flyer-crawler.projectium.com/docs/adr/0041-ai-gemini-integration-architecture.md

# ADR-041: AI/Gemini Integration Architecture

**Date**: 2026-01-09

**Status**: Accepted

**Implemented**: 2026-01-09

## Context

The application relies heavily on Google Gemini AI for core functionality:

1. **Flyer Processing**: Extracting store names, dates, addresses, and individual sale items from uploaded flyer images.
2. **Receipt Analysis**: Parsing purchased items and prices from receipt images.
3. **Recipe Suggestions**: Generating recipe ideas based on available ingredients.
4. **Text Extraction**: OCR-style extraction from cropped image regions.

These AI operations have unique challenges:

- **Rate Limits**: Google AI API enforces requests-per-minute (RPM) limits.
- **Quota Buckets**: Different model families (stable, preview, experimental) have separate quotas.
- **Model Availability**: Models may be unavailable due to regional restrictions, updates, or high load.
- **Cost Variability**: Different models have different pricing (Flash-Lite vs Pro).
- **Output Limits**: Some models have 8k token limits, others 65k.
- **Testability**: Tests must not make real API calls.

## Decision

We will implement a centralized `AIService` class with:

1. **Dependency Injection**: AI client and filesystem are injectable for testability.
2. **Model Fallback Chain**: Automatic failover through prioritized model lists.
3. **Rate Limiting**: Per-instance rate limiter using `p-ratelimit`.
4. **Tiered Model Selection**: Different model lists for different task types.
5. **Environment-Aware Mocking**: Automatic mock client in test environments.

### Design Principles

- **Single Responsibility**: `AIService` handles all AI interactions.
- **Fail-Safe Fallbacks**: If a model fails, try the next one in the chain.
- **Cost Optimization**: Use cheaper "lite" models for simple text tasks.
- **Structured Logging**: Log all AI interactions with timing and model info.

## Implementation Details

### AIService Class Structure

Located in `src/services/aiService.server.ts`:

```typescript
interface IAiClient {
  generateContent(request: {
    contents: Content[];
    tools?: Tool[];
    useLiteModels?: boolean;
  }): Promise<GenerateContentResponse>;
}

interface IFileSystem {
  readFile(path: string): Promise<Buffer>;
}

export class AIService {
  private aiClient: IAiClient;
  private fs: IFileSystem;
  private rateLimiter: <T>(fn: () => Promise<T>) => Promise<T>;
  private logger: Logger;

  constructor(logger: Logger, aiClient?: IAiClient, fs?: IFileSystem) {
    // If aiClient provided: use it (unit test)
    // Else if test environment: use internal mock (integration test)
    // Else: create real GoogleGenAI client (production)
  }
}
```

### Tiered Model Lists

Models are organized by task complexity and quota bucket:

```typescript
// For image processing (vision + long output)
private readonly models = [
  // Tier A: Fast & Stable
  'gemini-2.5-flash',      // Primary, 65k output
  'gemini-2.5-flash-lite', // Cost-saver, 65k output

  // Tier B: Heavy Lifters
  'gemini-2.5-pro',        // Complex layouts, 65k output

  // Tier C: Preview Bucket (separate quota)
  'gemini-3-flash-preview',
  'gemini-3-pro-preview',

  // Tier D: Experimental Bucket
  'gemini-exp-1206',

  // Tier E: Last Resort
  'gemma-3-27b-it',
  'gemini-2.0-flash-exp',  // WARNING: 8k limit
];

// For simple text tasks (recipes, categorization)
private readonly models_lite = [
  'gemini-2.5-flash-lite',
  'gemini-2.0-flash-lite-001',
  'gemini-2.0-flash-001',
  'gemma-3-12b-it',
  'gemma-3-4b-it',
  'gemini-2.0-flash-exp',
];
```

### Fallback with Retry Logic

```typescript
private async _generateWithFallback(
  genAI: GoogleGenAI,
  request: { contents: Content[]; tools?: Tool[] },
  models: string[],
): Promise<GenerateContentResponse> {
  let lastError: Error | null = null;

  for (const modelName of models) {
    try {
      return await genAI.models.generateContent({ model: modelName, ...request });
    } catch (error: unknown) {
      const errorMsg = extractErrorMessage(error);
      const isRetriable = [
        'quota', '429', '503', 'resource_exhausted',
        'overloaded', 'unavailable', 'not found'
      ].some(term => errorMsg.toLowerCase().includes(term));

      if (isRetriable) {
        this.logger.warn(`Model '${modelName}' failed, trying next...`);
        lastError = new Error(errorMsg);
        continue;
      }
      throw error; // Non-retriable error
    }
  }
  throw lastError || new Error('All AI models failed.');
}
```

### Rate Limiting

```typescript
const requestsPerMinute = parseInt(process.env.GEMINI_RPM || '5', 10);
this.rateLimiter = pRateLimit({
  interval: 60 * 1000,
  rate: requestsPerMinute,
  concurrency: requestsPerMinute,
});

// Usage:
const result = await this.rateLimiter(() =>
  this.aiClient.generateContent({ contents: [...] })
);
```

### Test Environment Detection

```typescript
const isTestEnvironment = process.env.NODE_ENV === 'test' || !!process.env.VITEST_POOL_ID;

if (aiClient) {
  // Unit test: use provided mock
  this.aiClient = aiClient;
} else if (isTestEnvironment) {
  // Integration test: use internal mock
  this.aiClient = {
    generateContent: async () => ({
      text: JSON.stringify(this.getMockFlyerData()),
    }),
  };
} else {
  // Production: use real client
  const genAI = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });
  this.aiClient = { generateContent: /* adapter */ };
}
```

### Prompt Engineering

Prompts are constructed with:

1. **Clear Task Definition**: What to extract and in what format.
2. **Structured Output Requirements**: JSON schema with field descriptions.
3. **Examples**: Concrete examples of expected output.
4. **Context Hints**: User location for store address resolution.

```typescript
private _buildFlyerExtractionPrompt(
  masterItems: MasterGroceryItem[],
  submitterIp?: string,
  userProfileAddress?: string,
): string {
  // Location hint for address resolution
  let locationHint = '';
  if (userProfileAddress) {
    locationHint = `The user has profile address "${userProfileAddress}"...`;
  }

  // Simplified master item list (reduce token usage)
  const simplifiedMasterList = masterItems.map(item => ({
    id: item.master_grocery_item_id,
    name: item.name,
  }));

  return `
    # TASK
    Analyze the flyer image(s) and extract...

    # RULES
    1. Extract store_name, valid_from, valid_to, store_address
    2. Extract items array with item, price_display, price_in_cents...

    # EXAMPLES
    - { "item": "Red Grapes", "price_display": "$1.99 /lb", ... }

    # MASTER LIST
    ${JSON.stringify(simplifiedMasterList)}
  `;
}
```

### Response Parsing

AI responses may contain markdown, trailing text, or formatting issues:

````typescript
private _parseJsonFromAiResponse<T>(responseText: string | undefined, logger: Logger): T | null {
  if (!responseText) return null;

  // Try to extract from markdown code block
  const markdownMatch = responseText.match(/```(json)?\s*([\s\S]*?)\s*```/);
  let jsonString = markdownMatch?.[2]?.trim() || responseText;

  // Find JSON boundaries
  const startIndex = Math.min(
    jsonString.indexOf('{') >= 0 ? jsonString.indexOf('{') : Infinity,
    jsonString.indexOf('[') >= 0 ? jsonString.indexOf('[') : Infinity
  );
  const endIndex = Math.max(jsonString.lastIndexOf('}'), jsonString.lastIndexOf(']'));

  if (startIndex === Infinity || endIndex === -1) return null;

  try {
    return JSON.parse(jsonString.substring(startIndex, endIndex + 1));
  } catch {
    return null;
  }
}
````

## Consequences

### Positive

- **Resilience**: Automatic failover when models are unavailable or rate-limited.
- **Cost Control**: Uses cheaper models for simple tasks.
- **Testability**: Full mock support for unit and integration tests.
- **Observability**: Detailed logging of all AI operations with timing.
- **Maintainability**: Centralized AI logic in one service.

### Negative

- **Model List Maintenance**: Must update model lists when new models release.
- **Complexity**: Fallback logic adds complexity.
- **Delayed Failures**: May take longer to fail if all models are down.

### Mitigation

- Monitor model deprecation announcements from Google.
- Add health checks that validate AI connectivity on startup.
- Consider caching successful model selections per task type.

## Key Files

- `src/services/aiService.server.ts` - Main AIService class
- `src/services/aiService.server.test.ts` - Unit tests with mocked AI client
- `src/services/aiApiClient.ts` - Low-level API client wrapper
- `src/services/aiAnalysisService.ts` - Higher-level analysis orchestration
- `src/types/ai.ts` - Zod schemas for AI response validation

## Related ADRs

- [ADR-027](./0027-standardized-naming-convention-for-ai-and-database-types.md) - Naming Conventions for AI Types
- [ADR-039](./0039-dependency-injection-pattern.md) - Dependency Injection Pattern
- [ADR-001](./0001-standardized-error-handling.md) - Error Handling