Files
flyer-crawler.projectium.com/docs/adr/0041-ai-gemini-integration-architecture.md
Torben Sorensen e14c19c112
All checks were successful
Deploy to Test Environment / deploy-to-test (push) Successful in 16m0s
linting docs + some fixes go claude and gemini
2026-01-09 22:38:57 -08:00

8.7 KiB

ADR-041: AI/Gemini Integration Architecture

Date: 2026-01-09

Status: Accepted

Implemented: 2026-01-09

Context

The application relies heavily on Google Gemini AI for core functionality:

  1. Flyer Processing: Extracting store names, dates, addresses, and individual sale items from uploaded flyer images.
  2. Receipt Analysis: Parsing purchased items and prices from receipt images.
  3. Recipe Suggestions: Generating recipe ideas based on available ingredients.
  4. Text Extraction: OCR-style extraction from cropped image regions.

These AI operations have unique challenges:

  • Rate Limits: Google AI API enforces requests-per-minute (RPM) limits.
  • Quota Buckets: Different model families (stable, preview, experimental) have separate quotas.
  • Model Availability: Models may be unavailable due to regional restrictions, updates, or high load.
  • Cost Variability: Different models have different pricing (Flash-Lite vs Pro).
  • Output Limits: Some models have 8k token limits, others 65k.
  • Testability: Tests must not make real API calls.

Decision

We will implement a centralized AIService class with:

  1. Dependency Injection: AI client and filesystem are injectable for testability.
  2. Model Fallback Chain: Automatic failover through prioritized model lists.
  3. Rate Limiting: Per-instance rate limiter using p-ratelimit.
  4. Tiered Model Selection: Different model lists for different task types.
  5. Environment-Aware Mocking: Automatic mock client in test environments.

Design Principles

  • Single Responsibility: AIService handles all AI interactions.
  • Fail-Safe Fallbacks: If a model fails, try the next one in the chain.
  • Cost Optimization: Use cheaper "lite" models for simple text tasks.
  • Structured Logging: Log all AI interactions with timing and model info.

Implementation Details

AIService Class Structure

Located in src/services/aiService.server.ts:

interface IAiClient {
  generateContent(request: {
    contents: Content[];
    tools?: Tool[];
    useLiteModels?: boolean;
  }): Promise<GenerateContentResponse>;
}

interface IFileSystem {
  readFile(path: string): Promise<Buffer>;
}

export class AIService {
  private aiClient: IAiClient;
  private fs: IFileSystem;
  private rateLimiter: <T>(fn: () => Promise<T>) => Promise<T>;
  private logger: Logger;

  constructor(logger: Logger, aiClient?: IAiClient, fs?: IFileSystem) {
    // If aiClient provided: use it (unit test)
    // Else if test environment: use internal mock (integration test)
    // Else: create real GoogleGenAI client (production)
  }
}

Tiered Model Lists

Models are organized by task complexity and quota bucket:

// For image processing (vision + long output)
private readonly models = [
  // Tier A: Fast & Stable
  'gemini-2.5-flash',      // Primary, 65k output
  'gemini-2.5-flash-lite', // Cost-saver, 65k output

  // Tier B: Heavy Lifters
  'gemini-2.5-pro',        // Complex layouts, 65k output

  // Tier C: Preview Bucket (separate quota)
  'gemini-3-flash-preview',
  'gemini-3-pro-preview',

  // Tier D: Experimental Bucket
  'gemini-exp-1206',

  // Tier E: Last Resort
  'gemma-3-27b-it',
  'gemini-2.0-flash-exp',  // WARNING: 8k limit
];

// For simple text tasks (recipes, categorization)
private readonly models_lite = [
  'gemini-2.5-flash-lite',
  'gemini-2.0-flash-lite-001',
  'gemini-2.0-flash-001',
  'gemma-3-12b-it',
  'gemma-3-4b-it',
  'gemini-2.0-flash-exp',
];

Fallback with Retry Logic

private async _generateWithFallback(
  genAI: GoogleGenAI,
  request: { contents: Content[]; tools?: Tool[] },
  models: string[],
): Promise<GenerateContentResponse> {
  let lastError: Error | null = null;

  for (const modelName of models) {
    try {
      return await genAI.models.generateContent({ model: modelName, ...request });
    } catch (error: unknown) {
      const errorMsg = extractErrorMessage(error);
      const isRetriable = [
        'quota', '429', '503', 'resource_exhausted',
        'overloaded', 'unavailable', 'not found'
      ].some(term => errorMsg.toLowerCase().includes(term));

      if (isRetriable) {
        this.logger.warn(`Model '${modelName}' failed, trying next...`);
        lastError = new Error(errorMsg);
        continue;
      }
      throw error; // Non-retriable error
    }
  }
  throw lastError || new Error('All AI models failed.');
}

Rate Limiting

const requestsPerMinute = parseInt(process.env.GEMINI_RPM || '5', 10);
this.rateLimiter = pRateLimit({
  interval: 60 * 1000,
  rate: requestsPerMinute,
  concurrency: requestsPerMinute,
});

// Usage:
const result = await this.rateLimiter(() =>
  this.aiClient.generateContent({ contents: [...] })
);

Test Environment Detection

const isTestEnvironment = process.env.NODE_ENV === 'test' || !!process.env.VITEST_POOL_ID;

if (aiClient) {
  // Unit test: use provided mock
  this.aiClient = aiClient;
} else if (isTestEnvironment) {
  // Integration test: use internal mock
  this.aiClient = {
    generateContent: async () => ({
      text: JSON.stringify(this.getMockFlyerData()),
    }),
  };
} else {
  // Production: use real client
  const genAI = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });
  this.aiClient = { generateContent: /* adapter */ };
}

Prompt Engineering

Prompts are constructed with:

  1. Clear Task Definition: What to extract and in what format.
  2. Structured Output Requirements: JSON schema with field descriptions.
  3. Examples: Concrete examples of expected output.
  4. Context Hints: User location for store address resolution.
private _buildFlyerExtractionPrompt(
  masterItems: MasterGroceryItem[],
  submitterIp?: string,
  userProfileAddress?: string,
): string {
  // Location hint for address resolution
  let locationHint = '';
  if (userProfileAddress) {
    locationHint = `The user has profile address "${userProfileAddress}"...`;
  }

  // Simplified master item list (reduce token usage)
  const simplifiedMasterList = masterItems.map(item => ({
    id: item.master_grocery_item_id,
    name: item.name,
  }));

  return `
    # TASK
    Analyze the flyer image(s) and extract...

    # RULES
    1. Extract store_name, valid_from, valid_to, store_address
    2. Extract items array with item, price_display, price_in_cents...

    # EXAMPLES
    - { "item": "Red Grapes", "price_display": "$1.99 /lb", ... }

    # MASTER LIST
    ${JSON.stringify(simplifiedMasterList)}
  `;
}

Response Parsing

AI responses may contain markdown, trailing text, or formatting issues:

private _parseJsonFromAiResponse<T>(responseText: string | undefined, logger: Logger): T | null {
  if (!responseText) return null;

  // Try to extract from markdown code block
  const markdownMatch = responseText.match(/```(json)?\s*([\s\S]*?)\s*```/);
  let jsonString = markdownMatch?.[2]?.trim() || responseText;

  // Find JSON boundaries
  const startIndex = Math.min(
    jsonString.indexOf('{') >= 0 ? jsonString.indexOf('{') : Infinity,
    jsonString.indexOf('[') >= 0 ? jsonString.indexOf('[') : Infinity
  );
  const endIndex = Math.max(jsonString.lastIndexOf('}'), jsonString.lastIndexOf(']'));

  if (startIndex === Infinity || endIndex === -1) return null;

  try {
    return JSON.parse(jsonString.substring(startIndex, endIndex + 1));
  } catch {
    return null;
  }
}

Consequences

Positive

  • Resilience: Automatic failover when models are unavailable or rate-limited.
  • Cost Control: Uses cheaper models for simple tasks.
  • Testability: Full mock support for unit and integration tests.
  • Observability: Detailed logging of all AI operations with timing.
  • Maintainability: Centralized AI logic in one service.

Negative

  • Model List Maintenance: Must update model lists when new models release.
  • Complexity: Fallback logic adds complexity.
  • Delayed Failures: May take longer to fail if all models are down.

Mitigation

  • Monitor model deprecation announcements from Google.
  • Add health checks that validate AI connectivity on startup.
  • Consider caching successful model selections per task type.

Key Files

  • src/services/aiService.server.ts - Main AIService class
  • src/services/aiService.server.test.ts - Unit tests with mocked AI client
  • src/services/aiApiClient.ts - Low-level API client wrapper
  • src/services/aiAnalysisService.ts - Higher-level analysis orchestration
  • src/types/ai.ts - Zod schemas for AI response validation
  • ADR-027 - Naming Conventions for AI Types
  • ADR-039 - Dependency Injection Pattern
  • ADR-001 - Error Handling