torbo/flyer-crawler.projectium.com

Fork 0

Files

Torben Sorensen e14c19c112

Deploy to Test Environment / deploy-to-test (push) Successful in 16m0s

Details

linting docs + some fixes go claude and gemini

2026-01-09 22:38:57 -08:00

8.7 KiB

Raw Blame History

ADR-041: AI/Gemini Integration Architecture

Date: 2026-01-09

Status: Accepted

Implemented: 2026-01-09

Context

The application relies heavily on Google Gemini AI for core functionality:

Flyer Processing: Extracting store names, dates, addresses, and individual sale items from uploaded flyer images.
Receipt Analysis: Parsing purchased items and prices from receipt images.
Recipe Suggestions: Generating recipe ideas based on available ingredients.
Text Extraction: OCR-style extraction from cropped image regions.

These AI operations have unique challenges:

Rate Limits: Google AI API enforces requests-per-minute (RPM) limits.
Quota Buckets: Different model families (stable, preview, experimental) have separate quotas.
Model Availability: Models may be unavailable due to regional restrictions, updates, or high load.
Cost Variability: Different models have different pricing (Flash-Lite vs Pro).
Output Limits: Some models have 8k token limits, others 65k.
Testability: Tests must not make real API calls.

Decision

We will implement a centralized AIService class with:

Dependency Injection: AI client and filesystem are injectable for testability.
Model Fallback Chain: Automatic failover through prioritized model lists.
Rate Limiting: Per-instance rate limiter using p-ratelimit.
Tiered Model Selection: Different model lists for different task types.
Environment-Aware Mocking: Automatic mock client in test environments.

Design Principles

Single Responsibility: AIService handles all AI interactions.
Fail-Safe Fallbacks: If a model fails, try the next one in the chain.
Cost Optimization: Use cheaper "lite" models for simple text tasks.
Structured Logging: Log all AI interactions with timing and model info.

Implementation Details

AIService Class Structure

Located in src/services/aiService.server.ts:

interface IAiClient {
  generateContent(request: {
    contents: Content[];
    tools?: Tool[];
    useLiteModels?: boolean;
  }): Promise<GenerateContentResponse>;
}

interface IFileSystem {
  readFile(path: string): Promise<Buffer>;
}

export class AIService {
  private aiClient: IAiClient;
  private fs: IFileSystem;
  private rateLimiter: <T>(fn: () => Promise<T>) => Promise<T>;
  private logger: Logger;

  constructor(logger: Logger, aiClient?: IAiClient, fs?: IFileSystem) {
    // If aiClient provided: use it (unit test)
    // Else if test environment: use internal mock (integration test)
    // Else: create real GoogleGenAI client (production)
  }
}

Tiered Model Lists

Models are organized by task complexity and quota bucket:

// For image processing (vision + long output)
private readonly models = [
  // Tier A: Fast & Stable
  'gemini-2.5-flash',      // Primary, 65k output
  'gemini-2.5-flash-lite', // Cost-saver, 65k output

  // Tier B: Heavy Lifters
  'gemini-2.5-pro',        // Complex layouts, 65k output

  // Tier C: Preview Bucket (separate quota)
  'gemini-3-flash-preview',
  'gemini-3-pro-preview',

  // Tier D: Experimental Bucket
  'gemini-exp-1206',

  // Tier E: Last Resort
  'gemma-3-27b-it',
  'gemini-2.0-flash-exp',  // WARNING: 8k limit
];

// For simple text tasks (recipes, categorization)
private readonly models_lite = [
  'gemini-2.5-flash-lite',
  'gemini-2.0-flash-lite-001',
  'gemini-2.0-flash-001',
  'gemma-3-12b-it',
  'gemma-3-4b-it',
  'gemini-2.0-flash-exp',
];

Fallback with Retry Logic

private async _generateWithFallback(
  genAI: GoogleGenAI,
  request: { contents: Content[]; tools?: Tool[] },
  models: string[],
): Promise<GenerateContentResponse> {
  let lastError: Error | null = null;

  for (const modelName of models) {
    try {
      return await genAI.models.generateContent({ model: modelName, ...request });
    } catch (error: unknown) {
      const errorMsg = extractErrorMessage(error);
      const isRetriable = [
        'quota', '429', '503', 'resource_exhausted',
        'overloaded', 'unavailable', 'not found'
      ].some(term => errorMsg.toLowerCase().includes(term));

      if (isRetriable) {
        this.logger.warn(`Model '${modelName}' failed, trying next...`);
        lastError = new Error(errorMsg);
        continue;
      }
      throw error; // Non-retriable error
    }
  }
  throw lastError || new Error('All AI models failed.');
}

Rate Limiting

const requestsPerMinute = parseInt(process.env.GEMINI_RPM || '5', 10);
this.rateLimiter = pRateLimit({
  interval: 60 * 1000,
  rate: requestsPerMinute,
  concurrency: requestsPerMinute,
});

// Usage:
const result = await this.rateLimiter(() =>
  this.aiClient.generateContent({ contents: [...] })
);

Test Environment Detection

const isTestEnvironment = process.env.NODE_ENV === 'test' || !!process.env.VITEST_POOL_ID;

if (aiClient) {
  // Unit test: use provided mock
  this.aiClient = aiClient;
} else if (isTestEnvironment) {
  // Integration test: use internal mock
  this.aiClient = {
    generateContent: async () => ({
      text: JSON.stringify(this.getMockFlyerData()),
    }),
  };
} else {
  // Production: use real client
  const genAI = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });
  this.aiClient = { generateContent: /* adapter */ };
}

Prompt Engineering

Prompts are constructed with:

Clear Task Definition: What to extract and in what format.
Structured Output Requirements: JSON schema with field descriptions.
Examples: Concrete examples of expected output.
Context Hints: User location for store address resolution.

private _buildFlyerExtractionPrompt(
  masterItems: MasterGroceryItem[],
  submitterIp?: string,
  userProfileAddress?: string,
): string {
  // Location hint for address resolution
  let locationHint = '';
  if (userProfileAddress) {
    locationHint = `The user has profile address "${userProfileAddress}"...`;
  }

  // Simplified master item list (reduce token usage)
  const simplifiedMasterList = masterItems.map(item => ({
    id: item.master_grocery_item_id,
    name: item.name,
  }));

  return `
    # TASK
    Analyze the flyer image(s) and extract...

    # RULES
    1. Extract store_name, valid_from, valid_to, store_address
    2. Extract items array with item, price_display, price_in_cents...

    # EXAMPLES
    - { "item": "Red Grapes", "price_display": "$1.99 /lb", ... }

    # MASTER LIST
    ${JSON.stringify(simplifiedMasterList)}
  `;
}

Response Parsing

AI responses may contain markdown, trailing text, or formatting issues:

private _parseJsonFromAiResponse<T>(responseText: string | undefined, logger: Logger): T | null {
  if (!responseText) return null;

  // Try to extract from markdown code block
  const markdownMatch = responseText.match(/```(json)?\s*([\s\S]*?)\s*```/);
  let jsonString = markdownMatch?.[2]?.trim() || responseText;

  // Find JSON boundaries
  const startIndex = Math.min(
    jsonString.indexOf('{') >= 0 ? jsonString.indexOf('{') : Infinity,
    jsonString.indexOf('[') >= 0 ? jsonString.indexOf('[') : Infinity
  );
  const endIndex = Math.max(jsonString.lastIndexOf('}'), jsonString.lastIndexOf(']'));

  if (startIndex === Infinity || endIndex === -1) return null;

  try {
    return JSON.parse(jsonString.substring(startIndex, endIndex + 1));
  } catch {
    return null;
  }
}

Consequences

Positive

Resilience: Automatic failover when models are unavailable or rate-limited.
Cost Control: Uses cheaper models for simple tasks.
Testability: Full mock support for unit and integration tests.
Observability: Detailed logging of all AI operations with timing.
Maintainability: Centralized AI logic in one service.

Negative

Model List Maintenance: Must update model lists when new models release.
Complexity: Fallback logic adds complexity.
Delayed Failures: May take longer to fail if all models are down.

Mitigation

Monitor model deprecation announcements from Google.
Add health checks that validate AI connectivity on startup.
Consider caching successful model selections per task type.

Key Files

src/services/aiService.server.ts - Main AIService class
src/services/aiService.server.test.ts - Unit tests with mocked AI client
src/services/aiApiClient.ts - Low-level API client wrapper
src/services/aiAnalysisService.ts - Higher-level analysis orchestration
src/types/ai.ts - Zod schemas for AI response validation

ADR-027 - Naming Conventions for AI Types
ADR-039 - Dependency Injection Pattern
ADR-001 - Error Handling

8.7 KiB Raw Blame History