# ADR-041: AI/Gemini Integration Architecture **Date**: 2026-01-09 **Status**: Accepted **Implemented**: 2026-01-09 ## Context The application relies heavily on Google Gemini AI for core functionality: 1. **Flyer Processing**: Extracting store names, dates, addresses, and individual sale items from uploaded flyer images. 2. **Receipt Analysis**: Parsing purchased items and prices from receipt images. 3. **Recipe Suggestions**: Generating recipe ideas based on available ingredients. 4. **Text Extraction**: OCR-style extraction from cropped image regions. These AI operations have unique challenges: - **Rate Limits**: Google AI API enforces requests-per-minute (RPM) limits. - **Quota Buckets**: Different model families (stable, preview, experimental) have separate quotas. - **Model Availability**: Models may be unavailable due to regional restrictions, updates, or high load. - **Cost Variability**: Different models have different pricing (Flash-Lite vs Pro). - **Output Limits**: Some models have 8k token limits, others 65k. - **Testability**: Tests must not make real API calls. ## Decision We will implement a centralized `AIService` class with: 1. **Dependency Injection**: AI client and filesystem are injectable for testability. 2. **Model Fallback Chain**: Automatic failover through prioritized model lists. 3. **Rate Limiting**: Per-instance rate limiter using `p-ratelimit`. 4. **Tiered Model Selection**: Different model lists for different task types. 5. **Environment-Aware Mocking**: Automatic mock client in test environments. ### Design Principles - **Single Responsibility**: `AIService` handles all AI interactions. - **Fail-Safe Fallbacks**: If a model fails, try the next one in the chain. - **Cost Optimization**: Use cheaper "lite" models for simple text tasks. - **Structured Logging**: Log all AI interactions with timing and model info. ## Implementation Details ### AIService Class Structure Located in `src/services/aiService.server.ts`: ```typescript interface IAiClient { generateContent(request: { contents: Content[]; tools?: Tool[]; useLiteModels?: boolean; }): Promise; } interface IFileSystem { readFile(path: string): Promise; } export class AIService { private aiClient: IAiClient; private fs: IFileSystem; private rateLimiter: (fn: () => Promise) => Promise; private logger: Logger; constructor(logger: Logger, aiClient?: IAiClient, fs?: IFileSystem) { // If aiClient provided: use it (unit test) // Else if test environment: use internal mock (integration test) // Else: create real GoogleGenAI client (production) } } ``` ### Tiered Model Lists Models are organized by task complexity and quota bucket: ```typescript // For image processing (vision + long output) private readonly models = [ // Tier A: Fast & Stable 'gemini-2.5-flash', // Primary, 65k output 'gemini-2.5-flash-lite', // Cost-saver, 65k output // Tier B: Heavy Lifters 'gemini-2.5-pro', // Complex layouts, 65k output // Tier C: Preview Bucket (separate quota) 'gemini-3-flash-preview', 'gemini-3-pro-preview', // Tier D: Experimental Bucket 'gemini-exp-1206', // Tier E: Last Resort 'gemma-3-27b-it', 'gemini-2.0-flash-exp', // WARNING: 8k limit ]; // For simple text tasks (recipes, categorization) private readonly models_lite = [ 'gemini-2.5-flash-lite', 'gemini-2.0-flash-lite-001', 'gemini-2.0-flash-001', 'gemma-3-12b-it', 'gemma-3-4b-it', 'gemini-2.0-flash-exp', ]; ``` ### Fallback with Retry Logic ```typescript private async _generateWithFallback( genAI: GoogleGenAI, request: { contents: Content[]; tools?: Tool[] }, models: string[], ): Promise { let lastError: Error | null = null; for (const modelName of models) { try { return await genAI.models.generateContent({ model: modelName, ...request }); } catch (error: unknown) { const errorMsg = extractErrorMessage(error); const isRetriable = [ 'quota', '429', '503', 'resource_exhausted', 'overloaded', 'unavailable', 'not found' ].some(term => errorMsg.toLowerCase().includes(term)); if (isRetriable) { this.logger.warn(`Model '${modelName}' failed, trying next...`); lastError = new Error(errorMsg); continue; } throw error; // Non-retriable error } } throw lastError || new Error('All AI models failed.'); } ``` ### Rate Limiting ```typescript const requestsPerMinute = parseInt(process.env.GEMINI_RPM || '5', 10); this.rateLimiter = pRateLimit({ interval: 60 * 1000, rate: requestsPerMinute, concurrency: requestsPerMinute, }); // Usage: const result = await this.rateLimiter(() => this.aiClient.generateContent({ contents: [...] }) ); ``` ### Test Environment Detection ```typescript const isTestEnvironment = process.env.NODE_ENV === 'test' || !!process.env.VITEST_POOL_ID; if (aiClient) { // Unit test: use provided mock this.aiClient = aiClient; } else if (isTestEnvironment) { // Integration test: use internal mock this.aiClient = { generateContent: async () => ({ text: JSON.stringify(this.getMockFlyerData()), }), }; } else { // Production: use real client const genAI = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY }); this.aiClient = { generateContent: /* adapter */ }; } ``` ### Prompt Engineering Prompts are constructed with: 1. **Clear Task Definition**: What to extract and in what format. 2. **Structured Output Requirements**: JSON schema with field descriptions. 3. **Examples**: Concrete examples of expected output. 4. **Context Hints**: User location for store address resolution. ```typescript private _buildFlyerExtractionPrompt( masterItems: MasterGroceryItem[], submitterIp?: string, userProfileAddress?: string, ): string { // Location hint for address resolution let locationHint = ''; if (userProfileAddress) { locationHint = `The user has profile address "${userProfileAddress}"...`; } // Simplified master item list (reduce token usage) const simplifiedMasterList = masterItems.map(item => ({ id: item.master_grocery_item_id, name: item.name, })); return ` # TASK Analyze the flyer image(s) and extract... # RULES 1. Extract store_name, valid_from, valid_to, store_address 2. Extract items array with item, price_display, price_in_cents... # EXAMPLES - { "item": "Red Grapes", "price_display": "$1.99 /lb", ... } # MASTER LIST ${JSON.stringify(simplifiedMasterList)} `; } ``` ### Response Parsing AI responses may contain markdown, trailing text, or formatting issues: ````typescript private _parseJsonFromAiResponse(responseText: string | undefined, logger: Logger): T | null { if (!responseText) return null; // Try to extract from markdown code block const markdownMatch = responseText.match(/```(json)?\s*([\s\S]*?)\s*```/); let jsonString = markdownMatch?.[2]?.trim() || responseText; // Find JSON boundaries const startIndex = Math.min( jsonString.indexOf('{') >= 0 ? jsonString.indexOf('{') : Infinity, jsonString.indexOf('[') >= 0 ? jsonString.indexOf('[') : Infinity ); const endIndex = Math.max(jsonString.lastIndexOf('}'), jsonString.lastIndexOf(']')); if (startIndex === Infinity || endIndex === -1) return null; try { return JSON.parse(jsonString.substring(startIndex, endIndex + 1)); } catch { return null; } } ```` ## Consequences ### Positive - **Resilience**: Automatic failover when models are unavailable or rate-limited. - **Cost Control**: Uses cheaper models for simple tasks. - **Testability**: Full mock support for unit and integration tests. - **Observability**: Detailed logging of all AI operations with timing. - **Maintainability**: Centralized AI logic in one service. ### Negative - **Model List Maintenance**: Must update model lists when new models release. - **Complexity**: Fallback logic adds complexity. - **Delayed Failures**: May take longer to fail if all models are down. ### Mitigation - Monitor model deprecation announcements from Google. - Add health checks that validate AI connectivity on startup. - Consider caching successful model selections per task type. ## Key Files - `src/services/aiService.server.ts` - Main AIService class - `src/services/aiService.server.test.ts` - Unit tests with mocked AI client - `src/services/aiApiClient.ts` - Low-level API client wrapper - `src/services/aiAnalysisService.ts` - Higher-level analysis orchestration - `src/types/ai.ts` - Zod schemas for AI response validation ## Related ADRs - [ADR-027](./0027-standardized-naming-convention-for-ai-and-database-types.md) - Naming Conventions for AI Types - [ADR-039](./0039-dependency-injection-pattern.md) - Dependency Injection Pattern - [ADR-001](./0001-standardized-error-handling.md) - Error Handling