All checks were successful
Deploy to Test Environment / deploy-to-test (push) Successful in 16m0s
8.7 KiB
8.7 KiB
ADR-041: AI/Gemini Integration Architecture
Date: 2026-01-09
Status: Accepted
Implemented: 2026-01-09
Context
The application relies heavily on Google Gemini AI for core functionality:
- Flyer Processing: Extracting store names, dates, addresses, and individual sale items from uploaded flyer images.
- Receipt Analysis: Parsing purchased items and prices from receipt images.
- Recipe Suggestions: Generating recipe ideas based on available ingredients.
- Text Extraction: OCR-style extraction from cropped image regions.
These AI operations have unique challenges:
- Rate Limits: Google AI API enforces requests-per-minute (RPM) limits.
- Quota Buckets: Different model families (stable, preview, experimental) have separate quotas.
- Model Availability: Models may be unavailable due to regional restrictions, updates, or high load.
- Cost Variability: Different models have different pricing (Flash-Lite vs Pro).
- Output Limits: Some models have 8k token limits, others 65k.
- Testability: Tests must not make real API calls.
Decision
We will implement a centralized AIService class with:
- Dependency Injection: AI client and filesystem are injectable for testability.
- Model Fallback Chain: Automatic failover through prioritized model lists.
- Rate Limiting: Per-instance rate limiter using
p-ratelimit. - Tiered Model Selection: Different model lists for different task types.
- Environment-Aware Mocking: Automatic mock client in test environments.
Design Principles
- Single Responsibility:
AIServicehandles all AI interactions. - Fail-Safe Fallbacks: If a model fails, try the next one in the chain.
- Cost Optimization: Use cheaper "lite" models for simple text tasks.
- Structured Logging: Log all AI interactions with timing and model info.
Implementation Details
AIService Class Structure
Located in src/services/aiService.server.ts:
interface IAiClient {
generateContent(request: {
contents: Content[];
tools?: Tool[];
useLiteModels?: boolean;
}): Promise<GenerateContentResponse>;
}
interface IFileSystem {
readFile(path: string): Promise<Buffer>;
}
export class AIService {
private aiClient: IAiClient;
private fs: IFileSystem;
private rateLimiter: <T>(fn: () => Promise<T>) => Promise<T>;
private logger: Logger;
constructor(logger: Logger, aiClient?: IAiClient, fs?: IFileSystem) {
// If aiClient provided: use it (unit test)
// Else if test environment: use internal mock (integration test)
// Else: create real GoogleGenAI client (production)
}
}
Tiered Model Lists
Models are organized by task complexity and quota bucket:
// For image processing (vision + long output)
private readonly models = [
// Tier A: Fast & Stable
'gemini-2.5-flash', // Primary, 65k output
'gemini-2.5-flash-lite', // Cost-saver, 65k output
// Tier B: Heavy Lifters
'gemini-2.5-pro', // Complex layouts, 65k output
// Tier C: Preview Bucket (separate quota)
'gemini-3-flash-preview',
'gemini-3-pro-preview',
// Tier D: Experimental Bucket
'gemini-exp-1206',
// Tier E: Last Resort
'gemma-3-27b-it',
'gemini-2.0-flash-exp', // WARNING: 8k limit
];
// For simple text tasks (recipes, categorization)
private readonly models_lite = [
'gemini-2.5-flash-lite',
'gemini-2.0-flash-lite-001',
'gemini-2.0-flash-001',
'gemma-3-12b-it',
'gemma-3-4b-it',
'gemini-2.0-flash-exp',
];
Fallback with Retry Logic
private async _generateWithFallback(
genAI: GoogleGenAI,
request: { contents: Content[]; tools?: Tool[] },
models: string[],
): Promise<GenerateContentResponse> {
let lastError: Error | null = null;
for (const modelName of models) {
try {
return await genAI.models.generateContent({ model: modelName, ...request });
} catch (error: unknown) {
const errorMsg = extractErrorMessage(error);
const isRetriable = [
'quota', '429', '503', 'resource_exhausted',
'overloaded', 'unavailable', 'not found'
].some(term => errorMsg.toLowerCase().includes(term));
if (isRetriable) {
this.logger.warn(`Model '${modelName}' failed, trying next...`);
lastError = new Error(errorMsg);
continue;
}
throw error; // Non-retriable error
}
}
throw lastError || new Error('All AI models failed.');
}
Rate Limiting
const requestsPerMinute = parseInt(process.env.GEMINI_RPM || '5', 10);
this.rateLimiter = pRateLimit({
interval: 60 * 1000,
rate: requestsPerMinute,
concurrency: requestsPerMinute,
});
// Usage:
const result = await this.rateLimiter(() =>
this.aiClient.generateContent({ contents: [...] })
);
Test Environment Detection
const isTestEnvironment = process.env.NODE_ENV === 'test' || !!process.env.VITEST_POOL_ID;
if (aiClient) {
// Unit test: use provided mock
this.aiClient = aiClient;
} else if (isTestEnvironment) {
// Integration test: use internal mock
this.aiClient = {
generateContent: async () => ({
text: JSON.stringify(this.getMockFlyerData()),
}),
};
} else {
// Production: use real client
const genAI = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });
this.aiClient = { generateContent: /* adapter */ };
}
Prompt Engineering
Prompts are constructed with:
- Clear Task Definition: What to extract and in what format.
- Structured Output Requirements: JSON schema with field descriptions.
- Examples: Concrete examples of expected output.
- Context Hints: User location for store address resolution.
private _buildFlyerExtractionPrompt(
masterItems: MasterGroceryItem[],
submitterIp?: string,
userProfileAddress?: string,
): string {
// Location hint for address resolution
let locationHint = '';
if (userProfileAddress) {
locationHint = `The user has profile address "${userProfileAddress}"...`;
}
// Simplified master item list (reduce token usage)
const simplifiedMasterList = masterItems.map(item => ({
id: item.master_grocery_item_id,
name: item.name,
}));
return `
# TASK
Analyze the flyer image(s) and extract...
# RULES
1. Extract store_name, valid_from, valid_to, store_address
2. Extract items array with item, price_display, price_in_cents...
# EXAMPLES
- { "item": "Red Grapes", "price_display": "$1.99 /lb", ... }
# MASTER LIST
${JSON.stringify(simplifiedMasterList)}
`;
}
Response Parsing
AI responses may contain markdown, trailing text, or formatting issues:
private _parseJsonFromAiResponse<T>(responseText: string | undefined, logger: Logger): T | null {
if (!responseText) return null;
// Try to extract from markdown code block
const markdownMatch = responseText.match(/```(json)?\s*([\s\S]*?)\s*```/);
let jsonString = markdownMatch?.[2]?.trim() || responseText;
// Find JSON boundaries
const startIndex = Math.min(
jsonString.indexOf('{') >= 0 ? jsonString.indexOf('{') : Infinity,
jsonString.indexOf('[') >= 0 ? jsonString.indexOf('[') : Infinity
);
const endIndex = Math.max(jsonString.lastIndexOf('}'), jsonString.lastIndexOf(']'));
if (startIndex === Infinity || endIndex === -1) return null;
try {
return JSON.parse(jsonString.substring(startIndex, endIndex + 1));
} catch {
return null;
}
}
Consequences
Positive
- Resilience: Automatic failover when models are unavailable or rate-limited.
- Cost Control: Uses cheaper models for simple tasks.
- Testability: Full mock support for unit and integration tests.
- Observability: Detailed logging of all AI operations with timing.
- Maintainability: Centralized AI logic in one service.
Negative
- Model List Maintenance: Must update model lists when new models release.
- Complexity: Fallback logic adds complexity.
- Delayed Failures: May take longer to fail if all models are down.
Mitigation
- Monitor model deprecation announcements from Google.
- Add health checks that validate AI connectivity on startup.
- Consider caching successful model selections per task type.
Key Files
src/services/aiService.server.ts- Main AIService classsrc/services/aiService.server.test.ts- Unit tests with mocked AI clientsrc/services/aiApiClient.ts- Low-level API client wrappersrc/services/aiAnalysisService.ts- Higher-level analysis orchestrationsrc/types/ai.ts- Zod schemas for AI response validation