Files
flyer-crawler.projectium.com/docs/adr/0041-ai-gemini-integration-architecture.md
Torben Sorensen e14c19c112
All checks were successful
Deploy to Test Environment / deploy-to-test (push) Successful in 16m0s
linting docs + some fixes go claude and gemini
2026-01-09 22:38:57 -08:00

292 lines
8.7 KiB
Markdown

# ADR-041: AI/Gemini Integration Architecture
**Date**: 2026-01-09
**Status**: Accepted
**Implemented**: 2026-01-09
## Context
The application relies heavily on Google Gemini AI for core functionality:
1. **Flyer Processing**: Extracting store names, dates, addresses, and individual sale items from uploaded flyer images.
2. **Receipt Analysis**: Parsing purchased items and prices from receipt images.
3. **Recipe Suggestions**: Generating recipe ideas based on available ingredients.
4. **Text Extraction**: OCR-style extraction from cropped image regions.
These AI operations have unique challenges:
- **Rate Limits**: Google AI API enforces requests-per-minute (RPM) limits.
- **Quota Buckets**: Different model families (stable, preview, experimental) have separate quotas.
- **Model Availability**: Models may be unavailable due to regional restrictions, updates, or high load.
- **Cost Variability**: Different models have different pricing (Flash-Lite vs Pro).
- **Output Limits**: Some models have 8k token limits, others 65k.
- **Testability**: Tests must not make real API calls.
## Decision
We will implement a centralized `AIService` class with:
1. **Dependency Injection**: AI client and filesystem are injectable for testability.
2. **Model Fallback Chain**: Automatic failover through prioritized model lists.
3. **Rate Limiting**: Per-instance rate limiter using `p-ratelimit`.
4. **Tiered Model Selection**: Different model lists for different task types.
5. **Environment-Aware Mocking**: Automatic mock client in test environments.
### Design Principles
- **Single Responsibility**: `AIService` handles all AI interactions.
- **Fail-Safe Fallbacks**: If a model fails, try the next one in the chain.
- **Cost Optimization**: Use cheaper "lite" models for simple text tasks.
- **Structured Logging**: Log all AI interactions with timing and model info.
## Implementation Details
### AIService Class Structure
Located in `src/services/aiService.server.ts`:
```typescript
interface IAiClient {
generateContent(request: {
contents: Content[];
tools?: Tool[];
useLiteModels?: boolean;
}): Promise<GenerateContentResponse>;
}
interface IFileSystem {
readFile(path: string): Promise<Buffer>;
}
export class AIService {
private aiClient: IAiClient;
private fs: IFileSystem;
private rateLimiter: <T>(fn: () => Promise<T>) => Promise<T>;
private logger: Logger;
constructor(logger: Logger, aiClient?: IAiClient, fs?: IFileSystem) {
// If aiClient provided: use it (unit test)
// Else if test environment: use internal mock (integration test)
// Else: create real GoogleGenAI client (production)
}
}
```
### Tiered Model Lists
Models are organized by task complexity and quota bucket:
```typescript
// For image processing (vision + long output)
private readonly models = [
// Tier A: Fast & Stable
'gemini-2.5-flash', // Primary, 65k output
'gemini-2.5-flash-lite', // Cost-saver, 65k output
// Tier B: Heavy Lifters
'gemini-2.5-pro', // Complex layouts, 65k output
// Tier C: Preview Bucket (separate quota)
'gemini-3-flash-preview',
'gemini-3-pro-preview',
// Tier D: Experimental Bucket
'gemini-exp-1206',
// Tier E: Last Resort
'gemma-3-27b-it',
'gemini-2.0-flash-exp', // WARNING: 8k limit
];
// For simple text tasks (recipes, categorization)
private readonly models_lite = [
'gemini-2.5-flash-lite',
'gemini-2.0-flash-lite-001',
'gemini-2.0-flash-001',
'gemma-3-12b-it',
'gemma-3-4b-it',
'gemini-2.0-flash-exp',
];
```
### Fallback with Retry Logic
```typescript
private async _generateWithFallback(
genAI: GoogleGenAI,
request: { contents: Content[]; tools?: Tool[] },
models: string[],
): Promise<GenerateContentResponse> {
let lastError: Error | null = null;
for (const modelName of models) {
try {
return await genAI.models.generateContent({ model: modelName, ...request });
} catch (error: unknown) {
const errorMsg = extractErrorMessage(error);
const isRetriable = [
'quota', '429', '503', 'resource_exhausted',
'overloaded', 'unavailable', 'not found'
].some(term => errorMsg.toLowerCase().includes(term));
if (isRetriable) {
this.logger.warn(`Model '${modelName}' failed, trying next...`);
lastError = new Error(errorMsg);
continue;
}
throw error; // Non-retriable error
}
}
throw lastError || new Error('All AI models failed.');
}
```
### Rate Limiting
```typescript
const requestsPerMinute = parseInt(process.env.GEMINI_RPM || '5', 10);
this.rateLimiter = pRateLimit({
interval: 60 * 1000,
rate: requestsPerMinute,
concurrency: requestsPerMinute,
});
// Usage:
const result = await this.rateLimiter(() =>
this.aiClient.generateContent({ contents: [...] })
);
```
### Test Environment Detection
```typescript
const isTestEnvironment = process.env.NODE_ENV === 'test' || !!process.env.VITEST_POOL_ID;
if (aiClient) {
// Unit test: use provided mock
this.aiClient = aiClient;
} else if (isTestEnvironment) {
// Integration test: use internal mock
this.aiClient = {
generateContent: async () => ({
text: JSON.stringify(this.getMockFlyerData()),
}),
};
} else {
// Production: use real client
const genAI = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });
this.aiClient = { generateContent: /* adapter */ };
}
```
### Prompt Engineering
Prompts are constructed with:
1. **Clear Task Definition**: What to extract and in what format.
2. **Structured Output Requirements**: JSON schema with field descriptions.
3. **Examples**: Concrete examples of expected output.
4. **Context Hints**: User location for store address resolution.
```typescript
private _buildFlyerExtractionPrompt(
masterItems: MasterGroceryItem[],
submitterIp?: string,
userProfileAddress?: string,
): string {
// Location hint for address resolution
let locationHint = '';
if (userProfileAddress) {
locationHint = `The user has profile address "${userProfileAddress}"...`;
}
// Simplified master item list (reduce token usage)
const simplifiedMasterList = masterItems.map(item => ({
id: item.master_grocery_item_id,
name: item.name,
}));
return `
# TASK
Analyze the flyer image(s) and extract...
# RULES
1. Extract store_name, valid_from, valid_to, store_address
2. Extract items array with item, price_display, price_in_cents...
# EXAMPLES
- { "item": "Red Grapes", "price_display": "$1.99 /lb", ... }
# MASTER LIST
${JSON.stringify(simplifiedMasterList)}
`;
}
```
### Response Parsing
AI responses may contain markdown, trailing text, or formatting issues:
````typescript
private _parseJsonFromAiResponse<T>(responseText: string | undefined, logger: Logger): T | null {
if (!responseText) return null;
// Try to extract from markdown code block
const markdownMatch = responseText.match(/```(json)?\s*([\s\S]*?)\s*```/);
let jsonString = markdownMatch?.[2]?.trim() || responseText;
// Find JSON boundaries
const startIndex = Math.min(
jsonString.indexOf('{') >= 0 ? jsonString.indexOf('{') : Infinity,
jsonString.indexOf('[') >= 0 ? jsonString.indexOf('[') : Infinity
);
const endIndex = Math.max(jsonString.lastIndexOf('}'), jsonString.lastIndexOf(']'));
if (startIndex === Infinity || endIndex === -1) return null;
try {
return JSON.parse(jsonString.substring(startIndex, endIndex + 1));
} catch {
return null;
}
}
````
## Consequences
### Positive
- **Resilience**: Automatic failover when models are unavailable or rate-limited.
- **Cost Control**: Uses cheaper models for simple tasks.
- **Testability**: Full mock support for unit and integration tests.
- **Observability**: Detailed logging of all AI operations with timing.
- **Maintainability**: Centralized AI logic in one service.
### Negative
- **Model List Maintenance**: Must update model lists when new models release.
- **Complexity**: Fallback logic adds complexity.
- **Delayed Failures**: May take longer to fail if all models are down.
### Mitigation
- Monitor model deprecation announcements from Google.
- Add health checks that validate AI connectivity on startup.
- Consider caching successful model selections per task type.
## Key Files
- `src/services/aiService.server.ts` - Main AIService class
- `src/services/aiService.server.test.ts` - Unit tests with mocked AI client
- `src/services/aiApiClient.ts` - Low-level API client wrapper
- `src/services/aiAnalysisService.ts` - Higher-level analysis orchestration
- `src/types/ai.ts` - Zod schemas for AI response validation
## Related ADRs
- [ADR-027](./0027-standardized-naming-convention-for-ai-and-database-types.md) - Naming Conventions for AI Types
- [ADR-039](./0039-dependency-injection-pattern.md) - Dependency Injection Pattern
- [ADR-001](./0001-standardized-error-handling.md) - Error Handling