All checks were successful
Deploy to Test Environment / deploy-to-test (push) Successful in 16m0s
292 lines
8.7 KiB
Markdown
292 lines
8.7 KiB
Markdown
# ADR-041: AI/Gemini Integration Architecture
|
|
|
|
**Date**: 2026-01-09
|
|
|
|
**Status**: Accepted
|
|
|
|
**Implemented**: 2026-01-09
|
|
|
|
## Context
|
|
|
|
The application relies heavily on Google Gemini AI for core functionality:
|
|
|
|
1. **Flyer Processing**: Extracting store names, dates, addresses, and individual sale items from uploaded flyer images.
|
|
2. **Receipt Analysis**: Parsing purchased items and prices from receipt images.
|
|
3. **Recipe Suggestions**: Generating recipe ideas based on available ingredients.
|
|
4. **Text Extraction**: OCR-style extraction from cropped image regions.
|
|
|
|
These AI operations have unique challenges:
|
|
|
|
- **Rate Limits**: Google AI API enforces requests-per-minute (RPM) limits.
|
|
- **Quota Buckets**: Different model families (stable, preview, experimental) have separate quotas.
|
|
- **Model Availability**: Models may be unavailable due to regional restrictions, updates, or high load.
|
|
- **Cost Variability**: Different models have different pricing (Flash-Lite vs Pro).
|
|
- **Output Limits**: Some models have 8k token limits, others 65k.
|
|
- **Testability**: Tests must not make real API calls.
|
|
|
|
## Decision
|
|
|
|
We will implement a centralized `AIService` class with:
|
|
|
|
1. **Dependency Injection**: AI client and filesystem are injectable for testability.
|
|
2. **Model Fallback Chain**: Automatic failover through prioritized model lists.
|
|
3. **Rate Limiting**: Per-instance rate limiter using `p-ratelimit`.
|
|
4. **Tiered Model Selection**: Different model lists for different task types.
|
|
5. **Environment-Aware Mocking**: Automatic mock client in test environments.
|
|
|
|
### Design Principles
|
|
|
|
- **Single Responsibility**: `AIService` handles all AI interactions.
|
|
- **Fail-Safe Fallbacks**: If a model fails, try the next one in the chain.
|
|
- **Cost Optimization**: Use cheaper "lite" models for simple text tasks.
|
|
- **Structured Logging**: Log all AI interactions with timing and model info.
|
|
|
|
## Implementation Details
|
|
|
|
### AIService Class Structure
|
|
|
|
Located in `src/services/aiService.server.ts`:
|
|
|
|
```typescript
|
|
interface IAiClient {
|
|
generateContent(request: {
|
|
contents: Content[];
|
|
tools?: Tool[];
|
|
useLiteModels?: boolean;
|
|
}): Promise<GenerateContentResponse>;
|
|
}
|
|
|
|
interface IFileSystem {
|
|
readFile(path: string): Promise<Buffer>;
|
|
}
|
|
|
|
export class AIService {
|
|
private aiClient: IAiClient;
|
|
private fs: IFileSystem;
|
|
private rateLimiter: <T>(fn: () => Promise<T>) => Promise<T>;
|
|
private logger: Logger;
|
|
|
|
constructor(logger: Logger, aiClient?: IAiClient, fs?: IFileSystem) {
|
|
// If aiClient provided: use it (unit test)
|
|
// Else if test environment: use internal mock (integration test)
|
|
// Else: create real GoogleGenAI client (production)
|
|
}
|
|
}
|
|
```
|
|
|
|
### Tiered Model Lists
|
|
|
|
Models are organized by task complexity and quota bucket:
|
|
|
|
```typescript
|
|
// For image processing (vision + long output)
|
|
private readonly models = [
|
|
// Tier A: Fast & Stable
|
|
'gemini-2.5-flash', // Primary, 65k output
|
|
'gemini-2.5-flash-lite', // Cost-saver, 65k output
|
|
|
|
// Tier B: Heavy Lifters
|
|
'gemini-2.5-pro', // Complex layouts, 65k output
|
|
|
|
// Tier C: Preview Bucket (separate quota)
|
|
'gemini-3-flash-preview',
|
|
'gemini-3-pro-preview',
|
|
|
|
// Tier D: Experimental Bucket
|
|
'gemini-exp-1206',
|
|
|
|
// Tier E: Last Resort
|
|
'gemma-3-27b-it',
|
|
'gemini-2.0-flash-exp', // WARNING: 8k limit
|
|
];
|
|
|
|
// For simple text tasks (recipes, categorization)
|
|
private readonly models_lite = [
|
|
'gemini-2.5-flash-lite',
|
|
'gemini-2.0-flash-lite-001',
|
|
'gemini-2.0-flash-001',
|
|
'gemma-3-12b-it',
|
|
'gemma-3-4b-it',
|
|
'gemini-2.0-flash-exp',
|
|
];
|
|
```
|
|
|
|
### Fallback with Retry Logic
|
|
|
|
```typescript
|
|
private async _generateWithFallback(
|
|
genAI: GoogleGenAI,
|
|
request: { contents: Content[]; tools?: Tool[] },
|
|
models: string[],
|
|
): Promise<GenerateContentResponse> {
|
|
let lastError: Error | null = null;
|
|
|
|
for (const modelName of models) {
|
|
try {
|
|
return await genAI.models.generateContent({ model: modelName, ...request });
|
|
} catch (error: unknown) {
|
|
const errorMsg = extractErrorMessage(error);
|
|
const isRetriable = [
|
|
'quota', '429', '503', 'resource_exhausted',
|
|
'overloaded', 'unavailable', 'not found'
|
|
].some(term => errorMsg.toLowerCase().includes(term));
|
|
|
|
if (isRetriable) {
|
|
this.logger.warn(`Model '${modelName}' failed, trying next...`);
|
|
lastError = new Error(errorMsg);
|
|
continue;
|
|
}
|
|
throw error; // Non-retriable error
|
|
}
|
|
}
|
|
throw lastError || new Error('All AI models failed.');
|
|
}
|
|
```
|
|
|
|
### Rate Limiting
|
|
|
|
```typescript
|
|
const requestsPerMinute = parseInt(process.env.GEMINI_RPM || '5', 10);
|
|
this.rateLimiter = pRateLimit({
|
|
interval: 60 * 1000,
|
|
rate: requestsPerMinute,
|
|
concurrency: requestsPerMinute,
|
|
});
|
|
|
|
// Usage:
|
|
const result = await this.rateLimiter(() =>
|
|
this.aiClient.generateContent({ contents: [...] })
|
|
);
|
|
```
|
|
|
|
### Test Environment Detection
|
|
|
|
```typescript
|
|
const isTestEnvironment = process.env.NODE_ENV === 'test' || !!process.env.VITEST_POOL_ID;
|
|
|
|
if (aiClient) {
|
|
// Unit test: use provided mock
|
|
this.aiClient = aiClient;
|
|
} else if (isTestEnvironment) {
|
|
// Integration test: use internal mock
|
|
this.aiClient = {
|
|
generateContent: async () => ({
|
|
text: JSON.stringify(this.getMockFlyerData()),
|
|
}),
|
|
};
|
|
} else {
|
|
// Production: use real client
|
|
const genAI = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });
|
|
this.aiClient = { generateContent: /* adapter */ };
|
|
}
|
|
```
|
|
|
|
### Prompt Engineering
|
|
|
|
Prompts are constructed with:
|
|
|
|
1. **Clear Task Definition**: What to extract and in what format.
|
|
2. **Structured Output Requirements**: JSON schema with field descriptions.
|
|
3. **Examples**: Concrete examples of expected output.
|
|
4. **Context Hints**: User location for store address resolution.
|
|
|
|
```typescript
|
|
private _buildFlyerExtractionPrompt(
|
|
masterItems: MasterGroceryItem[],
|
|
submitterIp?: string,
|
|
userProfileAddress?: string,
|
|
): string {
|
|
// Location hint for address resolution
|
|
let locationHint = '';
|
|
if (userProfileAddress) {
|
|
locationHint = `The user has profile address "${userProfileAddress}"...`;
|
|
}
|
|
|
|
// Simplified master item list (reduce token usage)
|
|
const simplifiedMasterList = masterItems.map(item => ({
|
|
id: item.master_grocery_item_id,
|
|
name: item.name,
|
|
}));
|
|
|
|
return `
|
|
# TASK
|
|
Analyze the flyer image(s) and extract...
|
|
|
|
# RULES
|
|
1. Extract store_name, valid_from, valid_to, store_address
|
|
2. Extract items array with item, price_display, price_in_cents...
|
|
|
|
# EXAMPLES
|
|
- { "item": "Red Grapes", "price_display": "$1.99 /lb", ... }
|
|
|
|
# MASTER LIST
|
|
${JSON.stringify(simplifiedMasterList)}
|
|
`;
|
|
}
|
|
```
|
|
|
|
### Response Parsing
|
|
|
|
AI responses may contain markdown, trailing text, or formatting issues:
|
|
|
|
````typescript
|
|
private _parseJsonFromAiResponse<T>(responseText: string | undefined, logger: Logger): T | null {
|
|
if (!responseText) return null;
|
|
|
|
// Try to extract from markdown code block
|
|
const markdownMatch = responseText.match(/```(json)?\s*([\s\S]*?)\s*```/);
|
|
let jsonString = markdownMatch?.[2]?.trim() || responseText;
|
|
|
|
// Find JSON boundaries
|
|
const startIndex = Math.min(
|
|
jsonString.indexOf('{') >= 0 ? jsonString.indexOf('{') : Infinity,
|
|
jsonString.indexOf('[') >= 0 ? jsonString.indexOf('[') : Infinity
|
|
);
|
|
const endIndex = Math.max(jsonString.lastIndexOf('}'), jsonString.lastIndexOf(']'));
|
|
|
|
if (startIndex === Infinity || endIndex === -1) return null;
|
|
|
|
try {
|
|
return JSON.parse(jsonString.substring(startIndex, endIndex + 1));
|
|
} catch {
|
|
return null;
|
|
}
|
|
}
|
|
````
|
|
|
|
## Consequences
|
|
|
|
### Positive
|
|
|
|
- **Resilience**: Automatic failover when models are unavailable or rate-limited.
|
|
- **Cost Control**: Uses cheaper models for simple tasks.
|
|
- **Testability**: Full mock support for unit and integration tests.
|
|
- **Observability**: Detailed logging of all AI operations with timing.
|
|
- **Maintainability**: Centralized AI logic in one service.
|
|
|
|
### Negative
|
|
|
|
- **Model List Maintenance**: Must update model lists when new models release.
|
|
- **Complexity**: Fallback logic adds complexity.
|
|
- **Delayed Failures**: May take longer to fail if all models are down.
|
|
|
|
### Mitigation
|
|
|
|
- Monitor model deprecation announcements from Google.
|
|
- Add health checks that validate AI connectivity on startup.
|
|
- Consider caching successful model selections per task type.
|
|
|
|
## Key Files
|
|
|
|
- `src/services/aiService.server.ts` - Main AIService class
|
|
- `src/services/aiService.server.test.ts` - Unit tests with mocked AI client
|
|
- `src/services/aiApiClient.ts` - Low-level API client wrapper
|
|
- `src/services/aiAnalysisService.ts` - Higher-level analysis orchestration
|
|
- `src/types/ai.ts` - Zod schemas for AI response validation
|
|
|
|
## Related ADRs
|
|
|
|
- [ADR-027](./0027-standardized-naming-convention-for-ai-and-database-types.md) - Naming Conventions for AI Types
|
|
- [ADR-039](./0039-dependency-injection-pattern.md) - Dependency Injection Pattern
|
|
- [ADR-001](./0001-standardized-error-handling.md) - Error Handling
|