integration test fixes - claude for the win? try 3

2026-01-09 03:41:57 -08:00
parent ff912b9055
commit 664ad291be
10 changed files with 591 additions and 63 deletions
--- a/docs/adr/0006-background-job-processing-and-task-queues.md
+++ b/docs/adr/0006-background-job-processing-and-task-queues.md
@@ -2,7 +2,7 @@

 **Date**: 2025-12-12

-**Status**: Proposed
+**Status**: Accepted

 ## Context

@@ -16,3 +16,82 @@ We will implement a dedicated background job processing system using a task queu

 **Positive**: Decouples the API from heavy processing, allows for retries on failure, and enables scaling the processing workers independently. Increases application reliability and resilience.
 **Negative**: Introduces a new dependency (Redis) into the infrastructure. Requires refactoring of the flyer processing logic to work within a job queue structure.
+
+## Implementation Details
+
+### Queue Infrastructure
+
+The implementation uses **BullMQ v5.65.1** with **ioredis v5.8.2** for Redis connectivity. Six distinct queues handle different job types:
+
+| Queue Name                   | Purpose                     | Retry Attempts | Backoff Strategy       |
+| ---------------------------- | --------------------------- | -------------- | ---------------------- |
+| `flyer-processing`           | OCR/AI processing of flyers | 3              | Exponential (5s base)  |
+| `email-sending`              | Email delivery              | 5              | Exponential (10s base) |
+| `analytics-reporting`        | Daily report generation     | 2              | Exponential (60s base) |
+| `weekly-analytics-reporting` | Weekly report generation    | 2              | Exponential (1h base)  |
+| `file-cleanup`               | Temporary file cleanup      | 3              | Exponential (30s base) |
+| `token-cleanup`              | Expired token removal       | 2              | Exponential (1h base)  |
+
+### Key Files
+
+- `src/services/queues.server.ts` - Queue definitions and configuration
+- `src/services/workers.server.ts` - Worker implementations with configurable concurrency
+- `src/services/redis.server.ts` - Redis connection management
+- `src/services/queueService.server.ts` - Queue lifecycle and graceful shutdown
+- `src/services/flyerProcessingService.server.ts` - 5-stage flyer processing pipeline
+- `src/types/job-data.ts` - TypeScript interfaces for all job data types
+
+### API Design
+
+Endpoints for long-running tasks return **202 Accepted** immediately with a job ID:
+
+```text
+POST /api/ai/upload-and-process → 202 { jobId: "..." }
+GET  /api/ai/jobs/:jobId/status → { state: "...", progress: ... }
+```
+
+### Worker Configuration
+
+Workers are configured via environment variables:
+
+- `WORKER_CONCURRENCY` - Flyer processing parallelism (default: 1)
+- `EMAIL_WORKER_CONCURRENCY` - Email worker parallelism (default: 10)
+- `ANALYTICS_WORKER_CONCURRENCY` - Analytics worker parallelism (default: 1)
+- `CLEANUP_WORKER_CONCURRENCY` - Cleanup worker parallelism (default: 10)
+
+### Monitoring
+
+- **Bull Board UI** available at `/api/admin/jobs` for admin users
+- Worker status endpoint: `GET /api/admin/workers/status`
+- Queue status endpoint: `GET /api/admin/queues/status`
+
+### Graceful Shutdown
+
+Both API and worker processes implement graceful shutdown with a 30-second timeout, ensuring in-flight jobs complete before process termination.
+
+## Compliance Notes
+
+### Deprecated Synchronous Endpoints
+
+The following endpoints process flyers synchronously and are **deprecated**:
+
+- `POST /api/ai/upload-legacy` - For integration testing only
+- `POST /api/ai/flyers/process` - Legacy workflow, should migrate to queue-based approach
+
+New integrations MUST use `POST /api/ai/upload-and-process` for queue-based processing.
+
+### Email Handling
+
+- **Bulk emails** (deal notifications): Enqueued via `emailQueue`
+- **Transactional emails** (password reset): Sent synchronously for immediate user feedback
+
+## Future Enhancements
+
+Potential improvements for consideration:
+
+1. **Dead Letter Queue (DLQ)**: Move permanently failed jobs to a dedicated queue for analysis
+2. **Job Priority Levels**: Allow priority-based processing for different job types
+3. **Real-time Progress**: WebSocket/SSE for live job progress updates to clients
+4. **Per-Queue Rate Limiting**: Throttle job processing based on external API limits
+5. **Job Dependencies**: Support for jobs that depend on completion of other jobs
+6. **Prometheus Metrics**: Export queue metrics for observability dashboards
--- a/docs/adr/0009-caching-strategy-for-read-heavy-operations.md
+++ b/docs/adr/0009-caching-strategy-for-read-heavy-operations.md
@@ -2,7 +2,7 @@

 **Date**: 2025-12-12

-**Status**: Proposed
+**Status**: Accepted

 ## Context

@@ -20,3 +20,107 @@ We will implement a multi-layered caching strategy using an in-memory data store

 **Positive**: Directly addresses application performance and scalability. Reduces database load and improves API response times for common requests.
 **Negative**: Introduces Redis as a dependency if not already used. Adds complexity to the data-fetching logic and requires careful management of cache invalidation to prevent stale data.
+
+## Implementation Details
+
+### Cache Service
+
+A centralized cache service (`src/services/cacheService.server.ts`) provides reusable caching functionality:
+
+- **`getOrSet<T>(key, fetcher, options)`**: Cache-aside pattern implementation
+- **`get<T>(key)`**: Retrieve cached value
+- **`set<T>(key, value, ttl)`**: Store value with TTL
+- **`del(key)`**: Delete specific key
+- **`invalidatePattern(pattern)`**: Delete keys matching a pattern
+
+All cache operations are fail-safe - cache failures do not break the application.
+
+### TTL Configuration
+
+Different data types use different TTL values based on volatility:
+
+| Data Type           | TTL       | Rationale                              |
+| ------------------- | --------- | -------------------------------------- |
+| Brands/Stores       | 1 hour    | Rarely changes, safe to cache longer   |
+| Flyer lists         | 5 minutes | Changes when new flyers are added      |
+| Individual flyers   | 10 minutes| Stable once created                    |
+| Flyer items         | 10 minutes| Stable once created                    |
+| Statistics          | 5 minutes | Can be slightly stale                  |
+| Frequent sales      | 15 minutes| Aggregated data, updated periodically  |
+| Categories          | 1 hour    | Rarely changes                         |
+
+### Cache Key Strategy
+
+Cache keys follow a consistent prefix pattern for pattern-based invalidation:
+
+- `cache:brands` - All brands list
+- `cache:flyers:{limit}:{offset}` - Paginated flyer lists
+- `cache:flyer:{id}` - Individual flyer data
+- `cache:flyer-items:{flyerId}` - Items for a specific flyer
+- `cache:stats:*` - Statistics data
+- `geocode:{address}` - Geocoding results (30-day TTL)
+
+### Cached Endpoints
+
+The following repository methods implement server-side caching:
+
+| Method | Cache Key Pattern | TTL |
+| ------ | ----------------- | --- |
+| `FlyerRepository.getAllBrands()` | `cache:brands` | 1 hour |
+| `FlyerRepository.getFlyers()` | `cache:flyers:{limit}:{offset}` | 5 minutes |
+| `FlyerRepository.getFlyerItems()` | `cache:flyer-items:{flyerId}` | 10 minutes |
+
+### Cache Invalidation
+
+**Event-based invalidation** is triggered on write operations:
+
+- **Flyer creation** (`FlyerPersistenceService.saveFlyer`): Invalidates all `cache:flyers*` keys
+- **Flyer deletion** (`FlyerRepository.deleteFlyer`): Invalidates specific flyer and flyer items cache, plus flyer lists
+
+**Manual invalidation** via admin endpoints:
+
+- `POST /api/admin/system/clear-cache` - Clears all application cache (flyers, brands, stats)
+- `POST /api/admin/system/clear-geocode-cache` - Clears geocoding cache
+
+### Client-Side Caching
+
+TanStack React Query provides client-side caching with configurable stale times:
+
+| Query Type        | Stale Time  |
+| ----------------- | ----------- |
+| Categories        | 1 hour      |
+| Master Items      | 10 minutes  |
+| Flyer Items       | 5 minutes   |
+| Flyers            | 2 minutes   |
+| Shopping Lists    | 1 minute    |
+| Activity Log      | 30 seconds  |
+
+### Multi-Layer Cache Architecture
+
+```text
+Client Request
+      ↓
+[TanStack React Query] ← Client-side cache (staleTime-based)
+      ↓
+[Express API]
+      ↓
+[CacheService.getOrSet()] ← Server-side Redis cache (TTL-based)
+      ↓
+[PostgreSQL Database]
+```
+
+## Key Files
+
+- `src/services/cacheService.server.ts` - Centralized cache service
+- `src/services/db/flyer.db.ts` - Repository with caching for brands, flyers, flyer items
+- `src/services/flyerPersistenceService.server.ts` - Cache invalidation on flyer creation
+- `src/routes/admin.routes.ts` - Admin cache management endpoints
+- `src/config/queryClient.ts` - Client-side query cache configuration
+
+## Future Enhancements
+
+1. **Recipe caching**: Add caching to expensive recipe queries (by-sale-percentage, etc.)
+2. **Cache warming**: Pre-populate cache on startup for frequently accessed static data
+3. **Cache metrics**: Add hit/miss rate monitoring for observability
+4. **Conditional caching**: Skip cache for authenticated user-specific data
+5. **Cache compression**: Compress large cached payloads to reduce Redis memory usage