integration test fixes - claude for the win? try 3
All checks were successful
Deploy to Test Environment / deploy-to-test (push) Successful in 30m3s

This commit is contained in:
2026-01-09 03:41:57 -08:00
parent ff912b9055
commit 664ad291be
10 changed files with 591 additions and 63 deletions

View File

@@ -2,7 +2,7 @@
**Date**: 2025-12-12
**Status**: Proposed
**Status**: Accepted
## Context
@@ -16,3 +16,82 @@ We will implement a dedicated background job processing system using a task queu
**Positive**: Decouples the API from heavy processing, allows for retries on failure, and enables scaling the processing workers independently. Increases application reliability and resilience.
**Negative**: Introduces a new dependency (Redis) into the infrastructure. Requires refactoring of the flyer processing logic to work within a job queue structure.
## Implementation Details
### Queue Infrastructure
The implementation uses **BullMQ v5.65.1** with **ioredis v5.8.2** for Redis connectivity. Six distinct queues handle different job types:
| Queue Name | Purpose | Retry Attempts | Backoff Strategy |
| ---------------------------- | --------------------------- | -------------- | ---------------------- |
| `flyer-processing` | OCR/AI processing of flyers | 3 | Exponential (5s base) |
| `email-sending` | Email delivery | 5 | Exponential (10s base) |
| `analytics-reporting` | Daily report generation | 2 | Exponential (60s base) |
| `weekly-analytics-reporting` | Weekly report generation | 2 | Exponential (1h base) |
| `file-cleanup` | Temporary file cleanup | 3 | Exponential (30s base) |
| `token-cleanup` | Expired token removal | 2 | Exponential (1h base) |
### Key Files
- `src/services/queues.server.ts` - Queue definitions and configuration
- `src/services/workers.server.ts` - Worker implementations with configurable concurrency
- `src/services/redis.server.ts` - Redis connection management
- `src/services/queueService.server.ts` - Queue lifecycle and graceful shutdown
- `src/services/flyerProcessingService.server.ts` - 5-stage flyer processing pipeline
- `src/types/job-data.ts` - TypeScript interfaces for all job data types
### API Design
Endpoints for long-running tasks return **202 Accepted** immediately with a job ID:
```text
POST /api/ai/upload-and-process → 202 { jobId: "..." }
GET /api/ai/jobs/:jobId/status → { state: "...", progress: ... }
```
### Worker Configuration
Workers are configured via environment variables:
- `WORKER_CONCURRENCY` - Flyer processing parallelism (default: 1)
- `EMAIL_WORKER_CONCURRENCY` - Email worker parallelism (default: 10)
- `ANALYTICS_WORKER_CONCURRENCY` - Analytics worker parallelism (default: 1)
- `CLEANUP_WORKER_CONCURRENCY` - Cleanup worker parallelism (default: 10)
### Monitoring
- **Bull Board UI** available at `/api/admin/jobs` for admin users
- Worker status endpoint: `GET /api/admin/workers/status`
- Queue status endpoint: `GET /api/admin/queues/status`
### Graceful Shutdown
Both API and worker processes implement graceful shutdown with a 30-second timeout, ensuring in-flight jobs complete before process termination.
## Compliance Notes
### Deprecated Synchronous Endpoints
The following endpoints process flyers synchronously and are **deprecated**:
- `POST /api/ai/upload-legacy` - For integration testing only
- `POST /api/ai/flyers/process` - Legacy workflow, should migrate to queue-based approach
New integrations MUST use `POST /api/ai/upload-and-process` for queue-based processing.
### Email Handling
- **Bulk emails** (deal notifications): Enqueued via `emailQueue`
- **Transactional emails** (password reset): Sent synchronously for immediate user feedback
## Future Enhancements
Potential improvements for consideration:
1. **Dead Letter Queue (DLQ)**: Move permanently failed jobs to a dedicated queue for analysis
2. **Job Priority Levels**: Allow priority-based processing for different job types
3. **Real-time Progress**: WebSocket/SSE for live job progress updates to clients
4. **Per-Queue Rate Limiting**: Throttle job processing based on external API limits
5. **Job Dependencies**: Support for jobs that depend on completion of other jobs
6. **Prometheus Metrics**: Export queue metrics for observability dashboards

View File

@@ -2,7 +2,7 @@
**Date**: 2025-12-12
**Status**: Proposed
**Status**: Accepted
## Context
@@ -20,3 +20,107 @@ We will implement a multi-layered caching strategy using an in-memory data store
**Positive**: Directly addresses application performance and scalability. Reduces database load and improves API response times for common requests.
**Negative**: Introduces Redis as a dependency if not already used. Adds complexity to the data-fetching logic and requires careful management of cache invalidation to prevent stale data.
## Implementation Details
### Cache Service
A centralized cache service (`src/services/cacheService.server.ts`) provides reusable caching functionality:
- **`getOrSet<T>(key, fetcher, options)`**: Cache-aside pattern implementation
- **`get<T>(key)`**: Retrieve cached value
- **`set<T>(key, value, ttl)`**: Store value with TTL
- **`del(key)`**: Delete specific key
- **`invalidatePattern(pattern)`**: Delete keys matching a pattern
All cache operations are fail-safe - cache failures do not break the application.
### TTL Configuration
Different data types use different TTL values based on volatility:
| Data Type | TTL | Rationale |
| ------------------- | --------- | -------------------------------------- |
| Brands/Stores | 1 hour | Rarely changes, safe to cache longer |
| Flyer lists | 5 minutes | Changes when new flyers are added |
| Individual flyers | 10 minutes| Stable once created |
| Flyer items | 10 minutes| Stable once created |
| Statistics | 5 minutes | Can be slightly stale |
| Frequent sales | 15 minutes| Aggregated data, updated periodically |
| Categories | 1 hour | Rarely changes |
### Cache Key Strategy
Cache keys follow a consistent prefix pattern for pattern-based invalidation:
- `cache:brands` - All brands list
- `cache:flyers:{limit}:{offset}` - Paginated flyer lists
- `cache:flyer:{id}` - Individual flyer data
- `cache:flyer-items:{flyerId}` - Items for a specific flyer
- `cache:stats:*` - Statistics data
- `geocode:{address}` - Geocoding results (30-day TTL)
### Cached Endpoints
The following repository methods implement server-side caching:
| Method | Cache Key Pattern | TTL |
| ------ | ----------------- | --- |
| `FlyerRepository.getAllBrands()` | `cache:brands` | 1 hour |
| `FlyerRepository.getFlyers()` | `cache:flyers:{limit}:{offset}` | 5 minutes |
| `FlyerRepository.getFlyerItems()` | `cache:flyer-items:{flyerId}` | 10 minutes |
### Cache Invalidation
**Event-based invalidation** is triggered on write operations:
- **Flyer creation** (`FlyerPersistenceService.saveFlyer`): Invalidates all `cache:flyers*` keys
- **Flyer deletion** (`FlyerRepository.deleteFlyer`): Invalidates specific flyer and flyer items cache, plus flyer lists
**Manual invalidation** via admin endpoints:
- `POST /api/admin/system/clear-cache` - Clears all application cache (flyers, brands, stats)
- `POST /api/admin/system/clear-geocode-cache` - Clears geocoding cache
### Client-Side Caching
TanStack React Query provides client-side caching with configurable stale times:
| Query Type | Stale Time |
| ----------------- | ----------- |
| Categories | 1 hour |
| Master Items | 10 minutes |
| Flyer Items | 5 minutes |
| Flyers | 2 minutes |
| Shopping Lists | 1 minute |
| Activity Log | 30 seconds |
### Multi-Layer Cache Architecture
```text
Client Request
[TanStack React Query] ← Client-side cache (staleTime-based)
[Express API]
[CacheService.getOrSet()] ← Server-side Redis cache (TTL-based)
[PostgreSQL Database]
```
## Key Files
- `src/services/cacheService.server.ts` - Centralized cache service
- `src/services/db/flyer.db.ts` - Repository with caching for brands, flyers, flyer items
- `src/services/flyerPersistenceService.server.ts` - Cache invalidation on flyer creation
- `src/routes/admin.routes.ts` - Admin cache management endpoints
- `src/config/queryClient.ts` - Client-side query cache configuration
## Future Enhancements
1. **Recipe caching**: Add caching to expensive recipe queries (by-sale-percentage, etc.)
2. **Cache warming**: Pre-populate cache on startup for frequently accessed static data
3. **Cache metrics**: Add hit/miss rate monitoring for observability
4. **Conditional caching**: Skip cache for authenticated user-specific data
5. **Cache compression**: Compress large cached payloads to reduce Redis memory usage