Files
flyer-crawler.projectium.com/docs/adr/0006-background-job-processing-and-task-queues.md
Torben Sorensen 664ad291be
All checks were successful
Deploy to Test Environment / deploy-to-test (push) Successful in 30m3s
integration test fixes - claude for the win? try 3
2026-01-09 03:41:57 -08:00

98 lines
4.5 KiB
Markdown

# ADR-006: Background Job Processing and Task Queues
**Date**: 2025-12-12
**Status**: Accepted
## Context
The application's core purpose involves long-running, resource-intensive tasks like OCR processing of flyers. Executing these tasks within the lifecycle of an API request is unreliable and does not scale. A failure during processing could be lost without a robust system.
## Decision
We will implement a dedicated background job processing system using a task queue library like **BullMQ** (with Redis). All flyer ingestion, OCR processing, and other long-running tasks (e.g., sending bulk emails) MUST be dispatched as jobs to this queue.
## Consequences
**Positive**: Decouples the API from heavy processing, allows for retries on failure, and enables scaling the processing workers independently. Increases application reliability and resilience.
**Negative**: Introduces a new dependency (Redis) into the infrastructure. Requires refactoring of the flyer processing logic to work within a job queue structure.
## Implementation Details
### Queue Infrastructure
The implementation uses **BullMQ v5.65.1** with **ioredis v5.8.2** for Redis connectivity. Six distinct queues handle different job types:
| Queue Name | Purpose | Retry Attempts | Backoff Strategy |
| ---------------------------- | --------------------------- | -------------- | ---------------------- |
| `flyer-processing` | OCR/AI processing of flyers | 3 | Exponential (5s base) |
| `email-sending` | Email delivery | 5 | Exponential (10s base) |
| `analytics-reporting` | Daily report generation | 2 | Exponential (60s base) |
| `weekly-analytics-reporting` | Weekly report generation | 2 | Exponential (1h base) |
| `file-cleanup` | Temporary file cleanup | 3 | Exponential (30s base) |
| `token-cleanup` | Expired token removal | 2 | Exponential (1h base) |
### Key Files
- `src/services/queues.server.ts` - Queue definitions and configuration
- `src/services/workers.server.ts` - Worker implementations with configurable concurrency
- `src/services/redis.server.ts` - Redis connection management
- `src/services/queueService.server.ts` - Queue lifecycle and graceful shutdown
- `src/services/flyerProcessingService.server.ts` - 5-stage flyer processing pipeline
- `src/types/job-data.ts` - TypeScript interfaces for all job data types
### API Design
Endpoints for long-running tasks return **202 Accepted** immediately with a job ID:
```text
POST /api/ai/upload-and-process → 202 { jobId: "..." }
GET /api/ai/jobs/:jobId/status → { state: "...", progress: ... }
```
### Worker Configuration
Workers are configured via environment variables:
- `WORKER_CONCURRENCY` - Flyer processing parallelism (default: 1)
- `EMAIL_WORKER_CONCURRENCY` - Email worker parallelism (default: 10)
- `ANALYTICS_WORKER_CONCURRENCY` - Analytics worker parallelism (default: 1)
- `CLEANUP_WORKER_CONCURRENCY` - Cleanup worker parallelism (default: 10)
### Monitoring
- **Bull Board UI** available at `/api/admin/jobs` for admin users
- Worker status endpoint: `GET /api/admin/workers/status`
- Queue status endpoint: `GET /api/admin/queues/status`
### Graceful Shutdown
Both API and worker processes implement graceful shutdown with a 30-second timeout, ensuring in-flight jobs complete before process termination.
## Compliance Notes
### Deprecated Synchronous Endpoints
The following endpoints process flyers synchronously and are **deprecated**:
- `POST /api/ai/upload-legacy` - For integration testing only
- `POST /api/ai/flyers/process` - Legacy workflow, should migrate to queue-based approach
New integrations MUST use `POST /api/ai/upload-and-process` for queue-based processing.
### Email Handling
- **Bulk emails** (deal notifications): Enqueued via `emailQueue`
- **Transactional emails** (password reset): Sent synchronously for immediate user feedback
## Future Enhancements
Potential improvements for consideration:
1. **Dead Letter Queue (DLQ)**: Move permanently failed jobs to a dedicated queue for analysis
2. **Job Priority Levels**: Allow priority-based processing for different job types
3. **Real-time Progress**: WebSocket/SSE for live job progress updates to clients
4. **Per-Queue Rate Limiting**: Throttle job processing based on external API limits
5. **Job Dependencies**: Support for jobs that depend on completion of other jobs
6. **Prometheus Metrics**: Export queue metrics for observability dashboards