# ADR-006: Background Job Processing and Task Queues **Date**: 2025-12-12 **Status**: Accepted ## Context The application's core purpose involves long-running, resource-intensive tasks like OCR processing of flyers. Executing these tasks within the lifecycle of an API request is unreliable and does not scale. A failure during processing could be lost without a robust system. ## Decision We will implement a dedicated background job processing system using a task queue library like **BullMQ** (with Redis). All flyer ingestion, OCR processing, and other long-running tasks (e.g., sending bulk emails) MUST be dispatched as jobs to this queue. ## Consequences **Positive**: Decouples the API from heavy processing, allows for retries on failure, and enables scaling the processing workers independently. Increases application reliability and resilience. **Negative**: Introduces a new dependency (Redis) into the infrastructure. Requires refactoring of the flyer processing logic to work within a job queue structure. ## Implementation Details ### Queue Infrastructure The implementation uses **BullMQ v5.65.1** with **ioredis v5.8.2** for Redis connectivity. Six distinct queues handle different job types: | Queue Name | Purpose | Retry Attempts | Backoff Strategy | | ---------------------------- | --------------------------- | -------------- | ---------------------- | | `flyer-processing` | OCR/AI processing of flyers | 3 | Exponential (5s base) | | `email-sending` | Email delivery | 5 | Exponential (10s base) | | `analytics-reporting` | Daily report generation | 2 | Exponential (60s base) | | `weekly-analytics-reporting` | Weekly report generation | 2 | Exponential (1h base) | | `file-cleanup` | Temporary file cleanup | 3 | Exponential (30s base) | | `token-cleanup` | Expired token removal | 2 | Exponential (1h base) | ### Key Files - `src/services/queues.server.ts` - Queue definitions and configuration - `src/services/workers.server.ts` - Worker implementations with configurable concurrency - `src/services/redis.server.ts` - Redis connection management - `src/services/queueService.server.ts` - Queue lifecycle and graceful shutdown - `src/services/flyerProcessingService.server.ts` - 5-stage flyer processing pipeline - `src/types/job-data.ts` - TypeScript interfaces for all job data types ### API Design Endpoints for long-running tasks return **202 Accepted** immediately with a job ID: ```text POST /api/ai/upload-and-process → 202 { jobId: "..." } GET /api/ai/jobs/:jobId/status → { state: "...", progress: ... } ``` ### Worker Configuration Workers are configured via environment variables: - `WORKER_CONCURRENCY` - Flyer processing parallelism (default: 1) - `EMAIL_WORKER_CONCURRENCY` - Email worker parallelism (default: 10) - `ANALYTICS_WORKER_CONCURRENCY` - Analytics worker parallelism (default: 1) - `CLEANUP_WORKER_CONCURRENCY` - Cleanup worker parallelism (default: 10) ### Monitoring - **Bull Board UI** available at `/api/admin/jobs` for admin users - Worker status endpoint: `GET /api/admin/workers/status` - Queue status endpoint: `GET /api/admin/queues/status` ### Graceful Shutdown Both API and worker processes implement graceful shutdown with a 30-second timeout, ensuring in-flight jobs complete before process termination. ## Compliance Notes ### Deprecated Synchronous Endpoints The following endpoints process flyers synchronously and are **deprecated**: - `POST /api/ai/upload-legacy` - For integration testing only - `POST /api/ai/flyers/process` - Legacy workflow, should migrate to queue-based approach New integrations MUST use `POST /api/ai/upload-and-process` for queue-based processing. ### Email Handling - **Bulk emails** (deal notifications): Enqueued via `emailQueue` - **Transactional emails** (password reset): Sent synchronously for immediate user feedback ## Future Enhancements Potential improvements for consideration: 1. **Dead Letter Queue (DLQ)**: Move permanently failed jobs to a dedicated queue for analysis 2. **Job Priority Levels**: Allow priority-based processing for different job types 3. **Real-time Progress**: WebSocket/SSE for live job progress updates to clients 4. **Per-Queue Rate Limiting**: Throttle job processing based on external API limits 5. **Job Dependencies**: Support for jobs that depend on completion of other jobs 6. **Prometheus Metrics**: Export queue metrics for observability dashboards