All checks were successful
Deploy to Test Environment / deploy-to-test (push) Successful in 30m3s
98 lines
4.5 KiB
Markdown
98 lines
4.5 KiB
Markdown
# ADR-006: Background Job Processing and Task Queues
|
|
|
|
**Date**: 2025-12-12
|
|
|
|
**Status**: Accepted
|
|
|
|
## Context
|
|
|
|
The application's core purpose involves long-running, resource-intensive tasks like OCR processing of flyers. Executing these tasks within the lifecycle of an API request is unreliable and does not scale. A failure during processing could be lost without a robust system.
|
|
|
|
## Decision
|
|
|
|
We will implement a dedicated background job processing system using a task queue library like **BullMQ** (with Redis). All flyer ingestion, OCR processing, and other long-running tasks (e.g., sending bulk emails) MUST be dispatched as jobs to this queue.
|
|
|
|
## Consequences
|
|
|
|
**Positive**: Decouples the API from heavy processing, allows for retries on failure, and enables scaling the processing workers independently. Increases application reliability and resilience.
|
|
**Negative**: Introduces a new dependency (Redis) into the infrastructure. Requires refactoring of the flyer processing logic to work within a job queue structure.
|
|
|
|
## Implementation Details
|
|
|
|
### Queue Infrastructure
|
|
|
|
The implementation uses **BullMQ v5.65.1** with **ioredis v5.8.2** for Redis connectivity. Six distinct queues handle different job types:
|
|
|
|
| Queue Name | Purpose | Retry Attempts | Backoff Strategy |
|
|
| ---------------------------- | --------------------------- | -------------- | ---------------------- |
|
|
| `flyer-processing` | OCR/AI processing of flyers | 3 | Exponential (5s base) |
|
|
| `email-sending` | Email delivery | 5 | Exponential (10s base) |
|
|
| `analytics-reporting` | Daily report generation | 2 | Exponential (60s base) |
|
|
| `weekly-analytics-reporting` | Weekly report generation | 2 | Exponential (1h base) |
|
|
| `file-cleanup` | Temporary file cleanup | 3 | Exponential (30s base) |
|
|
| `token-cleanup` | Expired token removal | 2 | Exponential (1h base) |
|
|
|
|
### Key Files
|
|
|
|
- `src/services/queues.server.ts` - Queue definitions and configuration
|
|
- `src/services/workers.server.ts` - Worker implementations with configurable concurrency
|
|
- `src/services/redis.server.ts` - Redis connection management
|
|
- `src/services/queueService.server.ts` - Queue lifecycle and graceful shutdown
|
|
- `src/services/flyerProcessingService.server.ts` - 5-stage flyer processing pipeline
|
|
- `src/types/job-data.ts` - TypeScript interfaces for all job data types
|
|
|
|
### API Design
|
|
|
|
Endpoints for long-running tasks return **202 Accepted** immediately with a job ID:
|
|
|
|
```text
|
|
POST /api/ai/upload-and-process → 202 { jobId: "..." }
|
|
GET /api/ai/jobs/:jobId/status → { state: "...", progress: ... }
|
|
```
|
|
|
|
### Worker Configuration
|
|
|
|
Workers are configured via environment variables:
|
|
|
|
- `WORKER_CONCURRENCY` - Flyer processing parallelism (default: 1)
|
|
- `EMAIL_WORKER_CONCURRENCY` - Email worker parallelism (default: 10)
|
|
- `ANALYTICS_WORKER_CONCURRENCY` - Analytics worker parallelism (default: 1)
|
|
- `CLEANUP_WORKER_CONCURRENCY` - Cleanup worker parallelism (default: 10)
|
|
|
|
### Monitoring
|
|
|
|
- **Bull Board UI** available at `/api/admin/jobs` for admin users
|
|
- Worker status endpoint: `GET /api/admin/workers/status`
|
|
- Queue status endpoint: `GET /api/admin/queues/status`
|
|
|
|
### Graceful Shutdown
|
|
|
|
Both API and worker processes implement graceful shutdown with a 30-second timeout, ensuring in-flight jobs complete before process termination.
|
|
|
|
## Compliance Notes
|
|
|
|
### Deprecated Synchronous Endpoints
|
|
|
|
The following endpoints process flyers synchronously and are **deprecated**:
|
|
|
|
- `POST /api/ai/upload-legacy` - For integration testing only
|
|
- `POST /api/ai/flyers/process` - Legacy workflow, should migrate to queue-based approach
|
|
|
|
New integrations MUST use `POST /api/ai/upload-and-process` for queue-based processing.
|
|
|
|
### Email Handling
|
|
|
|
- **Bulk emails** (deal notifications): Enqueued via `emailQueue`
|
|
- **Transactional emails** (password reset): Sent synchronously for immediate user feedback
|
|
|
|
## Future Enhancements
|
|
|
|
Potential improvements for consideration:
|
|
|
|
1. **Dead Letter Queue (DLQ)**: Move permanently failed jobs to a dedicated queue for analysis
|
|
2. **Job Priority Levels**: Allow priority-based processing for different job types
|
|
3. **Real-time Progress**: WebSocket/SSE for live job progress updates to clients
|
|
4. **Per-Queue Rate Limiting**: Throttle job processing based on external API limits
|
|
5. **Job Dependencies**: Support for jobs that depend on completion of other jobs
|
|
6. **Prometheus Metrics**: Export queue metrics for observability dashboards
|