Files
flyer-crawler.projectium.com/docs/adr/0006-background-job-processing-and-task-queues.md
Torben Sorensen 664ad291be
All checks were successful
Deploy to Test Environment / deploy-to-test (push) Successful in 30m3s
integration test fixes - claude for the win? try 3
2026-01-09 03:41:57 -08:00

4.5 KiB

ADR-006: Background Job Processing and Task Queues

Date: 2025-12-12

Status: Accepted

Context

The application's core purpose involves long-running, resource-intensive tasks like OCR processing of flyers. Executing these tasks within the lifecycle of an API request is unreliable and does not scale. A failure during processing could be lost without a robust system.

Decision

We will implement a dedicated background job processing system using a task queue library like BullMQ (with Redis). All flyer ingestion, OCR processing, and other long-running tasks (e.g., sending bulk emails) MUST be dispatched as jobs to this queue.

Consequences

Positive: Decouples the API from heavy processing, allows for retries on failure, and enables scaling the processing workers independently. Increases application reliability and resilience. Negative: Introduces a new dependency (Redis) into the infrastructure. Requires refactoring of the flyer processing logic to work within a job queue structure.

Implementation Details

Queue Infrastructure

The implementation uses BullMQ v5.65.1 with ioredis v5.8.2 for Redis connectivity. Six distinct queues handle different job types:

Queue Name Purpose Retry Attempts Backoff Strategy
flyer-processing OCR/AI processing of flyers 3 Exponential (5s base)
email-sending Email delivery 5 Exponential (10s base)
analytics-reporting Daily report generation 2 Exponential (60s base)
weekly-analytics-reporting Weekly report generation 2 Exponential (1h base)
file-cleanup Temporary file cleanup 3 Exponential (30s base)
token-cleanup Expired token removal 2 Exponential (1h base)

Key Files

  • src/services/queues.server.ts - Queue definitions and configuration
  • src/services/workers.server.ts - Worker implementations with configurable concurrency
  • src/services/redis.server.ts - Redis connection management
  • src/services/queueService.server.ts - Queue lifecycle and graceful shutdown
  • src/services/flyerProcessingService.server.ts - 5-stage flyer processing pipeline
  • src/types/job-data.ts - TypeScript interfaces for all job data types

API Design

Endpoints for long-running tasks return 202 Accepted immediately with a job ID:

POST /api/ai/upload-and-process → 202 { jobId: "..." }
GET  /api/ai/jobs/:jobId/status → { state: "...", progress: ... }

Worker Configuration

Workers are configured via environment variables:

  • WORKER_CONCURRENCY - Flyer processing parallelism (default: 1)
  • EMAIL_WORKER_CONCURRENCY - Email worker parallelism (default: 10)
  • ANALYTICS_WORKER_CONCURRENCY - Analytics worker parallelism (default: 1)
  • CLEANUP_WORKER_CONCURRENCY - Cleanup worker parallelism (default: 10)

Monitoring

  • Bull Board UI available at /api/admin/jobs for admin users
  • Worker status endpoint: GET /api/admin/workers/status
  • Queue status endpoint: GET /api/admin/queues/status

Graceful Shutdown

Both API and worker processes implement graceful shutdown with a 30-second timeout, ensuring in-flight jobs complete before process termination.

Compliance Notes

Deprecated Synchronous Endpoints

The following endpoints process flyers synchronously and are deprecated:

  • POST /api/ai/upload-legacy - For integration testing only
  • POST /api/ai/flyers/process - Legacy workflow, should migrate to queue-based approach

New integrations MUST use POST /api/ai/upload-and-process for queue-based processing.

Email Handling

  • Bulk emails (deal notifications): Enqueued via emailQueue
  • Transactional emails (password reset): Sent synchronously for immediate user feedback

Future Enhancements

Potential improvements for consideration:

  1. Dead Letter Queue (DLQ): Move permanently failed jobs to a dedicated queue for analysis
  2. Job Priority Levels: Allow priority-based processing for different job types
  3. Real-time Progress: WebSocket/SSE for live job progress updates to clients
  4. Per-Queue Rate Limiting: Throttle job processing based on external API limits
  5. Job Dependencies: Support for jobs that depend on completion of other jobs
  6. Prometheus Metrics: Export queue metrics for observability dashboards