# ADR-038: Graceful Shutdown Pattern **Date**: 2026-01-09 **Status**: Accepted **Implemented**: 2026-01-09 ## Context When deploying or restarting the application, abrupt termination can cause: 1. **Lost Jobs**: BullMQ jobs in progress may be marked as failed or stalled. 2. **Connection Leaks**: Database and Redis connections may not be properly closed. 3. **Incomplete Requests**: HTTP requests in flight may receive no response. 4. **Data Corruption**: Transactions may be left in an inconsistent state. Kubernetes and PM2 send termination signals (SIGTERM, SIGINT) to processes before forcefully killing them. The application must handle these signals to shut down gracefully. ## Decision We will implement a coordinated graceful shutdown pattern that: 1. **Stops Accepting New Work**: Closes HTTP server, pauses job queues. 2. **Completes In-Flight Work**: Waits for active requests and jobs to finish. 3. **Releases Resources**: Closes database pools, Redis connections, and queues. 4. **Logs Shutdown Progress**: Provides visibility into the shutdown process. ### Signal Handling | Signal | Source | Behavior | | ------- | ------------------ | --------------------------------------- | | SIGTERM | Kubernetes, PM2 | Graceful shutdown with resource cleanup | | SIGINT | Ctrl+C in terminal | Same as SIGTERM | | SIGKILL | Force kill | Cannot be caught; immediate termination | ## Implementation Details ### Queue and Worker Shutdown Located in `src/services/queueService.server.ts`: ```typescript import { logger } from './logger.server'; export const gracefulShutdown = async (signal: string): Promise => { logger.info(`[Shutdown] Received ${signal}. Closing all queues and workers...`); const resources = [ { name: 'flyerQueue', close: () => flyerQueue.close() }, { name: 'emailQueue', close: () => emailQueue.close() }, { name: 'analyticsQueue', close: () => analyticsQueue.close() }, { name: 'weeklyAnalyticsQueue', close: () => weeklyAnalyticsQueue.close() }, { name: 'cleanupQueue', close: () => cleanupQueue.close() }, { name: 'tokenCleanupQueue', close: () => tokenCleanupQueue.close() }, { name: 'redisConnection', close: () => connection.quit() }, ]; const results = await Promise.allSettled( resources.map(async (resource) => { try { await resource.close(); logger.info(`[Shutdown] ${resource.name} closed successfully.`); } catch (error) { logger.error({ err: error }, `[Shutdown] Error closing ${resource.name}`); throw error; } }), ); const failures = results.filter((r) => r.status === 'rejected'); if (failures.length > 0) { logger.error(`[Shutdown] ${failures.length} resources failed to close.`); } logger.info('[Shutdown] All resources closed. Process can now exit.'); }; // Register signal handlers process.on('SIGTERM', () => gracefulShutdown('SIGTERM')); process.on('SIGINT', () => gracefulShutdown('SIGINT')); ``` ### HTTP Server Shutdown Located in `server.ts`: ```typescript import { gracefulShutdown as shutdownQueues } from './src/services/queueService.server'; import { closePool } from './src/services/db/connection.db'; const server = app.listen(PORT, () => { logger.info(`Server listening on port ${PORT}`); }); const gracefulShutdown = async (signal: string): Promise => { logger.info(`[Shutdown] Received ${signal}. Starting graceful shutdown...`); // 1. Stop accepting new connections server.close((err) => { if (err) { logger.error({ err }, '[Shutdown] Error closing HTTP server'); } else { logger.info('[Shutdown] HTTP server closed.'); } }); // 2. Wait for in-flight requests (with timeout) await new Promise((resolve) => setTimeout(resolve, 5000)); // 3. Close queues and workers await shutdownQueues(signal); // 4. Close database pool await closePool(); logger.info('[Shutdown] Database pool closed.'); // 5. Exit process process.exit(0); }; process.on('SIGTERM', () => gracefulShutdown('SIGTERM')); process.on('SIGINT', () => gracefulShutdown('SIGINT')); ``` ### Database Pool Shutdown Located in `src/services/db/connection.db.ts`: ```typescript let pool: Pool | null = null; export function getPool(): Pool { if (!pool) { pool = new Pool({ max: 20, idleTimeoutMillis: 30000, connectionTimeoutMillis: 2000, }); } return pool; } export async function closePool(): Promise { if (pool) { await pool.end(); pool = null; logger.info('[Database] Connection pool closed.'); } } export function getPoolStatus(): { totalCount: number; idleCount: number; waitingCount: number } { const p = getPool(); return { totalCount: p.totalCount, idleCount: p.idleCount, waitingCount: p.waitingCount, }; } ``` ### PM2 Ecosystem Configuration Located in `ecosystem.config.cjs`: ```javascript module.exports = { apps: [ { name: 'flyer-crawler-api', script: 'server.ts', interpreter: 'tsx', // Graceful shutdown settings kill_timeout: 10000, // 10 seconds to cleanup before SIGKILL wait_ready: true, // Wait for 'ready' signal before considering app started listen_timeout: 10000, // Timeout for ready signal // Cluster mode for zero-downtime reloads instances: 1, exec_mode: 'fork', // Environment variables env_production: { NODE_ENV: 'production', PORT: 3000, }, env_test: { NODE_ENV: 'test', PORT: 3001, }, }, ], }; ``` ### Worker Graceful Shutdown BullMQ workers can be configured to wait for active jobs: ```typescript import { Worker } from 'bullmq'; const worker = new Worker('flyerQueue', processor, { connection, // Graceful shutdown: wait for active jobs before closing settings: { lockDuration: 30000, // Time before job is considered stalled stalledInterval: 5000, // Check for stalled jobs every 5s }, }); // Workers auto-close when connection closes worker.on('closing', () => { logger.info('[Worker] flyerQueue worker is closing...'); }); worker.on('closed', () => { logger.info('[Worker] flyerQueue worker closed.'); }); ``` ### Shutdown Sequence Diagram ```text SIGTERM Received │ ▼ ┌──────────────────────┐ │ Stop HTTP Server │ ← No new connections accepted │ (server.close()) │ └──────────────────────┘ │ ▼ ┌──────────────────────┐ │ Wait for In-Flight │ ← 5-second grace period │ Requests │ └──────────────────────┘ │ ▼ ┌──────────────────────┐ │ Close BullMQ Queues │ ← Stop processing new jobs │ and Workers │ └──────────────────────┘ │ ▼ ┌──────────────────────┐ │ Close Redis │ ← Disconnect from Redis │ Connection │ └──────────────────────┘ │ ▼ ┌──────────────────────┐ │ Close Database Pool │ ← Release all DB connections │ (pool.end()) │ └──────────────────────┘ │ ▼ ┌──────────────────────┐ │ process.exit(0) │ ← Clean exit └──────────────────────┘ ``` ## Consequences ### Positive - **Zero Lost Work**: In-flight requests and jobs complete before shutdown. - **Clean Resource Cleanup**: All connections are properly closed. - **Zero-Downtime Deploys**: PM2 can reload without dropping requests. - **Observability**: Shutdown progress is logged for debugging. ### Negative - **Shutdown Delay**: Takes 5-15 seconds to fully shutdown. - **Complexity**: Multiple shutdown handlers must be coordinated. - **Edge Cases**: Very long-running jobs may be killed if they exceed the grace period. ## Key Files - `server.ts` - HTTP server shutdown and signal handling - `src/services/queueService.server.ts` - Queue shutdown (`gracefulShutdown`) - `src/services/db/connection.db.ts` - Database pool shutdown (`closePool`) - `ecosystem.config.cjs` - PM2 configuration with `kill_timeout` ## Related ADRs - [ADR-006](./0006-background-job-processing-and-task-queues.md) - Background Job Processing - [ADR-020](./0020-health-checks-and-liveness-readiness-probes.md) - Health Checks - [ADR-014](./0014-containerization-and-deployment-strategy.md) - Containerization