Files
flyer-crawler.projectium.com/docs/adr/0038-graceful-shutdown-pattern.md

8.7 KiB

ADR-038: Graceful Shutdown Pattern

Date: 2026-01-09

Status: Accepted

Implemented: 2026-01-09

Context

When deploying or restarting the application, abrupt termination can cause:

  1. Lost Jobs: BullMQ jobs in progress may be marked as failed or stalled.
  2. Connection Leaks: Database and Redis connections may not be properly closed.
  3. Incomplete Requests: HTTP requests in flight may receive no response.
  4. Data Corruption: Transactions may be left in an inconsistent state.

Kubernetes and PM2 send termination signals (SIGTERM, SIGINT) to processes before forcefully killing them. The application must handle these signals to shut down gracefully.

Decision

We will implement a coordinated graceful shutdown pattern that:

  1. Stops Accepting New Work: Closes HTTP server, pauses job queues.
  2. Completes In-Flight Work: Waits for active requests and jobs to finish.
  3. Releases Resources: Closes database pools, Redis connections, and queues.
  4. Logs Shutdown Progress: Provides visibility into the shutdown process.

Signal Handling

Signal Source Behavior
SIGTERM Kubernetes, PM2 Graceful shutdown with resource cleanup
SIGINT Ctrl+C in terminal Same as SIGTERM
SIGKILL Force kill Cannot be caught; immediate termination

Implementation Details

Queue and Worker Shutdown

Located in src/services/queueService.server.ts:

import { logger } from './logger.server';

export const gracefulShutdown = async (signal: string): Promise<void> => {
  logger.info(`[Shutdown] Received ${signal}. Closing all queues and workers...`);

  const resources = [
    { name: 'flyerQueue', close: () => flyerQueue.close() },
    { name: 'emailQueue', close: () => emailQueue.close() },
    { name: 'analyticsQueue', close: () => analyticsQueue.close() },
    { name: 'weeklyAnalyticsQueue', close: () => weeklyAnalyticsQueue.close() },
    { name: 'cleanupQueue', close: () => cleanupQueue.close() },
    { name: 'tokenCleanupQueue', close: () => tokenCleanupQueue.close() },
    { name: 'redisConnection', close: () => connection.quit() },
  ];

  const results = await Promise.allSettled(
    resources.map(async (resource) => {
      try {
        await resource.close();
        logger.info(`[Shutdown] ${resource.name} closed successfully.`);
      } catch (error) {
        logger.error({ err: error }, `[Shutdown] Error closing ${resource.name}`);
        throw error;
      }
    }),
  );

  const failures = results.filter((r) => r.status === 'rejected');
  if (failures.length > 0) {
    logger.error(`[Shutdown] ${failures.length} resources failed to close.`);
  }

  logger.info('[Shutdown] All resources closed. Process can now exit.');
};

// Register signal handlers
process.on('SIGTERM', () => gracefulShutdown('SIGTERM'));
process.on('SIGINT', () => gracefulShutdown('SIGINT'));

HTTP Server Shutdown

Located in server.ts:

import { gracefulShutdown as shutdownQueues } from './src/services/queueService.server';
import { closePool } from './src/services/db/connection.db';

const server = app.listen(PORT, () => {
  logger.info(`Server listening on port ${PORT}`);
});

const gracefulShutdown = async (signal: string): Promise<void> => {
  logger.info(`[Shutdown] Received ${signal}. Starting graceful shutdown...`);

  // 1. Stop accepting new connections
  server.close((err) => {
    if (err) {
      logger.error({ err }, '[Shutdown] Error closing HTTP server');
    } else {
      logger.info('[Shutdown] HTTP server closed.');
    }
  });

  // 2. Wait for in-flight requests (with timeout)
  await new Promise((resolve) => setTimeout(resolve, 5000));

  // 3. Close queues and workers
  await shutdownQueues(signal);

  // 4. Close database pool
  await closePool();
  logger.info('[Shutdown] Database pool closed.');

  // 5. Exit process
  process.exit(0);
};

process.on('SIGTERM', () => gracefulShutdown('SIGTERM'));
process.on('SIGINT', () => gracefulShutdown('SIGINT'));

Database Pool Shutdown

Located in src/services/db/connection.db.ts:

let pool: Pool | null = null;

export function getPool(): Pool {
  if (!pool) {
    pool = new Pool({
      max: 20,
      idleTimeoutMillis: 30000,
      connectionTimeoutMillis: 2000,
    });
  }
  return pool;
}

export async function closePool(): Promise<void> {
  if (pool) {
    await pool.end();
    pool = null;
    logger.info('[Database] Connection pool closed.');
  }
}

export function getPoolStatus(): { totalCount: number; idleCount: number; waitingCount: number } {
  const p = getPool();
  return {
    totalCount: p.totalCount,
    idleCount: p.idleCount,
    waitingCount: p.waitingCount,
  };
}

PM2 Ecosystem Configuration

Located in ecosystem.config.cjs:

module.exports = {
  apps: [
    {
      name: 'flyer-crawler-api',
      script: 'server.ts',
      interpreter: 'tsx',

      // Graceful shutdown settings
      kill_timeout: 10000, // 10 seconds to cleanup before SIGKILL
      wait_ready: true, // Wait for 'ready' signal before considering app started
      listen_timeout: 10000, // Timeout for ready signal

      // Cluster mode for zero-downtime reloads
      instances: 1,
      exec_mode: 'fork',

      // Environment variables
      env_production: {
        NODE_ENV: 'production',
        PORT: 3000,
      },
      env_test: {
        NODE_ENV: 'test',
        PORT: 3001,
      },
    },
  ],
};

Worker Graceful Shutdown

BullMQ workers can be configured to wait for active jobs:

import { Worker } from 'bullmq';

const worker = new Worker('flyerQueue', processor, {
  connection,
  // Graceful shutdown: wait for active jobs before closing
  settings: {
    lockDuration: 30000, // Time before job is considered stalled
    stalledInterval: 5000, // Check for stalled jobs every 5s
  },
});

// Workers auto-close when connection closes
worker.on('closing', () => {
  logger.info('[Worker] flyerQueue worker is closing...');
});

worker.on('closed', () => {
  logger.info('[Worker] flyerQueue worker closed.');
});

Shutdown Sequence Diagram

SIGTERM Received
       │
       ▼
┌──────────────────────┐
│ Stop HTTP Server     │  ← No new connections accepted
│ (server.close())     │
└──────────────────────┘
       │
       ▼
┌──────────────────────┐
│ Wait for In-Flight   │  ← 5-second grace period
│ Requests             │
└──────────────────────┘
       │
       ▼
┌──────────────────────┐
│ Close BullMQ Queues  │  ← Stop processing new jobs
│ and Workers          │
└──────────────────────┘
       │
       ▼
┌──────────────────────┐
│ Close Redis          │  ← Disconnect from Redis
│ Connection           │
└──────────────────────┘
       │
       ▼
┌──────────────────────┐
│ Close Database Pool  │  ← Release all DB connections
│ (pool.end())         │
└──────────────────────┘
       │
       ▼
┌──────────────────────┐
│ process.exit(0)      │  ← Clean exit
└──────────────────────┘

Consequences

Positive

  • Zero Lost Work: In-flight requests and jobs complete before shutdown.
  • Clean Resource Cleanup: All connections are properly closed.
  • Zero-Downtime Deploys: PM2 can reload without dropping requests.
  • Observability: Shutdown progress is logged for debugging.

Negative

  • Shutdown Delay: Takes 5-15 seconds to fully shutdown.
  • Complexity: Multiple shutdown handlers must be coordinated.
  • Edge Cases: Very long-running jobs may be killed if they exceed the grace period.

Key Files

  • server.ts - HTTP server shutdown and signal handling
  • src/services/queueService.server.ts - Queue shutdown (gracefulShutdown)
  • src/services/db/connection.db.ts - Database pool shutdown (closePool)
  • ecosystem.config.cjs - PM2 configuration with kill_timeout