torbo/flyer-crawler.projectium.com

Fork 0

Files

Torben Sorensen 4a04e478c4

Deploy to Test Environment / deploy-to-test (push) Failing after 16m58s

Details

integration test fixes - claude for the win? try 4 - i have a good feeling

2026-01-09 05:56:19 -08:00

6.3 KiB

Raw Blame History

ADR-020: Health Checks and Liveness/Readiness Probes

Date: 2025-12-12

Status: Accepted

Implemented: 2026-01-09

Context

When the application is containerized (ADR-014), the container orchestrator (e.g., Kubernetes, Docker Swarm) needs a way to determine if the application is running correctly. Without this, it cannot manage application lifecycle events like restarts or rolling updates effectively.

Decision

We will implement dedicated health check endpoints in the Express application.

A Liveness Probe (/api/health/live) will return a 200 OK to indicate the server is running. If it fails, the orchestrator should restart the container.
A Readiness Probe (/api/health/ready) will return a 200 OK only if the application is ready to accept traffic (e.g., database connection is established). If it fails, the orchestrator will temporarily remove the container from the load balancer.

Consequences

Positive: Enables robust, automated application lifecycle management in a containerized environment. Prevents traffic from being sent to unhealthy or uninitialized application instances.
Negative: Adds a small amount of code for the health check endpoints. Requires configuration in the container orchestration layer.

Implementation Status

What's Implemented

✅ Liveness Probe (/api/health/live) - Simple process health check
✅ Readiness Probe (/api/health/ready) - Comprehensive dependency health check
✅ Startup Probe (/api/health/startup) - Initial startup verification
✅ Individual Service Checks - Database, Redis, Storage endpoints
✅ Detailed Health Response - Service latency, status, and details

Implementation Details

Probe Endpoints

Endpoint	Purpose	Checks	HTTP Status
`/api/health/live`	Liveness probe	Process running	200 = alive
`/api/health/ready`	Readiness probe	DB, Redis, Storage	200 = ready, 503 = not ready
`/api/health/startup`	Startup probe	Database only	200 = started, 503 = starting

Liveness Probe

The liveness probe is intentionally simple with no external dependencies:

// GET /api/health/live
{
  "status": "ok",
  "timestamp": "2026-01-09T12:00:00.000Z"
}

Usage: If this endpoint fails to respond, the container should be restarted.

Readiness Probe

The readiness probe checks all critical dependencies:

// GET /api/health/ready
{
  "status": "healthy",  // healthy | degraded | unhealthy
  "timestamp": "2026-01-09T12:00:00.000Z",
  "uptime": 3600.5,
  "services": {
    "database": {
      "status": "healthy",
      "latency": 5,
      "details": {
        "totalConnections": 10,
        "idleConnections": 8,
        "waitingConnections": 0
      }
    },
    "redis": {
      "status": "healthy",
      "latency": 2
    },
    "storage": {
      "status": "healthy",
      "latency": 1,
      "details": {
        "path": "/var/www/.../flyer-images"
      }
    }
  }
}

Status Logic:

healthy - All critical services (database, Redis) are healthy
degraded - Some non-critical issues (high connection wait, storage issues)
unhealthy - Critical service unavailable (returns 503)

Startup Probe

The startup probe is used during container initialization:

// GET /api/health/startup
// Success (200):
{
  "status": "started",
  "timestamp": "2026-01-09T12:00:00.000Z",
  "database": { "status": "healthy", "latency": 5 }
}

// Still starting (503):
{
  "status": "starting",
  "message": "Waiting for database connection",
  "database": { "status": "unhealthy", "message": "..." }
}

Individual Service Endpoints

For detailed diagnostics:

Endpoint	Purpose
`/api/health/ping`	Simple server responsiveness
`/api/health/db-schema`	Verify database tables exist
`/api/health/db-pool`	Database connection pool status
`/api/health/redis`	Redis connectivity
`/api/health/storage`	File storage accessibility
`/api/health/time`	Server time synchronization

Kubernetes Configuration Example

apiVersion: v1
kind: Pod
spec:
  containers:
    - name: flyer-crawler
      livenessProbe:
        httpGet:
          path: /api/health/live
          port: 3001
        initialDelaySeconds: 10
        periodSeconds: 15
        failureThreshold: 3

      readinessProbe:
        httpGet:
          path: /api/health/ready
          port: 3001
        initialDelaySeconds: 5
        periodSeconds: 10
        failureThreshold: 3

      startupProbe:
        httpGet:
          path: /api/health/startup
          port: 3001
        initialDelaySeconds: 0
        periodSeconds: 5
        failureThreshold: 30 # Allow up to 150 seconds for startup

Docker Compose Configuration Example

services:
  api:
    image: flyer-crawler:latest
    healthcheck:
      test: ['CMD', 'curl', '-f', 'http://localhost:3001/api/health/ready']
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 40s

PM2 Configuration Example

For non-containerized deployments using PM2:

// ecosystem.config.js
module.exports = {
  apps: [
    {
      name: 'flyer-crawler',
      script: 'dist/server.js',
      // PM2 will check this endpoint
      // and restart if it fails
      health_check: {
        url: 'http://localhost:3001/api/health/ready',
        interval: 30000,
        timeout: 10000,
      },
    },
  ],
};

Key Files

src/routes/health.routes.ts - Health check endpoint implementations
server.ts - Health routes mounted at /api/health

Service Health Thresholds

Service	Healthy	Degraded	Unhealthy
Database	Responds to `SELECT 1`	> 3 waiting connections	Connection fails
Redis	`PING` returns `PONG`	N/A	Connection fails
Storage	Write access to path	N/A	Path not accessible

6.3 KiB Raw Blame History