6.3 KiB
ADR-020: Health Checks and Liveness/Readiness Probes
Date: 2025-12-12
Status: Accepted
Implemented: 2026-01-09
Context
When the application is containerized (ADR-014), the container orchestrator (e.g., Kubernetes, Docker Swarm) needs a way to determine if the application is running correctly. Without this, it cannot manage application lifecycle events like restarts or rolling updates effectively.
Decision
We will implement dedicated health check endpoints in the Express application.
-
A Liveness Probe (
/api/health/live) will return a200 OKto indicate the server is running. If it fails, the orchestrator should restart the container. -
A Readiness Probe (
/api/health/ready) will return a200 OKonly if the application is ready to accept traffic (e.g., database connection is established). If it fails, the orchestrator will temporarily remove the container from the load balancer.
Consequences
- Positive: Enables robust, automated application lifecycle management in a containerized environment. Prevents traffic from being sent to unhealthy or uninitialized application instances.
- Negative: Adds a small amount of code for the health check endpoints. Requires configuration in the container orchestration layer.
Implementation Status
What's Implemented
- ✅ Liveness Probe (
/api/health/live) - Simple process health check - ✅ Readiness Probe (
/api/health/ready) - Comprehensive dependency health check - ✅ Startup Probe (
/api/health/startup) - Initial startup verification - ✅ Individual Service Checks - Database, Redis, Storage endpoints
- ✅ Detailed Health Response - Service latency, status, and details
Implementation Details
Probe Endpoints
| Endpoint | Purpose | Checks | HTTP Status |
|---|---|---|---|
/api/health/live |
Liveness probe | Process running | 200 = alive |
/api/health/ready |
Readiness probe | DB, Redis, Storage | 200 = ready, 503 = not ready |
/api/health/startup |
Startup probe | Database only | 200 = started, 503 = starting |
Liveness Probe
The liveness probe is intentionally simple with no external dependencies:
// GET /api/health/live
{
"status": "ok",
"timestamp": "2026-01-09T12:00:00.000Z"
}
Usage: If this endpoint fails to respond, the container should be restarted.
Readiness Probe
The readiness probe checks all critical dependencies:
// GET /api/health/ready
{
"status": "healthy", // healthy | degraded | unhealthy
"timestamp": "2026-01-09T12:00:00.000Z",
"uptime": 3600.5,
"services": {
"database": {
"status": "healthy",
"latency": 5,
"details": {
"totalConnections": 10,
"idleConnections": 8,
"waitingConnections": 0
}
},
"redis": {
"status": "healthy",
"latency": 2
},
"storage": {
"status": "healthy",
"latency": 1,
"details": {
"path": "/var/www/.../flyer-images"
}
}
}
}
Status Logic:
healthy- All critical services (database, Redis) are healthydegraded- Some non-critical issues (high connection wait, storage issues)unhealthy- Critical service unavailable (returns 503)
Startup Probe
The startup probe is used during container initialization:
// GET /api/health/startup
// Success (200):
{
"status": "started",
"timestamp": "2026-01-09T12:00:00.000Z",
"database": { "status": "healthy", "latency": 5 }
}
// Still starting (503):
{
"status": "starting",
"message": "Waiting for database connection",
"database": { "status": "unhealthy", "message": "..." }
}
Individual Service Endpoints
For detailed diagnostics:
| Endpoint | Purpose |
|---|---|
/api/health/ping |
Simple server responsiveness |
/api/health/db-schema |
Verify database tables exist |
/api/health/db-pool |
Database connection pool status |
/api/health/redis |
Redis connectivity |
/api/health/storage |
File storage accessibility |
/api/health/time |
Server time synchronization |
Kubernetes Configuration Example
apiVersion: v1
kind: Pod
spec:
containers:
- name: flyer-crawler
livenessProbe:
httpGet:
path: /api/health/live
port: 3001
initialDelaySeconds: 10
periodSeconds: 15
failureThreshold: 3
readinessProbe:
httpGet:
path: /api/health/ready
port: 3001
initialDelaySeconds: 5
periodSeconds: 10
failureThreshold: 3
startupProbe:
httpGet:
path: /api/health/startup
port: 3001
initialDelaySeconds: 0
periodSeconds: 5
failureThreshold: 30 # Allow up to 150 seconds for startup
Docker Compose Configuration Example
services:
api:
image: flyer-crawler:latest
healthcheck:
test: ['CMD', 'curl', '-f', 'http://localhost:3001/api/health/ready']
interval: 30s
timeout: 10s
retries: 3
start_period: 40s
PM2 Configuration Example
For non-containerized deployments using PM2:
// ecosystem.config.js
module.exports = {
apps: [
{
name: 'flyer-crawler',
script: 'dist/server.js',
// PM2 will check this endpoint
// and restart if it fails
health_check: {
url: 'http://localhost:3001/api/health/ready',
interval: 30000,
timeout: 10000,
},
},
],
};
Key Files
src/routes/health.routes.ts- Health check endpoint implementationsserver.ts- Health routes mounted at/api/health
Service Health Thresholds
| Service | Healthy | Degraded | Unhealthy |
|---|---|---|---|
| Database | Responds to SELECT 1 |
> 3 waiting connections | Connection fails |
| Redis | PING returns PONG |
N/A | Connection fails |
| Storage | Write access to path | N/A | Path not accessible |