integration test fixes - claude for the win? try 4 - i have a good feeling
Some checks failed
Deploy to Test Environment / deploy-to-test (push) Failing after 16m58s
Some checks failed
Deploy to Test Environment / deploy-to-test (push) Failing after 16m58s
This commit is contained in:
@@ -2,7 +2,9 @@
|
||||
|
||||
**Date**: 2025-12-12
|
||||
|
||||
**Status**: Proposed
|
||||
**Status**: Accepted
|
||||
|
||||
**Implemented**: 2026-01-09
|
||||
|
||||
## Context
|
||||
|
||||
@@ -20,3 +22,195 @@ We will implement dedicated health check endpoints in the Express application.
|
||||
|
||||
- **Positive**: Enables robust, automated application lifecycle management in a containerized environment. Prevents traffic from being sent to unhealthy or uninitialized application instances.
|
||||
- **Negative**: Adds a small amount of code for the health check endpoints. Requires configuration in the container orchestration layer.
|
||||
|
||||
## Implementation Status
|
||||
|
||||
### What's Implemented
|
||||
|
||||
- ✅ **Liveness Probe** (`/api/health/live`) - Simple process health check
|
||||
- ✅ **Readiness Probe** (`/api/health/ready`) - Comprehensive dependency health check
|
||||
- ✅ **Startup Probe** (`/api/health/startup`) - Initial startup verification
|
||||
- ✅ **Individual Service Checks** - Database, Redis, Storage endpoints
|
||||
- ✅ **Detailed Health Response** - Service latency, status, and details
|
||||
|
||||
## Implementation Details
|
||||
|
||||
### Probe Endpoints
|
||||
|
||||
| Endpoint | Purpose | Checks | HTTP Status |
|
||||
| --------------------- | --------------- | ------------------ | ----------------------------- |
|
||||
| `/api/health/live` | Liveness probe | Process running | 200 = alive |
|
||||
| `/api/health/ready` | Readiness probe | DB, Redis, Storage | 200 = ready, 503 = not ready |
|
||||
| `/api/health/startup` | Startup probe | Database only | 200 = started, 503 = starting |
|
||||
|
||||
### Liveness Probe
|
||||
|
||||
The liveness probe is intentionally simple with no external dependencies:
|
||||
|
||||
```typescript
|
||||
// GET /api/health/live
|
||||
{
|
||||
"status": "ok",
|
||||
"timestamp": "2026-01-09T12:00:00.000Z"
|
||||
}
|
||||
```
|
||||
|
||||
**Usage**: If this endpoint fails to respond, the container should be restarted.
|
||||
|
||||
### Readiness Probe
|
||||
|
||||
The readiness probe checks all critical dependencies:
|
||||
|
||||
```typescript
|
||||
// GET /api/health/ready
|
||||
{
|
||||
"status": "healthy", // healthy | degraded | unhealthy
|
||||
"timestamp": "2026-01-09T12:00:00.000Z",
|
||||
"uptime": 3600.5,
|
||||
"services": {
|
||||
"database": {
|
||||
"status": "healthy",
|
||||
"latency": 5,
|
||||
"details": {
|
||||
"totalConnections": 10,
|
||||
"idleConnections": 8,
|
||||
"waitingConnections": 0
|
||||
}
|
||||
},
|
||||
"redis": {
|
||||
"status": "healthy",
|
||||
"latency": 2
|
||||
},
|
||||
"storage": {
|
||||
"status": "healthy",
|
||||
"latency": 1,
|
||||
"details": {
|
||||
"path": "/var/www/.../flyer-images"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Status Logic**:
|
||||
|
||||
- `healthy` - All critical services (database, Redis) are healthy
|
||||
- `degraded` - Some non-critical issues (high connection wait, storage issues)
|
||||
- `unhealthy` - Critical service unavailable (returns 503)
|
||||
|
||||
### Startup Probe
|
||||
|
||||
The startup probe is used during container initialization:
|
||||
|
||||
```typescript
|
||||
// GET /api/health/startup
|
||||
// Success (200):
|
||||
{
|
||||
"status": "started",
|
||||
"timestamp": "2026-01-09T12:00:00.000Z",
|
||||
"database": { "status": "healthy", "latency": 5 }
|
||||
}
|
||||
|
||||
// Still starting (503):
|
||||
{
|
||||
"status": "starting",
|
||||
"message": "Waiting for database connection",
|
||||
"database": { "status": "unhealthy", "message": "..." }
|
||||
}
|
||||
```
|
||||
|
||||
### Individual Service Endpoints
|
||||
|
||||
For detailed diagnostics:
|
||||
|
||||
| Endpoint | Purpose |
|
||||
| ----------------------- | ------------------------------- |
|
||||
| `/api/health/ping` | Simple server responsiveness |
|
||||
| `/api/health/db-schema` | Verify database tables exist |
|
||||
| `/api/health/db-pool` | Database connection pool status |
|
||||
| `/api/health/redis` | Redis connectivity |
|
||||
| `/api/health/storage` | File storage accessibility |
|
||||
| `/api/health/time` | Server time synchronization |
|
||||
|
||||
## Kubernetes Configuration Example
|
||||
|
||||
```yaml
|
||||
apiVersion: v1
|
||||
kind: Pod
|
||||
spec:
|
||||
containers:
|
||||
- name: flyer-crawler
|
||||
livenessProbe:
|
||||
httpGet:
|
||||
path: /api/health/live
|
||||
port: 3001
|
||||
initialDelaySeconds: 10
|
||||
periodSeconds: 15
|
||||
failureThreshold: 3
|
||||
|
||||
readinessProbe:
|
||||
httpGet:
|
||||
path: /api/health/ready
|
||||
port: 3001
|
||||
initialDelaySeconds: 5
|
||||
periodSeconds: 10
|
||||
failureThreshold: 3
|
||||
|
||||
startupProbe:
|
||||
httpGet:
|
||||
path: /api/health/startup
|
||||
port: 3001
|
||||
initialDelaySeconds: 0
|
||||
periodSeconds: 5
|
||||
failureThreshold: 30 # Allow up to 150 seconds for startup
|
||||
```
|
||||
|
||||
## Docker Compose Configuration Example
|
||||
|
||||
```yaml
|
||||
services:
|
||||
api:
|
||||
image: flyer-crawler:latest
|
||||
healthcheck:
|
||||
test: ['CMD', 'curl', '-f', 'http://localhost:3001/api/health/ready']
|
||||
interval: 30s
|
||||
timeout: 10s
|
||||
retries: 3
|
||||
start_period: 40s
|
||||
```
|
||||
|
||||
## PM2 Configuration Example
|
||||
|
||||
For non-containerized deployments using PM2:
|
||||
|
||||
```javascript
|
||||
// ecosystem.config.js
|
||||
module.exports = {
|
||||
apps: [
|
||||
{
|
||||
name: 'flyer-crawler',
|
||||
script: 'dist/server.js',
|
||||
// PM2 will check this endpoint
|
||||
// and restart if it fails
|
||||
health_check: {
|
||||
url: 'http://localhost:3001/api/health/ready',
|
||||
interval: 30000,
|
||||
timeout: 10000,
|
||||
},
|
||||
},
|
||||
],
|
||||
};
|
||||
```
|
||||
|
||||
## Key Files
|
||||
|
||||
- `src/routes/health.routes.ts` - Health check endpoint implementations
|
||||
- `server.ts` - Health routes mounted at `/api/health`
|
||||
|
||||
## Service Health Thresholds
|
||||
|
||||
| Service | Healthy | Degraded | Unhealthy |
|
||||
| -------- | ---------------------- | ----------------------- | ------------------- |
|
||||
| Database | Responds to `SELECT 1` | > 3 waiting connections | Connection fails |
|
||||
| Redis | `PING` returns `PONG` | N/A | Connection fails |
|
||||
| Storage | Write access to path | N/A | Path not accessible |
|
||||
|
||||
Reference in New Issue
Block a user