Files
flyer-crawler.projectium.com/docs/adr/0029-secret-rotation-and-key-management.md
Torben Sorensen 4a04e478c4
Some checks failed
Deploy to Test Environment / deploy-to-test (push) Failing after 16m58s
integration test fixes - claude for the win? try 4 - i have a good feeling
2026-01-09 05:56:19 -08:00

148 lines
4.8 KiB
Markdown

# ADR-029: Secret Rotation and Key Management Strategy
**Date**: 2026-01-09
**Status**: Proposed
## Context
While ADR-007 covers configuration validation at startup, it does not address the lifecycle management of secrets:
1. **JWT Secrets**: If the JWT_SECRET is rotated, all existing user sessions are immediately invalidated
2. **Database Credentials**: No documented procedure for rotating database passwords without downtime
3. **API Keys**: External service API keys (AI services, geocoding) have no rotation strategy
4. **Emergency Revocation**: No process for immediately invalidating compromised credentials
Current risks:
- Long-lived secrets that never change become high-value targets
- No ability to rotate secrets without application restart
- No audit trail of when secrets were last rotated
- Compromised keys could remain active indefinitely
## Decision
We will implement a comprehensive secret rotation and key management strategy.
### 1. JWT Secret Rotation with Dual-Key Support
Support multiple JWT secrets simultaneously to enable zero-downtime rotation:
```typescript
// Environment variables
JWT_SECRET = current_secret;
JWT_SECRET_PREVIOUS = old_secret; // Optional, for transition period
// Token verification tries current first, falls back to previous
const verifyToken = (token: string) => {
try {
return jwt.verify(token, process.env.JWT_SECRET);
} catch {
if (process.env.JWT_SECRET_PREVIOUS) {
return jwt.verify(token, process.env.JWT_SECRET_PREVIOUS);
}
throw new AuthenticationError('Invalid token');
}
};
```
### 2. Database Credential Rotation
Document and implement a procedure for PostgreSQL credential rotation:
1. Create new database user with identical permissions
2. Update application configuration to use new credentials
3. Restart application instances (rolling restart)
4. Remove old database user after all instances updated
5. Log rotation event for audit purposes
### 3. API Key Management
For external service API keys (Google AI, geocoding services):
1. **Naming Convention**: `{SERVICE}_API_KEY` and `{SERVICE}_API_KEY_PREVIOUS`
2. **Fallback Logic**: Try primary key, fall back to previous on 401/403
3. **Health Checks**: Validate API keys on startup
4. **Usage Logging**: Track which key is being used for each request
### 4. Emergency Revocation Procedures
Document emergency procedures for:
- **JWT Compromise**: Set new JWT_SECRET, clear all refresh tokens from database
- **Database Compromise**: Rotate credentials immediately, audit access logs
- **API Key Compromise**: Regenerate at provider, update environment, restart
### 5. Secret Audit Trail
Track secret lifecycle events:
- When secrets were last rotated
- Who initiated the rotation
- Which instances are using which secrets
## Implementation Approach
### Phase 1: Dual JWT Secret Support
- Modify token verification to support fallback secret
- Add JWT_SECRET_PREVIOUS to configuration schema
- Update documentation
### Phase 2: Rotation Scripts
- Create `scripts/rotate-jwt-secret.sh`
- Create `scripts/rotate-db-credentials.sh`
- Add rotation instructions to operations runbook
### Phase 3: API Key Fallback
- Wrap external API clients with fallback logic
- Add key validation to health checks
- Implement key usage logging
## Consequences
### Positive
- **Zero-Downtime Rotation**: Secrets can be rotated without invalidating all sessions
- **Reduced Risk**: Regular rotation limits exposure window for compromised credentials
- **Audit Trail**: Clear record of when secrets were changed
- **Emergency Response**: Documented procedures for security incidents
### Negative
- **Complexity**: Dual-key logic adds code complexity
- **Operations Overhead**: Regular rotation requires operational discipline
- **Testing**: Rotation procedures need to be tested periodically
## Implementation Status
### What's Implemented
- ❌ Not yet implemented
### What Needs To Be Done
1. Implement dual JWT secret verification
2. Create rotation scripts
3. Document emergency procedures
4. Add secret validation to health checks
5. Create rotation schedule recommendations
## Key Files (To Be Created)
- `src/utils/secretManager.ts` - Secret rotation utilities
- `scripts/rotate-jwt-secret.sh` - JWT rotation script
- `scripts/rotate-db-credentials.sh` - Database credential rotation
- `docs/operations/secret-rotation.md` - Operations runbook
## Rotation Schedule Recommendations
| Secret Type | Rotation Frequency | Grace Period |
| ------------------ | -------------------------- | ----------------- |
| JWT_SECRET | 90 days | 7 days (dual-key) |
| Database Passwords | 180 days | Rolling restart |
| AI API Keys | On suspicion of compromise | Immediate |
| Refresh Tokens | 7-day max age | N/A (per-token) |