Some checks failed
Deploy to Test Environment / deploy-to-test (push) Failing after 16m58s
148 lines
4.8 KiB
Markdown
148 lines
4.8 KiB
Markdown
# ADR-029: Secret Rotation and Key Management Strategy
|
|
|
|
**Date**: 2026-01-09
|
|
|
|
**Status**: Proposed
|
|
|
|
## Context
|
|
|
|
While ADR-007 covers configuration validation at startup, it does not address the lifecycle management of secrets:
|
|
|
|
1. **JWT Secrets**: If the JWT_SECRET is rotated, all existing user sessions are immediately invalidated
|
|
2. **Database Credentials**: No documented procedure for rotating database passwords without downtime
|
|
3. **API Keys**: External service API keys (AI services, geocoding) have no rotation strategy
|
|
4. **Emergency Revocation**: No process for immediately invalidating compromised credentials
|
|
|
|
Current risks:
|
|
|
|
- Long-lived secrets that never change become high-value targets
|
|
- No ability to rotate secrets without application restart
|
|
- No audit trail of when secrets were last rotated
|
|
- Compromised keys could remain active indefinitely
|
|
|
|
## Decision
|
|
|
|
We will implement a comprehensive secret rotation and key management strategy.
|
|
|
|
### 1. JWT Secret Rotation with Dual-Key Support
|
|
|
|
Support multiple JWT secrets simultaneously to enable zero-downtime rotation:
|
|
|
|
```typescript
|
|
// Environment variables
|
|
JWT_SECRET = current_secret;
|
|
JWT_SECRET_PREVIOUS = old_secret; // Optional, for transition period
|
|
|
|
// Token verification tries current first, falls back to previous
|
|
const verifyToken = (token: string) => {
|
|
try {
|
|
return jwt.verify(token, process.env.JWT_SECRET);
|
|
} catch {
|
|
if (process.env.JWT_SECRET_PREVIOUS) {
|
|
return jwt.verify(token, process.env.JWT_SECRET_PREVIOUS);
|
|
}
|
|
throw new AuthenticationError('Invalid token');
|
|
}
|
|
};
|
|
```
|
|
|
|
### 2. Database Credential Rotation
|
|
|
|
Document and implement a procedure for PostgreSQL credential rotation:
|
|
|
|
1. Create new database user with identical permissions
|
|
2. Update application configuration to use new credentials
|
|
3. Restart application instances (rolling restart)
|
|
4. Remove old database user after all instances updated
|
|
5. Log rotation event for audit purposes
|
|
|
|
### 3. API Key Management
|
|
|
|
For external service API keys (Google AI, geocoding services):
|
|
|
|
1. **Naming Convention**: `{SERVICE}_API_KEY` and `{SERVICE}_API_KEY_PREVIOUS`
|
|
2. **Fallback Logic**: Try primary key, fall back to previous on 401/403
|
|
3. **Health Checks**: Validate API keys on startup
|
|
4. **Usage Logging**: Track which key is being used for each request
|
|
|
|
### 4. Emergency Revocation Procedures
|
|
|
|
Document emergency procedures for:
|
|
|
|
- **JWT Compromise**: Set new JWT_SECRET, clear all refresh tokens from database
|
|
- **Database Compromise**: Rotate credentials immediately, audit access logs
|
|
- **API Key Compromise**: Regenerate at provider, update environment, restart
|
|
|
|
### 5. Secret Audit Trail
|
|
|
|
Track secret lifecycle events:
|
|
|
|
- When secrets were last rotated
|
|
- Who initiated the rotation
|
|
- Which instances are using which secrets
|
|
|
|
## Implementation Approach
|
|
|
|
### Phase 1: Dual JWT Secret Support
|
|
|
|
- Modify token verification to support fallback secret
|
|
- Add JWT_SECRET_PREVIOUS to configuration schema
|
|
- Update documentation
|
|
|
|
### Phase 2: Rotation Scripts
|
|
|
|
- Create `scripts/rotate-jwt-secret.sh`
|
|
- Create `scripts/rotate-db-credentials.sh`
|
|
- Add rotation instructions to operations runbook
|
|
|
|
### Phase 3: API Key Fallback
|
|
|
|
- Wrap external API clients with fallback logic
|
|
- Add key validation to health checks
|
|
- Implement key usage logging
|
|
|
|
## Consequences
|
|
|
|
### Positive
|
|
|
|
- **Zero-Downtime Rotation**: Secrets can be rotated without invalidating all sessions
|
|
- **Reduced Risk**: Regular rotation limits exposure window for compromised credentials
|
|
- **Audit Trail**: Clear record of when secrets were changed
|
|
- **Emergency Response**: Documented procedures for security incidents
|
|
|
|
### Negative
|
|
|
|
- **Complexity**: Dual-key logic adds code complexity
|
|
- **Operations Overhead**: Regular rotation requires operational discipline
|
|
- **Testing**: Rotation procedures need to be tested periodically
|
|
|
|
## Implementation Status
|
|
|
|
### What's Implemented
|
|
|
|
- ❌ Not yet implemented
|
|
|
|
### What Needs To Be Done
|
|
|
|
1. Implement dual JWT secret verification
|
|
2. Create rotation scripts
|
|
3. Document emergency procedures
|
|
4. Add secret validation to health checks
|
|
5. Create rotation schedule recommendations
|
|
|
|
## Key Files (To Be Created)
|
|
|
|
- `src/utils/secretManager.ts` - Secret rotation utilities
|
|
- `scripts/rotate-jwt-secret.sh` - JWT rotation script
|
|
- `scripts/rotate-db-credentials.sh` - Database credential rotation
|
|
- `docs/operations/secret-rotation.md` - Operations runbook
|
|
|
|
## Rotation Schedule Recommendations
|
|
|
|
| Secret Type | Rotation Frequency | Grace Period |
|
|
| ------------------ | -------------------------- | ----------------- |
|
|
| JWT_SECRET | 90 days | 7 days (dual-key) |
|
|
| Database Passwords | 180 days | Rolling restart |
|
|
| AI API Keys | On suspicion of compromise | Immediate |
|
|
| Refresh Tokens | 7-day max age | N/A (per-token) |
|