integration test fixes - claude for the win? try 4 - i have a good feeling
Some checks failed
Deploy to Test Environment / deploy-to-test (push) Failing after 16m58s

This commit is contained in:
2026-01-09 05:55:55 -08:00
parent 1814469eb4
commit 4a04e478c4
20 changed files with 3006 additions and 570 deletions

View File

@@ -2,7 +2,9 @@
**Date**: 2025-12-12
**Status**: Proposed
**Status**: Accepted
**Implemented**: 2026-01-09
## Context
@@ -16,3 +18,216 @@ We will introduce a centralized, schema-validated configuration service. We will
**Positive**: Improves application reliability and developer experience by catching configuration errors at startup rather than at runtime. Provides a single source of truth for all required configuration.
**Negative**: Adds a small amount of boilerplate for defining the configuration schema. Requires a one-time effort to refactor all `process.env` access points to use the new configuration service.
## Implementation Status
### What's Implemented
-**Centralized Configuration Schema** - Zod-based validation in `src/config/env.ts`
-**Type-Safe Access** - Full TypeScript types for all configuration
-**Fail-Fast Startup** - Clear error messages for missing/invalid config
-**Environment Helpers** - `isProduction`, `isTest`, `isDevelopment` exports
-**Service Configuration Helpers** - `isSmtpConfigured`, `isAiConfigured`, etc.
### Migration Status
- ⏳ Gradual migration of `process.env` access to `config.*` in progress
- Legacy `process.env` access still works during transition
## Implementation Details
### Configuration Schema
The configuration is organized into logical groups:
```typescript
import { config, isProduction, isTest } from './config/env';
// Database
config.database.host; // DB_HOST
config.database.port; // DB_PORT (default: 5432)
config.database.user; // DB_USER
config.database.password; // DB_PASSWORD
config.database.name; // DB_NAME
// Redis
config.redis.url; // REDIS_URL
config.redis.password; // REDIS_PASSWORD (optional)
// Authentication
config.auth.jwtSecret; // JWT_SECRET (min 32 chars)
config.auth.jwtSecretPrevious; // JWT_SECRET_PREVIOUS (for rotation)
// SMTP (all optional - email degrades gracefully)
config.smtp.host; // SMTP_HOST
config.smtp.port; // SMTP_PORT (default: 587)
config.smtp.user; // SMTP_USER
config.smtp.pass; // SMTP_PASS
config.smtp.secure; // SMTP_SECURE (default: false)
config.smtp.fromEmail; // SMTP_FROM_EMAIL
// AI Services
config.ai.geminiApiKey; // GEMINI_API_KEY
config.ai.geminiRpm; // GEMINI_RPM (default: 5)
config.ai.priceQualityThreshold; // AI_PRICE_QUALITY_THRESHOLD (default: 0.5)
// Google Services
config.google.mapsApiKey; // GOOGLE_MAPS_API_KEY (optional)
config.google.clientId; // GOOGLE_CLIENT_ID (optional)
config.google.clientSecret; // GOOGLE_CLIENT_SECRET (optional)
// Worker Configuration
config.worker.concurrency; // WORKER_CONCURRENCY (default: 1)
config.worker.lockDuration; // WORKER_LOCK_DURATION (default: 30000)
config.worker.emailConcurrency; // EMAIL_WORKER_CONCURRENCY (default: 10)
config.worker.analyticsConcurrency; // ANALYTICS_WORKER_CONCURRENCY (default: 1)
config.worker.cleanupConcurrency; // CLEANUP_WORKER_CONCURRENCY (default: 10)
config.worker.weeklyAnalyticsConcurrency; // WEEKLY_ANALYTICS_WORKER_CONCURRENCY (default: 1)
// Server
config.server.nodeEnv; // NODE_ENV (development/production/test)
config.server.port; // PORT (default: 3001)
config.server.frontendUrl; // FRONTEND_URL
config.server.baseUrl; // BASE_URL
config.server.storagePath; // STORAGE_PATH (default: /var/www/.../flyer-images)
```
### Convenience Helpers
```typescript
import { isProduction, isTest, isDevelopment, isSmtpConfigured } from './config/env';
// Environment checks
if (isProduction) {
// Production-only logic
}
// Service availability checks
if (isSmtpConfigured) {
await sendEmail(...);
} else {
logger.warn('Email not configured, skipping notification');
}
```
### Fail-Fast Error Messages
When configuration is invalid, the application exits with a clear error:
```text
╔════════════════════════════════════════════════════════════════╗
║ CONFIGURATION ERROR - APPLICATION STARTUP ║
╚════════════════════════════════════════════════════════════════╝
The following environment variables are missing or invalid:
- database.host: DB_HOST is required
- auth.jwtSecret: JWT_SECRET must be at least 32 characters for security
Please check your .env file or environment configuration.
See ADR-007 for the complete list of required environment variables.
```
### Usage Example
```typescript
// Before (direct process.env access)
const pool = new Pool({
host: process.env.DB_HOST,
port: parseInt(process.env.DB_PORT || '5432', 10),
user: process.env.DB_USER,
password: process.env.DB_PASSWORD,
database: process.env.DB_NAME,
});
// After (type-safe config access)
import { config } from './config/env';
const pool = new Pool({
host: config.database.host,
port: config.database.port,
user: config.database.user,
password: config.database.password,
database: config.database.name,
});
```
## Required Environment Variables
### Critical (Application will not start without these)
| Variable | Description |
| ------------- | ----------------------------------------------------- |
| `DB_HOST` | PostgreSQL database host |
| `DB_USER` | PostgreSQL database user |
| `DB_PASSWORD` | PostgreSQL database password |
| `DB_NAME` | PostgreSQL database name |
| `REDIS_URL` | Redis connection URL (e.g., `redis://localhost:6379`) |
| `JWT_SECRET` | JWT signing secret (minimum 32 characters) |
### Optional with Defaults
| Variable | Default | Description |
| ---------------------------- | ------------------------- | ------------------------------- |
| `DB_PORT` | 5432 | PostgreSQL port |
| `PORT` | 3001 | Server HTTP port |
| `NODE_ENV` | development | Environment mode |
| `STORAGE_PATH` | /var/www/.../flyer-images | File upload directory |
| `SMTP_PORT` | 587 | SMTP server port |
| `SMTP_SECURE` | false | Use TLS for SMTP |
| `GEMINI_RPM` | 5 | Gemini API requests per minute |
| `AI_PRICE_QUALITY_THRESHOLD` | 0.5 | AI extraction quality threshold |
| `WORKER_CONCURRENCY` | 1 | Flyer processing concurrency |
| `WORKER_LOCK_DURATION` | 30000 | Worker lock duration (ms) |
### Optional (Feature-specific)
| Variable | Description |
| --------------------- | ------------------------------------------- |
| `GEMINI_API_KEY` | Google Gemini API key (enables AI features) |
| `GOOGLE_MAPS_API_KEY` | Google Maps API key (enables geocoding) |
| `SMTP_HOST` | SMTP server (enables email notifications) |
| `SMTP_USER` | SMTP authentication username |
| `SMTP_PASS` | SMTP authentication password |
| `SMTP_FROM_EMAIL` | Sender email address |
| `FRONTEND_URL` | Frontend URL for email links |
| `JWT_SECRET_PREVIOUS` | Previous JWT secret for rotation (ADR-029) |
## Key Files
- `src/config/env.ts` - Configuration schema and validation
- `.env.example` - Template for required environment variables
## Migration Guide
To migrate existing `process.env` usage:
1. Import the config:
```typescript
import { config, isProduction } from '../config/env';
```
2. Replace direct access:
```typescript
// Before
process.env.DB_HOST;
process.env.NODE_ENV === 'production';
parseInt(process.env.PORT || '3001', 10);
// After
config.database.host;
isProduction;
config.server.port;
```
3. Use service helpers for optional features:
```typescript
import { isSmtpConfigured, isAiConfigured } from '../config/env';
if (isSmtpConfigured) {
// Email is available
}
```

View File

@@ -2,7 +2,9 @@
**Date**: 2025-12-12
**Status**: Proposed
**Status**: Accepted
**Implemented**: 2026-01-09
## Context
@@ -20,3 +22,195 @@ We will implement dedicated health check endpoints in the Express application.
- **Positive**: Enables robust, automated application lifecycle management in a containerized environment. Prevents traffic from being sent to unhealthy or uninitialized application instances.
- **Negative**: Adds a small amount of code for the health check endpoints. Requires configuration in the container orchestration layer.
## Implementation Status
### What's Implemented
-**Liveness Probe** (`/api/health/live`) - Simple process health check
-**Readiness Probe** (`/api/health/ready`) - Comprehensive dependency health check
-**Startup Probe** (`/api/health/startup`) - Initial startup verification
-**Individual Service Checks** - Database, Redis, Storage endpoints
-**Detailed Health Response** - Service latency, status, and details
## Implementation Details
### Probe Endpoints
| Endpoint | Purpose | Checks | HTTP Status |
| --------------------- | --------------- | ------------------ | ----------------------------- |
| `/api/health/live` | Liveness probe | Process running | 200 = alive |
| `/api/health/ready` | Readiness probe | DB, Redis, Storage | 200 = ready, 503 = not ready |
| `/api/health/startup` | Startup probe | Database only | 200 = started, 503 = starting |
### Liveness Probe
The liveness probe is intentionally simple with no external dependencies:
```typescript
// GET /api/health/live
{
"status": "ok",
"timestamp": "2026-01-09T12:00:00.000Z"
}
```
**Usage**: If this endpoint fails to respond, the container should be restarted.
### Readiness Probe
The readiness probe checks all critical dependencies:
```typescript
// GET /api/health/ready
{
"status": "healthy", // healthy | degraded | unhealthy
"timestamp": "2026-01-09T12:00:00.000Z",
"uptime": 3600.5,
"services": {
"database": {
"status": "healthy",
"latency": 5,
"details": {
"totalConnections": 10,
"idleConnections": 8,
"waitingConnections": 0
}
},
"redis": {
"status": "healthy",
"latency": 2
},
"storage": {
"status": "healthy",
"latency": 1,
"details": {
"path": "/var/www/.../flyer-images"
}
}
}
}
```
**Status Logic**:
- `healthy` - All critical services (database, Redis) are healthy
- `degraded` - Some non-critical issues (high connection wait, storage issues)
- `unhealthy` - Critical service unavailable (returns 503)
### Startup Probe
The startup probe is used during container initialization:
```typescript
// GET /api/health/startup
// Success (200):
{
"status": "started",
"timestamp": "2026-01-09T12:00:00.000Z",
"database": { "status": "healthy", "latency": 5 }
}
// Still starting (503):
{
"status": "starting",
"message": "Waiting for database connection",
"database": { "status": "unhealthy", "message": "..." }
}
```
### Individual Service Endpoints
For detailed diagnostics:
| Endpoint | Purpose |
| ----------------------- | ------------------------------- |
| `/api/health/ping` | Simple server responsiveness |
| `/api/health/db-schema` | Verify database tables exist |
| `/api/health/db-pool` | Database connection pool status |
| `/api/health/redis` | Redis connectivity |
| `/api/health/storage` | File storage accessibility |
| `/api/health/time` | Server time synchronization |
## Kubernetes Configuration Example
```yaml
apiVersion: v1
kind: Pod
spec:
containers:
- name: flyer-crawler
livenessProbe:
httpGet:
path: /api/health/live
port: 3001
initialDelaySeconds: 10
periodSeconds: 15
failureThreshold: 3
readinessProbe:
httpGet:
path: /api/health/ready
port: 3001
initialDelaySeconds: 5
periodSeconds: 10
failureThreshold: 3
startupProbe:
httpGet:
path: /api/health/startup
port: 3001
initialDelaySeconds: 0
periodSeconds: 5
failureThreshold: 30 # Allow up to 150 seconds for startup
```
## Docker Compose Configuration Example
```yaml
services:
api:
image: flyer-crawler:latest
healthcheck:
test: ['CMD', 'curl', '-f', 'http://localhost:3001/api/health/ready']
interval: 30s
timeout: 10s
retries: 3
start_period: 40s
```
## PM2 Configuration Example
For non-containerized deployments using PM2:
```javascript
// ecosystem.config.js
module.exports = {
apps: [
{
name: 'flyer-crawler',
script: 'dist/server.js',
// PM2 will check this endpoint
// and restart if it fails
health_check: {
url: 'http://localhost:3001/api/health/ready',
interval: 30000,
timeout: 10000,
},
},
],
};
```
## Key Files
- `src/routes/health.routes.ts` - Health check endpoint implementations
- `server.ts` - Health routes mounted at `/api/health`
## Service Health Thresholds
| Service | Healthy | Degraded | Unhealthy |
| -------- | ---------------------- | ----------------------- | ------------------- |
| Database | Responds to `SELECT 1` | > 3 waiting connections | Connection fails |
| Redis | `PING` returns `PONG` | N/A | Connection fails |
| Storage | Write access to path | N/A | Path not accessible |

View File

@@ -2,7 +2,9 @@
**Date**: 2025-12-12
**Status**: Proposed
**Status**: Accepted
**Implemented**: 2026-01-09
## Context
@@ -10,10 +12,171 @@ The project contains both frontend (React) and backend (Node.js) code. While lin
## Decision
We will mandate the use of **Prettier** for automated code formatting and a unified **ESLint** configuration for code quality rules across both frontend and backend. This will be enforced automatically using a pre-commit hook managed by a tool like **Husky**.
We will mandate the use of **Prettier** for automated code formatting and a unified **ESLint** configuration for code quality rules across both frontend and backend. This will be enforced automatically using a pre-commit hook managed by **Husky** and **lint-staged**.
## Consequences
**Positive**: Improves developer experience and team velocity by automating code consistency. Reduces time spent on stylistic code review comments. Enhances code readability and maintainability.
**Negative**: Requires an initial setup and configuration of Prettier, ESLint, and Husky. May require a one-time reformatting of the entire codebase.
## Implementation Status
### What's Implemented
-**Prettier Configuration** - `.prettierrc` with consistent settings
-**Prettier Ignore** - `.prettierignore` to exclude generated files
-**ESLint Configuration** - `eslint.config.js` with TypeScript and React support
-**ESLint + Prettier Integration** - `eslint-config-prettier` to avoid conflicts
-**Husky Pre-commit Hooks** - Automatic enforcement on commit
-**lint-staged** - Run linters only on staged files for performance
## Implementation Details
### Prettier Configuration
The project uses a consistent Prettier configuration in `.prettierrc`:
```json
{
"semi": true,
"trailingComma": "all",
"singleQuote": true,
"printWidth": 100,
"tabWidth": 2,
"useTabs": false,
"endOfLine": "auto"
}
```
### ESLint Configuration
ESLint is configured with:
- TypeScript support via `typescript-eslint`
- React hooks rules via `eslint-plugin-react-hooks`
- React Refresh support for HMR
- Prettier compatibility via `eslint-config-prettier`
```javascript
// eslint.config.js (ESLint v9 flat config)
import globals from 'globals';
import tseslint from 'typescript-eslint';
import pluginReact from 'eslint-plugin-react';
import pluginReactHooks from 'eslint-plugin-react-hooks';
import pluginReactRefresh from 'eslint-plugin-react-refresh';
import eslintConfigPrettier from 'eslint-config-prettier';
export default tseslint.config(
// ... configurations
eslintConfigPrettier, // Must be last to override formatting rules
);
```
### Pre-commit Hook
The pre-commit hook runs lint-staged automatically:
```bash
# .husky/pre-commit
npx lint-staged
```
### lint-staged Configuration
lint-staged runs appropriate tools based on file type:
```json
{
"*.{js,jsx,ts,tsx}": ["eslint --fix", "prettier --write"],
"*.{json,md,css,html,yml,yaml}": ["prettier --write"]
}
```
### NPM Scripts
| Script | Description |
| ------------------ | ---------------------------------------------- |
| `npm run format` | Format all files with Prettier |
| `npm run lint` | Run ESLint on all TypeScript/JavaScript files |
| `npm run validate` | Run Prettier check + TypeScript check + ESLint |
## Key Files
| File | Purpose |
| -------------------- | -------------------------------- |
| `.prettierrc` | Prettier configuration |
| `.prettierignore` | Files to exclude from formatting |
| `eslint.config.js` | ESLint flat configuration (v9) |
| `.husky/pre-commit` | Pre-commit hook script |
| `.lintstagedrc.json` | lint-staged configuration |
## Developer Workflow
### Automatic Formatting on Commit
When you commit changes:
1. Husky intercepts the commit
2. lint-staged identifies staged files
3. ESLint fixes auto-fixable issues
4. Prettier formats the code
5. Changes are automatically staged
6. Commit proceeds if no errors
### Manual Formatting
```bash
# Format entire codebase
npm run format
# Check formatting without changes
npx prettier --check .
# Run ESLint
npm run lint
# Run all validation checks
npm run validate
```
### IDE Integration
For the best experience, configure your IDE:
**VS Code** - Install extensions:
- Prettier - Code formatter
- ESLint
Add to `.vscode/settings.json`:
```json
{
"editor.defaultFormatter": "esbenp.prettier-vscode",
"editor.formatOnSave": true,
"editor.codeActionsOnSave": {
"source.fixAll.eslint": "explicit"
}
}
```
## Troubleshooting
### "eslint --fix failed"
ESLint may fail on unfixable errors. Review the output and manually fix the issues.
### "prettier --write failed"
Check for syntax errors in the file that prevent parsing.
### Bypassing Hooks (Emergency)
In rare cases, you may need to bypass hooks:
```bash
git commit --no-verify -m "emergency fix"
```
Use sparingly - the CI pipeline will still catch formatting issues.

View File

@@ -0,0 +1,149 @@
# ADR-028: API Response Standardization and Envelope Pattern
**Date**: 2026-01-09
**Status**: Proposed
## Context
The API currently has inconsistent response formats across different endpoints:
1. Some endpoints return raw data arrays (`[{...}, {...}]`)
2. Some return wrapped objects (`{ data: [...] }`)
3. Pagination is handled inconsistently (some use `page`/`limit`, others use `offset`/`count`)
4. Error responses vary in structure between middleware and route handlers
5. No standard for including metadata (pagination info, request timing, etc.)
This inconsistency creates friction for:
- Frontend developers who must handle multiple response formats
- API documentation and client SDK generation
- Implementing consistent error handling across the application
- Future API versioning transitions
## Decision
We will adopt a standardized response envelope pattern for all API responses.
### Success Response Format
```typescript
interface ApiSuccessResponse<T> {
success: true;
data: T;
meta?: {
// Pagination (when applicable)
pagination?: {
page: number;
limit: number;
total: number;
totalPages: number;
hasNextPage: boolean;
hasPrevPage: boolean;
};
// Timing
requestId?: string;
timestamp?: string;
duration?: number;
};
}
```
### Error Response Format
```typescript
interface ApiErrorResponse {
success: false;
error: {
code: string; // Machine-readable error code (e.g., 'VALIDATION_ERROR')
message: string; // Human-readable message
details?: unknown; // Additional context (validation errors, etc.)
};
meta?: {
requestId?: string;
timestamp?: string;
};
}
```
### Implementation Approach
1. **Response Helper Functions**: Create utility functions in `src/utils/apiResponse.ts`:
- `sendSuccess(res, data, meta?)`
- `sendPaginated(res, data, pagination)`
- `sendError(res, code, message, details?, statusCode?)`
2. **Error Handler Integration**: Update `errorHandler.ts` to use the standard error format
3. **Gradual Migration**: Apply to new endpoints immediately, migrate existing endpoints incrementally
4. **TypeScript Types**: Export response types for frontend consumption
## Consequences
### Positive
- **Consistency**: All responses follow a predictable structure
- **Type Safety**: Frontend can rely on consistent types
- **Debugging**: Request IDs and timestamps aid in issue investigation
- **Pagination**: Standardized pagination metadata reduces frontend complexity
- **API Evolution**: Envelope pattern makes it easier to add fields without breaking changes
### Negative
- **Verbosity**: Responses are slightly larger due to envelope overhead
- **Migration Effort**: Existing endpoints need updating
- **Learning Curve**: Developers must learn and use the helper functions
## Implementation Status
### What's Implemented
- ❌ Not yet implemented
### What Needs To Be Done
1. Create `src/utils/apiResponse.ts` with helper functions
2. Create `src/types/api.ts` with response type definitions
3. Update `errorHandler.ts` to use standard error format
4. Create migration guide for existing endpoints
5. Update 2-3 routes as examples
6. Document pattern in this ADR
## Example Usage
```typescript
// In a route handler
router.get('/flyers', async (req, res, next) => {
try {
const { page = 1, limit = 20 } = req.query;
const { flyers, total } = await flyerService.getFlyers({ page, limit });
return sendPaginated(res, flyers, {
page,
limit,
total,
});
} catch (error) {
next(error);
}
});
// Response:
// {
// "success": true,
// "data": [...],
// "meta": {
// "pagination": {
// "page": 1,
// "limit": 20,
// "total": 150,
// "totalPages": 8,
// "hasNextPage": true,
// "hasPrevPage": false
// },
// "requestId": "abc-123",
// "timestamp": "2026-01-09T12:00:00.000Z"
// }
// }
```

View File

@@ -0,0 +1,147 @@
# ADR-029: Secret Rotation and Key Management Strategy
**Date**: 2026-01-09
**Status**: Proposed
## Context
While ADR-007 covers configuration validation at startup, it does not address the lifecycle management of secrets:
1. **JWT Secrets**: If the JWT_SECRET is rotated, all existing user sessions are immediately invalidated
2. **Database Credentials**: No documented procedure for rotating database passwords without downtime
3. **API Keys**: External service API keys (AI services, geocoding) have no rotation strategy
4. **Emergency Revocation**: No process for immediately invalidating compromised credentials
Current risks:
- Long-lived secrets that never change become high-value targets
- No ability to rotate secrets without application restart
- No audit trail of when secrets were last rotated
- Compromised keys could remain active indefinitely
## Decision
We will implement a comprehensive secret rotation and key management strategy.
### 1. JWT Secret Rotation with Dual-Key Support
Support multiple JWT secrets simultaneously to enable zero-downtime rotation:
```typescript
// Environment variables
JWT_SECRET = current_secret;
JWT_SECRET_PREVIOUS = old_secret; // Optional, for transition period
// Token verification tries current first, falls back to previous
const verifyToken = (token: string) => {
try {
return jwt.verify(token, process.env.JWT_SECRET);
} catch {
if (process.env.JWT_SECRET_PREVIOUS) {
return jwt.verify(token, process.env.JWT_SECRET_PREVIOUS);
}
throw new AuthenticationError('Invalid token');
}
};
```
### 2. Database Credential Rotation
Document and implement a procedure for PostgreSQL credential rotation:
1. Create new database user with identical permissions
2. Update application configuration to use new credentials
3. Restart application instances (rolling restart)
4. Remove old database user after all instances updated
5. Log rotation event for audit purposes
### 3. API Key Management
For external service API keys (Google AI, geocoding services):
1. **Naming Convention**: `{SERVICE}_API_KEY` and `{SERVICE}_API_KEY_PREVIOUS`
2. **Fallback Logic**: Try primary key, fall back to previous on 401/403
3. **Health Checks**: Validate API keys on startup
4. **Usage Logging**: Track which key is being used for each request
### 4. Emergency Revocation Procedures
Document emergency procedures for:
- **JWT Compromise**: Set new JWT_SECRET, clear all refresh tokens from database
- **Database Compromise**: Rotate credentials immediately, audit access logs
- **API Key Compromise**: Regenerate at provider, update environment, restart
### 5. Secret Audit Trail
Track secret lifecycle events:
- When secrets were last rotated
- Who initiated the rotation
- Which instances are using which secrets
## Implementation Approach
### Phase 1: Dual JWT Secret Support
- Modify token verification to support fallback secret
- Add JWT_SECRET_PREVIOUS to configuration schema
- Update documentation
### Phase 2: Rotation Scripts
- Create `scripts/rotate-jwt-secret.sh`
- Create `scripts/rotate-db-credentials.sh`
- Add rotation instructions to operations runbook
### Phase 3: API Key Fallback
- Wrap external API clients with fallback logic
- Add key validation to health checks
- Implement key usage logging
## Consequences
### Positive
- **Zero-Downtime Rotation**: Secrets can be rotated without invalidating all sessions
- **Reduced Risk**: Regular rotation limits exposure window for compromised credentials
- **Audit Trail**: Clear record of when secrets were changed
- **Emergency Response**: Documented procedures for security incidents
### Negative
- **Complexity**: Dual-key logic adds code complexity
- **Operations Overhead**: Regular rotation requires operational discipline
- **Testing**: Rotation procedures need to be tested periodically
## Implementation Status
### What's Implemented
- ❌ Not yet implemented
### What Needs To Be Done
1. Implement dual JWT secret verification
2. Create rotation scripts
3. Document emergency procedures
4. Add secret validation to health checks
5. Create rotation schedule recommendations
## Key Files (To Be Created)
- `src/utils/secretManager.ts` - Secret rotation utilities
- `scripts/rotate-jwt-secret.sh` - JWT rotation script
- `scripts/rotate-db-credentials.sh` - Database credential rotation
- `docs/operations/secret-rotation.md` - Operations runbook
## Rotation Schedule Recommendations
| Secret Type | Rotation Frequency | Grace Period |
| ------------------ | -------------------------- | ----------------- |
| JWT_SECRET | 90 days | 7 days (dual-key) |
| Database Passwords | 180 days | Rolling restart |
| AI API Keys | On suspicion of compromise | Immediate |
| Refresh Tokens | 7-day max age | N/A (per-token) |

View File

@@ -0,0 +1,150 @@
# ADR-030: Graceful Degradation and Circuit Breaker Pattern
**Date**: 2026-01-09
**Status**: Proposed
## Context
The application depends on several external services:
1. **AI Services** (Google Gemini) - For flyer item extraction
2. **Redis** - For caching, rate limiting, and job queues
3. **PostgreSQL** - Primary data store
4. **Geocoding APIs** - For location services
Currently, when these services fail:
- AI failures may cause the entire upload to fail
- Redis unavailability could crash the application or bypass rate limiting
- No circuit breakers prevent repeated calls to failing services
- No fallback behaviors are defined
This creates fragility where a single service outage can cascade into application-wide failures.
## Decision
We will implement a graceful degradation strategy with circuit breakers for external service dependencies.
### 1. Circuit Breaker Pattern
Implement circuit breakers for external service calls using a library like `opossum`:
```typescript
import CircuitBreaker from 'opossum';
const aiCircuitBreaker = new CircuitBreaker(callAiService, {
timeout: 30000, // 30 second timeout
errorThresholdPercentage: 50, // Open circuit at 50% failures
resetTimeout: 30000, // Try again after 30 seconds
volumeThreshold: 5, // Minimum calls before calculating error %
});
aiCircuitBreaker.on('open', () => {
logger.warn('AI service circuit breaker opened');
});
aiCircuitBreaker.on('halfOpen', () => {
logger.info('AI service circuit breaker half-open, testing...');
});
```
### 2. Fallback Behaviors by Service
| Service | Fallback Behavior |
| ---------------------- | ---------------------------------------- |
| **Redis (Cache)** | Skip cache, query database directly |
| **Redis (Rate Limit)** | Log warning, allow request (fail-open) |
| **Redis (Queues)** | Queue to memory, process synchronously |
| **AI Service** | Return partial results, queue for retry |
| **Geocoding** | Return null location, allow manual entry |
| **PostgreSQL** | No fallback - critical dependency |
### 3. Health Status Aggregation
Extend health checks (ADR-020) to report service-level health:
```typescript
// GET /api/health/ready response
{
"status": "degraded", // healthy | degraded | unhealthy
"services": {
"database": { "status": "healthy", "latency": 5 },
"redis": { "status": "healthy", "latency": 2 },
"ai": { "status": "degraded", "circuitState": "half-open" },
"geocoding": { "status": "healthy", "latency": 150 }
}
}
```
### 4. Retry Strategies
Define retry policies for transient failures:
```typescript
const retryConfig = {
ai: { maxRetries: 3, backoff: 'exponential', initialDelay: 1000 },
geocoding: { maxRetries: 2, backoff: 'linear', initialDelay: 500 },
database: { maxRetries: 3, backoff: 'exponential', initialDelay: 100 },
};
```
## Implementation Approach
### Phase 1: Redis Fallbacks
- Wrap cache operations with try-catch (already partially done in cacheService)
- Add fail-open for rate limiting when Redis is down
- Log degraded state
### Phase 2: AI Circuit Breaker
- Wrap AI service calls with circuit breaker
- Implement queue-for-retry on circuit open
- Add manual fallback UI for failed extractions
### Phase 3: Health Aggregation
- Update health endpoints with service status
- Add Prometheus-compatible metrics
- Create dashboard for service health
## Consequences
### Positive
- **Resilience**: Application continues functioning during partial outages
- **User Experience**: Degraded but functional is better than complete failure
- **Observability**: Clear visibility into service health
- **Protection**: Circuit breakers prevent cascading failures
### Negative
- **Complexity**: Additional code for fallback logic
- **Testing**: Requires testing failure scenarios
- **Consistency**: Some operations may have different results during degradation
## Implementation Status
### What's Implemented
- ✅ Cache operations fail gracefully (cacheService.server.ts)
- ❌ Circuit breakers for AI services
- ❌ Rate limit fail-open behavior
- ❌ Health aggregation endpoint
- ❌ Retry strategies with backoff
### What Needs To Be Done
1. Install and configure `opossum` circuit breaker library
2. Wrap AI service calls with circuit breaker
3. Add fail-open to rate limiting
4. Extend health endpoints with service status
5. Document degraded mode behaviors
## Key Files
- `src/utils/circuitBreaker.ts` - Circuit breaker configurations (to create)
- `src/services/cacheService.server.ts` - Already has graceful fallbacks
- `src/routes/health.routes.ts` - Health check endpoints (to extend)
- `src/services/aiService.server.ts` - AI service wrapper (to wrap)

View File

@@ -0,0 +1,199 @@
# ADR-031: Data Retention and Privacy Compliance (GDPR/CCPA)
**Date**: 2026-01-09
**Status**: Proposed
## Context
The application stores various types of user data:
1. **User Accounts**: Email, password hash, profile information
2. **Shopping Lists**: Personal shopping preferences and history
3. **Watch Lists**: Tracked items and price alerts
4. **Activity Logs**: User actions for analytics and debugging
5. **Tracking Data**: Page views, interactions, feature usage
Current gaps in privacy compliance:
- **No Data Retention Policies**: Activity logs accumulate indefinitely
- **No User Data Export**: Users cannot export their data (GDPR Article 20)
- **No User Data Deletion**: No self-service account deletion (GDPR Article 17)
- **No Cookie Consent**: Cookie usage not disclosed or consented
- **No Privacy Policy Enforcement**: Privacy commitments not enforced in code
These gaps create legal exposure for users in EU (GDPR) and California (CCPA).
## Decision
We will implement comprehensive data retention and privacy compliance features.
### 1. Data Retention Policies
| Data Type | Retention Period | Deletion Method |
| ------------------------- | ------------------------ | ------------------------ |
| **Activity Logs** | 90 days | Automated cleanup job |
| **Tracking Events** | 30 days | Automated cleanup job |
| **Deleted User Data** | 30 days (soft delete) | Hard delete after period |
| **Expired Sessions** | 7 days after expiry | Token cleanup job |
| **Failed Login Attempts** | 24 hours | Automated cleanup |
| **Flyer Data** | Indefinite (public data) | N/A |
| **User Shopping Lists** | Until account deletion | With account |
| **User Watch Lists** | Until account deletion | With account |
### 2. User Data Export (Right to Portability)
Implement `GET /api/users/me/export` endpoint:
```typescript
interface UserDataExport {
exportDate: string;
user: {
email: string;
created_at: string;
profile: ProfileData;
};
shoppingLists: ShoppingList[];
watchedItems: WatchedItem[];
priceAlerts: PriceAlert[];
achievements: Achievement[];
// Exclude: password hash, internal IDs, admin flags
}
```
Export formats: JSON (primary), CSV (optional)
### 3. User Data Deletion (Right to Erasure)
Implement `DELETE /api/users/me` endpoint:
1. **Soft Delete**: Mark account as deleted, anonymize PII
2. **Grace Period**: 30 days to restore account
3. **Hard Delete**: Permanently remove all user data after grace period
4. **Audit Log**: Record deletion request (anonymized)
Deletion cascade:
- User account → Anonymize email/name
- Shopping lists → Delete
- Watch lists → Delete
- Achievements → Delete
- Activity logs → Anonymize user_id
- Sessions/tokens → Delete immediately
### 4. Cookie Consent
Implement cookie consent banner:
```typescript
// Cookie categories
enum CookieCategory {
ESSENTIAL = 'essential', // Always allowed (auth, CSRF)
FUNCTIONAL = 'functional', // Dark mode, preferences
ANALYTICS = 'analytics', // Usage tracking
}
// Store consent in localStorage and server-side
interface CookieConsent {
essential: true; // Cannot be disabled
functional: boolean;
analytics: boolean;
consentDate: string;
consentVersion: string;
}
```
### 5. Privacy Policy Enforcement
Enforce privacy commitments in code:
- Email addresses never logged in plaintext
- Passwords never logged (already in pino redact config)
- IP addresses anonymized after 7 days
- Third-party data sharing requires explicit consent
## Implementation Approach
### Phase 1: Data Retention Jobs
- Create retention cleanup job in background job service
- Add activity_log retention (90 days)
- Add tracking_events retention (30 days)
### Phase 2: User Data Export
- Create export endpoint
- Implement data aggregation query
- Add rate limiting (1 export per 24h)
### Phase 3: Account Deletion
- Implement soft delete with anonymization
- Create hard delete cleanup job
- Add account recovery endpoint
### Phase 4: Cookie Consent
- Create consent banner component
- Store consent preferences
- Gate analytics based on consent
## Consequences
### Positive
- **Legal Compliance**: Meets GDPR and CCPA requirements
- **User Trust**: Demonstrates commitment to privacy
- **Data Hygiene**: Automatic cleanup prevents data bloat
- **Reduced Liability**: Less data = less risk
### Negative
- **Implementation Effort**: Significant feature development
- **Operational Complexity**: Deletion jobs need monitoring
- **Feature Limitations**: Some features may be limited without consent
## Implementation Status
### What's Implemented
- ✅ Token cleanup job exists (tokenCleanupQueue)
- ❌ Activity log retention
- ❌ User data export endpoint
- ❌ Account deletion endpoint
- ❌ Cookie consent banner
- ❌ Data anonymization functions
### What Needs To Be Done
1. Add activity_log cleanup to background jobs
2. Create `/api/users/me/export` endpoint
3. Create `/api/users/me` DELETE endpoint with soft delete
4. Implement cookie consent UI component
5. Document data retention in privacy policy
6. Add anonymization utility functions
## Key Files (To Be Created/Modified)
- `src/services/backgroundJobService.ts` - Add retention jobs
- `src/routes/user.routes.ts` - Add export/delete endpoints
- `src/services/privacyService.server.ts` - Data export/deletion logic
- `src/components/CookieConsent.tsx` - Consent banner
- `src/utils/anonymize.ts` - Data anonymization utilities
## Compliance Checklist
### GDPR Requirements
- [ ] Article 15: Right of Access (data export)
- [ ] Article 17: Right to Erasure (account deletion)
- [ ] Article 20: Right to Data Portability (JSON export)
- [ ] Article 7: Conditions for Consent (cookie consent)
- [ ] Article 13: Information to be Provided (privacy policy)
### CCPA Requirements
- [ ] Right to Know (data export)
- [ ] Right to Delete (account deletion)
- [ ] Right to Opt-Out (cookie consent for analytics)
- [ ] Non-Discrimination (no feature penalty for privacy choices)

View File

@@ -4,49 +4,55 @@ This directory contains a log of the architectural decisions made for the Flyer
## 1. Foundational / Core Infrastructure
**[ADR-002](./0002-standardized-transaction-management.md)**: Standardized Transaction Management and Unit of Work Pattern (Proposed)
**[ADR-007](./0007-configuration-and-secrets-management.md)**: Configuration and Secrets Management (Proposed)
**[ADR-020](./0020-health-checks-and-liveness-readiness-probes.md)**: Health Checks and Liveness/Readiness Probes (Proposed)
**[ADR-002](./0002-standardized-transaction-management.md)**: Standardized Transaction Management and Unit of Work Pattern (Accepted)
**[ADR-007](./0007-configuration-and-secrets-management.md)**: Configuration and Secrets Management (Accepted)
**[ADR-020](./0020-health-checks-and-liveness-readiness-probes.md)**: Health Checks and Liveness/Readiness Probes (Accepted)
**[ADR-030](./0030-graceful-degradation-and-circuit-breaker.md)**: Graceful Degradation and Circuit Breaker Pattern (Proposed)
## 2. Data Management
**[ADR-009](./0009-caching-strategy-for-read-heavy-operations.md)**: Caching Strategy for Read-Heavy Operations (Proposed)
**[ADR-009](./0009-caching-strategy-for-read-heavy-operations.md)**: Caching Strategy for Read-Heavy Operations (Partially Implemented)
**[ADR-013](./0013-database-schema-migration-strategy.md)**: Database Schema Migration Strategy (Proposed)
**[ADR-019](./0019-data-backup-and-recovery-strategy.md)**: Data Backup and Recovery Strategy (Proposed)
**[ADR-023](./0023-database-schema-migration-strategy.md)**: Database Schema Migration Strategy (Proposed)
**[ADR-031](./0031-data-retention-and-privacy-compliance.md)**: Data Retention and Privacy Compliance (Proposed)
## 3. API & Integration
**[ADR-003](./0003-standardized-input-validation-using-middleware.md)**: Standardized Input Validation using Middleware (Proposed)
**[ADR-003](./0003-standardized-input-validation-using-middleware.md)**: Standardized Input Validation using Middleware (Accepted)
**[ADR-008](./0008-api-versioning-strategy.md)**: API Versioning Strategy (Proposed)
**[ADR-018](./0018-api-documentation-strategy.md)**: API Documentation Strategy (Proposed)
**[ADR-022](./0022-real-time-notification-system.md)**: Real-time Notification System (Proposed)
**[ADR-028](./0028-api-response-standardization.md)**: API Response Standardization and Envelope Pattern (Proposed)
## 4. Security & Compliance
**[ADR-001](./0001-standardized-error-handling.md)**: Standardized Error Handling for Service and Repository Layers (Accepted)
**[ADR-011](./0011-advanced-authorization-and-access-control-strategy.md)**: Advanced Authorization and Access Control Strategy (Proposed)
**[ADR-016](./0016-api-security-hardening.md)**: API Security Hardening (Proposed)
**[ADR-016](./0016-api-security-hardening.md)**: API Security Hardening (Accepted)
**[ADR-029](./0029-secret-rotation-and-key-management.md)**: Secret Rotation and Key Management Strategy (Proposed)
## 5. Observability & Monitoring
**[ADR-004](./0004-standardized-application-wide-structured-logging.md)**: Standardized Application-Wide Structured Logging (Proposed)
**[ADR-004](./0004-standardized-application-wide-structured-logging.md)**: Standardized Application-Wide Structured Logging (Accepted)
**[ADR-015](./0015-application-performance-monitoring-and-error-tracking.md)**: Application Performance Monitoring (APM) and Error Tracking (Proposed)
## 6. Deployment & Operations
**[ADR-006](./0006-background-job-processing-and-task-queues.md)**: Background Job Processing and Task Queues (Proposed)
**[ADR-006](./0006-background-job-processing-and-task-queues.md)**: Background Job Processing and Task Queues (Partially Implemented)
**[ADR-014](./0014-containerization-and-deployment-strategy.md)**: Containerization and Deployment Strategy (Proposed)
**[ADR-017](./0017-ci-cd-and-branching-strategy.md)**: CI/CD and Branching Strategy (Proposed)
**[ADR-024](./0024-feature-flagging-strategy.md)**: Feature Flagging Strategy (Proposed)
## 7. Frontend / User Interface
**[ADR-005](./0005-frontend-state-management-and-server-cache-strategy.md)**: Frontend State Management and Server Cache Strategy (Proposed)
**[ADR-012](./0012-frontend-component-library-and-design-system.md)**: Frontend Component Library and Design System (Proposed)
**[ADR-005](./0005-frontend-state-management-and-server-cache-strategy.md)**: Frontend State Management and Server Cache Strategy (Accepted)
**[ADR-012](./0012-frontend-component-library-and-design-system.md)**: Frontend Component Library and Design System (Partially Implemented)
**[ADR-025](./0025-internationalization-and-localization-strategy.md)**: Internationalization (i18n) and Localization (l10n) Strategy (Proposed)
**[ADR-026](./0026-standardized-client-side-structured-logging.md)**: Standardized Client-Side Structured Logging (Proposed)
## 8. Development Workflow & Quality
**[ADR-010](./0010-testing-strategy-and-standards.md)**: Testing Strategy and Standards (Proposed)
**[ADR-021](./0021-code-formatting-and-linting-unification.md)**: Code Formatting and Linting Unification (Proposed)
**[ADR-010](./0010-testing-strategy-and-standards.md)**: Testing Strategy and Standards (Accepted)
**[ADR-021](./0021-code-formatting-and-linting-unification.md)**: Code Formatting and Linting Unification (Accepted)
**[ADR-027](./0027-standardized-naming-convention-for-ai-and-database-types.md)**: Standardized Naming Convention for AI and Database Types (Accepted)