integration test fixes - claude for the win? try 4 - i have a good feeling

2026-01-09 05:55:55 -08:00
parent 1814469eb4
commit 4a04e478c4
20 changed files with 3006 additions and 570 deletions
--- a/docs/adr/0007-configuration-and-secrets-management.md
+++ b/docs/adr/0007-configuration-and-secrets-management.md
@@ -2,7 +2,9 @@

 **Date**: 2025-12-12

-**Status**: Proposed
+**Status**: Accepted
+
+**Implemented**: 2026-01-09

 ## Context

@@ -16,3 +18,216 @@ We will introduce a centralized, schema-validated configuration service. We will

 **Positive**: Improves application reliability and developer experience by catching configuration errors at startup rather than at runtime. Provides a single source of truth for all required configuration.
 **Negative**: Adds a small amount of boilerplate for defining the configuration schema. Requires a one-time effort to refactor all `process.env` access points to use the new configuration service.
+
+## Implementation Status
+
+### What's Implemented
+
+- ✅ **Centralized Configuration Schema** - Zod-based validation in `src/config/env.ts`
+- ✅ **Type-Safe Access** - Full TypeScript types for all configuration
+- ✅ **Fail-Fast Startup** - Clear error messages for missing/invalid config
+- ✅ **Environment Helpers** - `isProduction`, `isTest`, `isDevelopment` exports
+- ✅ **Service Configuration Helpers** - `isSmtpConfigured`, `isAiConfigured`, etc.
+
+### Migration Status
+
+- ⏳ Gradual migration of `process.env` access to `config.*` in progress
+- Legacy `process.env` access still works during transition
+
+## Implementation Details
+
+### Configuration Schema
+
+The configuration is organized into logical groups:
+
+```typescript
+import { config, isProduction, isTest } from './config/env';
+
+// Database
+config.database.host; // DB_HOST
+config.database.port; // DB_PORT (default: 5432)
+config.database.user; // DB_USER
+config.database.password; // DB_PASSWORD
+config.database.name; // DB_NAME
+
+// Redis
+config.redis.url; // REDIS_URL
+config.redis.password; // REDIS_PASSWORD (optional)
+
+// Authentication
+config.auth.jwtSecret; // JWT_SECRET (min 32 chars)
+config.auth.jwtSecretPrevious; // JWT_SECRET_PREVIOUS (for rotation)
+
+// SMTP (all optional - email degrades gracefully)
+config.smtp.host; // SMTP_HOST
+config.smtp.port; // SMTP_PORT (default: 587)
+config.smtp.user; // SMTP_USER
+config.smtp.pass; // SMTP_PASS
+config.smtp.secure; // SMTP_SECURE (default: false)
+config.smtp.fromEmail; // SMTP_FROM_EMAIL
+
+// AI Services
+config.ai.geminiApiKey; // GEMINI_API_KEY
+config.ai.geminiRpm; // GEMINI_RPM (default: 5)
+config.ai.priceQualityThreshold; // AI_PRICE_QUALITY_THRESHOLD (default: 0.5)
+
+// Google Services
+config.google.mapsApiKey; // GOOGLE_MAPS_API_KEY (optional)
+config.google.clientId; // GOOGLE_CLIENT_ID (optional)
+config.google.clientSecret; // GOOGLE_CLIENT_SECRET (optional)
+
+// Worker Configuration
+config.worker.concurrency; // WORKER_CONCURRENCY (default: 1)
+config.worker.lockDuration; // WORKER_LOCK_DURATION (default: 30000)
+config.worker.emailConcurrency; // EMAIL_WORKER_CONCURRENCY (default: 10)
+config.worker.analyticsConcurrency; // ANALYTICS_WORKER_CONCURRENCY (default: 1)
+config.worker.cleanupConcurrency; // CLEANUP_WORKER_CONCURRENCY (default: 10)
+config.worker.weeklyAnalyticsConcurrency; // WEEKLY_ANALYTICS_WORKER_CONCURRENCY (default: 1)
+
+// Server
+config.server.nodeEnv; // NODE_ENV (development/production/test)
+config.server.port; // PORT (default: 3001)
+config.server.frontendUrl; // FRONTEND_URL
+config.server.baseUrl; // BASE_URL
+config.server.storagePath; // STORAGE_PATH (default: /var/www/.../flyer-images)
+```
+
+### Convenience Helpers
+
+```typescript
+import { isProduction, isTest, isDevelopment, isSmtpConfigured } from './config/env';
+
+// Environment checks
+if (isProduction) {
+  // Production-only logic
+}
+
+// Service availability checks
+if (isSmtpConfigured) {
+  await sendEmail(...);
+} else {
+  logger.warn('Email not configured, skipping notification');
+}
+```
+
+### Fail-Fast Error Messages
+
+When configuration is invalid, the application exits with a clear error:
+
+```text
+╔════════════════════════════════════════════════════════════════╗
+║           CONFIGURATION ERROR - APPLICATION STARTUP            ║
+╚════════════════════════════════════════════════════════════════╝
+
+The following environment variables are missing or invalid:
+
+  - database.host: DB_HOST is required
+  - auth.jwtSecret: JWT_SECRET must be at least 32 characters for security
+
+Please check your .env file or environment configuration.
+See ADR-007 for the complete list of required environment variables.
+```
+
+### Usage Example
+
+```typescript
+// Before (direct process.env access)
+const pool = new Pool({
+  host: process.env.DB_HOST,
+  port: parseInt(process.env.DB_PORT || '5432', 10),
+  user: process.env.DB_USER,
+  password: process.env.DB_PASSWORD,
+  database: process.env.DB_NAME,
+});
+
+// After (type-safe config access)
+import { config } from './config/env';
+
+const pool = new Pool({
+  host: config.database.host,
+  port: config.database.port,
+  user: config.database.user,
+  password: config.database.password,
+  database: config.database.name,
+});
+```
+
+## Required Environment Variables
+
+### Critical (Application will not start without these)
+
+| Variable      | Description                                           |
+| ------------- | ----------------------------------------------------- |
+| `DB_HOST`     | PostgreSQL database host                              |
+| `DB_USER`     | PostgreSQL database user                              |
+| `DB_PASSWORD` | PostgreSQL database password                          |
+| `DB_NAME`     | PostgreSQL database name                              |
+| `REDIS_URL`   | Redis connection URL (e.g., `redis://localhost:6379`) |
+| `JWT_SECRET`  | JWT signing secret (minimum 32 characters)            |
+
+### Optional with Defaults
+
+| Variable                     | Default                   | Description                     |
+| ---------------------------- | ------------------------- | ------------------------------- |
+| `DB_PORT`                    | 5432                      | PostgreSQL port                 |
+| `PORT`                       | 3001                      | Server HTTP port                |
+| `NODE_ENV`                   | development               | Environment mode                |
+| `STORAGE_PATH`               | /var/www/.../flyer-images | File upload directory           |
+| `SMTP_PORT`                  | 587                       | SMTP server port                |
+| `SMTP_SECURE`                | false                     | Use TLS for SMTP                |
+| `GEMINI_RPM`                 | 5                         | Gemini API requests per minute  |
+| `AI_PRICE_QUALITY_THRESHOLD` | 0.5                       | AI extraction quality threshold |
+| `WORKER_CONCURRENCY`         | 1                         | Flyer processing concurrency    |
+| `WORKER_LOCK_DURATION`       | 30000                     | Worker lock duration (ms)       |
+
+### Optional (Feature-specific)
+
+| Variable              | Description                                 |
+| --------------------- | ------------------------------------------- |
+| `GEMINI_API_KEY`      | Google Gemini API key (enables AI features) |
+| `GOOGLE_MAPS_API_KEY` | Google Maps API key (enables geocoding)     |
+| `SMTP_HOST`           | SMTP server (enables email notifications)   |
+| `SMTP_USER`           | SMTP authentication username                |
+| `SMTP_PASS`           | SMTP authentication password                |
+| `SMTP_FROM_EMAIL`     | Sender email address                        |
+| `FRONTEND_URL`        | Frontend URL for email links                |
+| `JWT_SECRET_PREVIOUS` | Previous JWT secret for rotation (ADR-029)  |
+
+## Key Files
+
+- `src/config/env.ts` - Configuration schema and validation
+- `.env.example` - Template for required environment variables
+
+## Migration Guide
+
+To migrate existing `process.env` usage:
+
+1. Import the config:
+
+   ```typescript
+   import { config, isProduction } from '../config/env';
+   ```
+
+2. Replace direct access:
+
+   ```typescript
+   // Before
+   process.env.DB_HOST;
+   process.env.NODE_ENV === 'production';
+   parseInt(process.env.PORT || '3001', 10);
+
+   // After
+   config.database.host;
+   isProduction;
+   config.server.port;
+   ```
+
+3. Use service helpers for optional features:
+
+   ```typescript
+   import { isSmtpConfigured, isAiConfigured } from '../config/env';
+
+   if (isSmtpConfigured) {
+     // Email is available
+   }
+   ```
--- a/docs/adr/0020-health-checks-and-liveness-readiness-probes.md
+++ b/docs/adr/0020-health-checks-and-liveness-readiness-probes.md
@@ -2,7 +2,9 @@

 **Date**: 2025-12-12

-**Status**: Proposed
+**Status**: Accepted
+
+**Implemented**: 2026-01-09

 ## Context

@@ -20,3 +22,195 @@ We will implement dedicated health check endpoints in the Express application.

 - **Positive**: Enables robust, automated application lifecycle management in a containerized environment. Prevents traffic from being sent to unhealthy or uninitialized application instances.
 - **Negative**: Adds a small amount of code for the health check endpoints. Requires configuration in the container orchestration layer.
+
+## Implementation Status
+
+### What's Implemented
+
+- ✅ **Liveness Probe** (`/api/health/live`) - Simple process health check
+- ✅ **Readiness Probe** (`/api/health/ready`) - Comprehensive dependency health check
+- ✅ **Startup Probe** (`/api/health/startup`) - Initial startup verification
+- ✅ **Individual Service Checks** - Database, Redis, Storage endpoints
+- ✅ **Detailed Health Response** - Service latency, status, and details
+
+## Implementation Details
+
+### Probe Endpoints
+
+| Endpoint              | Purpose         | Checks             | HTTP Status                   |
+| --------------------- | --------------- | ------------------ | ----------------------------- |
+| `/api/health/live`    | Liveness probe  | Process running    | 200 = alive                   |
+| `/api/health/ready`   | Readiness probe | DB, Redis, Storage | 200 = ready, 503 = not ready  |
+| `/api/health/startup` | Startup probe   | Database only      | 200 = started, 503 = starting |
+
+### Liveness Probe
+
+The liveness probe is intentionally simple with no external dependencies:
+
+```typescript
+// GET /api/health/live
+{
+  "status": "ok",
+  "timestamp": "2026-01-09T12:00:00.000Z"
+}
+```
+
+**Usage**: If this endpoint fails to respond, the container should be restarted.
+
+### Readiness Probe
+
+The readiness probe checks all critical dependencies:
+
+```typescript
+// GET /api/health/ready
+{
+  "status": "healthy",  // healthy | degraded | unhealthy
+  "timestamp": "2026-01-09T12:00:00.000Z",
+  "uptime": 3600.5,
+  "services": {
+    "database": {
+      "status": "healthy",
+      "latency": 5,
+      "details": {
+        "totalConnections": 10,
+        "idleConnections": 8,
+        "waitingConnections": 0
+      }
+    },
+    "redis": {
+      "status": "healthy",
+      "latency": 2
+    },
+    "storage": {
+      "status": "healthy",
+      "latency": 1,
+      "details": {
+        "path": "/var/www/.../flyer-images"
+      }
+    }
+  }
+}
+```
+
+**Status Logic**:
+
+- `healthy` - All critical services (database, Redis) are healthy
+- `degraded` - Some non-critical issues (high connection wait, storage issues)
+- `unhealthy` - Critical service unavailable (returns 503)
+
+### Startup Probe
+
+The startup probe is used during container initialization:
+
+```typescript
+// GET /api/health/startup
+// Success (200):
+{
+  "status": "started",
+  "timestamp": "2026-01-09T12:00:00.000Z",
+  "database": { "status": "healthy", "latency": 5 }
+}
+
+// Still starting (503):
+{
+  "status": "starting",
+  "message": "Waiting for database connection",
+  "database": { "status": "unhealthy", "message": "..." }
+}
+```
+
+### Individual Service Endpoints
+
+For detailed diagnostics:
+
+| Endpoint                | Purpose                         |
+| ----------------------- | ------------------------------- |
+| `/api/health/ping`      | Simple server responsiveness    |
+| `/api/health/db-schema` | Verify database tables exist    |
+| `/api/health/db-pool`   | Database connection pool status |
+| `/api/health/redis`     | Redis connectivity              |
+| `/api/health/storage`   | File storage accessibility      |
+| `/api/health/time`      | Server time synchronization     |
+
+## Kubernetes Configuration Example
+
+```yaml
+apiVersion: v1
+kind: Pod
+spec:
+  containers:
+    - name: flyer-crawler
+      livenessProbe:
+        httpGet:
+          path: /api/health/live
+          port: 3001
+        initialDelaySeconds: 10
+        periodSeconds: 15
+        failureThreshold: 3
+
+      readinessProbe:
+        httpGet:
+          path: /api/health/ready
+          port: 3001
+        initialDelaySeconds: 5
+        periodSeconds: 10
+        failureThreshold: 3
+
+      startupProbe:
+        httpGet:
+          path: /api/health/startup
+          port: 3001
+        initialDelaySeconds: 0
+        periodSeconds: 5
+        failureThreshold: 30 # Allow up to 150 seconds for startup
+```
+
+## Docker Compose Configuration Example
+
+```yaml
+services:
+  api:
+    image: flyer-crawler:latest
+    healthcheck:
+      test: ['CMD', 'curl', '-f', 'http://localhost:3001/api/health/ready']
+      interval: 30s
+      timeout: 10s
+      retries: 3
+      start_period: 40s
+```
+
+## PM2 Configuration Example
+
+For non-containerized deployments using PM2:
+
+```javascript
+// ecosystem.config.js
+module.exports = {
+  apps: [
+    {
+      name: 'flyer-crawler',
+      script: 'dist/server.js',
+      // PM2 will check this endpoint
+      // and restart if it fails
+      health_check: {
+        url: 'http://localhost:3001/api/health/ready',
+        interval: 30000,
+        timeout: 10000,
+      },
+    },
+  ],
+};
+```
+
+## Key Files
+
+- `src/routes/health.routes.ts` - Health check endpoint implementations
+- `server.ts` - Health routes mounted at `/api/health`
+
+## Service Health Thresholds
+
+| Service  | Healthy                | Degraded                | Unhealthy           |
+| -------- | ---------------------- | ----------------------- | ------------------- |
+| Database | Responds to `SELECT 1` | > 3 waiting connections | Connection fails    |
+| Redis    | `PING` returns `PONG`  | N/A                     | Connection fails    |
+| Storage  | Write access to path   | N/A                     | Path not accessible |
--- a/docs/adr/0021-code-formatting-and-linting-unification.md
+++ b/docs/adr/0021-code-formatting-and-linting-unification.md
@@ -2,7 +2,9 @@

 **Date**: 2025-12-12

-**Status**: Proposed
+**Status**: Accepted
+
+**Implemented**: 2026-01-09

 ## Context

@@ -10,10 +12,171 @@ The project contains both frontend (React) and backend (Node.js) code. While lin

 ## Decision

-We will mandate the use of **Prettier** for automated code formatting and a unified **ESLint** configuration for code quality rules across both frontend and backend. This will be enforced automatically using a pre-commit hook managed by a tool like **Husky**.
+We will mandate the use of **Prettier** for automated code formatting and a unified **ESLint** configuration for code quality rules across both frontend and backend. This will be enforced automatically using a pre-commit hook managed by **Husky** and **lint-staged**.

 ## Consequences

 **Positive**: Improves developer experience and team velocity by automating code consistency. Reduces time spent on stylistic code review comments. Enhances code readability and maintainability.

 **Negative**: Requires an initial setup and configuration of Prettier, ESLint, and Husky. May require a one-time reformatting of the entire codebase.
+
+## Implementation Status
+
+### What's Implemented
+
+- ✅ **Prettier Configuration** - `.prettierrc` with consistent settings
+- ✅ **Prettier Ignore** - `.prettierignore` to exclude generated files
+- ✅ **ESLint Configuration** - `eslint.config.js` with TypeScript and React support
+- ✅ **ESLint + Prettier Integration** - `eslint-config-prettier` to avoid conflicts
+- ✅ **Husky Pre-commit Hooks** - Automatic enforcement on commit
+- ✅ **lint-staged** - Run linters only on staged files for performance
+
+## Implementation Details
+
+### Prettier Configuration
+
+The project uses a consistent Prettier configuration in `.prettierrc`:
+
+```json
+{
+  "semi": true,
+  "trailingComma": "all",
+  "singleQuote": true,
+  "printWidth": 100,
+  "tabWidth": 2,
+  "useTabs": false,
+  "endOfLine": "auto"
+}
+```
+
+### ESLint Configuration
+
+ESLint is configured with:
+
+- TypeScript support via `typescript-eslint`
+- React hooks rules via `eslint-plugin-react-hooks`
+- React Refresh support for HMR
+- Prettier compatibility via `eslint-config-prettier`
+
+```javascript
+// eslint.config.js (ESLint v9 flat config)
+import globals from 'globals';
+import tseslint from 'typescript-eslint';
+import pluginReact from 'eslint-plugin-react';
+import pluginReactHooks from 'eslint-plugin-react-hooks';
+import pluginReactRefresh from 'eslint-plugin-react-refresh';
+import eslintConfigPrettier from 'eslint-config-prettier';
+
+export default tseslint.config(
+  // ... configurations
+  eslintConfigPrettier, // Must be last to override formatting rules
+);
+```
+
+### Pre-commit Hook
+
+The pre-commit hook runs lint-staged automatically:
+
+```bash
+# .husky/pre-commit
+npx lint-staged
+```
+
+### lint-staged Configuration
+
+lint-staged runs appropriate tools based on file type:
+
+```json
+{
+  "*.{js,jsx,ts,tsx}": ["eslint --fix", "prettier --write"],
+  "*.{json,md,css,html,yml,yaml}": ["prettier --write"]
+}
+```
+
+### NPM Scripts
+
+| Script             | Description                                    |
+| ------------------ | ---------------------------------------------- |
+| `npm run format`   | Format all files with Prettier                 |
+| `npm run lint`     | Run ESLint on all TypeScript/JavaScript files  |
+| `npm run validate` | Run Prettier check + TypeScript check + ESLint |
+
+## Key Files
+
+| File                 | Purpose                          |
+| -------------------- | -------------------------------- |
+| `.prettierrc`        | Prettier configuration           |
+| `.prettierignore`    | Files to exclude from formatting |
+| `eslint.config.js`   | ESLint flat configuration (v9)   |
+| `.husky/pre-commit`  | Pre-commit hook script           |
+| `.lintstagedrc.json` | lint-staged configuration        |
+
+## Developer Workflow
+
+### Automatic Formatting on Commit
+
+When you commit changes:
+
+1. Husky intercepts the commit
+2. lint-staged identifies staged files
+3. ESLint fixes auto-fixable issues
+4. Prettier formats the code
+5. Changes are automatically staged
+6. Commit proceeds if no errors
+
+### Manual Formatting
+
+```bash
+# Format entire codebase
+npm run format
+
+# Check formatting without changes
+npx prettier --check .
+
+# Run ESLint
+npm run lint
+
+# Run all validation checks
+npm run validate
+```
+
+### IDE Integration
+
+For the best experience, configure your IDE:
+
+**VS Code** - Install extensions:
+
+- Prettier - Code formatter
+- ESLint
+
+Add to `.vscode/settings.json`:
+
+```json
+{
+  "editor.defaultFormatter": "esbenp.prettier-vscode",
+  "editor.formatOnSave": true,
+  "editor.codeActionsOnSave": {
+    "source.fixAll.eslint": "explicit"
+  }
+}
+```
+
+## Troubleshooting
+
+### "eslint --fix failed"
+
+ESLint may fail on unfixable errors. Review the output and manually fix the issues.
+
+### "prettier --write failed"
+
+Check for syntax errors in the file that prevent parsing.
+
+### Bypassing Hooks (Emergency)
+
+In rare cases, you may need to bypass hooks:
+
+```bash
+git commit --no-verify -m "emergency fix"
+```
+
+Use sparingly - the CI pipeline will still catch formatting issues.
--- a/docs/adr/0028-api-response-standardization.md
+++ b/docs/adr/0028-api-response-standardization.md
@@ -0,0 +1,149 @@
+# ADR-028: API Response Standardization and Envelope Pattern
+
+**Date**: 2026-01-09
+
+**Status**: Proposed
+
+## Context
+
+The API currently has inconsistent response formats across different endpoints:
+
+1. Some endpoints return raw data arrays (`[{...}, {...}]`)
+2. Some return wrapped objects (`{ data: [...] }`)
+3. Pagination is handled inconsistently (some use `page`/`limit`, others use `offset`/`count`)
+4. Error responses vary in structure between middleware and route handlers
+5. No standard for including metadata (pagination info, request timing, etc.)
+
+This inconsistency creates friction for:
+
+- Frontend developers who must handle multiple response formats
+- API documentation and client SDK generation
+- Implementing consistent error handling across the application
+- Future API versioning transitions
+
+## Decision
+
+We will adopt a standardized response envelope pattern for all API responses.
+
+### Success Response Format
+
+```typescript
+interface ApiSuccessResponse<T> {
+  success: true;
+  data: T;
+  meta?: {
+    // Pagination (when applicable)
+    pagination?: {
+      page: number;
+      limit: number;
+      total: number;
+      totalPages: number;
+      hasNextPage: boolean;
+      hasPrevPage: boolean;
+    };
+    // Timing
+    requestId?: string;
+    timestamp?: string;
+    duration?: number;
+  };
+}
+```
+
+### Error Response Format
+
+```typescript
+interface ApiErrorResponse {
+  success: false;
+  error: {
+    code: string; // Machine-readable error code (e.g., 'VALIDATION_ERROR')
+    message: string; // Human-readable message
+    details?: unknown; // Additional context (validation errors, etc.)
+  };
+  meta?: {
+    requestId?: string;
+    timestamp?: string;
+  };
+}
+```
+
+### Implementation Approach
+
+1. **Response Helper Functions**: Create utility functions in `src/utils/apiResponse.ts`:
+   - `sendSuccess(res, data, meta?)`
+   - `sendPaginated(res, data, pagination)`
+   - `sendError(res, code, message, details?, statusCode?)`
+
+2. **Error Handler Integration**: Update `errorHandler.ts` to use the standard error format
+
+3. **Gradual Migration**: Apply to new endpoints immediately, migrate existing endpoints incrementally
+
+4. **TypeScript Types**: Export response types for frontend consumption
+
+## Consequences
+
+### Positive
+
+- **Consistency**: All responses follow a predictable structure
+- **Type Safety**: Frontend can rely on consistent types
+- **Debugging**: Request IDs and timestamps aid in issue investigation
+- **Pagination**: Standardized pagination metadata reduces frontend complexity
+- **API Evolution**: Envelope pattern makes it easier to add fields without breaking changes
+
+### Negative
+
+- **Verbosity**: Responses are slightly larger due to envelope overhead
+- **Migration Effort**: Existing endpoints need updating
+- **Learning Curve**: Developers must learn and use the helper functions
+
+## Implementation Status
+
+### What's Implemented
+
+- ❌ Not yet implemented
+
+### What Needs To Be Done
+
+1. Create `src/utils/apiResponse.ts` with helper functions
+2. Create `src/types/api.ts` with response type definitions
+3. Update `errorHandler.ts` to use standard error format
+4. Create migration guide for existing endpoints
+5. Update 2-3 routes as examples
+6. Document pattern in this ADR
+
+## Example Usage
+
+```typescript
+// In a route handler
+router.get('/flyers', async (req, res, next) => {
+  try {
+    const { page = 1, limit = 20 } = req.query;
+    const { flyers, total } = await flyerService.getFlyers({ page, limit });
+
+    return sendPaginated(res, flyers, {
+      page,
+      limit,
+      total,
+    });
+  } catch (error) {
+    next(error);
+  }
+});
+
+// Response:
+// {
+//   "success": true,
+//   "data": [...],
+//   "meta": {
+//     "pagination": {
+//       "page": 1,
+//       "limit": 20,
+//       "total": 150,
+//       "totalPages": 8,
+//       "hasNextPage": true,
+//       "hasPrevPage": false
+//     },
+//     "requestId": "abc-123",
+//     "timestamp": "2026-01-09T12:00:00.000Z"
+//   }
+// }
+```
--- a/docs/adr/0029-secret-rotation-and-key-management.md
+++ b/docs/adr/0029-secret-rotation-and-key-management.md
@@ -0,0 +1,147 @@
+# ADR-029: Secret Rotation and Key Management Strategy
+
+**Date**: 2026-01-09
+
+**Status**: Proposed
+
+## Context
+
+While ADR-007 covers configuration validation at startup, it does not address the lifecycle management of secrets:
+
+1. **JWT Secrets**: If the JWT_SECRET is rotated, all existing user sessions are immediately invalidated
+2. **Database Credentials**: No documented procedure for rotating database passwords without downtime
+3. **API Keys**: External service API keys (AI services, geocoding) have no rotation strategy
+4. **Emergency Revocation**: No process for immediately invalidating compromised credentials
+
+Current risks:
+
+- Long-lived secrets that never change become high-value targets
+- No ability to rotate secrets without application restart
+- No audit trail of when secrets were last rotated
+- Compromised keys could remain active indefinitely
+
+## Decision
+
+We will implement a comprehensive secret rotation and key management strategy.
+
+### 1. JWT Secret Rotation with Dual-Key Support
+
+Support multiple JWT secrets simultaneously to enable zero-downtime rotation:
+
+```typescript
+// Environment variables
+JWT_SECRET = current_secret;
+JWT_SECRET_PREVIOUS = old_secret; // Optional, for transition period
+
+// Token verification tries current first, falls back to previous
+const verifyToken = (token: string) => {
+  try {
+    return jwt.verify(token, process.env.JWT_SECRET);
+  } catch {
+    if (process.env.JWT_SECRET_PREVIOUS) {
+      return jwt.verify(token, process.env.JWT_SECRET_PREVIOUS);
+    }
+    throw new AuthenticationError('Invalid token');
+  }
+};
+```
+
+### 2. Database Credential Rotation
+
+Document and implement a procedure for PostgreSQL credential rotation:
+
+1. Create new database user with identical permissions
+2. Update application configuration to use new credentials
+3. Restart application instances (rolling restart)
+4. Remove old database user after all instances updated
+5. Log rotation event for audit purposes
+
+### 3. API Key Management
+
+For external service API keys (Google AI, geocoding services):
+
+1. **Naming Convention**: `{SERVICE}_API_KEY` and `{SERVICE}_API_KEY_PREVIOUS`
+2. **Fallback Logic**: Try primary key, fall back to previous on 401/403
+3. **Health Checks**: Validate API keys on startup
+4. **Usage Logging**: Track which key is being used for each request
+
+### 4. Emergency Revocation Procedures
+
+Document emergency procedures for:
+
+- **JWT Compromise**: Set new JWT_SECRET, clear all refresh tokens from database
+- **Database Compromise**: Rotate credentials immediately, audit access logs
+- **API Key Compromise**: Regenerate at provider, update environment, restart
+
+### 5. Secret Audit Trail
+
+Track secret lifecycle events:
+
+- When secrets were last rotated
+- Who initiated the rotation
+- Which instances are using which secrets
+
+## Implementation Approach
+
+### Phase 1: Dual JWT Secret Support
+
+- Modify token verification to support fallback secret
+- Add JWT_SECRET_PREVIOUS to configuration schema
+- Update documentation
+
+### Phase 2: Rotation Scripts
+
+- Create `scripts/rotate-jwt-secret.sh`
+- Create `scripts/rotate-db-credentials.sh`
+- Add rotation instructions to operations runbook
+
+### Phase 3: API Key Fallback
+
+- Wrap external API clients with fallback logic
+- Add key validation to health checks
+- Implement key usage logging
+
+## Consequences
+
+### Positive
+
+- **Zero-Downtime Rotation**: Secrets can be rotated without invalidating all sessions
+- **Reduced Risk**: Regular rotation limits exposure window for compromised credentials
+- **Audit Trail**: Clear record of when secrets were changed
+- **Emergency Response**: Documented procedures for security incidents
+
+### Negative
+
+- **Complexity**: Dual-key logic adds code complexity
+- **Operations Overhead**: Regular rotation requires operational discipline
+- **Testing**: Rotation procedures need to be tested periodically
+
+## Implementation Status
+
+### What's Implemented
+
+- ❌ Not yet implemented
+
+### What Needs To Be Done
+
+1. Implement dual JWT secret verification
+2. Create rotation scripts
+3. Document emergency procedures
+4. Add secret validation to health checks
+5. Create rotation schedule recommendations
+
+## Key Files (To Be Created)
+
+- `src/utils/secretManager.ts` - Secret rotation utilities
+- `scripts/rotate-jwt-secret.sh` - JWT rotation script
+- `scripts/rotate-db-credentials.sh` - Database credential rotation
+- `docs/operations/secret-rotation.md` - Operations runbook
+
+## Rotation Schedule Recommendations
+
+| Secret Type        | Rotation Frequency         | Grace Period      |
+| ------------------ | -------------------------- | ----------------- |
+| JWT_SECRET         | 90 days                    | 7 days (dual-key) |
+| Database Passwords | 180 days                   | Rolling restart   |
+| AI API Keys        | On suspicion of compromise | Immediate         |
+| Refresh Tokens     | 7-day max age              | N/A (per-token)   |
--- a/docs/adr/0030-graceful-degradation-and-circuit-breaker.md
+++ b/docs/adr/0030-graceful-degradation-and-circuit-breaker.md
@@ -0,0 +1,150 @@
+# ADR-030: Graceful Degradation and Circuit Breaker Pattern
+
+**Date**: 2026-01-09
+
+**Status**: Proposed
+
+## Context
+
+The application depends on several external services:
+
+1. **AI Services** (Google Gemini) - For flyer item extraction
+2. **Redis** - For caching, rate limiting, and job queues
+3. **PostgreSQL** - Primary data store
+4. **Geocoding APIs** - For location services
+
+Currently, when these services fail:
+
+- AI failures may cause the entire upload to fail
+- Redis unavailability could crash the application or bypass rate limiting
+- No circuit breakers prevent repeated calls to failing services
+- No fallback behaviors are defined
+
+This creates fragility where a single service outage can cascade into application-wide failures.
+
+## Decision
+
+We will implement a graceful degradation strategy with circuit breakers for external service dependencies.
+
+### 1. Circuit Breaker Pattern
+
+Implement circuit breakers for external service calls using a library like `opossum`:
+
+```typescript
+import CircuitBreaker from 'opossum';
+
+const aiCircuitBreaker = new CircuitBreaker(callAiService, {
+  timeout: 30000, // 30 second timeout
+  errorThresholdPercentage: 50, // Open circuit at 50% failures
+  resetTimeout: 30000, // Try again after 30 seconds
+  volumeThreshold: 5, // Minimum calls before calculating error %
+});
+
+aiCircuitBreaker.on('open', () => {
+  logger.warn('AI service circuit breaker opened');
+});
+
+aiCircuitBreaker.on('halfOpen', () => {
+  logger.info('AI service circuit breaker half-open, testing...');
+});
+```
+
+### 2. Fallback Behaviors by Service
+
+| Service                | Fallback Behavior                        |
+| ---------------------- | ---------------------------------------- |
+| **Redis (Cache)**      | Skip cache, query database directly      |
+| **Redis (Rate Limit)** | Log warning, allow request (fail-open)   |
+| **Redis (Queues)**     | Queue to memory, process synchronously   |
+| **AI Service**         | Return partial results, queue for retry  |
+| **Geocoding**          | Return null location, allow manual entry |
+| **PostgreSQL**         | No fallback - critical dependency        |
+
+### 3. Health Status Aggregation
+
+Extend health checks (ADR-020) to report service-level health:
+
+```typescript
+// GET /api/health/ready response
+{
+  "status": "degraded",  // healthy | degraded | unhealthy
+  "services": {
+    "database": { "status": "healthy", "latency": 5 },
+    "redis": { "status": "healthy", "latency": 2 },
+    "ai": { "status": "degraded", "circuitState": "half-open" },
+    "geocoding": { "status": "healthy", "latency": 150 }
+  }
+}
+```
+
+### 4. Retry Strategies
+
+Define retry policies for transient failures:
+
+```typescript
+const retryConfig = {
+  ai: { maxRetries: 3, backoff: 'exponential', initialDelay: 1000 },
+  geocoding: { maxRetries: 2, backoff: 'linear', initialDelay: 500 },
+  database: { maxRetries: 3, backoff: 'exponential', initialDelay: 100 },
+};
+```
+
+## Implementation Approach
+
+### Phase 1: Redis Fallbacks
+
+- Wrap cache operations with try-catch (already partially done in cacheService)
+- Add fail-open for rate limiting when Redis is down
+- Log degraded state
+
+### Phase 2: AI Circuit Breaker
+
+- Wrap AI service calls with circuit breaker
+- Implement queue-for-retry on circuit open
+- Add manual fallback UI for failed extractions
+
+### Phase 3: Health Aggregation
+
+- Update health endpoints with service status
+- Add Prometheus-compatible metrics
+- Create dashboard for service health
+
+## Consequences
+
+### Positive
+
+- **Resilience**: Application continues functioning during partial outages
+- **User Experience**: Degraded but functional is better than complete failure
+- **Observability**: Clear visibility into service health
+- **Protection**: Circuit breakers prevent cascading failures
+
+### Negative
+
+- **Complexity**: Additional code for fallback logic
+- **Testing**: Requires testing failure scenarios
+- **Consistency**: Some operations may have different results during degradation
+
+## Implementation Status
+
+### What's Implemented
+
+- ✅ Cache operations fail gracefully (cacheService.server.ts)
+- ❌ Circuit breakers for AI services
+- ❌ Rate limit fail-open behavior
+- ❌ Health aggregation endpoint
+- ❌ Retry strategies with backoff
+
+### What Needs To Be Done
+
+1. Install and configure `opossum` circuit breaker library
+2. Wrap AI service calls with circuit breaker
+3. Add fail-open to rate limiting
+4. Extend health endpoints with service status
+5. Document degraded mode behaviors
+
+## Key Files
+
+- `src/utils/circuitBreaker.ts` - Circuit breaker configurations (to create)
+- `src/services/cacheService.server.ts` - Already has graceful fallbacks
+- `src/routes/health.routes.ts` - Health check endpoints (to extend)
+- `src/services/aiService.server.ts` - AI service wrapper (to wrap)
--- a/docs/adr/0031-data-retention-and-privacy-compliance.md
+++ b/docs/adr/0031-data-retention-and-privacy-compliance.md
@@ -0,0 +1,199 @@
+# ADR-031: Data Retention and Privacy Compliance (GDPR/CCPA)
+
+**Date**: 2026-01-09
+
+**Status**: Proposed
+
+## Context
+
+The application stores various types of user data:
+
+1. **User Accounts**: Email, password hash, profile information
+2. **Shopping Lists**: Personal shopping preferences and history
+3. **Watch Lists**: Tracked items and price alerts
+4. **Activity Logs**: User actions for analytics and debugging
+5. **Tracking Data**: Page views, interactions, feature usage
+
+Current gaps in privacy compliance:
+
+- **No Data Retention Policies**: Activity logs accumulate indefinitely
+- **No User Data Export**: Users cannot export their data (GDPR Article 20)
+- **No User Data Deletion**: No self-service account deletion (GDPR Article 17)
+- **No Cookie Consent**: Cookie usage not disclosed or consented
+- **No Privacy Policy Enforcement**: Privacy commitments not enforced in code
+
+These gaps create legal exposure for users in EU (GDPR) and California (CCPA).
+
+## Decision
+
+We will implement comprehensive data retention and privacy compliance features.
+
+### 1. Data Retention Policies
+
+| Data Type                 | Retention Period         | Deletion Method          |
+| ------------------------- | ------------------------ | ------------------------ |
+| **Activity Logs**         | 90 days                  | Automated cleanup job    |
+| **Tracking Events**       | 30 days                  | Automated cleanup job    |
+| **Deleted User Data**     | 30 days (soft delete)    | Hard delete after period |
+| **Expired Sessions**      | 7 days after expiry      | Token cleanup job        |
+| **Failed Login Attempts** | 24 hours                 | Automated cleanup        |
+| **Flyer Data**            | Indefinite (public data) | N/A                      |
+| **User Shopping Lists**   | Until account deletion   | With account             |
+| **User Watch Lists**      | Until account deletion   | With account             |
+
+### 2. User Data Export (Right to Portability)
+
+Implement `GET /api/users/me/export` endpoint:
+
+```typescript
+interface UserDataExport {
+  exportDate: string;
+  user: {
+    email: string;
+    created_at: string;
+    profile: ProfileData;
+  };
+  shoppingLists: ShoppingList[];
+  watchedItems: WatchedItem[];
+  priceAlerts: PriceAlert[];
+  achievements: Achievement[];
+  // Exclude: password hash, internal IDs, admin flags
+}
+```
+
+Export formats: JSON (primary), CSV (optional)
+
+### 3. User Data Deletion (Right to Erasure)
+
+Implement `DELETE /api/users/me` endpoint:
+
+1. **Soft Delete**: Mark account as deleted, anonymize PII
+2. **Grace Period**: 30 days to restore account
+3. **Hard Delete**: Permanently remove all user data after grace period
+4. **Audit Log**: Record deletion request (anonymized)
+
+Deletion cascade:
+
+- User account → Anonymize email/name
+- Shopping lists → Delete
+- Watch lists → Delete
+- Achievements → Delete
+- Activity logs → Anonymize user_id
+- Sessions/tokens → Delete immediately
+
+### 4. Cookie Consent
+
+Implement cookie consent banner:
+
+```typescript
+// Cookie categories
+enum CookieCategory {
+  ESSENTIAL = 'essential', // Always allowed (auth, CSRF)
+  FUNCTIONAL = 'functional', // Dark mode, preferences
+  ANALYTICS = 'analytics', // Usage tracking
+}
+
+// Store consent in localStorage and server-side
+interface CookieConsent {
+  essential: true; // Cannot be disabled
+  functional: boolean;
+  analytics: boolean;
+  consentDate: string;
+  consentVersion: string;
+}
+```
+
+### 5. Privacy Policy Enforcement
+
+Enforce privacy commitments in code:
+
+- Email addresses never logged in plaintext
+- Passwords never logged (already in pino redact config)
+- IP addresses anonymized after 7 days
+- Third-party data sharing requires explicit consent
+
+## Implementation Approach
+
+### Phase 1: Data Retention Jobs
+
+- Create retention cleanup job in background job service
+- Add activity_log retention (90 days)
+- Add tracking_events retention (30 days)
+
+### Phase 2: User Data Export
+
+- Create export endpoint
+- Implement data aggregation query
+- Add rate limiting (1 export per 24h)
+
+### Phase 3: Account Deletion
+
+- Implement soft delete with anonymization
+- Create hard delete cleanup job
+- Add account recovery endpoint
+
+### Phase 4: Cookie Consent
+
+- Create consent banner component
+- Store consent preferences
+- Gate analytics based on consent
+
+## Consequences
+
+### Positive
+
+- **Legal Compliance**: Meets GDPR and CCPA requirements
+- **User Trust**: Demonstrates commitment to privacy
+- **Data Hygiene**: Automatic cleanup prevents data bloat
+- **Reduced Liability**: Less data = less risk
+
+### Negative
+
+- **Implementation Effort**: Significant feature development
+- **Operational Complexity**: Deletion jobs need monitoring
+- **Feature Limitations**: Some features may be limited without consent
+
+## Implementation Status
+
+### What's Implemented
+
+- ✅ Token cleanup job exists (tokenCleanupQueue)
+- ❌ Activity log retention
+- ❌ User data export endpoint
+- ❌ Account deletion endpoint
+- ❌ Cookie consent banner
+- ❌ Data anonymization functions
+
+### What Needs To Be Done
+
+1. Add activity_log cleanup to background jobs
+2. Create `/api/users/me/export` endpoint
+3. Create `/api/users/me` DELETE endpoint with soft delete
+4. Implement cookie consent UI component
+5. Document data retention in privacy policy
+6. Add anonymization utility functions
+
+## Key Files (To Be Created/Modified)
+
+- `src/services/backgroundJobService.ts` - Add retention jobs
+- `src/routes/user.routes.ts` - Add export/delete endpoints
+- `src/services/privacyService.server.ts` - Data export/deletion logic
+- `src/components/CookieConsent.tsx` - Consent banner
+- `src/utils/anonymize.ts` - Data anonymization utilities
+
+## Compliance Checklist
+
+### GDPR Requirements
+
+- [ ] Article 15: Right of Access (data export)
+- [ ] Article 17: Right to Erasure (account deletion)
+- [ ] Article 20: Right to Data Portability (JSON export)
+- [ ] Article 7: Conditions for Consent (cookie consent)
+- [ ] Article 13: Information to be Provided (privacy policy)
+
+### CCPA Requirements
+
+- [ ] Right to Know (data export)
+- [ ] Right to Delete (account deletion)
+- [ ] Right to Opt-Out (cookie consent for analytics)
+- [ ] Non-Discrimination (no feature penalty for privacy choices)
--- a/docs/adr/index.md
+++ b/docs/adr/index.md
@@ -4,49 +4,55 @@ This directory contains a log of the architectural decisions made for the Flyer

 ## 1. Foundational / Core Infrastructure

-**[ADR-002](./0002-standardized-transaction-management.md)**: Standardized Transaction Management and Unit of Work Pattern (Proposed)
-**[ADR-007](./0007-configuration-and-secrets-management.md)**: Configuration and Secrets Management (Proposed)
-**[ADR-020](./0020-health-checks-and-liveness-readiness-probes.md)**: Health Checks and Liveness/Readiness Probes (Proposed)
+**[ADR-002](./0002-standardized-transaction-management.md)**: Standardized Transaction Management and Unit of Work Pattern (Accepted)
+**[ADR-007](./0007-configuration-and-secrets-management.md)**: Configuration and Secrets Management (Accepted)
+**[ADR-020](./0020-health-checks-and-liveness-readiness-probes.md)**: Health Checks and Liveness/Readiness Probes (Accepted)
+**[ADR-030](./0030-graceful-degradation-and-circuit-breaker.md)**: Graceful Degradation and Circuit Breaker Pattern (Proposed)

 ## 2. Data Management

-**[ADR-009](./0009-caching-strategy-for-read-heavy-operations.md)**: Caching Strategy for Read-Heavy Operations (Proposed)
+**[ADR-009](./0009-caching-strategy-for-read-heavy-operations.md)**: Caching Strategy for Read-Heavy Operations (Partially Implemented)
 **[ADR-013](./0013-database-schema-migration-strategy.md)**: Database Schema Migration Strategy (Proposed)
 **[ADR-019](./0019-data-backup-and-recovery-strategy.md)**: Data Backup and Recovery Strategy (Proposed)
 **[ADR-023](./0023-database-schema-migration-strategy.md)**: Database Schema Migration Strategy (Proposed)
+**[ADR-031](./0031-data-retention-and-privacy-compliance.md)**: Data Retention and Privacy Compliance (Proposed)

 ## 3. API & Integration

-**[ADR-003](./0003-standardized-input-validation-using-middleware.md)**: Standardized Input Validation using Middleware (Proposed)
+**[ADR-003](./0003-standardized-input-validation-using-middleware.md)**: Standardized Input Validation using Middleware (Accepted)
 **[ADR-008](./0008-api-versioning-strategy.md)**: API Versioning Strategy (Proposed)
 **[ADR-018](./0018-api-documentation-strategy.md)**: API Documentation Strategy (Proposed)
 **[ADR-022](./0022-real-time-notification-system.md)**: Real-time Notification System (Proposed)
+**[ADR-028](./0028-api-response-standardization.md)**: API Response Standardization and Envelope Pattern (Proposed)

 ## 4. Security & Compliance

 **[ADR-001](./0001-standardized-error-handling.md)**: Standardized Error Handling for Service and Repository Layers (Accepted)
 **[ADR-011](./0011-advanced-authorization-and-access-control-strategy.md)**: Advanced Authorization and Access Control Strategy (Proposed)
-**[ADR-016](./0016-api-security-hardening.md)**: API Security Hardening (Proposed)
+**[ADR-016](./0016-api-security-hardening.md)**: API Security Hardening (Accepted)
+**[ADR-029](./0029-secret-rotation-and-key-management.md)**: Secret Rotation and Key Management Strategy (Proposed)

 ## 5. Observability & Monitoring

-**[ADR-004](./0004-standardized-application-wide-structured-logging.md)**: Standardized Application-Wide Structured Logging (Proposed)
+**[ADR-004](./0004-standardized-application-wide-structured-logging.md)**: Standardized Application-Wide Structured Logging (Accepted)
 **[ADR-015](./0015-application-performance-monitoring-and-error-tracking.md)**: Application Performance Monitoring (APM) and Error Tracking (Proposed)

 ## 6. Deployment & Operations

-**[ADR-006](./0006-background-job-processing-and-task-queues.md)**: Background Job Processing and Task Queues (Proposed)
+**[ADR-006](./0006-background-job-processing-and-task-queues.md)**: Background Job Processing and Task Queues (Partially Implemented)
 **[ADR-014](./0014-containerization-and-deployment-strategy.md)**: Containerization and Deployment Strategy (Proposed)
 **[ADR-017](./0017-ci-cd-and-branching-strategy.md)**: CI/CD and Branching Strategy (Proposed)
 **[ADR-024](./0024-feature-flagging-strategy.md)**: Feature Flagging Strategy (Proposed)

 ## 7. Frontend / User Interface

-**[ADR-005](./0005-frontend-state-management-and-server-cache-strategy.md)**: Frontend State Management and Server Cache Strategy (Proposed)
-**[ADR-012](./0012-frontend-component-library-and-design-system.md)**: Frontend Component Library and Design System (Proposed)
+**[ADR-005](./0005-frontend-state-management-and-server-cache-strategy.md)**: Frontend State Management and Server Cache Strategy (Accepted)
+**[ADR-012](./0012-frontend-component-library-and-design-system.md)**: Frontend Component Library and Design System (Partially Implemented)
 **[ADR-025](./0025-internationalization-and-localization-strategy.md)**: Internationalization (i18n) and Localization (l10n) Strategy (Proposed)
+**[ADR-026](./0026-standardized-client-side-structured-logging.md)**: Standardized Client-Side Structured Logging (Proposed)

 ## 8. Development Workflow & Quality

-**[ADR-010](./0010-testing-strategy-and-standards.md)**: Testing Strategy and Standards (Proposed)
-**[ADR-021](./0021-code-formatting-and-linting-unification.md)**: Code Formatting and Linting Unification (Proposed)
+**[ADR-010](./0010-testing-strategy-and-standards.md)**: Testing Strategy and Standards (Accepted)
+**[ADR-021](./0021-code-formatting-and-linting-unification.md)**: Code Formatting and Linting Unification (Accepted)
+**[ADR-027](./0027-standardized-naming-convention-for-ai-and-database-types.md)**: Standardized Naming Convention for AI and Database Types (Accepted)