doc updates and test fixin

2026-01-22 11:17:06 -08:00
parent 9f7b821760
commit fac98f4c54
56 changed files with 11967 additions and 357 deletions
--- a/docs/subagents/SECURITY-DEBUG-GUIDE.md
+++ b/docs/subagents/SECURITY-DEBUG-GUIDE.md
@@ -0,0 +1,406 @@
+# Security and Debugging Subagent Guide
+
+This guide covers security and debugging-focused subagents:
+
+- **security-engineer**: Security audits, vulnerability scanning, OWASP, pentesting
+- **log-debug**: Production errors, observability, Bugsink/Sentry analysis
+- **code-reviewer**: Code quality, security review, best practices
+
+## The security-engineer Subagent
+
+### When to Use
+
+Use the **security-engineer** subagent when you need to:
+
+- Conduct security audits of code or features
+- Review authentication/authorization flows
+- Identify vulnerabilities (OWASP Top 10)
+- Review API security
+- Assess data protection measures
+- Plan security improvements
+
+### What security-engineer Knows
+
+The security-engineer subagent understands:
+
+- OWASP Top 10 vulnerabilities
+- Node.js/Express security best practices
+- JWT authentication security
+- SQL injection prevention
+- XSS and CSRF protection
+- Rate limiting strategies (ADR-032)
+- API security hardening (ADR-016)
+
+### Example Requests
+
+**Security Audit:**
+```
+"Use security-engineer to audit the user registration and
+login flow for security vulnerabilities. Check for common
+issues like credential stuffing, brute force, and session
+management problems."
+```
+
+**API Security Review:**
+```
+"Use security-engineer to review the flyer upload endpoint
+for security issues. Consider file type validation, size
+limits, malicious file handling, and authorization."
+```
+
+**Vulnerability Assessment:**
+```
+"Use security-engineer to assess our exposure to the OWASP
+Top 10 vulnerabilities. Identify any gaps in our current
+security measures."
+```
+
+### Security Checklist
+
+The security-engineer subagent uses this checklist:
+
+#### Authentication & Authorization
+- [ ] Password hashing with bcrypt (cost factor >= 10)
+- [ ] JWT tokens with appropriate expiration
+- [ ] Refresh token rotation
+- [ ] Session invalidation on password change
+- [ ] Role-based access control (RBAC)
+
+#### Input Validation
+- [ ] All user input validated with Zod schemas
+- [ ] SQL queries use parameterized statements
+- [ ] File uploads validated for type and size
+- [ ] Path traversal prevention
+
+#### Data Protection
+- [ ] Sensitive data encrypted at rest
+- [ ] HTTPS enforced in production
+- [ ] No secrets in source code
+- [ ] Proper error messages (no stack traces to users)
+
+#### Rate Limiting
+- [ ] Login attempts limited
+- [ ] API endpoints rate limited
+- [ ] File upload rate limited
+
+#### Headers & CORS
+- [ ] Security headers set (Helmet.js)
+- [ ] CORS configured appropriately
+- [ ] Content-Security-Policy defined
+
+### Security Patterns in This Project
+
+**Rate Limiting (ADR-032):**
+```typescript
+// src/config/rateLimiters.ts
+export const loginLimiter = rateLimit({
+  windowMs: 15 * 60 * 1000, // 15 minutes
+  max: 5, // 5 attempts per window
+  message: 'Too many login attempts',
+});
+```
+
+**Input Validation (ADR-003):**
+```typescript
+// src/middleware/validation.middleware.ts
+router.post(
+  '/register',
+  validateRequest(registerSchema),
+  async (req, res, next) => { ... }
+);
+```
+
+**Authentication (ADR-048):**
+```typescript
+// JWT with refresh tokens
+const accessToken = jwt.sign(payload, secret, { expiresIn: '15m' });
+const refreshToken = jwt.sign({ userId }, refreshSecret, { expiresIn: '7d' });
+```
+
+## The log-debug Subagent
+
+### When to Use
+
+Use the **log-debug** subagent when you need to:
+
+- Debug production errors
+- Analyze Bugsink/Sentry error reports
+- Investigate performance issues
+- Trace request flows through logs
+- Identify patterns in error occurrences
+
+### What log-debug Knows
+
+The log-debug subagent understands:
+
+- Pino structured logging
+- Bugsink/Sentry error tracking
+- Log aggregation with Logstash
+- PostgreSQL function observability (ADR-050)
+- Request tracing patterns
+- Error correlation
+
+### MCP Tools for Debugging
+
+The log-debug subagent can use MCP tools to access error tracking:
+
+```
+// Check Bugsink for production errors
+mcp__bugsink__list_projects()
+mcp__bugsink__list_issues({ project_id: 1 })
+mcp__bugsink__get_event({ event_id: "..." })
+mcp__bugsink__get_stacktrace({ event_id: "..." })
+
+// Check local dev errors
+mcp__localerrors__list_issues({ project_id: 1 })
+```
+
+### Example Requests
+
+**Production Error Investigation:**
+```
+"Use log-debug to investigate the spike in 500 errors on the
+flyer processing endpoint yesterday. Check Bugsink for error
+patterns and identify the root cause."
+```
+
+**Performance Analysis:**
+```
+"Use log-debug to analyze the slow response times on the deals
+page. Check logs for database query timing and identify any
+bottlenecks."
+```
+
+**Error Pattern Analysis:**
+```
+"Use log-debug to identify patterns in the authentication
+failures over the past week. Are they coming from specific
+IPs or affecting specific users?"
+```
+
+### Log Analysis Patterns
+
+**Structured Log Format (Pino):**
+```json
+{
+  "level": 50,
+  "time": 1704067200000,
+  "pid": 1234,
+  "hostname": "server1",
+  "module": "flyerService",
+  "requestId": "abc-123",
+  "userId": "user-456",
+  "msg": "Flyer processing failed",
+  "err": {
+    "type": "AIExtractionError",
+    "message": "Rate limit exceeded",
+    "stack": "..."
+  }
+}
+```
+
+**Request Tracing:**
+```typescript
+// Each request gets a unique ID for tracing
+app.use((req, res, next) => {
+  req.requestId = crypto.randomUUID();
+  req.log = logger.child({ requestId: req.requestId });
+  next();
+});
+```
+
+**Error Correlation:**
+- Same `requestId` across all logs for a request
+- Same `userId` for user-related errors
+- Same `flyerId` for flyer processing errors
+
+### Bugsink Error Tracking
+
+**Production Bugsink Projects:**
+
+| Project | ID | Purpose |
+|---------|-----|---------|
+| flyer-crawler-backend | 1 | Backend errors |
+| flyer-crawler-frontend | 2 | Frontend errors |
+| flyer-crawler-backend-test | 3 | Test backend |
+| flyer-crawler-frontend-test | 4 | Test frontend |
+| flyer-crawler-infrastructure | 5 | Infra errors |
+
+**Accessing Bugsink:**
+- Production: https://bugsink.projectium.com
+- Dev Container: http://localhost:8000
+
+### Log File Locations
+
+| Environment | Log Path |
+|-------------|----------|
+| Production | `/var/www/flyer-crawler.projectium.com/logs/app.log` |
+| Test | `/var/www/flyer-crawler-test.projectium.com/logs/app.log` |
+| Dev Container | `/app/logs/app.log` |
+
+## The code-reviewer Subagent
+
+### When to Use
+
+Use the **code-reviewer** subagent when you need to:
+
+- Review code quality before merging
+- Identify potential issues in implementations
+- Check adherence to project patterns
+- Review security implications
+- Assess test coverage
+
+### What code-reviewer Knows
+
+The code-reviewer subagent understands:
+
+- Project architecture patterns (ADRs)
+- Repository pattern standards (ADR-034)
+- Service layer architecture (ADR-035)
+- Testing standards (ADR-010)
+- TypeScript best practices
+- Security considerations
+
+### Example Requests
+
+**Code Review:**
+```
+"Use code-reviewer to review the changes in the shopping list
+feature branch. Check for adherence to project patterns,
+potential bugs, and security issues."
+```
+
+**Architecture Review:**
+```
+"Use code-reviewer to review the proposed changes to the
+caching layer. Does it follow our patterns? Are there
+potential issues with cache invalidation?"
+```
+
+**Security-Focused Review:**
+```
+"Use code-reviewer to review the new file upload handling
+code with a focus on security. Check for path traversal,
+file type validation, and size limits."
+```
+
+### Code Review Checklist
+
+The code-reviewer subagent checks:
+
+#### Code Quality
+- [ ] Follows TypeScript strict mode
+- [ ] No `any` types without justification
+- [ ] Proper error handling
+- [ ] Meaningful variable names
+- [ ] Appropriate comments
+
+#### Architecture
+- [ ] Follows layer separation (Routes -> Services -> Repositories)
+- [ ] Uses correct file naming conventions
+- [ ] Repository methods follow naming patterns
+- [ ] Transactions used for multi-operation changes
+
+#### Testing
+- [ ] New code has corresponding tests
+- [ ] Tests follow project patterns
+- [ ] Edge cases covered
+- [ ] Mocks used appropriately
+
+#### Security
+- [ ] Input validation present
+- [ ] Authorization checks in place
+- [ ] No secrets in code
+- [ ] Error messages don't leak information
+
+#### Performance
+- [ ] No obvious N+1 queries
+- [ ] Appropriate use of caching
+- [ ] Large data sets paginated
+- [ ] Expensive operations async/queued
+
+### Review Output Format
+
+```markdown
+## Code Review: [Feature/PR Name]
+
+### Summary
+Brief overview of the changes reviewed.
+
+### Issues Found
+
+#### Critical
+- **[File:Line]** Description of critical issue
+  - Impact: What could go wrong
+  - Suggestion: How to fix
+
+#### High Priority
+- **[File:Line]** Description
+
+#### Medium Priority
+- **[File:Line]** Description
+
+#### Low Priority / Suggestions
+- **[File:Line]** Description
+
+### Positive Observations
+- Good patterns followed
+- Well-tested areas
+- Clean implementations
+
+### Recommendations
+1. Priority items to address before merge
+2. Items for follow-up tickets
+```
+
+## Debugging Workflow
+
+### 1. Error Investigation
+
+```
+1. Identify the error in Bugsink
+   mcp__bugsink__list_issues({ project_id: 1, status: "unresolved" })
+
+2. Get error details
+   mcp__bugsink__get_issue({ issue_id: "..." })
+
+3. Get full stacktrace
+   mcp__bugsink__get_stacktrace({ event_id: "..." })
+
+4. Check for patterns across events
+   mcp__bugsink__list_events({ issue_id: "..." })
+```
+
+### 2. Log Correlation
+
+```bash
+# Find related logs by request ID
+grep "requestId\":\"abc-123\"" /var/www/flyer-crawler.projectium.com/logs/app.log
+
+# Find all errors in a time range
+jq 'select(.level >= 50 and .time >= 1704067200000)' app.log
+```
+
+### 3. Database Query Analysis
+
+```bash
+# Check slow query log
+tail -f /var/log/postgresql/postgresql-$(date +%Y-%m-%d).log | grep "duration:"
+```
+
+### 4. Root Cause Analysis
+
+- Correlate error timing with deployments
+- Check for resource constraints (memory, connections)
+- Review recent code changes
+- Check external service status
+
+## Related Documentation
+
+- [OVERVIEW.md](./OVERVIEW.md) - Subagent system overview
+- [DEVOPS-GUIDE.md](./DEVOPS-GUIDE.md) - Infrastructure debugging
+- [../adr/0016-api-security-hardening.md](../adr/0016-api-security-hardening.md) - Security ADR
+- [../adr/0032-rate-limiting-strategy.md](../adr/0032-rate-limiting-strategy.md) - Rate limiting
+- [../adr/0015-application-performance-monitoring-and-error-tracking.md](../adr/0015-application-performance-monitoring-and-error-tracking.md) - Monitoring ADR
+- [../adr/0050-postgresql-function-observability.md](../adr/0050-postgresql-function-observability.md) - Database observability
+- [../BARE-METAL-SETUP.md](../BARE-METAL-SETUP.md) - Production setup