Compare commits

...

2 Commits

Author SHA1 Message Date
Gitea Actions
d06a1952a0 ci: Bump version to 0.14.1 [skip ci] 2026-02-12 17:37:36 +05:00
4d323a51ca fix tour / whats new collision
All checks were successful
Deploy to Test Environment / deploy-to-test (push) Successful in 49m39s
2026-02-12 04:29:43 -08:00
11 changed files with 2621 additions and 4 deletions

View File

@@ -0,0 +1,517 @@
# ADR-0042: Browser Test Performance Optimization
**Status**: Accepted
**Date**: 2026-02-10
**Authors**: Claude Code AI Agent
## Context
### Current State
The stock-alert project has 64 Playwright browser tests across 5 spec files taking approximately 240 seconds (~4 minutes) to execute. Analysis reveals three major performance bottlenecks:
| Metric | Count | Impact |
| ----------------------------------- | ----- | -------------------------------------------- |
| Hardcoded `waitForTimeout()` calls | 66 | ~120s cumulative wait time |
| Redundant login calls per test | 43 | ~2-3s each = 86-129s overhead |
| Visual regression tests blocking CI | 4 | Cannot run in parallel with functional tests |
### Test Distribution
| File | Tests | `waitForTimeout` Calls | `login()` Calls |
| ------------------- | ------ | ---------------------- | ------------------------ |
| `dashboard.spec.js` | 10 | 8 | 10 |
| `alerts.spec.js` | 14 | 25 | 1 (beforeEach) |
| `gaps.spec.js` | 20 | 29 | 1 (beforeEach) |
| `login.spec.js` | 11 | 4 | 0 (tests login itself) |
| `visual.spec.js` | 4 | 0 | 4 (via navigateWithAuth) |
| **Total** | **59** | **66** | **16 patterns** |
### Root Causes
1. **Anti-Pattern: Hardcoded Timeouts**
- `waitForTimeout(2000)` used to "wait for data to load"
- Unnecessarily slow on fast systems, flaky on slow systems
- No correlation to actual page readiness
2. **Anti-Pattern: Per-Test Authentication**
- Each test navigates to `/login`, enters password, submits
- Session cookie persists across requests but not across tests
- `beforeEach` login adds 2-3 seconds per test
3. **Architecture: Mixed Test Types**
- Visual regression tests require different infrastructure (baseline images)
- Functional tests and visual tests compete for worker slots
- Cannot optimize CI parallelization
### Requirements
1. Reduce test suite runtime by 40-55%
2. Improve test determinism (eliminate flakiness)
3. Maintain test coverage and reliability
4. Enable parallel CI execution where possible
5. Document patterns for other projects
## Decision
Implement three optimization phases:
### Phase 1: Event-Based Wait Replacement (Primary Impact: ~50% of time savings)
Replace all 66 `waitForTimeout()` calls with Playwright's event-based waiting APIs.
**Replacement Patterns:**
| Current Pattern | Replacement | Rationale |
| --------------------------------------- | ------------------------------------------------- | ----------------------------- |
| `waitForTimeout(2000)` after navigation | `waitForLoadState('networkidle')` | Waits for network quiescence |
| `waitForTimeout(1000)` after click | `waitForSelector('.result')` | Waits for specific DOM change |
| `waitForTimeout(3000)` for charts | `waitForSelector('canvas', { state: 'visible' })` | Waits for chart render |
| `waitForTimeout(500)` for viewport | `waitForFunction(() => ...)` | Waits for layout reflow |
**Implementation Examples:**
```javascript
// BEFORE: Hardcoded timeout
await page.goto('/alerts');
await page.waitForTimeout(2000);
const rows = await page.locator('tbody tr').count();
// AFTER: Event-based wait
await page.goto('/alerts');
await page.waitForLoadState('networkidle');
await page.waitForSelector('tbody tr', { state: 'attached' });
const rows = await page.locator('tbody tr').count();
```
```javascript
// BEFORE: Hardcoded timeout after action
await page.click('#runCheckBtn');
await page.waitForTimeout(2000);
// AFTER: Wait for response
const [response] = await Promise.all([
page.waitForResponse((resp) => resp.url().includes('/api/check')),
page.click('#runCheckBtn'),
]);
```
**Helper Function Addition to `helpers.js`:**
```javascript
/**
* Waits for page to be fully loaded with data.
* Replaces hardcoded waitForTimeout calls.
*/
async function waitForPageReady(page, options = {}) {
const { dataSelector = null, networkIdle = true, minTime = 0 } = options;
const promises = [];
if (networkIdle) {
promises.push(page.waitForLoadState('networkidle'));
}
if (dataSelector) {
promises.push(page.waitForSelector(dataSelector, { state: 'visible' }));
}
if (minTime > 0) {
promises.push(page.waitForTimeout(minTime)); // Escape hatch for animations
}
await Promise.all(promises);
}
```
**Estimated Time Savings:** 60-80 seconds (eliminates ~120s of cumulative waits, but event waits have overhead)
### Phase 2: Global Authentication Setup (Primary Impact: ~35% of time savings)
Share authenticated session across all tests using Playwright's global setup feature.
**Architecture:**
```
┌──────────────────┐
│ global-setup.js │
│ │
│ 1. Login once │
│ 2. Save storage │
└────────┬─────────┘
┌──────────────────────┼──────────────────────┐
│ │ │
▼ ▼ ▼
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ dashboard.spec │ │ alerts.spec │ │ gaps.spec │
│ (reuses auth) │ │ (reuses auth) │ │ (reuses auth) │
└─────────────────┘ └─────────────────┘ └─────────────────┘
```
**Implementation Files:**
**`tests/browser/global-setup.js`:**
```javascript
const { chromium } = require('@playwright/test');
const path = require('path');
const authFile = path.join(__dirname, '.auth', 'user.json');
module.exports = async function globalSetup() {
const browser = await chromium.launch();
const page = await browser.newPage();
// Only perform login if authentication is enabled
if (process.env.DASHBOARD_PASSWORD) {
await page.goto(process.env.PLAYWRIGHT_BASE_URL || 'http://localhost:8980');
// Perform login
await page.goto('/login');
await page.fill('#password', process.env.DASHBOARD_PASSWORD);
await page.click('button[type="submit"]');
await page.waitForURL('/');
// Save authentication state
await page.context().storageState({ path: authFile });
}
await browser.close();
};
```
**`playwright.config.js` Updates:**
```javascript
module.exports = defineConfig({
// ... existing config ...
// Global setup runs once before all tests
globalSetup: require.resolve('./tests/browser/global-setup.js'),
projects: [
{
name: 'chromium',
use: {
...devices['Desktop Chrome'],
// Reuse authentication state from global setup
storageState: './tests/browser/.auth/user.json',
},
},
],
});
```
**Test File Updates:**
```javascript
// BEFORE: Login in beforeEach
test.beforeEach(async ({ page }) => {
page.consoleErrors = captureConsoleErrors(page);
if (isAuthEnabled()) {
await login(page);
}
});
// AFTER: Remove login (handled by global setup)
test.beforeEach(async ({ page }) => {
page.consoleErrors = captureConsoleErrors(page);
// Authentication already applied via storageState
});
```
**Estimated Time Savings:** 80-100 seconds (43 logins x ~2-3s each, minus 3s for global setup)
### Phase 3: Visual Test Separation (Primary Impact: CI parallelization)
Separate visual regression tests into a dedicated project for parallel CI execution.
**Project Configuration:**
```javascript
// playwright.config.js
module.exports = defineConfig({
projects: [
// Functional tests - fast, event-based
{
name: 'functional',
testMatch: /^(?!.*visual).*\.spec\.js$/,
use: {
...devices['Desktop Chrome'],
storageState: './tests/browser/.auth/user.json',
},
},
// Visual tests - separate baseline management
{
name: 'visual',
testMatch: '**/visual.spec.js',
use: {
...devices['Desktop Chrome'],
storageState: './tests/browser/.auth/user.json',
},
// Different snapshot handling
snapshotPathTemplate: '{testDir}/__screenshots__/{projectName}/{testFilePath}/{arg}{ext}',
},
],
});
```
**CI Pipeline Updates:**
```yaml
# .gitea/workflows/test.yml
jobs:
browser-functional:
runs-on: ubuntu-latest
steps:
- run: npx playwright test --project=functional
browser-visual:
runs-on: ubuntu-latest
steps:
- run: npx playwright test --project=visual
```
**Estimated Time Savings:** 30-45 seconds (parallel execution vs sequential)
## Implementation Schedule
### Critical Path (Estimated 8-12 hours)
```
Phase 1 (Event Waits) ████████████████ [4-6h]
Phase 2 (Global Auth) ████████ [2-3h]
Phase 3 (Visual Separation) ████ [2-3h]
```
### Effort Summary
| Phase | Min Hours | Max Hours | Expected Savings |
| ------------------------ | --------- | --------- | --------------------- |
| 1. Event-Based Waits | 4 | 6 | 60-80s (25-33%) |
| 2. Global Authentication | 2 | 3 | 80-100s (33-42%) |
| 3. Visual Separation | 2 | 3 | 30-45s (CI parallel) |
| **Total** | **8** | **12** | **170-225s (70-94%)** |
### Expected Results
| Metric | Before | After | Improvement |
| ------------------ | ------ | --------- | ------------- |
| Total Runtime | 240s | 110-140s | 42-54% faster |
| Flaky Test Rate | ~5% | <1% | 80% reduction |
| CI Parallelization | None | 2 workers | 2x throughput |
| Login Operations | 43 | 1 | 98% reduction |
| Hardcoded Waits | 66 | <5 | 92% reduction |
## Consequences
### Positive
1. **Performance**: 40-55% reduction in test runtime
2. **Reliability**: Event-based waits eliminate timing flakiness
3. **Scalability**: Global setup pattern scales to N tests with O(1) login cost
4. **CI Efficiency**: Parallel visual tests enable faster feedback loops
5. **Maintainability**: Centralized auth logic reduces code duplication
6. **Transferable Knowledge**: Patterns applicable to any Playwright project
### Negative
1. **Initial Migration Effort**: 8-12 hours of refactoring
2. **Learning Curve**: Team must understand Playwright wait APIs
3. **Global Setup Complexity**: Adds shared state between tests
4. **Debugging Harder**: Shared auth can mask test isolation issues
### Mitigations
| Risk | Mitigation |
| ------------------ | ------------------------------------------------------------- |
| Global setup fails | Add retry logic; fallback to per-test login |
| Event waits flaky | Keep small timeout buffer (100ms) as escape hatch |
| Visual tests drift | Separate baseline management per environment |
| Test isolation | Run `--project=functional` without auth for isolation testing |
### Neutral
- Test count unchanged (59 tests)
- Coverage unchanged
- Visual baselines unchanged (path changes only)
## Alternatives Considered
### Alternative 1: Reduce Test Count
**Rejected:** Sacrifices coverage for speed. Tests exist for a reason.
### Alternative 2: Increase Worker Parallelism
**Rejected:** Server cannot handle >2 concurrent sessions reliably; creates resource contention.
### Alternative 3: Use `page.waitForTimeout()` with Shorter Durations
**Rejected:** Addresses symptom, not root cause. Still creates timing-dependent tests.
### Alternative 4: Cookie Injection Instead of Login
**Rejected:** Requires reverse-engineering session format; brittle if auth changes.
### Alternative 5: HTTP API Authentication (No Browser)
**Rejected:** Loses browser session behavior validation; tests login flow.
## Implementation Details
### Wait Replacement Mapping
| File | Current Timeouts | Replacement Strategy |
| ------------------- | ---------------------- | ---------------------------------------------------------------------- |
| `dashboard.spec.js` | 1000ms, 2000ms, 3000ms | `waitForSelector` for charts, `waitForLoadState` for navigation |
| `alerts.spec.js` | 500ms, 1000ms, 2000ms | `waitForResponse` for API calls, `waitForSelector` for table rows |
| `gaps.spec.js` | 500ms, 1000ms, 2000ms | `waitForResponse` for `/api/gaps`, `waitForSelector` for summary cards |
| `login.spec.js` | 500ms, 2000ms | `waitForURL` for redirects, `waitForSelector` for error messages |
### Common Wait Patterns for This Codebase
| Scenario | Recommended Pattern | Example |
| --------------------- | ------------------------------------------------- | ------------------------- |
| After page navigation | `waitForLoadState('networkidle')` | Loading dashboard data |
| After button click | `waitForResponse()` + `waitForSelector()` | Run Check button |
| After filter change | `waitForResponse(/api\/.*/)` | Status filter dropdown |
| For chart rendering | `waitForSelector('canvas', { state: 'visible' })` | Chart cards |
| For modal appearance | `waitForSelector('.modal', { state: 'visible' })` | Confirmation dialogs |
| For layout change | `waitForFunction()` | Responsive viewport tests |
### Auth Storage Structure
```
tests/browser/
├── .auth/
│ └── user.json # Generated by global-setup, gitignored
├── global-setup.js # Creates user.json
├── dashboard.spec.js # Uses storageState
├── alerts.spec.js
├── gaps.spec.js
├── login.spec.js # Tests login itself, may need special handling
└── visual.spec.js
```
**`.gitignore` Addition:**
```
tests/browser/.auth/
```
### Login.spec.js Special Handling
`login.spec.js` tests the login flow itself and must NOT use the shared auth state:
```javascript
// playwright.config.js
projects: [
{
name: 'functional',
testMatch: /^(?!.*login).*\.spec\.js$/,
use: { storageState: './tests/browser/.auth/user.json' },
},
{
name: 'login',
testMatch: '**/login.spec.js',
use: { storageState: undefined }, // No auth - tests login flow
},
];
```
## Testing the Optimization
### Baseline Measurement
```bash
# Before optimization: establish baseline
time npm run test:browser 2>&1 | tee baseline-timing.log
grep -E "passed|failed|skipped" baseline-timing.log
```
### Incremental Verification
```bash
# After Phase 1: verify wait replacement
npm run test:browser -- --reporter=list 2>&1 | grep -E "passed|failed|slow"
# After Phase 2: verify global auth
npm run test:browser -- --trace on
# Check trace for login occurrences (should be 1)
# After Phase 3: verify parallel execution
npm run test:browser -- --project=functional &
npm run test:browser -- --project=visual &
wait
```
### Success Criteria
| Metric | Target | Measurement |
| ---------------------- | ------ | -------------------------------------- |
| Total runtime | <150s | `time npm run test:browser` |
| Login count | 1 | Grep traces for `/login` navigation |
| Flaky rate | <2% | 50 consecutive CI runs |
| `waitForTimeout` count | <5 | `grep -r waitForTimeout tests/browser` |
## Lessons Learned / Patterns for Other Projects
### Pattern 1: Always Prefer Event-Based Waits
```javascript
// Bad
await page.click('#submit');
await page.waitForTimeout(2000);
expect(await page.title()).toBe('Success');
// Good
await Promise.all([page.waitForNavigation(), page.click('#submit')]);
expect(await page.title()).toBe('Success');
```
### Pattern 2: Global Setup for Authentication
Playwright's `storageState` feature should be the default for any authenticated app:
1. Create `global-setup.js` that performs login once
2. Save cookies/storage to JSON file
3. Configure `storageState` in `playwright.config.js`
4. Tests start authenticated with zero overhead
### Pattern 3: Separate Test Types by Execution Characteristics
| Test Type | Characteristics | Strategy |
| ---------- | ------------------------ | --------------------------------- |
| Functional | Fast, deterministic | Run first, gate deployment |
| Visual | Slow, baseline-dependent | Run in parallel, separate project |
| E2E | Cross-service, slow | Run nightly, separate workflow |
### Pattern 4: Measure Before and After
Always establish baseline metrics before optimization:
```bash
# Essential metrics to capture
time npm run test:browser # Total runtime
grep -c waitForTimeout *.js # Hardcoded wait count
grep -c 'await login' *.js # Login call count
```
## Related ADRs
- [ADR-0031](0031-quality-gates-eslint-playwright.md): Quality Gates - ESLint, Pre-commit Hooks, and Playwright Browser Testing
- [ADR-0035](0035-browser-test-selector-fixes.md): Browser Test Selector Fixes
- [ADR-0008](0008-testing-strategy.md): Testing Strategy
## References
- Playwright Best Practices: https://playwright.dev/docs/best-practices
- Playwright Authentication: https://playwright.dev/docs/auth
- Playwright Wait Strategies: https://playwright.dev/docs/actionability
- Test Files: `tests/browser/*.spec.js`
- Helper Module: `tests/browser/helpers.js`
- Configuration: `playwright.config.js`

View File

@@ -0,0 +1,265 @@
# ADR-027: Standardized Application-Wide Structured Logging
**Date**: 2026-02-10
**Status**: Accepted
**Source**: Imported from flyer-crawler project (ADR-004)
**Related**: [ADR-017](ADR-017-structured-logging-with-pino.md), [ADR-028](ADR-028-client-side-structured-logging.md), [ADR-029](ADR-029-error-tracking-with-bugsink.md)
## Context
While ADR-017 established Pino as our logging framework, this ADR extends that foundation with application-wide standards for request tracing, context propagation, and structured log formats.
The implementation of logging can vary significantly across different modules. The error handler middleware may produce high-quality, structured JSON logs for errors, but logging within route handlers and service layers can become ad-hoc, using plain strings or inconsistent object structures.
This inconsistency leads to several problems:
- **Difficult Debugging**: It is hard to trace a single user request through the system or correlate events related to a specific operation
- **Ineffective Log Analysis**: Inconsistent log formats make it difficult to effectively query, filter, and create dashboards in log management systems (like Datadog, Splunk, or the ELK stack)
- **Security Risks**: There is no enforced standard for redacting sensitive information (like passwords or tokens) in logs outside of the error handler, increasing the risk of accidental data exposure
- **Missing Context**: Logs often lack crucial context, such as a unique request ID, the authenticated user's ID, or the source IP address, making them less useful for diagnosing issues
## Decision
We will adopt a standardized, application-wide structured logging policy. All log entries MUST be in JSON format and adhere to a consistent schema.
### 1. Request-Scoped Logger with Context
We will create a middleware that runs at the beginning of the request lifecycle. This middleware will:
- Generate a unique `request_id` for each incoming request
- Create a request-scoped logger instance (a "child logger") that automatically includes the `request_id`, `user_id` (if authenticated), and `ip_address` in every log message it generates
- Attach this child logger to the `req` object (e.g., `req.log`)
### 2. Mandatory Use of Request-Scoped Logger
All route handlers and any service functions called by them **MUST** use the request-scoped logger (`req.log`) instead of the global logger instance. This ensures all logs for a given request are automatically correlated.
### 3. Standardized Log Schema
All log messages should follow a base schema. The logger configuration will be updated to enforce this.
**Base Fields**: `level`, `timestamp`, `message`, `request_id`, `user_id`, `ip_address`
**Error Fields**: When logging an error, the log entry MUST include an `error` object with `name`, `message`, and `stack`.
### 4. Standardized Logging Practices
| Level | HTTP Status | Scenario |
| ----- | ----------- | -------------------------------------------------- |
| DEBUG | Any | Request incoming, internal state, development info |
| INFO | 2xx | Successful requests, business events |
| WARN | 4xx | Client errors, validation failures, not found |
| ERROR | 5xx | Server errors, unhandled exceptions |
## Implementation Details
### Logger Configuration
Located in `src/services/logger.server.ts`:
```typescript
import pino from 'pino';
const isProduction = process.env.NODE_ENV === 'production';
const isTest = process.env.NODE_ENV === 'test';
export const logger = pino({
level: isProduction ? 'info' : 'debug',
transport:
isProduction || isTest
? undefined
: {
target: 'pino-pretty',
options: {
colorize: true,
translateTime: 'SYS:standard',
ignore: 'pid,hostname',
},
},
redact: {
paths: [
'req.headers.authorization',
'req.headers.cookie',
'*.body.password',
'*.body.newPassword',
'*.body.currentPassword',
'*.body.confirmPassword',
'*.body.refreshToken',
'*.body.token',
],
censor: '[REDACTED]',
},
});
```
### Request Logger Middleware
Located in `server.ts`:
```typescript
import { randomUUID } from 'crypto';
import type { Request, Response, NextFunction } from 'express';
import { logger } from './services/logger.server';
const requestLogger = (req: Request, res: Response, next: NextFunction) => {
const requestId = randomUUID();
const user = req.user as UserProfile | undefined;
const start = process.hrtime();
// Create request-scoped logger
req.log = logger.child({
request_id: requestId,
user_id: user?.user.user_id,
ip_address: req.ip,
});
req.log.debug({ method: req.method, originalUrl: req.originalUrl }, 'INCOMING');
res.on('finish', () => {
const duration = getDurationInMilliseconds(start);
const { statusCode, statusMessage } = res;
const logDetails = {
user_id: (req.user as UserProfile | undefined)?.user.user_id,
method: req.method,
originalUrl: req.originalUrl,
statusCode,
statusMessage,
duration: duration.toFixed(2),
};
// Include request details for failed requests (for debugging)
if (statusCode >= 400) {
logDetails.req = { headers: req.headers, body: req.body };
}
if (statusCode >= 500) req.log.error(logDetails, 'Request completed with server error');
else if (statusCode >= 400) req.log.warn(logDetails, 'Request completed with client error');
else req.log.info(logDetails, 'Request completed successfully');
});
next();
};
app.use(requestLogger);
```
### TypeScript Support
The `req.log` property is typed via declaration merging in `src/types/express.d.ts`:
```typescript
import { Logger } from 'pino';
declare global {
namespace Express {
export interface Request {
log: Logger;
}
}
}
```
### Automatic Sensitive Data Redaction
The Pino logger automatically redacts sensitive fields:
```json
// Before redaction
{
"body": {
"email": "user@example.com",
"password": "secret123",
"newPassword": "newsecret456"
}
}
// After redaction (in logs)
{
"body": {
"email": "user@example.com",
"password": "[REDACTED]",
"newPassword": "[REDACTED]"
}
}
```
### Service Layer Logging
Services accept the request-scoped logger as an optional parameter:
```typescript
export async function registerUser(email: string, password: string, reqLog?: Logger) {
const log = reqLog || logger; // Fall back to global logger
log.info({ email }, 'Registering new user');
// ... implementation
log.debug({ userId: user.user_id }, 'User created successfully');
return user;
}
// In route handler
router.post('/register', async (req, res, next) => {
await authService.registerUser(req.body.email, req.body.password, req.log);
});
```
### Log Output Format
**Development** (pino-pretty):
```text
[2026-01-09 12:34:56.789] INFO (request_id=abc123): Request completed successfully
method: "GET"
originalUrl: "/api/users"
statusCode: 200
duration: "45.23"
```
**Production** (JSON):
```json
{
"level": 30,
"time": 1704812096789,
"request_id": "abc123",
"user_id": "user_456",
"ip_address": "192.168.1.1",
"method": "GET",
"originalUrl": "/api/users",
"statusCode": 200,
"duration": "45.23",
"msg": "Request completed successfully"
}
```
## Consequences
### Positive
- **Enhanced Observability**: Every log line from a single request can be instantly grouped and analyzed, dramatically speeding up debugging
- **Improved Security**: Centralizing the addition of context (like `user_id`) reduces the chance of developers manually logging sensitive data
- **Scalable Log Management**: Consistent JSON logs are easily ingested and indexed by any modern log aggregation tool
- **Clearer Code**: Removes the need to manually pass contextual information (like user ID) down to service functions just for logging purposes
### Negative
- **Refactoring Effort**: Requires adding the `requestLogger` middleware and refactoring all routes and services to use `req.log` instead of the global `logger`
- **Slight Performance Overhead**: Creating a child logger for every request adds a minor performance cost, though this is negligible for most modern logging libraries
## Key Files
- `src/services/logger.server.ts` - Pino logger configuration
- `src/services/logger.client.ts` - Client-side logger (for frontend)
- `src/types/express.d.ts` - TypeScript declaration for `req.log`
- `server.ts` - Request logger middleware
## References
- [ADR-017: Structured Logging with Pino](ADR-017-structured-logging-with-pino.md)
- [ADR-001: Standardized Error Handling](ADR-001-standardized-error-handling.md) - Error handler uses `req.log` for error logging
- [ADR-028: Client-Side Structured Logging](ADR-028-client-side-structured-logging.md) - Client-side logging strategy
- [Pino Documentation](https://getpino.io/#/)

View File

@@ -0,0 +1,242 @@
# ADR-028: Standardized Client-Side Structured Logging
**Date**: 2026-02-10
**Status**: Accepted
**Source**: Imported from flyer-crawler project (ADR-026)
**Related**: [ADR-027](ADR-027-application-wide-structured-logging.md), [ADR-029](ADR-029-error-tracking-with-bugsink.md)
## Context
Following the standardization of backend logging in ADR-027, it is clear that our frontend components also require a consistent logging strategy. Currently, components either use `console.log` directly or a simple wrapper, but without a formal standard, this can lead to inconsistent log formats and difficulty in debugging user-facing issues.
While the frontend does not have the concept of a "request-scoped" logger, the principles of structured, context-rich logging are equally important for:
1. **Effective Debugging**: Understanding the state of a component or the sequence of user interactions that led to an error
2. **Integration with Monitoring Tools**: Sending structured logs to services like Sentry/Bugsink or LogRocket allows for powerful analysis and error tracking in production
3. **Clean Test Outputs**: Uncontrolled logging can pollute test runner output, making it difficult to spot actual test failures
An existing client-side logger at `src/services/logger.client.ts` provides a simple, structured logging interface. This ADR formalizes its use as the application standard.
## Decision
We will adopt a standardized, application-wide structured logging policy for all client-side (React) code.
### 1. Mandatory Use of the Global Client Logger
All frontend components, hooks, and services **MUST** use the global logger singleton exported from `src/services/logger.client.ts`. Direct use of `console.log`, `console.error`, etc., is discouraged.
### 2. Pino-like API for Structured Logging
The client logger mimics the `pino` API, which is the standard on the backend. It supports two primary call signatures:
- `logger.info('A simple message');`
- `logger.info({ key: 'value' }, 'A message with a structured data payload');`
The second signature, which includes a data object as the first argument, is **strongly preferred**, especially for logging errors or complex state.
### 3. Mocking in Tests
All Jest/Vitest tests for components or hooks that use the logger **MUST** mock the `src/services/logger.client.ts` module. This prevents logs from appearing in test output and allows for assertions that the logger was called correctly.
## Implementation
### Client Logger Service
Located in `src/services/logger.client.ts`:
```typescript
type LogLevel = 'debug' | 'info' | 'warn' | 'error';
interface LoggerOptions {
level?: LogLevel;
enabled?: boolean;
}
const LOG_LEVELS: Record<LogLevel, number> = {
debug: 0,
info: 1,
warn: 2,
error: 3,
};
class ClientLogger {
private level: LogLevel;
private enabled: boolean;
constructor(options: LoggerOptions = {}) {
this.level = options.level ?? 'info';
this.enabled = options.enabled ?? import.meta.env.DEV;
}
private shouldLog(level: LogLevel): boolean {
return this.enabled && LOG_LEVELS[level] >= LOG_LEVELS[this.level];
}
private formatMessage(data: object | string, message?: string): string {
if (typeof data === 'string') {
return data;
}
const payload = JSON.stringify(data, null, 2);
return message ? `${message}\n${payload}` : payload;
}
debug(data: object | string, message?: string): void {
if (this.shouldLog('debug')) {
console.debug(`[DEBUG] ${this.formatMessage(data, message)}`);
}
}
info(data: object | string, message?: string): void {
if (this.shouldLog('info')) {
console.info(`[INFO] ${this.formatMessage(data, message)}`);
}
}
warn(data: object | string, message?: string): void {
if (this.shouldLog('warn')) {
console.warn(`[WARN] ${this.formatMessage(data, message)}`);
}
}
error(data: object | string, message?: string): void {
if (this.shouldLog('error')) {
console.error(`[ERROR] ${this.formatMessage(data, message)}`);
}
}
}
export const logger = new ClientLogger({
level: import.meta.env.DEV ? 'debug' : 'warn',
enabled: true,
});
```
### Example Usage
**Logging an Error in a Component:**
```typescript
// In a React component or hook
import { logger } from '../services/logger.client';
import { notifyError } from '../services/notificationService';
const fetchData = async () => {
try {
const data = await apiClient.getData();
return data;
} catch (err) {
// Log the full error object for context, along with a descriptive message.
logger.error({ err }, 'Failed to fetch component data');
notifyError('Something went wrong. Please try again.');
}
};
```
**Logging State Changes:**
```typescript
// In a Zustand store or state hook
import { logger } from '../services/logger.client';
const useAuthStore = create((set) => ({
login: async (credentials) => {
logger.info({ email: credentials.email }, 'User login attempt');
try {
const user = await authService.login(credentials);
logger.info({ userId: user.id }, 'User logged in successfully');
set({ user, isAuthenticated: true });
} catch (error) {
logger.error({ error }, 'Login failed');
throw error;
}
},
}));
```
### Mocking the Logger in Tests
```typescript
// In a *.test.tsx file
import { vi } from 'vitest';
// Mock the logger at the top of the test file
vi.mock('../services/logger.client', () => ({
logger: {
info: vi.fn(),
warn: vi.fn(),
error: vi.fn(),
debug: vi.fn(),
},
}));
describe('MyComponent', () => {
beforeEach(() => {
vi.clearAllMocks(); // Clear mocks between tests
});
it('should log an error when fetching fails', async () => {
// ... test setup to make fetch fail ...
// Assert that the logger was called with the expected structure
expect(logger.error).toHaveBeenCalledWith(
expect.objectContaining({ err: expect.any(Error) }),
'Failed to fetch component data',
);
});
});
```
## Integration with Error Tracking
When using Sentry/Bugsink for error tracking (see ADR-029), the client logger can be extended to send logs as breadcrumbs:
```typescript
import * as Sentry from '@sentry/react';
class ClientLogger {
// ... existing implementation
error(data: object | string, message?: string): void {
if (this.shouldLog('error')) {
console.error(`[ERROR] ${this.formatMessage(data, message)}`);
}
// Add to Sentry breadcrumbs for error context
Sentry.addBreadcrumb({
category: 'log',
level: 'error',
message: typeof data === 'string' ? data : message,
data: typeof data === 'object' ? data : undefined,
});
}
}
```
## Consequences
### Positive
- **Consistency**: All client-side logs will have a predictable structure, making them easier to read and parse
- **Debuggability**: Errors logged with a full object (`{ err }`) capture the stack trace and other properties, which is invaluable for debugging
- **Testability**: Components that log are easier to test without polluting CI/CD output. We can also assert that logging occurs when expected
- **Future-Proof**: If we later decide to send client-side logs to a remote service, we only need to modify the central `logger.client.ts` file instead of every component
- **Error Tracking Integration**: Logs can be used as breadcrumbs in Sentry/Bugsink for better error context
### Negative
- **Minor Boilerplate**: Requires importing the logger in every file that needs it and mocking it in every corresponding test file. However, this is a small and consistent effort
- **Production Noise**: Care must be taken to configure appropriate log levels in production to avoid performance impact
## Key Files
- `src/services/logger.client.ts` - Client-side logger implementation
- `src/services/logger.server.ts` - Backend logger (for reference)
## References
- [ADR-027: Application-Wide Structured Logging](ADR-027-application-wide-structured-logging.md)
- [ADR-029: Error Tracking with Bugsink](ADR-029-error-tracking-with-bugsink.md)
- [Pino Documentation](https://getpino.io/#/)

View File

@@ -0,0 +1,389 @@
# ADR-029: Error Tracking and Observability with Bugsink
**Date**: 2026-02-10
**Status**: Accepted
**Source**: Imported from flyer-crawler project (ADR-015)
**Related**: [ADR-027](ADR-027-application-wide-structured-logging.md), [ADR-028](ADR-028-client-side-structured-logging.md), [ADR-030](ADR-030-postgresql-function-observability.md), [ADR-032](ADR-032-application-performance-monitoring.md)
## Context
While ADR-027 established structured logging with Pino, the application lacks a high-level, aggregated view of its health and errors. It is difficult to spot trends, identify recurring issues, or be proactively notified of new types of errors.
Key requirements:
1. **Self-hosted**: No external SaaS dependencies for error tracking
2. **Sentry SDK compatible**: Leverage mature, well-documented SDKs
3. **Lightweight**: Minimal resource overhead in the dev container
4. **Production-ready**: Same architecture works on bare-metal production servers
5. **AI-accessible**: MCP server integration for Claude Code and other AI tools
**Note**: Application Performance Monitoring (APM) and distributed tracing are covered separately in [ADR-032](ADR-032-application-performance-monitoring.md).
## Decision
We implement a self-hosted error tracking stack using **Bugsink** as the Sentry-compatible backend, with the following components:
### 1. Error Tracking Backend: Bugsink
**Bugsink** is a lightweight, self-hosted Sentry alternative that:
- Runs as a single process (no Kafka, Redis, ClickHouse required)
- Is fully compatible with Sentry SDKs
- Supports ARM64 and AMD64 architectures
- Can use SQLite (dev) or PostgreSQL (production)
**Deployment**:
- **Dev container**: Installed as a systemd service inside the container
- **Production**: Runs as a systemd service on bare-metal, listening on localhost only
- **Database**: Uses PostgreSQL with a dedicated `bugsink` user and `bugsink` database (same PostgreSQL instance as the main application)
### 2. Backend Integration: @sentry/node
The Express backend integrates `@sentry/node` SDK to:
- Capture unhandled exceptions before PM2/process manager restarts
- Report errors with full stack traces and context
- Integrate with Pino logger for breadcrumbs
- Filter errors by severity (only 5xx errors sent by default)
### 3. Frontend Integration: @sentry/react
The React frontend integrates `@sentry/react` SDK to:
- Wrap the app in an Error Boundary for graceful error handling
- Capture unhandled JavaScript errors
- Report errors with component stack traces
- Filter out browser extension errors
- **Frontend Error Correlation**: The global API client intercepts 4xx/5xx responses and can attach the `x-request-id` header to Sentry scope for correlation with backend logs
### 4. Log Aggregation: Logstash
**Logstash** parses application and infrastructure logs, forwarding error patterns to Bugsink:
- **Installation**: Installed inside the dev container (and on bare-metal prod servers)
- **Inputs**:
- Pino JSON logs from the Node.js application (PM2 managed)
- Redis logs (connection errors, memory warnings, slow commands)
- PostgreSQL function logs (via `fn_log()` - see ADR-030)
- NGINX access/error logs
- **Filter**: Identifies error-level logs (5xx responses, unhandled exceptions, Redis errors)
- **Output**: Sends to Bugsink via Sentry-compatible HTTP API
This provides a secondary error capture path for:
- Errors that occur before Sentry SDK initialization
- Log-based errors that do not throw exceptions
- Redis connection/performance issues
- Database function errors and slow queries
- Historical error analysis from log files
### 5. MCP Server Integration: bugsink-mcp
For AI tool integration (Claude Code, Cursor, etc.), we use the open-source [bugsink-mcp](https://github.com/j-shelfwood/bugsink-mcp) server:
- **No code changes required**: Configurable via environment variables
- **Capabilities**: List projects, get issues, view events, get stacktraces, manage releases
- **Configuration**:
- `BUGSINK_URL`: Points to Bugsink instance (`http://localhost:8000` for dev, `https://bugsink.example.com` for prod)
- `BUGSINK_API_TOKEN`: API token from Bugsink (created via Django management command)
- `BUGSINK_ORG_SLUG`: Organization identifier (usually "sentry")
## Architecture
```text
+---------------------------------------------------------------------------+
| Dev Container / Production Server |
+---------------------------------------------------------------------------+
| |
| +------------------+ +------------------+ |
| | Frontend | | Backend | |
| | (React) | | (Express) | |
| | @sentry/react | | @sentry/node | |
| +--------+---------+ +--------+---------+ |
| | | |
| | Sentry SDK Protocol | |
| +-----------+---------------+ |
| | |
| v |
| +----------------------+ |
| | Bugsink | |
| | (localhost:8000) |<------------------+ |
| | | | |
| | PostgreSQL backend | | |
| +----------------------+ | |
| | |
| +----------------------+ | |
| | Logstash |-------------------+ |
| | (Log Aggregator) | Sentry Output |
| | | |
| | Inputs: | |
| | - PM2/Pino logs | |
| | - Redis logs | |
| | - PostgreSQL logs | |
| | - NGINX logs | |
| +----------------------+ |
| ^ ^ ^ ^ |
| | | | | |
| +-----------+ | | +-----------+ |
| | | | | |
| +----+-----+ +-----+----+ +-----+----+ +-----+----+ |
| | PM2 | | Redis | | PostgreSQL| | NGINX | |
| | Logs | | Logs | | Logs | | Logs | |
| +----------+ +----------+ +-----------+ +---------+ |
| |
| +----------------------+ |
| | PostgreSQL | |
| | +----------------+ | |
| | | app_database | | (main app database) |
| | +----------------+ | |
| | | bugsink | | (error tracking database) |
| | +----------------+ | |
| +----------------------+ |
| |
+---------------------------------------------------------------------------+
External (Developer Machine):
+--------------------------------------+
| Claude Code / Cursor / VS Code |
| +--------------------------------+ |
| | bugsink-mcp | |
| | (MCP Server) | |
| | | |
| | BUGSINK_URL=http://localhost:8000
| | BUGSINK_API_TOKEN=... | |
| | BUGSINK_ORG_SLUG=... | |
| +--------------------------------+ |
+--------------------------------------+
```
## Implementation Details
### Environment Variables
| Variable | Description | Default (Dev) |
| -------------------- | -------------------------------- | -------------------------- |
| `SENTRY_DSN` | Sentry-compatible DSN (backend) | Set after project creation |
| `VITE_SENTRY_DSN` | Sentry-compatible DSN (frontend) | Set after project creation |
| `SENTRY_ENVIRONMENT` | Environment name | `development` |
| `SENTRY_DEBUG` | Enable debug logging | `false` |
| `SENTRY_ENABLED` | Enable/disable error reporting | `true` |
### PostgreSQL Setup
```sql
-- Create dedicated Bugsink database and user
CREATE USER bugsink WITH PASSWORD 'bugsink_dev_password';
CREATE DATABASE bugsink OWNER bugsink;
GRANT ALL PRIVILEGES ON DATABASE bugsink TO bugsink;
```
### Bugsink Configuration
```bash
# Environment variables for Bugsink service
SECRET_KEY=<random-50-char-string>
DATABASE_URL=postgresql://bugsink:bugsink_dev_password@localhost:5432/bugsink
BASE_URL=http://localhost:8000
PORT=8000
```
### Backend Sentry Integration
Located in `src/services/sentry.server.ts`:
```typescript
import * as Sentry from '@sentry/node';
import { config } from '../config/env';
export function initSentry() {
if (!config.sentry.enabled || !config.sentry.dsn) {
return;
}
Sentry.init({
dsn: config.sentry.dsn,
environment: config.sentry.environment || config.server.nodeEnv,
debug: config.sentry.debug,
// Performance monitoring - disabled by default (see ADR-032)
tracesSampleRate: 0,
// Filter out 4xx errors - only report server errors
beforeSend(event) {
const statusCode = event.contexts?.response?.status_code;
if (statusCode && statusCode >= 400 && statusCode < 500) {
return null;
}
return event;
},
});
}
// Set user context after authentication
export function setUserContext(user: { id: string; email: string; name?: string }) {
Sentry.setUser({
id: user.id,
email: user.email,
username: user.name,
});
}
// Clear user context on logout
export function clearUserContext() {
Sentry.setUser(null);
}
```
### Frontend Sentry Integration
Located in `src/services/sentry.client.ts`:
```typescript
import * as Sentry from '@sentry/react';
import { config } from '../config';
export function initSentry() {
if (!config.sentry.enabled || !config.sentry.dsn) {
return;
}
Sentry.init({
dsn: config.sentry.dsn,
environment: config.sentry.environment,
// Performance monitoring - disabled by default (see ADR-032)
tracesSampleRate: 0,
// Filter out browser extension errors
beforeSend(event) {
// Ignore errors from browser extensions
if (
event.exception?.values?.[0]?.stacktrace?.frames?.some((frame) =>
frame.filename?.includes('extension://'),
)
) {
return null;
}
return event;
},
});
}
// Set user context after login
export function setUserContext(user: { id: string; email: string; name?: string }) {
Sentry.setUser({
id: user.id,
email: user.email,
username: user.name,
});
}
// Clear user context on logout
export function clearUserContext() {
Sentry.setUser(null);
}
```
### Error Boundary Component
Located in `src/components/ErrorBoundary.tsx`:
```typescript
import * as Sentry from '@sentry/react';
import { Component, ErrorInfo, ReactNode } from 'react';
interface Props {
children: ReactNode;
fallback?: ReactNode;
}
interface State {
hasError: boolean;
}
export class ErrorBoundary extends Component<Props, State> {
constructor(props: Props) {
super(props);
this.state = { hasError: false };
}
static getDerivedStateFromError(): State {
return { hasError: true };
}
componentDidCatch(error: Error, errorInfo: ErrorInfo) {
Sentry.withScope((scope) => {
scope.setExtras({ componentStack: errorInfo.componentStack });
Sentry.captureException(error);
});
}
render() {
if (this.state.hasError) {
return this.props.fallback || (
<div className="error-boundary">
<h1>Something went wrong</h1>
<p>Please refresh the page or contact support.</p>
</div>
);
}
return this.props.children;
}
}
```
### Logstash Pipeline Configuration
Key routing for log sources:
| Source | Bugsink Project |
| --------------- | --------------- |
| Backend (Pino) | Backend API |
| Worker (Pino) | Backend API |
| PostgreSQL logs | Backend API |
| Vite logs | Infrastructure |
| Redis logs | Infrastructure |
| NGINX logs | Infrastructure |
| Frontend errors | Frontend |
## Consequences
### Positive
- **Full observability**: Aggregated view of errors and trends
- **Self-hosted**: No external SaaS dependencies or subscription costs
- **SDK compatibility**: Leverages mature Sentry SDKs with excellent documentation
- **AI integration**: MCP server enables Claude Code to query and analyze errors
- **Unified architecture**: Same setup works in dev container and production
- **Lightweight**: Bugsink runs in a single process, unlike full Sentry (16GB+ RAM)
- **Error correlation**: Request IDs allow correlation between frontend errors and backend logs
### Negative
- **Additional services**: Bugsink and Logstash add complexity to the container
- **PostgreSQL overhead**: Additional database for error tracking
- **Initial setup**: Requires configuration of multiple components
- **Logstash learning curve**: Pipeline configuration requires Logstash knowledge
## Alternatives Considered
1. **Full Sentry self-hosted**: Rejected due to complexity (Kafka, Redis, ClickHouse, 16GB+ RAM minimum)
2. **GlitchTip**: Considered, but Bugsink is lighter weight and easier to deploy
3. **Sentry SaaS**: Rejected due to self-hosted requirement
4. **Custom error aggregation**: Rejected in favor of proven Sentry SDK ecosystem
## References
- [Bugsink Documentation](https://www.bugsink.com/docs/)
- [Bugsink Docker Install](https://www.bugsink.com/docs/docker-install/)
- [@sentry/node Documentation](https://docs.sentry.io/platforms/javascript/guides/node/)
- [@sentry/react Documentation](https://docs.sentry.io/platforms/javascript/guides/react/)
- [bugsink-mcp](https://github.com/j-shelfwood/bugsink-mcp)
- [Logstash Reference](https://www.elastic.co/guide/en/logstash/current/index.html)
- [ADR-030: PostgreSQL Function Observability](ADR-030-postgresql-function-observability.md)
- [ADR-032: Application Performance Monitoring](ADR-032-application-performance-monitoring.md)

View File

@@ -0,0 +1,336 @@
# ADR-030: PostgreSQL Function Observability
**Date**: 2026-02-10
**Status**: Accepted
**Source**: Imported from flyer-crawler project (ADR-050)
**Related**: [ADR-029](ADR-029-error-tracking-with-bugsink.md), [ADR-027](ADR-027-application-wide-structured-logging.md)
## Context
Applications often use PostgreSQL functions and triggers for business logic, including:
- Data transformations and validations
- Complex query encapsulation
- Trigger-based side effects
- Audit logging
**Current Problem**: These database functions can fail silently in several ways:
1. **`ON CONFLICT DO NOTHING`** - Swallows constraint violations without notification
2. **`IF NOT FOUND THEN RETURN;`** - Silently exits when data is missing
3. **Trigger functions returning `NULL`** - No indication of partial failures
4. **No logging inside functions** - No visibility into function execution
When these silent failures occur:
- The application layer receives no error (function "succeeds" but does nothing)
- No logs are generated for debugging
- Issues are only discovered when users report missing data
- Root cause analysis is extremely difficult
**Example of Silent Failure**:
```sql
-- This function silently does nothing if record doesn't exist
CREATE OR REPLACE FUNCTION public.process_item(p_user_id UUID, p_item_name TEXT)
RETURNS void AS $$
BEGIN
SELECT item_id INTO v_item_id FROM items WHERE name = p_item_name;
IF v_item_id IS NULL THEN
RETURN; -- Silent failure - no log, no error
END IF;
-- ...
END;
$$;
```
ADR-029 established Logstash + Bugsink for error tracking, with PostgreSQL log integration. This ADR defines the implementation.
## Decision
We will implement a standardized PostgreSQL function observability strategy with three tiers of logging severity.
### 1. Function Logging Helper
Create a reusable logging function that outputs structured JSON to PostgreSQL logs:
```sql
-- Function to emit structured log messages from PL/pgSQL
CREATE OR REPLACE FUNCTION public.fn_log(
p_level TEXT, -- 'DEBUG', 'INFO', 'NOTICE', 'WARNING', 'ERROR'
p_function_name TEXT, -- The calling function name
p_message TEXT, -- Human-readable message
p_context JSONB DEFAULT NULL -- Additional context (user_id, params, etc.)
)
RETURNS void
LANGUAGE plpgsql
AS $$
DECLARE
log_line TEXT;
BEGIN
-- Build structured JSON log line
log_line := jsonb_build_object(
'timestamp', now(),
'level', p_level,
'source', 'postgresql',
'function', p_function_name,
'message', p_message,
'context', COALESCE(p_context, '{}'::jsonb)
)::text;
-- Use appropriate RAISE level
CASE p_level
WHEN 'DEBUG' THEN RAISE DEBUG '%', log_line;
WHEN 'INFO' THEN RAISE INFO '%', log_line;
WHEN 'NOTICE' THEN RAISE NOTICE '%', log_line;
WHEN 'WARNING' THEN RAISE WARNING '%', log_line;
WHEN 'ERROR' THEN RAISE LOG '%', log_line; -- Use LOG for errors to ensure capture
ELSE RAISE NOTICE '%', log_line;
END CASE;
END;
$$;
```
### 2. Logging Tiers
#### Tier 1: Critical Functions (Always Log)
Functions where silent failure causes data corruption or user-facing issues:
| Function Type | Log Events |
| ---------------------------- | --------------------------------------- |
| User creation/management | User creation, profile creation, errors |
| Permission/role changes | Role not found, permission denied |
| Financial transactions | Transaction not found, balance issues |
| Data approval workflows | Record not found, permission denied |
| Critical business operations | Items added, operations completed |
**Pattern**:
```sql
CREATE OR REPLACE FUNCTION public.process_critical_operation(p_user_id UUID, p_operation_name TEXT)
RETURNS void AS $$
DECLARE
v_operation_id BIGINT;
v_context JSONB;
BEGIN
v_context := jsonb_build_object('user_id', p_user_id, 'operation_name', p_operation_name);
SELECT operation_id INTO v_operation_id
FROM public.operations WHERE name = p_operation_name;
IF v_operation_id IS NULL THEN
-- Log the issue instead of silent return
PERFORM fn_log('WARNING', 'process_critical_operation',
'Operation not found: ' || p_operation_name, v_context);
RETURN;
END IF;
-- Perform operation
INSERT INTO public.user_operations (user_id, operation_id)
VALUES (p_user_id, v_operation_id)
ON CONFLICT (user_id, operation_id) DO NOTHING;
IF FOUND THEN
PERFORM fn_log('INFO', 'process_critical_operation',
'Operation completed: ' || p_operation_name, v_context);
END IF;
END;
$$;
```
#### Tier 2: Business Logic Functions (Log on Anomalies)
Functions where unexpected conditions should be logged but are not critical:
| Function Type | Log Events |
| --------------------------- | -------------------------------- |
| Search/suggestion functions | No match found (below threshold) |
| Recommendation engines | No recommendations generated |
| Data lookup functions | Empty results, no matches found |
| Price/analytics queries | No data available, stale data |
**Pattern**: Log when results are unexpectedly empty or inputs are invalid.
#### Tier 3: Triggers (Log Errors Only)
Triggers should be fast, so only log when something goes wrong:
| Trigger Type | Log Events |
| --------------------- | ---------------------------- |
| Audit triggers | Failed to update audit trail |
| Aggregation triggers | Calculation failed |
| Cascade triggers | Related record lookup failed |
| Notification triggers | External service call failed |
### 3. PostgreSQL Configuration
Enable logging in `postgresql.conf`:
```ini
# Log all function notices and above
log_min_messages = notice
# Include function name in log prefix
log_line_prefix = '%t [%p] %u@%d '
# Log to file for Logstash pickup
logging_collector = on
log_directory = '/var/log/postgresql'
log_filename = 'postgresql-%Y-%m-%d.log'
log_rotation_age = 1d
log_rotation_size = 100MB
# Capture slow queries from functions
log_min_duration_statement = 1000 # Log queries over 1 second
```
### 4. Logstash Integration
Update the Logstash pipeline (extends ADR-029 configuration):
```conf
# PostgreSQL function log input
input {
file {
path => "/var/log/postgresql/*.log"
type => "postgres"
tags => ["postgres"]
start_position => "beginning"
sincedb_path => "/var/lib/logstash/sincedb_postgres"
}
}
filter {
if [type] == "postgres" {
# Extract timestamp and process ID from PostgreSQL log prefix
grok {
match => { "message" => "%{TIMESTAMP_ISO8601:pg_timestamp} \[%{POSINT:pg_pid}\] %{USER:pg_user}@%{WORD:pg_database} %{GREEDYDATA:pg_message}" }
}
# Check if this is a structured JSON log from fn_log()
if [pg_message] =~ /^\{.*"source":"postgresql".*\}$/ {
json {
source => "pg_message"
target => "fn_log"
}
# Mark as error if level is WARNING or ERROR
if [fn_log][level] in ["WARNING", "ERROR"] {
mutate { add_tag => ["error", "db_function"] }
}
}
# Also catch native PostgreSQL errors
if [pg_message] =~ /^ERROR:/ or [pg_message] =~ /^FATAL:/ {
mutate { add_tag => ["error", "postgres_native"] }
}
}
}
output {
if "error" in [tags] and "postgres" in [tags] {
http {
url => "http://localhost:8000/api/store/"
http_method => "post"
format => "json"
}
}
}
```
### 5. Dual-File Update Requirement
**IMPORTANT**: All SQL function changes must be applied to BOTH files:
1. `sql/Initial_triggers_and_functions.sql` - Used for incremental updates
2. `sql/master_schema_rollup.sql` - Used for fresh database setup
Both files must remain in sync for triggers and functions.
## Implementation Steps
1. **Create `fn_log()` helper function**:
- Add to both SQL files
- Test with `SELECT fn_log('INFO', 'test', 'Test message', '{"key": "value"}'::jsonb);`
2. **Update Tier 1 critical functions** (highest priority):
- Identify functions with silent failures
- Add appropriate logging calls
- Test error paths
3. **Update Tier 2 business logic functions**:
- Add anomaly logging to suggestion/recommendation functions
- Log empty result sets with context
4. **Update Tier 3 trigger functions**:
- Add error-only logging to critical triggers
- Wrap complex trigger logic in exception handlers
5. **Configure PostgreSQL logging**:
- Update `postgresql.conf` in dev container
- Update production PostgreSQL configuration
- Verify logs appear in expected location
6. **Update Logstash pipeline**:
- Add PostgreSQL input to Logstash config
- Add filter rules for structured JSON extraction
- Test end-to-end: function log -> Logstash -> Bugsink
7. **Verify in Bugsink**:
- Confirm database function errors appear as issues
- Verify context (user_id, function name, params) is captured
## Consequences
### Positive
- **Visibility**: Silent failures become visible in error tracking
- **Debugging**: Function execution context captured for root cause analysis
- **Proactive detection**: Anomalies logged before users report issues
- **Unified monitoring**: Database errors appear alongside application errors in Bugsink
- **Structured logs**: JSON format enables filtering and aggregation
### Negative
- **Performance overhead**: Logging adds latency to function execution
- **Log volume**: Tier 1/2 functions may generate significant log volume
- **Maintenance**: Two SQL files must be kept in sync
- **PostgreSQL configuration**: Requires access to `postgresql.conf`
### Mitigations
- **Performance**: Only log meaningful events, not every function call
- **Log volume**: Use appropriate log levels; Logstash filters reduce noise
- **Sync**: Add CI check to verify SQL files match for function definitions
- **Configuration**: Document PostgreSQL settings in deployment runbook
## Examples
### Before (Silent Failure)
```sql
-- User thinks operation completed, but it silently failed
SELECT process_item('user-uuid', 'Nonexistent Item');
-- Returns: void (no error, no log)
-- Result: User never gets expected result, nobody knows why
```
### After (Observable Failure)
```sql
SELECT process_item('user-uuid', 'Nonexistent Item');
-- Returns: void
-- PostgreSQL log: {"timestamp":"2026-01-11T10:30:00Z","level":"WARNING","source":"postgresql","function":"process_item","message":"Item not found: Nonexistent Item","context":{"user_id":"user-uuid","item_name":"Nonexistent Item"}}
-- Bugsink: New issue created with full context
```
## References
- [ADR-029: Error Tracking with Bugsink](ADR-029-error-tracking-with-bugsink.md)
- [ADR-027: Application-Wide Structured Logging](ADR-027-application-wide-structured-logging.md)
- [PostgreSQL RAISE Documentation](https://www.postgresql.org/docs/current/plpgsql-errors-and-messages.html)
- [PostgreSQL Logging Configuration](https://www.postgresql.org/docs/current/runtime-config-logging.html)

View File

@@ -0,0 +1,262 @@
# ADR-031: Granular Debug Logging Strategy
**Date**: 2026-02-10
**Status**: Accepted
**Source**: Imported from flyer-crawler project (ADR-052)
**Related**: [ADR-027](ADR-027-application-wide-structured-logging.md), [ADR-017](ADR-017-structured-logging-with-pino.md)
## Context
Global log levels (INFO vs DEBUG) are too coarse. Developers need to inspect detailed debug information for specific subsystems (e.g., `ai-service`, `db-pool`, `auth-service`) without being flooded by logs from the entire application.
When debugging a specific feature:
- Setting `LOG_LEVEL=debug` globally produces too much noise
- Manually adding/removing debug statements is error-prone
- No standard way to enable targeted debugging in production
## Decision
We will adopt a namespace-based debug filter pattern, similar to the `debug` npm package, but integrated into our Pino logger.
1. **Logger Namespaces**: Every service/module logger must be initialized with a `module` property (e.g., `logger.child({ module: 'ai-service' })`).
2. **Environment Filter**: We will support a `DEBUG_MODULES` environment variable that overrides the log level for matching modules.
## Implementation
### Core Implementation
Implemented in `src/services/logger.server.ts`:
```typescript
import pino from 'pino';
// Parse DEBUG_MODULES from environment
const debugModules = (process.env.DEBUG_MODULES || '').split(',').map((s) => s.trim());
// Base logger configuration
export const logger = pino({
level: process.env.LOG_LEVEL || (process.env.NODE_ENV === 'production' ? 'info' : 'debug'),
// ... other configuration
});
/**
* Creates a scoped logger for a specific module.
* If DEBUG_MODULES includes this module or '*', debug level is enabled.
*/
export const createScopedLogger = (moduleName: string) => {
// If DEBUG_MODULES contains the module name or "*", force level to 'debug'
const isDebugEnabled = debugModules.includes('*') || debugModules.includes(moduleName);
return logger.child({
module: moduleName,
level: isDebugEnabled ? 'debug' : logger.level,
});
};
```
### Service Usage Examples
```typescript
// src/services/aiService.server.ts
import { createScopedLogger } from './logger.server';
const logger = createScopedLogger('ai-service');
export async function processWithAI(data: unknown) {
logger.debug({ data }, 'Starting AI processing');
// ... implementation
logger.info({ result }, 'AI processing completed');
}
```
```typescript
// src/services/authService.server.ts
import { createScopedLogger } from './logger.server';
const logger = createScopedLogger('auth-service');
export async function validateToken(token: string) {
logger.debug({ tokenLength: token.length }, 'Validating token');
// ... implementation
}
```
### Module Naming Convention
Use kebab-case suffixed with `-service` or `-worker`:
| Module Name | Purpose | File |
| --------------- | -------------------------------- | ------------------------------------- |
| `ai-service` | AI/external API interactions | `src/services/aiService.server.ts` |
| `auth-service` | Authentication and authorization | `src/services/authService.server.ts` |
| `db-pool` | Database connection pooling | `src/services/database.server.ts` |
| `cache-service` | Redis/caching operations | `src/services/cacheService.server.ts` |
| `queue-worker` | Background job processing | `src/workers/queueWorker.ts` |
| `email-service` | Email sending | `src/services/emailService.server.ts` |
## Usage
### Enable Debug Logging for Specific Modules
To debug only AI and authentication:
```bash
DEBUG_MODULES=ai-service,auth-service npm run dev
```
### Enable All Debug Logging
Use wildcard to enable debug logging for all modules:
```bash
DEBUG_MODULES=* npm run dev
```
### Development Environment
In `.env.development`:
```bash
# Enable debug logging for specific modules during development
DEBUG_MODULES=ai-service
```
### Production Troubleshooting
Temporarily enable debug logging for a specific subsystem:
```bash
# SSH into production server
ssh root@example.com
# Set environment variable and restart
DEBUG_MODULES=ai-service pm2 restart app-api
# View logs
pm2 logs app-api --lines 100
# Disable debug logging
pm2 unset DEBUG_MODULES app-api
pm2 restart app-api
```
### With PM2 Configuration
In `ecosystem.config.js`:
```javascript
module.exports = {
apps: [
{
name: 'app-api',
script: 'dist/server.js',
env: {
NODE_ENV: 'production',
// DEBUG_MODULES is unset by default
},
env_debug: {
NODE_ENV: 'production',
DEBUG_MODULES: 'ai-service,auth-service',
},
},
],
};
```
Start with debug logging:
```bash
pm2 start ecosystem.config.js --env debug
```
## Best Practices
### 1. Use Scoped Loggers for Long-Running Services
Services with complex workflows or external API calls should use `createScopedLogger` to allow targeted debugging:
```typescript
const logger = createScopedLogger('payment-service');
export async function processPayment(payment: Payment) {
logger.debug({ paymentId: payment.id }, 'Starting payment processing');
try {
const result = await externalPaymentAPI.process(payment);
logger.debug({ result }, 'External API response');
return result;
} catch (error) {
logger.error({ error, paymentId: payment.id }, 'Payment processing failed');
throw error;
}
}
```
### 2. Use Child Loggers for Contextual Data
Even within scoped loggers, create child loggers with job/request-specific context:
```typescript
const logger = createScopedLogger('queue-worker');
async function processJob(job: Job) {
const jobLogger = logger.child({ jobId: job.id, jobName: job.name });
jobLogger.debug('Starting job processing');
// ... processing
jobLogger.info('Job completed successfully');
}
```
### 3. Consistent Debug Message Patterns
Use consistent patterns for debug messages:
```typescript
// Function entry
logger.debug({ params: sanitizedParams }, 'Function entry: processOrder');
// External API calls
logger.debug({ url, method }, 'External API request');
logger.debug({ statusCode, duration }, 'External API response');
// State changes
logger.debug({ before, after }, 'State transition');
// Decision points
logger.debug({ condition, result }, 'Branch decision');
```
### 4. Production Usage Guidelines
- `DEBUG_MODULES` can be set in production for temporary debugging
- Should not be used continuously due to increased log volume
- Always unset after troubleshooting is complete
- Monitor log storage when debug logging is enabled
## Consequences
### Positive
- Developers can inspect detailed logs for specific subsystems without log flooding
- Production debugging becomes more targeted and efficient
- No performance impact when debug logging is disabled
- Compatible with existing Pino logging infrastructure
- Follows familiar pattern from `debug` npm package
### Negative
- Requires developers to know module names (mitigated by documentation)
- Not all services have adopted scoped loggers yet (gradual migration)
- Additional configuration complexity
## References
- [ADR-027: Application-Wide Structured Logging](ADR-027-application-wide-structured-logging.md)
- [ADR-017: Structured Logging with Pino](ADR-017-structured-logging-with-pino.md)
- [debug npm package](https://www.npmjs.com/package/debug) - Inspiration for namespace pattern
- [Pino Child Loggers](https://getpino.io/#/docs/child-loggers)

View File

@@ -0,0 +1,263 @@
# ADR-032: Application Performance Monitoring (APM)
**Date**: 2026-02-10
**Status**: Proposed
**Source**: Imported from flyer-crawler project (ADR-056)
**Related**: [ADR-029](ADR-029-error-tracking-with-bugsink.md) (Error Tracking with Bugsink)
## Context
Application Performance Monitoring (APM) provides visibility into application behavior through:
- **Distributed Tracing**: Track requests across services, queues, and database calls
- **Performance Metrics**: Response times, throughput, error rates
- **Resource Monitoring**: Memory usage, CPU, database connections
- **Transaction Analysis**: Identify slow endpoints and bottlenecks
While ADR-029 covers error tracking and observability, APM is a distinct concern focused on performance rather than errors. The Sentry SDK supports APM through its tracing features, but this capability is currently **intentionally disabled** in our application.
### Current State
The Sentry SDK is installed and configured for error tracking (see ADR-029), but APM features are disabled:
```typescript
// src/services/sentry.client.ts
Sentry.init({
dsn: config.sentry.dsn,
environment: config.sentry.environment,
// Performance monitoring - disabled for now to keep it simple
tracesSampleRate: 0,
// ...
});
```
```typescript
// src/services/sentry.server.ts
Sentry.init({
dsn: config.sentry.dsn,
environment: config.sentry.environment || config.server.nodeEnv,
// Performance monitoring - disabled for now to keep it simple
tracesSampleRate: 0,
// ...
});
```
### Why APM is Currently Disabled
1. **Complexity**: APM adds overhead and complexity to debugging
2. **Bugsink Limitations**: Bugsink's APM support is less mature than its error tracking
3. **Resource Overhead**: Tracing adds memory and CPU overhead
4. **Focus**: Error tracking provides more immediate value for our current scale
5. **Cost**: High sample rates can significantly increase storage requirements
## Decision
We propose a **staged approach** to APM implementation:
### Phase 1: Selective Backend Tracing (Low Priority)
Enable tracing for specific high-value operations:
```typescript
// Enable tracing for specific transactions only
Sentry.init({
dsn: config.sentry.dsn,
tracesSampleRate: 0, // Keep default at 0
// Trace only specific high-value transactions
tracesSampler: (samplingContext) => {
const transactionName = samplingContext.transactionContext?.name;
// Always trace long-running jobs
if (transactionName?.includes('job-processing')) {
return 0.1; // 10% sample rate
}
// Always trace AI/external API calls
if (transactionName?.includes('external-api')) {
return 0.5; // 50% sample rate
}
// Trace slow endpoints (determined by custom logic)
if (samplingContext.parentSampled) {
return 0.1; // 10% for child transactions
}
return 0; // Don't trace other transactions
},
});
```
### Phase 2: Custom Performance Metrics
Add custom metrics without full tracing overhead:
```typescript
// Custom metric for slow database queries
import { metrics } from '@sentry/node';
// In repository methods
const startTime = performance.now();
const result = await pool.query(sql, params);
const duration = performance.now() - startTime;
metrics.distribution('db.query.duration', duration, {
tags: { query_type: 'select', table: 'users' },
});
if (duration > 1000) {
logger.warn({ duration, sql }, 'Slow query detected');
}
```
### Phase 3: Full APM Integration (Future)
When/if full APM is needed:
```typescript
Sentry.init({
dsn: config.sentry.dsn,
tracesSampleRate: 0.1, // 10% of transactions
profilesSampleRate: 0.1, // 10% of traced transactions get profiled
integrations: [
// Database tracing
Sentry.postgresIntegration(),
// Redis tracing
Sentry.redisIntegration(),
// BullMQ job tracing (custom integration)
],
});
```
## Implementation Steps
### To Enable Basic APM
1. **Update Sentry Configuration**:
- Set `tracesSampleRate` > 0 in `src/services/sentry.server.ts`
- Set `tracesSampleRate` > 0 in `src/services/sentry.client.ts`
- Add environment variable `SENTRY_TRACES_SAMPLE_RATE` (default: 0)
2. **Add Instrumentation**:
- Enable automatic Express instrumentation
- Add manual spans for BullMQ job processing
- Add database query instrumentation
3. **Frontend Tracing**:
- Add Browser Tracing integration
- Configure page load and navigation tracing
4. **Environment Variables**:
```bash
SENTRY_TRACES_SAMPLE_RATE=0.1 # 10% sampling
SENTRY_PROFILES_SAMPLE_RATE=0 # Profiling disabled
```
5. **Bugsink Configuration**:
- Verify Bugsink supports performance data ingestion
- Configure retention policies for performance data
### Configuration Changes Required
```typescript
// src/config/env.ts - Add new config
sentry: {
dsn: env.SENTRY_DSN,
environment: env.SENTRY_ENVIRONMENT,
debug: env.SENTRY_DEBUG === 'true',
tracesSampleRate: parseFloat(env.SENTRY_TRACES_SAMPLE_RATE || '0'),
profilesSampleRate: parseFloat(env.SENTRY_PROFILES_SAMPLE_RATE || '0'),
},
```
```typescript
// src/services/sentry.server.ts - Updated init
Sentry.init({
dsn: config.sentry.dsn,
environment: config.sentry.environment,
tracesSampleRate: config.sentry.tracesSampleRate,
profilesSampleRate: config.sentry.profilesSampleRate,
// ... rest of config
});
```
## Trade-offs
### Enabling APM
**Benefits**:
- Identify performance bottlenecks
- Track distributed transactions across services
- Profile slow endpoints
- Monitor resource utilization trends
**Costs**:
- Increased memory usage (~5-15% overhead)
- Additional CPU for trace processing
- Increased storage in Bugsink/Sentry
- More complex debugging (noise in traces)
- Potential latency from tracing overhead
### Keeping APM Disabled
**Benefits**:
- Simpler operation and debugging
- Lower resource overhead
- Focused on error tracking (higher priority)
- No additional storage costs
**Costs**:
- No automated performance insights
- Manual profiling required for bottleneck detection
- Limited visibility into slow transactions
## Alternatives Considered
1. **OpenTelemetry**: More vendor-neutral, but adds another dependency and complexity
2. **Prometheus + Grafana**: Good for metrics, but doesn't provide distributed tracing
3. **Jaeger/Zipkin**: Purpose-built for tracing, but requires additional infrastructure
4. **New Relic/Datadog SaaS**: Full-featured but conflicts with self-hosted requirement
## Current Recommendation
**Keep APM disabled** (`tracesSampleRate: 0`) until:
1. Specific performance issues are identified that require tracing
2. Bugsink's APM support is verified and tested
3. Infrastructure can support the additional overhead
4. There is a clear business need for performance visibility
When enabling APM becomes necessary, start with Phase 1 (selective tracing) to minimize overhead while gaining targeted insights.
## Consequences
### Positive (When Implemented)
- Automated identification of slow endpoints
- Distributed trace visualization across async operations
- Correlation between errors and performance issues
- Proactive alerting on performance degradation
### Negative
- Additional infrastructure complexity
- Storage overhead for trace data
- Potential performance impact from tracing itself
- Learning curve for trace analysis
## References
- [Sentry Performance Monitoring](https://docs.sentry.io/product/performance/)
- [@sentry/node Performance](https://docs.sentry.io/platforms/javascript/guides/node/performance/)
- [@sentry/react Performance](https://docs.sentry.io/platforms/javascript/guides/react/performance/)
- [OpenTelemetry](https://opentelemetry.io/) (alternative approach)
- [ADR-029: Error Tracking with Bugsink](ADR-029-error-tracking-with-bugsink.md)

View File

@@ -0,0 +1,340 @@
# ADR-033: Bugsink to Gitea Issue Synchronization
**Date**: 2026-02-10
**Status**: Proposed
**Source**: Imported from flyer-crawler project (ADR-054)
**Related**: [ADR-029](ADR-029-error-tracking-with-bugsink.md), [ADR-012](ADR-012-bullmq-background-job-processing.md)
## Context
The application uses Bugsink (Sentry-compatible self-hosted error tracking) to capture runtime errors across multiple projects:
| Project Type | Environment | Description |
| -------------- | ------------ | ---------------------------------------- |
| Backend | Production | Main API server errors |
| Backend | Test/Staging | Pre-production API errors |
| Frontend | Production | Client-side JavaScript errors |
| Frontend | Test/Staging | Pre-production frontend errors |
| Infrastructure | Production | Infrastructure-level errors (Redis, PM2) |
| Infrastructure | Test/Staging | Pre-production infrastructure errors |
Currently, errors remain in Bugsink until manually reviewed. There is no automated workflow to:
1. Create trackable tickets for errors
2. Assign errors to developers
3. Track resolution progress
4. Prevent errors from being forgotten
## Decision
Implement an automated background worker that synchronizes unresolved Bugsink issues to Gitea as trackable tickets. The sync worker will:
1. **Run only on the test/staging server** (not production, not dev container)
2. **Poll all Bugsink projects** for unresolved issues
3. **Create Gitea issues** with full error context
4. **Mark synced issues as resolved** in Bugsink (to prevent re-polling)
5. **Track sync state in Redis** to ensure idempotency
### Why Test/Staging Only?
- The sync worker is a background service that needs API tokens for both Bugsink and Gitea
- Running on test/staging provides a single sync point without duplicating infrastructure
- All Bugsink projects (including production) are synced from this one worker
- Production server stays focused on serving users, not running sync jobs
## Architecture
### Component Overview
```
+-----------------------------------------------------------------------+
| TEST/STAGING SERVER |
| |
| +------------------+ +------------------+ +-------------------+ |
| | BullMQ Queue |--->| Sync Worker |--->| Redis DB 15 | |
| | bugsink-sync | | (15min repeat) | | Sync State | |
| +------------------+ +--------+---------+ +-------------------+ |
| | |
+-----------------------------------+------------------------------------+
|
+---------------+---------------+
v v
+------------------+ +------------------+
| Bugsink | | Gitea |
| (all projects) | | (1 repo) |
+------------------+ +------------------+
```
### Queue Configuration
| Setting | Value | Rationale |
| --------------- | ---------------------- | -------------------------------------------- |
| Queue Name | `bugsink-sync` | Follows existing naming pattern |
| Repeat Interval | 15 minutes | Balances responsiveness with API rate limits |
| Retry Attempts | 3 | Standard retry policy |
| Backoff | Exponential (30s base) | Handles temporary API failures |
| Concurrency | 1 | Serial processing prevents race conditions |
### Redis Database Allocation
| Database | Usage | Owner |
| -------- | ------------------- | --------------- |
| 0 | BullMQ (Production) | Existing queues |
| 1 | BullMQ (Test) | Existing queues |
| 2-14 | Reserved | Future use |
| 15 | Bugsink Sync State | This feature |
### Redis Key Schema
```
bugsink:synced:{bugsink_issue_id}
+-- Value: JSON {
gitea_issue_number: number,
synced_at: ISO timestamp,
project: string,
title: string
}
```
### Gitea Labels
The following labels should be created in the repository:
| Label | Color | Purpose |
| -------------------- | ------------------ | ---------------------------------- |
| `bug:frontend` | #e11d48 (Red) | Frontend JavaScript/React errors |
| `bug:backend` | #ea580c (Orange) | Backend Node.js/API errors |
| `bug:infrastructure` | #7c3aed (Purple) | Infrastructure errors (Redis, PM2) |
| `env:production` | #dc2626 (Dark Red) | Production environment |
| `env:test` | #2563eb (Blue) | Test/staging environment |
| `env:development` | #6b7280 (Gray) | Development environment |
| `source:bugsink` | #10b981 (Green) | Auto-synced from Bugsink |
### Label Mapping
| Bugsink Project Type | Bug Label | Env Label |
| --------------------- | ------------------ | -------------- |
| backend (prod) | bug:backend | env:production |
| backend (test) | bug:backend | env:test |
| frontend (prod) | bug:frontend | env:production |
| frontend (test) | bug:frontend | env:test |
| infrastructure (prod) | bug:infrastructure | env:production |
| infrastructure (test) | bug:infrastructure | env:test |
All synced issues also receive the `source:bugsink` label.
## Implementation Details
### New Files
| File | Purpose |
| -------------------------------------- | ------------------------------------------- |
| `src/services/bugsinkSync.server.ts` | Core synchronization logic |
| `src/services/bugsinkClient.server.ts` | HTTP client for Bugsink API |
| `src/services/giteaClient.server.ts` | HTTP client for Gitea API |
| `src/types/bugsink.ts` | TypeScript interfaces for Bugsink responses |
| `src/routes/admin/bugsink-sync.ts` | Admin endpoints for manual trigger |
### Modified Files
| File | Changes |
| -------------------------------- | ------------------------------------- |
| `src/services/queues.server.ts` | Add `bugsinkSyncQueue` definition |
| `src/services/workers.server.ts` | Add sync worker implementation |
| `src/config/env.ts` | Add bugsink sync configuration schema |
| `.env.example` | Document new environment variables |
### Environment Variables
```bash
# Bugsink Configuration
BUGSINK_URL=https://bugsink.example.com
BUGSINK_API_TOKEN=... # Created via Django management command
# Gitea Configuration
GITEA_URL=https://gitea.example.com
GITEA_API_TOKEN=... # Personal access token with repo scope
GITEA_OWNER=org-name
GITEA_REPO=project-repo
# Sync Control
BUGSINK_SYNC_ENABLED=false # Set true only in test environment
BUGSINK_SYNC_INTERVAL=15 # Minutes between sync runs
```
### Gitea Issue Template
```markdown
## Error Details
| Field | Value |
| ------------ | --------------- |
| **Type** | {error_type} |
| **Message** | {error_message} |
| **Platform** | {platform} |
| **Level** | {level} |
## Occurrence Statistics
- **First Seen**: {first_seen}
- **Last Seen**: {last_seen}
- **Total Occurrences**: {count}
## Request Context
- **URL**: {request_url}
- **Additional Context**: {context}
## Stacktrace
<details>
<summary>Click to expand</summary>
{stacktrace}
</details>
---
**Bugsink Issue**: {bugsink_url}
**Project**: {project_slug}
**Trace ID**: {trace_id}
```
### Sync Workflow
```
1. Worker triggered (every 15 min or manual)
2. For each Bugsink project:
a. List issues with status='unresolved'
b. For each issue:
i. Check Redis for existing sync record
ii. If already synced -> skip
iii. Fetch issue details + stacktrace
iv. Create Gitea issue with labels
v. Store sync record in Redis
vi. Mark issue as 'resolved' in Bugsink
3. Log summary (synced: N, skipped: N, failed: N)
```
### Idempotency Guarantees
1. **Redis check before creation**: Prevents duplicate Gitea issues
2. **Atomic Redis write after Gitea create**: Ensures state consistency
3. **Query only unresolved issues**: Resolved issues won't appear in polls
4. **No TTL on Redis keys**: Permanent sync history
## Admin Interface
### Manual Sync Endpoint
```
POST /api/admin/bugsink/sync
Authorization: Bearer {admin_jwt}
Response:
{
"success": true,
"data": {
"synced": 3,
"skipped": 12,
"failed": 0,
"duration_ms": 2340
}
}
```
### Sync Status Endpoint
```
GET /api/admin/bugsink/sync/status
Authorization: Bearer {admin_jwt}
Response:
{
"success": true,
"data": {
"enabled": true,
"last_run": "2026-01-17T10:30:00Z",
"next_run": "2026-01-17T10:45:00Z",
"total_synced": 47,
"projects": [
{ "slug": "backend-prod", "synced_count": 12 },
...
]
}
}
```
## Implementation Phases
### Phase 1: Core Infrastructure
- Add environment variables to `env.ts` schema
- Create `BugsinkClient` service (HTTP client)
- Create `GiteaClient` service (HTTP client)
- Add Redis db 15 connection for sync tracking
### Phase 2: Sync Logic
- Create `BugsinkSyncService` with sync logic
- Add `bugsink-sync` queue to `queues.server.ts`
- Add sync worker to `workers.server.ts`
- Create TypeScript types for API responses
### Phase 3: Integration
- Add admin endpoints for manual sync trigger
- Update CI/CD with new secrets
- Add secrets to repository settings
- Test end-to-end in staging environment
### Phase 4: Documentation
- Update CLAUDE.md with sync information
- Create operational runbook for sync issues
## Consequences
### Positive
1. **Visibility**: All application errors become trackable tickets
2. **Accountability**: Errors can be assigned to developers
3. **History**: Complete audit trail of when errors were discovered and resolved
4. **Integration**: Errors appear alongside feature work in Gitea
5. **Automation**: No manual error triage required
### Negative
1. **API Dependencies**: Requires both Bugsink and Gitea APIs to be available
2. **Token Management**: Additional secrets to manage in CI/CD
3. **Potential Noise**: High-frequency errors could create many tickets (mitigated by Bugsink's issue grouping)
4. **Single Point**: Sync only runs on test server (if test server is down, no sync occurs)
### Risks and Mitigations
| Risk | Mitigation |
| ----------------------- | ------------------------------------------------- |
| Bugsink API rate limits | 15-minute polling interval |
| Gitea API rate limits | Sequential processing with delays |
| Redis connection issues | Reuse existing connection patterns |
| Duplicate issues | Redis tracking + idempotent checks |
| Missing stacktrace | Graceful degradation (create issue without trace) |
## Future Enhancements
1. **Bi-directional sync**: Update Bugsink when Gitea issue is closed
2. **Smart deduplication**: Detect similar errors across projects
3. **Priority mapping**: High occurrence count -> high priority label
4. **Slack/Discord notifications**: Alert on new critical errors
5. **Metrics dashboard**: Track error trends over time
## References
- [ADR-012: BullMQ Background Job Processing](ADR-012-bullmq-background-job-processing.md)
- [ADR-029: Error Tracking with Bugsink](ADR-029-error-tracking-with-bugsink.md)
- [Bugsink API Documentation](https://bugsink.com/docs/api/)
- [Gitea API Documentation](https://docs.gitea.io/en-us/api-usage/)

4
package-lock.json generated
View File

@@ -1,12 +1,12 @@
{
"name": "flyer-crawler",
"version": "0.14.0",
"version": "0.14.1",
"lockfileVersion": 3,
"requires": true,
"packages": {
"": {
"name": "flyer-crawler",
"version": "0.14.0",
"version": "0.14.1",
"dependencies": {
"@bull-board/api": "^6.14.2",
"@bull-board/express": "^6.14.2",

View File

@@ -1,7 +1,7 @@
{
"name": "flyer-crawler",
"private": true,
"version": "0.14.0",
"version": "0.14.1",
"type": "module",
"scripts": {
"dev": "concurrently \"npm:start:dev\" \"vite\"",

View File

@@ -44,7 +44,10 @@ export const useAppInitialization = () => {
if (appVersion) {
logger.info(`Application version: ${appVersion}`);
const lastSeenVersion = localStorage.getItem('lastSeenVersion');
if (appVersion !== lastSeenVersion) {
const onboardingCompleted = localStorage.getItem('flyer_crawler_onboarding_completed');
// Only show "What's New" if onboarding tour has been completed
if (appVersion !== lastSeenVersion && onboardingCompleted === 'true') {
openModal('whatsNew');
localStorage.setItem('lastSeenVersion', appVersion);
}