Files
flyer-crawler.projectium.com/docs/adr/0015-error-tracking-and-observability.md
Torben Sorensen 61cfb518e6
Some checks failed
Deploy to Test Environment / deploy-to-test (push) Failing after 1m13s
ADR-015 done
2026-01-26 11:48:42 -08:00

273 lines
13 KiB
Markdown

# ADR-015: Error Tracking and Observability
**Date**: 2025-12-12
**Status**: Accepted (Fully Implemented)
**Updated**: 2026-01-26 (user context integration completed)
**Related**: [ADR-056](./0056-application-performance-monitoring.md) (Application Performance Monitoring)
## Context
While ADR-004 established structured logging with Pino, the application lacks a high-level, aggregated view of its health and errors. It's difficult to spot trends, identify recurring issues, or be proactively notified of new types of errors.
Key requirements:
1. **Self-hosted**: No external SaaS dependencies for error tracking
2. **Sentry SDK compatible**: Leverage mature, well-documented SDKs
3. **Lightweight**: Minimal resource overhead in the dev container
4. **Production-ready**: Same architecture works on bare-metal production servers
5. **AI-accessible**: MCP server integration for Claude Code and other AI tools
**Note**: Application Performance Monitoring (APM) and distributed tracing are covered separately in [ADR-056](./0056-application-performance-monitoring.md).
## Decision
We implement a self-hosted error tracking stack using **Bugsink** as the Sentry-compatible backend, with the following components:
### 1. Error Tracking Backend: Bugsink
**Bugsink** is a lightweight, self-hosted Sentry alternative that:
- Runs as a single process (no Kafka, Redis, ClickHouse required)
- Is fully compatible with Sentry SDKs
- Supports ARM64 and AMD64 architectures
- Can use SQLite (dev) or PostgreSQL (production)
**Deployment**:
- **Dev container**: Installed as a systemd service inside the container
- **Production**: Runs as a systemd service on bare-metal, listening on localhost only
- **Database**: Uses PostgreSQL with a dedicated `bugsink` user and `bugsink` database (same PostgreSQL instance as the main application)
### 2. Backend Integration: @sentry/node
The Express backend integrates `@sentry/node` SDK to:
- Capture unhandled exceptions before PM2/process manager restarts
- Report errors with full stack traces and context
- Integrate with Pino logger for breadcrumbs
- Filter errors by severity (only 5xx errors sent by default)
### 3. Frontend Integration: @sentry/react
The React frontend integrates `@sentry/react` SDK to:
- Wrap the app in an Error Boundary for graceful error handling
- Capture unhandled JavaScript errors
- Report errors with component stack traces
- Filter out browser extension errors
- **Frontend Error Correlation**: The global API client intercepts 4xx/5xx responses and can attach the `x-request-id` header to Sentry scope for correlation with backend logs
### 4. Log Aggregation: Logstash
**Logstash** parses application and infrastructure logs, forwarding error patterns to Bugsink:
- **Installation**: Installed inside the dev container (and on bare-metal prod servers)
- **Inputs**:
- Pino JSON logs from the Node.js application (PM2 managed)
- Redis logs (connection errors, memory warnings, slow commands)
- PostgreSQL function logs (via `fn_log()` - see ADR-050)
- NGINX access/error logs
- **Filter**: Identifies error-level logs (5xx responses, unhandled exceptions, Redis errors)
- **Output**: Sends to Bugsink via Sentry-compatible HTTP API
This provides a secondary error capture path for:
- Errors that occur before Sentry SDK initialization
- Log-based errors that don't throw exceptions
- Redis connection/performance issues
- Database function errors and slow queries
- Historical error analysis from log files
### 5. MCP Server Integration: bugsink-mcp
For AI tool integration (Claude Code, Cursor, etc.), we use the open-source [bugsink-mcp](https://github.com/j-shelfwood/bugsink-mcp) server:
- **No code changes required**: Configurable via environment variables
- **Capabilities**: List projects, get issues, view events, get stacktraces, manage releases
- **Configuration**:
- `BUGSINK_URL`: Points to Bugsink instance (`http://localhost:8000` for dev, `https://bugsink.projectium.com` for prod)
- `BUGSINK_API_TOKEN`: API token from Bugsink (created via Django management command)
- `BUGSINK_ORG_SLUG`: Organization identifier (usually "sentry")
## Architecture
```text
+---------------------------------------------------------------------------+
| Dev Container / Production Server |
+---------------------------------------------------------------------------+
| |
| +------------------+ +------------------+ |
| | Frontend | | Backend | |
| | (React) | | (Express) | |
| | @sentry/react | | @sentry/node | |
| +--------+---------+ +--------+---------+ |
| | | |
| | Sentry SDK Protocol | |
| +-----------+---------------+ |
| | |
| v |
| +----------------------+ |
| | Bugsink | |
| | (localhost:8000) |<------------------+ |
| | | | |
| | PostgreSQL backend | | |
| +----------------------+ | |
| | |
| +----------------------+ | |
| | Logstash |-------------------+ |
| | (Log Aggregator) | Sentry Output |
| | | |
| | Inputs: | |
| | - PM2/Pino logs | |
| | - Redis logs | |
| | - PostgreSQL logs | |
| | - NGINX logs | |
| +----------------------+ |
| ^ ^ ^ ^ |
| | | | | |
| +-----------+ | | +-----------+ |
| | | | | |
| +----+-----+ +-----+----+ +-----+----+ +-----+----+ |
| | PM2 | | Redis | | PostgreSQL| | NGINX | |
| | Logs | | Logs | | Logs | | Logs | |
| +----------+ +----------+ +-----------+ +---------+ |
| |
| +----------------------+ |
| | PostgreSQL | |
| | +----------------+ | |
| | | flyer_crawler | | (main app database) |
| | +----------------+ | |
| | | bugsink | | (error tracking database) |
| | +----------------+ | |
| +----------------------+ |
| |
+---------------------------------------------------------------------------+
External (Developer Machine):
+--------------------------------------+
| Claude Code / Cursor / VS Code |
| +--------------------------------+ |
| | bugsink-mcp | |
| | (MCP Server) | |
| | | |
| | BUGSINK_URL=http://localhost:8000
| | BUGSINK_API_TOKEN=... | |
| | BUGSINK_ORG_SLUG=... | |
| +--------------------------------+ |
+--------------------------------------+
```
## Implementation Status
### Completed
- [x] Bugsink installed and configured in dev container
- [x] PostgreSQL `bugsink` database and user created
- [x] `@sentry/node` SDK integrated in backend (`src/services/sentry.server.ts`)
- [x] `@sentry/react` SDK integrated in frontend (`src/services/sentry.client.ts`)
- [x] ErrorBoundary component created (`src/components/ErrorBoundary.tsx`)
- [x] ErrorBoundary wrapped around app (`src/providers/AppProviders.tsx`)
- [x] Logstash pipeline configured for PM2/Pino, Redis, PostgreSQL, NGINX logs
- [x] MCP server (`bugsink-mcp`) documented and configured
- [x] Environment variables added to `src/config/env.ts` and frontend `src/config.ts`
- [x] Browser extension errors filtered in `beforeSend`
- [x] 5xx error filtering in backend error handler
### Recently Completed (2026-01-26)
- [x] **User context after authentication**: Integrated `setUser()` calls in `AuthProvider.tsx` to associate errors with authenticated users
- Called on profile fetch from query (line 44-49)
- Called on direct login with profile (line 94-99)
- Called on login with profile fetch (line 124-129)
- Cleared on logout (line 76-77)
- Maps `user_id``id`, `email``email`, `full_name``username`
This completes the error tracking implementation - all errors are now associated with the authenticated user who encountered them, enabling user-specific error analysis and debugging.
## Configuration
### Environment Variables
| Variable | Description | Default (Dev) |
| -------------------- | -------------------------------- | -------------------------- |
| `SENTRY_DSN` | Sentry-compatible DSN (backend) | Set after project creation |
| `VITE_SENTRY_DSN` | Sentry-compatible DSN (frontend) | Set after project creation |
| `SENTRY_ENVIRONMENT` | Environment name | `development` |
| `SENTRY_DEBUG` | Enable debug logging | `false` |
| `SENTRY_ENABLED` | Enable/disable error reporting | `true` |
### PostgreSQL Setup
```sql
-- Create dedicated Bugsink database and user
CREATE USER bugsink WITH PASSWORD 'bugsink_dev_password';
CREATE DATABASE bugsink OWNER bugsink;
GRANT ALL PRIVILEGES ON DATABASE bugsink TO bugsink;
```
### Bugsink Configuration
```bash
# Environment variables for Bugsink service
SECRET_KEY=<random-50-char-string>
DATABASE_URL=postgresql://bugsink:bugsink_dev_password@localhost:5432/bugsink
BASE_URL=http://localhost:8000
PORT=8000
```
### Logstash Pipeline
See `docker/logstash/bugsink.conf` for the full pipeline configuration.
Key routing:
| Source | Bugsink Project |
| --------------- | --------------- |
| Backend (Pino) | Backend API |
| Worker (Pino) | Backend API |
| PostgreSQL logs | Backend API |
| Vite logs | Infrastructure |
| Redis logs | Infrastructure |
| NGINX logs | Infrastructure |
| Frontend errors | Frontend |
## Consequences
### Positive
- **Full observability**: Aggregated view of errors and trends
- **Self-hosted**: No external SaaS dependencies or subscription costs
- **SDK compatibility**: Leverages mature Sentry SDKs with excellent documentation
- **AI integration**: MCP server enables Claude Code to query and analyze errors
- **Unified architecture**: Same setup works in dev container and production
- **Lightweight**: Bugsink runs in a single process, unlike full Sentry (16GB+ RAM)
- **Error correlation**: Request IDs allow correlation between frontend errors and backend logs
### Negative
- **Additional services**: Bugsink and Logstash add complexity to the container
- **PostgreSQL overhead**: Additional database for error tracking
- **Initial setup**: Requires configuration of multiple components
- **Logstash learning curve**: Pipeline configuration requires Logstash knowledge
## Alternatives Considered
1. **Full Sentry self-hosted**: Rejected due to complexity (Kafka, Redis, ClickHouse, 16GB+ RAM minimum)
2. **GlitchTip**: Considered, but Bugsink is lighter weight and easier to deploy
3. **Sentry SaaS**: Rejected due to self-hosted requirement
4. **Custom error aggregation**: Rejected in favor of proven Sentry SDK ecosystem
## References
- [Bugsink Documentation](https://www.bugsink.com/docs/)
- [Bugsink Docker Install](https://www.bugsink.com/docs/docker-install/)
- [@sentry/node Documentation](https://docs.sentry.io/platforms/javascript/guides/node/)
- [@sentry/react Documentation](https://docs.sentry.io/platforms/javascript/guides/react/)
- [bugsink-mcp](https://github.com/j-shelfwood/bugsink-mcp)
- [Logstash Reference](https://www.elastic.co/guide/en/logstash/current/index.html)
- [ADR-050: PostgreSQL Function Observability](./0050-postgresql-function-observability.md)
- [ADR-056: Application Performance Monitoring](./0056-application-performance-monitoring.md)