All checks were successful
Deploy to Test Environment / deploy-to-test (push) Successful in 15m54s
322 lines
15 KiB
Markdown
322 lines
15 KiB
Markdown
# ADR-015: Application Performance Monitoring (APM) and Error Tracking
|
|
|
|
**Date**: 2025-12-12
|
|
|
|
**Status**: Accepted
|
|
|
|
**Updated**: 2026-01-11
|
|
|
|
## Context
|
|
|
|
While `ADR-004` established structured logging with Pino, the application lacks a high-level, aggregated view of its health, performance, and errors. It's difficult to spot trends, identify slow API endpoints, or be proactively notified of new types of errors.
|
|
|
|
Key requirements:
|
|
|
|
1. **Self-hosted**: No external SaaS dependencies for error tracking
|
|
2. **Sentry SDK compatible**: Leverage mature, well-documented SDKs
|
|
3. **Lightweight**: Minimal resource overhead in the dev container
|
|
4. **Production-ready**: Same architecture works on bare-metal production servers
|
|
5. **AI-accessible**: MCP server integration for Claude Code and other AI tools
|
|
|
|
## Decision
|
|
|
|
We will implement a self-hosted error tracking stack using **Bugsink** as the Sentry-compatible backend, with the following components:
|
|
|
|
### 1. Error Tracking Backend: Bugsink
|
|
|
|
**Bugsink** is a lightweight, self-hosted Sentry alternative that:
|
|
|
|
- Runs as a single process (no Kafka, Redis, ClickHouse required)
|
|
- Is fully compatible with Sentry SDKs
|
|
- Supports ARM64 and AMD64 architectures
|
|
- Can use SQLite (dev) or PostgreSQL (production)
|
|
|
|
**Deployment**:
|
|
|
|
- **Dev container**: Installed as a systemd service inside the container
|
|
- **Production**: Runs as a systemd service on bare-metal, listening on localhost only
|
|
- **Database**: Uses PostgreSQL with a dedicated `bugsink` user and `bugsink` database (same PostgreSQL instance as the main application)
|
|
|
|
### 2. Backend Integration: @sentry/node
|
|
|
|
The Express backend will integrate `@sentry/node` SDK to:
|
|
|
|
- Capture unhandled exceptions before PM2/process manager restarts
|
|
- Report errors with full stack traces and context
|
|
- Integrate with Pino logger for breadcrumbs
|
|
- Track transaction performance (optional)
|
|
|
|
### 3. Frontend Integration: @sentry/react
|
|
|
|
The React frontend will integrate `@sentry/react` SDK to:
|
|
|
|
- Wrap the app in a Sentry Error Boundary
|
|
- Capture unhandled JavaScript errors
|
|
- Report errors with component stack traces
|
|
- Track user session context
|
|
|
|
### 4. Log Aggregation: Logstash
|
|
|
|
**Logstash** parses application and infrastructure logs, forwarding error patterns to Bugsink:
|
|
|
|
- **Installation**: Installed inside the dev container (and on bare-metal prod servers)
|
|
- **Inputs**:
|
|
- Pino JSON logs from the Node.js application
|
|
- Redis logs (connection errors, memory warnings, slow commands)
|
|
- PostgreSQL function logs (future - see Implementation Steps)
|
|
- **Filter**: Identifies error-level logs (5xx responses, unhandled exceptions, Redis errors)
|
|
- **Output**: Sends to Bugsink via Sentry-compatible HTTP API
|
|
|
|
This provides a secondary error capture path for:
|
|
|
|
- Errors that occur before Sentry SDK initialization
|
|
- Log-based errors that don't throw exceptions
|
|
- Redis connection/performance issues
|
|
- Database function errors and slow queries
|
|
- Historical error analysis from log files
|
|
|
|
### 5. MCP Server Integration: sentry-selfhosted-mcp
|
|
|
|
For AI tool integration (Claude Code, Cursor, etc.), we use the open-source [sentry-selfhosted-mcp](https://github.com/ddfourtwo/sentry-selfhosted-mcp) server:
|
|
|
|
- **No code changes required**: Configurable via environment variables
|
|
- **Capabilities**: List projects, get issues, view events, update status, add comments
|
|
- **Configuration**:
|
|
- `SENTRY_URL`: Points to Bugsink instance
|
|
- `SENTRY_AUTH_TOKEN`: API token from Bugsink
|
|
- `SENTRY_ORG_SLUG`: Organization identifier
|
|
|
|
## Architecture
|
|
|
|
```text
|
|
┌─────────────────────────────────────────────────────────────────────────┐
|
|
│ Dev Container / Production Server │
|
|
├─────────────────────────────────────────────────────────────────────────┤
|
|
│ │
|
|
│ ┌──────────────────┐ ┌──────────────────┐ │
|
|
│ │ Frontend │ │ Backend │ │
|
|
│ │ (React) │ │ (Express) │ │
|
|
│ │ @sentry/react │ │ @sentry/node │ │
|
|
│ └────────┬─────────┘ └────────┬─────────┘ │
|
|
│ │ │ │
|
|
│ │ Sentry SDK Protocol │ │
|
|
│ └───────────┬───────────────┘ │
|
|
│ │ │
|
|
│ ▼ │
|
|
│ ┌──────────────────────┐ │
|
|
│ │ Bugsink │ │
|
|
│ │ (localhost:8000) │◄──────────────────┐ │
|
|
│ │ │ │ │
|
|
│ │ PostgreSQL backend │ │ │
|
|
│ └──────────────────────┘ │ │
|
|
│ │ │
|
|
│ ┌──────────────────────┐ │ │
|
|
│ │ Logstash │───────────────────┘ │
|
|
│ │ (Log Aggregator) │ Sentry Output │
|
|
│ │ │ │
|
|
│ │ Inputs: │ │
|
|
│ │ - Pino app logs │ │
|
|
│ │ - Redis logs │ │
|
|
│ │ - PostgreSQL (future) │
|
|
│ └──────────────────────┘ │
|
|
│ ▲ ▲ ▲ │
|
|
│ │ │ │ │
|
|
│ ┌───────────┘ │ └───────────┐ │
|
|
│ │ │ │ │
|
|
│ ┌────┴─────┐ ┌─────┴────┐ ┌──────┴─────┐ │
|
|
│ │ Pino │ │ Redis │ │ PostgreSQL │ │
|
|
│ │ Logs │ │ Logs │ │ Logs (TBD) │ │
|
|
│ └──────────┘ └──────────┘ └────────────┘ │
|
|
│ │
|
|
│ ┌──────────────────────┐ │
|
|
│ │ PostgreSQL │ │
|
|
│ │ ┌────────────────┐ │ │
|
|
│ │ │ flyer_crawler │ │ (main app database) │
|
|
│ │ ├────────────────┤ │ │
|
|
│ │ │ bugsink │ │ (error tracking database) │
|
|
│ │ └────────────────┘ │ │
|
|
│ └──────────────────────┘ │
|
|
│ │
|
|
└─────────────────────────────────────────────────────────────────────────┘
|
|
|
|
External (Developer Machine):
|
|
┌──────────────────────────────────────┐
|
|
│ Claude Code / Cursor / VS Code │
|
|
│ ┌────────────────────────────────┐ │
|
|
│ │ sentry-selfhosted-mcp │ │
|
|
│ │ (MCP Server) │ │
|
|
│ │ │ │
|
|
│ │ SENTRY_URL=http://localhost:8000
|
|
│ │ SENTRY_AUTH_TOKEN=... │ │
|
|
│ │ SENTRY_ORG_SLUG=... │ │
|
|
│ └────────────────────────────────┘ │
|
|
└──────────────────────────────────────┘
|
|
```
|
|
|
|
## Configuration
|
|
|
|
### Environment Variables
|
|
|
|
| Variable | Description | Default (Dev) |
|
|
| ------------------ | ------------------------------ | -------------------------- |
|
|
| `BUGSINK_DSN` | Sentry-compatible DSN for SDKs | Set after project creation |
|
|
| `BUGSINK_ENABLED` | Enable/disable error reporting | `true` |
|
|
| `BUGSINK_BASE_URL` | Bugsink web UI URL (internal) | `http://localhost:8000` |
|
|
|
|
### PostgreSQL Setup
|
|
|
|
```sql
|
|
-- Create dedicated Bugsink database and user
|
|
CREATE USER bugsink WITH PASSWORD 'bugsink_dev_password';
|
|
CREATE DATABASE bugsink OWNER bugsink;
|
|
GRANT ALL PRIVILEGES ON DATABASE bugsink TO bugsink;
|
|
```
|
|
|
|
### Bugsink Configuration
|
|
|
|
```bash
|
|
# Environment variables for Bugsink service
|
|
SECRET_KEY=<random-50-char-string>
|
|
DATABASE_URL=postgresql://bugsink:bugsink_dev_password@localhost:5432/bugsink
|
|
BASE_URL=http://localhost:8000
|
|
PORT=8000
|
|
```
|
|
|
|
### Logstash Pipeline
|
|
|
|
```conf
|
|
# /etc/logstash/conf.d/bugsink.conf
|
|
|
|
# === INPUTS ===
|
|
input {
|
|
# Pino application logs
|
|
file {
|
|
path => "/app/logs/*.log"
|
|
codec => json
|
|
type => "pino"
|
|
tags => ["app"]
|
|
}
|
|
|
|
# Redis logs
|
|
file {
|
|
path => "/var/log/redis/*.log"
|
|
type => "redis"
|
|
tags => ["redis"]
|
|
}
|
|
|
|
# PostgreSQL logs (for function logging - future)
|
|
# file {
|
|
# path => "/var/log/postgresql/*.log"
|
|
# type => "postgres"
|
|
# tags => ["postgres"]
|
|
# }
|
|
}
|
|
|
|
# === FILTERS ===
|
|
filter {
|
|
# Pino error detection (level 50 = error, 60 = fatal)
|
|
if [type] == "pino" and [level] >= 50 {
|
|
mutate { add_tag => ["error"] }
|
|
}
|
|
|
|
# Redis error detection
|
|
if [type] == "redis" {
|
|
grok {
|
|
match => { "message" => "%{POSINT:pid}:%{WORD:role} %{MONTHDAY} %{MONTH} %{TIME} %{WORD:loglevel} %{GREEDYDATA:redis_message}" }
|
|
}
|
|
if [loglevel] in ["WARNING", "ERROR"] {
|
|
mutate { add_tag => ["error"] }
|
|
}
|
|
}
|
|
|
|
# PostgreSQL function error detection (future)
|
|
# if [type] == "postgres" {
|
|
# # Parse PostgreSQL log format and detect ERROR/FATAL levels
|
|
# }
|
|
}
|
|
|
|
# === OUTPUT ===
|
|
output {
|
|
if "error" in [tags] {
|
|
http {
|
|
url => "http://localhost:8000/api/store/"
|
|
http_method => "post"
|
|
format => "json"
|
|
# Sentry envelope format
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
## Implementation Steps
|
|
|
|
1. **Update Dockerfile.dev**:
|
|
- Install Bugsink (pip package or binary)
|
|
- Install Logstash (Elastic APT repository)
|
|
- Add systemd service files for both
|
|
|
|
2. **PostgreSQL initialization**:
|
|
- Add Bugsink user/database creation to `sql/00-init-extensions.sql`
|
|
|
|
3. **Backend SDK integration**:
|
|
- Install `@sentry/node`
|
|
- Initialize in `server.ts` before Express app
|
|
- Configure error handler middleware integration
|
|
|
|
4. **Frontend SDK integration**:
|
|
- Install `@sentry/react`
|
|
- Wrap `App` component with `Sentry.ErrorBoundary`
|
|
- Configure in `src/index.tsx`
|
|
|
|
5. **Environment configuration**:
|
|
- Add Bugsink variables to `src/config/env.ts`
|
|
- Update `.env.example` and `compose.dev.yml`
|
|
|
|
6. **Logstash configuration**:
|
|
- Create pipeline config for Pino → Bugsink
|
|
- Configure Pino to write to log file in addition to stdout
|
|
- Configure Redis log monitoring (connection errors, slow commands)
|
|
|
|
7. **MCP server documentation**:
|
|
- Document `sentry-selfhosted-mcp` setup in CLAUDE.md
|
|
|
|
8. **PostgreSQL function logging** (future):
|
|
- Configure PostgreSQL to log function execution errors
|
|
- Add Logstash input for PostgreSQL logs
|
|
- Define filter rules for function-level error detection
|
|
- _Note: Ask for implementation details when this step is reached_
|
|
|
|
## Consequences
|
|
|
|
### Positive
|
|
|
|
- **Full observability**: Aggregated view of errors, trends, and performance
|
|
- **Self-hosted**: No external SaaS dependencies or subscription costs
|
|
- **SDK compatibility**: Leverages mature Sentry SDKs with excellent documentation
|
|
- **AI integration**: MCP server enables Claude Code to query and analyze errors
|
|
- **Unified architecture**: Same setup works in dev container and production
|
|
- **Lightweight**: Bugsink runs in a single process, unlike full Sentry (16GB+ RAM)
|
|
|
|
### Negative
|
|
|
|
- **Additional services**: Bugsink and Logstash add complexity to the container
|
|
- **PostgreSQL overhead**: Additional database for error tracking
|
|
- **Initial setup**: Requires configuration of multiple components
|
|
- **Logstash learning curve**: Pipeline configuration requires Logstash knowledge
|
|
|
|
## Alternatives Considered
|
|
|
|
1. **Full Sentry self-hosted**: Rejected due to complexity (Kafka, Redis, ClickHouse, 16GB+ RAM minimum)
|
|
2. **GlitchTip**: Considered, but Bugsink is lighter weight and easier to deploy
|
|
3. **Sentry SaaS**: Rejected due to self-hosted requirement
|
|
4. **Custom error aggregation**: Rejected in favor of proven Sentry SDK ecosystem
|
|
|
|
## References
|
|
|
|
- [Bugsink Documentation](https://www.bugsink.com/docs/)
|
|
- [Bugsink Docker Install](https://www.bugsink.com/docs/docker-install/)
|
|
- [@sentry/node Documentation](https://docs.sentry.io/platforms/javascript/guides/node/)
|
|
- [@sentry/react Documentation](https://docs.sentry.io/platforms/javascript/guides/react/)
|
|
- [sentry-selfhosted-mcp](https://github.com/ddfourtwo/sentry-selfhosted-mcp)
|
|
- [Logstash Reference](https://www.elastic.co/guide/en/logstash/current/index.html)
|