7.6 KiB
ADR-032: Application Performance Monitoring (APM)
Date: 2026-02-10
Status: Proposed
Source: Imported from flyer-crawler project (ADR-056)
Related: ADR-029 (Error Tracking with Bugsink)
Context
Application Performance Monitoring (APM) provides visibility into application behavior through:
- Distributed Tracing: Track requests across services, queues, and database calls
- Performance Metrics: Response times, throughput, error rates
- Resource Monitoring: Memory usage, CPU, database connections
- Transaction Analysis: Identify slow endpoints and bottlenecks
While ADR-029 covers error tracking and observability, APM is a distinct concern focused on performance rather than errors. The Sentry SDK supports APM through its tracing features, but this capability is currently intentionally disabled in our application.
Current State
The Sentry SDK is installed and configured for error tracking (see ADR-029), but APM features are disabled:
// src/services/sentry.client.ts
Sentry.init({
dsn: config.sentry.dsn,
environment: config.sentry.environment,
// Performance monitoring - disabled for now to keep it simple
tracesSampleRate: 0,
// ...
});
// src/services/sentry.server.ts
Sentry.init({
dsn: config.sentry.dsn,
environment: config.sentry.environment || config.server.nodeEnv,
// Performance monitoring - disabled for now to keep it simple
tracesSampleRate: 0,
// ...
});
Why APM is Currently Disabled
- Complexity: APM adds overhead and complexity to debugging
- Bugsink Limitations: Bugsink's APM support is less mature than its error tracking
- Resource Overhead: Tracing adds memory and CPU overhead
- Focus: Error tracking provides more immediate value for our current scale
- Cost: High sample rates can significantly increase storage requirements
Decision
We propose a staged approach to APM implementation:
Phase 1: Selective Backend Tracing (Low Priority)
Enable tracing for specific high-value operations:
// Enable tracing for specific transactions only
Sentry.init({
dsn: config.sentry.dsn,
tracesSampleRate: 0, // Keep default at 0
// Trace only specific high-value transactions
tracesSampler: (samplingContext) => {
const transactionName = samplingContext.transactionContext?.name;
// Always trace long-running jobs
if (transactionName?.includes('job-processing')) {
return 0.1; // 10% sample rate
}
// Always trace AI/external API calls
if (transactionName?.includes('external-api')) {
return 0.5; // 50% sample rate
}
// Trace slow endpoints (determined by custom logic)
if (samplingContext.parentSampled) {
return 0.1; // 10% for child transactions
}
return 0; // Don't trace other transactions
},
});
Phase 2: Custom Performance Metrics
Add custom metrics without full tracing overhead:
// Custom metric for slow database queries
import { metrics } from '@sentry/node';
// In repository methods
const startTime = performance.now();
const result = await pool.query(sql, params);
const duration = performance.now() - startTime;
metrics.distribution('db.query.duration', duration, {
tags: { query_type: 'select', table: 'users' },
});
if (duration > 1000) {
logger.warn({ duration, sql }, 'Slow query detected');
}
Phase 3: Full APM Integration (Future)
When/if full APM is needed:
Sentry.init({
dsn: config.sentry.dsn,
tracesSampleRate: 0.1, // 10% of transactions
profilesSampleRate: 0.1, // 10% of traced transactions get profiled
integrations: [
// Database tracing
Sentry.postgresIntegration(),
// Redis tracing
Sentry.redisIntegration(),
// BullMQ job tracing (custom integration)
],
});
Implementation Steps
To Enable Basic APM
-
Update Sentry Configuration:
- Set
tracesSampleRate> 0 insrc/services/sentry.server.ts - Set
tracesSampleRate> 0 insrc/services/sentry.client.ts - Add environment variable
SENTRY_TRACES_SAMPLE_RATE(default: 0)
- Set
-
Add Instrumentation:
- Enable automatic Express instrumentation
- Add manual spans for BullMQ job processing
- Add database query instrumentation
-
Frontend Tracing:
- Add Browser Tracing integration
- Configure page load and navigation tracing
-
Environment Variables:
SENTRY_TRACES_SAMPLE_RATE=0.1 # 10% sampling SENTRY_PROFILES_SAMPLE_RATE=0 # Profiling disabled -
Bugsink Configuration:
- Verify Bugsink supports performance data ingestion
- Configure retention policies for performance data
Configuration Changes Required
// src/config/env.ts - Add new config
sentry: {
dsn: env.SENTRY_DSN,
environment: env.SENTRY_ENVIRONMENT,
debug: env.SENTRY_DEBUG === 'true',
tracesSampleRate: parseFloat(env.SENTRY_TRACES_SAMPLE_RATE || '0'),
profilesSampleRate: parseFloat(env.SENTRY_PROFILES_SAMPLE_RATE || '0'),
},
// src/services/sentry.server.ts - Updated init
Sentry.init({
dsn: config.sentry.dsn,
environment: config.sentry.environment,
tracesSampleRate: config.sentry.tracesSampleRate,
profilesSampleRate: config.sentry.profilesSampleRate,
// ... rest of config
});
Trade-offs
Enabling APM
Benefits:
- Identify performance bottlenecks
- Track distributed transactions across services
- Profile slow endpoints
- Monitor resource utilization trends
Costs:
- Increased memory usage (~5-15% overhead)
- Additional CPU for trace processing
- Increased storage in Bugsink/Sentry
- More complex debugging (noise in traces)
- Potential latency from tracing overhead
Keeping APM Disabled
Benefits:
- Simpler operation and debugging
- Lower resource overhead
- Focused on error tracking (higher priority)
- No additional storage costs
Costs:
- No automated performance insights
- Manual profiling required for bottleneck detection
- Limited visibility into slow transactions
Alternatives Considered
- OpenTelemetry: More vendor-neutral, but adds another dependency and complexity
- Prometheus + Grafana: Good for metrics, but doesn't provide distributed tracing
- Jaeger/Zipkin: Purpose-built for tracing, but requires additional infrastructure
- New Relic/Datadog SaaS: Full-featured but conflicts with self-hosted requirement
Current Recommendation
Keep APM disabled (tracesSampleRate: 0) until:
- Specific performance issues are identified that require tracing
- Bugsink's APM support is verified and tested
- Infrastructure can support the additional overhead
- There is a clear business need for performance visibility
When enabling APM becomes necessary, start with Phase 1 (selective tracing) to minimize overhead while gaining targeted insights.
Consequences
Positive (When Implemented)
- Automated identification of slow endpoints
- Distributed trace visualization across async operations
- Correlation between errors and performance issues
- Proactive alerting on performance degradation
Negative
- Additional infrastructure complexity
- Storage overhead for trace data
- Potential performance impact from tracing itself
- Learning curve for trace analysis