torbo/flyer-crawler.projectium.com

Fork 0

Files

Torben Sorensen 4d323a51ca

Deploy to Test Environment / deploy-to-test (push) Successful in 49m39s

Details

fix tour / whats new collision

2026-02-12 04:29:43 -08:00

7.6 KiB

Raw Blame History

ADR-032: Application Performance Monitoring (APM)

Date: 2026-02-10

Status: Proposed

Source: Imported from flyer-crawler project (ADR-056)

Related: ADR-029 (Error Tracking with Bugsink)

Context

Application Performance Monitoring (APM) provides visibility into application behavior through:

Distributed Tracing: Track requests across services, queues, and database calls
Performance Metrics: Response times, throughput, error rates
Resource Monitoring: Memory usage, CPU, database connections
Transaction Analysis: Identify slow endpoints and bottlenecks

While ADR-029 covers error tracking and observability, APM is a distinct concern focused on performance rather than errors. The Sentry SDK supports APM through its tracing features, but this capability is currently intentionally disabled in our application.

Current State

The Sentry SDK is installed and configured for error tracking (see ADR-029), but APM features are disabled:

// src/services/sentry.client.ts
Sentry.init({
  dsn: config.sentry.dsn,
  environment: config.sentry.environment,
  // Performance monitoring - disabled for now to keep it simple
  tracesSampleRate: 0,
  // ...
});

// src/services/sentry.server.ts
Sentry.init({
  dsn: config.sentry.dsn,
  environment: config.sentry.environment || config.server.nodeEnv,
  // Performance monitoring - disabled for now to keep it simple
  tracesSampleRate: 0,
  // ...
});

Why APM is Currently Disabled

Complexity: APM adds overhead and complexity to debugging
Bugsink Limitations: Bugsink's APM support is less mature than its error tracking
Resource Overhead: Tracing adds memory and CPU overhead
Focus: Error tracking provides more immediate value for our current scale
Cost: High sample rates can significantly increase storage requirements

Decision

We propose a staged approach to APM implementation:

Phase 1: Selective Backend Tracing (Low Priority)

Enable tracing for specific high-value operations:

// Enable tracing for specific transactions only
Sentry.init({
  dsn: config.sentry.dsn,
  tracesSampleRate: 0, // Keep default at 0

  // Trace only specific high-value transactions
  tracesSampler: (samplingContext) => {
    const transactionName = samplingContext.transactionContext?.name;

    // Always trace long-running jobs
    if (transactionName?.includes('job-processing')) {
      return 0.1; // 10% sample rate
    }

    // Always trace AI/external API calls
    if (transactionName?.includes('external-api')) {
      return 0.5; // 50% sample rate
    }

    // Trace slow endpoints (determined by custom logic)
    if (samplingContext.parentSampled) {
      return 0.1; // 10% for child transactions
    }

    return 0; // Don't trace other transactions
  },
});

Phase 2: Custom Performance Metrics

Add custom metrics without full tracing overhead:

// Custom metric for slow database queries
import { metrics } from '@sentry/node';

// In repository methods
const startTime = performance.now();
const result = await pool.query(sql, params);
const duration = performance.now() - startTime;

metrics.distribution('db.query.duration', duration, {
  tags: { query_type: 'select', table: 'users' },
});

if (duration > 1000) {
  logger.warn({ duration, sql }, 'Slow query detected');
}

Phase 3: Full APM Integration (Future)

When/if full APM is needed:

Sentry.init({
  dsn: config.sentry.dsn,
  tracesSampleRate: 0.1, // 10% of transactions
  profilesSampleRate: 0.1, // 10% of traced transactions get profiled

  integrations: [
    // Database tracing
    Sentry.postgresIntegration(),
    // Redis tracing
    Sentry.redisIntegration(),
    // BullMQ job tracing (custom integration)
  ],
});

Implementation Steps

To Enable Basic APM

Update Sentry Configuration:
- Set tracesSampleRate > 0 in src/services/sentry.server.ts
- Set tracesSampleRate > 0 in src/services/sentry.client.ts
- Add environment variable SENTRY_TRACES_SAMPLE_RATE (default: 0)
Add Instrumentation:
- Enable automatic Express instrumentation
- Add manual spans for BullMQ job processing
- Add database query instrumentation
Frontend Tracing:
- Add Browser Tracing integration
- Configure page load and navigation tracing

Environment Variables:

SENTRY_TRACES_SAMPLE_RATE=0.1  # 10% sampling
SENTRY_PROFILES_SAMPLE_RATE=0  # Profiling disabled

Bugsink Configuration:
- Verify Bugsink supports performance data ingestion
- Configure retention policies for performance data

Configuration Changes Required

// src/config/env.ts - Add new config
sentry: {
  dsn: env.SENTRY_DSN,
  environment: env.SENTRY_ENVIRONMENT,
  debug: env.SENTRY_DEBUG === 'true',
  tracesSampleRate: parseFloat(env.SENTRY_TRACES_SAMPLE_RATE || '0'),
  profilesSampleRate: parseFloat(env.SENTRY_PROFILES_SAMPLE_RATE || '0'),
},

// src/services/sentry.server.ts - Updated init
Sentry.init({
  dsn: config.sentry.dsn,
  environment: config.sentry.environment,
  tracesSampleRate: config.sentry.tracesSampleRate,
  profilesSampleRate: config.sentry.profilesSampleRate,
  // ... rest of config
});

Trade-offs

Enabling APM

Benefits:

Identify performance bottlenecks
Track distributed transactions across services
Profile slow endpoints
Monitor resource utilization trends

Costs:

Increased memory usage (~5-15% overhead)
Additional CPU for trace processing
Increased storage in Bugsink/Sentry
More complex debugging (noise in traces)
Potential latency from tracing overhead

Keeping APM Disabled

Benefits:

Simpler operation and debugging
Lower resource overhead
Focused on error tracking (higher priority)
No additional storage costs

Costs:

No automated performance insights
Manual profiling required for bottleneck detection
Limited visibility into slow transactions

Alternatives Considered

OpenTelemetry: More vendor-neutral, but adds another dependency and complexity
Prometheus + Grafana: Good for metrics, but doesn't provide distributed tracing
Jaeger/Zipkin: Purpose-built for tracing, but requires additional infrastructure
New Relic/Datadog SaaS: Full-featured but conflicts with self-hosted requirement

Current Recommendation

Keep APM disabled (tracesSampleRate: 0) until:

Specific performance issues are identified that require tracing
Bugsink's APM support is verified and tested
Infrastructure can support the additional overhead
There is a clear business need for performance visibility

When enabling APM becomes necessary, start with Phase 1 (selective tracing) to minimize overhead while gaining targeted insights.

7.6 KiB

Raw Blame History

ADR-032: Application Performance Monitoring (APM)

Context

Current State

Why APM is Currently Disabled

Decision

Phase 1: Selective Backend Tracing (Low Priority)

Phase 2: Custom Performance Metrics

Phase 3: Full APM Integration (Future)

Implementation Steps

To Enable Basic APM

Configuration Changes Required

Trade-offs

Enabling APM

Keeping APM Disabled

Alternatives Considered

Current Recommendation

Consequences

Positive (When Implemented)

Negative

References

7.6 KiB Raw Blame History

ADR-032: Application Performance Monitoring (APM)

Context

Current State

Why APM is Currently Disabled

Decision

Phase 1: Selective Backend Tracing (Low Priority)

Phase 2: Custom Performance Metrics

Phase 3: Full APM Integration (Future)

Implementation Steps

To Enable Basic APM

Configuration Changes Required

Trade-offs

Enabling APM

Keeping APM Disabled

Alternatives Considered

Current Recommendation

Consequences

Positive (When Implemented)

Negative

References

7.6 KiB

Raw Blame History