Files
flyer-crawler.projectium.com/docs/adr/ADR-030-postgresql-function-observability.md
Torben Sorensen 4d323a51ca
All checks were successful
Deploy to Test Environment / deploy-to-test (push) Successful in 49m39s
fix tour / whats new collision
2026-02-12 04:29:43 -08:00

11 KiB

ADR-030: PostgreSQL Function Observability

Date: 2026-02-10

Status: Accepted

Source: Imported from flyer-crawler project (ADR-050)

Related: ADR-029, ADR-027

Context

Applications often use PostgreSQL functions and triggers for business logic, including:

  • Data transformations and validations
  • Complex query encapsulation
  • Trigger-based side effects
  • Audit logging

Current Problem: These database functions can fail silently in several ways:

  1. ON CONFLICT DO NOTHING - Swallows constraint violations without notification
  2. IF NOT FOUND THEN RETURN; - Silently exits when data is missing
  3. Trigger functions returning NULL - No indication of partial failures
  4. No logging inside functions - No visibility into function execution

When these silent failures occur:

  • The application layer receives no error (function "succeeds" but does nothing)
  • No logs are generated for debugging
  • Issues are only discovered when users report missing data
  • Root cause analysis is extremely difficult

Example of Silent Failure:

-- This function silently does nothing if record doesn't exist
CREATE OR REPLACE FUNCTION public.process_item(p_user_id UUID, p_item_name TEXT)
RETURNS void AS $$
BEGIN
    SELECT item_id INTO v_item_id FROM items WHERE name = p_item_name;
    IF v_item_id IS NULL THEN
        RETURN;  -- Silent failure - no log, no error
    END IF;
    -- ...
END;
$$;

ADR-029 established Logstash + Bugsink for error tracking, with PostgreSQL log integration. This ADR defines the implementation.

Decision

We will implement a standardized PostgreSQL function observability strategy with three tiers of logging severity.

1. Function Logging Helper

Create a reusable logging function that outputs structured JSON to PostgreSQL logs:

-- Function to emit structured log messages from PL/pgSQL
CREATE OR REPLACE FUNCTION public.fn_log(
    p_level TEXT,           -- 'DEBUG', 'INFO', 'NOTICE', 'WARNING', 'ERROR'
    p_function_name TEXT,   -- The calling function name
    p_message TEXT,         -- Human-readable message
    p_context JSONB DEFAULT NULL  -- Additional context (user_id, params, etc.)
)
RETURNS void
LANGUAGE plpgsql
AS $$
DECLARE
    log_line TEXT;
BEGIN
    -- Build structured JSON log line
    log_line := jsonb_build_object(
        'timestamp', now(),
        'level', p_level,
        'source', 'postgresql',
        'function', p_function_name,
        'message', p_message,
        'context', COALESCE(p_context, '{}'::jsonb)
    )::text;

    -- Use appropriate RAISE level
    CASE p_level
        WHEN 'DEBUG' THEN RAISE DEBUG '%', log_line;
        WHEN 'INFO' THEN RAISE INFO '%', log_line;
        WHEN 'NOTICE' THEN RAISE NOTICE '%', log_line;
        WHEN 'WARNING' THEN RAISE WARNING '%', log_line;
        WHEN 'ERROR' THEN RAISE LOG '%', log_line;  -- Use LOG for errors to ensure capture
        ELSE RAISE NOTICE '%', log_line;
    END CASE;
END;
$$;

2. Logging Tiers

Tier 1: Critical Functions (Always Log)

Functions where silent failure causes data corruption or user-facing issues:

Function Type Log Events
User creation/management User creation, profile creation, errors
Permission/role changes Role not found, permission denied
Financial transactions Transaction not found, balance issues
Data approval workflows Record not found, permission denied
Critical business operations Items added, operations completed

Pattern:

CREATE OR REPLACE FUNCTION public.process_critical_operation(p_user_id UUID, p_operation_name TEXT)
RETURNS void AS $$
DECLARE
    v_operation_id BIGINT;
    v_context JSONB;
BEGIN
    v_context := jsonb_build_object('user_id', p_user_id, 'operation_name', p_operation_name);

    SELECT operation_id INTO v_operation_id
    FROM public.operations WHERE name = p_operation_name;

    IF v_operation_id IS NULL THEN
        -- Log the issue instead of silent return
        PERFORM fn_log('WARNING', 'process_critical_operation',
            'Operation not found: ' || p_operation_name, v_context);
        RETURN;
    END IF;

    -- Perform operation
    INSERT INTO public.user_operations (user_id, operation_id)
    VALUES (p_user_id, v_operation_id)
    ON CONFLICT (user_id, operation_id) DO NOTHING;

    IF FOUND THEN
        PERFORM fn_log('INFO', 'process_critical_operation',
            'Operation completed: ' || p_operation_name, v_context);
    END IF;
END;
$$;

Tier 2: Business Logic Functions (Log on Anomalies)

Functions where unexpected conditions should be logged but are not critical:

Function Type Log Events
Search/suggestion functions No match found (below threshold)
Recommendation engines No recommendations generated
Data lookup functions Empty results, no matches found
Price/analytics queries No data available, stale data

Pattern: Log when results are unexpectedly empty or inputs are invalid.

Tier 3: Triggers (Log Errors Only)

Triggers should be fast, so only log when something goes wrong:

Trigger Type Log Events
Audit triggers Failed to update audit trail
Aggregation triggers Calculation failed
Cascade triggers Related record lookup failed
Notification triggers External service call failed

3. PostgreSQL Configuration

Enable logging in postgresql.conf:

# Log all function notices and above
log_min_messages = notice

# Include function name in log prefix
log_line_prefix = '%t [%p] %u@%d '

# Log to file for Logstash pickup
logging_collector = on
log_directory = '/var/log/postgresql'
log_filename = 'postgresql-%Y-%m-%d.log'
log_rotation_age = 1d
log_rotation_size = 100MB

# Capture slow queries from functions
log_min_duration_statement = 1000  # Log queries over 1 second

4. Logstash Integration

Update the Logstash pipeline (extends ADR-029 configuration):

# PostgreSQL function log input
input {
  file {
    path => "/var/log/postgresql/*.log"
    type => "postgres"
    tags => ["postgres"]
    start_position => "beginning"
    sincedb_path => "/var/lib/logstash/sincedb_postgres"
  }
}

filter {
  if [type] == "postgres" {
    # Extract timestamp and process ID from PostgreSQL log prefix
    grok {
      match => { "message" => "%{TIMESTAMP_ISO8601:pg_timestamp} \[%{POSINT:pg_pid}\] %{USER:pg_user}@%{WORD:pg_database} %{GREEDYDATA:pg_message}" }
    }

    # Check if this is a structured JSON log from fn_log()
    if [pg_message] =~ /^\{.*"source":"postgresql".*\}$/ {
      json {
        source => "pg_message"
        target => "fn_log"
      }

      # Mark as error if level is WARNING or ERROR
      if [fn_log][level] in ["WARNING", "ERROR"] {
        mutate { add_tag => ["error", "db_function"] }
      }
    }

    # Also catch native PostgreSQL errors
    if [pg_message] =~ /^ERROR:/ or [pg_message] =~ /^FATAL:/ {
      mutate { add_tag => ["error", "postgres_native"] }
    }
  }
}

output {
  if "error" in [tags] and "postgres" in [tags] {
    http {
      url => "http://localhost:8000/api/store/"
      http_method => "post"
      format => "json"
    }
  }
}

5. Dual-File Update Requirement

IMPORTANT: All SQL function changes must be applied to BOTH files:

  1. sql/Initial_triggers_and_functions.sql - Used for incremental updates
  2. sql/master_schema_rollup.sql - Used for fresh database setup

Both files must remain in sync for triggers and functions.

Implementation Steps

  1. Create fn_log() helper function:

    • Add to both SQL files
    • Test with SELECT fn_log('INFO', 'test', 'Test message', '{"key": "value"}'::jsonb);
  2. Update Tier 1 critical functions (highest priority):

    • Identify functions with silent failures
    • Add appropriate logging calls
    • Test error paths
  3. Update Tier 2 business logic functions:

    • Add anomaly logging to suggestion/recommendation functions
    • Log empty result sets with context
  4. Update Tier 3 trigger functions:

    • Add error-only logging to critical triggers
    • Wrap complex trigger logic in exception handlers
  5. Configure PostgreSQL logging:

    • Update postgresql.conf in dev container
    • Update production PostgreSQL configuration
    • Verify logs appear in expected location
  6. Update Logstash pipeline:

    • Add PostgreSQL input to Logstash config
    • Add filter rules for structured JSON extraction
    • Test end-to-end: function log -> Logstash -> Bugsink
  7. Verify in Bugsink:

    • Confirm database function errors appear as issues
    • Verify context (user_id, function name, params) is captured

Consequences

Positive

  • Visibility: Silent failures become visible in error tracking
  • Debugging: Function execution context captured for root cause analysis
  • Proactive detection: Anomalies logged before users report issues
  • Unified monitoring: Database errors appear alongside application errors in Bugsink
  • Structured logs: JSON format enables filtering and aggregation

Negative

  • Performance overhead: Logging adds latency to function execution
  • Log volume: Tier 1/2 functions may generate significant log volume
  • Maintenance: Two SQL files must be kept in sync
  • PostgreSQL configuration: Requires access to postgresql.conf

Mitigations

  • Performance: Only log meaningful events, not every function call
  • Log volume: Use appropriate log levels; Logstash filters reduce noise
  • Sync: Add CI check to verify SQL files match for function definitions
  • Configuration: Document PostgreSQL settings in deployment runbook

Examples

Before (Silent Failure)

-- User thinks operation completed, but it silently failed
SELECT process_item('user-uuid', 'Nonexistent Item');
-- Returns: void (no error, no log)
-- Result: User never gets expected result, nobody knows why

After (Observable Failure)

SELECT process_item('user-uuid', 'Nonexistent Item');
-- Returns: void
-- PostgreSQL log: {"timestamp":"2026-01-11T10:30:00Z","level":"WARNING","source":"postgresql","function":"process_item","message":"Item not found: Nonexistent Item","context":{"user_id":"user-uuid","item_name":"Nonexistent Item"}}
-- Bugsink: New issue created with full context

References