Files
flyer-crawler.projectium.com/docs/adr/0050-postgresql-function-observability.md
Torben Sorensen 4e22213cd1
All checks were successful
Deploy to Test Environment / deploy-to-test (push) Successful in 15m54s
all the new shiny things
2026-01-11 02:04:52 -08:00

12 KiB

ADR-050: PostgreSQL Function Observability

Date: 2026-01-11

Status: Proposed

Related: ADR-015, ADR-004

Context

The application uses 30+ PostgreSQL functions and 11+ triggers for business logic, including:

  • Recipe recommendations and search
  • Shopping list generation from menu plans
  • Price history tracking
  • Achievement awards
  • Activity logging
  • User profile creation

Current Problem: These database functions can fail silently in several ways:

  1. ON CONFLICT DO NOTHING - Swallows constraint violations without notification
  2. IF NOT FOUND THEN RETURN; - Silently exits when data is missing
  3. Trigger functions returning NULL - No indication of partial failures
  4. No logging inside functions - No visibility into function execution

When these silent failures occur:

  • The application layer receives no error (function "succeeds" but does nothing)
  • No logs are generated for debugging
  • Issues are only discovered when users report missing data
  • Root cause analysis is extremely difficult

Example of Silent Failure:

-- This function silently does nothing if achievement doesn't exist
CREATE OR REPLACE FUNCTION public.award_achievement(p_user_id UUID, p_achievement_name TEXT)
RETURNS void AS $$
BEGIN
    SELECT achievement_id INTO v_achievement_id FROM achievements WHERE name = p_achievement_name;
    IF v_achievement_id IS NULL THEN
        RETURN;  -- Silent failure - no log, no error
    END IF;
    -- ...
END;
$$;

ADR-015 established Logstash + Bugsink for error tracking, with PostgreSQL log integration marked as "future". This ADR defines the implementation.

Decision

We will implement a standardized PostgreSQL function observability strategy with three tiers of logging severity:

1. Function Logging Helper

Create a reusable logging function that outputs structured JSON to PostgreSQL logs:

-- Function to emit structured log messages from PL/pgSQL
CREATE OR REPLACE FUNCTION public.fn_log(
    p_level TEXT,           -- 'DEBUG', 'INFO', 'NOTICE', 'WARNING', 'ERROR'
    p_function_name TEXT,   -- The calling function name
    p_message TEXT,         -- Human-readable message
    p_context JSONB DEFAULT NULL  -- Additional context (user_id, params, etc.)
)
RETURNS void
LANGUAGE plpgsql
AS $$
DECLARE
    log_line TEXT;
BEGIN
    -- Build structured JSON log line
    log_line := jsonb_build_object(
        'timestamp', now(),
        'level', p_level,
        'source', 'postgresql',
        'function', p_function_name,
        'message', p_message,
        'context', COALESCE(p_context, '{}'::jsonb)
    )::text;

    -- Use appropriate RAISE level
    CASE p_level
        WHEN 'DEBUG' THEN RAISE DEBUG '%', log_line;
        WHEN 'INFO' THEN RAISE INFO '%', log_line;
        WHEN 'NOTICE' THEN RAISE NOTICE '%', log_line;
        WHEN 'WARNING' THEN RAISE WARNING '%', log_line;
        WHEN 'ERROR' THEN RAISE LOG '%', log_line;  -- Use LOG for errors to ensure capture
        ELSE RAISE NOTICE '%', log_line;
    END CASE;
END;
$$;

2. Logging Tiers

Tier 1: Critical Functions (Always Log)

Functions where silent failure causes data corruption or user-facing issues:

Function Log Events
handle_new_user() User creation, profile creation, errors
award_achievement() Achievement not found, already awarded
approve_correction() Correction not found, permission denied
complete_shopping_list() List not found, permission denied
add_menu_plan_to_shopping_list() Permission denied, items added
fork_recipe() Original not found, fork created

Pattern:

CREATE OR REPLACE FUNCTION public.award_achievement(p_user_id UUID, p_achievement_name TEXT)
RETURNS void AS $$
DECLARE
    v_achievement_id BIGINT;
    v_points_value INTEGER;
    v_context JSONB;
BEGIN
    v_context := jsonb_build_object('user_id', p_user_id, 'achievement_name', p_achievement_name);

    SELECT achievement_id, points_value INTO v_achievement_id, v_points_value
    FROM public.achievements WHERE name = p_achievement_name;

    IF v_achievement_id IS NULL THEN
        -- Log the issue instead of silent return
        PERFORM fn_log('WARNING', 'award_achievement',
            'Achievement not found: ' || p_achievement_name, v_context);
        RETURN;
    END IF;

    INSERT INTO public.user_achievements (user_id, achievement_id)
    VALUES (p_user_id, v_achievement_id)
    ON CONFLICT (user_id, achievement_id) DO NOTHING;

    IF FOUND THEN
        UPDATE public.profiles SET points = points + v_points_value WHERE user_id = p_user_id;
        PERFORM fn_log('INFO', 'award_achievement',
            'Achievement awarded: ' || p_achievement_name, v_context);
    END IF;
END;
$$;

Tier 2: Business Logic Functions (Log on Anomalies)

Functions where unexpected conditions should be logged but aren't critical:

Function Log Events
suggest_master_item_for_flyer_item() No match found (below threshold)
recommend_recipes_for_user() No recommendations generated
find_recipes_from_pantry() Empty pantry, no recipes found
get_best_sale_prices_for_user() No watched items, no current sales

Pattern: Log when results are unexpectedly empty or inputs are invalid.

Tier 3: Triggers (Log Errors Only)

Triggers should be fast, so only log when something goes wrong:

Trigger Function Log Events
update_price_history_on_flyer_item_insert() Failed to update history
update_recipe_rating_aggregates() Rating calculation failed
log_new_recipe() Profile lookup failed
log_new_flyer() Store lookup failed

3. PostgreSQL Configuration

Enable logging in postgresql.conf:

# Log all function notices and above
log_min_messages = notice

# Include function name in log prefix
log_line_prefix = '%t [%p] %u@%d '

# Log to file for Logstash pickup
logging_collector = on
log_directory = '/var/log/postgresql'
log_filename = 'postgresql-%Y-%m-%d.log'
log_rotation_age = 1d
log_rotation_size = 100MB

# Capture slow queries from functions
log_min_duration_statement = 1000  # Log queries over 1 second

4. Logstash Integration

Update the Logstash pipeline (extends ADR-015 configuration):

# PostgreSQL function log input
input {
  file {
    path => "/var/log/postgresql/*.log"
    type => "postgres"
    tags => ["postgres"]
    start_position => "beginning"
    sincedb_path => "/var/lib/logstash/sincedb_postgres"
  }
}

filter {
  if [type] == "postgres" {
    # Extract timestamp and process ID from PostgreSQL log prefix
    grok {
      match => { "message" => "%{TIMESTAMP_ISO8601:pg_timestamp} \[%{POSINT:pg_pid}\] %{USER:pg_user}@%{WORD:pg_database} %{GREEDYDATA:pg_message}" }
    }

    # Check if this is a structured JSON log from fn_log()
    if [pg_message] =~ /^\{.*"source":"postgresql".*\}$/ {
      json {
        source => "pg_message"
        target => "fn_log"
      }

      # Mark as error if level is WARNING or ERROR
      if [fn_log][level] in ["WARNING", "ERROR"] {
        mutate { add_tag => ["error", "db_function"] }
      }
    }

    # Also catch native PostgreSQL errors
    if [pg_message] =~ /^ERROR:/ or [pg_message] =~ /^FATAL:/ {
      mutate { add_tag => ["error", "postgres_native"] }
    }
  }
}

output {
  if "error" in [tags] and "postgres" in [tags] {
    http {
      url => "http://localhost:8000/api/store/"
      http_method => "post"
      format => "json"
    }
  }
}

5. Dual-File Update Requirement

IMPORTANT: All SQL function changes must be applied to BOTH files:

  1. sql/Initial_triggers_and_functions.sql - Used for incremental updates
  2. sql/master_schema_rollup.sql - Used for fresh database setup

Both files must remain in sync for triggers and functions.

Implementation Steps

  1. Create fn_log() helper function:

    • Add to both Initial_triggers_and_functions.sql and master_schema_rollup.sql
    • Test with SELECT fn_log('INFO', 'test', 'Test message', '{"key": "value"}'::jsonb);
  2. Update Tier 1 critical functions (highest priority):

    • award_achievement() - Log missing achievements, duplicate awards
    • handle_new_user() - Log user creation success/failure
    • approve_correction() - Log not found, permission denied
    • complete_shopping_list() - Log permission checks
    • add_menu_plan_to_shopping_list() - Log permission checks, items added
    • fork_recipe() - Log original not found
  3. Update Tier 2 business logic functions:

    • Add anomaly logging to suggestion/recommendation functions
    • Log empty result sets with context
  4. Update Tier 3 trigger functions:

    • Add error-only logging to critical triggers
    • Wrap complex trigger logic in exception handlers
  5. Configure PostgreSQL logging:

    • Update postgresql.conf in dev container
    • Update production PostgreSQL configuration
    • Verify logs appear in expected location
  6. Update Logstash pipeline:

    • Add PostgreSQL input to bugsink.conf
    • Add filter rules for structured JSON extraction
    • Test end-to-end: function log → Logstash → Bugsink
  7. Verify in Bugsink:

    • Confirm database function errors appear as issues
    • Verify context (user_id, function name, params) is captured

Consequences

Positive

  • Visibility: Silent failures become visible in error tracking
  • Debugging: Function execution context captured for root cause analysis
  • Proactive detection: Anomalies logged before users report issues
  • Unified monitoring: Database errors appear alongside application errors in Bugsink
  • Structured logs: JSON format enables filtering and aggregation

Negative

  • Performance overhead: Logging adds latency to function execution
  • Log volume: Tier 1/2 functions may generate significant log volume
  • Maintenance: Two SQL files must be kept in sync
  • PostgreSQL configuration: Requires access to postgresql.conf

Mitigations

  • Performance: Only log meaningful events, not every function call
  • Log volume: Use appropriate log levels; Logstash filters reduce noise
  • Sync: Add CI check to verify SQL files match for function definitions
  • Configuration: Document PostgreSQL settings in deployment runbook

Examples

Before (Silent Failure)

-- User thinks achievement was awarded, but it silently failed
SELECT award_achievement('user-uuid', 'Nonexistent Badge');
-- Returns: void (no error, no log)
-- Result: User never gets achievement, nobody knows why

After (Observable Failure)

SELECT award_achievement('user-uuid', 'Nonexistent Badge');
-- Returns: void
-- PostgreSQL log: {"timestamp":"2026-01-11T10:30:00Z","level":"WARNING","source":"postgresql","function":"award_achievement","message":"Achievement not found: Nonexistent Badge","context":{"user_id":"user-uuid","achievement_name":"Nonexistent Badge"}}
-- Bugsink: New issue created with full context

References