Files
flyer-crawler.projectium.com/docs/adr/ADR-030-postgresql-function-observability.md
Torben Sorensen 4d323a51ca
All checks were successful
Deploy to Test Environment / deploy-to-test (push) Successful in 49m39s
fix tour / whats new collision
2026-02-12 04:29:43 -08:00

337 lines
11 KiB
Markdown

# ADR-030: PostgreSQL Function Observability
**Date**: 2026-02-10
**Status**: Accepted
**Source**: Imported from flyer-crawler project (ADR-050)
**Related**: [ADR-029](ADR-029-error-tracking-with-bugsink.md), [ADR-027](ADR-027-application-wide-structured-logging.md)
## Context
Applications often use PostgreSQL functions and triggers for business logic, including:
- Data transformations and validations
- Complex query encapsulation
- Trigger-based side effects
- Audit logging
**Current Problem**: These database functions can fail silently in several ways:
1. **`ON CONFLICT DO NOTHING`** - Swallows constraint violations without notification
2. **`IF NOT FOUND THEN RETURN;`** - Silently exits when data is missing
3. **Trigger functions returning `NULL`** - No indication of partial failures
4. **No logging inside functions** - No visibility into function execution
When these silent failures occur:
- The application layer receives no error (function "succeeds" but does nothing)
- No logs are generated for debugging
- Issues are only discovered when users report missing data
- Root cause analysis is extremely difficult
**Example of Silent Failure**:
```sql
-- This function silently does nothing if record doesn't exist
CREATE OR REPLACE FUNCTION public.process_item(p_user_id UUID, p_item_name TEXT)
RETURNS void AS $$
BEGIN
SELECT item_id INTO v_item_id FROM items WHERE name = p_item_name;
IF v_item_id IS NULL THEN
RETURN; -- Silent failure - no log, no error
END IF;
-- ...
END;
$$;
```
ADR-029 established Logstash + Bugsink for error tracking, with PostgreSQL log integration. This ADR defines the implementation.
## Decision
We will implement a standardized PostgreSQL function observability strategy with three tiers of logging severity.
### 1. Function Logging Helper
Create a reusable logging function that outputs structured JSON to PostgreSQL logs:
```sql
-- Function to emit structured log messages from PL/pgSQL
CREATE OR REPLACE FUNCTION public.fn_log(
p_level TEXT, -- 'DEBUG', 'INFO', 'NOTICE', 'WARNING', 'ERROR'
p_function_name TEXT, -- The calling function name
p_message TEXT, -- Human-readable message
p_context JSONB DEFAULT NULL -- Additional context (user_id, params, etc.)
)
RETURNS void
LANGUAGE plpgsql
AS $$
DECLARE
log_line TEXT;
BEGIN
-- Build structured JSON log line
log_line := jsonb_build_object(
'timestamp', now(),
'level', p_level,
'source', 'postgresql',
'function', p_function_name,
'message', p_message,
'context', COALESCE(p_context, '{}'::jsonb)
)::text;
-- Use appropriate RAISE level
CASE p_level
WHEN 'DEBUG' THEN RAISE DEBUG '%', log_line;
WHEN 'INFO' THEN RAISE INFO '%', log_line;
WHEN 'NOTICE' THEN RAISE NOTICE '%', log_line;
WHEN 'WARNING' THEN RAISE WARNING '%', log_line;
WHEN 'ERROR' THEN RAISE LOG '%', log_line; -- Use LOG for errors to ensure capture
ELSE RAISE NOTICE '%', log_line;
END CASE;
END;
$$;
```
### 2. Logging Tiers
#### Tier 1: Critical Functions (Always Log)
Functions where silent failure causes data corruption or user-facing issues:
| Function Type | Log Events |
| ---------------------------- | --------------------------------------- |
| User creation/management | User creation, profile creation, errors |
| Permission/role changes | Role not found, permission denied |
| Financial transactions | Transaction not found, balance issues |
| Data approval workflows | Record not found, permission denied |
| Critical business operations | Items added, operations completed |
**Pattern**:
```sql
CREATE OR REPLACE FUNCTION public.process_critical_operation(p_user_id UUID, p_operation_name TEXT)
RETURNS void AS $$
DECLARE
v_operation_id BIGINT;
v_context JSONB;
BEGIN
v_context := jsonb_build_object('user_id', p_user_id, 'operation_name', p_operation_name);
SELECT operation_id INTO v_operation_id
FROM public.operations WHERE name = p_operation_name;
IF v_operation_id IS NULL THEN
-- Log the issue instead of silent return
PERFORM fn_log('WARNING', 'process_critical_operation',
'Operation not found: ' || p_operation_name, v_context);
RETURN;
END IF;
-- Perform operation
INSERT INTO public.user_operations (user_id, operation_id)
VALUES (p_user_id, v_operation_id)
ON CONFLICT (user_id, operation_id) DO NOTHING;
IF FOUND THEN
PERFORM fn_log('INFO', 'process_critical_operation',
'Operation completed: ' || p_operation_name, v_context);
END IF;
END;
$$;
```
#### Tier 2: Business Logic Functions (Log on Anomalies)
Functions where unexpected conditions should be logged but are not critical:
| Function Type | Log Events |
| --------------------------- | -------------------------------- |
| Search/suggestion functions | No match found (below threshold) |
| Recommendation engines | No recommendations generated |
| Data lookup functions | Empty results, no matches found |
| Price/analytics queries | No data available, stale data |
**Pattern**: Log when results are unexpectedly empty or inputs are invalid.
#### Tier 3: Triggers (Log Errors Only)
Triggers should be fast, so only log when something goes wrong:
| Trigger Type | Log Events |
| --------------------- | ---------------------------- |
| Audit triggers | Failed to update audit trail |
| Aggregation triggers | Calculation failed |
| Cascade triggers | Related record lookup failed |
| Notification triggers | External service call failed |
### 3. PostgreSQL Configuration
Enable logging in `postgresql.conf`:
```ini
# Log all function notices and above
log_min_messages = notice
# Include function name in log prefix
log_line_prefix = '%t [%p] %u@%d '
# Log to file for Logstash pickup
logging_collector = on
log_directory = '/var/log/postgresql'
log_filename = 'postgresql-%Y-%m-%d.log'
log_rotation_age = 1d
log_rotation_size = 100MB
# Capture slow queries from functions
log_min_duration_statement = 1000 # Log queries over 1 second
```
### 4. Logstash Integration
Update the Logstash pipeline (extends ADR-029 configuration):
```conf
# PostgreSQL function log input
input {
file {
path => "/var/log/postgresql/*.log"
type => "postgres"
tags => ["postgres"]
start_position => "beginning"
sincedb_path => "/var/lib/logstash/sincedb_postgres"
}
}
filter {
if [type] == "postgres" {
# Extract timestamp and process ID from PostgreSQL log prefix
grok {
match => { "message" => "%{TIMESTAMP_ISO8601:pg_timestamp} \[%{POSINT:pg_pid}\] %{USER:pg_user}@%{WORD:pg_database} %{GREEDYDATA:pg_message}" }
}
# Check if this is a structured JSON log from fn_log()
if [pg_message] =~ /^\{.*"source":"postgresql".*\}$/ {
json {
source => "pg_message"
target => "fn_log"
}
# Mark as error if level is WARNING or ERROR
if [fn_log][level] in ["WARNING", "ERROR"] {
mutate { add_tag => ["error", "db_function"] }
}
}
# Also catch native PostgreSQL errors
if [pg_message] =~ /^ERROR:/ or [pg_message] =~ /^FATAL:/ {
mutate { add_tag => ["error", "postgres_native"] }
}
}
}
output {
if "error" in [tags] and "postgres" in [tags] {
http {
url => "http://localhost:8000/api/store/"
http_method => "post"
format => "json"
}
}
}
```
### 5. Dual-File Update Requirement
**IMPORTANT**: All SQL function changes must be applied to BOTH files:
1. `sql/Initial_triggers_and_functions.sql` - Used for incremental updates
2. `sql/master_schema_rollup.sql` - Used for fresh database setup
Both files must remain in sync for triggers and functions.
## Implementation Steps
1. **Create `fn_log()` helper function**:
- Add to both SQL files
- Test with `SELECT fn_log('INFO', 'test', 'Test message', '{"key": "value"}'::jsonb);`
2. **Update Tier 1 critical functions** (highest priority):
- Identify functions with silent failures
- Add appropriate logging calls
- Test error paths
3. **Update Tier 2 business logic functions**:
- Add anomaly logging to suggestion/recommendation functions
- Log empty result sets with context
4. **Update Tier 3 trigger functions**:
- Add error-only logging to critical triggers
- Wrap complex trigger logic in exception handlers
5. **Configure PostgreSQL logging**:
- Update `postgresql.conf` in dev container
- Update production PostgreSQL configuration
- Verify logs appear in expected location
6. **Update Logstash pipeline**:
- Add PostgreSQL input to Logstash config
- Add filter rules for structured JSON extraction
- Test end-to-end: function log -> Logstash -> Bugsink
7. **Verify in Bugsink**:
- Confirm database function errors appear as issues
- Verify context (user_id, function name, params) is captured
## Consequences
### Positive
- **Visibility**: Silent failures become visible in error tracking
- **Debugging**: Function execution context captured for root cause analysis
- **Proactive detection**: Anomalies logged before users report issues
- **Unified monitoring**: Database errors appear alongside application errors in Bugsink
- **Structured logs**: JSON format enables filtering and aggregation
### Negative
- **Performance overhead**: Logging adds latency to function execution
- **Log volume**: Tier 1/2 functions may generate significant log volume
- **Maintenance**: Two SQL files must be kept in sync
- **PostgreSQL configuration**: Requires access to `postgresql.conf`
### Mitigations
- **Performance**: Only log meaningful events, not every function call
- **Log volume**: Use appropriate log levels; Logstash filters reduce noise
- **Sync**: Add CI check to verify SQL files match for function definitions
- **Configuration**: Document PostgreSQL settings in deployment runbook
## Examples
### Before (Silent Failure)
```sql
-- User thinks operation completed, but it silently failed
SELECT process_item('user-uuid', 'Nonexistent Item');
-- Returns: void (no error, no log)
-- Result: User never gets expected result, nobody knows why
```
### After (Observable Failure)
```sql
SELECT process_item('user-uuid', 'Nonexistent Item');
-- Returns: void
-- PostgreSQL log: {"timestamp":"2026-01-11T10:30:00Z","level":"WARNING","source":"postgresql","function":"process_item","message":"Item not found: Nonexistent Item","context":{"user_id":"user-uuid","item_name":"Nonexistent Item"}}
-- Bugsink: New issue created with full context
```
## References
- [ADR-029: Error Tracking with Bugsink](ADR-029-error-tracking-with-bugsink.md)
- [ADR-027: Application-Wide Structured Logging](ADR-027-application-wide-structured-logging.md)
- [PostgreSQL RAISE Documentation](https://www.postgresql.org/docs/current/plpgsql-errors-and-messages.html)
- [PostgreSQL Logging Configuration](https://www.postgresql.org/docs/current/runtime-config-logging.html)