all the new shiny things
All checks were successful
Deploy to Test Environment / deploy-to-test (push) Successful in 15m54s
All checks were successful
Deploy to Test Environment / deploy-to-test (push) Successful in 15m54s
This commit is contained in:
341
docs/adr/0050-postgresql-function-observability.md
Normal file
341
docs/adr/0050-postgresql-function-observability.md
Normal file
@@ -0,0 +1,341 @@
|
||||
# ADR-050: PostgreSQL Function Observability
|
||||
|
||||
**Date**: 2026-01-11
|
||||
|
||||
**Status**: Proposed
|
||||
|
||||
**Related**: [ADR-015](0015-application-performance-monitoring-and-error-tracking.md), [ADR-004](0004-standardized-application-wide-structured-logging.md)
|
||||
|
||||
## Context
|
||||
|
||||
The application uses 30+ PostgreSQL functions and 11+ triggers for business logic, including:
|
||||
|
||||
- Recipe recommendations and search
|
||||
- Shopping list generation from menu plans
|
||||
- Price history tracking
|
||||
- Achievement awards
|
||||
- Activity logging
|
||||
- User profile creation
|
||||
|
||||
**Current Problem**: These database functions can fail silently in several ways:
|
||||
|
||||
1. **`ON CONFLICT DO NOTHING`** - Swallows constraint violations without notification
|
||||
2. **`IF NOT FOUND THEN RETURN;`** - Silently exits when data is missing
|
||||
3. **Trigger functions returning `NULL`** - No indication of partial failures
|
||||
4. **No logging inside functions** - No visibility into function execution
|
||||
|
||||
When these silent failures occur:
|
||||
|
||||
- The application layer receives no error (function "succeeds" but does nothing)
|
||||
- No logs are generated for debugging
|
||||
- Issues are only discovered when users report missing data
|
||||
- Root cause analysis is extremely difficult
|
||||
|
||||
**Example of Silent Failure**:
|
||||
|
||||
```sql
|
||||
-- This function silently does nothing if achievement doesn't exist
|
||||
CREATE OR REPLACE FUNCTION public.award_achievement(p_user_id UUID, p_achievement_name TEXT)
|
||||
RETURNS void AS $$
|
||||
BEGIN
|
||||
SELECT achievement_id INTO v_achievement_id FROM achievements WHERE name = p_achievement_name;
|
||||
IF v_achievement_id IS NULL THEN
|
||||
RETURN; -- Silent failure - no log, no error
|
||||
END IF;
|
||||
-- ...
|
||||
END;
|
||||
$$;
|
||||
```
|
||||
|
||||
ADR-015 established Logstash + Bugsink for error tracking, with PostgreSQL log integration marked as "future". This ADR defines the implementation.
|
||||
|
||||
## Decision
|
||||
|
||||
We will implement a standardized PostgreSQL function observability strategy with three tiers of logging severity:
|
||||
|
||||
### 1. Function Logging Helper
|
||||
|
||||
Create a reusable logging function that outputs structured JSON to PostgreSQL logs:
|
||||
|
||||
```sql
|
||||
-- Function to emit structured log messages from PL/pgSQL
|
||||
CREATE OR REPLACE FUNCTION public.fn_log(
|
||||
p_level TEXT, -- 'DEBUG', 'INFO', 'NOTICE', 'WARNING', 'ERROR'
|
||||
p_function_name TEXT, -- The calling function name
|
||||
p_message TEXT, -- Human-readable message
|
||||
p_context JSONB DEFAULT NULL -- Additional context (user_id, params, etc.)
|
||||
)
|
||||
RETURNS void
|
||||
LANGUAGE plpgsql
|
||||
AS $$
|
||||
DECLARE
|
||||
log_line TEXT;
|
||||
BEGIN
|
||||
-- Build structured JSON log line
|
||||
log_line := jsonb_build_object(
|
||||
'timestamp', now(),
|
||||
'level', p_level,
|
||||
'source', 'postgresql',
|
||||
'function', p_function_name,
|
||||
'message', p_message,
|
||||
'context', COALESCE(p_context, '{}'::jsonb)
|
||||
)::text;
|
||||
|
||||
-- Use appropriate RAISE level
|
||||
CASE p_level
|
||||
WHEN 'DEBUG' THEN RAISE DEBUG '%', log_line;
|
||||
WHEN 'INFO' THEN RAISE INFO '%', log_line;
|
||||
WHEN 'NOTICE' THEN RAISE NOTICE '%', log_line;
|
||||
WHEN 'WARNING' THEN RAISE WARNING '%', log_line;
|
||||
WHEN 'ERROR' THEN RAISE LOG '%', log_line; -- Use LOG for errors to ensure capture
|
||||
ELSE RAISE NOTICE '%', log_line;
|
||||
END CASE;
|
||||
END;
|
||||
$$;
|
||||
```
|
||||
|
||||
### 2. Logging Tiers
|
||||
|
||||
#### Tier 1: Critical Functions (Always Log)
|
||||
|
||||
Functions where silent failure causes data corruption or user-facing issues:
|
||||
|
||||
| Function | Log Events |
|
||||
| ---------------------------------- | --------------------------------------- |
|
||||
| `handle_new_user()` | User creation, profile creation, errors |
|
||||
| `award_achievement()` | Achievement not found, already awarded |
|
||||
| `approve_correction()` | Correction not found, permission denied |
|
||||
| `complete_shopping_list()` | List not found, permission denied |
|
||||
| `add_menu_plan_to_shopping_list()` | Permission denied, items added |
|
||||
| `fork_recipe()` | Original not found, fork created |
|
||||
|
||||
**Pattern**:
|
||||
|
||||
```sql
|
||||
CREATE OR REPLACE FUNCTION public.award_achievement(p_user_id UUID, p_achievement_name TEXT)
|
||||
RETURNS void AS $$
|
||||
DECLARE
|
||||
v_achievement_id BIGINT;
|
||||
v_points_value INTEGER;
|
||||
v_context JSONB;
|
||||
BEGIN
|
||||
v_context := jsonb_build_object('user_id', p_user_id, 'achievement_name', p_achievement_name);
|
||||
|
||||
SELECT achievement_id, points_value INTO v_achievement_id, v_points_value
|
||||
FROM public.achievements WHERE name = p_achievement_name;
|
||||
|
||||
IF v_achievement_id IS NULL THEN
|
||||
-- Log the issue instead of silent return
|
||||
PERFORM fn_log('WARNING', 'award_achievement',
|
||||
'Achievement not found: ' || p_achievement_name, v_context);
|
||||
RETURN;
|
||||
END IF;
|
||||
|
||||
INSERT INTO public.user_achievements (user_id, achievement_id)
|
||||
VALUES (p_user_id, v_achievement_id)
|
||||
ON CONFLICT (user_id, achievement_id) DO NOTHING;
|
||||
|
||||
IF FOUND THEN
|
||||
UPDATE public.profiles SET points = points + v_points_value WHERE user_id = p_user_id;
|
||||
PERFORM fn_log('INFO', 'award_achievement',
|
||||
'Achievement awarded: ' || p_achievement_name, v_context);
|
||||
END IF;
|
||||
END;
|
||||
$$;
|
||||
```
|
||||
|
||||
#### Tier 2: Business Logic Functions (Log on Anomalies)
|
||||
|
||||
Functions where unexpected conditions should be logged but aren't critical:
|
||||
|
||||
| Function | Log Events |
|
||||
| -------------------------------------- | ---------------------------------- |
|
||||
| `suggest_master_item_for_flyer_item()` | No match found (below threshold) |
|
||||
| `recommend_recipes_for_user()` | No recommendations generated |
|
||||
| `find_recipes_from_pantry()` | Empty pantry, no recipes found |
|
||||
| `get_best_sale_prices_for_user()` | No watched items, no current sales |
|
||||
|
||||
**Pattern**: Log when results are unexpectedly empty or inputs are invalid.
|
||||
|
||||
#### Tier 3: Triggers (Log Errors Only)
|
||||
|
||||
Triggers should be fast, so only log when something goes wrong:
|
||||
|
||||
| Trigger Function | Log Events |
|
||||
| --------------------------------------------- | ------------------------- |
|
||||
| `update_price_history_on_flyer_item_insert()` | Failed to update history |
|
||||
| `update_recipe_rating_aggregates()` | Rating calculation failed |
|
||||
| `log_new_recipe()` | Profile lookup failed |
|
||||
| `log_new_flyer()` | Store lookup failed |
|
||||
|
||||
### 3. PostgreSQL Configuration
|
||||
|
||||
Enable logging in `postgresql.conf`:
|
||||
|
||||
```ini
|
||||
# Log all function notices and above
|
||||
log_min_messages = notice
|
||||
|
||||
# Include function name in log prefix
|
||||
log_line_prefix = '%t [%p] %u@%d '
|
||||
|
||||
# Log to file for Logstash pickup
|
||||
logging_collector = on
|
||||
log_directory = '/var/log/postgresql'
|
||||
log_filename = 'postgresql-%Y-%m-%d.log'
|
||||
log_rotation_age = 1d
|
||||
log_rotation_size = 100MB
|
||||
|
||||
# Capture slow queries from functions
|
||||
log_min_duration_statement = 1000 # Log queries over 1 second
|
||||
```
|
||||
|
||||
### 4. Logstash Integration
|
||||
|
||||
Update the Logstash pipeline (extends ADR-015 configuration):
|
||||
|
||||
```conf
|
||||
# PostgreSQL function log input
|
||||
input {
|
||||
file {
|
||||
path => "/var/log/postgresql/*.log"
|
||||
type => "postgres"
|
||||
tags => ["postgres"]
|
||||
start_position => "beginning"
|
||||
sincedb_path => "/var/lib/logstash/sincedb_postgres"
|
||||
}
|
||||
}
|
||||
|
||||
filter {
|
||||
if [type] == "postgres" {
|
||||
# Extract timestamp and process ID from PostgreSQL log prefix
|
||||
grok {
|
||||
match => { "message" => "%{TIMESTAMP_ISO8601:pg_timestamp} \[%{POSINT:pg_pid}\] %{USER:pg_user}@%{WORD:pg_database} %{GREEDYDATA:pg_message}" }
|
||||
}
|
||||
|
||||
# Check if this is a structured JSON log from fn_log()
|
||||
if [pg_message] =~ /^\{.*"source":"postgresql".*\}$/ {
|
||||
json {
|
||||
source => "pg_message"
|
||||
target => "fn_log"
|
||||
}
|
||||
|
||||
# Mark as error if level is WARNING or ERROR
|
||||
if [fn_log][level] in ["WARNING", "ERROR"] {
|
||||
mutate { add_tag => ["error", "db_function"] }
|
||||
}
|
||||
}
|
||||
|
||||
# Also catch native PostgreSQL errors
|
||||
if [pg_message] =~ /^ERROR:/ or [pg_message] =~ /^FATAL:/ {
|
||||
mutate { add_tag => ["error", "postgres_native"] }
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
output {
|
||||
if "error" in [tags] and "postgres" in [tags] {
|
||||
http {
|
||||
url => "http://localhost:8000/api/store/"
|
||||
http_method => "post"
|
||||
format => "json"
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 5. Dual-File Update Requirement
|
||||
|
||||
**IMPORTANT**: All SQL function changes must be applied to BOTH files:
|
||||
|
||||
1. `sql/Initial_triggers_and_functions.sql` - Used for incremental updates
|
||||
2. `sql/master_schema_rollup.sql` - Used for fresh database setup
|
||||
|
||||
Both files must remain in sync for triggers and functions.
|
||||
|
||||
## Implementation Steps
|
||||
|
||||
1. **Create `fn_log()` helper function**:
|
||||
- Add to both `Initial_triggers_and_functions.sql` and `master_schema_rollup.sql`
|
||||
- Test with `SELECT fn_log('INFO', 'test', 'Test message', '{"key": "value"}'::jsonb);`
|
||||
|
||||
2. **Update Tier 1 critical functions** (highest priority):
|
||||
- `award_achievement()` - Log missing achievements, duplicate awards
|
||||
- `handle_new_user()` - Log user creation success/failure
|
||||
- `approve_correction()` - Log not found, permission denied
|
||||
- `complete_shopping_list()` - Log permission checks
|
||||
- `add_menu_plan_to_shopping_list()` - Log permission checks, items added
|
||||
- `fork_recipe()` - Log original not found
|
||||
|
||||
3. **Update Tier 2 business logic functions**:
|
||||
- Add anomaly logging to suggestion/recommendation functions
|
||||
- Log empty result sets with context
|
||||
|
||||
4. **Update Tier 3 trigger functions**:
|
||||
- Add error-only logging to critical triggers
|
||||
- Wrap complex trigger logic in exception handlers
|
||||
|
||||
5. **Configure PostgreSQL logging**:
|
||||
- Update `postgresql.conf` in dev container
|
||||
- Update production PostgreSQL configuration
|
||||
- Verify logs appear in expected location
|
||||
|
||||
6. **Update Logstash pipeline**:
|
||||
- Add PostgreSQL input to `bugsink.conf`
|
||||
- Add filter rules for structured JSON extraction
|
||||
- Test end-to-end: function log → Logstash → Bugsink
|
||||
|
||||
7. **Verify in Bugsink**:
|
||||
- Confirm database function errors appear as issues
|
||||
- Verify context (user_id, function name, params) is captured
|
||||
|
||||
## Consequences
|
||||
|
||||
### Positive
|
||||
|
||||
- **Visibility**: Silent failures become visible in error tracking
|
||||
- **Debugging**: Function execution context captured for root cause analysis
|
||||
- **Proactive detection**: Anomalies logged before users report issues
|
||||
- **Unified monitoring**: Database errors appear alongside application errors in Bugsink
|
||||
- **Structured logs**: JSON format enables filtering and aggregation
|
||||
|
||||
### Negative
|
||||
|
||||
- **Performance overhead**: Logging adds latency to function execution
|
||||
- **Log volume**: Tier 1/2 functions may generate significant log volume
|
||||
- **Maintenance**: Two SQL files must be kept in sync
|
||||
- **PostgreSQL configuration**: Requires access to `postgresql.conf`
|
||||
|
||||
### Mitigations
|
||||
|
||||
- **Performance**: Only log meaningful events, not every function call
|
||||
- **Log volume**: Use appropriate log levels; Logstash filters reduce noise
|
||||
- **Sync**: Add CI check to verify SQL files match for function definitions
|
||||
- **Configuration**: Document PostgreSQL settings in deployment runbook
|
||||
|
||||
## Examples
|
||||
|
||||
### Before (Silent Failure)
|
||||
|
||||
```sql
|
||||
-- User thinks achievement was awarded, but it silently failed
|
||||
SELECT award_achievement('user-uuid', 'Nonexistent Badge');
|
||||
-- Returns: void (no error, no log)
|
||||
-- Result: User never gets achievement, nobody knows why
|
||||
```
|
||||
|
||||
### After (Observable Failure)
|
||||
|
||||
```sql
|
||||
SELECT award_achievement('user-uuid', 'Nonexistent Badge');
|
||||
-- Returns: void
|
||||
-- PostgreSQL log: {"timestamp":"2026-01-11T10:30:00Z","level":"WARNING","source":"postgresql","function":"award_achievement","message":"Achievement not found: Nonexistent Badge","context":{"user_id":"user-uuid","achievement_name":"Nonexistent Badge"}}
|
||||
-- Bugsink: New issue created with full context
|
||||
```
|
||||
|
||||
## References
|
||||
|
||||
- [ADR-015: Application Performance Monitoring](0015-application-performance-monitoring-and-error-tracking.md)
|
||||
- [ADR-004: Standardized Structured Logging](0004-standardized-application-wide-structured-logging.md)
|
||||
- [PostgreSQL RAISE Documentation](https://www.postgresql.org/docs/current/plpgsql-errors-and-messages.html)
|
||||
- [PostgreSQL Logging Configuration](https://www.postgresql.org/docs/current/runtime-config-logging.html)
|
||||
Reference in New Issue
Block a user