- Added tiered logging and error handling in various PostgreSQL functions to improve observability and error tracking. - Updated E2E tests to reflect changes in API endpoints for fetching best watched prices. - Introduced a comprehensive troubleshooting runbook for Logstash to assist in diagnosing common issues in the PostgreSQL observability pipeline.
12 KiB
Logstash Troubleshooting Runbook
This runbook provides step-by-step diagnostics and solutions for common Logstash issues in the PostgreSQL observability pipeline (ADR-050).
Quick Reference
| Symptom | Most Likely Cause | Quick Check |
|---|---|---|
| No errors in Bugsink | Logstash not running | systemctl status logstash |
| Events not processed | Grok pattern mismatch | Check filter failures in stats |
| Wrong Bugsink project | Environment detection failed | Verify pg_database field extraction |
| 403 authentication error | Missing/wrong DSN key | Check X-Sentry-Auth header |
| 500 error from Bugsink | Invalid event format | Verify event_id and required fields |
Diagnostic Steps
1. Verify Logstash is Running
# Check service status
systemctl status logstash
# If stopped, start it
systemctl start logstash
# View recent logs
journalctl -u logstash -n 50 --no-pager
Expected output:
- Status:
active (running) - No error messages in recent logs
2. Check Configuration Syntax
# Test configuration file
/usr/share/logstash/bin/logstash --config.test_and_exit -f /etc/logstash/conf.d/bugsink.conf
Expected output:
Configuration OK
If syntax errors:
- Review error message for line number
- Check for missing braces, quotes, or commas
- Verify plugin names are correct (e.g.,
json,grok,uuid,http)
3. Verify PostgreSQL Logs Are Being Read
# Check if log file exists and has content
ls -lh /var/log/postgresql/postgresql-$(date +%Y-%m-%d).log
# Check Logstash can read the file
sudo -u logstash cat /var/log/postgresql/postgresql-$(date +%Y-%m-%d).log | head -10
Expected output:
- Log file exists and is not empty
- Logstash user can read the file without permission errors
If permission denied:
# Check Logstash is in postgres group
groups logstash
# Should show: logstash : logstash adm postgres
# If not, add to group
usermod -a -G postgres logstash
systemctl restart logstash
4. Check Logstash Pipeline Stats
# Get pipeline statistics
curl -XGET 'localhost:9600/_node/stats/pipelines?pretty' | jq '.pipelines.main.plugins.filters'
Key metrics to check:
-
Grok filter events:
"events.in"- Total events received"events.out"- Events successfully parsed"failures"- Events that failed to parse
If failures > 0: Grok pattern doesn't match log format. Check PostgreSQL log format.
-
JSON filter events:
"events.in"- Events received by JSON parser"events.out"- Successfully parsed JSON
If events.in = 0: Regex check
pg_message =~ /^\{/is not matching. Verify fn_log() output format. -
UUID filter events:
- Should match number of errors being forwarded
5. Test Grok Pattern Manually
# Get a sample log line
tail -1 /var/log/postgresql/postgresql-$(date +%Y-%m-%d).log
# Example expected format:
# 2026-01-20 10:30:00 +05 [12345] flyer_crawler_prod@flyer-crawler-prod WARNING: {"level":"WARNING","source":"postgresql",...}
Pattern breakdown:
%{TIMESTAMP_ISO8601:pg_timestamp} # 2026-01-20 10:30:00
[+-]%{INT:pg_timezone} # +05
\[%{POSINT:pg_pid}\] # [12345]
%{DATA:pg_user}@%{DATA:pg_database} # flyer_crawler_prod@flyer-crawler-prod
%{WORD:pg_level}: # WARNING:
%{GREEDYDATA:pg_message} # (rest of line)
If pattern doesn't match:
- Check PostgreSQL
log_line_prefixsetting in/etc/postgresql/14/main/conf.d/observability.conf - Should be:
log_line_prefix = '%t [%p] %u@%d ' - Restart PostgreSQL if changed:
systemctl restart postgresql
6. Verify Environment Detection
# Check recent PostgreSQL logs for database field
tail -20 /var/log/postgresql/postgresql-$(date +%Y-%m-%d).log | grep -E "flyer-crawler-(prod|test)"
Expected:
- Production database:
flyer_crawler_prod@flyer-crawler-prod - Test database:
flyer_crawler_test@flyer-crawler-test
If database name doesn't match:
- Check database connection string in application
- Verify
DB_DATABASE_PRODandDB_DATABASE_TESTGitea secrets
7. Test Bugsink API Connection
# Test production endpoint
curl -X POST https://bugsink.projectium.com/api/1/store/ \
-H "X-Sentry-Auth: Sentry sentry_version=7, sentry_client=test/1.0, sentry_key=911aef02b9a548fa8fabb8a3c81abfe5" \
-H "Content-Type: application/json" \
-d '{
"event_id": "12345678901234567890123456789012",
"timestamp": "2026-01-20T10:30:00Z",
"platform": "other",
"level": "error",
"logger": "test",
"message": "Test error from troubleshooting"
}'
Expected response:
- HTTP 200 OK
- Response body:
{"id": "..."}
If 403 Forbidden:
- DSN key is wrong in
/etc/logstash/conf.d/bugsink.conf - Get correct key from Bugsink UI: Settings → Projects → DSN
If 500 Internal Server Error:
- Missing required fields (event_id, timestamp, level)
- Check
mappingsection in Logstash config
8. Monitor Logstash Output in Real-Time
# Watch Logstash processing logs
journalctl -u logstash -f
What to look for:
"response code => 200"- Successful forwarding to Bugsink"response code => 403"- Authentication failure"response code => 500"- Invalid event format- Grok parse failures
Common Issues and Solutions
Issue 1: Grok Pattern Parse Failures
Symptoms:
- Logstash stats show increasing
"failures"count - No events reaching Bugsink
Diagnosis:
curl -XGET 'localhost:9600/_node/stats/pipelines?pretty' | jq '.pipelines.main.plugins.filters[] | select(.name == "grok") | .failures'
Solution:
- Check PostgreSQL log format matches expected pattern
- Verify
log_line_prefixin PostgreSQL config - Test with sample log line using Grok Debugger (Kibana Dev Tools)
Issue 2: JSON Filter Not Parsing fn_log() Output
Symptoms:
- Grok parses successfully but JSON filter shows 0 events
[fn_log]fields missing in Logstash output
Diagnosis:
# Check if pg_message field contains JSON
tail -20 /var/log/postgresql/postgresql-$(date +%Y-%m-%d).log | grep "WARNING:" | grep "{"
Solution:
- Verify
fn_log()function exists in database:\df fn_log - Test
fn_log()output format:SELECT fn_log('WARNING', 'test', 'Test message', '{"key":"value"}'::jsonb); - Check logs show JSON output starting with
{
Issue 3: Events Going to Wrong Bugsink Project
Symptoms:
- Production errors appear in test project (or vice versa)
Diagnosis:
# Check database name detection in recent logs
tail -50 /var/log/postgresql/postgresql-$(date +%Y-%m-%d).log | grep -E "(flyer-crawler-prod|flyer-crawler-test)"
Solution:
- Verify database names in filter section match actual database names
- Check
pg_databasefield is correctly extracted by grok pattern:# Enable debug output in Logstash config temporarily stdout { codec => rubydebug { metadata => true } } - Verify environment tagging in filter:
pg_database == "flyer-crawler-prod"→ adds "production" tag → routes to project 1pg_database == "flyer-crawler-test"→ adds "test" tag → routes to project 3
Issue 4: 403 Authentication Errors from Bugsink
Symptoms:
- Logstash logs show
response code => 403 - Events not appearing in Bugsink
Diagnosis:
# Check Logstash output logs for authentication errors
journalctl -u logstash -n 100 | grep "403"
Solution:
- Verify DSN key in
/etc/logstash/conf.d/bugsink.confmatches Bugsink project - Get correct DSN from Bugsink UI:
- Navigate to Settings → Projects → Click project
- Copy "DSN" value
- Extract key:
http://KEY@host/PROJECT_ID→ use KEY
- Update
X-Sentry-Authheader in Logstash config:"X-Sentry-Auth" => "Sentry sentry_version=7, sentry_client=logstash/1.0, sentry_key=YOUR_KEY_HERE" - Restart Logstash:
systemctl restart logstash
Issue 5: 500 Errors from Bugsink
Symptoms:
- Logstash logs show
response code => 500 - Bugsink logs show validation errors
Diagnosis:
# Check Bugsink logs for details
docker logs bugsink-web 2>&1 | tail -50
Common causes:
- Missing
event_idfield - Invalid timestamp format
- Missing required Sentry fields
Solution:
- Verify
uuidfilter is generatingevent_id:uuid { target => "[@metadata][event_id]" overwrite => true } - Check
mappingsection includes all required fields:event_id(UUID)timestamp(ISO 8601)platform(string)level(error/warning/info)logger(string)message(string)
Issue 6: High Memory Usage by Logstash
Symptoms:
- Server running out of memory
- Logstash OOM killed
Diagnosis:
# Check Logstash memory usage
ps aux | grep logstash
systemctl status logstash
Solution:
- Limit Logstash heap size in
/etc/logstash/jvm.options:-Xms1g -Xmx1g - Restart Logstash:
systemctl restart logstash - Monitor with:
top -p $(pgrep -f logstash)
Issue 7: Log File Rotation Issues
Symptoms:
- Logstash stops processing after log file rotates
- Sincedb file pointing to old inode
Diagnosis:
# Check sincedb file
cat /var/lib/logstash/sincedb_postgres
# Check current log file inode
ls -li /var/log/postgresql/postgresql-$(date +%Y-%m-%d).log
Solution:
- Logstash should automatically detect rotation
- If stuck, delete sincedb file (will reprocess recent logs):
systemctl stop logstash rm /var/lib/logstash/sincedb_postgres systemctl start logstash
Verification Checklist
After making any changes, verify the pipeline is working:
- Logstash is running:
systemctl status logstash - Configuration is valid:
/usr/share/logstash/bin/logstash --config.test_and_exit -f /etc/logstash/conf.d/bugsink.conf - No grok failures:
curl localhost:9600/_node/stats/pipelines?pretty | jq '.pipelines.main.plugins.filters[] | select(.name == "grok") | .failures' - Events being processed:
curl localhost:9600/_node/stats/pipelines?pretty | jq '.pipelines.main.events' - Test error appears in Bugsink: Trigger a database function error and check Bugsink UI
Test Database Function Error
To generate a test error for verification:
# Connect to production database
sudo -u postgres psql -d flyer-crawler-prod
# Trigger an error (achievement not found)
SELECT award_achievement('00000000-0000-0000-0000-000000000001'::uuid, 'Nonexistent Badge');
\q
Expected flow:
- PostgreSQL logs the error to
/var/log/postgresql/postgresql-YYYY-MM-DD.log - Logstash reads and parses the log (within ~30 seconds)
- Error appears in Bugsink project 1 (production)
If error doesn't appear:
- Check each diagnostic step above
- Review Logstash logs:
journalctl -u logstash -f
Related Documentation
- Setup Guide: docs/BARE-METAL-SETUP.md - PostgreSQL Function Observability section
- Architecture: docs/adr/0050-postgresql-function-observability.md
- Configuration Reference: CLAUDE.md - Logstash Configuration section
- Bugsink MCP Server: CLAUDE.md - Sentry/Bugsink MCP Server Setup section