bugsink mcp and claude subagents - documentation and test fixes
This commit is contained in:
1961
docs/operations/BARE-METAL-SETUP.md
Normal file
1961
docs/operations/BARE-METAL-SETUP.md
Normal file
File diff suppressed because it is too large
Load Diff
271
docs/operations/DEPLOYMENT.md
Normal file
271
docs/operations/DEPLOYMENT.md
Normal file
@@ -0,0 +1,271 @@
|
||||
# Deployment Guide
|
||||
|
||||
This guide covers deploying Flyer Crawler to a production server.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- Ubuntu server (22.04 LTS recommended)
|
||||
- PostgreSQL 14+ with PostGIS extension
|
||||
- Redis
|
||||
- Node.js 20.x
|
||||
- NGINX (reverse proxy)
|
||||
- PM2 (process manager)
|
||||
|
||||
---
|
||||
|
||||
## Server Setup
|
||||
|
||||
### Install Node.js
|
||||
|
||||
```bash
|
||||
curl -sL https://deb.nodesource.com/setup_20.x | sudo bash -
|
||||
sudo apt-get install -y nodejs
|
||||
```
|
||||
|
||||
### Install PM2
|
||||
|
||||
```bash
|
||||
sudo npm install -g pm2
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Application Deployment
|
||||
|
||||
### Clone and Install
|
||||
|
||||
```bash
|
||||
git clone <repository-url>
|
||||
cd flyer-crawler.projectium.com
|
||||
npm install
|
||||
```
|
||||
|
||||
### Build for Production
|
||||
|
||||
```bash
|
||||
npm run build
|
||||
```
|
||||
|
||||
### Start with PM2
|
||||
|
||||
```bash
|
||||
npm run start:prod
|
||||
```
|
||||
|
||||
This starts three PM2 processes:
|
||||
|
||||
- `flyer-crawler-api` - Main API server
|
||||
- `flyer-crawler-worker` - Background job worker
|
||||
- `flyer-crawler-analytics-worker` - Analytics processing worker
|
||||
|
||||
---
|
||||
|
||||
## Environment Variables (Gitea Secrets)
|
||||
|
||||
For deployments using Gitea CI/CD workflows, configure these as **repository secrets**:
|
||||
|
||||
| Secret | Description |
|
||||
| --------------------------- | ------------------------------------------- |
|
||||
| `DB_HOST` | PostgreSQL server hostname |
|
||||
| `DB_USER` | PostgreSQL username |
|
||||
| `DB_PASSWORD` | PostgreSQL password |
|
||||
| `DB_DATABASE_PROD` | Production database name |
|
||||
| `REDIS_PASSWORD_PROD` | Production Redis password |
|
||||
| `REDIS_PASSWORD_TEST` | Test Redis password |
|
||||
| `JWT_SECRET` | Long, random string for signing auth tokens |
|
||||
| `VITE_GOOGLE_GENAI_API_KEY` | Google Gemini API key |
|
||||
| `GOOGLE_MAPS_API_KEY` | Google Maps Geocoding API key |
|
||||
|
||||
---
|
||||
|
||||
## NGINX Configuration
|
||||
|
||||
### Reverse Proxy Setup
|
||||
|
||||
Create a site configuration at `/etc/nginx/sites-available/flyer-crawler.projectium.com`:
|
||||
|
||||
```nginx
|
||||
server {
|
||||
listen 80;
|
||||
server_name flyer-crawler.projectium.com;
|
||||
|
||||
location / {
|
||||
proxy_pass http://localhost:5173;
|
||||
proxy_http_version 1.1;
|
||||
proxy_set_header Upgrade $http_upgrade;
|
||||
proxy_set_header Connection 'upgrade';
|
||||
proxy_set_header Host $host;
|
||||
proxy_cache_bypass $http_upgrade;
|
||||
}
|
||||
|
||||
location /api {
|
||||
proxy_pass http://localhost:3001;
|
||||
proxy_http_version 1.1;
|
||||
proxy_set_header Upgrade $http_upgrade;
|
||||
proxy_set_header Connection 'upgrade';
|
||||
proxy_set_header Host $host;
|
||||
proxy_cache_bypass $http_upgrade;
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Enable the site:
|
||||
|
||||
```bash
|
||||
sudo ln -s /etc/nginx/sites-available/flyer-crawler.projectium.com /etc/nginx/sites-enabled/
|
||||
sudo nginx -t
|
||||
sudo systemctl reload nginx
|
||||
```
|
||||
|
||||
### MIME Types Fix for .mjs Files
|
||||
|
||||
If JavaScript modules (`.mjs` files) aren't loading correctly, add the proper MIME type.
|
||||
|
||||
**Option 1**: Edit the site configuration file directly:
|
||||
|
||||
```nginx
|
||||
# Add inside the server block
|
||||
types {
|
||||
application/javascript js mjs;
|
||||
}
|
||||
```
|
||||
|
||||
**Option 2**: Edit `/etc/nginx/mime.types` globally:
|
||||
|
||||
```
|
||||
# Change this line:
|
||||
application/javascript js;
|
||||
|
||||
# To:
|
||||
application/javascript js mjs;
|
||||
```
|
||||
|
||||
After changes:
|
||||
|
||||
```bash
|
||||
sudo nginx -t
|
||||
sudo systemctl reload nginx
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## PM2 Log Management
|
||||
|
||||
Install and configure pm2-logrotate to manage log files:
|
||||
|
||||
```bash
|
||||
pm2 install pm2-logrotate
|
||||
pm2 set pm2-logrotate:max_size 10M
|
||||
pm2 set pm2-logrotate:retain 14
|
||||
pm2 set pm2-logrotate:compress false
|
||||
pm2 set pm2-logrotate:dateFormat YYYY-MM-DD_HH-mm-ss
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Rate Limiting
|
||||
|
||||
The application respects the Gemini AI service's rate limits. You can adjust the `GEMINI_RPM` (requests per minute) environment variable in production as needed without changing the code.
|
||||
|
||||
---
|
||||
|
||||
## CI/CD Pipeline
|
||||
|
||||
The project includes Gitea workflows at `.gitea/workflows/deploy.yml` that:
|
||||
|
||||
1. Run tests against a test database
|
||||
2. Build the application
|
||||
3. Deploy to production on successful builds
|
||||
|
||||
The workflow automatically:
|
||||
|
||||
- Sets up the test database schema before tests
|
||||
- Tears down test data after tests complete
|
||||
- Deploys to the production server
|
||||
|
||||
---
|
||||
|
||||
## Monitoring
|
||||
|
||||
### Check PM2 Status
|
||||
|
||||
```bash
|
||||
pm2 status
|
||||
pm2 logs
|
||||
pm2 logs flyer-crawler-api --lines 100
|
||||
```
|
||||
|
||||
### Restart Services
|
||||
|
||||
```bash
|
||||
pm2 restart all
|
||||
pm2 restart flyer-crawler-api
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Error Tracking with Bugsink (ADR-015)
|
||||
|
||||
Bugsink is a self-hosted Sentry-compatible error tracking system. See [docs/adr/0015-application-performance-monitoring-and-error-tracking.md](docs/adr/0015-application-performance-monitoring-and-error-tracking.md) for the full architecture decision.
|
||||
|
||||
### Creating Bugsink Projects and DSNs
|
||||
|
||||
After Bugsink is installed and running, you need to create projects and obtain DSNs:
|
||||
|
||||
1. **Access Bugsink UI**: Navigate to `http://localhost:8000`
|
||||
|
||||
2. **Log in** with your admin credentials
|
||||
|
||||
3. **Create Backend Project**:
|
||||
- Click "Create Project"
|
||||
- Name: `flyer-crawler-backend`
|
||||
- Platform: Node.js
|
||||
- Copy the generated DSN (format: `http://<key>@localhost:8000/<project_id>`)
|
||||
|
||||
4. **Create Frontend Project**:
|
||||
- Click "Create Project"
|
||||
- Name: `flyer-crawler-frontend`
|
||||
- Platform: React
|
||||
- Copy the generated DSN
|
||||
|
||||
5. **Configure Environment Variables**:
|
||||
|
||||
```bash
|
||||
# Backend (server-side)
|
||||
export SENTRY_DSN=http://<backend-key>@localhost:8000/<backend-project-id>
|
||||
|
||||
# Frontend (client-side, exposed to browser)
|
||||
export VITE_SENTRY_DSN=http://<frontend-key>@localhost:8000/<frontend-project-id>
|
||||
|
||||
# Shared settings
|
||||
export SENTRY_ENVIRONMENT=production
|
||||
export VITE_SENTRY_ENVIRONMENT=production
|
||||
export SENTRY_ENABLED=true
|
||||
export VITE_SENTRY_ENABLED=true
|
||||
```
|
||||
|
||||
### Testing Error Tracking
|
||||
|
||||
Verify Bugsink is receiving events:
|
||||
|
||||
```bash
|
||||
npx tsx scripts/test-bugsink.ts
|
||||
```
|
||||
|
||||
This sends test error and info events. Check the Bugsink UI for:
|
||||
|
||||
- `BugsinkTestError` in the backend project
|
||||
- Info message "Test info message from test-bugsink.ts"
|
||||
|
||||
### Sentry SDK v10+ HTTP DSN Limitation
|
||||
|
||||
The Sentry SDK v10+ enforces HTTPS-only DSNs by default. Since Bugsink runs locally over HTTP, our implementation uses the Sentry Store API directly instead of the SDK's built-in transport. This is handled transparently by the `sentry.server.ts` and `sentry.client.ts` modules.
|
||||
|
||||
---
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- [Database Setup](DATABASE.md) - PostgreSQL and PostGIS configuration
|
||||
- [Authentication Setup](AUTHENTICATION.md) - OAuth provider configuration
|
||||
- [Installation Guide](INSTALL.md) - Local development setup
|
||||
- [Bare-Metal Server Setup](docs/BARE-METAL-SETUP.md) - Manual server installation guide
|
||||
75
docs/operations/LOGSTASH-QUICK-REF.md
Normal file
75
docs/operations/LOGSTASH-QUICK-REF.md
Normal file
@@ -0,0 +1,75 @@
|
||||
# Logstash Quick Reference (ADR-050)
|
||||
|
||||
Aggregates logs from PostgreSQL, PM2, Redis, NGINX; forwards errors to Bugsink.
|
||||
|
||||
## Configuration
|
||||
|
||||
**Primary config**: `/etc/logstash/conf.d/bugsink.conf`
|
||||
|
||||
### Related Files
|
||||
|
||||
| Path | Purpose |
|
||||
| --------------------------------------------------- | ------------------------- |
|
||||
| `/etc/postgresql/14/main/conf.d/observability.conf` | PostgreSQL logging config |
|
||||
| `/var/log/postgresql/*.log` | PostgreSQL logs |
|
||||
| `/home/gitea-runner/.pm2/logs/*.log` | PM2 worker logs |
|
||||
| `/var/log/redis/redis-server.log` | Redis logs |
|
||||
| `/var/log/nginx/access.log` | NGINX access logs |
|
||||
| `/var/log/nginx/error.log` | NGINX error logs |
|
||||
| `/var/log/logstash/*.log` | Logstash file outputs |
|
||||
| `/var/lib/logstash/sincedb_*` | Position tracking files |
|
||||
|
||||
## Features
|
||||
|
||||
- **Multi-source aggregation**: PostgreSQL, PM2 workers, Redis, NGINX
|
||||
- **Environment routing**: Auto-detects prod/test, routes to correct Bugsink project
|
||||
- **JSON parsing**: Extracts `fn_log()` from PostgreSQL, Pino JSON from PM2
|
||||
- **Sentry format**: Transforms to `event_id`, `timestamp`, `level`, `message`, `extra`
|
||||
- **Error filtering**: Only forwards WARNING/ERROR to Bugsink
|
||||
- **Operational storage**: Non-error logs saved to `/var/log/logstash/`
|
||||
- **Request monitoring**: NGINX requests categorized by status, slow request detection
|
||||
|
||||
## Commands
|
||||
|
||||
```bash
|
||||
# Status and control
|
||||
systemctl status logstash
|
||||
systemctl restart logstash
|
||||
|
||||
# Test configuration
|
||||
/usr/share/logstash/bin/logstash --config.test_and_exit -f /etc/logstash/conf.d/bugsink.conf
|
||||
|
||||
# View logs
|
||||
journalctl -u logstash -f
|
||||
|
||||
# Check stats (events processed, failures)
|
||||
curl -XGET 'localhost:9600/_node/stats/pipelines?pretty' | jq '.pipelines.main.plugins.filters'
|
||||
|
||||
# Monitor sources
|
||||
tail -f /var/log/postgresql/postgresql-$(date +%Y-%m-%d).log
|
||||
tail -f /var/log/logstash/pm2-workers-$(date +%Y-%m-%d).log
|
||||
tail -f /var/log/logstash/redis-operational-$(date +%Y-%m-%d).log
|
||||
tail -f /var/log/logstash/nginx-access-$(date +%Y-%m-%d).log
|
||||
|
||||
# Check disk usage
|
||||
du -sh /var/log/logstash/
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
| Issue | Check | Solution |
|
||||
| --------------------- | ---------------- | ---------------------------------------------------------------------------------------------- |
|
||||
| No Bugsink errors | Logstash running | `systemctl status logstash` |
|
||||
| Config syntax error | Test config | `/usr/share/logstash/bin/logstash --config.test_and_exit -f /etc/logstash/conf.d/bugsink.conf` |
|
||||
| Grok pattern failures | Stats endpoint | `curl localhost:9600/_node/stats/pipelines?pretty \| jq '.pipelines.main.plugins.filters'` |
|
||||
| Wrong Bugsink project | Env detection | Check tags in logs match expected environment |
|
||||
| Permission denied | Logstash groups | `groups logstash` should include `postgres`, `adm` |
|
||||
| PM2 not captured | File paths | `ls /home/gitea-runner/.pm2/logs/flyer-crawler-worker-*.log` |
|
||||
| NGINX logs missing | Output directory | `ls -lh /var/log/logstash/nginx-access-*.log` |
|
||||
| High disk usage | Log rotation | Verify `/etc/logrotate.d/logstash` configured |
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- **Full setup**: [BARE-METAL-SETUP.md](BARE-METAL-SETUP.md) - PostgreSQL Function Observability section
|
||||
- **Architecture**: [adr/0050-postgresql-function-observability.md](adr/0050-postgresql-function-observability.md)
|
||||
- **Troubleshooting details**: [LOGSTASH-TROUBLESHOOTING.md](LOGSTASH-TROUBLESHOOTING.md)
|
||||
460
docs/operations/LOGSTASH-TROUBLESHOOTING.md
Normal file
460
docs/operations/LOGSTASH-TROUBLESHOOTING.md
Normal file
@@ -0,0 +1,460 @@
|
||||
# Logstash Troubleshooting Runbook
|
||||
|
||||
This runbook provides step-by-step diagnostics and solutions for common Logstash issues in the PostgreSQL observability pipeline (ADR-050).
|
||||
|
||||
## Quick Reference
|
||||
|
||||
| Symptom | Most Likely Cause | Quick Check |
|
||||
| ------------------------ | ---------------------------- | ------------------------------------- |
|
||||
| No errors in Bugsink | Logstash not running | `systemctl status logstash` |
|
||||
| Events not processed | Grok pattern mismatch | Check filter failures in stats |
|
||||
| Wrong Bugsink project | Environment detection failed | Verify `pg_database` field extraction |
|
||||
| 403 authentication error | Missing/wrong DSN key | Check `X-Sentry-Auth` header |
|
||||
| 500 error from Bugsink | Invalid event format | Verify `event_id` and required fields |
|
||||
|
||||
---
|
||||
|
||||
## Diagnostic Steps
|
||||
|
||||
### 1. Verify Logstash is Running
|
||||
|
||||
```bash
|
||||
# Check service status
|
||||
systemctl status logstash
|
||||
|
||||
# If stopped, start it
|
||||
systemctl start logstash
|
||||
|
||||
# View recent logs
|
||||
journalctl -u logstash -n 50 --no-pager
|
||||
```
|
||||
|
||||
**Expected output:**
|
||||
|
||||
- Status: `active (running)`
|
||||
- No error messages in recent logs
|
||||
|
||||
---
|
||||
|
||||
### 2. Check Configuration Syntax
|
||||
|
||||
```bash
|
||||
# Test configuration file
|
||||
/usr/share/logstash/bin/logstash --config.test_and_exit -f /etc/logstash/conf.d/bugsink.conf
|
||||
```
|
||||
|
||||
**Expected output:**
|
||||
|
||||
```
|
||||
Configuration OK
|
||||
```
|
||||
|
||||
**If syntax errors:**
|
||||
|
||||
1. Review error message for line number
|
||||
2. Check for missing braces, quotes, or commas
|
||||
3. Verify plugin names are correct (e.g., `json`, `grok`, `uuid`, `http`)
|
||||
|
||||
---
|
||||
|
||||
### 3. Verify PostgreSQL Logs Are Being Read
|
||||
|
||||
```bash
|
||||
# Check if log file exists and has content
|
||||
ls -lh /var/log/postgresql/postgresql-$(date +%Y-%m-%d).log
|
||||
|
||||
# Check Logstash can read the file
|
||||
sudo -u logstash cat /var/log/postgresql/postgresql-$(date +%Y-%m-%d).log | head -10
|
||||
```
|
||||
|
||||
**Expected output:**
|
||||
|
||||
- Log file exists and is not empty
|
||||
- Logstash user can read the file without permission errors
|
||||
|
||||
**If permission denied:**
|
||||
|
||||
```bash
|
||||
# Check Logstash is in postgres group
|
||||
groups logstash
|
||||
|
||||
# Should show: logstash : logstash adm postgres
|
||||
|
||||
# If not, add to group
|
||||
usermod -a -G postgres logstash
|
||||
systemctl restart logstash
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 4. Check Logstash Pipeline Stats
|
||||
|
||||
```bash
|
||||
# Get pipeline statistics
|
||||
curl -XGET 'localhost:9600/_node/stats/pipelines?pretty' | jq '.pipelines.main.plugins.filters'
|
||||
```
|
||||
|
||||
**Key metrics to check:**
|
||||
|
||||
1. **Grok filter events:**
|
||||
- `"events.in"` - Total events received
|
||||
- `"events.out"` - Events successfully parsed
|
||||
- `"failures"` - Events that failed to parse
|
||||
|
||||
**If failures > 0:** Grok pattern doesn't match log format. Check PostgreSQL log format.
|
||||
|
||||
2. **JSON filter events:**
|
||||
- `"events.in"` - Events received by JSON parser
|
||||
- `"events.out"` - Successfully parsed JSON
|
||||
|
||||
**If events.in = 0:** Regex check `pg_message =~ /^\{/` is not matching. Verify fn_log() output format.
|
||||
|
||||
3. **UUID filter events:**
|
||||
- Should match number of errors being forwarded
|
||||
|
||||
---
|
||||
|
||||
### 5. Test Grok Pattern Manually
|
||||
|
||||
```bash
|
||||
# Get a sample log line
|
||||
tail -1 /var/log/postgresql/postgresql-$(date +%Y-%m-%d).log
|
||||
|
||||
# Example expected format:
|
||||
# 2026-01-20 10:30:00 +05 [12345] flyer_crawler_prod@flyer-crawler-prod WARNING: {"level":"WARNING","source":"postgresql",...}
|
||||
```
|
||||
|
||||
**Pattern breakdown:**
|
||||
|
||||
```
|
||||
%{TIMESTAMP_ISO8601:pg_timestamp} # 2026-01-20 10:30:00
|
||||
[+-]%{INT:pg_timezone} # +05
|
||||
\[%{POSINT:pg_pid}\] # [12345]
|
||||
%{DATA:pg_user}@%{DATA:pg_database} # flyer_crawler_prod@flyer-crawler-prod
|
||||
%{WORD:pg_level}: # WARNING:
|
||||
%{GREEDYDATA:pg_message} # (rest of line)
|
||||
```
|
||||
|
||||
**If pattern doesn't match:**
|
||||
|
||||
1. Check PostgreSQL `log_line_prefix` setting in `/etc/postgresql/14/main/conf.d/observability.conf`
|
||||
2. Should be: `log_line_prefix = '%t [%p] %u@%d '`
|
||||
3. Restart PostgreSQL if changed: `systemctl restart postgresql`
|
||||
|
||||
---
|
||||
|
||||
### 6. Verify Environment Detection
|
||||
|
||||
```bash
|
||||
# Check recent PostgreSQL logs for database field
|
||||
tail -20 /var/log/postgresql/postgresql-$(date +%Y-%m-%d).log | grep -E "flyer-crawler-(prod|test)"
|
||||
```
|
||||
|
||||
**Expected:**
|
||||
|
||||
- Production database: `flyer_crawler_prod@flyer-crawler-prod`
|
||||
- Test database: `flyer_crawler_test@flyer-crawler-test`
|
||||
|
||||
**If database name doesn't match:**
|
||||
|
||||
- Check database connection string in application
|
||||
- Verify `DB_DATABASE_PROD` and `DB_DATABASE_TEST` Gitea secrets
|
||||
|
||||
---
|
||||
|
||||
### 7. Test Bugsink API Connection
|
||||
|
||||
```bash
|
||||
# Test production endpoint
|
||||
curl -X POST https://bugsink.projectium.com/api/1/store/ \
|
||||
-H "X-Sentry-Auth: Sentry sentry_version=7, sentry_client=test/1.0, sentry_key=911aef02b9a548fa8fabb8a3c81abfe5" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"event_id": "12345678901234567890123456789012",
|
||||
"timestamp": "2026-01-20T10:30:00Z",
|
||||
"platform": "other",
|
||||
"level": "error",
|
||||
"logger": "test",
|
||||
"message": "Test error from troubleshooting"
|
||||
}'
|
||||
```
|
||||
|
||||
**Expected response:**
|
||||
|
||||
- HTTP 200 OK
|
||||
- Response body: `{"id": "..."}`
|
||||
|
||||
**If 403 Forbidden:**
|
||||
|
||||
- DSN key is wrong in `/etc/logstash/conf.d/bugsink.conf`
|
||||
- Get correct key from Bugsink UI: Settings → Projects → DSN
|
||||
|
||||
**If 500 Internal Server Error:**
|
||||
|
||||
- Missing required fields (event_id, timestamp, level)
|
||||
- Check `mapping` section in Logstash config
|
||||
|
||||
---
|
||||
|
||||
### 8. Monitor Logstash Output in Real-Time
|
||||
|
||||
```bash
|
||||
# Watch Logstash processing logs
|
||||
journalctl -u logstash -f
|
||||
```
|
||||
|
||||
**What to look for:**
|
||||
|
||||
- `"response code => 200"` - Successful forwarding to Bugsink
|
||||
- `"response code => 403"` - Authentication failure
|
||||
- `"response code => 500"` - Invalid event format
|
||||
- Grok parse failures
|
||||
|
||||
---
|
||||
|
||||
## Common Issues and Solutions
|
||||
|
||||
### Issue 1: Grok Pattern Parse Failures
|
||||
|
||||
**Symptoms:**
|
||||
|
||||
- Logstash stats show increasing `"failures"` count
|
||||
- No events reaching Bugsink
|
||||
|
||||
**Diagnosis:**
|
||||
|
||||
```bash
|
||||
curl -XGET 'localhost:9600/_node/stats/pipelines?pretty' | jq '.pipelines.main.plugins.filters[] | select(.name == "grok") | .failures'
|
||||
```
|
||||
|
||||
**Solution:**
|
||||
|
||||
1. Check PostgreSQL log format matches expected pattern
|
||||
2. Verify `log_line_prefix` in PostgreSQL config
|
||||
3. Test with sample log line using Grok Debugger (Kibana Dev Tools)
|
||||
|
||||
---
|
||||
|
||||
### Issue 2: JSON Filter Not Parsing fn_log() Output
|
||||
|
||||
**Symptoms:**
|
||||
|
||||
- Grok parses successfully but JSON filter shows 0 events
|
||||
- `[fn_log]` fields missing in Logstash output
|
||||
|
||||
**Diagnosis:**
|
||||
|
||||
```bash
|
||||
# Check if pg_message field contains JSON
|
||||
tail -20 /var/log/postgresql/postgresql-$(date +%Y-%m-%d).log | grep "WARNING:" | grep "{"
|
||||
```
|
||||
|
||||
**Solution:**
|
||||
|
||||
1. Verify `fn_log()` function exists in database:
|
||||
```sql
|
||||
\df fn_log
|
||||
```
|
||||
2. Test `fn_log()` output format:
|
||||
```sql
|
||||
SELECT fn_log('WARNING', 'test', 'Test message', '{"key":"value"}'::jsonb);
|
||||
```
|
||||
3. Check logs show JSON output starting with `{`
|
||||
|
||||
---
|
||||
|
||||
### Issue 3: Events Going to Wrong Bugsink Project
|
||||
|
||||
**Symptoms:**
|
||||
|
||||
- Production errors appear in test project (or vice versa)
|
||||
|
||||
**Diagnosis:**
|
||||
|
||||
```bash
|
||||
# Check database name detection in recent logs
|
||||
tail -50 /var/log/postgresql/postgresql-$(date +%Y-%m-%d).log | grep -E "(flyer-crawler-prod|flyer-crawler-test)"
|
||||
```
|
||||
|
||||
**Solution:**
|
||||
|
||||
1. Verify database names in filter section match actual database names
|
||||
2. Check `pg_database` field is correctly extracted by grok pattern:
|
||||
```bash
|
||||
# Enable debug output in Logstash config temporarily
|
||||
stdout { codec => rubydebug { metadata => true } }
|
||||
```
|
||||
3. Verify environment tagging in filter:
|
||||
- `pg_database == "flyer-crawler-prod"` → adds "production" tag → routes to project 1
|
||||
- `pg_database == "flyer-crawler-test"` → adds "test" tag → routes to project 3
|
||||
|
||||
---
|
||||
|
||||
### Issue 4: 403 Authentication Errors from Bugsink
|
||||
|
||||
**Symptoms:**
|
||||
|
||||
- Logstash logs show `response code => 403`
|
||||
- Events not appearing in Bugsink
|
||||
|
||||
**Diagnosis:**
|
||||
|
||||
```bash
|
||||
# Check Logstash output logs for authentication errors
|
||||
journalctl -u logstash -n 100 | grep "403"
|
||||
```
|
||||
|
||||
**Solution:**
|
||||
|
||||
1. Verify DSN key in `/etc/logstash/conf.d/bugsink.conf` matches Bugsink project
|
||||
2. Get correct DSN from Bugsink UI:
|
||||
- Navigate to Settings → Projects → Click project
|
||||
- Copy "DSN" value
|
||||
- Extract key: `http://KEY@host/PROJECT_ID` → use KEY
|
||||
3. Update `X-Sentry-Auth` header in Logstash config:
|
||||
```conf
|
||||
"X-Sentry-Auth" => "Sentry sentry_version=7, sentry_client=logstash/1.0, sentry_key=YOUR_KEY_HERE"
|
||||
```
|
||||
4. Restart Logstash: `systemctl restart logstash`
|
||||
|
||||
---
|
||||
|
||||
### Issue 5: 500 Errors from Bugsink
|
||||
|
||||
**Symptoms:**
|
||||
|
||||
- Logstash logs show `response code => 500`
|
||||
- Bugsink logs show validation errors
|
||||
|
||||
**Diagnosis:**
|
||||
|
||||
```bash
|
||||
# Check Bugsink logs for details
|
||||
docker logs bugsink-web 2>&1 | tail -50
|
||||
```
|
||||
|
||||
**Common causes:**
|
||||
|
||||
1. Missing `event_id` field
|
||||
2. Invalid timestamp format
|
||||
3. Missing required Sentry fields
|
||||
|
||||
**Solution:**
|
||||
|
||||
1. Verify `uuid` filter is generating `event_id`:
|
||||
```conf
|
||||
uuid {
|
||||
target => "[@metadata][event_id]"
|
||||
overwrite => true
|
||||
}
|
||||
```
|
||||
2. Check `mapping` section includes all required fields:
|
||||
- `event_id` (UUID)
|
||||
- `timestamp` (ISO 8601)
|
||||
- `platform` (string)
|
||||
- `level` (error/warning/info)
|
||||
- `logger` (string)
|
||||
- `message` (string)
|
||||
|
||||
---
|
||||
|
||||
### Issue 6: High Memory Usage by Logstash
|
||||
|
||||
**Symptoms:**
|
||||
|
||||
- Server running out of memory
|
||||
- Logstash OOM killed
|
||||
|
||||
**Diagnosis:**
|
||||
|
||||
```bash
|
||||
# Check Logstash memory usage
|
||||
ps aux | grep logstash
|
||||
systemctl status logstash
|
||||
```
|
||||
|
||||
**Solution:**
|
||||
|
||||
1. Limit Logstash heap size in `/etc/logstash/jvm.options`:
|
||||
```
|
||||
-Xms1g
|
||||
-Xmx1g
|
||||
```
|
||||
2. Restart Logstash: `systemctl restart logstash`
|
||||
3. Monitor with: `top -p $(pgrep -f logstash)`
|
||||
|
||||
---
|
||||
|
||||
### Issue 7: Log File Rotation Issues
|
||||
|
||||
**Symptoms:**
|
||||
|
||||
- Logstash stops processing after log file rotates
|
||||
- Sincedb file pointing to old inode
|
||||
|
||||
**Diagnosis:**
|
||||
|
||||
```bash
|
||||
# Check sincedb file
|
||||
cat /var/lib/logstash/sincedb_postgres
|
||||
|
||||
# Check current log file inode
|
||||
ls -li /var/log/postgresql/postgresql-$(date +%Y-%m-%d).log
|
||||
```
|
||||
|
||||
**Solution:**
|
||||
|
||||
1. Logstash should automatically detect rotation
|
||||
2. If stuck, delete sincedb file (will reprocess recent logs):
|
||||
```bash
|
||||
systemctl stop logstash
|
||||
rm /var/lib/logstash/sincedb_postgres
|
||||
systemctl start logstash
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Verification Checklist
|
||||
|
||||
After making any changes, verify the pipeline is working:
|
||||
|
||||
- [ ] Logstash is running: `systemctl status logstash`
|
||||
- [ ] Configuration is valid: `/usr/share/logstash/bin/logstash --config.test_and_exit -f /etc/logstash/conf.d/bugsink.conf`
|
||||
- [ ] No grok failures: `curl localhost:9600/_node/stats/pipelines?pretty | jq '.pipelines.main.plugins.filters[] | select(.name == "grok") | .failures'`
|
||||
- [ ] Events being processed: `curl localhost:9600/_node/stats/pipelines?pretty | jq '.pipelines.main.events'`
|
||||
- [ ] Test error appears in Bugsink: Trigger a database function error and check Bugsink UI
|
||||
|
||||
---
|
||||
|
||||
## Test Database Function Error
|
||||
|
||||
To generate a test error for verification:
|
||||
|
||||
```bash
|
||||
# Connect to production database
|
||||
sudo -u postgres psql -d flyer-crawler-prod
|
||||
|
||||
# Trigger an error (achievement not found)
|
||||
SELECT award_achievement('00000000-0000-0000-0000-000000000001'::uuid, 'Nonexistent Badge');
|
||||
\q
|
||||
```
|
||||
|
||||
**Expected flow:**
|
||||
|
||||
1. PostgreSQL logs the error to `/var/log/postgresql/postgresql-YYYY-MM-DD.log`
|
||||
2. Logstash reads and parses the log (within ~30 seconds)
|
||||
3. Error appears in Bugsink project 1 (production)
|
||||
|
||||
**If error doesn't appear:**
|
||||
|
||||
- Check each diagnostic step above
|
||||
- Review Logstash logs: `journalctl -u logstash -f`
|
||||
|
||||
---
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- **Setup Guide**: [docs/BARE-METAL-SETUP.md](BARE-METAL-SETUP.md) - PostgreSQL Function Observability section
|
||||
- **Architecture**: [docs/adr/0050-postgresql-function-observability.md](adr/0050-postgresql-function-observability.md)
|
||||
- **Configuration Reference**: [CLAUDE.md](../CLAUDE.md) - Logstash Configuration section
|
||||
- **Bugsink MCP Server**: [CLAUDE.md](../CLAUDE.md) - Sentry/Bugsink MCP Server Setup section
|
||||
Reference in New Issue
Block a user