16 KiB
Production Deployment Checklist: Extended Logstash Configuration
Important: This checklist follows a inspect-first, then-modify approach. Each step first checks the current state before making changes.
Phase 1: Pre-Deployment Inspection
Step 1.1: Verify Logstash Status
ssh root@projectium.com
systemctl status logstash
curl -s http://localhost:9600/_node/stats/pipelines?pretty | jq '.pipelines.main.events'
Record current state:
- Status: [active/inactive]
- Events processed: [number]
- Memory usage: [amount]
Expected: Logstash should be active and processing PostgreSQL logs from ADR-050.
Step 1.2: Inspect Existing Configuration Files
# List all configuration files
ls -alF /etc/logstash/conf.d/
# Check existing backups (if any)
ls -lh /etc/logstash/conf.d/*.backup-* 2>/dev/null || echo "No backups found"
# View current configuration
cat /etc/logstash/conf.d/bugsink.conf
Record current state:
- Configuration files present: [list]
- Existing backups: [list or "none"]
- Current config size: [bytes]
Questions to answer:
- ✅ Is there an existing
bugsink.conf? - ✅ Are there any existing backups?
- ✅ What inputs/filters/outputs are currently configured?
Step 1.3: Inspect Log Output Directory
# Check if directory exists
ls -ld /var/log/logstash 2>/dev/null || echo "Directory does not exist"
# If exists, check contents
ls -alF /var/log/logstash/
# Check ownership and permissions
ls -ld /var/log/logstash
Record current state:
- Directory exists: [yes/no]
- Current ownership: [user:group]
- Current permissions: [drwx------]
- Existing files: [list]
Questions to answer:
- ✅ Does
/var/log/logstash/already exist? - ✅ What files are currently in it?
- ✅ Are these Logstash's own logs or our operational logs?
Step 1.4: Check Logrotate Configuration
# Check if logrotate config exists
cat /etc/logrotate.d/logstash 2>/dev/null || echo "No logrotate config found"
# List all logrotate configs
ls -lh /etc/logrotate.d/ | grep logstash
Record current state:
- Logrotate config exists: [yes/no]
- Current rotation policy: [daily/weekly/none]
Step 1.5: Check Logstash User Groups
# Check current group membership
groups logstash
# Verify which groups have access to required logs
ls -l /home/gitea-runner/.pm2/logs/*.log | head -3
ls -l /var/log/redis/redis-server.log
ls -l /var/log/nginx/access.log
ls -l /var/log/nginx/error.log
Record current state:
- Logstash groups: [list]
- PM2 log file group: [group]
- Redis log file group: [group]
- NGINX log file group: [group]
Questions to answer:
- ✅ Is logstash already in the
admgroup? - ✅ Is logstash already in the
postgresgroup? - ✅ Can logstash currently read PM2 logs?
Step 1.6: Test Log File Access (Current State)
# Test PM2 worker logs
sudo -u logstash cat /home/gitea-runner/.pm2/logs/flyer-crawler-worker-*.log | head -5 2>&1
# Test PM2 analytics worker logs
sudo -u logstash cat /home/gitea-runner/.pm2/logs/flyer-crawler-analytics-worker-*.log | head -5 2>&1
# Test Redis logs
sudo -u logstash cat /var/log/redis/redis-server.log | head -5 2>&1
# Test NGINX access logs
sudo -u logstash cat /var/log/nginx/access.log | head -5 2>&1
# Test NGINX error logs
sudo -u logstash cat /var/log/nginx/error.log | head -5 2>&1
Record current state:
- PM2 worker logs accessible: [yes/no/error]
- PM2 analytics logs accessible: [yes/no/error]
- Redis logs accessible: [yes/no/error]
- NGINX access logs accessible: [yes/no/error]
- NGINX error logs accessible: [yes/no/error]
If any fail: Note the specific error message (permission denied, file not found, etc.)
Step 1.7: Check PM2 Log File Locations
# List all PM2 log files
ls -lh /home/gitea-runner/.pm2/logs/
# Check for production and test worker logs
ls -lh /home/gitea-runner/.pm2/logs/ | grep -E "(flyer-crawler-worker|flyer-crawler-analytics-worker)"
Record current state:
- Production worker logs present: [yes/no]
- Test worker logs present: [yes/no]
- Analytics worker logs present: [yes/no]
- File naming pattern: [describe pattern]
Questions to answer:
- ✅ Do the log file paths match what's in the new Logstash config?
- ✅ Are there separate logs for production vs test environments?
Step 1.8: Check Disk Space
# Check available disk space
df -h /var/log/
# Check current size of Logstash logs
du -sh /var/log/logstash/
# Check size of PM2 logs
du -sh /home/gitea-runner/.pm2/logs/
Record current state:
- Available space on
/var/log: [amount] - Current Logstash log size: [amount]
- Current PM2 log size: [amount]
Risk assessment:
- ✅ Is there sufficient space for 30 days of rotated logs?
- ✅ Estimate: ~100MB/day for new operational logs = ~3GB for 30 days
Step 1.9: Review Bugsink Projects
# Check if Bugsink projects 5 and 6 exist
# (This requires accessing Bugsink UI or API)
echo "Manual check: Navigate to https://bugsink.projectium.com"
echo "Verify project IDs 5 and 6 exist and their names/DSNs"
Record current state:
- Project 5 exists: [yes/no]
- Project 5 name: [name]
- Project 6 exists: [yes/no]
- Project 6 name: [name]
Questions to answer:
- ✅ Do the project IDs in the new config match actual Bugsink projects?
- ✅ Are DSNs correct?
Phase 2: Make Deployment Decisions
Based on Phase 1 inspection, answer these questions:
-
Backup needed?
- Current config exists: [yes/no]
- Decision: [create backup / no backup needed]
-
Directory creation needed?
/var/log/logstash/exists with correct permissions: [yes/no]- Decision: [create directory / fix permissions / no action needed]
-
Logrotate config needed?
- Config exists: [yes/no]
- Decision: [create config / update config / no action needed]
-
Group membership needed?
- Logstash already in
admgroup: [yes/no] - Decision: [add to group / already member]
- Logstash already in
-
Log file access issues?
- Any files inaccessible: [list files]
- Decision: [fix permissions / fix group membership / no action needed]
Phase 3: Execute Deployment
Step 3.1: Create Configuration Backup
Only if: Configuration file exists and no recent backup.
# Create timestamped backup
sudo cp /etc/logstash/conf.d/bugsink.conf \
/etc/logstash/conf.d/bugsink.conf.backup-$(date +%Y%m%d-%H%M%S)
# Verify backup
ls -lh /etc/logstash/conf.d/*.backup-*
Confirmation: ✅ Backup file created with timestamp.
Step 3.2: Handle Log Output Directory
If directory doesn't exist:
sudo mkdir -p /var/log/logstash-operational
sudo chown logstash:logstash /var/log/logstash-operational
sudo chmod 755 /var/log/logstash-operational
If directory exists but has wrong permissions:
sudo chown logstash:logstash /var/log/logstash
sudo chmod 755 /var/log/logstash
Note: The existing /var/log/logstash/ contains Logstash's own operational logs (logstash-plain.log, etc.). You have two options:
Option A: Use a separate directory for our operational logs (recommended):
- Directory:
/var/log/logstash-operational/ - Update config to use this path instead
Option B: Share the directory (requires careful logrotate config):
- Keep using
/var/log/logstash/ - Ensure logrotate doesn't rotate our custom logs the same way as Logstash's own logs
Decision: [Choose Option A or B]
Verification:
ls -ld /var/log/logstash-operational # or /var/log/logstash
Confirmation: ✅ Directory exists with drwxr-xr-x logstash logstash.
Step 3.3: Configure Logrotate
Only if: Logrotate config doesn't exist or needs updating.
For Option A (separate directory):
sudo tee /etc/logrotate.d/logstash-operational <<'EOF'
/var/log/logstash-operational/*.log {
daily
rotate 30
compress
delaycompress
missingok
notifempty
create 0644 logstash logstash
sharedscripts
postrotate
# No reload needed - Logstash handles rotation automatically
endscript
}
EOF
For Option B (shared directory):
sudo tee /etc/logrotate.d/logstash-operational <<'EOF'
/var/log/logstash/pm2-workers-*.log
/var/log/logstash/redis-operational-*.log
/var/log/logstash/nginx-access-*.log {
daily
rotate 30
compress
delaycompress
missingok
notifempty
create 0644 logstash logstash
sharedscripts
postrotate
# No reload needed - Logstash handles rotation automatically
endscript
}
EOF
Verify configuration:
sudo logrotate -d /etc/logrotate.d/logstash-operational
cat /etc/logrotate.d/logstash-operational
Confirmation: ✅ Logrotate config created, syntax check passes.
Step 3.4: Grant Logstash Permissions
Only if: Logstash not already in adm group.
# Add logstash to adm group (for NGINX and system logs)
sudo usermod -a -G adm logstash
# Verify group membership
groups logstash
Expected output: logstash : logstash adm postgres
Confirmation: ✅ Logstash user is in required groups.
Step 3.5: Verify Log File Access (Post-Permission Changes)
Only if: Previous access tests failed.
# Re-test log file access
sudo -u logstash cat /home/gitea-runner/.pm2/logs/flyer-crawler-worker-*.log | head -5
sudo -u logstash cat /home/gitea-runner/.pm2/logs/flyer-crawler-analytics-worker-*.log | head -5
sudo -u logstash cat /var/log/redis/redis-server.log | head -5
sudo -u logstash cat /var/log/nginx/access.log | head -5
sudo -u logstash cat /var/log/nginx/error.log | head -5
Confirmation: ✅ All log files now readable without errors.
Step 3.6: Update Logstash Configuration
Important: Before pasting, adjust the file output paths based on your directory decision.
# Open configuration file
sudo nano /etc/logstash/conf.d/bugsink.conf
Paste the complete configuration from docs/BARE-METAL-SETUP.md.
If using Option A (separate directory), update these lines in the config:
# Change this:
path => "/var/log/logstash/pm2-workers-%{+YYYY-MM-dd}.log"
# To this:
path => "/var/log/logstash-operational/pm2-workers-%{+YYYY-MM-dd}.log"
# (Repeat for redis-operational and nginx-access file outputs)
Save and exit: Ctrl+X, Y, Enter
Step 3.7: Test Configuration Syntax
# Test for syntax errors
sudo /usr/share/logstash/bin/logstash --config.test_and_exit -f /etc/logstash/conf.d/bugsink.conf
Expected output: Configuration OK
If errors:
- Review error message for line number
- Check for missing braces, quotes, commas
- Verify file paths match your directory decision
- Compare against documentation
Confirmation: ✅ Configuration syntax is valid.
Step 3.8: Restart Logstash Service
# Restart Logstash
sudo systemctl restart logstash
# Check service started successfully
sudo systemctl status logstash
# Wait for initialization
sleep 30
# Check for startup errors
sudo journalctl -u logstash -n 100 --no-pager | grep -i error
Expected:
- Status:
active (running) - No critical errors (warnings about missing files are OK initially)
Confirmation: ✅ Logstash restarted successfully.
Phase 4: Post-Deployment Verification
Step 4.1: Verify Pipeline Processing
# Check pipeline stats - events should be increasing
curl -s http://localhost:9600/_node/stats/pipelines?pretty | jq '.pipelines.main.events'
# Check input plugins
curl -s http://localhost:9600/_node/stats/pipelines?pretty | jq '.pipelines.main.plugins.inputs'
# Check for grok failures
curl -s http://localhost:9600/_node/stats/pipelines?pretty | jq '.pipelines.main.plugins.filters[] | select(.name == "grok") | {name, events_in: .events.in, events_out: .events.out, failures}'
Expected:
events.inandevents.outare increasing- Input plugins show files being read
- Grok failures < 1% of events
Confirmation: ✅ Pipeline processing events from multiple sources.
Step 4.2: Verify File Outputs Created
# Wait a few minutes for log generation
sleep 120
# Check files were created
ls -lh /var/log/logstash-operational/ # or /var/log/logstash/
# View sample logs
tail -20 /var/log/logstash-operational/pm2-workers-$(date +%Y-%m-%d).log
tail -20 /var/log/logstash-operational/redis-operational-$(date +%Y-%m-%d).log
tail -20 /var/log/logstash-operational/nginx-access-$(date +%Y-%m-%d).log
Expected:
- Files exist with today's date
- Files contain JSON-formatted log entries
- Timestamps are recent
Confirmation: ✅ Operational logs being written successfully.
Step 4.3: Test Error Forwarding to Bugsink
# Check HTTP output stats (Bugsink forwarding)
curl -s http://localhost:9600/_node/stats/pipelines?pretty | jq '.pipelines.main.plugins.outputs[] | select(.name == "http") | {name, events_in: .events.in, events_out: .events.out}'
Manual check:
- Navigate to: https://bugsink.projectium.com
- Check Project 5 (production infrastructure) for recent events
- Check Project 6 (test infrastructure) for recent events
Confirmation: ✅ Errors forwarded to correct Bugsink projects.
Step 4.4: Monitor Logstash Performance
# Check memory usage
ps aux | grep logstash | grep -v grep
# Check disk usage
du -sh /var/log/logstash-operational/
# Monitor in real-time (Ctrl+C to exit)
sudo journalctl -u logstash -f
Expected:
- Memory usage < 1.5GB (with 1GB heap)
- Disk usage reasonable (< 100MB for first day)
- No repeated errors
Confirmation: ✅ Performance is stable.
Step 4.5: Verify Environment Detection
# Check recent logs for environment tags
sudo journalctl -u logstash -n 500 | grep -E "(production|test)" | tail -20
# Check file outputs for correct tagging
grep -o '"environment":"[^"]*"' /var/log/logstash-operational/pm2-workers-$(date +%Y-%m-%d).log | sort | uniq -c
Expected:
- Production worker logs tagged as "production"
- Test worker logs tagged as "test"
Confirmation: ✅ Environment detection working correctly.
Step 4.6: Document Deployment
# Record deployment
echo "Extended Logstash Configuration deployed on $(date)" | sudo tee -a /var/log/deployments.log
# Record configuration version
sudo ls -lh /etc/logstash/conf.d/bugsink.conf
Confirmation: ✅ Deployment documented.
Phase 5: 24-Hour Monitoring Plan
Monitor these metrics over the next 24 hours:
Every 4 hours:
- Service health:
systemctl status logstash - Disk usage:
du -sh /var/log/logstash-operational/ - Memory usage:
ps aux | grep logstash | grep -v grep
Every 12 hours:
- Error rates: Check Bugsink projects 5 and 6
- Log file growth:
ls -lh /var/log/logstash-operational/ - Pipeline stats:
curl -s http://localhost:9600/_node/stats/pipelines?pretty | jq '.pipelines.main.events'
Rollback Procedure
If issues occur:
# Stop Logstash
sudo systemctl stop logstash
# Find latest backup
ls -lt /etc/logstash/conf.d/*.backup-* | head -1
# Restore backup (replace TIMESTAMP with actual timestamp)
sudo cp /etc/logstash/conf.d/bugsink.conf.backup-TIMESTAMP \
/etc/logstash/conf.d/bugsink.conf
# Test restored config
sudo /usr/share/logstash/bin/logstash --config.test_and_exit -f /etc/logstash/conf.d/bugsink.conf
# Restart Logstash
sudo systemctl start logstash
# Verify status
systemctl status logstash
Quick Health Check
Run this anytime to verify deployment health:
# One-line health check
systemctl is-active logstash && \
echo "Service: OK" && \
ls /var/log/logstash-operational/*.log &>/dev/null && \
echo "Logs: OK" && \
curl -s http://localhost:9600/_node/stats/pipelines?pretty | jq -e '.pipelines.main.events.in > 0' &>/dev/null && \
echo "Processing: OK"
Expected output:
active
Service: OK
Logs: OK
Processing: OK
Summary Checklist
After completing all steps:
- ✅ Phase 1: Inspection complete, state recorded
- ✅ Phase 2: Deployment decisions made
- ✅ Phase 3: Configuration deployed
- ✅ Backup created
- ✅ Directory configured
- ✅ Logrotate configured
- ✅ Permissions granted
- ✅ Config updated and tested
- ✅ Service restarted
- ✅ Phase 4: Verification complete
- ✅ Pipeline processing
- ✅ File outputs working
- ✅ Errors forwarded to Bugsink
- ✅ Performance stable
- ✅ Environment detection working
- ✅ Phase 5: Monitoring plan established
Deployment Status: [READY / IN PROGRESS / COMPLETE / ROLLED BACK]