Files
flyer-crawler.projectium.com/docs/LOGSTASH_DEPLOYMENT_CHECKLIST.md
Torben Sorensen 4e5d709973
Some checks failed
Deploy to Test Environment / deploy-to-test (push) Failing after 12s
more fixin logging, UI update #1, source maps fix
2026-01-21 03:27:44 -08:00

16 KiB

Production Deployment Checklist: Extended Logstash Configuration

Important: This checklist follows a inspect-first, then-modify approach. Each step first checks the current state before making changes.


Phase 1: Pre-Deployment Inspection

Step 1.1: Verify Logstash Status

ssh root@projectium.com
systemctl status logstash
curl -s http://localhost:9600/_node/stats/pipelines?pretty | jq '.pipelines.main.events'

Record current state:

  • Status: [active/inactive]
  • Events processed: [number]
  • Memory usage: [amount]

Expected: Logstash should be active and processing PostgreSQL logs from ADR-050.


Step 1.2: Inspect Existing Configuration Files

# List all configuration files
ls -alF /etc/logstash/conf.d/

# Check existing backups (if any)
ls -lh /etc/logstash/conf.d/*.backup-* 2>/dev/null || echo "No backups found"

# View current configuration
cat /etc/logstash/conf.d/bugsink.conf

Record current state:

  • Configuration files present: [list]
  • Existing backups: [list or "none"]
  • Current config size: [bytes]

Questions to answer:

  • Is there an existing bugsink.conf?
  • Are there any existing backups?
  • What inputs/filters/outputs are currently configured?

Step 1.3: Inspect Log Output Directory

# Check if directory exists
ls -ld /var/log/logstash 2>/dev/null || echo "Directory does not exist"

# If exists, check contents
ls -alF /var/log/logstash/

# Check ownership and permissions
ls -ld /var/log/logstash

Record current state:

  • Directory exists: [yes/no]
  • Current ownership: [user:group]
  • Current permissions: [drwx------]
  • Existing files: [list]

Questions to answer:

  • Does /var/log/logstash/ already exist?
  • What files are currently in it?
  • Are these Logstash's own logs or our operational logs?

Step 1.4: Check Logrotate Configuration

# Check if logrotate config exists
cat /etc/logrotate.d/logstash 2>/dev/null || echo "No logrotate config found"

# List all logrotate configs
ls -lh /etc/logrotate.d/ | grep logstash

Record current state:

  • Logrotate config exists: [yes/no]
  • Current rotation policy: [daily/weekly/none]

Step 1.5: Check Logstash User Groups

# Check current group membership
groups logstash

# Verify which groups have access to required logs
ls -l /home/gitea-runner/.pm2/logs/*.log | head -3
ls -l /var/log/redis/redis-server.log
ls -l /var/log/nginx/access.log
ls -l /var/log/nginx/error.log

Record current state:

  • Logstash groups: [list]
  • PM2 log file group: [group]
  • Redis log file group: [group]
  • NGINX log file group: [group]

Questions to answer:

  • Is logstash already in the adm group?
  • Is logstash already in the postgres group?
  • Can logstash currently read PM2 logs?

Step 1.6: Test Log File Access (Current State)

# Test PM2 worker logs
sudo -u logstash cat /home/gitea-runner/.pm2/logs/flyer-crawler-worker-*.log | head -5 2>&1

# Test PM2 analytics worker logs
sudo -u logstash cat /home/gitea-runner/.pm2/logs/flyer-crawler-analytics-worker-*.log | head -5 2>&1

# Test Redis logs
sudo -u logstash cat /var/log/redis/redis-server.log | head -5 2>&1

# Test NGINX access logs
sudo -u logstash cat /var/log/nginx/access.log | head -5 2>&1

# Test NGINX error logs
sudo -u logstash cat /var/log/nginx/error.log | head -5 2>&1

Record current state:

  • PM2 worker logs accessible: [yes/no/error]
  • PM2 analytics logs accessible: [yes/no/error]
  • Redis logs accessible: [yes/no/error]
  • NGINX access logs accessible: [yes/no/error]
  • NGINX error logs accessible: [yes/no/error]

If any fail: Note the specific error message (permission denied, file not found, etc.)


Step 1.7: Check PM2 Log File Locations

# List all PM2 log files
ls -lh /home/gitea-runner/.pm2/logs/

# Check for production and test worker logs
ls -lh /home/gitea-runner/.pm2/logs/ | grep -E "(flyer-crawler-worker|flyer-crawler-analytics-worker)"

Record current state:

  • Production worker logs present: [yes/no]
  • Test worker logs present: [yes/no]
  • Analytics worker logs present: [yes/no]
  • File naming pattern: [describe pattern]

Questions to answer:

  • Do the log file paths match what's in the new Logstash config?
  • Are there separate logs for production vs test environments?

Step 1.8: Check Disk Space

# Check available disk space
df -h /var/log/

# Check current size of Logstash logs
du -sh /var/log/logstash/

# Check size of PM2 logs
du -sh /home/gitea-runner/.pm2/logs/

Record current state:

  • Available space on /var/log: [amount]
  • Current Logstash log size: [amount]
  • Current PM2 log size: [amount]

Risk assessment:

  • Is there sufficient space for 30 days of rotated logs?
  • Estimate: ~100MB/day for new operational logs = ~3GB for 30 days

Step 1.9: Review Bugsink Projects

# Check if Bugsink projects 5 and 6 exist
# (This requires accessing Bugsink UI or API)
echo "Manual check: Navigate to https://bugsink.projectium.com"
echo "Verify project IDs 5 and 6 exist and their names/DSNs"

Record current state:

  • Project 5 exists: [yes/no]
  • Project 5 name: [name]
  • Project 6 exists: [yes/no]
  • Project 6 name: [name]

Questions to answer:

  • Do the project IDs in the new config match actual Bugsink projects?
  • Are DSNs correct?

Phase 2: Make Deployment Decisions

Based on Phase 1 inspection, answer these questions:

  1. Backup needed?

    • Current config exists: [yes/no]
    • Decision: [create backup / no backup needed]
  2. Directory creation needed?

    • /var/log/logstash/ exists with correct permissions: [yes/no]
    • Decision: [create directory / fix permissions / no action needed]
  3. Logrotate config needed?

    • Config exists: [yes/no]
    • Decision: [create config / update config / no action needed]
  4. Group membership needed?

    • Logstash already in adm group: [yes/no]
    • Decision: [add to group / already member]
  5. Log file access issues?

    • Any files inaccessible: [list files]
    • Decision: [fix permissions / fix group membership / no action needed]

Phase 3: Execute Deployment

Step 3.1: Create Configuration Backup

Only if: Configuration file exists and no recent backup.

# Create timestamped backup
sudo cp /etc/logstash/conf.d/bugsink.conf \
  /etc/logstash/conf.d/bugsink.conf.backup-$(date +%Y%m%d-%H%M%S)

# Verify backup
ls -lh /etc/logstash/conf.d/*.backup-*

Confirmation: Backup file created with timestamp.


Step 3.2: Handle Log Output Directory

If directory doesn't exist:

sudo mkdir -p /var/log/logstash-operational
sudo chown logstash:logstash /var/log/logstash-operational
sudo chmod 755 /var/log/logstash-operational

If directory exists but has wrong permissions:

sudo chown logstash:logstash /var/log/logstash
sudo chmod 755 /var/log/logstash

Note: The existing /var/log/logstash/ contains Logstash's own operational logs (logstash-plain.log, etc.). You have two options:

Option A: Use a separate directory for our operational logs (recommended):

  • Directory: /var/log/logstash-operational/
  • Update config to use this path instead

Option B: Share the directory (requires careful logrotate config):

  • Keep using /var/log/logstash/
  • Ensure logrotate doesn't rotate our custom logs the same way as Logstash's own logs

Decision: [Choose Option A or B]

Verification:

ls -ld /var/log/logstash-operational  # or /var/log/logstash

Confirmation: Directory exists with drwxr-xr-x logstash logstash.


Step 3.3: Configure Logrotate

Only if: Logrotate config doesn't exist or needs updating.

For Option A (separate directory):

sudo tee /etc/logrotate.d/logstash-operational <<'EOF'
/var/log/logstash-operational/*.log {
    daily
    rotate 30
    compress
    delaycompress
    missingok
    notifempty
    create 0644 logstash logstash
    sharedscripts
    postrotate
        # No reload needed - Logstash handles rotation automatically
    endscript
}
EOF

For Option B (shared directory):

sudo tee /etc/logrotate.d/logstash-operational <<'EOF'
/var/log/logstash/pm2-workers-*.log
/var/log/logstash/redis-operational-*.log
/var/log/logstash/nginx-access-*.log {
    daily
    rotate 30
    compress
    delaycompress
    missingok
    notifempty
    create 0644 logstash logstash
    sharedscripts
    postrotate
        # No reload needed - Logstash handles rotation automatically
    endscript
}
EOF

Verify configuration:

sudo logrotate -d /etc/logrotate.d/logstash-operational
cat /etc/logrotate.d/logstash-operational

Confirmation: Logrotate config created, syntax check passes.


Step 3.4: Grant Logstash Permissions

Only if: Logstash not already in adm group.

# Add logstash to adm group (for NGINX and system logs)
sudo usermod -a -G adm logstash

# Verify group membership
groups logstash

Expected output: logstash : logstash adm postgres

Confirmation: Logstash user is in required groups.


Step 3.5: Verify Log File Access (Post-Permission Changes)

Only if: Previous access tests failed.

# Re-test log file access
sudo -u logstash cat /home/gitea-runner/.pm2/logs/flyer-crawler-worker-*.log | head -5
sudo -u logstash cat /home/gitea-runner/.pm2/logs/flyer-crawler-analytics-worker-*.log | head -5
sudo -u logstash cat /var/log/redis/redis-server.log | head -5
sudo -u logstash cat /var/log/nginx/access.log | head -5
sudo -u logstash cat /var/log/nginx/error.log | head -5

Confirmation: All log files now readable without errors.


Step 3.6: Update Logstash Configuration

Important: Before pasting, adjust the file output paths based on your directory decision.

# Open configuration file
sudo nano /etc/logstash/conf.d/bugsink.conf

Paste the complete configuration from docs/BARE-METAL-SETUP.md.

If using Option A (separate directory), update these lines in the config:

# Change this:
path => "/var/log/logstash/pm2-workers-%{+YYYY-MM-dd}.log"

# To this:
path => "/var/log/logstash-operational/pm2-workers-%{+YYYY-MM-dd}.log"

# (Repeat for redis-operational and nginx-access file outputs)

Save and exit: Ctrl+X, Y, Enter


Step 3.7: Test Configuration Syntax

# Test for syntax errors
sudo /usr/share/logstash/bin/logstash --config.test_and_exit -f /etc/logstash/conf.d/bugsink.conf

Expected output: Configuration OK

If errors:

  1. Review error message for line number
  2. Check for missing braces, quotes, commas
  3. Verify file paths match your directory decision
  4. Compare against documentation

Confirmation: Configuration syntax is valid.


Step 3.8: Restart Logstash Service

# Restart Logstash
sudo systemctl restart logstash

# Check service started successfully
sudo systemctl status logstash

# Wait for initialization
sleep 30

# Check for startup errors
sudo journalctl -u logstash -n 100 --no-pager | grep -i error

Expected:

  • Status: active (running)
  • No critical errors (warnings about missing files are OK initially)

Confirmation: Logstash restarted successfully.


Phase 4: Post-Deployment Verification

Step 4.1: Verify Pipeline Processing

# Check pipeline stats - events should be increasing
curl -s http://localhost:9600/_node/stats/pipelines?pretty | jq '.pipelines.main.events'

# Check input plugins
curl -s http://localhost:9600/_node/stats/pipelines?pretty | jq '.pipelines.main.plugins.inputs'

# Check for grok failures
curl -s http://localhost:9600/_node/stats/pipelines?pretty | jq '.pipelines.main.plugins.filters[] | select(.name == "grok") | {name, events_in: .events.in, events_out: .events.out, failures}'

Expected:

  • events.in and events.out are increasing
  • Input plugins show files being read
  • Grok failures < 1% of events

Confirmation: Pipeline processing events from multiple sources.


Step 4.2: Verify File Outputs Created

# Wait a few minutes for log generation
sleep 120

# Check files were created
ls -lh /var/log/logstash-operational/  # or /var/log/logstash/

# View sample logs
tail -20 /var/log/logstash-operational/pm2-workers-$(date +%Y-%m-%d).log
tail -20 /var/log/logstash-operational/redis-operational-$(date +%Y-%m-%d).log
tail -20 /var/log/logstash-operational/nginx-access-$(date +%Y-%m-%d).log

Expected:

  • Files exist with today's date
  • Files contain JSON-formatted log entries
  • Timestamps are recent

Confirmation: Operational logs being written successfully.


Step 4.3: Test Error Forwarding to Bugsink

# Check HTTP output stats (Bugsink forwarding)
curl -s http://localhost:9600/_node/stats/pipelines?pretty | jq '.pipelines.main.plugins.outputs[] | select(.name == "http") | {name, events_in: .events.in, events_out: .events.out}'

Manual check:

  1. Navigate to: https://bugsink.projectium.com
  2. Check Project 5 (production infrastructure) for recent events
  3. Check Project 6 (test infrastructure) for recent events

Confirmation: Errors forwarded to correct Bugsink projects.


Step 4.4: Monitor Logstash Performance

# Check memory usage
ps aux | grep logstash | grep -v grep

# Check disk usage
du -sh /var/log/logstash-operational/

# Monitor in real-time (Ctrl+C to exit)
sudo journalctl -u logstash -f

Expected:

  • Memory usage < 1.5GB (with 1GB heap)
  • Disk usage reasonable (< 100MB for first day)
  • No repeated errors

Confirmation: Performance is stable.


Step 4.5: Verify Environment Detection

# Check recent logs for environment tags
sudo journalctl -u logstash -n 500 | grep -E "(production|test)" | tail -20

# Check file outputs for correct tagging
grep -o '"environment":"[^"]*"' /var/log/logstash-operational/pm2-workers-$(date +%Y-%m-%d).log | sort | uniq -c

Expected:

  • Production worker logs tagged as "production"
  • Test worker logs tagged as "test"

Confirmation: Environment detection working correctly.


Step 4.6: Document Deployment

# Record deployment
echo "Extended Logstash Configuration deployed on $(date)" | sudo tee -a /var/log/deployments.log

# Record configuration version
sudo ls -lh /etc/logstash/conf.d/bugsink.conf

Confirmation: Deployment documented.


Phase 5: 24-Hour Monitoring Plan

Monitor these metrics over the next 24 hours:

Every 4 hours:

  1. Service health: systemctl status logstash
  2. Disk usage: du -sh /var/log/logstash-operational/
  3. Memory usage: ps aux | grep logstash | grep -v grep

Every 12 hours:

  1. Error rates: Check Bugsink projects 5 and 6
  2. Log file growth: ls -lh /var/log/logstash-operational/
  3. Pipeline stats: curl -s http://localhost:9600/_node/stats/pipelines?pretty | jq '.pipelines.main.events'

Rollback Procedure

If issues occur:

# Stop Logstash
sudo systemctl stop logstash

# Find latest backup
ls -lt /etc/logstash/conf.d/*.backup-* | head -1

# Restore backup (replace TIMESTAMP with actual timestamp)
sudo cp /etc/logstash/conf.d/bugsink.conf.backup-TIMESTAMP \
  /etc/logstash/conf.d/bugsink.conf

# Test restored config
sudo /usr/share/logstash/bin/logstash --config.test_and_exit -f /etc/logstash/conf.d/bugsink.conf

# Restart Logstash
sudo systemctl start logstash

# Verify status
systemctl status logstash

Quick Health Check

Run this anytime to verify deployment health:

# One-line health check
systemctl is-active logstash && \
  echo "Service: OK" && \
  ls /var/log/logstash-operational/*.log &>/dev/null && \
  echo "Logs: OK" && \
  curl -s http://localhost:9600/_node/stats/pipelines?pretty | jq -e '.pipelines.main.events.in > 0' &>/dev/null && \
  echo "Processing: OK"

Expected output:

active
Service: OK
Logs: OK
Processing: OK

Summary Checklist

After completing all steps:

  • Phase 1: Inspection complete, state recorded
  • Phase 2: Deployment decisions made
  • Phase 3: Configuration deployed
    • Backup created
    • Directory configured
    • Logrotate configured
    • Permissions granted
    • Config updated and tested
    • Service restarted
  • Phase 4: Verification complete
    • Pipeline processing
    • File outputs working
    • Errors forwarded to Bugsink
    • Performance stable
    • Environment detection working
  • Phase 5: Monitoring plan established

Deployment Status: [READY / IN PROGRESS / COMPLETE / ROLLED BACK]