Some checks failed
Deploy to Test Environment / deploy-to-test (push) Failing after 12s
696 lines
16 KiB
Markdown
696 lines
16 KiB
Markdown
# Production Deployment Checklist: Extended Logstash Configuration
|
|
|
|
**Important**: This checklist follows a **inspect-first, then-modify** approach. Each step first checks the current state before making changes.
|
|
|
|
---
|
|
|
|
## Phase 1: Pre-Deployment Inspection
|
|
|
|
### Step 1.1: Verify Logstash Status
|
|
|
|
```bash
|
|
ssh root@projectium.com
|
|
systemctl status logstash
|
|
curl -s http://localhost:9600/_node/stats/pipelines?pretty | jq '.pipelines.main.events'
|
|
```
|
|
|
|
**Record current state:**
|
|
|
|
- Status: [active/inactive]
|
|
- Events processed: [number]
|
|
- Memory usage: [amount]
|
|
|
|
**Expected**: Logstash should be active and processing PostgreSQL logs from ADR-050.
|
|
|
|
---
|
|
|
|
### Step 1.2: Inspect Existing Configuration Files
|
|
|
|
```bash
|
|
# List all configuration files
|
|
ls -alF /etc/logstash/conf.d/
|
|
|
|
# Check existing backups (if any)
|
|
ls -lh /etc/logstash/conf.d/*.backup-* 2>/dev/null || echo "No backups found"
|
|
|
|
# View current configuration
|
|
cat /etc/logstash/conf.d/bugsink.conf
|
|
```
|
|
|
|
**Record current state:**
|
|
|
|
- Configuration files present: [list]
|
|
- Existing backups: [list or "none"]
|
|
- Current config size: [bytes]
|
|
|
|
**Questions to answer:**
|
|
|
|
- ✅ Is there an existing `bugsink.conf`?
|
|
- ✅ Are there any existing backups?
|
|
- ✅ What inputs/filters/outputs are currently configured?
|
|
|
|
---
|
|
|
|
### Step 1.3: Inspect Log Output Directory
|
|
|
|
```bash
|
|
# Check if directory exists
|
|
ls -ld /var/log/logstash 2>/dev/null || echo "Directory does not exist"
|
|
|
|
# If exists, check contents
|
|
ls -alF /var/log/logstash/
|
|
|
|
# Check ownership and permissions
|
|
ls -ld /var/log/logstash
|
|
```
|
|
|
|
**Record current state:**
|
|
|
|
- Directory exists: [yes/no]
|
|
- Current ownership: [user:group]
|
|
- Current permissions: [drwx------]
|
|
- Existing files: [list]
|
|
|
|
**Questions to answer:**
|
|
|
|
- ✅ Does `/var/log/logstash/` already exist?
|
|
- ✅ What files are currently in it?
|
|
- ✅ Are these Logstash's own logs or our operational logs?
|
|
|
|
---
|
|
|
|
### Step 1.4: Check Logrotate Configuration
|
|
|
|
```bash
|
|
# Check if logrotate config exists
|
|
cat /etc/logrotate.d/logstash 2>/dev/null || echo "No logrotate config found"
|
|
|
|
# List all logrotate configs
|
|
ls -lh /etc/logrotate.d/ | grep logstash
|
|
```
|
|
|
|
**Record current state:**
|
|
|
|
- Logrotate config exists: [yes/no]
|
|
- Current rotation policy: [daily/weekly/none]
|
|
|
|
---
|
|
|
|
### Step 1.5: Check Logstash User Groups
|
|
|
|
```bash
|
|
# Check current group membership
|
|
groups logstash
|
|
|
|
# Verify which groups have access to required logs
|
|
ls -l /home/gitea-runner/.pm2/logs/*.log | head -3
|
|
ls -l /var/log/redis/redis-server.log
|
|
ls -l /var/log/nginx/access.log
|
|
ls -l /var/log/nginx/error.log
|
|
```
|
|
|
|
**Record current state:**
|
|
|
|
- Logstash groups: [list]
|
|
- PM2 log file group: [group]
|
|
- Redis log file group: [group]
|
|
- NGINX log file group: [group]
|
|
|
|
**Questions to answer:**
|
|
|
|
- ✅ Is logstash already in the `adm` group?
|
|
- ✅ Is logstash already in the `postgres` group?
|
|
- ✅ Can logstash currently read PM2 logs?
|
|
|
|
---
|
|
|
|
### Step 1.6: Test Log File Access (Current State)
|
|
|
|
```bash
|
|
# Test PM2 worker logs
|
|
sudo -u logstash cat /home/gitea-runner/.pm2/logs/flyer-crawler-worker-*.log | head -5 2>&1
|
|
|
|
# Test PM2 analytics worker logs
|
|
sudo -u logstash cat /home/gitea-runner/.pm2/logs/flyer-crawler-analytics-worker-*.log | head -5 2>&1
|
|
|
|
# Test Redis logs
|
|
sudo -u logstash cat /var/log/redis/redis-server.log | head -5 2>&1
|
|
|
|
# Test NGINX access logs
|
|
sudo -u logstash cat /var/log/nginx/access.log | head -5 2>&1
|
|
|
|
# Test NGINX error logs
|
|
sudo -u logstash cat /var/log/nginx/error.log | head -5 2>&1
|
|
```
|
|
|
|
**Record current state:**
|
|
|
|
- PM2 worker logs accessible: [yes/no/error]
|
|
- PM2 analytics logs accessible: [yes/no/error]
|
|
- Redis logs accessible: [yes/no/error]
|
|
- NGINX access logs accessible: [yes/no/error]
|
|
- NGINX error logs accessible: [yes/no/error]
|
|
|
|
**If any fail**: Note the specific error message (permission denied, file not found, etc.)
|
|
|
|
---
|
|
|
|
### Step 1.7: Check PM2 Log File Locations
|
|
|
|
```bash
|
|
# List all PM2 log files
|
|
ls -lh /home/gitea-runner/.pm2/logs/
|
|
|
|
# Check for production and test worker logs
|
|
ls -lh /home/gitea-runner/.pm2/logs/ | grep -E "(flyer-crawler-worker|flyer-crawler-analytics-worker)"
|
|
```
|
|
|
|
**Record current state:**
|
|
|
|
- Production worker logs present: [yes/no]
|
|
- Test worker logs present: [yes/no]
|
|
- Analytics worker logs present: [yes/no]
|
|
- File naming pattern: [describe pattern]
|
|
|
|
**Questions to answer:**
|
|
|
|
- ✅ Do the log file paths match what's in the new Logstash config?
|
|
- ✅ Are there separate logs for production vs test environments?
|
|
|
|
---
|
|
|
|
### Step 1.8: Check Disk Space
|
|
|
|
```bash
|
|
# Check available disk space
|
|
df -h /var/log/
|
|
|
|
# Check current size of Logstash logs
|
|
du -sh /var/log/logstash/
|
|
|
|
# Check size of PM2 logs
|
|
du -sh /home/gitea-runner/.pm2/logs/
|
|
```
|
|
|
|
**Record current state:**
|
|
|
|
- Available space on `/var/log`: [amount]
|
|
- Current Logstash log size: [amount]
|
|
- Current PM2 log size: [amount]
|
|
|
|
**Risk assessment:**
|
|
|
|
- ✅ Is there sufficient space for 30 days of rotated logs?
|
|
- ✅ Estimate: ~100MB/day for new operational logs = ~3GB for 30 days
|
|
|
|
---
|
|
|
|
### Step 1.9: Review Bugsink Projects
|
|
|
|
```bash
|
|
# Check if Bugsink projects 5 and 6 exist
|
|
# (This requires accessing Bugsink UI or API)
|
|
echo "Manual check: Navigate to https://bugsink.projectium.com"
|
|
echo "Verify project IDs 5 and 6 exist and their names/DSNs"
|
|
```
|
|
|
|
**Record current state:**
|
|
|
|
- Project 5 exists: [yes/no]
|
|
- Project 5 name: [name]
|
|
- Project 6 exists: [yes/no]
|
|
- Project 6 name: [name]
|
|
|
|
**Questions to answer:**
|
|
|
|
- ✅ Do the project IDs in the new config match actual Bugsink projects?
|
|
- ✅ Are DSNs correct?
|
|
|
|
---
|
|
|
|
## Phase 2: Make Deployment Decisions
|
|
|
|
Based on Phase 1 inspection, answer these questions:
|
|
|
|
1. **Backup needed?**
|
|
- Current config exists: [yes/no]
|
|
- Decision: [create backup / no backup needed]
|
|
|
|
2. **Directory creation needed?**
|
|
- `/var/log/logstash/` exists with correct permissions: [yes/no]
|
|
- Decision: [create directory / fix permissions / no action needed]
|
|
|
|
3. **Logrotate config needed?**
|
|
- Config exists: [yes/no]
|
|
- Decision: [create config / update config / no action needed]
|
|
|
|
4. **Group membership needed?**
|
|
- Logstash already in `adm` group: [yes/no]
|
|
- Decision: [add to group / already member]
|
|
|
|
5. **Log file access issues?**
|
|
- Any files inaccessible: [list files]
|
|
- Decision: [fix permissions / fix group membership / no action needed]
|
|
|
|
---
|
|
|
|
## Phase 3: Execute Deployment
|
|
|
|
### Step 3.1: Create Configuration Backup
|
|
|
|
**Only if**: Configuration file exists and no recent backup.
|
|
|
|
```bash
|
|
# Create timestamped backup
|
|
sudo cp /etc/logstash/conf.d/bugsink.conf \
|
|
/etc/logstash/conf.d/bugsink.conf.backup-$(date +%Y%m%d-%H%M%S)
|
|
|
|
# Verify backup
|
|
ls -lh /etc/logstash/conf.d/*.backup-*
|
|
```
|
|
|
|
**Confirmation**: ✅ Backup file created with timestamp.
|
|
|
|
---
|
|
|
|
### Step 3.2: Handle Log Output Directory
|
|
|
|
**If directory doesn't exist:**
|
|
|
|
```bash
|
|
sudo mkdir -p /var/log/logstash-operational
|
|
sudo chown logstash:logstash /var/log/logstash-operational
|
|
sudo chmod 755 /var/log/logstash-operational
|
|
```
|
|
|
|
**If directory exists but has wrong permissions:**
|
|
|
|
```bash
|
|
sudo chown logstash:logstash /var/log/logstash
|
|
sudo chmod 755 /var/log/logstash
|
|
```
|
|
|
|
**Note**: The existing `/var/log/logstash/` contains Logstash's own operational logs (logstash-plain.log, etc.). You have two options:
|
|
|
|
**Option A**: Use a separate directory for our operational logs (recommended):
|
|
|
|
- Directory: `/var/log/logstash-operational/`
|
|
- Update config to use this path instead
|
|
|
|
**Option B**: Share the directory (requires careful logrotate config):
|
|
|
|
- Keep using `/var/log/logstash/`
|
|
- Ensure logrotate doesn't rotate our custom logs the same way as Logstash's own logs
|
|
|
|
**Decision**: [Choose Option A or B]
|
|
|
|
**Verification:**
|
|
|
|
```bash
|
|
ls -ld /var/log/logstash-operational # or /var/log/logstash
|
|
```
|
|
|
|
**Confirmation**: ✅ Directory exists with `drwxr-xr-x logstash logstash`.
|
|
|
|
---
|
|
|
|
### Step 3.3: Configure Logrotate
|
|
|
|
**Only if**: Logrotate config doesn't exist or needs updating.
|
|
|
|
**For Option A (separate directory):**
|
|
|
|
```bash
|
|
sudo tee /etc/logrotate.d/logstash-operational <<'EOF'
|
|
/var/log/logstash-operational/*.log {
|
|
daily
|
|
rotate 30
|
|
compress
|
|
delaycompress
|
|
missingok
|
|
notifempty
|
|
create 0644 logstash logstash
|
|
sharedscripts
|
|
postrotate
|
|
# No reload needed - Logstash handles rotation automatically
|
|
endscript
|
|
}
|
|
EOF
|
|
```
|
|
|
|
**For Option B (shared directory):**
|
|
|
|
```bash
|
|
sudo tee /etc/logrotate.d/logstash-operational <<'EOF'
|
|
/var/log/logstash/pm2-workers-*.log
|
|
/var/log/logstash/redis-operational-*.log
|
|
/var/log/logstash/nginx-access-*.log {
|
|
daily
|
|
rotate 30
|
|
compress
|
|
delaycompress
|
|
missingok
|
|
notifempty
|
|
create 0644 logstash logstash
|
|
sharedscripts
|
|
postrotate
|
|
# No reload needed - Logstash handles rotation automatically
|
|
endscript
|
|
}
|
|
EOF
|
|
```
|
|
|
|
**Verify configuration:**
|
|
|
|
```bash
|
|
sudo logrotate -d /etc/logrotate.d/logstash-operational
|
|
cat /etc/logrotate.d/logstash-operational
|
|
```
|
|
|
|
**Confirmation**: ✅ Logrotate config created, syntax check passes.
|
|
|
|
---
|
|
|
|
### Step 3.4: Grant Logstash Permissions
|
|
|
|
**Only if**: Logstash not already in `adm` group.
|
|
|
|
```bash
|
|
# Add logstash to adm group (for NGINX and system logs)
|
|
sudo usermod -a -G adm logstash
|
|
|
|
# Verify group membership
|
|
groups logstash
|
|
```
|
|
|
|
**Expected output**: `logstash : logstash adm postgres`
|
|
|
|
**Confirmation**: ✅ Logstash user is in required groups.
|
|
|
|
---
|
|
|
|
### Step 3.5: Verify Log File Access (Post-Permission Changes)
|
|
|
|
**Only if**: Previous access tests failed.
|
|
|
|
```bash
|
|
# Re-test log file access
|
|
sudo -u logstash cat /home/gitea-runner/.pm2/logs/flyer-crawler-worker-*.log | head -5
|
|
sudo -u logstash cat /home/gitea-runner/.pm2/logs/flyer-crawler-analytics-worker-*.log | head -5
|
|
sudo -u logstash cat /var/log/redis/redis-server.log | head -5
|
|
sudo -u logstash cat /var/log/nginx/access.log | head -5
|
|
sudo -u logstash cat /var/log/nginx/error.log | head -5
|
|
```
|
|
|
|
**Confirmation**: ✅ All log files now readable without errors.
|
|
|
|
---
|
|
|
|
### Step 3.6: Update Logstash Configuration
|
|
|
|
**Important**: Before pasting, adjust the file output paths based on your directory decision.
|
|
|
|
```bash
|
|
# Open configuration file
|
|
sudo nano /etc/logstash/conf.d/bugsink.conf
|
|
```
|
|
|
|
**Paste the complete configuration from `docs/BARE-METAL-SETUP.md`.**
|
|
|
|
**If using Option A (separate directory)**, update these lines in the config:
|
|
|
|
```ruby
|
|
# Change this:
|
|
path => "/var/log/logstash/pm2-workers-%{+YYYY-MM-dd}.log"
|
|
|
|
# To this:
|
|
path => "/var/log/logstash-operational/pm2-workers-%{+YYYY-MM-dd}.log"
|
|
|
|
# (Repeat for redis-operational and nginx-access file outputs)
|
|
```
|
|
|
|
**Save and exit**: Ctrl+X, Y, Enter
|
|
|
|
---
|
|
|
|
### Step 3.7: Test Configuration Syntax
|
|
|
|
```bash
|
|
# Test for syntax errors
|
|
sudo /usr/share/logstash/bin/logstash --config.test_and_exit -f /etc/logstash/conf.d/bugsink.conf
|
|
```
|
|
|
|
**Expected output**: `Configuration OK`
|
|
|
|
**If errors:**
|
|
|
|
1. Review error message for line number
|
|
2. Check for missing braces, quotes, commas
|
|
3. Verify file paths match your directory decision
|
|
4. Compare against documentation
|
|
|
|
**Confirmation**: ✅ Configuration syntax is valid.
|
|
|
|
---
|
|
|
|
### Step 3.8: Restart Logstash Service
|
|
|
|
```bash
|
|
# Restart Logstash
|
|
sudo systemctl restart logstash
|
|
|
|
# Check service started successfully
|
|
sudo systemctl status logstash
|
|
|
|
# Wait for initialization
|
|
sleep 30
|
|
|
|
# Check for startup errors
|
|
sudo journalctl -u logstash -n 100 --no-pager | grep -i error
|
|
```
|
|
|
|
**Expected**:
|
|
|
|
- Status: `active (running)`
|
|
- No critical errors (warnings about missing files are OK initially)
|
|
|
|
**Confirmation**: ✅ Logstash restarted successfully.
|
|
|
|
---
|
|
|
|
## Phase 4: Post-Deployment Verification
|
|
|
|
### Step 4.1: Verify Pipeline Processing
|
|
|
|
```bash
|
|
# Check pipeline stats - events should be increasing
|
|
curl -s http://localhost:9600/_node/stats/pipelines?pretty | jq '.pipelines.main.events'
|
|
|
|
# Check input plugins
|
|
curl -s http://localhost:9600/_node/stats/pipelines?pretty | jq '.pipelines.main.plugins.inputs'
|
|
|
|
# Check for grok failures
|
|
curl -s http://localhost:9600/_node/stats/pipelines?pretty | jq '.pipelines.main.plugins.filters[] | select(.name == "grok") | {name, events_in: .events.in, events_out: .events.out, failures}'
|
|
```
|
|
|
|
**Expected**:
|
|
|
|
- `events.in` and `events.out` are increasing
|
|
- Input plugins show files being read
|
|
- Grok failures < 1% of events
|
|
|
|
**Confirmation**: ✅ Pipeline processing events from multiple sources.
|
|
|
|
---
|
|
|
|
### Step 4.2: Verify File Outputs Created
|
|
|
|
```bash
|
|
# Wait a few minutes for log generation
|
|
sleep 120
|
|
|
|
# Check files were created
|
|
ls -lh /var/log/logstash-operational/ # or /var/log/logstash/
|
|
|
|
# View sample logs
|
|
tail -20 /var/log/logstash-operational/pm2-workers-$(date +%Y-%m-%d).log
|
|
tail -20 /var/log/logstash-operational/redis-operational-$(date +%Y-%m-%d).log
|
|
tail -20 /var/log/logstash-operational/nginx-access-$(date +%Y-%m-%d).log
|
|
```
|
|
|
|
**Expected**:
|
|
|
|
- Files exist with today's date
|
|
- Files contain JSON-formatted log entries
|
|
- Timestamps are recent
|
|
|
|
**Confirmation**: ✅ Operational logs being written successfully.
|
|
|
|
---
|
|
|
|
### Step 4.3: Test Error Forwarding to Bugsink
|
|
|
|
```bash
|
|
# Check HTTP output stats (Bugsink forwarding)
|
|
curl -s http://localhost:9600/_node/stats/pipelines?pretty | jq '.pipelines.main.plugins.outputs[] | select(.name == "http") | {name, events_in: .events.in, events_out: .events.out}'
|
|
```
|
|
|
|
**Manual check**:
|
|
|
|
1. Navigate to: https://bugsink.projectium.com
|
|
2. Check Project 5 (production infrastructure) for recent events
|
|
3. Check Project 6 (test infrastructure) for recent events
|
|
|
|
**Confirmation**: ✅ Errors forwarded to correct Bugsink projects.
|
|
|
|
---
|
|
|
|
### Step 4.4: Monitor Logstash Performance
|
|
|
|
```bash
|
|
# Check memory usage
|
|
ps aux | grep logstash | grep -v grep
|
|
|
|
# Check disk usage
|
|
du -sh /var/log/logstash-operational/
|
|
|
|
# Monitor in real-time (Ctrl+C to exit)
|
|
sudo journalctl -u logstash -f
|
|
```
|
|
|
|
**Expected**:
|
|
|
|
- Memory usage < 1.5GB (with 1GB heap)
|
|
- Disk usage reasonable (< 100MB for first day)
|
|
- No repeated errors
|
|
|
|
**Confirmation**: ✅ Performance is stable.
|
|
|
|
---
|
|
|
|
### Step 4.5: Verify Environment Detection
|
|
|
|
```bash
|
|
# Check recent logs for environment tags
|
|
sudo journalctl -u logstash -n 500 | grep -E "(production|test)" | tail -20
|
|
|
|
# Check file outputs for correct tagging
|
|
grep -o '"environment":"[^"]*"' /var/log/logstash-operational/pm2-workers-$(date +%Y-%m-%d).log | sort | uniq -c
|
|
```
|
|
|
|
**Expected**:
|
|
|
|
- Production worker logs tagged as "production"
|
|
- Test worker logs tagged as "test"
|
|
|
|
**Confirmation**: ✅ Environment detection working correctly.
|
|
|
|
---
|
|
|
|
### Step 4.6: Document Deployment
|
|
|
|
```bash
|
|
# Record deployment
|
|
echo "Extended Logstash Configuration deployed on $(date)" | sudo tee -a /var/log/deployments.log
|
|
|
|
# Record configuration version
|
|
sudo ls -lh /etc/logstash/conf.d/bugsink.conf
|
|
```
|
|
|
|
**Confirmation**: ✅ Deployment documented.
|
|
|
|
---
|
|
|
|
## Phase 5: 24-Hour Monitoring Plan
|
|
|
|
Monitor these metrics over the next 24 hours:
|
|
|
|
**Every 4 hours:**
|
|
|
|
1. **Service health**: `systemctl status logstash`
|
|
2. **Disk usage**: `du -sh /var/log/logstash-operational/`
|
|
3. **Memory usage**: `ps aux | grep logstash | grep -v grep`
|
|
|
|
**Every 12 hours:**
|
|
|
|
1. **Error rates**: Check Bugsink projects 5 and 6
|
|
2. **Log file growth**: `ls -lh /var/log/logstash-operational/`
|
|
3. **Pipeline stats**: `curl -s http://localhost:9600/_node/stats/pipelines?pretty | jq '.pipelines.main.events'`
|
|
|
|
---
|
|
|
|
## Rollback Procedure
|
|
|
|
**If issues occur:**
|
|
|
|
```bash
|
|
# Stop Logstash
|
|
sudo systemctl stop logstash
|
|
|
|
# Find latest backup
|
|
ls -lt /etc/logstash/conf.d/*.backup-* | head -1
|
|
|
|
# Restore backup (replace TIMESTAMP with actual timestamp)
|
|
sudo cp /etc/logstash/conf.d/bugsink.conf.backup-TIMESTAMP \
|
|
/etc/logstash/conf.d/bugsink.conf
|
|
|
|
# Test restored config
|
|
sudo /usr/share/logstash/bin/logstash --config.test_and_exit -f /etc/logstash/conf.d/bugsink.conf
|
|
|
|
# Restart Logstash
|
|
sudo systemctl start logstash
|
|
|
|
# Verify status
|
|
systemctl status logstash
|
|
```
|
|
|
|
---
|
|
|
|
## Quick Health Check
|
|
|
|
Run this anytime to verify deployment health:
|
|
|
|
```bash
|
|
# One-line health check
|
|
systemctl is-active logstash && \
|
|
echo "Service: OK" && \
|
|
ls /var/log/logstash-operational/*.log &>/dev/null && \
|
|
echo "Logs: OK" && \
|
|
curl -s http://localhost:9600/_node/stats/pipelines?pretty | jq -e '.pipelines.main.events.in > 0' &>/dev/null && \
|
|
echo "Processing: OK"
|
|
```
|
|
|
|
Expected output:
|
|
|
|
```
|
|
active
|
|
Service: OK
|
|
Logs: OK
|
|
Processing: OK
|
|
```
|
|
|
|
---
|
|
|
|
## Summary Checklist
|
|
|
|
After completing all steps:
|
|
|
|
- ✅ Phase 1: Inspection complete, state recorded
|
|
- ✅ Phase 2: Deployment decisions made
|
|
- ✅ Phase 3: Configuration deployed
|
|
- ✅ Backup created
|
|
- ✅ Directory configured
|
|
- ✅ Logrotate configured
|
|
- ✅ Permissions granted
|
|
- ✅ Config updated and tested
|
|
- ✅ Service restarted
|
|
- ✅ Phase 4: Verification complete
|
|
- ✅ Pipeline processing
|
|
- ✅ File outputs working
|
|
- ✅ Errors forwarded to Bugsink
|
|
- ✅ Performance stable
|
|
- ✅ Environment detection working
|
|
- ✅ Phase 5: Monitoring plan established
|
|
|
|
**Deployment Status**: [READY / IN PROGRESS / COMPLETE / ROLLED BACK]
|