6.6 KiB
PM2 Crash Debugging Guide
Overview
This guide helps diagnose PM2 daemon crashes and identify which project is causing the issue.
Common Symptoms
- PM2 processes disappear between deployments
ENOENT: no such file or directory, uv_cwderrors in PM2 logs- Processes require
pm2 resurrectafter deployments - PM2 daemon restarts unexpectedly
Root Cause
PM2 processes crash when their working directory (CWD) is deleted or modified while they're running. This typically happens when:
- rsync --delete removes/recreates directories while processes are active
- npm install modifies node_modules while processes are using them
- Deployments don't stop processes before file operations
Debugging Tools
1. PM2 Diagnostics Workflow
Run the comprehensive diagnostics workflow:
# In Gitea Actions UI:
# 1. Go to Actions → "PM2 Diagnostics"
# 2. Click "Run workflow"
# 3. Choose monitoring duration (default: 60s)
This workflow captures:
- Current PM2 state
- Working directory validation
- PM2 daemon logs
- All PM2-managed projects
- Crash patterns
- Deployment script analysis
2. PM2 Crash Analysis Script
Run the crash analysis script on the server:
# SSH to server
ssh gitea-runner@projectium.com
# Run analysis
cd /var/www/flyer-crawler.projectium.com
bash scripts/analyze-pm2-crashes.sh
# Or save to file
bash scripts/analyze-pm2-crashes.sh > pm2-crash-report.txt
3. Manual PM2 Inspection
Quick manual checks:
# Current PM2 state
pm2 list
# Detailed JSON state
pm2 jlist | jq '.'
# Check for missing CWDs
pm2 jlist | jq -r '.[] | "\(.name): \(.pm2_env.pm_cwd)"' | while read line; do
PROC=$(echo "$line" | cut -d: -f1)
CWD=$(echo "$line" | cut -d: -f2- | xargs)
[ -d "$CWD" ] && echo "✅ $PROC" || echo "❌ $PROC (CWD missing: $CWD)"
done
# View PM2 daemon log
tail -100 ~/.pm2/pm2.log
# Search for ENOENT errors
grep -i "ENOENT\|uv_cwd" ~/.pm2/pm2.log
Identifying the Problematic Project
Check Which Projects Share PM2 Daemon
pm2 list
# Group by project
pm2 jlist | jq -r '.[] | .name' | grep -oE "^[a-z-]+" | sort -u
Projects on projectium.com:
flyer-crawler(production, test)stock-alert(production, test)- Others?
Check Deployment Timing
-
Review PM2 daemon restart times:
grep "New PM2 Daemon started" ~/.pm2/pm2.log -
Compare with deployment times in Gitea Actions
-
Identify which deployment triggered the crash
Check Deployment Scripts
For each project, check if deployment stops PM2 before rsync:
# Flyer-crawler
cat /var/www/flyer-crawler.projectium.com/.gitea/workflows/deploy-to-prod.yml | grep -B5 -A5 "rsync.*--delete"
# Stock-alert
cat /var/www/stock-alert.projectium.com/.gitea/workflows/deploy-to-prod.yml | grep -B5 -A5 "rsync.*--delete"
Look for:
- ❌
rsync --deletebeforepm2 stop - ✅
pm2 stopbeforersync --delete
Common Culprits
1. Flyer-Crawler Deployments
Before Fix:
# ❌ BAD - Deploys files while processes running
- name: Deploy Application
run: |
rsync --delete ./ /var/www/...
pm2 restart ...
After Fix:
# ✅ GOOD - Stops processes first
- name: Deploy Application
run: |
pm2 stop flyer-crawler-api flyer-crawler-worker
rsync --delete ./ /var/www/...
pm2 startOrReload ...
2. Stock-Alert Deployments
Check if stock-alert follows the same pattern. If it deploys without stopping PM2, it could crash the shared PM2 daemon.
3. Cross-Project Interference
If multiple projects share PM2:
- One project's deployment can crash another project's processes
- The crashed project's processes lose their CWD
- PM2 daemon may restart, clearing all processes
Solutions
Immediate Fix (Manual)
# Restore processes from dump file
pm2 resurrect
# Verify all processes are running
pm2 list
Permanent Fix
- Update deployment workflows to stop PM2 before file operations
- Isolate PM2 daemons by user or namespace
- Monitor deployments to ensure proper sequencing
Deployment Workflow Template
Correct sequence:
- name: Deploy Application
run: |
# 1. STOP PROCESSES FIRST
pm2 stop my-api my-worker
# 2. THEN deploy files
rsync -avz --delete ./ /var/www/my-app/
# 3. Install dependencies (safe, no processes running)
cd /var/www/my-app
npm install --omit=dev
# 4. Clean up errored processes
pm2 delete my-api my-worker || true
# 5. START processes
pm2 startOrReload ecosystem.config.cjs
pm2 save
Monitoring & Prevention
Enable Verbose Logging
Enhanced deployment logging (already implemented in flyer-crawler):
- name: Deploy Application
run: |
set -x # Command tracing
echo "Step 1: Stopping PM2..."
pm2 stop ...
pm2 list # Verify stopped
echo "Step 2: Deploying files..."
rsync --delete ...
echo "Step 3: Starting PM2..."
pm2 start ...
pm2 list # Verify started
Regular Health Checks
# Add to cron or monitoring system
*/5 * * * * pm2 jlist | jq -r '.[] | select(.pm2_env.status != "online") | "ALERT: \(.name) is \(.pm2_env.status)"'
Troubleshooting Decision Tree
PM2 processes missing?
├─ YES → Run `pm2 resurrect`
│ └─ Check PM2 daemon log for ENOENT errors
│ ├─ ENOENT found → Working directory deleted during deployment
│ │ └─ Fix: Add `pm2 stop` before rsync
│ └─ No ENOENT → Check other error patterns
│
└─ NO → Processes running but unstable?
└─ Check restart counts: `pm2 jlist | jq '.[].pm2_env.restart_time'`
└─ High restarts → Application-level issue (not PM2 crash)
Related Documentation
Quick Reference Commands
# Diagnose
pm2 list # Current state
pm2 jlist | jq '.' # Detailed JSON
tail -100 ~/.pm2/pm2.log # Recent logs
grep ENOENT ~/.pm2/pm2.log # Find crashes
# Fix
pm2 resurrect # Restore from dump
pm2 restart all # Restart everything
pm2 save # Save current state
# Analyze
bash scripts/analyze-pm2-crashes.sh # Run analysis script
pm2 jlist | jq -r '.[].pm2_env.pm_cwd' # Check working dirs