Compare commits
34 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
d21496e8b2 | ||
| dfeb2f1c3d | |||
|
|
209d7ceba1 | ||
|
|
645c1784b7 | ||
|
|
e02716c092 | ||
| 66e6d2fdbc | |||
| 82a38b4e2a | |||
|
|
f6f4415aeb | ||
|
|
0c23aa4c5e | ||
| 07125fc99d | |||
| 626aa80799 | |||
|
|
4025f29c5c | ||
|
|
e9e3b14050 | ||
| 507e89ea4e | |||
|
|
1efe42090b | ||
| 97cc14288b | |||
|
|
96251ec2cc | ||
| fe79522ea4 | |||
|
|
743216ef1b | ||
|
|
c53295a371 | ||
| c18efb1b60 | |||
|
|
822805e4c4 | ||
|
|
6fd690890b | ||
|
|
5fd836190c | ||
| 441467eb8a | |||
| 59bfc859d7 | |||
|
|
b989405a53 | ||
|
|
6af2533e9e | ||
| f434a5846a | |||
|
|
aea368677f | ||
| cd8ee92813 | |||
|
|
cf2cc5b832 | ||
| d2db3562bb | |||
|
|
0532b4b22e |
93
.gitattributes
vendored
Normal file
93
.gitattributes
vendored
Normal file
@@ -0,0 +1,93 @@
|
||||
# .gitattributes
|
||||
#
|
||||
# Optimize Gitea performance by excluding generated and vendored files
|
||||
# from language statistics and indexing.
|
||||
#
|
||||
# See: https://github.com/github/linguist/blob/master/docs/overrides.md
|
||||
|
||||
# =============================================================================
|
||||
# Vendored Dependencies
|
||||
# =============================================================================
|
||||
node_modules/** linguist-vendored
|
||||
|
||||
# =============================================================================
|
||||
# Generated Files - Coverage Reports
|
||||
# =============================================================================
|
||||
coverage/** linguist-generated
|
||||
.coverage/** linguist-generated
|
||||
public/coverage/** linguist-generated
|
||||
.nyc_output/** linguist-generated
|
||||
|
||||
# =============================================================================
|
||||
# Generated Files - Build Artifacts
|
||||
# =============================================================================
|
||||
dist/** linguist-generated
|
||||
build/** linguist-generated
|
||||
|
||||
# =============================================================================
|
||||
# Generated Files - Test Results
|
||||
# =============================================================================
|
||||
test-results/** linguist-generated
|
||||
playwright-report/** linguist-generated
|
||||
playwright-report-visual/** linguist-generated
|
||||
.vitest-results/** linguist-generated
|
||||
|
||||
# =============================================================================
|
||||
# Generated Files - TSOA OpenAPI Spec & Routes
|
||||
# =============================================================================
|
||||
src/routes/routes.ts linguist-generated
|
||||
public/swagger.json linguist-generated
|
||||
|
||||
# =============================================================================
|
||||
# Documentation Files
|
||||
# =============================================================================
|
||||
*.md linguist-documentation
|
||||
|
||||
# =============================================================================
|
||||
# Line Ending Normalization
|
||||
# =============================================================================
|
||||
# Ensure consistent line endings across platforms
|
||||
* text=auto
|
||||
|
||||
# Shell scripts should always use LF
|
||||
*.sh text eol=lf
|
||||
|
||||
# Windows batch files should use CRLF
|
||||
*.bat text eol=crlf
|
||||
*.cmd text eol=crlf
|
||||
|
||||
# SQL files should use LF
|
||||
*.sql text eol=lf
|
||||
|
||||
# Configuration files
|
||||
*.json text
|
||||
*.yml text
|
||||
*.yaml text
|
||||
*.toml text
|
||||
*.ini text
|
||||
|
||||
# Source code
|
||||
*.ts text
|
||||
*.tsx text
|
||||
*.js text
|
||||
*.jsx text
|
||||
*.cjs text
|
||||
*.mjs text
|
||||
*.css text
|
||||
*.scss text
|
||||
*.html text
|
||||
|
||||
# =============================================================================
|
||||
# Binary Files (explicit binary to prevent corruption)
|
||||
# =============================================================================
|
||||
*.png binary
|
||||
*.jpg binary
|
||||
*.jpeg binary
|
||||
*.gif binary
|
||||
*.ico binary
|
||||
*.pdf binary
|
||||
*.woff binary
|
||||
*.woff2 binary
|
||||
*.ttf binary
|
||||
*.eot binary
|
||||
*.otf binary
|
||||
@@ -86,6 +86,12 @@ jobs:
|
||||
echo "✅ Schema is up to date. No changes detected."
|
||||
fi
|
||||
|
||||
- name: Generate TSOA OpenAPI Spec and Routes
|
||||
run: |
|
||||
echo "Generating TSOA OpenAPI specification and route handlers..."
|
||||
npm run tsoa:build
|
||||
echo "✅ TSOA files generated successfully"
|
||||
|
||||
- name: Build React Application for Production
|
||||
# Source Maps (ADR-015): If SENTRY_AUTH_TOKEN is set, the @sentry/vite-plugin will:
|
||||
# 1. Generate hidden source maps during build
|
||||
@@ -119,18 +125,82 @@ jobs:
|
||||
|
||||
- name: Deploy Application to Production Server
|
||||
run: |
|
||||
echo "Deploying application files to /var/www/flyer-crawler.projectium.com..."
|
||||
echo "========================================="
|
||||
echo "DEPLOYING TO PRODUCTION SERVER"
|
||||
echo "========================================="
|
||||
APP_PATH="/var/www/flyer-crawler.projectium.com"
|
||||
|
||||
# CRITICAL: Stop PM2 processes BEFORE deploying files to prevent CWD errors
|
||||
echo "--- Stopping production PM2 processes before file deployment ---"
|
||||
pm2 stop flyer-crawler-api flyer-crawler-worker flyer-crawler-analytics-worker || echo "No production processes to stop"
|
||||
# ========================================
|
||||
# LAYER 1: PRE-FLIGHT SAFETY CHECKS
|
||||
# ========================================
|
||||
echo ""
|
||||
echo "--- Pre-Flight Safety Checks ---"
|
||||
|
||||
# Check 1: Verify we're in a git repository
|
||||
if ! git rev-parse --git-dir > /dev/null 2>&1; then
|
||||
echo "❌ FATAL: Not in a git repository! Aborting to prevent data loss."
|
||||
exit 1
|
||||
fi
|
||||
echo "✅ Git repository verified"
|
||||
|
||||
# Check 2: Verify critical files exist before deployment
|
||||
if [ ! -f "package.json" ] || [ ! -f "server.ts" ]; then
|
||||
echo "❌ FATAL: Critical files missing (package.json or server.ts). Aborting."
|
||||
exit 1
|
||||
fi
|
||||
echo "✅ Critical files verified"
|
||||
|
||||
# Check 3: Verify we have actual content to deploy (prevent empty checkout)
|
||||
FILE_COUNT=$(find . -type f | wc -l)
|
||||
if [ "$FILE_COUNT" -lt 10 ]; then
|
||||
echo "❌ FATAL: Suspiciously few files ($FILE_COUNT). Aborting to prevent catastrophic deletion."
|
||||
exit 1
|
||||
fi
|
||||
echo "✅ File count verified: $FILE_COUNT files ready to deploy"
|
||||
|
||||
# ========================================
|
||||
# LAYER 2: STOP PM2 BEFORE FILE OPERATIONS
|
||||
# ========================================
|
||||
echo ""
|
||||
echo "--- Stopping PM2 Processes ---"
|
||||
pm2 stop flyer-crawler-api flyer-crawler-worker flyer-crawler-analytics-worker --namespace flyer-crawler-prod || echo "No production processes to stop"
|
||||
pm2 list --namespace flyer-crawler-prod
|
||||
|
||||
# ========================================
|
||||
# LAYER 3: SAFE RSYNC WITH COMPREHENSIVE EXCLUDES
|
||||
# ========================================
|
||||
echo ""
|
||||
echo "--- Deploying Application Files ---"
|
||||
mkdir -p "$APP_PATH"
|
||||
mkdir -p "$APP_PATH/flyer-images/icons" "$APP_PATH/flyer-images/archive"
|
||||
rsync -avz --delete --exclude 'node_modules' --exclude '.git' --exclude 'dist' --exclude 'flyer-images' ./ "$APP_PATH/"
|
||||
rsync -avz dist/ "$APP_PATH"
|
||||
echo "Application deployment complete."
|
||||
|
||||
# Deploy backend with critical file exclusions
|
||||
rsync -avz --delete \
|
||||
--exclude 'node_modules' \
|
||||
--exclude '.git' \
|
||||
--exclude 'dist' \
|
||||
--exclude 'flyer-images' \
|
||||
--exclude 'ecosystem.config.cjs' \
|
||||
--exclude 'ecosystem-test.config.cjs' \
|
||||
--exclude 'ecosystem.dev.config.cjs' \
|
||||
--exclude '.env.*' \
|
||||
--exclude 'coverage' \
|
||||
--exclude '.coverage' \
|
||||
--exclude 'test-results' \
|
||||
--exclude 'playwright-report' \
|
||||
--exclude 'playwright-report-visual' \
|
||||
./ "$APP_PATH/" 2>&1 | tail -20
|
||||
|
||||
echo "✅ Backend files deployed ($(find "$APP_PATH" -type f | wc -l) files)"
|
||||
|
||||
# Deploy frontend assets
|
||||
rsync -avz dist/ "$APP_PATH" 2>&1 | tail -10
|
||||
echo "✅ Frontend assets deployed"
|
||||
|
||||
echo ""
|
||||
echo "========================================="
|
||||
echo "DEPLOYMENT COMPLETE"
|
||||
echo "========================================="
|
||||
|
||||
- name: Log Workflow Metadata
|
||||
run: |
|
||||
@@ -183,7 +253,7 @@ jobs:
|
||||
|
||||
# === PRE-CLEANUP PM2 STATE LOGGING ===
|
||||
echo "=== PRE-CLEANUP PM2 STATE ==="
|
||||
pm2 jlist
|
||||
pm2 jlist --namespace flyer-crawler-prod
|
||||
echo "=== END PRE-CLEANUP STATE ==="
|
||||
|
||||
# --- Cleanup Errored Processes with Defense-in-Depth Safeguards ---
|
||||
@@ -191,7 +261,7 @@ jobs:
|
||||
node -e "
|
||||
const exec = require('child_process').execSync;
|
||||
try {
|
||||
const list = JSON.parse(exec('pm2 jlist').toString());
|
||||
const list = JSON.parse(exec('pm2 jlist --namespace flyer-crawler-prod').toString());
|
||||
const prodProcesses = ['flyer-crawler-api', 'flyer-crawler-worker', 'flyer-crawler-analytics-worker'];
|
||||
|
||||
// Filter for processes that match our criteria
|
||||
@@ -219,7 +289,7 @@ jobs:
|
||||
targetProcesses.forEach(p => {
|
||||
console.log('Deleting ' + p.pm2_env.status + ' production process: ' + p.name + ' (' + p.pm2_env.pm_id + ')');
|
||||
try {
|
||||
exec('pm2 delete ' + p.pm2_env.pm_id);
|
||||
exec('pm2 delete ' + p.pm2_env.pm_id + ' --namespace flyer-crawler-prod');
|
||||
} catch(e) {
|
||||
console.error('Failed to delete ' + p.pm2_env.pm_id);
|
||||
}
|
||||
@@ -231,9 +301,13 @@ jobs:
|
||||
}
|
||||
"
|
||||
|
||||
# Save PM2 process list after cleanup to persist deletions
|
||||
echo "Saving PM2 process list after cleanup..."
|
||||
pm2 save --namespace flyer-crawler-prod
|
||||
|
||||
# === POST-CLEANUP VERIFICATION ===
|
||||
echo "=== POST-CLEANUP VERIFICATION ==="
|
||||
pm2 jlist | node -e "
|
||||
pm2 jlist --namespace flyer-crawler-prod | node -e "
|
||||
try {
|
||||
const list = JSON.parse(require('fs').readFileSync(0, 'utf-8'));
|
||||
const prodProcesses = list.filter(p => p.name && p.name.startsWith('flyer-crawler-') && !p.name.endsWith('-test') && !p.name.endsWith('-dev'));
|
||||
@@ -257,7 +331,7 @@ jobs:
|
||||
|
||||
# Get the running version from PM2 for the main API process
|
||||
# We use a small node script to parse the JSON output from pm2 jlist
|
||||
RUNNING_VERSION=$(pm2 jlist | node -e "try { const list = JSON.parse(require('fs').readFileSync(0, 'utf-8')); const app = list.find(p => p.name === 'flyer-crawler-api'); console.log(app ? app.pm2_env.version : ''); } catch(e) { console.log(''); }")
|
||||
RUNNING_VERSION=$(pm2 jlist --namespace flyer-crawler-prod | node -e "try { const list = JSON.parse(require('fs').readFileSync(0, 'utf-8')); const app = list.find(p => p.name === 'flyer-crawler-api'); console.log(app ? app.pm2_env.version : ''); } catch(e) { console.log(''); }")
|
||||
echo "Running PM2 Version: $RUNNING_VERSION"
|
||||
|
||||
if [ "${{ gitea.event.inputs.force_reload }}" == "true" ] || [ "$NEW_VERSION" != "$RUNNING_VERSION" ] || [ -z "$RUNNING_VERSION" ]; then
|
||||
@@ -266,7 +340,7 @@ jobs:
|
||||
else
|
||||
echo "Version mismatch (Running: $RUNNING_VERSION -> Deployed: $NEW_VERSION) or app not running. Reloading PM2..."
|
||||
fi
|
||||
pm2 startOrReload ecosystem.config.cjs --update-env && pm2 save
|
||||
pm2 startOrReload ecosystem.config.cjs --update-env --namespace flyer-crawler-prod && pm2 save --namespace flyer-crawler-prod
|
||||
echo "Production backend server reloaded successfully."
|
||||
else
|
||||
echo "Version $NEW_VERSION is already running. Skipping PM2 reload."
|
||||
@@ -296,14 +370,14 @@ jobs:
|
||||
sleep 5 # Wait a few seconds for the app to start and log its output.
|
||||
|
||||
# Resolve the PM2 ID dynamically to ensure we target the correct process
|
||||
PM2_ID=$(pm2 jlist | node -e "try { const list = JSON.parse(require('fs').readFileSync(0, 'utf-8')); const app = list.find(p => p.name === 'flyer-crawler-api'); console.log(app ? app.pm2_env.pm_id : ''); } catch(e) { console.log(''); }")
|
||||
PM2_ID=$(pm2 jlist --namespace flyer-crawler-prod | node -e "try { const list = JSON.parse(require('fs').readFileSync(0, 'utf-8')); const app = list.find(p => p.name === 'flyer-crawler-api'); console.log(app ? app.pm2_env.pm_id : ''); } catch(e) { console.log(''); }")
|
||||
|
||||
if [ -n "$PM2_ID" ]; then
|
||||
echo "Found process ID: $PM2_ID"
|
||||
pm2 describe "$PM2_ID" || echo "Failed to describe process $PM2_ID"
|
||||
pm2 logs "$PM2_ID" --lines 20 --nostream || echo "Failed to get logs for $PM2_ID"
|
||||
pm2 env "$PM2_ID" || echo "Failed to get env for $PM2_ID"
|
||||
pm2 describe "$PM2_ID" --namespace flyer-crawler-prod || echo "Failed to describe process $PM2_ID"
|
||||
pm2 logs "$PM2_ID" --lines 20 --nostream --namespace flyer-crawler-prod || echo "Failed to get logs for $PM2_ID"
|
||||
pm2 env "$PM2_ID" --namespace flyer-crawler-prod || echo "Failed to get env for $PM2_ID"
|
||||
else
|
||||
echo "Could not find process 'flyer-crawler-api' in pm2 list."
|
||||
pm2 list # Fallback to listing everything to help debug
|
||||
pm2 list --namespace flyer-crawler-prod # Fallback to listing everything to help debug
|
||||
fi
|
||||
|
||||
File diff suppressed because it is too large
Load Diff
@@ -57,8 +57,9 @@ jobs:
|
||||
- name: Step 1 - Stop Application Server
|
||||
run: |
|
||||
echo "Stopping PRODUCTION PM2 processes to release database connections..."
|
||||
pm2 stop flyer-crawler-api flyer-crawler-worker flyer-crawler-analytics-worker || echo "Production PM2 processes were not running."
|
||||
echo "✅ Production application server stopped."
|
||||
pm2 stop flyer-crawler-api flyer-crawler-worker flyer-crawler-analytics-worker --namespace flyer-crawler-prod || echo "Production PM2 processes were not running."
|
||||
pm2 save --namespace flyer-crawler-prod
|
||||
echo "✅ Production application server stopped and saved."
|
||||
|
||||
- name: Step 2 - Drop and Recreate Database
|
||||
run: |
|
||||
@@ -91,5 +92,5 @@ jobs:
|
||||
run: |
|
||||
echo "Restarting application server..."
|
||||
cd /var/www/flyer-crawler.projectium.com
|
||||
pm2 startOrReload ecosystem.config.cjs --env production && pm2 save
|
||||
pm2 startOrReload ecosystem.config.cjs --env production --namespace flyer-crawler-prod && pm2 save --namespace flyer-crawler-prod
|
||||
echo "✅ Application server restarted."
|
||||
|
||||
@@ -85,6 +85,12 @@ jobs:
|
||||
echo "✅ Schema is up to date. No changes detected."
|
||||
fi
|
||||
|
||||
- name: Generate TSOA OpenAPI Spec and Routes
|
||||
run: |
|
||||
echo "Generating TSOA OpenAPI specification and route handlers..."
|
||||
npm run tsoa:build
|
||||
echo "✅ TSOA files generated successfully"
|
||||
|
||||
- name: Build React Application for Production
|
||||
run: |
|
||||
if [ -z "${{ secrets.VITE_GOOGLE_GENAI_API_KEY }}" ]; then
|
||||
@@ -151,7 +157,7 @@ jobs:
|
||||
|
||||
# === PRE-CLEANUP PM2 STATE LOGGING ===
|
||||
echo "=== PRE-CLEANUP PM2 STATE ==="
|
||||
pm2 jlist
|
||||
pm2 jlist --namespace flyer-crawler-prod
|
||||
echo "=== END PRE-CLEANUP STATE ==="
|
||||
|
||||
# --- Cleanup Errored Processes with Defense-in-Depth Safeguards ---
|
||||
@@ -159,7 +165,7 @@ jobs:
|
||||
node -e "
|
||||
const exec = require('child_process').execSync;
|
||||
try {
|
||||
const list = JSON.parse(exec('pm2 jlist').toString());
|
||||
const list = JSON.parse(exec('pm2 jlist --namespace flyer-crawler-prod').toString());
|
||||
const prodProcesses = ['flyer-crawler-api', 'flyer-crawler-worker', 'flyer-crawler-analytics-worker'];
|
||||
|
||||
// Filter for processes that match our criteria
|
||||
@@ -187,7 +193,7 @@ jobs:
|
||||
targetProcesses.forEach(p => {
|
||||
console.log('Deleting ' + p.pm2_env.status + ' production process: ' + p.name + ' (' + p.pm2_env.pm_id + ')');
|
||||
try {
|
||||
exec('pm2 delete ' + p.pm2_env.pm_id);
|
||||
exec('pm2 delete ' + p.pm2_env.pm_id + ' --namespace flyer-crawler-prod');
|
||||
} catch(e) {
|
||||
console.error('Failed to delete ' + p.pm2_env.pm_id);
|
||||
}
|
||||
@@ -199,9 +205,13 @@ jobs:
|
||||
}
|
||||
"
|
||||
|
||||
# Save PM2 process list after cleanup to persist deletions
|
||||
echo "Saving PM2 process list after cleanup..."
|
||||
pm2 save --namespace flyer-crawler-prod
|
||||
|
||||
# === POST-CLEANUP VERIFICATION ===
|
||||
echo "=== POST-CLEANUP VERIFICATION ==="
|
||||
pm2 jlist | node -e "
|
||||
pm2 jlist --namespace flyer-crawler-prod | node -e "
|
||||
try {
|
||||
const list = JSON.parse(require('fs').readFileSync(0, 'utf-8'));
|
||||
const prodProcesses = list.filter(p => p.name && p.name.startsWith('flyer-crawler-') && !p.name.endsWith('-test') && !p.name.endsWith('-dev'));
|
||||
@@ -225,7 +235,7 @@ jobs:
|
||||
|
||||
# Get the running version from PM2 for the main API process
|
||||
# We use a small node script to parse the JSON output from pm2 jlist
|
||||
RUNNING_VERSION=$(pm2 jlist | node -e "try { const list = JSON.parse(require('fs').readFileSync(0, 'utf-8')); const app = list.find(p => p.name === 'flyer-crawler-api'); console.log(app ? app.pm2_env.version : ''); } catch(e) { console.log(''); }")
|
||||
RUNNING_VERSION=$(pm2 jlist --namespace flyer-crawler-prod | node -e "try { const list = JSON.parse(require('fs').readFileSync(0, 'utf-8')); const app = list.find(p => p.name === 'flyer-crawler-api'); console.log(app ? app.pm2_env.version : ''); } catch(e) { console.log(''); }")
|
||||
echo "Running PM2 Version: $RUNNING_VERSION"
|
||||
|
||||
if [ "${{ gitea.event.inputs.force_reload }}" == "true" ] || [ "$NEW_VERSION" != "$RUNNING_VERSION" ] || [ -z "$RUNNING_VERSION" ]; then
|
||||
@@ -234,7 +244,7 @@ jobs:
|
||||
else
|
||||
echo "Version mismatch (Running: $RUNNING_VERSION -> Deployed: $NEW_VERSION) or app not running. Reloading PM2..."
|
||||
fi
|
||||
pm2 startOrReload ecosystem.config.cjs --env production --update-env && pm2 save
|
||||
pm2 startOrReload ecosystem.config.cjs --env production --update-env --namespace flyer-crawler-prod && pm2 save --namespace flyer-crawler-prod
|
||||
echo "Production backend server reloaded successfully."
|
||||
else
|
||||
echo "Version $NEW_VERSION is already running. Skipping PM2 reload."
|
||||
@@ -257,6 +267,6 @@ jobs:
|
||||
run: |
|
||||
echo "--- Displaying recent PM2 logs for flyer-crawler-api ---"
|
||||
sleep 5
|
||||
pm2 describe flyer-crawler-api || echo "Could not find production pm2 process."
|
||||
pm2 logs flyer-crawler-api --lines 20 --nostream || echo "Could not find production pm2 process."
|
||||
pm2 env flyer-crawler-api || echo "Could not find production pm2 process."
|
||||
pm2 describe flyer-crawler-api --namespace flyer-crawler-prod || echo "Could not find production pm2 process."
|
||||
pm2 logs flyer-crawler-api --lines 20 --nostream --namespace flyer-crawler-prod || echo "Could not find production pm2 process."
|
||||
pm2 env flyer-crawler-api --namespace flyer-crawler-prod || echo "Could not find production pm2 process."
|
||||
258
.gitea/workflows/pm2-diagnostics.yml
Normal file
258
.gitea/workflows/pm2-diagnostics.yml
Normal file
@@ -0,0 +1,258 @@
|
||||
# .gitea/workflows/pm2-diagnostics.yml
|
||||
#
|
||||
# Comprehensive PM2 diagnostics to identify crash causes and problematic projects
|
||||
name: PM2 Diagnostics
|
||||
|
||||
on:
|
||||
workflow_dispatch:
|
||||
inputs:
|
||||
capture_interval:
|
||||
description: 'Seconds between PM2 state captures (default: 5)'
|
||||
required: false
|
||||
default: '5'
|
||||
duration:
|
||||
description: 'Total monitoring duration in seconds (default: 60)'
|
||||
required: false
|
||||
default: '60'
|
||||
|
||||
jobs:
|
||||
pm2-diagnostics:
|
||||
runs-on: projectium.com
|
||||
|
||||
steps:
|
||||
- name: PM2 Current State Snapshot
|
||||
run: |
|
||||
echo "========================================="
|
||||
echo "PM2 CURRENT STATE SNAPSHOT"
|
||||
echo "========================================="
|
||||
echo ""
|
||||
echo "=== Production Namespace (flyer-crawler-prod) ==="
|
||||
echo "--- PM2 List (Human Readable) ---"
|
||||
pm2 list --namespace flyer-crawler-prod
|
||||
echo ""
|
||||
echo "--- PM2 List (JSON) ---"
|
||||
pm2 jlist --namespace flyer-crawler-prod > /tmp/pm2-state-initial-prod.json
|
||||
cat /tmp/pm2-state-initial-prod.json | jq '.'
|
||||
echo ""
|
||||
echo "=== Test Namespace (flyer-crawler-test) ==="
|
||||
echo "--- PM2 List (Human Readable) ---"
|
||||
pm2 list --namespace flyer-crawler-test
|
||||
echo ""
|
||||
echo "--- PM2 List (JSON) ---"
|
||||
pm2 jlist --namespace flyer-crawler-test > /tmp/pm2-state-initial-test.json
|
||||
cat /tmp/pm2-state-initial-test.json | jq '.'
|
||||
echo ""
|
||||
echo "=== All Namespaces Combined ==="
|
||||
echo "--- PM2 List (All) ---"
|
||||
pm2 list
|
||||
echo ""
|
||||
echo "--- PM2 Daemon Info ---"
|
||||
pm2 info pm2-logrotate || echo "pm2-logrotate not found"
|
||||
echo ""
|
||||
echo "--- PM2 Version ---"
|
||||
pm2 --version
|
||||
echo ""
|
||||
echo "--- Node Version ---"
|
||||
node --version
|
||||
|
||||
- name: PM2 Process Working Directories
|
||||
run: |
|
||||
echo "========================================="
|
||||
echo "PROCESS WORKING DIRECTORIES"
|
||||
echo "========================================="
|
||||
echo ""
|
||||
echo "=== Production Namespace (flyer-crawler-prod) ==="
|
||||
pm2 jlist --namespace flyer-crawler-prod | jq -r '.[] | "Process: \(.name) | CWD: \(.pm2_env.pm_cwd) | Exists: \(if .pm2_env.pm_cwd then "checking..." else "N/A" end)"'
|
||||
echo ""
|
||||
echo "--- Checking if CWDs still exist ---"
|
||||
pm2 jlist --namespace flyer-crawler-prod | jq -r '.[].pm2_env.pm_cwd' | while read cwd; do
|
||||
if [ -n "$cwd" ] && [ "$cwd" != "null" ]; then
|
||||
if [ -d "$cwd" ]; then
|
||||
echo "✅ EXISTS: $cwd"
|
||||
else
|
||||
echo "❌ MISSING: $cwd (THIS WILL CAUSE CRASHES!)"
|
||||
fi
|
||||
fi
|
||||
done
|
||||
echo ""
|
||||
echo "=== Test Namespace (flyer-crawler-test) ==="
|
||||
pm2 jlist --namespace flyer-crawler-test | jq -r '.[] | "Process: \(.name) | CWD: \(.pm2_env.pm_cwd) | Exists: \(if .pm2_env.pm_cwd then "checking..." else "N/A" end)"'
|
||||
echo ""
|
||||
echo "--- Checking if CWDs still exist ---"
|
||||
pm2 jlist --namespace flyer-crawler-test | jq -r '.[].pm2_env.pm_cwd' | while read cwd; do
|
||||
if [ -n "$cwd" ] && [ "$cwd" != "null" ]; then
|
||||
if [ -d "$cwd" ]; then
|
||||
echo "✅ EXISTS: $cwd"
|
||||
else
|
||||
echo "❌ MISSING: $cwd (THIS WILL CAUSE CRASHES!)"
|
||||
fi
|
||||
fi
|
||||
done
|
||||
|
||||
- name: PM2 Log Analysis
|
||||
run: |
|
||||
echo "========================================="
|
||||
echo "PM2 LOG ANALYSIS"
|
||||
echo "========================================="
|
||||
echo ""
|
||||
echo "--- PM2 Daemon Log (Last 100 Lines) ---"
|
||||
tail -100 /home/gitea-runner/.pm2/pm2.log
|
||||
echo ""
|
||||
echo "--- Searching for ENOENT errors ---"
|
||||
grep -i "ENOENT\|no such file or directory\|uv_cwd" /home/gitea-runner/.pm2/pm2.log || echo "No ENOENT errors found"
|
||||
echo ""
|
||||
echo "--- Searching for crash patterns ---"
|
||||
grep -i "crash\|error\|exception" /home/gitea-runner/.pm2/pm2.log | tail -50 || echo "No crashes found"
|
||||
|
||||
- name: Identify All PM2-Managed Projects
|
||||
run: |
|
||||
echo "========================================="
|
||||
echo "ALL PM2-MANAGED PROJECTS"
|
||||
echo "========================================="
|
||||
echo ""
|
||||
echo "=== Production Namespace (flyer-crawler-prod) ==="
|
||||
pm2 jlist --namespace flyer-crawler-prod | jq -r '.[] | "[\(.pm_id)] \(.name) - v\(.pm2_env.version // "N/A") - \(.pm2_env.status) - CWD: \(.pm2_env.pm_cwd)"'
|
||||
echo ""
|
||||
echo "--- Projects by CWD ---"
|
||||
pm2 jlist --namespace flyer-crawler-prod | jq -r '.[].pm2_env.pm_cwd' | sort -u
|
||||
echo ""
|
||||
echo "=== Test Namespace (flyer-crawler-test) ==="
|
||||
pm2 jlist --namespace flyer-crawler-test | jq -r '.[] | "[\(.pm_id)] \(.name) - v\(.pm2_env.version // "N/A") - \(.pm2_env.status) - CWD: \(.pm2_env.pm_cwd)"'
|
||||
echo ""
|
||||
echo "--- Projects by CWD ---"
|
||||
pm2 jlist --namespace flyer-crawler-test | jq -r '.[].pm2_env.pm_cwd' | sort -u
|
||||
echo ""
|
||||
echo "=== All Namespaces (for reference) ==="
|
||||
pm2 jlist | jq -r '.[] | "[\(.pm_id)] \(.name) [ns: \(.pm2_env.namespace // "default")] - \(.pm2_env.status)"'
|
||||
echo ""
|
||||
echo "--- Checking which projects might interfere ---"
|
||||
for dir in /var/www/*; do
|
||||
if [ -d "$dir" ]; then
|
||||
echo ""
|
||||
echo "Directory: $dir"
|
||||
ls -la "$dir" | grep -E "ecosystem|package.json|node_modules" || echo " No PM2/Node files"
|
||||
fi
|
||||
done
|
||||
|
||||
- name: Monitor PM2 State Over Time
|
||||
run: |
|
||||
echo "========================================="
|
||||
echo "PM2 STATE MONITORING"
|
||||
echo "========================================="
|
||||
echo "Monitoring PM2 for ${{ gitea.event.inputs.duration }} seconds..."
|
||||
echo "Capturing state every ${{ gitea.event.inputs.capture_interval }} seconds"
|
||||
echo ""
|
||||
|
||||
INTERVAL=${{ gitea.event.inputs.capture_interval }}
|
||||
DURATION=${{ gitea.event.inputs.duration }}
|
||||
COUNT=$((DURATION / INTERVAL))
|
||||
|
||||
for i in $(seq 1 $COUNT); do
|
||||
echo "--- Capture $i at $(date) ---"
|
||||
echo ""
|
||||
echo "=== Production Namespace (flyer-crawler-prod) ==="
|
||||
pm2 jlist --namespace flyer-crawler-prod | jq -r '.[] | "\(.name): \(.pm2_env.status) (restarts: \(.pm2_env.restart_time))"'
|
||||
|
||||
# Check for crashes in production
|
||||
CRASHED_PROD=$(pm2 jlist --namespace flyer-crawler-prod | jq '[.[] | select(.pm2_env.status == "errored" or .pm2_env.status == "stopped")] | length')
|
||||
if [ "$CRASHED_PROD" -gt 0 ]; then
|
||||
echo "⚠️ WARNING: $CRASHED_PROD PRODUCTION process(es) in crashed state!"
|
||||
pm2 jlist --namespace flyer-crawler-prod | jq -r '.[] | select(.pm2_env.status == "errored" or .pm2_env.status == "stopped") | " - \(.name): \(.pm2_env.status)"'
|
||||
fi
|
||||
|
||||
echo ""
|
||||
echo "=== Test Namespace (flyer-crawler-test) ==="
|
||||
pm2 jlist --namespace flyer-crawler-test | jq -r '.[] | "\(.name): \(.pm2_env.status) (restarts: \(.pm2_env.restart_time))"'
|
||||
|
||||
# Check for crashes in test
|
||||
CRASHED_TEST=$(pm2 jlist --namespace flyer-crawler-test | jq '[.[] | select(.pm2_env.status == "errored" or .pm2_env.status == "stopped")] | length')
|
||||
if [ "$CRASHED_TEST" -gt 0 ]; then
|
||||
echo "⚠️ WARNING: $CRASHED_TEST TEST process(es) in crashed state!"
|
||||
pm2 jlist --namespace flyer-crawler-test | jq -r '.[] | select(.pm2_env.status == "errored" or .pm2_env.status == "stopped") | " - \(.name): \(.pm2_env.status)"'
|
||||
fi
|
||||
|
||||
echo ""
|
||||
sleep $INTERVAL
|
||||
done
|
||||
|
||||
- name: PM2 Dump File Analysis
|
||||
run: |
|
||||
echo "========================================="
|
||||
echo "PM2 DUMP FILE ANALYSIS"
|
||||
echo "========================================="
|
||||
echo "--- Dump file location ---"
|
||||
ls -lh /home/gitea-runner/.pm2/dump.pm2
|
||||
echo ""
|
||||
echo "--- Dump file contents ---"
|
||||
cat /home/gitea-runner/.pm2/dump.pm2 | jq '.'
|
||||
echo ""
|
||||
echo "--- Processes in dump ---"
|
||||
cat /home/gitea-runner/.pm2/dump.pm2 | jq -r '.apps[] | "\(.name) at \(.pm_cwd)"'
|
||||
|
||||
- name: Check for Rogue Deployment Scripts
|
||||
run: |
|
||||
echo "========================================="
|
||||
echo "DEPLOYMENT SCRIPT ANALYSIS"
|
||||
echo "========================================="
|
||||
echo "Checking for scripts that might delete directories..."
|
||||
echo ""
|
||||
for project in flyer-crawler stock-alert; do
|
||||
for env in "" "-test"; do
|
||||
DIR="/var/www/$project$env.projectium.com"
|
||||
if [ -d "$DIR" ]; then
|
||||
echo "--- Project: $project$env ---"
|
||||
echo "Location: $DIR"
|
||||
if [ -f "$DIR/.gitea/workflows/deploy-to-test.yml" ]; then
|
||||
echo "Has deploy-to-test workflow"
|
||||
grep -n "rsync.*--delete\|rm -rf" "$DIR/.gitea/workflows/deploy-to-test.yml" | head -5 || echo "No dangerous commands found"
|
||||
fi
|
||||
if [ -f "$DIR/.gitea/workflows/deploy-to-prod.yml" ]; then
|
||||
echo "Has deploy-to-prod workflow"
|
||||
grep -n "rsync.*--delete\|rm -rf" "$DIR/.gitea/workflows/deploy-to-prod.yml" | head -5 || echo "No dangerous commands found"
|
||||
fi
|
||||
echo ""
|
||||
fi
|
||||
done
|
||||
done
|
||||
|
||||
- name: Generate Diagnostic Report
|
||||
run: |
|
||||
echo "========================================="
|
||||
echo "DIAGNOSTIC SUMMARY"
|
||||
echo "========================================="
|
||||
echo ""
|
||||
echo "=== Production Namespace (flyer-crawler-prod) ==="
|
||||
echo "Total processes: $(pm2 jlist --namespace flyer-crawler-prod | jq 'length')"
|
||||
echo "Online: $(pm2 jlist --namespace flyer-crawler-prod | jq '[.[] | select(.pm2_env.status == "online")] | length')"
|
||||
echo "Stopped: $(pm2 jlist --namespace flyer-crawler-prod | jq '[.[] | select(.pm2_env.status == "stopped")] | length')"
|
||||
echo "Errored: $(pm2 jlist --namespace flyer-crawler-prod | jq '[.[] | select(.pm2_env.status == "errored")] | length')"
|
||||
echo ""
|
||||
echo "Flyer-crawler PROD processes:"
|
||||
pm2 jlist --namespace flyer-crawler-prod | jq -r '.[] | select(.name | contains("flyer-crawler")) | " \(.name): \(.pm2_env.status)"'
|
||||
echo ""
|
||||
echo "=== Test Namespace (flyer-crawler-test) ==="
|
||||
echo "Total processes: $(pm2 jlist --namespace flyer-crawler-test | jq 'length')"
|
||||
echo "Online: $(pm2 jlist --namespace flyer-crawler-test | jq '[.[] | select(.pm2_env.status == "online")] | length')"
|
||||
echo "Stopped: $(pm2 jlist --namespace flyer-crawler-test | jq '[.[] | select(.pm2_env.status == "stopped")] | length')"
|
||||
echo "Errored: $(pm2 jlist --namespace flyer-crawler-test | jq '[.[] | select(.pm2_env.status == "errored")] | length')"
|
||||
echo ""
|
||||
echo "Flyer-crawler TEST processes:"
|
||||
pm2 jlist --namespace flyer-crawler-test | jq -r '.[] | select(.name | contains("flyer-crawler")) | " \(.name): \(.pm2_env.status)"'
|
||||
echo ""
|
||||
echo "=== All Namespaces Summary ==="
|
||||
echo "Total PM2 processes (all): $(pm2 jlist | jq 'length')"
|
||||
echo ""
|
||||
echo "Stock-alert processes (separate project):"
|
||||
pm2 jlist | jq -r '.[] | select(.name | contains("stock-alert")) | " \(.name): \(.pm2_env.status) [ns: \(.pm2_env.namespace // "default")]"'
|
||||
echo ""
|
||||
echo "Other processes:"
|
||||
pm2 jlist | jq -r '.[] | select(.name | contains("flyer-crawler") | not) | select(.name | contains("stock-alert") | not) | " \(.name): \(.pm2_env.status) [ns: \(.pm2_env.namespace // "default")]"'
|
||||
echo ""
|
||||
echo "========================================="
|
||||
echo "RECOMMENDATIONS"
|
||||
echo "========================================="
|
||||
echo "1. Check for missing CWDs (marked with ❌ above)"
|
||||
echo "2. Review PM2 daemon log for ENOENT errors"
|
||||
echo "3. Verify no deployments are running rsync --delete while processes are online"
|
||||
echo "4. Use namespace-specific commands: pm2 list --namespace flyer-crawler-prod"
|
||||
echo "5. Avoid pm2 restart all - use namespace targeting instead"
|
||||
@@ -33,22 +33,22 @@ jobs:
|
||||
cd /var/www/flyer-crawler-test.projectium.com
|
||||
|
||||
echo "--- Current PM2 State (Before Restart) ---"
|
||||
pm2 list
|
||||
pm2 list --namespace flyer-crawler-test
|
||||
|
||||
echo "--- Restarting Test Processes ---"
|
||||
pm2 restart flyer-crawler-api-test flyer-crawler-worker-test flyer-crawler-analytics-worker-test || {
|
||||
pm2 restart flyer-crawler-api-test flyer-crawler-worker-test flyer-crawler-analytics-worker-test --namespace flyer-crawler-test || {
|
||||
echo "Restart failed, attempting to start processes..."
|
||||
pm2 start ecosystem-test.config.cjs
|
||||
pm2 start ecosystem-test.config.cjs --namespace flyer-crawler-test
|
||||
}
|
||||
|
||||
echo "--- Saving PM2 Process List ---"
|
||||
pm2 save
|
||||
pm2 save --namespace flyer-crawler-test
|
||||
|
||||
echo "--- Waiting 3 seconds for processes to stabilize ---"
|
||||
sleep 3
|
||||
|
||||
echo "=== TEST ENVIRONMENT STATUS ==="
|
||||
pm2 ps
|
||||
pm2 ps --namespace flyer-crawler-test
|
||||
|
||||
- name: Restart Production Environment
|
||||
if: gitea.event.inputs.environment == 'production' || gitea.event.inputs.environment == 'both'
|
||||
@@ -57,30 +57,51 @@ jobs:
|
||||
cd /var/www/flyer-crawler.projectium.com
|
||||
|
||||
echo "--- Current PM2 State (Before Restart) ---"
|
||||
pm2 list
|
||||
pm2 list --namespace flyer-crawler-prod
|
||||
|
||||
echo "--- Restarting Production Processes ---"
|
||||
pm2 restart flyer-crawler-api flyer-crawler-worker flyer-crawler-analytics-worker || {
|
||||
pm2 restart flyer-crawler-api flyer-crawler-worker flyer-crawler-analytics-worker --namespace flyer-crawler-prod || {
|
||||
echo "Restart failed, attempting to start processes..."
|
||||
pm2 start ecosystem.config.cjs
|
||||
pm2 start ecosystem.config.cjs --namespace flyer-crawler-prod
|
||||
}
|
||||
|
||||
echo "--- Saving PM2 Process List ---"
|
||||
pm2 save
|
||||
pm2 save --namespace flyer-crawler-prod
|
||||
|
||||
echo "--- Waiting 3 seconds for processes to stabilize ---"
|
||||
sleep 3
|
||||
|
||||
echo "=== PRODUCTION ENVIRONMENT STATUS ==="
|
||||
pm2 ps
|
||||
pm2 ps --namespace flyer-crawler-prod
|
||||
|
||||
- name: Final PM2 Status (All Processes)
|
||||
run: |
|
||||
echo "========================================="
|
||||
echo "FINAL PM2 STATUS - ALL PROCESSES"
|
||||
echo "========================================="
|
||||
pm2 ps
|
||||
|
||||
echo ""
|
||||
echo "--- PM2 Logs (Last 20 Lines) ---"
|
||||
pm2 logs --lines 20 --nostream || echo "No logs available"
|
||||
if [ "${{ gitea.event.inputs.environment }}" = "test" ]; then
|
||||
echo "--- Test Namespace ---"
|
||||
pm2 ps --namespace flyer-crawler-test
|
||||
echo ""
|
||||
echo "--- PM2 Logs (Last 20 Lines) ---"
|
||||
pm2 logs --namespace flyer-crawler-test --lines 20 --nostream || echo "No logs available"
|
||||
elif [ "${{ gitea.event.inputs.environment }}" = "production" ]; then
|
||||
echo "--- Production Namespace ---"
|
||||
pm2 ps --namespace flyer-crawler-prod
|
||||
echo ""
|
||||
echo "--- PM2 Logs (Last 20 Lines) ---"
|
||||
pm2 logs --namespace flyer-crawler-prod --lines 20 --nostream || echo "No logs available"
|
||||
else
|
||||
echo "--- Test Namespace ---"
|
||||
pm2 ps --namespace flyer-crawler-test
|
||||
echo ""
|
||||
echo "--- Production Namespace ---"
|
||||
pm2 ps --namespace flyer-crawler-prod
|
||||
echo ""
|
||||
echo "--- PM2 Logs - Test (Last 10 Lines) ---"
|
||||
pm2 logs --namespace flyer-crawler-test --lines 10 --nostream || echo "No logs available"
|
||||
echo ""
|
||||
echo "--- PM2 Logs - Production (Last 10 Lines) ---"
|
||||
pm2 logs --namespace flyer-crawler-prod --lines 10 --nostream || echo "No logs available"
|
||||
fi
|
||||
|
||||
100
.gitea/workflows/sync-test-version.yml
Normal file
100
.gitea/workflows/sync-test-version.yml
Normal file
@@ -0,0 +1,100 @@
|
||||
# .gitea/workflows/sync-test-version.yml
|
||||
#
|
||||
# Lightweight workflow to sync version numbers from production to test environment.
|
||||
# This runs after successful production deployments to update test PM2 metadata
|
||||
# without re-running the full test suite, build, and deployment pipeline.
|
||||
#
|
||||
# Duration: ~30 seconds (vs 5+ minutes for full test deployment)
|
||||
name: Sync Test Version
|
||||
|
||||
on:
|
||||
workflow_run:
|
||||
workflows: ["Deploy to Production"]
|
||||
types:
|
||||
- completed
|
||||
branches:
|
||||
- main
|
||||
|
||||
jobs:
|
||||
sync-version:
|
||||
runs-on: projectium.com
|
||||
# Only run if the production deployment succeeded
|
||||
if: ${{ gitea.event.workflow_run.conclusion == 'success' }}
|
||||
|
||||
steps:
|
||||
- name: Checkout Latest Code
|
||||
uses: actions/checkout@v3
|
||||
with:
|
||||
fetch-depth: 1 # Shallow clone, we only need latest commit
|
||||
|
||||
- name: Update Test Package Version
|
||||
run: |
|
||||
echo "========================================="
|
||||
echo "SYNCING VERSION TO TEST ENVIRONMENT"
|
||||
echo "========================================="
|
||||
|
||||
APP_PATH="/var/www/flyer-crawler-test.projectium.com"
|
||||
|
||||
# Get version from this repo's package.json
|
||||
NEW_VERSION=$(node -p "require('./package.json').version")
|
||||
echo "Production version: $NEW_VERSION"
|
||||
|
||||
# Get current test version
|
||||
if [ -f "$APP_PATH/package.json" ]; then
|
||||
CURRENT_VERSION=$(node -p "require('$APP_PATH/package.json').version")
|
||||
echo "Current test version: $CURRENT_VERSION"
|
||||
else
|
||||
CURRENT_VERSION="unknown"
|
||||
echo "Test package.json not found"
|
||||
fi
|
||||
|
||||
# Only update if versions differ
|
||||
if [ "$NEW_VERSION" != "$CURRENT_VERSION" ]; then
|
||||
echo "Updating test package.json to version $NEW_VERSION..."
|
||||
|
||||
# Update just the version field in test's package.json
|
||||
cd "$APP_PATH"
|
||||
npm version "$NEW_VERSION" --no-git-tag-version --allow-same-version
|
||||
|
||||
echo "✅ Test package.json updated to $NEW_VERSION"
|
||||
else
|
||||
echo "ℹ️ Versions already match, no update needed"
|
||||
fi
|
||||
|
||||
- name: Restart Test PM2 Processes
|
||||
run: |
|
||||
echo "Restarting test PM2 processes to refresh version metadata..."
|
||||
|
||||
# Restart with --update-env to pick up new package.json version
|
||||
pm2 --namespace flyer-crawler-test restart flyer-crawler-api-test flyer-crawler-worker-test flyer-crawler-analytics-worker-test --update-env && pm2 --namespace flyer-crawler-test save
|
||||
|
||||
echo "✅ Test PM2 processes restarted and saved"
|
||||
|
||||
# Show current state
|
||||
echo ""
|
||||
echo "--- Current PM2 State ---"
|
||||
pm2 --namespace flyer-crawler-test list
|
||||
|
||||
# Verify version in PM2 metadata
|
||||
echo ""
|
||||
echo "--- Verifying Version in PM2 ---"
|
||||
pm2 --namespace flyer-crawler-test jlist | node -e "
|
||||
try {
|
||||
const list = JSON.parse(require('fs').readFileSync(0, 'utf-8'));
|
||||
const testProcesses = list.filter(p => p.name && p.name.endsWith('-test'));
|
||||
testProcesses.forEach(p => {
|
||||
console.log(p.name + ': v' + (p.pm2_env.version || 'unknown') + ' (' + p.pm2_env.status + ')');
|
||||
});
|
||||
} catch(e) {
|
||||
console.error('Failed to parse PM2 output');
|
||||
}
|
||||
"
|
||||
|
||||
- name: Summary
|
||||
run: |
|
||||
echo "========================================="
|
||||
echo "VERSION SYNC COMPLETE"
|
||||
echo "========================================="
|
||||
echo "Test environment version updated to match production"
|
||||
echo "No tests run, no builds performed"
|
||||
echo "Duration: ~30 seconds"
|
||||
135
CLAUDE.md
135
CLAUDE.md
@@ -45,50 +45,129 @@ Out-of-sync = test failures.
|
||||
- Maximum 3 fix commands at a time (errors may cascade)
|
||||
- Always verify after fixes complete
|
||||
|
||||
### PM2 Process Isolation (Production/Test Servers)
|
||||
### PM2 Namespace Isolation (Production/Test Servers)
|
||||
|
||||
**CRITICAL**: Production and test environments share the same PM2 daemon on the server.
|
||||
|
||||
Flyer-crawler uses PM2 namespaces to isolate test and production processes:
|
||||
|
||||
| Namespace | Purpose | Config File |
|
||||
| -------------------- | ---------------------- | --------------------------- |
|
||||
| `flyer-crawler-prod` | Production environment | `ecosystem.config.cjs` |
|
||||
| `flyer-crawler-test` | Test environment | `ecosystem-test.config.cjs` |
|
||||
| `flyer-crawler-dev` | Development container | `ecosystem.dev.config.cjs` |
|
||||
|
||||
This prevents `pm2 save` race conditions during simultaneous deployments. See [ADR-063](docs/adr/0063-pm2-namespace-implementation.md) for details.
|
||||
|
||||
**See also**: [PM2 Process Isolation Incidents](#pm2-process-isolation-incidents) for past incidents and response procedures.
|
||||
|
||||
| Environment | Processes | Config File |
|
||||
| ----------- | -------------------------------------------------------------------------------------------- | --------------------------- |
|
||||
| Production | `flyer-crawler-api`, `flyer-crawler-worker`, `flyer-crawler-analytics-worker` | `ecosystem.config.cjs` |
|
||||
| Test | `flyer-crawler-api-test`, `flyer-crawler-worker-test`, `flyer-crawler-analytics-worker-test` | `ecosystem-test.config.cjs` |
|
||||
| Development | `flyer-crawler-api-dev`, `flyer-crawler-worker-dev`, `flyer-crawler-vite-dev` | `ecosystem.dev.config.cjs` |
|
||||
| Environment | Processes | Namespace |
|
||||
| ----------- | -------------------------------------------------------------------------------------------- | -------------------- |
|
||||
| Production | `flyer-crawler-api`, `flyer-crawler-worker`, `flyer-crawler-analytics-worker` | `flyer-crawler-prod` |
|
||||
| Test | `flyer-crawler-api-test`, `flyer-crawler-worker-test`, `flyer-crawler-analytics-worker-test` | `flyer-crawler-test` |
|
||||
| Development | `flyer-crawler-api-dev`, `flyer-crawler-worker-dev`, `flyer-crawler-vite-dev` | `flyer-crawler-dev` |
|
||||
|
||||
**Deployment Scripts MUST:**
|
||||
|
||||
- ✅ Use `--namespace` flag for all PM2 commands to scope to correct environment
|
||||
- ✅ Filter PM2 commands by exact process names or name patterns (e.g., `endsWith('-test')`)
|
||||
- ❌ NEVER use `pm2 stop all`, `pm2 delete all`, or `pm2 restart all`
|
||||
- ❌ NEVER use `pm2 stop all`, `pm2 delete all`, or `pm2 restart all` without namespace
|
||||
- ❌ NEVER delete/stop processes based solely on status without name filtering
|
||||
- ✅ Always verify process names match the target environment before any operation
|
||||
|
||||
**Examples:**
|
||||
|
||||
```bash
|
||||
# ✅ CORRECT - Production cleanup (filter by name)
|
||||
pm2 stop flyer-crawler-api flyer-crawler-worker flyer-crawler-analytics-worker
|
||||
# ✅ CORRECT - Production commands with namespace
|
||||
pm2 start ecosystem.config.cjs --namespace flyer-crawler-prod
|
||||
pm2 stop flyer-crawler-api flyer-crawler-worker --namespace flyer-crawler-prod
|
||||
pm2 restart all --namespace flyer-crawler-prod && pm2 save --namespace flyer-crawler-prod
|
||||
pm2 logs --namespace flyer-crawler-prod
|
||||
|
||||
# ✅ CORRECT - Test cleanup (filter by name pattern)
|
||||
# ✅ CORRECT - Test commands with namespace
|
||||
pm2 start ecosystem-test.config.cjs --namespace flyer-crawler-test
|
||||
pm2 status --namespace flyer-crawler-test
|
||||
pm2 delete all --namespace flyer-crawler-test && pm2 save --namespace flyer-crawler-test
|
||||
|
||||
# ✅ CORRECT - Dev container commands with namespace
|
||||
pm2 start ecosystem.dev.config.cjs --namespace flyer-crawler-dev
|
||||
pm2 logs --namespace flyer-crawler-dev
|
||||
|
||||
# ✅ CORRECT - Test cleanup (filter by namespace + name pattern)
|
||||
# Only delete test processes that are errored/stopped
|
||||
list.forEach(p => {
|
||||
if ((p.pm2_env.status === 'errored' || p.pm2_env.status === 'stopped') &&
|
||||
p.name && p.name.endsWith('-test')) {
|
||||
exec('pm2 delete ' + p.pm2_env.pm_id);
|
||||
p.name && p.name.endsWith('-test') &&
|
||||
p.pm2_env.namespace === 'flyer-crawler-test') {
|
||||
exec('pm2 delete ' + p.pm2_env.pm_id + ' --namespace flyer-crawler-test');
|
||||
}
|
||||
});
|
||||
exec('pm2 save --namespace flyer-crawler-test');
|
||||
|
||||
# ❌ WRONG - Affects all environments
|
||||
# ❌ WRONG - Missing namespace (affects all environments)
|
||||
pm2 stop all
|
||||
pm2 delete all
|
||||
pm2 restart all
|
||||
|
||||
# ❌ WRONG - No name filtering (could delete test processes during prod deploy)
|
||||
# ❌ WRONG - No name/namespace filtering (could delete test processes during prod deploy)
|
||||
if (p.pm2_env.status === 'errored') {
|
||||
exec('pm2 delete ' + p.pm2_env.pm_id);
|
||||
}
|
||||
```
|
||||
|
||||
### PM2 Save Requirement (CRITICAL)
|
||||
|
||||
**CRITICAL**: Every `pm2 start`, `pm2 restart`, `pm2 stop`, or `pm2 delete` command MUST be immediately followed by `pm2 save` with the same namespace.
|
||||
|
||||
Without `pm2 save`, processes become ephemeral and will disappear on:
|
||||
|
||||
- PM2 daemon restarts
|
||||
- Server reboots
|
||||
- Internal PM2 reconciliation events
|
||||
|
||||
**Pattern:**
|
||||
|
||||
```bash
|
||||
# ✅ CORRECT - Save after every state change (with namespace)
|
||||
pm2 start ecosystem.config.cjs --namespace flyer-crawler-prod && pm2 save --namespace flyer-crawler-prod
|
||||
pm2 restart my-app --namespace flyer-crawler-prod && pm2 save --namespace flyer-crawler-prod
|
||||
pm2 stop my-app --namespace flyer-crawler-test && pm2 save --namespace flyer-crawler-test
|
||||
pm2 delete my-app --namespace flyer-crawler-test && pm2 save --namespace flyer-crawler-test
|
||||
|
||||
# ❌ WRONG - Missing save (processes become ephemeral)
|
||||
pm2 start ecosystem.config.cjs --namespace flyer-crawler-prod
|
||||
pm2 restart my-app --namespace flyer-crawler-prod
|
||||
|
||||
# ❌ WRONG - Missing namespace (affects wrong environment)
|
||||
pm2 start ecosystem.config.cjs && pm2 save
|
||||
```
|
||||
|
||||
**In Cleanup Scripts:**
|
||||
|
||||
```javascript
|
||||
// ✅ CORRECT - Save after cleanup loop completes (with namespace)
|
||||
const namespace = 'flyer-crawler-test';
|
||||
targetProcesses.forEach((p) => {
|
||||
exec(`pm2 delete ${p.pm2_env.pm_id} --namespace ${namespace}`);
|
||||
});
|
||||
exec(`pm2 save --namespace ${namespace}`); // Persist all deletions
|
||||
|
||||
// ❌ WRONG - Missing save and namespace
|
||||
targetProcesses.forEach((p) => {
|
||||
exec('pm2 delete ' + p.pm2_env.pm_id);
|
||||
});
|
||||
```
|
||||
|
||||
**Why This Matters:**
|
||||
|
||||
PM2 maintains an in-memory process list. The `pm2 save` command writes this list to `~/.pm2/dump.pm2`, which PM2 uses to resurrect processes after daemon restarts. Without it, your carefully managed process state is lost. Using namespaces ensures that `pm2 save` in one environment does not affect another.
|
||||
|
||||
**See Also:**
|
||||
|
||||
- [ADR-014: Containerization and Deployment Strategy](docs/adr/0014-containerization-and-deployment-strategy.md)
|
||||
- [ADR-061: PM2 Process Isolation Safeguards](docs/adr/0061-pm2-process-isolation-safeguards.md)
|
||||
- [ADR-063: PM2 Namespace Implementation](docs/adr/0063-pm2-namespace-implementation.md)
|
||||
|
||||
### Communication Style
|
||||
|
||||
Ask before assuming. Never assume:
|
||||
@@ -102,7 +181,7 @@ Ask before assuming. Never assume:
|
||||
1. **Memory**: `mcp__memory__read_graph` - Recall project context, credentials, known issues
|
||||
2. **Git**: `git log --oneline -10` - Recent changes
|
||||
3. **Containers**: `mcp__podman__container_list` - Running state
|
||||
4. **PM2 Status**: `podman exec flyer-crawler-dev pm2 status` - Process health (API, Worker, Vite)
|
||||
4. **PM2 Status**: `podman exec flyer-crawler-dev pm2 status --namespace flyer-crawler-dev` - Process health (API, Worker, Vite)
|
||||
|
||||
---
|
||||
|
||||
@@ -110,15 +189,17 @@ Ask before assuming. Never assume:
|
||||
|
||||
### Essential Commands
|
||||
|
||||
| Command | Description |
|
||||
| ------------------------------------------------------------ | --------------------- |
|
||||
| `podman exec -it flyer-crawler-dev npm test` | Run all tests |
|
||||
| `podman exec -it flyer-crawler-dev npm run test:unit` | Unit tests only |
|
||||
| `podman exec -it flyer-crawler-dev npm run type-check` | TypeScript check |
|
||||
| `podman exec -it flyer-crawler-dev npm run test:integration` | Integration tests |
|
||||
| `podman exec -it flyer-crawler-dev pm2 status` | PM2 process status |
|
||||
| `podman exec -it flyer-crawler-dev pm2 logs` | View all PM2 logs |
|
||||
| `podman exec -it flyer-crawler-dev pm2 restart all` | Restart all processes |
|
||||
| Command | Description |
|
||||
| --------------------------------------------------------------------------------- | ------------------------ |
|
||||
| `podman exec -it flyer-crawler-dev npm test` | Run all tests |
|
||||
| `podman exec -it flyer-crawler-dev npm run test:unit` | Unit tests only |
|
||||
| `podman exec -it flyer-crawler-dev npm run type-check` | TypeScript check |
|
||||
| `podman exec -it flyer-crawler-dev npm run test:integration` | Integration tests |
|
||||
| `podman exec -it flyer-crawler-dev pm2 status --namespace flyer-crawler-dev` | PM2 process status (dev) |
|
||||
| `podman exec -it flyer-crawler-dev pm2 logs --namespace flyer-crawler-dev` | View PM2 logs (dev) |
|
||||
| `podman exec -it flyer-crawler-dev pm2 restart all --namespace flyer-crawler-dev` | Restart all (dev) |
|
||||
| `pm2 status --namespace flyer-crawler-prod` | PM2 status (production) |
|
||||
| `pm2 status --namespace flyer-crawler-test` | PM2 status (test) |
|
||||
|
||||
### Key Patterns (with file locations)
|
||||
|
||||
@@ -174,11 +255,13 @@ The dev container now matches production by using PM2 for process management.
|
||||
|
||||
| Component | Production | Dev Container |
|
||||
| ---------- | ---------------------- | ------------------------- |
|
||||
| API Server | PM2 cluster mode | PM2 fork mode + tsx watch |
|
||||
| API Server | PM2 fork mode | PM2 fork mode + tsx watch |
|
||||
| Worker | PM2 process | PM2 process + tsx watch |
|
||||
| Frontend | Static files via NGINX | PM2 + Vite dev server |
|
||||
| Logs | PM2 logs -> Logstash | PM2 logs -> Logstash |
|
||||
|
||||
**Note:** PM2 cluster mode is incompatible with tsx as script path. See [PM2-CLUSTER-MODE-INCOMPATIBILITY.md](docs/operations/PM2-CLUSTER-MODE-INCOMPATIBILITY.md).
|
||||
|
||||
**PM2 Processes in Dev Container**:
|
||||
|
||||
- `flyer-crawler-api-dev` - API server (port 3001)
|
||||
@@ -319,7 +402,7 @@ Common issues with solutions:
|
||||
|
||||
**Related Documentation**:
|
||||
|
||||
- [PM2 Process Isolation Requirements](#pm2-process-isolation-productiontest-servers) (existing section)
|
||||
- [PM2 Namespace Isolation](#pm2-namespace-isolation-productiontest-servers) (existing section)
|
||||
- [Incident Report 2026-02-17](docs/operations/INCIDENT-2026-02-17-PM2-PROCESS-KILL.md)
|
||||
- [PM2 Incident Response Runbook](docs/operations/PM2-INCIDENT-RESPONSE.md)
|
||||
|
||||
|
||||
20
README.md
20
README.md
@@ -49,8 +49,8 @@ npm run dev
|
||||
|
||||
The application will be available at:
|
||||
|
||||
- **Frontend**: http://localhost:5173
|
||||
- **Backend API**: http://localhost:3001
|
||||
- **Frontend**: <http://localhost:5173>
|
||||
- **Backend API**: <http://localhost:3001>
|
||||
|
||||
See [docs/getting-started/INSTALL.md](docs/getting-started/INSTALL.md) for detailed setup instructions including:
|
||||
|
||||
@@ -88,7 +88,7 @@ See [docs/development/TESTING.md](docs/development/TESTING.md) for testing guide
|
||||
| [⚙️ Installation Guide](docs/getting-started/INSTALL.md) | Local development setup with Podman |
|
||||
| [🏗️ Architecture Overview](docs/architecture/DATABASE.md) | System design, database, authentication |
|
||||
| [💻 Development Guide](docs/development/TESTING.md) | Testing, debugging, code patterns |
|
||||
| [🚀 Deployment Guide](docs/operations/DEPLOYMENT.md) | Production setup, NGINX, PM2 |
|
||||
| [🚀 Deployment Guide](docs/operations/DEPLOYMENT.md) | Production setup, NGINX, PM2 namespaces |
|
||||
| [🤖 AI Agent Guides](docs/subagents/OVERVIEW.md) | Working with Claude Code subagents |
|
||||
|
||||
### Quick References
|
||||
@@ -126,13 +126,13 @@ See [INSTALL.md](INSTALL.md) for the complete list.
|
||||
|
||||
## Scripts
|
||||
|
||||
| Command | Description |
|
||||
| -------------------- | -------------------------------- |
|
||||
| `npm run dev` | Start development server |
|
||||
| `npm run build` | Build for production |
|
||||
| `npm run start:prod` | Start production server with PM2 |
|
||||
| `npm run test` | Run test suite |
|
||||
| `npm run seed` | Seed development user accounts |
|
||||
| Command | Description |
|
||||
| -------------------- | ----------------------------------------------------------- |
|
||||
| `npm run dev` | Start development server |
|
||||
| `npm run build` | Build for production |
|
||||
| `npm run start:prod` | Start production server with PM2 (uses namespace isolation) |
|
||||
| `npm run test` | Run test suite |
|
||||
| `npm run seed` | Seed development user accounts |
|
||||
|
||||
---
|
||||
|
||||
|
||||
@@ -56,7 +56,7 @@ podman exec -it flyer-crawler-dev npm run dev:container
|
||||
|
||||
**Pass/Fail**: [ ]
|
||||
|
||||
**Notes**: ****\*\*****\*\*****\*\*****\_\_\_****\*\*****\*\*****\*\*****
|
||||
**Notes**: \***\*\*\*\*\***\*\*\***\*\*\*\*\***\_\_\_\***\*\*\*\*\***\*\*\***\*\*\*\*\***
|
||||
|
||||
---
|
||||
|
||||
@@ -90,7 +90,7 @@ podman exec -it flyer-crawler-dev npm run dev:container
|
||||
|
||||
**Pass/Fail**: [ ]
|
||||
|
||||
**Notes**: ****\*\*****\*\*****\*\*****\_\_\_****\*\*****\*\*****\*\*****
|
||||
**Notes**: \***\*\*\*\*\***\*\*\***\*\*\*\*\***\_\_\_\***\*\*\*\*\***\*\*\***\*\*\*\*\***
|
||||
|
||||
---
|
||||
|
||||
@@ -114,7 +114,7 @@ podman exec -it flyer-crawler-dev npm run dev:container
|
||||
|
||||
**Pass/Fail**: [ ]
|
||||
|
||||
**Notes**: ****\*\*****\*\*****\*\*****\_\_\_****\*\*****\*\*****\*\*****
|
||||
**Notes**: \***\*\*\*\*\***\*\*\***\*\*\*\*\***\_\_\_\***\*\*\*\*\***\*\*\***\*\*\*\*\***
|
||||
|
||||
---
|
||||
|
||||
@@ -138,7 +138,7 @@ podman exec -it flyer-crawler-dev npm run dev:container
|
||||
|
||||
**Pass/Fail**: [ ]
|
||||
|
||||
**Notes**: ****\*\*****\*\*****\*\*****\_\_\_****\*\*****\*\*****\*\*****
|
||||
**Notes**: \***\*\*\*\*\***\*\*\***\*\*\*\*\***\_\_\_\***\*\*\*\*\***\*\*\***\*\*\*\*\***
|
||||
|
||||
---
|
||||
|
||||
@@ -161,7 +161,7 @@ podman exec -it flyer-crawler-dev npm run dev:container
|
||||
|
||||
**Pass/Fail**: [ ]
|
||||
|
||||
**Notes**: ****\*\*****\*\*****\*\*****\_\_\_****\*\*****\*\*****\*\*****
|
||||
**Notes**: \***\*\*\*\*\***\*\*\***\*\*\*\*\***\_\_\_\***\*\*\*\*\***\*\*\***\*\*\*\*\***
|
||||
|
||||
---
|
||||
|
||||
@@ -189,7 +189,7 @@ podman exec -it flyer-crawler-dev npm run dev:container
|
||||
|
||||
**Pass/Fail**: [ ]
|
||||
|
||||
**Notes**: ****\*\*****\*\*****\*\*****\_\_\_****\*\*****\*\*****\*\*****
|
||||
**Notes**: \***\*\*\*\*\***\*\*\***\*\*\*\*\***\_\_\_\***\*\*\*\*\***\*\*\***\*\*\*\*\***
|
||||
|
||||
---
|
||||
|
||||
@@ -211,7 +211,7 @@ podman exec -it flyer-crawler-dev npm run dev:container
|
||||
|
||||
**Pass/Fail**: [ ]
|
||||
|
||||
**Notes**: ****\*\*****\*\*****\*\*****\_\_\_****\*\*****\*\*****\*\*****
|
||||
**Notes**: \***\*\*\*\*\***\*\*\***\*\*\*\*\***\_\_\_\***\*\*\*\*\***\*\*\***\*\*\*\*\***
|
||||
|
||||
---
|
||||
|
||||
@@ -234,7 +234,7 @@ podman exec -it flyer-crawler-dev npm run dev:container
|
||||
|
||||
**Pass/Fail**: [ ]
|
||||
|
||||
**Notes**: ****\*\*****\*\*****\*\*****\_\_\_****\*\*****\*\*****\*\*****
|
||||
**Notes**: \***\*\*\*\*\***\*\*\***\*\*\*\*\***\_\_\_\***\*\*\*\*\***\*\*\***\*\*\*\*\***
|
||||
|
||||
---
|
||||
|
||||
@@ -259,7 +259,7 @@ podman exec -it flyer-crawler-dev npm run dev:container
|
||||
|
||||
**Pass/Fail**: [ ]
|
||||
|
||||
**Notes**: ****\*\*****\*\*****\*\*****\_\_\_****\*\*****\*\*****\*\*****
|
||||
**Notes**: \***\*\*\*\*\***\*\*\***\*\*\*\*\***\_\_\_\***\*\*\*\*\***\*\*\***\*\*\*\*\***
|
||||
|
||||
---
|
||||
|
||||
@@ -284,7 +284,7 @@ podman exec -it flyer-crawler-dev npm run dev:container
|
||||
|
||||
**Pass/Fail**: [ ]
|
||||
|
||||
**Notes**: ****\*\*****\*\*****\*\*****\_\_\_****\*\*****\*\*****\*\*****
|
||||
**Notes**: \***\*\*\*\*\***\*\*\***\*\*\*\*\***\_\_\_\***\*\*\*\*\***\*\*\***\*\*\*\*\***
|
||||
|
||||
---
|
||||
|
||||
@@ -307,7 +307,7 @@ podman exec -it flyer-crawler-dev npm run dev:container
|
||||
|
||||
**Pass/Fail**: [ ]
|
||||
|
||||
**Notes**: ****\*\*****\*\*****\*\*****\_\_\_****\*\*****\*\*****\*\*****
|
||||
**Notes**: \***\*\*\*\*\***\*\*\***\*\*\*\*\***\_\_\_\***\*\*\*\*\***\*\*\***\*\*\*\*\***
|
||||
|
||||
---
|
||||
|
||||
@@ -330,7 +330,7 @@ podman exec -it flyer-crawler-dev npm run dev:container
|
||||
|
||||
**Pass/Fail**: [ ]
|
||||
|
||||
**Notes**: ****\*\*****\*\*****\*\*****\_\_\_****\*\*****\*\*****\*\*****
|
||||
**Notes**: \***\*\*\*\*\***\*\*\***\*\*\*\*\***\_\_\_\***\*\*\*\*\***\*\*\***\*\*\*\*\***
|
||||
|
||||
---
|
||||
|
||||
@@ -355,7 +355,7 @@ podman exec -it flyer-crawler-dev npm run dev:container
|
||||
|
||||
**Pass/Fail**: [ ]
|
||||
|
||||
**Notes**: ****\*\*****\*\*****\*\*****\_\_\_****\*\*****\*\*****\*\*****
|
||||
**Notes**: \***\*\*\*\*\***\*\*\***\*\*\*\*\***\_\_\_\***\*\*\*\*\***\*\*\***\*\*\*\*\***
|
||||
|
||||
---
|
||||
|
||||
@@ -379,7 +379,7 @@ podman exec -it flyer-crawler-dev npm run dev:container
|
||||
|
||||
**Pass/Fail**: [ ]
|
||||
|
||||
**Notes**: ****\*\*****\*\*****\*\*****\_\_\_****\*\*****\*\*****\*\*****
|
||||
**Notes**: \***\*\*\*\*\***\*\*\***\*\*\*\*\***\_\_\_\***\*\*\*\*\***\*\*\***\*\*\*\*\***
|
||||
|
||||
---
|
||||
|
||||
@@ -425,7 +425,7 @@ podman exec -it flyer-crawler-dev npm run dev:container
|
||||
|
||||
**Pass/Fail**: [ ]
|
||||
|
||||
**Notes**: ****\*\*****\*\*****\*\*****\_\_\_****\*\*****\*\*****\*\*****
|
||||
**Notes**: \***\*\*\*\*\***\*\*\***\*\*\*\*\***\_\_\_\***\*\*\*\*\***\*\*\***\*\*\*\*\***
|
||||
|
||||
---
|
||||
|
||||
@@ -448,7 +448,7 @@ podman exec -it flyer-crawler-dev npm run dev:container
|
||||
|
||||
**Pass/Fail**: [ ]
|
||||
|
||||
**Notes**: ****\*\*****\*\*****\*\*****\_\_\_****\*\*****\*\*****\*\*****
|
||||
**Notes**: \***\*\*\*\*\***\*\*\***\*\*\*\*\***\_\_\_\***\*\*\*\*\***\*\*\***\*\*\*\*\***
|
||||
|
||||
---
|
||||
|
||||
@@ -476,7 +476,7 @@ podman exec -it flyer-crawler-dev npm run dev:container
|
||||
|
||||
**Pass/Fail**: [ ]
|
||||
|
||||
**Notes**: ****\*\*****\*\*****\*\*****\_\_\_****\*\*****\*\*****\*\*****
|
||||
**Notes**: \***\*\*\*\*\***\*\*\***\*\*\*\*\***\_\_\_\***\*\*\*\*\***\*\*\***\*\*\*\*\***
|
||||
|
||||
---
|
||||
|
||||
@@ -502,7 +502,7 @@ podman exec -it flyer-crawler-dev npm run dev:container
|
||||
|
||||
**Pass/Fail**: [ ]
|
||||
|
||||
**Notes**: ****\*\*****\*\*****\*\*****\_\_\_****\*\*****\*\*****\*\*****
|
||||
**Notes**: \***\*\*\*\*\***\*\*\***\*\*\*\*\***\_\_\_\***\*\*\*\*\***\*\*\***\*\*\*\*\***
|
||||
|
||||
---
|
||||
|
||||
@@ -529,7 +529,7 @@ podman exec -it flyer-crawler-dev npm run dev:container
|
||||
|
||||
**Pass/Fail**: [ ]
|
||||
|
||||
**Notes**: ****\*\*****\*\*****\*\*****\_\_\_****\*\*****\*\*****\*\*****
|
||||
**Notes**: \***\*\*\*\*\***\*\*\***\*\*\*\*\***\_\_\_\***\*\*\*\*\***\*\*\***\*\*\*\*\***
|
||||
|
||||
---
|
||||
|
||||
@@ -555,7 +555,7 @@ podman exec -it flyer-crawler-dev npm run dev:container
|
||||
|
||||
**Pass/Fail**: [ ]
|
||||
|
||||
**Notes**: ****\*\*****\*\*****\*\*****\_\_\_****\*\*****\*\*****\*\*****
|
||||
**Notes**: \***\*\*\*\*\***\*\*\***\*\*\*\*\***\_\_\_\***\*\*\*\*\***\*\*\***\*\*\*\*\***
|
||||
|
||||
---
|
||||
|
||||
@@ -579,7 +579,7 @@ podman exec -it flyer-crawler-dev npm run dev:container
|
||||
|
||||
**Pass/Fail**: [ ]
|
||||
|
||||
**Notes**: ****\*\*****\*\*****\*\*****\_\_\_****\*\*****\*\*****\*\*****
|
||||
**Notes**: \***\*\*\*\*\***\*\*\***\*\*\*\*\***\_\_\_\***\*\*\*\*\***\*\*\***\*\*\*\*\***
|
||||
|
||||
---
|
||||
|
||||
@@ -612,7 +612,7 @@ podman exec -it flyer-crawler-dev npm run dev:container
|
||||
|
||||
**Pass/Fail**: [ ]
|
||||
|
||||
**Notes**: ****\*\*****\*\*****\*\*****\_\_\_****\*\*****\*\*****\*\*****
|
||||
**Notes**: \***\*\*\*\*\***\*\*\***\*\*\*\*\***\_\_\_\***\*\*\*\*\***\*\*\***\*\*\*\*\***
|
||||
|
||||
---
|
||||
|
||||
@@ -637,7 +637,7 @@ podman exec -it flyer-crawler-dev npm run dev:container
|
||||
|
||||
**Pass/Fail**: [ ]
|
||||
|
||||
**Notes**: ****\*\*****\*\*****\*\*****\_\_\_****\*\*****\*\*****\*\*****
|
||||
**Notes**: \***\*\*\*\*\***\*\*\***\*\*\*\*\***\_\_\_\***\*\*\*\*\***\*\*\***\*\*\*\*\***
|
||||
|
||||
---
|
||||
|
||||
@@ -656,7 +656,7 @@ podman exec -it flyer-crawler-dev npm run dev:container
|
||||
|
||||
**Pass/Fail**: [ ]
|
||||
|
||||
**Notes**: ****\*\*****\*\*****\*\*****\_\_\_****\*\*****\*\*****\*\*****
|
||||
**Notes**: \***\*\*\*\*\***\*\*\***\*\*\*\*\***\_\_\_\***\*\*\*\*\***\*\*\***\*\*\*\*\***
|
||||
|
||||
---
|
||||
|
||||
@@ -681,7 +681,7 @@ podman exec -it flyer-crawler-dev npm run dev:container
|
||||
|
||||
**Pass/Fail**: [ ]
|
||||
|
||||
**Notes**: ****\*\*****\*\*****\*\*****\_\_\_****\*\*****\*\*****\*\*****
|
||||
**Notes**: \***\*\*\*\*\***\*\*\***\*\*\*\*\***\_\_\_\***\*\*\*\*\***\*\*\***\*\*\*\*\***
|
||||
|
||||
---
|
||||
|
||||
@@ -705,7 +705,7 @@ podman exec -it flyer-crawler-dev npm run dev:container
|
||||
|
||||
**Pass/Fail**: [ ]
|
||||
|
||||
**Notes**: ****\*\*****\*\*****\*\*****\_\_\_****\*\*****\*\*****\*\*****
|
||||
**Notes**: \***\*\*\*\*\***\*\*\***\*\*\*\*\***\_\_\_\***\*\*\*\*\***\*\*\***\*\*\*\*\***
|
||||
|
||||
---
|
||||
|
||||
@@ -757,7 +757,7 @@ podman exec -it flyer-crawler-dev npm run dev:container
|
||||
|
||||
**Pass/Fail**: [ ]
|
||||
|
||||
**Measurements**: ****\*\*****\*\*****\*\*****\_\_\_****\*\*****\*\*****\*\*****
|
||||
**Measurements**: \***\*\*\*\*\***\*\*\***\*\*\*\*\***\_\_\_\***\*\*\*\*\***\*\*\***\*\*\*\*\***
|
||||
|
||||
---
|
||||
|
||||
@@ -775,7 +775,7 @@ podman exec -it flyer-crawler-dev npm run dev:container
|
||||
|
||||
**Pass/Fail**: [ ]
|
||||
|
||||
**Notes**: ****\*\*****\*\*****\*\*****\_\_\_****\*\*****\*\*****\*\*****
|
||||
**Notes**: \***\*\*\*\*\***\*\*\***\*\*\*\*\***\_\_\_\***\*\*\*\*\***\*\*\***\*\*\*\*\***
|
||||
|
||||
---
|
||||
|
||||
@@ -791,7 +791,7 @@ podman exec -it flyer-crawler-dev npm run dev:container
|
||||
|
||||
**Pass/Fail**: [ ]
|
||||
|
||||
**Notes**: ****\*\*****\*\*****\*\*****\_\_\_****\*\*****\*\*****\*\*****
|
||||
**Notes**: \***\*\*\*\*\***\*\*\***\*\*\*\*\***\_\_\_\***\*\*\*\*\***\*\*\***\*\*\*\*\***
|
||||
|
||||
---
|
||||
|
||||
@@ -807,7 +807,7 @@ podman exec -it flyer-crawler-dev npm run dev:container
|
||||
|
||||
**Pass/Fail**: [ ]
|
||||
|
||||
**Notes**: ****\*\*****\*\*****\*\*****\_\_\_****\*\*****\*\*****\*\*****
|
||||
**Notes**: \***\*\*\*\*\***\*\*\***\*\*\*\*\***\_\_\_\***\*\*\*\*\***\*\*\***\*\*\*\*\***
|
||||
|
||||
---
|
||||
|
||||
@@ -849,8 +849,8 @@ podman exec -it flyer-crawler-dev npm run dev:container
|
||||
|
||||
## Sign-Off
|
||||
|
||||
**Tester Name**: ****\*\*****\*\*****\*\*****\_\_\_****\*\*****\*\*****\*\*****
|
||||
**Date Completed**: ****\*\*****\*\*****\*\*****\_\_\_****\*\*****\*\*****\*\*****
|
||||
**Tester Name**: \***\*\*\*\*\***\*\*\***\*\*\*\*\***\_\_\_\***\*\*\*\*\***\*\*\***\*\*\*\*\***
|
||||
**Date Completed**: \***\*\*\*\*\***\*\*\***\*\*\*\*\***\_\_\_\***\*\*\*\*\***\*\*\***\*\*\*\*\***
|
||||
**Overall Status**: [ ] PASS [ ] PASS WITH ISSUES [ ] FAIL
|
||||
|
||||
**Ready for Production**: [ ] YES [ ] NO [ ] WITH FIXES
|
||||
|
||||
@@ -47,9 +47,11 @@ Production operations and deployment:
|
||||
- [Logstash Troubleshooting](operations/LOGSTASH-TROUBLESHOOTING.md) - Debugging logs
|
||||
- [Monitoring](operations/MONITORING.md) - Bugsink, health checks, observability
|
||||
|
||||
**Incident Response**:
|
||||
**PM2 Management**:
|
||||
|
||||
- [PM2 Namespace Completion Report](operations/PM2-NAMESPACE-COMPLETION-REPORT.md) - PM2 namespace implementation project summary
|
||||
- [PM2 Incident Response Runbook](operations/PM2-INCIDENT-RESPONSE.md) - Step-by-step procedures for PM2 incidents
|
||||
- [PM2 Crash Debugging](operations/PM2-CRASH-DEBUGGING.md) - Troubleshooting PM2 crashes
|
||||
|
||||
**Incident Reports**:
|
||||
|
||||
|
||||
@@ -49,7 +49,7 @@ Tests that pass on Windows but fail on Linux are considered **broken tests**. Te
|
||||
|
||||
We will standardize the deployment process using a hybrid approach:
|
||||
|
||||
1. **PM2 for Production**: Use PM2 cluster mode for process management, load balancing, and zero-downtime reloads.
|
||||
1. **PM2 for Production**: Use PM2 for process management. ~~Cluster mode~~ Fork mode used due to tsx incompatibility (see [PM2-CLUSTER-MODE-INCOMPATIBILITY.md](../operations/PM2-CLUSTER-MODE-INCOMPATIBILITY.md)).
|
||||
2. **Docker/Podman for Development**: Provide a complete containerized development environment with automatic initialization.
|
||||
3. **VS Code Dev Containers**: Enable one-click development environment setup.
|
||||
4. **Gitea Actions for CI/CD**: Automated deployment pipelines handle builds and deployments.
|
||||
@@ -187,13 +187,13 @@ Located in `ecosystem.config.cjs`:
|
||||
module.exports = {
|
||||
apps: [
|
||||
{
|
||||
// API Server - Cluster mode for load balancing
|
||||
// API Server - Fork mode (tsx incompatible with cluster)
|
||||
name: 'flyer-crawler-api',
|
||||
script: './node_modules/.bin/tsx',
|
||||
args: 'server.ts',
|
||||
max_memory_restart: '500M',
|
||||
instances: 'max', // Use all CPU cores
|
||||
exec_mode: 'cluster', // Enable cluster mode
|
||||
instances: 1, // Fork mode - single instance
|
||||
exec_mode: 'fork', // tsx requires fork mode
|
||||
kill_timeout: 5000, // Graceful shutdown timeout
|
||||
|
||||
// Restart configuration
|
||||
@@ -249,26 +249,39 @@ module.exports = {
|
||||
|
||||
### PM2 Commands Reference
|
||||
|
||||
**CRITICAL**: Every `pm2 start`, `pm2 restart`, `pm2 stop`, or `pm2 delete` command MUST be immediately followed by `pm2 save`. Without this, processes become ephemeral and will disappear on PM2 daemon restarts, server reboots, or internal reconciliation events.
|
||||
|
||||
```bash
|
||||
# Start/reload with environment
|
||||
# ✅ CORRECT - Start/reload with environment and save
|
||||
pm2 startOrReload ecosystem.config.cjs --env production --update-env && pm2 save
|
||||
|
||||
# ✅ CORRECT - Restart and save
|
||||
pm2 restart flyer-crawler-api && pm2 save
|
||||
|
||||
# ✅ CORRECT - Stop and save
|
||||
pm2 stop flyer-crawler-api && pm2 save
|
||||
|
||||
# ✅ CORRECT - Delete and save
|
||||
pm2 delete flyer-crawler-api && pm2 save
|
||||
|
||||
# ❌ WRONG - Missing save (processes become ephemeral)
|
||||
pm2 startOrReload ecosystem.config.cjs --env production --update-env
|
||||
|
||||
# Save process list for startup
|
||||
pm2 save
|
||||
|
||||
# View logs
|
||||
# View logs (read-only operation, no save needed)
|
||||
pm2 logs flyer-crawler-api --lines 50
|
||||
|
||||
# Monitor processes
|
||||
# Monitor processes (read-only operation, no save needed)
|
||||
pm2 monit
|
||||
|
||||
# List all processes
|
||||
# List all processes (read-only operation, no save needed)
|
||||
pm2 list
|
||||
|
||||
# Describe process details
|
||||
# Describe process details (read-only operation, no save needed)
|
||||
pm2 describe flyer-crawler-api
|
||||
```
|
||||
|
||||
**Why This Matters**: PM2 maintains an in-memory process list. The `pm2 save` command writes this list to `~/.pm2/dump.pm2`, which PM2 uses to resurrect processes after daemon restarts. Without it, your carefully managed process state is lost.
|
||||
|
||||
### Resource Limits
|
||||
|
||||
| Process | Memory Limit | Restart Delay | Kill Timeout |
|
||||
@@ -345,6 +358,42 @@ podman-compose -f compose.dev.yml build app
|
||||
|
||||
**Rationale**: Developers and CI systems should never need to run manual setup commands to execute tests. If the container is running, tests should work. Any deviation from this principle indicates an incomplete container setup.
|
||||
|
||||
## Updates
|
||||
|
||||
### 2026-02-19: Cluster Mode Disabled
|
||||
|
||||
**Decision:** Disabled PM2 cluster mode in favor of fork mode (single instance).
|
||||
|
||||
**Reason:** PM2 cluster mode is fundamentally incompatible with using `tsx` as the script path. The configuration pattern:
|
||||
|
||||
```javascript
|
||||
{
|
||||
script: './node_modules/.bin/tsx',
|
||||
args: 'server.ts',
|
||||
exec_mode: 'cluster',
|
||||
}
|
||||
```
|
||||
|
||||
causes 75-87% of cluster instances to fail on startup with no clear error messages. Only 1-2 out of 8 instances successfully start, with the rest showing constant restart attempts.
|
||||
|
||||
**Technical Root Cause:** PM2 requires the `node` binary as the interpreter to properly fork cluster workers using Node.js's native cluster module. When `tsx` is the script path, PM2 cannot create cluster workers correctly.
|
||||
|
||||
**Alternative:** To use cluster mode with TypeScript in the future, the correct configuration is:
|
||||
|
||||
```javascript
|
||||
{
|
||||
script: 'server.ts',
|
||||
interpreter: 'node',
|
||||
interpreter_args: '--import tsx', // Node 18.19+
|
||||
exec_mode: 'cluster',
|
||||
instances: 'max',
|
||||
}
|
||||
```
|
||||
|
||||
**Impact:** Current traffic does not require cluster mode load balancing. Fork mode with a single instance provides reliable process management without the cluster overhead.
|
||||
|
||||
**Documentation:** See [PM2-CLUSTER-MODE-INCOMPATIBILITY.md](../operations/PM2-CLUSTER-MODE-INCOMPATIBILITY.md) for full details.
|
||||
|
||||
## Related ADRs
|
||||
|
||||
- [ADR-017](./0017-ci-cd-and-branching-strategy.md) - CI/CD Strategy
|
||||
|
||||
@@ -115,6 +115,31 @@ echo "=== END POST-CLEANUP VERIFICATION ==="
|
||||
|
||||
**Purpose**: Immediately identifies cross-environment contamination.
|
||||
|
||||
#### Layer 6: PM2 Process List Persistence
|
||||
|
||||
**CRITICAL**: Save the PM2 process list after every state-changing operation:
|
||||
|
||||
```bash
|
||||
# After any pm2 start/stop/restart/delete operation
|
||||
pm2 save
|
||||
|
||||
# Example: After cleanup loop completes
|
||||
targetProcesses.forEach(p => {
|
||||
exec('pm2 delete ' + p.pm2_env.pm_id);
|
||||
});
|
||||
exec('pm2 save'); // Persist all deletions
|
||||
```
|
||||
|
||||
**Purpose**: Ensures PM2 process state persists across daemon restarts, server reboots, and internal reconciliation events.
|
||||
|
||||
**Why This Matters**: PM2 maintains an in-memory process list. Without `pm2 save`, processes become ephemeral:
|
||||
|
||||
- Daemon restart → All unsaved processes disappear
|
||||
- Server reboot → Process list reverts to last saved state
|
||||
- PM2 internal reconciliation → Unsaved processes may be lost
|
||||
|
||||
**Pattern**: Every `pm2 start`, `pm2 restart`, `pm2 stop`, or `pm2 delete` MUST be followed by `pm2 save`.
|
||||
|
||||
## Consequences
|
||||
|
||||
### Positive
|
||||
|
||||
203
docs/adr/0062-lightweight-version-sync-workflow.md
Normal file
203
docs/adr/0062-lightweight-version-sync-workflow.md
Normal file
@@ -0,0 +1,203 @@
|
||||
# ADR-0062: Lightweight Version Sync Workflow
|
||||
|
||||
**Status:** Accepted
|
||||
**Date:** 2026-02-18
|
||||
**Decision Makers:** Development Team
|
||||
**Related:** ADR-061 (PM2 Process Isolation Safeguards)
|
||||
|
||||
## Context
|
||||
|
||||
After successful production deployments, the version number in `package.json` is bumped and pushed back to the `main` branch. This triggered the full test deployment workflow (`deploy-to-test.yml`), which includes:
|
||||
|
||||
- `npm ci` (dependency installation)
|
||||
- TypeScript type-checking
|
||||
- Prettier formatting
|
||||
- ESLint linting
|
||||
- Unit tests (~150 tests)
|
||||
- Integration tests (~50 tests)
|
||||
- React build with source maps
|
||||
- Full rsync deployment
|
||||
- PM2 process restart
|
||||
|
||||
**Problem:** Running the complete 5-7 minute test suite just to update a 10KB `package.json` file was wasteful:
|
||||
|
||||
| Resource | Waste |
|
||||
| -------------- | ------------------------------------- |
|
||||
| **Duration** | 5-7 minutes (vs 30 seconds needed) |
|
||||
| **CPU** | ~95% unnecessary (full test suite) |
|
||||
| **File I/O** | 99.8% unnecessary (only need 1 file) |
|
||||
| **CI Queue** | Blocked other workflows unnecessarily |
|
||||
| **Time/month** | ~20 minutes wasted on version syncs |
|
||||
|
||||
The code being tested had already passed the full test suite when originally pushed to `main`. Re-running tests for a version number change provided no additional value.
|
||||
|
||||
## Decision
|
||||
|
||||
Implement a lightweight **version-sync-only workflow** that:
|
||||
|
||||
1. **Triggers automatically** after successful production deployments (via `workflow_run`)
|
||||
2. **Updates only** the `package.json` version in the test deployment directory
|
||||
3. **Restarts PM2** with `--update-env` to refresh version metadata
|
||||
4. **Completes in ~30 seconds** instead of 5-7 minutes
|
||||
|
||||
### Architecture
|
||||
|
||||
```yaml
|
||||
# .gitea/workflows/sync-test-version.yml
|
||||
on:
|
||||
workflow_run:
|
||||
workflows: ['Deploy to Production']
|
||||
types: [completed]
|
||||
branches: [main]
|
||||
|
||||
jobs:
|
||||
sync-version:
|
||||
if: ${{ gitea.event.workflow_run.conclusion == 'success' }}
|
||||
steps:
|
||||
- Checkout latest main
|
||||
- Update test package.json version
|
||||
- PM2 restart with --update-env
|
||||
- Verify version in PM2 metadata
|
||||
```
|
||||
|
||||
**Key Points:**
|
||||
|
||||
- `deploy-to-test.yml` remains **unchanged** (runs normally for code changes)
|
||||
- `deploy-to-prod.yml` remains **unchanged** (no explicit trigger needed)
|
||||
- `workflow_run` automatically triggers after production deployment
|
||||
- Non-blocking: production success doesn't depend on version sync
|
||||
|
||||
## Consequences
|
||||
|
||||
### Positive
|
||||
|
||||
✅ **90% faster** version synchronization (30 sec vs 5-7 min)
|
||||
✅ **95% less CPU** usage (no test suite execution)
|
||||
✅ **99.8% less file I/O** (only package.json updated)
|
||||
✅ **Cleaner separation** of concerns (version sync vs full deployment)
|
||||
✅ **No workflow file pollution** with conditionals
|
||||
✅ **Saves ~20 minutes/month** of CI time
|
||||
|
||||
### Negative
|
||||
|
||||
⚠️ Test environment version could briefly lag behind production (30 second window)
|
||||
⚠️ Additional workflow file to maintain
|
||||
⚠️ PM2 version metadata relies on `--update-env` working correctly
|
||||
|
||||
### Neutral
|
||||
|
||||
- Full test deployment still runs for actual code changes (as designed)
|
||||
- Version numbers remain synchronized (just via different workflow)
|
||||
- Same end state, different path
|
||||
|
||||
## Alternatives Considered
|
||||
|
||||
### 1. Add Conditionals to `deploy-to-test.yml`
|
||||
|
||||
**Approach:** Detect version bump commits and skip most steps
|
||||
|
||||
**Rejected because:**
|
||||
|
||||
- Pollutes workflow with 20+ `if:` conditionals
|
||||
- Makes workflow complex and brittle
|
||||
- Still queues a workflow run (even if it exits early)
|
||||
- Harder to maintain and debug
|
||||
|
||||
### 2. Manual Production Deployments
|
||||
|
||||
**Approach:** Make production `workflow_dispatch` only (manual trigger)
|
||||
|
||||
**Rejected because:**
|
||||
|
||||
- Removes automation benefits
|
||||
- Requires human intervention for every production deploy
|
||||
- Doesn't align with CI/CD best practices
|
||||
- Slows down deployment velocity
|
||||
|
||||
### 3. Don't Sync Test Versions
|
||||
|
||||
**Approach:** Accept that test version lags behind production
|
||||
|
||||
**Rejected because:**
|
||||
|
||||
- Version numbers become meaningless in test
|
||||
- Harder to correlate test issues with production releases
|
||||
- Loses visibility into what's deployed where
|
||||
|
||||
### 4. Release Tag-Based Production Deployments
|
||||
|
||||
**Approach:** Only deploy production on release tags, not on every push
|
||||
|
||||
**Rejected because:**
|
||||
|
||||
- Changes current deployment cadence
|
||||
- Adds manual release step overhead
|
||||
- Doesn't solve the fundamental problem (version sync still needed)
|
||||
|
||||
## Implementation
|
||||
|
||||
### Files Created
|
||||
|
||||
- `.gitea/workflows/sync-test-version.yml` - Lightweight version sync workflow
|
||||
|
||||
### Files Modified
|
||||
|
||||
- None (clean separation, no modifications to existing workflows)
|
||||
|
||||
### Configuration
|
||||
|
||||
No additional secrets or environment variables required. Uses existing PM2 configuration and test deployment paths.
|
||||
|
||||
## Verification
|
||||
|
||||
After implementation:
|
||||
|
||||
1. **Production deployment** completes and bumps version
|
||||
2. **Version sync workflow** triggers automatically within seconds
|
||||
3. **Test PM2 processes** restart with updated version metadata
|
||||
4. **PM2 list** shows matching versions across environments
|
||||
|
||||
```bash
|
||||
# Verify version sync worked
|
||||
pm2 jlist | jq '.[] | select(.name | endsWith("-test")) | {name, version: .pm2_env.version}'
|
||||
```
|
||||
|
||||
Expected output:
|
||||
|
||||
```json
|
||||
{
|
||||
"name": "flyer-crawler-api-test",
|
||||
"version": "0.16.2"
|
||||
}
|
||||
{
|
||||
"name": "flyer-crawler-worker-test",
|
||||
"version": "0.16.2"
|
||||
}
|
||||
```
|
||||
|
||||
## Monitoring
|
||||
|
||||
Track workflow execution times:
|
||||
|
||||
- Before: `deploy-to-test.yml` duration after prod deploys (~5-7 min)
|
||||
- After: `sync-test-version.yml` duration (~30 sec)
|
||||
|
||||
## Rollback Plan
|
||||
|
||||
If version sync fails or causes issues:
|
||||
|
||||
1. Remove `sync-test-version.yml`
|
||||
2. Previous behavior automatically resumes (full test deployment on version bumps)
|
||||
3. No data loss or configuration changes needed
|
||||
|
||||
## References
|
||||
|
||||
- Inspired by similar optimization in `stock-alert` project (commit `021f9c8`)
|
||||
- Related: ADR-061 for PM2 process isolation and deployment safety
|
||||
- Workflow pattern: GitHub Actions `workflow_run` trigger documentation
|
||||
|
||||
## Notes
|
||||
|
||||
This optimization follows the principle: **"Don't test what you've already tested."**
|
||||
|
||||
The code was validated when originally pushed to `main`. Version number changes are metadata updates, not code changes. Treating them differently is an architectural improvement, not a shortcut.
|
||||
191
docs/adr/0063-pm2-namespace-implementation.md
Normal file
191
docs/adr/0063-pm2-namespace-implementation.md
Normal file
@@ -0,0 +1,191 @@
|
||||
# ADR-063: PM2 Namespace Implementation
|
||||
|
||||
## Status
|
||||
|
||||
Accepted
|
||||
|
||||
## Context
|
||||
|
||||
### Problem
|
||||
|
||||
The PM2 process isolation safeguards implemented in [ADR-061](./0061-pm2-process-isolation-safeguards.md) successfully prevented cross-application process deletion but introduced operational complexity. Every PM2 command in deployment workflows required:
|
||||
|
||||
1. Process name filtering logic (JavaScript inline scripts)
|
||||
2. Safety abort checks (process count validation)
|
||||
3. Pre/post verification logging
|
||||
|
||||
Additionally, simultaneous test and production deployments created a race condition with `pm2 save`:
|
||||
|
||||
- Test deployment: `pm2 save` writes test processes to dump file
|
||||
- Prod deployment: `pm2 save` writes prod processes to dump file (overwrites test state)
|
||||
- PM2 daemon restart: Restores incomplete process list
|
||||
|
||||
This race condition could cause process loss on PM2 daemon restart.
|
||||
|
||||
### Requirements
|
||||
|
||||
1. Complete isolation between test/prod/dev PM2 processes
|
||||
2. Eliminate `pm2 save` race condition
|
||||
3. Simplify workflow commands
|
||||
4. Maintain backward compatibility during migration
|
||||
|
||||
## Decision
|
||||
|
||||
Implement PM2 namespaces with separate dump files per environment:
|
||||
|
||||
| Namespace | Config File | Use Case |
|
||||
| -------------------- | --------------------------- | ----------------------- |
|
||||
| `flyer-crawler-prod` | `ecosystem.config.cjs` | Production deployment |
|
||||
| `flyer-crawler-test` | `ecosystem-test.config.cjs` | Test/staging deployment |
|
||||
| `flyer-crawler-dev` | `ecosystem.dev.config.cjs` | Local development |
|
||||
|
||||
### Implementation
|
||||
|
||||
#### Ecosystem Config Changes
|
||||
|
||||
Each config file declares its namespace at the module level:
|
||||
|
||||
```javascript
|
||||
// ecosystem.config.cjs (production)
|
||||
module.exports = {
|
||||
namespace: 'flyer-crawler-prod',
|
||||
apps: [
|
||||
/* ... */
|
||||
],
|
||||
};
|
||||
|
||||
// ecosystem-test.config.cjs (test)
|
||||
module.exports = {
|
||||
namespace: 'flyer-crawler-test',
|
||||
apps: [
|
||||
/* ... */
|
||||
],
|
||||
};
|
||||
|
||||
// ecosystem.dev.config.cjs (development)
|
||||
module.exports = {
|
||||
namespace: 'flyer-crawler-dev',
|
||||
apps: [
|
||||
/* ... */
|
||||
],
|
||||
};
|
||||
```
|
||||
|
||||
#### Workflow Command Pattern
|
||||
|
||||
All PM2 commands require `--namespace` flag:
|
||||
|
||||
```bash
|
||||
# Start/reload
|
||||
pm2 startOrReload ecosystem.config.cjs --update-env --namespace flyer-crawler-prod
|
||||
|
||||
# Process management
|
||||
pm2 stop flyer-crawler-api --namespace flyer-crawler-prod
|
||||
pm2 restart flyer-crawler-api flyer-crawler-worker --namespace flyer-crawler-prod
|
||||
pm2 delete flyer-crawler-api --namespace flyer-crawler-prod
|
||||
|
||||
# Status
|
||||
pm2 list --namespace flyer-crawler-prod
|
||||
pm2 jlist --namespace flyer-crawler-prod
|
||||
pm2 logs flyer-crawler-api --namespace flyer-crawler-prod
|
||||
pm2 describe flyer-crawler-api --namespace flyer-crawler-prod
|
||||
|
||||
# Save (namespace-isolated dump file)
|
||||
pm2 save --namespace flyer-crawler-prod
|
||||
```
|
||||
|
||||
#### Migration Script
|
||||
|
||||
Zero-downtime migration from unnamed processes to namespaced processes:
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
# migrate-to-pm2-namespaces.sh
|
||||
|
||||
# 1. Stop old processes (by name)
|
||||
pm2 stop flyer-crawler-api flyer-crawler-worker flyer-crawler-analytics-worker || true
|
||||
pm2 stop flyer-crawler-api-test flyer-crawler-worker-test flyer-crawler-analytics-worker-test || true
|
||||
|
||||
# 2. Delete old processes
|
||||
pm2 delete flyer-crawler-api flyer-crawler-worker flyer-crawler-analytics-worker || true
|
||||
pm2 delete flyer-crawler-api-test flyer-crawler-worker-test flyer-crawler-analytics-worker-test || true
|
||||
|
||||
# 3. Save to clear old dump file
|
||||
pm2 save --force
|
||||
|
||||
# 4. Start with namespaces
|
||||
cd /var/www/flyer-crawler.projectium.com
|
||||
pm2 start ecosystem.config.cjs --namespace flyer-crawler-prod
|
||||
pm2 save --namespace flyer-crawler-prod
|
||||
|
||||
cd /var/www/flyer-crawler-test.projectium.com
|
||||
pm2 start ecosystem-test.config.cjs --namespace flyer-crawler-test
|
||||
pm2 save --namespace flyer-crawler-test
|
||||
```
|
||||
|
||||
## Consequences
|
||||
|
||||
### Positive
|
||||
|
||||
1. **Complete Process Isolation**: Namespaces create logical boundaries preventing cross-environment process operations
|
||||
2. **No Save Race Condition**: Each namespace maintains separate dump file at `~/.pm2/dump-<namespace>.pm2`
|
||||
3. **Simplified Commands**: No inline JavaScript filtering; use explicit namespace flag
|
||||
4. **Clear Organization**: `pm2 list --namespace <name>` shows only relevant processes
|
||||
5. **Retained Safeguards**: Defense-in-depth from ADR-061 remains as additional protection layer
|
||||
|
||||
### Negative
|
||||
|
||||
1. **Command Verbosity**: All PM2 commands require `--namespace` flag
|
||||
2. **Migration Required**: One-time migration to move existing processes into namespaces
|
||||
3. **Learning Curve**: Team must remember to include namespace flag
|
||||
|
||||
### Trade-offs
|
||||
|
||||
| Without Namespace | With Namespace |
|
||||
| --------------------------------- | ------------------------------------------------ |
|
||||
| `pm2 list` | `pm2 list --namespace flyer-crawler-prod` |
|
||||
| `pm2 logs app` | `pm2 logs app --namespace flyer-crawler-prod` |
|
||||
| `pm2 restart app` | `pm2 restart app --namespace flyer-crawler-prod` |
|
||||
| Filter logic in workflows | Explicit namespace declaration |
|
||||
| Single dump file (race condition) | Per-namespace dump files |
|
||||
|
||||
## Files Modified
|
||||
|
||||
| File | Changes |
|
||||
| ------------------------------------------ | ---------------------------------------------------------- |
|
||||
| `ecosystem.config.cjs` | Added `namespace: 'flyer-crawler-prod'` |
|
||||
| `ecosystem-test.config.cjs` | Added `namespace: 'flyer-crawler-test'` |
|
||||
| `ecosystem.dev.config.cjs` | Added `namespace: 'flyer-crawler-dev'` |
|
||||
| `.gitea/workflows/deploy-to-prod.yml` | Added `--namespace flyer-crawler-prod` to all PM2 commands |
|
||||
| `.gitea/workflows/deploy-to-test.yml` | Added `--namespace flyer-crawler-test` to all PM2 commands |
|
||||
| `.gitea/workflows/restart-pm2.yml` | Added `--namespace` flag for both environments |
|
||||
| `.gitea/workflows/manual-deploy-major.yml` | Added `--namespace flyer-crawler-prod` to PM2 commands |
|
||||
|
||||
## Verification
|
||||
|
||||
After migration, verify namespace isolation:
|
||||
|
||||
```bash
|
||||
# Should show only production processes
|
||||
pm2 list --namespace flyer-crawler-prod
|
||||
|
||||
# Should show only test processes
|
||||
pm2 list --namespace flyer-crawler-test
|
||||
|
||||
# Should show only dev processes (if running)
|
||||
pm2 list --namespace flyer-crawler-dev
|
||||
|
||||
# Verify separate dump files exist
|
||||
ls -la ~/.pm2/dump-flyer-crawler-*.pm2
|
||||
```
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- [ADR-061: PM2 Process Isolation Safeguards](./0061-pm2-process-isolation-safeguards.md) - Prior safeguards (still active)
|
||||
- [ADR-014: Containerization and Deployment Strategy](./0014-containerization-and-deployment-strategy.md) - Overall deployment architecture
|
||||
- [PM2 Namespace Documentation](https://pm2.keymetrics.io/docs/usage/application-declaration/#namespace)
|
||||
|
||||
## References
|
||||
|
||||
- PM2 Ecosystem File: https://pm2.keymetrics.io/docs/usage/application-declaration/
|
||||
- PM2 Namespaces: https://pm2.keymetrics.io/docs/usage/process-management/#namespaces
|
||||
@@ -57,6 +57,8 @@ This directory contains a log of the architectural decisions made for the Flyer
|
||||
**[ADR-053](./0053-worker-health-checks.md)**: Worker Health Checks and Stalled Job Monitoring (Accepted)
|
||||
**[ADR-054](./0054-bugsink-gitea-issue-sync.md)**: Bugsink to Gitea Issue Synchronization (Proposed)
|
||||
**[ADR-061](./0061-pm2-process-isolation-safeguards.md)**: PM2 Process Isolation Safeguards (Accepted)
|
||||
**[ADR-062](./0062-lightweight-version-sync-workflow.md)**: Lightweight Version Sync Workflow (Accepted)
|
||||
**[ADR-063](./0063-pm2-namespace-implementation.md)**: PM2 Namespace Implementation (Accepted)
|
||||
|
||||
## 7. Frontend / User Interface
|
||||
|
||||
|
||||
File diff suppressed because it is too large
Load Diff
@@ -17,15 +17,15 @@ This guide covers deploying Flyer Crawler to a production server.
|
||||
|
||||
### Command Reference Table
|
||||
|
||||
| Task | Command |
|
||||
| -------------------- | ----------------------------------------------------------------------- |
|
||||
| Deploy to production | Gitea Actions workflow (manual trigger) |
|
||||
| Deploy to test | Automatic on push to `main` |
|
||||
| Check PM2 status | `pm2 list` |
|
||||
| View logs | `pm2 logs flyer-crawler-api --lines 100` |
|
||||
| Restart all | `pm2 restart all` |
|
||||
| Check NGINX | `sudo nginx -t && sudo systemctl status nginx` |
|
||||
| Check health | `curl -s https://flyer-crawler.projectium.com/api/health/ready \| jq .` |
|
||||
| Task | Command |
|
||||
| -------------------- | ----------------------------------------------------------------------------------------------- |
|
||||
| Deploy to production | Gitea Actions workflow (manual trigger) |
|
||||
| Deploy to test | Automatic on push to `main` |
|
||||
| Check PM2 status | `pm2 list` |
|
||||
| View logs | `pm2 logs flyer-crawler-api --lines 100` |
|
||||
| Restart all | `pm2 restart flyer-crawler-api flyer-crawler-worker flyer-crawler-analytics-worker && pm2 save` |
|
||||
| Check NGINX | `sudo nginx -t && sudo systemctl status nginx` |
|
||||
| Check health | `curl -s https://flyer-crawler.projectium.com/api/health/ready \| jq .` |
|
||||
|
||||
### Deployment URLs
|
||||
|
||||
@@ -274,7 +274,40 @@ sudo systemctl reload nginx
|
||||
|
||||
---
|
||||
|
||||
## PM2 Log Management
|
||||
## PM2 Process Management
|
||||
|
||||
### Critical: Always Save After State Changes
|
||||
|
||||
**CRITICAL**: Every `pm2 start`, `pm2 restart`, `pm2 stop`, or `pm2 delete` command MUST be immediately followed by `pm2 save`.
|
||||
|
||||
Without `pm2 save`, processes become ephemeral and will disappear on:
|
||||
|
||||
- PM2 daemon restarts
|
||||
- Server reboots
|
||||
- Internal PM2 reconciliation events
|
||||
|
||||
**Correct Pattern:**
|
||||
|
||||
```bash
|
||||
# ✅ CORRECT - Always save after state changes
|
||||
pm2 restart flyer-crawler-api && pm2 save
|
||||
pm2 stop flyer-crawler-worker && pm2 save
|
||||
pm2 delete flyer-crawler-analytics-worker && pm2 save
|
||||
pm2 startOrReload ecosystem.config.cjs --env production --update-env && pm2 save
|
||||
|
||||
# ❌ WRONG - Missing save (processes become ephemeral)
|
||||
pm2 restart flyer-crawler-api
|
||||
pm2 stop flyer-crawler-worker
|
||||
|
||||
# ✅ Read-only operations don't need save
|
||||
pm2 list
|
||||
pm2 logs flyer-crawler-api
|
||||
pm2 monit
|
||||
```
|
||||
|
||||
**Why This Matters**: PM2 maintains an in-memory process list. The `pm2 save` command writes this list to `~/.pm2/dump.pm2`, which PM2 uses to resurrect processes after daemon restarts. Without it, your carefully managed process state is lost.
|
||||
|
||||
### PM2 Log Management
|
||||
|
||||
Install and configure pm2-logrotate to manage log files:
|
||||
|
||||
|
||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user