Files
flyer-crawler.projectium.com/docs/SUBAGENT-DEVOPS-REFERENCE.md
Torben Sorensen 4f06698dfd
Some checks failed
Deploy to Test Environment / deploy-to-test (push) Failing after 2m50s
test fixes and doc work
2026-01-28 15:33:48 -08:00

375 lines
10 KiB
Markdown

# DevOps Subagent Reference
## Critical Rule: Server Access is READ-ONLY
**Claude Code has READ-ONLY access to production/test servers.** The `claude-win10` user cannot execute write operations directly.
When working with production/test servers:
1. **Provide commands** for the user to execute (do not attempt SSH)
2. **Wait for user** to report command output
3. **Provide fix commands** 1-3 at a time (errors may cascade)
4. **Verify success** with read-only commands after user executes fixes
5. **Document findings** in relevant documentation
Commands in this reference are for the **user to run on the server**, not for Claude to execute.
---
## Critical Rule: Git Bash Path Conversion
Git Bash on Windows auto-converts Unix paths, breaking container commands.
| Solution | Example |
| ---------------------------- | -------------------------------------------------------- |
| `sh -c` with single quotes | `podman exec container sh -c '/usr/local/bin/script.sh'` |
| Double slashes | `podman exec container //usr//local//bin//script.sh` |
| MSYS_NO_PATHCONV=1 | `MSYS_NO_PATHCONV=1 podman exec ...` |
| Windows paths for host files | `podman cp "d:/path/file" container:/tmp/file` |
---
## Container Commands (Podman)
### Dev Container Operations
```bash
# List running containers
podman ps
# Container logs
podman logs flyer-crawler-dev
podman logs -f flyer-crawler-dev # Follow
# Execute in container
podman exec -it flyer-crawler-dev bash
podman exec -it flyer-crawler-dev npm run test:unit
# Restart container
podman restart flyer-crawler-dev
# Container resource usage
podman stats flyer-crawler-dev
```
### Test Execution (from Windows)
```bash
# Unit tests - pipe output for AI processing
podman exec -it flyer-crawler-dev npm run test:unit 2>&1 | tee test-results.txt
# Integration tests
podman exec -it flyer-crawler-dev npm run test:integration
# Type check (CRITICAL before commit)
podman exec -it flyer-crawler-dev npm run type-check
# Specific test file
podman exec -it flyer-crawler-dev npm test -- --run src/hooks/useAuth.test.tsx
```
### Database Operations (from Windows)
```bash
# Reset dev database
podman exec -it flyer-crawler-dev npm run db:reset:dev
# Access PostgreSQL
podman exec -it flyer-crawler-dev psql -U postgres -d flyer_crawler_dev
# Run SQL file (use MSYS_NO_PATHCONV to avoid path conversion)
MSYS_NO_PATHCONV=1 podman exec -it flyer-crawler-dev psql -U postgres -d flyer_crawler_dev -f /app/sql/master_schema_rollup.sql
```
---
## PM2 Commands
### Production Server
> **Note**: These commands are for the **user to execute on the server**. Claude Code provides commands but cannot run them directly. See [Server Access is READ-ONLY](#critical-rule-server-access-is-read-only) above.
```bash
# List all apps
pm2 list
# App status
pm2 show flyer-crawler-api
# Logs
pm2 logs flyer-crawler-api
pm2 logs --lines 100
# Restart apps
pm2 restart flyer-crawler-api
pm2 reload flyer-crawler-api # Zero-downtime
# Stop/Start
pm2 stop flyer-crawler-api
pm2 start flyer-crawler-api
# Delete and reload from config
pm2 delete all
pm2 start ecosystem.config.cjs
```
### PM2 Config: `ecosystem.config.cjs`
| App | Purpose | Memory | Mode |
| -------------------------------- | ---------------- | ------ | ------- |
| `flyer-crawler-api` | Express server | 500M | Cluster |
| `flyer-crawler-worker` | BullMQ worker | 1G | Fork |
| `flyer-crawler-analytics-worker` | Analytics worker | 1G | Fork |
Test variants: `*-test` suffix
---
## CI/CD Workflows
### Location: `.gitea/workflows/`
| Workflow | Trigger | Purpose |
| ----------------------------- | ------------ | ----------------------- |
| `deploy-to-test.yml` | Push to main | Auto-deploy to test env |
| `deploy-to-prod.yml` | Manual | Deploy to production |
| `manual-db-backup.yml` | Manual | Database backup |
| `manual-db-reset-test.yml` | Manual | Reset test database |
| `manual-db-reset-prod.yml` | Manual | Reset prod database |
| `manual-db-restore.yml` | Manual | Restore database |
| `manual-deploy-major.yml` | Manual | Major version release |
| `manual-redis-flush-prod.yml` | Manual | Flush Redis cache |
### Deploy to Test Pipeline Steps
1. Checkout code
2. Setup Node.js 20
3. `npm ci`
4. Bump patch version (creates git tag)
5. `npm run type-check`
6. `npm run test:unit`
7. Check schema hash against deployed DB
8. `npm run build`
9. Copy files to `/var/www/flyer-crawler-test.projectium.com/`
10. `pm2 reload ecosystem-test.config.cjs`
### Deploy to Production Pipeline Steps
1. Verify confirmation phrase ("deploy-to-prod")
2. Checkout `main` branch
3. `npm ci`
4. Bump minor version (creates git tag)
5. Check schema hash against prod DB
6. `npm run build` (with Sentry source maps)
7. Copy files to `/var/www/flyer-crawler.projectium.com/`
8. `pm2 reload ecosystem.config.cjs`
---
## Deployment Paths
| Environment | Path | Domain |
| ------------- | --------------------------------------------- | ----------------------------------- |
| Production | `/var/www/flyer-crawler.projectium.com/` | `flyer-crawler.projectium.com` |
| Test | `/var/www/flyer-crawler-test.projectium.com/` | `flyer-crawler-test.projectium.com` |
| Dev Container | `/app/` | `localhost:3000` |
---
## Environment Configuration
### Files
| File | Purpose |
| --------------------------- | ----------------------- |
| `ecosystem.config.cjs` | PM2 production config |
| `ecosystem-test.config.cjs` | PM2 test config |
| `src/config/env.ts` | Zod schema for env vars |
| `.env.example` | Template for env vars |
### Required Secrets (Gitea CI/CD)
| Category | Secrets |
| ---------- | -------------------------------------------------------------------------------------------- |
| Database | `DB_HOST`, `DB_USER_PROD`, `DB_PASSWORD_PROD`, `DB_DATABASE_PROD` |
| Test DB | `DB_USER_TEST`, `DB_PASSWORD_TEST`, `DB_DATABASE_TEST` |
| Redis | `REDIS_PASSWORD_PROD`, `REDIS_PASSWORD_TEST` |
| Auth | `JWT_SECRET`, `GOOGLE_CLIENT_ID`, `GOOGLE_CLIENT_SECRET`, `GH_CLIENT_ID`, `GH_CLIENT_SECRET` |
| AI | `VITE_GOOGLE_GENAI_API_KEY`, `VITE_GOOGLE_GENAI_API_KEY_TEST` |
| Monitoring | `SENTRY_DSN`, `SENTRY_DSN_TEST`, `SENTRY_AUTH_TOKEN` |
| Maps | `GOOGLE_MAPS_API_KEY` |
### Adding New Secret
1. Add to Gitea Settings > Secrets
2. Update workflow YAML: `SENTRY_DSN: ${{ secrets.SENTRY_DSN }}`
3. Update `ecosystem.config.cjs`
4. Update `src/config/env.ts` Zod schema
5. Update `.env.example`
---
## Redis Commands
### Dev Container
```bash
# Access Redis CLI
podman exec -it flyer-crawler-dev redis-cli
# Common commands
KEYS *
FLUSHALL
INFO
```
### Production
> **Note**: User executes these commands on the server.
```bash
# Access Redis CLI
redis-cli -a $REDIS_PASSWORD
# Flush cache (use with caution)
# Or use manual-redis-flush-prod.yml workflow
```
---
## Health Checks
### Endpoints
| Endpoint | Purpose |
| -------------------------- | -------------------------------------- |
| `GET /api/health` | Basic health check |
| `GET /api/health/detailed` | Full system status (DB, Redis, queues) |
### Manual Health Check
```bash
# From Windows
curl http://localhost:3000/api/health
# Or via Podman
podman exec -it flyer-crawler-dev curl http://localhost:3000/api/health
```
---
## Log Locations
### Production Server
```bash
# PM2 logs
~/.pm2/logs/
# NGINX logs
/var/log/nginx/access.log
/var/log/nginx/error.log
# Application logs (via PM2)
pm2 logs flyer-crawler-api --lines 200
```
### Dev Container
```bash
# View container logs
podman logs flyer-crawler-dev
# Follow logs
podman logs -f flyer-crawler-dev
```
---
## Backup/Restore
### Database Backup (Manual Workflow)
Trigger `manual-db-backup.yml` from Gitea Actions UI.
### Manual Backup
> **Note**: User executes these commands on the server.
```bash
# Backup
PGPASSWORD=$DB_PASSWORD pg_dump -h $DB_HOST -U $DB_USER $DB_NAME > backup_$(date +%Y%m%d).sql
# Restore
PGPASSWORD=$DB_PASSWORD psql -h $DB_HOST -U $DB_USER $DB_NAME < backup.sql
```
---
## Bugsink (Error Tracking)
### Dev Container Token Generation
```bash
MSYS_NO_PATHCONV=1 podman exec -e DATABASE_URL=postgresql://bugsink:bugsink_dev_password@postgres:5432/bugsink -e SECRET_KEY=dev-bugsink-secret-key-minimum-50-characters-for-security flyer-crawler-dev sh -c 'cd /opt/bugsink/conf && DJANGO_SETTINGS_MODULE=bugsink_conf PYTHONPATH=/opt/bugsink/conf:/opt/bugsink/lib/python3.10/site-packages /opt/bugsink/bin/python -m django create_auth_token'
```
### Production Token Generation
> **Note**: User executes this command on the server.
```bash
cd /opt/bugsink && bugsink-manage create_auth_token
```
---
## Common Troubleshooting
### Container Won't Start
```bash
# Check logs
podman logs flyer-crawler-dev
# Inspect container
podman inspect flyer-crawler-dev
# Remove and recreate
podman rm -f flyer-crawler-dev
# Then recreate with docker-compose or podman run
```
### Database Connection Issues
```bash
# Test connection inside container
podman exec -it flyer-crawler-dev psql -U postgres -d flyer_crawler_dev -c "SELECT 1"
# Check if PostgreSQL is running
podman exec -it flyer-crawler-dev pg_isready
```
### PM2 App Keeps Restarting
```bash
# Check logs
pm2 logs flyer-crawler-api --err --lines 100
# Check for memory issues
pm2 monit
# View app details
pm2 show flyer-crawler-api
```
### Redis Connection Issues
```bash
# Test Redis inside container
podman exec -it flyer-crawler-dev redis-cli ping
# Check Redis logs
podman logs flyer-crawler-redis
```