Files
flyer-crawler.projectium.com/docs/adr/0019-data-backup-and-recovery-strategy.md
Torben Sorensen 544eb7ae3c
Some checks failed
Deploy to Test Environment / deploy-to-test (push) Failing after 2m1s
still fixin test
2026-01-17 14:59:01 -08:00

228 lines
6.3 KiB
Markdown

# ADR-019: Data Backup and Recovery Strategy
**Date**: 2025-12-12
**Status**: Accepted
**Implemented**: 2026-01-09
## Context
The application's data, stored in PostgreSQL, is critical. Currently, there is no formalized or automated strategy for creating backups or for recovering data in the event of hardware failure, data corruption, or accidental deletion.
## Decision
We will implement a formal data backup and recovery strategy. This will involve using standard PostgreSQL tools (`pg_dump`) to perform **regular, automated backups** (e.g., daily). Backup files will be stored securely in a separate, off-site location (e.g., a cloud storage bucket). The ADR will also define the Recovery Time Objective (RTO) and Recovery Point Objective (RPO) and document the step-by-step procedure for restoring from a backup.
## Consequences
- **Positive**: Protects against catastrophic data loss, ensuring business continuity. Provides a clear, tested plan for disaster recovery.
- **Negative**: Requires setup and maintenance of backup scripts and secure storage. Incurs storage costs for backup files.
## Implementation Details
### Backup Workflow
Located in `.gitea/workflows/manual-db-backup.yml`:
```yaml
name: Manual - Backup Production Database
on:
workflow_dispatch:
inputs:
confirmation:
description: 'Type "backup-production-db" to confirm'
required: true
jobs:
backup-database:
runs-on: projectium.com
env:
DB_HOST: ${{ secrets.DB_HOST }}
DB_PORT: ${{ secrets.DB_PORT }}
DB_USER: ${{ secrets.DB_USER_PROD }}
DB_PASSWORD: ${{ secrets.DB_PASSWORD_PROD }}
DB_NAME: ${{ secrets.DB_DATABASE_PROD }}
steps:
- name: Validate Secrets
run: |
if [ -z "$DB_HOST" ] || [ -z "$DB_USER" ]; then
echo "ERROR: Database secrets not configured."
exit 1
fi
- name: Create Database Backup
run: |
TIMESTAMP=$(date +'%Y%m%d-%H%M%S')
BACKUP_FILENAME="flyer-crawler-prod-backup-${TIMESTAMP}.sql.gz"
# Create compressed backup
PGPASSWORD="$DB_PASSWORD" pg_dump \
-h "$DB_HOST" -p "$DB_PORT" \
-U "$DB_USER" -d "$DB_NAME" \
--clean --if-exists | gzip > "$BACKUP_FILENAME"
echo "backup_filename=$BACKUP_FILENAME" >> $GITEA_ENV
- name: Upload Backup as Artifact
uses: actions/upload-artifact@v3
with:
name: database-backup
path: ${{ env.backup_filename }}
```
### Restore Workflow
Located in `.gitea/workflows/manual-db-restore.yml`:
```yaml
name: Manual - Restore Database from Backup
on:
workflow_dispatch:
inputs:
confirmation:
description: 'Type "restore-from-backup" to confirm'
required: true
backup_file:
description: 'Path to backup file on server'
required: true
jobs:
restore-database:
steps:
- name: Verify Confirmation
run: |
if [ "${{ inputs.confirmation }}" != "restore-from-backup" ]; then
exit 1
fi
- name: Restore Database
run: |
# Decompress and restore
gunzip -c "${{ inputs.backup_file }}" | \
PGPASSWORD="$DB_PASSWORD" psql \
-h "$DB_HOST" -p "$DB_PORT" \
-U "$DB_USER" -d "$DB_NAME"
```
### Backup Command Reference
**Manual Backup**:
```bash
# Create compressed backup
PGPASSWORD="password" pg_dump \
-h localhost -p 5432 \
-U dbuser -d flyer-crawler \
--clean --if-exists | gzip > backup-$(date +%Y%m%d).sql.gz
# List backup contents (without restoring)
gunzip -c backup-20260109.sql.gz | head -100
```
**Manual Restore**:
```bash
# Restore from compressed backup
gunzip -c backup-20260109.sql.gz | \
PGPASSWORD="password" psql \
-h localhost -p 5432 \
-U dbuser -d flyer-crawler
```
### pg_dump Options
| Option | Purpose |
| ----------------- | ------------------------------ |
| `--clean` | Drop objects before recreating |
| `--if-exists` | Use IF EXISTS when dropping |
| `--no-owner` | Skip ownership commands |
| `--no-privileges` | Skip access privilege commands |
| `-F c` | Custom format (for pg_restore) |
| `-F p` | Plain text SQL (default) |
### Recovery Objectives
| Metric | Target | Current |
| ---------------------------------- | -------- | -------------- |
| **RPO** (Recovery Point Objective) | 24 hours | Manual trigger |
| **RTO** (Recovery Time Objective) | 1 hour | ~15 minutes |
### Backup Retention Policy
| Type | Retention | Storage |
| --------------- | --------- | ---------------- |
| Daily backups | 7 days | Gitea artifacts |
| Weekly backups | 4 weeks | Gitea artifacts |
| Monthly backups | 12 months | Off-site storage |
### Backup Verification
Periodically test backup integrity:
```bash
# Verify backup can be read
gunzip -t backup-20260109.sql.gz
# Test restore to a temporary database
createdb flyer-crawler-restore-test
gunzip -c backup-20260109.sql.gz | psql -d flyer-crawler-restore-test
# Verify data integrity...
dropdb flyer-crawler-restore-test
```
### Disaster Recovery Checklist
1. **Identify the Issue**
- Data corruption?
- Accidental deletion?
- Full database loss?
2. **Select Backup**
- Find most recent valid backup
- Download from Gitea artifacts or off-site storage
3. **Stop Application**
```bash
pm2 stop all
```
4. **Restore Database**
```bash
gunzip -c backup.sql.gz | psql -d flyer-crawler
```
5. **Verify Data**
- Check table row counts
- Verify recent data exists
- Test critical queries
6. **Restart Application**
```bash
pm2 start all
```
7. **Post-Mortem**
- Document incident
- Update procedures if needed
## Key Files
- `.gitea/workflows/manual-db-backup.yml` - Backup workflow
- `.gitea/workflows/manual-db-restore.yml` - Restore workflow
- `.gitea/workflows/manual-db-reset-test.yml` - Reset test database
- `.gitea/workflows/manual-db-reset-prod.yml` - Reset production database
- `sql/master_schema_rollup.sql` - Current schema definition
## Related ADRs
- [ADR-013](./0013-database-schema-migration-strategy.md) - Schema Migration Strategy
- [ADR-017](./0017-ci-cd-and-branching-strategy.md) - CI/CD Strategy