228 lines
6.3 KiB
Markdown
228 lines
6.3 KiB
Markdown
# ADR-019: Data Backup and Recovery Strategy
|
|
|
|
**Date**: 2025-12-12
|
|
|
|
**Status**: Accepted
|
|
|
|
**Implemented**: 2026-01-09
|
|
|
|
## Context
|
|
|
|
The application's data, stored in PostgreSQL, is critical. Currently, there is no formalized or automated strategy for creating backups or for recovering data in the event of hardware failure, data corruption, or accidental deletion.
|
|
|
|
## Decision
|
|
|
|
We will implement a formal data backup and recovery strategy. This will involve using standard PostgreSQL tools (`pg_dump`) to perform **regular, automated backups** (e.g., daily). Backup files will be stored securely in a separate, off-site location (e.g., a cloud storage bucket). The ADR will also define the Recovery Time Objective (RTO) and Recovery Point Objective (RPO) and document the step-by-step procedure for restoring from a backup.
|
|
|
|
## Consequences
|
|
|
|
- **Positive**: Protects against catastrophic data loss, ensuring business continuity. Provides a clear, tested plan for disaster recovery.
|
|
- **Negative**: Requires setup and maintenance of backup scripts and secure storage. Incurs storage costs for backup files.
|
|
|
|
## Implementation Details
|
|
|
|
### Backup Workflow
|
|
|
|
Located in `.gitea/workflows/manual-db-backup.yml`:
|
|
|
|
```yaml
|
|
name: Manual - Backup Production Database
|
|
|
|
on:
|
|
workflow_dispatch:
|
|
inputs:
|
|
confirmation:
|
|
description: 'Type "backup-production-db" to confirm'
|
|
required: true
|
|
|
|
jobs:
|
|
backup-database:
|
|
runs-on: projectium.com
|
|
|
|
env:
|
|
DB_HOST: ${{ secrets.DB_HOST }}
|
|
DB_PORT: ${{ secrets.DB_PORT }}
|
|
DB_USER: ${{ secrets.DB_USER }}
|
|
DB_PASSWORD: ${{ secrets.DB_PASSWORD }}
|
|
DB_NAME: ${{ secrets.DB_NAME_PROD }}
|
|
|
|
steps:
|
|
- name: Validate Secrets
|
|
run: |
|
|
if [ -z "$DB_HOST" ] || [ -z "$DB_USER" ]; then
|
|
echo "ERROR: Database secrets not configured."
|
|
exit 1
|
|
fi
|
|
|
|
- name: Create Database Backup
|
|
run: |
|
|
TIMESTAMP=$(date +'%Y%m%d-%H%M%S')
|
|
BACKUP_FILENAME="flyer-crawler-prod-backup-${TIMESTAMP}.sql.gz"
|
|
|
|
# Create compressed backup
|
|
PGPASSWORD="$DB_PASSWORD" pg_dump \
|
|
-h "$DB_HOST" -p "$DB_PORT" \
|
|
-U "$DB_USER" -d "$DB_NAME" \
|
|
--clean --if-exists | gzip > "$BACKUP_FILENAME"
|
|
|
|
echo "backup_filename=$BACKUP_FILENAME" >> $GITEA_ENV
|
|
|
|
- name: Upload Backup as Artifact
|
|
uses: actions/upload-artifact@v3
|
|
with:
|
|
name: database-backup
|
|
path: ${{ env.backup_filename }}
|
|
```
|
|
|
|
### Restore Workflow
|
|
|
|
Located in `.gitea/workflows/manual-db-restore.yml`:
|
|
|
|
```yaml
|
|
name: Manual - Restore Database from Backup
|
|
|
|
on:
|
|
workflow_dispatch:
|
|
inputs:
|
|
confirmation:
|
|
description: 'Type "restore-from-backup" to confirm'
|
|
required: true
|
|
backup_file:
|
|
description: 'Path to backup file on server'
|
|
required: true
|
|
|
|
jobs:
|
|
restore-database:
|
|
steps:
|
|
- name: Verify Confirmation
|
|
run: |
|
|
if [ "${{ inputs.confirmation }}" != "restore-from-backup" ]; then
|
|
exit 1
|
|
fi
|
|
|
|
- name: Restore Database
|
|
run: |
|
|
# Decompress and restore
|
|
gunzip -c "${{ inputs.backup_file }}" | \
|
|
PGPASSWORD="$DB_PASSWORD" psql \
|
|
-h "$DB_HOST" -p "$DB_PORT" \
|
|
-U "$DB_USER" -d "$DB_NAME"
|
|
```
|
|
|
|
### Backup Command Reference
|
|
|
|
**Manual Backup**:
|
|
|
|
```bash
|
|
# Create compressed backup
|
|
PGPASSWORD="password" pg_dump \
|
|
-h localhost -p 5432 \
|
|
-U dbuser -d flyer-crawler \
|
|
--clean --if-exists | gzip > backup-$(date +%Y%m%d).sql.gz
|
|
|
|
# List backup contents (without restoring)
|
|
gunzip -c backup-20260109.sql.gz | head -100
|
|
```
|
|
|
|
**Manual Restore**:
|
|
|
|
```bash
|
|
# Restore from compressed backup
|
|
gunzip -c backup-20260109.sql.gz | \
|
|
PGPASSWORD="password" psql \
|
|
-h localhost -p 5432 \
|
|
-U dbuser -d flyer-crawler
|
|
```
|
|
|
|
### pg_dump Options
|
|
|
|
| Option | Purpose |
|
|
| ----------------- | ------------------------------ |
|
|
| `--clean` | Drop objects before recreating |
|
|
| `--if-exists` | Use IF EXISTS when dropping |
|
|
| `--no-owner` | Skip ownership commands |
|
|
| `--no-privileges` | Skip access privilege commands |
|
|
| `-F c` | Custom format (for pg_restore) |
|
|
| `-F p` | Plain text SQL (default) |
|
|
|
|
### Recovery Objectives
|
|
|
|
| Metric | Target | Current |
|
|
| ---------------------------------- | -------- | -------------- |
|
|
| **RPO** (Recovery Point Objective) | 24 hours | Manual trigger |
|
|
| **RTO** (Recovery Time Objective) | 1 hour | ~15 minutes |
|
|
|
|
### Backup Retention Policy
|
|
|
|
| Type | Retention | Storage |
|
|
| --------------- | --------- | ---------------- |
|
|
| Daily backups | 7 days | Gitea artifacts |
|
|
| Weekly backups | 4 weeks | Gitea artifacts |
|
|
| Monthly backups | 12 months | Off-site storage |
|
|
|
|
### Backup Verification
|
|
|
|
Periodically test backup integrity:
|
|
|
|
```bash
|
|
# Verify backup can be read
|
|
gunzip -t backup-20260109.sql.gz
|
|
|
|
# Test restore to a temporary database
|
|
createdb flyer-crawler-restore-test
|
|
gunzip -c backup-20260109.sql.gz | psql -d flyer-crawler-restore-test
|
|
# Verify data integrity...
|
|
dropdb flyer-crawler-restore-test
|
|
```
|
|
|
|
### Disaster Recovery Checklist
|
|
|
|
1. **Identify the Issue**
|
|
- Data corruption?
|
|
- Accidental deletion?
|
|
- Full database loss?
|
|
|
|
2. **Select Backup**
|
|
- Find most recent valid backup
|
|
- Download from Gitea artifacts or off-site storage
|
|
|
|
3. **Stop Application**
|
|
|
|
```bash
|
|
pm2 stop all
|
|
```
|
|
|
|
4. **Restore Database**
|
|
|
|
```bash
|
|
gunzip -c backup.sql.gz | psql -d flyer-crawler
|
|
```
|
|
|
|
5. **Verify Data**
|
|
- Check table row counts
|
|
- Verify recent data exists
|
|
- Test critical queries
|
|
|
|
6. **Restart Application**
|
|
|
|
```bash
|
|
pm2 start all
|
|
```
|
|
|
|
7. **Post-Mortem**
|
|
- Document incident
|
|
- Update procedures if needed
|
|
|
|
## Key Files
|
|
|
|
- `.gitea/workflows/manual-db-backup.yml` - Backup workflow
|
|
- `.gitea/workflows/manual-db-restore.yml` - Restore workflow
|
|
- `.gitea/workflows/manual-db-reset-test.yml` - Reset test database
|
|
- `.gitea/workflows/manual-db-reset-prod.yml` - Reset production database
|
|
- `sql/master_schema_rollup.sql` - Current schema definition
|
|
|
|
## Related ADRs
|
|
|
|
- [ADR-013](./0013-database-schema-migration-strategy.md) - Schema Migration Strategy
|
|
- [ADR-017](./0017-ci-cd-and-branching-strategy.md) - CI/CD Strategy
|