Files
flyer-crawler.projectium.com/docs/adr/0019-data-backup-and-recovery-strategy.md

6.3 KiB

ADR-019: Data Backup and Recovery Strategy

Date: 2025-12-12

Status: Accepted

Implemented: 2026-01-09

Context

The application's data, stored in PostgreSQL, is critical. Currently, there is no formalized or automated strategy for creating backups or for recovering data in the event of hardware failure, data corruption, or accidental deletion.

Decision

We will implement a formal data backup and recovery strategy. This will involve using standard PostgreSQL tools (pg_dump) to perform regular, automated backups (e.g., daily). Backup files will be stored securely in a separate, off-site location (e.g., a cloud storage bucket). The ADR will also define the Recovery Time Objective (RTO) and Recovery Point Objective (RPO) and document the step-by-step procedure for restoring from a backup.

Consequences

  • Positive: Protects against catastrophic data loss, ensuring business continuity. Provides a clear, tested plan for disaster recovery.
  • Negative: Requires setup and maintenance of backup scripts and secure storage. Incurs storage costs for backup files.

Implementation Details

Backup Workflow

Located in .gitea/workflows/manual-db-backup.yml:

name: Manual - Backup Production Database

on:
  workflow_dispatch:
    inputs:
      confirmation:
        description: 'Type "backup-production-db" to confirm'
        required: true

jobs:
  backup-database:
    runs-on: projectium.com

    env:
      DB_HOST: ${{ secrets.DB_HOST }}
      DB_PORT: ${{ secrets.DB_PORT }}
      DB_USER: ${{ secrets.DB_USER }}
      DB_PASSWORD: ${{ secrets.DB_PASSWORD }}
      DB_NAME: ${{ secrets.DB_NAME_PROD }}

    steps:
      - name: Validate Secrets
        run: |
          if [ -z "$DB_HOST" ] || [ -z "$DB_USER" ]; then
            echo "ERROR: Database secrets not configured."
            exit 1
          fi

      - name: Create Database Backup
        run: |
          TIMESTAMP=$(date +'%Y%m%d-%H%M%S')
          BACKUP_FILENAME="flyer-crawler-prod-backup-${TIMESTAMP}.sql.gz"

          # Create compressed backup
          PGPASSWORD="$DB_PASSWORD" pg_dump \
            -h "$DB_HOST" -p "$DB_PORT" \
            -U "$DB_USER" -d "$DB_NAME" \
            --clean --if-exists | gzip > "$BACKUP_FILENAME"

          echo "backup_filename=$BACKUP_FILENAME" >> $GITEA_ENV

      - name: Upload Backup as Artifact
        uses: actions/upload-artifact@v3
        with:
          name: database-backup
          path: ${{ env.backup_filename }}

Restore Workflow

Located in .gitea/workflows/manual-db-restore.yml:

name: Manual - Restore Database from Backup

on:
  workflow_dispatch:
    inputs:
      confirmation:
        description: 'Type "restore-from-backup" to confirm'
        required: true
      backup_file:
        description: 'Path to backup file on server'
        required: true

jobs:
  restore-database:
    steps:
      - name: Verify Confirmation
        run: |
          if [ "${{ inputs.confirmation }}" != "restore-from-backup" ]; then
            exit 1
          fi

      - name: Restore Database
        run: |
          # Decompress and restore
          gunzip -c "${{ inputs.backup_file }}" | \
            PGPASSWORD="$DB_PASSWORD" psql \
              -h "$DB_HOST" -p "$DB_PORT" \
              -U "$DB_USER" -d "$DB_NAME"

Backup Command Reference

Manual Backup:

# Create compressed backup
PGPASSWORD="password" pg_dump \
  -h localhost -p 5432 \
  -U dbuser -d flyer-crawler \
  --clean --if-exists | gzip > backup-$(date +%Y%m%d).sql.gz

# List backup contents (without restoring)
gunzip -c backup-20260109.sql.gz | head -100

Manual Restore:

# Restore from compressed backup
gunzip -c backup-20260109.sql.gz | \
  PGPASSWORD="password" psql \
    -h localhost -p 5432 \
    -U dbuser -d flyer-crawler

pg_dump Options

Option Purpose
--clean Drop objects before recreating
--if-exists Use IF EXISTS when dropping
--no-owner Skip ownership commands
--no-privileges Skip access privilege commands
-F c Custom format (for pg_restore)
-F p Plain text SQL (default)

Recovery Objectives

Metric Target Current
RPO (Recovery Point Objective) 24 hours Manual trigger
RTO (Recovery Time Objective) 1 hour ~15 minutes

Backup Retention Policy

Type Retention Storage
Daily backups 7 days Gitea artifacts
Weekly backups 4 weeks Gitea artifacts
Monthly backups 12 months Off-site storage

Backup Verification

Periodically test backup integrity:

# Verify backup can be read
gunzip -t backup-20260109.sql.gz

# Test restore to a temporary database
createdb flyer-crawler-restore-test
gunzip -c backup-20260109.sql.gz | psql -d flyer-crawler-restore-test
# Verify data integrity...
dropdb flyer-crawler-restore-test

Disaster Recovery Checklist

  1. Identify the Issue

    • Data corruption?
    • Accidental deletion?
    • Full database loss?
  2. Select Backup

    • Find most recent valid backup
    • Download from Gitea artifacts or off-site storage
  3. Stop Application

    pm2 stop all
    
  4. Restore Database

    gunzip -c backup.sql.gz | psql -d flyer-crawler
    
  5. Verify Data

    • Check table row counts
    • Verify recent data exists
    • Test critical queries
  6. Restart Application

    pm2 start all
    
  7. Post-Mortem

    • Document incident
    • Update procedures if needed

Key Files

  • .gitea/workflows/manual-db-backup.yml - Backup workflow
  • .gitea/workflows/manual-db-restore.yml - Restore workflow
  • .gitea/workflows/manual-db-reset-test.yml - Reset test database
  • .gitea/workflows/manual-db-reset-prod.yml - Reset production database
  • sql/master_schema_rollup.sql - Current schema definition