Files
flyer-crawler.projectium.com/docs/architecture/DATABASE.md
Torben Sorensen 45ac4fccf5
Some checks failed
Deploy to Test Environment / deploy-to-test (push) Failing after 2m15s
comprehensive documentation review + test fixes
2026-01-28 16:35:38 -08:00

15 KiB

Database Architecture

Version: 0.12.20 Last Updated: 2026-01-28

Flyer Crawler uses PostgreSQL 16 with PostGIS for geographic data, pg_trgm for fuzzy text search, and uuid-ossp for UUID generation. The database contains 65 tables organized into logical domains.

Table of Contents

  1. Schema Overview
  2. Database Setup
  3. Schema Reference
  4. Related Documentation

Schema Overview

The database is organized into the following domains:

Core Infrastructure (6 tables)

Table Purpose Primary Key
users Authentication credentials and login data user_id (UUID)
profiles Public user data, preferences, points user_id (UUID)
addresses Normalized address storage with geocoding address_id
activity_log User activity audit trail activity_log_id
password_reset_tokens Temporary tokens for password reset token_id
schema_info Schema deployment metadata environment

Stores and Locations (4 tables)

Table Purpose Primary Key
stores Grocery store chains (Safeway, Kroger) store_id
store_locations Physical store locations with addresses store_location_id
favorite_stores User store favorites user_id, store_id
store_receipt_patterns Receipt text patterns for store ID pattern_id

Flyers and Items (7 tables)

Table Purpose Primary Key
flyers Uploaded flyer metadata and status flyer_id
flyer_items Individual deals extracted from flyers flyer_item_id
flyer_locations Flyer-to-location associations flyer_location_id
categories Item categorization (Produce, Dairy) category_id
master_grocery_items Canonical grocery item dictionary master_grocery_item_id
master_item_aliases Alternative names for master items alias_id
unmatched_flyer_items Items pending master item matching unmatched_item_id

Products and Brands (2 tables)

Table Purpose Primary Key
brands Brand names (Coca-Cola, Kraft) brand_id
products Specific products (master item + brand + size) product_id

Price Tracking (3 tables)

Table Purpose Primary Key
item_price_history Historical prices for master items price_history_id
user_submitted_prices User-contributed price reports submission_id
suggested_corrections Suggested edits to flyer items correction_id

User Features (8 tables)

Table Purpose Primary Key
user_watched_items Items user wants to track prices for user_watched_item_id
user_alerts Price alert thresholds alert_id
notifications User notifications notification_id
user_item_aliases User-defined item name aliases alias_id
user_follows User-to-user follow relationships follower_id, following_id
user_reactions Reactions to content (likes, etc.) reaction_id
budgets User-defined spending budgets budget_id
search_queries Search history for analytics query_id

Shopping Lists (4 tables)

Table Purpose Primary Key
shopping_lists User shopping lists shopping_list_id
shopping_list_items Items on shopping lists shopping_list_item_id
shared_shopping_lists Shopping list sharing shared_shopping_list_id
shopping_trips Completed shopping trips trip_id
shopping_trip_items Items purchased on trips trip_item_id

Recipes (11 tables)

Table Purpose Primary Key
recipes User recipes with metadata recipe_id
recipe_ingredients Recipe ingredient list recipe_ingredient_id
recipe_ingredient_substitutions Ingredient alternatives substitution_id
tags Recipe tags (vegan, quick, etc.) tag_id
recipe_tags Recipe-to-tag associations recipe_id, tag_id
appliances Kitchen appliances appliance_id
recipe_appliances Appliances needed for recipes recipe_id, appliance_id
recipe_ratings User ratings for recipes rating_id
recipe_comments User comments on recipes comment_id
favorite_recipes User recipe favorites user_id, recipe_id
recipe_collections User recipe collections collection_id

Meal Planning (3 tables)

Table Purpose Primary Key
menu_plans Weekly/monthly meal plans menu_plan_id
shared_menu_plans Menu plan sharing share_id
planned_meals Individual meals in a plan planned_meal_id

Pantry and Inventory (4 tables)

Table Purpose Primary Key
pantry_items User pantry inventory pantry_item_id
pantry_locations Storage locations (fridge, freezer) location_id
expiry_date_ranges Reference shelf life data expiry_range_id
expiry_alerts User expiry notification preferences expiry_alert_id
expiry_alert_log Sent expiry notifications alert_log_id

Receipts (4 tables)

Table Purpose Primary Key
receipts Scanned receipt metadata receipt_id
receipt_items Items parsed from receipts receipt_item_id
receipt_processing_log OCR/AI processing audit trail log_id

UPC Scanning (2 tables)

Table Purpose Primary Key
upc_scan_history User barcode scan history scan_id
upc_external_lookups External UPC API response cache lookup_id

Gamification (2 tables)

Table Purpose Primary Key
achievements Defined achievements achievement_id
user_achievements Achievements earned by users user_id, achievement_id

User Preferences (3 tables)

Table Purpose Primary Key
dietary_restrictions Defined dietary restrictions restriction_id
user_dietary_restrictions User dietary preferences user_id, restriction_id
user_appliances Appliances user owns user_id, appliance_id

Reference Data (1 table)

Table Purpose Primary Key
unit_conversions Unit conversion factors conversion_id

Database Setup

Required Extensions

Extension Purpose
postgis Geographic/spatial data for store locations
pg_trgm Trigram matching for fuzzy text search
uuid-ossp UUID generation for primary keys

Database Users

This project uses environment-specific database users to isolate production and test environments:

User Database Purpose
flyer_crawler_prod flyer-crawler-prod Production
flyer_crawler_test flyer-crawler-test Testing

Production Database Setup

Step 1: Install PostgreSQL

sudo apt update
sudo apt install postgresql postgresql-contrib

Step 2: Create Database and User

Switch to the postgres system user:

sudo -u postgres psql

Run the following SQL commands (replace 'a_very_strong_password' with a secure password):

-- Create the production role
CREATE ROLE flyer_crawler_prod WITH LOGIN PASSWORD 'a_very_strong_password';

-- Create the production database
CREATE DATABASE "flyer-crawler-prod" WITH OWNER = flyer_crawler_prod;

-- Connect to the new database
\c "flyer-crawler-prod"

-- Grant schema privileges
ALTER SCHEMA public OWNER TO flyer_crawler_prod;
GRANT CREATE, USAGE ON SCHEMA public TO flyer_crawler_prod;

-- Install required extensions (must be done as superuser)
CREATE EXTENSION IF NOT EXISTS postgis;
CREATE EXTENSION IF NOT EXISTS pg_trgm;
CREATE EXTENSION IF NOT EXISTS "uuid-ossp";

-- Exit
\q

Step 3: Apply the Schema

Navigate to your project directory and run:

psql -U flyer_crawler_prod -d "flyer-crawler-prod" -f sql/master_schema_rollup.sql

This creates all tables, functions, triggers, and seeds essential data (categories, master items).

Step 4: Seed the Admin Account

Set the required environment variables and run the seed script:

export DB_USER=flyer_crawler_prod
export DB_PASSWORD=your_password
export DB_NAME="flyer-crawler-prod"
export DB_HOST=localhost

npx tsx src/db/seed_admin_account.ts

Test Database Setup

The test database is used by CI/CD pipelines and local test runs.

Step 1: Create the Test Database

sudo -u postgres psql
-- Create the test role
CREATE ROLE flyer_crawler_test WITH LOGIN PASSWORD 'a_very_strong_password';

-- Create the test database
CREATE DATABASE "flyer-crawler-test" WITH OWNER = flyer_crawler_test;

-- Connect to the test database
\c "flyer-crawler-test"

-- Grant schema privileges (required for test runner to reset schema)
ALTER SCHEMA public OWNER TO flyer_crawler_test;
GRANT CREATE, USAGE ON SCHEMA public TO flyer_crawler_test;

-- Install required extensions
CREATE EXTENSION IF NOT EXISTS postgis;
CREATE EXTENSION IF NOT EXISTS pg_trgm;
CREATE EXTENSION IF NOT EXISTS "uuid-ossp";

-- Exit
\q

Step 2: Configure CI/CD Secrets

Ensure these secrets are set in your Gitea repository settings:

Shared:

Secret Description
DB_HOST Database hostname (e.g., localhost)
DB_PORT Database port (e.g., 5432)

Production-specific:

Secret Description
DB_USER_PROD Production database user (flyer_crawler_prod)
DB_PASSWORD_PROD Production database password
DB_DATABASE_PROD Production database name (flyer-crawler-prod)

Test-specific:

Secret Description
DB_USER_TEST Test database user (flyer_crawler_test)
DB_PASSWORD_TEST Test database password
DB_DATABASE_TEST Test database name (flyer-crawler-test)

How the Test Pipeline Works

The CI pipeline uses a permanent test database that gets reset on each test run:

  1. Setup: The vitest global setup script connects to flyer-crawler-test
  2. Schema Reset: Executes sql/drop_tables.sql (DROP SCHEMA public CASCADE)
  3. Schema Application: Runs sql/master_schema_rollup.sql to build a fresh schema
  4. Test Execution: Tests run against the clean database

This approach is faster than creating/destroying databases and doesn't require sudo access.


Connecting to Production Database

psql -h localhost -U flyer_crawler_prod -d "flyer-crawler-prod" -W

Checking PostGIS Version

SELECT version();
SELECT PostGIS_Full_Version();

Example output:

PostgreSQL 14.19 (Ubuntu 14.19-0ubuntu0.22.04.1)
POSTGIS="3.2.0 c3e3cc0" GEOS="3.10.2-CAPI-1.16.0" PROJ="8.2.1"

Schema Files

File Purpose
sql/master_schema_rollup.sql Complete schema with all tables, functions, and seed data
sql/drop_tables.sql Drops entire schema (used by test runner)
sql/schema.sql.txt Legacy schema file (reference only)

Backup and Restore

Create a Backup

pg_dump -U flyer_crawler_prod -d "flyer-crawler-prod" -F c -f backup.dump

Restore from Backup

pg_restore -U flyer_crawler_prod -d "flyer-crawler-prod" -c backup.dump