Files
flyer-crawler.projectium.com/docs/adr/0023-database-normalization-and-referential-integrity.md
Torben Sorensen 4022768c03
All checks were successful
Deploy to Test Environment / deploy-to-test (push) Successful in 18m39s
set up local e2e tests, and some e2e test fixes + docs on more db fixin - ugh
2026-01-19 13:45:21 -08:00

10 KiB

ADR-023: Database Normalization and Referential Integrity

Date: 2026-01-19 Status: Accepted Context: API design violates database normalization principles

Problem Statement

The application's API layer currently accepts string-based references (category names) instead of numerical IDs when creating relationships between entities. This violates database normalization principles and creates a brittle, error-prone API contract.

Example of Current Problem:

// API accepts string:
POST /api/users/watched-items
{ "itemName": "Milk", "category": "Dairy & Eggs" }  // ❌ String reference

// But database uses normalized foreign keys:
CREATE TABLE master_grocery_items (
  category_id BIGINT REFERENCES categories(category_id)  --  Proper FK
)

This mismatch forces the service layer to perform string lookups on every request:

// Service must do string matching:
const categoryRes = await client.query(
  'SELECT category_id FROM categories WHERE name = $1',
  [categoryName], // ❌ Error-prone string matching
);

Database Normal Forms (In Order of Importance)

1. First Normal Form (1NF) Currently Satisfied

Rule: Each column contains atomic values; no repeating groups.

Status: Compliant

  • All columns contain single values
  • No arrays or delimited strings in columns
  • Each row is uniquely identifiable

Example:

-- ✅ Good: Atomic values
CREATE TABLE master_grocery_items (
  master_grocery_item_id BIGINT PRIMARY KEY,
  name TEXT,
  category_id BIGINT
);

-- ❌ Bad: Non-atomic values (violates 1NF)
CREATE TABLE items (
  id BIGINT,
  categories TEXT  -- "Dairy,Frozen,Snacks" (comma-delimited)
);

2. Second Normal Form (2NF) Currently Satisfied

Rule: No partial dependencies; all non-key columns depend on the entire primary key.

Status: Compliant

  • All tables use single-column primary keys (no composite keys)
  • All non-key columns depend on the entire primary key

Example:

-- ✅ Good: All columns depend on full primary key
CREATE TABLE flyer_items (
  flyer_item_id BIGINT PRIMARY KEY,
  flyer_id BIGINT,           -- Depends on flyer_item_id
  master_item_id BIGINT,     -- Depends on flyer_item_id
  price_in_cents INT         -- Depends on flyer_item_id
);

-- ❌ Bad: Partial dependency (violates 2NF)
CREATE TABLE flyer_items (
  flyer_id BIGINT,
  item_id BIGINT,
  store_name TEXT,    -- Depends only on flyer_id, not (flyer_id, item_id)
  PRIMARY KEY (flyer_id, item_id)
);

3. Third Normal Form (3NF) ⚠️ VIOLATED IN API LAYER

Rule: No transitive dependencies; non-key columns depend only on the primary key, not on other non-key columns.

Status: ⚠️ Database is compliant, but API layer violates this principle

Database Schema (Correct):

-- ✅ Categories are normalized
CREATE TABLE categories (
  category_id BIGINT PRIMARY KEY,
  name TEXT NOT NULL UNIQUE
);

CREATE TABLE master_grocery_items (
  master_grocery_item_id BIGINT PRIMARY KEY,
  name TEXT,
  category_id BIGINT REFERENCES categories(category_id)  -- Direct reference
);

API Layer (Violates 3NF Principle):

// ❌ API accepts category name instead of ID
POST /api/users/watched-items
{
  "itemName": "Milk",
  "category": "Dairy & Eggs"  // String! Should be category_id
}

// Service layer must denormalize by doing lookup:
SELECT category_id FROM categories WHERE name = $1

This creates a transitive dependency in the application layer:

  • watched_itemcategory_namecategory_id
  • Instead of direct: watched_itemcategory_id

4. Boyce-Codd Normal Form (BCNF) Currently Satisfied

Rule: Every determinant is a candidate key (stricter version of 3NF).

Status: Compliant

  • All foreign key references use primary keys
  • No non-trivial functional dependencies where determinant is not a superkey

5. Fourth Normal Form (4NF) Currently Satisfied

Rule: No multi-valued dependencies; a record should not contain independent multi-valued facts.

Status: Compliant

  • Junction tables properly separate many-to-many relationships
  • Examples: user_watched_items, shopping_list_items, recipe_ingredients

6. Fifth Normal Form (5NF) Currently Satisfied

Rule: No join dependencies; tables cannot be decomposed further without loss of information.

Status: Compliant (as far as schema design goes)

Impact of API Violation

1. Brittleness

// Test fails because of exact string matching:
addWatchedItem('Milk', 'Dairy'); // ❌ Fails - not exact match
addWatchedItem('Milk', 'Dairy & Eggs'); // ✅ Works - exact match
addWatchedItem('Milk', 'dairy & eggs'); // ❌ Fails - case sensitive

2. No Discovery Mechanism

  • No API endpoint to list available categories
  • Frontend cannot dynamically populate dropdowns
  • Clients must hardcode category names

3. Performance Penalty

-- Current: String lookup on every request
SELECT category_id FROM categories WHERE name = $1;  -- Full table scan or index scan

-- Should be: Direct ID reference (no lookup needed)
INSERT INTO master_grocery_items (name, category_id) VALUES ($1, $2);

4. Impossible Localization

  • Cannot translate category names without breaking API
  • Category names are hardcoded in English

5. Maintenance Burden

  • Renaming a category breaks all API clients
  • Must coordinate name changes across frontend, tests, and documentation

Decision

We adopt the following principles for all API design:

1. Use Numerical IDs for All Foreign Key References

Rule: APIs MUST accept numerical IDs when creating relationships between entities.

// ✅ CORRECT: Use IDs
POST /api/users/watched-items
{
  "itemName": "Milk",
  "category_id": 3  // Numerical ID
}

// ❌ INCORRECT: Use strings
POST /api/users/watched-items
{
  "itemName": "Milk",
  "category": "Dairy & Eggs"  // String name
}

2. Provide Discovery Endpoints

Rule: For any entity referenced by ID, provide a GET endpoint to list available options.

// Required: Category discovery endpoint
GET / api / categories;
Response: [
  { category_id: 1, name: 'Fruits & Vegetables' },
  { category_id: 2, name: 'Meat & Seafood' },
  { category_id: 3, name: 'Dairy & Eggs' },
];

3. Support Lookup by Name (Optional)

Rule: If convenient, provide query parameters for name-based lookup, but use IDs internally.

// Optional: Convenience endpoint
GET /api/categories?name=Dairy%20%26%20Eggs
Response: { "category_id": 3, "name": "Dairy & Eggs" }

4. Return Full Objects in Responses

Rule: API responses SHOULD include denormalized data for convenience, but inputs MUST use IDs.

// ✅ Response includes category details
GET / api / users / watched - items;
Response: [
  {
    master_grocery_item_id: 42,
    name: 'Milk',
    category_id: 3,
    category: {
      // ✅ Include full object in response
      category_id: 3,
      name: 'Dairy & Eggs',
    },
  },
];

Affected Areas

Immediate Violations (Must Fix)

  1. User Watched Items (src/routes/user.routes.ts:76)

    • Currently: category: string
    • Should be: category_id: number
  2. Service Layer (src/services/db/personalization.db.ts:175)

    • Currently: categoryName: string
    • Should be: categoryId: number
  3. API Client (src/services/apiClient.ts:436)

    • Currently: category: string
    • Should be: category_id: number
  4. Frontend Hooks (src/hooks/mutations/useAddWatchedItemMutation.ts:9)

    • Currently: category?: string
    • Should be: category_id: number

Potential Violations (Review Required)

  1. UPC/Barcode System (src/types/upc.ts:85)

    • Uses category: string | null
    • May be appropriate if category is free-form user input
  2. AI Extraction (src/types/ai.ts:21)

    • Uses category_name: z.string()
    • AI extracts category names, needs mapping to IDs
  3. Flyer Data Transformer (src/services/flyerDataTransformer.ts:40)

    • Uses category_name: string
    • May need category matching/creation logic

Migration Strategy

See research-category-id-migration.md for detailed migration plan.

High-level approach:

  1. Phase 1: Add category discovery endpoint (non-breaking)

    • GET /api/categories
    • No API changes yet
  2. Phase 2: Support both formats (non-breaking)

    • Accept both category (string) and category_id (number)
    • Deprecate string format with warning logs
  3. Phase 3: Remove string support (breaking change, major version bump)

    • Only accept category_id
    • Update all clients and tests

Consequences

Positive

  • API matches database schema design
  • More robust (no typo-based failures)
  • Better performance (no string lookups)
  • Enables localization
  • Discoverable via REST API
  • Follows REST best practices

Negative

  • ⚠️ Breaking change for existing API consumers
  • ⚠️ Requires client updates
  • ⚠️ More complex migration path

Neutral

  • Frontend must fetch categories before displaying form
  • Slightly more initial API calls (one-time category fetch)

References

Approval

  • Proposed by: Claude Code (via user observation)
  • Date: 2026-01-19
  • Status: Accepted (pending implementation)