# ADR-023: Database Normalization and Referential Integrity **Date:** 2026-01-19 **Status:** Accepted **Context:** API design violates database normalization principles ## Problem Statement The application's API layer currently accepts string-based references (category names) instead of numerical IDs when creating relationships between entities. This violates database normalization principles and creates a brittle, error-prone API contract. **Example of Current Problem:** ```typescript // API accepts string: POST /api/users/watched-items { "itemName": "Milk", "category": "Dairy & Eggs" } // ❌ String reference // But database uses normalized foreign keys: CREATE TABLE master_grocery_items ( category_id BIGINT REFERENCES categories(category_id) -- ✅ Proper FK ) ``` This mismatch forces the service layer to perform string lookups on every request: ```typescript // Service must do string matching: const categoryRes = await client.query( 'SELECT category_id FROM categories WHERE name = $1', [categoryName], // ❌ Error-prone string matching ); ``` ## Database Normal Forms (In Order of Importance) ### 1. First Normal Form (1NF) ✅ Currently Satisfied **Rule:** Each column contains atomic values; no repeating groups. **Status:** ✅ **Compliant** - All columns contain single values - No arrays or delimited strings in columns - Each row is uniquely identifiable **Example:** ```sql -- ✅ Good: Atomic values CREATE TABLE master_grocery_items ( master_grocery_item_id BIGINT PRIMARY KEY, name TEXT, category_id BIGINT ); -- ❌ Bad: Non-atomic values (violates 1NF) CREATE TABLE items ( id BIGINT, categories TEXT -- "Dairy,Frozen,Snacks" (comma-delimited) ); ``` ### 2. Second Normal Form (2NF) ✅ Currently Satisfied **Rule:** No partial dependencies; all non-key columns depend on the entire primary key. **Status:** ✅ **Compliant** - All tables use single-column primary keys (no composite keys) - All non-key columns depend on the entire primary key **Example:** ```sql -- ✅ Good: All columns depend on full primary key CREATE TABLE flyer_items ( flyer_item_id BIGINT PRIMARY KEY, flyer_id BIGINT, -- Depends on flyer_item_id master_item_id BIGINT, -- Depends on flyer_item_id price_in_cents INT -- Depends on flyer_item_id ); -- ❌ Bad: Partial dependency (violates 2NF) CREATE TABLE flyer_items ( flyer_id BIGINT, item_id BIGINT, store_name TEXT, -- Depends only on flyer_id, not (flyer_id, item_id) PRIMARY KEY (flyer_id, item_id) ); ``` ### 3. Third Normal Form (3NF) ⚠️ VIOLATED IN API LAYER **Rule:** No transitive dependencies; non-key columns depend only on the primary key, not on other non-key columns. **Status:** ⚠️ **Database is compliant, but API layer violates this principle** **Database Schema (Correct):** ```sql -- ✅ Categories are normalized CREATE TABLE categories ( category_id BIGINT PRIMARY KEY, name TEXT NOT NULL UNIQUE ); CREATE TABLE master_grocery_items ( master_grocery_item_id BIGINT PRIMARY KEY, name TEXT, category_id BIGINT REFERENCES categories(category_id) -- Direct reference ); ``` **API Layer (Violates 3NF Principle):** ```typescript // ❌ API accepts category name instead of ID POST /api/users/watched-items { "itemName": "Milk", "category": "Dairy & Eggs" // String! Should be category_id } // Service layer must denormalize by doing lookup: SELECT category_id FROM categories WHERE name = $1 ``` This creates a **transitive dependency** in the application layer: - `watched_item` → `category_name` → `category_id` - Instead of direct: `watched_item` → `category_id` ### 4. Boyce-Codd Normal Form (BCNF) ✅ Currently Satisfied **Rule:** Every determinant is a candidate key (stricter version of 3NF). **Status:** ✅ **Compliant** - All foreign key references use primary keys - No non-trivial functional dependencies where determinant is not a superkey ### 5. Fourth Normal Form (4NF) ✅ Currently Satisfied **Rule:** No multi-valued dependencies; a record should not contain independent multi-valued facts. **Status:** ✅ **Compliant** - Junction tables properly separate many-to-many relationships - Examples: `user_watched_items`, `shopping_list_items`, `recipe_ingredients` ### 6. Fifth Normal Form (5NF) ✅ Currently Satisfied **Rule:** No join dependencies; tables cannot be decomposed further without loss of information. **Status:** ✅ **Compliant** (as far as schema design goes) ## Impact of API Violation ### 1. Brittleness ```typescript // Test fails because of exact string matching: addWatchedItem('Milk', 'Dairy'); // ❌ Fails - not exact match addWatchedItem('Milk', 'Dairy & Eggs'); // ✅ Works - exact match addWatchedItem('Milk', 'dairy & eggs'); // ❌ Fails - case sensitive ``` ### 2. No Discovery Mechanism - No API endpoint to list available categories - Frontend cannot dynamically populate dropdowns - Clients must hardcode category names ### 3. Performance Penalty ```sql -- Current: String lookup on every request SELECT category_id FROM categories WHERE name = $1; -- Full table scan or index scan -- Should be: Direct ID reference (no lookup needed) INSERT INTO master_grocery_items (name, category_id) VALUES ($1, $2); ``` ### 4. Impossible Localization - Cannot translate category names without breaking API - Category names are hardcoded in English ### 5. Maintenance Burden - Renaming a category breaks all API clients - Must coordinate name changes across frontend, tests, and documentation ## Decision **We adopt the following principles for all API design:** ### 1. Use Numerical IDs for All Foreign Key References **Rule:** APIs MUST accept numerical IDs when creating relationships between entities. ```typescript // ✅ CORRECT: Use IDs POST /api/users/watched-items { "itemName": "Milk", "category_id": 3 // Numerical ID } // ❌ INCORRECT: Use strings POST /api/users/watched-items { "itemName": "Milk", "category": "Dairy & Eggs" // String name } ``` ### 2. Provide Discovery Endpoints **Rule:** For any entity referenced by ID, provide a GET endpoint to list available options. ```typescript // Required: Category discovery endpoint GET / api / categories; Response: [ { category_id: 1, name: 'Fruits & Vegetables' }, { category_id: 2, name: 'Meat & Seafood' }, { category_id: 3, name: 'Dairy & Eggs' }, ]; ``` ### 3. Support Lookup by Name (Optional) **Rule:** If convenient, provide query parameters for name-based lookup, but use IDs internally. ```typescript // Optional: Convenience endpoint GET /api/categories?name=Dairy%20%26%20Eggs Response: { "category_id": 3, "name": "Dairy & Eggs" } ``` ### 4. Return Full Objects in Responses **Rule:** API responses SHOULD include denormalized data for convenience, but inputs MUST use IDs. ```typescript // ✅ Response includes category details GET / api / users / watched - items; Response: [ { master_grocery_item_id: 42, name: 'Milk', category_id: 3, category: { // ✅ Include full object in response category_id: 3, name: 'Dairy & Eggs', }, }, ]; ``` ## Affected Areas ### Immediate Violations (Must Fix) 1. **User Watched Items** ([src/routes/user.routes.ts:76](../../src/routes/user.routes.ts)) - Currently: `category: string` - Should be: `category_id: number` 2. **Service Layer** ([src/services/db/personalization.db.ts:175](../../src/services/db/personalization.db.ts)) - Currently: `categoryName: string` - Should be: `categoryId: number` 3. **API Client** ([src/services/apiClient.ts:436](../../src/services/apiClient.ts)) - Currently: `category: string` - Should be: `category_id: number` 4. **Frontend Hooks** ([src/hooks/mutations/useAddWatchedItemMutation.ts:9](../../src/hooks/mutations/useAddWatchedItemMutation.ts)) - Currently: `category?: string` - Should be: `category_id: number` ### Potential Violations (Review Required) 1. **UPC/Barcode System** ([src/types/upc.ts:85](../../src/types/upc.ts)) - Uses `category: string | null` - May be appropriate if category is free-form user input 2. **AI Extraction** ([src/types/ai.ts:21](../../src/types/ai.ts)) - Uses `category_name: z.string()` - AI extracts category names, needs mapping to IDs 3. **Flyer Data Transformer** ([src/services/flyerDataTransformer.ts:40](../../src/services/flyerDataTransformer.ts)) - Uses `category_name: string` - May need category matching/creation logic ## Migration Strategy See [research-category-id-migration.md](../research-category-id-migration.md) for detailed migration plan. **High-level approach:** 1. **Phase 1: Add category discovery endpoint** (non-breaking) - `GET /api/categories` - No API changes yet 2. **Phase 2: Support both formats** (non-breaking) - Accept both `category` (string) and `category_id` (number) - Deprecate string format with warning logs 3. **Phase 3: Remove string support** (breaking change, major version bump) - Only accept `category_id` - Update all clients and tests ## Consequences ### Positive - ✅ API matches database schema design - ✅ More robust (no typo-based failures) - ✅ Better performance (no string lookups) - ✅ Enables localization - ✅ Discoverable via REST API - ✅ Follows REST best practices ### Negative - ⚠️ Breaking change for existing API consumers - ⚠️ Requires client updates - ⚠️ More complex migration path ### Neutral - Frontend must fetch categories before displaying form - Slightly more initial API calls (one-time category fetch) ## References - [Database Normalization (Wikipedia)](https://en.wikipedia.org/wiki/Database_normalization) - [REST API Design Best Practices](https://stackoverflow.blog/2020/03/02/best-practices-for-rest-api-design/) - [PostgreSQL Foreign Keys](https://www.postgresql.org/docs/current/ddl-constraints.html#DDL-CONSTRAINTS-FK) ## Related Decisions - [ADR-001: Database Schema Design](./0001-database-schema-design.md) (if exists) - [ADR-014: Containerization and Deployment Strategy](./0014-containerization-and-deployment-strategy.md) ## Approval - **Proposed by:** Claude Code (via user observation) - **Date:** 2026-01-19 - **Status:** Accepted (pending implementation)