10 KiB
ADR-023: Database Normalization and Referential Integrity
Date: 2026-01-19 Status: Accepted Context: API design violates database normalization principles
Problem Statement
The application's API layer currently accepts string-based references (category names) instead of numerical IDs when creating relationships between entities. This violates database normalization principles and creates a brittle, error-prone API contract.
Example of Current Problem:
// API accepts string:
POST /api/users/watched-items
{ "itemName": "Milk", "category": "Dairy & Eggs" } // ❌ String reference
// But database uses normalized foreign keys:
CREATE TABLE master_grocery_items (
category_id BIGINT REFERENCES categories(category_id) -- ✅ Proper FK
)
This mismatch forces the service layer to perform string lookups on every request:
// Service must do string matching:
const categoryRes = await client.query(
'SELECT category_id FROM categories WHERE name = $1',
[categoryName], // ❌ Error-prone string matching
);
Database Normal Forms (In Order of Importance)
1. First Normal Form (1NF) ✅ Currently Satisfied
Rule: Each column contains atomic values; no repeating groups.
Status: ✅ Compliant
- All columns contain single values
- No arrays or delimited strings in columns
- Each row is uniquely identifiable
Example:
-- ✅ Good: Atomic values
CREATE TABLE master_grocery_items (
master_grocery_item_id BIGINT PRIMARY KEY,
name TEXT,
category_id BIGINT
);
-- ❌ Bad: Non-atomic values (violates 1NF)
CREATE TABLE items (
id BIGINT,
categories TEXT -- "Dairy,Frozen,Snacks" (comma-delimited)
);
2. Second Normal Form (2NF) ✅ Currently Satisfied
Rule: No partial dependencies; all non-key columns depend on the entire primary key.
Status: ✅ Compliant
- All tables use single-column primary keys (no composite keys)
- All non-key columns depend on the entire primary key
Example:
-- ✅ Good: All columns depend on full primary key
CREATE TABLE flyer_items (
flyer_item_id BIGINT PRIMARY KEY,
flyer_id BIGINT, -- Depends on flyer_item_id
master_item_id BIGINT, -- Depends on flyer_item_id
price_in_cents INT -- Depends on flyer_item_id
);
-- ❌ Bad: Partial dependency (violates 2NF)
CREATE TABLE flyer_items (
flyer_id BIGINT,
item_id BIGINT,
store_name TEXT, -- Depends only on flyer_id, not (flyer_id, item_id)
PRIMARY KEY (flyer_id, item_id)
);
3. Third Normal Form (3NF) ⚠️ VIOLATED IN API LAYER
Rule: No transitive dependencies; non-key columns depend only on the primary key, not on other non-key columns.
Status: ⚠️ Database is compliant, but API layer violates this principle
Database Schema (Correct):
-- ✅ Categories are normalized
CREATE TABLE categories (
category_id BIGINT PRIMARY KEY,
name TEXT NOT NULL UNIQUE
);
CREATE TABLE master_grocery_items (
master_grocery_item_id BIGINT PRIMARY KEY,
name TEXT,
category_id BIGINT REFERENCES categories(category_id) -- Direct reference
);
API Layer (Violates 3NF Principle):
// ❌ API accepts category name instead of ID
POST /api/users/watched-items
{
"itemName": "Milk",
"category": "Dairy & Eggs" // String! Should be category_id
}
// Service layer must denormalize by doing lookup:
SELECT category_id FROM categories WHERE name = $1
This creates a transitive dependency in the application layer:
watched_item→category_name→category_id- Instead of direct:
watched_item→category_id
4. Boyce-Codd Normal Form (BCNF) ✅ Currently Satisfied
Rule: Every determinant is a candidate key (stricter version of 3NF).
Status: ✅ Compliant
- All foreign key references use primary keys
- No non-trivial functional dependencies where determinant is not a superkey
5. Fourth Normal Form (4NF) ✅ Currently Satisfied
Rule: No multi-valued dependencies; a record should not contain independent multi-valued facts.
Status: ✅ Compliant
- Junction tables properly separate many-to-many relationships
- Examples:
user_watched_items,shopping_list_items,recipe_ingredients
6. Fifth Normal Form (5NF) ✅ Currently Satisfied
Rule: No join dependencies; tables cannot be decomposed further without loss of information.
Status: ✅ Compliant (as far as schema design goes)
Impact of API Violation
1. Brittleness
// Test fails because of exact string matching:
addWatchedItem('Milk', 'Dairy'); // ❌ Fails - not exact match
addWatchedItem('Milk', 'Dairy & Eggs'); // ✅ Works - exact match
addWatchedItem('Milk', 'dairy & eggs'); // ❌ Fails - case sensitive
2. No Discovery Mechanism
- No API endpoint to list available categories
- Frontend cannot dynamically populate dropdowns
- Clients must hardcode category names
3. Performance Penalty
-- Current: String lookup on every request
SELECT category_id FROM categories WHERE name = $1; -- Full table scan or index scan
-- Should be: Direct ID reference (no lookup needed)
INSERT INTO master_grocery_items (name, category_id) VALUES ($1, $2);
4. Impossible Localization
- Cannot translate category names without breaking API
- Category names are hardcoded in English
5. Maintenance Burden
- Renaming a category breaks all API clients
- Must coordinate name changes across frontend, tests, and documentation
Decision
We adopt the following principles for all API design:
1. Use Numerical IDs for All Foreign Key References
Rule: APIs MUST accept numerical IDs when creating relationships between entities.
// ✅ CORRECT: Use IDs
POST /api/users/watched-items
{
"itemName": "Milk",
"category_id": 3 // Numerical ID
}
// ❌ INCORRECT: Use strings
POST /api/users/watched-items
{
"itemName": "Milk",
"category": "Dairy & Eggs" // String name
}
2. Provide Discovery Endpoints
Rule: For any entity referenced by ID, provide a GET endpoint to list available options.
// Required: Category discovery endpoint
GET / api / categories;
Response: [
{ category_id: 1, name: 'Fruits & Vegetables' },
{ category_id: 2, name: 'Meat & Seafood' },
{ category_id: 3, name: 'Dairy & Eggs' },
];
3. Support Lookup by Name (Optional)
Rule: If convenient, provide query parameters for name-based lookup, but use IDs internally.
// Optional: Convenience endpoint
GET /api/categories?name=Dairy%20%26%20Eggs
Response: { "category_id": 3, "name": "Dairy & Eggs" }
4. Return Full Objects in Responses
Rule: API responses SHOULD include denormalized data for convenience, but inputs MUST use IDs.
// ✅ Response includes category details
GET / api / users / watched - items;
Response: [
{
master_grocery_item_id: 42,
name: 'Milk',
category_id: 3,
category: {
// ✅ Include full object in response
category_id: 3,
name: 'Dairy & Eggs',
},
},
];
Affected Areas
Immediate Violations (Must Fix)
-
User Watched Items (src/routes/user.routes.ts:76)
- Currently:
category: string - Should be:
category_id: number
- Currently:
-
Service Layer (src/services/db/personalization.db.ts:175)
- Currently:
categoryName: string - Should be:
categoryId: number
- Currently:
-
API Client (src/services/apiClient.ts:436)
- Currently:
category: string - Should be:
category_id: number
- Currently:
-
Frontend Hooks (src/hooks/mutations/useAddWatchedItemMutation.ts:9)
- Currently:
category?: string - Should be:
category_id: number
- Currently:
Potential Violations (Review Required)
-
UPC/Barcode System (src/types/upc.ts:85)
- Uses
category: string | null - May be appropriate if category is free-form user input
- Uses
-
AI Extraction (src/types/ai.ts:21)
- Uses
category_name: z.string() - AI extracts category names, needs mapping to IDs
- Uses
-
Flyer Data Transformer (src/services/flyerDataTransformer.ts:40)
- Uses
category_name: string - May need category matching/creation logic
- Uses
Migration Strategy
See research-category-id-migration.md for detailed migration plan.
High-level approach:
-
Phase 1: Add category discovery endpoint (non-breaking)
GET /api/categories- No API changes yet
-
Phase 2: Support both formats (non-breaking)
- Accept both
category(string) andcategory_id(number) - Deprecate string format with warning logs
- Accept both
-
Phase 3: Remove string support (breaking change, major version bump)
- Only accept
category_id - Update all clients and tests
- Only accept
Consequences
Positive
- ✅ API matches database schema design
- ✅ More robust (no typo-based failures)
- ✅ Better performance (no string lookups)
- ✅ Enables localization
- ✅ Discoverable via REST API
- ✅ Follows REST best practices
Negative
- ⚠️ Breaking change for existing API consumers
- ⚠️ Requires client updates
- ⚠️ More complex migration path
Neutral
- Frontend must fetch categories before displaying form
- Slightly more initial API calls (one-time category fetch)
References
Related Decisions
Approval
- Proposed by: Claude Code (via user observation)
- Date: 2026-01-19
- Status: Accepted (pending implementation)