Files
flyer-crawler.projectium.com/docs/adr/0023-database-normalization-and-referential-integrity.md
Torben Sorensen 4022768c03
All checks were successful
Deploy to Test Environment / deploy-to-test (push) Successful in 18m39s
set up local e2e tests, and some e2e test fixes + docs on more db fixin - ugh
2026-01-19 13:45:21 -08:00

353 lines
10 KiB
Markdown

# ADR-023: Database Normalization and Referential Integrity
**Date:** 2026-01-19
**Status:** Accepted
**Context:** API design violates database normalization principles
## Problem Statement
The application's API layer currently accepts string-based references (category names) instead of numerical IDs when creating relationships between entities. This violates database normalization principles and creates a brittle, error-prone API contract.
**Example of Current Problem:**
```typescript
// API accepts string:
POST /api/users/watched-items
{ "itemName": "Milk", "category": "Dairy & Eggs" } // ❌ String reference
// But database uses normalized foreign keys:
CREATE TABLE master_grocery_items (
category_id BIGINT REFERENCES categories(category_id) -- Proper FK
)
```
This mismatch forces the service layer to perform string lookups on every request:
```typescript
// Service must do string matching:
const categoryRes = await client.query(
'SELECT category_id FROM categories WHERE name = $1',
[categoryName], // ❌ Error-prone string matching
);
```
## Database Normal Forms (In Order of Importance)
### 1. First Normal Form (1NF) ✅ Currently Satisfied
**Rule:** Each column contains atomic values; no repeating groups.
**Status:****Compliant**
- All columns contain single values
- No arrays or delimited strings in columns
- Each row is uniquely identifiable
**Example:**
```sql
-- ✅ Good: Atomic values
CREATE TABLE master_grocery_items (
master_grocery_item_id BIGINT PRIMARY KEY,
name TEXT,
category_id BIGINT
);
-- ❌ Bad: Non-atomic values (violates 1NF)
CREATE TABLE items (
id BIGINT,
categories TEXT -- "Dairy,Frozen,Snacks" (comma-delimited)
);
```
### 2. Second Normal Form (2NF) ✅ Currently Satisfied
**Rule:** No partial dependencies; all non-key columns depend on the entire primary key.
**Status:****Compliant**
- All tables use single-column primary keys (no composite keys)
- All non-key columns depend on the entire primary key
**Example:**
```sql
-- ✅ Good: All columns depend on full primary key
CREATE TABLE flyer_items (
flyer_item_id BIGINT PRIMARY KEY,
flyer_id BIGINT, -- Depends on flyer_item_id
master_item_id BIGINT, -- Depends on flyer_item_id
price_in_cents INT -- Depends on flyer_item_id
);
-- ❌ Bad: Partial dependency (violates 2NF)
CREATE TABLE flyer_items (
flyer_id BIGINT,
item_id BIGINT,
store_name TEXT, -- Depends only on flyer_id, not (flyer_id, item_id)
PRIMARY KEY (flyer_id, item_id)
);
```
### 3. Third Normal Form (3NF) ⚠️ VIOLATED IN API LAYER
**Rule:** No transitive dependencies; non-key columns depend only on the primary key, not on other non-key columns.
**Status:** ⚠️ **Database is compliant, but API layer violates this principle**
**Database Schema (Correct):**
```sql
-- ✅ Categories are normalized
CREATE TABLE categories (
category_id BIGINT PRIMARY KEY,
name TEXT NOT NULL UNIQUE
);
CREATE TABLE master_grocery_items (
master_grocery_item_id BIGINT PRIMARY KEY,
name TEXT,
category_id BIGINT REFERENCES categories(category_id) -- Direct reference
);
```
**API Layer (Violates 3NF Principle):**
```typescript
// ❌ API accepts category name instead of ID
POST /api/users/watched-items
{
"itemName": "Milk",
"category": "Dairy & Eggs" // String! Should be category_id
}
// Service layer must denormalize by doing lookup:
SELECT category_id FROM categories WHERE name = $1
```
This creates a **transitive dependency** in the application layer:
- `watched_item``category_name``category_id`
- Instead of direct: `watched_item``category_id`
### 4. Boyce-Codd Normal Form (BCNF) ✅ Currently Satisfied
**Rule:** Every determinant is a candidate key (stricter version of 3NF).
**Status:****Compliant**
- All foreign key references use primary keys
- No non-trivial functional dependencies where determinant is not a superkey
### 5. Fourth Normal Form (4NF) ✅ Currently Satisfied
**Rule:** No multi-valued dependencies; a record should not contain independent multi-valued facts.
**Status:****Compliant**
- Junction tables properly separate many-to-many relationships
- Examples: `user_watched_items`, `shopping_list_items`, `recipe_ingredients`
### 6. Fifth Normal Form (5NF) ✅ Currently Satisfied
**Rule:** No join dependencies; tables cannot be decomposed further without loss of information.
**Status:****Compliant** (as far as schema design goes)
## Impact of API Violation
### 1. Brittleness
```typescript
// Test fails because of exact string matching:
addWatchedItem('Milk', 'Dairy'); // ❌ Fails - not exact match
addWatchedItem('Milk', 'Dairy & Eggs'); // ✅ Works - exact match
addWatchedItem('Milk', 'dairy & eggs'); // ❌ Fails - case sensitive
```
### 2. No Discovery Mechanism
- No API endpoint to list available categories
- Frontend cannot dynamically populate dropdowns
- Clients must hardcode category names
### 3. Performance Penalty
```sql
-- Current: String lookup on every request
SELECT category_id FROM categories WHERE name = $1; -- Full table scan or index scan
-- Should be: Direct ID reference (no lookup needed)
INSERT INTO master_grocery_items (name, category_id) VALUES ($1, $2);
```
### 4. Impossible Localization
- Cannot translate category names without breaking API
- Category names are hardcoded in English
### 5. Maintenance Burden
- Renaming a category breaks all API clients
- Must coordinate name changes across frontend, tests, and documentation
## Decision
**We adopt the following principles for all API design:**
### 1. Use Numerical IDs for All Foreign Key References
**Rule:** APIs MUST accept numerical IDs when creating relationships between entities.
```typescript
// ✅ CORRECT: Use IDs
POST /api/users/watched-items
{
"itemName": "Milk",
"category_id": 3 // Numerical ID
}
// ❌ INCORRECT: Use strings
POST /api/users/watched-items
{
"itemName": "Milk",
"category": "Dairy & Eggs" // String name
}
```
### 2. Provide Discovery Endpoints
**Rule:** For any entity referenced by ID, provide a GET endpoint to list available options.
```typescript
// Required: Category discovery endpoint
GET / api / categories;
Response: [
{ category_id: 1, name: 'Fruits & Vegetables' },
{ category_id: 2, name: 'Meat & Seafood' },
{ category_id: 3, name: 'Dairy & Eggs' },
];
```
### 3. Support Lookup by Name (Optional)
**Rule:** If convenient, provide query parameters for name-based lookup, but use IDs internally.
```typescript
// Optional: Convenience endpoint
GET /api/categories?name=Dairy%20%26%20Eggs
Response: { "category_id": 3, "name": "Dairy & Eggs" }
```
### 4. Return Full Objects in Responses
**Rule:** API responses SHOULD include denormalized data for convenience, but inputs MUST use IDs.
```typescript
// ✅ Response includes category details
GET / api / users / watched - items;
Response: [
{
master_grocery_item_id: 42,
name: 'Milk',
category_id: 3,
category: {
// ✅ Include full object in response
category_id: 3,
name: 'Dairy & Eggs',
},
},
];
```
## Affected Areas
### Immediate Violations (Must Fix)
1. **User Watched Items** ([src/routes/user.routes.ts:76](../../src/routes/user.routes.ts))
- Currently: `category: string`
- Should be: `category_id: number`
2. **Service Layer** ([src/services/db/personalization.db.ts:175](../../src/services/db/personalization.db.ts))
- Currently: `categoryName: string`
- Should be: `categoryId: number`
3. **API Client** ([src/services/apiClient.ts:436](../../src/services/apiClient.ts))
- Currently: `category: string`
- Should be: `category_id: number`
4. **Frontend Hooks** ([src/hooks/mutations/useAddWatchedItemMutation.ts:9](../../src/hooks/mutations/useAddWatchedItemMutation.ts))
- Currently: `category?: string`
- Should be: `category_id: number`
### Potential Violations (Review Required)
1. **UPC/Barcode System** ([src/types/upc.ts:85](../../src/types/upc.ts))
- Uses `category: string | null`
- May be appropriate if category is free-form user input
2. **AI Extraction** ([src/types/ai.ts:21](../../src/types/ai.ts))
- Uses `category_name: z.string()`
- AI extracts category names, needs mapping to IDs
3. **Flyer Data Transformer** ([src/services/flyerDataTransformer.ts:40](../../src/services/flyerDataTransformer.ts))
- Uses `category_name: string`
- May need category matching/creation logic
## Migration Strategy
See [research-category-id-migration.md](../research-category-id-migration.md) for detailed migration plan.
**High-level approach:**
1. **Phase 1: Add category discovery endpoint** (non-breaking)
- `GET /api/categories`
- No API changes yet
2. **Phase 2: Support both formats** (non-breaking)
- Accept both `category` (string) and `category_id` (number)
- Deprecate string format with warning logs
3. **Phase 3: Remove string support** (breaking change, major version bump)
- Only accept `category_id`
- Update all clients and tests
## Consequences
### Positive
- ✅ API matches database schema design
- ✅ More robust (no typo-based failures)
- ✅ Better performance (no string lookups)
- ✅ Enables localization
- ✅ Discoverable via REST API
- ✅ Follows REST best practices
### Negative
- ⚠️ Breaking change for existing API consumers
- ⚠️ Requires client updates
- ⚠️ More complex migration path
### Neutral
- Frontend must fetch categories before displaying form
- Slightly more initial API calls (one-time category fetch)
## References
- [Database Normalization (Wikipedia)](https://en.wikipedia.org/wiki/Database_normalization)
- [REST API Design Best Practices](https://stackoverflow.blog/2020/03/02/best-practices-for-rest-api-design/)
- [PostgreSQL Foreign Keys](https://www.postgresql.org/docs/current/ddl-constraints.html#DDL-CONSTRAINTS-FK)
## Related Decisions
- [ADR-001: Database Schema Design](./0001-database-schema-design.md) (if exists)
- [ADR-014: Containerization and Deployment Strategy](./0014-containerization-and-deployment-strategy.md)
## Approval
- **Proposed by:** Claude Code (via user observation)
- **Date:** 2026-01-19
- **Status:** Accepted (pending implementation)