All checks were successful
Deploy to Test Environment / deploy-to-test (push) Successful in 18m39s
353 lines
10 KiB
Markdown
353 lines
10 KiB
Markdown
# ADR-023: Database Normalization and Referential Integrity
|
|
|
|
**Date:** 2026-01-19
|
|
**Status:** Accepted
|
|
**Context:** API design violates database normalization principles
|
|
|
|
## Problem Statement
|
|
|
|
The application's API layer currently accepts string-based references (category names) instead of numerical IDs when creating relationships between entities. This violates database normalization principles and creates a brittle, error-prone API contract.
|
|
|
|
**Example of Current Problem:**
|
|
|
|
```typescript
|
|
// API accepts string:
|
|
POST /api/users/watched-items
|
|
{ "itemName": "Milk", "category": "Dairy & Eggs" } // ❌ String reference
|
|
|
|
// But database uses normalized foreign keys:
|
|
CREATE TABLE master_grocery_items (
|
|
category_id BIGINT REFERENCES categories(category_id) -- ✅ Proper FK
|
|
)
|
|
```
|
|
|
|
This mismatch forces the service layer to perform string lookups on every request:
|
|
|
|
```typescript
|
|
// Service must do string matching:
|
|
const categoryRes = await client.query(
|
|
'SELECT category_id FROM categories WHERE name = $1',
|
|
[categoryName], // ❌ Error-prone string matching
|
|
);
|
|
```
|
|
|
|
## Database Normal Forms (In Order of Importance)
|
|
|
|
### 1. First Normal Form (1NF) ✅ Currently Satisfied
|
|
|
|
**Rule:** Each column contains atomic values; no repeating groups.
|
|
|
|
**Status:** ✅ **Compliant**
|
|
|
|
- All columns contain single values
|
|
- No arrays or delimited strings in columns
|
|
- Each row is uniquely identifiable
|
|
|
|
**Example:**
|
|
|
|
```sql
|
|
-- ✅ Good: Atomic values
|
|
CREATE TABLE master_grocery_items (
|
|
master_grocery_item_id BIGINT PRIMARY KEY,
|
|
name TEXT,
|
|
category_id BIGINT
|
|
);
|
|
|
|
-- ❌ Bad: Non-atomic values (violates 1NF)
|
|
CREATE TABLE items (
|
|
id BIGINT,
|
|
categories TEXT -- "Dairy,Frozen,Snacks" (comma-delimited)
|
|
);
|
|
```
|
|
|
|
### 2. Second Normal Form (2NF) ✅ Currently Satisfied
|
|
|
|
**Rule:** No partial dependencies; all non-key columns depend on the entire primary key.
|
|
|
|
**Status:** ✅ **Compliant**
|
|
|
|
- All tables use single-column primary keys (no composite keys)
|
|
- All non-key columns depend on the entire primary key
|
|
|
|
**Example:**
|
|
|
|
```sql
|
|
-- ✅ Good: All columns depend on full primary key
|
|
CREATE TABLE flyer_items (
|
|
flyer_item_id BIGINT PRIMARY KEY,
|
|
flyer_id BIGINT, -- Depends on flyer_item_id
|
|
master_item_id BIGINT, -- Depends on flyer_item_id
|
|
price_in_cents INT -- Depends on flyer_item_id
|
|
);
|
|
|
|
-- ❌ Bad: Partial dependency (violates 2NF)
|
|
CREATE TABLE flyer_items (
|
|
flyer_id BIGINT,
|
|
item_id BIGINT,
|
|
store_name TEXT, -- Depends only on flyer_id, not (flyer_id, item_id)
|
|
PRIMARY KEY (flyer_id, item_id)
|
|
);
|
|
```
|
|
|
|
### 3. Third Normal Form (3NF) ⚠️ VIOLATED IN API LAYER
|
|
|
|
**Rule:** No transitive dependencies; non-key columns depend only on the primary key, not on other non-key columns.
|
|
|
|
**Status:** ⚠️ **Database is compliant, but API layer violates this principle**
|
|
|
|
**Database Schema (Correct):**
|
|
|
|
```sql
|
|
-- ✅ Categories are normalized
|
|
CREATE TABLE categories (
|
|
category_id BIGINT PRIMARY KEY,
|
|
name TEXT NOT NULL UNIQUE
|
|
);
|
|
|
|
CREATE TABLE master_grocery_items (
|
|
master_grocery_item_id BIGINT PRIMARY KEY,
|
|
name TEXT,
|
|
category_id BIGINT REFERENCES categories(category_id) -- Direct reference
|
|
);
|
|
```
|
|
|
|
**API Layer (Violates 3NF Principle):**
|
|
|
|
```typescript
|
|
// ❌ API accepts category name instead of ID
|
|
POST /api/users/watched-items
|
|
{
|
|
"itemName": "Milk",
|
|
"category": "Dairy & Eggs" // String! Should be category_id
|
|
}
|
|
|
|
// Service layer must denormalize by doing lookup:
|
|
SELECT category_id FROM categories WHERE name = $1
|
|
```
|
|
|
|
This creates a **transitive dependency** in the application layer:
|
|
|
|
- `watched_item` → `category_name` → `category_id`
|
|
- Instead of direct: `watched_item` → `category_id`
|
|
|
|
### 4. Boyce-Codd Normal Form (BCNF) ✅ Currently Satisfied
|
|
|
|
**Rule:** Every determinant is a candidate key (stricter version of 3NF).
|
|
|
|
**Status:** ✅ **Compliant**
|
|
|
|
- All foreign key references use primary keys
|
|
- No non-trivial functional dependencies where determinant is not a superkey
|
|
|
|
### 5. Fourth Normal Form (4NF) ✅ Currently Satisfied
|
|
|
|
**Rule:** No multi-valued dependencies; a record should not contain independent multi-valued facts.
|
|
|
|
**Status:** ✅ **Compliant**
|
|
|
|
- Junction tables properly separate many-to-many relationships
|
|
- Examples: `user_watched_items`, `shopping_list_items`, `recipe_ingredients`
|
|
|
|
### 6. Fifth Normal Form (5NF) ✅ Currently Satisfied
|
|
|
|
**Rule:** No join dependencies; tables cannot be decomposed further without loss of information.
|
|
|
|
**Status:** ✅ **Compliant** (as far as schema design goes)
|
|
|
|
## Impact of API Violation
|
|
|
|
### 1. Brittleness
|
|
|
|
```typescript
|
|
// Test fails because of exact string matching:
|
|
addWatchedItem('Milk', 'Dairy'); // ❌ Fails - not exact match
|
|
addWatchedItem('Milk', 'Dairy & Eggs'); // ✅ Works - exact match
|
|
addWatchedItem('Milk', 'dairy & eggs'); // ❌ Fails - case sensitive
|
|
```
|
|
|
|
### 2. No Discovery Mechanism
|
|
|
|
- No API endpoint to list available categories
|
|
- Frontend cannot dynamically populate dropdowns
|
|
- Clients must hardcode category names
|
|
|
|
### 3. Performance Penalty
|
|
|
|
```sql
|
|
-- Current: String lookup on every request
|
|
SELECT category_id FROM categories WHERE name = $1; -- Full table scan or index scan
|
|
|
|
-- Should be: Direct ID reference (no lookup needed)
|
|
INSERT INTO master_grocery_items (name, category_id) VALUES ($1, $2);
|
|
```
|
|
|
|
### 4. Impossible Localization
|
|
|
|
- Cannot translate category names without breaking API
|
|
- Category names are hardcoded in English
|
|
|
|
### 5. Maintenance Burden
|
|
|
|
- Renaming a category breaks all API clients
|
|
- Must coordinate name changes across frontend, tests, and documentation
|
|
|
|
## Decision
|
|
|
|
**We adopt the following principles for all API design:**
|
|
|
|
### 1. Use Numerical IDs for All Foreign Key References
|
|
|
|
**Rule:** APIs MUST accept numerical IDs when creating relationships between entities.
|
|
|
|
```typescript
|
|
// ✅ CORRECT: Use IDs
|
|
POST /api/users/watched-items
|
|
{
|
|
"itemName": "Milk",
|
|
"category_id": 3 // Numerical ID
|
|
}
|
|
|
|
// ❌ INCORRECT: Use strings
|
|
POST /api/users/watched-items
|
|
{
|
|
"itemName": "Milk",
|
|
"category": "Dairy & Eggs" // String name
|
|
}
|
|
```
|
|
|
|
### 2. Provide Discovery Endpoints
|
|
|
|
**Rule:** For any entity referenced by ID, provide a GET endpoint to list available options.
|
|
|
|
```typescript
|
|
// Required: Category discovery endpoint
|
|
GET / api / categories;
|
|
Response: [
|
|
{ category_id: 1, name: 'Fruits & Vegetables' },
|
|
{ category_id: 2, name: 'Meat & Seafood' },
|
|
{ category_id: 3, name: 'Dairy & Eggs' },
|
|
];
|
|
```
|
|
|
|
### 3. Support Lookup by Name (Optional)
|
|
|
|
**Rule:** If convenient, provide query parameters for name-based lookup, but use IDs internally.
|
|
|
|
```typescript
|
|
// Optional: Convenience endpoint
|
|
GET /api/categories?name=Dairy%20%26%20Eggs
|
|
Response: { "category_id": 3, "name": "Dairy & Eggs" }
|
|
```
|
|
|
|
### 4. Return Full Objects in Responses
|
|
|
|
**Rule:** API responses SHOULD include denormalized data for convenience, but inputs MUST use IDs.
|
|
|
|
```typescript
|
|
// ✅ Response includes category details
|
|
GET / api / users / watched - items;
|
|
Response: [
|
|
{
|
|
master_grocery_item_id: 42,
|
|
name: 'Milk',
|
|
category_id: 3,
|
|
category: {
|
|
// ✅ Include full object in response
|
|
category_id: 3,
|
|
name: 'Dairy & Eggs',
|
|
},
|
|
},
|
|
];
|
|
```
|
|
|
|
## Affected Areas
|
|
|
|
### Immediate Violations (Must Fix)
|
|
|
|
1. **User Watched Items** ([src/routes/user.routes.ts:76](../../src/routes/user.routes.ts))
|
|
- Currently: `category: string`
|
|
- Should be: `category_id: number`
|
|
|
|
2. **Service Layer** ([src/services/db/personalization.db.ts:175](../../src/services/db/personalization.db.ts))
|
|
- Currently: `categoryName: string`
|
|
- Should be: `categoryId: number`
|
|
|
|
3. **API Client** ([src/services/apiClient.ts:436](../../src/services/apiClient.ts))
|
|
- Currently: `category: string`
|
|
- Should be: `category_id: number`
|
|
|
|
4. **Frontend Hooks** ([src/hooks/mutations/useAddWatchedItemMutation.ts:9](../../src/hooks/mutations/useAddWatchedItemMutation.ts))
|
|
- Currently: `category?: string`
|
|
- Should be: `category_id: number`
|
|
|
|
### Potential Violations (Review Required)
|
|
|
|
1. **UPC/Barcode System** ([src/types/upc.ts:85](../../src/types/upc.ts))
|
|
- Uses `category: string | null`
|
|
- May be appropriate if category is free-form user input
|
|
|
|
2. **AI Extraction** ([src/types/ai.ts:21](../../src/types/ai.ts))
|
|
- Uses `category_name: z.string()`
|
|
- AI extracts category names, needs mapping to IDs
|
|
|
|
3. **Flyer Data Transformer** ([src/services/flyerDataTransformer.ts:40](../../src/services/flyerDataTransformer.ts))
|
|
- Uses `category_name: string`
|
|
- May need category matching/creation logic
|
|
|
|
## Migration Strategy
|
|
|
|
See [research-category-id-migration.md](../research-category-id-migration.md) for detailed migration plan.
|
|
|
|
**High-level approach:**
|
|
|
|
1. **Phase 1: Add category discovery endpoint** (non-breaking)
|
|
- `GET /api/categories`
|
|
- No API changes yet
|
|
|
|
2. **Phase 2: Support both formats** (non-breaking)
|
|
- Accept both `category` (string) and `category_id` (number)
|
|
- Deprecate string format with warning logs
|
|
|
|
3. **Phase 3: Remove string support** (breaking change, major version bump)
|
|
- Only accept `category_id`
|
|
- Update all clients and tests
|
|
|
|
## Consequences
|
|
|
|
### Positive
|
|
|
|
- ✅ API matches database schema design
|
|
- ✅ More robust (no typo-based failures)
|
|
- ✅ Better performance (no string lookups)
|
|
- ✅ Enables localization
|
|
- ✅ Discoverable via REST API
|
|
- ✅ Follows REST best practices
|
|
|
|
### Negative
|
|
|
|
- ⚠️ Breaking change for existing API consumers
|
|
- ⚠️ Requires client updates
|
|
- ⚠️ More complex migration path
|
|
|
|
### Neutral
|
|
|
|
- Frontend must fetch categories before displaying form
|
|
- Slightly more initial API calls (one-time category fetch)
|
|
|
|
## References
|
|
|
|
- [Database Normalization (Wikipedia)](https://en.wikipedia.org/wiki/Database_normalization)
|
|
- [REST API Design Best Practices](https://stackoverflow.blog/2020/03/02/best-practices-for-rest-api-design/)
|
|
- [PostgreSQL Foreign Keys](https://www.postgresql.org/docs/current/ddl-constraints.html#DDL-CONSTRAINTS-FK)
|
|
|
|
## Related Decisions
|
|
|
|
- [ADR-001: Database Schema Design](./0001-database-schema-design.md) (if exists)
|
|
- [ADR-014: Containerization and Deployment Strategy](./0014-containerization-and-deployment-strategy.md)
|
|
|
|
## Approval
|
|
|
|
- **Proposed by:** Claude Code (via user observation)
|
|
- **Date:** 2026-01-19
|
|
- **Status:** Accepted (pending implementation)
|