Files
flyer-crawler.projectium.com/docs/research-category-id-migration.md
Torben Sorensen 4022768c03
All checks were successful
Deploy to Test Environment / deploy-to-test (push) Successful in 18m39s
set up local e2e tests, and some e2e test fixes + docs on more db fixin - ugh
2026-01-19 13:45:21 -08:00

1030 lines
27 KiB
Markdown

# Research: Migration Plan for Category ID References
**Date:** 2026-01-19
**Status:** Planning
**Related:** [ADR-023: Database Normalization and Referential Integrity](adr/0023-database-normalization-and-referential-integrity.md)
## Executive Summary
The API currently accepts category names (strings) instead of category IDs (numbers) when creating watched items. This violates database normalization principles (3NF) and creates a brittle API. This document outlines a comprehensive, phased migration plan to fix this issue across the entire codebase.
**Estimated Effort:** 2-3 weeks (including testing and deployment)
**Breaking Changes:** Yes (API contract changes)
**Recommended Version Bump:** Major (0.11.14 → 1.0.0 or 0.12.0)
---
## Current State Analysis
### Files Using String-Based Category References
| File | Line | Current Type | Usage |
| -------------------------------------------------- | ------------ | ------------------------------- | ------------------------ |
| `src/routes/user.routes.ts` | 76 | `category: string` | API validation schema |
| `src/services/db/personalization.db.ts` | 175 | `categoryName: string` | Service method parameter |
| `src/services/apiClient.ts` | 436 | `category: string` | API client function |
| `src/hooks/mutations/useAddWatchedItemMutation.ts` | 9 | `category?: string` | Frontend mutation hook |
| `src/types/upc.ts` | 85, 107, 201 | `category: string \| null` | UPC types |
| `src/types/ai.ts` | 21 | `category_name: z.string()` | AI extraction schema |
| `src/services/flyerDataTransformer.ts` | 40 | `category_name: string` | Flyer processing |
| `src/services/db/expiry.db.ts` | 51 | `category_name: string \| null` | Expiry tracking |
| `src/services/db/upc.db.ts` | 36, 347 | `category: string \| null` | UPC database |
### Database Schema (Current - Correct)
```sql
-- Categories table (normalized)
CREATE TABLE categories (
category_id BIGINT PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
name TEXT NOT NULL UNIQUE,
created_at TIMESTAMPTZ DEFAULT now() NOT NULL,
updated_at TIMESTAMPTZ DEFAULT now() NOT NULL
);
-- Master items reference category by ID (correct)
CREATE TABLE master_grocery_items (
master_grocery_item_id BIGINT PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
name TEXT NOT NULL,
category_id BIGINT REFERENCES categories(category_id), -- ✅ Proper FK
-- ...
);
```
### Current API Behavior
```typescript
// API Request (current - problematic)
POST /api/users/watched-items
{
"itemName": "Milk",
"category": "Dairy & Eggs" // ❌ String
}
// Service performs string lookup
const categoryRes = await client.query(
'SELECT category_id FROM categories WHERE name = $1',
[categoryName] // ❌ String matching on every request
);
// Then uses ID internally
INSERT INTO master_grocery_items (name, category_id) VALUES ($1, $2)
```
---
## Migration Plan
### Phase 1: Add Category Discovery (Non-Breaking)
**Duration:** 2-3 days
**Risk:** Low
**Breaking:** No
#### 1.1. Create Categories API Route
**File:** `src/routes/category.routes.ts` (new file)
```typescript
// src/routes/category.routes.ts
import { Router } from 'express';
import { CategoryDbService } from '../services/db/category.db';
import { asyncHandler } from '../middleware/asyncHandler';
import { getLogger } from '../services/logger.server';
const router = Router();
const logger = getLogger();
/**
* @swagger
* /api/categories:
* get:
* summary: List all available grocery categories
* tags: [Categories]
* responses:
* 200:
* description: List of categories
* content:
* application/json:
* schema:
* type: object
* properties:
* success:
* type: boolean
* data:
* type: array
* items:
* type: object
* properties:
* category_id:
* type: integer
* name:
* type: string
*/
router.get(
'/',
asyncHandler(async (req, res) => {
const categories = await CategoryDbService.getAllCategories(logger);
res.json({
success: true,
data: categories,
});
}),
);
/**
* @swagger
* /api/categories/{id}:
* get:
* summary: Get a specific category by ID
* tags: [Categories]
* parameters:
* - in: path
* name: id
* required: true
* schema:
* type: integer
* responses:
* 200:
* description: Category details
* 404:
* description: Category not found
*/
router.get(
'/:id',
asyncHandler(async (req, res) => {
const categoryId = parseInt(req.params.id);
const category = await CategoryDbService.getCategoryById(categoryId, logger);
if (!category) {
return res.status(404).json({
success: false,
error: 'Category not found',
});
}
res.json({
success: true,
data: category,
});
}),
);
/**
* @swagger
* /api/categories/lookup:
* get:
* summary: Lookup category by name (for migration support)
* tags: [Categories]
* parameters:
* - in: query
* name: name
* required: true
* schema:
* type: string
* responses:
* 200:
* description: Category found
* 404:
* description: Category not found
*/
router.get(
'/lookup',
asyncHandler(async (req, res) => {
const name = req.query.name as string;
if (!name) {
return res.status(400).json({
success: false,
error: 'Query parameter "name" is required',
});
}
const category = await CategoryDbService.getCategoryByName(name, logger);
if (!category) {
return res.status(404).json({
success: false,
error: `Category '${name}' not found`,
});
}
res.json({
success: true,
data: category,
});
}),
);
export default router;
```
#### 1.2. Create Category Database Service
**File:** `src/services/db/category.db.ts` (new file)
```typescript
// src/services/db/category.db.ts
import { Logger } from 'pino';
import { getPool } from './connection.db';
import { handleDbError } from '../errorHandler';
export interface Category {
category_id: number;
name: string;
created_at: Date;
updated_at: Date;
}
export class CategoryDbService {
/**
* Get all categories ordered by name
*/
static async getAllCategories(logger: Logger): Promise<Category[]> {
const pool = getPool();
try {
const result = await pool.query<Category>(
`SELECT category_id, name, created_at, updated_at
FROM public.categories
ORDER BY name ASC`,
);
return result.rows;
} catch (error) {
handleDbError(error, logger, 'Error fetching all categories');
throw error;
}
}
/**
* Get category by ID
*/
static async getCategoryById(categoryId: number, logger: Logger): Promise<Category | null> {
const pool = getPool();
try {
const result = await pool.query<Category>(
`SELECT category_id, name, created_at, updated_at
FROM public.categories
WHERE category_id = $1`,
[categoryId],
);
return result.rows[0] || null;
} catch (error) {
handleDbError(error, logger, 'Error fetching category by ID', { categoryId });
throw error;
}
}
/**
* Get category by name (case-insensitive)
*/
static async getCategoryByName(name: string, logger: Logger): Promise<Category | null> {
const pool = getPool();
try {
const result = await pool.query<Category>(
`SELECT category_id, name, created_at, updated_at
FROM public.categories
WHERE LOWER(name) = LOWER($1)`,
[name],
);
return result.rows[0] || null;
} catch (error) {
handleDbError(error, logger, 'Error fetching category by name', { name });
throw error;
}
}
}
```
#### 1.3. Register Category Routes
**File:** `server.ts`
```typescript
// Add to server.ts imports
import categoryRoutes from './src/routes/category.routes';
// Add to route registrations
app.use('/api/categories', categoryRoutes);
```
#### 1.4. Testing Phase 1
**Create:** `src/tests/integration/category.routes.test.ts`
```typescript
// src/tests/integration/category.routes.test.ts
import { describe, it, expect, beforeAll } from 'vitest';
import supertest from 'supertest';
import { app } from '../../server';
describe('Category API Routes', () => {
let request: supertest.SuperTest<supertest.Test>;
beforeAll(() => {
request = supertest(app);
});
describe('GET /api/categories', () => {
it('should return list of all categories', async () => {
const response = await request.get('/api/categories');
expect(response.status).toBe(200);
expect(response.body.success).toBe(true);
expect(Array.isArray(response.body.data)).toBe(true);
expect(response.body.data.length).toBeGreaterThan(0);
// Verify category structure
const firstCategory = response.body.data[0];
expect(firstCategory).toHaveProperty('category_id');
expect(firstCategory).toHaveProperty('name');
expect(typeof firstCategory.category_id).toBe('number');
expect(typeof firstCategory.name).toBe('string');
});
it('should return categories in alphabetical order', async () => {
const response = await request.get('/api/categories');
const categories = response.body.data;
for (let i = 1; i < categories.length; i++) {
expect(categories[i].name >= categories[i - 1].name).toBe(true);
}
});
});
describe('GET /api/categories/:id', () => {
it('should return specific category by ID', async () => {
// First get all categories to find a valid ID
const listResponse = await request.get('/api/categories');
const firstCategory = listResponse.body.data[0];
const response = await request.get(`/api/categories/${firstCategory.category_id}`);
expect(response.status).toBe(200);
expect(response.body.success).toBe(true);
expect(response.body.data.category_id).toBe(firstCategory.category_id);
expect(response.body.data.name).toBe(firstCategory.name);
});
it('should return 404 for non-existent category ID', async () => {
const response = await request.get('/api/categories/999999');
expect(response.status).toBe(404);
expect(response.body.success).toBe(false);
});
});
describe('GET /api/categories/lookup', () => {
it('should find category by exact name', async () => {
const response = await request.get('/api/categories/lookup?name=Dairy & Eggs');
expect(response.status).toBe(200);
expect(response.body.success).toBe(true);
expect(response.body.data.name).toBe('Dairy & Eggs');
});
it('should find category by case-insensitive name', async () => {
const response = await request.get('/api/categories/lookup?name=dairy & eggs');
expect(response.status).toBe(200);
expect(response.body.success).toBe(true);
expect(response.body.data.name).toBe('Dairy & Eggs');
});
it('should return 404 for non-existent category name', async () => {
const response = await request.get('/api/categories/lookup?name=NonExistent');
expect(response.status).toBe(404);
expect(response.body.success).toBe(false);
});
it('should return 400 if name parameter is missing', async () => {
const response = await request.get('/api/categories/lookup');
expect(response.status).toBe(400);
expect(response.body.success).toBe(false);
});
});
});
```
**Update E2E test:** `src/tests/e2e/deals-journey.e2e.test.ts`
```typescript
// Add category fetch at start of test
const categoriesResponse = await authedFetch('/categories', {
method: 'GET',
});
const categories = await categoriesResponse.json();
const dairyCategory = categories.data.find((c) => c.name === 'Dairy & Eggs');
const bakeryCategory = categories.data.find((c) => c.name === 'Bakery & Bread');
// Still use string for now (backward compatible)
const watchItem1Response = await authedFetch('/users/watched-items', {
method: 'POST',
token: authToken,
body: JSON.stringify({
itemName: 'E2E Milk 2%',
category: 'Dairy & Eggs', // Still works
}),
});
```
#### 1.5. Documentation
Update `docs/api/README.md` or Swagger docs with new `/api/categories` endpoint.
---
### Phase 2: Support Both Formats (Non-Breaking)
**Duration:** 3-5 days
**Risk:** Medium
**Breaking:** No (backward compatible)
#### 2.1. Update API Validation Schema
**File:** `src/routes/user.routes.ts`
```typescript
// Update schema to accept both formats
const addWatchedItemSchema = z.object({
body: z
.object({
itemName: requiredString("Field 'itemName' is required."),
// Accept either category (string, deprecated) or category_id (number, preferred)
category: z.string().trim().optional(),
category_id: z.number().int().positive().optional(),
})
.refine((data) => data.category || data.category_id, {
message: "Either 'category' or 'category_id' must be provided",
path: ['category'],
}),
});
```
#### 2.2. Update Service Layer
**File:** `src/services/db/personalization.db.ts`
```typescript
/**
* Add a master grocery item to a user's watchlist.
*
* @param userId - UUID of the user
* @param itemName - Name of the grocery item
* @param categoryIdOrName - Either category ID (number) or name (string, deprecated)
* @param logger - Pino logger instance
* @returns A promise that resolves to the MasterGroceryItem
*
* @deprecated Passing category name as string is deprecated. Use category ID instead.
*/
async addWatchedItem(
userId: string,
itemName: string,
categoryIdOrName: number | string,
logger: Logger,
): Promise<MasterGroceryItem> {
try {
return await withTransaction(async (client) => {
let categoryId: number;
// Handle both ID and name (with deprecation warning)
if (typeof categoryIdOrName === 'number') {
categoryId = categoryIdOrName;
} else {
// Legacy string-based lookup (deprecated)
logger.warn(
{ categoryName: categoryIdOrName, userId, itemName },
'DEPRECATED: Using category name instead of ID. Please update to use category_id.'
);
const categoryRes = await client.query<{ category_id: number }>(
'SELECT category_id FROM public.categories WHERE name = $1',
[categoryIdOrName],
);
categoryId = categoryRes.rows[0]?.category_id;
if (!categoryId) {
throw new Error(`Category '${categoryIdOrName}' not found.`);
}
}
// Validate category ID exists
const categoryCheck = await client.query(
'SELECT 1 FROM public.categories WHERE category_id = $1',
[categoryId]
);
if (categoryCheck.rows.length === 0) {
throw new Error(`Category ID ${categoryId} does not exist.`);
}
// Find or create master item
let masterItem: MasterGroceryItem;
const masterItemRes = await client.query<MasterGroceryItem>(
'SELECT * FROM public.master_grocery_items WHERE name = $1',
[itemName],
);
if (masterItemRes.rows.length > 0) {
masterItem = masterItemRes.rows[0];
} else {
const newMasterItemRes = await client.query<MasterGroceryItem>(
'INSERT INTO public.master_grocery_items (name, category_id) VALUES ($1, $2) RETURNING *',
[itemName, categoryId],
);
masterItem = newMasterItemRes.rows[0];
}
// Add to user's watchlist
await client.query(
'INSERT INTO public.user_watched_items (user_id, master_item_id) VALUES ($1, $2) ON CONFLICT (user_id, master_item_id) DO NOTHING',
[userId, masterItem.master_grocery_item_id],
);
return masterItem;
});
} catch (error) {
handleDbError(
error,
logger,
'Transaction error in addWatchedItem',
{ userId, itemName, categoryIdOrName },
);
throw error;
}
}
```
#### 2.3. Update Route Handler
**File:** `src/routes/user.routes.ts` (route handler)
```typescript
router.post(
'/watched-items',
validateRequest(addWatchedItemSchema),
asyncHandler(async (req: RequestWithJwt, res: Response) => {
const { itemName, category, category_id } = req.body;
const userId = req.jwt?.userId;
// Prefer category_id, fallback to category (deprecated)
const categoryIdOrName = category_id ?? category;
if (category && !category_id) {
// Log deprecation warning
req.log.warn(
{ category, userId },
'Client using deprecated "category" field. Should use "category_id".',
);
}
const item = await PersonalizationDbService.addWatchedItem(
userId,
itemName,
categoryIdOrName,
req.log,
);
res.status(201).json({
success: true,
data: item,
});
}),
);
```
#### 2.4. Update Frontend (Gradual)
**File:** `src/services/apiClient.ts`
```typescript
/**
* Add item to watched list
*
* @param itemName - Name of the item
* @param categoryIdOrName - Category ID (preferred) or name (deprecated)
*/
export const addWatchedItem = async (
itemName: string,
categoryIdOrName: number | string,
): Promise<Response> => {
const body: { itemName: string; category_id?: number; category?: string } = {
itemName,
};
if (typeof categoryIdOrName === 'number') {
body.category_id = categoryIdOrName;
} else {
body.category = categoryIdOrName;
}
return authenticatedFetch('/users/watched-items', {
method: 'POST',
body: JSON.stringify(body),
});
};
```
#### 2.5. Update E2E Tests (Support Both)
**File:** `src/tests/e2e/deals-journey.e2e.test.ts`
```typescript
// Fetch categories first
const categoriesResponse = await fetch(`${API_BASE_URL}/categories`);
const categoriesData = await categoriesResponse.json();
const dairyCategory = categoriesData.data.find((c) => c.name === 'Dairy & Eggs');
// Use category_id (new format)
const watchItem1Response = await authedFetch('/users/watched-items', {
method: 'POST',
token: authToken,
body: JSON.stringify({
itemName: 'E2E Milk 2%',
category_id: dairyCategory.category_id, // ✅ Use ID
}),
});
```
#### 2.6. Testing Phase 2
Create integration tests that verify both formats work:
```typescript
describe('POST /api/users/watched-items (dual format)', () => {
it('should accept category_id (new format)', async () => {
const response = await authenticatedRequest.post('/api/users/watched-items').send({
itemName: 'Test Item',
category_id: 3,
});
expect(response.status).toBe(201);
});
it('should accept category name (deprecated format)', async () => {
const response = await authenticatedRequest.post('/api/users/watched-items').send({
itemName: 'Test Item',
category: 'Dairy & Eggs',
});
expect(response.status).toBe(201);
// Should log deprecation warning
});
it('should prefer category_id when both provided', async () => {
const response = await authenticatedRequest.post('/api/users/watched-items').send({
itemName: 'Test Item',
category_id: 3,
category: 'Wrong Name', // Should be ignored
});
expect(response.status).toBe(201);
});
});
```
---
### Phase 3: Remove String Support (Breaking Change)
**Duration:** 2-3 days
**Risk:** High
**Breaking:** Yes (requires version bump)
#### 3.1. Version Bump
Update `package.json`:
```json
{
"version": "1.0.0" // or "0.12.0" depending on semver strategy
}
```
#### 3.2. Remove Deprecated Code
**File:** `src/routes/user.routes.ts`
```typescript
// Remove deprecated category field
const addWatchedItemSchema = z.object({
body: z.object({
itemName: requiredString("Field 'itemName' is required."),
category_id: z.number().int().positive("Field 'category_id' must be a positive integer"),
}),
});
```
**File:** `src/services/db/personalization.db.ts`
```typescript
// Remove string overload
async addWatchedItem(
userId: string,
itemName: string,
categoryId: number, // Only accept number
logger: Logger,
): Promise<MasterGroceryItem> {
try {
return await withTransaction(async (client) => {
// Validate category exists
const categoryCheck = await client.query(
'SELECT 1 FROM public.categories WHERE category_id = $1',
[categoryId]
);
if (categoryCheck.rows.length === 0) {
throw new Error(`Category ID ${categoryId} does not exist.`);
}
// ... rest of implementation (no string lookup)
});
} catch (error) {
handleDbError(error, logger, 'Transaction error in addWatchedItem', {
userId,
itemName,
categoryId,
});
throw error;
}
}
```
#### 3.3. Update All Client Code
**File:** `src/services/apiClient.ts`
```typescript
// Remove string support
export const addWatchedItem = async (
itemName: string,
categoryId: number, // Only number
): Promise<Response> => {
return authenticatedFetch('/users/watched-items', {
method: 'POST',
body: JSON.stringify({
itemName,
category_id: categoryId,
}),
});
};
```
**File:** `src/hooks/mutations/useAddWatchedItemMutation.ts`
```typescript
interface AddWatchedItemParams {
itemName: string;
category_id: number; // Only number
}
```
#### 3.4. Update All Tests
Update all E2E and integration tests to use `category_id`:
```typescript
// All tests must fetch category IDs
const categoriesResponse = await fetch(`${API_BASE_URL}/categories`);
const categories = await categoriesResponse.json();
const watchItem1Response = await authedFetch('/users/watched-items', {
method: 'POST',
token: authToken,
body: JSON.stringify({
itemName: 'E2E Milk 2%',
category_id: categories.data.find((c) => c.name === 'Dairy & Eggs').category_id,
}),
});
```
#### 3.5. Update API Documentation
- Update Swagger/OpenAPI specs
- Update README and API docs
- Add migration guide for API consumers
#### 3.6. Deployment
1. Deploy Phase 3 to test environment
2. Run full E2E test suite
3. Verify all tests pass
4. Deploy to production with major version bump
5. Communicate breaking change to stakeholders
---
## Similar Issues to Address
### 1. UPC/Barcode System
**Current:**
```typescript
// src/types/upc.ts
export interface UpcLookupResult {
category: string | null; // ❌ String
}
```
**Decision Needed:**
- Is this user-provided free-form text? → Keep as string
- Is this from a predefined list? → Change to category_id
**Recommendation:** If categories are predefined, change to `category_id: number | null`
### 2. AI Extraction
**Current:**
```typescript
// src/types/ai.ts
export const FlyerItemSchema = z.object({
category_name: z.string().nullish(),
});
```
**Approach:**
- AI extracts category name (string)
- Service layer maps to category_id using fuzzy matching
- Store category_id in database
**Implementation:**
```typescript
// New: AI Category Mapper Service
export class AiCategoryMapper {
static async mapCategoryName(
categoryName: string | null,
logger: Logger,
): Promise<number | null> {
if (!categoryName) return null;
// Try exact match first
let category = await CategoryDbService.getCategoryByName(categoryName, logger);
// If no match, try fuzzy matching or default to "Other/Miscellaneous"
if (!category) {
logger.warn({ categoryName }, 'AI extracted unknown category, using default');
category = await CategoryDbService.getCategoryByName('Other/Miscellaneous', logger);
}
return category?.category_id || null;
}
}
```
### 3. Flyer Data Transformer
**Current:**
```typescript
// src/services/flyerDataTransformer.ts
category_name: String(item.category_name ?? '').trim() || 'Other/Miscellaneous',
```
**Should map to category_id:**
```typescript
const categoryId = await AiCategoryMapper.mapCategoryName(item.category_name, logger);
```
---
## Testing Strategy
### Unit Tests
- `CategoryDbService` - all CRUD operations
- `AiCategoryMapper` - fuzzy matching logic
- Validation schemas
### Integration Tests
- Category routes (all endpoints)
- Watched items with category_id
- Backward compatibility (Phase 2 only)
### E2E Tests
- Full user journey with category selection
- All 11 E2E tests updated to use category_id
### Manual Testing Checklist
- [ ] Frontend category dropdown populated from API
- [ ] Creating watched item with category_id
- [ ] Viewing watched items shows category name
- [ ] AI extraction maps categories correctly
- [ ] Error handling for invalid category_id
---
## Rollback Plan
### Phase 1 Rollback
**Risk:** Very low
**Steps:**
1. Remove category routes from `server.ts`
2. Delete `src/routes/category.routes.ts`
3. Delete `src/services/db/category.db.ts`
4. Redeploy previous version
### Phase 2 Rollback
**Risk:** Low (backward compatible)
**Steps:**
1. Revert changes to validation schemas
2. Revert service layer to string-only
3. Clients using category_id will break, but old clients still work
### Phase 3 Rollback
**Risk:** High (breaking change)
**Steps:**
1. Revert to Phase 2 code (dual support)
2. Redeploy with version downgrade
3. Communicate rollback to API consumers
---
## Communication Plan
### Before Phase 1
- Announce new `/api/categories` endpoint
- Encourage clients to start fetching categories
### Before Phase 2
- Announce deprecation of `category` field
- Recommend migration to `category_id`
- Provide timeline for Phase 3
### Before Phase 3
- Final warning: 2 weeks before deployment
- Publish migration guide
- Offer support for client updates
### After Phase 3
- Announce breaking change deployed
- Monitor error rates and Bugsink for issues
- Provide rollback plan if needed
---
## Success Criteria
- [ ] All E2E tests pass with category_id
- [ ] No performance regression
- [ ] API response times improved (no string lookups)
- [ ] Zero Bugsink errors related to category issues
- [ ] Frontend displays categories correctly
- [ ] All documentation updated
---
## Timeline
| Phase | Duration | Start | End |
| --------------------------- | -------------- | ------ | ------ |
| Phase 1: Discovery Endpoint | 2-3 days | Week 1 | Week 1 |
| Phase 1: Testing | 1 day | Week 1 | Week 1 |
| Phase 2: Dual Support | 3-5 days | Week 2 | Week 2 |
| Phase 2: Testing | 2 days | Week 2 | Week 2 |
| Phase 3: Remove Deprecated | 2-3 days | Week 3 | Week 3 |
| Phase 3: Testing & Deploy | 2 days | Week 3 | Week 3 |
| **Total** | **12-16 days** | | |
---
## Related Documents
- [ADR-023: Database Normalization and Referential Integrity](adr/0023-database-normalization-and-referential-integrity.md)
- [TESTING.md](TESTING.md)
- API Documentation (Swagger)
---
## Notes
- Consider adding category icons/colors in future
- May want to allow user-defined categories later
- Internationalization of category names needed eventually
- Category ordering/sorting preferences per user