# Research: Migration Plan for Category ID References **Date:** 2026-01-19 **Status:** Planning **Related:** [ADR-023: Database Normalization and Referential Integrity](adr/0023-database-normalization-and-referential-integrity.md) ## Executive Summary The API currently accepts category names (strings) instead of category IDs (numbers) when creating watched items. This violates database normalization principles (3NF) and creates a brittle API. This document outlines a comprehensive, phased migration plan to fix this issue across the entire codebase. **Estimated Effort:** 2-3 weeks (including testing and deployment) **Breaking Changes:** Yes (API contract changes) **Recommended Version Bump:** Major (0.11.14 → 1.0.0 or 0.12.0) --- ## Current State Analysis ### Files Using String-Based Category References | File | Line | Current Type | Usage | | -------------------------------------------------- | ------------ | ------------------------------- | ------------------------ | | `src/routes/user.routes.ts` | 76 | `category: string` | API validation schema | | `src/services/db/personalization.db.ts` | 175 | `categoryName: string` | Service method parameter | | `src/services/apiClient.ts` | 436 | `category: string` | API client function | | `src/hooks/mutations/useAddWatchedItemMutation.ts` | 9 | `category?: string` | Frontend mutation hook | | `src/types/upc.ts` | 85, 107, 201 | `category: string \| null` | UPC types | | `src/types/ai.ts` | 21 | `category_name: z.string()` | AI extraction schema | | `src/services/flyerDataTransformer.ts` | 40 | `category_name: string` | Flyer processing | | `src/services/db/expiry.db.ts` | 51 | `category_name: string \| null` | Expiry tracking | | `src/services/db/upc.db.ts` | 36, 347 | `category: string \| null` | UPC database | ### Database Schema (Current - Correct) ```sql -- Categories table (normalized) CREATE TABLE categories ( category_id BIGINT PRIMARY KEY GENERATED ALWAYS AS IDENTITY, name TEXT NOT NULL UNIQUE, created_at TIMESTAMPTZ DEFAULT now() NOT NULL, updated_at TIMESTAMPTZ DEFAULT now() NOT NULL ); -- Master items reference category by ID (correct) CREATE TABLE master_grocery_items ( master_grocery_item_id BIGINT PRIMARY KEY GENERATED ALWAYS AS IDENTITY, name TEXT NOT NULL, category_id BIGINT REFERENCES categories(category_id), -- ✅ Proper FK -- ... ); ``` ### Current API Behavior ```typescript // API Request (current - problematic) POST /api/users/watched-items { "itemName": "Milk", "category": "Dairy & Eggs" // ❌ String } // Service performs string lookup const categoryRes = await client.query( 'SELECT category_id FROM categories WHERE name = $1', [categoryName] // ❌ String matching on every request ); // Then uses ID internally INSERT INTO master_grocery_items (name, category_id) VALUES ($1, $2) ``` --- ## Migration Plan ### Phase 1: Add Category Discovery (Non-Breaking) **Duration:** 2-3 days **Risk:** Low **Breaking:** No #### 1.1. Create Categories API Route **File:** `src/routes/category.routes.ts` (new file) ```typescript // src/routes/category.routes.ts import { Router } from 'express'; import { CategoryDbService } from '../services/db/category.db'; import { asyncHandler } from '../middleware/asyncHandler'; import { getLogger } from '../services/logger.server'; const router = Router(); const logger = getLogger(); /** * @swagger * /api/categories: * get: * summary: List all available grocery categories * tags: [Categories] * responses: * 200: * description: List of categories * content: * application/json: * schema: * type: object * properties: * success: * type: boolean * data: * type: array * items: * type: object * properties: * category_id: * type: integer * name: * type: string */ router.get( '/', asyncHandler(async (req, res) => { const categories = await CategoryDbService.getAllCategories(logger); res.json({ success: true, data: categories, }); }), ); /** * @swagger * /api/categories/{id}: * get: * summary: Get a specific category by ID * tags: [Categories] * parameters: * - in: path * name: id * required: true * schema: * type: integer * responses: * 200: * description: Category details * 404: * description: Category not found */ router.get( '/:id', asyncHandler(async (req, res) => { const categoryId = parseInt(req.params.id); const category = await CategoryDbService.getCategoryById(categoryId, logger); if (!category) { return res.status(404).json({ success: false, error: 'Category not found', }); } res.json({ success: true, data: category, }); }), ); /** * @swagger * /api/categories/lookup: * get: * summary: Lookup category by name (for migration support) * tags: [Categories] * parameters: * - in: query * name: name * required: true * schema: * type: string * responses: * 200: * description: Category found * 404: * description: Category not found */ router.get( '/lookup', asyncHandler(async (req, res) => { const name = req.query.name as string; if (!name) { return res.status(400).json({ success: false, error: 'Query parameter "name" is required', }); } const category = await CategoryDbService.getCategoryByName(name, logger); if (!category) { return res.status(404).json({ success: false, error: `Category '${name}' not found`, }); } res.json({ success: true, data: category, }); }), ); export default router; ``` #### 1.2. Create Category Database Service **File:** `src/services/db/category.db.ts` (new file) ```typescript // src/services/db/category.db.ts import { Logger } from 'pino'; import { getPool } from './connection.db'; import { handleDbError } from '../errorHandler'; export interface Category { category_id: number; name: string; created_at: Date; updated_at: Date; } export class CategoryDbService { /** * Get all categories ordered by name */ static async getAllCategories(logger: Logger): Promise { const pool = getPool(); try { const result = await pool.query( `SELECT category_id, name, created_at, updated_at FROM public.categories ORDER BY name ASC`, ); return result.rows; } catch (error) { handleDbError(error, logger, 'Error fetching all categories'); throw error; } } /** * Get category by ID */ static async getCategoryById(categoryId: number, logger: Logger): Promise { const pool = getPool(); try { const result = await pool.query( `SELECT category_id, name, created_at, updated_at FROM public.categories WHERE category_id = $1`, [categoryId], ); return result.rows[0] || null; } catch (error) { handleDbError(error, logger, 'Error fetching category by ID', { categoryId }); throw error; } } /** * Get category by name (case-insensitive) */ static async getCategoryByName(name: string, logger: Logger): Promise { const pool = getPool(); try { const result = await pool.query( `SELECT category_id, name, created_at, updated_at FROM public.categories WHERE LOWER(name) = LOWER($1)`, [name], ); return result.rows[0] || null; } catch (error) { handleDbError(error, logger, 'Error fetching category by name', { name }); throw error; } } } ``` #### 1.3. Register Category Routes **File:** `server.ts` ```typescript // Add to server.ts imports import categoryRoutes from './src/routes/category.routes'; // Add to route registrations app.use('/api/categories', categoryRoutes); ``` #### 1.4. Testing Phase 1 **Create:** `src/tests/integration/category.routes.test.ts` ```typescript // src/tests/integration/category.routes.test.ts import { describe, it, expect, beforeAll } from 'vitest'; import supertest from 'supertest'; import { app } from '../../server'; describe('Category API Routes', () => { let request: supertest.SuperTest; beforeAll(() => { request = supertest(app); }); describe('GET /api/categories', () => { it('should return list of all categories', async () => { const response = await request.get('/api/categories'); expect(response.status).toBe(200); expect(response.body.success).toBe(true); expect(Array.isArray(response.body.data)).toBe(true); expect(response.body.data.length).toBeGreaterThan(0); // Verify category structure const firstCategory = response.body.data[0]; expect(firstCategory).toHaveProperty('category_id'); expect(firstCategory).toHaveProperty('name'); expect(typeof firstCategory.category_id).toBe('number'); expect(typeof firstCategory.name).toBe('string'); }); it('should return categories in alphabetical order', async () => { const response = await request.get('/api/categories'); const categories = response.body.data; for (let i = 1; i < categories.length; i++) { expect(categories[i].name >= categories[i - 1].name).toBe(true); } }); }); describe('GET /api/categories/:id', () => { it('should return specific category by ID', async () => { // First get all categories to find a valid ID const listResponse = await request.get('/api/categories'); const firstCategory = listResponse.body.data[0]; const response = await request.get(`/api/categories/${firstCategory.category_id}`); expect(response.status).toBe(200); expect(response.body.success).toBe(true); expect(response.body.data.category_id).toBe(firstCategory.category_id); expect(response.body.data.name).toBe(firstCategory.name); }); it('should return 404 for non-existent category ID', async () => { const response = await request.get('/api/categories/999999'); expect(response.status).toBe(404); expect(response.body.success).toBe(false); }); }); describe('GET /api/categories/lookup', () => { it('should find category by exact name', async () => { const response = await request.get('/api/categories/lookup?name=Dairy & Eggs'); expect(response.status).toBe(200); expect(response.body.success).toBe(true); expect(response.body.data.name).toBe('Dairy & Eggs'); }); it('should find category by case-insensitive name', async () => { const response = await request.get('/api/categories/lookup?name=dairy & eggs'); expect(response.status).toBe(200); expect(response.body.success).toBe(true); expect(response.body.data.name).toBe('Dairy & Eggs'); }); it('should return 404 for non-existent category name', async () => { const response = await request.get('/api/categories/lookup?name=NonExistent'); expect(response.status).toBe(404); expect(response.body.success).toBe(false); }); it('should return 400 if name parameter is missing', async () => { const response = await request.get('/api/categories/lookup'); expect(response.status).toBe(400); expect(response.body.success).toBe(false); }); }); }); ``` **Update E2E test:** `src/tests/e2e/deals-journey.e2e.test.ts` ```typescript // Add category fetch at start of test const categoriesResponse = await authedFetch('/categories', { method: 'GET', }); const categories = await categoriesResponse.json(); const dairyCategory = categories.data.find((c) => c.name === 'Dairy & Eggs'); const bakeryCategory = categories.data.find((c) => c.name === 'Bakery & Bread'); // Still use string for now (backward compatible) const watchItem1Response = await authedFetch('/users/watched-items', { method: 'POST', token: authToken, body: JSON.stringify({ itemName: 'E2E Milk 2%', category: 'Dairy & Eggs', // Still works }), }); ``` #### 1.5. Documentation Update `docs/api/README.md` or Swagger docs with new `/api/categories` endpoint. --- ### Phase 2: Support Both Formats (Non-Breaking) **Duration:** 3-5 days **Risk:** Medium **Breaking:** No (backward compatible) #### 2.1. Update API Validation Schema **File:** `src/routes/user.routes.ts` ```typescript // Update schema to accept both formats const addWatchedItemSchema = z.object({ body: z .object({ itemName: requiredString("Field 'itemName' is required."), // Accept either category (string, deprecated) or category_id (number, preferred) category: z.string().trim().optional(), category_id: z.number().int().positive().optional(), }) .refine((data) => data.category || data.category_id, { message: "Either 'category' or 'category_id' must be provided", path: ['category'], }), }); ``` #### 2.2. Update Service Layer **File:** `src/services/db/personalization.db.ts` ```typescript /** * Add a master grocery item to a user's watchlist. * * @param userId - UUID of the user * @param itemName - Name of the grocery item * @param categoryIdOrName - Either category ID (number) or name (string, deprecated) * @param logger - Pino logger instance * @returns A promise that resolves to the MasterGroceryItem * * @deprecated Passing category name as string is deprecated. Use category ID instead. */ async addWatchedItem( userId: string, itemName: string, categoryIdOrName: number | string, logger: Logger, ): Promise { try { return await withTransaction(async (client) => { let categoryId: number; // Handle both ID and name (with deprecation warning) if (typeof categoryIdOrName === 'number') { categoryId = categoryIdOrName; } else { // Legacy string-based lookup (deprecated) logger.warn( { categoryName: categoryIdOrName, userId, itemName }, 'DEPRECATED: Using category name instead of ID. Please update to use category_id.' ); const categoryRes = await client.query<{ category_id: number }>( 'SELECT category_id FROM public.categories WHERE name = $1', [categoryIdOrName], ); categoryId = categoryRes.rows[0]?.category_id; if (!categoryId) { throw new Error(`Category '${categoryIdOrName}' not found.`); } } // Validate category ID exists const categoryCheck = await client.query( 'SELECT 1 FROM public.categories WHERE category_id = $1', [categoryId] ); if (categoryCheck.rows.length === 0) { throw new Error(`Category ID ${categoryId} does not exist.`); } // Find or create master item let masterItem: MasterGroceryItem; const masterItemRes = await client.query( 'SELECT * FROM public.master_grocery_items WHERE name = $1', [itemName], ); if (masterItemRes.rows.length > 0) { masterItem = masterItemRes.rows[0]; } else { const newMasterItemRes = await client.query( 'INSERT INTO public.master_grocery_items (name, category_id) VALUES ($1, $2) RETURNING *', [itemName, categoryId], ); masterItem = newMasterItemRes.rows[0]; } // Add to user's watchlist await client.query( 'INSERT INTO public.user_watched_items (user_id, master_item_id) VALUES ($1, $2) ON CONFLICT (user_id, master_item_id) DO NOTHING', [userId, masterItem.master_grocery_item_id], ); return masterItem; }); } catch (error) { handleDbError( error, logger, 'Transaction error in addWatchedItem', { userId, itemName, categoryIdOrName }, ); throw error; } } ``` #### 2.3. Update Route Handler **File:** `src/routes/user.routes.ts` (route handler) ```typescript router.post( '/watched-items', validateRequest(addWatchedItemSchema), asyncHandler(async (req: RequestWithJwt, res: Response) => { const { itemName, category, category_id } = req.body; const userId = req.jwt?.userId; // Prefer category_id, fallback to category (deprecated) const categoryIdOrName = category_id ?? category; if (category && !category_id) { // Log deprecation warning req.log.warn( { category, userId }, 'Client using deprecated "category" field. Should use "category_id".', ); } const item = await PersonalizationDbService.addWatchedItem( userId, itemName, categoryIdOrName, req.log, ); res.status(201).json({ success: true, data: item, }); }), ); ``` #### 2.4. Update Frontend (Gradual) **File:** `src/services/apiClient.ts` ```typescript /** * Add item to watched list * * @param itemName - Name of the item * @param categoryIdOrName - Category ID (preferred) or name (deprecated) */ export const addWatchedItem = async ( itemName: string, categoryIdOrName: number | string, ): Promise => { const body: { itemName: string; category_id?: number; category?: string } = { itemName, }; if (typeof categoryIdOrName === 'number') { body.category_id = categoryIdOrName; } else { body.category = categoryIdOrName; } return authenticatedFetch('/users/watched-items', { method: 'POST', body: JSON.stringify(body), }); }; ``` #### 2.5. Update E2E Tests (Support Both) **File:** `src/tests/e2e/deals-journey.e2e.test.ts` ```typescript // Fetch categories first const categoriesResponse = await fetch(`${API_BASE_URL}/categories`); const categoriesData = await categoriesResponse.json(); const dairyCategory = categoriesData.data.find((c) => c.name === 'Dairy & Eggs'); // Use category_id (new format) const watchItem1Response = await authedFetch('/users/watched-items', { method: 'POST', token: authToken, body: JSON.stringify({ itemName: 'E2E Milk 2%', category_id: dairyCategory.category_id, // ✅ Use ID }), }); ``` #### 2.6. Testing Phase 2 Create integration tests that verify both formats work: ```typescript describe('POST /api/users/watched-items (dual format)', () => { it('should accept category_id (new format)', async () => { const response = await authenticatedRequest.post('/api/users/watched-items').send({ itemName: 'Test Item', category_id: 3, }); expect(response.status).toBe(201); }); it('should accept category name (deprecated format)', async () => { const response = await authenticatedRequest.post('/api/users/watched-items').send({ itemName: 'Test Item', category: 'Dairy & Eggs', }); expect(response.status).toBe(201); // Should log deprecation warning }); it('should prefer category_id when both provided', async () => { const response = await authenticatedRequest.post('/api/users/watched-items').send({ itemName: 'Test Item', category_id: 3, category: 'Wrong Name', // Should be ignored }); expect(response.status).toBe(201); }); }); ``` --- ### Phase 3: Remove String Support (Breaking Change) **Duration:** 2-3 days **Risk:** High **Breaking:** Yes (requires version bump) #### 3.1. Version Bump Update `package.json`: ```json { "version": "1.0.0" // or "0.12.0" depending on semver strategy } ``` #### 3.2. Remove Deprecated Code **File:** `src/routes/user.routes.ts` ```typescript // Remove deprecated category field const addWatchedItemSchema = z.object({ body: z.object({ itemName: requiredString("Field 'itemName' is required."), category_id: z.number().int().positive("Field 'category_id' must be a positive integer"), }), }); ``` **File:** `src/services/db/personalization.db.ts` ```typescript // Remove string overload async addWatchedItem( userId: string, itemName: string, categoryId: number, // Only accept number logger: Logger, ): Promise { try { return await withTransaction(async (client) => { // Validate category exists const categoryCheck = await client.query( 'SELECT 1 FROM public.categories WHERE category_id = $1', [categoryId] ); if (categoryCheck.rows.length === 0) { throw new Error(`Category ID ${categoryId} does not exist.`); } // ... rest of implementation (no string lookup) }); } catch (error) { handleDbError(error, logger, 'Transaction error in addWatchedItem', { userId, itemName, categoryId, }); throw error; } } ``` #### 3.3. Update All Client Code **File:** `src/services/apiClient.ts` ```typescript // Remove string support export const addWatchedItem = async ( itemName: string, categoryId: number, // Only number ): Promise => { return authenticatedFetch('/users/watched-items', { method: 'POST', body: JSON.stringify({ itemName, category_id: categoryId, }), }); }; ``` **File:** `src/hooks/mutations/useAddWatchedItemMutation.ts` ```typescript interface AddWatchedItemParams { itemName: string; category_id: number; // Only number } ``` #### 3.4. Update All Tests Update all E2E and integration tests to use `category_id`: ```typescript // All tests must fetch category IDs const categoriesResponse = await fetch(`${API_BASE_URL}/categories`); const categories = await categoriesResponse.json(); const watchItem1Response = await authedFetch('/users/watched-items', { method: 'POST', token: authToken, body: JSON.stringify({ itemName: 'E2E Milk 2%', category_id: categories.data.find((c) => c.name === 'Dairy & Eggs').category_id, }), }); ``` #### 3.5. Update API Documentation - Update Swagger/OpenAPI specs - Update README and API docs - Add migration guide for API consumers #### 3.6. Deployment 1. Deploy Phase 3 to test environment 2. Run full E2E test suite 3. Verify all tests pass 4. Deploy to production with major version bump 5. Communicate breaking change to stakeholders --- ## Similar Issues to Address ### 1. UPC/Barcode System **Current:** ```typescript // src/types/upc.ts export interface UpcLookupResult { category: string | null; // ❌ String } ``` **Decision Needed:** - Is this user-provided free-form text? → Keep as string - Is this from a predefined list? → Change to category_id **Recommendation:** If categories are predefined, change to `category_id: number | null` ### 2. AI Extraction **Current:** ```typescript // src/types/ai.ts export const FlyerItemSchema = z.object({ category_name: z.string().nullish(), }); ``` **Approach:** - AI extracts category name (string) - Service layer maps to category_id using fuzzy matching - Store category_id in database **Implementation:** ```typescript // New: AI Category Mapper Service export class AiCategoryMapper { static async mapCategoryName( categoryName: string | null, logger: Logger, ): Promise { if (!categoryName) return null; // Try exact match first let category = await CategoryDbService.getCategoryByName(categoryName, logger); // If no match, try fuzzy matching or default to "Other/Miscellaneous" if (!category) { logger.warn({ categoryName }, 'AI extracted unknown category, using default'); category = await CategoryDbService.getCategoryByName('Other/Miscellaneous', logger); } return category?.category_id || null; } } ``` ### 3. Flyer Data Transformer **Current:** ```typescript // src/services/flyerDataTransformer.ts category_name: String(item.category_name ?? '').trim() || 'Other/Miscellaneous', ``` **Should map to category_id:** ```typescript const categoryId = await AiCategoryMapper.mapCategoryName(item.category_name, logger); ``` --- ## Testing Strategy ### Unit Tests - `CategoryDbService` - all CRUD operations - `AiCategoryMapper` - fuzzy matching logic - Validation schemas ### Integration Tests - Category routes (all endpoints) - Watched items with category_id - Backward compatibility (Phase 2 only) ### E2E Tests - Full user journey with category selection - All 11 E2E tests updated to use category_id ### Manual Testing Checklist - [ ] Frontend category dropdown populated from API - [ ] Creating watched item with category_id - [ ] Viewing watched items shows category name - [ ] AI extraction maps categories correctly - [ ] Error handling for invalid category_id --- ## Rollback Plan ### Phase 1 Rollback **Risk:** Very low **Steps:** 1. Remove category routes from `server.ts` 2. Delete `src/routes/category.routes.ts` 3. Delete `src/services/db/category.db.ts` 4. Redeploy previous version ### Phase 2 Rollback **Risk:** Low (backward compatible) **Steps:** 1. Revert changes to validation schemas 2. Revert service layer to string-only 3. Clients using category_id will break, but old clients still work ### Phase 3 Rollback **Risk:** High (breaking change) **Steps:** 1. Revert to Phase 2 code (dual support) 2. Redeploy with version downgrade 3. Communicate rollback to API consumers --- ## Communication Plan ### Before Phase 1 - Announce new `/api/categories` endpoint - Encourage clients to start fetching categories ### Before Phase 2 - Announce deprecation of `category` field - Recommend migration to `category_id` - Provide timeline for Phase 3 ### Before Phase 3 - Final warning: 2 weeks before deployment - Publish migration guide - Offer support for client updates ### After Phase 3 - Announce breaking change deployed - Monitor error rates and Bugsink for issues - Provide rollback plan if needed --- ## Success Criteria - [ ] All E2E tests pass with category_id - [ ] No performance regression - [ ] API response times improved (no string lookups) - [ ] Zero Bugsink errors related to category issues - [ ] Frontend displays categories correctly - [ ] All documentation updated --- ## Timeline | Phase | Duration | Start | End | | --------------------------- | -------------- | ------ | ------ | | Phase 1: Discovery Endpoint | 2-3 days | Week 1 | Week 1 | | Phase 1: Testing | 1 day | Week 1 | Week 1 | | Phase 2: Dual Support | 3-5 days | Week 2 | Week 2 | | Phase 2: Testing | 2 days | Week 2 | Week 2 | | Phase 3: Remove Deprecated | 2-3 days | Week 3 | Week 3 | | Phase 3: Testing & Deploy | 2 days | Week 3 | Week 3 | | **Total** | **12-16 days** | | | --- ## Related Documents - [ADR-023: Database Normalization and Referential Integrity](adr/0023-database-normalization-and-referential-integrity.md) - [TESTING.md](TESTING.md) - API Documentation (Swagger) --- ## Notes - Consider adding category icons/colors in future - May want to allow user-defined categories later - Internationalization of category names needed eventually - Category ordering/sorting preferences per user