Files
flyer-crawler.projectium.com/docs/research-category-id-migration.md
Torben Sorensen 4022768c03
All checks were successful
Deploy to Test Environment / deploy-to-test (push) Successful in 18m39s
set up local e2e tests, and some e2e test fixes + docs on more db fixin - ugh
2026-01-19 13:45:21 -08:00

27 KiB

Research: Migration Plan for Category ID References

Date: 2026-01-19 Status: Planning Related: ADR-023: Database Normalization and Referential Integrity

Executive Summary

The API currently accepts category names (strings) instead of category IDs (numbers) when creating watched items. This violates database normalization principles (3NF) and creates a brittle API. This document outlines a comprehensive, phased migration plan to fix this issue across the entire codebase.

Estimated Effort: 2-3 weeks (including testing and deployment) Breaking Changes: Yes (API contract changes) Recommended Version Bump: Major (0.11.14 → 1.0.0 or 0.12.0)


Current State Analysis

Files Using String-Based Category References

File Line Current Type Usage
src/routes/user.routes.ts 76 category: string API validation schema
src/services/db/personalization.db.ts 175 categoryName: string Service method parameter
src/services/apiClient.ts 436 category: string API client function
src/hooks/mutations/useAddWatchedItemMutation.ts 9 category?: string Frontend mutation hook
src/types/upc.ts 85, 107, 201 category: string | null UPC types
src/types/ai.ts 21 category_name: z.string() AI extraction schema
src/services/flyerDataTransformer.ts 40 category_name: string Flyer processing
src/services/db/expiry.db.ts 51 category_name: string | null Expiry tracking
src/services/db/upc.db.ts 36, 347 category: string | null UPC database

Database Schema (Current - Correct)

-- Categories table (normalized)
CREATE TABLE categories (
  category_id BIGINT PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
  name TEXT NOT NULL UNIQUE,
  created_at TIMESTAMPTZ DEFAULT now() NOT NULL,
  updated_at TIMESTAMPTZ DEFAULT now() NOT NULL
);

-- Master items reference category by ID (correct)
CREATE TABLE master_grocery_items (
  master_grocery_item_id BIGINT PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
  name TEXT NOT NULL,
  category_id BIGINT REFERENCES categories(category_id),  -- ✅ Proper FK
  -- ...
);

Current API Behavior

// API Request (current - problematic)
POST /api/users/watched-items
{
  "itemName": "Milk",
  "category": "Dairy & Eggs"  // ❌ String
}

// Service performs string lookup
const categoryRes = await client.query(
  'SELECT category_id FROM categories WHERE name = $1',
  [categoryName]  // ❌ String matching on every request
);

// Then uses ID internally
INSERT INTO master_grocery_items (name, category_id) VALUES ($1, $2)

Migration Plan

Phase 1: Add Category Discovery (Non-Breaking)

Duration: 2-3 days Risk: Low Breaking: No

1.1. Create Categories API Route

File: src/routes/category.routes.ts (new file)

// src/routes/category.routes.ts
import { Router } from 'express';
import { CategoryDbService } from '../services/db/category.db';
import { asyncHandler } from '../middleware/asyncHandler';
import { getLogger } from '../services/logger.server';

const router = Router();
const logger = getLogger();

/**
 * @swagger
 * /api/categories:
 *   get:
 *     summary: List all available grocery categories
 *     tags: [Categories]
 *     responses:
 *       200:
 *         description: List of categories
 *         content:
 *           application/json:
 *             schema:
 *               type: object
 *               properties:
 *                 success:
 *                   type: boolean
 *                 data:
 *                   type: array
 *                   items:
 *                     type: object
 *                     properties:
 *                       category_id:
 *                         type: integer
 *                       name:
 *                         type: string
 */
router.get(
  '/',
  asyncHandler(async (req, res) => {
    const categories = await CategoryDbService.getAllCategories(logger);

    res.json({
      success: true,
      data: categories,
    });
  }),
);

/**
 * @swagger
 * /api/categories/{id}:
 *   get:
 *     summary: Get a specific category by ID
 *     tags: [Categories]
 *     parameters:
 *       - in: path
 *         name: id
 *         required: true
 *         schema:
 *           type: integer
 *     responses:
 *       200:
 *         description: Category details
 *       404:
 *         description: Category not found
 */
router.get(
  '/:id',
  asyncHandler(async (req, res) => {
    const categoryId = parseInt(req.params.id);
    const category = await CategoryDbService.getCategoryById(categoryId, logger);

    if (!category) {
      return res.status(404).json({
        success: false,
        error: 'Category not found',
      });
    }

    res.json({
      success: true,
      data: category,
    });
  }),
);

/**
 * @swagger
 * /api/categories/lookup:
 *   get:
 *     summary: Lookup category by name (for migration support)
 *     tags: [Categories]
 *     parameters:
 *       - in: query
 *         name: name
 *         required: true
 *         schema:
 *           type: string
 *     responses:
 *       200:
 *         description: Category found
 *       404:
 *         description: Category not found
 */
router.get(
  '/lookup',
  asyncHandler(async (req, res) => {
    const name = req.query.name as string;

    if (!name) {
      return res.status(400).json({
        success: false,
        error: 'Query parameter "name" is required',
      });
    }

    const category = await CategoryDbService.getCategoryByName(name, logger);

    if (!category) {
      return res.status(404).json({
        success: false,
        error: `Category '${name}' not found`,
      });
    }

    res.json({
      success: true,
      data: category,
    });
  }),
);

export default router;

1.2. Create Category Database Service

File: src/services/db/category.db.ts (new file)

// src/services/db/category.db.ts
import { Logger } from 'pino';
import { getPool } from './connection.db';
import { handleDbError } from '../errorHandler';

export interface Category {
  category_id: number;
  name: string;
  created_at: Date;
  updated_at: Date;
}

export class CategoryDbService {
  /**
   * Get all categories ordered by name
   */
  static async getAllCategories(logger: Logger): Promise<Category[]> {
    const pool = getPool();

    try {
      const result = await pool.query<Category>(
        `SELECT category_id, name, created_at, updated_at
         FROM public.categories
         ORDER BY name ASC`,
      );

      return result.rows;
    } catch (error) {
      handleDbError(error, logger, 'Error fetching all categories');
      throw error;
    }
  }

  /**
   * Get category by ID
   */
  static async getCategoryById(categoryId: number, logger: Logger): Promise<Category | null> {
    const pool = getPool();

    try {
      const result = await pool.query<Category>(
        `SELECT category_id, name, created_at, updated_at
         FROM public.categories
         WHERE category_id = $1`,
        [categoryId],
      );

      return result.rows[0] || null;
    } catch (error) {
      handleDbError(error, logger, 'Error fetching category by ID', { categoryId });
      throw error;
    }
  }

  /**
   * Get category by name (case-insensitive)
   */
  static async getCategoryByName(name: string, logger: Logger): Promise<Category | null> {
    const pool = getPool();

    try {
      const result = await pool.query<Category>(
        `SELECT category_id, name, created_at, updated_at
         FROM public.categories
         WHERE LOWER(name) = LOWER($1)`,
        [name],
      );

      return result.rows[0] || null;
    } catch (error) {
      handleDbError(error, logger, 'Error fetching category by name', { name });
      throw error;
    }
  }
}

1.3. Register Category Routes

File: server.ts

// Add to server.ts imports
import categoryRoutes from './src/routes/category.routes';

// Add to route registrations
app.use('/api/categories', categoryRoutes);

1.4. Testing Phase 1

Create: src/tests/integration/category.routes.test.ts

// src/tests/integration/category.routes.test.ts
import { describe, it, expect, beforeAll } from 'vitest';
import supertest from 'supertest';
import { app } from '../../server';

describe('Category API Routes', () => {
  let request: supertest.SuperTest<supertest.Test>;

  beforeAll(() => {
    request = supertest(app);
  });

  describe('GET /api/categories', () => {
    it('should return list of all categories', async () => {
      const response = await request.get('/api/categories');

      expect(response.status).toBe(200);
      expect(response.body.success).toBe(true);
      expect(Array.isArray(response.body.data)).toBe(true);
      expect(response.body.data.length).toBeGreaterThan(0);

      // Verify category structure
      const firstCategory = response.body.data[0];
      expect(firstCategory).toHaveProperty('category_id');
      expect(firstCategory).toHaveProperty('name');
      expect(typeof firstCategory.category_id).toBe('number');
      expect(typeof firstCategory.name).toBe('string');
    });

    it('should return categories in alphabetical order', async () => {
      const response = await request.get('/api/categories');
      const categories = response.body.data;

      for (let i = 1; i < categories.length; i++) {
        expect(categories[i].name >= categories[i - 1].name).toBe(true);
      }
    });
  });

  describe('GET /api/categories/:id', () => {
    it('should return specific category by ID', async () => {
      // First get all categories to find a valid ID
      const listResponse = await request.get('/api/categories');
      const firstCategory = listResponse.body.data[0];

      const response = await request.get(`/api/categories/${firstCategory.category_id}`);

      expect(response.status).toBe(200);
      expect(response.body.success).toBe(true);
      expect(response.body.data.category_id).toBe(firstCategory.category_id);
      expect(response.body.data.name).toBe(firstCategory.name);
    });

    it('should return 404 for non-existent category ID', async () => {
      const response = await request.get('/api/categories/999999');

      expect(response.status).toBe(404);
      expect(response.body.success).toBe(false);
    });
  });

  describe('GET /api/categories/lookup', () => {
    it('should find category by exact name', async () => {
      const response = await request.get('/api/categories/lookup?name=Dairy & Eggs');

      expect(response.status).toBe(200);
      expect(response.body.success).toBe(true);
      expect(response.body.data.name).toBe('Dairy & Eggs');
    });

    it('should find category by case-insensitive name', async () => {
      const response = await request.get('/api/categories/lookup?name=dairy & eggs');

      expect(response.status).toBe(200);
      expect(response.body.success).toBe(true);
      expect(response.body.data.name).toBe('Dairy & Eggs');
    });

    it('should return 404 for non-existent category name', async () => {
      const response = await request.get('/api/categories/lookup?name=NonExistent');

      expect(response.status).toBe(404);
      expect(response.body.success).toBe(false);
    });

    it('should return 400 if name parameter is missing', async () => {
      const response = await request.get('/api/categories/lookup');

      expect(response.status).toBe(400);
      expect(response.body.success).toBe(false);
    });
  });
});

Update E2E test: src/tests/e2e/deals-journey.e2e.test.ts

// Add category fetch at start of test
const categoriesResponse = await authedFetch('/categories', {
  method: 'GET',
});
const categories = await categoriesResponse.json();
const dairyCategory = categories.data.find((c) => c.name === 'Dairy & Eggs');
const bakeryCategory = categories.data.find((c) => c.name === 'Bakery & Bread');

// Still use string for now (backward compatible)
const watchItem1Response = await authedFetch('/users/watched-items', {
  method: 'POST',
  token: authToken,
  body: JSON.stringify({
    itemName: 'E2E Milk 2%',
    category: 'Dairy & Eggs', // Still works
  }),
});

1.5. Documentation

Update docs/api/README.md or Swagger docs with new /api/categories endpoint.


Phase 2: Support Both Formats (Non-Breaking)

Duration: 3-5 days Risk: Medium Breaking: No (backward compatible)

2.1. Update API Validation Schema

File: src/routes/user.routes.ts

// Update schema to accept both formats
const addWatchedItemSchema = z.object({
  body: z
    .object({
      itemName: requiredString("Field 'itemName' is required."),
      // Accept either category (string, deprecated) or category_id (number, preferred)
      category: z.string().trim().optional(),
      category_id: z.number().int().positive().optional(),
    })
    .refine((data) => data.category || data.category_id, {
      message: "Either 'category' or 'category_id' must be provided",
      path: ['category'],
    }),
});

2.2. Update Service Layer

File: src/services/db/personalization.db.ts

/**
 * Add a master grocery item to a user's watchlist.
 *
 * @param userId - UUID of the user
 * @param itemName - Name of the grocery item
 * @param categoryIdOrName - Either category ID (number) or name (string, deprecated)
 * @param logger - Pino logger instance
 * @returns A promise that resolves to the MasterGroceryItem
 *
 * @deprecated Passing category name as string is deprecated. Use category ID instead.
 */
async addWatchedItem(
  userId: string,
  itemName: string,
  categoryIdOrName: number | string,
  logger: Logger,
): Promise<MasterGroceryItem> {
  try {
    return await withTransaction(async (client) => {
      let categoryId: number;

      // Handle both ID and name (with deprecation warning)
      if (typeof categoryIdOrName === 'number') {
        categoryId = categoryIdOrName;
      } else {
        // Legacy string-based lookup (deprecated)
        logger.warn(
          { categoryName: categoryIdOrName, userId, itemName },
          'DEPRECATED: Using category name instead of ID. Please update to use category_id.'
        );

        const categoryRes = await client.query<{ category_id: number }>(
          'SELECT category_id FROM public.categories WHERE name = $1',
          [categoryIdOrName],
        );

        categoryId = categoryRes.rows[0]?.category_id;
        if (!categoryId) {
          throw new Error(`Category '${categoryIdOrName}' not found.`);
        }
      }

      // Validate category ID exists
      const categoryCheck = await client.query(
        'SELECT 1 FROM public.categories WHERE category_id = $1',
        [categoryId]
      );

      if (categoryCheck.rows.length === 0) {
        throw new Error(`Category ID ${categoryId} does not exist.`);
      }

      // Find or create master item
      let masterItem: MasterGroceryItem;
      const masterItemRes = await client.query<MasterGroceryItem>(
        'SELECT * FROM public.master_grocery_items WHERE name = $1',
        [itemName],
      );

      if (masterItemRes.rows.length > 0) {
        masterItem = masterItemRes.rows[0];
      } else {
        const newMasterItemRes = await client.query<MasterGroceryItem>(
          'INSERT INTO public.master_grocery_items (name, category_id) VALUES ($1, $2) RETURNING *',
          [itemName, categoryId],
        );
        masterItem = newMasterItemRes.rows[0];
      }

      // Add to user's watchlist
      await client.query(
        'INSERT INTO public.user_watched_items (user_id, master_item_id) VALUES ($1, $2) ON CONFLICT (user_id, master_item_id) DO NOTHING',
        [userId, masterItem.master_grocery_item_id],
      );

      return masterItem;
    });
  } catch (error) {
    handleDbError(
      error,
      logger,
      'Transaction error in addWatchedItem',
      { userId, itemName, categoryIdOrName },
    );
    throw error;
  }
}

2.3. Update Route Handler

File: src/routes/user.routes.ts (route handler)

router.post(
  '/watched-items',
  validateRequest(addWatchedItemSchema),
  asyncHandler(async (req: RequestWithJwt, res: Response) => {
    const { itemName, category, category_id } = req.body;
    const userId = req.jwt?.userId;

    // Prefer category_id, fallback to category (deprecated)
    const categoryIdOrName = category_id ?? category;

    if (category && !category_id) {
      // Log deprecation warning
      req.log.warn(
        { category, userId },
        'Client using deprecated "category" field. Should use "category_id".',
      );
    }

    const item = await PersonalizationDbService.addWatchedItem(
      userId,
      itemName,
      categoryIdOrName,
      req.log,
    );

    res.status(201).json({
      success: true,
      data: item,
    });
  }),
);

2.4. Update Frontend (Gradual)

File: src/services/apiClient.ts

/**
 * Add item to watched list
 *
 * @param itemName - Name of the item
 * @param categoryIdOrName - Category ID (preferred) or name (deprecated)
 */
export const addWatchedItem = async (
  itemName: string,
  categoryIdOrName: number | string,
): Promise<Response> => {
  const body: { itemName: string; category_id?: number; category?: string } = {
    itemName,
  };

  if (typeof categoryIdOrName === 'number') {
    body.category_id = categoryIdOrName;
  } else {
    body.category = categoryIdOrName;
  }

  return authenticatedFetch('/users/watched-items', {
    method: 'POST',
    body: JSON.stringify(body),
  });
};

2.5. Update E2E Tests (Support Both)

File: src/tests/e2e/deals-journey.e2e.test.ts

// Fetch categories first
const categoriesResponse = await fetch(`${API_BASE_URL}/categories`);
const categoriesData = await categoriesResponse.json();
const dairyCategory = categoriesData.data.find((c) => c.name === 'Dairy & Eggs');

// Use category_id (new format)
const watchItem1Response = await authedFetch('/users/watched-items', {
  method: 'POST',
  token: authToken,
  body: JSON.stringify({
    itemName: 'E2E Milk 2%',
    category_id: dairyCategory.category_id, // ✅ Use ID
  }),
});

2.6. Testing Phase 2

Create integration tests that verify both formats work:

describe('POST /api/users/watched-items (dual format)', () => {
  it('should accept category_id (new format)', async () => {
    const response = await authenticatedRequest.post('/api/users/watched-items').send({
      itemName: 'Test Item',
      category_id: 3,
    });

    expect(response.status).toBe(201);
  });

  it('should accept category name (deprecated format)', async () => {
    const response = await authenticatedRequest.post('/api/users/watched-items').send({
      itemName: 'Test Item',
      category: 'Dairy & Eggs',
    });

    expect(response.status).toBe(201);
    // Should log deprecation warning
  });

  it('should prefer category_id when both provided', async () => {
    const response = await authenticatedRequest.post('/api/users/watched-items').send({
      itemName: 'Test Item',
      category_id: 3,
      category: 'Wrong Name', // Should be ignored
    });

    expect(response.status).toBe(201);
  });
});

Phase 3: Remove String Support (Breaking Change)

Duration: 2-3 days Risk: High Breaking: Yes (requires version bump)

3.1. Version Bump

Update package.json:

{
  "version": "1.0.0" // or "0.12.0" depending on semver strategy
}

3.2. Remove Deprecated Code

File: src/routes/user.routes.ts

// Remove deprecated category field
const addWatchedItemSchema = z.object({
  body: z.object({
    itemName: requiredString("Field 'itemName' is required."),
    category_id: z.number().int().positive("Field 'category_id' must be a positive integer"),
  }),
});

File: src/services/db/personalization.db.ts

// Remove string overload
async addWatchedItem(
  userId: string,
  itemName: string,
  categoryId: number,  // Only accept number
  logger: Logger,
): Promise<MasterGroceryItem> {
  try {
    return await withTransaction(async (client) => {
      // Validate category exists
      const categoryCheck = await client.query(
        'SELECT 1 FROM public.categories WHERE category_id = $1',
        [categoryId]
      );

      if (categoryCheck.rows.length === 0) {
        throw new Error(`Category ID ${categoryId} does not exist.`);
      }

      // ... rest of implementation (no string lookup)
    });
  } catch (error) {
    handleDbError(error, logger, 'Transaction error in addWatchedItem', {
      userId,
      itemName,
      categoryId,
    });
    throw error;
  }
}

3.3. Update All Client Code

File: src/services/apiClient.ts

// Remove string support
export const addWatchedItem = async (
  itemName: string,
  categoryId: number, // Only number
): Promise<Response> => {
  return authenticatedFetch('/users/watched-items', {
    method: 'POST',
    body: JSON.stringify({
      itemName,
      category_id: categoryId,
    }),
  });
};

File: src/hooks/mutations/useAddWatchedItemMutation.ts

interface AddWatchedItemParams {
  itemName: string;
  category_id: number; // Only number
}

3.4. Update All Tests

Update all E2E and integration tests to use category_id:

// All tests must fetch category IDs
const categoriesResponse = await fetch(`${API_BASE_URL}/categories`);
const categories = await categoriesResponse.json();

const watchItem1Response = await authedFetch('/users/watched-items', {
  method: 'POST',
  token: authToken,
  body: JSON.stringify({
    itemName: 'E2E Milk 2%',
    category_id: categories.data.find((c) => c.name === 'Dairy & Eggs').category_id,
  }),
});

3.5. Update API Documentation

  • Update Swagger/OpenAPI specs
  • Update README and API docs
  • Add migration guide for API consumers

3.6. Deployment

  1. Deploy Phase 3 to test environment
  2. Run full E2E test suite
  3. Verify all tests pass
  4. Deploy to production with major version bump
  5. Communicate breaking change to stakeholders

Similar Issues to Address

1. UPC/Barcode System

Current:

// src/types/upc.ts
export interface UpcLookupResult {
  category: string | null; // ❌ String
}

Decision Needed:

  • Is this user-provided free-form text? → Keep as string
  • Is this from a predefined list? → Change to category_id

Recommendation: If categories are predefined, change to category_id: number | null

2. AI Extraction

Current:

// src/types/ai.ts
export const FlyerItemSchema = z.object({
  category_name: z.string().nullish(),
});

Approach:

  • AI extracts category name (string)
  • Service layer maps to category_id using fuzzy matching
  • Store category_id in database

Implementation:

// New: AI Category Mapper Service
export class AiCategoryMapper {
  static async mapCategoryName(
    categoryName: string | null,
    logger: Logger,
  ): Promise<number | null> {
    if (!categoryName) return null;

    // Try exact match first
    let category = await CategoryDbService.getCategoryByName(categoryName, logger);

    // If no match, try fuzzy matching or default to "Other/Miscellaneous"
    if (!category) {
      logger.warn({ categoryName }, 'AI extracted unknown category, using default');
      category = await CategoryDbService.getCategoryByName('Other/Miscellaneous', logger);
    }

    return category?.category_id || null;
  }
}

3. Flyer Data Transformer

Current:

// src/services/flyerDataTransformer.ts
category_name: String(item.category_name ?? '').trim() || 'Other/Miscellaneous',

Should map to category_id:

const categoryId = await AiCategoryMapper.mapCategoryName(item.category_name, logger);

Testing Strategy

Unit Tests

  • CategoryDbService - all CRUD operations
  • AiCategoryMapper - fuzzy matching logic
  • Validation schemas

Integration Tests

  • Category routes (all endpoints)
  • Watched items with category_id
  • Backward compatibility (Phase 2 only)

E2E Tests

  • Full user journey with category selection
  • All 11 E2E tests updated to use category_id

Manual Testing Checklist

  • Frontend category dropdown populated from API
  • Creating watched item with category_id
  • Viewing watched items shows category name
  • AI extraction maps categories correctly
  • Error handling for invalid category_id

Rollback Plan

Phase 1 Rollback

Risk: Very low Steps:

  1. Remove category routes from server.ts
  2. Delete src/routes/category.routes.ts
  3. Delete src/services/db/category.db.ts
  4. Redeploy previous version

Phase 2 Rollback

Risk: Low (backward compatible) Steps:

  1. Revert changes to validation schemas
  2. Revert service layer to string-only
  3. Clients using category_id will break, but old clients still work

Phase 3 Rollback

Risk: High (breaking change) Steps:

  1. Revert to Phase 2 code (dual support)
  2. Redeploy with version downgrade
  3. Communicate rollback to API consumers

Communication Plan

Before Phase 1

  • Announce new /api/categories endpoint
  • Encourage clients to start fetching categories

Before Phase 2

  • Announce deprecation of category field
  • Recommend migration to category_id
  • Provide timeline for Phase 3

Before Phase 3

  • Final warning: 2 weeks before deployment
  • Publish migration guide
  • Offer support for client updates

After Phase 3

  • Announce breaking change deployed
  • Monitor error rates and Bugsink for issues
  • Provide rollback plan if needed

Success Criteria

  • All E2E tests pass with category_id
  • No performance regression
  • API response times improved (no string lookups)
  • Zero Bugsink errors related to category issues
  • Frontend displays categories correctly
  • All documentation updated

Timeline

Phase Duration Start End
Phase 1: Discovery Endpoint 2-3 days Week 1 Week 1
Phase 1: Testing 1 day Week 1 Week 1
Phase 2: Dual Support 3-5 days Week 2 Week 2
Phase 2: Testing 2 days Week 2 Week 2
Phase 3: Remove Deprecated 2-3 days Week 3 Week 3
Phase 3: Testing & Deploy 2 days Week 3 Week 3
Total 12-16 days


Notes

  • Consider adding category icons/colors in future
  • May want to allow user-defined categories later
  • Internationalization of category names needed eventually
  • Category ordering/sorting preferences per user