Files
flyer-crawler.projectium.com/docs/SCHEMA_RELATIONSHIP_ANALYSIS.md
Torben Sorensen cf476e7afc
All checks were successful
Deploy to Test Environment / deploy-to-test (push) Successful in 18m47s
ADR-022 - websocket notificaitons - also more test fixes with stores
2026-01-19 10:53:42 -08:00

312 lines
11 KiB
Markdown

# Database Schema Relationship Analysis
## Executive Summary
This document analyzes the database schema to identify missing table relationships and JOINs that aren't properly implemented in the codebase. This analysis was triggered by discovering that `WatchedItemDeal` was using a `store_name` string instead of a proper `store` object with nested locations.
## Key Findings
### ✅ CORRECTLY IMPLEMENTED
#### 1. Store → Store Locations → Addresses (3-table normalization)
**Schema:**
```sql
stores (store_id) store_locations (store_location_id) addresses (address_id)
```
**Implementation:**
- [src/services/db/storeLocation.db.ts](src/services/db/storeLocation.db.ts) properly JOINs all three tables
- [src/types.ts](src/types.ts) defines `StoreWithLocations` interface with nested address objects
- Recent fixes corrected `WatchedItemDeal` to use `store` object instead of `store_name` string
**Queries:**
```typescript
// From storeLocation.db.ts
FROM public.stores s
LEFT JOIN public.store_locations sl ON s.store_id = sl.store_id
LEFT JOIN public.addresses a ON sl.address_id = a.address_id
```
#### 2. Shopping Trips → Shopping Trip Items
**Schema:**
```sql
shopping_trips (shopping_trip_id) shopping_trip_items (shopping_trip_item_id) master_grocery_items
```
**Implementation:**
- [src/services/db/shopping.db.ts:513-518](src/services/db/shopping.db.ts#L513-L518) properly JOINs shopping_trips → shopping_trip_items → master_grocery_items
- Uses `json_agg` to nest items array within trip object
- [src/types.ts:639-647](src/types.ts#L639-L647) `ShoppingTrip` interface includes nested `items: ShoppingTripItem[]`
**Queries:**
```typescript
FROM public.shopping_trips st
LEFT JOIN public.shopping_trip_items sti ON st.shopping_trip_id = sti.shopping_trip_id
LEFT JOIN public.master_grocery_items mgi ON sti.master_item_id = mgi.master_grocery_item_id
```
#### 3. Receipts → Receipt Items
**Schema:**
```sql
receipts (receipt_id) receipt_items (receipt_item_id)
```
**Implementation:**
- [src/types.ts:649-662](src/types.ts#L649-L662) `Receipt` interface includes optional `items?: ReceiptItem[]`
- Receipt items are fetched separately via repository methods
- Proper foreign key relationship maintained
---
### ❌ MISSING / INCORRECT IMPLEMENTATIONS
#### 1. **CRITICAL: Flyers → Flyer Locations → Store Locations (Many-to-Many)**
**Schema:**
```sql
CREATE TABLE IF NOT EXISTS public.flyer_locations (
flyer_id BIGINT NOT NULL REFERENCES public.flyers(flyer_id) ON DELETE CASCADE,
store_location_id BIGINT NOT NULL REFERENCES public.store_locations(store_location_id) ON DELETE CASCADE,
PRIMARY KEY (flyer_id, store_location_id),
...
);
COMMENT: 'A linking table associating a single flyer with multiple store locations where its deals are valid.'
```
**Problem:**
- The schema defines a **many-to-many relationship** - a flyer can be valid at multiple store locations
- Current implementation in [src/services/db/flyer.db.ts](src/services/db/flyer.db.ts) **IGNORES** the `flyer_locations` table entirely
- Queries JOIN `flyers` directly to `stores` via `store_id` foreign key
- This means flyers can only be associated with ONE store, not multiple locations
**Current (Incorrect) Queries:**
```typescript
// From flyer.db.ts:315-362
FROM public.flyers f
JOIN public.stores s ON f.store_id = s.store_id // ❌ Wrong - ignores flyer_locations
```
**Expected (Correct) Queries:**
```typescript
// Should be:
FROM public.flyers f
JOIN public.flyer_locations fl ON f.flyer_id = fl.flyer_id
JOIN public.store_locations sl ON fl.store_location_id = sl.store_location_id
JOIN public.stores s ON sl.store_id = s.store_id
JOIN public.addresses a ON sl.address_id = a.address_id
```
**TypeScript Type Issues:**
- [src/types.ts](src/types.ts) `Flyer` interface has `store` object, but it should have `locations: StoreLocation[]` array
- Current structure assumes one store per flyer, not multiple locations
**Files Affected:**
- [src/services/db/flyer.db.ts](src/services/db/flyer.db.ts) - All flyer queries
- [src/types.ts](src/types.ts) - `Flyer` interface definition
- Any component displaying flyer locations
---
#### 2. **User Submitted Prices → Store Locations (MIGRATED)**
**Status**: ✅ **FIXED** - Migration created
**Schema:**
```sql
CREATE TABLE IF NOT EXISTS public.user_submitted_prices (
...
store_id BIGINT NOT NULL REFERENCES public.stores(store_id) ON DELETE CASCADE,
...
);
```
**Solution Implemented:**
- Created migration [sql/migrations/005_add_store_location_to_user_submitted_prices.sql](sql/migrations/005_add_store_location_to_user_submitted_prices.sql)
- Added `store_location_id` column to table (NOT NULL after migration)
- Migrated existing data: linked each price to first location of its store
- Updated TypeScript interface [src/types.ts:270-282](src/types.ts#L270-L282) to include both fields
- Kept `store_id` for backward compatibility during transition
**Benefits:**
- Prices are now specific to individual store locations
- "Walmart Toronto" and "Walmart Vancouver" prices are tracked separately
- Improves geographic specificity for price comparisons
- Enables proximity-based price recommendations
**Next Steps:**
- Application code needs to be updated to use `store_location_id` when creating new prices
- Once all code is migrated, can drop the legacy `store_id` column
- User-submitted prices feature is not yet implemented in the UI
---
#### 3. **Receipts → Store Locations (MIGRATED)**
**Status**: ✅ **FIXED** - Migration created
**Schema:**
```sql
CREATE TABLE IF NOT EXISTS public.receipts (
...
store_id BIGINT REFERENCES public.stores(store_id) ON DELETE CASCADE,
store_location_id BIGINT REFERENCES public.store_locations(store_location_id) ON DELETE SET NULL,
...
);
```
**Solution Implemented:**
- Created migration [sql/migrations/006_add_store_location_to_receipts.sql](sql/migrations/006_add_store_location_to_receipts.sql)
- Added `store_location_id` column to table (nullable - receipts may not have matched store)
- Migrated existing data: linked each receipt to first location of its store
- Updated TypeScript interface [src/types.ts:661-675](src/types.ts#L661-L675) to include both fields
- Kept `store_id` for backward compatibility during transition
**Benefits:**
- Receipts can now be tied to specific store locations
- "Loblaws Queen St" and "Loblaws Bloor St" are tracked separately
- Enables location-specific shopping pattern analysis
- Improves receipt matching accuracy with address data
**Next Steps:**
- Receipt scanning code needs to determine specific store_location_id from OCR text
- May require address parsing/matching logic in receipt processing
- Once all code is migrated, can drop the legacy `store_id` column
- OCR confidence and pattern matching should prefer location-specific data
---
#### 4. Item Price History → Store Locations (Already Correct!)
**Schema:**
```sql
CREATE TABLE IF NOT EXISTS public.item_price_history (
...
store_location_id BIGINT REFERENCES public.store_locations(store_location_id) ON DELETE CASCADE,
...
);
```
**Status:**
-**CORRECTLY IMPLEMENTED** - This table already uses `store_location_id`
- Properly tracks price history per location
- Good example of how other tables should be structured
---
## Summary Table
| Table | Foreign Key | Should Use | Status | Priority |
| --------------------- | --------------------------- | ------------------------------------- | --------------- | -------- |
| **flyer_locations** | flyer_id, store_location_id | Many-to-many link | ✅ **FIXED** | ✅ Done |
| flyers | store_id | ~~store_id~~ Now uses flyer_locations | ✅ **FIXED** | ✅ Done |
| user_submitted_prices | store_id | store_location_id | ✅ **MIGRATED** | ✅ Done |
| receipts | store_id | store_location_id | ✅ **MIGRATED** | ✅ Done |
| item_price_history | store_location_id | ✅ Already correct | ✅ Correct | ✅ Good |
| shopping_trips | (no store ref) | N/A | ✅ Correct | ✅ Good |
| store_locations | store_id, address_id | ✅ Already correct | ✅ Correct | ✅ Good |
---
## Impact Assessment
### Critical (Must Fix)
1. **Flyer Locations Many-to-Many**
- **Impact:** Flyers can't be associated with multiple store locations
- **User Impact:** Users can't see which specific store locations have deals
- **Business Logic:** Breaks core assumption that one flyer can be valid at multiple stores
- **Fix Complexity:** High - requires schema migration, type changes, query rewrites
### Medium (Should Consider)
2. **User Submitted Prices & Receipts**
- **Impact:** Loss of location-specific data
- **User Impact:** Can't distinguish between different locations of same store chain
- **Business Logic:** Reduces accuracy of proximity-based recommendations
- **Fix Complexity:** Medium - requires migration and query updates
---
## Recommended Actions
### Phase 1: Fix Flyer Locations (Critical)
1. Create migration to properly use `flyer_locations` table
2. Update `Flyer` TypeScript interface to support multiple locations
3. Rewrite all flyer queries in [src/services/db/flyer.db.ts](src/services/db/flyer.db.ts)
4. Update flyer creation/update endpoints to manage `flyer_locations` entries
5. Update frontend components to display multiple locations per flyer
6. Update tests to use new structure
### Phase 2: Consider Store Location Specificity (Optional)
1. Evaluate if location-specific receipts and prices provide value
2. If yes, create migrations to change `store_id``store_location_id`
3. Update repository queries
4. Update TypeScript interfaces
5. Update tests
---
## Related Documents
- [ADR-013: Store Address Normalization](../docs/adr/0013-store-address-normalization.md)
- [STORE_ADDRESS_IMPLEMENTATION_PLAN.md](../STORE_ADDRESS_IMPLEMENTATION_PLAN.md)
- [TESTING.md](../docs/TESTING.md)
---
## Analysis Methodology
This analysis was conducted by:
1. Extracting all foreign key relationships from [sql/master_schema_rollup.sql](sql/master_schema_rollup.sql)
2. Comparing schema relationships against TypeScript interfaces in [src/types.ts](src/types.ts)
3. Auditing database queries in [src/services/db/](src/services/db/) for proper JOIN usage
4. Identifying gaps where schema relationships exist but aren't used in queries
Commands used:
```bash
# Extract all foreign keys
podman exec -it flyer-crawler-dev bash -c "grep -n 'REFERENCES' sql/master_schema_rollup.sql"
# Check specific table structures
podman exec -it flyer-crawler-dev bash -c "grep -A 15 'CREATE TABLE.*table_name' sql/master_schema_rollup.sql"
# Verify query patterns
podman exec -it flyer-crawler-dev bash -c "grep -n 'JOIN.*table_name' src/services/db/*.ts"
```
---
**Last Updated:** 2026-01-19
**Analyzed By:** Claude Code (via user request after discovering store_name → store bug)