Some checks failed
Deploy to Test Environment / deploy-to-test (push) Failing after 2m15s
927 lines
51 KiB
Markdown
927 lines
51 KiB
Markdown
# Flyer Crawler - System Architecture Overview
|
|
|
|
**Version**: 0.12.20
|
|
**Last Updated**: 2026-01-28
|
|
**Platform**: Linux (Production and Development)
|
|
|
|
---
|
|
|
|
## Table of Contents
|
|
|
|
1. [Executive Summary](#executive-summary)
|
|
2. [System Architecture Diagram](#system-architecture-diagram)
|
|
3. [Technology Stack](#technology-stack)
|
|
4. [System Components](#system-components)
|
|
5. [Data Flow](#data-flow)
|
|
6. [Architecture Layers](#architecture-layers)
|
|
7. [Key Entities](#key-entities)
|
|
8. [Authentication Flow](#authentication-flow)
|
|
9. [Background Processing](#background-processing)
|
|
10. [Deployment Architecture](#deployment-architecture)
|
|
11. [Design Principles and ADRs](#design-principles-and-adrs)
|
|
12. [Key Files Reference](#key-files-reference)
|
|
|
|
---
|
|
|
|
## Executive Summary
|
|
|
|
**Flyer Crawler** is a grocery deal extraction and analysis platform that uses AI-powered processing to extract deals from grocery store flyer images and PDFs. The system provides users with features including watchlists, price history tracking, shopping lists, deal alerts, and recipe management.
|
|
|
|
### Core Capabilities
|
|
|
|
| Domain | Description |
|
|
| ------------------------- | --------------------------------------------------------------------------------------- |
|
|
| **Deal Extraction** | AI-powered extraction of deals from grocery store flyer images/PDFs using Google Gemini |
|
|
| **Price Tracking** | Historical price data, trend analysis, and price alerts |
|
|
| **User Features** | Watchlists, shopping lists, recipes, pantry management, achievements |
|
|
| **Real-time Updates** | WebSocket-based notifications for price alerts and processing status |
|
|
| **Background Processing** | Asynchronous job queues for flyer processing, emails, and analytics |
|
|
|
|
---
|
|
|
|
## System Architecture Diagram
|
|
|
|
```text
|
|
+-----------------------------------------------------------------------------------+
|
|
| CLIENT LAYER |
|
|
+-----------------------------------------------------------------------------------+
|
|
| |
|
|
| +-------------------+ +-------------------+ +-------------------+ |
|
|
| | Web Browser | | Mobile PWA | | API Clients | |
|
|
| | (React SPA) | | (React SPA) | | (REST/JSON) | |
|
|
| +--------+----------+ +--------+----------+ +--------+----------+ |
|
|
| | | | |
|
|
+-------------|-------------------------|-------------------------|------------------+
|
|
| | |
|
|
v v v
|
|
+-----------------------------------------------------------------------------------+
|
|
| NGINX REVERSE PROXY |
|
|
| - SSL/TLS Termination - Rate Limiting - Static Asset Serving |
|
|
| - Load Balancing - Compression - WebSocket Proxying |
|
|
| - Flyer Images (/flyer-images/) with 7-day cache |
|
|
+----------------------------------+------------------------------------------------+
|
|
|
|
|
v
|
|
+-----------------------------------------------------------------------------------+
|
|
| APPLICATION LAYER |
|
|
+-----------------------------------------------------------------------------------+
|
|
| |
|
|
| +-----------------------------------------------------------------------------+ |
|
|
| | EXPRESS.JS SERVER (Node.js) | |
|
|
| | | |
|
|
| | +-------------------------+ +-------------------------+ | |
|
|
| | | Routes Layer | | Middleware Chain | | |
|
|
| | | - API Endpoints | | - Authentication | | |
|
|
| | | - Request Validation | | - Rate Limiting | | |
|
|
| | | - Response Formatting | | - Logging | | |
|
|
| | +------------+------------+ | - Error Handling | | |
|
|
| | | +-------------------------+ | |
|
|
| | v | |
|
|
| | +-------------------------+ +-------------------------+ | |
|
|
| | | Services Layer | | External Services | | |
|
|
| | | - Business Logic | | - Google Gemini AI | | |
|
|
| | | - Transaction Coord. | | - Google Maps API | | |
|
|
| | | - Event Publishing | | - OAuth Providers | | |
|
|
| | +------------+------------+ | - Email (SMTP) | | |
|
|
| | | +-------------------------+ | |
|
|
| | v | |
|
|
| | +-------------------------+ | |
|
|
| | | Repository Layer | | |
|
|
| | | - Database Access | | |
|
|
| | | - Query Construction | | |
|
|
| | | - Entity Mapping | | |
|
|
| | +------------+------------+ | |
|
|
| | | | |
|
|
| +---------------|-------------------------------------------------------------+ |
|
|
| | |
|
|
+------------------|----------------------------------------------------------------+
|
|
|
|
|
v
|
|
+-----------------------------------------------------------------------------------+
|
|
| DATA LAYER |
|
|
+-----------------------------------------------------------------------------------+
|
|
| |
|
|
| +---------------------------+ +---------------------------+ |
|
|
| | PostgreSQL 16 | | Redis 7 | |
|
|
| | (with PostGIS) | | | |
|
|
| | | | - Session Cache | |
|
|
| | - Primary Data Store | | - Query Cache | |
|
|
| | - Geographic Queries | | - Job Queue Backing | |
|
|
| | - Full-Text Search | | - Rate Limit Counters | |
|
|
| | - Stored Functions | | - Real-time Pub/Sub | |
|
|
| +---------------------------+ +---------------------------+ |
|
|
| |
|
|
+-----------------------------------------------------------------------------------+
|
|
|
|
+-----------------------------------------------------------------------------------+
|
|
| BACKGROUND PROCESSING LAYER |
|
|
+-----------------------------------------------------------------------------------+
|
|
| |
|
|
| +---------------------------+ +---------------------------+ |
|
|
| | PM2 Process | | BullMQ Workers | |
|
|
| | Manager | | | |
|
|
| | | | - Flyer Processing | |
|
|
| | - Process Clustering | | - Receipt Processing | |
|
|
| | - Auto-restart | | - Email Sending | |
|
|
| | - Log Management | | - Analytics Reports | |
|
|
| | - Health Monitoring | | - File Cleanup | |
|
|
| +---------------------------+ | - Token Cleanup | |
|
|
| | - Expiry Alerts | |
|
|
| | - Barcode Detection | |
|
|
| +---------------------------+ |
|
|
| |
|
|
+-----------------------------------------------------------------------------------+
|
|
|
|
+-----------------------------------------------------------------------------------+
|
|
| OBSERVABILITY LAYER |
|
|
+-----------------------------------------------------------------------------------+
|
|
| |
|
|
| +------------------+ +------------------+ +------------------+ |
|
|
| | Bugsink/Sentry | | Pino Logger | | Logstash | |
|
|
| | (Error Track) | | (Structured) | | (Aggregation) | |
|
|
| +------------------+ +------------------+ +------------------+ |
|
|
| |
|
|
+-----------------------------------------------------------------------------------+
|
|
```
|
|
|
|
---
|
|
|
|
## Technology Stack
|
|
|
|
### Core Technologies
|
|
|
|
| Component | Technology | Version | Purpose |
|
|
| ---------------------- | ---------- | -------- | -------------------------------- |
|
|
| **Runtime** | Node.js | 22.x LTS | Server-side JavaScript runtime |
|
|
| **Language** | TypeScript | 5.9.3 | Type-safe JavaScript superset |
|
|
| **Web Framework** | Express.js | 5.1.0 | HTTP server and routing |
|
|
| **Frontend Framework** | React | 19.2.0 | UI component library |
|
|
| **Build Tool** | Vite | 7.2.4 | Frontend bundling and dev server |
|
|
|
|
### Data Storage
|
|
|
|
| Component | Technology | Version | Purpose |
|
|
| --------------------- | ---------------- | ------- | ---------------------------------------------- |
|
|
| **Primary Database** | PostgreSQL | 16.x | Relational data storage |
|
|
| **Spatial Extension** | PostGIS | 3.x | Geographic queries and store location features |
|
|
| **Cache & Queues** | Redis | 7.x | Caching, session storage, job queue backing |
|
|
| **File Storage** | Local Filesystem | - | Uploaded flyers and processed images |
|
|
|
|
### AI and External Services
|
|
|
|
| Component | Technology | Purpose |
|
|
| --------------- | --------------------------- | --------------------------------------- |
|
|
| **AI Provider** | Google Gemini | Flyer data extraction, image analysis |
|
|
| **Geocoding** | Google Maps API / Nominatim | Address geocoding and location services |
|
|
| **OAuth** | Google, GitHub | Social authentication |
|
|
| **Email** | Nodemailer (SMTP) | Transactional emails |
|
|
|
|
### Background Processing Stack
|
|
|
|
| Component | Technology | Version | Purpose |
|
|
| ------------------- | ---------- | ------- | --------------------------------- |
|
|
| **Job Queues** | BullMQ | 5.65.1 | Reliable async job processing |
|
|
| **Process Manager** | PM2 | Latest | Process management and clustering |
|
|
| **Scheduler** | node-cron | 4.2.1 | Scheduled tasks |
|
|
|
|
### Frontend Stack
|
|
|
|
| Component | Technology | Version | Purpose |
|
|
| -------------------- | -------------- | ------- | ---------------------------------------- |
|
|
| **State Management** | TanStack Query | 5.90.12 | Server state caching and synchronization |
|
|
| **Routing** | React Router | 7.9.6 | Client-side routing |
|
|
| **Styling** | Tailwind CSS | 4.1.17 | Utility-first CSS framework |
|
|
| **Icons** | Lucide React | 0.555.0 | Icon components |
|
|
| **Charts** | Recharts | 3.4.1 | Data visualization |
|
|
|
|
### Observability and Quality
|
|
|
|
| Component | Technology | Purpose |
|
|
| --------------------- | ---------------- | ----------------------------- |
|
|
| **Error Tracking** | Sentry / Bugsink | Error monitoring and alerting |
|
|
| **Logging** | Pino | Structured JSON logging |
|
|
| **Log Aggregation** | Logstash | Centralized log collection |
|
|
| **Testing** | Vitest | Unit and integration testing |
|
|
| **API Documentation** | Swagger/OpenAPI | Interactive API documentation |
|
|
|
|
---
|
|
|
|
## System Components
|
|
|
|
### Frontend (React/Vite)
|
|
|
|
The frontend is a single-page application (SPA) built with React 19 and Vite.
|
|
|
|
**Key Characteristics**:
|
|
|
|
- Server state management via TanStack Query
|
|
- Neo-Brutalism design system (ADR-012)
|
|
- Responsive design for mobile and desktop
|
|
- PWA-capable for offline access
|
|
|
|
**Directory Structure**:
|
|
|
|
```text
|
|
src/
|
|
+-- components/ # Reusable UI components
|
|
+-- contexts/ # React context providers
|
|
+-- features/ # Feature-specific modules (ADR-047)
|
|
+-- hooks/ # Custom React hooks
|
|
+-- layouts/ # Page layout components
|
|
+-- pages/ # Route page components
|
|
+-- services/ # API client services
|
|
```
|
|
|
|
### Backend (Express/Node.js)
|
|
|
|
The backend is a RESTful API server built with Express.js 5.
|
|
|
|
**Key Characteristics**:
|
|
|
|
- Layered architecture (Routes -> Services -> Repositories)
|
|
- JWT-based authentication with OAuth support
|
|
- Request validation via Zod schemas
|
|
- Structured logging with Pino
|
|
- Standardized error handling (ADR-001)
|
|
|
|
**API Route Modules** (all versioned under `/api/v1/*`):
|
|
|
|
| Route | Purpose |
|
|
| ------------------------- | ----------------------------------------------- |
|
|
| `/api/v1/auth` | Authentication (login, register, OAuth) |
|
|
| `/api/v1/health` | Health checks and monitoring |
|
|
| `/api/v1/system` | System administration (PM2 status, server info) |
|
|
| `/api/v1/users` | User profile management |
|
|
| `/api/v1/ai` | AI-powered features and flyer processing |
|
|
| `/api/v1/admin` | Administrative functions |
|
|
| `/api/v1/budgets` | Budget management and spending analysis |
|
|
| `/api/v1/achievements` | Gamification and achievement system |
|
|
| `/api/v1/flyers` | Flyer CRUD and processing |
|
|
| `/api/v1/recipes` | Recipe management and recommendations |
|
|
| `/api/v1/personalization` | Master items and user preferences |
|
|
| `/api/v1/price-history` | Price tracking and trend analysis |
|
|
| `/api/v1/stats` | Public statistics and analytics |
|
|
| `/api/v1/upc` | UPC barcode scanning and product lookup |
|
|
| `/api/v1/inventory` | Inventory and expiry tracking |
|
|
| `/api/v1/receipts` | Receipt scanning and purchase history |
|
|
| `/api/v1/deals` | Best prices and deal discovery |
|
|
| `/api/v1/reactions` | Social features (reactions, sharing) |
|
|
| `/api/v1/stores` | Store management and location services |
|
|
| `/api/v1/categories` | Category browsing and product categorization |
|
|
|
|
### Database (PostgreSQL/PostGIS)
|
|
|
|
PostgreSQL serves as the primary data store with PostGIS extension for geographic queries.
|
|
|
|
**Key Features**:
|
|
|
|
- UUID primary keys for user data
|
|
- BIGINT IDENTITY for auto-incrementing IDs
|
|
- PostGIS geography types for store locations
|
|
- Stored functions for complex business logic
|
|
- Triggers for automated updates (e.g., `item_count` maintenance)
|
|
|
|
### Cache (Redis)
|
|
|
|
Redis provides caching and backing for the job queue system.
|
|
|
|
**Usage Patterns**:
|
|
|
|
- Query result caching (flyers, prices, stats)
|
|
- Rate limiting counters
|
|
- BullMQ job queue storage
|
|
- Session token storage
|
|
|
|
### AI (Google Gemini)
|
|
|
|
Google Gemini powers the AI extraction capabilities.
|
|
|
|
**Capabilities**:
|
|
|
|
- Flyer image analysis and data extraction
|
|
- Store name and logo detection
|
|
- Deal item parsing (name, price, quantity)
|
|
- Date range extraction
|
|
- Category classification
|
|
|
|
### Background Workers (BullMQ/PM2)
|
|
|
|
BullMQ workers handle asynchronous processing tasks. PM2 manages both the API server and worker processes in production and development environments (ADR-014).
|
|
|
|
**Dev Container PM2 Processes**:
|
|
|
|
| Process | Script | Purpose |
|
|
| -------------------------- | ---------------------------------- | -------------------------- |
|
|
| `flyer-crawler-api-dev` | `tsx watch server.ts` | API server with hot reload |
|
|
| `flyer-crawler-worker-dev` | `tsx watch src/services/worker.ts` | Background job processing |
|
|
| `flyer-crawler-vite-dev` | `vite --host` | Frontend dev server |
|
|
|
|
**Production PM2 Processes**:
|
|
|
|
| Process | Script | Purpose |
|
|
| -------------------------------- | ------------------ | --------------------------- |
|
|
| `flyer-crawler-api` | `tsx server.ts` | API server (cluster mode) |
|
|
| `flyer-crawler-worker` | `tsx worker.ts` | Background job processing |
|
|
| `flyer-crawler-analytics-worker` | `tsx analytics.ts` | Analytics report generation |
|
|
|
|
**Job Queues**:
|
|
|
|
| Queue | Purpose | Retry Strategy |
|
|
| ---------------------------- | -------------------------------- | ------------------------------------- |
|
|
| `flyer-processing` | Process uploaded flyers with AI | 3 attempts, exponential backoff (5s) |
|
|
| `receipt-processing` | OCR and parse receipts | 3 attempts, exponential backoff (10s) |
|
|
| `email-sending` | Send transactional emails | 5 attempts, exponential backoff (10s) |
|
|
| `analytics-reporting` | Generate daily analytics | 2 attempts, exponential backoff (60s) |
|
|
| `weekly-analytics-reporting` | Generate weekly reports | 2 attempts, exponential backoff (1h) |
|
|
| `file-cleanup` | Remove temporary files | 3 attempts, exponential backoff (30s) |
|
|
| `token-cleanup` | Expire old refresh tokens | 2 attempts, exponential backoff (1h) |
|
|
| `expiry-alerts` | Send pantry expiry notifications | 2 attempts, exponential backoff (5m) |
|
|
| `barcode-detection` | Process barcode scans | 2 attempts, exponential backoff (5s) |
|
|
|
|
---
|
|
|
|
## Data Flow
|
|
|
|
### Flyer Processing Pipeline
|
|
|
|
```text
|
|
+-------------+ +----------------+ +------------------+ +---------------+
|
|
| User | | Express | | BullMQ | | PostgreSQL |
|
|
| Upload +---->+ Route +---->+ Queue +---->+ Storage |
|
|
+-------------+ +-------+--------+ +--------+---------+ +-------+-------+
|
|
| | |
|
|
v v v
|
|
+-------+--------+ +--------+---------+ +-------+-------+
|
|
| Validate | | Worker | | Cache |
|
|
| & Store | | Process | | Invalidate |
|
|
| Temp File | | | | |
|
|
+----------------+ +--------+---------+ +---------------+
|
|
|
|
|
v
|
|
+--------+---------+
|
|
| Google |
|
|
| Gemini AI |
|
|
| Extraction |
|
|
+--------+---------+
|
|
|
|
|
v
|
|
+--------+---------+
|
|
| Transform |
|
|
| & Validate |
|
|
| Data |
|
|
+--------+---------+
|
|
|
|
|
v
|
|
+--------+---------+
|
|
| Persist to |
|
|
| Database |
|
|
| (Transaction) |
|
|
+--------+---------+
|
|
|
|
|
v
|
|
+--------+---------+
|
|
| WebSocket |
|
|
| Notification |
|
|
+------------------+
|
|
```
|
|
|
|
### Detailed Processing Steps
|
|
|
|
1. **Upload**: User uploads flyer image via `/api/flyers/upload`
|
|
2. **Validation**: Server validates file type, size, and generates checksum
|
|
3. **Queueing**: Job added to `flyer-processing` queue with file path
|
|
4. **Worker Pickup**: BullMQ worker picks up job for processing
|
|
5. **AI Extraction**: Google Gemini analyzes image and extracts:
|
|
- Store name
|
|
- Valid date range
|
|
- Store address (if present)
|
|
- Deal items (name, price, quantity, category)
|
|
6. **Data Transformation**: Raw AI output transformed to database schema
|
|
7. **Persistence**: Transactional insert of flyer + items + store
|
|
8. **Cache Invalidation**: Redis cache cleared for affected queries
|
|
9. **Notification**: WebSocket message sent to user with results
|
|
10. **Cleanup**: Temporary files scheduled for deletion
|
|
|
|
---
|
|
|
|
## Architecture Layers
|
|
|
|
The application follows a strict layered architecture as defined in ADR-035.
|
|
|
|
```text
|
|
+-----------------------------------------------------------------------+
|
|
| ROUTES LAYER |
|
|
| Responsibilities: |
|
|
| - HTTP request/response handling |
|
|
| - Input validation (via middleware) |
|
|
| - Authentication/authorization checks |
|
|
| - Rate limiting |
|
|
| - Response formatting (sendSuccess, sendPaginated, sendError) |
|
|
+----------------------------------+------------------------------------+
|
|
|
|
|
v
|
|
+-----------------------------------------------------------------------+
|
|
| SERVICES LAYER |
|
|
| Responsibilities: |
|
|
| - Business logic orchestration |
|
|
| - Transaction coordination (withTransaction) |
|
|
| - External API integration |
|
|
| - Cross-repository operations |
|
|
| - Event publishing |
|
|
+----------------------------------+------------------------------------+
|
|
|
|
|
v
|
|
+-----------------------------------------------------------------------+
|
|
| REPOSITORY LAYER |
|
|
| Responsibilities: |
|
|
| - Direct database access |
|
|
| - Query construction |
|
|
| - Entity mapping |
|
|
| - Error translation (handleDbError) |
|
|
+-----------------------------------------------------------------------+
|
|
```
|
|
|
|
### Layer Communication Rules
|
|
|
|
1. **Routes MUST NOT** directly access repositories (except simple CRUD)
|
|
2. **Repositories MUST NOT** call other repositories (use services)
|
|
3. **Services MAY** call other services
|
|
4. **Infrastructure services MAY** be called from any layer
|
|
|
|
### Service Types and Naming Conventions
|
|
|
|
| Type | Suffix | Example | Location |
|
|
| ------------------- | ------------- | --------------------- | ------------------ |
|
|
| Business Service | `*Service.ts` | `authService.ts` | `src/services/` |
|
|
| Server-Only Service | `*.server.ts` | `aiService.server.ts` | `src/services/` |
|
|
| Database Repository | `*.db.ts` | `user.db.ts` | `src/services/db/` |
|
|
| Infrastructure | Descriptive | `logger.server.ts` | `src/services/` |
|
|
|
|
### Repository Method Naming (ADR-034)
|
|
|
|
| Prefix | Behavior | Return Type |
|
|
| ------- | ----------------------------------- | -------------- |
|
|
| `get*` | Throws `NotFoundError` if not found | Entity |
|
|
| `find*` | Returns `null` if not found | Entity or null |
|
|
| `list*` | Returns empty array if none found | Entity[] |
|
|
|
|
---
|
|
|
|
## Key Entities
|
|
|
|
### Entity Relationship Overview
|
|
|
|
```text
|
|
+------------------+ +------------------+ +------------------+
|
|
| users | | profiles | | addresses |
|
|
|------------------| |------------------| |------------------|
|
|
| user_id (PK) |<-------->| user_id (PK,FK) |--------->| address_id (PK) |
|
|
| email | | full_name | | address_line_1 |
|
|
| password_hash | | avatar_url | | city |
|
|
| refresh_token | | points | | province_state |
|
|
+--------+---------+ | role | | latitude |
|
|
| +------------------+ | longitude |
|
|
| | location (GIS) |
|
|
| +--------+---------+
|
|
| ^
|
|
v |
|
|
+--------+---------+ +------------------+ +--------+---------+
|
|
| stores |--------->| store_locations |--------->| |
|
|
|------------------| |------------------| | |
|
|
| store_id (PK) | | store_location_id| | |
|
|
| name | | store_id (FK) | | |
|
|
| logo_url | | address_id (FK) | | |
|
|
+--------+---------+ +------------------+ +------------------+
|
|
|
|
|
v
|
|
+--------+---------+ +------------------+ +------------------+
|
|
| flyers |--------->| flyer_items |--------->| master_grocery_ |
|
|
|------------------| |------------------| | items |
|
|
| flyer_id (PK) | | flyer_item_id | |------------------|
|
|
| store_id (FK) | | flyer_id (FK) | | master_grocery_ |
|
|
| file_name | | item | | item_id (PK) |
|
|
| image_url | | price_display | | name |
|
|
| valid_from | | price_in_cents | | category_id (FK) |
|
|
| valid_to | | quantity | | is_allergen |
|
|
| status | | master_item_id | +------------------+
|
|
| item_count | | category_id (FK) |
|
|
+------------------+ +------------------+
|
|
```
|
|
|
|
### Core Entities
|
|
|
|
| Entity | Table | Purpose |
|
|
| --------------------- | ---------------------- | --------------------------------------------- |
|
|
| **User** | `users` | Authentication credentials and login tracking |
|
|
| **Profile** | `profiles` | Public user data, preferences, points |
|
|
| **Store** | `stores` | Grocery store chains (Safeway, Kroger, etc.) |
|
|
| **StoreLocation** | `store_locations` | Physical store locations with addresses |
|
|
| **Address** | `addresses` | Normalized address storage with geocoding |
|
|
| **Flyer** | `flyers` | Uploaded flyer metadata and status |
|
|
| **FlyerItem** | `flyer_items` | Individual deals extracted from flyers |
|
|
| **MasterGroceryItem** | `master_grocery_items` | Canonical grocery item dictionary |
|
|
| **Category** | `categories` | Item categorization (Produce, Dairy, etc.) |
|
|
|
|
### User Feature Entities
|
|
|
|
| Entity | Table | Purpose |
|
|
| -------------------- | --------------------- | ------------------------------------ |
|
|
| **UserWatchedItem** | `user_watched_items` | Items user wants to track prices for |
|
|
| **UserAlert** | `user_alerts` | Price alert thresholds |
|
|
| **ShoppingList** | `shopping_lists` | User shopping lists |
|
|
| **ShoppingListItem** | `shopping_list_items` | Items on shopping lists |
|
|
| **Recipe** | `recipes` | User recipes with ingredients |
|
|
| **RecipeIngredient** | `recipe_ingredients` | Recipe ingredient list |
|
|
| **PantryItem** | `pantry_items` | User pantry inventory |
|
|
| **Receipt** | `receipts` | Scanned receipt data |
|
|
| **ReceiptItem** | `receipt_items` | Items parsed from receipts |
|
|
|
|
### Gamification Entities
|
|
|
|
| Entity | Table | Purpose |
|
|
| ------------------- | ------------------- | ------------------------------------- |
|
|
| **Achievement** | `achievements` | Defined achievements |
|
|
| **UserAchievement** | `user_achievements` | Achievements earned by users |
|
|
| **ActivityLog** | `activity_log` | User activity for feeds and analytics |
|
|
|
|
---
|
|
|
|
## Authentication Flow
|
|
|
|
### JWT Token Architecture
|
|
|
|
```text
|
|
+-------------------+ +-------------------+ +-------------------+
|
|
| Login Request | | Server | | Database |
|
|
| (email/pass) +---->+ Validates +---->+ Verify User |
|
|
+-------------------+ +--------+----------+ +-------------------+
|
|
|
|
|
v
|
|
+--------+----------+
|
|
| Generate |
|
|
| JWT Tokens |
|
|
| - Access (15m) |
|
|
| - Refresh (7d) |
|
|
+--------+----------+
|
|
|
|
|
v
|
|
+-------------------+ +--------+----------+
|
|
| Client Storage |<----+ Return Tokens |
|
|
| - Access: Memory| | - Access: Body |
|
|
| - Refresh: HTTP | | - Refresh: Cookie|
|
|
| Only Cookie | +-------------------+
|
|
+-------------------+
|
|
```
|
|
|
|
### Authentication Methods
|
|
|
|
1. **Local Authentication**: Email/password with bcrypt hashing
|
|
2. **Google OAuth 2.0**: Social login via Google account
|
|
3. **GitHub OAuth 2.0**: Social login via GitHub account
|
|
|
|
### Security Features (ADR-016, ADR-048)
|
|
|
|
- **Rate Limiting**: Login attempts rate-limited per IP
|
|
- **Account Lockout**: 15-minute lockout after 5 failed attempts
|
|
- **Password Requirements**: Strength validation via zxcvbn
|
|
- **JWT Rotation**: Access tokens are short-lived, refresh tokens are rotated
|
|
- **HTTPS Only**: All production traffic encrypted
|
|
|
|
### Protected Route Flow
|
|
|
|
```text
|
|
+-------------------+ +-------------------+ +-------------------+
|
|
| API Request | | requireAuth | | JWT Strategy |
|
|
| + Bearer Token +---->+ Middleware +---->+ Validate |
|
|
+-------------------+ +--------+----------+ +--------+----------+
|
|
| |
|
|
| +-------------------+
|
|
| |
|
|
v v
|
|
+--------+-----+----+
|
|
| req.user |
|
|
| populated |
|
|
+--------+----------+
|
|
|
|
|
v
|
|
+--------+----------+
|
|
| Route Handler |
|
|
| Executes |
|
|
+-------------------+
|
|
```
|
|
|
|
---
|
|
|
|
## Background Processing
|
|
|
|
### Worker Architecture
|
|
|
|
```text
|
|
+-------------------+ +-------------------+ +-------------------+
|
|
| API Server | | Redis | | Worker Process |
|
|
| (Queue Producer)| | (Job Storage) | | (Consumer) |
|
|
+--------+----------+ +--------+----------+ +--------+----------+
|
|
| ^ |
|
|
| Add Job | Poll/Process |
|
|
+------------------------>+<------------------------+
|
|
|
|
|
|
|
|
+-------------------------+-------------------------+
|
|
| | |
|
|
v v v
|
|
+--------+----------+ +--------+----------+ +--------+----------+
|
|
| Flyer Worker | | Email Worker | | Analytics |
|
|
| Concurrency: 1 | | Concurrency: 10 | | Worker |
|
|
+-------------------+ +-------------------+ | Concurrency: 1 |
|
|
+-------------------+
|
|
```
|
|
|
|
### Job Lifecycle
|
|
|
|
1. **Queued**: Job added to queue with data payload
|
|
2. **Active**: Worker picks up job and begins processing
|
|
3. **Completed**: Job finishes successfully
|
|
4. **Failed**: Job encounters error, may retry
|
|
5. **Delayed**: Job waiting for retry backoff
|
|
|
|
### Retry Strategy
|
|
|
|
Jobs use exponential backoff for retries:
|
|
|
|
```text
|
|
Attempt 1: Immediate
|
|
Attempt 2: Initial delay (e.g., 5 seconds)
|
|
Attempt 3: 2x delay (e.g., 10 seconds)
|
|
Attempt 4: 4x delay (e.g., 20 seconds)
|
|
...
|
|
```
|
|
|
|
### Scheduled Jobs (ADR-037)
|
|
|
|
| Schedule | Job | Purpose |
|
|
| --------------------- | ---------------- | ------------------------------------------ |
|
|
| Daily 2:00 AM | Analytics Report | Generate daily usage statistics |
|
|
| Weekly Sunday 3:00 AM | Weekly Analytics | Generate weekly summary reports |
|
|
| Every 6 hours | Token Cleanup | Remove expired refresh tokens |
|
|
| Every hour | Expiry Alerts | Check and send pantry expiry notifications |
|
|
|
|
---
|
|
|
|
## Deployment Architecture
|
|
|
|
### Environment Overview
|
|
|
|
```text
|
|
+-----------------------------------------------------------------------------------+
|
|
| DEVELOPMENT |
|
|
+-----------------------------------------------------------------------------------+
|
|
| |
|
|
| +-----------------------------------+ +-----------------------------------+ |
|
|
| | Windows Host Machine | | Linux Dev Container | |
|
|
| | - VS Code | | (flyer-crawler-dev) | |
|
|
| | - Podman Desktop +---->+ - Node.js 22 | |
|
|
| | - Git | | - PM2 (process manager) | |
|
|
| +-----------------------------------+ | - PostgreSQL 16 | |
|
|
| | - Redis 7 | |
|
|
| | - Bugsink (local) | |
|
|
| | - Logstash (log aggregation) | |
|
|
| +-----------------------------------+ |
|
|
+-----------------------------------------------------------------------------------+
|
|
|
|
+-----------------------------------------------------------------------------------+
|
|
| TEST SERVER |
|
|
+-----------------------------------------------------------------------------------+
|
|
| |
|
|
| +-----------------------------------+ +-----------------------------------+ |
|
|
| | NGINX Reverse Proxy | | Application Server | |
|
|
| | flyer-crawler-test.projectium.com | - PM2 Process Manager | |
|
|
| | - SSL/TLS (Let's Encrypt) +---->+ - Node.js 22 | |
|
|
| | - Rate Limiting | | - PostgreSQL 16 | |
|
|
| +-----------------------------------+ | - Redis 7 | |
|
|
| +-----------------------------------+ |
|
|
+-----------------------------------------------------------------------------------+
|
|
|
|
+-----------------------------------------------------------------------------------+
|
|
| PRODUCTION |
|
|
+-----------------------------------------------------------------------------------+
|
|
| |
|
|
| +-----------------------------------+ +-----------------------------------+ |
|
|
| | NGINX Reverse Proxy | | Application Server | |
|
|
| | flyer-crawler.projectium.com | - PM2 Process Manager | |
|
|
| | - SSL/TLS (Let's Encrypt) +---->+ - Node.js 22 (Clustered) | |
|
|
| | - Rate Limiting | | - PostgreSQL 16 | |
|
|
| | - Gzip Compression | | - Redis 7 | |
|
|
| +-----------------------------------+ +-----------------------------------+ |
|
|
| |
|
|
| +-----------------------------------+ |
|
|
| | Monitoring | |
|
|
| | - Bugsink (Error Tracking) | |
|
|
| | - Logstash (Log Aggregation) | |
|
|
| +-----------------------------------+ |
|
|
+-----------------------------------------------------------------------------------+
|
|
```
|
|
|
|
### Deployment Pipeline (ADR-017)
|
|
|
|
```text
|
|
+------------+ +------------+ +------------+ +------------+
|
|
| Push to | | Gitea | | Build & | | Deploy |
|
|
| main +---->+ Actions +---->+ Test +---->+ to Prod |
|
|
+------------+ +------------+ +------------+ +------------+
|
|
|
|
|
v
|
|
+------+------+
|
|
| Type |
|
|
| Check |
|
|
+------+------+
|
|
|
|
|
v
|
|
+------+------+
|
|
| Unit |
|
|
| Tests |
|
|
+------+------+
|
|
|
|
|
v
|
|
+------+------+
|
|
| Build |
|
|
| Assets |
|
|
+-------------+
|
|
```
|
|
|
|
### Server Paths
|
|
|
|
| Environment | Web Root | Data Storage | Flyer Images |
|
|
| ----------- | --------------------------------------------- | ----------------------------------------------------- | ---------------------------------------------------------- |
|
|
| Production | `/var/www/flyer-crawler.projectium.com/` | `/var/www/flyer-crawler.projectium.com/uploads/` | `/var/www/flyer-crawler.projectium.com/flyer-images/` |
|
|
| Test | `/var/www/flyer-crawler-test.projectium.com/` | `/var/www/flyer-crawler-test.projectium.com/uploads/` | `/var/www/flyer-crawler-test.projectium.com/flyer-images/` |
|
|
| Development | Container-local | Container-local | `/app/public/flyer-images/` |
|
|
|
|
Flyer images are served by NGINX as static files at `/flyer-images/` with 7-day browser caching.
|
|
|
|
---
|
|
|
|
## Design Principles and ADRs
|
|
|
|
The system architecture is governed by Architecture Decision Records (ADRs). Key decisions include:
|
|
|
|
### Core Infrastructure
|
|
|
|
| ADR | Title | Status |
|
|
| ------- | ------------------------------------ | -------- |
|
|
| ADR-001 | Standardized Error Handling | Accepted |
|
|
| ADR-002 | Standardized Transaction Management | Accepted |
|
|
| ADR-007 | Configuration and Secrets Management | Accepted |
|
|
| ADR-020 | Health Checks and Probes | Accepted |
|
|
|
|
### API and Integration
|
|
|
|
| ADR | Title | Status |
|
|
| ------- | ----------------------------- | ---------------- |
|
|
| ADR-003 | Standardized Input Validation | Accepted |
|
|
| ADR-008 | API Versioning Strategy | Phase 1 Complete |
|
|
| ADR-022 | Real-time Notification System | Proposed |
|
|
| ADR-028 | API Response Standardization | Implemented |
|
|
|
|
**Implementation Guide**: [API Versioning Infrastructure](./api-versioning-infrastructure.md) (Phase 2)
|
|
|
|
### Security
|
|
|
|
| ADR | Title | Status |
|
|
| ------- | ----------------------- | --------------------- |
|
|
| ADR-016 | API Security Hardening | Accepted |
|
|
| ADR-032 | Rate Limiting Strategy | Accepted |
|
|
| ADR-048 | Authentication Strategy | Partially Implemented |
|
|
|
|
### Architecture Patterns
|
|
|
|
| ADR | Title | Status |
|
|
| ------- | ---------------------------------- | -------- |
|
|
| ADR-034 | Repository Pattern Standards | Accepted |
|
|
| ADR-035 | Service Layer Architecture | Accepted |
|
|
| ADR-036 | Event Bus and Pub/Sub Pattern | Accepted |
|
|
| ADR-041 | AI/Gemini Integration Architecture | Accepted |
|
|
|
|
### Operations
|
|
|
|
| ADR | Title | Status |
|
|
| ------- | ------------------------------- | --------------------- |
|
|
| ADR-006 | Background Job Processing | Accepted |
|
|
| ADR-014 | Containerization and Deployment | Partially Implemented |
|
|
| ADR-037 | Scheduled Jobs and Cron Pattern | Accepted |
|
|
| ADR-038 | Graceful Shutdown Pattern | Accepted |
|
|
|
|
### Observability
|
|
|
|
| ADR | Title | Status |
|
|
| ------- | --------------------------------- | -------- |
|
|
| ADR-004 | Structured Logging | Accepted |
|
|
| ADR-015 | APM and Error Tracking | Proposed |
|
|
| ADR-050 | PostgreSQL Function Observability | Accepted |
|
|
|
|
**Full ADR Index**: [docs/adr/index.md](../adr/index.md)
|
|
|
|
---
|
|
|
|
## Key Files Reference
|
|
|
|
### Configuration Files
|
|
|
|
| File | Purpose |
|
|
| ------------------------ | ------------------------------------------------------ |
|
|
| `server.ts` | Express application setup and middleware configuration |
|
|
| `src/config/env.ts` | Environment variable validation (Zod schema) |
|
|
| `src/config/passport.ts` | Authentication strategies (Local, JWT, OAuth) |
|
|
| `ecosystem.config.cjs` | PM2 process manager configuration |
|
|
| `vite.config.ts` | Vite build and dev server configuration |
|
|
|
|
### Route Files
|
|
|
|
| File | API Prefix |
|
|
| ----------------------------- | -------------- |
|
|
| `src/routes/auth.routes.ts` | `/api/auth` |
|
|
| `src/routes/user.routes.ts` | `/api/users` |
|
|
| `src/routes/flyer.routes.ts` | `/api/flyers` |
|
|
| `src/routes/recipe.routes.ts` | `/api/recipes` |
|
|
| `src/routes/deals.routes.ts` | `/api/deals` |
|
|
| `src/routes/store.routes.ts` | `/api/stores` |
|
|
| `src/routes/admin.routes.ts` | `/api/admin` |
|
|
| `src/routes/health.routes.ts` | `/api/health` |
|
|
|
|
### Service Files
|
|
|
|
| File | Purpose |
|
|
| ----------------------------------------------- | --------------------------------------- |
|
|
| `src/services/flyerProcessingService.server.ts` | Flyer processing pipeline orchestration |
|
|
| `src/services/flyerAiProcessor.server.ts` | AI extraction for flyers |
|
|
| `src/services/aiService.server.ts` | Google Gemini AI integration |
|
|
| `src/services/cacheService.server.ts` | Redis caching abstraction |
|
|
| `src/services/emailService.server.ts` | Email sending |
|
|
| `src/services/queues.server.ts` | BullMQ queue definitions |
|
|
| `src/services/queueService.server.ts` | Queue management and scheduling |
|
|
| `src/services/workers.server.ts` | BullMQ worker definitions |
|
|
| `src/services/websocketService.server.ts` | Real-time WebSocket notifications |
|
|
| `src/services/receiptService.server.ts` | Receipt scanning and OCR |
|
|
| `src/services/upcService.server.ts` | UPC barcode lookup |
|
|
| `src/services/expiryService.server.ts` | Pantry expiry tracking |
|
|
| `src/services/geocodingService.server.ts` | Address geocoding |
|
|
| `src/services/analyticsService.server.ts` | Analytics and reporting |
|
|
| `src/services/monitoringService.server.ts` | Health monitoring |
|
|
| `src/services/barcodeService.server.ts` | Barcode detection |
|
|
| `src/services/logger.server.ts` | Structured logging (Pino) |
|
|
| `src/services/redis.server.ts` | Redis connection management |
|
|
| `src/services/sentry.server.ts` | Error tracking (Sentry/Bugsink) |
|
|
|
|
### Database Files
|
|
|
|
| File | Purpose |
|
|
| --------------------------------------- | -------------------------------------------- |
|
|
| `src/services/db/connection.db.ts` | Database pool and transaction management |
|
|
| `src/services/db/errors.db.ts` | Database error types |
|
|
| `src/services/db/index.db.ts` | Repository exports |
|
|
| `src/services/db/user.db.ts` | User repository |
|
|
| `src/services/db/flyer.db.ts` | Flyer repository |
|
|
| `src/services/db/store.db.ts` | Store repository |
|
|
| `src/services/db/storeLocation.db.ts` | Store location repository |
|
|
| `src/services/db/recipe.db.ts` | Recipe repository |
|
|
| `src/services/db/category.db.ts` | Category repository |
|
|
| `src/services/db/personalization.db.ts` | Master items and personalization |
|
|
| `src/services/db/shopping.db.ts` | Shopping lists repository |
|
|
| `src/services/db/deals.db.ts` | Deals and best prices repository |
|
|
| `src/services/db/price.db.ts` | Price history repository |
|
|
| `src/services/db/receipt.db.ts` | Receipt repository |
|
|
| `src/services/db/upc.db.ts` | UPC scan history repository |
|
|
| `src/services/db/expiry.db.ts` | Expiry tracking repository |
|
|
| `src/services/db/gamification.db.ts` | Achievements repository |
|
|
| `src/services/db/budget.db.ts` | Budget repository |
|
|
| `src/services/db/reaction.db.ts` | User reactions repository |
|
|
| `src/services/db/notification.db.ts` | Notifications repository |
|
|
| `src/services/db/address.db.ts` | Address repository |
|
|
| `src/services/db/admin.db.ts` | Admin operations repository |
|
|
| `src/services/db/conversion.db.ts` | Unit conversion repository |
|
|
| `src/services/db/flyerLocation.db.ts` | Flyer locations repository |
|
|
| `sql/master_schema_rollup.sql` | Complete database schema (for test DB setup) |
|
|
| `sql/initial_schema.sql` | Fresh installation schema |
|
|
|
|
### Type Definitions
|
|
|
|
| File | Purpose |
|
|
| ----------------------- | ---------------------------- |
|
|
| `src/types.ts` | Core entity type definitions |
|
|
| `src/types/job-data.ts` | BullMQ job payload types |
|
|
|
|
---
|
|
|
|
## Additional Resources
|
|
|
|
- **API Documentation**: Available at `/docs/api-docs` in development environments
|
|
- **Testing Guide**: [docs/tests/](../tests/)
|
|
- **Getting Started**: [docs/getting-started/](../getting-started/)
|
|
- **Operations Guide**: [docs/operations/](../operations/)
|
|
- **Authentication Details**: [docs/architecture/AUTHENTICATION.md](./AUTHENTICATION.md)
|
|
- **Database Schema**: [docs/architecture/DATABASE.md](./DATABASE.md)
|
|
- **WebSocket Usage**: [docs/architecture/WEBSOCKET_USAGE.md](./WEBSOCKET_USAGE.md)
|
|
|
|
---
|
|
|
|
_This document is maintained as part of the Flyer Crawler project documentation. For updates, contact the development team or submit a pull request._
|