Flyer Crawler is a web application that uses the Google Gemini AI to extract, analyze, and manage data from grocery store flyers. Users can upload flyer images or PDFs, and the application will automatically identify items, prices, and sale dates, storing the structured data in a PostgreSQL database for historical analysis, price tracking, and personalized deal alerts.

We are working on an app to help people save money, by finding good deals that are only advertized in store flyers/ads. So, the primary purpose of the site is to make uploading flyers as easy as possible and as accurate as possible, and to store peoples needs, so sales can be matched to needs.

Features

AI-Powered Data Extraction: Upload PNG, JPG, or PDF flyers to automatically extract store names, sale dates, and a detailed list of items with prices and quantities.
Bulk Import: Process multiple flyers at once with a summary report of successes, skips (duplicates), and errors.
Database Integration: All extracted data is saved to a PostgreSQL database, enabling long-term persistence and analysis.
Personalized Watchlist: Authenticated users can create a "watchlist" of specific grocery items they want to track.
Active Deal Alerts: The app highlights current sales on your watched items from all valid flyers in the database.
Price History Charts: Visualize the price trends of your watched items over time.
Shopping List Management: Users can create multiple shopping lists, add items from flyers or their watchlist, and track purchased items.
User Authentication & Management: Secure user sign-up, login, and profile management, including a secure account deletion process.
Dynamic UI: A responsive interface with dark mode and a choice between metric/imperial unit systems.

Tech Stack

Frontend: React, TypeScript, Tailwind CSS
AI: Google Gemini API (@google/genai)
Backend: Node.js with Express
Database: PostgreSQL
Authentication: Passport.js
UI Components: Recharts for charts

Required Secrets & Configuration

This project is configured to run in a CI/CD environment and does not use .env files. All configuration and secrets must be provided as environment variables. For deployments using the included Gitea workflows, these must be configured as repository secrets in your Gitea instance.

DB_HOST, DB_USER, DB_PASSWORD: Credentials for your PostgreSQL server. The port is assumed to be 5432.
DB_DATABASE_PROD: The name of your production database.
REDIS_PASSWORD_PROD: The password for your production Redis instance.
REDIS_PASSWORD_TEST: The password for your test Redis instance.
JWT_SECRET: A long, random, and secret string for signing authentication tokens.
VITE_GOOGLE_GENAI_API_KEY: Your Google Gemini API key.
GOOGLE_MAPS_API_KEY: Your Google Maps Geocoding API key.

Setup and Installation

Step 1: Set Up PostgreSQL Database

Set up a PostgreSQL database instance.
Run the Database Schema:
- Connect to your database using a tool like psql or DBeaver.
- Open sql/schema.sql.txt, copy its entire contents, and execute it against your database.
- This will create all necessary tables, functions, and relationships.

Step 2: Install Dependencies and Run the Application

Install Dependencies:
```
npm install
```
Run the Application:
```
npm run start:prod
```

Step 3: Seed Development Users (Optional)

To create the initial admin@example.com and user@example.com accounts, you can run the seed script:

npm run seed

After running, you may need to restart your IDE's TypeScript server to pick up the changes.

NGINX mime types issue

sudo nano /etc/nginx/mime.types

change

application/javascript js;

application/javascript js mjs;

RESTART NGINX

sudo nginx -t sudo systemctl reload nginx

actually the proper change was to do this in the /etc/nginx/sites-available/flyer-crawler.projectium.com file

for OAuth

Get Google OAuth Credentials This is a crucial step that you must do outside the codebase:

Go to the Google Cloud Console.

Create a new project (or select an existing one).

In the navigation menu, go to APIs & Services > Credentials.

Click Create Credentials > OAuth client ID.

Select Web application as the application type.

Under Authorized redirect URIs, click ADD URI and enter the URL where Google will redirect users back to your server. For local development, this will be: http://localhost:3001/api/auth/google/callback.

Click Create. You will be given a Client ID and a Client Secret.

Get GitHub OAuth Credentials You'll need to obtain a Client ID and Client Secret from GitHub:

Go to your GitHub profile settings.

Navigate to Developer settings > OAuth Apps.

Click New OAuth App.

Fill in the required fields:

Application name: A descriptive name for your app (e.g., "Flyer Crawler"). Homepage URL: The base URL of your application (e.g., http://localhost:5173 for local development). Authorization callback URL: This is where GitHub will redirect users after they authorize your app. For local development, this will be: http://localhost:3001/api/auth/github/callback. Click Register application.

You will be given a Client ID and a Client Secret.

connect to postgres on projectium.com

psql -h localhost -U flyer_crawler_user -d "flyer-crawler-prod" -W

postgis

flyer-crawler-prod=> SELECT version(); version

PostgreSQL 14.19 (Ubuntu 14.19-0ubuntu0.22.04.1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04.2) 11.4.0, 64-bit (1 row)

flyer-crawler-prod=> SELECT PostGIS_Full_Version(); postgis_full_version

POSTGIS="3.2.0 c3e3cc0" [EXTENSION] PGSQL="140" GEOS="3.10.2-CAPI-1.16.0" PROJ="8.2.1" LIBXML="2.9.12" LIBJSON="0.15" LIBPROTOBUF="1.3.3" WAGYU="0.5.0 (Internal)" (1 row)

production postgres setup

Part 1: Production Database Setup This database will be the live, persistent storage for your application.

Step 1: Install PostgreSQL (if not already installed) First, ensure PostgreSQL is installed on your server.

bash sudo apt update sudo apt install postgresql postgresql-contrib Step 2: Create the Production Database and User It's best practice to create a dedicated, non-superuser role for your application to connect with.

Switch to the postgres system user to get superuser access to the database.

bash sudo -u postgres psql Inside the psql shell, run the following SQL commands. Remember to replace 'a_very_strong_password' with a secure password that you will manage with a secrets tool or in your .env file.

sql -- Create a new role (user) for your application CREATE ROLE flyer_crawler_user WITH LOGIN PASSWORD 'a_very_strong_password';

-- Create the production database and assign ownership to the new user CREATE DATABASE "flyer-crawler-prod" WITH OWNER = flyer_crawler_user;

-- Connect to the new database to install extensions within it. \c "flyer-crawler-prod"

-- Install the required extensions as a superuser. This only needs to be done once. CREATE EXTENSION IF NOT EXISTS postgis; CREATE EXTENSION IF NOT EXISTS pg_trgm; CREATE EXTENSION IF NOT EXISTS "uuid-ossp";

-- Exit the psql shell

Step 3: Apply the Master Schema Now, you'll populate your new database with all the tables, functions, and initial data. Your master_schema_rollup.sql file is perfect for this.

Navigate to your project's root directory on the server.

Run the following command to execute the master schema script against your new production database. You will be prompted for the password you created in the previous step.

bash psql -U flyer_crawler_user -d "flyer-crawler-prod" -f sql/master_schema_rollup.sql This single command creates all tables, extensions (pg_trgm, postgis), functions, and triggers, and seeds essential data like categories and master items.

Step 4: Seed the Admin Account (If Needed) Your application has a separate script to create the initial admin user. To run it, you must first set the required environment variables in your shell session.

bash

Set variables for the current session

export DB_USER=flyer_crawler_user DB_PASSWORD=your_password DB_NAME="flyer-crawler-prod" ...

Run the seeding script

npx tsx src/db/seed_admin_account.ts Your production database is now ready!

Part 2: Test Database Setup (for CI/CD) Your Gitea workflow (deploy.yml) already automates the creation and teardown of the test database during the pipeline run. The steps below are for understanding what the workflow does and for manual setup if you ever need to run tests outside the CI pipeline.

The process your CI pipeline follows is:

Setup (sql/test_setup.sql):

As the postgres superuser, it runs sql/test_setup.sql. This creates a temporary role named test_runner. It creates a separate database named "flyer-crawler-test" owned by test_runner. Schema Application (src/tests/setup/global-setup.ts):

The test runner (vitest) executes the global-setup.ts file. This script connects to the "flyer-crawler-test" database using the temporary credentials. It then runs the same sql/master_schema_rollup.sql file, ensuring your test database has the exact same structure as production. Test Execution:

Your tests run against this clean, isolated "flyer-crawler-test" database. Teardown (sql/test_teardown.sql):

After tests complete (whether they pass or fail), the if: always() step in your workflow ensures that sql/test_teardown.sql is executed. This script terminates any lingering connections to the test database, drops the "flyer-crawler-test" database completely, and drops the test_runner role.

Part 3: Test Database Setup (for CI/CD and Local Testing) Your Gitea workflow and local test runner rely on a permanent test database. This database needs to be created once on your server. The test runner will automatically reset the schema inside it before every test run.

Step 1: Create the Test Database On your server, switch to the postgres system user to get superuser access.

bash sudo -u postgres psql Inside the psql shell, create a new database. We will assign ownership to the same flyer_crawler_user that your application uses. This user needs to be the owner to have permission to drop and recreate the schema during testing.

sql -- Create the test database and assign ownership to your existing application user CREATE DATABASE "flyer-crawler-test" WITH OWNER = flyer_crawler_user;

-- Connect to the newly created test database \c "flyer-crawler-test"

-- Grant ownership of the public schema within this database to your application user. -- This is CRITICAL for allowing the test runner to drop and recreate the schema. ALTER SCHEMA public OWNER TO flyer_crawler_user;

-- Exit the psql shell \q

Step 2: Configure Gitea Secrets for Testing Your CI pipeline needs to know how to connect to this test database. Ensure the following secrets are set in your Gitea repository settings:

DB_HOST: The hostname of your database server (e.g., localhost). DB_PORT: The port for your database (e.g., 5432). DB_USER: The user for the database (e.g., flyer_crawler_user). DB_PASSWORD: The password for the database user. The workflow file (.gitea/workflows/deploy.yml) is configured to use these secrets and will automatically connect to the "flyer-crawler-test" database when it runs the npm test command.

How the Test Workflow Works The CI pipeline no longer uses sudo or creates/destroys the database on each run. Instead, the process is now:

Setup: The vitest global setup script (src/tests/setup/global-setup.ts) connects to the permanent "flyer-crawler-test" database.

Schema Reset: It executes sql/drop_tables.sql (which runs DROP SCHEMA public CASCADE) to completely wipe all tables, functions, and triggers.

Schema Application: It then immediately executes sql/master_schema_rollup.sql to build a fresh, clean schema and seed initial data.

Test Execution: Your tests run against this clean, isolated schema.

This approach is faster, more reliable, and removes the need for sudo access within the CI pipeline.

gitea-runner@projectium:~$ pm2 install pm2-logrotate [PM2][Module] Installing NPM pm2-logrotate module [PM2][Module] Calling [NPM] to install pm2-logrotate ...

added 161 packages in 5s

21 packages are looking for funding run npm fund for details npm notice npm notice New patch version of npm available! 11.6.3 -> 11.6.4 npm notice Changelog: https://github.com/npm/cli/releases/tag/v11.6.4 npm notice To update run: npm install -g npm@11.6.4 npm notice [PM2][Module] Module downloaded [PM2][WARN] Applications pm2-logrotate not running, starting... [PM2] App [pm2-logrotate] launched (1 instances) Module: pm2-logrotate $ pm2 set pm2-logrotate:max_size 10M $ pm2 set pm2-logrotate:retain 30 $ pm2 set pm2-logrotate:compress false $ pm2 set pm2-logrotate:dateFormat YYYY-MM-DD_HH-mm-ss $ pm2 set pm2-logrotate:workerInterval 30 $ pm2 set pm2-logrotate:rotateInterval 0 0 * * * $ pm2 set pm2-logrotate:rotateModule true Modules configuration. Copy/Paste line to edit values. [PM2][Module] Module successfully installed and launched [PM2][Module] Checkout module options: $ pm2 conf ┌────┬───────────────────────────────────┬─────────────┬─────────┬─────────┬──────────┬────────┬──────┬───────────┬──────────┬──────────┬──────────┬──────────┐ │ id │ name │ namespace │ version │ mode │ pid │ uptime │ ↺ │ status │ cpu │ mem │ user │ watching │ ├────┼───────────────────────────────────┼─────────────┼─────────┼─────────┼──────────┼────────┼──────┼───────────┼──────────┼──────────┼──────────┼──────────┤ │ 2 │ flyer-crawler-analytics-worker │ default │ 0.0.0 │ fork │ 3846981 │ 7m │ 5 │ online │ 0% │ 55.8mb │ git… │ disabled │ │ 11 │ flyer-crawler-api │ default │ 0.0.0 │ fork │ 3846987 │ 7m │ 0 │ online │ 0% │ 59.0mb │ git… │ disabled │ │ 12 │ flyer-crawler-worker │ default │ 0.0.0 │ fork │ 3846988 │ 7m │ 0 │ online │ 0% │ 54.2mb │ git… │ disabled │ └────┴───────────────────────────────────┴─────────────┴─────────┴─────────┴──────────┴────────┴──────┴───────────┴──────────┴──────────┴──────────┴──────────┘ Module ┌────┬──────────────────────────────┬───────────────┬──────────┬──────────┬──────┬──────────┬──────────┬──────────┐ │ id │ module │ version │ pid │ status │ ↺ │ cpu │ mem │ user │ ├────┼──────────────────────────────┼───────────────┼──────────┼──────────┼──────┼──────────┼──────────┼──────────┤ │ 13 │ pm2-logrotate │ 3.0.0 │ 3848878 │ online │ 0 │ 0% │ 20.1mb │ git… │ └────┴──────────────────────────────┴───────────────┴──────────┴──────────┴──────┴──────────┴──────────┴──────────┘ gitea-runner@projectium:$ pm2 set pm2-logrotate:max_size 10M [PM2] Module pm2-logrotate restarted [PM2] Setting changed Module: pm2-logrotate $ pm2 set pm2-logrotate:max_size 10M $ pm2 set pm2-logrotate:retain 30 $ pm2 set pm2-logrotate:compress false $ pm2 set pm2-logrotate:dateFormat YYYY-MM-DD_HH-mm-ss $ pm2 set pm2-logrotate:workerInterval 30 $ pm2 set pm2-logrotate:rotateInterval 0 0 * * * $ pm2 set pm2-logrotate:rotateModule true gitea-runner@projectium:$ pm2 set pm2-logrotate:retain 14 [PM2] Module pm2-logrotate restarted [PM2] Setting changed Module: pm2-logrotate $ pm2 set pm2-logrotate:max_size 10M $ pm2 set pm2-logrotate:retain 14 $ pm2 set pm2-logrotate:compress false $ pm2 set pm2-logrotate:dateFormat YYYY-MM-DD_HH-mm-ss $ pm2 set pm2-logrotate:workerInterval 30 $ pm2 set pm2-logrotate:rotateInterval 0 0 * * * $ pm2 set pm2-logrotate:rotateModule true gitea-runner@projectium:~$

dev server setup:

Here are the steps to set up the development environment on Windows using Podman with an Ubuntu container:

Install Prerequisites on Windows Install WSL 2: Podman on Windows relies on the Windows Subsystem for Linux. Install it by running wsl --install in an administrator PowerShell. Install Podman Desktop: Download and install Podman Desktop for Windows.
Set Up Podman Initialize Podman: Launch Podman Desktop. It will automatically set up its WSL 2 machine. Start Podman: Ensure the Podman machine is running from the Podman Desktop interface.
Set Up the Ubuntu Container

Pull Ubuntu Image: Open a PowerShell or command prompt and pull the latest Ubuntu image: podman pull ubuntu:latest
Create a Podman Volume: Create a volume to persist node_modules and avoid installing them every time the container starts. podman volume create node_modules_cache
Run the Ubuntu Container: Start a new container with the project directory mounted and the necessary ports forwarded.
- Open a terminal in your project's root directory on Windows.
- Run the following command, replacing D:\gitea\flyer-crawler.projectium.com\flyer-crawler.projectium.com with the full path to your project:

podman run -it -p 3001:3001 -p 5173:5173 --name flyer-dev -v "D:\gitea\flyer-crawler.projectium.com\flyer-crawler.projectium.com:/app" -v "node_modules_cache:/app/node_modules" ubuntu:latest

-p 3001:3001: Forwards the backend server port. -p 5173:5173: Forwards the Vite frontend server port. --name flyer-dev: Names the container for easy reference. -v "...:/app": Mounts your project directory into the container at /app. -v "node_modules_cache:/app/node_modules": Mounts the named volume for node_modules.

Configure the Ubuntu Environment You are now inside the Ubuntu container's shell.

Update Package Lists: apt-get update
Install Dependencies: Install curl, git, and nodejs (which includes npm). apt-get install -y curl git curl -sL https://deb.nodesource.com/setup_20.x | bash - apt-get install -y nodejs
Navigate to Project Directory: cd /app
Install Project Dependencies: npm install

Run the Development Server
- Start the Application: npm run dev
Accessing the Application

Frontend: Open your browser and go to http://localhost:5173.
Backend: The frontend will make API calls to http://localhost:3001.

Managing the Environment

Stopping the Container: Press Ctrl+C in the container terminal, then type exit.
Restarting the Container: podman start -a -i flyer-dev

for me:

cd /mnt/d/gitea/flyer-crawler.projectium.com/flyer-crawler.projectium.com podman run -it -p 3001:3001 -p 5173:5173 --name flyer-dev -v "$(pwd):/app" -v "node_modules_cache:/app/node_modules" ubuntu:latest

  rate limiting

  respect the AI service's rate limits, making it more stable and robust. You can adjust the GEMINI_RPM environment variable in your production environment as needed without changing the code.

Languages

TypeScript 97.3%

PLpgSQL 1.5%

PowerShell 0.6%

JavaScript 0.3%

Shell 0.2%