Torben Sorensen 59ace9b31e
Some checks are pending
Deploy to Web Server flyer-crawler.projectium.com / deploy (push) Has started running
final initial database fixes - IT WORKS
2025-11-21 18:48:17 -08:00
2025-11-21 10:26:40 -08:00
2025-11-20 22:11:09 -08:00
2025-11-20 22:11:09 -08:00

Flyer Crawler - Grocery AI Analyzer

Flyer Crawler is a web application that uses the Google Gemini AI to extract, analyze, and manage data from grocery store flyers. Users can upload flyer images or PDFs, and the application will automatically identify items, prices, and sale dates, storing the structured data in a PostgreSQL database for historical analysis, price tracking, and personalized deal alerts.

We are working on an app to help people save money, by finding good deals that are only advertized in store flyers/ads. So, the primary purpose of the site is to make uploading flyers as easy as possible and as accurate as possible, and to store peoples needs, so sales can be matched to needs.

Features

  • AI-Powered Data Extraction: Upload PNG, JPG, or PDF flyers to automatically extract store names, sale dates, and a detailed list of items with prices and quantities.
  • Bulk Import: Process multiple flyers at once with a summary report of successes, skips (duplicates), and errors.
  • Database Integration: All extracted data is saved to a PostgreSQL database, enabling long-term persistence and analysis.
  • Personalized Watchlist: Authenticated users can create a "watchlist" of specific grocery items they want to track.
  • Active Deal Alerts: The app highlights current sales on your watched items from all valid flyers in the database.
  • Price History Charts: Visualize the price trends of your watched items over time.
  • Shopping List Management: Users can create multiple shopping lists, add items from flyers or their watchlist, and track purchased items.
  • User Authentication & Management: Secure user sign-up, login, and profile management, including a secure account deletion process.
  • Dynamic UI: A responsive interface with dark mode and a choice between metric/imperial unit systems.

Tech Stack

  • Frontend: React, TypeScript, Tailwind CSS
  • AI: Google Gemini API (@google/genai)
  • Backend: Node.js with Express
  • Database: PostgreSQL
  • Authentication: Passport.js
  • UI Components: Recharts for charts

Required Environment Variables & Setup

This project requires several secret keys to function. Create a .env file in the root of the project and see the env.example file for a complete template.

  • For the AI Service: VITE_GOOGLE_GENAI_API_KEY. This is your public-facing Google Gemini API key.
  • For the Database: You will need to provide connection details for your PostgreSQL database, such as DB_USER, DB_HOST, DB_DATABASE, DB_PASSWORD, and DB_PORT.

Setup and Installation

Step 1: Set Up PostgreSQL Database

  1. Set up a PostgreSQL database instance.
  2. Run the Database Schema:
    • Connect to your database using a tool like psql or DBeaver.
    • Open sql/schema.sql.txt, copy its entire contents, and execute it against your database.
    • This will create all necessary tables, functions, and relationships.

Step 2: Install Dependencies and Run the Application

  1. Install Dependencies:

    npm install
    
  2. Run the Application:

    npm run dev
    

Step 3: Seed Development Users (Optional)

To create the initial admin@example.com and user@example.com accounts, you can run the seed script:

npm run seed

After running, you may need to restart your IDE's TypeScript server to pick up the changes.

NGINX mime types issue

sudo nano /etc/nginx/mime.types

change

application/javascript js;

TO

application/javascript js mjs;

RESTART NGINX

sudo nginx -t sudo systemctl reload nginx

actually the proper change was to do this in the /etc/nginx/sites-available/flyer-crawler.projectium.com file

for OAuth

  1. Get Google OAuth Credentials This is a crucial step that you must do outside the codebase:

Go to the Google Cloud Console.

Create a new project (or select an existing one).

In the navigation menu, go to APIs & Services > Credentials.

Click Create Credentials > OAuth client ID.

Select Web application as the application type.

Under Authorized redirect URIs, click ADD URI and enter the URL where Google will redirect users back to your server. For local development, this will be: http://localhost:3001/api/auth/google/callback.

Click Create. You will be given a Client ID and a Client Secret.

Add these credentials to your .env file at the project root:

plaintext GOOGLE_CLIENT_ID="your-client-id-from-google" GOOGLE_CLIENT_SECRET="your-client-secret-from-google"

  1. Get GitHub OAuth Credentials You'll need to obtain a Client ID and Client Secret from GitHub:

Go to your GitHub profile settings.

Navigate to Developer settings > OAuth Apps.

Click New OAuth App.

Fill in the required fields:

Application name: A descriptive name for your app (e.g., "Flyer Crawler"). Homepage URL: The base URL of your application (e.g., http://localhost:5173 for local development). Authorization callback URL: This is where GitHub will redirect users after they authorize your app. For local development, this will be: http://localhost:3001/api/auth/github/callback. Click Register application.

You will be given a Client ID and a Client Secret.

Add these credentials to your .env file at the project root:

plaintext GITHUB_CLIENT_ID="your-github-client-id" GITHUB_CLIENT_SECRET="your-github-client-secret"

connect to postgres on projectium.com

psql -h localhost -U flyer_crawler_user -d "flyer-crawler-prod" -W

postgis

flyer-crawler-prod=> SELECT version(); version

PostgreSQL 14.19 (Ubuntu 14.19-0ubuntu0.22.04.1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04.2) 11.4.0, 64-bit (1 row)

flyer-crawler-prod=> SELECT PostGIS_Full_Version(); postgis_full_version

POSTGIS="3.2.0 c3e3cc0" [EXTENSION] PGSQL="140" GEOS="3.10.2-CAPI-1.16.0" PROJ="8.2.1" LIBXML="2.9.12" LIBJSON="0.15" LIBPROTOBUF="1.3.3" WAGYU="0.5.0 (Internal)" (1 row)

production postgres setup

Part 1: Production Database Setup This database will be the live, persistent storage for your application.

Step 1: Install PostgreSQL (if not already installed) First, ensure PostgreSQL is installed on your server.

bash sudo apt update sudo apt install postgresql postgresql-contrib Step 2: Create the Production Database and User It's best practice to create a dedicated, non-superuser role for your application to connect with.

Switch to the postgres system user to get superuser access to the database.

bash sudo -u postgres psql Inside the psql shell, run the following SQL commands. Remember to replace 'a_very_strong_password' with a secure password that you will manage with a secrets tool or in your .env file.

sql -- Create a new role (user) for your application CREATE ROLE flyer_crawler_user WITH LOGIN PASSWORD 'a_very_strong_password';

-- Create the production database and assign ownership to the new user CREATE DATABASE "flyer-crawler-prod" WITH OWNER = flyer_crawler_user;

-- Exit the psql shell

Step 3: Apply the Master Schema Now, you'll populate your new database with all the tables, functions, and initial data. Your master_schema_rollup.sql file is perfect for this.

Navigate to your project's root directory on the server.

Run the following command to execute the master schema script against your new production database. You will be prompted for the password you created in the previous step.

bash psql -U flyer_crawler_user -d "flyer-crawler-prod" -f sql/master_schema_rollup.sql This single command creates all tables, extensions (pg_trgm, postgis), functions, and triggers, and seeds essential data like categories and master items.

Step 4: Seed the Admin Account Your application has a separate script to create the initial admin user.

Ensure your .env file on the server is configured with the correct production database credentials (DB_USER, DB_PASSWORD, DB_DATABASE="flyer-crawler-prod", etc.).

Run the admin seeding script using tsx.

bash npx tsx src/db/seed_admin_account.ts Your production database is now ready! Your application can connect to it using the flyer_crawler_user role and the credentials in your .env file.

Part 2: Test Database Setup (for CI/CD) Your Gitea workflow (deploy.yml) already automates the creation and teardown of the test database during the pipeline run. The steps below are for understanding what the workflow does and for manual setup if you ever need to run tests outside the CI pipeline.

The process your CI pipeline follows is:

Setup (sql/test_setup.sql):

As the postgres superuser, it runs sql/test_setup.sql. This creates a temporary role named test_runner. It creates a separate database named "flyer-crawler-test" owned by test_runner. Schema Application (src/tests/setup/global-setup.ts):

The test runner (vitest) executes the global-setup.ts file. This script connects to the "flyer-crawler-test" database using the temporary credentials. It then runs the same sql/master_schema_rollup.sql file, ensuring your test database has the exact same structure as production. Test Execution:

Your tests run against this clean, isolated "flyer-crawler-test" database. Teardown (sql/test_teardown.sql):

After tests complete (whether they pass or fail), the if: always() step in your workflow ensures that sql/test_teardown.sql is executed. This script terminates any lingering connections to the test database, drops the "flyer-crawler-test" database completely, and drops the test_runner role.

Description
The Google AI based Flyer Crawler App
Readme 69 MiB
Languages
TypeScript 97.3%
PLpgSQL 1.5%
PowerShell 0.6%
JavaScript 0.3%
Shell 0.2%