torbo/flyer-crawler.projectium.com

Fork 0

Files

Torben Sorensen d2efca8339 massive fixes to stores and addresses

2026-01-19 00:33:09 -08:00

21 KiB

Raw Permalink Blame History

Claude Code Project Instructions

Session Startup Checklist

IMPORTANT: At the start of every session, perform these steps:

Check Memory First - Use mcp__memory__read_graph or mcp__memory__search_nodes to recall:
- Project-specific configurations and credentials
- Previous work context and decisions
- Infrastructure details (URLs, ports, access patterns)
- Known issues and their solutions
Review Recent Git History - Check git log --oneline -10 to understand recent changes
Check Container Status - Use mcp__podman__container_list to see what's running

Project Instructions

Things to Remember

Before writing any code:

State how you will verify this change works (test, bash command, browser check, etc.)
Write the test or verification step first
Then implement the code
Run verification and iterate until it passes

Git Bash / MSYS Path Conversion Issue (Windows Host)

CRITICAL ISSUE: Git Bash on Windows automatically converts Unix-style paths to Windows paths, which breaks Podman/Docker commands.

Problem Examples:

# This FAILS in Git Bash:
podman exec container /usr/local/bin/script.sh
# Git Bash converts to: C:/Program Files/Git/usr/local/bin/script.sh

# This FAILS in Git Bash:
podman exec container bash -c "cat /tmp/file.sql"
# Git Bash converts /tmp to C:/Users/user/AppData/Local/Temp

Solutions:

Use sh -c instead of bash -c for single-quoted commands:

podman exec container sh -c '/usr/local/bin/script.sh'

Use double slashes to escape path conversion:

podman exec container //usr//local//bin//script.sh

Set MSYS_NO_PATHCONV environment variable:

MSYS_NO_PATHCONV=1 podman exec container /usr/local/bin/script.sh

Use Windows paths with forward slashes when referencing host files:
```
podman cp "d:/path/to/file" container:/tmp/file
```

ALWAYS use one of these workarounds when running Bash commands on Windows that involve Unix paths inside containers.

Communication Style: Ask Before Assuming

IMPORTANT: When helping with tasks, ask clarifying questions before making assumptions. Do not assume:

What steps the user has or hasn't completed
What the user already knows or has configured
What external services (OAuth providers, APIs, etc.) are already set up
What secrets or credentials have already been created

Instead, ask the user to confirm the current state before providing instructions or making recommendations. This prevents wasted effort and respects the user's existing work.

Platform Requirement: Linux Only

CRITICAL: This application is designed to run exclusively on Linux. See ADR-014 for full details.

Environment Terminology

Dev Container (or just "dev"): The containerized Linux development environment (flyer-crawler-dev). This is where all development and testing should occur.
Host: The Windows machine running Podman/Docker and VS Code.

When instructions say "run in dev" or "run in the dev container", they mean executing commands inside the flyer-crawler-dev container.

Test Execution Rules

ALL tests MUST be executed in the dev container - the Linux container environment
NEVER run tests directly on Windows host - test results from Windows are unreliable
Always use the dev container for testing when developing on Windows
TypeScript type-check MUST run in dev container - npm run type-check on Windows does not reliably detect errors

See docs/TESTING.md for comprehensive testing documentation.

How to Run Tests Correctly

# If on Windows, first open VS Code and "Reopen in Container"
# Then run tests inside the dev container:
npm test           # Run all unit tests
npm run test:unit  # Run unit tests only
npm run test:integration  # Run integration tests (requires DB/Redis)

Running Tests via Podman (from Windows host)

Note: This project has 2900+ unit tests. For AI-assisted development, pipe output to a file for easier processing.

The command to run unit tests in the dev container via podman:

# Basic (output to terminal)
podman exec -it flyer-crawler-dev npm run test:unit

# Recommended for AI processing: pipe to file
podman exec -it flyer-crawler-dev npm run test:unit 2>&1 | tee test-results.txt

The command to run integration tests in the dev container via podman:

podman exec -it flyer-crawler-dev npm run test:integration

For running specific test files:

podman exec -it flyer-crawler-dev npm test -- --run src/hooks/useAuth.test.tsx

Why Linux Only?

Path separators: Code uses POSIX-style paths (/) which may break on Windows
Shell scripts in scripts/ directory are Linux-only
External dependencies like pdftocairo assume Linux installation paths
Unix-style file permissions are assumed throughout

Test Result Interpretation

Tests that pass on Windows but fail on Linux = BROKEN tests (must be fixed)
Tests that fail on Windows but pass on Linux = PASSING tests (acceptable)

Development Workflow

Open project in VS Code
Use "Reopen in Container" (Dev Containers extension required) to enter the dev environment
Wait for dev container initialization to complete
Run npm test to verify the dev environment is working
Make changes and run tests inside the dev container

Code Change Verification

After making any code changes, always run a type-check to catch TypeScript errors before committing:

npm run type-check

This prevents linting/type errors from being introduced into the codebase.

Quick Reference

Command	Description
`npm test`	Run all unit tests
`npm run test:unit`	Run unit tests only
`npm run test:integration`	Run integration tests
`npm run dev:container`	Start dev server (container)
`npm run build`	Build for production
`npm run type-check`	Run TypeScript type checking

Database Schema Files

CRITICAL: The database schema files must be kept in sync with each other. When making schema changes:

File	Purpose
`sql/master_schema_rollup.sql`	Complete schema used by test database setup and reference
`sql/initial_schema.sql`	Base schema without seed data, used as standalone reference
`sql/migrations/*.sql`	Incremental migrations for production database updates

Maintenance Rules:

Keep master_schema_rollup.sql and initial_schema.sql in sync - These files should contain the same table definitions
When adding columns via migration, also add them to both master_schema_rollup.sql and initial_schema.sql
Migrations are for production deployments - They use ALTER TABLE to add columns incrementally
Schema files are for fresh installs - They define the complete table structure
Test database uses master_schema_rollup.sql - If schema files are out of sync with migrations, tests will fail

Example: When 002_expiry_tracking.sql adds purchase_date to pantry_items, that column must also exist in the CREATE TABLE statements in both master_schema_rollup.sql and initial_schema.sql.

Known Integration Test Issues and Solutions

This section documents common test issues encountered in integration tests, their root causes, and solutions. These patterns recur frequently.

1. Vitest globalSetup Runs in Separate Node.js Context

Problem: Vitest's globalSetup runs in a completely separate Node.js context from test files. This means:

Singletons created in globalSetup are NOT the same instances as those in test files
global, globalThis, and process are all isolated between contexts
vi.spyOn() on module exports doesn't work cross-context
Dependency injection via setter methods fails across contexts

Affected Tests: Any test trying to inject mocks into BullMQ worker services (e.g., AI failure tests, DB failure tests)

Solution Options:

Mark tests as .todo() until an API-based mock injection mechanism is implemented
Create test-only API endpoints that allow setting mock behaviors via HTTP
Use file-based or Redis-based mock flags that services check at runtime

Example of affected code pattern:

// This DOES NOT work - different module instances
const { flyerProcessingService } = await import('../../services/workers.server');
flyerProcessingService._getAiProcessor()._setExtractAndValidateData(mockFn);
// The worker uses a different flyerProcessingService instance!

2. BullMQ Cleanup Queue Deleting Files Before Test Verification

Problem: The cleanup worker runs in the globalSetup context and processes cleanup jobs even when tests spy on cleanupQueue.add(). The spy intercepts calls in the test context, but jobs already queued run in the worker's context.

Affected Tests: EXIF/PNG metadata stripping tests that need to verify file contents before deletion

Solution: Drain and pause the cleanup queue before the test:

const { cleanupQueue } = await import('../../services/queues.server');
await cleanupQueue.drain(); // Remove existing jobs
await cleanupQueue.pause(); // Prevent new jobs from processing
// ... run test ...
await cleanupQueue.resume(); // Restore normal operation

3. Cache Invalidation After Direct Database Inserts

Problem: Tests that insert data directly via SQL (bypassing the service layer) don't trigger cache invalidation. Subsequent API calls return stale cached data.

Affected Tests: Any test using pool.query() to insert flyers, stores, or other cached entities

Solution: Manually invalidate the cache after direct inserts:

await pool.query('INSERT INTO flyers ...');
await cacheService.invalidateFlyers(); // Clear stale cache

4. Unique Filenames Required for Test Isolation

Problem: Multer generates predictable filenames in test environments, causing race conditions when multiple tests upload files concurrently or in sequence.

Affected Tests: Flyer processing tests, file upload tests

Solution: Always use unique filenames with timestamps:

// In multer.middleware.ts
const uniqueSuffix = `${Date.now()}-${Math.round(Math.random() * 1e9)}`;
cb(null, `${file.fieldname}-${uniqueSuffix}-${sanitizedOriginalName}`);

5. Response Format Mismatches

Problem: API response formats may change, causing tests to fail when expecting old formats.

Common Issues:

response.body.data.jobId vs response.body.data.job.id
Nested objects vs flat response structures
Type coercion (string vs number for IDs)

Solution: Always log response bodies during debugging and update test assertions to match actual API contracts.

6. External Service Availability

Problem: Tests depending on external services (PM2, Redis health checks) fail when those services aren't available in the test environment.

Solution: Use try/catch with graceful degradation or mock the external service checks.

Secrets and Environment Variables

CRITICAL: This project uses Gitea CI/CD secrets for all sensitive configuration. There is NO /etc/flyer-crawler/environment file or similar local config file on the server.

Server Directory Structure

Path	Environment	Notes
`/var/www/flyer-crawler.projectium.com/`	Production	NO `.env` file - secrets injected via CI/CD only
`/var/www/flyer-crawler-test.projectium.com/`	Test	Has `.env.test` file for test-specific config

How Secrets Work

Gitea Secrets: All secrets are stored in Gitea repository settings (Settings → Secrets)
CI/CD Injection: Secrets are injected during deployment via .gitea/workflows/deploy-to-prod.yml and deploy-to-test.yml
PM2 Environment: The CI/CD workflow passes secrets to PM2 via environment variables, which are then available to the application

Key Files for Configuration

File	Purpose
`src/config/env.ts`	Centralized config with Zod schema validation
`ecosystem.config.cjs`	PM2 process config - reads from `process.env`
`.gitea/workflows/deploy-to-prod.yml`	Production deployment with secret injection
`.gitea/workflows/deploy-to-test.yml`	Test deployment with secret injection
`.env.example`	Template showing all available environment variables
`.env.test`	Test environment overrides (only on test server)

Adding New Secrets

To add a new secret (e.g., SENTRY_DSN):

Add the secret to Gitea repository settings
Update the relevant workflow file (e.g., deploy-to-prod.yml) to inject it:
```
SENTRY_DSN: ${{ secrets.SENTRY_DSN }}
```
Update ecosystem.config.cjs to read it from process.env
Update src/config/env.ts schema if validation is needed
Update .env.example to document the new variable

Current Gitea Secrets

Shared (used by both environments):

DB_HOST - Database host (shared PostgreSQL server)
JWT_SECRET - Authentication
GOOGLE_MAPS_API_KEY - Google Maps
GOOGLE_CLIENT_ID, GOOGLE_CLIENT_SECRET - Google OAuth
GH_CLIENT_ID, GH_CLIENT_SECRET - GitHub OAuth
SENTRY_AUTH_TOKEN - Bugsink API token for source map uploads (create at Settings > API Keys in Bugsink)

Production-specific:

DB_USER_PROD, DB_PASSWORD_PROD - Production database credentials (flyer_crawler_prod)
DB_DATABASE_PROD - Production database name (flyer-crawler)
REDIS_PASSWORD_PROD - Redis password (uses database 0)
VITE_GOOGLE_GENAI_API_KEY - Gemini API key for production
SENTRY_DSN, VITE_SENTRY_DSN - Bugsink error tracking DSNs (production projects)

Test-specific:

DB_USER_TEST, DB_PASSWORD_TEST - Test database credentials (flyer_crawler_test)
DB_DATABASE_TEST - Test database name (flyer-crawler-test)
REDIS_PASSWORD_TEST - Redis password (uses database 1 for isolation)
VITE_GOOGLE_GENAI_API_KEY_TEST - Gemini API key for test
SENTRY_DSN_TEST, VITE_SENTRY_DSN_TEST - Bugsink error tracking DSNs (test projects)

Test Environment

The test environment (flyer-crawler-test.projectium.com) uses both Gitea CI/CD secrets and a local .env.test file:

Gitea secrets: Injected during deployment via .gitea/workflows/deploy-to-test.yml
.env.test file: Located at /var/www/flyer-crawler-test.projectium.com/.env.test for local overrides
Redis database 1: Isolates test job queues from production (which uses database 0)
PM2 process names: Suffixed with -test (e.g., flyer-crawler-api-test)

Database User Setup (Test Environment)

CRITICAL: The test database requires specific PostgreSQL permissions to be configured manually. Schema ownership alone is NOT sufficient - explicit privileges must be granted.

Database Users:

User	Database	Purpose
`flyer_crawler_prod`	`flyer-crawler-prod`	Production
`flyer_crawler_test`	`flyer-crawler-test`	Testing

Required Setup Commands (run as postgres superuser):

# Connect as postgres superuser
sudo -u postgres psql

# Create the test database and user (if not exists)
CREATE DATABASE "flyer-crawler-test";
CREATE USER flyer_crawler_test WITH PASSWORD 'your-password-here';

# Grant ownership and privileges
ALTER DATABASE "flyer-crawler-test" OWNER TO flyer_crawler_test;
\c "flyer-crawler-test"
ALTER SCHEMA public OWNER TO flyer_crawler_test;
GRANT CREATE, USAGE ON SCHEMA public TO flyer_crawler_test;

# Create required extension (must be done by superuser)
CREATE EXTENSION IF NOT EXISTS "uuid-ossp";

Why These Steps Are Necessary:

Schema ownership alone is insufficient - PostgreSQL requires explicit GRANT CREATE, USAGE privileges even when the user owns the schema
uuid-ossp extension - Required by the application for UUID generation; must be created by a superuser before the app can use it
Separate users for prod/test - Prevents accidental cross-environment data access; each environment has its own credentials in Gitea secrets

Verification:

# Check schema privileges (should show 'UC' for flyer_crawler_test)
psql -d "flyer-crawler-test" -c "\dn+ public"

# Expected output:
#  Name  |       Owner        |            Access privileges
# -------+--------------------+------------------------------------------
# public | flyer_crawler_test | flyer_crawler_test=UC/flyer_crawler_test

Dev Container Environment

The dev container runs its own local Bugsink instance - it does NOT connect to the production Bugsink server:

Local Bugsink: Runs at http://localhost:8000 inside the container
Pre-configured DSNs: Set in compose.dev.yml, pointing to local instance
Admin credentials: admin@localhost / admin
Isolated: Dev errors stay local, don't pollute production/test dashboards
No Gitea secrets needed: Everything is self-contained in the container

MCP Servers

The following MCP servers are configured for this project:

Server	Purpose
gitea-projectium	Gitea API for gitea.projectium.com
gitea-torbonium	Gitea API for gitea.torbonium.com
podman	Container management
filesystem	File system access
fetch	Web fetching
markitdown	Convert documents to markdown
sequential-thinking	Step-by-step reasoning
memory	Knowledge graph persistence
postgres	Direct database queries (localhost:5432)
playwright	Browser automation and testing
redis	Redis cache inspection (localhost:6379)
sentry-selfhosted-mcp	Error tracking via Bugsink (localhost:8000)

Note: MCP servers work in both Claude CLI and Claude Code VS Code extension (as of January 2026).

Sentry/Bugsink MCP Server Setup (ADR-015)

To enable Claude Code to query and analyze application errors from Bugsink:

Install the MCP server:

# Clone the sentry-selfhosted-mcp repository
git clone https://github.com/ddfourtwo/sentry-selfhosted-mcp.git
cd sentry-selfhosted-mcp
npm install

Configure Claude Code (add to .claude/mcp.json):

{
  "sentry-selfhosted-mcp": {
    "command": "node",
    "args": ["/path/to/sentry-selfhosted-mcp/dist/index.js"],
    "env": {
      "SENTRY_URL": "http://localhost:8000",
      "SENTRY_AUTH_TOKEN": "<get-from-bugsink-ui>",
      "SENTRY_ORG_SLUG": "flyer-crawler"
    }
  }
}

Get the auth token:
- Navigate to Bugsink UI at http://localhost:8000
- Log in with admin credentials
- Go to Settings > API Keys
- Create a new API key with read access
Available capabilities:
- List projects and issues
- View detailed error events
- Search by error message or stack trace
- Update issue status (resolve, ignore)
- Add comments to issues

SSH Server Access

Claude Code can execute commands on the production server via SSH:

# Basic command execution
ssh root@projectium.com "command here"

# Examples:
ssh root@projectium.com "systemctl status logstash"
ssh root@projectium.com "pm2 list"
ssh root@projectium.com "tail -50 /var/www/flyer-crawler.projectium.com/logs/app.log"

Use cases:

Managing Logstash, PM2, NGINX, Redis services
Viewing server logs
Deploying configuration changes
Checking service status

Important: SSH access requires the host machine to have SSH keys configured for root@projectium.com.

21 KiB Raw Permalink Blame History