torbo/flyer-crawler.projectium.com

Fork 0

Files

Torben Sorensen 6e36cc3b07

Deploy to Test Environment / deploy-to-test (push) Successful in 16m34s

Details

logging + e2e test fixes

2026-01-12 19:10:29 -08:00

42 KiB

Raw Permalink Blame History

Bare-Metal Server Setup Guide

This guide covers the manual installation of Flyer Crawler and its dependencies on a bare-metal Ubuntu server (e.g., a colocation server). This is the definitive reference for setting up a production environment without containers.

Target Environment: Ubuntu 22.04 LTS (or newer)

System Prerequisites
PostgreSQL Setup
Redis Setup
Node.js and Application Setup
PM2 Process Manager
NGINX Reverse Proxy
Bugsink Error Tracking
Logstash Log Aggregation
SSL/TLS with Let's Encrypt
Firewall Configuration
Maintenance Commands

System Prerequisites

Update the system and install essential packages:

sudo apt update && sudo apt upgrade -y
sudo apt install -y curl git build-essential python3 python3-pip python3-venv

PostgreSQL Setup

Install PostgreSQL 14+ with PostGIS

# Add PostgreSQL APT repository
sudo sh -c 'echo "deb http://apt.postgresql.org/pub/repos/apt $(lsb_release -cs)-pgdg main" > /etc/apt/sources.list.d/pgdg.list'
wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add -
sudo apt update

# Install PostgreSQL and PostGIS
sudo apt install -y postgresql-14 postgresql-14-postgis-3

Create Application Database and User

sudo -u postgres psql

-- Create application user and database
CREATE USER flyer_crawler WITH PASSWORD 'YOUR_SECURE_PASSWORD';
CREATE DATABASE flyer_crawler OWNER flyer_crawler;

-- Connect to the database and enable extensions
\c flyer_crawler

CREATE EXTENSION IF NOT EXISTS postgis;
CREATE EXTENSION IF NOT EXISTS pg_trgm;
CREATE EXTENSION IF NOT EXISTS pgcrypto;

-- Grant privileges
GRANT ALL PRIVILEGES ON DATABASE flyer_crawler TO flyer_crawler;

\q

Configure PostgreSQL for Remote Access (if needed)

Edit /etc/postgresql/14/main/postgresql.conf:

listen_addresses = 'localhost'  # Change to '*' for remote access

Edit /etc/postgresql/14/main/pg_hba.conf to add allowed hosts:

# Local connections
local   all   all   peer
host    all   all   127.0.0.1/32   scram-sha-256

Restart PostgreSQL:

sudo systemctl restart postgresql

Redis Setup

Install Redis

sudo apt install -y redis-server

Configure Redis Password

Edit /etc/redis/redis.conf:

requirepass YOUR_REDIS_PASSWORD

Restart Redis:

sudo systemctl restart redis-server
sudo systemctl enable redis-server

Test Redis Connection

redis-cli -a YOUR_REDIS_PASSWORD ping
# Should output: PONG

Node.js and Application Setup

Install Node.js 20.x

curl -fsSL https://deb.nodesource.com/setup_20.x | sudo -E bash -
sudo apt install -y nodejs

Verify installation:

node --version  # Should output v20.x.x
npm --version

Install System Dependencies for PDF Processing

sudo apt install -y poppler-utils  # For pdftocairo

Clone and Install Application

# Create application directory
sudo mkdir -p /opt/flyer-crawler
sudo chown $USER:$USER /opt/flyer-crawler

# Clone repository
cd /opt/flyer-crawler
git clone https://gitea.projectium.com/flyer-crawler/flyer-crawler.projectium.com.git .

# Install dependencies
npm install

# Build for production
npm run build

Configure Environment Variables

Important: The flyer-crawler application does not use local environment files in production. All secrets are managed through Gitea CI/CD secrets and injected during deployment.

How Secrets Work

Secrets are stored in Gitea at Repository → Settings → Actions → Secrets
Workflow files (.gitea/workflows/deploy-to-prod.yml) reference secrets using ${{ secrets.SECRET_NAME }}
PM2 receives environment variables from the workflow's env: block
ecosystem.config.cjs passes variables to the application via process.env

Required Gitea Secrets

Before deployment, ensure these secrets are configured in Gitea:

Shared Secrets (used by both production and test):

Secret Name	Description
`DB_HOST`	Database hostname (usually `localhost`)
`DB_USER`	Database username
`DB_PASSWORD`	Database password
`JWT_SECRET`	JWT signing secret (min 32 characters)
`GOOGLE_MAPS_API_KEY`	Google Maps API key
`GOOGLE_CLIENT_ID`	Google OAuth client ID
`GOOGLE_CLIENT_SECRET`	Google OAuth client secret
`GH_CLIENT_ID`	GitHub OAuth client ID
`GH_CLIENT_SECRET`	GitHub OAuth client secret

Production-Specific Secrets:

Secret Name	Description
`DB_DATABASE_PROD`	Production database name (`flyer_crawler`)
`REDIS_PASSWORD_PROD`	Redis password for production (uses database 0)
`VITE_GOOGLE_GENAI_API_KEY`	Gemini API key for production
`SENTRY_DSN`	Bugsink backend DSN (see Bugsink section)
`VITE_SENTRY_DSN`	Bugsink frontend DSN

Test-Specific Secrets:

Secret Name	Description
`DB_DATABASE_TEST`	Test database name (`flyer-crawler-test`)
`REDIS_PASSWORD_TEST`	Redis password for test (uses database 1 for isolation)
`VITE_GOOGLE_GENAI_API_KEY_TEST`	Gemini API key for test environment
`SENTRY_DSN_TEST`	Bugsink backend DSN for test (see Bugsink section)
`VITE_SENTRY_DSN_TEST`	Bugsink frontend DSN for test

Test Environment Details

The test environment (flyer-crawler-test.projectium.com) uses both Gitea CI/CD secrets and a local .env.test file:

Path	Purpose
`/var/www/flyer-crawler-test.projectium.com/`	Test application directory
`/var/www/flyer-crawler-test.projectium.com/.env.test`	Local overrides for test-specific config

Key differences from production:

Uses Redis database 1 (production uses database 0) to isolate job queues
PM2 processes are named with -test suffix (e.g., flyer-crawler-api-test)
Deployed automatically on every push to main branch
Has a .env.test file for additional local configuration overrides

For detailed information on secrets management, see CLAUDE.md.

PM2 Process Manager

Install PM2 Globally

sudo npm install -g pm2

PM2 Configuration Files

The application uses separate ecosystem config files for production and test environments:

File	Purpose	Processes Started
`ecosystem.config.cjs`	Production deployment	`flyer-crawler-api`, `flyer-crawler-worker`, `flyer-crawler-analytics-worker`
`ecosystem-test.config.cjs`	Test deployment	`flyer-crawler-api-test`, `flyer-crawler-worker-test`, `flyer-crawler-analytics-worker-test`

Key Points:

Production and test processes run simultaneously with distinct names
Test processes use NODE_ENV=test which enables file logging
Test processes use Redis database 1 (isolated from production which uses database 0)
Both configs validate required environment variables but only warn (don't exit) if missing

Start Production Application

cd /var/www/flyer-crawler.projectium.com

# Set required environment variables (usually done via CI/CD)
export DB_HOST=localhost
export JWT_SECRET=your-secret
export GEMINI_API_KEY=your-api-key
# ... other required variables

pm2 startOrReload ecosystem.config.cjs --update-env && pm2 save

This starts three production processes:

flyer-crawler-api - Main API server (port 3001)
flyer-crawler-worker - Background job worker
flyer-crawler-analytics-worker - Analytics processing worker

Start Test Application

cd /var/www/flyer-crawler-test.projectium.com

# Set required environment variables (usually done via CI/CD)
export DB_HOST=localhost
export DB_NAME=flyer-crawler-test
export JWT_SECRET=your-secret
export GEMINI_API_KEY=your-test-api-key
export REDIS_URL=redis://localhost:6379/1  # Use database 1 for isolation
# ... other required variables

pm2 startOrReload ecosystem-test.config.cjs --update-env && pm2 save

This starts three test processes (running alongside production):

flyer-crawler-api-test - Test API server (port 3001 via different NGINX vhost)
flyer-crawler-worker-test - Test background job worker
flyer-crawler-analytics-worker-test - Test analytics worker

Verify Running Processes

After starting both environments, you should see 6 application processes:

pm2 list

Expected output:

┌────┬───────────────────────────────────┬──────────┬────────┬───────────┐
│ id │ name                              │ mode     │ status │ cpu       │
├────┼───────────────────────────────────┼──────────┼────────┼───────────┤
│ 0  │ flyer-crawler-api                 │ cluster  │ online │ 0%        │
│ 1  │ flyer-crawler-worker              │ fork     │ online │ 0%        │
│ 2  │ flyer-crawler-analytics-worker    │ fork     │ online │ 0%        │
│ 3  │ flyer-crawler-api-test            │ fork     │ online │ 0%        │
│ 4  │ flyer-crawler-worker-test         │ fork     │ online │ 0%        │
│ 5  │ flyer-crawler-analytics-worker-test│ fork     │ online │ 0%        │
└────┴───────────────────────────────────┴──────────┴────────┴───────────┘

Configure PM2 Startup

pm2 startup systemd
# Follow the command output to enable PM2 on boot

pm2 save

PM2 Log Rotation

pm2 install pm2-logrotate
pm2 set pm2-logrotate:max_size 10M
pm2 set pm2-logrotate:retain 14
pm2 set pm2-logrotate:compress true

Useful PM2 Commands

# View logs for a specific process
pm2 logs flyer-crawler-api-test --lines 50

# View environment variables for a process
pm2 env <process-id>

# Restart only test processes
pm2 restart flyer-crawler-api-test flyer-crawler-worker-test flyer-crawler-analytics-worker-test

# Delete all test processes (without affecting production)
pm2 delete flyer-crawler-api-test flyer-crawler-worker-test flyer-crawler-analytics-worker-test

NGINX Reverse Proxy

Install NGINX

sudo apt install -y nginx

Create Site Configuration

Create /etc/nginx/sites-available/flyer-crawler.projectium.com:

server {
    listen 80;
    server_name flyer-crawler.projectium.com;

    # Redirect HTTP to HTTPS (uncomment after SSL setup)
    # return 301 https://$server_name$request_uri;

    location / {
        proxy_pass http://localhost:5173;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection 'upgrade';
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
        proxy_cache_bypass $http_upgrade;
    }

    location /api {
        proxy_pass http://localhost:3001;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection 'upgrade';
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
        proxy_cache_bypass $http_upgrade;

        # File upload size limit
        client_max_body_size 50M;
    }

    # MIME type fix for .mjs files
    types {
        application/javascript js mjs;
    }
}

Enable the Site

sudo ln -s /etc/nginx/sites-available/flyer-crawler.projectium.com /etc/nginx/sites-enabled/
sudo nginx -t
sudo systemctl reload nginx
sudo systemctl enable nginx

Bugsink Error Tracking

Bugsink is a lightweight, self-hosted Sentry-compatible error tracking system. This guide follows the official Bugsink single-server production setup.

See ADR-015 for architecture details.

Step 1: Create Bugsink User

Create a dedicated non-root user for Bugsink:

sudo adduser bugsink --disabled-password --gecos ""

Step 2: Set Up Virtual Environment and Install Bugsink

Switch to the bugsink user:

sudo su - bugsink

Create the virtual environment:

python3 -m venv venv

Activate the virtual environment:

source venv/bin/activate

You should see (venv) at the beginning of your prompt. Now install Bugsink:

pip install bugsink --upgrade
bugsink-show-version

You should see output like bugsink 2.x.x.

Step 3: Create Configuration File

Generate the configuration file. Replace bugsink.yourdomain.com with your actual hostname:

bugsink-create-conf --template=singleserver --host=bugsink.yourdomain.com

This creates bugsink_conf.py in /home/bugsink/. Edit it to customize settings:

nano bugsink_conf.py

Key settings to review:

Setting	Description
`BASE_URL`	The URL where Bugsink will be accessed (e.g., `https://bugsink.yourdomain.com`)
`SITE_TITLE`	Display name for your Bugsink instance
`SECRET_KEY`	Auto-generated, but verify it exists
`TIME_ZONE`	Your timezone (e.g., `America/New_York`)
`USER_REGISTRATION`	Set to `"closed"` to disable public signup
`SINGLE_USER`	Set to `True` if only one user will use this instance

Step 4: Initialize Database

Bugsink uses SQLite by default, which is recommended for single-server setups. Run the database migrations:

bugsink-manage migrate
bugsink-manage migrate snappea --database=snappea

Verify the database files were created:

ls *.sqlite3

You should see db.sqlite3 and snappea.sqlite3.

Step 5: Create Admin User

Create the superuser account. Using your email as the username is recommended:

bugsink-manage createsuperuser

Important: Save these credentials - you'll need them to log into the Bugsink web UI.

Step 6: Verify Configuration

Run Django's deployment checks:

bugsink-manage check_migrations
bugsink-manage check --deploy --fail-level WARNING

Exit back to root for the next steps:

exit

Step 7: Create Gunicorn Service

Create /etc/systemd/system/gunicorn-bugsink.service:

sudo nano /etc/systemd/system/gunicorn-bugsink.service

Add the following content:

[Unit]
Description=Gunicorn daemon for Bugsink
After=network.target

[Service]
Restart=always
Type=notify
User=bugsink
Group=bugsink

Environment="PYTHONUNBUFFERED=1"
WorkingDirectory=/home/bugsink
ExecStart=/home/bugsink/venv/bin/gunicorn \
    --bind="127.0.0.1:8000" \
    --workers=4 \
    --timeout=6 \
    --access-logfile - \
    --max-requests=1000 \
    --max-requests-jitter=100 \
    bugsink.wsgi
ExecReload=/bin/kill -s HUP $MAINPID
KillMode=mixed
TimeoutStopSec=5

[Install]
WantedBy=multi-user.target

Enable and start the service:

sudo systemctl daemon-reload
sudo systemctl enable --now gunicorn-bugsink.service
sudo systemctl status gunicorn-bugsink.service

Test that Gunicorn is responding (replace hostname):

curl http://localhost:8000/accounts/login/ --header "Host: bugsink.yourdomain.com"

You should see HTML output containing a login form.

Step 8: Create Snappea Background Worker Service

Snappea is Bugsink's background task processor. Create /etc/systemd/system/snappea.service:

sudo nano /etc/systemd/system/snappea.service

Add the following content:

[Unit]
Description=Snappea daemon for Bugsink background tasks
After=network.target

[Service]
Restart=always
User=bugsink
Group=bugsink

Environment="PYTHONUNBUFFERED=1"
WorkingDirectory=/home/bugsink
ExecStart=/home/bugsink/venv/bin/bugsink-runsnappea
KillMode=mixed
TimeoutStopSec=5
RuntimeMaxSec=1d

[Install]
WantedBy=multi-user.target

Enable and start the service:

sudo systemctl daemon-reload
sudo systemctl enable --now snappea.service
sudo systemctl status snappea.service

Verify snappea is working:

sudo su - bugsink
source venv/bin/activate
bugsink-manage checksnappea
exit

Step 9: Configure NGINX for Bugsink

Create /etc/nginx/sites-available/bugsink:

sudo nano /etc/nginx/sites-available/bugsink

Add the following (replace bugsink.yourdomain.com with your hostname):

server {
    server_name bugsink.yourdomain.com;
    listen 80;

    client_max_body_size 20M;

    access_log /var/log/nginx/bugsink.access.log;
    error_log /var/log/nginx/bugsink.error.log;

    location / {
        proxy_pass http://127.0.0.1:8000;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-Proto $scheme;
    }
}

Enable the site:

sudo ln -s /etc/nginx/sites-available/bugsink /etc/nginx/sites-enabled/
sudo nginx -t
sudo systemctl reload nginx

Step 10: Configure SSL with Certbot (Recommended)

sudo certbot --nginx -d bugsink.yourdomain.com

After SSL is configured, update the NGINX config to add security headers. Edit /etc/nginx/sites-available/bugsink and add to the location / block:

add_header Strict-Transport-Security "max-age=31536000; preload" always;

Reload NGINX:

sudo nginx -t
sudo systemctl reload nginx

Step 11: Create Projects and Get DSNs

Access Bugsink UI at https://bugsink.yourdomain.com
Log in with the admin credentials you created
Create a new team (or use the default)
Create projects for each environment:

Production:
- flyer-crawler-backend (Platform: Node.js)
- flyer-crawler-frontend (Platform: JavaScript/React)
Test:
- flyer-crawler-backend-test (Platform: Node.js)
- flyer-crawler-frontend-test (Platform: JavaScript/React)
For each project, go to Settings → Client Keys (DSN)
Copy the DSN URLs - you'll have 4 DSNs total (2 for production, 2 for test)

Note: The dev container runs its own local Bugsink instance at localhost:8000 - no remote DSNs needed for development.

Step 12: Configure Application to Use Bugsink

The flyer-crawler application receives its configuration via Gitea CI/CD secrets, not local environment files. Follow these steps to add the Bugsink DSNs:

1. Add Secrets in Gitea

Navigate to your repository in Gitea:

Go to Settings → Actions → Secrets
Add the following secrets:

Production DSNs:

Secret Name	Value	Description
`SENTRY_DSN`	`https://KEY@bugsink.yourdomain.com/1`	Production backend DSN
`VITE_SENTRY_DSN`	`https://KEY@bugsink.yourdomain.com/2`	Production frontend DSN

Test DSNs:

Secret Name	Value	Description
`SENTRY_DSN_TEST`	`https://KEY@bugsink.yourdomain.com/3`	Test backend DSN
`VITE_SENTRY_DSN_TEST`	`https://KEY@bugsink.yourdomain.com/4`	Test frontend DSN

Note: The project numbers in the DSN URLs are assigned by Bugsink when you create each project. Use the actual DSN values from Step 11.

2. Update the Deployment Workflows

Production (deploy-to-prod.yml):

In the Install Backend Dependencies and Restart Production Server step, add to the env: block:

env:
  # ... existing secrets ...
  # Sentry/Bugsink Error Tracking
  SENTRY_DSN: ${{ secrets.SENTRY_DSN }}
  SENTRY_ENVIRONMENT: 'production'
  SENTRY_ENABLED: 'true'

In the build step, add frontend variables:

VITE_SENTRY_DSN=${{ secrets.VITE_SENTRY_DSN }} \
VITE_SENTRY_ENVIRONMENT=production \
VITE_SENTRY_ENABLED=true \
npm run build

Test (deploy-to-test.yml):

In the Install Backend Dependencies and Restart Test Server step, add to the env: block:

env:
  # ... existing secrets ...
  # Sentry/Bugsink Error Tracking (Test)
  SENTRY_DSN: ${{ secrets.SENTRY_DSN_TEST }}
  SENTRY_ENVIRONMENT: 'test'
  SENTRY_ENABLED: 'true'

In the build step, add frontend variables:

VITE_SENTRY_DSN=${{ secrets.VITE_SENTRY_DSN_TEST }} \
VITE_SENTRY_ENVIRONMENT=test \
VITE_SENTRY_ENABLED=true \
npm run build

3. Update ecosystem.config.cjs

Add Sentry variables to the sharedEnv object in ecosystem.config.cjs:

const sharedEnv = {
  // ... existing variables ...
  SENTRY_DSN: process.env.SENTRY_DSN,
  SENTRY_ENVIRONMENT: process.env.SENTRY_ENVIRONMENT,
  SENTRY_ENABLED: process.env.SENTRY_ENABLED,
};

4. Dev Container (No Configuration Needed)

The dev container runs its own local Bugsink instance at http://localhost:8000. No remote DSNs or Gitea secrets are needed for development:

DSNs are pre-configured in compose.dev.yml
Admin UI: http://localhost:8000 (login: admin@localhost / admin)
Errors stay local and isolated from production/test

5. Deploy to Apply Changes

Trigger deployments via Gitea Actions:

Test: Automatically deploys on push to main
Production: Manual trigger via workflow dispatch

Note: There is no /etc/flyer-crawler/environment file on the server. Production and test secrets are managed through Gitea CI/CD and injected at deployment time. Dev container uses local .env file. See CLAUDE.md for details.

Step 13: Test Error Tracking

You can test Bugsink is working before configuring the flyer-crawler application.

Switch to the bugsink user and open a Python shell:

sudo su - bugsink
source venv/bin/activate
bugsink-manage shell

In the Python shell, send a test message using the backend DSN from Step 11:

import sentry_sdk
sentry_sdk.init("https://YOUR_BACKEND_KEY@bugsink.yourdomain.com/1")
sentry_sdk.capture_message("Test message from Bugsink setup")
exit()

Exit back to root:

exit

Check the Bugsink UI - you should see the test message appear in the flyer-crawler-backend project.

Step 14: Test from Flyer-Crawler Application (After App Setup)

Once the flyer-crawler application has been deployed with the Sentry secrets configured in Step 12:

cd /var/www/flyer-crawler.projectium.com
npx tsx scripts/test-bugsink.ts

Check the Bugsink UI - you should see a test event appear.

Bugsink Maintenance Commands

Task	Command
View Gunicorn status	`sudo systemctl status gunicorn-bugsink`
View Snappea status	`sudo systemctl status snappea`
View Gunicorn logs	`sudo journalctl -u gunicorn-bugsink -f`
View Snappea logs	`sudo journalctl -u snappea -f`
Restart Bugsink	`sudo systemctl restart gunicorn-bugsink snappea`
Run management commands	`sudo su - bugsink` then `source venv/bin/activate && bugsink-manage <command>`
Upgrade Bugsink	`sudo su - bugsink && source venv/bin/activate && pip install bugsink --upgrade && exit && sudo systemctl restart gunicorn-bugsink snappea`

Logstash Log Aggregation

Logstash aggregates logs from the application and infrastructure, forwarding errors to Bugsink.

Note: Logstash integration is optional. The flyer-crawler application already sends errors directly to Bugsink via the Sentry SDK. Logstash is only needed if you want to aggregate logs from other sources (Redis, NGINX, etc.) into Bugsink.

Step 1: Create Application Log Directory

The flyer-crawler application automatically creates its log directory on startup, but you need to ensure proper permissions for Logstash to read the logs.

Create the log directories and set appropriate permissions:

# Create log directory for the production application
sudo mkdir -p /var/www/flyer-crawler.projectium.com/logs

# Set ownership to root (since PM2 runs as root)
sudo chown -R root:root /var/www/flyer-crawler.projectium.com/logs

# Make logs readable by logstash user
sudo chmod 755 /var/www/flyer-crawler.projectium.com/logs

For the test environment:

sudo mkdir -p /var/www/flyer-crawler-test.projectium.com/logs
sudo chown -R root:root /var/www/flyer-crawler-test.projectium.com/logs
sudo chmod 755 /var/www/flyer-crawler-test.projectium.com/logs

Step 2: Application File Logging (Already Configured)

The flyer-crawler application uses Pino for logging and is configured to write logs to files in production/test environments:

Log File Locations:

Environment	Log File Path
Production	`/var/www/flyer-crawler.projectium.com/logs/app.log`
Test	`/var/www/flyer-crawler-test.projectium.com/logs/app.log`
Dev Container	`/app/logs/app.log`

How It Works:

In production/test: Pino writes JSON logs to both stdout (for PM2) AND logs/app.log (for Logstash)
In development: Pino uses pino-pretty for human-readable console output only
The log directory is created automatically if it doesn't exist
You can override the log directory with the LOG_DIR environment variable

Verify Logging After Deployment:

After deploying the application, verify that logs are being written:

# Check production logs
ls -la /var/www/flyer-crawler.projectium.com/logs/
tail -f /var/www/flyer-crawler.projectium.com/logs/app.log

# Check test logs
ls -la /var/www/flyer-crawler-test.projectium.com/logs/
tail -f /var/www/flyer-crawler-test.projectium.com/logs/app.log

You should see JSON-formatted log entries like:

{ "level": 30, "time": 1704067200000, "msg": "Server started on port 3001", "module": "server" }

Step 3: Install Logstash

# Add Elastic APT repository
wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo gpg --dearmor -o /usr/share/keyrings/elastic-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/elastic-keyring.gpg] https://artifacts.elastic.co/packages/8.x/apt stable main" | sudo tee /etc/apt/sources.list.d/elastic-8.x.list

# Update and install
sudo apt update
sudo apt install -y logstash

Verify installation:

/usr/share/logstash/bin/logstash --version

Step 4: Configure Logstash Pipeline

Create the pipeline configuration file:

sudo nano /etc/logstash/conf.d/bugsink.conf

Add the following content:

input {
  # Production application logs (Pino JSON format)
  file {
    path => "/var/www/flyer-crawler.projectium.com/logs/app.log"
    codec => json_lines
    type => "pino"
    tags => ["app", "production"]
    start_position => "end"
    sincedb_path => "/var/lib/logstash/sincedb_pino_prod"
  }

  # Test environment logs
  file {
    path => "/var/www/flyer-crawler-test.projectium.com/logs/app.log"
    codec => json_lines
    type => "pino"
    tags => ["app", "test"]
    start_position => "end"
    sincedb_path => "/var/lib/logstash/sincedb_pino_test"
  }

  # Redis logs (shared by both environments)
  file {
    path => "/var/log/redis/redis-server.log"
    type => "redis"
    tags => ["infra", "redis", "production"]
    start_position => "end"
    sincedb_path => "/var/lib/logstash/sincedb_redis"
  }

  # NGINX error logs (production)
  file {
    path => "/var/log/nginx/error.log"
    type => "nginx"
    tags => ["infra", "nginx", "production"]
    start_position => "end"
    sincedb_path => "/var/lib/logstash/sincedb_nginx_error"
  }

  # NGINX access logs - for detecting 5xx errors (production)
  file {
    path => "/var/log/nginx/access.log"
    type => "nginx_access"
    tags => ["infra", "nginx", "production"]
    start_position => "end"
    sincedb_path => "/var/lib/logstash/sincedb_nginx_access"
  }

  # PM2 error logs - Production (plain text stack traces)
  file {
    path => "/home/gitea-runner/.pm2/logs/flyer-crawler-*-error.log"
    exclude => "*-test-error.log"
    type => "pm2"
    tags => ["infra", "pm2", "production"]
    start_position => "end"
    sincedb_path => "/var/lib/logstash/sincedb_pm2_prod"
  }

  # PM2 error logs - Test
  file {
    path => "/home/gitea-runner/.pm2/logs/flyer-crawler-*-test-error.log"
    type => "pm2"
    tags => ["infra", "pm2", "test"]
    start_position => "end"
    sincedb_path => "/var/lib/logstash/sincedb_pm2_test"
  }
}

filter {
  # Pino log level detection
  # Pino levels: 10=trace, 20=debug, 30=info, 40=warn, 50=error, 60=fatal
  if [type] == "pino" and [level] {
    if [level] >= 50 {
      mutate { add_tag => ["error"] }
    } else if [level] >= 40 {
      mutate { add_tag => ["warning"] }
    }
  }

  # Redis error detection
  if [type] == "redis" {
    grok {
      match => { "message" => "%{POSINT:pid}:%{WORD:role} %{MONTHDAY} %{MONTH} %{YEAR}? ?%{TIME} %{WORD:loglevel} %{GREEDYDATA:redis_message}" }
    }
    if [loglevel] in ["WARNING", "ERROR"] {
      mutate { add_tag => ["error"] }
    }
  }

  # NGINX error log detection (all entries are errors)
  if [type] == "nginx" {
    mutate { add_tag => ["error"] }
    grok {
      match => { "message" => "%{TIMESTAMP_ISO8601:timestamp} \[%{WORD:severity}\] %{GREEDYDATA:nginx_message}" }
    }
  }

  # NGINX access log - detect 5xx errors
  if [type] == "nginx_access" {
    grok {
      match => { "message" => "%{COMBINEDAPACHELOG}" }
    }
    if [response] =~ /^5\d{2}$/ {
      mutate { add_tag => ["error"] }
    }
  }

  # PM2 error log detection - tag lines with actual error indicators
  if [type] == "pm2" {
    if [message] =~ /Error:|error:|ECONNREFUSED|ENOENT|TypeError|ReferenceError|SyntaxError/ {
      mutate { add_tag => ["error"] }
    }
  }
}

output {
  # Production app errors -> flyer-crawler-backend (project 1)
  if "error" in [tags] and "app" in [tags] and "production" in [tags] {
    http {
      url => "http://localhost:8000/api/1/store/"
      http_method => "post"
      format => "json"
      headers => {
        "X-Sentry-Auth" => "Sentry sentry_version=7, sentry_client=logstash/1.0, sentry_key=YOUR_PROD_BACKEND_DSN_KEY"
      }
    }
  }

  # Test app errors -> flyer-crawler-backend-test (project 3)
  if "error" in [tags] and "app" in [tags] and "test" in [tags] {
    http {
      url => "http://localhost:8000/api/3/store/"
      http_method => "post"
      format => "json"
      headers => {
        "X-Sentry-Auth" => "Sentry sentry_version=7, sentry_client=logstash/1.0, sentry_key=YOUR_TEST_BACKEND_DSN_KEY"
      }
    }
  }

  # Production infrastructure errors (Redis, NGINX, PM2) -> flyer-crawler-infrastructure (project 5)
  if "error" in [tags] and "infra" in [tags] and "production" in [tags] {
    http {
      url => "http://localhost:8000/api/5/store/"
      http_method => "post"
      format => "json"
      headers => {
        "X-Sentry-Auth" => "Sentry sentry_version=7, sentry_client=logstash/1.0, sentry_key=b083076f94fb461b889d5dffcbef43bf"
      }
    }
  }

  # Test infrastructure errors (PM2 test logs) -> flyer-crawler-test-infrastructure (project 6)
  if "error" in [tags] and "infra" in [tags] and "test" in [tags] {
    http {
      url => "http://localhost:8000/api/6/store/"
      http_method => "post"
      format => "json"
      headers => {
        "X-Sentry-Auth" => "Sentry sentry_version=7, sentry_client=logstash/1.0, sentry_key=25020dd6c2b74ad78463ec90e90fadab"
      }
    }
  }

  # Debug output (uncomment to troubleshoot)
  # stdout { codec => rubydebug }
}

Bugsink Project DSNs:

Project	DSN Key	Project ID
`flyer-crawler-backend`	`911aef02b9a548fa8fabb8a3c81abfe5`	1
`flyer-crawler-frontend`	(used by app, not Logstash)	2
`flyer-crawler-backend-test`	`cdb99c314589431e83d4cc38a809449b`	3
`flyer-crawler-frontend-test`	(used by app, not Logstash)	4
`flyer-crawler-infrastructure`	`b083076f94fb461b889d5dffcbef43bf`	5
`flyer-crawler-test-infrastructure`	`25020dd6c2b74ad78463ec90e90fadab`	6

Note: The DSN key is the part before @ in the full DSN URL (e.g., https://KEY@bugsink.projectium.com/PROJECT_ID).

Note on PM2 Logs: PM2 error logs capture stack traces from stderr, which are valuable for debugging startup errors and uncaught exceptions. Production PM2 logs go to project 5 (infrastructure), test PM2 logs go to project 6 (test-infrastructure).

Step 5: Create Logstash State Directory and Fix Config Path

Logstash needs a directory to track which log lines it has already processed, and a symlink so it can find its config files:

# Create state directory for sincedb files
sudo mkdir -p /var/lib/logstash
sudo chown logstash:logstash /var/lib/logstash

# Create symlink so Logstash finds its config (avoids "Could not find logstash.yml" warning)
sudo ln -sf /etc/logstash /usr/share/logstash/config

Step 6: Grant Logstash Access to Application Logs

Logstash runs as the logstash user and needs permission to read log files:

# Add logstash user to adm group (for nginx and redis logs)
sudo usermod -aG adm logstash

# Make application log files readable (created automatically when app starts)
sudo chmod 644 /var/www/flyer-crawler.projectium.com/logs/app.log 2>/dev/null || echo "Production log file not yet created"
sudo chmod 644 /var/www/flyer-crawler-test.projectium.com/logs/app.log 2>/dev/null || echo "Test log file not yet created"

# Make Redis logs and directory readable
sudo chmod 755 /var/log/redis/
sudo chmod 644 /var/log/redis/redis-server.log

# Make NGINX logs readable
sudo chmod 644 /var/log/nginx/access.log /var/log/nginx/error.log

# Make PM2 logs and directories accessible
sudo chmod 755 /home/gitea-runner/
sudo chmod 755 /home/gitea-runner/.pm2/
sudo chmod 755 /home/gitea-runner/.pm2/logs/
sudo chmod 644 /home/gitea-runner/.pm2/logs/*.log

# Verify logstash group membership
groups logstash

Note: The application log files are created automatically when the application starts. Run the chmod commands after the first deployment.

Step 7: Test Logstash Configuration

Test the configuration before starting:

sudo /usr/share/logstash/bin/logstash --config.test_and_exit -f /etc/logstash/conf.d/bugsink.conf

You should see Configuration OK if there are no errors.

Step 8: Start Logstash

sudo systemctl enable logstash
sudo systemctl start logstash
sudo systemctl status logstash

View Logstash logs to verify it's working:

sudo journalctl -u logstash -f

Troubleshooting Logstash

Issue	Solution
"Permission denied" errors	Check file permissions on log files and sincedb directory
No events being processed	Verify log file paths exist and contain data
HTTP output errors	Check Bugsink is running and DSN key is correct
Logstash not starting	Run config test: `sudo /usr/share/logstash/bin/logstash --config.test_and_exit -f /etc/logstash/conf.d/`

Alternative: Skip Logstash

Since the flyer-crawler application already sends errors directly to Bugsink via the Sentry SDK (configured in Steps 11-12), you may not need Logstash at all. Logstash is primarily useful for:

Aggregating logs from services that don't have native Sentry support (Redis, NGINX)
Centralizing all logs in one place
Complex log transformations

If you only need application error tracking, the Sentry SDK integration is sufficient.

SSL/TLS with Let's Encrypt

Install Certbot

sudo apt install -y certbot python3-certbot-nginx

Obtain Certificate

sudo certbot --nginx -d flyer-crawler.projectium.com

Certbot will automatically configure NGINX for HTTPS.

Auto-Renewal

Certbot installs a systemd timer for automatic renewal. Verify:

sudo systemctl status certbot.timer

Firewall Configuration

Configure UFW

sudo ufw default deny incoming
sudo ufw default allow outgoing

# Allow SSH
sudo ufw allow ssh

# Allow HTTP and HTTPS
sudo ufw allow 80/tcp
sudo ufw allow 443/tcp

# Enable firewall
sudo ufw enable

Important: Bugsink (port 8000) should NOT be exposed externally. It listens on localhost only.

Maintenance Commands

Application Management

Task	Command
View PM2 status	`pm2 status`
View application logs	`pm2 logs`
Restart all processes	`pm2 restart all`
Restart specific app	`pm2 restart flyer-crawler-api`
Update application	`cd /opt/flyer-crawler && git pull && npm install && npm run build && pm2 restart all`

Service Management

Service	Start	Stop	Status
PostgreSQL	`sudo systemctl start postgresql`	`sudo systemctl stop postgresql`	`sudo systemctl status postgresql`
Redis	`sudo systemctl start redis-server`	`sudo systemctl stop redis-server`	`sudo systemctl status redis-server`
NGINX	`sudo systemctl start nginx`	`sudo systemctl stop nginx`	`sudo systemctl status nginx`
Bugsink	`sudo systemctl start bugsink`	`sudo systemctl stop bugsink`	`sudo systemctl status bugsink`
Logstash	`sudo systemctl start logstash`	`sudo systemctl stop logstash`	`sudo systemctl status logstash`

Database Backup

# Backup application database
pg_dump -U flyer_crawler -h localhost flyer_crawler > backup_$(date +%Y%m%d).sql

# Backup Bugsink database
pg_dump -U bugsink -h localhost bugsink > bugsink_backup_$(date +%Y%m%d).sql

Log Locations

Log	Location
Application (PM2)	`~/.pm2/logs/`
NGINX access	`/var/log/nginx/access.log`
NGINX error	`/var/log/nginx/error.log`
PostgreSQL	`/var/log/postgresql/`
Redis	`/var/log/redis/`
Bugsink	`journalctl -u bugsink`
Logstash	`/var/log/logstash/`

DEPLOYMENT.md - Container-based deployment
DATABASE.md - Database schema and extensions
AUTHENTICATION.md - OAuth provider setup
ADR-015 - Error tracking architecture

42 KiB Raw Permalink Blame History

Bare-Metal Server Setup Guide

Table of Contents

System Prerequisites

PostgreSQL Setup

Install PostgreSQL 14+ with PostGIS

Create Application Database and User

Configure PostgreSQL for Remote Access (if needed)

Redis Setup

Install Redis

Configure Redis Password

Test Redis Connection

Node.js and Application Setup

Install Node.js 20.x

Install System Dependencies for PDF Processing

Clone and Install Application

Configure Environment Variables

How Secrets Work

Required Gitea Secrets

Test Environment Details

PM2 Process Manager

Install PM2 Globally

PM2 Configuration Files

Start Production Application

Start Test Application

Verify Running Processes

Configure PM2 Startup

PM2 Log Rotation

Useful PM2 Commands

NGINX Reverse Proxy

Install NGINX

Create Site Configuration

Enable the Site

Bugsink Error Tracking

Step 1: Create Bugsink User

Step 2: Set Up Virtual Environment and Install Bugsink

Step 3: Create Configuration File

Step 4: Initialize Database

Step 5: Create Admin User

Step 6: Verify Configuration

Step 7: Create Gunicorn Service

Step 8: Create Snappea Background Worker Service

Step 9: Configure NGINX for Bugsink

Step 10: Configure SSL with Certbot (Recommended)

Step 11: Create Projects and Get DSNs

Step 12: Configure Application to Use Bugsink

1. Add Secrets in Gitea

2. Update the Deployment Workflows

3. Update ecosystem.config.cjs

4. Dev Container (No Configuration Needed)

5. Deploy to Apply Changes

Step 13: Test Error Tracking

Step 14: Test from Flyer-Crawler Application (After App Setup)

Bugsink Maintenance Commands

Logstash Log Aggregation

Step 1: Create Application Log Directory

Step 2: Application File Logging (Already Configured)

Step 3: Install Logstash

Step 4: Configure Logstash Pipeline

Step 5: Create Logstash State Directory and Fix Config Path

Step 6: Grant Logstash Access to Application Logs

Step 7: Test Logstash Configuration

Step 8: Start Logstash

Troubleshooting Logstash

Alternative: Skip Logstash

SSL/TLS with Let's Encrypt

Install Certbot

Obtain Certificate

Auto-Renewal

Firewall Configuration

Configure UFW

Maintenance Commands

Application Management

Service Management

Database Backup

Log Locations

Related Documentation

42 KiB

Raw Permalink Blame History