Files
flyer-crawler.projectium.com/docs/BARE-METAL-SETUP.md
Torben Sorensen 6e36cc3b07
All checks were successful
Deploy to Test Environment / deploy-to-test (push) Successful in 16m34s
logging + e2e test fixes
2026-01-12 19:10:29 -08:00

42 KiB

Bare-Metal Server Setup Guide

This guide covers the manual installation of Flyer Crawler and its dependencies on a bare-metal Ubuntu server (e.g., a colocation server). This is the definitive reference for setting up a production environment without containers.

Target Environment: Ubuntu 22.04 LTS (or newer)


Table of Contents

  1. System Prerequisites
  2. PostgreSQL Setup
  3. Redis Setup
  4. Node.js and Application Setup
  5. PM2 Process Manager
  6. NGINX Reverse Proxy
  7. Bugsink Error Tracking
  8. Logstash Log Aggregation
  9. SSL/TLS with Let's Encrypt
  10. Firewall Configuration
  11. Maintenance Commands

System Prerequisites

Update the system and install essential packages:

sudo apt update && sudo apt upgrade -y
sudo apt install -y curl git build-essential python3 python3-pip python3-venv

PostgreSQL Setup

Install PostgreSQL 14+ with PostGIS

# Add PostgreSQL APT repository
sudo sh -c 'echo "deb http://apt.postgresql.org/pub/repos/apt $(lsb_release -cs)-pgdg main" > /etc/apt/sources.list.d/pgdg.list'
wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add -
sudo apt update

# Install PostgreSQL and PostGIS
sudo apt install -y postgresql-14 postgresql-14-postgis-3

Create Application Database and User

sudo -u postgres psql
-- Create application user and database
CREATE USER flyer_crawler WITH PASSWORD 'YOUR_SECURE_PASSWORD';
CREATE DATABASE flyer_crawler OWNER flyer_crawler;

-- Connect to the database and enable extensions
\c flyer_crawler

CREATE EXTENSION IF NOT EXISTS postgis;
CREATE EXTENSION IF NOT EXISTS pg_trgm;
CREATE EXTENSION IF NOT EXISTS pgcrypto;

-- Grant privileges
GRANT ALL PRIVILEGES ON DATABASE flyer_crawler TO flyer_crawler;

\q

Configure PostgreSQL for Remote Access (if needed)

Edit /etc/postgresql/14/main/postgresql.conf:

listen_addresses = 'localhost'  # Change to '*' for remote access

Edit /etc/postgresql/14/main/pg_hba.conf to add allowed hosts:

# Local connections
local   all   all   peer
host    all   all   127.0.0.1/32   scram-sha-256

Restart PostgreSQL:

sudo systemctl restart postgresql

Redis Setup

Install Redis

sudo apt install -y redis-server

Configure Redis Password

Edit /etc/redis/redis.conf:

requirepass YOUR_REDIS_PASSWORD

Restart Redis:

sudo systemctl restart redis-server
sudo systemctl enable redis-server

Test Redis Connection

redis-cli -a YOUR_REDIS_PASSWORD ping
# Should output: PONG

Node.js and Application Setup

Install Node.js 20.x

curl -fsSL https://deb.nodesource.com/setup_20.x | sudo -E bash -
sudo apt install -y nodejs

Verify installation:

node --version  # Should output v20.x.x
npm --version

Install System Dependencies for PDF Processing

sudo apt install -y poppler-utils  # For pdftocairo

Clone and Install Application

# Create application directory
sudo mkdir -p /opt/flyer-crawler
sudo chown $USER:$USER /opt/flyer-crawler

# Clone repository
cd /opt/flyer-crawler
git clone https://gitea.projectium.com/flyer-crawler/flyer-crawler.projectium.com.git .

# Install dependencies
npm install

# Build for production
npm run build

Configure Environment Variables

Important: The flyer-crawler application does not use local environment files in production. All secrets are managed through Gitea CI/CD secrets and injected during deployment.

How Secrets Work

  1. Secrets are stored in Gitea at Repository → Settings → Actions → Secrets
  2. Workflow files (.gitea/workflows/deploy-to-prod.yml) reference secrets using ${{ secrets.SECRET_NAME }}
  3. PM2 receives environment variables from the workflow's env: block
  4. ecosystem.config.cjs passes variables to the application via process.env

Required Gitea Secrets

Before deployment, ensure these secrets are configured in Gitea:

Shared Secrets (used by both production and test):

Secret Name Description
DB_HOST Database hostname (usually localhost)
DB_USER Database username
DB_PASSWORD Database password
JWT_SECRET JWT signing secret (min 32 characters)
GOOGLE_MAPS_API_KEY Google Maps API key
GOOGLE_CLIENT_ID Google OAuth client ID
GOOGLE_CLIENT_SECRET Google OAuth client secret
GH_CLIENT_ID GitHub OAuth client ID
GH_CLIENT_SECRET GitHub OAuth client secret

Production-Specific Secrets:

Secret Name Description
DB_DATABASE_PROD Production database name (flyer_crawler)
REDIS_PASSWORD_PROD Redis password for production (uses database 0)
VITE_GOOGLE_GENAI_API_KEY Gemini API key for production
SENTRY_DSN Bugsink backend DSN (see Bugsink section)
VITE_SENTRY_DSN Bugsink frontend DSN

Test-Specific Secrets:

Secret Name Description
DB_DATABASE_TEST Test database name (flyer-crawler-test)
REDIS_PASSWORD_TEST Redis password for test (uses database 1 for isolation)
VITE_GOOGLE_GENAI_API_KEY_TEST Gemini API key for test environment
SENTRY_DSN_TEST Bugsink backend DSN for test (see Bugsink section)
VITE_SENTRY_DSN_TEST Bugsink frontend DSN for test

Test Environment Details

The test environment (flyer-crawler-test.projectium.com) uses both Gitea CI/CD secrets and a local .env.test file:

Path Purpose
/var/www/flyer-crawler-test.projectium.com/ Test application directory
/var/www/flyer-crawler-test.projectium.com/.env.test Local overrides for test-specific config

Key differences from production:

  • Uses Redis database 1 (production uses database 0) to isolate job queues
  • PM2 processes are named with -test suffix (e.g., flyer-crawler-api-test)
  • Deployed automatically on every push to main branch
  • Has a .env.test file for additional local configuration overrides

For detailed information on secrets management, see CLAUDE.md.


PM2 Process Manager

Install PM2 Globally

sudo npm install -g pm2

PM2 Configuration Files

The application uses separate ecosystem config files for production and test environments:

File Purpose Processes Started
ecosystem.config.cjs Production deployment flyer-crawler-api, flyer-crawler-worker, flyer-crawler-analytics-worker
ecosystem-test.config.cjs Test deployment flyer-crawler-api-test, flyer-crawler-worker-test, flyer-crawler-analytics-worker-test

Key Points:

  • Production and test processes run simultaneously with distinct names
  • Test processes use NODE_ENV=test which enables file logging
  • Test processes use Redis database 1 (isolated from production which uses database 0)
  • Both configs validate required environment variables but only warn (don't exit) if missing

Start Production Application

cd /var/www/flyer-crawler.projectium.com

# Set required environment variables (usually done via CI/CD)
export DB_HOST=localhost
export JWT_SECRET=your-secret
export GEMINI_API_KEY=your-api-key
# ... other required variables

pm2 startOrReload ecosystem.config.cjs --update-env && pm2 save

This starts three production processes:

  • flyer-crawler-api - Main API server (port 3001)
  • flyer-crawler-worker - Background job worker
  • flyer-crawler-analytics-worker - Analytics processing worker

Start Test Application

cd /var/www/flyer-crawler-test.projectium.com

# Set required environment variables (usually done via CI/CD)
export DB_HOST=localhost
export DB_NAME=flyer-crawler-test
export JWT_SECRET=your-secret
export GEMINI_API_KEY=your-test-api-key
export REDIS_URL=redis://localhost:6379/1  # Use database 1 for isolation
# ... other required variables

pm2 startOrReload ecosystem-test.config.cjs --update-env && pm2 save

This starts three test processes (running alongside production):

  • flyer-crawler-api-test - Test API server (port 3001 via different NGINX vhost)
  • flyer-crawler-worker-test - Test background job worker
  • flyer-crawler-analytics-worker-test - Test analytics worker

Verify Running Processes

After starting both environments, you should see 6 application processes:

pm2 list

Expected output:

┌────┬───────────────────────────────────┬──────────┬────────┬───────────┐
│ id │ name                              │ mode     │ status │ cpu       │
├────┼───────────────────────────────────┼──────────┼────────┼───────────┤
│ 0  │ flyer-crawler-api                 │ cluster  │ online │ 0%        │
│ 1  │ flyer-crawler-worker              │ fork     │ online │ 0%        │
│ 2  │ flyer-crawler-analytics-worker    │ fork     │ online │ 0%        │
│ 3  │ flyer-crawler-api-test            │ fork     │ online │ 0%        │
│ 4  │ flyer-crawler-worker-test         │ fork     │ online │ 0%        │
│ 5  │ flyer-crawler-analytics-worker-test│ fork     │ online │ 0%        │
└────┴───────────────────────────────────┴──────────┴────────┴───────────┘

Configure PM2 Startup

pm2 startup systemd
# Follow the command output to enable PM2 on boot

pm2 save

PM2 Log Rotation

pm2 install pm2-logrotate
pm2 set pm2-logrotate:max_size 10M
pm2 set pm2-logrotate:retain 14
pm2 set pm2-logrotate:compress true

Useful PM2 Commands

# View logs for a specific process
pm2 logs flyer-crawler-api-test --lines 50

# View environment variables for a process
pm2 env <process-id>

# Restart only test processes
pm2 restart flyer-crawler-api-test flyer-crawler-worker-test flyer-crawler-analytics-worker-test

# Delete all test processes (without affecting production)
pm2 delete flyer-crawler-api-test flyer-crawler-worker-test flyer-crawler-analytics-worker-test

NGINX Reverse Proxy

Install NGINX

sudo apt install -y nginx

Create Site Configuration

Create /etc/nginx/sites-available/flyer-crawler.projectium.com:

server {
    listen 80;
    server_name flyer-crawler.projectium.com;

    # Redirect HTTP to HTTPS (uncomment after SSL setup)
    # return 301 https://$server_name$request_uri;

    location / {
        proxy_pass http://localhost:5173;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection 'upgrade';
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
        proxy_cache_bypass $http_upgrade;
    }

    location /api {
        proxy_pass http://localhost:3001;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection 'upgrade';
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
        proxy_cache_bypass $http_upgrade;

        # File upload size limit
        client_max_body_size 50M;
    }

    # MIME type fix for .mjs files
    types {
        application/javascript js mjs;
    }
}

Enable the Site

sudo ln -s /etc/nginx/sites-available/flyer-crawler.projectium.com /etc/nginx/sites-enabled/
sudo nginx -t
sudo systemctl reload nginx
sudo systemctl enable nginx

Bugsink Error Tracking

Bugsink is a lightweight, self-hosted Sentry-compatible error tracking system. This guide follows the official Bugsink single-server production setup.

See ADR-015 for architecture details.

Step 1: Create Bugsink User

Create a dedicated non-root user for Bugsink:

sudo adduser bugsink --disabled-password --gecos ""

Step 2: Set Up Virtual Environment and Install Bugsink

Switch to the bugsink user:

sudo su - bugsink

Create the virtual environment:

python3 -m venv venv

Activate the virtual environment:

source venv/bin/activate

You should see (venv) at the beginning of your prompt. Now install Bugsink:

pip install bugsink --upgrade
bugsink-show-version

You should see output like bugsink 2.x.x.

Step 3: Create Configuration File

Generate the configuration file. Replace bugsink.yourdomain.com with your actual hostname:

bugsink-create-conf --template=singleserver --host=bugsink.yourdomain.com

This creates bugsink_conf.py in /home/bugsink/. Edit it to customize settings:

nano bugsink_conf.py

Key settings to review:

Setting Description
BASE_URL The URL where Bugsink will be accessed (e.g., https://bugsink.yourdomain.com)
SITE_TITLE Display name for your Bugsink instance
SECRET_KEY Auto-generated, but verify it exists
TIME_ZONE Your timezone (e.g., America/New_York)
USER_REGISTRATION Set to "closed" to disable public signup
SINGLE_USER Set to True if only one user will use this instance

Step 4: Initialize Database

Bugsink uses SQLite by default, which is recommended for single-server setups. Run the database migrations:

bugsink-manage migrate
bugsink-manage migrate snappea --database=snappea

Verify the database files were created:

ls *.sqlite3

You should see db.sqlite3 and snappea.sqlite3.

Step 5: Create Admin User

Create the superuser account. Using your email as the username is recommended:

bugsink-manage createsuperuser

Important: Save these credentials - you'll need them to log into the Bugsink web UI.

Step 6: Verify Configuration

Run Django's deployment checks:

bugsink-manage check_migrations
bugsink-manage check --deploy --fail-level WARNING

Exit back to root for the next steps:

exit

Step 7: Create Gunicorn Service

Create /etc/systemd/system/gunicorn-bugsink.service:

sudo nano /etc/systemd/system/gunicorn-bugsink.service

Add the following content:

[Unit]
Description=Gunicorn daemon for Bugsink
After=network.target

[Service]
Restart=always
Type=notify
User=bugsink
Group=bugsink

Environment="PYTHONUNBUFFERED=1"
WorkingDirectory=/home/bugsink
ExecStart=/home/bugsink/venv/bin/gunicorn \
    --bind="127.0.0.1:8000" \
    --workers=4 \
    --timeout=6 \
    --access-logfile - \
    --max-requests=1000 \
    --max-requests-jitter=100 \
    bugsink.wsgi
ExecReload=/bin/kill -s HUP $MAINPID
KillMode=mixed
TimeoutStopSec=5

[Install]
WantedBy=multi-user.target

Enable and start the service:

sudo systemctl daemon-reload
sudo systemctl enable --now gunicorn-bugsink.service
sudo systemctl status gunicorn-bugsink.service

Test that Gunicorn is responding (replace hostname):

curl http://localhost:8000/accounts/login/ --header "Host: bugsink.yourdomain.com"

You should see HTML output containing a login form.

Step 8: Create Snappea Background Worker Service

Snappea is Bugsink's background task processor. Create /etc/systemd/system/snappea.service:

sudo nano /etc/systemd/system/snappea.service

Add the following content:

[Unit]
Description=Snappea daemon for Bugsink background tasks
After=network.target

[Service]
Restart=always
User=bugsink
Group=bugsink

Environment="PYTHONUNBUFFERED=1"
WorkingDirectory=/home/bugsink
ExecStart=/home/bugsink/venv/bin/bugsink-runsnappea
KillMode=mixed
TimeoutStopSec=5
RuntimeMaxSec=1d

[Install]
WantedBy=multi-user.target

Enable and start the service:

sudo systemctl daemon-reload
sudo systemctl enable --now snappea.service
sudo systemctl status snappea.service

Verify snappea is working:

sudo su - bugsink
source venv/bin/activate
bugsink-manage checksnappea
exit

Step 9: Configure NGINX for Bugsink

Create /etc/nginx/sites-available/bugsink:

sudo nano /etc/nginx/sites-available/bugsink

Add the following (replace bugsink.yourdomain.com with your hostname):

server {
    server_name bugsink.yourdomain.com;
    listen 80;

    client_max_body_size 20M;

    access_log /var/log/nginx/bugsink.access.log;
    error_log /var/log/nginx/bugsink.error.log;

    location / {
        proxy_pass http://127.0.0.1:8000;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-Proto $scheme;
    }
}

Enable the site:

sudo ln -s /etc/nginx/sites-available/bugsink /etc/nginx/sites-enabled/
sudo nginx -t
sudo systemctl reload nginx
sudo certbot --nginx -d bugsink.yourdomain.com

After SSL is configured, update the NGINX config to add security headers. Edit /etc/nginx/sites-available/bugsink and add to the location / block:

add_header Strict-Transport-Security "max-age=31536000; preload" always;

Reload NGINX:

sudo nginx -t
sudo systemctl reload nginx

Step 11: Create Projects and Get DSNs

  1. Access Bugsink UI at https://bugsink.yourdomain.com

  2. Log in with the admin credentials you created

  3. Create a new team (or use the default)

  4. Create projects for each environment:

    Production:

    • flyer-crawler-backend (Platform: Node.js)
    • flyer-crawler-frontend (Platform: JavaScript/React)

    Test:

    • flyer-crawler-backend-test (Platform: Node.js)
    • flyer-crawler-frontend-test (Platform: JavaScript/React)
  5. For each project, go to Settings → Client Keys (DSN)

  6. Copy the DSN URLs - you'll have 4 DSNs total (2 for production, 2 for test)

Note: The dev container runs its own local Bugsink instance at localhost:8000 - no remote DSNs needed for development.

Step 12: Configure Application to Use Bugsink

The flyer-crawler application receives its configuration via Gitea CI/CD secrets, not local environment files. Follow these steps to add the Bugsink DSNs:

1. Add Secrets in Gitea

Navigate to your repository in Gitea:

  1. Go to SettingsActionsSecrets
  2. Add the following secrets:

Production DSNs:

Secret Name Value Description
SENTRY_DSN https://KEY@bugsink.yourdomain.com/1 Production backend DSN
VITE_SENTRY_DSN https://KEY@bugsink.yourdomain.com/2 Production frontend DSN

Test DSNs:

Secret Name Value Description
SENTRY_DSN_TEST https://KEY@bugsink.yourdomain.com/3 Test backend DSN
VITE_SENTRY_DSN_TEST https://KEY@bugsink.yourdomain.com/4 Test frontend DSN

Note: The project numbers in the DSN URLs are assigned by Bugsink when you create each project. Use the actual DSN values from Step 11.

2. Update the Deployment Workflows

Production (deploy-to-prod.yml):

In the Install Backend Dependencies and Restart Production Server step, add to the env: block:

env:
  # ... existing secrets ...
  # Sentry/Bugsink Error Tracking
  SENTRY_DSN: ${{ secrets.SENTRY_DSN }}
  SENTRY_ENVIRONMENT: 'production'
  SENTRY_ENABLED: 'true'

In the build step, add frontend variables:

VITE_SENTRY_DSN=${{ secrets.VITE_SENTRY_DSN }} \
VITE_SENTRY_ENVIRONMENT=production \
VITE_SENTRY_ENABLED=true \
npm run build

Test (deploy-to-test.yml):

In the Install Backend Dependencies and Restart Test Server step, add to the env: block:

env:
  # ... existing secrets ...
  # Sentry/Bugsink Error Tracking (Test)
  SENTRY_DSN: ${{ secrets.SENTRY_DSN_TEST }}
  SENTRY_ENVIRONMENT: 'test'
  SENTRY_ENABLED: 'true'

In the build step, add frontend variables:

VITE_SENTRY_DSN=${{ secrets.VITE_SENTRY_DSN_TEST }} \
VITE_SENTRY_ENVIRONMENT=test \
VITE_SENTRY_ENABLED=true \
npm run build

3. Update ecosystem.config.cjs

Add Sentry variables to the sharedEnv object in ecosystem.config.cjs:

const sharedEnv = {
  // ... existing variables ...
  SENTRY_DSN: process.env.SENTRY_DSN,
  SENTRY_ENVIRONMENT: process.env.SENTRY_ENVIRONMENT,
  SENTRY_ENABLED: process.env.SENTRY_ENABLED,
};

4. Dev Container (No Configuration Needed)

The dev container runs its own local Bugsink instance at http://localhost:8000. No remote DSNs or Gitea secrets are needed for development:

  • DSNs are pre-configured in compose.dev.yml
  • Admin UI: http://localhost:8000 (login: admin@localhost / admin)
  • Errors stay local and isolated from production/test

5. Deploy to Apply Changes

Trigger deployments via Gitea Actions:

  • Test: Automatically deploys on push to main
  • Production: Manual trigger via workflow dispatch

Note: There is no /etc/flyer-crawler/environment file on the server. Production and test secrets are managed through Gitea CI/CD and injected at deployment time. Dev container uses local .env file. See CLAUDE.md for details.

Step 13: Test Error Tracking

You can test Bugsink is working before configuring the flyer-crawler application.

Switch to the bugsink user and open a Python shell:

sudo su - bugsink
source venv/bin/activate
bugsink-manage shell

In the Python shell, send a test message using the backend DSN from Step 11:

import sentry_sdk
sentry_sdk.init("https://YOUR_BACKEND_KEY@bugsink.yourdomain.com/1")
sentry_sdk.capture_message("Test message from Bugsink setup")
exit()

Exit back to root:

exit

Check the Bugsink UI - you should see the test message appear in the flyer-crawler-backend project.

Step 14: Test from Flyer-Crawler Application (After App Setup)

Once the flyer-crawler application has been deployed with the Sentry secrets configured in Step 12:

cd /var/www/flyer-crawler.projectium.com
npx tsx scripts/test-bugsink.ts

Check the Bugsink UI - you should see a test event appear.

Bugsink Maintenance Commands

Task Command
View Gunicorn status sudo systemctl status gunicorn-bugsink
View Snappea status sudo systemctl status snappea
View Gunicorn logs sudo journalctl -u gunicorn-bugsink -f
View Snappea logs sudo journalctl -u snappea -f
Restart Bugsink sudo systemctl restart gunicorn-bugsink snappea
Run management commands sudo su - bugsink then source venv/bin/activate && bugsink-manage <command>
Upgrade Bugsink sudo su - bugsink && source venv/bin/activate && pip install bugsink --upgrade && exit && sudo systemctl restart gunicorn-bugsink snappea

Logstash Log Aggregation

Logstash aggregates logs from the application and infrastructure, forwarding errors to Bugsink.

Note: Logstash integration is optional. The flyer-crawler application already sends errors directly to Bugsink via the Sentry SDK. Logstash is only needed if you want to aggregate logs from other sources (Redis, NGINX, etc.) into Bugsink.

Step 1: Create Application Log Directory

The flyer-crawler application automatically creates its log directory on startup, but you need to ensure proper permissions for Logstash to read the logs.

Create the log directories and set appropriate permissions:

# Create log directory for the production application
sudo mkdir -p /var/www/flyer-crawler.projectium.com/logs

# Set ownership to root (since PM2 runs as root)
sudo chown -R root:root /var/www/flyer-crawler.projectium.com/logs

# Make logs readable by logstash user
sudo chmod 755 /var/www/flyer-crawler.projectium.com/logs

For the test environment:

sudo mkdir -p /var/www/flyer-crawler-test.projectium.com/logs
sudo chown -R root:root /var/www/flyer-crawler-test.projectium.com/logs
sudo chmod 755 /var/www/flyer-crawler-test.projectium.com/logs

Step 2: Application File Logging (Already Configured)

The flyer-crawler application uses Pino for logging and is configured to write logs to files in production/test environments:

Log File Locations:

Environment Log File Path
Production /var/www/flyer-crawler.projectium.com/logs/app.log
Test /var/www/flyer-crawler-test.projectium.com/logs/app.log
Dev Container /app/logs/app.log

How It Works:

  • In production/test: Pino writes JSON logs to both stdout (for PM2) AND logs/app.log (for Logstash)
  • In development: Pino uses pino-pretty for human-readable console output only
  • The log directory is created automatically if it doesn't exist
  • You can override the log directory with the LOG_DIR environment variable

Verify Logging After Deployment:

After deploying the application, verify that logs are being written:

# Check production logs
ls -la /var/www/flyer-crawler.projectium.com/logs/
tail -f /var/www/flyer-crawler.projectium.com/logs/app.log

# Check test logs
ls -la /var/www/flyer-crawler-test.projectium.com/logs/
tail -f /var/www/flyer-crawler-test.projectium.com/logs/app.log

You should see JSON-formatted log entries like:

{ "level": 30, "time": 1704067200000, "msg": "Server started on port 3001", "module": "server" }

Step 3: Install Logstash

# Add Elastic APT repository
wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo gpg --dearmor -o /usr/share/keyrings/elastic-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/elastic-keyring.gpg] https://artifacts.elastic.co/packages/8.x/apt stable main" | sudo tee /etc/apt/sources.list.d/elastic-8.x.list

# Update and install
sudo apt update
sudo apt install -y logstash

Verify installation:

/usr/share/logstash/bin/logstash --version

Step 4: Configure Logstash Pipeline

Create the pipeline configuration file:

sudo nano /etc/logstash/conf.d/bugsink.conf

Add the following content:

input {
  # Production application logs (Pino JSON format)
  file {
    path => "/var/www/flyer-crawler.projectium.com/logs/app.log"
    codec => json_lines
    type => "pino"
    tags => ["app", "production"]
    start_position => "end"
    sincedb_path => "/var/lib/logstash/sincedb_pino_prod"
  }

  # Test environment logs
  file {
    path => "/var/www/flyer-crawler-test.projectium.com/logs/app.log"
    codec => json_lines
    type => "pino"
    tags => ["app", "test"]
    start_position => "end"
    sincedb_path => "/var/lib/logstash/sincedb_pino_test"
  }

  # Redis logs (shared by both environments)
  file {
    path => "/var/log/redis/redis-server.log"
    type => "redis"
    tags => ["infra", "redis", "production"]
    start_position => "end"
    sincedb_path => "/var/lib/logstash/sincedb_redis"
  }

  # NGINX error logs (production)
  file {
    path => "/var/log/nginx/error.log"
    type => "nginx"
    tags => ["infra", "nginx", "production"]
    start_position => "end"
    sincedb_path => "/var/lib/logstash/sincedb_nginx_error"
  }

  # NGINX access logs - for detecting 5xx errors (production)
  file {
    path => "/var/log/nginx/access.log"
    type => "nginx_access"
    tags => ["infra", "nginx", "production"]
    start_position => "end"
    sincedb_path => "/var/lib/logstash/sincedb_nginx_access"
  }

  # PM2 error logs - Production (plain text stack traces)
  file {
    path => "/home/gitea-runner/.pm2/logs/flyer-crawler-*-error.log"
    exclude => "*-test-error.log"
    type => "pm2"
    tags => ["infra", "pm2", "production"]
    start_position => "end"
    sincedb_path => "/var/lib/logstash/sincedb_pm2_prod"
  }

  # PM2 error logs - Test
  file {
    path => "/home/gitea-runner/.pm2/logs/flyer-crawler-*-test-error.log"
    type => "pm2"
    tags => ["infra", "pm2", "test"]
    start_position => "end"
    sincedb_path => "/var/lib/logstash/sincedb_pm2_test"
  }
}

filter {
  # Pino log level detection
  # Pino levels: 10=trace, 20=debug, 30=info, 40=warn, 50=error, 60=fatal
  if [type] == "pino" and [level] {
    if [level] >= 50 {
      mutate { add_tag => ["error"] }
    } else if [level] >= 40 {
      mutate { add_tag => ["warning"] }
    }
  }

  # Redis error detection
  if [type] == "redis" {
    grok {
      match => { "message" => "%{POSINT:pid}:%{WORD:role} %{MONTHDAY} %{MONTH} %{YEAR}? ?%{TIME} %{WORD:loglevel} %{GREEDYDATA:redis_message}" }
    }
    if [loglevel] in ["WARNING", "ERROR"] {
      mutate { add_tag => ["error"] }
    }
  }

  # NGINX error log detection (all entries are errors)
  if [type] == "nginx" {
    mutate { add_tag => ["error"] }
    grok {
      match => { "message" => "%{TIMESTAMP_ISO8601:timestamp} \[%{WORD:severity}\] %{GREEDYDATA:nginx_message}" }
    }
  }

  # NGINX access log - detect 5xx errors
  if [type] == "nginx_access" {
    grok {
      match => { "message" => "%{COMBINEDAPACHELOG}" }
    }
    if [response] =~ /^5\d{2}$/ {
      mutate { add_tag => ["error"] }
    }
  }

  # PM2 error log detection - tag lines with actual error indicators
  if [type] == "pm2" {
    if [message] =~ /Error:|error:|ECONNREFUSED|ENOENT|TypeError|ReferenceError|SyntaxError/ {
      mutate { add_tag => ["error"] }
    }
  }
}

output {
  # Production app errors -> flyer-crawler-backend (project 1)
  if "error" in [tags] and "app" in [tags] and "production" in [tags] {
    http {
      url => "http://localhost:8000/api/1/store/"
      http_method => "post"
      format => "json"
      headers => {
        "X-Sentry-Auth" => "Sentry sentry_version=7, sentry_client=logstash/1.0, sentry_key=YOUR_PROD_BACKEND_DSN_KEY"
      }
    }
  }

  # Test app errors -> flyer-crawler-backend-test (project 3)
  if "error" in [tags] and "app" in [tags] and "test" in [tags] {
    http {
      url => "http://localhost:8000/api/3/store/"
      http_method => "post"
      format => "json"
      headers => {
        "X-Sentry-Auth" => "Sentry sentry_version=7, sentry_client=logstash/1.0, sentry_key=YOUR_TEST_BACKEND_DSN_KEY"
      }
    }
  }

  # Production infrastructure errors (Redis, NGINX, PM2) -> flyer-crawler-infrastructure (project 5)
  if "error" in [tags] and "infra" in [tags] and "production" in [tags] {
    http {
      url => "http://localhost:8000/api/5/store/"
      http_method => "post"
      format => "json"
      headers => {
        "X-Sentry-Auth" => "Sentry sentry_version=7, sentry_client=logstash/1.0, sentry_key=b083076f94fb461b889d5dffcbef43bf"
      }
    }
  }

  # Test infrastructure errors (PM2 test logs) -> flyer-crawler-test-infrastructure (project 6)
  if "error" in [tags] and "infra" in [tags] and "test" in [tags] {
    http {
      url => "http://localhost:8000/api/6/store/"
      http_method => "post"
      format => "json"
      headers => {
        "X-Sentry-Auth" => "Sentry sentry_version=7, sentry_client=logstash/1.0, sentry_key=25020dd6c2b74ad78463ec90e90fadab"
      }
    }
  }

  # Debug output (uncomment to troubleshoot)
  # stdout { codec => rubydebug }
}

Bugsink Project DSNs:

Project DSN Key Project ID
flyer-crawler-backend 911aef02b9a548fa8fabb8a3c81abfe5 1
flyer-crawler-frontend (used by app, not Logstash) 2
flyer-crawler-backend-test cdb99c314589431e83d4cc38a809449b 3
flyer-crawler-frontend-test (used by app, not Logstash) 4
flyer-crawler-infrastructure b083076f94fb461b889d5dffcbef43bf 5
flyer-crawler-test-infrastructure 25020dd6c2b74ad78463ec90e90fadab 6

Note: The DSN key is the part before @ in the full DSN URL (e.g., https://KEY@bugsink.projectium.com/PROJECT_ID).

Note on PM2 Logs: PM2 error logs capture stack traces from stderr, which are valuable for debugging startup errors and uncaught exceptions. Production PM2 logs go to project 5 (infrastructure), test PM2 logs go to project 6 (test-infrastructure).

Step 5: Create Logstash State Directory and Fix Config Path

Logstash needs a directory to track which log lines it has already processed, and a symlink so it can find its config files:

# Create state directory for sincedb files
sudo mkdir -p /var/lib/logstash
sudo chown logstash:logstash /var/lib/logstash

# Create symlink so Logstash finds its config (avoids "Could not find logstash.yml" warning)
sudo ln -sf /etc/logstash /usr/share/logstash/config

Step 6: Grant Logstash Access to Application Logs

Logstash runs as the logstash user and needs permission to read log files:

# Add logstash user to adm group (for nginx and redis logs)
sudo usermod -aG adm logstash

# Make application log files readable (created automatically when app starts)
sudo chmod 644 /var/www/flyer-crawler.projectium.com/logs/app.log 2>/dev/null || echo "Production log file not yet created"
sudo chmod 644 /var/www/flyer-crawler-test.projectium.com/logs/app.log 2>/dev/null || echo "Test log file not yet created"

# Make Redis logs and directory readable
sudo chmod 755 /var/log/redis/
sudo chmod 644 /var/log/redis/redis-server.log

# Make NGINX logs readable
sudo chmod 644 /var/log/nginx/access.log /var/log/nginx/error.log

# Make PM2 logs and directories accessible
sudo chmod 755 /home/gitea-runner/
sudo chmod 755 /home/gitea-runner/.pm2/
sudo chmod 755 /home/gitea-runner/.pm2/logs/
sudo chmod 644 /home/gitea-runner/.pm2/logs/*.log

# Verify logstash group membership
groups logstash

Note: The application log files are created automatically when the application starts. Run the chmod commands after the first deployment.

Step 7: Test Logstash Configuration

Test the configuration before starting:

sudo /usr/share/logstash/bin/logstash --config.test_and_exit -f /etc/logstash/conf.d/bugsink.conf

You should see Configuration OK if there are no errors.

Step 8: Start Logstash

sudo systemctl enable logstash
sudo systemctl start logstash
sudo systemctl status logstash

View Logstash logs to verify it's working:

sudo journalctl -u logstash -f

Troubleshooting Logstash

Issue Solution
"Permission denied" errors Check file permissions on log files and sincedb directory
No events being processed Verify log file paths exist and contain data
HTTP output errors Check Bugsink is running and DSN key is correct
Logstash not starting Run config test: sudo /usr/share/logstash/bin/logstash --config.test_and_exit -f /etc/logstash/conf.d/

Alternative: Skip Logstash

Since the flyer-crawler application already sends errors directly to Bugsink via the Sentry SDK (configured in Steps 11-12), you may not need Logstash at all. Logstash is primarily useful for:

  • Aggregating logs from services that don't have native Sentry support (Redis, NGINX)
  • Centralizing all logs in one place
  • Complex log transformations

If you only need application error tracking, the Sentry SDK integration is sufficient.


SSL/TLS with Let's Encrypt

Install Certbot

sudo apt install -y certbot python3-certbot-nginx

Obtain Certificate

sudo certbot --nginx -d flyer-crawler.projectium.com

Certbot will automatically configure NGINX for HTTPS.

Auto-Renewal

Certbot installs a systemd timer for automatic renewal. Verify:

sudo systemctl status certbot.timer

Firewall Configuration

Configure UFW

sudo ufw default deny incoming
sudo ufw default allow outgoing

# Allow SSH
sudo ufw allow ssh

# Allow HTTP and HTTPS
sudo ufw allow 80/tcp
sudo ufw allow 443/tcp

# Enable firewall
sudo ufw enable

Important: Bugsink (port 8000) should NOT be exposed externally. It listens on localhost only.


Maintenance Commands

Application Management

Task Command
View PM2 status pm2 status
View application logs pm2 logs
Restart all processes pm2 restart all
Restart specific app pm2 restart flyer-crawler-api
Update application cd /opt/flyer-crawler && git pull && npm install && npm run build && pm2 restart all

Service Management

Service Start Stop Status
PostgreSQL sudo systemctl start postgresql sudo systemctl stop postgresql sudo systemctl status postgresql
Redis sudo systemctl start redis-server sudo systemctl stop redis-server sudo systemctl status redis-server
NGINX sudo systemctl start nginx sudo systemctl stop nginx sudo systemctl status nginx
Bugsink sudo systemctl start bugsink sudo systemctl stop bugsink sudo systemctl status bugsink
Logstash sudo systemctl start logstash sudo systemctl stop logstash sudo systemctl status logstash

Database Backup

# Backup application database
pg_dump -U flyer_crawler -h localhost flyer_crawler > backup_$(date +%Y%m%d).sql

# Backup Bugsink database
pg_dump -U bugsink -h localhost bugsink > bugsink_backup_$(date +%Y%m%d).sql

Log Locations

Log Location
Application (PM2) ~/.pm2/logs/
NGINX access /var/log/nginx/access.log
NGINX error /var/log/nginx/error.log
PostgreSQL /var/log/postgresql/
Redis /var/log/redis/
Bugsink journalctl -u bugsink
Logstash /var/log/logstash/