flyer-crawler.projectium.com/docs/BARE-METAL-SETUP.md

# Bare-Metal Server Setup Guide

This guide covers the manual installation of Flyer Crawler and its dependencies on a bare-metal Ubuntu server (e.g., a colocation server). This is the definitive reference for setting up a production environment without containers.

**Target Environment**: Ubuntu 22.04 LTS (or newer)

---

## Table of Contents

1. [System Prerequisites](#system-prerequisites)
2. [PostgreSQL Setup](#postgresql-setup)
3. [Redis Setup](#redis-setup)
4. [Node.js and Application Setup](#nodejs-and-application-setup)
5. [PM2 Process Manager](#pm2-process-manager)
6. [NGINX Reverse Proxy](#nginx-reverse-proxy)
7. [Bugsink Error Tracking](#bugsink-error-tracking)
8. [Logstash Log Aggregation](#logstash-log-aggregation)
9. [SSL/TLS with Let's Encrypt](#ssltls-with-lets-encrypt)
10. [Firewall Configuration](#firewall-configuration)
11. [Maintenance Commands](#maintenance-commands)

---

## System Prerequisites

Update the system and install essential packages:

```bash
sudo apt update && sudo apt upgrade -y
sudo apt install -y curl git build-essential python3 python3-pip python3-venv
```

---

## PostgreSQL Setup

### Install PostgreSQL 14+ with PostGIS

```bash
# Add PostgreSQL APT repository
sudo sh -c 'echo "deb http://apt.postgresql.org/pub/repos/apt $(lsb_release -cs)-pgdg main" > /etc/apt/sources.list.d/pgdg.list'
wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add -
sudo apt update

# Install PostgreSQL and PostGIS
sudo apt install -y postgresql-14 postgresql-14-postgis-3
```

### Create Application Database and User

```bash
sudo -u postgres psql
```

```sql
-- Create application user and database
CREATE USER flyer_crawler WITH PASSWORD 'YOUR_SECURE_PASSWORD';
CREATE DATABASE flyer_crawler OWNER flyer_crawler;

-- Connect to the database and enable extensions
\c flyer_crawler

CREATE EXTENSION IF NOT EXISTS postgis;
CREATE EXTENSION IF NOT EXISTS pg_trgm;
CREATE EXTENSION IF NOT EXISTS pgcrypto;

-- Grant privileges
GRANT ALL PRIVILEGES ON DATABASE flyer_crawler TO flyer_crawler;

\q
```

### Configure PostgreSQL for Remote Access (if needed)

Edit `/etc/postgresql/14/main/postgresql.conf`:

```conf
listen_addresses = 'localhost'  # Change to '*' for remote access
```

Edit `/etc/postgresql/14/main/pg_hba.conf` to add allowed hosts:

```conf
# Local connections
local   all   all   peer
host    all   all   127.0.0.1/32   scram-sha-256
```

Restart PostgreSQL:

```bash
sudo systemctl restart postgresql
```

---

## Redis Setup

### Install Redis

```bash
sudo apt install -y redis-server
```

### Configure Redis Password

Edit `/etc/redis/redis.conf`:

```conf
requirepass YOUR_REDIS_PASSWORD
```

Restart Redis:

```bash
sudo systemctl restart redis-server
sudo systemctl enable redis-server
```

### Test Redis Connection

```bash
redis-cli -a YOUR_REDIS_PASSWORD ping
# Should output: PONG
```

---

## Node.js and Application Setup

### Install Node.js 20.x

```bash
curl -fsSL https://deb.nodesource.com/setup_20.x | sudo -E bash -
sudo apt install -y nodejs
```

Verify installation:

```bash
node --version  # Should output v20.x.x
npm --version
```

### Install System Dependencies for PDF Processing

```bash
sudo apt install -y poppler-utils  # For pdftocairo
```

### Clone and Install Application

```bash
# Create application directory
sudo mkdir -p /opt/flyer-crawler
sudo chown $USER:$USER /opt/flyer-crawler

# Clone repository
cd /opt/flyer-crawler
git clone https://gitea.projectium.com/flyer-crawler/flyer-crawler.projectium.com.git .

# Install dependencies
npm install

# Build for production
npm run build
```

### Configure Environment Variables

**Important:** The flyer-crawler application does **not** use local environment files in production. All secrets are managed through **Gitea CI/CD secrets** and injected during deployment.

#### How Secrets Work

1. **Secrets are stored in Gitea** at Repository → Settings → Actions → Secrets
2. **Workflow files** (`.gitea/workflows/deploy-to-prod.yml`) reference secrets using `${{ secrets.SECRET_NAME }}`
3. **PM2** receives environment variables from the workflow's `env:` block
4. **ecosystem.config.cjs** passes variables to the application via `process.env`

#### Required Gitea Secrets

Before deployment, ensure these secrets are configured in Gitea:

**Shared Secrets** (used by both production and test):

| Secret Name            | Description                             |
| ---------------------- | --------------------------------------- |
| `DB_HOST`              | Database hostname (usually `localhost`) |
| `DB_USER`              | Database username                       |
| `DB_PASSWORD`          | Database password                       |
| `JWT_SECRET`           | JWT signing secret (min 32 characters)  |
| `GOOGLE_MAPS_API_KEY`  | Google Maps API key                     |
| `GOOGLE_CLIENT_ID`     | Google OAuth client ID                  |
| `GOOGLE_CLIENT_SECRET` | Google OAuth client secret              |
| `GH_CLIENT_ID`         | GitHub OAuth client ID                  |
| `GH_CLIENT_SECRET`     | GitHub OAuth client secret              |

**Production-Specific Secrets**:

| Secret Name                 | Description                                                          |
| --------------------------- | -------------------------------------------------------------------- |
| `DB_DATABASE_PROD`          | Production database name (`flyer_crawler`)                           |
| `REDIS_PASSWORD_PROD`       | Redis password for production (uses database 0)                      |
| `VITE_GOOGLE_GENAI_API_KEY` | Gemini API key for production                                        |
| `SENTRY_DSN`                | Bugsink backend DSN (see [Bugsink section](#bugsink-error-tracking)) |
| `VITE_SENTRY_DSN`           | Bugsink frontend DSN                                                 |

**Test-Specific Secrets**:

| Secret Name                      | Description                                                                   |
| -------------------------------- | ----------------------------------------------------------------------------- |
| `DB_DATABASE_TEST`               | Test database name (`flyer-crawler-test`)                                     |
| `REDIS_PASSWORD_TEST`            | Redis password for test (uses database 1 for isolation)                       |
| `VITE_GOOGLE_GENAI_API_KEY_TEST` | Gemini API key for test environment                                           |
| `SENTRY_DSN_TEST`                | Bugsink backend DSN for test (see [Bugsink section](#bugsink-error-tracking)) |
| `VITE_SENTRY_DSN_TEST`           | Bugsink frontend DSN for test                                                 |

#### Test Environment Details

The test environment (`flyer-crawler-test.projectium.com`) uses **both** Gitea CI/CD secrets and a local `.env.test` file:

| Path                                                   | Purpose                                  |
| ------------------------------------------------------ | ---------------------------------------- |
| `/var/www/flyer-crawler-test.projectium.com/`          | Test application directory               |
| `/var/www/flyer-crawler-test.projectium.com/.env.test` | Local overrides for test-specific config |

**Key differences from production:**

- Uses Redis database **1** (production uses database **0**) to isolate job queues
- PM2 processes are named with `-test` suffix (e.g., `flyer-crawler-api-test`)
- Deployed automatically on every push to `main` branch
- Has a `.env.test` file for additional local configuration overrides

For detailed information on secrets management, see [CLAUDE.md](../CLAUDE.md).

---

## PM2 Process Manager

### Install PM2 Globally

```bash
sudo npm install -g pm2
```

### PM2 Configuration Files

The application uses **separate ecosystem config files** for production and test environments:

| File                        | Purpose               | Processes Started                                                                            |
| --------------------------- | --------------------- | -------------------------------------------------------------------------------------------- |
| `ecosystem.config.cjs`      | Production deployment | `flyer-crawler-api`, `flyer-crawler-worker`, `flyer-crawler-analytics-worker`                |
| `ecosystem-test.config.cjs` | Test deployment       | `flyer-crawler-api-test`, `flyer-crawler-worker-test`, `flyer-crawler-analytics-worker-test` |

**Key Points:**

- Production and test processes run **simultaneously** with distinct names
- Test processes use `NODE_ENV=test` which enables file logging
- Test processes use Redis database 1 (isolated from production which uses database 0)
- Both configs validate required environment variables but only warn (don't exit) if missing

### Start Production Application

```bash
cd /var/www/flyer-crawler.projectium.com

# Set required environment variables (usually done via CI/CD)
export DB_HOST=localhost
export JWT_SECRET=your-secret
export GEMINI_API_KEY=your-api-key
# ... other required variables

pm2 startOrReload ecosystem.config.cjs --update-env && pm2 save
```

This starts three production processes:

- `flyer-crawler-api` - Main API server (port 3001)
- `flyer-crawler-worker` - Background job worker
- `flyer-crawler-analytics-worker` - Analytics processing worker

### Start Test Application

```bash
cd /var/www/flyer-crawler-test.projectium.com

# Set required environment variables (usually done via CI/CD)
export DB_HOST=localhost
export DB_NAME=flyer-crawler-test
export JWT_SECRET=your-secret
export GEMINI_API_KEY=your-test-api-key
export REDIS_URL=redis://localhost:6379/1  # Use database 1 for isolation
# ... other required variables

pm2 startOrReload ecosystem-test.config.cjs --update-env && pm2 save
```

This starts three test processes (running alongside production):

- `flyer-crawler-api-test` - Test API server (port 3001 via different NGINX vhost)
- `flyer-crawler-worker-test` - Test background job worker
- `flyer-crawler-analytics-worker-test` - Test analytics worker

### Verify Running Processes

After starting both environments, you should see 6 application processes:

```bash
pm2 list
```

Expected output:

```text
┌────┬───────────────────────────────────┬──────────┬────────┬───────────┐
│ id │ name                              │ mode     │ status │ cpu       │
├────┼───────────────────────────────────┼──────────┼────────┼───────────┤
│ 0  │ flyer-crawler-api                 │ cluster  │ online │ 0%        │
│ 1  │ flyer-crawler-worker              │ fork     │ online │ 0%        │
│ 2  │ flyer-crawler-analytics-worker    │ fork     │ online │ 0%        │
│ 3  │ flyer-crawler-api-test            │ fork     │ online │ 0%        │
│ 4  │ flyer-crawler-worker-test         │ fork     │ online │ 0%        │
│ 5  │ flyer-crawler-analytics-worker-test│ fork     │ online │ 0%        │
└────┴───────────────────────────────────┴──────────┴────────┴───────────┘
```

### Configure PM2 Startup

```bash
pm2 startup systemd
# Follow the command output to enable PM2 on boot

pm2 save
```

### PM2 Log Rotation

```bash
pm2 install pm2-logrotate
pm2 set pm2-logrotate:max_size 10M
pm2 set pm2-logrotate:retain 14
pm2 set pm2-logrotate:compress true
```

### Useful PM2 Commands

```bash
# View logs for a specific process
pm2 logs flyer-crawler-api-test --lines 50

# View environment variables for a process
pm2 env <process-id>

# Restart only test processes
pm2 restart flyer-crawler-api-test flyer-crawler-worker-test flyer-crawler-analytics-worker-test

# Delete all test processes (without affecting production)
pm2 delete flyer-crawler-api-test flyer-crawler-worker-test flyer-crawler-analytics-worker-test
```

---

## NGINX Reverse Proxy

### Install NGINX

```bash
sudo apt install -y nginx
```

### Create Site Configuration

Create `/etc/nginx/sites-available/flyer-crawler.projectium.com`:

```nginx
server {
    listen 80;
    server_name flyer-crawler.projectium.com;

    # Redirect HTTP to HTTPS (uncomment after SSL setup)
    # return 301 https://$server_name$request_uri;

    location / {
        proxy_pass http://localhost:5173;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection 'upgrade';
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
        proxy_cache_bypass $http_upgrade;
    }

    location /api {
        proxy_pass http://localhost:3001;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection 'upgrade';
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
        proxy_cache_bypass $http_upgrade;

        # File upload size limit
        client_max_body_size 50M;
    }

    # MIME type fix for .mjs files
    types {
        application/javascript js mjs;
    }
}
```

### Enable the Site

```bash
sudo ln -s /etc/nginx/sites-available/flyer-crawler.projectium.com /etc/nginx/sites-enabled/
sudo nginx -t
sudo systemctl reload nginx
sudo systemctl enable nginx
```

---

## Bugsink Error Tracking

Bugsink is a lightweight, self-hosted Sentry-compatible error tracking system. This guide follows the [official Bugsink single-server production setup](https://www.bugsink.com/docs/single-server-production/).

See [ADR-015](adr/0015-application-performance-monitoring-and-error-tracking.md) for architecture details.

### Step 1: Create Bugsink User

Create a dedicated non-root user for Bugsink:

```bash
sudo adduser bugsink --disabled-password --gecos ""
```

### Step 2: Set Up Virtual Environment and Install Bugsink

Switch to the bugsink user:

```bash
sudo su - bugsink
```

Create the virtual environment:

```bash
python3 -m venv venv
```

Activate the virtual environment:

```bash
source venv/bin/activate
```

You should see `(venv)` at the beginning of your prompt. Now install Bugsink:

```bash
pip install bugsink --upgrade
bugsink-show-version
```

You should see output like `bugsink 2.x.x`.

### Step 3: Create Configuration File

Generate the configuration file. Replace `bugsink.yourdomain.com` with your actual hostname:

```bash
bugsink-create-conf --template=singleserver --host=bugsink.yourdomain.com
```

This creates `bugsink_conf.py` in `/home/bugsink/`. Edit it to customize settings:

```bash
nano bugsink_conf.py
```

**Key settings to review:**

| Setting             | Description                                                                     |
| ------------------- | ------------------------------------------------------------------------------- |
| `BASE_URL`          | The URL where Bugsink will be accessed (e.g., `https://bugsink.yourdomain.com`) |
| `SITE_TITLE`        | Display name for your Bugsink instance                                          |
| `SECRET_KEY`        | Auto-generated, but verify it exists                                            |
| `TIME_ZONE`         | Your timezone (e.g., `America/New_York`)                                        |
| `USER_REGISTRATION` | Set to `"closed"` to disable public signup                                      |
| `SINGLE_USER`       | Set to `True` if only one user will use this instance                           |

### Step 4: Initialize Database

Bugsink uses SQLite by default, which is recommended for single-server setups. Run the database migrations:

```bash
bugsink-manage migrate
bugsink-manage migrate snappea --database=snappea
```

Verify the database files were created:

```bash
ls *.sqlite3
```

You should see `db.sqlite3` and `snappea.sqlite3`.

### Step 5: Create Admin User

Create the superuser account. Using your email as the username is recommended:

```bash
bugsink-manage createsuperuser
```

**Important:** Save these credentials - you'll need them to log into the Bugsink web UI.

### Step 6: Verify Configuration

Run Django's deployment checks:

```bash
bugsink-manage check_migrations
bugsink-manage check --deploy --fail-level WARNING
```

Exit back to root for the next steps:

```bash
exit
```

### Step 7: Create Gunicorn Service

Create `/etc/systemd/system/gunicorn-bugsink.service`:

```bash
sudo nano /etc/systemd/system/gunicorn-bugsink.service
```

Add the following content:

```ini
[Unit]
Description=Gunicorn daemon for Bugsink
After=network.target

[Service]
Restart=always
Type=notify
User=bugsink
Group=bugsink

Environment="PYTHONUNBUFFERED=1"
WorkingDirectory=/home/bugsink
ExecStart=/home/bugsink/venv/bin/gunicorn \
    --bind="127.0.0.1:8000" \
    --workers=4 \
    --timeout=6 \
    --access-logfile - \
    --max-requests=1000 \
    --max-requests-jitter=100 \
    bugsink.wsgi
ExecReload=/bin/kill -s HUP $MAINPID
KillMode=mixed
TimeoutStopSec=5

[Install]
WantedBy=multi-user.target
```

Enable and start the service:

```bash
sudo systemctl daemon-reload
sudo systemctl enable --now gunicorn-bugsink.service
sudo systemctl status gunicorn-bugsink.service
```

Test that Gunicorn is responding (replace hostname):

```bash
curl http://localhost:8000/accounts/login/ --header "Host: bugsink.yourdomain.com"
```

You should see HTML output containing a login form.

### Step 8: Create Snappea Background Worker Service

Snappea is Bugsink's background task processor. Create `/etc/systemd/system/snappea.service`:

```bash
sudo nano /etc/systemd/system/snappea.service
```

Add the following content:

```ini
[Unit]
Description=Snappea daemon for Bugsink background tasks
After=network.target

[Service]
Restart=always
User=bugsink
Group=bugsink

Environment="PYTHONUNBUFFERED=1"
WorkingDirectory=/home/bugsink
ExecStart=/home/bugsink/venv/bin/bugsink-runsnappea
KillMode=mixed
TimeoutStopSec=5
RuntimeMaxSec=1d

[Install]
WantedBy=multi-user.target
```

Enable and start the service:

```bash
sudo systemctl daemon-reload
sudo systemctl enable --now snappea.service
sudo systemctl status snappea.service
```

Verify snappea is working:

```bash
sudo su - bugsink
source venv/bin/activate
bugsink-manage checksnappea
exit
```

### Step 9: Configure NGINX for Bugsink

Create `/etc/nginx/sites-available/bugsink`:

```bash
sudo nano /etc/nginx/sites-available/bugsink
```

Add the following (replace `bugsink.yourdomain.com` with your hostname):

```nginx
server {
    server_name bugsink.yourdomain.com;
    listen 80;

    client_max_body_size 20M;

    access_log /var/log/nginx/bugsink.access.log;
    error_log /var/log/nginx/bugsink.error.log;

    location / {
        proxy_pass http://127.0.0.1:8000;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-Proto $scheme;
    }
}
```

Enable the site:

```bash
sudo ln -s /etc/nginx/sites-available/bugsink /etc/nginx/sites-enabled/
sudo nginx -t
sudo systemctl reload nginx
```

### Step 10: Configure SSL with Certbot (Recommended)

```bash
sudo certbot --nginx -d bugsink.yourdomain.com
```

After SSL is configured, update the NGINX config to add security headers. Edit `/etc/nginx/sites-available/bugsink` and add to the `location /` block:

```nginx
add_header Strict-Transport-Security "max-age=31536000; preload" always;
```

Reload NGINX:

```bash
sudo nginx -t
sudo systemctl reload nginx
```

### Step 11: Create Projects and Get DSNs

1. Access Bugsink UI at `https://bugsink.yourdomain.com`
2. Log in with the admin credentials you created
3. Create a new team (or use the default)
4. Create projects for each environment:

   **Production:**
   - **flyer-crawler-backend** (Platform: Node.js)
   - **flyer-crawler-frontend** (Platform: JavaScript/React)

   **Test:**
   - **flyer-crawler-backend-test** (Platform: Node.js)
   - **flyer-crawler-frontend-test** (Platform: JavaScript/React)

5. For each project, go to Settings → Client Keys (DSN)
6. Copy the DSN URLs - you'll have 4 DSNs total (2 for production, 2 for test)

> **Note:** The dev container runs its own local Bugsink instance at `localhost:8000` - no remote DSNs needed for development.

### Step 12: Configure Application to Use Bugsink

The flyer-crawler application receives its configuration via **Gitea CI/CD secrets**, not local environment files. Follow these steps to add the Bugsink DSNs:

#### 1. Add Secrets in Gitea

Navigate to your repository in Gitea:

1. Go to **Settings** → **Actions** → **Secrets**
2. Add the following secrets:

**Production DSNs:**

| Secret Name       | Value                                  | Description             |
| ----------------- | -------------------------------------- | ----------------------- |
| `SENTRY_DSN`      | `https://KEY@bugsink.yourdomain.com/1` | Production backend DSN  |
| `VITE_SENTRY_DSN` | `https://KEY@bugsink.yourdomain.com/2` | Production frontend DSN |

**Test DSNs:**

| Secret Name            | Value                                  | Description       |
| ---------------------- | -------------------------------------- | ----------------- |
| `SENTRY_DSN_TEST`      | `https://KEY@bugsink.yourdomain.com/3` | Test backend DSN  |
| `VITE_SENTRY_DSN_TEST` | `https://KEY@bugsink.yourdomain.com/4` | Test frontend DSN |

> **Note:** The project numbers in the DSN URLs are assigned by Bugsink when you create each project. Use the actual DSN values from Step 11.

#### 2. Update the Deployment Workflows

**Production** (`deploy-to-prod.yml`):

In the `Install Backend Dependencies and Restart Production Server` step, add to the `env:` block:

```yaml
env:
  # ... existing secrets ...
  # Sentry/Bugsink Error Tracking
  SENTRY_DSN: ${{ secrets.SENTRY_DSN }}
  SENTRY_ENVIRONMENT: 'production'
  SENTRY_ENABLED: 'true'
```

In the build step, add frontend variables:

```yaml
VITE_SENTRY_DSN=${{ secrets.VITE_SENTRY_DSN }} \
VITE_SENTRY_ENVIRONMENT=production \
VITE_SENTRY_ENABLED=true \
npm run build
```

**Test** (`deploy-to-test.yml`):

In the `Install Backend Dependencies and Restart Test Server` step, add to the `env:` block:

```yaml
env:
  # ... existing secrets ...
  # Sentry/Bugsink Error Tracking (Test)
  SENTRY_DSN: ${{ secrets.SENTRY_DSN_TEST }}
  SENTRY_ENVIRONMENT: 'test'
  SENTRY_ENABLED: 'true'
```

In the build step, add frontend variables:

```yaml
VITE_SENTRY_DSN=${{ secrets.VITE_SENTRY_DSN_TEST }} \
VITE_SENTRY_ENVIRONMENT=test \
VITE_SENTRY_ENABLED=true \
npm run build
```

#### 3. Update ecosystem.config.cjs

Add Sentry variables to the `sharedEnv` object in `ecosystem.config.cjs`:

```javascript
const sharedEnv = {
  // ... existing variables ...
  SENTRY_DSN: process.env.SENTRY_DSN,
  SENTRY_ENVIRONMENT: process.env.SENTRY_ENVIRONMENT,
  SENTRY_ENABLED: process.env.SENTRY_ENABLED,
};
```

#### 4. Dev Container (No Configuration Needed)

The dev container runs its own **local Bugsink instance** at `http://localhost:8000`. No remote DSNs or Gitea secrets are needed for development:

- DSNs are pre-configured in `compose.dev.yml`
- Admin UI: `http://localhost:8000` (login: `admin@localhost` / `admin`)
- Errors stay local and isolated from production/test

#### 5. Deploy to Apply Changes

Trigger deployments via Gitea Actions:

- **Test**: Automatically deploys on push to `main`
- **Production**: Manual trigger via workflow dispatch

**Note:** There is no `/etc/flyer-crawler/environment` file on the server. Production and test secrets are managed through Gitea CI/CD and injected at deployment time. Dev container uses local `.env` file. See [CLAUDE.md](../CLAUDE.md) for details.

### Step 13: Test Error Tracking

You can test Bugsink is working before configuring the flyer-crawler application.

Switch to the bugsink user and open a Python shell:

```bash
sudo su - bugsink
source venv/bin/activate
bugsink-manage shell
```

In the Python shell, send a test message using the **backend DSN** from Step 11:

```python
import sentry_sdk
sentry_sdk.init("https://YOUR_BACKEND_KEY@bugsink.yourdomain.com/1")
sentry_sdk.capture_message("Test message from Bugsink setup")
exit()
```

Exit back to root:

```bash
exit
```

Check the Bugsink UI - you should see the test message appear in the `flyer-crawler-backend` project.

### Step 14: Test from Flyer-Crawler Application (After App Setup)

Once the flyer-crawler application has been deployed with the Sentry secrets configured in Step 12:

```bash
cd /var/www/flyer-crawler.projectium.com
npx tsx scripts/test-bugsink.ts
```

Check the Bugsink UI - you should see a test event appear.

### Bugsink Maintenance Commands

| Task                    | Command                                                                                                                                     |
| ----------------------- | ------------------------------------------------------------------------------------------------------------------------------------------- |
| View Gunicorn status    | `sudo systemctl status gunicorn-bugsink`                                                                                                    |
| View Snappea status     | `sudo systemctl status snappea`                                                                                                             |
| View Gunicorn logs      | `sudo journalctl -u gunicorn-bugsink -f`                                                                                                    |
| View Snappea logs       | `sudo journalctl -u snappea -f`                                                                                                             |
| Restart Bugsink         | `sudo systemctl restart gunicorn-bugsink snappea`                                                                                           |
| Run management commands | `sudo su - bugsink` then `source venv/bin/activate && bugsink-manage <command>`                                                             |
| Upgrade Bugsink         | `sudo su - bugsink && source venv/bin/activate && pip install bugsink --upgrade && exit && sudo systemctl restart gunicorn-bugsink snappea` |

---

## Logstash Log Aggregation

Logstash aggregates logs from the application and infrastructure, forwarding errors to Bugsink.

> **Note:** Logstash integration is **optional**. The flyer-crawler application already sends errors directly to Bugsink via the Sentry SDK. Logstash is only needed if you want to aggregate logs from other sources (Redis, NGINX, etc.) into Bugsink.

### Step 1: Create Application Log Directory

The flyer-crawler application automatically creates its log directory on startup, but you need to ensure proper permissions for Logstash to read the logs.

Create the log directories and set appropriate permissions:

```bash
# Create log directory for the production application
sudo mkdir -p /var/www/flyer-crawler.projectium.com/logs

# Set ownership to root (since PM2 runs as root)
sudo chown -R root:root /var/www/flyer-crawler.projectium.com/logs

# Make logs readable by logstash user
sudo chmod 755 /var/www/flyer-crawler.projectium.com/logs
```

For the test environment:

```bash
sudo mkdir -p /var/www/flyer-crawler-test.projectium.com/logs
sudo chown -R root:root /var/www/flyer-crawler-test.projectium.com/logs
sudo chmod 755 /var/www/flyer-crawler-test.projectium.com/logs
```

### Step 2: Application File Logging (Already Configured)

The flyer-crawler application uses Pino for logging and is configured to write logs to files in production/test environments:

**Log File Locations:**

| Environment   | Log File Path                                             |
| ------------- | --------------------------------------------------------- |
| Production    | `/var/www/flyer-crawler.projectium.com/logs/app.log`      |
| Test          | `/var/www/flyer-crawler-test.projectium.com/logs/app.log` |
| Dev Container | `/app/logs/app.log`                                       |

**How It Works:**

- In production/test: Pino writes JSON logs to both stdout (for PM2) AND `logs/app.log` (for Logstash)
- In development: Pino uses pino-pretty for human-readable console output only
- The log directory is created automatically if it doesn't exist
- You can override the log directory with the `LOG_DIR` environment variable

**Verify Logging After Deployment:**

After deploying the application, verify that logs are being written:

```bash
# Check production logs
ls -la /var/www/flyer-crawler.projectium.com/logs/
tail -f /var/www/flyer-crawler.projectium.com/logs/app.log

# Check test logs
ls -la /var/www/flyer-crawler-test.projectium.com/logs/
tail -f /var/www/flyer-crawler-test.projectium.com/logs/app.log
```

You should see JSON-formatted log entries like:

```json
{ "level": 30, "time": 1704067200000, "msg": "Server started on port 3001", "module": "server" }
```

### Step 3: Install Logstash

```bash
# Add Elastic APT repository
wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo gpg --dearmor -o /usr/share/keyrings/elastic-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/elastic-keyring.gpg] https://artifacts.elastic.co/packages/8.x/apt stable main" | sudo tee /etc/apt/sources.list.d/elastic-8.x.list

# Update and install
sudo apt update
sudo apt install -y logstash
```

Verify installation:

```bash
/usr/share/logstash/bin/logstash --version
```

### Step 4: Configure Logstash Pipeline

Create the pipeline configuration file:

```bash
sudo nano /etc/logstash/conf.d/bugsink.conf
```

Add the following content:

```conf
input {
  # Production application logs (Pino JSON format)
  file {
    path => "/var/www/flyer-crawler.projectium.com/logs/app.log"
    codec => json_lines
    type => "pino"
    tags => ["app", "production"]
    start_position => "end"
    sincedb_path => "/var/lib/logstash/sincedb_pino_prod"
  }

  # Test environment logs
  file {
    path => "/var/www/flyer-crawler-test.projectium.com/logs/app.log"
    codec => json_lines
    type => "pino"
    tags => ["app", "test"]
    start_position => "end"
    sincedb_path => "/var/lib/logstash/sincedb_pino_test"
  }

  # Redis logs (shared by both environments)
  file {
    path => "/var/log/redis/redis-server.log"
    type => "redis"
    tags => ["infra", "redis", "production"]
    start_position => "end"
    sincedb_path => "/var/lib/logstash/sincedb_redis"
  }

  # NGINX error logs (production)
  file {
    path => "/var/log/nginx/error.log"
    type => "nginx"
    tags => ["infra", "nginx", "production"]
    start_position => "end"
    sincedb_path => "/var/lib/logstash/sincedb_nginx_error"
  }

  # NGINX access logs - for detecting 5xx errors (production)
  file {
    path => "/var/log/nginx/access.log"
    type => "nginx_access"
    tags => ["infra", "nginx", "production"]
    start_position => "end"
    sincedb_path => "/var/lib/logstash/sincedb_nginx_access"
  }

  # PM2 error logs - Production (plain text stack traces)
  file {
    path => "/home/gitea-runner/.pm2/logs/flyer-crawler-*-error.log"
    exclude => "*-test-error.log"
    type => "pm2"
    tags => ["infra", "pm2", "production"]
    start_position => "end"
    sincedb_path => "/var/lib/logstash/sincedb_pm2_prod"
  }

  # PM2 error logs - Test
  file {
    path => "/home/gitea-runner/.pm2/logs/flyer-crawler-*-test-error.log"
    type => "pm2"
    tags => ["infra", "pm2", "test"]
    start_position => "end"
    sincedb_path => "/var/lib/logstash/sincedb_pm2_test"
  }
}

filter {
  # Pino log level detection
  # Pino levels: 10=trace, 20=debug, 30=info, 40=warn, 50=error, 60=fatal
  if [type] == "pino" and [level] {
    if [level] >= 50 {
      mutate { add_tag => ["error"] }
    } else if [level] >= 40 {
      mutate { add_tag => ["warning"] }
    }
  }

  # Redis error detection
  if [type] == "redis" {
    grok {
      match => { "message" => "%{POSINT:pid}:%{WORD:role} %{MONTHDAY} %{MONTH} %{YEAR}? ?%{TIME} %{WORD:loglevel} %{GREEDYDATA:redis_message}" }
    }
    if [loglevel] in ["WARNING", "ERROR"] {
      mutate { add_tag => ["error"] }
    }
  }

  # NGINX error log detection (all entries are errors)
  if [type] == "nginx" {
    mutate { add_tag => ["error"] }
    grok {
      match => { "message" => "%{TIMESTAMP_ISO8601:timestamp} \[%{WORD:severity}\] %{GREEDYDATA:nginx_message}" }
    }
  }

  # NGINX access log - detect 5xx errors
  if [type] == "nginx_access" {
    grok {
      match => { "message" => "%{COMBINEDAPACHELOG}" }
    }
    if [response] =~ /^5\d{2}$/ {
      mutate { add_tag => ["error"] }
    }
  }

  # PM2 error log detection - tag lines with actual error indicators
  if [type] == "pm2" {
    if [message] =~ /Error:|error:|ECONNREFUSED|ENOENT|TypeError|ReferenceError|SyntaxError/ {
      mutate { add_tag => ["error"] }
    }
  }
}

output {
  # Production app errors -> flyer-crawler-backend (project 1)
  if "error" in [tags] and "app" in [tags] and "production" in [tags] {
    http {
      url => "http://localhost:8000/api/1/store/"
      http_method => "post"
      format => "json"
      headers => {
        "X-Sentry-Auth" => "Sentry sentry_version=7, sentry_client=logstash/1.0, sentry_key=YOUR_PROD_BACKEND_DSN_KEY"
      }
    }
  }

  # Test app errors -> flyer-crawler-backend-test (project 3)
  if "error" in [tags] and "app" in [tags] and "test" in [tags] {
    http {
      url => "http://localhost:8000/api/3/store/"
      http_method => "post"
      format => "json"
      headers => {
        "X-Sentry-Auth" => "Sentry sentry_version=7, sentry_client=logstash/1.0, sentry_key=YOUR_TEST_BACKEND_DSN_KEY"
      }
    }
  }

  # Production infrastructure errors (Redis, NGINX, PM2) -> flyer-crawler-infrastructure (project 5)
  if "error" in [tags] and "infra" in [tags] and "production" in [tags] {
    http {
      url => "http://localhost:8000/api/5/store/"
      http_method => "post"
      format => "json"
      headers => {
        "X-Sentry-Auth" => "Sentry sentry_version=7, sentry_client=logstash/1.0, sentry_key=b083076f94fb461b889d5dffcbef43bf"
      }
    }
  }

  # Test infrastructure errors (PM2 test logs) -> flyer-crawler-test-infrastructure (project 6)
  if "error" in [tags] and "infra" in [tags] and "test" in [tags] {
    http {
      url => "http://localhost:8000/api/6/store/"
      http_method => "post"
      format => "json"
      headers => {
        "X-Sentry-Auth" => "Sentry sentry_version=7, sentry_client=logstash/1.0, sentry_key=25020dd6c2b74ad78463ec90e90fadab"
      }
    }
  }

  # Debug output (uncomment to troubleshoot)
  # stdout { codec => rubydebug }
}
```

**Bugsink Project DSNs:**

| Project                             | DSN Key                            | Project ID |
| ----------------------------------- | ---------------------------------- | ---------- |
| `flyer-crawler-backend`             | `911aef02b9a548fa8fabb8a3c81abfe5` | 1          |
| `flyer-crawler-frontend`            | (used by app, not Logstash)        | 2          |
| `flyer-crawler-backend-test`        | `cdb99c314589431e83d4cc38a809449b` | 3          |
| `flyer-crawler-frontend-test`       | (used by app, not Logstash)        | 4          |
| `flyer-crawler-infrastructure`      | `b083076f94fb461b889d5dffcbef43bf` | 5          |
| `flyer-crawler-test-infrastructure` | `25020dd6c2b74ad78463ec90e90fadab` | 6          |

**Note:** The DSN key is the part before `@` in the full DSN URL (e.g., `https://KEY@bugsink.projectium.com/PROJECT_ID`).

**Note on PM2 Logs:** PM2 error logs capture stack traces from stderr, which are valuable for debugging startup errors and uncaught exceptions. Production PM2 logs go to project 5 (infrastructure), test PM2 logs go to project 6 (test-infrastructure).

### Step 5: Create Logstash State Directory and Fix Config Path

Logstash needs a directory to track which log lines it has already processed, and a symlink so it can find its config files:

```bash
# Create state directory for sincedb files
sudo mkdir -p /var/lib/logstash
sudo chown logstash:logstash /var/lib/logstash

# Create symlink so Logstash finds its config (avoids "Could not find logstash.yml" warning)
sudo ln -sf /etc/logstash /usr/share/logstash/config
```

### Step 6: Grant Logstash Access to Application Logs

Logstash runs as the `logstash` user and needs permission to read log files:

```bash
# Add logstash user to adm group (for nginx and redis logs)
sudo usermod -aG adm logstash

# Make application log files readable (created automatically when app starts)
sudo chmod 644 /var/www/flyer-crawler.projectium.com/logs/app.log 2>/dev/null || echo "Production log file not yet created"
sudo chmod 644 /var/www/flyer-crawler-test.projectium.com/logs/app.log 2>/dev/null || echo "Test log file not yet created"

# Make Redis logs and directory readable
sudo chmod 755 /var/log/redis/
sudo chmod 644 /var/log/redis/redis-server.log

# Make NGINX logs readable
sudo chmod 644 /var/log/nginx/access.log /var/log/nginx/error.log

# Make PM2 logs and directories accessible
sudo chmod 755 /home/gitea-runner/
sudo chmod 755 /home/gitea-runner/.pm2/
sudo chmod 755 /home/gitea-runner/.pm2/logs/
sudo chmod 644 /home/gitea-runner/.pm2/logs/*.log

# Verify logstash group membership
groups logstash
```

**Note:** The application log files are created automatically when the application starts. Run the chmod commands after the first deployment.

### Step 7: Test Logstash Configuration

Test the configuration before starting:

```bash
sudo /usr/share/logstash/bin/logstash --config.test_and_exit -f /etc/logstash/conf.d/bugsink.conf
```

You should see `Configuration OK` if there are no errors.

### Step 8: Start Logstash

```bash
sudo systemctl enable logstash
sudo systemctl start logstash
sudo systemctl status logstash
```

View Logstash logs to verify it's working:

```bash
sudo journalctl -u logstash -f
```

### Troubleshooting Logstash

| Issue                      | Solution                                                                                                 |
| -------------------------- | -------------------------------------------------------------------------------------------------------- |
| "Permission denied" errors | Check file permissions on log files and sincedb directory                                                |
| No events being processed  | Verify log file paths exist and contain data                                                             |
| HTTP output errors         | Check Bugsink is running and DSN key is correct                                                          |
| Logstash not starting      | Run config test: `sudo /usr/share/logstash/bin/logstash --config.test_and_exit -f /etc/logstash/conf.d/` |

### Alternative: Skip Logstash

Since the flyer-crawler application already sends errors directly to Bugsink via the Sentry SDK (configured in Steps 11-12), you may not need Logstash at all. Logstash is primarily useful for:

- Aggregating logs from services that don't have native Sentry support (Redis, NGINX)
- Centralizing all logs in one place
- Complex log transformations

If you only need application error tracking, the Sentry SDK integration is sufficient.

---

## SSL/TLS with Let's Encrypt

### Install Certbot

```bash
sudo apt install -y certbot python3-certbot-nginx
```

### Obtain Certificate

```bash
sudo certbot --nginx -d flyer-crawler.projectium.com
```

Certbot will automatically configure NGINX for HTTPS.

### Auto-Renewal

Certbot installs a systemd timer for automatic renewal. Verify:

```bash
sudo systemctl status certbot.timer
```

---

## Firewall Configuration

### Configure UFW

```bash
sudo ufw default deny incoming
sudo ufw default allow outgoing

# Allow SSH
sudo ufw allow ssh

# Allow HTTP and HTTPS
sudo ufw allow 80/tcp
sudo ufw allow 443/tcp

# Enable firewall
sudo ufw enable
```

**Important**: Bugsink (port 8000) should NOT be exposed externally. It listens on localhost only.

---

## Maintenance Commands

### Application Management

| Task                  | Command                                                                                |
| --------------------- | -------------------------------------------------------------------------------------- |
| View PM2 status       | `pm2 status`                                                                           |
| View application logs | `pm2 logs`                                                                             |
| Restart all processes | `pm2 restart all`                                                                      |
| Restart specific app  | `pm2 restart flyer-crawler-api`                                                        |
| Update application    | `cd /opt/flyer-crawler && git pull && npm install && npm run build && pm2 restart all` |

### Service Management

| Service    | Start                               | Stop                               | Status                               |
| ---------- | ----------------------------------- | ---------------------------------- | ------------------------------------ |
| PostgreSQL | `sudo systemctl start postgresql`   | `sudo systemctl stop postgresql`   | `sudo systemctl status postgresql`   |
| Redis      | `sudo systemctl start redis-server` | `sudo systemctl stop redis-server` | `sudo systemctl status redis-server` |
| NGINX      | `sudo systemctl start nginx`        | `sudo systemctl stop nginx`        | `sudo systemctl status nginx`        |
| Bugsink    | `sudo systemctl start bugsink`      | `sudo systemctl stop bugsink`      | `sudo systemctl status bugsink`      |
| Logstash   | `sudo systemctl start logstash`     | `sudo systemctl stop logstash`     | `sudo systemctl status logstash`     |

### Database Backup

```bash
# Backup application database
pg_dump -U flyer_crawler -h localhost flyer_crawler > backup_$(date +%Y%m%d).sql

# Backup Bugsink database
pg_dump -U bugsink -h localhost bugsink > bugsink_backup_$(date +%Y%m%d).sql
```

### Log Locations

| Log               | Location                    |
| ----------------- | --------------------------- |
| Application (PM2) | `~/.pm2/logs/`              |
| NGINX access      | `/var/log/nginx/access.log` |
| NGINX error       | `/var/log/nginx/error.log`  |
| PostgreSQL        | `/var/log/postgresql/`      |
| Redis             | `/var/log/redis/`           |
| Bugsink           | `journalctl -u bugsink`     |
| Logstash          | `/var/log/logstash/`        |

---

## Related Documentation

- [DEPLOYMENT.md](../DEPLOYMENT.md) - Container-based deployment
- [DATABASE.md](../DATABASE.md) - Database schema and extensions
- [AUTHENTICATION.md](../AUTHENTICATION.md) - OAuth provider setup
- [ADR-015](adr/0015-application-performance-monitoring-and-error-tracking.md) - Error tracking architecture