Files
flyer-crawler.projectium.com/docs/BARE-METAL-SETUP.md
Torben Sorensen 6e36cc3b07
All checks were successful
Deploy to Test Environment / deploy-to-test (push) Successful in 16m34s
logging + e2e test fixes
2026-01-12 19:10:29 -08:00

1348 lines
42 KiB
Markdown

# Bare-Metal Server Setup Guide
This guide covers the manual installation of Flyer Crawler and its dependencies on a bare-metal Ubuntu server (e.g., a colocation server). This is the definitive reference for setting up a production environment without containers.
**Target Environment**: Ubuntu 22.04 LTS (or newer)
---
## Table of Contents
1. [System Prerequisites](#system-prerequisites)
2. [PostgreSQL Setup](#postgresql-setup)
3. [Redis Setup](#redis-setup)
4. [Node.js and Application Setup](#nodejs-and-application-setup)
5. [PM2 Process Manager](#pm2-process-manager)
6. [NGINX Reverse Proxy](#nginx-reverse-proxy)
7. [Bugsink Error Tracking](#bugsink-error-tracking)
8. [Logstash Log Aggregation](#logstash-log-aggregation)
9. [SSL/TLS with Let's Encrypt](#ssltls-with-lets-encrypt)
10. [Firewall Configuration](#firewall-configuration)
11. [Maintenance Commands](#maintenance-commands)
---
## System Prerequisites
Update the system and install essential packages:
```bash
sudo apt update && sudo apt upgrade -y
sudo apt install -y curl git build-essential python3 python3-pip python3-venv
```
---
## PostgreSQL Setup
### Install PostgreSQL 14+ with PostGIS
```bash
# Add PostgreSQL APT repository
sudo sh -c 'echo "deb http://apt.postgresql.org/pub/repos/apt $(lsb_release -cs)-pgdg main" > /etc/apt/sources.list.d/pgdg.list'
wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add -
sudo apt update
# Install PostgreSQL and PostGIS
sudo apt install -y postgresql-14 postgresql-14-postgis-3
```
### Create Application Database and User
```bash
sudo -u postgres psql
```
```sql
-- Create application user and database
CREATE USER flyer_crawler WITH PASSWORD 'YOUR_SECURE_PASSWORD';
CREATE DATABASE flyer_crawler OWNER flyer_crawler;
-- Connect to the database and enable extensions
\c flyer_crawler
CREATE EXTENSION IF NOT EXISTS postgis;
CREATE EXTENSION IF NOT EXISTS pg_trgm;
CREATE EXTENSION IF NOT EXISTS pgcrypto;
-- Grant privileges
GRANT ALL PRIVILEGES ON DATABASE flyer_crawler TO flyer_crawler;
\q
```
### Configure PostgreSQL for Remote Access (if needed)
Edit `/etc/postgresql/14/main/postgresql.conf`:
```conf
listen_addresses = 'localhost' # Change to '*' for remote access
```
Edit `/etc/postgresql/14/main/pg_hba.conf` to add allowed hosts:
```conf
# Local connections
local all all peer
host all all 127.0.0.1/32 scram-sha-256
```
Restart PostgreSQL:
```bash
sudo systemctl restart postgresql
```
---
## Redis Setup
### Install Redis
```bash
sudo apt install -y redis-server
```
### Configure Redis Password
Edit `/etc/redis/redis.conf`:
```conf
requirepass YOUR_REDIS_PASSWORD
```
Restart Redis:
```bash
sudo systemctl restart redis-server
sudo systemctl enable redis-server
```
### Test Redis Connection
```bash
redis-cli -a YOUR_REDIS_PASSWORD ping
# Should output: PONG
```
---
## Node.js and Application Setup
### Install Node.js 20.x
```bash
curl -fsSL https://deb.nodesource.com/setup_20.x | sudo -E bash -
sudo apt install -y nodejs
```
Verify installation:
```bash
node --version # Should output v20.x.x
npm --version
```
### Install System Dependencies for PDF Processing
```bash
sudo apt install -y poppler-utils # For pdftocairo
```
### Clone and Install Application
```bash
# Create application directory
sudo mkdir -p /opt/flyer-crawler
sudo chown $USER:$USER /opt/flyer-crawler
# Clone repository
cd /opt/flyer-crawler
git clone https://gitea.projectium.com/flyer-crawler/flyer-crawler.projectium.com.git .
# Install dependencies
npm install
# Build for production
npm run build
```
### Configure Environment Variables
**Important:** The flyer-crawler application does **not** use local environment files in production. All secrets are managed through **Gitea CI/CD secrets** and injected during deployment.
#### How Secrets Work
1. **Secrets are stored in Gitea** at Repository → Settings → Actions → Secrets
2. **Workflow files** (`.gitea/workflows/deploy-to-prod.yml`) reference secrets using `${{ secrets.SECRET_NAME }}`
3. **PM2** receives environment variables from the workflow's `env:` block
4. **ecosystem.config.cjs** passes variables to the application via `process.env`
#### Required Gitea Secrets
Before deployment, ensure these secrets are configured in Gitea:
**Shared Secrets** (used by both production and test):
| Secret Name | Description |
| ---------------------- | --------------------------------------- |
| `DB_HOST` | Database hostname (usually `localhost`) |
| `DB_USER` | Database username |
| `DB_PASSWORD` | Database password |
| `JWT_SECRET` | JWT signing secret (min 32 characters) |
| `GOOGLE_MAPS_API_KEY` | Google Maps API key |
| `GOOGLE_CLIENT_ID` | Google OAuth client ID |
| `GOOGLE_CLIENT_SECRET` | Google OAuth client secret |
| `GH_CLIENT_ID` | GitHub OAuth client ID |
| `GH_CLIENT_SECRET` | GitHub OAuth client secret |
**Production-Specific Secrets**:
| Secret Name | Description |
| --------------------------- | -------------------------------------------------------------------- |
| `DB_DATABASE_PROD` | Production database name (`flyer_crawler`) |
| `REDIS_PASSWORD_PROD` | Redis password for production (uses database 0) |
| `VITE_GOOGLE_GENAI_API_KEY` | Gemini API key for production |
| `SENTRY_DSN` | Bugsink backend DSN (see [Bugsink section](#bugsink-error-tracking)) |
| `VITE_SENTRY_DSN` | Bugsink frontend DSN |
**Test-Specific Secrets**:
| Secret Name | Description |
| -------------------------------- | ----------------------------------------------------------------------------- |
| `DB_DATABASE_TEST` | Test database name (`flyer-crawler-test`) |
| `REDIS_PASSWORD_TEST` | Redis password for test (uses database 1 for isolation) |
| `VITE_GOOGLE_GENAI_API_KEY_TEST` | Gemini API key for test environment |
| `SENTRY_DSN_TEST` | Bugsink backend DSN for test (see [Bugsink section](#bugsink-error-tracking)) |
| `VITE_SENTRY_DSN_TEST` | Bugsink frontend DSN for test |
#### Test Environment Details
The test environment (`flyer-crawler-test.projectium.com`) uses **both** Gitea CI/CD secrets and a local `.env.test` file:
| Path | Purpose |
| ------------------------------------------------------ | ---------------------------------------- |
| `/var/www/flyer-crawler-test.projectium.com/` | Test application directory |
| `/var/www/flyer-crawler-test.projectium.com/.env.test` | Local overrides for test-specific config |
**Key differences from production:**
- Uses Redis database **1** (production uses database **0**) to isolate job queues
- PM2 processes are named with `-test` suffix (e.g., `flyer-crawler-api-test`)
- Deployed automatically on every push to `main` branch
- Has a `.env.test` file for additional local configuration overrides
For detailed information on secrets management, see [CLAUDE.md](../CLAUDE.md).
---
## PM2 Process Manager
### Install PM2 Globally
```bash
sudo npm install -g pm2
```
### PM2 Configuration Files
The application uses **separate ecosystem config files** for production and test environments:
| File | Purpose | Processes Started |
| --------------------------- | --------------------- | -------------------------------------------------------------------------------------------- |
| `ecosystem.config.cjs` | Production deployment | `flyer-crawler-api`, `flyer-crawler-worker`, `flyer-crawler-analytics-worker` |
| `ecosystem-test.config.cjs` | Test deployment | `flyer-crawler-api-test`, `flyer-crawler-worker-test`, `flyer-crawler-analytics-worker-test` |
**Key Points:**
- Production and test processes run **simultaneously** with distinct names
- Test processes use `NODE_ENV=test` which enables file logging
- Test processes use Redis database 1 (isolated from production which uses database 0)
- Both configs validate required environment variables but only warn (don't exit) if missing
### Start Production Application
```bash
cd /var/www/flyer-crawler.projectium.com
# Set required environment variables (usually done via CI/CD)
export DB_HOST=localhost
export JWT_SECRET=your-secret
export GEMINI_API_KEY=your-api-key
# ... other required variables
pm2 startOrReload ecosystem.config.cjs --update-env && pm2 save
```
This starts three production processes:
- `flyer-crawler-api` - Main API server (port 3001)
- `flyer-crawler-worker` - Background job worker
- `flyer-crawler-analytics-worker` - Analytics processing worker
### Start Test Application
```bash
cd /var/www/flyer-crawler-test.projectium.com
# Set required environment variables (usually done via CI/CD)
export DB_HOST=localhost
export DB_NAME=flyer-crawler-test
export JWT_SECRET=your-secret
export GEMINI_API_KEY=your-test-api-key
export REDIS_URL=redis://localhost:6379/1 # Use database 1 for isolation
# ... other required variables
pm2 startOrReload ecosystem-test.config.cjs --update-env && pm2 save
```
This starts three test processes (running alongside production):
- `flyer-crawler-api-test` - Test API server (port 3001 via different NGINX vhost)
- `flyer-crawler-worker-test` - Test background job worker
- `flyer-crawler-analytics-worker-test` - Test analytics worker
### Verify Running Processes
After starting both environments, you should see 6 application processes:
```bash
pm2 list
```
Expected output:
```text
┌────┬───────────────────────────────────┬──────────┬────────┬───────────┐
│ id │ name │ mode │ status │ cpu │
├────┼───────────────────────────────────┼──────────┼────────┼───────────┤
│ 0 │ flyer-crawler-api │ cluster │ online │ 0% │
│ 1 │ flyer-crawler-worker │ fork │ online │ 0% │
│ 2 │ flyer-crawler-analytics-worker │ fork │ online │ 0% │
│ 3 │ flyer-crawler-api-test │ fork │ online │ 0% │
│ 4 │ flyer-crawler-worker-test │ fork │ online │ 0% │
│ 5 │ flyer-crawler-analytics-worker-test│ fork │ online │ 0% │
└────┴───────────────────────────────────┴──────────┴────────┴───────────┘
```
### Configure PM2 Startup
```bash
pm2 startup systemd
# Follow the command output to enable PM2 on boot
pm2 save
```
### PM2 Log Rotation
```bash
pm2 install pm2-logrotate
pm2 set pm2-logrotate:max_size 10M
pm2 set pm2-logrotate:retain 14
pm2 set pm2-logrotate:compress true
```
### Useful PM2 Commands
```bash
# View logs for a specific process
pm2 logs flyer-crawler-api-test --lines 50
# View environment variables for a process
pm2 env <process-id>
# Restart only test processes
pm2 restart flyer-crawler-api-test flyer-crawler-worker-test flyer-crawler-analytics-worker-test
# Delete all test processes (without affecting production)
pm2 delete flyer-crawler-api-test flyer-crawler-worker-test flyer-crawler-analytics-worker-test
```
---
## NGINX Reverse Proxy
### Install NGINX
```bash
sudo apt install -y nginx
```
### Create Site Configuration
Create `/etc/nginx/sites-available/flyer-crawler.projectium.com`:
```nginx
server {
listen 80;
server_name flyer-crawler.projectium.com;
# Redirect HTTP to HTTPS (uncomment after SSL setup)
# return 301 https://$server_name$request_uri;
location / {
proxy_pass http://localhost:5173;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection 'upgrade';
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_cache_bypass $http_upgrade;
}
location /api {
proxy_pass http://localhost:3001;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection 'upgrade';
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_cache_bypass $http_upgrade;
# File upload size limit
client_max_body_size 50M;
}
# MIME type fix for .mjs files
types {
application/javascript js mjs;
}
}
```
### Enable the Site
```bash
sudo ln -s /etc/nginx/sites-available/flyer-crawler.projectium.com /etc/nginx/sites-enabled/
sudo nginx -t
sudo systemctl reload nginx
sudo systemctl enable nginx
```
---
## Bugsink Error Tracking
Bugsink is a lightweight, self-hosted Sentry-compatible error tracking system. This guide follows the [official Bugsink single-server production setup](https://www.bugsink.com/docs/single-server-production/).
See [ADR-015](adr/0015-application-performance-monitoring-and-error-tracking.md) for architecture details.
### Step 1: Create Bugsink User
Create a dedicated non-root user for Bugsink:
```bash
sudo adduser bugsink --disabled-password --gecos ""
```
### Step 2: Set Up Virtual Environment and Install Bugsink
Switch to the bugsink user:
```bash
sudo su - bugsink
```
Create the virtual environment:
```bash
python3 -m venv venv
```
Activate the virtual environment:
```bash
source venv/bin/activate
```
You should see `(venv)` at the beginning of your prompt. Now install Bugsink:
```bash
pip install bugsink --upgrade
bugsink-show-version
```
You should see output like `bugsink 2.x.x`.
### Step 3: Create Configuration File
Generate the configuration file. Replace `bugsink.yourdomain.com` with your actual hostname:
```bash
bugsink-create-conf --template=singleserver --host=bugsink.yourdomain.com
```
This creates `bugsink_conf.py` in `/home/bugsink/`. Edit it to customize settings:
```bash
nano bugsink_conf.py
```
**Key settings to review:**
| Setting | Description |
| ------------------- | ------------------------------------------------------------------------------- |
| `BASE_URL` | The URL where Bugsink will be accessed (e.g., `https://bugsink.yourdomain.com`) |
| `SITE_TITLE` | Display name for your Bugsink instance |
| `SECRET_KEY` | Auto-generated, but verify it exists |
| `TIME_ZONE` | Your timezone (e.g., `America/New_York`) |
| `USER_REGISTRATION` | Set to `"closed"` to disable public signup |
| `SINGLE_USER` | Set to `True` if only one user will use this instance |
### Step 4: Initialize Database
Bugsink uses SQLite by default, which is recommended for single-server setups. Run the database migrations:
```bash
bugsink-manage migrate
bugsink-manage migrate snappea --database=snappea
```
Verify the database files were created:
```bash
ls *.sqlite3
```
You should see `db.sqlite3` and `snappea.sqlite3`.
### Step 5: Create Admin User
Create the superuser account. Using your email as the username is recommended:
```bash
bugsink-manage createsuperuser
```
**Important:** Save these credentials - you'll need them to log into the Bugsink web UI.
### Step 6: Verify Configuration
Run Django's deployment checks:
```bash
bugsink-manage check_migrations
bugsink-manage check --deploy --fail-level WARNING
```
Exit back to root for the next steps:
```bash
exit
```
### Step 7: Create Gunicorn Service
Create `/etc/systemd/system/gunicorn-bugsink.service`:
```bash
sudo nano /etc/systemd/system/gunicorn-bugsink.service
```
Add the following content:
```ini
[Unit]
Description=Gunicorn daemon for Bugsink
After=network.target
[Service]
Restart=always
Type=notify
User=bugsink
Group=bugsink
Environment="PYTHONUNBUFFERED=1"
WorkingDirectory=/home/bugsink
ExecStart=/home/bugsink/venv/bin/gunicorn \
--bind="127.0.0.1:8000" \
--workers=4 \
--timeout=6 \
--access-logfile - \
--max-requests=1000 \
--max-requests-jitter=100 \
bugsink.wsgi
ExecReload=/bin/kill -s HUP $MAINPID
KillMode=mixed
TimeoutStopSec=5
[Install]
WantedBy=multi-user.target
```
Enable and start the service:
```bash
sudo systemctl daemon-reload
sudo systemctl enable --now gunicorn-bugsink.service
sudo systemctl status gunicorn-bugsink.service
```
Test that Gunicorn is responding (replace hostname):
```bash
curl http://localhost:8000/accounts/login/ --header "Host: bugsink.yourdomain.com"
```
You should see HTML output containing a login form.
### Step 8: Create Snappea Background Worker Service
Snappea is Bugsink's background task processor. Create `/etc/systemd/system/snappea.service`:
```bash
sudo nano /etc/systemd/system/snappea.service
```
Add the following content:
```ini
[Unit]
Description=Snappea daemon for Bugsink background tasks
After=network.target
[Service]
Restart=always
User=bugsink
Group=bugsink
Environment="PYTHONUNBUFFERED=1"
WorkingDirectory=/home/bugsink
ExecStart=/home/bugsink/venv/bin/bugsink-runsnappea
KillMode=mixed
TimeoutStopSec=5
RuntimeMaxSec=1d
[Install]
WantedBy=multi-user.target
```
Enable and start the service:
```bash
sudo systemctl daemon-reload
sudo systemctl enable --now snappea.service
sudo systemctl status snappea.service
```
Verify snappea is working:
```bash
sudo su - bugsink
source venv/bin/activate
bugsink-manage checksnappea
exit
```
### Step 9: Configure NGINX for Bugsink
Create `/etc/nginx/sites-available/bugsink`:
```bash
sudo nano /etc/nginx/sites-available/bugsink
```
Add the following (replace `bugsink.yourdomain.com` with your hostname):
```nginx
server {
server_name bugsink.yourdomain.com;
listen 80;
client_max_body_size 20M;
access_log /var/log/nginx/bugsink.access.log;
error_log /var/log/nginx/bugsink.error.log;
location / {
proxy_pass http://127.0.0.1:8000;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-Proto $scheme;
}
}
```
Enable the site:
```bash
sudo ln -s /etc/nginx/sites-available/bugsink /etc/nginx/sites-enabled/
sudo nginx -t
sudo systemctl reload nginx
```
### Step 10: Configure SSL with Certbot (Recommended)
```bash
sudo certbot --nginx -d bugsink.yourdomain.com
```
After SSL is configured, update the NGINX config to add security headers. Edit `/etc/nginx/sites-available/bugsink` and add to the `location /` block:
```nginx
add_header Strict-Transport-Security "max-age=31536000; preload" always;
```
Reload NGINX:
```bash
sudo nginx -t
sudo systemctl reload nginx
```
### Step 11: Create Projects and Get DSNs
1. Access Bugsink UI at `https://bugsink.yourdomain.com`
2. Log in with the admin credentials you created
3. Create a new team (or use the default)
4. Create projects for each environment:
**Production:**
- **flyer-crawler-backend** (Platform: Node.js)
- **flyer-crawler-frontend** (Platform: JavaScript/React)
**Test:**
- **flyer-crawler-backend-test** (Platform: Node.js)
- **flyer-crawler-frontend-test** (Platform: JavaScript/React)
5. For each project, go to Settings → Client Keys (DSN)
6. Copy the DSN URLs - you'll have 4 DSNs total (2 for production, 2 for test)
> **Note:** The dev container runs its own local Bugsink instance at `localhost:8000` - no remote DSNs needed for development.
### Step 12: Configure Application to Use Bugsink
The flyer-crawler application receives its configuration via **Gitea CI/CD secrets**, not local environment files. Follow these steps to add the Bugsink DSNs:
#### 1. Add Secrets in Gitea
Navigate to your repository in Gitea:
1. Go to **Settings****Actions****Secrets**
2. Add the following secrets:
**Production DSNs:**
| Secret Name | Value | Description |
| ----------------- | -------------------------------------- | ----------------------- |
| `SENTRY_DSN` | `https://KEY@bugsink.yourdomain.com/1` | Production backend DSN |
| `VITE_SENTRY_DSN` | `https://KEY@bugsink.yourdomain.com/2` | Production frontend DSN |
**Test DSNs:**
| Secret Name | Value | Description |
| ---------------------- | -------------------------------------- | ----------------- |
| `SENTRY_DSN_TEST` | `https://KEY@bugsink.yourdomain.com/3` | Test backend DSN |
| `VITE_SENTRY_DSN_TEST` | `https://KEY@bugsink.yourdomain.com/4` | Test frontend DSN |
> **Note:** The project numbers in the DSN URLs are assigned by Bugsink when you create each project. Use the actual DSN values from Step 11.
#### 2. Update the Deployment Workflows
**Production** (`deploy-to-prod.yml`):
In the `Install Backend Dependencies and Restart Production Server` step, add to the `env:` block:
```yaml
env:
# ... existing secrets ...
# Sentry/Bugsink Error Tracking
SENTRY_DSN: ${{ secrets.SENTRY_DSN }}
SENTRY_ENVIRONMENT: 'production'
SENTRY_ENABLED: 'true'
```
In the build step, add frontend variables:
```yaml
VITE_SENTRY_DSN=${{ secrets.VITE_SENTRY_DSN }} \
VITE_SENTRY_ENVIRONMENT=production \
VITE_SENTRY_ENABLED=true \
npm run build
```
**Test** (`deploy-to-test.yml`):
In the `Install Backend Dependencies and Restart Test Server` step, add to the `env:` block:
```yaml
env:
# ... existing secrets ...
# Sentry/Bugsink Error Tracking (Test)
SENTRY_DSN: ${{ secrets.SENTRY_DSN_TEST }}
SENTRY_ENVIRONMENT: 'test'
SENTRY_ENABLED: 'true'
```
In the build step, add frontend variables:
```yaml
VITE_SENTRY_DSN=${{ secrets.VITE_SENTRY_DSN_TEST }} \
VITE_SENTRY_ENVIRONMENT=test \
VITE_SENTRY_ENABLED=true \
npm run build
```
#### 3. Update ecosystem.config.cjs
Add Sentry variables to the `sharedEnv` object in `ecosystem.config.cjs`:
```javascript
const sharedEnv = {
// ... existing variables ...
SENTRY_DSN: process.env.SENTRY_DSN,
SENTRY_ENVIRONMENT: process.env.SENTRY_ENVIRONMENT,
SENTRY_ENABLED: process.env.SENTRY_ENABLED,
};
```
#### 4. Dev Container (No Configuration Needed)
The dev container runs its own **local Bugsink instance** at `http://localhost:8000`. No remote DSNs or Gitea secrets are needed for development:
- DSNs are pre-configured in `compose.dev.yml`
- Admin UI: `http://localhost:8000` (login: `admin@localhost` / `admin`)
- Errors stay local and isolated from production/test
#### 5. Deploy to Apply Changes
Trigger deployments via Gitea Actions:
- **Test**: Automatically deploys on push to `main`
- **Production**: Manual trigger via workflow dispatch
**Note:** There is no `/etc/flyer-crawler/environment` file on the server. Production and test secrets are managed through Gitea CI/CD and injected at deployment time. Dev container uses local `.env` file. See [CLAUDE.md](../CLAUDE.md) for details.
### Step 13: Test Error Tracking
You can test Bugsink is working before configuring the flyer-crawler application.
Switch to the bugsink user and open a Python shell:
```bash
sudo su - bugsink
source venv/bin/activate
bugsink-manage shell
```
In the Python shell, send a test message using the **backend DSN** from Step 11:
```python
import sentry_sdk
sentry_sdk.init("https://YOUR_BACKEND_KEY@bugsink.yourdomain.com/1")
sentry_sdk.capture_message("Test message from Bugsink setup")
exit()
```
Exit back to root:
```bash
exit
```
Check the Bugsink UI - you should see the test message appear in the `flyer-crawler-backend` project.
### Step 14: Test from Flyer-Crawler Application (After App Setup)
Once the flyer-crawler application has been deployed with the Sentry secrets configured in Step 12:
```bash
cd /var/www/flyer-crawler.projectium.com
npx tsx scripts/test-bugsink.ts
```
Check the Bugsink UI - you should see a test event appear.
### Bugsink Maintenance Commands
| Task | Command |
| ----------------------- | ------------------------------------------------------------------------------------------------------------------------------------------- |
| View Gunicorn status | `sudo systemctl status gunicorn-bugsink` |
| View Snappea status | `sudo systemctl status snappea` |
| View Gunicorn logs | `sudo journalctl -u gunicorn-bugsink -f` |
| View Snappea logs | `sudo journalctl -u snappea -f` |
| Restart Bugsink | `sudo systemctl restart gunicorn-bugsink snappea` |
| Run management commands | `sudo su - bugsink` then `source venv/bin/activate && bugsink-manage <command>` |
| Upgrade Bugsink | `sudo su - bugsink && source venv/bin/activate && pip install bugsink --upgrade && exit && sudo systemctl restart gunicorn-bugsink snappea` |
---
## Logstash Log Aggregation
Logstash aggregates logs from the application and infrastructure, forwarding errors to Bugsink.
> **Note:** Logstash integration is **optional**. The flyer-crawler application already sends errors directly to Bugsink via the Sentry SDK. Logstash is only needed if you want to aggregate logs from other sources (Redis, NGINX, etc.) into Bugsink.
### Step 1: Create Application Log Directory
The flyer-crawler application automatically creates its log directory on startup, but you need to ensure proper permissions for Logstash to read the logs.
Create the log directories and set appropriate permissions:
```bash
# Create log directory for the production application
sudo mkdir -p /var/www/flyer-crawler.projectium.com/logs
# Set ownership to root (since PM2 runs as root)
sudo chown -R root:root /var/www/flyer-crawler.projectium.com/logs
# Make logs readable by logstash user
sudo chmod 755 /var/www/flyer-crawler.projectium.com/logs
```
For the test environment:
```bash
sudo mkdir -p /var/www/flyer-crawler-test.projectium.com/logs
sudo chown -R root:root /var/www/flyer-crawler-test.projectium.com/logs
sudo chmod 755 /var/www/flyer-crawler-test.projectium.com/logs
```
### Step 2: Application File Logging (Already Configured)
The flyer-crawler application uses Pino for logging and is configured to write logs to files in production/test environments:
**Log File Locations:**
| Environment | Log File Path |
| ------------- | --------------------------------------------------------- |
| Production | `/var/www/flyer-crawler.projectium.com/logs/app.log` |
| Test | `/var/www/flyer-crawler-test.projectium.com/logs/app.log` |
| Dev Container | `/app/logs/app.log` |
**How It Works:**
- In production/test: Pino writes JSON logs to both stdout (for PM2) AND `logs/app.log` (for Logstash)
- In development: Pino uses pino-pretty for human-readable console output only
- The log directory is created automatically if it doesn't exist
- You can override the log directory with the `LOG_DIR` environment variable
**Verify Logging After Deployment:**
After deploying the application, verify that logs are being written:
```bash
# Check production logs
ls -la /var/www/flyer-crawler.projectium.com/logs/
tail -f /var/www/flyer-crawler.projectium.com/logs/app.log
# Check test logs
ls -la /var/www/flyer-crawler-test.projectium.com/logs/
tail -f /var/www/flyer-crawler-test.projectium.com/logs/app.log
```
You should see JSON-formatted log entries like:
```json
{ "level": 30, "time": 1704067200000, "msg": "Server started on port 3001", "module": "server" }
```
### Step 3: Install Logstash
```bash
# Add Elastic APT repository
wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo gpg --dearmor -o /usr/share/keyrings/elastic-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/elastic-keyring.gpg] https://artifacts.elastic.co/packages/8.x/apt stable main" | sudo tee /etc/apt/sources.list.d/elastic-8.x.list
# Update and install
sudo apt update
sudo apt install -y logstash
```
Verify installation:
```bash
/usr/share/logstash/bin/logstash --version
```
### Step 4: Configure Logstash Pipeline
Create the pipeline configuration file:
```bash
sudo nano /etc/logstash/conf.d/bugsink.conf
```
Add the following content:
```conf
input {
# Production application logs (Pino JSON format)
file {
path => "/var/www/flyer-crawler.projectium.com/logs/app.log"
codec => json_lines
type => "pino"
tags => ["app", "production"]
start_position => "end"
sincedb_path => "/var/lib/logstash/sincedb_pino_prod"
}
# Test environment logs
file {
path => "/var/www/flyer-crawler-test.projectium.com/logs/app.log"
codec => json_lines
type => "pino"
tags => ["app", "test"]
start_position => "end"
sincedb_path => "/var/lib/logstash/sincedb_pino_test"
}
# Redis logs (shared by both environments)
file {
path => "/var/log/redis/redis-server.log"
type => "redis"
tags => ["infra", "redis", "production"]
start_position => "end"
sincedb_path => "/var/lib/logstash/sincedb_redis"
}
# NGINX error logs (production)
file {
path => "/var/log/nginx/error.log"
type => "nginx"
tags => ["infra", "nginx", "production"]
start_position => "end"
sincedb_path => "/var/lib/logstash/sincedb_nginx_error"
}
# NGINX access logs - for detecting 5xx errors (production)
file {
path => "/var/log/nginx/access.log"
type => "nginx_access"
tags => ["infra", "nginx", "production"]
start_position => "end"
sincedb_path => "/var/lib/logstash/sincedb_nginx_access"
}
# PM2 error logs - Production (plain text stack traces)
file {
path => "/home/gitea-runner/.pm2/logs/flyer-crawler-*-error.log"
exclude => "*-test-error.log"
type => "pm2"
tags => ["infra", "pm2", "production"]
start_position => "end"
sincedb_path => "/var/lib/logstash/sincedb_pm2_prod"
}
# PM2 error logs - Test
file {
path => "/home/gitea-runner/.pm2/logs/flyer-crawler-*-test-error.log"
type => "pm2"
tags => ["infra", "pm2", "test"]
start_position => "end"
sincedb_path => "/var/lib/logstash/sincedb_pm2_test"
}
}
filter {
# Pino log level detection
# Pino levels: 10=trace, 20=debug, 30=info, 40=warn, 50=error, 60=fatal
if [type] == "pino" and [level] {
if [level] >= 50 {
mutate { add_tag => ["error"] }
} else if [level] >= 40 {
mutate { add_tag => ["warning"] }
}
}
# Redis error detection
if [type] == "redis" {
grok {
match => { "message" => "%{POSINT:pid}:%{WORD:role} %{MONTHDAY} %{MONTH} %{YEAR}? ?%{TIME} %{WORD:loglevel} %{GREEDYDATA:redis_message}" }
}
if [loglevel] in ["WARNING", "ERROR"] {
mutate { add_tag => ["error"] }
}
}
# NGINX error log detection (all entries are errors)
if [type] == "nginx" {
mutate { add_tag => ["error"] }
grok {
match => { "message" => "%{TIMESTAMP_ISO8601:timestamp} \[%{WORD:severity}\] %{GREEDYDATA:nginx_message}" }
}
}
# NGINX access log - detect 5xx errors
if [type] == "nginx_access" {
grok {
match => { "message" => "%{COMBINEDAPACHELOG}" }
}
if [response] =~ /^5\d{2}$/ {
mutate { add_tag => ["error"] }
}
}
# PM2 error log detection - tag lines with actual error indicators
if [type] == "pm2" {
if [message] =~ /Error:|error:|ECONNREFUSED|ENOENT|TypeError|ReferenceError|SyntaxError/ {
mutate { add_tag => ["error"] }
}
}
}
output {
# Production app errors -> flyer-crawler-backend (project 1)
if "error" in [tags] and "app" in [tags] and "production" in [tags] {
http {
url => "http://localhost:8000/api/1/store/"
http_method => "post"
format => "json"
headers => {
"X-Sentry-Auth" => "Sentry sentry_version=7, sentry_client=logstash/1.0, sentry_key=YOUR_PROD_BACKEND_DSN_KEY"
}
}
}
# Test app errors -> flyer-crawler-backend-test (project 3)
if "error" in [tags] and "app" in [tags] and "test" in [tags] {
http {
url => "http://localhost:8000/api/3/store/"
http_method => "post"
format => "json"
headers => {
"X-Sentry-Auth" => "Sentry sentry_version=7, sentry_client=logstash/1.0, sentry_key=YOUR_TEST_BACKEND_DSN_KEY"
}
}
}
# Production infrastructure errors (Redis, NGINX, PM2) -> flyer-crawler-infrastructure (project 5)
if "error" in [tags] and "infra" in [tags] and "production" in [tags] {
http {
url => "http://localhost:8000/api/5/store/"
http_method => "post"
format => "json"
headers => {
"X-Sentry-Auth" => "Sentry sentry_version=7, sentry_client=logstash/1.0, sentry_key=b083076f94fb461b889d5dffcbef43bf"
}
}
}
# Test infrastructure errors (PM2 test logs) -> flyer-crawler-test-infrastructure (project 6)
if "error" in [tags] and "infra" in [tags] and "test" in [tags] {
http {
url => "http://localhost:8000/api/6/store/"
http_method => "post"
format => "json"
headers => {
"X-Sentry-Auth" => "Sentry sentry_version=7, sentry_client=logstash/1.0, sentry_key=25020dd6c2b74ad78463ec90e90fadab"
}
}
}
# Debug output (uncomment to troubleshoot)
# stdout { codec => rubydebug }
}
```
**Bugsink Project DSNs:**
| Project | DSN Key | Project ID |
| ----------------------------------- | ---------------------------------- | ---------- |
| `flyer-crawler-backend` | `911aef02b9a548fa8fabb8a3c81abfe5` | 1 |
| `flyer-crawler-frontend` | (used by app, not Logstash) | 2 |
| `flyer-crawler-backend-test` | `cdb99c314589431e83d4cc38a809449b` | 3 |
| `flyer-crawler-frontend-test` | (used by app, not Logstash) | 4 |
| `flyer-crawler-infrastructure` | `b083076f94fb461b889d5dffcbef43bf` | 5 |
| `flyer-crawler-test-infrastructure` | `25020dd6c2b74ad78463ec90e90fadab` | 6 |
**Note:** The DSN key is the part before `@` in the full DSN URL (e.g., `https://KEY@bugsink.projectium.com/PROJECT_ID`).
**Note on PM2 Logs:** PM2 error logs capture stack traces from stderr, which are valuable for debugging startup errors and uncaught exceptions. Production PM2 logs go to project 5 (infrastructure), test PM2 logs go to project 6 (test-infrastructure).
### Step 5: Create Logstash State Directory and Fix Config Path
Logstash needs a directory to track which log lines it has already processed, and a symlink so it can find its config files:
```bash
# Create state directory for sincedb files
sudo mkdir -p /var/lib/logstash
sudo chown logstash:logstash /var/lib/logstash
# Create symlink so Logstash finds its config (avoids "Could not find logstash.yml" warning)
sudo ln -sf /etc/logstash /usr/share/logstash/config
```
### Step 6: Grant Logstash Access to Application Logs
Logstash runs as the `logstash` user and needs permission to read log files:
```bash
# Add logstash user to adm group (for nginx and redis logs)
sudo usermod -aG adm logstash
# Make application log files readable (created automatically when app starts)
sudo chmod 644 /var/www/flyer-crawler.projectium.com/logs/app.log 2>/dev/null || echo "Production log file not yet created"
sudo chmod 644 /var/www/flyer-crawler-test.projectium.com/logs/app.log 2>/dev/null || echo "Test log file not yet created"
# Make Redis logs and directory readable
sudo chmod 755 /var/log/redis/
sudo chmod 644 /var/log/redis/redis-server.log
# Make NGINX logs readable
sudo chmod 644 /var/log/nginx/access.log /var/log/nginx/error.log
# Make PM2 logs and directories accessible
sudo chmod 755 /home/gitea-runner/
sudo chmod 755 /home/gitea-runner/.pm2/
sudo chmod 755 /home/gitea-runner/.pm2/logs/
sudo chmod 644 /home/gitea-runner/.pm2/logs/*.log
# Verify logstash group membership
groups logstash
```
**Note:** The application log files are created automatically when the application starts. Run the chmod commands after the first deployment.
### Step 7: Test Logstash Configuration
Test the configuration before starting:
```bash
sudo /usr/share/logstash/bin/logstash --config.test_and_exit -f /etc/logstash/conf.d/bugsink.conf
```
You should see `Configuration OK` if there are no errors.
### Step 8: Start Logstash
```bash
sudo systemctl enable logstash
sudo systemctl start logstash
sudo systemctl status logstash
```
View Logstash logs to verify it's working:
```bash
sudo journalctl -u logstash -f
```
### Troubleshooting Logstash
| Issue | Solution |
| -------------------------- | -------------------------------------------------------------------------------------------------------- |
| "Permission denied" errors | Check file permissions on log files and sincedb directory |
| No events being processed | Verify log file paths exist and contain data |
| HTTP output errors | Check Bugsink is running and DSN key is correct |
| Logstash not starting | Run config test: `sudo /usr/share/logstash/bin/logstash --config.test_and_exit -f /etc/logstash/conf.d/` |
### Alternative: Skip Logstash
Since the flyer-crawler application already sends errors directly to Bugsink via the Sentry SDK (configured in Steps 11-12), you may not need Logstash at all. Logstash is primarily useful for:
- Aggregating logs from services that don't have native Sentry support (Redis, NGINX)
- Centralizing all logs in one place
- Complex log transformations
If you only need application error tracking, the Sentry SDK integration is sufficient.
---
## SSL/TLS with Let's Encrypt
### Install Certbot
```bash
sudo apt install -y certbot python3-certbot-nginx
```
### Obtain Certificate
```bash
sudo certbot --nginx -d flyer-crawler.projectium.com
```
Certbot will automatically configure NGINX for HTTPS.
### Auto-Renewal
Certbot installs a systemd timer for automatic renewal. Verify:
```bash
sudo systemctl status certbot.timer
```
---
## Firewall Configuration
### Configure UFW
```bash
sudo ufw default deny incoming
sudo ufw default allow outgoing
# Allow SSH
sudo ufw allow ssh
# Allow HTTP and HTTPS
sudo ufw allow 80/tcp
sudo ufw allow 443/tcp
# Enable firewall
sudo ufw enable
```
**Important**: Bugsink (port 8000) should NOT be exposed externally. It listens on localhost only.
---
## Maintenance Commands
### Application Management
| Task | Command |
| --------------------- | -------------------------------------------------------------------------------------- |
| View PM2 status | `pm2 status` |
| View application logs | `pm2 logs` |
| Restart all processes | `pm2 restart all` |
| Restart specific app | `pm2 restart flyer-crawler-api` |
| Update application | `cd /opt/flyer-crawler && git pull && npm install && npm run build && pm2 restart all` |
### Service Management
| Service | Start | Stop | Status |
| ---------- | ----------------------------------- | ---------------------------------- | ------------------------------------ |
| PostgreSQL | `sudo systemctl start postgresql` | `sudo systemctl stop postgresql` | `sudo systemctl status postgresql` |
| Redis | `sudo systemctl start redis-server` | `sudo systemctl stop redis-server` | `sudo systemctl status redis-server` |
| NGINX | `sudo systemctl start nginx` | `sudo systemctl stop nginx` | `sudo systemctl status nginx` |
| Bugsink | `sudo systemctl start bugsink` | `sudo systemctl stop bugsink` | `sudo systemctl status bugsink` |
| Logstash | `sudo systemctl start logstash` | `sudo systemctl stop logstash` | `sudo systemctl status logstash` |
### Database Backup
```bash
# Backup application database
pg_dump -U flyer_crawler -h localhost flyer_crawler > backup_$(date +%Y%m%d).sql
# Backup Bugsink database
pg_dump -U bugsink -h localhost bugsink > bugsink_backup_$(date +%Y%m%d).sql
```
### Log Locations
| Log | Location |
| ----------------- | --------------------------- |
| Application (PM2) | `~/.pm2/logs/` |
| NGINX access | `/var/log/nginx/access.log` |
| NGINX error | `/var/log/nginx/error.log` |
| PostgreSQL | `/var/log/postgresql/` |
| Redis | `/var/log/redis/` |
| Bugsink | `journalctl -u bugsink` |
| Logstash | `/var/log/logstash/` |
---
## Related Documentation
- [DEPLOYMENT.md](../DEPLOYMENT.md) - Container-based deployment
- [DATABASE.md](../DATABASE.md) - Database schema and extensions
- [AUTHENTICATION.md](../AUTHENTICATION.md) - OAuth provider setup
- [ADR-015](adr/0015-application-performance-monitoring-and-error-tracking.md) - Error tracking architecture