42 KiB
Bare-Metal Server Setup Guide
This guide covers the manual installation of Flyer Crawler and its dependencies on a bare-metal Ubuntu server (e.g., a colocation server). This is the definitive reference for setting up a production environment without containers.
Target Environment: Ubuntu 22.04 LTS (or newer)
Table of Contents
- System Prerequisites
- PostgreSQL Setup
- Redis Setup
- Node.js and Application Setup
- PM2 Process Manager
- NGINX Reverse Proxy
- Bugsink Error Tracking
- Logstash Log Aggregation
- SSL/TLS with Let's Encrypt
- Firewall Configuration
- Maintenance Commands
System Prerequisites
Update the system and install essential packages:
sudo apt update && sudo apt upgrade -y
sudo apt install -y curl git build-essential python3 python3-pip python3-venv
PostgreSQL Setup
Install PostgreSQL 14+ with PostGIS
# Add PostgreSQL APT repository
sudo sh -c 'echo "deb http://apt.postgresql.org/pub/repos/apt $(lsb_release -cs)-pgdg main" > /etc/apt/sources.list.d/pgdg.list'
wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add -
sudo apt update
# Install PostgreSQL and PostGIS
sudo apt install -y postgresql-14 postgresql-14-postgis-3
Create Application Database and User
sudo -u postgres psql
-- Create application user and database
CREATE USER flyer_crawler WITH PASSWORD 'YOUR_SECURE_PASSWORD';
CREATE DATABASE flyer_crawler OWNER flyer_crawler;
-- Connect to the database and enable extensions
\c flyer_crawler
CREATE EXTENSION IF NOT EXISTS postgis;
CREATE EXTENSION IF NOT EXISTS pg_trgm;
CREATE EXTENSION IF NOT EXISTS pgcrypto;
-- Grant privileges
GRANT ALL PRIVILEGES ON DATABASE flyer_crawler TO flyer_crawler;
\q
Configure PostgreSQL for Remote Access (if needed)
Edit /etc/postgresql/14/main/postgresql.conf:
listen_addresses = 'localhost' # Change to '*' for remote access
Edit /etc/postgresql/14/main/pg_hba.conf to add allowed hosts:
# Local connections
local all all peer
host all all 127.0.0.1/32 scram-sha-256
Restart PostgreSQL:
sudo systemctl restart postgresql
Redis Setup
Install Redis
sudo apt install -y redis-server
Configure Redis Password
Edit /etc/redis/redis.conf:
requirepass YOUR_REDIS_PASSWORD
Restart Redis:
sudo systemctl restart redis-server
sudo systemctl enable redis-server
Test Redis Connection
redis-cli -a YOUR_REDIS_PASSWORD ping
# Should output: PONG
Node.js and Application Setup
Install Node.js 20.x
curl -fsSL https://deb.nodesource.com/setup_20.x | sudo -E bash -
sudo apt install -y nodejs
Verify installation:
node --version # Should output v20.x.x
npm --version
Install System Dependencies for PDF Processing
sudo apt install -y poppler-utils # For pdftocairo
Clone and Install Application
# Create application directory
sudo mkdir -p /opt/flyer-crawler
sudo chown $USER:$USER /opt/flyer-crawler
# Clone repository
cd /opt/flyer-crawler
git clone https://gitea.projectium.com/flyer-crawler/flyer-crawler.projectium.com.git .
# Install dependencies
npm install
# Build for production
npm run build
Configure Environment Variables
Important: The flyer-crawler application does not use local environment files in production. All secrets are managed through Gitea CI/CD secrets and injected during deployment.
How Secrets Work
- Secrets are stored in Gitea at Repository → Settings → Actions → Secrets
- Workflow files (
.gitea/workflows/deploy-to-prod.yml) reference secrets using${{ secrets.SECRET_NAME }} - PM2 receives environment variables from the workflow's
env:block - ecosystem.config.cjs passes variables to the application via
process.env
Required Gitea Secrets
Before deployment, ensure these secrets are configured in Gitea:
Shared Secrets (used by both production and test):
| Secret Name | Description |
|---|---|
DB_HOST |
Database hostname (usually localhost) |
DB_USER |
Database username |
DB_PASSWORD |
Database password |
JWT_SECRET |
JWT signing secret (min 32 characters) |
GOOGLE_MAPS_API_KEY |
Google Maps API key |
GOOGLE_CLIENT_ID |
Google OAuth client ID |
GOOGLE_CLIENT_SECRET |
Google OAuth client secret |
GH_CLIENT_ID |
GitHub OAuth client ID |
GH_CLIENT_SECRET |
GitHub OAuth client secret |
Production-Specific Secrets:
| Secret Name | Description |
|---|---|
DB_DATABASE_PROD |
Production database name (flyer_crawler) |
REDIS_PASSWORD_PROD |
Redis password for production (uses database 0) |
VITE_GOOGLE_GENAI_API_KEY |
Gemini API key for production |
SENTRY_DSN |
Bugsink backend DSN (see Bugsink section) |
VITE_SENTRY_DSN |
Bugsink frontend DSN |
Test-Specific Secrets:
| Secret Name | Description |
|---|---|
DB_DATABASE_TEST |
Test database name (flyer-crawler-test) |
REDIS_PASSWORD_TEST |
Redis password for test (uses database 1 for isolation) |
VITE_GOOGLE_GENAI_API_KEY_TEST |
Gemini API key for test environment |
SENTRY_DSN_TEST |
Bugsink backend DSN for test (see Bugsink section) |
VITE_SENTRY_DSN_TEST |
Bugsink frontend DSN for test |
Test Environment Details
The test environment (flyer-crawler-test.projectium.com) uses both Gitea CI/CD secrets and a local .env.test file:
| Path | Purpose |
|---|---|
/var/www/flyer-crawler-test.projectium.com/ |
Test application directory |
/var/www/flyer-crawler-test.projectium.com/.env.test |
Local overrides for test-specific config |
Key differences from production:
- Uses Redis database 1 (production uses database 0) to isolate job queues
- PM2 processes are named with
-testsuffix (e.g.,flyer-crawler-api-test) - Deployed automatically on every push to
mainbranch - Has a
.env.testfile for additional local configuration overrides
For detailed information on secrets management, see CLAUDE.md.
PM2 Process Manager
Install PM2 Globally
sudo npm install -g pm2
PM2 Configuration Files
The application uses separate ecosystem config files for production and test environments:
| File | Purpose | Processes Started |
|---|---|---|
ecosystem.config.cjs |
Production deployment | flyer-crawler-api, flyer-crawler-worker, flyer-crawler-analytics-worker |
ecosystem-test.config.cjs |
Test deployment | flyer-crawler-api-test, flyer-crawler-worker-test, flyer-crawler-analytics-worker-test |
Key Points:
- Production and test processes run simultaneously with distinct names
- Test processes use
NODE_ENV=testwhich enables file logging - Test processes use Redis database 1 (isolated from production which uses database 0)
- Both configs validate required environment variables but only warn (don't exit) if missing
Start Production Application
cd /var/www/flyer-crawler.projectium.com
# Set required environment variables (usually done via CI/CD)
export DB_HOST=localhost
export JWT_SECRET=your-secret
export GEMINI_API_KEY=your-api-key
# ... other required variables
pm2 startOrReload ecosystem.config.cjs --update-env && pm2 save
This starts three production processes:
flyer-crawler-api- Main API server (port 3001)flyer-crawler-worker- Background job workerflyer-crawler-analytics-worker- Analytics processing worker
Start Test Application
cd /var/www/flyer-crawler-test.projectium.com
# Set required environment variables (usually done via CI/CD)
export DB_HOST=localhost
export DB_NAME=flyer-crawler-test
export JWT_SECRET=your-secret
export GEMINI_API_KEY=your-test-api-key
export REDIS_URL=redis://localhost:6379/1 # Use database 1 for isolation
# ... other required variables
pm2 startOrReload ecosystem-test.config.cjs --update-env && pm2 save
This starts three test processes (running alongside production):
flyer-crawler-api-test- Test API server (port 3001 via different NGINX vhost)flyer-crawler-worker-test- Test background job workerflyer-crawler-analytics-worker-test- Test analytics worker
Verify Running Processes
After starting both environments, you should see 6 application processes:
pm2 list
Expected output:
┌────┬───────────────────────────────────┬──────────┬────────┬───────────┐
│ id │ name │ mode │ status │ cpu │
├────┼───────────────────────────────────┼──────────┼────────┼───────────┤
│ 0 │ flyer-crawler-api │ cluster │ online │ 0% │
│ 1 │ flyer-crawler-worker │ fork │ online │ 0% │
│ 2 │ flyer-crawler-analytics-worker │ fork │ online │ 0% │
│ 3 │ flyer-crawler-api-test │ fork │ online │ 0% │
│ 4 │ flyer-crawler-worker-test │ fork │ online │ 0% │
│ 5 │ flyer-crawler-analytics-worker-test│ fork │ online │ 0% │
└────┴───────────────────────────────────┴──────────┴────────┴───────────┘
Configure PM2 Startup
pm2 startup systemd
# Follow the command output to enable PM2 on boot
pm2 save
PM2 Log Rotation
pm2 install pm2-logrotate
pm2 set pm2-logrotate:max_size 10M
pm2 set pm2-logrotate:retain 14
pm2 set pm2-logrotate:compress true
Useful PM2 Commands
# View logs for a specific process
pm2 logs flyer-crawler-api-test --lines 50
# View environment variables for a process
pm2 env <process-id>
# Restart only test processes
pm2 restart flyer-crawler-api-test flyer-crawler-worker-test flyer-crawler-analytics-worker-test
# Delete all test processes (without affecting production)
pm2 delete flyer-crawler-api-test flyer-crawler-worker-test flyer-crawler-analytics-worker-test
NGINX Reverse Proxy
Install NGINX
sudo apt install -y nginx
Create Site Configuration
Create /etc/nginx/sites-available/flyer-crawler.projectium.com:
server {
listen 80;
server_name flyer-crawler.projectium.com;
# Redirect HTTP to HTTPS (uncomment after SSL setup)
# return 301 https://$server_name$request_uri;
location / {
proxy_pass http://localhost:5173;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection 'upgrade';
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_cache_bypass $http_upgrade;
}
location /api {
proxy_pass http://localhost:3001;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection 'upgrade';
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_cache_bypass $http_upgrade;
# File upload size limit
client_max_body_size 50M;
}
# MIME type fix for .mjs files
types {
application/javascript js mjs;
}
}
Enable the Site
sudo ln -s /etc/nginx/sites-available/flyer-crawler.projectium.com /etc/nginx/sites-enabled/
sudo nginx -t
sudo systemctl reload nginx
sudo systemctl enable nginx
Bugsink Error Tracking
Bugsink is a lightweight, self-hosted Sentry-compatible error tracking system. This guide follows the official Bugsink single-server production setup.
See ADR-015 for architecture details.
Step 1: Create Bugsink User
Create a dedicated non-root user for Bugsink:
sudo adduser bugsink --disabled-password --gecos ""
Step 2: Set Up Virtual Environment and Install Bugsink
Switch to the bugsink user:
sudo su - bugsink
Create the virtual environment:
python3 -m venv venv
Activate the virtual environment:
source venv/bin/activate
You should see (venv) at the beginning of your prompt. Now install Bugsink:
pip install bugsink --upgrade
bugsink-show-version
You should see output like bugsink 2.x.x.
Step 3: Create Configuration File
Generate the configuration file. Replace bugsink.yourdomain.com with your actual hostname:
bugsink-create-conf --template=singleserver --host=bugsink.yourdomain.com
This creates bugsink_conf.py in /home/bugsink/. Edit it to customize settings:
nano bugsink_conf.py
Key settings to review:
| Setting | Description |
|---|---|
BASE_URL |
The URL where Bugsink will be accessed (e.g., https://bugsink.yourdomain.com) |
SITE_TITLE |
Display name for your Bugsink instance |
SECRET_KEY |
Auto-generated, but verify it exists |
TIME_ZONE |
Your timezone (e.g., America/New_York) |
USER_REGISTRATION |
Set to "closed" to disable public signup |
SINGLE_USER |
Set to True if only one user will use this instance |
Step 4: Initialize Database
Bugsink uses SQLite by default, which is recommended for single-server setups. Run the database migrations:
bugsink-manage migrate
bugsink-manage migrate snappea --database=snappea
Verify the database files were created:
ls *.sqlite3
You should see db.sqlite3 and snappea.sqlite3.
Step 5: Create Admin User
Create the superuser account. Using your email as the username is recommended:
bugsink-manage createsuperuser
Important: Save these credentials - you'll need them to log into the Bugsink web UI.
Step 6: Verify Configuration
Run Django's deployment checks:
bugsink-manage check_migrations
bugsink-manage check --deploy --fail-level WARNING
Exit back to root for the next steps:
exit
Step 7: Create Gunicorn Service
Create /etc/systemd/system/gunicorn-bugsink.service:
sudo nano /etc/systemd/system/gunicorn-bugsink.service
Add the following content:
[Unit]
Description=Gunicorn daemon for Bugsink
After=network.target
[Service]
Restart=always
Type=notify
User=bugsink
Group=bugsink
Environment="PYTHONUNBUFFERED=1"
WorkingDirectory=/home/bugsink
ExecStart=/home/bugsink/venv/bin/gunicorn \
--bind="127.0.0.1:8000" \
--workers=4 \
--timeout=6 \
--access-logfile - \
--max-requests=1000 \
--max-requests-jitter=100 \
bugsink.wsgi
ExecReload=/bin/kill -s HUP $MAINPID
KillMode=mixed
TimeoutStopSec=5
[Install]
WantedBy=multi-user.target
Enable and start the service:
sudo systemctl daemon-reload
sudo systemctl enable --now gunicorn-bugsink.service
sudo systemctl status gunicorn-bugsink.service
Test that Gunicorn is responding (replace hostname):
curl http://localhost:8000/accounts/login/ --header "Host: bugsink.yourdomain.com"
You should see HTML output containing a login form.
Step 8: Create Snappea Background Worker Service
Snappea is Bugsink's background task processor. Create /etc/systemd/system/snappea.service:
sudo nano /etc/systemd/system/snappea.service
Add the following content:
[Unit]
Description=Snappea daemon for Bugsink background tasks
After=network.target
[Service]
Restart=always
User=bugsink
Group=bugsink
Environment="PYTHONUNBUFFERED=1"
WorkingDirectory=/home/bugsink
ExecStart=/home/bugsink/venv/bin/bugsink-runsnappea
KillMode=mixed
TimeoutStopSec=5
RuntimeMaxSec=1d
[Install]
WantedBy=multi-user.target
Enable and start the service:
sudo systemctl daemon-reload
sudo systemctl enable --now snappea.service
sudo systemctl status snappea.service
Verify snappea is working:
sudo su - bugsink
source venv/bin/activate
bugsink-manage checksnappea
exit
Step 9: Configure NGINX for Bugsink
Create /etc/nginx/sites-available/bugsink:
sudo nano /etc/nginx/sites-available/bugsink
Add the following (replace bugsink.yourdomain.com with your hostname):
server {
server_name bugsink.yourdomain.com;
listen 80;
client_max_body_size 20M;
access_log /var/log/nginx/bugsink.access.log;
error_log /var/log/nginx/bugsink.error.log;
location / {
proxy_pass http://127.0.0.1:8000;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-Proto $scheme;
}
}
Enable the site:
sudo ln -s /etc/nginx/sites-available/bugsink /etc/nginx/sites-enabled/
sudo nginx -t
sudo systemctl reload nginx
Step 10: Configure SSL with Certbot (Recommended)
sudo certbot --nginx -d bugsink.yourdomain.com
After SSL is configured, update the NGINX config to add security headers. Edit /etc/nginx/sites-available/bugsink and add to the location / block:
add_header Strict-Transport-Security "max-age=31536000; preload" always;
Reload NGINX:
sudo nginx -t
sudo systemctl reload nginx
Step 11: Create Projects and Get DSNs
-
Access Bugsink UI at
https://bugsink.yourdomain.com -
Log in with the admin credentials you created
-
Create a new team (or use the default)
-
Create projects for each environment:
Production:
- flyer-crawler-backend (Platform: Node.js)
- flyer-crawler-frontend (Platform: JavaScript/React)
Test:
- flyer-crawler-backend-test (Platform: Node.js)
- flyer-crawler-frontend-test (Platform: JavaScript/React)
-
For each project, go to Settings → Client Keys (DSN)
-
Copy the DSN URLs - you'll have 4 DSNs total (2 for production, 2 for test)
Note: The dev container runs its own local Bugsink instance at
localhost:8000- no remote DSNs needed for development.
Step 12: Configure Application to Use Bugsink
The flyer-crawler application receives its configuration via Gitea CI/CD secrets, not local environment files. Follow these steps to add the Bugsink DSNs:
1. Add Secrets in Gitea
Navigate to your repository in Gitea:
- Go to Settings → Actions → Secrets
- Add the following secrets:
Production DSNs:
| Secret Name | Value | Description |
|---|---|---|
SENTRY_DSN |
https://KEY@bugsink.yourdomain.com/1 |
Production backend DSN |
VITE_SENTRY_DSN |
https://KEY@bugsink.yourdomain.com/2 |
Production frontend DSN |
Test DSNs:
| Secret Name | Value | Description |
|---|---|---|
SENTRY_DSN_TEST |
https://KEY@bugsink.yourdomain.com/3 |
Test backend DSN |
VITE_SENTRY_DSN_TEST |
https://KEY@bugsink.yourdomain.com/4 |
Test frontend DSN |
Note: The project numbers in the DSN URLs are assigned by Bugsink when you create each project. Use the actual DSN values from Step 11.
2. Update the Deployment Workflows
Production (deploy-to-prod.yml):
In the Install Backend Dependencies and Restart Production Server step, add to the env: block:
env:
# ... existing secrets ...
# Sentry/Bugsink Error Tracking
SENTRY_DSN: ${{ secrets.SENTRY_DSN }}
SENTRY_ENVIRONMENT: 'production'
SENTRY_ENABLED: 'true'
In the build step, add frontend variables:
VITE_SENTRY_DSN=${{ secrets.VITE_SENTRY_DSN }} \
VITE_SENTRY_ENVIRONMENT=production \
VITE_SENTRY_ENABLED=true \
npm run build
Test (deploy-to-test.yml):
In the Install Backend Dependencies and Restart Test Server step, add to the env: block:
env:
# ... existing secrets ...
# Sentry/Bugsink Error Tracking (Test)
SENTRY_DSN: ${{ secrets.SENTRY_DSN_TEST }}
SENTRY_ENVIRONMENT: 'test'
SENTRY_ENABLED: 'true'
In the build step, add frontend variables:
VITE_SENTRY_DSN=${{ secrets.VITE_SENTRY_DSN_TEST }} \
VITE_SENTRY_ENVIRONMENT=test \
VITE_SENTRY_ENABLED=true \
npm run build
3. Update ecosystem.config.cjs
Add Sentry variables to the sharedEnv object in ecosystem.config.cjs:
const sharedEnv = {
// ... existing variables ...
SENTRY_DSN: process.env.SENTRY_DSN,
SENTRY_ENVIRONMENT: process.env.SENTRY_ENVIRONMENT,
SENTRY_ENABLED: process.env.SENTRY_ENABLED,
};
4. Dev Container (No Configuration Needed)
The dev container runs its own local Bugsink instance at http://localhost:8000. No remote DSNs or Gitea secrets are needed for development:
- DSNs are pre-configured in
compose.dev.yml - Admin UI:
http://localhost:8000(login:admin@localhost/admin) - Errors stay local and isolated from production/test
5. Deploy to Apply Changes
Trigger deployments via Gitea Actions:
- Test: Automatically deploys on push to
main - Production: Manual trigger via workflow dispatch
Note: There is no /etc/flyer-crawler/environment file on the server. Production and test secrets are managed through Gitea CI/CD and injected at deployment time. Dev container uses local .env file. See CLAUDE.md for details.
Step 13: Test Error Tracking
You can test Bugsink is working before configuring the flyer-crawler application.
Switch to the bugsink user and open a Python shell:
sudo su - bugsink
source venv/bin/activate
bugsink-manage shell
In the Python shell, send a test message using the backend DSN from Step 11:
import sentry_sdk
sentry_sdk.init("https://YOUR_BACKEND_KEY@bugsink.yourdomain.com/1")
sentry_sdk.capture_message("Test message from Bugsink setup")
exit()
Exit back to root:
exit
Check the Bugsink UI - you should see the test message appear in the flyer-crawler-backend project.
Step 14: Test from Flyer-Crawler Application (After App Setup)
Once the flyer-crawler application has been deployed with the Sentry secrets configured in Step 12:
cd /var/www/flyer-crawler.projectium.com
npx tsx scripts/test-bugsink.ts
Check the Bugsink UI - you should see a test event appear.
Bugsink Maintenance Commands
| Task | Command |
|---|---|
| View Gunicorn status | sudo systemctl status gunicorn-bugsink |
| View Snappea status | sudo systemctl status snappea |
| View Gunicorn logs | sudo journalctl -u gunicorn-bugsink -f |
| View Snappea logs | sudo journalctl -u snappea -f |
| Restart Bugsink | sudo systemctl restart gunicorn-bugsink snappea |
| Run management commands | sudo su - bugsink then source venv/bin/activate && bugsink-manage <command> |
| Upgrade Bugsink | sudo su - bugsink && source venv/bin/activate && pip install bugsink --upgrade && exit && sudo systemctl restart gunicorn-bugsink snappea |
Logstash Log Aggregation
Logstash aggregates logs from the application and infrastructure, forwarding errors to Bugsink.
Note: Logstash integration is optional. The flyer-crawler application already sends errors directly to Bugsink via the Sentry SDK. Logstash is only needed if you want to aggregate logs from other sources (Redis, NGINX, etc.) into Bugsink.
Step 1: Create Application Log Directory
The flyer-crawler application automatically creates its log directory on startup, but you need to ensure proper permissions for Logstash to read the logs.
Create the log directories and set appropriate permissions:
# Create log directory for the production application
sudo mkdir -p /var/www/flyer-crawler.projectium.com/logs
# Set ownership to root (since PM2 runs as root)
sudo chown -R root:root /var/www/flyer-crawler.projectium.com/logs
# Make logs readable by logstash user
sudo chmod 755 /var/www/flyer-crawler.projectium.com/logs
For the test environment:
sudo mkdir -p /var/www/flyer-crawler-test.projectium.com/logs
sudo chown -R root:root /var/www/flyer-crawler-test.projectium.com/logs
sudo chmod 755 /var/www/flyer-crawler-test.projectium.com/logs
Step 2: Application File Logging (Already Configured)
The flyer-crawler application uses Pino for logging and is configured to write logs to files in production/test environments:
Log File Locations:
| Environment | Log File Path |
|---|---|
| Production | /var/www/flyer-crawler.projectium.com/logs/app.log |
| Test | /var/www/flyer-crawler-test.projectium.com/logs/app.log |
| Dev Container | /app/logs/app.log |
How It Works:
- In production/test: Pino writes JSON logs to both stdout (for PM2) AND
logs/app.log(for Logstash) - In development: Pino uses pino-pretty for human-readable console output only
- The log directory is created automatically if it doesn't exist
- You can override the log directory with the
LOG_DIRenvironment variable
Verify Logging After Deployment:
After deploying the application, verify that logs are being written:
# Check production logs
ls -la /var/www/flyer-crawler.projectium.com/logs/
tail -f /var/www/flyer-crawler.projectium.com/logs/app.log
# Check test logs
ls -la /var/www/flyer-crawler-test.projectium.com/logs/
tail -f /var/www/flyer-crawler-test.projectium.com/logs/app.log
You should see JSON-formatted log entries like:
{ "level": 30, "time": 1704067200000, "msg": "Server started on port 3001", "module": "server" }
Step 3: Install Logstash
# Add Elastic APT repository
wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo gpg --dearmor -o /usr/share/keyrings/elastic-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/elastic-keyring.gpg] https://artifacts.elastic.co/packages/8.x/apt stable main" | sudo tee /etc/apt/sources.list.d/elastic-8.x.list
# Update and install
sudo apt update
sudo apt install -y logstash
Verify installation:
/usr/share/logstash/bin/logstash --version
Step 4: Configure Logstash Pipeline
Create the pipeline configuration file:
sudo nano /etc/logstash/conf.d/bugsink.conf
Add the following content:
input {
# Production application logs (Pino JSON format)
file {
path => "/var/www/flyer-crawler.projectium.com/logs/app.log"
codec => json_lines
type => "pino"
tags => ["app", "production"]
start_position => "end"
sincedb_path => "/var/lib/logstash/sincedb_pino_prod"
}
# Test environment logs
file {
path => "/var/www/flyer-crawler-test.projectium.com/logs/app.log"
codec => json_lines
type => "pino"
tags => ["app", "test"]
start_position => "end"
sincedb_path => "/var/lib/logstash/sincedb_pino_test"
}
# Redis logs (shared by both environments)
file {
path => "/var/log/redis/redis-server.log"
type => "redis"
tags => ["infra", "redis", "production"]
start_position => "end"
sincedb_path => "/var/lib/logstash/sincedb_redis"
}
# NGINX error logs (production)
file {
path => "/var/log/nginx/error.log"
type => "nginx"
tags => ["infra", "nginx", "production"]
start_position => "end"
sincedb_path => "/var/lib/logstash/sincedb_nginx_error"
}
# NGINX access logs - for detecting 5xx errors (production)
file {
path => "/var/log/nginx/access.log"
type => "nginx_access"
tags => ["infra", "nginx", "production"]
start_position => "end"
sincedb_path => "/var/lib/logstash/sincedb_nginx_access"
}
# PM2 error logs - Production (plain text stack traces)
file {
path => "/home/gitea-runner/.pm2/logs/flyer-crawler-*-error.log"
exclude => "*-test-error.log"
type => "pm2"
tags => ["infra", "pm2", "production"]
start_position => "end"
sincedb_path => "/var/lib/logstash/sincedb_pm2_prod"
}
# PM2 error logs - Test
file {
path => "/home/gitea-runner/.pm2/logs/flyer-crawler-*-test-error.log"
type => "pm2"
tags => ["infra", "pm2", "test"]
start_position => "end"
sincedb_path => "/var/lib/logstash/sincedb_pm2_test"
}
}
filter {
# Pino log level detection
# Pino levels: 10=trace, 20=debug, 30=info, 40=warn, 50=error, 60=fatal
if [type] == "pino" and [level] {
if [level] >= 50 {
mutate { add_tag => ["error"] }
} else if [level] >= 40 {
mutate { add_tag => ["warning"] }
}
}
# Redis error detection
if [type] == "redis" {
grok {
match => { "message" => "%{POSINT:pid}:%{WORD:role} %{MONTHDAY} %{MONTH} %{YEAR}? ?%{TIME} %{WORD:loglevel} %{GREEDYDATA:redis_message}" }
}
if [loglevel] in ["WARNING", "ERROR"] {
mutate { add_tag => ["error"] }
}
}
# NGINX error log detection (all entries are errors)
if [type] == "nginx" {
mutate { add_tag => ["error"] }
grok {
match => { "message" => "%{TIMESTAMP_ISO8601:timestamp} \[%{WORD:severity}\] %{GREEDYDATA:nginx_message}" }
}
}
# NGINX access log - detect 5xx errors
if [type] == "nginx_access" {
grok {
match => { "message" => "%{COMBINEDAPACHELOG}" }
}
if [response] =~ /^5\d{2}$/ {
mutate { add_tag => ["error"] }
}
}
# PM2 error log detection - tag lines with actual error indicators
if [type] == "pm2" {
if [message] =~ /Error:|error:|ECONNREFUSED|ENOENT|TypeError|ReferenceError|SyntaxError/ {
mutate { add_tag => ["error"] }
}
}
}
output {
# Production app errors -> flyer-crawler-backend (project 1)
if "error" in [tags] and "app" in [tags] and "production" in [tags] {
http {
url => "http://localhost:8000/api/1/store/"
http_method => "post"
format => "json"
headers => {
"X-Sentry-Auth" => "Sentry sentry_version=7, sentry_client=logstash/1.0, sentry_key=YOUR_PROD_BACKEND_DSN_KEY"
}
}
}
# Test app errors -> flyer-crawler-backend-test (project 3)
if "error" in [tags] and "app" in [tags] and "test" in [tags] {
http {
url => "http://localhost:8000/api/3/store/"
http_method => "post"
format => "json"
headers => {
"X-Sentry-Auth" => "Sentry sentry_version=7, sentry_client=logstash/1.0, sentry_key=YOUR_TEST_BACKEND_DSN_KEY"
}
}
}
# Production infrastructure errors (Redis, NGINX, PM2) -> flyer-crawler-infrastructure (project 5)
if "error" in [tags] and "infra" in [tags] and "production" in [tags] {
http {
url => "http://localhost:8000/api/5/store/"
http_method => "post"
format => "json"
headers => {
"X-Sentry-Auth" => "Sentry sentry_version=7, sentry_client=logstash/1.0, sentry_key=b083076f94fb461b889d5dffcbef43bf"
}
}
}
# Test infrastructure errors (PM2 test logs) -> flyer-crawler-test-infrastructure (project 6)
if "error" in [tags] and "infra" in [tags] and "test" in [tags] {
http {
url => "http://localhost:8000/api/6/store/"
http_method => "post"
format => "json"
headers => {
"X-Sentry-Auth" => "Sentry sentry_version=7, sentry_client=logstash/1.0, sentry_key=25020dd6c2b74ad78463ec90e90fadab"
}
}
}
# Debug output (uncomment to troubleshoot)
# stdout { codec => rubydebug }
}
Bugsink Project DSNs:
| Project | DSN Key | Project ID |
|---|---|---|
flyer-crawler-backend |
911aef02b9a548fa8fabb8a3c81abfe5 |
1 |
flyer-crawler-frontend |
(used by app, not Logstash) | 2 |
flyer-crawler-backend-test |
cdb99c314589431e83d4cc38a809449b |
3 |
flyer-crawler-frontend-test |
(used by app, not Logstash) | 4 |
flyer-crawler-infrastructure |
b083076f94fb461b889d5dffcbef43bf |
5 |
flyer-crawler-test-infrastructure |
25020dd6c2b74ad78463ec90e90fadab |
6 |
Note: The DSN key is the part before @ in the full DSN URL (e.g., https://KEY@bugsink.projectium.com/PROJECT_ID).
Note on PM2 Logs: PM2 error logs capture stack traces from stderr, which are valuable for debugging startup errors and uncaught exceptions. Production PM2 logs go to project 5 (infrastructure), test PM2 logs go to project 6 (test-infrastructure).
Step 5: Create Logstash State Directory and Fix Config Path
Logstash needs a directory to track which log lines it has already processed, and a symlink so it can find its config files:
# Create state directory for sincedb files
sudo mkdir -p /var/lib/logstash
sudo chown logstash:logstash /var/lib/logstash
# Create symlink so Logstash finds its config (avoids "Could not find logstash.yml" warning)
sudo ln -sf /etc/logstash /usr/share/logstash/config
Step 6: Grant Logstash Access to Application Logs
Logstash runs as the logstash user and needs permission to read log files:
# Add logstash user to adm group (for nginx and redis logs)
sudo usermod -aG adm logstash
# Make application log files readable (created automatically when app starts)
sudo chmod 644 /var/www/flyer-crawler.projectium.com/logs/app.log 2>/dev/null || echo "Production log file not yet created"
sudo chmod 644 /var/www/flyer-crawler-test.projectium.com/logs/app.log 2>/dev/null || echo "Test log file not yet created"
# Make Redis logs and directory readable
sudo chmod 755 /var/log/redis/
sudo chmod 644 /var/log/redis/redis-server.log
# Make NGINX logs readable
sudo chmod 644 /var/log/nginx/access.log /var/log/nginx/error.log
# Make PM2 logs and directories accessible
sudo chmod 755 /home/gitea-runner/
sudo chmod 755 /home/gitea-runner/.pm2/
sudo chmod 755 /home/gitea-runner/.pm2/logs/
sudo chmod 644 /home/gitea-runner/.pm2/logs/*.log
# Verify logstash group membership
groups logstash
Note: The application log files are created automatically when the application starts. Run the chmod commands after the first deployment.
Step 7: Test Logstash Configuration
Test the configuration before starting:
sudo /usr/share/logstash/bin/logstash --config.test_and_exit -f /etc/logstash/conf.d/bugsink.conf
You should see Configuration OK if there are no errors.
Step 8: Start Logstash
sudo systemctl enable logstash
sudo systemctl start logstash
sudo systemctl status logstash
View Logstash logs to verify it's working:
sudo journalctl -u logstash -f
Troubleshooting Logstash
| Issue | Solution |
|---|---|
| "Permission denied" errors | Check file permissions on log files and sincedb directory |
| No events being processed | Verify log file paths exist and contain data |
| HTTP output errors | Check Bugsink is running and DSN key is correct |
| Logstash not starting | Run config test: sudo /usr/share/logstash/bin/logstash --config.test_and_exit -f /etc/logstash/conf.d/ |
Alternative: Skip Logstash
Since the flyer-crawler application already sends errors directly to Bugsink via the Sentry SDK (configured in Steps 11-12), you may not need Logstash at all. Logstash is primarily useful for:
- Aggregating logs from services that don't have native Sentry support (Redis, NGINX)
- Centralizing all logs in one place
- Complex log transformations
If you only need application error tracking, the Sentry SDK integration is sufficient.
SSL/TLS with Let's Encrypt
Install Certbot
sudo apt install -y certbot python3-certbot-nginx
Obtain Certificate
sudo certbot --nginx -d flyer-crawler.projectium.com
Certbot will automatically configure NGINX for HTTPS.
Auto-Renewal
Certbot installs a systemd timer for automatic renewal. Verify:
sudo systemctl status certbot.timer
Firewall Configuration
Configure UFW
sudo ufw default deny incoming
sudo ufw default allow outgoing
# Allow SSH
sudo ufw allow ssh
# Allow HTTP and HTTPS
sudo ufw allow 80/tcp
sudo ufw allow 443/tcp
# Enable firewall
sudo ufw enable
Important: Bugsink (port 8000) should NOT be exposed externally. It listens on localhost only.
Maintenance Commands
Application Management
| Task | Command |
|---|---|
| View PM2 status | pm2 status |
| View application logs | pm2 logs |
| Restart all processes | pm2 restart all |
| Restart specific app | pm2 restart flyer-crawler-api |
| Update application | cd /opt/flyer-crawler && git pull && npm install && npm run build && pm2 restart all |
Service Management
| Service | Start | Stop | Status |
|---|---|---|---|
| PostgreSQL | sudo systemctl start postgresql |
sudo systemctl stop postgresql |
sudo systemctl status postgresql |
| Redis | sudo systemctl start redis-server |
sudo systemctl stop redis-server |
sudo systemctl status redis-server |
| NGINX | sudo systemctl start nginx |
sudo systemctl stop nginx |
sudo systemctl status nginx |
| Bugsink | sudo systemctl start bugsink |
sudo systemctl stop bugsink |
sudo systemctl status bugsink |
| Logstash | sudo systemctl start logstash |
sudo systemctl stop logstash |
sudo systemctl status logstash |
Database Backup
# Backup application database
pg_dump -U flyer_crawler -h localhost flyer_crawler > backup_$(date +%Y%m%d).sql
# Backup Bugsink database
pg_dump -U bugsink -h localhost bugsink > bugsink_backup_$(date +%Y%m%d).sql
Log Locations
| Log | Location |
|---|---|
| Application (PM2) | ~/.pm2/logs/ |
| NGINX access | /var/log/nginx/access.log |
| NGINX error | /var/log/nginx/error.log |
| PostgreSQL | /var/log/postgresql/ |
| Redis | /var/log/redis/ |
| Bugsink | journalctl -u bugsink |
| Logstash | /var/log/logstash/ |
Related Documentation
- DEPLOYMENT.md - Container-based deployment
- DATABASE.md - Database schema and extensions
- AUTHENTICATION.md - OAuth provider setup
- ADR-015 - Error tracking architecture