Files
flyer-crawler.projectium.com/docs/BARE-METAL-SETUP.md
Torben Sorensen 11aeac5edd
Some checks failed
Deploy to Test Environment / deploy-to-test (push) Failing after 1m10s
whoa - so much - new features (UPC,etc) - Sentry for app logging! so much more !
2026-01-11 19:07:02 -08:00

638 lines
15 KiB
Markdown

# Bare-Metal Server Setup Guide
This guide covers the manual installation of Flyer Crawler and its dependencies on a bare-metal Ubuntu server (e.g., a colocation server). This is the definitive reference for setting up a production environment without containers.
**Target Environment**: Ubuntu 22.04 LTS (or newer)
---
## Table of Contents
1. [System Prerequisites](#system-prerequisites)
2. [PostgreSQL Setup](#postgresql-setup)
3. [Redis Setup](#redis-setup)
4. [Node.js and Application Setup](#nodejs-and-application-setup)
5. [PM2 Process Manager](#pm2-process-manager)
6. [NGINX Reverse Proxy](#nginx-reverse-proxy)
7. [Bugsink Error Tracking](#bugsink-error-tracking)
8. [Logstash Log Aggregation](#logstash-log-aggregation)
9. [SSL/TLS with Let's Encrypt](#ssltls-with-lets-encrypt)
10. [Firewall Configuration](#firewall-configuration)
11. [Maintenance Commands](#maintenance-commands)
---
## System Prerequisites
Update the system and install essential packages:
```bash
sudo apt update && sudo apt upgrade -y
sudo apt install -y curl git build-essential python3 python3-pip python3-venv
```
---
## PostgreSQL Setup
### Install PostgreSQL 14+ with PostGIS
```bash
# Add PostgreSQL APT repository
sudo sh -c 'echo "deb http://apt.postgresql.org/pub/repos/apt $(lsb_release -cs)-pgdg main" > /etc/apt/sources.list.d/pgdg.list'
wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add -
sudo apt update
# Install PostgreSQL and PostGIS
sudo apt install -y postgresql-14 postgresql-14-postgis-3
```
### Create Application Database and User
```bash
sudo -u postgres psql
```
```sql
-- Create application user and database
CREATE USER flyer_crawler WITH PASSWORD 'YOUR_SECURE_PASSWORD';
CREATE DATABASE flyer_crawler OWNER flyer_crawler;
-- Connect to the database and enable extensions
\c flyer_crawler
CREATE EXTENSION IF NOT EXISTS postgis;
CREATE EXTENSION IF NOT EXISTS pg_trgm;
CREATE EXTENSION IF NOT EXISTS pgcrypto;
-- Grant privileges
GRANT ALL PRIVILEGES ON DATABASE flyer_crawler TO flyer_crawler;
\q
```
### Create Bugsink Database (for error tracking)
```bash
sudo -u postgres psql
```
```sql
-- Create dedicated Bugsink user and database
CREATE USER bugsink WITH PASSWORD 'BUGSINK_SECURE_PASSWORD';
CREATE DATABASE bugsink OWNER bugsink;
GRANT ALL PRIVILEGES ON DATABASE bugsink TO bugsink;
\q
```
### Configure PostgreSQL for Remote Access (if needed)
Edit `/etc/postgresql/14/main/postgresql.conf`:
```conf
listen_addresses = 'localhost' # Change to '*' for remote access
```
Edit `/etc/postgresql/14/main/pg_hba.conf` to add allowed hosts:
```conf
# Local connections
local all all peer
host all all 127.0.0.1/32 scram-sha-256
```
Restart PostgreSQL:
```bash
sudo systemctl restart postgresql
```
---
## Redis Setup
### Install Redis
```bash
sudo apt install -y redis-server
```
### Configure Redis Password
Edit `/etc/redis/redis.conf`:
```conf
requirepass YOUR_REDIS_PASSWORD
```
Restart Redis:
```bash
sudo systemctl restart redis-server
sudo systemctl enable redis-server
```
### Test Redis Connection
```bash
redis-cli -a YOUR_REDIS_PASSWORD ping
# Should output: PONG
```
---
## Node.js and Application Setup
### Install Node.js 20.x
```bash
curl -fsSL https://deb.nodesource.com/setup_20.x | sudo -E bash -
sudo apt install -y nodejs
```
Verify installation:
```bash
node --version # Should output v20.x.x
npm --version
```
### Install System Dependencies for PDF Processing
```bash
sudo apt install -y poppler-utils # For pdftocairo
```
### Clone and Install Application
```bash
# Create application directory
sudo mkdir -p /opt/flyer-crawler
sudo chown $USER:$USER /opt/flyer-crawler
# Clone repository
cd /opt/flyer-crawler
git clone https://gitea.projectium.com/flyer-crawler/flyer-crawler.projectium.com.git .
# Install dependencies
npm install
# Build for production
npm run build
```
### Configure Environment Variables
Create a systemd environment file at `/etc/flyer-crawler/environment`:
```bash
sudo mkdir -p /etc/flyer-crawler
sudo nano /etc/flyer-crawler/environment
```
Add the following (replace with actual values):
```bash
# Database
DB_HOST=localhost
DB_USER=flyer_crawler
DB_PASSWORD=YOUR_SECURE_PASSWORD
DB_DATABASE_PROD=flyer_crawler
# Redis
REDIS_HOST=localhost
REDIS_PORT=6379
REDIS_PASSWORD_PROD=YOUR_REDIS_PASSWORD
# Authentication
JWT_SECRET=YOUR_LONG_RANDOM_JWT_SECRET
# Google APIs
VITE_GOOGLE_GENAI_API_KEY=YOUR_GEMINI_API_KEY
GOOGLE_MAPS_API_KEY=YOUR_MAPS_API_KEY
# Sentry/Bugsink Error Tracking (ADR-015)
SENTRY_DSN=http://BACKEND_KEY@localhost:8000/1
VITE_SENTRY_DSN=http://FRONTEND_KEY@localhost:8000/2
SENTRY_ENVIRONMENT=production
VITE_SENTRY_ENVIRONMENT=production
SENTRY_ENABLED=true
VITE_SENTRY_ENABLED=true
SENTRY_DEBUG=false
VITE_SENTRY_DEBUG=false
# Application
NODE_ENV=production
PORT=3001
```
Secure the file:
```bash
sudo chmod 600 /etc/flyer-crawler/environment
```
---
## PM2 Process Manager
### Install PM2 Globally
```bash
sudo npm install -g pm2
```
### Start Application with PM2
```bash
cd /opt/flyer-crawler
npm run start:prod
```
This starts three processes:
- `flyer-crawler-api` - Main API server (port 3001)
- `flyer-crawler-worker` - Background job worker
- `flyer-crawler-analytics-worker` - Analytics processing worker
### Configure PM2 Startup
```bash
pm2 startup systemd
# Follow the command output to enable PM2 on boot
pm2 save
```
### PM2 Log Rotation
```bash
pm2 install pm2-logrotate
pm2 set pm2-logrotate:max_size 10M
pm2 set pm2-logrotate:retain 14
pm2 set pm2-logrotate:compress true
```
---
## NGINX Reverse Proxy
### Install NGINX
```bash
sudo apt install -y nginx
```
### Create Site Configuration
Create `/etc/nginx/sites-available/flyer-crawler.projectium.com`:
```nginx
server {
listen 80;
server_name flyer-crawler.projectium.com;
# Redirect HTTP to HTTPS (uncomment after SSL setup)
# return 301 https://$server_name$request_uri;
location / {
proxy_pass http://localhost:5173;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection 'upgrade';
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_cache_bypass $http_upgrade;
}
location /api {
proxy_pass http://localhost:3001;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection 'upgrade';
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_cache_bypass $http_upgrade;
# File upload size limit
client_max_body_size 50M;
}
# MIME type fix for .mjs files
types {
application/javascript js mjs;
}
}
```
### Enable the Site
```bash
sudo ln -s /etc/nginx/sites-available/flyer-crawler.projectium.com /etc/nginx/sites-enabled/
sudo nginx -t
sudo systemctl reload nginx
sudo systemctl enable nginx
```
---
## Bugsink Error Tracking
Bugsink is a lightweight, self-hosted Sentry-compatible error tracking system. See [ADR-015](adr/0015-application-performance-monitoring-and-error-tracking.md) for architecture details.
### Install Bugsink
```bash
# Create virtual environment
sudo mkdir -p /opt/bugsink
sudo python3 -m venv /opt/bugsink/venv
# Activate and install
source /opt/bugsink/venv/bin/activate
pip install bugsink
# Create wrapper scripts
sudo tee /opt/bugsink/bin/bugsink-manage << 'EOF'
#!/bin/bash
source /opt/bugsink/venv/bin/activate
exec python -m bugsink.manage "$@"
EOF
sudo tee /opt/bugsink/bin/bugsink-runserver << 'EOF'
#!/bin/bash
source /opt/bugsink/venv/bin/activate
exec python -m bugsink.runserver "$@"
EOF
sudo chmod +x /opt/bugsink/bin/bugsink-manage /opt/bugsink/bin/bugsink-runserver
```
### Configure Bugsink
Create `/etc/bugsink/environment`:
```bash
sudo mkdir -p /etc/bugsink
sudo nano /etc/bugsink/environment
```
```bash
SECRET_KEY=YOUR_RANDOM_50_CHAR_SECRET_KEY
DATABASE_URL=postgresql://bugsink:BUGSINK_SECURE_PASSWORD@localhost:5432/bugsink
BASE_URL=http://localhost:8000
PORT=8000
```
```bash
sudo chmod 600 /etc/bugsink/environment
```
### Initialize Bugsink Database
```bash
source /etc/bugsink/environment
/opt/bugsink/bin/bugsink-manage migrate
/opt/bugsink/bin/bugsink-manage migrate --database=snappea
```
### Create Bugsink Admin User
```bash
/opt/bugsink/bin/bugsink-manage createsuperuser
```
### Create Systemd Service
Create `/etc/systemd/system/bugsink.service`:
```ini
[Unit]
Description=Bugsink Error Tracking
After=network.target postgresql.service
[Service]
Type=simple
User=www-data
Group=www-data
EnvironmentFile=/etc/bugsink/environment
ExecStart=/opt/bugsink/bin/bugsink-runserver 0.0.0.0:8000
Restart=always
RestartSec=5
[Install]
WantedBy=multi-user.target
```
```bash
sudo systemctl daemon-reload
sudo systemctl enable bugsink
sudo systemctl start bugsink
```
### Create Bugsink Projects and Get DSNs
1. Access Bugsink UI at `http://localhost:8000`
2. Log in with admin credentials
3. Create projects:
- **flyer-crawler-backend** (Platform: Node.js)
- **flyer-crawler-frontend** (Platform: React)
4. Copy the DSNs from each project's settings
5. Update `/etc/flyer-crawler/environment` with the DSNs
### Test Error Tracking
```bash
cd /opt/flyer-crawler
npx tsx scripts/test-bugsink.ts
```
Check Bugsink UI for test events.
---
## Logstash Log Aggregation
Logstash aggregates logs from the application and infrastructure, forwarding errors to Bugsink.
### Install Logstash
```bash
# Add Elastic APT repository
wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo gpg --dearmor -o /usr/share/keyrings/elastic-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/elastic-keyring.gpg] https://artifacts.elastic.co/packages/8.x/apt stable main" | sudo tee /etc/apt/sources.list.d/elastic-8.x.list
sudo apt update
sudo apt install -y logstash
```
### Configure Logstash Pipeline
Create `/etc/logstash/conf.d/bugsink.conf`:
```conf
input {
# Pino application logs
file {
path => "/opt/flyer-crawler/logs/*.log"
codec => json
type => "pino"
tags => ["app"]
}
# Redis logs
file {
path => "/var/log/redis/*.log"
type => "redis"
tags => ["redis"]
}
}
filter {
# Pino error detection (level 50 = error, 60 = fatal)
if [type] == "pino" and [level] >= 50 {
mutate { add_tag => ["error"] }
}
# Redis error detection
if [type] == "redis" {
grok {
match => { "message" => "%{POSINT:pid}:%{WORD:role} %{MONTHDAY} %{MONTH} %{TIME} %{WORD:loglevel} %{GREEDYDATA:redis_message}" }
}
if [loglevel] in ["WARNING", "ERROR"] {
mutate { add_tag => ["error"] }
}
}
}
output {
if "error" in [tags] {
http {
url => "http://localhost:8000/api/1/store/"
http_method => "post"
format => "json"
headers => {
"X-Sentry-Auth" => "Sentry sentry_version=7, sentry_client=logstash/1.0, sentry_key=YOUR_BACKEND_DSN_KEY"
}
}
}
}
```
Replace `YOUR_BACKEND_DSN_KEY` with the key from your backend project DSN.
### Start Logstash
```bash
sudo systemctl enable logstash
sudo systemctl start logstash
```
---
## SSL/TLS with Let's Encrypt
### Install Certbot
```bash
sudo apt install -y certbot python3-certbot-nginx
```
### Obtain Certificate
```bash
sudo certbot --nginx -d flyer-crawler.projectium.com
```
Certbot will automatically configure NGINX for HTTPS.
### Auto-Renewal
Certbot installs a systemd timer for automatic renewal. Verify:
```bash
sudo systemctl status certbot.timer
```
---
## Firewall Configuration
### Configure UFW
```bash
sudo ufw default deny incoming
sudo ufw default allow outgoing
# Allow SSH
sudo ufw allow ssh
# Allow HTTP and HTTPS
sudo ufw allow 80/tcp
sudo ufw allow 443/tcp
# Enable firewall
sudo ufw enable
```
**Important**: Bugsink (port 8000) should NOT be exposed externally. It listens on localhost only.
---
## Maintenance Commands
### Application Management
| Task | Command |
| --------------------- | -------------------------------------------------------------------------------------- |
| View PM2 status | `pm2 status` |
| View application logs | `pm2 logs` |
| Restart all processes | `pm2 restart all` |
| Restart specific app | `pm2 restart flyer-crawler-api` |
| Update application | `cd /opt/flyer-crawler && git pull && npm install && npm run build && pm2 restart all` |
### Service Management
| Service | Start | Stop | Status |
| ---------- | ----------------------------------- | ---------------------------------- | ------------------------------------ |
| PostgreSQL | `sudo systemctl start postgresql` | `sudo systemctl stop postgresql` | `sudo systemctl status postgresql` |
| Redis | `sudo systemctl start redis-server` | `sudo systemctl stop redis-server` | `sudo systemctl status redis-server` |
| NGINX | `sudo systemctl start nginx` | `sudo systemctl stop nginx` | `sudo systemctl status nginx` |
| Bugsink | `sudo systemctl start bugsink` | `sudo systemctl stop bugsink` | `sudo systemctl status bugsink` |
| Logstash | `sudo systemctl start logstash` | `sudo systemctl stop logstash` | `sudo systemctl status logstash` |
### Database Backup
```bash
# Backup application database
pg_dump -U flyer_crawler -h localhost flyer_crawler > backup_$(date +%Y%m%d).sql
# Backup Bugsink database
pg_dump -U bugsink -h localhost bugsink > bugsink_backup_$(date +%Y%m%d).sql
```
### Log Locations
| Log | Location |
| ----------------- | --------------------------- |
| Application (PM2) | `~/.pm2/logs/` |
| NGINX access | `/var/log/nginx/access.log` |
| NGINX error | `/var/log/nginx/error.log` |
| PostgreSQL | `/var/log/postgresql/` |
| Redis | `/var/log/redis/` |
| Bugsink | `journalctl -u bugsink` |
| Logstash | `/var/log/logstash/` |
---
## Related Documentation
- [DEPLOYMENT.md](../DEPLOYMENT.md) - Container-based deployment
- [DATABASE.md](../DATABASE.md) - Database schema and extensions
- [AUTHENTICATION.md](../AUTHENTICATION.md) - OAuth provider setup
- [ADR-015](adr/0015-application-performance-monitoring-and-error-tracking.md) - Error tracking architecture