# Bare-Metal Server Setup Guide This guide covers the manual installation of Flyer Crawler and its dependencies on a bare-metal Ubuntu server (e.g., a colocation server). This is the definitive reference for setting up a production environment without containers. **Target Environment**: Ubuntu 22.04 LTS (or newer) --- ## Table of Contents 1. [System Prerequisites](#system-prerequisites) 2. [PostgreSQL Setup](#postgresql-setup) 3. [Redis Setup](#redis-setup) 4. [Node.js and Application Setup](#nodejs-and-application-setup) 5. [PM2 Process Manager](#pm2-process-manager) 6. [NGINX Reverse Proxy](#nginx-reverse-proxy) 7. [Bugsink Error Tracking](#bugsink-error-tracking) 8. [Logstash Log Aggregation](#logstash-log-aggregation) 9. [SSL/TLS with Let's Encrypt](#ssltls-with-lets-encrypt) 10. [Firewall Configuration](#firewall-configuration) 11. [Maintenance Commands](#maintenance-commands) --- ## System Prerequisites Update the system and install essential packages: ```bash sudo apt update && sudo apt upgrade -y sudo apt install -y curl git build-essential python3 python3-pip python3-venv ``` --- ## PostgreSQL Setup ### Install PostgreSQL 14+ with PostGIS ```bash # Add PostgreSQL APT repository sudo sh -c 'echo "deb http://apt.postgresql.org/pub/repos/apt $(lsb_release -cs)-pgdg main" > /etc/apt/sources.list.d/pgdg.list' wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add - sudo apt update # Install PostgreSQL and PostGIS sudo apt install -y postgresql-14 postgresql-14-postgis-3 ``` ### Create Application Database and User ```bash sudo -u postgres psql ``` ```sql -- Create application user and database CREATE USER flyer_crawler WITH PASSWORD 'YOUR_SECURE_PASSWORD'; CREATE DATABASE flyer_crawler OWNER flyer_crawler; -- Connect to the database and enable extensions \c flyer_crawler CREATE EXTENSION IF NOT EXISTS postgis; CREATE EXTENSION IF NOT EXISTS pg_trgm; CREATE EXTENSION IF NOT EXISTS pgcrypto; -- Grant privileges GRANT ALL PRIVILEGES ON DATABASE flyer_crawler TO flyer_crawler; \q ``` ### Configure PostgreSQL for Remote Access (if needed) Edit `/etc/postgresql/14/main/postgresql.conf`: ```conf listen_addresses = 'localhost' # Change to '*' for remote access ``` Edit `/etc/postgresql/14/main/pg_hba.conf` to add allowed hosts: ```conf # Local connections local all all peer host all all 127.0.0.1/32 scram-sha-256 ``` Restart PostgreSQL: ```bash sudo systemctl restart postgresql ``` --- ## Redis Setup ### Install Redis ```bash sudo apt install -y redis-server ``` ### Configure Redis Password Edit `/etc/redis/redis.conf`: ```conf requirepass YOUR_REDIS_PASSWORD ``` Restart Redis: ```bash sudo systemctl restart redis-server sudo systemctl enable redis-server ``` ### Test Redis Connection ```bash redis-cli -a YOUR_REDIS_PASSWORD ping # Should output: PONG ``` --- ## Node.js and Application Setup ### Install Node.js 20.x ```bash curl -fsSL https://deb.nodesource.com/setup_20.x | sudo -E bash - sudo apt install -y nodejs ``` Verify installation: ```bash node --version # Should output v20.x.x npm --version ``` ### Install System Dependencies for PDF Processing ```bash sudo apt install -y poppler-utils # For pdftocairo ``` ### Clone and Install Application ```bash # Create application directory sudo mkdir -p /opt/flyer-crawler sudo chown $USER:$USER /opt/flyer-crawler # Clone repository cd /opt/flyer-crawler git clone https://gitea.projectium.com/flyer-crawler/flyer-crawler.projectium.com.git . # Install dependencies npm install # Build for production npm run build ``` ### Configure Environment Variables **Important:** The flyer-crawler application does **not** use local environment files in production. All secrets are managed through **Gitea CI/CD secrets** and injected during deployment. #### How Secrets Work 1. **Secrets are stored in Gitea** at Repository → Settings → Actions → Secrets 2. **Workflow files** (`.gitea/workflows/deploy-to-prod.yml`) reference secrets using `${{ secrets.SECRET_NAME }}` 3. **PM2** receives environment variables from the workflow's `env:` block 4. **ecosystem.config.cjs** passes variables to the application via `process.env` #### Required Gitea Secrets Before deployment, ensure these secrets are configured in Gitea: **Shared Secrets** (used by both production and test): | Secret Name | Description | | ---------------------- | --------------------------------------- | | `DB_HOST` | Database hostname (usually `localhost`) | | `DB_USER` | Database username | | `DB_PASSWORD` | Database password | | `JWT_SECRET` | JWT signing secret (min 32 characters) | | `GOOGLE_MAPS_API_KEY` | Google Maps API key | | `GOOGLE_CLIENT_ID` | Google OAuth client ID | | `GOOGLE_CLIENT_SECRET` | Google OAuth client secret | | `GH_CLIENT_ID` | GitHub OAuth client ID | | `GH_CLIENT_SECRET` | GitHub OAuth client secret | **Production-Specific Secrets**: | Secret Name | Description | | --------------------------- | -------------------------------------------------------------------- | | `DB_DATABASE_PROD` | Production database name (`flyer_crawler`) | | `REDIS_PASSWORD_PROD` | Redis password for production (uses database 0) | | `VITE_GOOGLE_GENAI_API_KEY` | Gemini API key for production | | `SENTRY_DSN` | Bugsink backend DSN (see [Bugsink section](#bugsink-error-tracking)) | | `VITE_SENTRY_DSN` | Bugsink frontend DSN | **Test-Specific Secrets**: | Secret Name | Description | | -------------------------------- | ----------------------------------------------------------------------------- | | `DB_DATABASE_TEST` | Test database name (`flyer-crawler-test`) | | `REDIS_PASSWORD_TEST` | Redis password for test (uses database 1 for isolation) | | `VITE_GOOGLE_GENAI_API_KEY_TEST` | Gemini API key for test environment | | `SENTRY_DSN_TEST` | Bugsink backend DSN for test (see [Bugsink section](#bugsink-error-tracking)) | | `VITE_SENTRY_DSN_TEST` | Bugsink frontend DSN for test | #### Test Environment Details The test environment (`flyer-crawler-test.projectium.com`) uses **both** Gitea CI/CD secrets and a local `.env.test` file: | Path | Purpose | | ------------------------------------------------------ | ---------------------------------------- | | `/var/www/flyer-crawler-test.projectium.com/` | Test application directory | | `/var/www/flyer-crawler-test.projectium.com/.env.test` | Local overrides for test-specific config | **Key differences from production:** - Uses Redis database **1** (production uses database **0**) to isolate job queues - PM2 processes are named with `-test` suffix (e.g., `flyer-crawler-api-test`) - Deployed automatically on every push to `main` branch - Has a `.env.test` file for additional local configuration overrides For detailed information on secrets management, see [CLAUDE.md](../CLAUDE.md). --- ## PM2 Process Manager ### Install PM2 Globally ```bash sudo npm install -g pm2 ``` ### PM2 Configuration Files The application uses **separate ecosystem config files** for production and test environments: | File | Purpose | Processes Started | | --------------------------- | --------------------- | -------------------------------------------------------------------------------------------- | | `ecosystem.config.cjs` | Production deployment | `flyer-crawler-api`, `flyer-crawler-worker`, `flyer-crawler-analytics-worker` | | `ecosystem-test.config.cjs` | Test deployment | `flyer-crawler-api-test`, `flyer-crawler-worker-test`, `flyer-crawler-analytics-worker-test` | **Key Points:** - Production and test processes run **simultaneously** with distinct names - Test processes use `NODE_ENV=test` which enables file logging - Test processes use Redis database 1 (isolated from production which uses database 0) - Both configs validate required environment variables but only warn (don't exit) if missing ### Start Production Application ```bash cd /var/www/flyer-crawler.projectium.com # Set required environment variables (usually done via CI/CD) export DB_HOST=localhost export JWT_SECRET=your-secret export GEMINI_API_KEY=your-api-key # ... other required variables pm2 startOrReload ecosystem.config.cjs --update-env && pm2 save ``` This starts three production processes: - `flyer-crawler-api` - Main API server (port 3001) - `flyer-crawler-worker` - Background job worker - `flyer-crawler-analytics-worker` - Analytics processing worker ### Start Test Application ```bash cd /var/www/flyer-crawler-test.projectium.com # Set required environment variables (usually done via CI/CD) export DB_HOST=localhost export DB_NAME=flyer-crawler-test export JWT_SECRET=your-secret export GEMINI_API_KEY=your-test-api-key export REDIS_URL=redis://localhost:6379/1 # Use database 1 for isolation # ... other required variables pm2 startOrReload ecosystem-test.config.cjs --update-env && pm2 save ``` This starts three test processes (running alongside production): - `flyer-crawler-api-test` - Test API server (port 3001 via different NGINX vhost) - `flyer-crawler-worker-test` - Test background job worker - `flyer-crawler-analytics-worker-test` - Test analytics worker ### Verify Running Processes After starting both environments, you should see 6 application processes: ```bash pm2 list ``` Expected output: ```text ┌────┬───────────────────────────────────┬──────────┬────────┬───────────┐ │ id │ name │ mode │ status │ cpu │ ├────┼───────────────────────────────────┼──────────┼────────┼───────────┤ │ 0 │ flyer-crawler-api │ cluster │ online │ 0% │ │ 1 │ flyer-crawler-worker │ fork │ online │ 0% │ │ 2 │ flyer-crawler-analytics-worker │ fork │ online │ 0% │ │ 3 │ flyer-crawler-api-test │ fork │ online │ 0% │ │ 4 │ flyer-crawler-worker-test │ fork │ online │ 0% │ │ 5 │ flyer-crawler-analytics-worker-test│ fork │ online │ 0% │ └────┴───────────────────────────────────┴──────────┴────────┴───────────┘ ``` ### Configure PM2 Startup ```bash pm2 startup systemd # Follow the command output to enable PM2 on boot pm2 save ``` ### PM2 Log Rotation ```bash pm2 install pm2-logrotate pm2 set pm2-logrotate:max_size 10M pm2 set pm2-logrotate:retain 14 pm2 set pm2-logrotate:compress true ``` ### Useful PM2 Commands ```bash # View logs for a specific process pm2 logs flyer-crawler-api-test --lines 50 # View environment variables for a process pm2 env # Restart only test processes pm2 restart flyer-crawler-api-test flyer-crawler-worker-test flyer-crawler-analytics-worker-test # Delete all test processes (without affecting production) pm2 delete flyer-crawler-api-test flyer-crawler-worker-test flyer-crawler-analytics-worker-test ``` --- ## NGINX Reverse Proxy ### Install NGINX ```bash sudo apt install -y nginx ``` ### Create Site Configuration Create `/etc/nginx/sites-available/flyer-crawler.projectium.com`: ```nginx server { listen 80; server_name flyer-crawler.projectium.com; # Redirect HTTP to HTTPS (uncomment after SSL setup) # return 301 https://$server_name$request_uri; location / { proxy_pass http://localhost:5173; proxy_http_version 1.1; proxy_set_header Upgrade $http_upgrade; proxy_set_header Connection 'upgrade'; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header X-Forwarded-Proto $scheme; proxy_cache_bypass $http_upgrade; } location /api { proxy_pass http://localhost:3001; proxy_http_version 1.1; proxy_set_header Upgrade $http_upgrade; proxy_set_header Connection 'upgrade'; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header X-Forwarded-Proto $scheme; proxy_cache_bypass $http_upgrade; # File upload size limit client_max_body_size 50M; } # MIME type fix for .mjs files types { application/javascript js mjs; } } ``` ### Enable the Site ```bash sudo ln -s /etc/nginx/sites-available/flyer-crawler.projectium.com /etc/nginx/sites-enabled/ sudo nginx -t sudo systemctl reload nginx sudo systemctl enable nginx ``` --- ## Bugsink Error Tracking Bugsink is a lightweight, self-hosted Sentry-compatible error tracking system. This guide follows the [official Bugsink single-server production setup](https://www.bugsink.com/docs/single-server-production/). See [ADR-015](adr/0015-application-performance-monitoring-and-error-tracking.md) for architecture details. ### Step 1: Create Bugsink User Create a dedicated non-root user for Bugsink: ```bash sudo adduser bugsink --disabled-password --gecos "" ``` ### Step 2: Set Up Virtual Environment and Install Bugsink Switch to the bugsink user: ```bash sudo su - bugsink ``` Create the virtual environment: ```bash python3 -m venv venv ``` Activate the virtual environment: ```bash source venv/bin/activate ``` You should see `(venv)` at the beginning of your prompt. Now install Bugsink: ```bash pip install bugsink --upgrade bugsink-show-version ``` You should see output like `bugsink 2.x.x`. ### Step 3: Create Configuration File Generate the configuration file. Replace `bugsink.yourdomain.com` with your actual hostname: ```bash bugsink-create-conf --template=singleserver --host=bugsink.yourdomain.com ``` This creates `bugsink_conf.py` in `/home/bugsink/`. Edit it to customize settings: ```bash nano bugsink_conf.py ``` **Key settings to review:** | Setting | Description | | ------------------- | ------------------------------------------------------------------------------- | | `BASE_URL` | The URL where Bugsink will be accessed (e.g., `https://bugsink.yourdomain.com`) | | `SITE_TITLE` | Display name for your Bugsink instance | | `SECRET_KEY` | Auto-generated, but verify it exists | | `TIME_ZONE` | Your timezone (e.g., `America/New_York`) | | `USER_REGISTRATION` | Set to `"closed"` to disable public signup | | `SINGLE_USER` | Set to `True` if only one user will use this instance | ### Step 4: Initialize Database Bugsink uses SQLite by default, which is recommended for single-server setups. Run the database migrations: ```bash bugsink-manage migrate bugsink-manage migrate snappea --database=snappea ``` Verify the database files were created: ```bash ls *.sqlite3 ``` You should see `db.sqlite3` and `snappea.sqlite3`. ### Step 5: Create Admin User Create the superuser account. Using your email as the username is recommended: ```bash bugsink-manage createsuperuser ``` **Important:** Save these credentials - you'll need them to log into the Bugsink web UI. ### Step 6: Verify Configuration Run Django's deployment checks: ```bash bugsink-manage check_migrations bugsink-manage check --deploy --fail-level WARNING ``` Exit back to root for the next steps: ```bash exit ``` ### Step 7: Create Gunicorn Service Create `/etc/systemd/system/gunicorn-bugsink.service`: ```bash sudo nano /etc/systemd/system/gunicorn-bugsink.service ``` Add the following content: ```ini [Unit] Description=Gunicorn daemon for Bugsink After=network.target [Service] Restart=always Type=notify User=bugsink Group=bugsink Environment="PYTHONUNBUFFERED=1" WorkingDirectory=/home/bugsink ExecStart=/home/bugsink/venv/bin/gunicorn \ --bind="127.0.0.1:8000" \ --workers=4 \ --timeout=6 \ --access-logfile - \ --max-requests=1000 \ --max-requests-jitter=100 \ bugsink.wsgi ExecReload=/bin/kill -s HUP $MAINPID KillMode=mixed TimeoutStopSec=5 [Install] WantedBy=multi-user.target ``` Enable and start the service: ```bash sudo systemctl daemon-reload sudo systemctl enable --now gunicorn-bugsink.service sudo systemctl status gunicorn-bugsink.service ``` Test that Gunicorn is responding (replace hostname): ```bash curl http://localhost:8000/accounts/login/ --header "Host: bugsink.yourdomain.com" ``` You should see HTML output containing a login form. ### Step 8: Create Snappea Background Worker Service Snappea is Bugsink's background task processor. Create `/etc/systemd/system/snappea.service`: ```bash sudo nano /etc/systemd/system/snappea.service ``` Add the following content: ```ini [Unit] Description=Snappea daemon for Bugsink background tasks After=network.target [Service] Restart=always User=bugsink Group=bugsink Environment="PYTHONUNBUFFERED=1" WorkingDirectory=/home/bugsink ExecStart=/home/bugsink/venv/bin/bugsink-runsnappea KillMode=mixed TimeoutStopSec=5 RuntimeMaxSec=1d [Install] WantedBy=multi-user.target ``` Enable and start the service: ```bash sudo systemctl daemon-reload sudo systemctl enable --now snappea.service sudo systemctl status snappea.service ``` Verify snappea is working: ```bash sudo su - bugsink source venv/bin/activate bugsink-manage checksnappea exit ``` ### Step 9: Configure NGINX for Bugsink Create `/etc/nginx/sites-available/bugsink`: ```bash sudo nano /etc/nginx/sites-available/bugsink ``` Add the following (replace `bugsink.yourdomain.com` with your hostname): ```nginx server { server_name bugsink.yourdomain.com; listen 80; client_max_body_size 20M; access_log /var/log/nginx/bugsink.access.log; error_log /var/log/nginx/bugsink.error.log; location / { proxy_pass http://127.0.0.1:8000; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-Proto $scheme; } } ``` Enable the site: ```bash sudo ln -s /etc/nginx/sites-available/bugsink /etc/nginx/sites-enabled/ sudo nginx -t sudo systemctl reload nginx ``` ### Step 10: Configure SSL with Certbot (Recommended) ```bash sudo certbot --nginx -d bugsink.yourdomain.com ``` After SSL is configured, update the NGINX config to add security headers. Edit `/etc/nginx/sites-available/bugsink` and add to the `location /` block: ```nginx add_header Strict-Transport-Security "max-age=31536000; preload" always; ``` Reload NGINX: ```bash sudo nginx -t sudo systemctl reload nginx ``` ### Step 11: Create Projects and Get DSNs 1. Access Bugsink UI at `https://bugsink.yourdomain.com` 2. Log in with the admin credentials you created 3. Create a new team (or use the default) 4. Create projects for each environment: **Production:** - **flyer-crawler-backend** (Platform: Node.js) - **flyer-crawler-frontend** (Platform: JavaScript/React) **Test:** - **flyer-crawler-backend-test** (Platform: Node.js) - **flyer-crawler-frontend-test** (Platform: JavaScript/React) 5. For each project, go to Settings → Client Keys (DSN) 6. Copy the DSN URLs - you'll have 4 DSNs total (2 for production, 2 for test) > **Note:** The dev container runs its own local Bugsink instance at `localhost:8000` - no remote DSNs needed for development. ### Step 12: Configure Application to Use Bugsink The flyer-crawler application receives its configuration via **Gitea CI/CD secrets**, not local environment files. Follow these steps to add the Bugsink DSNs: #### 1. Add Secrets in Gitea Navigate to your repository in Gitea: 1. Go to **Settings** → **Actions** → **Secrets** 2. Add the following secrets: **Production DSNs:** | Secret Name | Value | Description | | ----------------- | -------------------------------------- | ----------------------- | | `SENTRY_DSN` | `https://KEY@bugsink.yourdomain.com/1` | Production backend DSN | | `VITE_SENTRY_DSN` | `https://KEY@bugsink.yourdomain.com/2` | Production frontend DSN | **Test DSNs:** | Secret Name | Value | Description | | ---------------------- | -------------------------------------- | ----------------- | | `SENTRY_DSN_TEST` | `https://KEY@bugsink.yourdomain.com/3` | Test backend DSN | | `VITE_SENTRY_DSN_TEST` | `https://KEY@bugsink.yourdomain.com/4` | Test frontend DSN | > **Note:** The project numbers in the DSN URLs are assigned by Bugsink when you create each project. Use the actual DSN values from Step 11. #### 2. Update the Deployment Workflows **Production** (`deploy-to-prod.yml`): In the `Install Backend Dependencies and Restart Production Server` step, add to the `env:` block: ```yaml env: # ... existing secrets ... # Sentry/Bugsink Error Tracking SENTRY_DSN: ${{ secrets.SENTRY_DSN }} SENTRY_ENVIRONMENT: 'production' SENTRY_ENABLED: 'true' ``` In the build step, add frontend variables: ```yaml VITE_SENTRY_DSN=${{ secrets.VITE_SENTRY_DSN }} \ VITE_SENTRY_ENVIRONMENT=production \ VITE_SENTRY_ENABLED=true \ npm run build ``` **Test** (`deploy-to-test.yml`): In the `Install Backend Dependencies and Restart Test Server` step, add to the `env:` block: ```yaml env: # ... existing secrets ... # Sentry/Bugsink Error Tracking (Test) SENTRY_DSN: ${{ secrets.SENTRY_DSN_TEST }} SENTRY_ENVIRONMENT: 'test' SENTRY_ENABLED: 'true' ``` In the build step, add frontend variables: ```yaml VITE_SENTRY_DSN=${{ secrets.VITE_SENTRY_DSN_TEST }} \ VITE_SENTRY_ENVIRONMENT=test \ VITE_SENTRY_ENABLED=true \ npm run build ``` #### 3. Update ecosystem.config.cjs Add Sentry variables to the `sharedEnv` object in `ecosystem.config.cjs`: ```javascript const sharedEnv = { // ... existing variables ... SENTRY_DSN: process.env.SENTRY_DSN, SENTRY_ENVIRONMENT: process.env.SENTRY_ENVIRONMENT, SENTRY_ENABLED: process.env.SENTRY_ENABLED, }; ``` #### 4. Dev Container (No Configuration Needed) The dev container runs its own **local Bugsink instance** at `http://localhost:8000`. No remote DSNs or Gitea secrets are needed for development: - DSNs are pre-configured in `compose.dev.yml` - Admin UI: `http://localhost:8000` (login: `admin@localhost` / `admin`) - Errors stay local and isolated from production/test #### 5. Deploy to Apply Changes Trigger deployments via Gitea Actions: - **Test**: Automatically deploys on push to `main` - **Production**: Manual trigger via workflow dispatch **Note:** There is no `/etc/flyer-crawler/environment` file on the server. Production and test secrets are managed through Gitea CI/CD and injected at deployment time. Dev container uses local `.env` file. See [CLAUDE.md](../CLAUDE.md) for details. ### Step 13: Test Error Tracking You can test Bugsink is working before configuring the flyer-crawler application. Switch to the bugsink user and open a Python shell: ```bash sudo su - bugsink source venv/bin/activate bugsink-manage shell ``` In the Python shell, send a test message using the **backend DSN** from Step 11: ```python import sentry_sdk sentry_sdk.init("https://YOUR_BACKEND_KEY@bugsink.yourdomain.com/1") sentry_sdk.capture_message("Test message from Bugsink setup") exit() ``` Exit back to root: ```bash exit ``` Check the Bugsink UI - you should see the test message appear in the `flyer-crawler-backend` project. ### Step 14: Test from Flyer-Crawler Application (After App Setup) Once the flyer-crawler application has been deployed with the Sentry secrets configured in Step 12: ```bash cd /var/www/flyer-crawler.projectium.com npx tsx scripts/test-bugsink.ts ``` Check the Bugsink UI - you should see a test event appear. ### Bugsink Maintenance Commands | Task | Command | | ----------------------- | ------------------------------------------------------------------------------------------------------------------------------------------- | | View Gunicorn status | `sudo systemctl status gunicorn-bugsink` | | View Snappea status | `sudo systemctl status snappea` | | View Gunicorn logs | `sudo journalctl -u gunicorn-bugsink -f` | | View Snappea logs | `sudo journalctl -u snappea -f` | | Restart Bugsink | `sudo systemctl restart gunicorn-bugsink snappea` | | Run management commands | `sudo su - bugsink` then `source venv/bin/activate && bugsink-manage ` | | Upgrade Bugsink | `sudo su - bugsink && source venv/bin/activate && pip install bugsink --upgrade && exit && sudo systemctl restart gunicorn-bugsink snappea` | --- ## Logstash Log Aggregation Logstash aggregates logs from the application and infrastructure, forwarding errors to Bugsink. > **Note:** Logstash integration is **optional**. The flyer-crawler application already sends errors directly to Bugsink via the Sentry SDK. Logstash is only needed if you want to aggregate logs from other sources (Redis, NGINX, etc.) into Bugsink. ### Step 1: Create Application Log Directory The flyer-crawler application automatically creates its log directory on startup, but you need to ensure proper permissions for Logstash to read the logs. Create the log directories and set appropriate permissions: ```bash # Create log directory for the production application sudo mkdir -p /var/www/flyer-crawler.projectium.com/logs # Set ownership to root (since PM2 runs as root) sudo chown -R root:root /var/www/flyer-crawler.projectium.com/logs # Make logs readable by logstash user sudo chmod 755 /var/www/flyer-crawler.projectium.com/logs ``` For the test environment: ```bash sudo mkdir -p /var/www/flyer-crawler-test.projectium.com/logs sudo chown -R root:root /var/www/flyer-crawler-test.projectium.com/logs sudo chmod 755 /var/www/flyer-crawler-test.projectium.com/logs ``` ### Step 2: Application File Logging (Already Configured) The flyer-crawler application uses Pino for logging and is configured to write logs to files in production/test environments: **Log File Locations:** | Environment | Log File Path | | ------------- | --------------------------------------------------------- | | Production | `/var/www/flyer-crawler.projectium.com/logs/app.log` | | Test | `/var/www/flyer-crawler-test.projectium.com/logs/app.log` | | Dev Container | `/app/logs/app.log` | **How It Works:** - In production/test: Pino writes JSON logs to both stdout (for PM2) AND `logs/app.log` (for Logstash) - In development: Pino uses pino-pretty for human-readable console output only - The log directory is created automatically if it doesn't exist - You can override the log directory with the `LOG_DIR` environment variable **Verify Logging After Deployment:** After deploying the application, verify that logs are being written: ```bash # Check production logs ls -la /var/www/flyer-crawler.projectium.com/logs/ tail -f /var/www/flyer-crawler.projectium.com/logs/app.log # Check test logs ls -la /var/www/flyer-crawler-test.projectium.com/logs/ tail -f /var/www/flyer-crawler-test.projectium.com/logs/app.log ``` You should see JSON-formatted log entries like: ```json { "level": 30, "time": 1704067200000, "msg": "Server started on port 3001", "module": "server" } ``` ### Step 3: Install Logstash ```bash # Add Elastic APT repository wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo gpg --dearmor -o /usr/share/keyrings/elastic-keyring.gpg echo "deb [signed-by=/usr/share/keyrings/elastic-keyring.gpg] https://artifacts.elastic.co/packages/8.x/apt stable main" | sudo tee /etc/apt/sources.list.d/elastic-8.x.list # Update and install sudo apt update sudo apt install -y logstash ``` Verify installation: ```bash /usr/share/logstash/bin/logstash --version ``` ### Step 4: Configure Logstash Pipeline Create the pipeline configuration file: ```bash sudo nano /etc/logstash/conf.d/bugsink.conf ``` Add the following content: ```conf input { # Production application logs (Pino JSON format) file { path => "/var/www/flyer-crawler.projectium.com/logs/app.log" codec => json_lines type => "pino" tags => ["app", "production"] start_position => "end" sincedb_path => "/var/lib/logstash/sincedb_pino_prod" } # Test environment logs file { path => "/var/www/flyer-crawler-test.projectium.com/logs/app.log" codec => json_lines type => "pino" tags => ["app", "test"] start_position => "end" sincedb_path => "/var/lib/logstash/sincedb_pino_test" } # Redis logs (shared by both environments) file { path => "/var/log/redis/redis-server.log" type => "redis" tags => ["infra", "redis", "production"] start_position => "end" sincedb_path => "/var/lib/logstash/sincedb_redis" } # NGINX error logs (production) file { path => "/var/log/nginx/error.log" type => "nginx" tags => ["infra", "nginx", "production"] start_position => "end" sincedb_path => "/var/lib/logstash/sincedb_nginx_error" } # NGINX access logs - for detecting 5xx errors (production) file { path => "/var/log/nginx/access.log" type => "nginx_access" tags => ["infra", "nginx", "production"] start_position => "end" sincedb_path => "/var/lib/logstash/sincedb_nginx_access" } # PM2 error logs - Production (plain text stack traces) file { path => "/home/gitea-runner/.pm2/logs/flyer-crawler-*-error.log" exclude => "*-test-error.log" type => "pm2" tags => ["infra", "pm2", "production"] start_position => "end" sincedb_path => "/var/lib/logstash/sincedb_pm2_prod" } # PM2 error logs - Test file { path => "/home/gitea-runner/.pm2/logs/flyer-crawler-*-test-error.log" type => "pm2" tags => ["infra", "pm2", "test"] start_position => "end" sincedb_path => "/var/lib/logstash/sincedb_pm2_test" } } filter { # Pino log level detection # Pino levels: 10=trace, 20=debug, 30=info, 40=warn, 50=error, 60=fatal if [type] == "pino" and [level] { if [level] >= 50 { mutate { add_tag => ["error"] } } else if [level] >= 40 { mutate { add_tag => ["warning"] } } } # Redis error detection if [type] == "redis" { grok { match => { "message" => "%{POSINT:pid}:%{WORD:role} %{MONTHDAY} %{MONTH} %{YEAR}? ?%{TIME} %{WORD:loglevel} %{GREEDYDATA:redis_message}" } } if [loglevel] in ["WARNING", "ERROR"] { mutate { add_tag => ["error"] } } } # NGINX error log detection (all entries are errors) if [type] == "nginx" { mutate { add_tag => ["error"] } grok { match => { "message" => "%{TIMESTAMP_ISO8601:timestamp} \[%{WORD:severity}\] %{GREEDYDATA:nginx_message}" } } } # NGINX access log - detect 5xx errors if [type] == "nginx_access" { grok { match => { "message" => "%{COMBINEDAPACHELOG}" } } if [response] =~ /^5\d{2}$/ { mutate { add_tag => ["error"] } } } # PM2 error log detection - tag lines with actual error indicators if [type] == "pm2" { if [message] =~ /Error:|error:|ECONNREFUSED|ENOENT|TypeError|ReferenceError|SyntaxError/ { mutate { add_tag => ["error"] } } } } output { # Production app errors -> flyer-crawler-backend (project 1) if "error" in [tags] and "app" in [tags] and "production" in [tags] { http { url => "http://localhost:8000/api/1/store/" http_method => "post" format => "json" headers => { "X-Sentry-Auth" => "Sentry sentry_version=7, sentry_client=logstash/1.0, sentry_key=YOUR_PROD_BACKEND_DSN_KEY" } } } # Test app errors -> flyer-crawler-backend-test (project 3) if "error" in [tags] and "app" in [tags] and "test" in [tags] { http { url => "http://localhost:8000/api/3/store/" http_method => "post" format => "json" headers => { "X-Sentry-Auth" => "Sentry sentry_version=7, sentry_client=logstash/1.0, sentry_key=YOUR_TEST_BACKEND_DSN_KEY" } } } # Production infrastructure errors (Redis, NGINX, PM2) -> flyer-crawler-infrastructure (project 5) if "error" in [tags] and "infra" in [tags] and "production" in [tags] { http { url => "http://localhost:8000/api/5/store/" http_method => "post" format => "json" headers => { "X-Sentry-Auth" => "Sentry sentry_version=7, sentry_client=logstash/1.0, sentry_key=b083076f94fb461b889d5dffcbef43bf" } } } # Test infrastructure errors (PM2 test logs) -> flyer-crawler-test-infrastructure (project 6) if "error" in [tags] and "infra" in [tags] and "test" in [tags] { http { url => "http://localhost:8000/api/6/store/" http_method => "post" format => "json" headers => { "X-Sentry-Auth" => "Sentry sentry_version=7, sentry_client=logstash/1.0, sentry_key=25020dd6c2b74ad78463ec90e90fadab" } } } # Debug output (uncomment to troubleshoot) # stdout { codec => rubydebug } } ``` **Bugsink Project DSNs:** | Project | DSN Key | Project ID | | ----------------------------------- | ---------------------------------- | ---------- | | `flyer-crawler-backend` | `911aef02b9a548fa8fabb8a3c81abfe5` | 1 | | `flyer-crawler-frontend` | (used by app, not Logstash) | 2 | | `flyer-crawler-backend-test` | `cdb99c314589431e83d4cc38a809449b` | 3 | | `flyer-crawler-frontend-test` | (used by app, not Logstash) | 4 | | `flyer-crawler-infrastructure` | `b083076f94fb461b889d5dffcbef43bf` | 5 | | `flyer-crawler-test-infrastructure` | `25020dd6c2b74ad78463ec90e90fadab` | 6 | **Note:** The DSN key is the part before `@` in the full DSN URL (e.g., `https://KEY@bugsink.projectium.com/PROJECT_ID`). **Note on PM2 Logs:** PM2 error logs capture stack traces from stderr, which are valuable for debugging startup errors and uncaught exceptions. Production PM2 logs go to project 5 (infrastructure), test PM2 logs go to project 6 (test-infrastructure). ### Step 5: Create Logstash State Directory and Fix Config Path Logstash needs a directory to track which log lines it has already processed, and a symlink so it can find its config files: ```bash # Create state directory for sincedb files sudo mkdir -p /var/lib/logstash sudo chown logstash:logstash /var/lib/logstash # Create symlink so Logstash finds its config (avoids "Could not find logstash.yml" warning) sudo ln -sf /etc/logstash /usr/share/logstash/config ``` ### Step 6: Grant Logstash Access to Application Logs Logstash runs as the `logstash` user and needs permission to read log files: ```bash # Add logstash user to adm group (for nginx and redis logs) sudo usermod -aG adm logstash # Make application log files readable (created automatically when app starts) sudo chmod 644 /var/www/flyer-crawler.projectium.com/logs/app.log 2>/dev/null || echo "Production log file not yet created" sudo chmod 644 /var/www/flyer-crawler-test.projectium.com/logs/app.log 2>/dev/null || echo "Test log file not yet created" # Make Redis logs and directory readable sudo chmod 755 /var/log/redis/ sudo chmod 644 /var/log/redis/redis-server.log # Make NGINX logs readable sudo chmod 644 /var/log/nginx/access.log /var/log/nginx/error.log # Make PM2 logs and directories accessible sudo chmod 755 /home/gitea-runner/ sudo chmod 755 /home/gitea-runner/.pm2/ sudo chmod 755 /home/gitea-runner/.pm2/logs/ sudo chmod 644 /home/gitea-runner/.pm2/logs/*.log # Verify logstash group membership groups logstash ``` **Note:** The application log files are created automatically when the application starts. Run the chmod commands after the first deployment. ### Step 7: Test Logstash Configuration Test the configuration before starting: ```bash sudo /usr/share/logstash/bin/logstash --config.test_and_exit -f /etc/logstash/conf.d/bugsink.conf ``` You should see `Configuration OK` if there are no errors. ### Step 8: Start Logstash ```bash sudo systemctl enable logstash sudo systemctl start logstash sudo systemctl status logstash ``` View Logstash logs to verify it's working: ```bash sudo journalctl -u logstash -f ``` ### Troubleshooting Logstash | Issue | Solution | | -------------------------- | -------------------------------------------------------------------------------------------------------- | | "Permission denied" errors | Check file permissions on log files and sincedb directory | | No events being processed | Verify log file paths exist and contain data | | HTTP output errors | Check Bugsink is running and DSN key is correct | | Logstash not starting | Run config test: `sudo /usr/share/logstash/bin/logstash --config.test_and_exit -f /etc/logstash/conf.d/` | ### Alternative: Skip Logstash Since the flyer-crawler application already sends errors directly to Bugsink via the Sentry SDK (configured in Steps 11-12), you may not need Logstash at all. Logstash is primarily useful for: - Aggregating logs from services that don't have native Sentry support (Redis, NGINX) - Centralizing all logs in one place - Complex log transformations If you only need application error tracking, the Sentry SDK integration is sufficient. --- ## SSL/TLS with Let's Encrypt ### Install Certbot ```bash sudo apt install -y certbot python3-certbot-nginx ``` ### Obtain Certificate ```bash sudo certbot --nginx -d flyer-crawler.projectium.com ``` Certbot will automatically configure NGINX for HTTPS. ### Auto-Renewal Certbot installs a systemd timer for automatic renewal. Verify: ```bash sudo systemctl status certbot.timer ``` --- ## Firewall Configuration ### Configure UFW ```bash sudo ufw default deny incoming sudo ufw default allow outgoing # Allow SSH sudo ufw allow ssh # Allow HTTP and HTTPS sudo ufw allow 80/tcp sudo ufw allow 443/tcp # Enable firewall sudo ufw enable ``` **Important**: Bugsink (port 8000) should NOT be exposed externally. It listens on localhost only. --- ## Maintenance Commands ### Application Management | Task | Command | | --------------------- | -------------------------------------------------------------------------------------- | | View PM2 status | `pm2 status` | | View application logs | `pm2 logs` | | Restart all processes | `pm2 restart all` | | Restart specific app | `pm2 restart flyer-crawler-api` | | Update application | `cd /opt/flyer-crawler && git pull && npm install && npm run build && pm2 restart all` | ### Service Management | Service | Start | Stop | Status | | ---------- | ----------------------------------- | ---------------------------------- | ------------------------------------ | | PostgreSQL | `sudo systemctl start postgresql` | `sudo systemctl stop postgresql` | `sudo systemctl status postgresql` | | Redis | `sudo systemctl start redis-server` | `sudo systemctl stop redis-server` | `sudo systemctl status redis-server` | | NGINX | `sudo systemctl start nginx` | `sudo systemctl stop nginx` | `sudo systemctl status nginx` | | Bugsink | `sudo systemctl start bugsink` | `sudo systemctl stop bugsink` | `sudo systemctl status bugsink` | | Logstash | `sudo systemctl start logstash` | `sudo systemctl stop logstash` | `sudo systemctl status logstash` | ### Database Backup ```bash # Backup application database pg_dump -U flyer_crawler -h localhost flyer_crawler > backup_$(date +%Y%m%d).sql # Backup Bugsink database pg_dump -U bugsink -h localhost bugsink > bugsink_backup_$(date +%Y%m%d).sql ``` ### Log Locations | Log | Location | | ----------------- | --------------------------- | | Application (PM2) | `~/.pm2/logs/` | | NGINX access | `/var/log/nginx/access.log` | | NGINX error | `/var/log/nginx/error.log` | | PostgreSQL | `/var/log/postgresql/` | | Redis | `/var/log/redis/` | | Bugsink | `journalctl -u bugsink` | | Logstash | `/var/log/logstash/` | --- ## Related Documentation - [DEPLOYMENT.md](../DEPLOYMENT.md) - Container-based deployment - [DATABASE.md](../DATABASE.md) - Database schema and extensions - [AUTHENTICATION.md](../AUTHENTICATION.md) - OAuth provider setup - [ADR-015](adr/0015-application-performance-monitoring-and-error-tracking.md) - Error tracking architecture