Files
flyer-crawler.projectium.com/docs/subagents/DATABASE-GUIDE.md
Torben Sorensen 45ac4fccf5
Some checks failed
Deploy to Test Environment / deploy-to-test (push) Failing after 2m15s
comprehensive documentation review + test fixes
2026-01-28 16:35:38 -08:00

432 lines
13 KiB
Markdown

# Database Subagent Guide
This guide covers two database-focused subagents:
- **db-dev**: Database development - schemas, queries, migrations, optimization
- **db-admin**: Database administration - PostgreSQL/Redis admin, security, backups
## Quick Reference
| Aspect | db-dev | db-admin |
| ---------------- | -------------------------------------------- | ------------------------------------------ |
| **Primary Use** | Schemas, queries, migrations | Performance tuning, backups, security |
| **Key Files** | `src/services/db/*.db.ts`, `sql/migrations/` | `postgresql.conf`, `pg_hba.conf` |
| **Key ADRs** | ADR-034 (Repository), ADR-002 (Transactions) | ADR-019 (Backups), ADR-050 (Observability) |
| **Test Command** | `podman exec -it flyer-crawler-dev npm test` | N/A |
| **MCP Tool** | `mcp__devdb__query` | SSH to production |
| **Delegate To** | `coder` (service layer), `db-admin` (perf) | `devops` (infrastructure) |
## Understanding the Difference
| Aspect | db-dev | db-admin |
| --------------- | ----------------------------------- | ------------------------------------- |
| **Focus** | Application database code | Infrastructure and operations |
| **Tasks** | Queries, migrations, repositories | Performance tuning, backups, security |
| **Output** | SQL migrations, repository methods | Configuration, monitoring scripts |
| **When to Use** | Adding features, optimizing queries | Production issues, capacity planning |
## The db-dev Subagent
### When to Use
Use the **db-dev** subagent when you need to:
- Design new database tables or modify existing ones
- Write SQL queries or optimize existing ones
- Create database migrations
- Implement repository pattern methods
- Fix N+1 query problems
- Add indexes for performance
- Work with PostGIS spatial queries
### What db-dev Knows
The db-dev subagent has deep knowledge of:
- Project database schema (`sql/master_schema_rollup.sql`)
- Repository pattern standards (ADR-034)
- Transaction management (ADR-002)
- PostgreSQL-specific features (PostGIS, pg_trgm, etc.)
- Schema synchronization requirements
### Schema Synchronization (Critical)
> **Schema files MUST stay synchronized:**
>
> | File | Purpose |
> | ------------------------------ | --------------------------------- |
> | `sql/master_schema_rollup.sql` | Test DB setup, complete reference |
> | `sql/initial_schema.sql` | Fresh install schema |
> | `sql/migrations/*.sql` | Production incremental changes |
When db-dev creates a migration, it will also update the schema files.
### Example Requests
**Adding a new table:**
```
"Use db-dev to design a table for storing user recipe reviews.
Include fields for rating (1-5), review text, and relationships
to users and recipes. Create the migration and update schema files."
```
**Optimizing a slow query:**
```
"Use db-dev to optimize the query that lists flyers with their
item counts. It's currently doing N+1 queries and takes too long
with many flyers."
```
**Adding spatial search:**
```
"Use db-dev to add the ability to search stores within a radius
of a given location using PostGIS. Include the migration for
adding the geography column."
```
### Repository Pattern Standards
The db-dev subagent follows these naming conventions:
| Prefix | Returns | Behavior on Not Found |
| --------- | ------------------- | ------------------------------------ |
| `get*` | Single entity | Throws `NotFoundError` |
| `find*` | Entity or `null` | Returns `null` |
| `list*` | Array | Returns `[]` |
| `create*` | Created entity | Throws on constraint violation |
| `update*` | Updated entity | Throws `NotFoundError` if not exists |
| `delete*` | `void` or `boolean` | Throws `NotFoundError` if not exists |
### Example Migration
```sql
-- sql/migrations/20260121_add_recipe_reviews.sql
-- Create recipe_reviews table
CREATE TABLE IF NOT EXISTS recipe_reviews (
review_id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
recipe_id UUID NOT NULL REFERENCES recipes(recipe_id) ON DELETE CASCADE,
user_id UUID NOT NULL REFERENCES users(user_id) ON DELETE CASCADE,
rating INTEGER NOT NULL CHECK (rating >= 1 AND rating <= 5),
review_text TEXT,
created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP,
UNIQUE (recipe_id, user_id)
);
-- Add indexes
CREATE INDEX idx_recipe_reviews_recipe_id ON recipe_reviews(recipe_id);
CREATE INDEX idx_recipe_reviews_user_id ON recipe_reviews(user_id);
CREATE INDEX idx_recipe_reviews_rating ON recipe_reviews(rating);
-- Add trigger for updated_at
CREATE TRIGGER update_recipe_reviews_updated_at
BEFORE UPDATE ON recipe_reviews
FOR EACH ROW
EXECUTE FUNCTION update_updated_at_column();
```
### Example Repository Method
```typescript
// src/services/db/recipeReview.db.ts
import { handleDbError, NotFoundError } from './errors.db';
export async function getReviewById(reviewId: string, client?: PoolClient): Promise<RecipeReview> {
const queryable = client || getPool();
try {
const result = await queryable.query<RecipeReview>(
`SELECT * FROM recipe_reviews WHERE review_id = $1`,
[reviewId],
);
if (result.rows.length === 0) {
throw new NotFoundError(`Review with ID ${reviewId} not found.`);
}
return result.rows[0];
} catch (error) {
handleDbError(
error,
log,
'Database error in getReviewById',
{ reviewId },
{
entityName: 'RecipeReview',
defaultMessage: 'Failed to fetch review.',
},
);
}
}
export async function listReviewsByRecipeId(
recipeId: string,
options: { limit?: number; offset?: number } = {},
client?: PoolClient,
): Promise<RecipeReview[]> {
const queryable = client || getPool();
const { limit = 50, offset = 0 } = options;
try {
const result = await queryable.query<RecipeReview>(
`SELECT * FROM recipe_reviews
WHERE recipe_id = $1
ORDER BY created_at DESC
LIMIT $2 OFFSET $3`,
[recipeId, limit, offset],
);
return result.rows;
} catch (error) {
handleDbError(
error,
log,
'Database error in listReviewsByRecipeId',
{ recipeId, limit, offset },
{
entityName: 'RecipeReview',
defaultMessage: 'Failed to list reviews.',
},
);
}
}
```
## The db-admin Subagent
### When to Use
Use the **db-admin** subagent when you need to:
- Debug production database issues
- Configure PostgreSQL settings
- Set up database backups
- Analyze slow query logs
- Configure Redis for production
- Plan database capacity
- Manage database users and permissions
- Handle replication or failover
### What db-admin Knows
The db-admin subagent understands:
- PostgreSQL configuration and tuning
- Redis configuration for BullMQ queues
- Backup and recovery strategies (ADR-019)
- Connection pooling settings
- Production deployment setup
- Bugsink PostgreSQL observability (ADR-050)
### Example Requests
**Performance tuning:**
```
"Use db-admin to analyze why the database is running slow.
Check connection pool settings, identify slow queries, and
recommend PostgreSQL configuration changes."
```
**Backup configuration:**
```
"Use db-admin to set up daily automated backups for the
production database with 30-day retention."
```
**User management:**
```
"Use db-admin to create a read-only database user for
reporting purposes that can only SELECT from specific tables."
```
### Database Users
| User | Database | Purpose |
| -------------------- | -------------------- | ---------------------- |
| `flyer_crawler_prod` | `flyer-crawler-prod` | Production |
| `flyer_crawler_test` | `flyer-crawler-test` | Testing |
| `postgres` | All | Superuser (admin only) |
### Creating Database Users
```sql
-- As postgres superuser
CREATE DATABASE "flyer-crawler-test";
CREATE USER flyer_crawler_test WITH PASSWORD 'secure_password';
ALTER DATABASE "flyer-crawler-test" OWNER TO flyer_crawler_test;
\c "flyer-crawler-test"
ALTER SCHEMA public OWNER TO flyer_crawler_test;
GRANT CREATE, USAGE ON SCHEMA public TO flyer_crawler_test;
CREATE EXTENSION IF NOT EXISTS "uuid-ossp";
```
### PostgreSQL Configuration Guidance
For production, db-admin may recommend settings like:
```ini
# /etc/postgresql/14/main/conf.d/performance.conf
# Connection settings
max_connections = 100
shared_buffers = 256MB
# Query optimization
effective_cache_size = 768MB
random_page_cost = 1.1
# Write performance
wal_buffers = 16MB
checkpoint_completion_target = 0.9
# Logging
log_min_duration_statement = 1000 # Log queries over 1 second
```
### Redis Configuration Guidance
For BullMQ queues:
```ini
# /etc/redis/redis.conf
# Memory management
maxmemory 256mb
maxmemory-policy noeviction # BullMQ requires this
# Persistence
appendonly yes
appendfsync everysec
# Security
requirepass your_redis_password
```
## Common Database Tasks
### Running Migrations in Production
```bash
# SSH to production server
ssh root@projectium.com
# Run migration
cd /var/www/flyer-crawler.projectium.com
npm run db:migrate
```
### Checking Database Health
```bash
# Connection count
psql -c "SELECT count(*) FROM pg_stat_activity WHERE datname = 'flyer-crawler-prod';"
# Table sizes
psql -d "flyer-crawler-prod" -c "
SELECT
tablename,
pg_size_pretty(pg_total_relation_size(schemaname || '.' || tablename)) as size
FROM pg_tables
WHERE schemaname = 'public'
ORDER BY pg_total_relation_size(schemaname || '.' || tablename) DESC
LIMIT 10;"
# Slow queries
psql -d "flyer-crawler-prod" -c "
SELECT
calls,
mean_exec_time::numeric(10,2) as avg_ms,
query
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 5;"
```
### Database Backup Commands
```bash
# Manual backup
pg_dump -U flyer_crawler_prod -h localhost "flyer-crawler-prod" > backup_$(date +%Y%m%d).sql
# Restore from backup
psql -U flyer_crawler_prod -h localhost "flyer-crawler-prod" < backup_20260121.sql
```
## N+1 Query Detection
The db-dev subagent is particularly skilled at identifying N+1 query problems:
**Problematic Pattern:**
```typescript
// BAD: N+1 queries
const flyers = await listFlyers();
for (const flyer of flyers) {
flyer.items = await listItemsByFlyerId(flyer.flyer_id); // N queries!
}
```
**Optimized Pattern:**
```typescript
// GOOD: Single query with JOIN or separate batch query
const flyersWithItems = await listFlyersWithItems(); // 1 query
// Or with batching:
const flyers = await listFlyers();
const flyerIds = flyers.map((f) => f.flyer_id);
const allItems = await listItemsByFlyerIds(flyerIds); // 1 query
// Group items by flyer_id in application code
```
## Working with PostGIS
The project uses PostGIS for spatial queries. Example:
```sql
-- Find stores within 10km of a location
SELECT
s.store_id,
s.name,
ST_Distance(
sl.location::geography,
ST_MakePoint(-79.3832, 43.6532)::geography
) / 1000 as distance_km
FROM stores s
JOIN store_locations sl ON s.store_id = sl.store_id
WHERE ST_DWithin(
sl.location::geography,
ST_MakePoint(-79.3832, 43.6532)::geography,
10000 -- 10km in meters
)
ORDER BY distance_km;
```
## MCP Database Access
For direct database queries during development, use the MCP server:
```
// Query the dev database
mcp__devdb__query("SELECT * FROM flyers LIMIT 5")
```
This is useful for:
- Verifying data during debugging
- Checking schema state
- Testing queries before implementing
## Related Documentation
- [OVERVIEW.md](./OVERVIEW.md) - Subagent system overview
- [CODER-GUIDE.md](./CODER-GUIDE.md) - Working with the coder subagent
- [DEVOPS-GUIDE.md](./DEVOPS-GUIDE.md) - DevOps and deployment workflows
- [../adr/0034-repository-pattern-standards.md](../adr/0034-repository-pattern-standards.md) - Repository patterns
- [../adr/0002-standardized-transaction-management.md](../adr/0002-standardized-transaction-management.md) - Transaction management
- [../adr/0019-data-backup-and-recovery-strategy.md](../adr/0019-data-backup-and-recovery-strategy.md) - Backup strategy
- [../adr/0050-postgresql-function-observability.md](../adr/0050-postgresql-function-observability.md) - Database observability
- [../operations/BARE-METAL-SETUP.md](../operations/BARE-METAL-SETUP.md) - Production database setup