Files
flyer-crawler.projectium.com/docs/adr/0033-file-upload-and-storage-strategy.md

197 lines
6.0 KiB
Markdown

# ADR-033: File Upload and Storage Strategy
**Date**: 2026-01-09
**Status**: Accepted
**Implemented**: 2026-01-09
## Context
The application handles file uploads for flyer images and user avatars. Without a consistent strategy, file uploads can introduce security vulnerabilities (path traversal, malicious file types), performance issues (unbounded file sizes), and maintenance challenges (inconsistent storage locations).
Key concerns:
1. **Security**: Preventing malicious file uploads, path traversal attacks, and unsafe filenames
2. **Storage Organization**: Consistent directory structure for uploaded files
3. **Size Limits**: Preventing resource exhaustion from oversized uploads
4. **File Type Validation**: Ensuring only expected file types are accepted
5. **Cleanup**: Managing temporary and orphaned files
## Decision
We will implement a centralized file upload strategy using `multer` middleware with custom storage configurations, file type validation, and size limits.
### Storage Types
| Type | Directory | Purpose | Size Limit |
| -------- | ------------------------------ | ------------------------------ | ---------- |
| `flyer` | `$STORAGE_PATH` (configurable) | Flyer images for AI processing | 100MB |
| `avatar` | `public/uploads/avatars/` | User profile pictures | 5MB |
### Filename Strategy
All uploaded files are renamed to prevent:
- Path traversal attacks
- Filename collisions
- Problematic characters in filenames
**Pattern**: `{fieldname}-{timestamp}-{random}-{sanitized-original}`
Example: `flyer-1704825600000-829461742-grocery-flyer.jpg`
### File Type Validation
Only image files (`image/*` MIME type) are accepted. Non-image uploads are rejected with a structured `ValidationError`.
## Implementation Details
### Multer Configuration Factory
```typescript
import { createUploadMiddleware } from '../middleware/multer.middleware';
// For flyer uploads (100MB limit)
const flyerUpload = createUploadMiddleware({
storageType: 'flyer',
fileSize: 100 * 1024 * 1024, // 100MB
fileFilter: 'image',
});
// For avatar uploads (5MB limit)
const avatarUpload = createUploadMiddleware({
storageType: 'avatar',
fileSize: 5 * 1024 * 1024, // 5MB
fileFilter: 'image',
});
```
### Storage Configuration
```typescript
// Configurable via environment variable
export const flyerStoragePath =
process.env.STORAGE_PATH || '/var/www/flyer-crawler.projectium.com/flyer-images';
// Relative to project root
export const avatarStoragePath = path.join(process.cwd(), 'public', 'uploads', 'avatars');
```
### Filename Sanitization
The `sanitizeFilename` utility removes dangerous characters:
```typescript
// Removes: path separators, null bytes, special characters
// Keeps: alphanumeric, dots, hyphens, underscores
const sanitized = sanitizeFilename(file.originalname);
```
### Required File Validation Middleware
Ensures a file was uploaded before processing:
```typescript
import { requireFileUpload } from '../middleware/fileUpload.middleware';
router.post(
'/upload',
flyerUpload.single('flyerImage'),
requireFileUpload('flyerImage'), // 400 error if missing
handleMulterError,
async (req, res) => {
// req.file is guaranteed to exist
},
);
```
### Error Handling
```typescript
import { handleMulterError } from '../middleware/multer.middleware';
// Catches multer-specific errors (file too large, etc.)
router.use(handleMulterError);
```
### Directory Initialization
Storage directories are created automatically at application startup:
```typescript
(async () => {
await fs.mkdir(flyerStoragePath, { recursive: true });
await fs.mkdir(avatarStoragePath, { recursive: true });
})();
```
### Test Environment Handling
In test environments, files use predictable names for easy cleanup:
```typescript
if (process.env.NODE_ENV === 'test') {
return cb(null, `test-avatar${path.extname(file.originalname) || '.png'}`);
}
```
## Usage Example
```typescript
import { createUploadMiddleware, handleMulterError } from '../middleware/multer.middleware';
import { requireFileUpload } from '../middleware/fileUpload.middleware';
import { validateRequest } from '../middleware/validation.middleware';
import { aiUploadLimiter } from '../config/rateLimiters';
const flyerUpload = createUploadMiddleware({
storageType: 'flyer',
fileSize: 100 * 1024 * 1024,
fileFilter: 'image',
});
router.post(
'/upload-and-process',
aiUploadLimiter,
validateRequest(uploadSchema),
flyerUpload.single('flyerImage'),
requireFileUpload('flyerImage'),
handleMulterError,
async (req, res, next) => {
const filePath = req.file!.path;
// Process the uploaded file...
},
);
```
## Key Files
- `src/middleware/multer.middleware.ts` - Multer configuration and storage handlers
- `src/middleware/fileUpload.middleware.ts` - File requirement validation
- `src/utils/stringUtils.ts` - Filename sanitization utilities
- `src/utils/fileUtils.ts` - File system utilities (deletion, etc.)
## Consequences
### Positive
- **Security**: Prevents path traversal and malicious uploads through sanitization and validation
- **Consistency**: All uploads follow the same patterns and storage organization
- **Predictability**: Test environments use predictable filenames for cleanup
- **Extensibility**: Factory pattern allows easy addition of new upload types
### Negative
- **Disk Storage**: Files stored on disk require backup and cleanup strategies
- **Single Server**: Current implementation doesn't support cloud storage (S3, etc.)
- **No Virus Scanning**: Files aren't scanned for malware before processing
## Future Enhancements
1. **Cloud Storage**: Support for S3/GCS as storage backend
2. **Virus Scanning**: Integrate ClamAV or cloud-based scanning
3. **Image Optimization**: Automatic resizing/compression before storage
4. **CDN Integration**: Serve uploaded files through CDN
5. **Cleanup Job**: Scheduled job to remove orphaned/temporary files
6. **Presigned URLs**: Direct upload to cloud storage to reduce server load