All checks were successful
Deploy to Test Environment / deploy-to-test (push) Successful in 16m0s
364 lines
10 KiB
Markdown
364 lines
10 KiB
Markdown
# ADR-046: Image Processing Pipeline
|
|
|
|
**Date**: 2026-01-09
|
|
|
|
**Status**: Accepted
|
|
|
|
**Implemented**: 2026-01-09
|
|
|
|
## Context
|
|
|
|
The application handles significant image processing for flyer uploads:
|
|
|
|
1. **Privacy Protection**: Strip EXIF metadata (location, device info).
|
|
2. **Optimization**: Resize, compress, and convert images for web delivery.
|
|
3. **Icon Generation**: Create thumbnails for listing views.
|
|
4. **Format Support**: Handle JPEG, PNG, WebP, and PDF inputs.
|
|
5. **Storage Management**: Organize processed images on disk.
|
|
|
|
These operations must be:
|
|
|
|
- **Performant**: Large images should not block the request.
|
|
- **Secure**: Prevent malicious file uploads.
|
|
- **Consistent**: Produce predictable output quality.
|
|
- **Testable**: Support unit testing without real files.
|
|
|
|
## Decision
|
|
|
|
We will implement a modular image processing pipeline using:
|
|
|
|
1. **Sharp**: For image resizing, compression, and format conversion.
|
|
2. **EXIF Parsing**: For metadata extraction and stripping.
|
|
3. **UUID Naming**: For unique, non-guessable file names.
|
|
4. **Directory Structure**: Organized storage for originals and derivatives.
|
|
|
|
### Design Principles
|
|
|
|
- **Pipeline Pattern**: Chain processing steps in a predictable order.
|
|
- **Fail-Fast Validation**: Reject invalid files before processing.
|
|
- **Idempotent Operations**: Same input produces same output.
|
|
- **Resource Cleanup**: Delete temp files on error.
|
|
|
|
## Implementation Details
|
|
|
|
### Image Processor Module
|
|
|
|
Located in `src/utils/imageProcessor.ts`:
|
|
|
|
```typescript
|
|
import sharp from 'sharp';
|
|
import path from 'path';
|
|
import { v4 as uuidv4 } from 'uuid';
|
|
import fs from 'fs/promises';
|
|
import type { Logger } from 'pino';
|
|
|
|
// ============================================
|
|
// CONFIGURATION
|
|
// ============================================
|
|
const IMAGE_CONFIG = {
|
|
maxWidth: 2048,
|
|
maxHeight: 2048,
|
|
quality: 85,
|
|
iconSize: 200,
|
|
allowedFormats: ['jpeg', 'png', 'webp', 'avif'],
|
|
outputFormat: 'webp' as const,
|
|
};
|
|
|
|
// ============================================
|
|
// MAIN PROCESSING FUNCTION
|
|
// ============================================
|
|
export async function processAndSaveImage(
|
|
inputPath: string,
|
|
outputDir: string,
|
|
originalFileName: string,
|
|
logger: Logger,
|
|
): Promise<string> {
|
|
const outputFileName = `${uuidv4()}.${IMAGE_CONFIG.outputFormat}`;
|
|
const outputPath = path.join(outputDir, outputFileName);
|
|
|
|
logger.info({ inputPath, outputPath }, 'Processing image');
|
|
|
|
try {
|
|
// Create sharp instance and strip metadata
|
|
await sharp(inputPath)
|
|
.rotate() // Auto-rotate based on EXIF orientation
|
|
.resize(IMAGE_CONFIG.maxWidth, IMAGE_CONFIG.maxHeight, {
|
|
fit: 'inside',
|
|
withoutEnlargement: true,
|
|
})
|
|
.webp({ quality: IMAGE_CONFIG.quality })
|
|
.toFile(outputPath);
|
|
|
|
logger.info({ outputPath }, 'Image processed successfully');
|
|
return outputFileName;
|
|
} catch (error) {
|
|
logger.error({ error, inputPath }, 'Image processing failed');
|
|
throw error;
|
|
}
|
|
}
|
|
```
|
|
|
|
### Icon Generation
|
|
|
|
```typescript
|
|
export async function generateFlyerIcon(
|
|
inputPath: string,
|
|
iconsDir: string,
|
|
logger: Logger,
|
|
): Promise<string> {
|
|
// Ensure icons directory exists
|
|
await fs.mkdir(iconsDir, { recursive: true });
|
|
|
|
const iconFileName = `${uuidv4()}-icon.webp`;
|
|
const iconPath = path.join(iconsDir, iconFileName);
|
|
|
|
logger.info({ inputPath, iconPath }, 'Generating icon');
|
|
|
|
await sharp(inputPath)
|
|
.resize(IMAGE_CONFIG.iconSize, IMAGE_CONFIG.iconSize, {
|
|
fit: 'cover',
|
|
position: 'top', // Flyers usually have store name at top
|
|
})
|
|
.webp({ quality: 80 })
|
|
.toFile(iconPath);
|
|
|
|
logger.info({ iconPath }, 'Icon generated successfully');
|
|
return iconFileName;
|
|
}
|
|
```
|
|
|
|
### EXIF Metadata Extraction
|
|
|
|
For audit/logging purposes before stripping:
|
|
|
|
```typescript
|
|
import ExifParser from 'exif-parser';
|
|
|
|
export async function extractExifMetadata(
|
|
filePath: string,
|
|
logger: Logger,
|
|
): Promise<ExifMetadata | null> {
|
|
try {
|
|
const buffer = await fs.readFile(filePath);
|
|
const parser = ExifParser.create(buffer);
|
|
const result = parser.parse();
|
|
|
|
const metadata: ExifMetadata = {
|
|
make: result.tags?.Make,
|
|
model: result.tags?.Model,
|
|
dateTime: result.tags?.DateTimeOriginal,
|
|
gpsLatitude: result.tags?.GPSLatitude,
|
|
gpsLongitude: result.tags?.GPSLongitude,
|
|
orientation: result.tags?.Orientation,
|
|
};
|
|
|
|
// Log if GPS data was present (privacy concern)
|
|
if (metadata.gpsLatitude || metadata.gpsLongitude) {
|
|
logger.info({ filePath }, 'GPS data found in image, will be stripped during processing');
|
|
}
|
|
|
|
return metadata;
|
|
} catch (error) {
|
|
logger.debug({ error, filePath }, 'No EXIF data found or parsing failed');
|
|
return null;
|
|
}
|
|
}
|
|
```
|
|
|
|
### PDF to Image Conversion
|
|
|
|
```typescript
|
|
import * as pdfjs from 'pdfjs-dist';
|
|
|
|
export async function convertPdfToImages(
|
|
pdfPath: string,
|
|
outputDir: string,
|
|
logger: Logger,
|
|
): Promise<string[]> {
|
|
const pdfData = await fs.readFile(pdfPath);
|
|
const pdf = await pdfjs.getDocument({ data: pdfData }).promise;
|
|
|
|
const outputPaths: string[] = [];
|
|
|
|
for (let i = 1; i <= pdf.numPages; i++) {
|
|
const page = await pdf.getPage(i);
|
|
const viewport = page.getViewport({ scale: 2.0 }); // 2x for quality
|
|
|
|
// Create canvas and render
|
|
const canvas = createCanvas(viewport.width, viewport.height);
|
|
const context = canvas.getContext('2d');
|
|
|
|
await page.render({
|
|
canvasContext: context,
|
|
viewport: viewport,
|
|
}).promise;
|
|
|
|
// Save as image
|
|
const outputFileName = `${uuidv4()}-page-${i}.png`;
|
|
const outputPath = path.join(outputDir, outputFileName);
|
|
const buffer = canvas.toBuffer('image/png');
|
|
await fs.writeFile(outputPath, buffer);
|
|
|
|
outputPaths.push(outputPath);
|
|
logger.info({ page: i, outputPath }, 'PDF page converted to image');
|
|
}
|
|
|
|
return outputPaths;
|
|
}
|
|
```
|
|
|
|
### File Validation
|
|
|
|
```typescript
|
|
import { fileTypeFromBuffer } from 'file-type';
|
|
|
|
export async function validateImageFile(
|
|
filePath: string,
|
|
logger: Logger,
|
|
): Promise<{ valid: boolean; mimeType: string | null; error?: string }> {
|
|
try {
|
|
const buffer = await fs.readFile(filePath, { length: 4100 }); // Read header only
|
|
const type = await fileTypeFromBuffer(buffer);
|
|
|
|
if (!type) {
|
|
return { valid: false, mimeType: null, error: 'Unknown file type' };
|
|
}
|
|
|
|
const allowedMimes = ['image/jpeg', 'image/png', 'image/webp', 'image/avif', 'application/pdf'];
|
|
|
|
if (!allowedMimes.includes(type.mime)) {
|
|
return {
|
|
valid: false,
|
|
mimeType: type.mime,
|
|
error: `File type ${type.mime} not allowed`,
|
|
};
|
|
}
|
|
|
|
return { valid: true, mimeType: type.mime };
|
|
} catch (error) {
|
|
logger.error({ error, filePath }, 'File validation failed');
|
|
return { valid: false, mimeType: null, error: 'Validation error' };
|
|
}
|
|
}
|
|
```
|
|
|
|
### Storage Organization
|
|
|
|
```
|
|
flyer-images/
|
|
├── originals/ # Uploaded files (if kept)
|
|
│ └── {uuid}.{ext}
|
|
├── processed/ # Optimized images (or root level)
|
|
│ └── {uuid}.webp
|
|
├── icons/ # Thumbnails
|
|
│ └── {uuid}-icon.webp
|
|
└── temp/ # Temporary processing files
|
|
└── {uuid}.tmp
|
|
```
|
|
|
|
### Cleanup Utilities
|
|
|
|
```typescript
|
|
export async function cleanupTempFiles(
|
|
tempDir: string,
|
|
maxAgeMs: number,
|
|
logger: Logger,
|
|
): Promise<number> {
|
|
const files = await fs.readdir(tempDir);
|
|
const now = Date.now();
|
|
let deletedCount = 0;
|
|
|
|
for (const file of files) {
|
|
const filePath = path.join(tempDir, file);
|
|
const stats = await fs.stat(filePath);
|
|
const age = now - stats.mtimeMs;
|
|
|
|
if (age > maxAgeMs) {
|
|
await fs.unlink(filePath);
|
|
deletedCount++;
|
|
}
|
|
}
|
|
|
|
logger.info({ deletedCount, tempDir }, 'Cleaned up temp files');
|
|
return deletedCount;
|
|
}
|
|
```
|
|
|
|
### Integration with Flyer Processing
|
|
|
|
```typescript
|
|
// In flyerProcessingService.ts
|
|
export async function processUploadedFlyer(
|
|
file: Express.Multer.File,
|
|
logger: Logger,
|
|
): Promise<{ imageUrl: string; iconUrl: string }> {
|
|
const flyerImageDir = 'flyer-images';
|
|
const iconsDir = path.join(flyerImageDir, 'icons');
|
|
|
|
// 1. Validate file
|
|
const validation = await validateImageFile(file.path, logger);
|
|
if (!validation.valid) {
|
|
throw new ValidationError([{ path: 'file', message: validation.error! }]);
|
|
}
|
|
|
|
// 2. Extract and log EXIF before stripping
|
|
await extractExifMetadata(file.path, logger);
|
|
|
|
// 3. Process and optimize image
|
|
const processedFileName = await processAndSaveImage(
|
|
file.path,
|
|
flyerImageDir,
|
|
file.originalname,
|
|
logger,
|
|
);
|
|
|
|
// 4. Generate icon
|
|
const processedImagePath = path.join(flyerImageDir, processedFileName);
|
|
const iconFileName = await generateFlyerIcon(processedImagePath, iconsDir, logger);
|
|
|
|
// 5. Construct URLs
|
|
const baseUrl = process.env.BACKEND_URL || 'http://localhost:3001';
|
|
const imageUrl = `${baseUrl}/flyer-images/${processedFileName}`;
|
|
const iconUrl = `${baseUrl}/flyer-images/icons/${iconFileName}`;
|
|
|
|
// 6. Delete original upload (privacy)
|
|
await fs.unlink(file.path);
|
|
|
|
return { imageUrl, iconUrl };
|
|
}
|
|
```
|
|
|
|
## Consequences
|
|
|
|
### Positive
|
|
|
|
- **Privacy**: EXIF metadata (including GPS) is stripped automatically.
|
|
- **Performance**: WebP output reduces file sizes by 25-35%.
|
|
- **Consistency**: All images processed to standard format and dimensions.
|
|
- **Security**: File type validation prevents malicious uploads.
|
|
- **Organization**: Clear directory structure for storage management.
|
|
|
|
### Negative
|
|
|
|
- **CPU Intensive**: Image processing can be slow for large files.
|
|
- **Storage**: Keeping originals doubles storage requirements.
|
|
- **Dependency**: Sharp requires native binaries.
|
|
|
|
### Mitigation
|
|
|
|
- Process images in background jobs (BullMQ queue).
|
|
- Configure whether to keep originals based on requirements.
|
|
- Use pre-built Sharp binaries via npm.
|
|
|
|
## Key Files
|
|
|
|
- `src/utils/imageProcessor.ts` - Core image processing functions
|
|
- `src/services/flyer/flyerProcessingService.ts` - Integration with flyer workflow
|
|
- `src/middleware/fileUpload.middleware.ts` - Multer configuration
|
|
|
|
## Related ADRs
|
|
|
|
- [ADR-033](./0033-file-upload-and-storage-strategy.md) - File Upload Strategy
|
|
- [ADR-006](./0006-background-job-processing-and-task-queues.md) - Background Jobs
|
|
- [ADR-041](./0041-ai-gemini-integration-architecture.md) - AI Integration (uses processed images)
|