Files
flyer-crawler.projectium.com/docs/adr/0046-image-processing-pipeline.md
Torben Sorensen e14c19c112
All checks were successful
Deploy to Test Environment / deploy-to-test (push) Successful in 16m0s
linting docs + some fixes go claude and gemini
2026-01-09 22:38:57 -08:00

10 KiB

ADR-046: Image Processing Pipeline

Date: 2026-01-09

Status: Accepted

Implemented: 2026-01-09

Context

The application handles significant image processing for flyer uploads:

  1. Privacy Protection: Strip EXIF metadata (location, device info).
  2. Optimization: Resize, compress, and convert images for web delivery.
  3. Icon Generation: Create thumbnails for listing views.
  4. Format Support: Handle JPEG, PNG, WebP, and PDF inputs.
  5. Storage Management: Organize processed images on disk.

These operations must be:

  • Performant: Large images should not block the request.
  • Secure: Prevent malicious file uploads.
  • Consistent: Produce predictable output quality.
  • Testable: Support unit testing without real files.

Decision

We will implement a modular image processing pipeline using:

  1. Sharp: For image resizing, compression, and format conversion.
  2. EXIF Parsing: For metadata extraction and stripping.
  3. UUID Naming: For unique, non-guessable file names.
  4. Directory Structure: Organized storage for originals and derivatives.

Design Principles

  • Pipeline Pattern: Chain processing steps in a predictable order.
  • Fail-Fast Validation: Reject invalid files before processing.
  • Idempotent Operations: Same input produces same output.
  • Resource Cleanup: Delete temp files on error.

Implementation Details

Image Processor Module

Located in src/utils/imageProcessor.ts:

import sharp from 'sharp';
import path from 'path';
import { v4 as uuidv4 } from 'uuid';
import fs from 'fs/promises';
import type { Logger } from 'pino';

// ============================================
// CONFIGURATION
// ============================================
const IMAGE_CONFIG = {
  maxWidth: 2048,
  maxHeight: 2048,
  quality: 85,
  iconSize: 200,
  allowedFormats: ['jpeg', 'png', 'webp', 'avif'],
  outputFormat: 'webp' as const,
};

// ============================================
// MAIN PROCESSING FUNCTION
// ============================================
export async function processAndSaveImage(
  inputPath: string,
  outputDir: string,
  originalFileName: string,
  logger: Logger,
): Promise<string> {
  const outputFileName = `${uuidv4()}.${IMAGE_CONFIG.outputFormat}`;
  const outputPath = path.join(outputDir, outputFileName);

  logger.info({ inputPath, outputPath }, 'Processing image');

  try {
    // Create sharp instance and strip metadata
    await sharp(inputPath)
      .rotate() // Auto-rotate based on EXIF orientation
      .resize(IMAGE_CONFIG.maxWidth, IMAGE_CONFIG.maxHeight, {
        fit: 'inside',
        withoutEnlargement: true,
      })
      .webp({ quality: IMAGE_CONFIG.quality })
      .toFile(outputPath);

    logger.info({ outputPath }, 'Image processed successfully');
    return outputFileName;
  } catch (error) {
    logger.error({ error, inputPath }, 'Image processing failed');
    throw error;
  }
}

Icon Generation

export async function generateFlyerIcon(
  inputPath: string,
  iconsDir: string,
  logger: Logger,
): Promise<string> {
  // Ensure icons directory exists
  await fs.mkdir(iconsDir, { recursive: true });

  const iconFileName = `${uuidv4()}-icon.webp`;
  const iconPath = path.join(iconsDir, iconFileName);

  logger.info({ inputPath, iconPath }, 'Generating icon');

  await sharp(inputPath)
    .resize(IMAGE_CONFIG.iconSize, IMAGE_CONFIG.iconSize, {
      fit: 'cover',
      position: 'top', // Flyers usually have store name at top
    })
    .webp({ quality: 80 })
    .toFile(iconPath);

  logger.info({ iconPath }, 'Icon generated successfully');
  return iconFileName;
}

EXIF Metadata Extraction

For audit/logging purposes before stripping:

import ExifParser from 'exif-parser';

export async function extractExifMetadata(
  filePath: string,
  logger: Logger,
): Promise<ExifMetadata | null> {
  try {
    const buffer = await fs.readFile(filePath);
    const parser = ExifParser.create(buffer);
    const result = parser.parse();

    const metadata: ExifMetadata = {
      make: result.tags?.Make,
      model: result.tags?.Model,
      dateTime: result.tags?.DateTimeOriginal,
      gpsLatitude: result.tags?.GPSLatitude,
      gpsLongitude: result.tags?.GPSLongitude,
      orientation: result.tags?.Orientation,
    };

    // Log if GPS data was present (privacy concern)
    if (metadata.gpsLatitude || metadata.gpsLongitude) {
      logger.info({ filePath }, 'GPS data found in image, will be stripped during processing');
    }

    return metadata;
  } catch (error) {
    logger.debug({ error, filePath }, 'No EXIF data found or parsing failed');
    return null;
  }
}

PDF to Image Conversion

import * as pdfjs from 'pdfjs-dist';

export async function convertPdfToImages(
  pdfPath: string,
  outputDir: string,
  logger: Logger,
): Promise<string[]> {
  const pdfData = await fs.readFile(pdfPath);
  const pdf = await pdfjs.getDocument({ data: pdfData }).promise;

  const outputPaths: string[] = [];

  for (let i = 1; i <= pdf.numPages; i++) {
    const page = await pdf.getPage(i);
    const viewport = page.getViewport({ scale: 2.0 }); // 2x for quality

    // Create canvas and render
    const canvas = createCanvas(viewport.width, viewport.height);
    const context = canvas.getContext('2d');

    await page.render({
      canvasContext: context,
      viewport: viewport,
    }).promise;

    // Save as image
    const outputFileName = `${uuidv4()}-page-${i}.png`;
    const outputPath = path.join(outputDir, outputFileName);
    const buffer = canvas.toBuffer('image/png');
    await fs.writeFile(outputPath, buffer);

    outputPaths.push(outputPath);
    logger.info({ page: i, outputPath }, 'PDF page converted to image');
  }

  return outputPaths;
}

File Validation

import { fileTypeFromBuffer } from 'file-type';

export async function validateImageFile(
  filePath: string,
  logger: Logger,
): Promise<{ valid: boolean; mimeType: string | null; error?: string }> {
  try {
    const buffer = await fs.readFile(filePath, { length: 4100 }); // Read header only
    const type = await fileTypeFromBuffer(buffer);

    if (!type) {
      return { valid: false, mimeType: null, error: 'Unknown file type' };
    }

    const allowedMimes = ['image/jpeg', 'image/png', 'image/webp', 'image/avif', 'application/pdf'];

    if (!allowedMimes.includes(type.mime)) {
      return {
        valid: false,
        mimeType: type.mime,
        error: `File type ${type.mime} not allowed`,
      };
    }

    return { valid: true, mimeType: type.mime };
  } catch (error) {
    logger.error({ error, filePath }, 'File validation failed');
    return { valid: false, mimeType: null, error: 'Validation error' };
  }
}

Storage Organization

flyer-images/
├── originals/           # Uploaded files (if kept)
│   └── {uuid}.{ext}
├── processed/           # Optimized images (or root level)
│   └── {uuid}.webp
├── icons/               # Thumbnails
│   └── {uuid}-icon.webp
└── temp/                # Temporary processing files
    └── {uuid}.tmp

Cleanup Utilities

export async function cleanupTempFiles(
  tempDir: string,
  maxAgeMs: number,
  logger: Logger,
): Promise<number> {
  const files = await fs.readdir(tempDir);
  const now = Date.now();
  let deletedCount = 0;

  for (const file of files) {
    const filePath = path.join(tempDir, file);
    const stats = await fs.stat(filePath);
    const age = now - stats.mtimeMs;

    if (age > maxAgeMs) {
      await fs.unlink(filePath);
      deletedCount++;
    }
  }

  logger.info({ deletedCount, tempDir }, 'Cleaned up temp files');
  return deletedCount;
}

Integration with Flyer Processing

// In flyerProcessingService.ts
export async function processUploadedFlyer(
  file: Express.Multer.File,
  logger: Logger,
): Promise<{ imageUrl: string; iconUrl: string }> {
  const flyerImageDir = 'flyer-images';
  const iconsDir = path.join(flyerImageDir, 'icons');

  // 1. Validate file
  const validation = await validateImageFile(file.path, logger);
  if (!validation.valid) {
    throw new ValidationError([{ path: 'file', message: validation.error! }]);
  }

  // 2. Extract and log EXIF before stripping
  await extractExifMetadata(file.path, logger);

  // 3. Process and optimize image
  const processedFileName = await processAndSaveImage(
    file.path,
    flyerImageDir,
    file.originalname,
    logger,
  );

  // 4. Generate icon
  const processedImagePath = path.join(flyerImageDir, processedFileName);
  const iconFileName = await generateFlyerIcon(processedImagePath, iconsDir, logger);

  // 5. Construct URLs
  const baseUrl = process.env.BACKEND_URL || 'http://localhost:3001';
  const imageUrl = `${baseUrl}/flyer-images/${processedFileName}`;
  const iconUrl = `${baseUrl}/flyer-images/icons/${iconFileName}`;

  // 6. Delete original upload (privacy)
  await fs.unlink(file.path);

  return { imageUrl, iconUrl };
}

Consequences

Positive

  • Privacy: EXIF metadata (including GPS) is stripped automatically.
  • Performance: WebP output reduces file sizes by 25-35%.
  • Consistency: All images processed to standard format and dimensions.
  • Security: File type validation prevents malicious uploads.
  • Organization: Clear directory structure for storage management.

Negative

  • CPU Intensive: Image processing can be slow for large files.
  • Storage: Keeping originals doubles storage requirements.
  • Dependency: Sharp requires native binaries.

Mitigation

  • Process images in background jobs (BullMQ queue).
  • Configure whether to keep originals based on requirements.
  • Use pre-built Sharp binaries via npm.

Key Files

  • src/utils/imageProcessor.ts - Core image processing functions
  • src/services/flyer/flyerProcessingService.ts - Integration with flyer workflow
  • src/middleware/fileUpload.middleware.ts - Multer configuration
  • ADR-033 - File Upload Strategy
  • ADR-006 - Background Jobs
  • ADR-041 - AI Integration (uses processed images)