Welcome to the Frontier
Aegis Photo Voyager is a next-generation desktop application designed for power users, photographers, and archivists. Unlike traditional photo managers that rely on rigid folder structures or manual tagging, Aegis uses Large Language Models (LLMs) to understand the story and mathematical signature of every image.
By leveraging local LLM environments, we ensure that user data never leaves their machine. To achieve this, Aegis utilizes two primary pillars of AI: Metadata Extraction for human-readable discovery and Embeddings for visual similarity searching.
Core Concepts
📝 Semantic Metadata Extraction
This is the "brain" of the application. Using vision-capable models, we analyze pixels to generate descriptive tags, summaries, and structured data.
- ✔ Identifies objects, locations, and text (OCR)
- ✔ Assigns subjective "mood" and "composition" tags
- ✔ Enables complex natural language filtering
🔢 Visual Embeddings
The "DNA" of an image. We convert visual features into a high-dimensional vector space where similar concepts exist in close mathematical proximity.
- ✔ Powers "Find similar photos" features
- ✔ Enables visual clustering (grouping by style)
- ✔ Fast, mathematical retrieval using Vector DBs
Recommended Environments
Choosing the right backend is critical for the Aegis Photo Voyager pipeline. We support two primary environments: Ollama (for automation) and LM Studio (for exploration).
| Feature | Ollama | LM Studio |
|---|---|---|
| User Experience | CLI & Background Service. Best for high-volume batch processing. | GUI-First. Best for manually testing different vision models. |
| API Performance | Ultra-fast HTTP endpoint; Python library is native and robust. | Local Server mode (OpenAI-compatible) available via toggle. |
| Windows Support | Fast, but works best when run through WSL2 for GPU access. | Native Windows app with seamless slider for GPU offloading. |
| Apple Silicon | Native Metal support; extremely low overhead. | Visual performance charts; optimized for Apple's MLX backend. |
Performance Optimization
Granite-Vision
For Aegis Photo Voyager, we recommend IBM Granite-Vision. This model is highly specialized for "Visual Document Understanding," making it superior for reading labels, dates, and detailed object lists within photos.
Granite-Vision fits comfortably into 8GB-16GB machines, providing "Pro" results on consumer-grade hardware.
The 512px Efficiency Protocol
The Aegis pipeline automatically resizes inputs to 512px width. Benchmarks show <1% drop in tag accuracy when scaling from 4K to 512px for semantic extraction, while providing a 4x speed improvement.
Implementation Guide
📦 Ollama (Developer Choice)
Binary Install: Visit ollama.com and download the agent.
Pull Granite-Vision:
ollama pull granite-vision:3.1-2b
Start API Service: Ollama runs in the background at localhost:11434 by default.
🖥️ LM Studio (Curator Choice)
Visual Interface: Download from lmstudio.ai.
Search for GGUF: Search for "Granite Vision" in the discovery tab. Select the Bartowski quantizations.
Activate Local Server: Click the ↔️ icon on the left to start an OpenAI-compatible server.