LLM Processing Deep Dive | Aegis Photo Voyager

Aegis Photo Voyager leverages state-of-the-art Large Language Models (LLMs) and Vector Embeddings to transform your local photo library into an intelligent, searchable, and immersive experience. This processing happens entirely under your control, primarily using local or self-hosted models to ensure maximum privacy.

1. Vision-Based Metadata Extraction (The "Brain")

This is the most comprehensive layer of processing. When you run "AI Analysis" on your photos, the application uses a Vision LLM (typically granite3.2-vision:2b) to "look" at your photos and videos and extract rich metadata.

How it Works

Model: Integrated via Ollama or LM Studio.
Processing: For photos, the image is encoded and sent to the LLM. For videos, the application extracts key frames to analyze the content.
Structured Output: The LLM is prompted to return a precise JSON object, which is then parsed into the application's database.

What is Extracted?

Executive Summary: A concise one-sentence description of the photo.
Narrative Description: A detailed multi-sentence description of the scene, lighting, and composition.
Tags & Objects: Automated labeling of items (e.g., "mountain", "bicycle").
Technical Quality: AI-driven evaluation of sharpness, graininess, and focus.

How to Access: You can find these results in the Sidebar when viewing a photo in Detail View. You can also use the Filter Pills in the main gallery sidebar to narrow down your library by mood, category, or detected objects.

2. Visual Similarity & Clustering (The "Eye")

Beyond understanding text, Aegis Photo Voyager understands visual relationships. It uses CLIP-style models to create "mathematical signatures" (embeddings) for your photos.

How it Works

Technology: Uses Local ONNX models (CLIP-ViT-B-32) processed directly on your CPU/GPU.
Dimensionality Reduction: Since these signatures are complex, the application uses PCA (Principal Component Analysis) to flatten them into 2D or 3D coordinates.

Why it's Useful:

Similar Photo Search: Right-click any photo and select "Search Similar" to find photos with a similar visual aesthetic or composition.
The 2D/3D Map: View your entire library as a "star map" where visually similar photos are physically clustered together.

How to Access: Switch to the 2D Cluster View or 3D Cluster View from the Views menu at the top of the application.

3. Semantic Natural Language Search (The "Voice")

Unlike traditional keyword search that only looks for exact matches, Semantic Search understands the meaning of your request.

How it Works

Model: Uses a text embedding model (like mxbai-embed-large) via Ollama.
The Magic: When you type a query, the application converts your words into a vector and compares it against the vectors of the AI-generated descriptions.

Example Queries:

"A peaceful afternoon by the lake"
"Someone celebrating a birthday with a cake"
"The feeling of a cold winter morning"

How to Access: In the top search bar, change the dropdown from "Keyword" to "Semantic", then type your query and press Enter.

4. Immersive Photo Journeys

The "Photo Journey" feature uses the combined power of all the AI layers to create an automated, themed slideshow experience.

Available Journeys

Person Through Time: Uses face recognition and AI dates to follow someone's life.
Activity Journey: Groups photos by AI-detected actions.
Semantic Journey: Dynamically builds a slideshow based on a natural language prompt.

How to Access: Click the Slideshow button in the top menu and select "Photo Journey...".

Summary of Benefits

Zero Manual Tagging

The AI handles the "grunt work" of organization, automatically extracting tags, objects, and descriptions.

Privacy First

All processing can run locally; your photos never need to leave your machine.

Discovery

Find hidden gems in your library that you forgot existed using visual and semantic relationships.