GUIDE

How to find and load a vision LLM

Learn how to choose the right AI model for your hardware to power Aegis Photo Voyager.

← Back to Support

Aegis Photo Voyager relies on "Vision LLMs" (Large Language Models capable of seeing) to describe your photos. Choosing the right one is key to balancing performance and accuracy.

1. Look for the "Vision" Tag

Not all AI models can "see" images. When browsing the Ollama Library, you must look for models that specifically have the "vision" tag. Standard text models will not work for image analysis.

2. Check Model Size (Parameters)

Models come in different sizes, usually measured in billions of parameters (e.g., 2b, 7b, 32b).

  • Larger models are generally "smarter" and more detailed but require significantly more RAM and processing power (VRAM).
  • Smaller models run faster and work well on laptops without dedicated graphics cards.

3. Our Recommendation: granite3.2-vision:2b

For most users, especially those on laptops without a dedicated GPU, we have had excellent results with granite3.2-vision:2b. It is lightweight, fast, and provides surprisingly accurate descriptions. Read more about it here: LLM Vision Models Benchmark

ollama pull granite3.2-vision:2b

4. Test Your Model

Don't be afraid to experiment! You can download multiple vision models. Use the "Test" feature within Aegis Photo Voyager to compare their performance and description quality on your own photos before committing to one for your entire library.