Skip to content
Athrael.net logo Athrael.net
Go back

Snappy: Your Vision Retrieval Buddy!

Edit page

Snappy_light_readme

From Template to Tool: The Evolution of Snappy

A few months back, I shared The Most Beautiful RAG, a template for building vision-based document retrieval systems using ColPali embeddings. That project was about exploring the possibilities; what happens when you stop treating documents as pure text and start treating them as visual artifacts?

Snappy is the evolution of that template and it seeks to come closer to production-readiness.

Quick View

What Is Snappy?

Snappy is a vision-first document retrieval system. Instead of relying on OCR to extract text from PDFs (and losing all the context that comes with layout, tables, handwriting, and visual markers), Snappy treats each page as an image and searches based on visual similarity.

Think of it this way: when you’re looking for a specific invoice, you’re not just remembering the words on it; you’re remembering the logo, the table layout, maybe a stamp or signature. Snappy searches the same way.

Core pieces:

Why Vision-First Retrieval?

Traditional RAG pipelines rely on text extraction. You parse a PDF, pull out the text, chunk it, embed it, and search. This works fine for clean, text-heavy documents. But what about:

OCR-based approaches lose this information. Vision-first retrieval preserves it.

Snappy uses ColPali-style embeddings; multivector representations that capture both textual content and visual structure. Each page is rasterized, embedded as a sequence of patch-level vectors, and stored alongside the original image. When you search, you’re matching against the full visual representation, not just extracted text.

From Template to Production

The original nextjs-fastapi-colpali-template was a proof of concept. It demonstrated that vision-based retrieval could work, but it was minimal; it needed frontend polish, more robust live progress tracking, configuration UI, and had no production-ready deployment options.

Snappy takes that foundation and builds on it:

What’s New

What Stayed the Same

The core architecture is still there:

If you used the template before, you’ll recognize the structure. But hopefully you’ll find the experience much more polished.

How to Use Snappy

Quick Start (Docker Compose)

# 1) Configure environment
cp .env.example .env
cp frontend/.env.example frontend/.env.local
# Add your OpenAI API key to frontend/.env.local

# 2) Start the ColPali embedding service (pick CPU or GPU)
cd colpali
docker compose --profile cpu up -d --build  # or --profile gpu

# 3) Start all services
cd ..
docker compose up -d --build

# Services will be available at:
# Backend:  http://localhost:8000
# Frontend: http://localhost:3000
# Qdrant:   http://localhost:6333
# MinIO:    http://localhost:9000

Using Pre-built Images

If you don’t want to build from source, pull the pre-built images:

docker pull ghcr.io/athrael-soju/snappy/backend:latest
docker pull ghcr.io/athrael-soju/snappy/frontend:latest
docker pull ghcr.io/athrael-soju/snappy/colpali-cpu:latest

See docs/DOCKER_IMAGES.md for deployment examples.

Local Development

If you prefer running services locally:

  1. Start Qdrant and MinIO (Docker is fine)
  2. Start the ColPali service (from colpali/)
  3. In backend/, create a venv, install dependencies, and run:
    uvicorn backend:app --host 0.0.0.0 --port 8000 --reload
  4. In frontend/, install and run:
    yarn install --frozen-lockfile
    yarn dev

Real-World Use Cases

Snappy excels at scenarios where visual layout matters:

The Frontend Experience

The Next.js frontend is designed to be fast and intuitive:

Design tokens (text-body-*, size-icon-*) keep the UI consistent, and the responsive layout works on desktop and mobile.

Under the Hood

Indexing Flow

  1. PDF → images (pdf2image.convert_from_path)
  2. Images → embeddings (ColPali API)
  3. Save images to MinIO (public URLs)
  4. Upsert embeddings (original + mean-pooled rows/cols) to Qdrant with payload metadata

Retrieval Flow

  1. Query → embedding (ColPali API)
  2. Qdrant multivector prefetch (rows/cols), then rerank with using="original"
  3. Fetch top-k page images from MinIO
  4. Stream an OpenAI-backed answer conditioned on user text + page images

Data Model in Qdrant

Each point stores three vectors (multivector):

Payload includes metadata like filename, page index, image URL, dimensions, and timestamps.

Optional Binary Quantization

Enable binary quantization in Qdrant to reduce memory and speed up search:

# In .env
QDRANT_USE_BINARY=True
QDRANT_BINARY_ALWAYS_RAM=True
QDRANT_SEARCH_RESCORE=True
QDRANT_SEARCH_OVERSAMPLING=2.0

Clear the collection and re-index after changing these settings.

API Overview

The backend exposes a clean REST API:

Chat streaming lives in frontend/app/api/chat/route.ts. The route calls the backend search endpoint, invokes the OpenAI Responses API, and streams Server-Sent Events to the browser. The backend does not proxy OpenAI calls.

Troubleshooting

See backend/docs/configuration.md for advanced troubleshooting.

What’s Next?

Some ideas I’m planning to explore in the future :

If you have ideas or run into issues, open an issue or PR on GitHub.

I hope you find Snappy useful. If you’re working on document retrieval, especially in domains where visual structure matters, give it a try. And if you build something cool with it, let me know!

Thanks for reading.


Edit page
Share this post on:

Previous Post
Implementing Spatially-Grounded Document Retrieval via Patch-to-Region Propagation
Next Post
The Most Beautiful RAG: Starring ColPali, Qdrant, Minio and Friends