Snappy: Your Vision Retrieval Buddy!

Snappy_light_readme

From Template to Tool: The Evolution of Snappy

A few months back, I shared The Most Beautiful RAG, a template for building vision-based document retrieval systems using ColPali embeddings. That project was about exploring the possibilities; what happens when you stop treating documents as pure text and start treating them as visual artifacts?

Snappy is the evolution of that template and it seeks to come closer to production-readiness.

Repo: https://github.com/athrael-soju/Snappy
Documentation:

Quick View

What Is Snappy?

Snappy is a vision-first document retrieval system. Instead of relying on OCR to extract text from PDFs (and losing all the context that comes with layout, tables, handwriting, and visual markers), Snappy treats each page as an image and searches based on visual similarity.

Think of it this way: when you're looking for a specific invoice, you're not just remembering the words on it; you're remembering the logo, the table layout, maybe a stamp or signature. Snappy searches the same way.

Core pieces:

FastAPI backend with routes for indexing, search, chat (streaming), and maintenance
ColPali embedding service (CPU or GPU) for vision-based embeddings
Qdrant for multivector search with optional binary quantization
MinIO for object storage of page images
Next.js 16 frontend with React 19.2 for a responsive, real-time UI
OpenAI Responses API for streaming chat with visual citations

Why Vision-First Retrieval?

Traditional RAG pipelines rely on text extraction. You parse a PDF, pull out the text, chunk it, embed it, and search. This works fine for clean, text-heavy documents. But what about:

Legal documents with annotations and stamps
Medical records with handwritten notes and diagrams
Financial statements with specific table layouts and logos
Academic papers where figures, equations, and charts matter as much as the text
Historical archives where the visual appearance is part of the context

OCR-based approaches lose this information. Vision-first retrieval preserves it.

Snappy uses ColPali-style embeddings; multivector representations that capture both textual content and visual structure. Each page is rasterized, embedded as a sequence of patch-level vectors, and stored alongside the original image. When you search, you're matching against the full visual representation, not just extracted text.

From Template to Production

The original nextjs-fastapi-colpali-template was a proof of concept. It demonstrated that vision-based retrieval could work, but it was minimal; it needed frontend polish, more robust live progress tracking, configuration UI, and had no production-ready deployment options.

Snappy takes that foundation and builds on it:

What's New

Live indexing progress: Server-Sent Events stream real-time updates as PDFs are processed, so you're not left wondering if anything is happening.
Configuration UI: A schema-driven settings page lets you tweak everything from search limits to quantization settings without touching .env files. Changes are validated at runtime, and you can reset to defaults or draft changes before committing.
Streaming chat with visual inline citations: The chat interface streams responses token-by-token and shows you exactly which pages the AI is referencing, with inline image previews.
Pre-built Docker images: Deploy with pre-built containers from GitHub Container Registry, or build from source if you need customization.
Advanced configuration: From the UI, you can adjust search parameters, performance settings, storage & retrieval settings, upload limits and more.
Modern frontend stack: Next.js 16 with React 19.2, design tokens for consistent styling, and responsive layouts that work on mobile.

What Stayed the Same

The core architecture is still there:

Page-level retrieval with ColPali embeddings
Qdrant multivector search (rows/cols prefetch + reranking)
MinIO for image storage with public URLs
FastAPI backend with modular routers
Docker Compose for local development

If you used the template before, you'll recognize the structure. But hopefully you'll find the experience much more polished.

How to Use Snappy

Quick Start (Docker Compose)

# 1) Configure environment
cp .env.example .env
cp frontend/.env.example frontend/.env.local
# Add your OpenAI API key to frontend/.env.local

# 2) Start the ColPali embedding service (pick CPU or GPU)
cd colpali
docker compose --profile cpu up -d --build  # or --profile gpu

# 3) Start all services
cd ..
docker compose up -d --build

# Services will be available at:
# Backend:  http://localhost:8000
# Frontend: http://localhost:3000
# Qdrant:   http://localhost:6333
# MinIO:    http://localhost:9000

Using Pre-built Images

If you don't want to build from source, pull the pre-built images:

docker pull ghcr.io/athrael-soju/snappy/backend:latest
docker pull ghcr.io/athrael-soju/snappy/frontend:latest
docker pull ghcr.io/athrael-soju/snappy/colpali-cpu:latest

See docs/DOCKER_IMAGES.md for deployment examples.

Local Development

If you prefer running services locally:

Start Qdrant and MinIO (Docker is fine)
Start the ColPali service (from colpali/)

In backend/, create a venv, install dependencies, and run:

uvicorn backend:app --host 0.0.0.0 --port 8000 --reload

In frontend/, install and run:

yarn install --frozen-lockfile
yarn dev

Real-World Use Cases

Snappy excels at scenarios where visual layout matters:

Legal document analysis: Search case files, contracts, and briefs by annotations, stamps, and document structure.
Medical records retrieval: Find patient charts and diagnostic reports by handwritten notes, stamps, and visual markers.
Financial auditing: Locate invoices, receipts, and statements by logos, signatures, and table layouts.
Academic research: Search papers by figures, tables, equations, and charts; ideal for literature reviews.
Archive management: Retrieve historical documents and scanned archives by visual appearance, preserving context that text extraction destroys.
Engineering documentation: Find blueprints, schematics, and technical drawings by visual elements and layout patterns.

The Frontend Experience

The Next.js frontend is designed to be fast and intuitive:

Upload: Drag-and-drop PDFs with live progress tracking
Search: Type a query and see visually similar pages ranked by relevance
Chat: Ask questions and get streaming responses with inline page citations
Configuration: Adjust search parameters, quantization settings, and upload limits from the UI
Maintenance: Clear collections, delete documents, and reinitialize services without touching the backend

Design tokens (text-body-*, size-icon-*) keep the UI consistent, and the responsive layout works on desktop and mobile.

Under the Hood

Indexing Flow

PDF → images (pdf2image.convert_from_path)
Images → embeddings (ColPali API)
Save images to MinIO (public URLs)
Upsert embeddings (original + mean-pooled rows/cols) to Qdrant with payload metadata

Retrieval Flow

Query → embedding (ColPali API)
Qdrant multivector prefetch (rows/cols), then rerank with using="original"
Fetch top-k page images from MinIO
Stream an OpenAI-backed answer conditioned on user text + page images

Data Model in Qdrant

Each point stores three vectors (multivector):

original: full token sequence
mean_pooling_rows: pooled by rows
mean_pooling_columns: pooled by columns

Payload includes metadata like filename, page index, image URL, dimensions, and timestamps.

Optional Binary Quantization

Enable binary quantization in Qdrant to reduce memory and speed up search:

# In .env
QDRANT_USE_BINARY=True
QDRANT_BINARY_ALWAYS_RAM=True
QDRANT_SEARCH_RESCORE=True
QDRANT_SEARCH_OVERSAMPLING=2.0

Clear the collection and re-index after changing these settings.

API Overview

The backend exposes a clean REST API:

GET /health - check dependencies
GET /search?q=...&k=5 - top-k results with payload metadata
POST /index (multipart files[]) - upload and index PDFs
GET /progress/stream/{job_id} - Server-Sent Events for indexing progress
POST /index/cancel/{job_id} - cancel an in-progress indexing job
GET /status - collection stats and service health
POST /initialize - create collections and buckets
DELETE /delete - delete specific documents
POST /clear/qdrant | /clear/minio | /clear/all - maintenance
GET /config/schema - get the configuration schema
GET /config/values - get current configuration
POST /config/update - update runtime configuration
POST /config/reset - reset to defaults

Chat streaming lives in frontend/app/api/chat/route.ts. The route calls the backend search endpoint, invokes the OpenAI Responses API, and streams Server-Sent Events to the browser. The backend does not proxy OpenAI calls.

Troubleshooting

ColPali timing out? Increase COLPALI_API_TIMEOUT or run the GPU profile.
Progress bar stuck? Ensure Poppler is installed and check backend logs for PDF conversion errors.
Missing images? Verify MinIO credentials/URLs and confirm next.config.ts allows the domains you expect.
CORS issues? Replace wildcard ALLOWED_ORIGINS entries with explicit URLs before exposing the API publicly.
Config changes vanish? /config/update modifies runtime state only; update .env for anything you need to keep after a restart.
Upload rejected? The uploader currently accepts PDFs only. Adjust max size, chunk size, or file count limits in the configuration UI.

See backend/docs/configuration.md for advanced troubleshooting.

What's Next?

Some ideas I'm planning to explore in the future :

Multi-page understanding: To track context across multiple pages, allowing Snappy to get the "big picture".
Knowledge Graph Integration: To Enhance Snappy with temporal reasoning and help overcome topK limitations.
Snappy Agents: To introduce a Vision-first Agentic workflow, simplify ingestion, retrieval and bring Snappy 1 step closer to production.

If you have ideas or run into issues, open an issue or PR on GitHub.

Credits & Links

Snappy repo: https://github.com/athrael-soju/Snappy
Component READMEs: backend, frontend, colpali
Original template: fastapi-nextjs-colpali-template
Earlier exploration: /blog/the-most-beautiful-vision-rag
ColPali paper: https://arxiv.org/abs/2407.01449
Qdrant blog: https://qdrant.tech/blog/colpali-qdrant-optimization/

I hope you find Snappy useful. If you're working on document retrieval, especially in domains where visual structure matters, give it a try. And if you build something cool with it, let me know!

Thanks for reading.