logo
Published on

Snappy: Your Vision Retrieval Buddy!

Authors
  • avatar
    Name
    Athos Georgiou
    Twitter

Snappy_light_readme

From Template to Tool: The Evolution of Snappy

A few months back, I shared The Most Beautiful RAG, a template for building vision-based document retrieval systems using ColPali embeddings. That project was about exploring the possibilities; what happens when you stop treating documents as pure text and start treating them as visual artifacts?

Snappy is the evolution of that template and it seeks to come closer to production-readiness.

Quick View

What Is Snappy?

Snappy is a vision-first document retrieval system. Instead of relying on OCR to extract text from PDFs (and losing all the context that comes with layout, tables, handwriting, and visual markers), Snappy treats each page as an image and searches based on visual similarity.

Think of it this way: when you're looking for a specific invoice, you're not just remembering the words on it; you're remembering the logo, the table layout, maybe a stamp or signature. Snappy searches the same way.

Core pieces:

  • FastAPI backend with routes for indexing, search, chat (streaming), and maintenance
  • ColPali embedding service (CPU or GPU) for vision-based embeddings
  • Qdrant for multivector search with optional binary quantization
  • MinIO for object storage of page images
  • Next.js 16 frontend with React 19.2 for a responsive, real-time UI
  • OpenAI Responses API for streaming chat with visual citations

Why Vision-First Retrieval?

Traditional RAG pipelines rely on text extraction. You parse a PDF, pull out the text, chunk it, embed it, and search. This works fine for clean, text-heavy documents. But what about:

  • Legal documents with annotations and stamps
  • Medical records with handwritten notes and diagrams
  • Financial statements with specific table layouts and logos
  • Academic papers where figures, equations, and charts matter as much as the text
  • Historical archives where the visual appearance is part of the context

OCR-based approaches lose this information. Vision-first retrieval preserves it.

Snappy uses ColPali-style embeddings; multivector representations that capture both textual content and visual structure. Each page is rasterized, embedded as a sequence of patch-level vectors, and stored alongside the original image. When you search, you're matching against the full visual representation, not just extracted text.

From Template to Production

The original nextjs-fastapi-colpali-template was a proof of concept. It demonstrated that vision-based retrieval could work, but it was minimal; it needed frontend polish, more robust live progress tracking, configuration UI, and had no production-ready deployment options.

Snappy takes that foundation and builds on it:

What's New

  • Live indexing progress: Server-Sent Events stream real-time updates as PDFs are processed, so you're not left wondering if anything is happening.
  • Configuration UI: A schema-driven settings page lets you tweak everything from search limits to quantization settings without touching .env files. Changes are validated at runtime, and you can reset to defaults or draft changes before committing.
  • Streaming chat with visual inline citations: The chat interface streams responses token-by-token and shows you exactly which pages the AI is referencing, with inline image previews.
  • Pre-built Docker images: Deploy with pre-built containers from GitHub Container Registry, or build from source if you need customization.
  • Advanced configuration: From the UI, you can adjust search parameters, performance settings, storage & retrieval settings, upload limits and more.
  • Modern frontend stack: Next.js 16 with React 19.2, design tokens for consistent styling, and responsive layouts that work on mobile.

What Stayed the Same

The core architecture is still there:

  • Page-level retrieval with ColPali embeddings
  • Qdrant multivector search (rows/cols prefetch + reranking)
  • MinIO for image storage with public URLs
  • FastAPI backend with modular routers
  • Docker Compose for local development

If you used the template before, you'll recognize the structure. But hopefully you'll find the experience much more polished.

How to Use Snappy

Quick Start (Docker Compose)

# 1) Configure environment
cp .env.example .env
cp frontend/.env.example frontend/.env.local
# Add your OpenAI API key to frontend/.env.local

# 2) Start the ColPali embedding service (pick CPU or GPU)
cd colpali
docker compose --profile cpu up -d --build  # or --profile gpu

# 3) Start all services
cd ..
docker compose up -d --build

# Services will be available at:
# Backend:  http://localhost:8000
# Frontend: http://localhost:3000
# Qdrant:   http://localhost:6333
# MinIO:    http://localhost:9000

Using Pre-built Images

If you don't want to build from source, pull the pre-built images:

docker pull ghcr.io/athrael-soju/snappy/backend:latest
docker pull ghcr.io/athrael-soju/snappy/frontend:latest
docker pull ghcr.io/athrael-soju/snappy/colpali-cpu:latest

See docs/DOCKER_IMAGES.md for deployment examples.

Local Development

If you prefer running services locally:

  1. Start Qdrant and MinIO (Docker is fine)
  2. Start the ColPali service (from colpali/)
  3. In backend/, create a venv, install dependencies, and run:
    uvicorn backend:app --host 0.0.0.0 --port 8000 --reload
    
  4. In frontend/, install and run:
    yarn install --frozen-lockfile
    yarn dev
    

Real-World Use Cases

Snappy excels at scenarios where visual layout matters:

  • Legal document analysis: Search case files, contracts, and briefs by annotations, stamps, and document structure.
  • Medical records retrieval: Find patient charts and diagnostic reports by handwritten notes, stamps, and visual markers.
  • Financial auditing: Locate invoices, receipts, and statements by logos, signatures, and table layouts.
  • Academic research: Search papers by figures, tables, equations, and charts; ideal for literature reviews.
  • Archive management: Retrieve historical documents and scanned archives by visual appearance, preserving context that text extraction destroys.
  • Engineering documentation: Find blueprints, schematics, and technical drawings by visual elements and layout patterns.

The Frontend Experience

The Next.js frontend is designed to be fast and intuitive:

  • Upload: Drag-and-drop PDFs with live progress tracking
  • Search: Type a query and see visually similar pages ranked by relevance
  • Chat: Ask questions and get streaming responses with inline page citations
  • Configuration: Adjust search parameters, quantization settings, and upload limits from the UI
  • Maintenance: Clear collections, delete documents, and reinitialize services without touching the backend

Design tokens (text-body-*, size-icon-*) keep the UI consistent, and the responsive layout works on desktop and mobile.

Under the Hood

Indexing Flow

  1. PDF → images (pdf2image.convert_from_path)
  2. Images → embeddings (ColPali API)
  3. Save images to MinIO (public URLs)
  4. Upsert embeddings (original + mean-pooled rows/cols) to Qdrant with payload metadata

Retrieval Flow

  1. Query → embedding (ColPali API)
  2. Qdrant multivector prefetch (rows/cols), then rerank with using="original"
  3. Fetch top-k page images from MinIO
  4. Stream an OpenAI-backed answer conditioned on user text + page images

Data Model in Qdrant

Each point stores three vectors (multivector):

  • original: full token sequence
  • mean_pooling_rows: pooled by rows
  • mean_pooling_columns: pooled by columns

Payload includes metadata like filename, page index, image URL, dimensions, and timestamps.

Optional Binary Quantization

Enable binary quantization in Qdrant to reduce memory and speed up search:

# In .env
QDRANT_USE_BINARY=True
QDRANT_BINARY_ALWAYS_RAM=True
QDRANT_SEARCH_RESCORE=True
QDRANT_SEARCH_OVERSAMPLING=2.0

Clear the collection and re-index after changing these settings.

API Overview

The backend exposes a clean REST API:

  • GET /health - check dependencies
  • GET /search?q=...&k=5 - top-k results with payload metadata
  • POST /index (multipart files[]) - upload and index PDFs
  • GET /progress/stream/{job_id} - Server-Sent Events for indexing progress
  • POST /index/cancel/{job_id} - cancel an in-progress indexing job
  • GET /status - collection stats and service health
  • POST /initialize - create collections and buckets
  • DELETE /delete - delete specific documents
  • POST /clear/qdrant | /clear/minio | /clear/all - maintenance
  • GET /config/schema - get the configuration schema
  • GET /config/values - get current configuration
  • POST /config/update - update runtime configuration
  • POST /config/reset - reset to defaults

Chat streaming lives in frontend/app/api/chat/route.ts. The route calls the backend search endpoint, invokes the OpenAI Responses API, and streams Server-Sent Events to the browser. The backend does not proxy OpenAI calls.

Troubleshooting

  • ColPali timing out? Increase COLPALI_API_TIMEOUT or run the GPU profile.
  • Progress bar stuck? Ensure Poppler is installed and check backend logs for PDF conversion errors.
  • Missing images? Verify MinIO credentials/URLs and confirm next.config.ts allows the domains you expect.
  • CORS issues? Replace wildcard ALLOWED_ORIGINS entries with explicit URLs before exposing the API publicly.
  • Config changes vanish? /config/update modifies runtime state only; update .env for anything you need to keep after a restart.
  • Upload rejected? The uploader currently accepts PDFs only. Adjust max size, chunk size, or file count limits in the configuration UI.

See backend/docs/configuration.md for advanced troubleshooting.

What's Next?

Some ideas I'm planning to explore in the future :

  • Multi-page understanding: To track context across multiple pages, allowing Snappy to get the "big picture".
  • Knowledge Graph Integration: To Enhance Snappy with temporal reasoning and help overcome topK limitations.
  • Snappy Agents: To introduce a Vision-first Agentic workflow, simplify ingestion, retrieval and bring Snappy 1 step closer to production.

If you have ideas or run into issues, open an issue or PR on GitHub.

I hope you find Snappy useful. If you're working on document retrieval, especially in domains where visual structure matters, give it a try. And if you build something cool with it, let me know!

Thanks for reading.