Skip to content
Athrael.net logo Athrael.net
Go back

The Most Beautiful RAG: Starring ColPali, Qdrant, Minio and Friends

Updated:

Vision RAG Template

A Follow‑up: From Little Scripts to a Full Template

A few weeks ago I shared: The Most Beautiful RAG: Starring Colnomic, Qdrant, Minio and Friends. That project explored late‑interaction retrieval with ColPali‑style embeddings, Qdrant, and MinIO, plus a bunch of performance tricks.

This post levels that up into a reusable template you can clone, run, and build on:

If you want a minimal, API‑first Vision RAG that does page‑level retrieval over PDFs, you’re in the right place.

What It Is

High‑Level Architecture

The core pieces:

Architecture

Next.js Frontend

Home
Home screen
Upload
Upload screen
Search
Search screen
Chat
Chat screen
Maintenance
Maintenance screen
About
About screen

Indexing Flow

  1. PDF → images (pdf2image.convert_from_path)
  2. Images → embeddings (external ColPali API)
  3. Save images to MinIO (public URLs)
  4. Upsert embeddings (original + mean‑pooled rows/cols) to Qdrant with payload metadata

Retrieval Flow

  1. Query → embedding (ColPali API)
  2. Qdrant multivector prefetch (rows/cols), then rerank with using="original"
  3. Fetch top‑k page images from MinIO
  4. Stream an OpenAI‑backed answer conditioned on user text + page images

Quickstart (Docker Compose)

# 1) Configure env
cp .env.example .env
# Set OPENAI_API_KEY / OPENAI_MODEL
# Choose COLPALI_MODE=cpu|gpu (or set COLPALI_API_BASE_URL to override)

# 2) Start the ColPali Embedding API (separate compose, from colpali/)
# CPU at http://localhost:7001 or GPU at http://localhost:7002
docker compose -f colpali/docker-compose.yml up -d api-cpu  # or api-gpu

# 3) Start all services
docker compose up -d

# Services
# Qdrant:   http://localhost:6333  (Dashboard at /dashboard)
# MinIO:    http://localhost:9000  (Console: http://localhost:9001, user/pass: minioadmin/minioadmin)
# API:      http://localhost:8000  (OpenAPI: http://localhost:8000/docs)
# Frontend: http://localhost:3000   (if enabled)

Open the docs at http://localhost:8000/docs and try the endpoints.

Local Development (without Compose)

cp .env.example .env
# set OPENAI_API_KEY, OPENAI_MODEL, QDRANT_URL, MINIO_URL, COLPALI_API_BASE_URL
uvicorn backend:app --host 0.0.0.0 --port 8000 --reload

Environment Variables (high‑value ones)

See .env.example for a minimal starting point.

ColPali API contract (expected)

The backend expects a ColPali‑style embedding API with endpoints:

Ensure your embedding server matches this contract to avoid client/runtime errors.

Data model in Qdrant

Each point stores three vectors (multivector):

Payload example:

{
  "index": 12,
  "page": "Page 3",
  "image_url": "http://localhost:9000/documents/images/<id>.png",
  "document_id": "<id>",
  "filename": "file.pdf",
  "file_size_bytes": 123456,
  "pdf_page_index": 3,
  "total_pages": 10,
  "page_width_px": 1654,
  "page_height_px": 2339,
  "indexed_at": "2025-01-01T00:00:00Z"
}

Binary quantization (optional)

Enable Qdrant binary quantization to reduce memory and speed up search while preserving quality via rescore/oversampling.

Using the API

API Examples

# Search
curl "http://localhost:8000/search?q=What%20is%20the%20booking%20reference%3F&k=5"

# Chat (non‑streaming)
curl -X POST http://localhost:8000/chat \
  -H 'Content-Type: application/json' \
  -d '{
    "message": "What is the booking reference for case 002?",
    "k": 5,
    "ai_enabled": true
  }'

Why ColPali‑style Retrieval Here?

If you read my previous post, you’ll also recognize mean‑pooled vectors for fast prefetch and final reranking with full‑res embeddings — the same spirit is here.

Troubleshooting

I hope you find this useful! Let me know if you have any questions or run into any issues.

Just kidding, nobody makes it this far, lel


Share this post on:

Previous Post
Snappy: Your Vision Retrieval Buddy!
Next Post
You too can run the Vidore Benchmark with less than 32GB of GPU VRAM