Skip to content
Athrael.net logo Athrael.net
Go back

The Most Beautiful RAG: Starring ColPali, Qdrant, Minio and Friends

Updated:
Edit page

Vision RAG Template

A Follow‑up: From Little Scripts to a Full Template

A few weeks ago I shared: The Most Beautiful RAG: Starring Colnomic, Qdrant, Minio and Friends. That project explored late‑interaction retrieval with ColPali‑style embeddings, Qdrant, and MinIO, plus a bunch of performance tricks.

This post levels that up into a reusable template you can clone, run, and build on:

If you want a minimal, API‑first Vision RAG that does page‑level retrieval over PDFs, you’re in the right place.

What It Is

High‑Level Architecture

The core pieces:

Architecture

Next.js Frontend

Home
Home screen
Upload
Upload screen
Search
Search screen
Chat
Chat screen
Maintenance
Maintenance screen
About
About screen

Indexing Flow

  1. PDF → images (pdf2image.convert_from_path)
  2. Images → embeddings (external ColPali API)
  3. Save images to MinIO (public URLs)
  4. Upsert embeddings (original + mean‑pooled rows/cols) to Qdrant with payload metadata

Retrieval Flow

  1. Query → embedding (ColPali API)
  2. Qdrant multivector prefetch (rows/cols), then rerank with using="original"
  3. Fetch top‑k page images from MinIO
  4. Stream an OpenAI‑backed answer conditioned on user text + page images

Quickstart (Docker Compose)

# 1) Configure env
cp .env.example .env
# Set OPENAI_API_KEY / OPENAI_MODEL
# Choose COLPALI_MODE=cpu|gpu (or set COLPALI_API_BASE_URL to override)

# 2) Start the ColPali Embedding API (separate compose, from colpali/)
# CPU at http://localhost:7001 or GPU at http://localhost:7002
docker compose -f colpali/docker-compose.yml up -d api-cpu  # or api-gpu

# 3) Start all services
docker compose up -d

# Services
# Qdrant:   http://localhost:6333  (Dashboard at /dashboard)
# MinIO:    http://localhost:9000  (Console: http://localhost:9001, user/pass: minioadmin/minioadmin)
# API:      http://localhost:8000  (OpenAPI: http://localhost:8000/docs)
# Frontend: http://localhost:3000   (if enabled)

Open the docs at http://localhost:8000/docs and try the endpoints.

Local Development (without Compose)

cp .env.example .env
# set OPENAI_API_KEY, OPENAI_MODEL, QDRANT_URL, MINIO_URL, COLPALI_API_BASE_URL
uvicorn backend:app --host 0.0.0.0 --port 8000 --reload

Environment Variables (high‑value ones)

See .env.example for a minimal starting point.

ColPali API contract (expected)

The backend expects a ColPali‑style embedding API with endpoints:

Ensure your embedding server matches this contract to avoid client/runtime errors.

Data model in Qdrant

Each point stores three vectors (multivector):

Payload example:

{
  "index": 12,
  "page": "Page 3",
  "image_url": "http://localhost:9000/documents/images/<id>.png",
  "document_id": "<id>",
  "filename": "file.pdf",
  "file_size_bytes": 123456,
  "pdf_page_index": 3,
  "total_pages": 10,
  "page_width_px": 1654,
  "page_height_px": 2339,
  "indexed_at": "2025-01-01T00:00:00Z"
}

Binary quantization (optional)

Enable Qdrant binary quantization to reduce memory and speed up search while preserving quality via rescore/oversampling.

Using the API

API Examples

# Search
curl "http://localhost:8000/search?q=What%20is%20the%20booking%20reference%3F&k=5"

# Chat (non‑streaming)
curl -X POST http://localhost:8000/chat \
  -H 'Content-Type: application/json' \
  -d '{
    "message": "What is the booking reference for case 002?",
    "k": 5,
    "ai_enabled": true
  }'

Why ColPali‑style Retrieval Here?

If you read my previous post, you’ll also recognize mean‑pooled vectors for fast prefetch and final reranking with full‑res embeddings — the same spirit is here.

Troubleshooting

I hope you find this useful! Let me know if you have any questions or run into any issues.

Just kidding, nobody makes it this far, lel


Edit page
Share this post on:

Previous Post
Snappy: Your Vision Retrieval Buddy!
Next Post
You too can run the Vidore Benchmark with less than 32GB of GPU VRAM