Archives

All the articles I've archived.

2026 ⁴

June ²

The Price of Anarchy in Disaggregated Inference

17 Jun, 2026

I split NVIDIA Dynamo's prefill and decode into three competing games and measured the Price of Anarchy on a 3-node B200 cluster. While the GPUs had headroom, no router tuning moved the needle; the moment they saturated, one parameter was the gap between a 1-second tail and a 28-second one. So I built a 270-line controller that watches for that moment and flips the switch, without touching Dynamo's core.
Mine the Way Your Model Scores: MaxSim Hard-Negative Mining for a Late-Interaction Student

Updated: 7 Jun, 2026

The standard way to mine hard negatives for a late-interaction model uses a single-vector cosine teacher, even though the model itself scores with multi-vector MaxSim. So I rebuilt my miner to score the way my model does. Matched mining clearly beat training with no mined negatives, while the cosine approach was barely doing anything at all.

March ¹

Diminishing Returns and the Art of Knowing When to Stop

15 Mar, 2026

I trained three generations of ColQwen3.5, each with more sophisticated optimization than the last. The most optimized version barely beat the previous one on the primary benchmark (+0.0011 nDCG@5). Individual tasks reshuffled substantially, with per-task swings an order of magnitude larger than the aggregate gain.

January ¹

Closing the AI Value Gap: Insights from Research

25 Jan, 2026

Enterprise AI adoption has reached 88%, yet only 5% of pilots deliver measurable impact. Research from MIT, BCG, and RAND reveals what separates successful implementations from the rest. It's not the technology.

2025 ¹⁰

December ¹

Implementing Spatially-Grounded Document Retrieval via Patch-to-Region Propagation

2 Dec, 2025

A deep dive into my recent research on spatially-grounded document retrieval using ColPali models and OCR bounding boxes, enabling precise region-level retrieval during inference time and without additional training.

October ¹

Snappy: Your Vision Retrieval Buddy!

27 Oct, 2025

How Snappy evolved from the nextjs-fastapi-colpali template into a vision-first document retrieval system

August ³

You too can run the Vidore Benchmark with less than 32GB of GPU VRAM

21 Aug, 2025

Quick, practical notes to run the Vidore benchmark smoothly on a single 32GB GPU: dtype, batch size, and common OOM fixes.
The Most Beautiful RAG: Starring ColPali, Qdrant, Minio and Friends

Updated: 1 Sep, 2025

An end-to-end, page-level Vision RAG template with ColPali-style embeddings, Qdrant multivector retrieval (with optional binary quantization), and MinIO-backed storage — dockerized and API-first.
ColQwen2.5 FastAPI Integration

11 Aug, 2025

A little-script to create a FastAPI server for ColQwen2.5

July ²

Audio RAG with ColQwen2.5-Omni

16 Jul, 2025

An audio RAG system that processes video URLs and answers questions about their content using ColQwen2.5-Omni and OpenAI audio
The Most Beautiful RAG: Starring Colnomic, Qdrant, Minio and Friends

Updated: 7 Jul, 2025

Introducing the first project in my little-scripts monorepo - A simple, yet beautiful RAG implementation using Colnomic, Qdrant and Nomic

April ¹

Mapping Worlds into Graphs with Qdrant, Neo4j, RF-DETR, BLIP-2 and Kung Fu

29 Apr, 2025

Diving deeper into the GraphRAG rabbit hole, I explore how to transform real-world video data into knowledge graphs using RF-DETR for object detection and BLIP-2 for intelligent entity description - setting the foundation for context-aware retrieval systems.

March ²

Down the Rabbit Hole - One step closer to Production Grade GraphRAG

25 Mar, 2025

After my initial experiment with GraphRAG using Qdrant, Neo4j, and Ollama, I took on a journey to build a more dynamic and context-aware system. This post dives into the details of how I constructed a dynamic ontology for NLP GraphRag.
GraphRAG with Qdrant, Neo4j, and Ollama (Using Qwen2.5:3b and Nomic text embeddings)

5 Mar, 2025

I've been playing with a new approach to RAG systems - combining vector search with knowledge graphs for more contextual, relationship-aware answers. Here's what I've built, how it works, and why you might want to try it yourself.

2024 ¹⁰

October ¹

Crazy good Observability using Grafana Alloy

9 Oct, 2024

Learn how to quickly set up a complete Grafana Alloy observability stack with just a few commands.

August ¹

Superfast Telemetry Setup for Next.js with OpenTelemetry, Prometheus, and Grafana

26 Aug, 2024

Learn how to quickly set up a super cool telemetry system for your Next.js application using OpenTelemetry, Prometheus, and Grafana

July ¹

It must've been love. Is it over now?

11 Jul, 2024

It must've been love. But, Is it over now?

April ¹

Raising Artificial Intelligence

18 Apr, 2024

Artificial Intelligence, especially Large Language Models like GPT-4, can be viewed through the parent-child relationship lens, reflecting the care and responsibility akin to raising a child. This perspective helps balance AI’s capabilities with societal impacts, ethical considerations, and risk management, without implying AI sentience or diminishing human complexities.

March ¹

Is Generative AI the Answer to Everything?

9 Mar, 2024

Is Generative AI the Answer to Everything? No, but it's a powerful tool that can be augmented with traditional methods and new technologies to address a broader spectrum of challenges.

February ⁴

How much would it cost to store a 1 hour, 60fps 4k Video in a RAG model?

Updated: 22 Feb, 2024

A simple experiment to calculate the cost of storing a 1 hour, 60fps 4k Video in a RAG model. For no practical reason, whatsoever.
Naive NoSQL Conversational History Retrieval for Dummies

19 Feb, 2024

Persistent memory in Generative AI is a crucial component that allows the AI to remember and recall information from previous interactions. In this article, we'll explore the 'Naive NoSQL Conversational History Retrieval Strategy'
So, I've been doing stuff...

14 Feb, 2024

2023 was an exciting year for learning. I went from a hibernating bear to a busy bee, and I've gotten back in touch with my long lost passion; Artificial Intelligence. It's time to reflect on the past and look forward to the future.
A brief analysis on RAG with Pinecone Serverless and Unstructured.io

Updated: 9 Feb, 2024

In this article, I provide a brief analysis on the performance of a RAG model using Pinecone Serverless and Unstructured.io, and the streaming chat experience. This analysis was done using a Next.js based template named [Titanium](https://github.com/athrael-soju/Titanium), which already incorporates several advanced Generative AI features.

January ¹

Integrating Vision using the latest OpenAI API

30 Jan, 2024

In this article, I'll be integrating Vision into the AI chat assistant I've been building in the previous articles. This work will include creating new UI components for the Vision API, as well as creating new API routes to support the new functionality.

2023 ⁷

December ⁵

Thoughts on the Latest OpenAI APIs and starting a New Project

Updated: 16 Dec, 2023

Lately I've been playing around with the latest OpenAI APIs to see what the buzz is all about. I thought it would be a good time to start a new project from scratch, and I've been working on it for the last month or so. It's still in early stages, but I think I've learned enough to have a decent perspective on what OpenAI has to offer and how it can be used to build interesting applications.
Integrating multi-user Assistants using the latest OpenAI API

Updated: 13 Dec, 2023

In this article, I'll be sharing my experience with integrating multi-user assistants with OpenAI API. This will include building the UI components for the user Assistant, including some functionality to allow deletion of all Assistant related data/files and going over file uploads and how to handle them in the Assistant.
Integrating Next-Auth in a Streaming AI Chat Assistant Using Material-UI

4 Dec, 2023

This guide covers the step-by-step process of integrating next-auth for authentication in a Next.js project, using Material-UI for styling. It includes acquiring OAuth credentials from GitHub and Google, configuring environment variables, setting up authentication providers, and implementing Material-UI components for a responsive user interface.
Creating an OpenAI Law Copilot - A Guide to Building an AI Legal Assistant

2 Dec, 2023

A guide to building an AI legal assistant using OpenAI's API to handle various legal tasks efficiently.
Creating a Customized Input Component in a Streaming AI Chat Assistant Using Material-UI

30 Nov, 2023

Creating a customized input component for a streaming AI chat assistant using Material-UI and React.

November ²

Integrating Markdown in Streaming Chat for AI Assistants

29 Nov, 2023

A guide to Integrating Markdown in Streaming Chat for AI Assistants.
Athrael.net - A Tech Blog about everything, everywhere, all at once. 'Pun intended'

28 Nov, 2023

Inaugural post introducing Athrael.net, a space where AI meets real-world applications, driven by a seasoned software engineer’s journey and insights.

Archives

The Price of Anarchy in Disaggregated Inference

Mine the Way Your Model Scores: MaxSim Hard-Negative Mining for a Late-Interaction Student

Diminishing Returns and the Art of Knowing When to Stop

Closing the AI Value Gap: Insights from Research

Implementing Spatially-Grounded Document Retrieval via Patch-to-Region Propagation

Snappy: Your Vision Retrieval Buddy!

You too can run the Vidore Benchmark with less than 32GB of GPU VRAM

The Most Beautiful RAG: Starring ColPali, Qdrant, Minio and Friends

ColQwen2.5 FastAPI Integration

Audio RAG with ColQwen2.5-Omni

The Most Beautiful RAG: Starring Colnomic, Qdrant, Minio and Friends

Mapping Worlds into Graphs with Qdrant, Neo4j, RF-DETR, BLIP-2 and Kung Fu

Down the Rabbit Hole - One step closer to Production Grade GraphRAG

GraphRAG with Qdrant, Neo4j, and Ollama (Using Qwen2.5:3b and Nomic text embeddings)

Crazy good Observability using Grafana Alloy

Superfast Telemetry Setup for Next.js with OpenTelemetry, Prometheus, and Grafana

It must've been love. Is it over now?

Raising Artificial Intelligence

Is Generative AI the Answer to Everything?

How much would it cost to store a 1 hour, 60fps 4k Video in a RAG model?

Naive NoSQL Conversational History Retrieval for Dummies

So, I've been doing stuff...

A brief analysis on RAG with Pinecone Serverless and Unstructured.io

Integrating Vision using the latest OpenAI API

Thoughts on the Latest OpenAI APIs and starting a New Project

Integrating multi-user Assistants using the latest OpenAI API

Integrating Next-Auth in a Streaming AI Chat Assistant Using Material-UI

Creating an OpenAI Law Copilot - A Guide to Building an AI Legal Assistant

Creating a Customized Input Component in a Streaming AI Chat Assistant Using Material-UI

Integrating Markdown in Streaming Chat for AI Assistants

Athrael.net - A Tech Blog about everything, everywhere, all at once. 'Pun intended'