Tag: llm-serving

All the articles with the tag "llm-serving".

The Price of Anarchy in Disaggregated Inference

17 Jun, 2026

I split NVIDIA Dynamo's prefill and decode into three competing games and measured the Price of Anarchy on a 3-node B200 cluster. While the GPUs had headroom, no router tuning moved the needle; the moment they saturated, one parameter was the gap between a 1-second tail and a 28-second one. So I built a 270-line controller that watches for that moment and flips the switch, without touching Dynamo's core.

The Price of Anarchy in Disaggregated Inference