Tag: llm-serving
All the articles with the tag "llm-serving".
-
The Price of Anarchy in Disaggregated Inference
I split NVIDIA Dynamo's prefill and decode into three competing games and measured the Price of Anarchy on a 3-node B200 cluster. While the GPUs had headroom, no router tuning moved the needle; the moment they saturated, one parameter was the gap between a 1-second tail and a 28-second one. So I built a 270-line controller that watches for that moment and flips the switch, without touching Dynamo's core.
Athrael.net