Tag: optimization
All the articles with the tag "optimization".
-
The Price of Anarchy in Disaggregated Inference
I split NVIDIA Dynamo's prefill and decode into three competing games and measured the Price of Anarchy on a 3-node B200 cluster. While the GPUs had headroom, no router tuning moved the needle; the moment they saturated, one parameter was the gap between a 1-second tail and a 28-second one. So I built a 270-line controller that watches for that moment and flips the switch, without touching Dynamo's core.
-
Diminishing Returns and the Art of Knowing When to Stop
I trained three generations of ColQwen3.5, each with more sophisticated optimization than the last. The most optimized version barely beat the previous one on the primary benchmark (+0.0011 nDCG@5). Individual tasks reshuffled substantially, with per-task swings an order of magnitude larger than the aggregate gain.
Athrael.net