Engineering notes on inference systems, latency, and hardware trends.
Blog

Inference systems notes for practical AI teams

Technical essays on TTFT, cache design, model serving, hardware trends, low precision, and the operational shape of frontier inference APIs.

AllLatencyKV CacheHardwareServingAgents