Nvidia has broken through prior barriers with their B200 GPUs We have conducted independent benchmarking and are seeing >1,000 output tokens/s on Llama 4 Maverick, >10X the speed of some other providers. This represents the fastest Maverick endpoint that we have benchmarked yet. Exciting times ahead for developers when B200-based APIs are publicly available.
I love NVIDIA, but are the numbers really all about FP8? They also did misleading advertising in the past. See the image here... they compared FP8 from a 4099 with FP4 on a new 5090. Blackwell is sure better, but 5x-6x... I doubt it.
Cerebras missing?
Who needs some going live soon? 😉
Congrats! 🎉
Artificial Analysis - what are the number of #concurrentusers on each vendor’s machine for these results. Without that - this is a useless metric as the economics of #lowlatency do not work