Nvidia's B200 GPUs break speed records

Nvidia has broken through prior barriers with their B200 GPUs We have conducted independent benchmarking and are seeing >1,000 output tokens/s on Llama 4 Maverick, >10X the speed of some other providers. This represents the fastest Maverick endpoint that we have benchmarked yet. Exciting times ahead for developers when B200-based APIs are publicly available.

  • No alternative text description for this image

Artificial Analysis - what are the number of #concurrentusers on each vendor’s machine for these results. Without that - this is a useless metric as the economics of #lowlatency do not work

I love NVIDIA, but are the numbers really all about FP8? They also did misleading advertising in the past. See the image here... they compared FP8 from a 4099 with FP4 on a new 5090. Blackwell is sure better, but 5x-6x... I doubt it.

  • No alternative text description for this image

Who needs some going live soon? 😉

Like
Reply
See more comments

To view or add a comment, sign in

Explore content categories