Nvidia's B200 GPUs break speed records

View organization page for Artificial Analysis

19,908 followers

6mo

Nvidia has broken through prior barriers with their B200 GPUs We have conducted independent benchmarking and are seeing >1,000 output tokens/s on Llama 4 Maverick, >10X the speed of some other providers. This represents the fastest Maverick endpoint that we have benchmarked yet. Exciting times ahead for developers when B200-based APIs are publicly available.

7 Comments

R K Anand

6mo

Artificial Analysis - what are the number of #concurrentusers on each vendor’s machine for these results. Without that - this is a useless metric as the economics of #lowlatency do not work

6 Reactions

Dr. Martin Schiele

6mo

I love NVIDIA, but are the numbers really all about FP8? They also did misleading advertising in the past. See the image here... they compared FP8 from a 4099 with FP4 on a new 5090. Blackwell is sure better, but 5x-6x... I doubt it.

4 Reactions

Aapo Kyrölä

6mo

Cerebras missing?

Matt Kretchmer

6mo

Who needs some going live soon? 😉

Ashwin Hiremath

6mo

Congrats! 🎉

See more comments

To view or add a comment, sign in

More Relevant Posts

LightAct

893 followers
1mo
Report this post
The future of real-time rendering has arrived. We are incredibly excited to officially launch reActor xr3Black server. It is the latest iteration of our reActor range that comes with RTX 6000 #Blackwell GPU. Boasting with massive amounts of VRAM and an incredible real-time rendering performance combined with 4 SSD drives it is a perfect machine for the most advanced and complex shows. Last but not least, thanks to our engineering expertise, the reActor xr3Black currently holds the top spot in the Notchmarks list. More info here: https://lnkd.in/dC9vKgVv #poweredbylightact #lightact #nvidia
Like Comment
To view or add a comment, sign in
Al Mata
1mo Edited
Report this post
Everyone knows how to tell you what AI is but no one is telling you how to build it or works. Here is a start. Checkout the Intel Arc Pro Series GPUs that you can install on your computer that are self powered by the PCIE bus. These cards easily give you over 4000 tokens per second and you can use some tools like LM Studio to run models locally. There is a sample table with comparison to NVIDIAs A1000/2000 cards as well as pricing. Check out this good video by Alex Ziskind on how they work, compare and run models locally. https://lnkd.in/gGWeEjEn -LearnToDoAI.com
Like Comment
To view or add a comment, sign in
Puget Systems

4,458 followers
1mo
Report this post
Broadcast productions demand reliability. Dropped frames and sync issues aren’t an option. We recently validated Cinedeck workflows for recording four ProRes 422 HQ + four HD proxy streams in a mobile broadcast environment. Our hardware configuration (AMD Threadripper 7970X (32-core) + NVIDIA RTX 4000 Ada) successfully delivered stable, low-latency capture while offloading proxy encoding to NVENC, ensuring the CPU stayed focused on master files. Beyond benchmarks, we explored bottlenecks like NVMe disk cache saturation, NIC bandwidth, and scalability with dual GPUs for larger production needs. Key takeaway: For mobile or live workflows, this tested configuration provides reliability and headroom with options to scale up or down depending on scope. Read the full evaluation here: https://hubs.ly/Q03Ldp5P0
1 Comment
Like Comment
To view or add a comment, sign in
Phononic Inc

6,865 followers
1mo
Report this post
Cooling for stacked memory: With today's GPUs driving higher and higher TDPs, the need for more creative cooling techniques is even more important. HBM stacks in particular present unique challenges to maximize performance. Our perspective? Precise, predictive and performant cooling of HBM can unblock better performance and rapid ROI for your GPU deployments! Expand this from nodes, to racks to PODs and the potential is astounding! Ready to learn more? Read more in Phononic's DC Thought Stack https://hubs.la/Q03Mn_8_0
Like Comment
To view or add a comment, sign in
Matt Langman
1mo
Report this post
With many AI and ML workloads requiring frequent and large memory transfers, memory bandwidth is a critical component in overall system performance. At Phononic Inc, we're delivering an innovative approach for cooling of HBM that provides the needed precision and effectiveness to enhance AI performance.
Phononic Inc

6,865 followers
1mo

Cooling for stacked memory: With today's GPUs driving higher and higher TDPs, the need for more creative cooling techniques is even more important. HBM stacks in particular present unique challenges to maximize performance. Our perspective? Precise, predictive and performant cooling of HBM can unblock better performance and rapid ROI for your GPU deployments! Expand this from nodes, to racks to PODs and the potential is astounding! Ready to learn more? Read more in Phononic's DC Thought Stack https://hubs.la/Q03Mn_8_0
Like Comment
To view or add a comment, sign in
Larry Yang
1mo
Report this post
Increase HBM memory performance by eliminating its thermal constraint. Learn about Phononic Inc's precise, predictive and performance enhancing technology at our Data Center Thought Stack https://hubs.la/Q03Mn_8_0
Phononic Inc

6,865 followers
1mo

Cooling for stacked memory: With today's GPUs driving higher and higher TDPs, the need for more creative cooling techniques is even more important. HBM stacks in particular present unique challenges to maximize performance. Our perspective? Precise, predictive and performant cooling of HBM can unblock better performance and rapid ROI for your GPU deployments! Expand this from nodes, to racks to PODs and the potential is astounding! Ready to learn more? Read more in Phononic's DC Thought Stack https://hubs.la/Q03Mn_8_0
Like Comment
To view or add a comment, sign in
PNY Technologies

26,069 followers
1mo Edited
Report this post
With the release of the latest NVIDIA RTX PRO solutions, professionals across industries now have access to accelerated computing designed to drive the next wave of innovation. But which solution is right for your workflow? Our latest video offers a clear overview of each NVIDIA RTX PRO solution, helping you identify the best fit based on your industry, performance needs, and future goals. Watch now: https://bit.ly/4nzEeBW
Like Comment
To view or add a comment, sign in
Shadeform

1,833 followers
1mo
Report this post
Shadeform has partnered with Hydra Host to make their high-performance bare metal GPU infrastructure deployable in two clicks on our marketplace. Hydra brings highly competitive pricing across both B200 and H200 GPUs, demonstrated reliability, and adds to our continuously growing bare metal capacity. If you’re looking for bare metal B200 or H200 GPUs at a highly competitive price, deploy Hydra’s offerings on-demand on the Shadeform marketplace.
1 Comment
Like Comment
To view or add a comment, sign in
QCT

13,535 followers
1mo
Report this post
Advance #scalableAI with the #QuantaGrid D75L, a compact system housing 8x NVIDIA Blackwell Ultra GPUs in a 3U chassis. Featuring #liquidcooling and PCIe Gen 6 connectivity, it can be scaled up to build massive scale AI clusters. #OCPSummit25 https://bit.ly/4natlFF
Like Comment
To view or add a comment, sign in
Paolo Libertad
1mo
Report this post
SK Hynix drops a massive 245TB PCIe Gen5 SSD for AI and data centers! Kioxia Huawei Sandisk are in the game. Samsung Micron Solidigm will join the ultra-dense storage battle by 2026. The AI storage race is on.
Like Comment
To view or add a comment, sign in

19,908 followers

View Profile Connect

LinkedIn respects your privacy

Nvidia's B200 GPUs break speed records

More from this author

Anthropic’s new Claude Opus 4.5 is the #2 most intelligent model in the Artificial Analysis Intelligence Index, narrowly behind Google’s Gemini 3 Pro

Gemini 3 Pro is the new leader in AI. Google has the leading language model for the first time, with Gemini 3 Pro debuting above GPT-5.1

Announcing AA-Omniscience, our new benchmark for knowledge and hallucination across >40 topics

Explore content categories