NVIDIA Shares Blackwell GPU Compute Stats: 30% More FP64 Than Hopper, 30x Faster In Simulation & Science, 18X Faster Than CPUs

erek

[H]F Junkie
Joined
Dec 19, 2005
Messages
11,015
1715656911363.png

"NVIDIA has shared more performance statistics of its next-gen Blackwell GPU architecture which has taken the industry by storm. The company shared several metrics including its science, AI, & simulation results versus the outgoing Hopper chips and competing x86 CPUs when using Grace-powered Superchip modules.

NVIDIA's Monumental Performance Gains With Blackwell GPUs Aren't Just Limited To AI, Science & Simulation See Huge Boost Too​

In a new blog post, NVIDIA has shared how Blackwell GPUs are going to add more performance to the research segment which includes Quantum Computing, Drug Discovery, Fusion Energy, Physics-based simulations, scientific computing, & more. When the architecture was originally announced at GTC 2024, the company showcased some big numbers but we have yet to get a proper look at the architecture itself. While we wait for that, the company has more figures for us to consume."

1715657120405.png


Source: https://wccftech.com/nvidia-blackwe...0x-faster-simulation-science-18x-faster-cpus/
 
Must be a bit strange feeling to be a customer, the same day that :
https://nvidianews.nvidia.com/news/nvidia-grace-hopper-ignites-new-era-of-ai-supercomputing
Driving a fundamental shift in the high-performance computing industry toward AI-powered systems, NVIDIA today announced nine new supercomputers worldwide are using NVIDIA Grace Hopper™ Superchips to speed scientific research and discovery. Combined, the systems deliver 200 exaflops, or 200 quintillion calculations per second, of energy-efficient AI processing power.

they show how much the replacement to be released in months are better....
 
Meanwhile, GTA 6 will launch on consoles at 30fps and will still sell 500 million copies in pre-orders....but every RTX-enabled game will still sell small because Call of Duty and it's ilk are on version 14, and people get enough dopamine hit out of the freebie versions of Warzone and Fortnight and Counterstrikes and so forth which they run, becuase the internet tells them this is what the cool kids do to be competetive, at low detail settings.

Hrrrrrrmmmm....

But on the upside, people who run multi-monitor for sims (the backbone of the PC gaming community since time immemorial) will be that much closer to achieving consistent 60 or 120fps at 4k x 3 screens.......so there's that.
 
But on the upside, people who run multi-monitor for sims (the backbone of the PC gaming community since time immemorial) will be that much closer to achieving consistent 60 or 120fps at 4k x 3 screens.......so there's that.

Still praying to the Korean Chaebol gods for an OLED version of the 57” G9 mini led
 
Meanwhile, GTA 6 will launch on consoles at 30fps and will still sell 500 million copies in pre-orders....but every RTX-enabled game will still sell small because Call of Duty and it's ilk are on version 14, and people get enough dopamine hit out of the freebie versions of Warzone and Fortnight and Counterstrikes and so forth which they run, becuase the internet tells them this is what the cool kids do to be competetive, at low detail settings.

Hrrrrrrmmmm....

But on the upside, people who run multi-monitor for sims (the backbone of the PC gaming community since time immemorial) will be that much closer to achieving consistent 60 or 120fps at 4k x 3 screens.......so there's that.
Gaming Blackwell won't be based on enterprise Blackwell, which this article is about.
Int8 vs floating point...
Surprised they're not pushing int4 in this comparison. We can double the speed by further cutting the size of the data type in half!

https://developer.nvidia.com/blog/int4-for-ai-inference/
 
If they changed the precision of the AI model used for the simulation being run, they wrote it so small below the low res image that I am not even able to read it
 
Gaming Blackwell won't be based on enterprise Blackwell, which this article is about.

Surprised they're not pushing int4 in this comparison. We can double the speed by further cutting the size of the data type in half!

https://developer.nvidia.com/blog/int4-for-ai-inference/
Last time I checked Nvidia's Int4 was upwards of 120x faster than their nearest competitor, that's a pretty boring bar chart.
 
Surprised they're not pushing int4 in this comparison. We can double the speed by further cutting the size of the data type in half!
Such marketing fluff. SIMD within a register is a thing. You can pack two int4s into an int8, two int8s into an int16, so on a so forth. We've been doing it for years. Not sure what data paths Nvidia are taking to achieve those magnitude of order speed claims.

Only scenario I can think of where you'd need that fine-tuned granularity is when thread synchronization costs precious cycles. Then again, I'm not a hardware optimization expert.

Floating Point is different though. SWAR'ing floats is more about making lemonade. FP8 already has like 3 or 4 different widely used formats, at least within the topic of AI inference.
 
Back
Top