AMD says its FPGA is ready to emulate your biggest chips

erek · Jun 27, 2023

Might be able to play Crysis too

“In addition to doubling the gate density, AMD says the part also offers twice the bandwidth, which translates into a higher effective cloud rate when emulating silicon. Meanwhile, the chip features a new chiplet architecture that places four FPGA tiles in quadrants, which Bauer says helps to reduce latency and congestion as data moves through the chips.

While all of this might sound impressive, anyone who has spent any time playing with emulation will know it tends to be highly inefficient, slow, and expensive compared to running on native hardware, and the situation is no different here.

Emulating modern SoCs with billions of transistors is a pretty resource intensive process to begin with. Depending on the size and complexity of the chip, Bauer says dozens or even hundreds of FPGAs spanning multiple racks may be required, and even then clock speeds are severely limited compared to what you'd find in hard silicon.

According to AMD, while just 24 devices are required to emulate a billion logic gates, it can be scaled out to support up to 60 billion gates at clock speeds in excess of 50MHz.

Bauer notes that the effective clock rate does depend on the number of FPGAs involved. "For example, if you had a piece of IP that can live in a single VP1902, you're gonna see much higher performance," he said.

While AMD's latest FPGA is largely aimed at chipmakers, the company says the chips are also well suited to companies doing firmware development and testing, IP block and subsystem prototyping, peripheral validation, and other test use cases.

As for compatibility, we're told the new chip will take advantage of the same underlying Vivado ML software development suite as the company's previous FPGAs. AMD says it's also working in collaboration with leading EDA vendors, like Cadence, Siemens and Synopsys, to add support for the chip's more advanced features.

AMD's VP1902 is slated to start sampling to customers in Q3 with general availability beginning in early 2024. ®”

Source: https://www.theregister.com/2023/06/27/amd_versal_fpga_emulation/

Zarathustra[H] · Jun 27, 2023

I mean, presumably this is just Xilinx stuff, right?

erek · Jun 27, 2023

Zarathustra[H] said:
I mean, presumably this is just Xilinx stuff, right?

Yea

presumably

Zarathustra[H] · Jun 27, 2023

erek said:
Yea presumably

I mean, it would be really cool if they included a programmable FPGA on the package with consumer CPU's which could do things like video encoding and offload other computer actions.

It would be a little bit like Apples approach in the M1.

kac77 · Jun 27, 2023

Zarathustra[H] said:
I mean, it would be really cool if they included a programmable FPGA on the package with consumer CPU's which could do things like video encoding and offload other computer actions.

It would be a little bit like Apples approach in the M1.

They've done this with EPYC and Ryzen with FPGA's for AI

Zarathustra[H] · Jun 27, 2023

kac77 said:
They've done this with EPYC and Ryzen with FPGA's for AI

Meh, AI. Lame

They should do it for something that is actually useful instead.

Or, since FPGA's are fully programmable, ship them with drivers that install a choice of firmware. AI firmware for those who are into that sort of thing, encoding firmware for others, crypto mining for yet others and probably other applications for things I am not thinking about right now.

They could even allow for reprogramming the FPGA on the fly.

That would be really cool and useful.

Lakados · Jun 27, 2023

Zarathustra[H] said:
I mean, it would be really cool if they included a programmable FPGA on the package with consumer CPU's which could do things like video encoding and offload other computer actions.

It would be a little bit like Apples approach in the M1.

No. That is a horror story!
It’s already hard to stop consumers from clicking yes on things that installs crap to hidden boot sectors and disk locations that survive formats and now you instead want them to have hardware access??? No no no.

There are a few Threads in here about the Intel FPGAs going on the upcoming Xeons and while that is fine for enterprise because any bad actor who gets to that stage has you utterly defeated, consumers for the most part are the bad actors.

Zarathustra[H] · Jun 27, 2023

Lakados said:
No. That is a horror story!
It’s already hard to stop consumers from clicking yes on things that installs crap to hidden boot sectors and disk locations that survive formats and now you instead want them to have hardware access??? No no no.

There are a few Threads in here about the Intel FPGAs going on the upcoming Xeons and while that is fine for enterprise because any bad actor who gets to that stage has you utterly defeated, consumers for the most part are the bad actors.

I believe there has to be a way to design it securely, such that only the proper drivers can flash the firmware, and when they do they give you like 5-10 options, with a hidden "i know what I'm doing" option somewhere that allows you to develop your own.

serpretetsky · Jun 27, 2023

Zarathustra[H] said:
I mean, it would be really cool if they included a programmable FPGA on the package with consumer CPU's which could do things like video encoding and offload other computer actions.

It would be a little bit like Apples approach in the M1.

The Versal parts, like the Zynq parts before them, have an onboard ARM CPU that is capable of running linux. Does that count?

schmide · Jun 27, 2023

Dynamic chips doing firmware ? Isn't firmware firm ?

TrunksZero · Jun 27, 2023

I kinda

Zarathustra[H] said:
I mean, it would be really cool if they included a programmable FPGA on the package with consumer CPU's which could do things like video encoding and offload other computer actions.

It would be a little bit like Apples approach in the M1.

Video encoding would be sweet. Revive the All-In-Wonder line of Radeon cards with some HDMi capture inputs and a little FPGA just for video capture encoding & streaming tasks. Could make for a killer product.

serpretetsky · Jun 27, 2023

But the question becomes: do you really need the CPU and Programmable logic to be on the same die or the same package to do video encoding/decoding? They make PCIe add-in FPGA boards, you can plug them into your pcie slot on your x86 machine and program it to do whatever you want. Is video encode/decode latency sensitive enough that PCIe is too slow?

erek · Jun 27, 2023

schmide said:
Dynamic chips doing firmware ? Isn't firmware firm ?

At least it’s not solid state ware

cdabc123 · Jun 27, 2023

FPGAs could be a great thing to have next to a cpu die, if the packaging and architecture allow it. This can be seen with apple devices, and HPC solutions. Many useful algorithms (such as encoding) can be ran on a FPGA very efficiently, even compared to ASICs for memory limited applications. This could allow for the ASIC of the cpu to be tuned as a more efficient architecture like ARM. x86 can be routed efficiently through software if developers choose to integrate the FPGA logic in their design.

The bandwidth on a modern SOC is tremendous, and FPGAs can be connected very well to memory and can directly accelerate some architectures. It will be interesting to see what AMD and Intel can do as they both chose to aquire the largest FPGA manufacturers and could gain from integrating and advancing their technology.

Haveing a high proformance Pcie FPGA can also be tremendously useful. From high performance networking, to mining crypto and many other algorithms. It is nice to develop a pcie card with a FPGA on it so that has made them prominent. However, integrating them with the cpu packaging can accelerate many workloads.

serpretetsky · Jun 27, 2023

cdabc123 said:
FPGAs could be a great thing to have next to a cpu die, if the packaging and architecture allow it. This can be seen with apple devices, and HPC solutions.

Were you mistaken here? or maybe I need some schooling. Which apple CPU has an FPGA on the same package?

LukeTbk · Jun 27, 2023

I think there also FPGA for CXL not just PCIe now that could have interesting latency:
https://www.intel.ca/content/www/ca...tual-property/interface-protocols/cxl-ip.html

I feel there is strong issue, something a CPU is not able to do fast enough but not interesting enough to have a specialized hardware for it and not during dev cycle but among regular customer ?

Crypto had the risk to change before your chips was done so fpga outside Bitcoin make a lot of sense, we will have 32 core cpu for people that want them, so you need for many things to be done on a FPGA to be done on something still expensive to beat it.

Not uninteresting, AV2 encoding start you update your FPGA instead of needing new hardware, game-app make specialized HDL code for something specific back in the days it was still possible to beat cpu for say IK solving another things quite common.

But at the same time we could start to wonder how to fully use all those cpu threads we can have on personal regular desktop, so it probably need to be a very large performance gap to justify it.

Lakados · Jun 27, 2023

LukeTbk said:
I think there also FPGA for CXL not just PCIe now that could have interesting latency:
https://www.intel.ca/content/www/ca...tual-property/interface-protocols/cxl-ip.html

I feel there is strong issue, something a CPU is not able to do fast enough but not interesting enough to have a specialized hardware for it and not during dev cycle but among regular customer ?

Crypto had the risk to change before your chips was done, we will have 32 core cpu for people that want them so you need for many things something still expensive to beat it.

Not uninteresting, AV2 encoding start you update your FPGA instead of needing new hardware, game-app make specialized HDL code for something specific back in the days it was still possible to beat cpu for say IK solving an other things quite common.

But at the same time we could start to wonder how to fully use all those cpu threads we can have on personal regular desktop, so it probably need to be a very large performance gap to justify it.

I would rather we start seeing FPGA’s as “common” PCIe upgrades? Like sound cards used to be. I would prefer it if CPU’s tried to do less and we had more specialized hardware available.

Imagine if you would a system with a shitload of high clocking e (either the Intel ones or the AMD 4c) cores, an FPGA to handle the specialized functions and a decent GPU.
Where game or applications devs could essentially “install” custom physics engines or audio engines or whatever on the FPGA to enhance their games?
Video editors or sound mixers could load their preferred codecs and accelerators just for them. Could be fun.

Maybe I’m way off my rocker but in my old cranky age I feel CPU’s and GPU’s are trying to do too much of everything and end up too complicated with too many bits for one offs that apply to too few users.

Zarathustra[H] · Jun 28, 2023

Lakados said:
I would rather we start seeing FPGA’s as “common” PCIe upgrades? Like sound cards used to be. I would prefer it if CPU’s tried to do less and we had more specialized hardware available.

I would love this, but unfortunately we seem to get less and less expansion over time in non-HEDT systems.

While I prefer the flexibility and configurability of general purpose systems with lots of PCIe lanes, one benefit from the on package approach is that there would be wide scale general availability, and that can drive all sorts of applications that otherwise might be overlooked if encountering a PC with an FPGA PCIe board is a relatively rare occurrence.

If these things wind up in every PC, who knows what software makers might use them for. Maybe a specialized physics accelerator that can be used in games? Or econding accelerators etc. etc.

As long as the programming process isn't too long, you could even do it at runtime. Program the FPGA with the appropriate design right before using it.

Lakados · Jun 28, 2023

Zarathustra[H] said:
I would love this, but unfortunately we seem to get less and less expansion over time in non-HEDT systems.

While I prefer the flexibility and configurability of general purpose systems with lots of PCIe lanes, one benefit from the on package approach is that there would be wide scale general availability, and that can drive all sorts of applications that otherwise might be overlooked if encountering a PC with an FPGA PCIe board is a relatively rare occurrence.

If these things wind up in every PC, who knows what software makers might use them for. Maybe a specialized physics accelerator that can be used in games? Or econding accelerators etc. etc.

As long as the programming process isn't too long, you could even do it at runtime. Program the FPGA with the appropriate design right before using it.

Honestly we wouldn’t even need many more PCIe slots 2 additional memory channels would be a bigger deal. Pcie5 is already too fast for the silicon to keep up with as is. I’m sure they could find the happy tipping point of diminishing returns and float around there to keep costs lower and offer some expanded options for those who really need it.

But the FPGA programming might as well happen while it’s compiling shaders on launch.

Zarathustra[H] · Jun 28, 2023

Lakados said:
Honestly we wouldn’t even need many more PCIe slots 2 additional memory channels would be a bigger deal. Pcie5 is already too fast for the silicon to keep up with as is. I’m sure they could find the happy tipping point of diminishing returns and float around there to keep costs lower and offer some expanded options for those who really need it.

But the FPGA programming might as well happen while it’s compiling shaders on launch.

That may be true if you actually use all the Gen5 bandwidth that is available, but there is a lot of PCIe bandwidth waste for many reasons.

Sticking an 8x card in a 16x slot, or sticking a previous gen card in a latest gen slot, etc.

Then there's the fact that you aren't always loading up all installed PCIe devices at the same time, so they dont necessarily HAVE to be "non-blocking".

It would be nice if the techology were more flexible. Some sort of PCIe switch built into the chipset on the board. Maybe you don't need 24 Gen 5 lanes, but what if you could turn them into 16 gen4 lanes for the GPU, a few Gen3 and Gen4 slots for m.2 drives, and a bunch of 8x Gen2 slots fot older expansion cards.

That kind of flexibility would be a beautiful thing, and could be achieved without needing to increase the total bandwidth.

AMD says its FPGA is ready to emulate your biggest chips

[H]F Junkie

Extremely [H]

[H]F Junkie

Extremely [H]

2[H]4U

Extremely [H]

[H]F Junkie

Extremely [H]

2[H]4U

Limp Gawd

Gawd

2[H]4U

[H]F Junkie

Supreme [H]ardness

2[H]4U

Supreme [H]ardness

[H]F Junkie

Extremely [H]

[H]F Junkie

Extremely [H]