From www.tomshardware.com
Here in Berlin, Germany, at IFA 2024, AMD’s Jack Huynh, the senior vice president and general manager of the Computing and Graphics Business Group, announced that the company will unify its consumer-focused RDNA and data center-focused CDNA architectures into one microarchitecture, named UDNA, that will set the stage for the company to tackle Nvidia’s entrenched CUDA ecosystem more effectively. The announcement comes as AMD has decided to deprioritize high-end gaming graphics cards to accelerate market share gains.
When AMD moved on from its GCN microarchitecture back in 2019, the company decided to split its new graphics microarchitecture into two different designs, with RDNA designed to power gaming graphics products for the consumer market while the CDNA architecture was designed specifically to cater to compute-centric AI and HPC workloads in the data center.
Huynh explained the reasoning behind the split in a Q&A session with the press and the rationale for moving forward with a new unified design. We also followed up for more details about the forthcoming architecture. Here’s a lightly edited transcript of the conversations:
Jack Huynh [JH], AMD: So, part of a big change at AMD is today we have a CDNA architecture for our Instinct data center GPUs and RDNA for the consumer stuff. It’s forked. Going forward, we will call it UDNA. There’ll be one unified architecture, both Instinct and client [consumer]. We’ll unify it so that it will be so much easier for developers versus today, where they have to choose and value is not improving.
We forked it because then you get the sub-optimizations and the micro-optimizations, but then it’s very difficult for these developers, especially as we’re growing our data center business, so now we need to unify it. That’s been a part of it. Because remember what I said earlier? I’m thinking about millions of developers; that’s where we want to get to. Step one is to get to the hundreds, thousands, tens of thousands, hundreds of thousands, and hopefully, one day, millions. That’s what I’m telling the team right now. It’s that scale we have to build now.
Tom’s Hardware [TH], Paul Alcorn: So, with UDNA bringing those architectures back together, will all of that still be backward compatible with the RDNA and the CDNA split?
JH: So, one of the things we want to do is …we made some mistakes with the RDNA side; each time we change the memory hierarchy, the subsystem, it has to reset the matrix on the optimizations. I don’t want to do that.
So, going forward, we’re thinking about not just RDNA 5, RDNA 6, RDNA 7, but UDNA 6 and UDNA 7. We plan the next three generations because once we get the optimizations, I don’t want to have to change the memory hierarchy, and then we lose a lot of optimizations. So, we’re kind of forcing that issue about full forward and backward compatibility. We do that on Xbox today; it’s very doable but requires advanced planning. It’s a lot more work to do, but that’s the direction we’re going.
PA: When you bring this back to a unified architecture, this means, just to be clear, a desktop GPU would have the same architecture as an MI300X equivalent in the future? Correct?
JH: It’s a cloud-to-client strategy. And I think it will allow us to be very efficient, too. So, instead of having two teams do it, you have one team. It’s not doing something that’s that crazy, right? We forked it because we wanted to micro-optimize in the near term, but now that we have scale, we have to unify back, and I believe it’s the right approach. There might be some little bumps.
PA: So, this merging back together, how long will that take? How many more product generations before we see that?
JH: We haven’t disclosed that yet. It’s a strategy. Strategy is very important to me. I think it’s the right strategy. We’ve got to make sure we’re doing the right thing. In fact, when we talk to developers, they love it because, again, they have all these other departments telling them to do different things, too. So, I need to reduce the complexity.
[…]From the developer’s standpoint, they love this strategy. They actually wish we did it sooner, but I can’t change the engine when a plane’s in the air. I have to find the right way to setpoint that so I don’t break things.
[End of Huynh’s comments]Yes, high-end silicon can build markets, but ultimately, software support tends to define the winners and losers. Nvidia has taught the master’s class of how to build a seemingly impenetrable moat with its unparalleled proprietary CUDA ecosystem.
Nvidia began laying the foundation of its empire when it started with CUDA eighteen long years ago, and perhaps one of its most fundamental advantages is signified by the ‘U’ in CUDA, the Compute Unified Device Architecture. Nvidia has but one CUDA platform for all uses, and it leverages the same underlying microarchitectures for AI, HPC, and gaming.
Huynh told me that CUDA has four million developers, and his goal is to pave the path for AMD to see similar success. That’s a tall order. AMD continues to rely on the open source ROCm software stack to counter Nvidia, but that requires buy-in from both users and the open source community that will shoulder some of the burden of optimizing the stack. Anything AMD can do to simplify that work, even if it comes at the cost of some micro-optimizations for certain types of applications/games, will help accelerate that ecosystem.
AMD has taken its fair share of criticism for the often scattered efficacy of the ROCm stack. When it bought Xilinx in 2022, AMD even announced that it would put Victor Peng, the then-CEO of Xilinx, in charge of a unified ROCm team to bring the project under tighter control (Peng recently retired). That effort has yielded at least some fruit, but AMD continues to receive criticism for the state of its ROCm stack — it’s clear the company has plenty of work ahead to fully put itself in a position to take on Nvidia’s CUDA.
The company also remains focused on ROCm despite the emergence of the UXL Foundation, an open software ecosystem for accelerators that is getting broad support from other players in the industry, like Qualcomm, Samsung, Arm, and Intel.
What precisely will UDNA change compared to the current RDNA and CDNA split? Huynh didn’t go into a lot of detail, and obviously there’s still plenty of groundwork to be laid. But one clear potential pain point has been the lack of dedicated AI acceleration units in RDNA. Nvidia brought tensor cores to then entire RTX line starting in 2018. AMD only has limited AI acceleration in RDNA 3, basically accessing the FP16 units in a more optimized fashion via WMMA instructions, while RDNA 2 depends purely on the GPU shaders for such work.
Our assumption is that, at some point, AMD will bring full stack support for tensor operations to its GPUs with UDNA. CDNA has had such functional units since 2020, with increased throughput and number format support being added with CDNA 2 (2021) and CDNA 3 (2023). Given the preponderance of AI work being done on both data center and client GPUs these days, adding tensor support to client GPUs seems like a critical need.
The unified UDNA architecture is a good next logical step on the journey to competing with CUDA, but AMD has a mountain to climb. Huynh wouldn’t commit to a release date for the new architecture, but given the billions of dollars at stake in the AI market, it’s obviously going to be a top priority to execute the new microarchitectural strategy. Still, with what we’ve heard about AMD RDNA 4, it appears UDNA is at least one more generation away.
[ For more curated Computing news, check out the main news page here]
The post AMD announces unified UDNA GPU architecture — bringing RDNA and CDNA together to take on Nvidia’s CUDA ecosystem | Tom’s Hardware first appeared on www.tomshardware.com