AMD’s Next-Gen UDNA: Four Die Sizes, One Potential 96-CU Flagship

New block diagrams from a well-known hardware leaker @Kepler_L2 sketch out a very modular AMD UDNA family that could include four die sizes, with the largest rumored to reach 96 compute units. The images show repeating shader arrays and shader engines, each engine holding a handful of compute units and its own render backend. Those engines feed into a central SoC block that contains the graphics command processor, the graphics engine, hardware schedulers, and a shared L2 cache. On the memory side, the flagship diagram shows sixteen unified memory controllers, each with 32 bits, which would add up to a 512-bit external interface. The leak also mentions the possibility of a much larger on-die Infinity Cache for the top part.

If you map the parts together, the flagship’s numbers line up with the diagram. Eight shader arrays with two shader engines each give 16 shader engines, and with six compute units per engine, that reaches the 96 CU figure. The mid-tier design pares things back: four shader arrays and eight shader engines, with five compute units per engine, which is about 40 compute units and an estimated six memory controllers for a 192-bit bus. Below that are 24 CU and 12 CU designs built from smaller arrays and fewer controllers. The 24 CU design is shown with as many as eight memory controllers, which could mean different controller widths depending on whether the part uses a traditional GDDR style interface or an LPDDR5X-like arrangement.

There are other hints in the diagrams that go beyond just compute counts and bus widths. One interesting detail is the suggestion that some data-center-focused parts could get much larger per-CU local caches. The documents also imply some convergence between Radeon and Instinct features under the UDNA label, which could let AMD reuse the same building blocks for both gaming and AI or accelerator roles. That kind of modular SoC layout would make it easier to mix and match engine counts, cache sizes, and memory controller counts across a single family of parts. All of this remains unofficial and speculative. The diagrams offer a plausible way for AMD to scale a single architecture up and down, but they do not prove that silicon has been taped out or that final specifications are fixed. AMD could still change CU counts, memory widths, cache sizes, or SKU names before production begins.