Future Arm CPU Roadmaps

Not directly related to v9, however tied into the technology roadmap of the upcoming v9 designs in the near future, Arm also talked about some points regarding their projected performance of v9 designs in the next 2 years.

Arm talked about how the mobile space had seen performance increases of 2.4x (we’re talking purely ISO-process design IPC here) of this year’s X1 devices compared to the Cortex-A73 a few years ago in 2016.

Interestingly, Arm also talked about Neoverse V1 designs and how they’re achieving 2.4x the performance of A72 class designs, and discloses that they are expecting the first V1 devices to he released later this year.

For the next-generation mobile IP cores, code-named Matterhorn and Makalu, the company is disclosing an aggregate expected 30% IPC gain across these two generations, excluding frequency or any other additional performance gains which could be reached by SoC designers. This actually represents a 14% generational increases across these two new designs, and as showcased in the performance curve in the slide, would indicate that improvements are slowing down relative to what Arm had managed over the past few years since the A76. Still, the company states that the rate of advancement is still well beyond the industry average – admittedly that is being dragged down by some players.

Oddly enough, Arm also included a slide that wanted to focus on the system-side impact on performance, rather than just CPU IP performance. Some of the figures presented here, such as 1% of performance per 5ns of memory latency have been figures that we had talked about extensively for a few generations now, but Arm here also points out that there’s a whole generation of CPU performance that can be squeezed out if one focuses on improving various other aspects of an implementations by improving the memory path, increasing caches, or optimising frequency capabilities. I consider this to be a veiled shot at the current conservative approaches from SoC vendors which are not fully utilising the expected performance headroom of X1 cores, and subsequently also not reaching the expected performance projections of the new core.

Arm continues to see the CPU as the most versatile compute block for the future. While dedicated accelerators or GPUs will have their place, they have a hard time to address important points such as programmability, protection, pervasiveness (essentially ability to run them on any device), and proven abilities to work correctly. Currently, the compute ecosystem is extremely fragmented in how things are run, not only differing between device types, but also differing between device vendors and operating systems.

SVE2 and Matrix multiplication can vastly simplify the software ecosystem, and allow compute workloads to take a step forward with a more unified approach that will be able to run on any device in the future.

Lastly, Arm had a nugget of new information on the future of Mali GPUs, disclosing that the company is working on new technologies such as VRS and in particular Ray Tracing. The latter point is quite surprising to hear, and signals that the desktop and console ecosystem push by AMD’s and Nvidia’s introduction of RT is also expected to push the mobile GPU ecosystem towards RT.

Armv9 designs to be unveiled soon, devices in early 2022

Today’s announcement came in an extremely high-level format, and we expect Arm to talk more about the various details of Armv9 and new features such as CCA in the company’s usual yearly tech disclosures in the coming months.

In general, Armv9 appears to be a mix between a more fundamental ISA shift, which SVE2 can be seen as, and a general re-baselining for the software ecosystem to aggregate the last decade of v8 extensions, and build the foundation for the next decade of the Arm architecture.

Arm had already talked about the Neoverse V1 and N2 late last year, and I do expect the N2 at least to be eventually unveiled as a v9 design. Arm further discloses to expect more Armv9 CPU designs, likely the mobile-side Cortex-A78 and X1 successors, to be unveiled this year, with the new CPUs likely to have already been taped-in by the usual SoC vendors, and expected to be seen in commercial devices in early 2022.

Introducing the Confidential Compute Architecture
Comments Locked


View All Comments

  • mdriftmeyer - Thursday, April 1, 2021 - link

    Considering EPYC Genoa is 96 cores /192 threads and will include Xilinx specialty processors for Zen 4 I would have just left that as the comment. Intel's new CEO will ratchet up specialty processing onto future Intel solutions as well.
  • mdriftmeyer - Thursday, April 1, 2021 - link

    Sorry, but that's actually not even remotely close. Just head over to Phoronix and see how bad Milan whips the competition across the board. And yes, Phoronix has a much large process suite of applications than Anandtech.
  • Wilco1 - Friday, April 2, 2021 - link

    Anandtech is one of the few sites that produces accurate benchmark results across different ISAs. SPEC is an industry standard benchmark to compare servers, and I don't see anything like it on Phoronix. Phoronix just runs a bunch of mostly unknown benchmarks without even checking that the results are meaningful across ISAs (they are not in many cases). Quantity does not imply quality.
  • RSAUser - Saturday, April 3, 2021 - link

    Spec is quite flawed, you can go read up on it, it basically only cares about cache and cache latency, it is not an accurate representation of how stuff performs between different architectures.

    It's actually quite difficult to compare between architectures unless you know the specific use case,and Apple has done really well with the interpretation layer and I think dotnet core/5 from MS will also help MS quite a bit with that over the next few years when they start moving a lot of their products to their own architecture.
  • Wilco1 - Saturday, April 3, 2021 - link

    SPEC consists of real applications like the GCC compiler. More cache, lower latency memory and higher IPC*frequency give better scores just like any other code. SPEC is not perfect by any means, but it is the best cross-ISA benchmark that exists today.

    What Phoronix does is testing how well code is optimized. If you see x86 being much faster than AArch64 then clearly that code hasn't been optimized for AArch64. SimdJson treated AArch64 as first-class from the start and thus has had similar optimization effort as x86, and you can see that in the results. But that's not the case for many other random projects that are not popular (yet) on AArch64. So Phoronix results are completely useless if you are interested in comparing CPU performance.
  • mdriftmeyer - Thursday, April 1, 2021 - link

    Considering EPYC Genoa is 96 cores /192 threads and will include Xilinx specialty processors for Zen 4 I would have just left that as the comment. Intel's new CEO will ratchet up specialty processing onto future Intel solutions as well.
  • Wilco1 - Saturday, April 3, 2021 - link

    Genoa is 2022, Altra Max has 128 cores in 2021.
  • abufrejoval - Wednesday, March 31, 2021 - link

    I just hope they put CCA also in client side SoCs. So far all those 'realm', 'enclave' or VM encryption enhancements have only targeted server-side chips, but I don't think the vendor-favored walled garden approach has much of a future, there is an urgent need for more federation.
  • bobwya - Wednesday, March 31, 2021 - link

    "The benefit of SVE and SVE2 beyond addition various modern SIMD capabilities is in their variable vector size" - que?!! :-)
  • Matthias B V - Thursday, April 1, 2021 - link

    Glad to see. At least with the new arch they finally have to update their small cores. Was so tired of A55... Where only big cores are in focus though in my opinion the small ones are as or even more important.

    SVE 2 is great wonder how Intel and AMD react to this. They should work on similar features and also create a Lean86 getting rid of legacy if they want to defend market share. That and more flexible features like SVE would benefit them a lot.

    I am quite excited what ARM v9.x can do in tablets and Ultrabooks etc.

Log in

Don't have an account? Sign up now