Arm Announces Neoverse V1, N2 Platforms & CPUs, CMN-700 Mesh: More Performance, More Cores, More Flexibility
by Andrei Frumusanu on April 27, 2021 9:00 AM EST- Posted in
- CPUs
- Arm
- Servers
- Infrastructure
- Neoverse N1
- Neoverse V1
- Neoverse N2
- CMN-700
2020 has been an extremely successful year for Arm’s infrastructure and enterprise endeavours, as it was the year where we’ve seen fruition of the company’s “Neoverse” line of CPU microarchitectures hit the market in the form of Amazon’s new Graviton2 design as well as Ampere’s Altra server processor. Arm had first introduced the Neoverse N1 back in early 2019 and if you weren’t convinced of the Arm server promise with the Graviton2, the more powerful and super-sized Altra certainly should have turned some heads.
Inarguably the first generation of Arm servers that are truly competitive at the top end of performance, Arm is now finally achieving a goal the company has had in their sights for several years now, gaining real market share against the x86 incumbents.
Fast-forward to 2021, the Neoverse N1 design today employed in designs such as the Ampere Altra is still competitive, or beating the newest generation AMD or Intel designs – a situation that which a few years ago seemed farfetched. We recommend catching up on these important review pieces over the last 2 years to get an accurate picture of today’s market:
- Arm Announces Neoverse N1 & E1 Platforms & CPUs: Enabling A Huge Jump In Infrastructure Performance
- Amazon's Arm-based Graviton2 Against AMD and Intel: Comparing Cloud Compute
- The Ampere Altra Review: 2x 80 Cores Arm Server Performance Monster
- AMD 3rd Gen EPYC Milan Review: A Peak vs Per Core Performance Balance
- Intel 3rd Gen Xeon Scalable (Ice Lake SP) Review: Generationally Big, Competitively Small
(Note: Y axis left chart starts at 50%)
Arm is very open that their main priority with the Neoverse line of products is gaining cloud footprint deployment market share, and as an example of the new-found success is an estimate into Amazon’s own AWS instance additions throughout 2020, where the new Arm-based Graviton2 is said to be the dominant hardware deployment, picking up the majority of share that’s being lost by Intel.
Looking towards 2022 and Beyond
Today, we’re pivoting towards the future and the new Neoverse V1 and Neoverse N2 generation of products. Arm had already tested the new products last September, teasing a few characteristics of the new designs, but falling short of disclosing more concrete details about the new microarchitectures. Following last month’s announcement of the Armv9 architecture, we’re now finally ready to dive into the two new CPU microarchitectures as well as the new CMN-700 mesh network.
As presented back in September, this generation of Neoverse CPU microarchitectures differ themselves in that we’re talking about two quite different products, aimed at different goals and market segments. The Neoverse V1 represents a new line-up for Arm, with a CPU microarchitecture that is aiming itself for more HPC-like workloads and designs oriented towards such markets, while the Neoverse N2 is more of a straight-up successor to the Neoverse N1 and infrastructure and cloud deployments in the same way that the N1 sees itself today in products such as the Graviton or Altra processors.
For readers who are familiar with Arm’s mobile CPU microarchitectures, there’s definitely very large similarities between the designs – even though Arm’s marketing seems to be oddly reluctant to make such kind of comparisons, which is why I made the above chart which more clearly tries to depict the similarities between design generations.
The original Neoverse N1 as seen in the Graviton2 and Altra Q processors had been a derivative, or better said, a sibling microarchitecture, to the Cortex-A76, which had been employed in the 2019 generation of Cortex-A76 mobile SoCs such as the Snapdragon 855. Naturally, the Neoverse designs had server-oriented features and changes that aren’t present in the mobile counterparts.
Similarly to how the N1 was related to the A76, the new generation V1 and N2 microarchitectures are related to newer designs in the Cortex-portfolio. The V1 is related to the Cortex-X1 which we’ve seen in this year’s new mobile SoCs such as the Snapdragon 888 or Exynos 2100. The Neoverse N2 on the other hand is related to an upcoming new Cortex-A microarchitecture which we expect to hear more about in the following few months. Throughout the piece today we’ll make a few more references to this generational disconnect between the V1 and N2, and it’s important to remember that the N2 is a newer design, albeit aimed at different performance and efficiency points.
This decoupling of design goals between the V1 and N2 for Arm comes through the company’s attempt to target more specific markets where the end products might have different priorities, much like how in the mobile space the new Cortex-X series prioritises per-core performance while the Cortex-A series continues to focus on the best PPA. Similarly, the V1 focuses on maximised performance at lower efficiency, with features such as wider SIMD units (2x256b SVE), while the N2 continues the scale-out philosophy of having the best power-efficiency while still moving forward performance through generational IPC improvements.
In today’s piece, we’ll be diving into the new microarchitectural changes of the V1, N2, as well as Arm’s newest generation mesh interconnect IP, the CMN-700, which is expected to serve as the foundation of the next-generation Arm infrastructure processors.
Table of contents:
- A Successful 2020 for Arm - Looking Towards 2022
- The Neoverse V1 Microarchitecture: X1 with SVE?
- The Neoverse V1 Microarchitecture: Platform Enhancements
- The Neoverse N2 Microarchitecture: First Armv9 For Enterprise
- The SVE Factor - More Than Just Vector Size
- PPA & ISO Performance Projections
- The CMN-700 Mesh Network - Bigger, More Flexible
- Eventual Design Performance Projections
- First Thoughts & End Remarks
95 Comments
View All Comments
nandnandnand - Tuesday, April 27, 2021 - link
Looking at Cortex-X-next. It seems like Arm can put out a new Cortex-X for every new Cortex-A78 successor, since the Cortex-X is very similar but bigger.mode_13h - Tuesday, April 27, 2021 - link
Form an earlier article:> The Cortex-X1 was designed within the frame of a new program at Arm,
> which the company calls the “Cortex-X Custom Program”.
> The program is an evolution of what the company had previously
> already done with the “Built on Arm Cortex Technology” program
> released a few years ago. As a reminder, that license allowed
> customers to collaborate early in the design phase of a new
> microarchitecture, and request customizations to the configurations,
> such as a larger re-order buffer (ROB), differently tuned prefetchers,
> or interface customizations for better integrations into the SoC designs.
> Qualcomm was the predominant benefactor of this license,
Alistair - Tuesday, April 27, 2021 - link
I just want to be able to use ARM in standard DIY with an Asus motherboard and a socket, just like AMD and Intel.mode_13h - Tuesday, April 27, 2021 - link
I wonder if Nvidia will put out a Jetson-style board in something like a mini-ITX form factor.Alistair - Wednesday, April 28, 2021 - link
i sure hope so, and something not massively overpriced like right nowmode_13h - Thursday, April 29, 2021 - link
Yeah, because Nvidia is known for their bargain pricing!; )
Although, if they wanted to create a whole new product segment, it's conceivable they might keep prices rather affordable for a couple generations.
nandnandnand - Wednesday, April 28, 2021 - link
I want it. You want it. Some people seem to want it. Maybe demand is forming? Get on it, China.16-core Cortex-X2 please.
mode_13h - Wednesday, April 28, 2021 - link
They already did, sort of. See: https://e.huawei.com/us/products/servers/kunpeng/k...Whoops! Had to get this out of Google cache, because the page 404'd:
Board Model D920S10
Processors 1 Kunpeng 920 processor, 4/8 cores, 2.6 GHz
Internal Storage 6 SATA 3.0 hard drive interfaces, 2 M.2 SSD slots
Memory 4 DDR4-2666 UDIMM slots, up to 64 GB
PCIe Expansion 1 PCIe 3.0 x16, 1 PCIe 3.0 x4, and 1 PCIe 3.0 x1 slots
LOM Network Ports 2 LOM NIC, supporting GE network ports or optical ports
USB 4 USB 3.0 and 4 USB 2.0
mode_13h - Tuesday, April 27, 2021 - link
Do any of the current x86 cores pair up SSE operations for >= 4x throughput per cycle?AVX2 has been around for long enough that a lot of the code which could benefit from it has already been written to do so, yet *most* people are still compiling to baseline x86-64 (or just above that), since Intel is still making low-power cores without any AVX. So, I'm sure there's still *some* code that could benefit from >= 4x SSEn execution.
AntonErtl - Wednesday, April 28, 2021 - link
Zen has 4 128-bit FP units (2 FMA and 2 FADD). Not sure if that's what you are interested in.