80-Core N1 Next-Gen Ampere, ‘QuickSilver’: The Anti-Graviton2

by Dr. Ian Cutress on December 23, 2019 3:00 PM EST

55 Comments | Add A Comment

55 Comments

The drive to putting Arm into the server space has had its ups and downs. We’ve seen the likes of AppliedMicro/Ampere, Calxeda, Broadcom/Cavium/Marvell, Qualcomm, Huawei, Fujitsu, Annapurna/Amazon, and even AMD, deal with Arm-based silicon in the server market. Some of these designs have successful, others not so much, but Arm is pushing its new Neoverse N1 roadmap of cores into this space, aiming for high performance and for scale. We’ve already seen Amazon come into the market with its N1-based Graviton2 for its cloud services, but there’s going to be a counter product for every other cloud provider, with the new N1-based Next-Gen Ampere CPU, codenamed QuickSilver. We have some details ahead of the official release announcement in Q1 2020.

Ampere, technically Ampere Computing, is currently in the market with its eMAG processors. Using custom Arm cores derived from its acquisition of AppliedMicro, it has achieved mild success in second tier cloud providers (such as Packet) as well as Android in the cloud-type services for a number of Chinese smartphone game providers. There’s also some success with telecommunications at the edge, but ultimately nothing with extreme demands. This is where the Next-Gen Ampere product comes in.

Updated Roadmap for Jan 2020

New Product, No (Marketing) Name Yet

The new product doesn’t yet have a marketing name – we asked and they preferred it to be called ‘Next-Gen Ampere’ for now, or to use its SoC codename ‘QuickSilver’. What we were told is that the new product is a brand new ground-up design from Ampere, separate from the AppliedMicro IP acquisition. It plans to compete in the same space that Amazon’s Graviton2 currently sits at AWS, but as the main alternative to the other cloud providers that won’t have access to Graviton2.

What we are getting today is some rough details of the new chip. Other features, such as exact SKUs to be launched, exact TDPs, exact frequencies, and pricing, are going to be disclosed at the official release announcement in 2020. Nonetheless, Ampere has exposed a lot of details.

Next-Gen Ampere will be a monolithic chip built on TSMC’s 7nm process and featuring 80 cores. These cores are not custom like eMAG, but are built on Arm’s Neoverse N1 design, using paired clusters of cores connected by an Arm mesh IP (CMN-600). This is the same core as Graviton2, and as expected Ampere is keen to promote that their design is optimized for power, performance, latency, and throughput, as well as offering more cores and more of other things as well.

N1 cores are single threaded (not multi-threaded), and the QuickSilver SoC is built with containers in mind: chip will support container level QoS commands such that shared resources are not taken by one particular customer, and additional RAS features will be in play to ensure consistent performance. Ampere actually hammered on this point a fair bit, stating that QuickSilver is built to ensure consistency in a multi-tenant environment to the point where deterministic performance is required, which suggests a container based frequency and cache control. Ampere says that the turbo frequency of its new chip will be consistent in order to enable this.

Aside from 80 cores, QuickSilver will also have over 128 PCIe 4.0 lanes. Ampere wouldn’t state at this point exactly how many (details to come in Q1), but did state that it would have more on the market than any other server chip, x86 or Arm, will have in 2020. The current leader in this race is AMD, whose Rome EPYC processors have 128, and we confirmed that Quicksilver will have more than 128. This suddenly got very interesting, as this opens up a lot of possibilities for connectivity in specific markets, such as accelerators and storage. Ampere is a key partner in NVIDIA’s CUDA-on-Arm strategy in 2020, and so expect to see cloud instances using Quicksilver with access to lots of NVIDIA GPUs.

Also on the new chip, there will be eight DDR4 memory channels, although an exact frequency support was not stated. Ampere said it would be more than the current eMAG support, which is DDR4-2666, and that the IP they are using for the DDR4 memory controller allows them to push up to the limits of the JEDEC DDR4 specification, so realistically here we are looking at DDR4-3200 memory out of the gate. It will be interesting if this includes 2 DIMMs per channel support as well.

QuickSilver will support up to dual socket configurations, and Ampere states that it will use the CCIX protocol over PCIe 4.0 to enable socket-to-socket communications. Beyond this, the new chips will also support off-chip CCIX, for coherent accelerator attachments, or to add coherent storage class memory tiers. Ampere stated that their desire to use standardized interconnect protocols is helping drive their time-to-market, and with use of tried-and-tested IP blocks (like the DDR4 memory controller), that helps as well, along with taking advantage of the software and infrastructure that already circulates these standards. We were also told that all the work done to improve the performance on eMAG was transferable to QuickSilver, and that going forward Ampere’s strategy of building on the previous will be a key element for its customer base.

At this point in time Ampere did not see the need to add-in any specific acceleration units to the design – aside from the limited work they could do on the core, they decided that there was no need for specific cryptography or AI acceleration engines at this time, and they will wait until such compute blocks become standardized. The amount of connectivity and coherency, according to Ampere, will assist any customer needing additional acceleration.

On performance, Ampere is targeting a big leap over eMAG. The N1 cores from Arm are A76-like in their performance, and offer some configurability, such as a 512 KB to 1 MB L2 option. Ampere didn’t state exactly what size L2 they are using, but did state that they optimized for performance over die area in order to reduce memory/cache misses, which would imply that the full 1 MB L2 configuration is in play here. Ampere currently has silicon in house and early silicon sampling with key customers, and will provide a more rounded vision of performance with its full launch announcement in 2020.

We were told that the TDP of Next-Gen Ampere will range from 200W+ for the full 80-core model for servers, down to 45W with silicon that will offer an aggressive core count and performance per watt for particular use cases at the edge, for fanless designs. We were able to confirm that Ampere is building a monolithic die with 80 cores, so the halo chip will be a fully enabled part, with other chips being cut down versions of this. Ampere wouldn’t commit to any smaller silicon designs at this stage, stating that their customers are mostly requesting high-core count and high-performance parts.

Ampere’s Jeff Wittich, SVP of Products and ex-Intel, did state that the company is very thorough in its silicon design work, stating that he’s never seen so much pre-silicon design work for a product before. He cites an extensive amount of emulation for QuickSilver, especially when it comes to features of the SoC, and the company also does a lot of test chip designs as well to make sure what comes out at the right time ends up being what the company needs in a product with as few stepping adjustments as possible. I was surprised to hear this, given the relative size of Ampere and Wittich’s history, but given the recent praise on Graviton2 with its launch, QuickSilver does seem best poised to offer a direct competitor with higher-performance to other cloud providers in 2020.

Arm-based Server CPU Offerings
AnandTech	Ampere QuickSilver	Amazon Graviton2	Marvell ThunderX2	Huawei Kunpeng 920	Ampere eMAG
Launched	Q1 2020	Q4 2019	Q2 2018	Q3 2019	Q3 2018
Arm arch µarch	v8.2 Neoverse N1	v8.2 Neoverse N1	v8.1 Vulkan	v8.4 TSV110	v8.0 Skylark
Cores	80	64	32	64	32
Node	TSMC 7nm	TSMC 7nm	TSMC 16nm	TSMC 7nm	TSMC 16nm
Freq	?	3.1 GHz	2.5 GHz	2.6 GHz	3.3 GHz
TDP	200W+	100W ?	?	180 W	125 W
Memory	8x DDR4-3200?	8x DDR4-3200	8x DDR4-2666	8x DDR4-2933	8x DDR4-2666
PCIe	>128 4.0	?	56x 3.0	40x 4.0	32x 3.0
CCIX	Yes	?	-	Yes	-
Multi Socket	2	?	2	4	1

Ampere’s Future and Roadmap

We were able to also ask about Ampere’s future in our briefing. It’s no secret that questions are being asked as to Ampere’s future, having done two rounds of funding but not making a serious dent in the Arm server space while also being suspiciously quiet. We were told that ultimately Ampere has had its head down recently driving the new Next-Gen Ampere product, hence the relative quietness, even when workstation products came to market but not a peep was heard from the company.

As part of our discussions, Ampere did state that it is very secure in its funding. We were told that Ampere is working on an annual cadence with its processor portfolio, with products coming out in 2020, 2021, and 2022. High-volume manufacturing with Quicksilver will start at the end of Q1/Q2 2020, and the company actually has the next two generations of hardware completely funded. This allows the company to be candid with old and new customers as to where the product is going, but also allows them to work with customers to examine future workloads and determine the requirements for future products.

Ampere's Roadmap
Ampere eMag	Ampere Next-Gen	NG+1	NG+2
Shipped	Sampling	In Development	Defined
16nm	7nm	7nm	5nm
Skylark	QuickSilver	QS+1	QS+2
ARM v8.0	ARM v8.2
Up to 32 Cores	Up to 80 Cores
8 x DDR4-2667	8 x DDR4-3200?
42x PCIe 3.0	>128 PCIe 4.0
75 W to 125 W	45 W to 200+ W
3.3 GHz Turbo	? Frequency
	More IO
	CCIX Attach
	Dual Socket
	Improved IPC

As to future products, Ampere did state that as they are now using LGA socketed processors for QuickSilver, then the next generation after this (which I’ll call QuickSilver+1) will be socket compatible and offer drop in support. Ampere sees this as a benefit for a quicker time-to-market, which is true, but also means that we should expect parity with memory and PCIe support for two generations, which is often welcomed in the server space. The generation after this, QuickSilver+2, is funded and is already in the definition phase. Beyond this Ampere is doing research and pathfinding, but based on the need to be Agile in an aggressive market space, Ampere doesn’t feel the need to start defining specifications in stone for products 3+ years away just yet.

From an AnandTech point of view, we’re glad that Ampere reached out to us to talk about this so far in advance of the official launch in Q1: it looks like only a couple of other media were offered briefings. We’ve seen workstation versions of eMAG hit the market, so we’re hoping to be sampled one of those ahead of the launch of the Next-Gen Product, if only to act as a reference point for performance claims. Then hopefully we can put it against the other competition in the market. Arm servers just got interesting.

The carousel image for this article is the current eMAG product as part of the Packet bare-metal cloud offering

55 Comments

View All Comments

Vince789 - Monday, December 23, 2019 - link
The 80-Core in the title is a clear sign it's not a GPU article
mode_13h - Tuesday, December 24, 2019 - link
The GV100, featured on the Titan V and Tesla V100, has 80 streaming multiprocessors. Those are basically equivalent to CPU cores.
Vince789 - Tuesday, December 24, 2019 - link
Yep, but Nvidia never refers to them "GPU cores" lol
micklevin - Monday, December 23, 2019 - link
LOL, what's VMware screen doing on ARM server?
name99 - Tuesday, December 24, 2019 - link
You do realize that VMware runs on ARM, right?
https://blogs.vmware.com/vsphere/2019/10/esxi-on-a...

This is why it’s so hard to take the anti-ARM carping seriously, when the people engaged in it don’t even know the most basic facts about what they are criticizing...
Harekm - Tuesday, December 24, 2019 - link
Why would you buy this over a 64 core Epyc?
yeeeeman - Tuesday, December 24, 2019 - link
Better price maybe? Performance is probably lower and power is in the same ballpark.
Wilco1 - Tuesday, December 24, 2019 - link
I would expect it to have higher performance than Rome given Neoverse N1 has higher IPC and 25% extra cores. It should run at a higher frequency than Graviton2 since it uses far more than 25% extra power.
milli - Tuesday, December 24, 2019 - link
Higher IPC than Rome? I'll believe it when I see it.
Wilco1 - Tuesday, December 24, 2019 - link
Arm having higher IPC is non-controversial - it's well known that recent Arm micorarchitectures are as wide or wider than x86. Of course they clock lower than the fastest desktop chips, however clocks are similar for server parts, hence the IPC advantage matters. You should be able to try it out for yourself soon of course.

80-Core N1 Next-Gen Ampere, ‘QuickSilver’: The Anti-Graviton2

New Product, No (Marketing) Name Yet

Ampere’s Future and Roadmap

Related Reading

Post Your Comment

55 Comments

View All Comments

Vince789 - Monday, December 23, 2019 - link

mode_13h - Tuesday, December 24, 2019 - link

Vince789 - Tuesday, December 24, 2019 - link

micklevin - Monday, December 23, 2019 - link

name99 - Tuesday, December 24, 2019 - link

Harekm - Tuesday, December 24, 2019 - link

yeeeeman - Tuesday, December 24, 2019 - link

Wilco1 - Tuesday, December 24, 2019 - link

milli - Tuesday, December 24, 2019 - link

Wilco1 - Tuesday, December 24, 2019 - link

Log in

Don't have an account? Sign up now