The drive to putting Arm into the server space has had its ups and downs. We’ve seen the likes of AppliedMicro/Ampere, Calxeda, Broadcom/Cavium/Marvell, Qualcomm, Huawei, Fujitsu, Annapurna/Amazon, and even AMD, deal with Arm-based silicon in the server market. Some of these designs have successful, others not so much, but Arm is pushing its new Neoverse N1 roadmap of cores into this space, aiming for high performance and for scale. We’ve already seen Amazon come into the market with its N1-based Graviton2 for its cloud services, but there’s going to be a counter product for every other cloud provider, with the new N1-based Next-Gen Ampere CPU, codenamed QuickSilver. We have some details ahead of the official release announcement in Q1 2020.

Ampere, technically Ampere Computing, is currently in the market with its eMAG processors. Using custom Arm cores derived from its acquisition of AppliedMicro, it has achieved mild success in second tier cloud providers (such as Packet) as well as Android in the cloud-type services for a number of Chinese smartphone game providers. There’s also some success with telecommunications at the edge, but ultimately nothing with extreme demands. This is where the Next-Gen Ampere product comes in.


Updated Roadmap for Jan 2020

 

New Product, No (Marketing) Name Yet

The new product doesn’t yet have a marketing name – we asked and they preferred it to be called ‘Next-Gen Ampere’ for now, or to use its SoC codename ‘QuickSilver’. What we were told is that the new product is a brand new ground-up design from Ampere, separate from the AppliedMicro IP acquisition. It plans to compete in the same space that Amazon’s Graviton2 currently sits at AWS, but as the main alternative to the other cloud providers that won’t have access to Graviton2.

What we are getting today is some rough details of the new chip. Other features, such as exact SKUs to be launched, exact TDPs, exact frequencies, and pricing, are going to be disclosed at the official release announcement in 2020. Nonetheless, Ampere has exposed a lot of details.

Next-Gen Ampere will be a monolithic chip built on TSMC’s 7nm process and featuring 80 cores. These cores are not custom like eMAG, but are built on Arm’s Neoverse N1 design, using paired clusters of cores connected by an Arm mesh IP (CMN-600). This is the same core as Graviton2, and as expected Ampere is keen to promote that their design is optimized for power, performance, latency, and throughput, as well as offering more cores and more of other things as well.

N1 cores are single threaded (not multi-threaded), and the QuickSilver SoC is built with containers in mind: chip will support container level QoS commands such that shared resources are not taken by one particular customer, and additional RAS features will be in play to ensure consistent performance. Ampere actually hammered on this point a fair bit, stating that QuickSilver is built to ensure consistency in a multi-tenant environment to the point where deterministic performance is required, which suggests a container based frequency and cache control. Ampere says that the turbo frequency of its new chip will be consistent in order to enable this.

Aside from 80 cores, QuickSilver will also have over 128 PCIe 4.0 lanes. Ampere wouldn’t state at this point exactly how many (details to come in Q1), but did state that it would have more on the market than any other server chip, x86 or Arm, will have in 2020. The current leader in this race is AMD, whose Rome EPYC processors have 128, and we confirmed that Quicksilver will have more than 128. This suddenly got very interesting, as this opens up a lot of possibilities for connectivity in specific markets, such as accelerators and storage. Ampere is a key partner in NVIDIA’s CUDA-on-Arm strategy in 2020, and so expect to see cloud instances using Quicksilver with access to lots of NVIDIA GPUs.

Also on the new chip, there will be eight DDR4 memory channels, although an exact frequency support was not stated. Ampere said it would be more than the current eMAG support, which is DDR4-2666, and that the IP they are using for the DDR4 memory controller allows them to push up to the limits of the JEDEC DDR4 specification, so realistically here we are looking at DDR4-3200 memory out of the gate. It will be interesting if this includes 2 DIMMs per channel support as well.

QuickSilver will support up to dual socket configurations, and Ampere states that it will use the CCIX protocol over PCIe 4.0 to enable socket-to-socket communications. Beyond this, the new chips will also support off-chip CCIX, for coherent accelerator attachments, or to add coherent storage class memory tiers. Ampere stated that their desire to use standardized interconnect protocols is helping drive their time-to-market, and with use of tried-and-tested IP blocks (like the DDR4 memory controller), that helps as well, along with taking advantage of the software and infrastructure that already circulates these standards. We were also told that all the work done to improve the performance on eMAG was transferable to QuickSilver, and that going forward Ampere’s strategy of building on the previous will be a key element for its customer base.

At this point in time Ampere did not see the need to add-in any specific acceleration units to the design – aside from the limited work they could do on the core, they decided that there was no need for specific cryptography or AI acceleration engines at this time, and they will wait until such compute blocks become standardized. The amount of connectivity and coherency, according to Ampere, will assist any customer needing additional acceleration.

On performance, Ampere is targeting a big leap over eMAG. The N1 cores from Arm are A76-like in their performance, and offer some configurability, such as a 512 KB to 1 MB L2 option. Ampere didn’t state exactly what size L2 they are using, but did state that they optimized for performance over die area in order to reduce memory/cache misses, which would imply that the full 1 MB L2 configuration is in play here. Ampere currently has silicon in house and early silicon sampling with key customers, and will provide a more rounded vision of performance with its full launch announcement in 2020.

We were told that the TDP of Next-Gen Ampere will range from 200W+ for the full 80-core model for servers, down to 45W with silicon that will offer an aggressive core count and performance per watt for particular use cases at the edge, for fanless designs. We were able to confirm that Ampere is building a monolithic die with 80 cores, so the halo chip will be a fully enabled part, with other chips being cut down versions of this. Ampere wouldn’t commit to any smaller silicon designs at this stage, stating that their customers are mostly requesting high-core count and high-performance parts.

Ampere’s Jeff Wittich, SVP of Products and ex-Intel, did state that the company is very thorough in its silicon design work, stating that he’s never seen so much pre-silicon design work for a product before. He cites an extensive amount of emulation for QuickSilver, especially when it comes to features of the SoC, and the company also does a lot of test chip designs as well to make sure what comes out at the right time ends up being what the company needs in a product with as few stepping adjustments as possible. I was surprised to hear this, given the relative size of Ampere and Wittich’s history, but given the recent praise on Graviton2 with its launch, QuickSilver does seem best poised to offer a direct competitor with higher-performance to other cloud providers in 2020.

Arm-based Server CPU Offerings
AnandTech Ampere

QuickSilver
Amazon

Graviton2
Marvell

ThunderX2
Huawei

Kunpeng 920
Ampere

eMAG
Launched Q1 2020 Q4 2019 Q2 2018 Q3 2019 Q3 2018
Arm arch

µarch
v8.2

Neoverse N1
v8.2

Neoverse N1
v8.1

Vulkan
v8.4

TSV110
v8.0

Skylark
Cores 80 64 32 64 32
Node TSMC
7nm
TSMC
7nm
TSMC
16nm
TSMC
7nm
TSMC
16nm
Freq ? 3.1 GHz 2.5 GHz 2.6 GHz 3.3 GHz
TDP 200W+ 100W ? ? 180 W 125 W
Memory 8x
DDR4-3200?
8x
DDR4-3200
8x
DDR4-2666
8x
DDR4-2933
8x
DDR4-2666
PCIe >128 4.0 ? 56x 3.0 40x 4.0 32x 3.0
CCIX Yes ? - Yes -
Multi
Socket
2 ? 2 4 1

Ampere’s Future and Roadmap

We were able to also ask about Ampere’s future in our briefing. It’s no secret that questions are being asked as to Ampere’s future, having done two rounds of funding but not making a serious dent in the Arm server space while also being suspiciously quiet. We were told that ultimately Ampere has had its head down recently driving the new Next-Gen Ampere product, hence the relative quietness, even when workstation products came to market but not a peep was heard from the company.

As part of our discussions, Ampere did state that it is very secure in its funding. We were told that Ampere is working on an annual cadence with its processor portfolio, with products coming out in 2020, 2021, and 2022. High-volume manufacturing with Quicksilver will start at the end of Q1/Q2 2020, and the company actually has the next two generations of hardware completely funded. This allows the company to be candid with old and new customers as to where the product is going, but also allows them to work with customers to examine future workloads and determine the requirements for future products.

Ampere's Roadmap
Ampere
eMag
Ampere
Next-Gen
NG+1 NG+2
Shipped Sampling In Development Defined
16nm 7nm 7nm 5nm
Skylark QuickSilver QS+1 QS+2
ARM v8.0 ARM v8.2    
Up to 32 Cores Up to 80 Cores    
8 x DDR4-2667 8 x DDR4-3200?    
42x PCIe 3.0 >128 PCIe 4.0    
75 W to 125 W 45 W to 200+ W    
3.3 GHz Turbo ? Frequency    
  More IO    
  CCIX Attach    
  Dual Socket    
  Improved IPC    

As to future products, Ampere did state that as they are now using LGA socketed processors for QuickSilver, then the next generation after this (which I’ll call QuickSilver+1) will be socket compatible and offer drop in support. Ampere sees this as a benefit for a quicker time-to-market, which is true, but also means that we should expect parity with memory and PCIe support for two generations, which is often welcomed in the server space. The generation after this, QuickSilver+2, is funded and is already in the definition phase. Beyond this Ampere is doing research and pathfinding, but based on the need to be Agile in an aggressive market space, Ampere doesn’t feel the need to start defining specifications in stone for products 3+ years away just yet.

From an AnandTech point of view, we’re glad that Ampere reached out to us to talk about this so far in advance of the official launch in Q1: it looks like only a couple of other media were offered briefings. We’ve seen workstation versions of eMAG hit the market, so we’re hoping to be sampled one of those ahead of the launch of the Next-Gen Product, if only to act as a reference point for performance claims. Then hopefully we can put it against the other competition in the market. Arm servers just got interesting.

The carousel image for this article is the current eMAG product as part of the Packet bare-metal cloud offering

Related Reading

Comments Locked

55 Comments

View All Comments

  • Vince789 - Monday, December 23, 2019 - link

    The 80-Core in the title is a clear sign it's not a GPU article
  • mode_13h - Tuesday, December 24, 2019 - link

    The GV100, featured on the Titan V and Tesla V100, has 80 streaming multiprocessors. Those are basically equivalent to CPU cores.
  • Vince789 - Tuesday, December 24, 2019 - link

    Yep, but Nvidia never refers to them "GPU cores" lol
  • micklevin - Monday, December 23, 2019 - link

    LOL, what's VMware screen doing on ARM server?
  • name99 - Tuesday, December 24, 2019 - link

    You do realize that VMware runs on ARM, right?
    https://blogs.vmware.com/vsphere/2019/10/esxi-on-a...

    This is why it’s so hard to take the anti-ARM carping seriously, when the people engaged in it don’t even know the most basic facts about what they are criticizing...
  • Harekm - Tuesday, December 24, 2019 - link

    Why would you buy this over a 64 core Epyc?
  • yeeeeman - Tuesday, December 24, 2019 - link

    Better price maybe? Performance is probably lower and power is in the same ballpark.
  • Wilco1 - Tuesday, December 24, 2019 - link

    I would expect it to have higher performance than Rome given Neoverse N1 has higher IPC and 25% extra cores. It should run at a higher frequency than Graviton2 since it uses far more than 25% extra power.
  • milli - Tuesday, December 24, 2019 - link

    Higher IPC than Rome? I'll believe it when I see it.
  • Wilco1 - Tuesday, December 24, 2019 - link

    Arm having higher IPC is non-controversial - it's well known that recent Arm micorarchitectures are as wide or wider than x86. Of course they clock lower than the fastest desktop chips, however clocks are similar for server parts, hence the IPC advantage matters. You should be able to try it out for yourself soon of course.

Log in

Don't have an account? Sign up now