Intel Launches Cooper Lake: 3rd Generation Xeon Scalable for 4P/8P Servers
by Dr. Ian Cutress on June 18, 2020 9:00 AM ESTWe’ve known about Intel’s Cooper Lake platform for a number of quarters. What was initially planned, as far as we understand, as a custom silicon variant of Cascade Lake for its high-profile customers, it was subsequently productized and aimed to be inserted into a delay in Intel’s roadmap caused by the development of 10nm for Xeon. Set to be a full range update to the product stack, in the last quarter, Intel declared that its Cooper Lake platform would end up solely in the hands of its priority customers, only as a quad-socket or higher platform. Today, Intel launches Cooper Lake, and confirms that Ice Lake is set to come out later this year, aimed at the 1P/2P markets.
Count Your Coopers: BFloat16 Support
Cooper Lake Xeon Scalable is officially designated as Intel’s 3rd Generation of Xeon Scalable for high-socket count servers. Ice Lake Xeon Scalable, when it launches later this year, will also be called 3rd Generation of Xeon Scalable, except for low core count servers.
For Cooper Lake, Intel has made three key additions to the platform. First is the addition of AVX512-based BF16 instructions, allowing users to take advantage of the BF16 number format. A number of key AI workloads, typically done in FP32 or FP16, can now be performed in BF16 to get almost the same throughput as FP16 for almost the same range of FP32. Facebook made a big deal about BF16 in its presentation last year at Hot Chips, where it forms a critical part of its Zion platform. At the time the presentation was made, there was no CPU on the market that supported BF16, which led to this amusing exchange at the conference:
BF16 (bfloat16) is a way of encoding a number in binary that attempts to take advantage of the range of a 32-bit number, but in a 16-bit format such that double the compute can be packed into the same number of bits. The simple table looks a bit like this:
Data Type Representations | ||||||
Type | Bits | Exponent | Fraction | Precision | Range | Speed |
float32 | 32 | 8 | 23 | High | High | Slow |
float16 | 16 | 5 | 10 | Low | Low | 2x Fast |
bfloat16 | 16 | 8 | 7 | Lower | High | 2x Fast |
By using BF16 numbers rather than FP32 numbers, it would also mean that memory bandwidth requirements as well as system-to-system network requirements could be halved. On the scale of a Facebook, or an Amazon, or a Tencent, this would appeal to them. At the time of the presentation at Hot Chips last year, Facebook confirmed that it already had silicon working on its datasets.
Doubling Socket-to-Socket Interconnect Bandwidth
The second upgrade that Intel has made to Cooper Lake over Cascade Lake is in socket-to-socket interconnect. Traditionally Intel’s Xeon processors have relied on a form of QPI/UPI (Ultra Path Interconnect) in order to connect multiple CPUs together to act as one system. In Cascade Lake Xeon Scalable, the top end processors each had three UPI links running at 10.4 GT/s. For Cooper Lake, we have six UPI links also running at 10.4 GT/s, however these links still only have three controllers behind them such that each CPU can only connect to three other CPUs, but the bandwidth can be doubled.
This means that in Cooper Lake, each CPU-to-CPU connection involves two UPI links, each running at 10.4 GT/s, for a total of 20.8 GT/s. Because the number of links is doubled, rather than an evolution of the standard, there are no power efficiency improvements beyond anything Intel has done to the manufacturing process. Note that double the bandwidth between sockets is still a good thing, even if latency and power per bit is still the same.
Intel still uses the double pinwheel topology for its eight socket designs, ensuring at max two hops to any required processor in the set. Eight socket is the limit with a glueless network – we have already seen companies like Microsoft build servers with 32 sockets using additional glue logic.
Memory and 2nd Gen Optane
The third upgrade for Cooper Lake is the memory support. Intel is now supporting DDR4-3200 with the Cooper Xeon Platinum parts, however only in a 1 DIMM per channel (1 DPC) scenario. 2 DPC is supported, but only at DDR4-2933. Support for DDR4-3200 technically gives the system a boost from 23.46 GB/s per channel to 25.60 GB/s, an increase of 9.1%.
The base models of Cooper Lake will also be updated to support 1.125 TiB of memory, up from 1 TB. This allows for a 12 DIMM scenario where six modules are 64 GB and six modules are 128 GB. One of the complaints about Cascade Xeons was that in 1 TB mode, it would not allow for an even capacity per memory channel when it was filled with memory, so Intel have rectified this situation. In this scenario, it means that the six 128 GB modules could also be Optane. Why Intel didn’t go for the full 12 * 128 GB scenario, we’ll never know.
The higher memory capacity processors will support 4.5 TB of memory, and be listed as ‘HL’ processors.
Cooper Lake will also support Intel’s second generation 200-series Optane DC Persistent Memory, codenamed Barlow Pass. 200-series Optane DCPMM will still available in 128 GB, 256 GB, and 512 GB modules, same as the first generation, and will also run at the same DDR4-2666 memory speed. Intel claims that this new generation of Optane offers 25% higher memory bandwidth than the previous generation, which we assume comes down to a new generation of Optane controller on the memory and software optimization at the system level.
Intel states that the 25% performance increase is when they compare 1st gen Optane DCPMM to 2nd gen Optane DCPMM at 15 W, both operating at DDR4-2666. Note that the first-gen could operate in different power modes, from 12 W up to 18 W. We asked Intel if the second generation was the same, and they stated that 15 W is the maximum power mode offered in the new generation.
99 Comments
View All Comments
JayNor - Thursday, June 18, 2020 - link
The SSD speed depends on the block sizes and whether the data is restored serially.The other issue is that the data may not have been stored to the SSD.
schujj07 - Thursday, June 18, 2020 - link
I have and you are very wrong. Just SAP itself isn't an in RAM program, it is a set of different types of programs. SAP itself can run on multiple different DBs (Sybase, MSSQL, Oracle, MaxDB, DB2, or HANA) and with the exception of S4 HANA you need a separate system for your application server. Of those only HANA is a in RAM DB. Shutting down SAP doesn't take that long itself, shutting down HANA on the other hand can take a while depending on the storage subsystem you have. A 128GB RAM HANA DB can take up to 20 minutes to shutdown or restart on a 8Gb Fibre Channel SAN with 10k spinning disks. However, moving to a Software Defined Storage (SDS) array with NVMe disk and dual port 25Gb iSCSI interfaces changed that same shutdown & restart to less than 2 minutes. I have started a 1000GB HANA DB on that same SDS array in about 5 minutes. When you are restarting a physical HANA appliance the thing that takes the most time is the RAM check. I've restarted appliances with 2TB RAM and the RAM check itself can take about 10-20 minutes.Cramming more cores onto an Intel CPU is very difficult. The 28 core CPU is already near the top of the recital limit with an estimated size of 698mm2. https://www.anandtech.com/show/11550/the-intel-sky... That right there means that they cannot add more cores to their monolithic die. I can guarantee you that they would if it would fit.
Deicidium369 - Thursday, June 18, 2020 - link
Those systems are running on large multi socket systems... so the individual socket core count is not really that big of a deal. Most ERP is more IO intensive than purely compute intensive.I haven't dealt with a large SAP install - last one I was involved with was a SAP R/3 on a Sun Starfire server... and my SAP HANA is well handled by available RAM, and we don't need to worry about scheduling downtime across multiple world time zones.
1TB is not that large of an install - but larger than what I run... You have much more upto date experience than I do - i left the day to day years ago.
Spunjji - Friday, June 19, 2020 - link
"Those systems are running on large multi socket systems... so the individual socket core count is not really that big of a deal."It is if it means they can run the same sized instance on cheaper systems with fewer sockets. :|
"Most ERP is more IO intensive than purely compute intensive."
Then having that computer power attached to the fastest *and* widest IO available surely counts for something? Especially if, once again, it means you can get the same IO bandwidth from fewer sockets.
You're basically saying "AMD is bad for this" with a bunch of faux-authoritative statements based on outdated or inaccurate information, and then when you're called on it, you dissemble with a bunch of reasons which imply that in reality AMD could probably be quite a good fit for some people.
eek2121 - Thursday, June 18, 2020 - link
I wish AMD had a quad socket offering available via DIY for EPYC. I wish bot AMD and Intel would consider a dual socket offering for HEDT. I suppose the power/cooling requirements might be too high.Deicidium369 - Thursday, June 18, 2020 - link
I was an Intel HEDT user - when the time came to replace our engineering workstations I looked into the HEDT socket 2066 offerings, and ultimately decided on going to a dual socket 3647 Xeon Scalable motherboard and CPU. More memory channels and the ability (in our case never used, due to upgrade from a small Pascal based DGX to 2 large Volta DGX-2s) a second socket. So the people who have needed the additional power have moved to Xeon already. So HEDT is largely dead - the i9900K/i10900K can handle the lower end parts of the market - and if ISV certifications are required for support (Autodesk/etc) - then the Intel/Nvidia is really the only game in town.a Dual socket AMD or Intel aren't really that power intensive - and active CPU coolers are available for both - so chilled datacenter air would not be required (most servers use a passive heat sink, due to DC air). So if your use case requires dual socket - it's not that hard to accomplish.
schujj07 - Thursday, June 18, 2020 - link
"So HEDT is largely dead - the i9900K/i10900K can handle the lower end parts of the market - and if ISV certifications are required for support (Autodesk/etc) - then the Intel/Nvidia is really the only game in town." The HEDT all depends on what you are doing. If you are running applications that can be done with max 256GB RAM and scale above 10c/20t, then HEDT is still viable. Especially if you need maximum CPU performance. https://www.servethehome.com/amd-ryzen-threadrippe...The ISV certification claim you make is total BS. https://www.amd.com/en/support/certified-drivers (that is just Radeon Pro) For CPU that is simple since it is x86-64 and anything that runs x86-64 will work with it just fine.
Deicidium369 - Thursday, June 18, 2020 - link
SighThe vendor we used back then would only offer Support on end to end systems they supplied - so it was Intel and Nvidia - at this point AMD was not in a competitive position. Looking at the dates of the drivers - they were not certified at the time. We used Win 7 at that time.
I don't know what you want - sorry if my experience is different that what the AMD website has to say. I chose Intel and will continue to choose Intel. I don't care what you choose as I don't care.
Sorry that Intel has a dominant position in almost every single segment, also sorry that Nvidia has been destroying AMD in GPUs. Sorry that at the time when I purchased a system for the then new Window 7 that AMD was not an option for CPU or GPU. My businesses run off of Intel and Nvidia. When making the decisions for the now current system I evaluated TR and it came up short, WAY SHORT. I don't expect to eval TR or Epyc for the replacements early next year. Sorry that it somehow affects you.
schujj07 - Friday, June 19, 2020 - link
I don't care what you choose, just don't come in and state things as fact when your information is 10 years old. Remember that when you make false and misleading claims people will call you out. There are a lot of IT Pros who read this website for the new tech that is coming out or because they are system builders as well. We know what we are talking about because our job is to stay on top of the trends.Deicidium369 - Saturday, June 20, 2020 - link
Yeah and you work for people like me. And I don't care what you think, believe or do.