Intel 3rd Gen Xeon Scalable (Ice Lake SP) Review: Generationally Big, Competitively Smallby Andrei Frumusanu on April 6, 2021 11:00 AM EST
- Posted in
- Xeon Scalable
- Ice Lake-SP
Section by Ian Cutress
Ice Lake Xeon Processor List
Intel is introducing around 40 new processors across the Xeon Platinum (8300 series), Xeon Gold (6300 and 5300 series) and Xeon Silver (4300 series). Xeon Bronze no longer exists with Ice Lake. Much like the previous generation, the 8/6/5/4 segmentation signifies the series, and the 3 indicates the generation. Beyond that the two digits are somewhat meaningless as before.
That being said, there is a significant change. In the past, Platinum/Gold/Silver also indicated socket support, with Platinum supporting up to 8P configurations. This time around, as Ice Lake does not support 8P, all the processors will support only up to 2P, with a few select models being uniprocessor only. This makes the Platinum/Gold/Silver segmentation arbitrary, if only to indicate what sort of performance/price bracket the processors are in.
On top of this, Intel is adding in more suffixes to the equation. If you work with Xeon Scalable processors day in and day out, there is now a need to differentiate the Q processor from a P processor, and an S processor from an M processor. There’s a handy list down below.
The easiest way with this is to jump into the deep end with the processor list. RCP stands for recommended customer price, and SGX GB stands for how large Software Guard Extension enclaves can be – either 8 GB, 64 GB, or 512 GB. Cells highlighted in green show highlights in the stack.
|Intel 3rd Gen Xeon Scalable
Ice Lake Xeon Only
|Xeon Platinum (8x DDR4-3200)|
|Xeon Gold 6300 (8x DDR4-3200)|
|Xeon Gold 5300 (8x DDR4-2933)|
|Xeon Silver (8x DDR4-2666)|
|Q = Liquid Cooled SKU
Y = Supports Intel SST-PP 2.0
P = IaaS Cloud Specialised Processor
V = SaaS Cloud Specialised Processor
N = Networking/NFV Optimized
M = Media Processing Optimized
T = Long-Life and Extended Thermal Support
U = Uniprocessor (1P Only)
S = 512 GB SGX Enclave per CPU Guaranteed (...but not all 512 GB are labelled S)
The peak turbo on these processors is 3.7 GHz, which is much lower than what we saw with the previous generation. Despite this, Intel seems to be keeping prices reasonable, and enabling Optane support through most of the stack except for the Silver processors (which has its own single exception).
New suffixes include Q, for a liquid cooled processor model with higher all-core frequencies at 270 W, and Intel said this part came about based on customer demand. The T processors are extended life / extended thermal support, which usually means -40ºC to 125ºC support – useful when working at the poles or in other extreme conditions. M/N/P/V specialized processors, according to our chat with Lisa Spelman, GM of the Xeon and Memory Group, are the focal points for software stack optimizations. Users that want focused hardware that can get 2-10%+ more performance on their specific workload can get these processors for which the software will be specifically tuned. Lisa stated that while all processors will receive uplifts, the segmented parts are the ones those uplifts will be targeted for. This means managing turbo vs use case and adapting code for that, which can only really be optimized for a known turbo profile.
It’s hard not to notice that the server market over the last couple of years has become more competitive. Not only is Intel competing with its own high market share, but x86 alternatives from AMD have scored big wins when it comes to per-core performance, and Arm implementations such as the Ampere Altra can enable unprecedented density at competitive performance as well. Here’s how they all stand, looking at top-of-stack offerings.
|uArch||Zen 3||N1||N1||Sunny Cove|
|TDP||280 W||?||250 W||270 W|
|L3 Cache||256 MB||32 MB||32 MB||60 MB|
|PCIe||4.0 x128||?||4.0 x128||4.0 x64|
|Chipset||On CPU||?||On CPU||External|
|DDR4||8 x 3200||8 x 3200||8 x 3200||8 x 3200|
|DRAM Cap||4 TB||?||4 TB||4 TB|
At 40 cores, Intel does look a little behind, especially as Ampere is currently at 80 cores and a higher frequency, and will come out with a 128-core Altra Max version here very shortly. This means Ampere will be able to enable more cores in a single socket than Intel can in two sockets. Intel’s competitive advantage however will be the large current install base and decades of optimization, as well as new security features and its total offering to the market.
On a pure x86 level, AMD launched Milan only a few weeks ago, with its new Zen 3 core which has been highly impressive. Using a chiplet based approach, AMD has over 1000 mm2 of silicon to spread across 64 high performance cores and massive amounts of IO. Compared to Intel, which is around 660 mm2 and monolithic, AMD has the chipset onboard in its IO die, whereas Intel keeps it external which saves a good amount of idle power. Top of stack pricing between AMD and Intel is similar now, however AMD is also focusing in the mid-range with products like the 7F53 which really impressed us. We’ll see what Intel can respond with.
In our numbers today, we’ll be comparing Intel’s top-of-stack to everyone else. The battle royale of behemoths.
Gen on Gen Improvements: ISO Power
It is also important to look at what Intel is offering generationally in a like-for-like comparison. Intel’s 28-core 205 W point for the previous generation Cascade Lake is a good stake in the ground, and the Intel Xeon Gold 6258R is the dual socket equivalent of the Platinum 8280. We reviewed the two and they performed identically.
For this review, we’ve put the 40-core Xeon Platinum 8380 down to 205 W to see the effect of performance. But perhaps more in line, we also have the Xeon Gold 6330 which is a direct 28-core and 205 W replacement.
|Intel Xeon Comparison: 3rd Gen vs 2nd Gen
2P, 205 W vs 205 W
|28 / 56||32 / 64||Cores / Threads||28 / 56|
|2000 MHz Base
3100 MHz ST
2600 MHz MT
|2200 MHz Base
3400 MHz ST
2800 MHz MT
|2700 MHz Base
4000 MHz ST
3300 MHz MT
|35 MB + 42 MB||40 MB + 48 MB||L2 + L3 Cache||28 MB + 38.5 MB|
|205 W||205 W||TDP||205 W|
|PCIe 4.0 x64||PCIe 4.0 x64||PCIe||PCIe 3.0 x48|
|8 x DDR4-3200||8 x DDR4-3200||DRAM Support||6 x DDR4-2933|
|4 TB||4 TB||DRAM Capacity||1 TB|
|4 TB Optane
+ 2 TB DRAM
|4 TB Optane
+ 2 TB DRAM
|1 TB DDR4-2666
+ 1.5 TB
|64 GB||64 GB||SGX Enclave||None|
|1P, 2P||1P, 2P||Socket Support||1P, 2P|
|3 x 11.2 GT/s||3x 11.2 GT/s||UPI Links||3 x 10.4 GT/s|
So the 6330 might seem like a natural fit, however, the 8352Y feels better given that it is more equivalent in price and offers more performance. Intel is promoting a +20% raw performance boost with the new generation, which is important here, because the 8352Y still loses 500 MHz to the previous generation in all-core frequency. The 8352Y and 6330 do make it up in the extra features, such as DDR4 channels, memory support, PCIe 4.0, Optane support, SGX enclave support, and faster UPI links.
This review has a few of our 6330 numbers that we’ve been able to run in the short time we’ve had the system.
Post Your CommentPlease log in or sign up to comment.
View All Comments
mode_13h - Monday, April 12, 2021 - linkWith regard specifically to testing AVX-512, perhaps the best method is to include results both with and without it. This serves the dual-role of informing customers of the likely performance for software compiled with more typical options, as well as showing how much further performance is to be gained by using an AVX-512 optimized build.
KurtL - Wednesday, April 7, 2021 - linkGCC the industry standard in real world? Maybe in that part of the world where you live, but not everywhere. It is only true in a part of the world. HPC centres have relied on icc for ages for much of the performance-critical code, though GCC is slowly catching up, at least for C and C++ but not at all for Fortran, an important language in HPC (I just read it made it back in the top-20 of most used languages after falling back to position 34 a year or so ago). In embedded systems and the non-x86-world in general, LLVM derived compilers have long been the norm. Commercial compiler vendors and CPU manufacturers are all moving to LLVM-based compilers or have been there for years already.
Wilco1 - Wednesday, April 7, 2021 - linkYes GCC is the industry standard for Linux. That's a simple fact, not something you can dispute.
In HPC people are willing to use various compilers to get best performance, so it's certainly not purely ICC. And HPC isn't exclusively Intel or x86 based either. LLVM is increasing in popularity in the wider industry but it still needs to catch up to GCC in performance.
mode_13h - Wednesday, April 7, 2021 - linkGCC is the only supported compiler for building the Linux kernel, although Google is working hard to make it build with LLVM. They seem to believe it's better for security.
From the benchmarks that Phoronix routinely publishes, each has its strengths and weaknesses. I think neither is a clear winner.
Wilco1 - Thursday, April 8, 2021 - linkPlus almost all distros use GCC - there is only one I know that uses LLVM. LLVM is slowly gaining popularity though.
They are fairly close for general code, however recent GCC versions significantly improved vectorization, and that helps SPEC.
Wilco1 - Tuesday, April 6, 2021 - linkICC and AMD's AOCC are SPEC trick compilers. Neither is used much in the real world since for real code they are typically slower than GCC or LLVM.
Btw are you equally happy if I propose to use a compiler which replaces critical inner loops of the benchmarks with hand-optimized assembler code? It would be foolish not to take advantage of the extra performance you get only on those benchmarks...
ricebunny - Tuesday, April 6, 2021 - linkThey are not SPEC tricks. You can use these compilers for any compliant C++ code that you have. In the last 10 years, the only time I didn’t use icc with Intel chips was on systems where I had no control over the sw ecosystem.
Wilco1 - Tuesday, April 6, 2021 - linkThey only exist because of SPEC. The latest ICC is now based on LLVM since it was falling further behind on typical code.
ricebunny - Tuesday, April 6, 2021 - linkFrom my experience icc consistently produced better vectorized code.
Anandtech again didn’t publicize the compiler flags they used to build the benchmark code. By default, gcc will not generate avx512 optimized code.
Wilco1 - Tuesday, April 6, 2021 - linkMaybe compared to old GCC/LLVM versions, but things have changed. There is now little difference between ICC and GCC when running SPEC in terms of vectorized performance. Note the amount of code that can benefit from AVX-512 is absolutely tiny, and the speedups in the real world are smaller than expected (see eg. SIMDJson results with hand-optimized AVX-512).
And please read the article - the setup is clearly explained in every review: "We compile the binaries with GCC 10.2 on their respective platforms, with simple -Ofast optimisation flags and relevant architecture and machine tuning flags (-march/-mtune=Neoverse-n1 ; -march/-mtune=skylake-avx512 ; -march/-mtune=znver2 (for Zen3 as well due to GCC 10.2 not having znver3). "