Migrating from Broadwell to Skylake-SP:

More Cores, More Memory, Cache, AVX-512

The way that Intel is approaching the Xeon D product family is changing. Previously, the Broadwell-based Xeon parts were compact, with HEDT level core counts and a nice feature set. For this generation, Intel has decided to migrate to the server Skylake-SP core, rather than the standard Skylake-S core. Along with the generational enhancements over Broadwell, this means an adjusted cache hierarchy, use of Intel’s new core-to-core mesh technology, and the addition of AVX-512 units. This means that the new Xeon D-2100 series is, in silicon, a big 18-core Skylake-SP behemoth. We have reached out to Intel to confirm if this is the case for all the cores in the stack, as well as die sizes and transistor counts. Stay tuned for that information.

As we reported in depth in our analysis of the Skylake-SP core, the implementation of the cache, mesh, and AVX-512 changes, has a significant impact on how software has to consider running on these cores when it comes to memory accesses and core-to-core communication. Here’s a small brief primer on the changes:

Migrating from Broadwell to Skylake-SP
  D-1500 D-2100 Overall
Cache L2 256 KB L2 1 MB L2 Rule-of-thumb 2x hit-rate in L2

Overall L3 not as useful
L3 1.5 MB L3/core 1.375 MB L3/core
  L3 inclusive of L2 L3 is victim cache
Mesh / Uncore Ring bus Each node
(PCIe / core / DRAM)
has crossbar partition
for x/y routing
- Better scaling
- Better av. node-to-node latency

- More complex 
- More power at same frequency
AVX-512 AVX
AVX 2.0
AVX512F
AVX512CD
AVX512VW
AVX512DQ
AVX512VL
Substantial vector performance improvement

Increase in peak TDP
and silicon die area

A note on the AVX-512 support: Intel’s current consumer and enterprise-based Skylake-SP processors offer two different variants for this. Some processors, most notably the cheaper ones, only have one 512-bit FMA (fused multiply-add) execution unit for AVX-512 support, while the bigger and more expensive processors have two FMA units. The benefit of having two FMA units in action allows the AVX-512 silicon to be fed, and increases throughput, at the potential expense of power and longevity. For Xeon D-2100, Intel has stated that all of the processors only have a single 512-bit FMA unit.

Also on the mesh: compared to the old ring bus methodology, from our tests on the consumer line, it is clear that Intel is running the mesh (or the ‘uncore’) at a lower frequency overall, which may cause a drop in core-to-core bandwidth in the newer processors. We have asked Intel to confirm what mesh frequency is being used in the D-2100 series, and we are waiting to see if that information will be disclosed. It is worth noting that when we get access to these parts, we can probe for the frequency very easily, so it would help if Intel officially disclosed the value.

With the core migration comes other feature changes. The new Xeon D-2100 will be rated for quadruple the memory capacity of the Xeon D-1500, as the number of memory channels doubles from two to four, and RDIMM/LRDIMM modules up to 64 GB each are supported. This means a single processor can now support 512GB using eight 64GB RDIMMs. The previous generation only supported 32 GB RDIMMs.

 

Up to 32 PCIe 3.0 Lanes, 20 HSIO Lanes, Storage and VROC

For this generation, Intel has kept the number of publicly available PCIe lanes for add-in controllers at 32, which means we are likely to see implementations with x16/x8 PCIe lanes, but also opens up opportunities in cold/warm storage for more RAID cards, or in communications for additional transceivers or accelerators. When we say ‘publicly’ available here, it is clear that the chip has more than the previous version, the presence of QuickAssist means that there is likely at least 48 as part of the design (or 64 if the silicon is identical to the Xeon SP XCC chip design), but due to product segmentation/items like QAT, the amount of lanes for other controllers is kept constant. To a certain extent, this allows Intel to offer the D-2100 series almost as a drop in replacement for those that want to upgrade to Skylake cores.


The Lewisberg Chipset with 26 HSIO lanes, found in Xeon SP

As Xeon D is marketed as a system-on-chip, the traditional chipset is integrated into the platform. Intel has integrated one of its latest series of chipsets, and is offering 20 PCIe 3.0 High-Speed IO (HSIO) lanes for this. As with the chipsets, there will be limits as to where the lanes can go: these are typically limited to a PCIe 3.0 x4 connection at most, and some network controllers are limited to certain HSIO slots, but it does allow for intricate systems to be built.

One of the benefits of the number of PCIe lanes, as well as PCIe switch support, is for storage. Intel is targeting both long-term backup (cold storage) and content delivery networks/CDNs (warm storage) with this product line, and so Intel is keen to promote its PCIe storage and NVMe support. We confirmed that the Xeon D-2100 will be supporting Intel’s Virtual Raid on Chip (VROC), which means that hardware-based RAID 0 and RAID 1 configurations will have additional support benefits, but will be limited to specific NVMe drives and require a hardware based VROC key provided by the OEM. Intel also states that Xeon D-2100 will have fourteen SATA ports integrated, an increase from six SATA ports on the previous generation, although Intel has not disclosed how many AHCI controllers this is, or the SATA RAID support for these controllers on the platform. The platform also supports some legacy IO: eSPI, LPC and SMBus

Either way we slice it, Xeon D-2100 is coming across as a Skylake-SP HCC core and a Lewisburg chipset either melded into one, or two chips on the same package. Lewisburg has options available for different levels of QuickAssist and 10GbE, just as the Xeon D-2100 series. In order to get QAT and 10GbE, the Xeon SP platform has to provide 16 PCIe lanes from the CPU to the chipset for bandwidth - we know the Xeon SP HCC core has 64 PCIe lanes total, so if 16 each are used for QAT/10GbE, that leaves 32 in play. Which is what Xeon D-2100 has. Lewisberg also supports the same IO: 14 SATA ports. It wouldn't make sense for Intel to create a completely new silicon die just for Xeon D, right? If not, that makes Xeon D a multi-chip package with Xeon Gold and Lewisburg.

Increasing Ethernet and Adding QuickAssist Enterprise Features and Availability
POST A COMMENT

22 Comments

View All Comments

  • Threska - Wednesday, February 7, 2018 - link

    " From a pure price perspective, this jump from the top core count part down to the one just below it is sizable, although Intel does have a history with this, such as the E3-1200 Xeon line where the top processor, with a 100 MHz higher frequency than the second best, was 30%+ higher in cost."

    Must be nice having a monopoly.
    Reply
  • Qwertilot - Wednesday, February 7, 2018 - link

    That's not a monopoly thing as, by definition, they provide very, very strong competition to themselves :) Some customers are presumably truly price insensitive for whatever reason. Reply
  • Elstar - Wednesday, February 7, 2018 - link

    If all you care about is upfront costs, then yes, Intel's high-end parts are expensive. But if you run a data center where "performance/watt" is critical, then the cost of the top parts are reasonable. Reply
  • Elstar - Wednesday, February 7, 2018 - link

    If all you care about is upfront costs, then yes, Intel's high-end parts are expensive. But if you run a data center where "performance/watt" is critical, then the cost of the top parts are reasonable. Reply
  • tamalero - Sunday, February 11, 2018 - link

    I'm confused how 4 cores less but 600Mhz less per core on base frequency and all turbo is "better performance per watt" while being almost 1400 USD more per processor. Reply
  • HStewart - Wednesday, February 7, 2018 - link

    "Must be nice having a monopoly."

    Well anybody that states Intel has a monopoly should rethink that, even Apple could be consider a Monopoly because they don't allow others to manufacture products on iOS - but the one that comes to mind the most is Qualcomm with recent announcements of Windows 10 for ARM - which only works on Qualcomm.. Can we say Windows 10 for Qualcomm - sorry no thanks

    But the real thing that make Qualcomm a real monopoly is it telecommunications.
    Reply
  • prisonerX - Friday, February 9, 2018 - link

    I guess you can argue what a monopoly is, but Intel is irrefutably abuses their dominant position in the marketplace. The former is not a sin, the later is illegal. Intel is repugnant. Reply
  • Yorgos - Wednesday, February 7, 2018 - link

    "Living on the Edge"
    Right on.
    Will it work? will it get infested due to the various sec. holes? will it get bricked like their C2000 cousins?
    You can never tell what's going to come tomorrow when you use intel.
    Living on the Edge.
    Reply
  • HStewart - Wednesday, February 7, 2018 - link

    I think you are trying to referred to Atom bases servers - they have been replace with C3xxx versions like 16 Core

    https://ark.intel.com/products/97927/Intel-Atom-Pr...

    But if these new D series Xeons have lower power - I could see them replace C2000 cousins. or this Atom based server
    Reply
  • DanNeely - Wednesday, February 7, 2018 - link

    Looks like only 14 D2xxx CPUs (in all the tables/charts) not 15 as stated in the section header. Reply

Log in

Don't have an account? Sign up now