Migrating from Broadwell to Skylake-SP:

More Cores, More Memory, Cache, AVX-512

The way that Intel is approaching the Xeon D product family is changing. Previously, the Broadwell-based Xeon parts were compact, with HEDT level core counts and a nice feature set. For this generation, Intel has decided to migrate to the server Skylake-SP core, rather than the standard Skylake-S core. Along with the generational enhancements over Broadwell, this means an adjusted cache hierarchy, use of Intel’s new core-to-core mesh technology, and the addition of AVX-512 units. This means that the new Xeon D-2100 series is, in silicon, a big 18-core Skylake-SP behemoth. We have reached out to Intel to confirm if this is the case for all the cores in the stack, as well as die sizes and transistor counts. Stay tuned for that information.

As we reported in depth in our analysis of the Skylake-SP core, the implementation of the cache, mesh, and AVX-512 changes, has a significant impact on how software has to consider running on these cores when it comes to memory accesses and core-to-core communication. Here’s a small brief primer on the changes:

Migrating from Broadwell to Skylake-SP
  D-1500 D-2100 Overall
Cache L2 256 KB L2 1 MB L2 Rule-of-thumb 2x hit-rate in L2

Overall L3 not as useful
L3 1.5 MB L3/core 1.375 MB L3/core
  L3 inclusive of L2 L3 is victim cache
Mesh / Uncore Ring bus Each node
(PCIe / core / DRAM)
has crossbar partition
for x/y routing
- Better scaling
- Better av. node-to-node latency

- More complex 
- More power at same frequency
AVX-512 AVX
AVX 2.0
AVX512F
AVX512CD
AVX512VW
AVX512DQ
AVX512VL
Substantial vector performance improvement

Increase in peak TDP
and silicon die area

A note on the AVX-512 support: Intel’s current consumer and enterprise-based Skylake-SP processors offer two different variants for this. Some processors, most notably the cheaper ones, only have one 512-bit FMA (fused multiply-add) execution unit for AVX-512 support, while the bigger and more expensive processors have two FMA units. The benefit of having two FMA units in action allows the AVX-512 silicon to be fed, and increases throughput, at the potential expense of power and longevity. For Xeon D-2100, Intel has stated that all of the processors only have a single 512-bit FMA unit.

Also on the mesh: compared to the old ring bus methodology, from our tests on the consumer line, it is clear that Intel is running the mesh (or the ‘uncore’) at a lower frequency overall, which may cause a drop in core-to-core bandwidth in the newer processors. We have asked Intel to confirm what mesh frequency is being used in the D-2100 series, and we are waiting to see if that information will be disclosed. It is worth noting that when we get access to these parts, we can probe for the frequency very easily, so it would help if Intel officially disclosed the value.

With the core migration comes other feature changes. The new Xeon D-2100 will be rated for quadruple the memory capacity of the Xeon D-1500, as the number of memory channels doubles from two to four, and RDIMM/LRDIMM modules up to 64 GB each are supported. This means a single processor can now support 512GB using eight 64GB RDIMMs. The previous generation only supported 32 GB RDIMMs.

 

Up to 32 PCIe 3.0 Lanes, 20 HSIO Lanes, Storage and VROC

For this generation, Intel has kept the number of publicly available PCIe lanes for add-in controllers at 32, which means we are likely to see implementations with x16/x8 PCIe lanes, but also opens up opportunities in cold/warm storage for more RAID cards, or in communications for additional transceivers or accelerators. When we say ‘publicly’ available here, it is clear that the chip has more than the previous version, the presence of QuickAssist means that there is likely at least 48 as part of the design (or 64 if the silicon is identical to the Xeon SP XCC chip design), but due to product segmentation/items like QAT, the amount of lanes for other controllers is kept constant. To a certain extent, this allows Intel to offer the D-2100 series almost as a drop in replacement for those that want to upgrade to Skylake cores.


The Lewisberg Chipset with 26 HSIO lanes, found in Xeon SP

As Xeon D is marketed as a system-on-chip, the traditional chipset is integrated into the platform. Intel has integrated one of its latest series of chipsets, and is offering 20 PCIe 3.0 High-Speed IO (HSIO) lanes for this. As with the chipsets, there will be limits as to where the lanes can go: these are typically limited to a PCIe 3.0 x4 connection at most, and some network controllers are limited to certain HSIO slots, but it does allow for intricate systems to be built.

One of the benefits of the number of PCIe lanes, as well as PCIe switch support, is for storage. Intel is targeting both long-term backup (cold storage) and content delivery networks/CDNs (warm storage) with this product line, and so Intel is keen to promote its PCIe storage and NVMe support. We confirmed that the Xeon D-2100 will be supporting Intel’s Virtual Raid on Chip (VROC), which means that hardware-based RAID 0 and RAID 1 configurations will have additional support benefits, but will be limited to specific NVMe drives and require a hardware based VROC key provided by the OEM. Intel also states that Xeon D-2100 will have fourteen SATA ports integrated, an increase from six SATA ports on the previous generation, although Intel has not disclosed how many AHCI controllers this is, or the SATA RAID support for these controllers on the platform. The platform also supports some legacy IO: eSPI, LPC and SMBus

Either way we slice it, Xeon D-2100 is coming across as a Skylake-SP HCC core and a Lewisburg chipset either melded into one, or two chips on the same package. Lewisburg has options available for different levels of QuickAssist and 10GbE, just as the Xeon D-2100 series. In order to get QAT and 10GbE, the Xeon SP platform has to provide 16 PCIe lanes from the CPU to the chipset for bandwidth - we know the Xeon SP HCC core has 64 PCIe lanes total, so if 16 each are used for QAT/10GbE, that leaves 32 in play. Which is what Xeon D-2100 has. Lewisberg also supports the same IO: 14 SATA ports. It wouldn't make sense for Intel to create a completely new silicon die just for Xeon D, right? If not, that makes Xeon D a multi-chip package with Xeon Gold and Lewisburg.

Increasing Ethernet and Adding QuickAssist Enterprise Features and Availability
POST A COMMENT

22 Comments

View All Comments

  • Elstar - Wednesday, February 7, 2018 - link

    I'm not sure what/who the target market is for the D-2191. The core count says "high end", but the TDP, base frequency, DDR frequency, and unique lack of integrated Ethernet is weird. It feels more like an "embedded Xeon-W" than a "Xeon-D". Reply
  • IntelUser2000 - Wednesday, February 7, 2018 - link

    Here's what one article had to say:

    "Looking back to the previous generation, Facebook utilized Mellanox multi-host adapters along with a custom version of the original Xeon D to lower networking costs and improve performance. We suspect that Intel is keenly aware of this and that is a part of the reason for that de-feature move."
    Reply
  • Elstar - Wednesday, February 7, 2018 - link

    That explains it. And after a few quick searches, I found Open Compute Project PDFs that explain the setup where integrated networking would be pointless. Thanks! Reply
  • Lakados - Wednesday, February 7, 2018 - link

    Always read the fine print:
    Benchmark results were obtained prior to implementation of recent software patches and firmware updates intended to address exploits referred to as "Spectre" and "Meltdown". Implementation of these updates may make these results inapplicable to your device or system.

    While I can see uses for these, until I see how they run with the patches in place this announcement is garbage.
    Reply
  • pavag - Wednesday, February 7, 2018 - link

    So, you pay $2400 for Meltdown and Spectre? Reply
  • Hurr Durr - Thursday, February 8, 2018 - link

    You`ve been paying for it for 20 years now without a single peep. You'll buy your Mossad processor and you will like it, goy. Reply
  • prisonerX - Friday, February 9, 2018 - link

    It's strange, I had to change to my AMD system to type "Palestinian genocide/Apartheid" it wouldn't work on my i5 box. Reply
  • Hurr Durr - Saturday, February 10, 2018 - link

    My i5 box always tries to inject something about toxic masculinity and opressive whiteness into every text I type in Word! Reply
  • none12345 - Thursday, February 8, 2018 - link

    Showcasing benchmark results without applying critical patches seems wrong on every level. Reply
  • prisonerX - Friday, February 9, 2018 - link

    Just subtract 30% and you've got it. Reply

Log in

Don't have an account? Sign up now