A Modest Tick

As Broadwell is a tick - a die shrink of an existing architecture, rather than a new architecture - so you should expect modest IPC improvements. Most Xeon E5 v4 SKUs have slightly lower clockspeeds compared to their Haswell v3 brethren, so overall the single threaded performance has hardly improved. Clock for clock, Intel tells us that their simulation tools show that Broadwell delivers about 5% better performance per clock in non-AVX2 traces.

First Y-axis + bars: simulated single threaded performance improvement. Blue line + second Y-axis is the cumulative improvement.

In that sense, Broadwell is basically a Haswell made on Intel's 14nm second generation tri-gate transistor process. Intel did make a few subtle improvements to the micro-architecture:

  • Faster divider: lower latency & higher throughput 
  • AVX multiply latency has decreased from 5 to 3 
  • Bigger TLB (1.5k vs 1k entries)
  • Slightly improved branch prediction (as always)
  • Larger scheduler (64 vs 60)

None of these improvements will yield large performance improvements. The larger improvements must come from other features.

New Features

Compared to Haswell-EP, Broadwell-EP also includes some new features. The first one is the improved power control unit. 

On Haswell, one AVX instruction on one core forced all cores on the same socket to slow down their clockspeed by around 2 to 4 speed bins (-200,-400 MHz) for at least 1 ms, as AVX has a higher power requirement that reduces how much a CPU can turbo. On Broadwell, only the cores that run AVX code will be reducing their clockspeed, allowing the other cores to run at higher speeds. 

The other performance feature is the vastly improved PCLMULQDQ (carry-less multiplication) instruction: throughput has been doubled, and latency reduced from 7 cycles to 5.

This increases AES (symmetric) encryption performance by 20-25%, and CRCs (Cyclic Redundancy check) are up to 90% faster. Broadwell also has some new ADCX/ADOX instructions to speed up asymmetric encryption algorithms such as the popular RSA. These improvements are implemented in OpenSSL 1.0.2-beta3. But don't expect too much from it.. The compute intensive asymetric encryption is mostly used to initiate a secure connection. Most modern web applications keep their sessions "alive", and as a result, events that require asymmetric encryption happen a lot less frequentely . Symmetric encryption (like AES) which is used to send encrypted data is a lot lighter, so even on a fully encrypted website with long encrypted data streams, encryption is only a small percentage (<5%) of the total computing load.

Broadwell-EP: The 14nm Xeon E5 Sharing Cache and Memory Resources
Comments Locked


View All Comments

  • Casper42 - Thursday, March 31, 2016 - link

    HPE just dropped the 64GB LRDIMMs a week or two back.
    They are now exactly 2x the 32GB LRDIMM as far as List Price goes.
    LRDIMMs across the board are 31% more expensive than RDIMMs.
  • wishgranter - Tuesday, April 5, 2016 - link

  • wishgranter - Tuesday, April 5, 2016 - link

    While introducing a wide array of 10nm-class DDR4 modules with capacities ranging from 4GB for notebook PCs to 128GB for enterprise servers, Samsung will be extending its 20nm DRAM line-up with its new 10nm-class DRAM portfolio throughout the year.
  • nathanddrews - Thursday, March 31, 2016 - link

    Perf/W is obviously a very exciting metric for server farmers and it generally exciting from a basic technology perspective, but it's absolute performance isn't amazing. Anyway, it's not like I'll be buying one anyway. LOL
  • asendra - Thursday, March 31, 2016 - link

    This interest me in so far as this would be the updated processors in a supposedly-coming-this-year Mac Pro refresh. Not that I would personally fork that much cash, but I'm interested to see how much of a jump they will make.

    But things seam rather bleak. No wonder they decided to wait 3 years for a refresh.
  • MrSpadge - Thursday, March 31, 2016 - link

    Not sure which years you're counting in, but for the majority of us it takes 1.5 years from 09/2014 to today.
  • asendra - Thursday, March 31, 2016 - link

    Apple didn't update the MacPros with Haswell-EP. They are still using Ivy Bridge
  • tipoo - Thursday, March 31, 2016 - link

    Wonder what they'll do on the GPU side though. Too early for next generation 14nm FF GPUs from anyone, if Nvidia was even a choice due to OpenCL politics. Another GCN 1.0 part in 2016 would be...A bag of hurt.

    Still waiting on the high end 15" rMBP to have something better than GCN 1.0...The performance, shockingly, hasn't come all that far from even my Iris Pro model. Maybe double, which is something, but I'd like larger than that to upgrade from integrated...
  • extide - Thursday, March 31, 2016 - link

    Nah, if they refresh it late this year, like in august or something like that, then 14/16nm FF GPU's will be available.

    At worst we would get GCN 1.2, but yeah it would suck to see 28nm GPU's put in there...
  • mdriftmeyer - Thursday, March 31, 2016 - link

    On what planet do you not grasp FinFET 14nm end of June from AMD?

Log in

Don't have an account? Sign up now