The Intel Xeon E5 v4 Review: Testing Broadwell-EP With Demanding Server Workloads
by Johan De Gelas on March 31, 2016 12:30 PM EST- Posted in
- CPUs
- Intel
- Xeon
- Enterprise
- Enterprise CPUs
- Broadwell
A Modest Tick
As Broadwell is a tick - a die shrink of an existing architecture, rather than a new architecture - so you should expect modest IPC improvements. Most Xeon E5 v4 SKUs have slightly lower clockspeeds compared to their Haswell v3 brethren, so overall the single threaded performance has hardly improved. Clock for clock, Intel tells us that their simulation tools show that Broadwell delivers about 5% better performance per clock in non-AVX2 traces.
First Y-axis + bars: simulated single threaded performance improvement. Blue line + second Y-axis is the cumulative improvement.
In that sense, Broadwell is basically a Haswell made on Intel's 14nm second generation tri-gate transistor process. Intel did make a few subtle improvements to the micro-architecture:
- Faster divider: lower latency & higher throughput
- AVX multiply latency has decreased from 5 to 3
- Bigger TLB (1.5k vs 1k entries)
- Slightly improved branch prediction (as always)
- Larger scheduler (64 vs 60)
None of these improvements will yield large performance improvements. The larger improvements must come from other features.
New Features
Compared to Haswell-EP, Broadwell-EP also includes some new features. The first one is the improved power control unit.
On Haswell, one AVX instruction on one core forced all cores on the same socket to slow down their clockspeed by around 2 to 4 speed bins (-200,-400 MHz) for at least 1 ms, as AVX has a higher power requirement that reduces how much a CPU can turbo. On Broadwell, only the cores that run AVX code will be reducing their clockspeed, allowing the other cores to run at higher speeds.
The other performance feature is the vastly improved PCLMULQDQ (carry-less multiplication) instruction: throughput has been doubled, and latency reduced from 7 cycles to 5.
This increases AES (symmetric) encryption performance by 20-25%, and CRCs (Cyclic Redundancy check) are up to 90% faster. Broadwell also has some new ADCX/ADOX instructions to speed up asymmetric encryption algorithms such as the popular RSA. These improvements are implemented in OpenSSL 1.0.2-beta3. But don't expect too much from it.. The compute intensive asymetric encryption is mostly used to initiate a secure connection. Most modern web applications keep their sessions "alive", and as a result, events that require asymmetric encryption happen a lot less frequentely . Symmetric encryption (like AES) which is used to send encrypted data is a lot lighter, so even on a fully encrypted website with long encrypted data streams, encryption is only a small percentage (<5%) of the total computing load.
112 Comments
View All Comments
ltcommanderdata - Friday, April 1, 2016 - link
Does anyone know the Windows support situation for Broadwell-EP for workstation use? Microsoft said Broadwell is the last fully supported processor for Windows 7/8.1 with Skylake getting transitional support and Kaby Lake will not be supported. So how does Broadwell-EP fit in? Is it lumped in with Broadwell and is fully supported or will it be treated like Skylake with temporary support until 2018 and only critical security updates after that? And following on will Skylake-EP see any Windows 7/8.1 support at all or will it not be supported since it'll presumably be released after Kaby Lake?extide - Friday, April 1, 2016 - link
When MS says they are not supporting Skylake on Windows 7 DOES NOT MEAN it won't work. It just means they are not going to add any specific support for that processor in the older OS's. They are not adding in the speed shift support, essentially.For some reason the press has not made this very clear, and many people are freaking out thinking that there will be a hard break here will stuff will straight up not work. That is not the case.
Broadwell has no new OS level features over Haswell (unlike Skylake with speed shift) so there is nothing special about Broadwell to the OS. As the poster above mentions, they are all x86 cpu's and will all still work with x86 OS's.
The difference here is between "Fully Supported" and Compatible. Skylake and even Kaby Lake will be compatible with WIndows 7/8/8.1.
aryonoco - Friday, April 1, 2016 - link
Johan, this is yet again by far the best Enterprise CPU benchmark that's available anywhere on the net.Thank you for your detailed, scientific and well documented work. Works like this are not easy, I can only imagine how many man hours (weeks?) compiling this article must have taken. I just want you to know that it's hugely appreciated.
JohanAnandtech - Friday, April 1, 2016 - link
Great to read this after weeks of hard work! :-Dfsdjmellisse - Friday, April 1, 2016 - link
hello, i want to buy E5-2630L v4any one can give me website for buy it ?
Best regards
HrD - Friday, April 1, 2016 - link
I'm confused by the following:"The following compiler switches were used on icc:
-fast -openmp -parallel
The results are expressed in GB per second. The following compiler switches were used on icc:
-O3 –fopenmp –static"
Shouldn't one of these refer to icc and the other to gcc?
JohanAnandtech - Friday, April 1, 2016 - link
Pretty sure I did not mix them up. "-fast" does not work on gcc neither does -fopenmp work on icc.patrickjp93 - Friday, April 1, 2016 - link
Um, wrong and wrong. -Ofast works with GCC 4.9 and later for sure. And -fopenmp is a valid ICC flag post-ICC 13.JohanAnandtech - Saturday, April 2, 2016 - link
"-fast" is a typical icc flag. (I did not write -"Ofast" that works on gcc 4.8 too)extide - Friday, April 1, 2016 - link
Johan, if you read the comment, you can see that you mention icc for BOTH.