Announcement Four: AVX-512 & Favored Core

To complete the set, there are a couple of other points worth discussing. First up is that AVX-512 support coming to Skylake-X. Intel has implemented AVX-512 (or at least a variant of it) in the last generation of Xeon Phi processors, Knights Landing, but this will be the first implementation in a consumer/enterprise core.

Intel hasn’t given many details on AVX-512 yet, regarding whether there is one or two units per CPU, or if it is more granular and is per core. We expect it to be enabled on day one, although I have a suspicion there may be a BIOS flag that needs enabling in order to use it.

As with AVX and AVX2, the goal here is so provide a powerful set of hardware to solve vector calculations. The silicon that does this is dense, so sustained calculations run hot: we’ve seen processors that support AVX and AVX2 offer decreased operating frequencies when these instructions come along, and AVX-512 will be no different. Intel has not clarified at what frequency the AVX-512 instructions will run at, although if each core can support AVX-512 we suspect that the reduced frequency will only effect that core.

With the support of AVX-512, Intel is calling the Core i9-7980X ‘the first TeraFLOP CPU’. I’ve asked details as to how this figure is calculated (software, or theoretical), but it does make a milestone in processor design. We are muddying the waters a bit here though: an AVX unit does vector calculations, as does a GPU. We’re talking about parallel compute processes completed by dedicated hardware – the line between general purpose CPU and anything else is getting blurred.

Favored Core

For Broadwell-E, the last generation of Intel’s HEDT platform, we were introduced to the term ‘Favored Core’, which was given the title of Turbo Boost Max 3.0. The idea here is that each piece of silicon that comes off of the production line is different (which is then binned to match to a SKU), but within a piece of silicon the cores themselves will have different frequency and voltage characteristics. The one core that is determined to be the best is called the ‘Favored Core’, and when Intel’s Windows 10 driver and software were in place, single threaded workloads were moved to this favored core to run faster.

In theory, it was good – a step above the generic Turbo Boost 2.0 and offered an extra 100-200 MHz for single threaded applications. In practice, it was flawed: motherboard manufacturers didn’t support it, or they had it disabled in the BIOS by default. Users had to install the drivers and software as well – without the combination of all of these at work, the favored core feature didn’t work at all.

Intel is changing the feature for Skylake-X, with an upgrade and for ease-of-use. The driver and software are now part of Windows updates, so users will get them automatically (if you don’t want it, you have to disable it manually). With Skylake-X, instead of one core being the favored core, there are two cores in this family. As a result, two apps can be run at the higher frequency, or one app that needs two cores can participate.

Availability

Last but not least, let's talk about availability. Intel will likely announce availability during the keynote at Computex, which is going on at the same time as this news post goes live. The launch date should be sooner rather than later for the LCC parts, although the HCC parts are unknown. But no matter what, I think it's safe to say that by the end of this summer, we should expect a showdown over the best HEDT processor around.

Announcement Three: Skylake-X's New L3 Cache Architecture
Comments Locked

203 Comments

View All Comments

  • shady28 - Tuesday, May 30, 2017 - link

    Looks like a marketing stunt to me. I welcome the 6c/12t part, but most applications can't even effectively use 4c/8t processors. It is a complete waste for 99% of buyers and even the remaining 1% are likely to rarely see a benefit.
  • Maleorderbride - Tuesday, May 30, 2017 - link

    Your statement just betrays your ignorance and your lack of imagination. Computers are tools for quite a few people, so they will pay considerable sums for better tools which in turn earn them more money.

    Video editing and 3D work can and will use all cores. While I am not going to claim they are a large percentage of the market, they routinely purchase 8/10 core options. I have quite a few customers running X99 boards with a single E5-2696 V4 dropped in ($1400 on ebay) and it excels in some workflows.

    They are not "rarely" use these extra cores--they are using them every single day and it is the primary reason for purchase.
  • shady28 - Tuesday, May 30, 2017 - link


    Lol! The childish insults aside, you think those thoughts you regurgitated are new? Professional video editors make a tiny fraction of a tiny fraction of the market, and if they are smart they aren't using CPUs for much. Most people who profess this 'need' to do 3D video editing are playing anyway, not working. Like I already said, a fraction of a 1% use case.

    Common sense says Intel did not release these for the 0.1% of users who might be able to take advantage of it. They released it to make suckers of the other 99.9%. Your comments indicate they are once again succeeding.
  • Maleorderbride - Wednesday, May 31, 2017 - link

    Your post made a claim about 100% of the market. Obviously you over-claimed. You can't edit posts here, so your "like I said," followed by a watered down version of your post is just a transparent attempt to save your ego. Your assumptions about whether people who claim to be video editors are really "working" is irrelevant.

    As for blaming video professionals for even using a CPU, you obviously are unaware that some codecs are entirely CPU bound when transcoding, and that these professionals (DITs especially) are under pressure to complete transcodes as quickly as possible on location. Every other person there is waiting for them.

    Are many things GPU accelerated? Yes, but being "smart" has nothing to do with it. Sometimes one can use those 2x 1080 Ti's, but sometimes you need 18+ cores, or both. But I guess you got me, I'm a "sucker" if I buy the best tool for a job that makes money.
  • shady28 - Friday, June 2, 2017 - link

    First sentence in your post is a lie, else you're reading comprehension is challenged. My first post is just a few lines up, it said :
    "It is a complete waste for 99% of buyers and even the remaining 1% are likely to rarely see a benefit."
  • prisonerX - Wednesday, May 31, 2017 - link

    You use applications that are highly parallel everyday and you don't even know it. Maleorderbride is right: you're ignorant and unimaginative.
  • Meteor2 - Saturday, June 3, 2017 - link

    No shady28 is correct here. People who *truly* need HCC on desktop are a vanishingly small minority. This is about headlines and marketing.
  • Namisecond - Wednesday, May 31, 2017 - link

    Welcome to the 1%?
  • helvete - Friday, September 8, 2017 - link

    Have you ever tried to run more than one application at a time? /s
  • Bulat Ziganshin - Tuesday, May 30, 2017 - link

    i can give you details about avx-512 - they are pretty obvious from analysis of skylake execution ports. so

    1) avx-512 is mainly single-issue. all the avx commands that now are supported BOTH on port 0 & port 1, will become avx-512 commands supported on joined port 0+1

    2) a few commands that are supported only on port 5 (this are various bit exchanges), will be also single-issued in avx-512, which still means doubled perfromance - from single-issued avx-256 to single-issued avx-512

    3) a few commands that can be issued on any of 3 ports (0,1,5), including booleans and add/sub/cmp - so-lcalled PADD group, will be double-issued in avx-512, so they will get 33% uplift

    overall, ports 0&1 will join when executing 512-bit commands, while port 5 is extended to 512-bit operands. joined port 0&1 can execute almost any avx-512 command, except for a bit exchange ones, port 5 can execute bit exchanges and PADD group

    when going from sse to avx, intel sacrificed easy of programming for easy of hardware implemenation, resulting in almost fuull lack of commands that can exchane data between upper&lower parts of ymm register. avx-512 was done right, but this means that bit exchange commands require a full 512-bit mesh. so, intel mobed all these commands to port 5 providing full 512 bit implementation, while most remaining commands were moved into ports 0&1 where 512-bit command can be implemented as simple pair of 256-bit ones

    lloking at power budgets, it's obvious that simple doubling of execution resources (i.e. support of 512 bit commands instead of 256-bit ones) is impossible. in previous cpu generation, even avx commands increased energy usage by 40%, so it's easy to predict that extending each executed command to 512 bits will require another 80% increase

    of course, m/a analysis can't say anything about commands absent in avx2 set, so my guess that predicate register manipulations will also go to port 5, just to make the m/a a bit less asymmetric

    also it's easy to predict that in the next generations the first "improvement" will be to add FMAD capability to port 5, further doubling the marketing perfromance figures

    finally, their existing 22-core cpus are already perfrom more than SP teraflop, but this time teraflop will go into HEDT class (while 10 broadwell cores at 3 GHz are only 0.9 tflops capable)

Log in

Don't have an account? Sign up now