A new update to the Intel document for software developers indicates that the company will begin to introduce various AVX-512 instruction set extensions to its consumer CPUs soon. This will start from the codenamed Cannon Lake (CNL) and Ice Lake (ICL) processors, made using 10 nm process technologies. The new extensions will enable future chips to improve performance in certain applications. One of the main questions on AVX-512 is which consumer programs will actually support the AVX-512 when these CNL and ICL processors hit the market. In addition to the AVX-512, the upcoming processors will introduce a host of other new non-AVX-512 instructions.

AVX-512 Coming to Consumer CPUs

According to the Intel Architecture Instruction Set Extensions and Future Features Programming Reference document, Intel’s Cannon Lake CPUs will support AVX512F, AVX512CD, AVX512DQ, AVX512BW, and AVX512VL. This will bring the feature set of these CPUs to the current level of the Skylake-SP based processors. In addition, the Cannon Lake microarchitecture will support the AVX512_IFMA and AVX512_VBMI commands, but at this point, it is unclear whether the support will be limited to servers, or will also be featured in the consumer processors (the latter scenario is likely based on the document wording, but remains unclear).

Intel originally promised to release Cannon Lake processors in 2016 – 2017 timeframe, but delayed introduction of its 10 nm process technology to 2018, thus postponing the CPU launch as well. Initially it was expected that the Cannon Lake CPUs would generally resemble the Kaby Lake and Coffee Lake chips with some refinements, but the addition of the AVX-512 support means a rather tangible architecture improvement. For AVX-512, large the chunks of data require massive memory bandwidth, which the Skylake-SP cores get due to large caches and more memory controllers. Keeping in mind memory bandwidth and power consumption factors, the AVX-512 might not be supported by all Cannon Lake client CPUs, but only by those aimed at higher-performance machines (i.e., no AVX-512 for ULP mobile parts as well as entry-level desktop SKUs, but this is a speculation at this point). Meanwhile, a good news is that by the time AVX-512-supporting Cannon Lake processors arrive, programs for client PCs that take advantage of the latest extensions will likely be available.

The evolution of the AVX-512 on general-purpose CPUs is not going to stop. Intel’s Ice Lake processors will support AVX512_VPOPCNTDQ (which will also be supported by the Xeon Phi ‘Knights Mill’) commands as well as AVX512_VNNI, AVX512_VBMI2, AVX512+VPCLMULQDQ and AVX512_BITALG instructions. The ICL chips will also feature AVX-512 versions of known AES and GFNI algorithms for encryption and error corrections — AVX512+VAES and AVX512+GFNI.

Meanwhile, the Knights Mill will exclusively support AVX512_4FMAPS and AVX512_4VNNI (at least for a while, because an Intel filing with the Linux kernel states that the upcoming Xeon Phi and Xeon CPUs will support both commands, but descriptions of Linux patches are not always accurate, plus, plans tend to change).

AVX-512 Support Propogation by Various Intel CPUs
  Xeon, Core X General Xeon Phi  
Skylake-SP AVX512BW
AVX512DQ
AVX512VL
AVX512F
AVX512CD
AVX512ER
AVX512PF
Knights Landing
Cannon Lake AVX512VBMI
AVX512IFMA
AVX512_4FMAPS
AVX512_4VNNIW
Knights Mill
Ice Lake AVX512_VNNI
AVX512_VBMI2
AVX512_BITALG
AVX512+VAES
AVX512+GFNI
AVX512+VPCLMULQDQ
AVX512_VPOPCNTDQ
Source: Intel Architecture Instruction Set Extensions and Future Features Programming Reference (pages 12 and 13)

As it turns out from Intel’s document, the Cannon Lake and Ice Lake processors will have an up-to-date AVX-512 support. It is unknown whether the CNL and the ICL cores will be used inside the future server processors (remember that Intel has server-specific 'Cascade Lake' product incoming), but if this is the case, then it looks like Intel’s cores for server and client computers will have the same feature-set going forward, at least when it comes to the AVX-512 support.

Adding the AVX-512 to consumer processors looks like an important development even though the instruction set was primarily designed to process large amounts of data common for servers and, to a degree, workstations (such as encoding, rendering, cryptography, deep learning, etc.). Apparently, Intel believes that 512-bit INT/FP calculations will be important for mainstream PCs as well. A big question is how exactly Intel plans to implement the AVX-512 in various Cannon Lake and Ice Lake processors going forward. Keep in mind that Intel’s six and eight-core Skylake-X CPUs officially support one fused FMA for AVX-512-F, but the chips with 10+ cores officially support dual 512-bit AVX-512-F ports and can offer up to two times higher performance. So in that respect, there is potential for further differentiation between products.

In the meantime, Intel’s Cannon Lake and Ice Lake CPUs will have a number of other new instructions for various matters and they are certainly worth looking at.

New Instructions to Improve Security, Performance of Upcoming CPUs

In a bid to speed up certain cryptography algorithms, Cannon Lake will feature the SHA-NI instruction set that is already supported by the Goldmont cores. SHA-NI is of a similar base to AES-NI, that was added several generations prior. Based on Intel’s publications, SHA-NI can speed up SHA1, SHA256 and SHA224 algorithms. In addition, the new CPUs will also support the UMIP security mechanism that prevents the execution of certain instructions in if their privilege level is insufficient for that, preventing certain apps from accessing the OS settings.

The Ice Lake chips will bring support for Fast Short REP MOV instruction that will enable fast moves of large amounts of data from one location to another, which will benefit optimized memory-intensive applications. Keep in mind that we are moving towards persistent memory for a number of server applications and therefore large amounts of data located in DRAM and/or NVDIMMs will be more common in the future.

Another interesting feature supported by the Ice Lake consumer processors is CLWB (Cache Line Write Back) command for NVMe programming. The feature is already supported by the Skylake-SP cores and is required to better handle SSDs connected to the processor, but will come into consumer products with Ice Lake. CLWB flushes the write caches, but does not invalidate the data, making it available if it is needed after the line is flushed, thus improving performance in certain situations. Given the Purley/Skylake-SP context, CLWB is something required for upcoming NVDIMMs (based on 3D XPoint), but it is not completely clear how Intel expects to use it in case of consumer platforms (they make sense for certain workstation applications and for that reason CLWB is supported by SKL-SP). In any case, the addition of CLWB will add some speed in certain cases when very fast SSDs are used and cache miss is an issue.

There are other features coming in the Goldmont Plus (the heart of upcoming Gemini Lake SoCs) and Ice Lake processors, namely PTWRITE and RDPID, which seem to be aimed mostly at software developers and which purpose may not benefit end users right away.

Instruction Set Extensions of Cannon Lake, Ice Lake and Goldmont+ CPUs
  Instruction Purpose Description
Cannon Lake SHA-NI Security Cryptography acceleration.
UMIP

User-Mode Instruction Prevention
Security Prevents execution of certain instructions if the Current Privilege Level (CPL) is greater than 0. If these instructions were executed while in CPL > 0, user space applications could have access to system-wide settings such as the global and local descriptor tables, the task register and the interrupt descriptor table.
Ice Lake CLWB

Cache Line
Write Back
Performance Writes back modified data of a cache line similar to CLFLUSHOPT, but avoids invalidating the line from the cache (and instead transitions the line to non-modified state). CLWB attempts to minimize the compulsory cache miss if the same data is accessed temporally after the line is flushed if the same data is accessed temporally after the line is flushed.
Fast Short REP MOV Performance Enables fast moves of data from one location to another.
RDPID

Read Processor ID
General Quickly reads processor ID to discover its feature set and apply optimizations/use specific code path if possible.
Goldmont Plus PTWRITE

Write Data to a Processor Trace Packet
Debugging Unclear.
UMIP Security See above
RDPID General See above
Source: Intel Architecture Instruction Set Extensions and Future Features Programming Reference (pages 12 and 13)

Some History

Intel and AMD have been adding various instruction set extensions to the x86 architecture since the mid-1990s. Throughout the recent 20 years, both companies have brought in hundreds of new instructions designed to improve performance in various applications by SIMD instructions and feeding CPU cores large amounts of data at once or by using special-purpose hardware. Intel’s latest mainstream extensions are called the AVX/AVX2 and their main purposes were increasing the width of the register file (both SIMD and integer) to 256 bits and the introduction of commands like the FMA3 (that serves the same purpose — does relatively complex computations in one instruction). To perform 256-bit AVX2 operations, CPUs have to lower their frequency to maintain stability, as cores tend to draw a lot of power under such workloads, but even at lower clock rates AVX/AVX2 make a lot of sense and increase overall throughput.

The next step in the evolution of the instruction set extensions that Intel made was the AVX-512. With AVX-512 the company decided to introduce different sets of instructions for different applications and implemented them in different products. Some of the AVX-512 extensions are aimed primarily at enterprise workloads, whereas the others are needed for supercomputers or high performance compute. Implementing all of them in in all products hardly makes a lot of sense for Intel and its customers, so the latest Skylake-SP Xeons (and the high-end desktop processors) support one set of AVX-512 commands and the Xeon Phis support another one. In the meantime, contemporary mainstream consumer CPUs do not support AVX-512 at all.  One of the reasons for this is because the physical implementation significantly increases die size (by up to 15% in case of the Skylake core). Other factors such as the cost associated with a die increase, and partly because client applications today cannot take advantage of such instructions, are also in the mix. In the future, this is going to change as Intel plans to enable support of certain AVX-512 variations in its future Cannon Lake and Ice Lake processors for mainstream consumers.

Wrapping Up

The addition of the AVX-512 to the future consumer CPUs is a good news for those who use such processors for things like video encoding, rendering or other applications that are common for workstations. Meanwhile, with the Ice Lake consumer chips, Intel is adding a deep learning-specific (AVX512_VNNI) 512-bit instructions as well as the NV-DIMM-oriented features such as CLWB, although immediate advantages for this market segment are unclear. Intel is opening this information up to allow developers to prepare for these processors and develop software in advance. In any case, all new features are always welcome by many because at some point they start to bring certain advantages.

Related Reading

Source: Intel (via WikiChip Twitter).

Comments Locked

50 Comments

View All Comments

  • MrSpadge - Thursday, October 19, 2017 - link

    Mobo support won't change with AVX512. Either they need a new one for Cannon Lake or not.
  • Flunk - Thursday, October 19, 2017 - link

    Considering Intel is requiring new motherboards for Coffee Lake without any real reason I think we can be assured they'll require new ones for Canon Lake as well.
  • smilingcrow - Thursday, October 19, 2017 - link

    There is a reason but not one that everyone accepts as being valid.
    Personally I don't know enough about how power delivery to chips impacts the requirement to break compatibility with a new CPU design so I just don't know what the truth is.
    Some people prefer to believe their own 'story' about the scenario even though they know no more than me on the subject.
    That just tells you something about their levels of bias and intelligence rather than the underlying issue.
    But this is the age we live in where people believe their own fake 'truths' so are also suggestible to the fake news of others.
  • edzieba - Thursday, October 19, 2017 - link

    Intel have stuck to two-gens-per-socket for the last decade for the consumer socket line (Socket Hx), so I expect that to continue. Whatever the successor to Coffee Lake is, it will likely use the same socket as Coffee Lake. If that's Cannon Lake, then we'd see Cannon Lake on Socket H5 ('LGA 1151-2') and Ice Lake on Socket H6.
  • smilingcrow - Thursday, October 19, 2017 - link

    Historically that is correct but there is strong evidence pointing to CL being a platform that is stand alone so is neither forward or backward compatible.
  • gchernis - Thursday, October 19, 2017 - link

    Very exciting developments! (A copy-paste bug: "if the same data is accessed temporally after the line is flushed")
  • HStewart - Thursday, October 19, 2017 - link

    First of all, these next generation chips sound like a major CPU architecture change. My first job was almost 7 years of assembly language program and one thing is for sure REP MOV type of memory moves is used a lot - so Icy Lake looks quite fast chip especially if they implemented so existing code runs with new micro code.

    But one thing about Cannon Lake, I thought it was primary aim at low power mobile chips and also that was planned before year end. Maybe the desktop chips are 2018.

    This could mean that AVX-512 is coming to laptops which is going to be significant - because it means more applications are likely will used it since it is bigger market than desktop now a days
  • 29a - Thursday, October 19, 2017 - link

    I would like to see benchmarks with AVX enabled vs disabled if that is possible.
  • quaz0r - Thursday, October 19, 2017 - link

    Sweet! It's about time locusts were able to leverage the power of AVX-512. Honestly though "consumer" is the worst term EVER. EVER. Did I mention EVER? It maybe works to refer to people as locusts within the context of buying food perhaps, but CPUs? Why can't we ever just say "customers" or something that is at least neutral?
  • Elstar - Thursday, October 19, 2017 - link

    Just FYI – Intel has been incrementally optimizing REP MOV since at least Nehalem. This particular optimization should make it possible to use REP MOV in cases where the amount of data copied is "short".

Log in

Don't have an account? Sign up now