Improvements to the Cache Hierarchy

The biggest under-the-hood change for the Ryzen 2000-series processors is in the cache latency. AMD is claiming that they were able to knock one-cycle from L1 and L2 caches, several cycles from L3, and better DRAM performance. Because pure core IPC is intimately intertwined with the caches (the size, the latency, the bandwidth), these new numbers are leading AMD to claim that these new processors can offer a +3% IPC gain over the previous generation.

The numbers AMD gives are:

  • 13% Better L1 Latency (1.10ns vs 0.95ns)
  • 34% Better L2 Latency (4.6ns vs 3.0ns)
  • 16% Better L3 Latency (11.0ns vs 9.2ns)
  • 11% Better Memory Latency (74ns vs 66ns at DDR4-3200)
  • Increased DRAM Frequency Support (DDR4-2666 vs DDR4-2933)

It is interesting that in the official slide deck AMD quotes latency measured as time, although in private conversations in our briefing it was discussed in terms of clock cycles. Ultimately latency measured as time can take advantage of other internal enhancements; however a pure engineer prefers to discuss clock cycles.

Naturally we went ahead to test the two aspects of this equation: are the cache metrics actually lower, and do we get an IPC uplift?

Cache Me Ousside, How Bow Dah?

For our testing, we use a memory latency checker over the stride range of the cache hierarchy of a single core. For this test we used the following:

  • Ryzen 7 2700X (Zen+)
  • Ryzen 5 2400G (Zen APU)
  • Ryzen 7 1800X (Zen)
  • Intel Core i7-8700K (Coffee Lake)
  • Intel Core i7-7700K (Kaby Lake)

The most obvious comparison is between the AMD processors. Here we have the Ryzen 7 1800X from the initial launch, the Ryzen 5 2400G APU that pairs Zen cores with Vega graphics, and the new Ryzen 7 2700X processor.

This graph is logarithmic in both axes.

This graph shows that in every phase of the cache design, the newest Ryzen 7 2700X requires fewer core clocks. The biggest difference is on the L2 cache latency, but L3 has a sizeable gain as well. The reason that the L2 gain is so large, especially between the 1800X and 2700X, is an interesting story.

When AMD first launched the Ryzen 7 1800X, the L2 latency was tested and listed at 17 clocks. This was a little high – it turns out that the engineers had intended for the L2 latency to be 12 clocks initially, but run out of time to tune the firmware and layout before sending the design off to be manufactured, leaving 17 cycles as the best compromise based on what the design was capable of and did not cause issues. With Threadripper and the Ryzen APUs, AMD tweaked the design enough to hit an L2 latency of 12 cycles, which was not specifically promoted at the time despite the benefits it provides. Now with the Ryzen 2000-series, AMD has reduced it down further to 11 cycles. We were told that this was due to both the new manufacturing process but also additional tweaks made to ensure signal coherency. In our testing, we actually saw an average L2 latency of 10.4 cycles, down from 16.9 cycles in on the Ryzen 7 1800X.

The L3 difference is a little unexpected: AMD stated a 16% better latency: 11.0 ns to 9.2 ns. We saw a change from 10.7 ns to 8.1 ns, which was a drop from 39 cycles to 30 cycles.

Of course, we could not go without comparing AMD to Intel. This is where it got very interesting. Now the cache configurations between the Ryzen 7 2700X and Core i7-8700K are different:

CPU Cache uArch Comparison
  AMD
Zen (Ryzen 1000)
Zen+ (Ryzen 2000)
Intel
Kaby Lake (Core 7000)
Coffee Lake (Core 8000)
L1-I Size 64 KB/core 32 KB/core
L1-I Assoc 4-way 8-way
L1-D Size 32 KB/core 32 KB/core
L1-D Assoc 8-way 8-way
L2 Size 512 KB/core 256 KB/core
L2 Assoc 8-way 4-way
L3 Size 8 MB/CCX
(2 MB/core)
2 MB/core
L3 Assoc 16-way 16-way
L3 Type Victim Write-back

AMD has a larger L2 cache, however the AMD L3 cache is a non-inclusive victim cache, which means it cannot be pre-fetched into unlike the Intel L3 cache.

This was an unexpected result, but we can see clearly that AMD has a latency timing advantage across the L2 and L3 caches. There is a sizable difference in DRAM, however the core performance metrics are here in the lower caches.

We can expand this out to include the three AMD chips, as well as Intel’s Coffee Lake and Kaby Lake cores.

This is a graph using cycles rather than timing latency: Intel has a small L1 advantage, however the larger L2 caches in AMD’s Zen designs mean that Intel has to hit the higher latency L3 earlier. Intel makes quick work of DRAM cycle latency however.

Talking 12nm and Zen+ Translating to IPC: All This for 3%?
Comments Locked

545 Comments

View All Comments

  • utmode - Saturday, April 21, 2018 - link

    Thank you Ian for testing on Latest Win10 patch.
  • eva02langley - Saturday, April 21, 2018 - link

    All this because of 1080p gaming benchmarks...

    All this circus for a benchmark,

    1. with an enthusiast GPU
    2. with a high end CPU
    3. at 1080p
    4. and affecting only 144hz users

    This bench needs to be gone. It is misleading and inaccurate depending if the GPU is bottleneck or not. Joe Blo looking at these, don't understand that buying a RX 580, is not going to get out the same thing from extracting the results out of these stupid CPU benchmarks at 1080p.

    Joe Blo is not going to know until he sees budget, high end and enthusiasm GPU in plays with his intended CPU purchase. WE KNOW, but they don't.

    All this for a stupid bench that impact 1080p @ 144HZ users.

    I am having a 1080 TI @ 2160p, I can tell you that this stupid bench doesn't do jack in my situation... but the multi-threaded results does.
  • wizyy - Saturday, April 21, 2018 - link

    Well, although admittedly there are users that aren't interested in 1080p 144Hz performance numbers, there are LARGE sum of players that need exactly that.
    Cybercafe that I'm administering, for one, has 40 PCs with 40 144Hz monitors.
  • eva02langley - Monday, April 23, 2018 - link

    My point is that by looking at numbers, you can get the wrong idea.

    Unless you test a budget, mid range and high-end GPU at 1080p, 1440p and 2160p with a specified CPU, you don't get a clear picture.

    As of today, this bench is only specific to 1080p @ 144Hz which represent a really % of potential users.

    Like I was saying, I am at 2160p, this render this bench totally useless. GPU bottleneck is going to be more and more present in the future because resolution is just increasing.
  • mapesdhs - Monday, May 14, 2018 - link

    There aren't large numbers at all. The no. of gamers who use high frequency monitors of any kind is a distinct minority. Irony is they're resensitising their visual system to *need* higher refresh rates, and they can't go back (ref New Scientis article last year IIRC). IMO this whole push for high refresh rates is just a sideways move by the industry because nobody bothers introducing new visual features anymore, there's been nothing new on that front for many years. Nowadays it's just performance, and encouraging refresh is one way of pushing it. How convenient that these things came a long just as GPU tech had become way more than powerful enough to run any game at 60Hz locked.
  • mapesdhs - Monday, May 14, 2018 - link

    (I was replying to wizyy btw; why does the reply thing put messages in the wrong place? This forum system is so outdated, and still can't edit)
  • aliquis - Saturday, April 21, 2018 - link

    You are simply wrong.
    Doing say 4K benchmarks would just make people think it doesn't matter which CPU you have and all are the same for gaming which is totally wrong and inaccurate.
    Benchmarks for CPU game performance should definitely be done at a low resolution and with a powerful graphics card. Sweclockers still did 720p medium. The problem with medium is that you may lower the load on the CPU for things such as physics and reflections. Still valid for high fps gamers but maybe should be combined with a higher setting too in case that use more CPU. Ultra as worst case scenario.

    Opposite of your suggestion which basically result in no data and hence you could just as well not benchmarks games whatsoever instead do it at a low resolution but then simple conclude something like "Even an i3 or Ryzen 3 is enough to achieve a 60 fps avg experience" for instance. If that was the case. Then it would still be accurate and useful and people could decide themselves how much they care about 140 or 180 fps.

    All these idiots who claim the Intel lead is only there I low resolutions are wrong and fool others. The Intel lead in executing game code is always there. It's just that you of course need to have a strong enough gpu vs settings and resolution to be able to appreciate / get it too. But that run in both directions. On YouTube someone tested for instance project cars on the new cpus and he had little above 50% gpu load so obviously didn't used it all and was bottleneck ed by the CPU performance yet only used just over 20% of the CPU which for the uneducated may seem like pooh the Ryzen is so powerful for games with so much headroom but it's not because clearly one (virtual) core was fully utilised and couldn't offer more performance and the rest and unused capacity was and is irrelevant for the game performance because the game aren't using it anyway. It doesn't help to have unused cores. It do help to have more powerful ones though.
  • aliquis - Saturday, April 21, 2018 - link

    And avg 144 fps isn't enough. Lows matter and even then even shorter frame times would get even more up to date images if vsync off.

    But sure if all frame times are below 1/144 s you're likely fine in that situation. But a few at 1/50s isn't perfect.
  • mapesdhs - Monday, May 14, 2018 - link

    144 isn't enough? :D That's hillarious. Welcome to the world of gamers who sensitised themselves away from normal refresh rates, and now they can't go back. You're chasing moving goal posts. Look up the article in New Scientist about this.
  • Singuy888 - Saturday, April 21, 2018 - link

    It doesn't matter which CPU you choose for gaming. That's the point but people like to dig up tech from 10 years ago to prove one vs the other. Games are going multithreaded and even Intel is pushing this. So 1080p heavy single threaded gaming benchmank is misleading unless you like living in the past. You win with ryzen @ 1440P or above and you win with future highly multithreaded games. But nope..let's just test world of Warcraft at 720p to show Intel's dominance because that's the future?

Log in

Don't have an account? Sign up now