OoOE

You’re going to come across the phrase out-of-order execution (OoOE) a lot here, so let’s go through a quick refresher on what that is and why it matters.

At a high level, the role of a CPU is to read instructions from whatever program it’s running, determine what they’re telling the machine to do, execute them and write the result back out to memory.

The program counter within a CPU points to the address in memory of the next instruction to be executed. The CPU’s fetch logic grabs instructions in order. Those instructions are decoded into an internally understood format (a single architectural instruction sometimes decodes into multiple smaller instructions). Once decoded, all necessary operands are fetched from memory (if they’re not already in local registers) and the combination of instruction + operands are issued for execution. The results are committed to memory (registers/cache/DRAM) and it’s on to the next one.

In-order architectures complete this pipeline in order, from start to finish. The obvious problem is that many steps within the pipeline are dependent on having the right operands immediately available. For a number of reasons, this isn’t always possible. Operands could depend on other earlier instructions that may not have finished executing, or they might be located in main memory - hundreds of cycles away from the CPU. In these cases, a bubble is inserted into the processor’s pipeline and the machine’s overall efficiency drops as no work is being done until those operands are available.

Out-of-order architectures attempt to fix this problem by allowing independent instructions to execute ahead of others that are stalled waiting for data. In both cases instructions are fetched and retired in-order, but in an OoO architecture instructions can be executed out-of-order to improve overall utilization of execution resources.

The move to an OoO paradigm generally comes with penalties to die area and power consumption, which is one reason the earliest mobile CPU architectures were in-order designs. The ARM11, ARM’s Cortex A8, Intel’s original Atom (Bonnell) and Qualcomm’s Scorpion core were all in-order. As performance demands continued to go up and with new, smaller/lower power transistors, all of the players here started introducing OoO variants of their architectures. Although often referred to as out of order designs, ARM’s Cortex A9 and Qualcomm’s Krait 200/300 are mildly OoO compared to Cortex A15. Intel’s Silvermont joins the ranks of the Cortex A15 as a fully out of order design by modern day standards. The move to OoO alone should be good for around a 30% increase in single threaded performance vs. Bonnell.

Pipeline

Silvermont changes the Atom pipeline slightly. Bonnell featured a 16 stage in-order pipeline. One side effect to the design was that all operations, including those that didn’t have cache accesses (e.g. operations whose operands were in registers), had to go through three data cache access stages even though nothing happened during those stages. In going out-of-order, Silvermont allows instructions to bypass those stages if they don’t need data from memory, effectively shortening the mispredict penalty from 13 stages down to 10. The integer pipeline depth now varies depending on the type of instruction, but you’re looking at a range of 14 - 17 stages.

Branch prediction improves tremendously with Silvermont, a staple of any progressive microprocessor architecture. Silvermont takes the gshare branch predictor of Bonnell and significantly increased the size of all associated data structures. Silvermont also added an indirect branch predictor. The combination of the larger predictors and the new indirect predictor should increase branch prediction accuracy.

Couple better branch prediction with a lower mispredict latency and you’re talking about another 5 - 10% increase in IPC over Bonnell.

Introduction & 22nm Sensible Scaling: OoO Atom Remains Dual-Issue
Comments Locked

174 Comments

View All Comments

  • Spunjji - Wednesday, May 8, 2013 - link

    +1
  • chubbypanda - Monday, May 6, 2013 - link

    The article is about yet to be relased platform. Obviously you could get better information if you work for Intel or its OEM partners. If you don't, Anand's writing is as good as they get.
  • Thrill92 - Tuesday, May 7, 2013 - link

    But what's your point?
  • raptorious - Monday, May 6, 2013 - link

    It seems like every subsequent Anandtech article about Intel that I read sounds more and more like an Intel Marketing slide deck. I think I'd believe that the absolute performance of Silvermont is better than Cortex A15, but I'm very skeptical that the perf/watt will actually be better at the TDP that we care about for a tablet. I have a very hard time believing that a 2-wide OoO architecture will get better IPC than a 3-wide one. In order to achieve better performance, you'd have to very aggressively scale frequency, and as we all know, perf/watt usually decreases as you scale frequency up (C*V^2*F). It MIGHT be better perf/watt in a phone, simply because with a 2-wide architecture, you can scale dynamic power much lower, but of course, then you can't make the ridiculous claims of 1.6x performance.
  • JarredWalton - Monday, May 6, 2013 - link

    FWIW, Intel is willing to provide these detailed slide decks long in advance of the launch of their hardware. The other SoC vendors are far less willing to share information. If Apple, Qualcomm, or some other vendor put together a nice slide deck, I can guarantee we'd be writing about it.
  • B - Monday, May 6, 2013 - link

    @JarredWalton, I completely agree with your assessment. I have listened to every Anandtech Podcast and repeatedly hear Anand and Brian Klug lament the lack of transparency with the other SOC vendors. Those two go through great lengths to get any meaningful information on the roadmaps of Apple, Qualcomm, et al. The bottom line is that currently Intel is accustomed to sharing more information than its peers in the mobile industry and I suspect your readership wants to know what's coming long before the product is released, and this will always include a speculative component.
  • beginner99 - Monday, May 6, 2013 - link

    The intel slides basically say intel will have 8x better performance/watt. Now if you don't believe them, just half the numbers and you are at 4x, which is still huge...I believe it.

    Medfield uses a basically 5 year old design on an older process!!! than current ARM offerings and is competitive in performance/watt (it's actually better already). The only thing is how efficient the GPU will be and even more important how expensive the whole SOC will be. So even if the performance and power data is correct, not guarantee it will succeed.

    I do see why some don't like the article but I think Anand is just enthusiastic and lets be honest, AMD has no delivered anything to be enthusiastic about in years and has a history of misinformation on slides What intel disclosed on slides was usually more or less true in the past so they have more credit than AMD.
  • raptorious - Monday, May 6, 2013 - link

    Showing 8x perf/watt or even 4x perf/watt from generation to generation might be possible by milking numbers, but across the board that is laughably impossible. You're talking about defying the laws of physics. This architecture isn't radically different from A15 or other designs, and the process improvements of 22 nm over 32 nm don't just magically give you 4x perf/watt. If you want to live in Intel's fairy tale land, go ahead.
  • JDG1980 - Monday, May 6, 2013 - link

    Intel has far better fabs than anyone else. That alone gives them a huge advantage. The reason they've been doing so poorly up until now is that (as the article mentions) they've basically been stagnating with an Atom design dating back to 2004. Now that they've updated to a modern design, they should be able to beat their competitors decisively on the hardware side. Whether that will lead to design wins or not, who can say... they're pretty late to this particular game. But they can give it a good shot.
  • t.s. - Tuesday, May 7, 2013 - link

    Yeah, right. Same with AMD. After they 'upgrade' their architecture from star to bulldozer, they automagically have a huge advantage. Remember, changing architecture doesn't necessary a good thing. Moreover for the 1st time you do the change.

Log in

Don't have an account? Sign up now