Intel’s Silvermont Architecture Revealed: Getting Serious About Mobile
by Anand Lal Shimpi on May 6, 2013 1:00 PM EST- Posted in
- CPUs
- Intel
- Silvermont
- SoCs
OoOE
You’re going to come across the phrase out-of-order execution (OoOE) a lot here, so let’s go through a quick refresher on what that is and why it matters.
At a high level, the role of a CPU is to read instructions from whatever program it’s running, determine what they’re telling the machine to do, execute them and write the result back out to memory.
The program counter within a CPU points to the address in memory of the next instruction to be executed. The CPU’s fetch logic grabs instructions in order. Those instructions are decoded into an internally understood format (a single architectural instruction sometimes decodes into multiple smaller instructions). Once decoded, all necessary operands are fetched from memory (if they’re not already in local registers) and the combination of instruction + operands are issued for execution. The results are committed to memory (registers/cache/DRAM) and it’s on to the next one.
In-order architectures complete this pipeline in order, from start to finish. The obvious problem is that many steps within the pipeline are dependent on having the right operands immediately available. For a number of reasons, this isn’t always possible. Operands could depend on other earlier instructions that may not have finished executing, or they might be located in main memory - hundreds of cycles away from the CPU. In these cases, a bubble is inserted into the processor’s pipeline and the machine’s overall efficiency drops as no work is being done until those operands are available.
Out-of-order architectures attempt to fix this problem by allowing independent instructions to execute ahead of others that are stalled waiting for data. In both cases instructions are fetched and retired in-order, but in an OoO architecture instructions can be executed out-of-order to improve overall utilization of execution resources.
The move to an OoO paradigm generally comes with penalties to die area and power consumption, which is one reason the earliest mobile CPU architectures were in-order designs. The ARM11, ARM’s Cortex A8, Intel’s original Atom (Bonnell) and Qualcomm’s Scorpion core were all in-order. As performance demands continued to go up and with new, smaller/lower power transistors, all of the players here started introducing OoO variants of their architectures. Although often referred to as out of order designs, ARM’s Cortex A9 and Qualcomm’s Krait 200/300 are mildly OoO compared to Cortex A15. Intel’s Silvermont joins the ranks of the Cortex A15 as a fully out of order design by modern day standards. The move to OoO alone should be good for around a 30% increase in single threaded performance vs. Bonnell.
Pipeline
Silvermont changes the Atom pipeline slightly. Bonnell featured a 16 stage in-order pipeline. One side effect to the design was that all operations, including those that didn’t have cache accesses (e.g. operations whose operands were in registers), had to go through three data cache access stages even though nothing happened during those stages. In going out-of-order, Silvermont allows instructions to bypass those stages if they don’t need data from memory, effectively shortening the mispredict penalty from 13 stages down to 10. The integer pipeline depth now varies depending on the type of instruction, but you’re looking at a range of 14 - 17 stages.
Branch prediction improves tremendously with Silvermont, a staple of any progressive microprocessor architecture. Silvermont takes the gshare branch predictor of Bonnell and significantly increased the size of all associated data structures. Silvermont also added an indirect branch predictor. The combination of the larger predictors and the new indirect predictor should increase branch prediction accuracy.
Couple better branch prediction with a lower mispredict latency and you’re talking about another 5 - 10% increase in IPC over Bonnell.
174 Comments
View All Comments
Jumangi - Monday, May 6, 2013 - link
Let me know when Intel has something of actual substance to show and not just bunch of marketing/hype focused Powerpoint slides. ARM continues to delivers solid performance gains year after year with low power usage...Intel says yea we'll will get around to updating our 5 year old design...eventually we promise...yawn...Krysto - Monday, May 6, 2013 - link
Great point. Intel keeps promising how awesome they will be when they launch their new "mobile" chip, and at always it's ALWAYS disappointing, because in the mean time ARM chips keep shipping on their merry way, and keep improving. Fast.A5 - Monday, May 6, 2013 - link
Eh. A15 wasn't exactly a home run. The performance is good for what it is, but they overshot their TDP targets big time.saurabhr8here - Monday, May 6, 2013 - link
A15 wasn't a home run because it has been developed on an early bleeding edge technology. As the process technology matures and the design is optimized for the process, the power/performance numbers will improve.DanNeely - Tuesday, May 7, 2013 - link
A15's problem isn't overshooting TDP targets; it's that it was originally designed for use in entry level NASes and other similar level embedded systems/micro servers. A few extra watts for better CPU performance isn't a big problem there.xTRICKYxx - Tuesday, May 7, 2013 - link
Exactly. A15 was not initially designed for smartphones.Wilco1 - Tuesday, May 7, 2013 - link
That's not correct, ARM has said from the early announcements that it would go into mobiles at lower frequencies and core counts. Of course both core counts and frequencies turned out to be higher than originally expected, so power consumption is higher too. The Exynos 5250 appears to be released quickly in order to be first to market. The Octa core is far more tuned and will do better. NVidia has stated Tegra 4 uses 40% less power than Tegra 3 at equivalent performance levels.Krysto - Monday, May 6, 2013 - link
Let's do a recap. Performance is as high as Cortex A15...a chip launched in 2012.GPU performance is where iPad 4 was...in 2012.
They are doing their benchmarks against last-gen ARM chips...okay.
Intel Silvermont is expected late 2013/early 2014.
Yeah...it's obviously so competitive! NOT.
By the time Intel Silvermont arrives in smartphones (Merrifield), we will see 20nm ARMv8 chips in smartphones, already shipping. Good luck, Intel, another hit and a miss.
As for what you said that Silvermont is conservative because they don't want to basically cannibalize Haswell - that's EXACTLY Intel's biggest problem right now. Their conflict of interest between the low-end, unprofitable Atom division, with the high-end very profitable Core division.
This is exactly what killed their Xscale division, too. And it's what will kill Intel in the end. Because Intel will have to make Atom compete *whether they want to or not*. ARM chips are going to go higher and higher performance and become "good enough" for most everything. What is Intel going to do then? They'll have to keep up, which will slowly eliminate their *profitable* Core chips from the market. And what then? Survive on $20 chips with a dozen competitors? This is going to be very interesting for Intel in the next few years - and not in a good way, especially with a brand new CEO.
Kjella - Monday, May 6, 2013 - link
It's been four months of 2013, how many quad-core ARM processors have launched since 2012? They're comparing against what is out now (if they were able to compare against unreleased ARM processors there'd be something very wrong) and beating them, not sure where your reading comprehension failed there. Looks to me like they're ready for a clash of the titans around year's end. Also 1-5W chips don't compete much with 15-85W Haswells no matter what, AMD is dying fast and people need their x86 computers so whatever. Reminds me of all the posts that say Windows is sooooooo dead.xTRICKYxx - Tuesday, May 7, 2013 - link
AMD is making a lot of money right now.