Exploring DirectX 12: 3DMark API Overhead Feature Test
by Ryan Smith & Ian Cutress on March 27, 2015 8:00 AM EST- Posted in
- GPUs
- Radeon
- Futuremark
- GeForce
- 3DMark
- DirectX 12
Other Notes
Before jumping into our results, let’s quickly talk about testing.
For our test we are using the latest version of the Windows 10 technical preview – build 10041 – and the latest drivers from AMD, Intel, and NVIDIA. In fact for testing DirectX 12 these latest packages are the minimum versions that the test supports. Meanwhile 3DMark does of course also run on Windows Vista and later, however on Windows Vista/7/8 only the DirectX 11 and Mantle tests are available since those are the only APIs available.
From a test reliability standpoint the API Overhead Feature Test (or as we’ll call it from now, AOFT) is generally reliable under DirectX 12 and Mantle, however we would like to note that we have found it to be somewhat unreliable under DirectX 11. DirectX 11 scores have varied widely at times, and we’ve seen one configuration flip between 1.4 million draw calls per second and 1.9 million draw calls per second based on indeterminable factors.
Our best guess right now is that the variability comes from the much greater overhead of DirectX 11, and consequently all of the work that the API, video drivers, and OS are all undertaking in the background. Consequently the DirectX 11 results are good enough for what the AOFT has set out to do – showcase just how much incredibly faster DX12 and Mantle are – but it has a much higher degree of variability than our standard tests and should be treated accordingly.
Meanwhile Futuremark for their part is looking to make it clear that this is first and foremost a test to showcase API differences, and is not a hardware test designed to showcase how different components perform.
The purpose of the test is to compare API performance on a single system. It should not be used to compare component performance across different systems. Specifically, this test should not be used to compare graphics cards, since the benefit of reducing API overhead is greatest in situations where the CPU is the limiting factor.
We have of course gone and benchmarked a number of configurations to showcase how they benefit from DirectX 12 and/or Mantle, however as per Futuremark’s guidelines we are not looking to directly compare video cards. Especially since we’re often hitting the throughput limits of the command processor, something a real-world task would not suffer from.
The Test
Moving on, we also want to quickly point out the clearly beta state of the current WDDM 2.0 drivers. Of note, the DX11 results with NVIDIA’s 349.90 driver are notably lower than the results with their WDDM 1.3 driver, showing much greater variability. Meanwhile AMD’s drivers have stability issues, with our dGPU testbed locking up a couple of different times. So these drivers are clearly not at production status.
DirectX 12 Support Status | ||||
Current Status | Supported At Launch | |||
AMD GCN 1.2 (285) | Working | Yes | ||
AMD GCN 1.1 (290/260 Series) | Working | Yes | ||
AMD GCN 1.0 (7000/200 Series) | Working | Yes | ||
NVIDIA Maxwell 2 (900 Series) | Working | Yes | ||
NVIDIA Maxwell 1 (750 Series) | Working | Yes | ||
NVIDIA Kepler (600/700 Series) | Working | Yes | ||
NVIDIA Fermi (400/500 Series) | Not Active | Yes | ||
Intel Gen 7.5 (Haswell) | Working | Yes | ||
Intel Gen 8 (Broadwell) | Working | Yes |
And on that note, it should be noted that the OS and drivers are all still in development. So performance results are subject to change as Windows 10 and the WDDM 2.0 drivers get closer to finalization.
One bit of good news is that DirectX 12 support on AMD GCN 1.0 cards is up and running here, as opposed to the issues we ran into last month with Star Swarm. So other than NVIDIA’s Fermi cards, which aren’t turned on in beta drivers, we have the ability to test all of the major x86-paired GPU architectures that support DirectX 12.
For our actual testing, we’ve broken down our testing for dGPUs and for iGPUs. Given the vast performance difference between the two and the fact that the CPU and GPU are bound together in the latter, this helps to better control for relative performance.
On the dGPU side we are largely reusing our Star Swarm test configuration, meaning we’re testing the full range of working DX12-capable GPU architectures across a range of CPU configurations.
DirectX 12 Preview dGPU Testing CPU Configurations (i7-4960X) | |||
Configuration | Emulating | ||
6C/12T @ 4.2GHz | Overclocked Core i7 | ||
4C/4T @ 3.8GHz | Core i5-4670K | ||
2C/4T @ 3.8GHz | Core i3-4370 |
Meanwhile on the iGPU side we have a range of Haswell and Kaveri processors from Intel and AMD respectively.
CPU: | Intel Core i7-4960X @ 4.2GHz |
Motherboard: | ASRock Fatal1ty X79 Professional |
Power Supply: | Corsair AX1200i |
Hard Disk: | Samsung SSD 840 EVO (750GB) |
Memory: | G.Skill RipjawZ DDR3-1866 4 x 8GB (9-10-9-26) |
Case: | NZXT Phantom 630 Windowed Edition |
Monitor: | Asus PQ321 |
Video Cards: | AMD Radeon R9 290X AMD Radeon R9 285 AMD Radeon HD 7970 NVIDIA GeForce GTX 980 NVIDIA GeForce GTX 750 Ti NVIDIA GeForce GTX 680 |
Video Drivers: | NVIDIA Release 349.90 Beta AMD Catalyst 15.200.1012.2 Beta |
OS: | Windows 10 Technical Preview (Build 10041) |
CPU: | AMD A10-7850K AMD A10-7700K AMD A8-7600 AMD A6-7400L Intel Core i7-4790K Intel Core i5-4690 Intel Core i3-4360 Intel Core i3-4130T Pentium G3258 |
Motherboard: | GIGABYTE F2A88X-UP4 for AMD ASUS Maximus VII Impact for Intel LGA-1150 Zotac ZBOX EI750 Plus for Intel BGA |
Power Supply: | Rosewill Silent Night 500W Platinum |
Hard Disk: | OCZ Vertex 3 256GB OS SSD |
Memory: | G.Skill 2x4GB DDR3-2133 9-11-10 for AMD G.Skill 2x4GB DDR3-1866 9-10-9 at 1600 for Intel |
Video Cards: | AMD APU Integrated Intel CPU Integrated |
Video Drivers: | AMD Catalyst 15.200.1012.2 Beta Intel Driver Version 10.18.15.4124 |
OS: | Windows 10 Technical Preview (Build 10041) |
113 Comments
View All Comments
silverblue - Saturday, March 28, 2015 - link
Well, varying results aside, I've heard of scores in the region of eight million. That would theoretically (if other results are anything to go off) put it around the level of a mildly-overclocked i3 (stock about 7.5m). Definitely worth bearing in mind the more-than-six-cores scaling limitation showcased by this test - AMD's own tests show this happening to the 8350, meaning that the Mantle score - which can scale to more cores - should be higher. Incidentally, the DX11 scores seem to be in the low 600,000s with a slight regression in the MT test. I saw these 8350 figures in some comments somewhere but forgot where so I do apologise for not being able to back them up, however the Intel results can be found here:http://www.pcworld.com/article/2900814/tested-dire...
I suppose it's all hearsay until a site actually does a CPU comparison involving both Intel and AMD processors. Draw calls are also just a synthetic; I can't see AMD's gaming performance leaping through the stratosphere overnight, and Intel stands to benefit a lot here as well.
silverblue - Saturday, March 28, 2015 - link
Sorry, stock i3 about 7.1m.oneb1t - Saturday, March 28, 2015 - link
my fx-8320@4.7ghz + R9 290x does 14.4mil :) in mantleLaststop311 - Friday, March 27, 2015 - link
I think AMD APU's are the biggest winner here. Since draw calls help lift cpu bottlenecks and the apu's have 4 weaker cores the lack of dx11 to be able to really utilize multi core for draw calls means the weak single threaded performance of the apus could really hold things back here. DX12 will be able to shift the bottleneck back to the igpu of the apu's for a lot of games and really help make more games playable at 1080p with higher settings or at least same settings and smoother.If only AMD would release an updated version of the 20 cu design for the ps4 using GCN 1.3 cores + 16GB of 2nd generation 3d HBM memory directly on top that the cpu or gpu could use, not only would you have a rly fast 1080p capable gaming chip you could design radically new motherboards that omit ram slots entirely. Could have new mini itx boards that have room for more sata ports and usb headers and fan headers and more room available for vrm's and cool it with good water cooling like the thermaltake 3.0 360mm rad AIO and good TIM like the coollaboratory liquid metal ultra. Or you could even take it the super compact direction and even create a smaller board than mini-itx and turn it into an ultimate htpc. And as well as the reduced size your whole system would benefit from the massive bandwidth (1.2TB/sec) and reduced latency. The memory pool could respond in real time to add more space for the gpu as necessary and since apu's are really only for 1080p that will never be a problem. I know this will probably never happen but if it did i would 100% build my htpc with an apu like that
Laststop311 - Saturday, March 28, 2015 - link
As a side question, Is there some contractual agreement that will not allow AMD to sell these large 20 cu designed APU's on the regular pc market? Does sony have exclusive rights to the chip and the techniques used to make such a large igpu? Or is it die size and cost that scares AMD from making the chip for the PC market as their would be a much higher price compared to current apu's? I'm sure 4 excavator cores cant be much bigger than 8 jaguar so if its doable with 8 jaguar it should be doable with 4 excavator, especially if they put it on the 16/14nm finfet node?silverblue - Saturday, March 28, 2015 - link
I'm sure Sony would only be bothered if AMD couldn't fulfill their orders. A PC built to offer exactly the same as the PS4 would generally cost more anyway.They can't very well go from an eight FPU design to one with two/four depending on how you look at it, even if the clocks are much higher. I think you'd need to wait for the next generation of consoles.
FriendlyUser - Saturday, March 28, 2015 - link
I really hope the developers put this to good use. I am also particularly excited about multicore scaling, since single threaded performance has stagnated (yes, even in the Intel camp).jabber - Saturday, March 28, 2015 - link
I think this shows that AMD has got a big boost from being the main partner with Microsoft on the Xbox. It's meant that AMD got a major seat at the top DX12 table from day one for a change. I hope to see some really interesting results now that it appears finally AMD hardware has been given some optimisation love other than Intel.Tigran - Saturday, March 28, 2015 - link
>>> Finally with 2 cores many of our configurations are CPU limited. The baseline changes a bit – DX11MT ceases to be effective since 1 core must be reserved for the display driver – and the fastest cards have lost quite a bit of performance here. None the less, the AMD cards can still hit 10M+ draw calls per second with just 2 cores, and the GTX 980/680 are close behind at 9.4M draw calls per second. Which is again a minimum 6.7x increase in draw call throughput versus DirectX 11, showing that even on relatively low performance CPUs the draw call gains from DirectX 12 are substantial. <<<Can you please explain how can it be? I thought the main advantage of new APIs is the workload of all CPU cores (instead of one in DX11). If so, should't the performance double in 2-core mode?Why there is 6.7x increase in draw call instead of 2x ?
Tigran - Saturday, March 28, 2015 - link
Just to make it clear: I know there such advantage of Mantle and DX12 as direct addressing GPU, w/o CPU. But this test is about draw calls, requested from CPU to GPU. How can we boost the number of draw calls apart from using additional CPU core?