On our forums and elsewhere over the past couple of weeks there has been quite a bit of chatter on the subject of VRAM allocation on the GeForce GTX 970. To quickly summarize a more complex issue, various GTX 970 owners had observed that the GTX 970 was prone to topping out its reported VRAM allocation at 3.5GB rather than 4GB, and that meanwhile the GTX 980 was reaching 4GB allocated in similar circumstances. This unusual outcome was at odds with what we know about the cards and the underlying GM204 GPU, as NVIDIA’s specifications state that the GTX 980 and GTX 970 have identical memory configurations: 4GB of 7GHz GDDR5 on a 256-bit bus, split amongst 4 ROP/memory controller partitions. In other words, there was no known reason that the GTX 970 and GTX 980 should be behaving differently when it comes to memory allocation.


GTX 970 Logical Diagram


GTX 970 Memory Allocation (Image Courtesy error-id10t of Overclock.net Forums)

Since then there has been some further investigation into the matter using various tools written in CUDA in order to try to systematically confirm this phenomena and to pinpoint what is going on. Those tests seemingly confirm the issue – the GTX 970 has something unusual going on after 3.5GB VRAM allocation – but they have not come any closer in explaining just what is going on.

Finally, more or less the entire technical press has been pushing NVIDIA on the issue, and this morning they have released a statement on the matter, which we are republishing in full:

The GeForce GTX 970 is equipped with 4GB of dedicated graphics memory.  However the 970 has a different configuration of SMs than the 980, and fewer crossbar resources to the memory system. To optimally manage memory traffic in this configuration, we segment graphics memory into a 3.5GB section and a 0.5GB section.  The GPU has higher priority access to the 3.5GB section.  When a game needs less than 3.5GB of video memory per draw command then it will only access the first partition, and 3rd party applications that measure memory usage will report 3.5GB of memory in use on GTX 970, but may report more for GTX 980 if there is more memory used by other commands.  When a game requires more than 3.5GB of memory then we use both segments.

We understand there have been some questions about how the GTX 970 will perform when it accesses the 0.5GB memory segment.  The best way to test that is to look at game performance.  Compare a GTX 980 to a 970 on a game that uses less than 3.5GB.  Then turn up the settings so the game needs more than 3.5GB and compare 980 and 970 performance again.

Here’s an example of some performance data:

GeForce GTX 970 Performance
Settings GTX980 GTX970

Shadows of Mordor

<3.5GB setting = 2688x1512 Very High

72fps

60fps

>3.5GB setting = 3456x1944

55fps (-24%)

45fps (-25%)

Battlefield 4

<3.5GB setting = 3840x2160 2xMSAA

36fps

30fps

>3.5GB setting = 3840x2160 135% res

19fps (-47%)

15fps (-50%)

Call of Duty: Advanced Warfare

<3.5GB setting = 3840x2160 FSMAA T2x, Supersampling off

82fps

71fps

>3.5GB setting = 3840x2160 FSMAA T2x, Supersampling on

48fps (-41%)

40fps (-44%)

On GTX 980, Shadows of Mordor drops about 24% on GTX 980 and 25% on GTX 970, a 1% difference.  On Battlefield 4, the drop is 47% on GTX 980 and 50% on GTX 970, a 3% difference.  On CoD: AW, the drop is 41% on GTX 980 and 44% on GTX 970, a 3% difference.  As you can see, there is very little change in the performance of the GTX 970 relative to GTX 980 on these games when it is using the 0.5GB segment.

Before going any further, it’s probably best to explain the nature of the message itself before discussing the content. As is almost always the case when issuing blanket technical statements to the wider press, NVIDIA has opted for a simpler, high level message that’s light on technical details in order to make the content of the message accessible to more users. For NVIDIA and their customer base this makes all the sense in the world (and we don’t resent them for it), but it goes without saying that “fewer crossbar resources to the memory system” does not come close to fully explaining the issue at hand, why it’s happening, and how in detail NVIDIA is handling VRAM allocation. Meanwhile for technical users and technical press such as ourselves we would like more information, and while we can’t speak for NVIDIA, rarely is NVIDIA’s first statement their last statement in these matters, so we do not believe this is the last we will hear on the subject.

In any case, NVIDIA’s statement affirms that the GTX 970 does materially differ from the GTX 980. Despite the outward appearance of identical memory subsystems, there is an important difference here that makes a 512MB partition of VRAM less performant or otherwise decoupled from the other 3.5GB.

Being a high level statement, NVIDIA’s focus is on the performance ramifications – mainly, that there generally aren’t any – and while we’re not prepared to affirm or deny NVIDIA’s claims, it’s clear that this only scratches the surface. VRAM allocation is a multi-variable process; drivers, applications, APIs, and OSes all play a part here, and just because VRAM is allocated doesn’t necessarily mean it’s in use, or that it’s being used in a performance-critical situation. Using VRAM for an application-level resource cache and actively loading 4GB of resources per frame are two very different scenarios, for example, and would certainly be impacted differently by NVIDIA’s split memory partitions.

For the moment with so few answers in hand we’re not going to spend too much time trying to guess what it is NVIDIA has done, but from NVIDIA’s statement it’s clear that there’s some additional investigating left to do. If nothing else, what we’ve learned today is that we know less than we thought we did, and that’s never a satisfying answer. To that end we’ll keep digging, and once we have the answers we need we’ll be back with a deeper answer on how the GTX 970’s memory subsystem works and how it influences the performance of the card.

Comments Locked

93 Comments

View All Comments

  • Gothmoth - Sunday, January 25, 2015 - link

    can we agree that the gtx970 is the best card for the money at the moment... so what?

    in some cases you have a 1-3% drop in performance... get a life guys.

    should i buy a crappy radeon with crappy openGL drives instead?
    no thanks....
  • anubis44 - Sunday, January 25, 2015 - link

    No, we can't agree that the GTX970 is the best card for the money when I was able to buy a Gigabyte Radeon R9 290 for $259 Canadian and flash the bios on it for free so it runs at 1050MHz instead of 943MHz stock. The GTX970s were all $380+ Canadian, so no, the GTX970 is NOT the best bang for buck card in the mid-high end.

    Now that nVidia has once again shown how willing it is to rip off its own customers, I am especially glad I stayed away from them.
  • Gothmoth - Sunday, January 25, 2015 - link

    i don´t live in canada.. so i don´t care about your prices.

    here the nvidia is the better bang for the buck......

    + i have working openGL drivers.. not the faulty ATI ones....
  • Black Obsidian - Sunday, January 25, 2015 - link

    And the R9 290 doesn't have significant problems when you use all of its RAM, as opposed to what some GTX970 owners are reporting.

    They're both fairly minor problems, but I'd personally take "ATI's" (solvable) software issues over a suboptimal hardware design that might NOT be fixable with software.
  • chizow - Monday, January 26, 2015 - link

    1-3% drop in perf constitutes significant problems on a part that is already 15-20% slower, but also costs about 40% less than the 980? If anything I'd say that premium on the 980 is really showing its value, for anyone who cares that much about that last 1-3% bit of performance.
  • Pork@III - Monday, January 26, 2015 - link

    Just 980 is too high priced. The price of 970 is close to normal. We need of 980 Ti to pushed down price of all ordinary 980th.
  • chizow - Monday, January 26, 2015 - link

    @anubis44, idk as I've tried to explain many times to you and others like Creig, performance is not the only factor that goes into a buying decision. Luckily for Nvidia fans however, that still mostly binds Nvidia to price against AMD based on performance alone.

    In any case, I'd say the rest of the marketplace disagrees with you. Despite AMD slashing prices in Q4, they still got slaughtered in the marketplace, so while they measured up well against Nvidia in terms of price and performance, they still got killed going up against the 970. Because additional support and features are worth the premium for many, and the main reason the majority will go with Nvidia if price and performance are close.
  • Vayra - Monday, January 26, 2015 - link

    That one was a bit different though, and the stutter was considerably less than with this GTX 970.

    GTX 660 had 0,5GB on a smaller bus if I remember correctly, but it did not handicap the card as much. You could use 1900 MB and still have a playable game.
  • Klimax - Sunday, January 25, 2015 - link

    Actually, we got all technical information we need:
    "and fewer crossbar resources to the memory system" is very condensed but technical explanation. There likely won't be anything more.

    See:
    http://citeseerx.ist.psu.edu/viewdoc/download?doi=...
    "Butterfly networks offer minimal hop count for a given
    router radix while having no path diversity and requiring very
    long wires. A crossbar interconnect can be seen as a 1-stage
    butterfly and scales quadratically in area as the number of
    ports increase."
    http://web.eecs.umich.edu/~twenisch/papers/ispass1...

    GT200:
    http://www.realworldtech.com/gt200/10/
    "Loads are then issued across a whole warp and sent over the intra-chip crossbar bus to the GDDR3 memory controller. Store instructions are handled in a similar manner, first addresses are calculated and then the stores are sent across the intra-chip crossbar to the ROP units and then to the GDDR3 memory controller."

    From all that in conjunction with NVidia's statement we can conclude that there are likely fewer ports on crossbar -> smaller effective bandwidth as SMX have to wait for their memory access.
  • chizow - Monday, January 26, 2015 - link

    Nice find, and I agree, its about as detailed as it is going to get without going overly technical. In essence, the missing performance was probably always there for any culled/cut SKUs based on a specific ASIC. At some point, the cut functional units are going to have to interface some other functional unit, so if one end isn't there, its only natural to assume the other interconnect will go unutilized or underutilized as a result.

Log in

Don't have an account? Sign up now