Segmented Memory Allocation in Software

So far we’ve talked about the hardware, and having finally explained the hardware basis of segmented memory we can begin to understand the role software plays, and how software allocates memory among the two segments.

From a low-level perspective, video memory management under Windows is the domain of the combination of the operating system and the video drivers. Strictly speaking Windows controls video memory management – this being one of the big changes of Windows Vista and the Windows Display Driver Model – while the video drivers get a significant amount of input in hinting at how things should be laid out.

Meanwhile from an application’s perspective all video memory and its address space is virtual. This means that applications are writing to their own private space, blissfully unaware of what else is in video memory and where it may be, or for that matter where in memory (or even which memory) they are writing. As a result of this memory virtualization it falls to the OS and video drivers to decide where in physical VRAM to allocate memory requests, and for the GTX 970 in particular, whether to put a request in the 3.5GB segment, the 512MB segment, or in the worst case scenario system memory over PCIe.


Virtual Address Space (Image Courtesy Dysprosia)

Without going quite so far to rehash the entire theory of memory management and caching, the goal of memory management in the case of the GTX 970 is to allocate resources over the entire 4GB of VRAM such that high-priority items end up in the fast segment and low-priority items end up in the slow segment. To do this NVIDIA focuses up to the first 3.5GB of memory allocations on the faster 3.5GB segment, and then finally for memory allocations beyond 3.5GB they turn to the 512MB segment, as there’s no benefit to using the slower segment so long as there’s available space in the faster segment.

The complex part of this process occurs once both memory segments are in use, at which point NVIDIA’s heuristics come into play to try to best determine which resources to allocate to which segments. How NVIDIA does this is very much a “secret sauce” scenario for the company, but from a high level identifying the type of resource and when it was last used are good ways to figure out where to send a resource. Frame buffers, render targets, UAVs, and other intermediate buffers for example are the last thing you want to send to the slow segment; meanwhile textures, resources not in active use (e.g. cached), and resources belonging to inactive applications would be great candidates to send off to the slower segment. The way NVIDIA describes the process we suspect there are even per-application optimizations in use, though NVIDIA can clearly handle generic cases as well.

From an API perspective this is applicable towards both graphics and compute, though it’s a safe bet that graphics is the more easily and accurately handled of the two thanks to the rigid nature of graphics rendering. Direct3D, OpenGL, CUDA, and OpenCL all see and have access to the full 4GB of memory available on the GTX 970, and from the perspective of the applications using these APIs the 4GB of memory is identical, the segments being abstracted. This is also why applications attempting to benchmark the memory in a piecemeal fashion will not find slow memory areas until the end of their run, as their earlier allocations will be in the fast segment and only finally spill over to the slow segment once the fast segment is full.

GeForce GTX 970 Addressable VRAM
API Memory
Direct3D 4GB
OpenGL 4GB
CUDA 4GB
OpenCL 4GB

The one remaining unknown element here (and something NVIDIA is still investigating) is why some users have been seeing total VRAM allocation top out at 3.5GB on a GTX 970, but go to 4GB on a GTX 980. Again from a high-level perspective all of this segmentation is abstracted, so games should not be aware of what’s going on under the hood.

Overall then the role of software in memory allocation is relatively straightforward since it’s layered on top of the segments. Applications have access to the full 4GB, and due to the fact that application memory space is virtualized the existence and usage of the memory segments is abstracted from the application, with the physical memory allocation handled by the OS and driver. Only after 3.5GB is requested – enough to fill the entire 3.5GB segment – does the 512MB segment get used, at which point NVIDIA attempts to place the least sensitive/important data in the slower segment.

Diving Deeper: The Maxwell 2 Memory Crossbar & ROP Partitions Practical Performance Possibilities & Closing Thoughts
Comments Locked

398 Comments

View All Comments

  • CX71 - Thursday, January 29, 2015 - link

    Has nothing to do with damages, I doubt anyone has lost income as a result of buying a 970, it's all about consumer rights and the advertised specs which weren't accurate. Particularly for those who bought two or more 970s (which I was planning to do) with the intention of using SLI to push pixels on screen(s) above 1080, that's the issue for nVidia. If they had of stated right from launch that the card would loose performance if VRAM usage went above 3.5GB, and people still bought it then they'd be fine.
  • Magictoaster - Thursday, January 29, 2015 - link

    Litigation has everything to do with damages. This was the point I was trying to make. Your rights as a consumer are to have the card returned and your money refunded. If you want to bring suite, class action or otherwise, against Nvidia there need to be damages. If you have no damages, you have no case. If Nvidia gave you a hard time, and wouldnt issue an RMA/refund, you could argue they are acting in bad faith and ask for treble damages (a punitive tripling of any damages awarded). Again, you still need to prove that you were damaged by Nvidias misrepresentations.

    Seeing as Nvidia made no false representations (Nvidias listed specs only state the amount of VRAM, not ROPs, not partition size, not performance scaling, etc) you would be hard press to even prove that Nvidia deliberately misrepresented the specs of the card.

    Legal action is based on damages, you have to show Nvidia misrepresented their card, you relied on that representation, and as a result you suffered damages. If you can't show that, then don't mention litigation.

    With regards to your consumer rights, ask for a refund, you are probably entitled to it. You are not entitled to a free upgrade, a pile of gold, a unicorn, or any other non-sense.
  • Magictoaster - Thursday, January 29, 2015 - link

    Just so we are clear, I'm not a lawyer and this is not legal advise.

    I did work work for a law firm, and have seen countless cases were people sued "on principle," or to "correct the system" and every single one of them was a loss/dismissed at considerable expense to the plaintiff.
  • Elixer - Thursday, January 29, 2015 - link

    Looks like it was too good to be true.

    While some people got help, others are getting the shaft.
  • AnnonymousCoward - Thursday, January 29, 2015 - link

    It could just as well have 1024 ROPs and 2GB L2; who gives a shit?
  • M1cha3l - Friday, January 30, 2015 - link

    excellent article is there any change to
    make or link explanation about ROP's and SMMs ??

    Thanks :D
  • Fishman44 - Friday, January 30, 2015 - link

    This is a big deal. The most disturbing thing about this story is that Nvidia knew, and took the calculated risk that it wouldn't get noticed.
  • Ballist1x - Friday, January 30, 2015 - link

    I guess the question is now:

    Is Anandtech going to change the way they review GFX cards in the future to avoid this debacle in future?

    Maybe test the memory bandwidth, test with memory usage etc instead of trotting out the exact same synthetics every time and claiming that there were some unknown results - and then not revisiting the tests as they did for the GTX 970 launch?

    Were Anand complicit?
  • Nfarce - Friday, January 30, 2015 - link

    I go on vacation for a week, then come back and catch up on my tech news and then THIS happens! In any event, EGVA Superclocked 970 owner here. I have been extremely happy with the card running 1440p on all my games. Even if the specs were reduced, I still would have bought it over the 980 for $200 less. I did not see the 10-15% better performance with the 980 worth the 55% increase in cost, especially when I can safely overclock the already factory overclocked card to within a few frames per second of the 980.

    But yes, I am not happy with Nvidia's massive fall down here. If anything, I have diminished respect and trust for the company. It will not go forgotten.
  • Dal Makhani - Friday, January 30, 2015 - link

    you just said that the card is great for you, so why have less respect and trust with them? Contradictions dont help arguments, if anything this will only affect users who are maxing out VRAM and thats probably only a few of most owners who are probably on 1080p or 1440p without cranking modded textures/effects on games.

    PR issues like this arent really a big deal because companies make mistakes all the time and this is far from major and i can see miscommunication like this happening between parties at large companies all the time.

Log in

Don't have an account? Sign up now