Segmented Memory Allocation in Software

So far we’ve talked about the hardware, and having finally explained the hardware basis of segmented memory we can begin to understand the role software plays, and how software allocates memory among the two segments.

From a low-level perspective, video memory management under Windows is the domain of the combination of the operating system and the video drivers. Strictly speaking Windows controls video memory management – this being one of the big changes of Windows Vista and the Windows Display Driver Model – while the video drivers get a significant amount of input in hinting at how things should be laid out.

Meanwhile from an application’s perspective all video memory and its address space is virtual. This means that applications are writing to their own private space, blissfully unaware of what else is in video memory and where it may be, or for that matter where in memory (or even which memory) they are writing. As a result of this memory virtualization it falls to the OS and video drivers to decide where in physical VRAM to allocate memory requests, and for the GTX 970 in particular, whether to put a request in the 3.5GB segment, the 512MB segment, or in the worst case scenario system memory over PCIe.


Virtual Address Space (Image Courtesy Dysprosia)

Without going quite so far to rehash the entire theory of memory management and caching, the goal of memory management in the case of the GTX 970 is to allocate resources over the entire 4GB of VRAM such that high-priority items end up in the fast segment and low-priority items end up in the slow segment. To do this NVIDIA focuses up to the first 3.5GB of memory allocations on the faster 3.5GB segment, and then finally for memory allocations beyond 3.5GB they turn to the 512MB segment, as there’s no benefit to using the slower segment so long as there’s available space in the faster segment.

The complex part of this process occurs once both memory segments are in use, at which point NVIDIA’s heuristics come into play to try to best determine which resources to allocate to which segments. How NVIDIA does this is very much a “secret sauce” scenario for the company, but from a high level identifying the type of resource and when it was last used are good ways to figure out where to send a resource. Frame buffers, render targets, UAVs, and other intermediate buffers for example are the last thing you want to send to the slow segment; meanwhile textures, resources not in active use (e.g. cached), and resources belonging to inactive applications would be great candidates to send off to the slower segment. The way NVIDIA describes the process we suspect there are even per-application optimizations in use, though NVIDIA can clearly handle generic cases as well.

From an API perspective this is applicable towards both graphics and compute, though it’s a safe bet that graphics is the more easily and accurately handled of the two thanks to the rigid nature of graphics rendering. Direct3D, OpenGL, CUDA, and OpenCL all see and have access to the full 4GB of memory available on the GTX 970, and from the perspective of the applications using these APIs the 4GB of memory is identical, the segments being abstracted. This is also why applications attempting to benchmark the memory in a piecemeal fashion will not find slow memory areas until the end of their run, as their earlier allocations will be in the fast segment and only finally spill over to the slow segment once the fast segment is full.

GeForce GTX 970 Addressable VRAM
API Memory
Direct3D 4GB
OpenGL 4GB
CUDA 4GB
OpenCL 4GB

The one remaining unknown element here (and something NVIDIA is still investigating) is why some users have been seeing total VRAM allocation top out at 3.5GB on a GTX 970, but go to 4GB on a GTX 980. Again from a high-level perspective all of this segmentation is abstracted, so games should not be aware of what’s going on under the hood.

Overall then the role of software in memory allocation is relatively straightforward since it’s layered on top of the segments. Applications have access to the full 4GB, and due to the fact that application memory space is virtualized the existence and usage of the memory segments is abstracted from the application, with the physical memory allocation handled by the OS and driver. Only after 3.5GB is requested – enough to fill the entire 3.5GB segment – does the 512MB segment get used, at which point NVIDIA attempts to place the least sensitive/important data in the slower segment.

Diving Deeper: The Maxwell 2 Memory Crossbar & ROP Partitions Practical Performance Possibilities & Closing Thoughts
Comments Locked

398 Comments

View All Comments

  • bigboxes - Monday, January 26, 2015 - link

    Agreed. How can AnandTech just blindly accept this excuse? We can only hope that there is a price drop that follows this outing.
  • JarredWalton - Monday, January 26, 2015 - link

    People buy these cards based on performance, not on raw specs. For their part, NVIDIA doesn't even publicly list ROP counts for the various parts. I can't imagine any credible lawyer trying to start a class action suit over this information.
  • bigboxes - Monday, January 26, 2015 - link

    I'd say they do both. No one is saying that they 970 is a bad card. It's just not as good as advertised. I've been seriously considering purchasing a 970. The last couple of weeks I have been researching the different models. I think that Newegg and Amazon keep on directing web pages and e-mails directly relating to that fact. I'm now waiting to see how this whole thing settles.
  • Thepotisleaking - Monday, January 26, 2015 - link

    On point! All metrics are considered for purchases and there is no doubt somewhere out there are those that bought based on specs, particularly VRAM, that were defrauded.

    Johntseng is likely an nvidia employee or just a fanboy simpleton. Nice to see this issue acquiring traction leading to another payout from nvidia.

    not worth the lie is what they are likely thinking down in Santa Clara today :)
  • Thepotisleaking - Monday, January 26, 2015 - link

    Further many sites are on this issue now, a lot more data on the performance impact of this issue will be coming to light.

    .

    #ROPGATE 2014
  • Jon Tseng - Tuesday, January 27, 2015 - link

    >Johntseng is likely an nvidia employee

    Incorrect. You can look me up on Linkedin if you want to know where I work. Unlike some people I choose not to hide behind anonymous pseudonyms when I post online.

    >or just a fanboy

    Incorrect. From 2009-2014 I used a 4870x2 (scan.co.uk invoice #AQX84951 if you don't believe me) which was an incredible card for the price (£240). From 2003-06 I used a 9800 Pro (another great piece of silicon). I also had a 4850 in the HTPC and used an XT1950 Pro for a while while I was weaning myself off of AGP. Stange behaviour for an NVDA fanboy.

    >simpleton.

    Maybe if you tried engaging with my arguments rather than conducting silly ad hominem (https://yourlogicalfallacyis.com/ad-hominem) attacks you might have more headway.

    GTX970 is still a great card at an awesome price. Nothing that has come out today has changed either the performance benchmarks we have all seen or the price point we can buy it at. Actually I'm secretly hoping all the dumb publicity makes the price come down a bit so I can grab a second one to SLI...
  • Kutark - Tuesday, January 27, 2015 - link

    Don't waste your time responding to these morons. The irony is he called you a fanboy, when he is clearly an ATI fanboy.
  • Fishman44 - Friday, January 30, 2015 - link

    The issue isn't how great the card is. The issue is that they knowingly deceived their customers.
  • Oxford Guy - Tuesday, January 27, 2015 - link

    That's empty rhetoric.

    Performance depends on specs. 4 GB of VRAM performs better than 3.5 GB of VRAM when the game needs more than 3.5 GB of VRAM.
  • nevertell - Tuesday, January 27, 2015 - link

    Outside of situations where the GPU is used for general purpose computation, the performance implications of partitioned memory are not noteworthy given proper memory management in the driver. The reason games utilize as much memory as possible is because they cache textures and vertexes and intermediate results. The reason the 970gtx utilizes 3.5 gigabytes in most scenarios is that storing a cache on partition memory will deliver lower performance. I can assure you that a game that would need 4 gigabytes of data to perform a non-trivial draw call would be bound by compute, not i/o. Research sparse texture arrays and all other nice things that will be (are introduced in opengl with extensions for each vendor) in the next gen opengl spec (and are sort of implemented in mantle and dx12) and understand how caches and this will come as common sense.
    The real question is, why didn't they just sell a 3.5gb card instead and save on memory chips ?

Log in

Don't have an account? Sign up now