Segmented Memory Allocation in Software

So far we’ve talked about the hardware, and having finally explained the hardware basis of segmented memory we can begin to understand the role software plays, and how software allocates memory among the two segments.

From a low-level perspective, video memory management under Windows is the domain of the combination of the operating system and the video drivers. Strictly speaking Windows controls video memory management – this being one of the big changes of Windows Vista and the Windows Display Driver Model – while the video drivers get a significant amount of input in hinting at how things should be laid out.

Meanwhile from an application’s perspective all video memory and its address space is virtual. This means that applications are writing to their own private space, blissfully unaware of what else is in video memory and where it may be, or for that matter where in memory (or even which memory) they are writing. As a result of this memory virtualization it falls to the OS and video drivers to decide where in physical VRAM to allocate memory requests, and for the GTX 970 in particular, whether to put a request in the 3.5GB segment, the 512MB segment, or in the worst case scenario system memory over PCIe.


Virtual Address Space (Image Courtesy Dysprosia)

Without going quite so far to rehash the entire theory of memory management and caching, the goal of memory management in the case of the GTX 970 is to allocate resources over the entire 4GB of VRAM such that high-priority items end up in the fast segment and low-priority items end up in the slow segment. To do this NVIDIA focuses up to the first 3.5GB of memory allocations on the faster 3.5GB segment, and then finally for memory allocations beyond 3.5GB they turn to the 512MB segment, as there’s no benefit to using the slower segment so long as there’s available space in the faster segment.

The complex part of this process occurs once both memory segments are in use, at which point NVIDIA’s heuristics come into play to try to best determine which resources to allocate to which segments. How NVIDIA does this is very much a “secret sauce” scenario for the company, but from a high level identifying the type of resource and when it was last used are good ways to figure out where to send a resource. Frame buffers, render targets, UAVs, and other intermediate buffers for example are the last thing you want to send to the slow segment; meanwhile textures, resources not in active use (e.g. cached), and resources belonging to inactive applications would be great candidates to send off to the slower segment. The way NVIDIA describes the process we suspect there are even per-application optimizations in use, though NVIDIA can clearly handle generic cases as well.

From an API perspective this is applicable towards both graphics and compute, though it’s a safe bet that graphics is the more easily and accurately handled of the two thanks to the rigid nature of graphics rendering. Direct3D, OpenGL, CUDA, and OpenCL all see and have access to the full 4GB of memory available on the GTX 970, and from the perspective of the applications using these APIs the 4GB of memory is identical, the segments being abstracted. This is also why applications attempting to benchmark the memory in a piecemeal fashion will not find slow memory areas until the end of their run, as their earlier allocations will be in the fast segment and only finally spill over to the slow segment once the fast segment is full.

GeForce GTX 970 Addressable VRAM
API Memory
Direct3D 4GB
OpenGL 4GB
CUDA 4GB
OpenCL 4GB

The one remaining unknown element here (and something NVIDIA is still investigating) is why some users have been seeing total VRAM allocation top out at 3.5GB on a GTX 970, but go to 4GB on a GTX 980. Again from a high-level perspective all of this segmentation is abstracted, so games should not be aware of what’s going on under the hood.

Overall then the role of software in memory allocation is relatively straightforward since it’s layered on top of the segments. Applications have access to the full 4GB, and due to the fact that application memory space is virtualized the existence and usage of the memory segments is abstracted from the application, with the physical memory allocation handled by the OS and driver. Only after 3.5GB is requested – enough to fill the entire 3.5GB segment – does the 512MB segment get used, at which point NVIDIA attempts to place the least sensitive/important data in the slower segment.

Diving Deeper: The Maxwell 2 Memory Crossbar & ROP Partitions Practical Performance Possibilities & Closing Thoughts
Comments Locked

398 Comments

View All Comments

  • Ballist1x - Tuesday, January 27, 2015 - link

    My analogy is as follows:

    Its like buying a V8 engine car, except it can only ever run as a V7 or a V1. Never as a V8.

    As a V1 you get a lumpy ride and as a V7 you never truly get the full performance of a V8.

    Can it be sold as a V8?
  • Michael Bay - Tuesday, January 27, 2015 - link

    AMD does it all the time when selling multicore cpus, yet there is no outrage.
    Maybe that`s because nobody is expecting anything out of them anyway.
  • AnnonymousCoward - Thursday, January 29, 2015 - link

    Yolo, your car comparison isn't valid. 5.0L and V8 are significant for bragging rights, while someone who buys a non-top-of-the-line 970 won't even know what the hell a ROP is. Second, performance is the thing to focus on.

    So here's a better analogy: you thought you bought a 450 horsepower 5.0 V8 with 64 air inlets, but it turns out you have a 450hp 5.0 V8 with 56 air inlets.

    Moral of the story: why give a shit?
  • iamKG - Tuesday, January 27, 2015 - link

    i really wonder....
    Are GTX980 also have the same memory structure? (3.5GB + 0.5GB)
  • mapesdhs - Tuesday, January 27, 2015 - link

    No.

    Ian.
  • Rollo Thomasi - Tuesday, January 27, 2015 - link

    Couldn't the 970 be potentialy worse then the a card with the same gpu and only 3.5GB?

    While the gpu is trying to access the slow last 0.5GB the first 3.5GB is inaccessable right?

    If the card had only 3.5GB and the game needs 4GB it woud have to use a painfully slow 0.5GB of main memory through the PCIExpress buss but at least it could still have access to the first 3.5GB while waiting.

    Am I right?
  • zlandar - Tuesday, January 27, 2015 - link

    I would be upset if I found out my video card has 3.5 GB of video RAM when it's advertised as 4 GB.
  • xenol - Tuesday, January 27, 2015 - link

    Even if NVIDIA "overestimated" the specs and "lied to customers", this just makes the card actually appear better considering how well it performs.
  • koss - Tuesday, January 27, 2015 - link

    I am really impresses by the comment section. Those doing reviews are aware they'd be out of business soon and join the dark PR site. 'Performance do not suffer.', says a guy in Anandtech, who talks the talk of a GPU mastermind, yet he walks the walk of his colleagues in nVdia PR team - the very same people not knowing specs of a product they work on(and its not a joystick or dvd player, something their company tries for the very first time and they never heard or seen before). You are coming as either really gem of an employee or you mustthink we all idiots.

    Btw why didn't your professional reviewers find the problem, but those people who shouldn't question specs and just TRUST you, because they don't understand better? You know the same one that don't talk with engineers and can't get it how a review of a product they were working on for two years is a thing they've never seen. They only logical reason being - it is done by unbiased expert just like you and therefore easily predictable.
  • dejo1967 - Tuesday, January 27, 2015 - link

    I have a gtx970 and think its a fantastic card. The problem comes from the fact that Nvidia would rail their mother to make a few extra dollars. As customers, we dont earn any repsect from nvidia! I am one of the ones that purchased a 6800GT that was broken at the chip level and wouldnt play high def video as they stated it would. That card would use 100%cpu load to do anything. They came out with the 6600GT and it would use roughly 30% cpu to play hd video. Nvidia then at least did come out of the closet and state that the chip was broken. But they didnt offer to take care of those of us that did pay full price for a card that didnt do what what stated.
    In the end they came up with a software workaround that did bring cpu usage down a bit. But they then wanted another $40 for that as a fix. Nvidia is the worst company I have ever dealt with when it comes to taking care of customers.
    This whole fiasco is about Nvidia wanting to make something looks as though it is better than what it is and to take as much advantage of the customer base as they can. There are zero morals to be found in the whole nvidia corporation. Take that as the facts

Log in

Don't have an account? Sign up now