Segmented Memory Allocation in Software

So far we’ve talked about the hardware, and having finally explained the hardware basis of segmented memory we can begin to understand the role software plays, and how software allocates memory among the two segments.

From a low-level perspective, video memory management under Windows is the domain of the combination of the operating system and the video drivers. Strictly speaking Windows controls video memory management – this being one of the big changes of Windows Vista and the Windows Display Driver Model – while the video drivers get a significant amount of input in hinting at how things should be laid out.

Meanwhile from an application’s perspective all video memory and its address space is virtual. This means that applications are writing to their own private space, blissfully unaware of what else is in video memory and where it may be, or for that matter where in memory (or even which memory) they are writing. As a result of this memory virtualization it falls to the OS and video drivers to decide where in physical VRAM to allocate memory requests, and for the GTX 970 in particular, whether to put a request in the 3.5GB segment, the 512MB segment, or in the worst case scenario system memory over PCIe.


Virtual Address Space (Image Courtesy Dysprosia)

Without going quite so far to rehash the entire theory of memory management and caching, the goal of memory management in the case of the GTX 970 is to allocate resources over the entire 4GB of VRAM such that high-priority items end up in the fast segment and low-priority items end up in the slow segment. To do this NVIDIA focuses up to the first 3.5GB of memory allocations on the faster 3.5GB segment, and then finally for memory allocations beyond 3.5GB they turn to the 512MB segment, as there’s no benefit to using the slower segment so long as there’s available space in the faster segment.

The complex part of this process occurs once both memory segments are in use, at which point NVIDIA’s heuristics come into play to try to best determine which resources to allocate to which segments. How NVIDIA does this is very much a “secret sauce” scenario for the company, but from a high level identifying the type of resource and when it was last used are good ways to figure out where to send a resource. Frame buffers, render targets, UAVs, and other intermediate buffers for example are the last thing you want to send to the slow segment; meanwhile textures, resources not in active use (e.g. cached), and resources belonging to inactive applications would be great candidates to send off to the slower segment. The way NVIDIA describes the process we suspect there are even per-application optimizations in use, though NVIDIA can clearly handle generic cases as well.

From an API perspective this is applicable towards both graphics and compute, though it’s a safe bet that graphics is the more easily and accurately handled of the two thanks to the rigid nature of graphics rendering. Direct3D, OpenGL, CUDA, and OpenCL all see and have access to the full 4GB of memory available on the GTX 970, and from the perspective of the applications using these APIs the 4GB of memory is identical, the segments being abstracted. This is also why applications attempting to benchmark the memory in a piecemeal fashion will not find slow memory areas until the end of their run, as their earlier allocations will be in the fast segment and only finally spill over to the slow segment once the fast segment is full.

GeForce GTX 970 Addressable VRAM
API Memory
Direct3D 4GB
OpenGL 4GB
CUDA 4GB
OpenCL 4GB

The one remaining unknown element here (and something NVIDIA is still investigating) is why some users have been seeing total VRAM allocation top out at 3.5GB on a GTX 970, but go to 4GB on a GTX 980. Again from a high-level perspective all of this segmentation is abstracted, so games should not be aware of what’s going on under the hood.

Overall then the role of software in memory allocation is relatively straightforward since it’s layered on top of the segments. Applications have access to the full 4GB, and due to the fact that application memory space is virtualized the existence and usage of the memory segments is abstracted from the application, with the physical memory allocation handled by the OS and driver. Only after 3.5GB is requested – enough to fill the entire 3.5GB segment – does the 512MB segment get used, at which point NVIDIA attempts to place the least sensitive/important data in the slower segment.

Diving Deeper: The Maxwell 2 Memory Crossbar & ROP Partitions Practical Performance Possibilities & Closing Thoughts
Comments Locked

398 Comments

View All Comments

  • Wesleyrpg - Tuesday, January 27, 2015 - link

    This review has been sponsored by nvidia

    ;)
  • Exchequer - Tuesday, January 27, 2015 - link

    Obviously a lot of time and effort went into writing this article.

    However the one thing I do not get is why there are no frametimes performance figures. Nvidia has commented that the performance degradation is only 1-3%. However this is measured in 'old school' average fps.

    It is possible (maybe even likely if I understand the story correctly) that a game running on 3.6 GB of vram might show 100 100 30 100 100 100 100 30 100 100 100 100. Meaning that in terms of average FPS you will see nothing worrying here. But in terms of percentile performance you will see annoying lagspikes going from 30 to 100 fps.

    So Instead of knowing the reduction of average fps on the 3.5 gb vs >3.5gb performance we NEED to know the increase in "worst" percentile frametimes (or framerates). Only then can we be sure that no annoying micro stutter is introduced at 3.5+ GB loads.
  • Ryan Smith - Tuesday, January 27, 2015 - link

    "However the one thing I do not get is why there are no frametimes performance figures."

    We only had 12 hours (overnight no less) to prepare the article, which meant there wasn't time to do anything more than this.
  • Bytales - Tuesday, January 27, 2015 - link

    Now its interesting to see how Geforce 750 is compartimentalized compared to the full chip, the 750Ti !
    Is it the same issues, that wasnt discovered, only because these are cheaper cards, and if someone get such a cheap card will get probably the 750ti !?
  • snouter - Tuesday, January 27, 2015 - link

    I had two 2GB GTX760 in SLI. I got tired of fussing with the SLI and sold those cards and got a 4GB* GTX970.

    I play games, I don't really sit around benchmarking and blah blah. I knew I was taxing my 2GB cards though and SLI does not pool memory, so it was not like I was in a 2GB+2GB situation.

    The 970GTX works fine. Except... when I do grow into it, my ceiling won't be 4GB, it will be 3.5GB. When I go to sell it used, it will "be that crippled card."

    It's not the end of the world. My girl loves me and I have meat in the fridge, but... there is no way around it. This is not the video card I thought I bought.
  • Quad5Ny - Tuesday, January 27, 2015 - link

    @Ryan Smith
    Do you think it would be possible to have a driver option to use 3.75GB and forgo the split partitioning? Or would that not be possible because of the 1K stripe size?
  • Ryan Smith - Tuesday, January 27, 2015 - link

    3.5GB you mean? In theory I don't see why that shouldn't be possible. But keep in mind if you were to do that, you'd start spilling into system memory instead of the second segment.
  • Quad5Ny - Thursday, January 29, 2015 - link

    Yup 3.5GB. For some reason I was thinking each chip was 256MB while writing that.
  • 3ricss - Tuesday, January 27, 2015 - link

    At $329 the GTX970 is a compromise I'm willing to take. And did.
  • gudomlig - Tuesday, January 27, 2015 - link

    I own an MSI gaming 970. It runs everything at 1080p smooth as butter and runs most of my games in 1080 3D (vizio passive HDTV) with no trouble. 3D solution by NVIDIA is a bit weak,had to do some driver work arounds to get around 30fps lock at 1080.My 7950 with tridef seems better on some games but given there is like none of us trying to use 3D I guess I can't complain that much. So what they screwed up the specs by a tad, its not like this isn't still a serious kick-a$$ card. The benchmarks speak for themselves, find me a practical application where this matters and maybe then I'd care, but probably not.

Log in

Don't have an account? Sign up now