Segmented Memory Allocation in Software

So far we’ve talked about the hardware, and having finally explained the hardware basis of segmented memory we can begin to understand the role software plays, and how software allocates memory among the two segments.

From a low-level perspective, video memory management under Windows is the domain of the combination of the operating system and the video drivers. Strictly speaking Windows controls video memory management – this being one of the big changes of Windows Vista and the Windows Display Driver Model – while the video drivers get a significant amount of input in hinting at how things should be laid out.

Meanwhile from an application’s perspective all video memory and its address space is virtual. This means that applications are writing to their own private space, blissfully unaware of what else is in video memory and where it may be, or for that matter where in memory (or even which memory) they are writing. As a result of this memory virtualization it falls to the OS and video drivers to decide where in physical VRAM to allocate memory requests, and for the GTX 970 in particular, whether to put a request in the 3.5GB segment, the 512MB segment, or in the worst case scenario system memory over PCIe.


Virtual Address Space (Image Courtesy Dysprosia)

Without going quite so far to rehash the entire theory of memory management and caching, the goal of memory management in the case of the GTX 970 is to allocate resources over the entire 4GB of VRAM such that high-priority items end up in the fast segment and low-priority items end up in the slow segment. To do this NVIDIA focuses up to the first 3.5GB of memory allocations on the faster 3.5GB segment, and then finally for memory allocations beyond 3.5GB they turn to the 512MB segment, as there’s no benefit to using the slower segment so long as there’s available space in the faster segment.

The complex part of this process occurs once both memory segments are in use, at which point NVIDIA’s heuristics come into play to try to best determine which resources to allocate to which segments. How NVIDIA does this is very much a “secret sauce” scenario for the company, but from a high level identifying the type of resource and when it was last used are good ways to figure out where to send a resource. Frame buffers, render targets, UAVs, and other intermediate buffers for example are the last thing you want to send to the slow segment; meanwhile textures, resources not in active use (e.g. cached), and resources belonging to inactive applications would be great candidates to send off to the slower segment. The way NVIDIA describes the process we suspect there are even per-application optimizations in use, though NVIDIA can clearly handle generic cases as well.

From an API perspective this is applicable towards both graphics and compute, though it’s a safe bet that graphics is the more easily and accurately handled of the two thanks to the rigid nature of graphics rendering. Direct3D, OpenGL, CUDA, and OpenCL all see and have access to the full 4GB of memory available on the GTX 970, and from the perspective of the applications using these APIs the 4GB of memory is identical, the segments being abstracted. This is also why applications attempting to benchmark the memory in a piecemeal fashion will not find slow memory areas until the end of their run, as their earlier allocations will be in the fast segment and only finally spill over to the slow segment once the fast segment is full.

GeForce GTX 970 Addressable VRAM
API Memory
Direct3D 4GB
OpenGL 4GB
CUDA 4GB
OpenCL 4GB

The one remaining unknown element here (and something NVIDIA is still investigating) is why some users have been seeing total VRAM allocation top out at 3.5GB on a GTX 970, but go to 4GB on a GTX 980. Again from a high-level perspective all of this segmentation is abstracted, so games should not be aware of what’s going on under the hood.

Overall then the role of software in memory allocation is relatively straightforward since it’s layered on top of the segments. Applications have access to the full 4GB, and due to the fact that application memory space is virtualized the existence and usage of the memory segments is abstracted from the application, with the physical memory allocation handled by the OS and driver. Only after 3.5GB is requested – enough to fill the entire 3.5GB segment – does the 512MB segment get used, at which point NVIDIA attempts to place the least sensitive/important data in the slower segment.

Diving Deeper: The Maxwell 2 Memory Crossbar & ROP Partitions Practical Performance Possibilities & Closing Thoughts
Comments Locked

398 Comments

View All Comments

  • Will Robinson - Wednesday, January 28, 2015 - link

    You're going to love this then...
    http://gamenab.net/2015/01/26/truth-about-the-g-sy...
  • Oxford Guy - Thursday, January 29, 2015 - link

    Fascinating link, for sure.
  • mudz78 - Wednesday, January 28, 2015 - link

    "we have also been working on cooking up potential corner cases for the GTX 970 and have so far come up empty"

    Riiight.

    "As part of our discussion with NVIDIA, they laid out the fact that the original published specifications for the GTX 970 were wrong, and as a result the “unusual” behavior that users had been seeing from the GTX 970 was in fact expected behavior for a card configured as the GTX 970 was."

    Nvidia has already admitted they had complaints about performance.

    If you want to come up with scenarios where the 970 shits its pants you should really try harder:

    http://www.overclock.net/t/1535502/gtx-970s-can-on...

    http://forums.guru3d.com/showthread.php?t=396064

    https://www.reddit.com/r/hardware/comments/2s333r/...

    http://www.reddit.com/r/pcgaming/comments/2s2968/g...

    All of those threads have been around for weeks before Nvidia's announcment.

    Who cares what Nvidia's take on the situation is? It was an accident? Oh, no worries, mate!

    They are a business that lied, there's consequences to that. Nobody cares that they didn't mean it.

    Refunds will start rolling out in coming weeks.
  • Yojimbo - Wednesday, January 28, 2015 - link

    Hey, can you link to the actual relevant part of those threads where someone is posting his methodology and results for creating a performance problem? The overclocker link seems to be a link to a 106-page thread whose first message is just a link to the other 3 threads you posted. The first message in the guru3d thread claims that the card can't use more than 3.5GB at all, which we now know to be completely false. It's like you're throwing us a cookbook and flour and saying "Here, there's a pie in here somewhere." If it's somewhere in there, and you have seen it before, could you please find and point to the methodology and claimed results so that people can try to repeat it rather than you just saying "you really should try harder"?
  • mudz78 - Wednesday, January 28, 2015 - link

    I think a more fitting analogy would be, somebody is complaining they can't spell and I am handing them a dictionary. I'm telling you the information is in there, so have a read and find it.

    Maybe if you bothered to read beyond the first post in each thread you would have some answers?

    " The first message in the guru3d thread claims that the card can't use more than 3.5GB at all,"

    No it doesn't.

    "I think (maybe) is here a little problem with GTX 970. If I run some games, for example Far Cry 4, GTX 970 allocate only around 3500MB video memory, but in same game and same scene GTX 980 allocate full 4000MB video memory.
    But if I change resolution to higher - 3840x2160, then all memory is allocated.
    Same problem exist in many other games like Crysis 3, Watch Dogs etc..

    Where is problem?? I really dont know..."
    http://forums.guru3d.com/showthread.php?t=396064

    "I didn't believe this at first, but I just decided to try and test it myself with texture modded Skyrim and my SLI 970s. I tried to push the 3.5 GBs barrier by downsampling it from 5120x2880 with the four following experimental conditions:

    1. No MSAA applied on top
    2. 2xMSAA applied on top
    3. 4xMSAA applied on top
    4. 8xMSAA applied on top

    Since MSAA is known to be VRAM heavy, it made sense. I also kept a close eye on GPU usage and FPS with the Rivatuner overlay as well as VRAM usage. All of this was done running around Whiterun to minimize GPU usage. My results were as follows.

    1. Skyrim peaked at about 3600 MBs in usage with occasional brief hitching while loading new textures in and out of VRAM. GPU usage remained well below 99% on each card.

    2. Skyrim once again peaked at about 3600 MBs with the mentioned hitching, this time somewhat more frequently. Once again, GPU usage remained well below 99%.

    3. Skyrim yet again peaked at about 3600 MBs and hitched much more prominently and frequently at the same time as VRAM usage droppped down 100-200 MBs. GPU usage was below 99% again with FPS still at 60 aside from those hitches.

    4. Now Skyrim was using the full 4 GB framebuffer with massive stuttering and hitching from a lack of VRAM. This time, I had to stare at the ground to keep GPU usage below 99% and retain 60 FPS. I ran around Whiterun just staring at the ground and it remained at 60 FPS except with those massive hitches where GPU usage and framerate temporarily plummeted. This last run merely indicated that Skyrim can indeed use more VRAM than it was with the previous 3 settings and so the issue seems to be with the 970s themselves rather than just the game in this example. The performance degradation aside from VRAM was severe, but that could just be 8xMSAA at 5K taking its calculative toll.

    So it seems to me that my 970s refuse to utilize above ~3600 MBs of VRAM unless they absolutely need it, but I've no idea why. Nvidia didn't gimp the memory bus in any overly obvious way from the full GM204 chip therefore the 970s should have no issue using the same VRAM amount as the 980s. I don't like what I see, it's like the situation with the GTX 660 that had 2 GBs but could only effectively use up 1.5 without reducing its bandwidth to a third, so it tried to avoid exceeding 1.5. The difference is that was predictable due to the GK106's 192-bit memory bus, there's nothing about the 970's explicit specifications that indicates the same situation should apply.

    A similar shortcoming was noticed sometime back regarding the 970's ROPs and how the cutting-down of 3 of GM204's 16 SMM units affected the effective pixel fillrate of the 970s despite retaining the full 64 ROPs. It's possible that Maxwell is more tightly-connected to shader clusters and severing them affects a lot about how the chip behaves, but that doesn't really make sense. If this is an issue, it's almost certainly software-related. I'm not happy regardless of the reason and I'll try more games later. Anecdotally, I have noticed recent demanding games peaking at about 3500-3600 MBs and can't actually recall anything going beyond that. I didn't pay attention to it or change any conditions to test it."
    http://www.overclock.net/t/1535502/gtx-970s-can-on...

    "I can reproduce this issue in Hitman: Absolution.
    Once more than 3.5GB get allocated, there is a huge frametime spike.
    The same scene can be tested to get reproducible results.
    In 4k, memory usage stays below 3.5GB and there is no extreme spike. But in 5k (4x DSR with 1440p), at the same scene, there is a huge fps drop once the game wants to allocate 2-300MB at once and burst the 3.5GB.
    It happens in the tutorial mission when encountering the tennis field.

    With older driver (344.11 instead of 347.09), memory usage is lower, but you can enable MSAA to get high VRAM usage and thus be able to reproduce by 100%.

    Could a GTX 980 owner test this?"
    http://www.overclock.net/t/1535502/gtx-970s-can-on...

    "Without AA or just FXAA, I have around 3.5GB used in AC: U and mostly no stuttering. With 2xMSAA it rises to ~3.6-3.7GB and performance is still ok. But when I enable 4xMSAA and it needs ~3.8GB, I often have severe stuttering.
    When I set resolution to 720p and enable 8xMSAA, VRAM usage is well below 3GB and there is no stuttering at all."
    http://forums.guru3d.com/showpost.php?p=4991141&am...

    "In Far Cry 4 @ 1440p
    No AA: 3320MB Max Vram, locked at 60 fps
    2x MSAA: 3405MB Max Vram, locked at 60fps
    4x MSAA: 3500MB Max Vram, 45-60fps
    8x MSAA, starts around 3700-3800MB @ 4-5fps, stabilizes at 3500MB @ 30-40fps."
    http://forums.guru3d.com/showpost.php?p=4991210&am...

    There's plenty more evidence supporting the acknowledged (by Nvidia) fact that the GTX970 has performance issues with VRAM allocation above 3.5GB.

    And all those people posting "my games run fine at 1080p", you are clearly missing the point.
  • aoshiryaev - Wednesday, January 28, 2015 - link

    Why not just disable the slow 512mb of memory?
  • SkyBill40 - Wednesday, January 28, 2015 - link

    Why not just have the full 4GB at the rated speed as advertised?
  • Oxford Guy - Thursday, January 29, 2015 - link

    Ding ding ding.
  • MrWhtie - Wednesday, January 28, 2015 - link

    I can run 4 games at 100+ fps on 1080p simultaneously (MSI GTX 970). Power like this used to always cost $500+. I have no complaints; I didn't have $500 to spend on a GTX 980.

    I feel Nvidia is doing us a favor by significantly undercutting AMD.
  • mudz78 - Wednesday, January 28, 2015 - link

    Yeah, a huge favour. By lying about their product specs, undercutting the competition and concreting market share, they set themselves up to hike prices in the future.

Log in

Don't have an account? Sign up now