Segmented Memory Allocation in Software

So far we’ve talked about the hardware, and having finally explained the hardware basis of segmented memory we can begin to understand the role software plays, and how software allocates memory among the two segments.

From a low-level perspective, video memory management under Windows is the domain of the combination of the operating system and the video drivers. Strictly speaking Windows controls video memory management – this being one of the big changes of Windows Vista and the Windows Display Driver Model – while the video drivers get a significant amount of input in hinting at how things should be laid out.

Meanwhile from an application’s perspective all video memory and its address space is virtual. This means that applications are writing to their own private space, blissfully unaware of what else is in video memory and where it may be, or for that matter where in memory (or even which memory) they are writing. As a result of this memory virtualization it falls to the OS and video drivers to decide where in physical VRAM to allocate memory requests, and for the GTX 970 in particular, whether to put a request in the 3.5GB segment, the 512MB segment, or in the worst case scenario system memory over PCIe.


Virtual Address Space (Image Courtesy Dysprosia)

Without going quite so far to rehash the entire theory of memory management and caching, the goal of memory management in the case of the GTX 970 is to allocate resources over the entire 4GB of VRAM such that high-priority items end up in the fast segment and low-priority items end up in the slow segment. To do this NVIDIA focuses up to the first 3.5GB of memory allocations on the faster 3.5GB segment, and then finally for memory allocations beyond 3.5GB they turn to the 512MB segment, as there’s no benefit to using the slower segment so long as there’s available space in the faster segment.

The complex part of this process occurs once both memory segments are in use, at which point NVIDIA’s heuristics come into play to try to best determine which resources to allocate to which segments. How NVIDIA does this is very much a “secret sauce” scenario for the company, but from a high level identifying the type of resource and when it was last used are good ways to figure out where to send a resource. Frame buffers, render targets, UAVs, and other intermediate buffers for example are the last thing you want to send to the slow segment; meanwhile textures, resources not in active use (e.g. cached), and resources belonging to inactive applications would be great candidates to send off to the slower segment. The way NVIDIA describes the process we suspect there are even per-application optimizations in use, though NVIDIA can clearly handle generic cases as well.

From an API perspective this is applicable towards both graphics and compute, though it’s a safe bet that graphics is the more easily and accurately handled of the two thanks to the rigid nature of graphics rendering. Direct3D, OpenGL, CUDA, and OpenCL all see and have access to the full 4GB of memory available on the GTX 970, and from the perspective of the applications using these APIs the 4GB of memory is identical, the segments being abstracted. This is also why applications attempting to benchmark the memory in a piecemeal fashion will not find slow memory areas until the end of their run, as their earlier allocations will be in the fast segment and only finally spill over to the slow segment once the fast segment is full.

GeForce GTX 970 Addressable VRAM
API Memory
Direct3D 4GB
OpenGL 4GB
CUDA 4GB
OpenCL 4GB

The one remaining unknown element here (and something NVIDIA is still investigating) is why some users have been seeing total VRAM allocation top out at 3.5GB on a GTX 970, but go to 4GB on a GTX 980. Again from a high-level perspective all of this segmentation is abstracted, so games should not be aware of what’s going on under the hood.

Overall then the role of software in memory allocation is relatively straightforward since it’s layered on top of the segments. Applications have access to the full 4GB, and due to the fact that application memory space is virtualized the existence and usage of the memory segments is abstracted from the application, with the physical memory allocation handled by the OS and driver. Only after 3.5GB is requested – enough to fill the entire 3.5GB segment – does the 512MB segment get used, at which point NVIDIA attempts to place the least sensitive/important data in the slower segment.

Diving Deeper: The Maxwell 2 Memory Crossbar & ROP Partitions Practical Performance Possibilities & Closing Thoughts
Comments Locked

398 Comments

View All Comments

  • Harry Lloyd - Thursday, January 29, 2015 - link

    This card needs 8 GiB of VRAM with eight 8-Gbit GDDR5 chips (instead of eight 4-Gbit ones). The price would not be much higher, but we would get 7 GiB of full bandwitdh. That would be enough for pretty much anything until Pascal comes along.
  • Ranger101 - Thursday, January 29, 2015 - link

    Nvidia HAS BEEN LYING and they should be ROASTED not meekly forgiven as Ryan Smith suggests. It remains a mystery as to why Anandtech should be so keen to absolve them....
  • Man_Of_Steele - Thursday, January 29, 2015 - link

    I agree with you there. No one seems to really be giving them a hard time... the mistake isn't as simple as Anandtech is trying to make it seem IMO.
  • TEAMSWITCHER - Thursday, January 29, 2015 - link

    What has changed in light of this information? Did the GTX 970 benchmarks suddenly decline? Will AMD raise the price of the R9 290 and R9 290X now that the GTX 970 scandal has been "exposed"? No...all around.

    Same Process, More Transistors, More Performance, Lower Power, and Lower cost all the while using a non-symetric memory partitioning scheme to maximize high speed VRAM. nvidia's only fault was not telling us about it. If it bothers you that much spend another 40% and get the GTX 980, but know this...it will NOT get 40% more performance.
  • itproflorida - Thursday, January 29, 2015 - link

    With 970 gtx SLI; There is frame time lag when enabling 2x, 4xmsaa or TXAA @ 4k and some games at 1440p. Eventhough most games run fine @ 4k with no AA , Fxaa or 1xSmaa enabled some like AC Unity have hitching or frame time lag with no AA with Maxed settings. Its not just a vram issue like this site and others are proposing.

    I have a video of AC Unity @ 1440p Native resolution. Ultra settings, HBAO+ and soft shadows using FXAA. Very intense action scenes with no lag and averaging 60+ fps. With vram at 3990MB.

    Yet I can experience frame time lag in FC4 with vram at only 3436Mb @ 4k with 2mxssa enabled

    CODAW @4k Ultra, Maxed settings cached textures with 1xsmaa is fine also while it goes over 3500 MB vram..

    So I am not convinced that it is just the vram segmentation and slower speed of the cache and how the drivers handle memory allocation.

    Are they still great cards, yes. As long as you know how to tweak each game
  • piiman - Saturday, January 31, 2015 - link

    Or just return my 2 970's for 1 980 and save 40%
  • wolfman3k5 - Thursday, January 29, 2015 - link

    NVIDIA, a company that has the engineering talent to produce highly complex GPUs with billions of transistors whants us to somehow believe that they made a mistake when they publicized the specs for the GTX 970? And now they are applogizing like that will make everything okay? I am so so sorry, however no amount of "mea culpa" will make things right. "I am sorry" doesn't pay the bills, doesn't feed the kids, and most certainly, doesn't make up for the deception. I own two GTX 970s, and while I have never-ever been satisfied with their performance, now I know why. I have purchased them both from NewEgg.com and they are in mint condition. I would like to return them and at least get the GTX 980, which is more in line with the specs that they published originaly, minus some CUDA cores. No NVIDIA, you will not loose me as a customer, however I want what I "thought" I paid for. I will foot the bill for the price difference. Please, someone from NVIDIA, if you are reading this, please contact me at wolfman3k5_at_gmail_dot_com, and tell me how you can help me return my GTX 970 cards for a refund. Thank you.
  • piiman - Saturday, January 31, 2015 - link

    " I will foot the bill for the price difference."

    If you bought 2 970's they will owe you money. My 2 970s cost almost 700.00 just for the cards the 980 is going for 550.00
  • piiman - Saturday, January 31, 2015 - link

    oh and have you tried calling NewEgg they are very understanding and will work with you/us.
  • Magictoaster - Thursday, January 29, 2015 - link

    The reason no one is giving Nvidea a hard time is because most people dont buy a card based purley on its published specifications. They look at the benchmarks, for multiple games, and the price, and if its a good fit, they buy it.

    I bought a GTX970. I love the card. It plays all my games at 1080P (my monitors max resolution) flawlessly. It performs exactly like the benchmarks said it would. I don't really care that the internal workings of the card are not the exact same as a 980. They are not supposed to be, thats why I paid $200 less than a 980.

    Those suggesting litigation would need to consider what the damages are, are really, there are very few. NVidea's published specs are accurate (though incomplete), new eggs specs were accurate (and incomplete), and the card performs as expected. I don't see any deliberate attempt at fraud. I got what I paid for, and the card works as expected.

Log in

Don't have an account? Sign up now