i doubt you can, after all you cant even get your own facts straight. https://imgur.com/a/s9Ift1p did banks even give business loans to 8 year old kids to start a " complete wood shop " ? did you drop out of elementary school to start this ?
So you have a problem with my wood shop or my machine shop? That was a response to someone talking about having a woodshop and wanting to build things. I have several businesses - the wood shop is a hobby. My machine shop is over 40K sq ft and has close to $35M in machines from DMG Mori, Mazak, Haas, etc. The machine shop is part of an engineering company I own. 16 Engineers, 5 production supervisors and about 5 other people doing whatever needs to be done. We work for large companies - most recently a major after market parts supplier and more specifically parts for the new Supras. We have worked for numerous national racing teams to develop parts and to build and deliver every thing from simple components to full chassis assemblies. Our process starts virtually and any new parts or assemblies are tested using our current 2 x 16xV100 DGX-2s. That was detailed in the paragraph above the one you highlighted.
I have been working with wood even before I took industrial arts in school. I can make anything from cabinets to furniture. It something I enjoy doing. My dad was a union machinist, and he had a small hobby wood shop that I learned in. I had my own set of hand tools by the time I was 8 - and knew how to use them - all the machinery in the world is useless if you don't know how to put something together. You need to get your facts straight. And BTW - never once got a business loan in my life - never needed it.
Its more than a little creepy you are stalking me and taking screenshots - you think you have some sort of "gotcha" moment? Kid, I also own 2 other companies, one with well over 1000 employees and over $320M in gross revenues - we have production facilities in 10 states. 100% of my workforce is idled at the moment and all still drawing their full 40hr paycheck - most are installers making $22.50 per hour - so they are getting paid $900 to sit on their butts. No cuts to benefits.
The 3rd company is a private equity company I am 50% partner in. Business partner and the Godfather to my kids was a major VC in Cali even before the internet - invested in little companies such as Netscape, Silicon Graphics, Sun and quite a few others. Was a major investor in Cisco and later Juniper Networks and was an early angel to several companies who have gone public in the last few years.
While you were not even born I was building and in some cases selling businesses. in 1994 started the first ISP in the Houston TX area - in 1995 we had over 25K dial up customers, sold my interest and started another ISP focusing on mostly big bandwidth. OC3 and OC12 as well as various Sonet/SDH services. We had 50K dial up, 8K DSL (1st DSL testbed in Texas) as well as hundreds of lines to clients ranging from a single TI upto an OC12. We sold to a company that would become Level 3 Communications - I walked out with close to $43M in the bank - that was invested over the course of 20 years and is worth many many multiples of that, I was 28 when I sold the 2nd ISP - I retired from doing anything I didn't want to do to make a living. To me retiring is not sitting on a beach somewhere drinking margaritas.
I don't know what your infatuation with me is, but it's creepy as hell. I am sorry you come from a disadvantaged background where even hand tools were out of reach, but that is not my problem. I feel bad for you that you had no examples of successful people for you to emulate and become successful yourself - instead you are a warrior who thinks he pulled off some kind of Gotcha!!
Well kid, I am off - the Silver Salmon are starting to run on the Copper River in Alaska - so have fun, I am sure you have tons of my posts screen shotted - so GL with that
yea right you do, YOU said you RETIRED 20 years ago when YOU were 28, YOU said YOU started that woodshop 40 YEARS ago, YOU werent talking about them, YOU were talking about you " I started 40 years ago with a next to nothing " " The engineering is the same whether it's in my metal / composites shop or the wood shop. " that is YOU talking about YOU starting the business not the person YOU are replying to. whats the matter Deicidium369, got caught in a LIE and now have to lie even more to try to get out of it ? most of your posts are pure BS and you know it. you rarely, IF EVER post and links of proof to your BS, when confronted or called out on your BS, you seem to do two things, run away with your tail between your legs, or reply with insults, name calling or condescending comments, just like your replies to me, and ANY one else that calls you out on your made up BS, even those that write about computer related stuff, like Jarred W, Ian and Ryan on here. that seems to be why you were banned on toms. blah blah blah blah is all i read in this BS post, all fiction.
going by this BS post, you are either around 45 years old, or 60+ but cause you cant get your own facts straight, who knows which is the truth, and which is fiction, like your posts.
The post of Deicidium369 you linked on imgur, specifically the highlighted portion is consistent with his post above.
Deicidium369 noted he started a woodworking hobby at age 8, he stated he started with next to nothing and built up a complete workshop over a period of 40 years. That is all consistent.
I do not know whether you suffer from some form of dyslexia or what, but you seem to jumble a lot of stuff together without reason, then your version makes no sense.
Deicidium369 looks like a troll. This is what it posted further down in this comments section: "6th [stack] might be for sideband/native HBM ECC. ECC on HBM is not usually inline like on DDR" Do you know of many millionaires and business owners to have that language or even have the time or interest to write comments on tech sites?
@Korguz, drop the fixation with his personal & business life. Please stick to commenting the technical content of his posts, or pointing out if/when he displays a lack of decorum.
Is the A100 (826mm2 die size) a somewhat 'modern' record for IC on a 300mm wafer? I understand these are 'artistic renderings' but the WxH almost looks like 2:1 __ further reducing the wafer 'cull' (maybe 75 dies per wafer?)
AND ... it is interesting that the HBM from the top image (with the nVidia logo) has been rotated 90 degrees on the PCB on other images.
But yeah, sticking to "standard" chips, one hopes that nV has a plan for chiplets. 7nm and 5nm still offer standard reticle, but TSMC 3nm is probably going to be high NA, probably meaning reticle size halves...
Yeah, no joke. I honestly wonder how things will play out, for Cerebras. It seems like they could own the cloud AI market, if all their claims are 100% legit.
I'm looking forwards to finding that out. It's certainly a bold strategy, but I have a strong suspicion that any benefits they get from having everything "all on one chip" will be wiped out by the sheer expense of fabricating, packaging and building a platform around that 300mm^2 "chip".
They went to 7nm, keep that in mind. Originally they were on 12nm on the old process, which is really an updated version of the 16/20nm hybrid TSMC process.
yeah its basically a what 1.5 node shrink (give TSMC is like 1.7-1.8 ish scaling factors?). Guess it was worth waiting around for the yields to finally support a full reticle die!
Yeah, it does look impressive. As much as I like to jump on the Nvidia hate-wagon, I have to give them kudos here.
Based on initial impressions, it looks like the RX 5700XT will be competing against the lowly RTX 3060. So Nvidia have a big lead once more. And with the newer hardware (lots more Tensor cores) and (standardised RT/DX 12.2 optimised) software, it looks like "RTX On" is no longer going to be a joke. With the PS5/XBX supporting it crudely, should see many studious/games finally adopt it too.
...sucks to be a RTX-2000's owner right about now, Heh
Yeah, I'm feeling mighty smug about my decision to keep sweating my Maxwell GPU and wait out the RTX 20 series!
Of course, part of that is because I'm a laptop gamer and I was waiting for RX 5700 to make it to a notebook design so that I could get better performance without breaking the bank... so I guess AMD did me favour by never following through?! xD
It'll be interesting to see if AMD can actually meet Nvidia out of the gate this time - RDNA 2 vs Ampere - or if it'll be another log wait...
In their specification graphics, they say general FP64 is still 9.7 but the Tensor Core matrix multiplication units can bump it to ~19.5. So the truth is in between.
No indeed. Generalized 1:1 FP32 or FP64 performance would have been surprising, so I felt like it was a good idea to dive deeper, and as usual it was Nvidia Marketing Fluff.
Depends on the intended use. As a deep learning-type accelerator, it will probably do quite well. I would have liked some information on pricing though, too. NVIDIA has a tendency to jack up prices a lot when moving from one generation to the next.
A lot of those additional transistors seem to go to many more Tensor cores, the 19.5 TFLOPs are for TF64. This thing is an accelerator, after all, and apparently geared to deep learning and related applications. @Ryan: could you do a deep(er) dive into the use and usefulness of tensor cores for graphics use, especially for games? I believe that a lot of the "so what?" here is about the otherwise underwhelming increase in non-tensor speeds compared to Volta.
Disappointed in the small FP performance improvement since I'm assuming desktop variants will also only get a minor bump. Seems like people are still wary of adopting tensor cores since even AI upscalers like gigapixel still rely solely on FP32.
It could be that it's just how people are used to doing things, but it could also be that Nvidia's claims about no loss of accuracy aren't applicable in as many circumstances as they'd like people to believe.
right now I think they are being used for 2 uses mainly with one upcoming use 1st use is de-noising of images in game due low amount of rays when using RTX 2nd use is DLSS 2.0 (probably will be used also in future iterations) 3rd use will be RTX Voice when it will be fully released tensor cores are useful mostly in matrix multiplication if you can get your calculation to be done efficiently by operations on matrices you can get big speed up. the reason we see big increase in tensor cores speed is Nvidia doubling down of AI calculation (be it inference or training) to be the major need for accelerators in HPC. while there are many accelerators that are made specifically for AI computation Nvidia is assuming that clients would also want the accelerators to do other GP computation.
You kiddies have no point of reference... the Tensor cores make up a small amount of the die, and are the new architecture Tensor cores which are smaller and much more powerful. So no, most of the die is NOT tensor cores.
Weird claim when the article above says this: "A single Ampere tensor core has 4x the FMA throughput as a Volta tensor core, which has allowed NVIDIA to halve the total number of tensor cores per SM – going from 8 cores to 4 – and still deliver a functional 2x increase in FMA throughput. In essence, a single Ampere tensor core has become an even larger massive matrix multiplication machine"
So, fewer cores, but each core is capable of 4X the output in a greater number of formats and at slightly lower clock speeds. Does that really sound like a smaller unit to you?
what he says you have no based data about the how the die is devised and compared to Volta, so those are just assumptions which may be true or may be wrong.
Right now, we don't know what kind of display output hardware is on die. You probably won't see a Quadro graphics card featuring this chip until the end of the year, at the earliest.
And as noted elsewhere, the improvement in general-purpose compute is fairly lackluster. So, unless you plan on using DLSS 3.0 at max settings, you'll probably be disappointed with its gaming performance.
Would think the Quadro/Tesla cards first of the year - MAYBE this year with some of the less than perfect die - specs released today are not the "full" A100 - will see all 128SM and 8x8GB in a refresh once yields get better (new node and architecture for Nvidia - usually 1 of those are tackled at a time). The GeForce cards will be even less "perfect" than the Quadro/Tesla.
If the picture is render of real product, than this is cut-down version. For 40GB VRAM with 5120-bit bus there would need to be 5 chips (or multiple of that number). This will probably come later as yealds go up and they stockpile good dies.
Maybe the 6th stack is simply to improve yield? Titan V had that weird thing where one of the 4 stacks was disabled, so maybe they frequently encounter defects in one of the stacks or the bus connecting it.
Weird though as you'd have though they'd make sure they had good die before the packaging process. Unless the interposer has some issues? But I wouldn't think the litho on that would be stupidly complex. Hmm.
They expect to have poor yields of perfect chips. They intentionally design the die with the expectation of disabling parts of it to get better yields on all chips from the wafer. The rest of the yield losses are offset by the higher product cost and by binning further disabled chips for high-end Quadros. The performance increase relative to even the losses from the disabled parts of the chip is still worth it. That's not to say that an MCM design wouldn't be much more economical, but we aren't quite there yet with interposer design.
Volta was 300 then later 350W. Our first DGX-2 had the 300W and drew ~10KW - when we added the 2nd unit we got the 350W version (1st unit swapped out) and each then went to 12KW draw. Near linear scaling - 20% perf increase on 20% higher power draw.
Volta was alot more power efficient than the Pascal units it replaced.
It seems likely, you also have the ability to slice it into 7 instances. Why not 8? Something's disabled. They're probably saving up the perfect chips because of yield and for some niche they can charge an extra premium for. I mean you won't care if you put 1000 of these into a data center, but if you want to make one extra pricey workstation chip or whatever.
this is not the full 128SM chip - this is a massive die and not only a new process, but a new architecture - so will be a while before we get the full 128SM and 48 or 64GB units.
More than double the transistor count going from 12nm to 7mn process but only 25% TFLOP improvement? I think there is much room for design optimization remaining on this process.
The sixth stack might just be a dummy to get a level surface for the cooler. (Like the nonfunctional chips used to balance some of AMD's Threadripper packages.)
"sideband/native HBM ECC"? That's not the lingo of a millionaire private investor and business owner. If fact, I'm pretty sure those fellows don't have the time or interest to write comments on AnandTech's articles.
Hopefully the RTX lineup won’t have all the transistor space taken up by tensor cores (and RT cores). I feel like at least the xx80ti version should have a General fp32 performance above 20tflop at this point.
Swap the FP64 cores for FP32 and you're at roughly 30TFLOPS. I'd much rather keep the tensor and RT cores, AMD has the progress hating crowd covered very well.
More RT cores is going to be great, and badly needed to make the tech worthwhile. Not so sure about the Tensor cores - they're still not doing anything particularly useful at the moment. Perhaps with more of them they can make their DLSS algorithms more complex?
Tensor cores can and sometimes(most/all?) are used for denoising sparse ray samples. I'm still not sold on DLSS, but 2.0 is a big improvement over 1.0 so maybe it'll get decent?
Of course, I'd forgotten about that. I'm not aware of any literature on which parts of the RT pipeline impose the greatest bottleneck, but I guess if they've improved that hardware then an increase in Tensor performance would also be necessary to get a net performance gain.
I also thought they used tensor cores for denoising, but recently I've not been able to find any evidence of this. At least, not in *games* that use global illumination.
20 SMs are disabled on the A100, so it only uses 84.4% of its CUDA cores. V100, by comparison, only had 4 SMs disabled, using 95.2% of its CUDA cores.
In addition, 2 512-bit memory controllers are disabled on the A100. The V100 did not disable any memory controllers.
So I think that we can expect an updated A100 GPU in the future with more compute performance, more memory bandwidth, and higher memory capacity. Perhaps one with 120 SMs enabled (11% more CUDA cores) and 48 GB of memory (or even 96GB if they want to), as well as 1.92 TB/s of memory bandwidth.
New node (from 12nm to 7nm) and new architecture - so not so much about anything being disabled - as not functional. The full 128SM unit and with 6x8GB or even 6x16GB for a refresh down the road. Tacking both the new node and the new arch in a single step is aggressive.
Of course they are disabled for yield reasons. They never disable them just for fun. But, as I said, later they will come out with a more enabled version.
And NVIDIA always comes out with a new architecture and a new node at the same time.
They avoided it for a long time after the GeForce FX series, but IIRC that changed when the 32nm nodes disappeared and they had to jump right to 28nm with Maxwell. They've gone to a new architecture with each new node since then.
Maxwell was on 28 nm just like Kepler. Maxwell was most likely originally planned to go to 20 nm, though, but that node didn't work out well for high power chips. Kepler was also the first thing released on 28 nm and Kepler was released for the first time on 28 nm. From what I can see, the last time NVIDIA put an older architecture onto a new process prior to a brand new architecture using the process was when they put Tesla on 40 nm before Fermi came out. That was in 2009.
Thanks for the correction - I'd forgotten about Kepler, just remembered that the 28nm transition was done on a new architecture. Kepler was a fairly dramatic change from Fermi, too - IIRC that's where they went from separate CUDA/core clock domains to shared clocks across the whole chip, but with double the shader resources of Fermi to compensate.
It's interesting that Argonne is getting the first DGX-A100s since it has an option to procure a supercomputer under the Coral-2 program. That supercomputer would not use Ampere-generation hardware, but rather post-Ampere, with a 2022 delivery, but perhaps they are evaluating the new technologies in NVIDIA's architecture post-Volta. The A100 does not increase base FP32 or FP64 performance that much, but includes new features, some of which potentially benefit HPC in terms of architectural efficiency, ease of programming, and acceleration of parts of traditional HPC code using tensor cores.
I think it's pretty safe to assume that the A100 will be iterated several times like the V100. As yields improve they will be able to enable more parts of the die to get performance uplift. Improved memory capacity is an easy win later on too. Meanwhile consumer and workstation GPUs will continue to refine certain features of the architecture in preparation for the replacement of the A100. This would be similar to Nvidia releasing Volta consumer cards alongside the V100, waiting a bit, releasing the Pascal cards, waiting a little longer, and releasing Turing. This lets them iterate consumer features quickly while giving their huge datacenter GPUs more time to be developed. People seem to forget that Nvidia has still been earning much more money from Gaming sales than Datacenter.
"This would be similar to Nvidia releasing Volta consumer cards alongside the V100, waiting a bit, releasing the Pascal cards, waiting a little longer, and releasing Turing."
There were no Volta consumer cards, and Pascal was before Volta?
I admit to having a brainfart about Pascal, but I was more making a hypothetical example of a progression rather than making up architecture names. I more meant *if* they had released consumer Volta, not when. My point was more that I think Nvidia will stick to Ampere in the datacenter for a couple of generations while consumer and workstation cards move on from Ampere to Hopper to whatever comes next. I'm basing this on the V100 getting multiple releases.
Agreed. I was hoping that Big Navi was going to bring high end gaming par with Nvidia's latest, but really, AMD haven't even caught up with the 3 year old 1080ti yet.
It looks like it is going to take a quantum leap from 5700XT to Big Navi for that to come true.
If AMD's claims of 1.5x PPW or RDNA and the rumours about increased die size bear out, then they could be in the running at the high-end again - but almost certainly not for the top-end halo card. Personally I'd be fine with that, as long as *something* happens to stem the attrition in the cost/performance ratio of high-end GPUs.
Quite an impressive beast. It would have been even more impressive if Nvidia had not disabled one of the HBM2 stacks (apparently for binning purposes and/or to keep the TDP from blowing past 400W) since -assuming the memory retained the same clock- that would result in a memory bandwidth of 1.92 TB/sec and 48 GB!
We’ve updated our terms. By continuing to use the site and/or by logging into your account, you agree to the Site’s updated Terms of Use and Privacy Policy.
124 Comments
Back to Article
Unashamed_unoriginal_username_x86 - Thursday, May 14, 2020 - link
Hype! Can't wait to hear more!Unashamed_unoriginal_username_x86 - Thursday, May 14, 2020 - link
Looking forward to gaming on the TITAN A :) :) :) :)... I hope they're good in SLI, NVlink bridges seem like a waste of money
Zingam - Thursday, May 14, 2020 - link
Imagine playing Minesweeper in glorious 8K! What a treat!mode_13h - Thursday, May 14, 2020 - link
You mean Minesweeper RTX!eek2121 - Thursday, May 14, 2020 - link
You mean dwarf fortress. ;)Deicidium369 - Thursday, May 14, 2020 - link
I can do that now - do you even have an 8K display?Korguz - Thursday, May 14, 2020 - link
i doubt you can, after all you cant even get your own facts straight.https://imgur.com/a/s9Ift1p
did banks even give business loans to 8 year old kids to start a " complete wood shop " ? did you drop out of elementary school to start this ?
Deicidium369 - Thursday, May 14, 2020 - link
So you have a problem with my wood shop or my machine shop? That was a response to someone talking about having a woodshop and wanting to build things. I have several businesses - the wood shop is a hobby. My machine shop is over 40K sq ft and has close to $35M in machines from DMG Mori, Mazak, Haas, etc. The machine shop is part of an engineering company I own. 16 Engineers, 5 production supervisors and about 5 other people doing whatever needs to be done. We work for large companies - most recently a major after market parts supplier and more specifically parts for the new Supras. We have worked for numerous national racing teams to develop parts and to build and deliver every thing from simple components to full chassis assemblies. Our process starts virtually and any new parts or assemblies are tested using our current 2 x 16xV100 DGX-2s. That was detailed in the paragraph above the one you highlighted.I have been working with wood even before I took industrial arts in school. I can make anything from cabinets to furniture. It something I enjoy doing. My dad was a union machinist, and he had a small hobby wood shop that I learned in. I had my own set of hand tools by the time I was 8 - and knew how to use them - all the machinery in the world is useless if you don't know how to put something together. You need to get your facts straight. And BTW - never once got a business loan in my life - never needed it.
Its more than a little creepy you are stalking me and taking screenshots - you think you have some sort of "gotcha" moment? Kid, I also own 2 other companies, one with well over 1000 employees and over $320M in gross revenues - we have production facilities in 10 states. 100% of my workforce is idled at the moment and all still drawing their full 40hr paycheck - most are installers making $22.50 per hour - so they are getting paid $900 to sit on their butts. No cuts to benefits.
The 3rd company is a private equity company I am 50% partner in. Business partner and the Godfather to my kids was a major VC in Cali even before the internet - invested in little companies such as Netscape, Silicon Graphics, Sun and quite a few others. Was a major investor in Cisco and later Juniper Networks and was an early angel to several companies who have gone public in the last few years.
While you were not even born I was building and in some cases selling businesses. in 1994 started the first ISP in the Houston TX area - in 1995 we had over 25K dial up customers, sold my interest and started another ISP focusing on mostly big bandwidth. OC3 and OC12 as well as various Sonet/SDH services. We had 50K dial up, 8K DSL (1st DSL testbed in Texas) as well as hundreds of lines to clients ranging from a single TI upto an OC12. We sold to a company that would become Level 3 Communications - I walked out with close to $43M in the bank - that was invested over the course of 20 years and is worth many many multiples of that, I was 28 when I sold the 2nd ISP - I retired from doing anything I didn't want to do to make a living. To me retiring is not sitting on a beach somewhere drinking margaritas.
I don't know what your infatuation with me is, but it's creepy as hell. I am sorry you come from a disadvantaged background where even hand tools were out of reach, but that is not my problem. I feel bad for you that you had no examples of successful people for you to emulate and become successful yourself - instead you are a warrior who thinks he pulled off some kind of Gotcha!!
Well kid, I am off - the Silver Salmon are starting to run on the Copper River in Alaska - so have fun, I am sure you have tons of my posts screen shotted - so GL with that
Korguz - Friday, May 15, 2020 - link
yea right you do, YOU said you RETIRED 20 years ago when YOU were 28, YOU said YOU started that woodshop 40 YEARS ago, YOU werent talking about them, YOU were talking about you " I started 40 years ago with a next to nothing " " The engineering is the same whether it's in my metal / composites shop or the wood shop. " that is YOU talking about YOU starting the business not the person YOU are replying to. whats the matter Deicidium369, got caught in a LIE and now have to lie even more to try to get out of it ? most of your posts are pure BS and you know it. you rarely, IF EVER post and links of proof to your BS, when confronted or called out on your BS, you seem to do two things, run away with your tail between your legs, or reply with insults, name calling or condescending comments, just like your replies to me, and ANY one else that calls you out on your made up BS, even those that write about computer related stuff, like Jarred W, Ian and Ryan on here. that seems to be why you were banned on toms.blah blah blah blah is all i read in this BS post, all fiction.
going by this BS post, you are either around 45 years old, or 60+ but cause you cant get your own facts straight, who knows which is the truth, and which is fiction, like your posts.
PeterCollier - Friday, May 15, 2020 - link
Your obsession with a successful person is not aiding your success in life.Farwalker2u - Friday, May 15, 2020 - link
The post of Deicidium369 you linked on imgur, specifically the highlighted portion is consistent with his post above.Deicidium369 noted he started a woodworking hobby at age 8, he stated he started with next to nothing and built up a complete workshop over a period of 40 years. That is all consistent.
I do not know whether you suffer from some form of dyslexia or what, but you seem to jumble a lot of stuff together without reason, then your version makes no sense.
Rοb - Friday, May 15, 2020 - link
He said he pays cheap, so it's probably not all BS.Gastec - Sunday, May 17, 2020 - link
Deicidium369 looks like a troll. This is what it posted further down in this comments section:"6th [stack] might be for sideband/native HBM ECC. ECC on HBM is not usually inline like on DDR"
Do you know of many millionaires and business owners to have that language or even have the time or interest to write comments on tech sites?
mode_13h - Sunday, May 17, 2020 - link
Eh, he said his businesses are idled, which is plausible. His activity on this site started about when the lockdowns hit.Many tech entrepreneurs probably started out as tech enthusiasts.
mrvco - Sunday, May 17, 2020 - link
Seek help. Seriously. Call a friend, family member or a professional.mode_13h - Sunday, May 17, 2020 - link
@Korguz, drop the fixation with his personal & business life. Please stick to commenting the technical content of his posts, or pointing out if/when he displays a lack of decorum.dersteffeneilers - Saturday, May 16, 2020 - link
ok this is epicGastec - Sunday, May 17, 2020 - link
Let me guess, you are also a programmer and if you are not yet, you could be at any moment :)Gastec - Sunday, May 17, 2020 - link
Now I've read better between the lines of Delirium369 and I can only ROFL! Good trolling but I suspect mental disturbances :)tamalero - Sunday, May 17, 2020 - link
Deicidium's post reads similar to the infamous gorilla warfare meme.Gigaplex - Friday, May 15, 2020 - link
Get a room you twoSpunjji - Friday, May 15, 2020 - link
Dude, Deicidium is a pain in the ass but this is getting silly now...Smell This - Thursday, May 14, 2020 - link
Minesweeper RTX ... @16K !!
Is the A100 (826mm2 die size) a somewhat 'modern' record for IC on a 300mm wafer? I understand these are 'artistic renderings' but the WxH almost looks like 2:1 __ further reducing the wafer 'cull' (maybe 75 dies per wafer?)
AND ... it is interesting that the HBM from the top image (with the nVidia logo) has been rotated 90 degrees on the PCB on other images.
name99 - Thursday, May 14, 2020 - link
Cerebras looks at your puny 826mm^2 and laughs!But yeah, sticking to "standard" chips, one hopes that nV has a plan for chiplets. 7nm and 5nm still offer standard reticle, but TSMC 3nm is probably going to be high NA, probably meaning reticle size halves...
mode_13h - Sunday, May 17, 2020 - link
Yeah, no joke. I honestly wonder how things will play out, for Cerebras. It seems like they could own the cloud AI market, if all their claims are 100% legit.Spunjji - Monday, May 18, 2020 - link
I'm looking forwards to finding that out. It's certainly a bold strategy, but I have a strong suspicion that any benefits they get from having everything "all on one chip" will be wiped out by the sheer expense of fabricating, packaging and building a platform around that 300mm^2 "chip".lilkwarrior - Friday, May 15, 2020 - link
You would want to use NVLINK being way faster and probably the only way to do mGPU as SLI is dead.willis936 - Thursday, May 14, 2020 - link
That transistor count and memory throughput is astounding for such a modest increase in die size. Exciting times.CrazyElf - Thursday, May 14, 2020 - link
They went to 7nm, keep that in mind. Originally they were on 12nm on the old process, which is really an updated version of the 16/20nm hybrid TSMC process.Jon Tseng - Thursday, May 14, 2020 - link
yeah its basically a what 1.5 node shrink (give TSMC is like 1.7-1.8 ish scaling factors?). Guess it was worth waiting around for the yields to finally support a full reticle die!Kangal - Friday, May 15, 2020 - link
Yeah, it does look impressive.As much as I like to jump on the Nvidia hate-wagon, I have to give them kudos here.
Based on initial impressions, it looks like the RX 5700XT will be competing against the lowly RTX 3060. So Nvidia have a big lead once more. And with the newer hardware (lots more Tensor cores) and (standardised RT/DX 12.2 optimised) software, it looks like "RTX On" is no longer going to be a joke. With the PS5/XBX supporting it crudely, should see many studious/games finally adopt it too.
...sucks to be a RTX-2000's owner right about now, Heh
Spunjji - Friday, May 15, 2020 - link
Yeah, I'm feeling mighty smug about my decision to keep sweating my Maxwell GPU and wait out the RTX 20 series!Of course, part of that is because I'm a laptop gamer and I was waiting for RX 5700 to make it to a notebook design so that I could get better performance without breaking the bank... so I guess AMD did me favour by never following through?! xD
It'll be interesting to see if AMD can actually meet Nvidia out of the gate this time - RDNA 2 vs Ampere - or if it'll be another log wait...
Koenig168 - Thursday, May 14, 2020 - link
54 billion transistors! 😮North01 - Thursday, May 14, 2020 - link
In their GTC 2020 video, NVIDIA claimed ~20 FP64:https://youtu.be/onbnb_D1wC8?t=347
North01 - Thursday, May 14, 2020 - link
They also claim that the DGX A100 (8x A100) puts out 156 FP64 (8*19.5):https://youtu.be/onbnb_D1wC8?t=992
SarahKerrigan - Thursday, May 14, 2020 - link
In their specification graphics, they say general FP64 is still 9.7 but the Tensor Core matrix multiplication units can bump it to ~19.5. So the truth is in between.https://3s81si1s5ygj3mzby34dq6qf-wpengine.netdna-s...
North01 - Thursday, May 14, 2020 - link
Thanks for clearing that up. They're hardly ever straightforward with these things.SarahKerrigan - Thursday, May 14, 2020 - link
No indeed. Generalized 1:1 FP32 or FP64 performance would have been surprising, so I felt like it was a good idea to dive deeper, and as usual it was Nvidia Marketing Fluff.eastcoast_pete - Thursday, May 14, 2020 - link
Depends on the intended use. As a deep learning-type accelerator, it will probably do quite well. I would have liked some information on pricing though, too. NVIDIA has a tendency to jack up prices a lot when moving from one generation to the next.eastcoast_pete - Thursday, May 14, 2020 - link
A lot of those additional transistors seem to go to many more Tensor cores, the 19.5 TFLOPs are for TF64. This thing is an accelerator, after all, and apparently geared to deep learning and related applications.@Ryan: could you do a deep(er) dive into the use and usefulness of tensor cores for graphics use, especially for games? I believe that a lot of the "so what?" here is about the otherwise underwhelming increase in non-tensor speeds compared to Volta.
brucethemoose - Thursday, May 14, 2020 - link
AI, AI, and more AI... Nvidia are basically inventing uses for them, like DLSS.Also, potentially out of game uses, like image and video enhancement.
Devs could theoretically come up with interesting gameplay uses, but there are quite a few barriers in the way of that.
whatthe123 - Thursday, May 14, 2020 - link
Disappointed in the small FP performance improvement since I'm assuming desktop variants will also only get a minor bump. Seems like people are still wary of adopting tensor cores since even AI upscalers like gigapixel still rely solely on FP32.brucethemoose - Friday, May 15, 2020 - link
Gigapixel is OpenCL anyway, AFAIK.But yeah, even the "good" CUDA projects use FP32 input/output for some reason.
Spunjji - Monday, May 18, 2020 - link
It could be that it's just how people are used to doing things, but it could also be that Nvidia's claims about no loss of accuracy aren't applicable in as many circumstances as they'd like people to believe.brucethemoose - Friday, May 15, 2020 - link
*Other than DLSS, which I bet is low precision.Eliadbu - Monday, May 18, 2020 - link
right now I think they are being used for 2 uses mainly with one upcoming use1st use is de-noising of images in game due low amount of rays when using RTX
2nd use is DLSS 2.0 (probably will be used also in future iterations)
3rd use will be RTX Voice when it will be fully released
tensor cores are useful mostly in matrix multiplication if you can get your calculation to be done efficiently by operations on matrices you can get big speed up. the reason we see big increase in tensor cores speed is Nvidia doubling down of AI calculation (be it inference or training) to be the major need for accelerators in HPC. while there are many accelerators that are made specifically for AI computation Nvidia is assuming that clients would also want the accelerators to do other GP computation.
peevee - Thursday, May 14, 2020 - link
Looks like most of the die is tensor cores now. Very small increase in regular FP precision for 2.5x more transistors.And of course there is a lie on the slide right there - "2oX VOLTA". Marketoids are evil.
mode_13h - Thursday, May 14, 2020 - link
It's not a lie, for a _very specific_ workload. So, they call it "spin". And yes, it's evil.eastcoast_pete - Thursday, May 14, 2020 - link
Those folks in marketing gotta eat, too. Unfortunately, NVIDIA marketing eats only the best, and that gets added to their prices, too.Deicidium369 - Thursday, May 14, 2020 - link
You kiddies have no point of reference... the Tensor cores make up a small amount of the die, and are the new architecture Tensor cores which are smaller and much more powerful. So no, most of the die is NOT tensor cores.p1esk - Thursday, May 14, 2020 - link
Where can I see the shot of the die, and how do you know which areas are tensor cores and which are not?Spunjji - Friday, May 15, 2020 - link
Weird claim when the article above says this:"A single Ampere tensor core has 4x the FMA throughput as a Volta tensor core, which has allowed NVIDIA to halve the total number of tensor cores per SM – going from 8 cores to 4 – and still deliver a functional 2x increase in FMA throughput. In essence, a single Ampere tensor core has become an even larger massive matrix multiplication machine"
So, fewer cores, but each core is capable of 4X the output in a greater number of formats and at slightly lower clock speeds. Does that really sound like a smaller unit to you?
Eliadbu - Monday, May 18, 2020 - link
what he says you have no based data about the how the die is devised and compared to Volta, so those are just assumptions which may be true or may be wrong.Jon Tseng - Thursday, May 14, 2020 - link
dare we ask if it can run the crysis remaster at 16k? :-pmode_13h - Thursday, May 14, 2020 - link
Right now, we don't know what kind of display output hardware is on die. You probably won't see a Quadro graphics card featuring this chip until the end of the year, at the earliest.And as noted elsewhere, the improvement in general-purpose compute is fairly lackluster. So, unless you plan on using DLSS 3.0 at max settings, you'll probably be disappointed with its gaming performance.
Jon Tseng - Thursday, May 14, 2020 - link
dammit why you have to be such a spoilsport! :-palthough to be fair I guess we won't get driver support for quite some times anyhow!
Deicidium369 - Thursday, May 14, 2020 - link
Would think the Quadro/Tesla cards first of the year - MAYBE this year with some of the less than perfect die - specs released today are not the "full" A100 - will see all 128SM and 8x8GB in a refresh once yields get better (new node and architecture for Nvidia - usually 1 of those are tackled at a time). The GeForce cards will be even less "perfect" than the Quadro/Tesla.philehidiot - Thursday, May 14, 2020 - link
Sod off, I wanted to ask about Crysis!Hurrrrrumph, I say.
Jon Tseng - Friday, May 15, 2020 - link
lol. anyhow the original version will still likely be CPU limited unless you have NO2 cooling on your processor! :-pqap - Thursday, May 14, 2020 - link
If the picture is render of real product, than this is cut-down version. For 40GB VRAM with 5120-bit bus there would need to be 5 chips (or multiple of that number). This will probably come later as yealds go up and they stockpile good dies.mode_13h - Thursday, May 14, 2020 - link
Maybe the 6th stack is simply to improve yield? Titan V had that weird thing where one of the 4 stacks was disabled, so maybe they frequently encounter defects in one of the stacks or the bus connecting it.Deicidium369 - Thursday, May 14, 2020 - link
or the 6th die being sideband/native HBM ECC.CrazyElf - Thursday, May 14, 2020 - link
It's a huge die, so it is not a surprise that yields will be imperfect.Jon Tseng - Thursday, May 14, 2020 - link
Weird though as you'd have though they'd make sure they had good die before the packaging process. Unless the interposer has some issues? But I wouldn't think the litho on that would be stupidly complex. Hmm.SaberKOG91 - Thursday, May 14, 2020 - link
They expect to have poor yields of perfect chips. They intentionally design the die with the expectation of disabling parts of it to get better yields on all chips from the wafer. The rest of the yield losses are offset by the higher product cost and by binning further disabled chips for high-end Quadros. The performance increase relative to even the losses from the disabled parts of the chip is still worth it. That's not to say that an MCM design wouldn't be much more economical, but we aren't quite there yet with interposer design.nandnandnand - Thursday, May 14, 2020 - link
https://www.geeks3d.com/20200514/nvidia-ampere-ga1...Full version is 8192 cores, 48 GB VRAM, 6144-bit bus, etc.
Kevin G - Friday, May 15, 2020 - link
Fully functional yields can't be great with a 826 mm^2 die. This does leave room open for a good refresh as they do improve overtime.This also ignores any sort of power binning which is a different axis.
GreenReaper - Saturday, May 16, 2020 - link
The more on die, the more you can shave.Valrandir - Thursday, May 14, 2020 - link
Most Excellentmode_13h - Thursday, May 14, 2020 - link
For deep learning? Yes, indeed.mode_13h - Thursday, May 14, 2020 - link
> On a generation-over-generation basis, power consumption has once again gone up, which is probably fitting for a generation called Ampere.Oooh, good one! It was almost too obvious.
GC2:CS - Thursday, May 14, 2020 - link
Does that mean “Volta” was supposed to lower the Power draw ?What about next gen ? Are they Joule or (kilo)Watt ? :D
Deicidium369 - Thursday, May 14, 2020 - link
Volta was 300 then later 350W. Our first DGX-2 had the 300W and drew ~10KW - when we added the 2nd unit we got the 350W version (1st unit swapped out) and each then went to 12KW draw. Near linear scaling - 20% perf increase on 20% higher power draw.Volta was alot more power efficient than the Pascal units it replaced.
mode_13h - Monday, May 18, 2020 - link
It was a joke, and you seem to have missed it.jabbadap - Thursday, May 14, 2020 - link
40GB and 5120bit bus? Is one of those hbms disabled or what?Kjella - Thursday, May 14, 2020 - link
It seems likely, you also have the ability to slice it into 7 instances. Why not 8? Something's disabled. They're probably saving up the perfect chips because of yield and for some niche they can charge an extra premium for. I mean you won't care if you put 1000 of these into a data center, but if you want to make one extra pricey workstation chip or whatever.Deicidium369 - Thursday, May 14, 2020 - link
this is not the full 128SM chip - this is a massive die and not only a new process, but a new architecture - so will be a while before we get the full 128SM and 48 or 64GB units.Ryan Smith - Thursday, May 14, 2020 - link
Correct. For yield reasons only 10 of the 12 MC partitions are enabled.Deicidium369 - Thursday, May 14, 2020 - link
Amazing that the tackled both the new node and new arch at once - the move from TSMC 16 to 12 was really nothing compared to this.Spunjji - Friday, May 15, 2020 - link
Didn't they also do that for Pascal? It seems like something they've been more liable to do since node progress began stalling.Spunjji - Friday, May 15, 2020 - link
(Not to say it's not impressive, it sure as hell is)mmusto - Thursday, May 14, 2020 - link
More than double the transistor count going from 12nm to 7mn process but only 25% TFLOP improvement? I think there is much room for design optimization remaining on this process.Gigaplex - Friday, May 15, 2020 - link
Adding support for more formats takes more transistors.Duncan Macdonald - Thursday, May 14, 2020 - link
The sixth stack might just be a dummy to get a level surface for the cooler. (Like the nonfunctional chips used to balance some of AMD's Threadripper packages.)Deicidium369 - Thursday, May 14, 2020 - link
6th might be for sideband/native HBM ECC. ECC on HBM is not usually inline like on DDRGastec - Sunday, May 17, 2020 - link
"sideband/native HBM ECC"? That's not the lingo of a millionaire private investor and business owner. If fact, I'm pretty sure those fellows don't have the time or interest to write comments on AnandTech's articles.zamroni - Thursday, May 14, 2020 - link
how much is the yield for 800+ mm2 die?del42sa - Thursday, May 14, 2020 - link
not goodDeicidium369 - Thursday, May 14, 2020 - link
this is not the fully enabled chip - so one can imagine - not greatNotgeralt - Thursday, May 14, 2020 - link
Hopefully the RTX lineup won’t have all the transistor space taken up by tensor cores (and RT cores). I feel like at least the xx80ti version should have a General fp32 performance above 20tflop at this point.BenSkywalker - Friday, May 15, 2020 - link
Swap the FP64 cores for FP32 and you're at roughly 30TFLOPS. I'd much rather keep the tensor and RT cores, AMD has the progress hating crowd covered very well.Spunjji - Friday, May 15, 2020 - link
More RT cores is going to be great, and badly needed to make the tech worthwhile. Not so sure about the Tensor cores - they're still not doing anything particularly useful at the moment. Perhaps with more of them they can make their DLSS algorithms more complex?BenSkywalker - Saturday, May 16, 2020 - link
Tensor cores can and sometimes(most/all?) are used for denoising sparse ray samples. I'm still not sold on DLSS, but 2.0 is a big improvement over 1.0 so maybe it'll get decent?Spunjji - Monday, May 18, 2020 - link
Of course, I'd forgotten about that. I'm not aware of any literature on which parts of the RT pipeline impose the greatest bottleneck, but I guess if they've improved that hardware then an increase in Tensor performance would also be necessary to get a net performance gain.mode_13h - Monday, May 18, 2020 - link
I also thought they used tensor cores for denoising, but recently I've not been able to find any evidence of this. At least, not in *games* that use global illumination.Yojimbo - Thursday, May 14, 2020 - link
20 SMs are disabled on the A100, so it only uses 84.4% of its CUDA cores. V100, by comparison, only had 4 SMs disabled, using 95.2% of its CUDA cores.In addition, 2 512-bit memory controllers are disabled on the A100. The V100 did not disable any memory controllers.
So I think that we can expect an updated A100 GPU in the future with more compute performance, more memory bandwidth, and higher memory capacity. Perhaps one with 120 SMs enabled (11% more CUDA cores) and 48 GB of memory (or even 96GB if they want to), as well as 1.92 TB/s of memory bandwidth.
Deicidium369 - Thursday, May 14, 2020 - link
New node (from 12nm to 7nm) and new architecture - so not so much about anything being disabled - as not functional. The full 128SM unit and with 6x8GB or even 6x16GB for a refresh down the road. Tacking both the new node and the new arch in a single step is aggressive.Yojimbo - Thursday, May 14, 2020 - link
Of course they are disabled for yield reasons. They never disable them just for fun. But, as I said, later they will come out with a more enabled version.And NVIDIA always comes out with a new architecture and a new node at the same time.
Spunjji - Friday, May 15, 2020 - link
They avoided it for a long time after the GeForce FX series, but IIRC that changed when the 32nm nodes disappeared and they had to jump right to 28nm with Maxwell. They've gone to a new architecture with each new node since then.Yojimbo - Friday, May 15, 2020 - link
Maxwell was on 28 nm just like Kepler. Maxwell was most likely originally planned to go to 20 nm, though, but that node didn't work out well for high power chips. Kepler was also the first thing released on 28 nm and Kepler was released for the first time on 28 nm. From what I can see, the last time NVIDIA put an older architecture onto a new process prior to a brand new architecture using the process was when they put Tesla on 40 nm before Fermi came out. That was in 2009.Spunjji - Monday, May 18, 2020 - link
Thanks for the correction - I'd forgotten about Kepler, just remembered that the 28nm transition was done on a new architecture. Kepler was a fairly dramatic change from Fermi, too - IIRC that's where they went from separate CUDA/core clock domains to shared clocks across the whole chip, but with double the shader resources of Fermi to compensate.Yojimbo - Thursday, May 14, 2020 - link
It's interesting that Argonne is getting the first DGX-A100s since it has an option to procure a supercomputer under the Coral-2 program. That supercomputer would not use Ampere-generation hardware, but rather post-Ampere, with a 2022 delivery, but perhaps they are evaluating the new technologies in NVIDIA's architecture post-Volta. The A100 does not increase base FP32 or FP64 performance that much, but includes new features, some of which potentially benefit HPC in terms of architectural efficiency, ease of programming, and acceleration of parts of traditional HPC code using tensor cores.Deicidium369 - Thursday, May 14, 2020 - link
Could be Hopper based on the timelineYojimbo - Thursday, May 14, 2020 - link
I don't think anyone knows what "Hopper" is.SaberKOG91 - Thursday, May 14, 2020 - link
I think it's pretty safe to assume that the A100 will be iterated several times like the V100. As yields improve they will be able to enable more parts of the die to get performance uplift. Improved memory capacity is an easy win later on too. Meanwhile consumer and workstation GPUs will continue to refine certain features of the architecture in preparation for the replacement of the A100. This would be similar to Nvidia releasing Volta consumer cards alongside the V100, waiting a bit, releasing the Pascal cards, waiting a little longer, and releasing Turing. This lets them iterate consumer features quickly while giving their huge datacenter GPUs more time to be developed. People seem to forget that Nvidia has still been earning much more money from Gaming sales than Datacenter.PopinFRESH007 - Thursday, May 14, 2020 - link
"This would be similar to Nvidia releasing Volta consumer cards alongside the V100, waiting a bit, releasing the Pascal cards, waiting a little longer, and releasing Turing."There were no Volta consumer cards, and Pascal was before Volta?
SaberKOG91 - Thursday, May 14, 2020 - link
I admit to having a brainfart about Pascal, but I was more making a hypothetical example of a progression rather than making up architecture names. I more meant *if* they had released consumer Volta, not when. My point was more that I think Nvidia will stick to Ampere in the datacenter for a couple of generations while consumer and workstation cards move on from Ampere to Hopper to whatever comes next. I'm basing this on the V100 getting multiple releases.Whiteknight2020 - Thursday, May 14, 2020 - link
Or hot spare block with remapping, no need to swap out the card?Whiteknight2020 - Thursday, May 14, 2020 - link
Or hot spare block with remapping, no need to swap out the card?yeeeeman - Thursday, May 14, 2020 - link
Come on Ryan, you have all Ampere details here: https://devblogs.nvidia.com/nvidia-ampere-architec...ksec - Thursday, May 14, 2020 - link
Feeling a little bit sorry for AMD. Sigh.Well Done Nvidia.
blppt - Thursday, May 14, 2020 - link
Agreed. I was hoping that Big Navi was going to bring high end gaming par with Nvidia's latest, but really, AMD haven't even caught up with the 3 year old 1080ti yet.It looks like it is going to take a quantum leap from 5700XT to Big Navi for that to come true.
Spunjji - Friday, May 15, 2020 - link
Didn't the Radeon VII catch up with the 1080Ti?If AMD's claims of 1.5x PPW or RDNA and the rumours about increased die size bear out, then they could be in the running at the high-end again - but almost certainly not for the top-end halo card. Personally I'd be fine with that, as long as *something* happens to stem the attrition in the cost/performance ratio of high-end GPUs.
blppt - Sunday, May 17, 2020 - link
Most of the time the 5700XT provides similar performance to the VII, except at 4k, where the HBM bandwidth wins a lot of the time.Both are usually behind the 1080ti in benchmarks i've seen.
zepi - Friday, May 15, 2020 - link
Each of these has two top-of-the-line EPYC's. AMD is getting their fair share.zentwo - Friday, May 15, 2020 - link
So is this on PCIe 4?Yojimbo - Friday, May 15, 2020 - link
Yes, this supports PCIe 4.ANORTECH - Friday, May 15, 2020 - link
Next Event: Quadros!!!! yeahSantoval - Saturday, May 16, 2020 - link
Quite an impressive beast. It would have been even more impressive if Nvidia had not disabled one of the HBM2 stacks (apparently for binning purposes and/or to keep the TDP from blowing past 400W) since -assuming the memory retained the same clock- that would result in a memory bandwidth of 1.92 TB/sec and 48 GB!mode_13h - Monday, May 18, 2020 - link
If it were simply for power reasons, they could just dial clocks down further and still probably come out ahead. I'm 99% sure it's for yield.rahul9931716812 - Sunday, May 17, 2020 - link
Amazing Post Thanks <a href="https://www.shayaribag.com/">hindi Shayari</a>
rahul9931716812 - Sunday, May 17, 2020 - link
https://www.shayaribag.comPrayForDeath - Sunday, May 17, 2020 - link
It says "Greatest generational leap - 20X Volta" in one of the slides. How does nvidia get away with blatantly lying like this?Spunjji - Monday, May 18, 2020 - link
By being very specific in the small print. Also, America.