NVIDIA Announces Tesla P100 Accelerator - Pascal GP100 Power for HPC

by Ryan Smith on April 5, 2016 3:15 PM EST

93 Comments | Add A Comment

93 Comments

Today at their 2016 GPU Technology Conference, NVIDIA announced the first of their Pascal architecture powered Tesla cards, the Tesla P100. The P100 is the first major update to the Tesla HPC family since the launch of the first Kepler cards in late 2012, and represents a very sizable performance increase for the Tesla family thanks to the combination of the smaller 16nm manufacturing process and the Pascal architecture.

NVIDIA Tesla Family Specification Comparison
	Tesla P100	Tesla K80	Tesla K40	Tesla M40
Stream Processors	3584	2 x 2496	2880	3072
Core Clock	1328MHz	562MHz	745MHz	948MHz
Boost Clock(s)	1480MHz	875MHz	810MHz, 875MHz	1114MHz
Memory Clock	1.4Gbps HBM2	5Gbps GDDR5	6Gbps GDDR5	6Gbps GDDR5
Memory Bus Width	4096-bit	2 x 384-bit	384-bit	384-bit
Memory Bandwidth	720GB/sec	2 x 240GB/sec	288GB/sec	288GB/sec
VRAM	16GB	2 x 12GB	12GB	12GB
Half Precision	21.2 TFLOPS	8.74 TFLOPS	4.29 TFLOPS	6.8 TFLOPS
Single Precision	10.6 TFLOPS	8.74 TFLOPS	4.29 TFLOPS	6.8 TFLOPS
Double Precision	5.3 TFLOPS (1/2 rate)	2.91 TFLOPS (1/3 rate)	1.43 TFLOPS (1/3 rate)	213 GFLOPS (1/32 rate)
GPU	GP100 (610mm2)	GK210	GK110B	GM200
Transistor Count	15.3B	2 x 7.1B(?)	7.1B	8B
TDP	300W	300W	235W	250W
Cooling	N/A	Passive	Active/Passive	Passive
Manufacturing Process	TSMC 16nm FinFET	TSMC 28nm	TSMC 28nm	TSMC 28nm
Architecture	Pascal	Kepler	Kepler	Maxwell 2

Powering the Tesla P100 is a partially disabled version of NVIDIA's new GP100 GPU, with 56 of 60 SMs enabled. GP100 is a whale of a GPU, measuring 610mm2 in die size on TSMC's 16nm FinFET process and composed of 15.3B transistors. It is remarkable in and of itself that NVIDIA and TSMC are in volume production of such a large 16nm die at this time, as everything else we've seen with a confirmed size is at best one-quarter of this size. GP100 is being produced on TSMC's Chip-On-Wafer-On-Substrate technology, with this line apparently being used for putting the GPU and HBM2 DRAM stacks on the same interposer.

We'll dive into the full Pascal architecture (as implemented by GP100) at a later time, but it's worth noting that Pascal here is 64 FP32 CUDA cores per SM, versus 128 on Maxwell. Each of those SMs also contains 32 FP64 CUDA cores - giving us the 1/2 rate for FP64 - and new to the Pascal architecture is the ability to pack 2 FP16 operations inside a single FP32 CUDA core under the right circumstances. With a boost clock of 1.48GHz, altogether Tesla P100 will offer 10.6 TFLOPS of FP32 performance or 5.3 TFLOPS of FP64 performance, more than doubling and tripling Tesla K40's rated throughput on these metrics respectively. NVIDIA has been happy to crow about the performance of Tesla P100, and for good reason, as this stands to be a very powerful processor.

Paired with the GP100 GPU on Tesla P100 is 16GB of HBM2 VRAM, laid out in 4 stacks for a 4096-bit memory bus. NVIDIA quotes P100 as offering 720GB/sec of memory bandwidth, which works out to a memory clock of 1.4Gbps. As we've seen with other HBM products, this marks a significant increase in memory bandwidth, more than doubling NVIDIA's last generation of cards.

In their announcement, NVIDIA also confirmed that Tesla P100 will support NVLink, with 4 NVLink controllers. Previously announced, NVLink will allow GPUs to connect to either each other or to supporting CPUs (OpenPOWER), offering a higher bandwidth cache coherent link than what PCIe 3 offers. This link will be important for NVIDIA for a number of reasons, as their scalability and unified memory plans are built around its functionality.

Speaking of functionality, Tesla P100 and the underlying GP100 GPU is a full-featured HPC GPU. It supports all of the HPC-centric functionality that the Tesla K20/40/80 embodied, including ECC memory protection for the register file, caches, and HBM2 DRAM. Coupled with the very high FP64 rate, and it's clear that this is the successor of the GK110/GK210 GPU.

NVIDIA's pictures also confirm that this is using their new mezzanine connector, with flat boards no longer on perpendicular cards. This is a very HPC-centric design (I'd expect to see plenty of PCIe cards in time as well), but again was previously announced and is well suited for the market NVIDIA is going after, where these cards will be installed in a manner very similar to LGA CPUs. The P100 is rated for a TDP of 300W, so the cooling requirements are a bit higher than last-generation cards, most of which were in the 230W-250W range.

Finally, in its initial implementation NVIDIA is focusing on customers that need extreme scaling capabilities, and I wouldn't be too surprised if this was in part due to the margins of that market and how these initial cards will be in demand. NVLink of course plays a big part here, with NVIDIA able to go up to 8-way configurations thanks to it.

Gallery: NVIDIA Tesla P100

Source: NVIDIA

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

93 Comments

View All Comments

frostyfiredude - Tuesday, April 5, 2016 - link
Maybe it wasn't clear, but I'm not expecting the GTX1080 Ti to use the PM200 die since it's so huge. I expect a derivative die like PM202 will be made for it that's significantly smaller, has similar FP32 performance but cut down FP64 performance.
frostyfiredude - Tuesday, April 5, 2016 - link
Oops used Samsung's naming; GP100 and GP102 not PM200 and PM202
T1beriu - Tuesday, April 5, 2016 - link
As far as I observed, SM architecture doesn't change from die to die, so I think FP64 ratio will be the same. Looking back and comparing Quadro and Tesla with consumer cards I can't see a FP64 ratio reduction.

Because it is huge HBM2 is expensive, I think GP100 will be a HPC exclusive until Q2/Q3'17 when it will released as '80 Ti.

I expect the lower sized GP104 to be released at Computex '16 @ 350-400 sqmm with GDDR5.

GTX1070 @ 5.5 TFLOPS and GTX80 @ 7 TFLOPS that replace the GM200.

GM204 will live for another season.
dragonsqrrl - Tuesday, April 5, 2016 - link
"Looking back and comparing Quadro and Tesla with consumer cards I can't see a FP64 ratio reduction."
Consumer cards based on GF100/110 and GK110 had reduced double precision performance. This was accomplished by either limiting clocks under double precision workloads through firmware and drivers, or fusing off FP64 cores during binning. GF100/110 is capable of 1/2 FP32, but the 580/570 were limited to 1/8. GK110 is capable of 1/3, but the 780/Ti were limited to 1/24.
T1beriu - Wednesday, April 6, 2016 - link
It seems I haven't done my research properly. Thanks for the explanation.
extide - Thursday, April 7, 2016 - link
Actually the lower tier dies typically have less FP64 h/w anyways. GK110 had 1/3 rate FP64, but GK104 and down had less than that.
ZeDestructor - Tuesday, April 5, 2016 - link
Actually, they'll use the same die with FP64 shut off most likely, like the 780 and 780Ti.
ImSpartacus - Tuesday, April 5, 2016 - link
GK210 in the K80?

Wasn't GK210 only ever used in the K80 and nothing else?
dragonsqrrl - Tuesday, April 5, 2016 - link
True, an exception to the rule... sort of. Even in the case of GK210, it was still very much GK110. If I remember right they made some compute specific enhancements like larger caches and registers. But I think the point is that the top tier compute oriented GPU for a given architecture, like GF110 or GK110 have always been used in both consumer and pro cards. The 'big die' is never pro exclusive.
ImSpartacus - Wednesday, April 6, 2016 - link
Yeah, I think we'll see a gp10p-based Titan, but the question is: will we see gp100 throughout the rest of the consumer lineup?

There are still rumors the existence of a gp102 part. I wonder if that could be a slow-dp gddr5x part that fills out most of consumer lineup above gp104.

The idea is that it'd be a much smaller die uses a less costly memory technology to get nearly the same gaming performance.

NVIDIA Announces Tesla P100 Accelerator - Pascal GP100 Power for HPC

Post Your Comment

93 Comments

View All Comments

frostyfiredude - Tuesday, April 5, 2016 - link

frostyfiredude - Tuesday, April 5, 2016 - link

T1beriu - Tuesday, April 5, 2016 - link

dragonsqrrl - Tuesday, April 5, 2016 - link

T1beriu - Wednesday, April 6, 2016 - link

extide - Thursday, April 7, 2016 - link

ZeDestructor - Tuesday, April 5, 2016 - link

ImSpartacus - Tuesday, April 5, 2016 - link

dragonsqrrl - Tuesday, April 5, 2016 - link

ImSpartacus - Wednesday, April 6, 2016 - link

Log in

Don't have an account? Sign up now