GPU microarchitecture by Nvidia
Nvidia Volta Release date December 7, 2017 Codename Volta Fabrication process TSMC 12 nm (FinFET )Enthusiast Tesla V100 Tesla V100S PCIe Titan V Titan V CEO Edition Quadro GV100 Predecessor Pascal Variant Turing (consumer, professional)Successor Ampere (consumer, professional)Supported
Painting of Alessandro Volta, eponym of architecture
Volta is the codename, but not the trademark,[ 1] for a GPU microarchitecture developed by Nvidia , succeeding Pascal . It was first announced on a roadmap in March 2013,[ 2] although the first product was not announced until May 2017.[ 3] The architecture is named after 18th–19th century Italian chemist and physicist Alessandro Volta . It was Nvidia's first chip to feature Tensor Cores , specially designed cores that have superior deep learning performance over regular CUDA cores.[ 4] The architecture is produced with TSMC 's 12 nm FinFET process. The Ampere microarchitecture is the successor to Volta.
The first graphics card to use it was the datacenter Tesla V100, e.g. as part of the Nvidia DGX-1 system.[ 3] It has also been used in the Quadro GV100 and Titan V. There were no mainstream GeForce graphics cards based on Volta.
After two USPTO proceedings,[ 5] [ 6] on Jul. 03, 2023 Nvidia lost the Volta trademark application in the field of artificial intelligence. The Volta trademark[ 7] owner remains Volta Robots, a company specialized in AI and vision algorithms for robots and unmanned vehicles.
Details
Architectural improvements of the Volta architecture include the following:
CUDA Compute Capability 7.0
concurrent execution of integer and floating point operations
TSMC 's 12 nm FinFET process,[ 8] allowing 21.1 billion transistors .[ 9]
High Bandwidth Memory 2 (HBM2),[ 8] [ 10]
NVLink 2.0 : a high-bandwidth bus between the CPU and GPU, and between multiple GPUs. Allows much higher transfer speeds than those achievable by using PCI Express ; estimated to provide 25 Gbit/s per lane.[ 11] (Disabled for Titan V)
Tensor cores: A tensor core is a unit that multiplies two 4×4 FP16 matrices, and then adds a third FP16 or FP32 matrix to the result by using fused multiply–add operations, and obtains an FP32 result that could be optionally demoted to an FP16 result.[ 12] Tensor cores are intended to speed up the training of neural networks.[ 12] Volta's Tensor cores are first generation while Ampere has third generation Tensor cores.[ 13] [ 14]
PureVideo Feature Set I hardware video decoding
Comparison of Compute Capability: GP100 vs GV100 vs GA100[ 15]
GPU features
Nvidia Tesla P100
Nvidia Tesla V100
Nvidia A100
GPU codename
GP100
GV100
GA100
GPU architecture
Nvidia Pascal
Nvidia Volta
Nvidia Ampere
Compute capability
6.0
7.0
8.0
Threads / warp
32
32
32
Max warps / SM
64
64
64
Max threads / SM
2048
2048
2048
Max thread blocks / SM
32
32
32
Max 32-bit registers / SM
65536
65536
65536
Max registers / block
65536
65536
65536
Max registers / thread
255
255
255
Max thread block size
1024
1024
1024
FP32 cores / SM
64
64
64
Ratio of SM registers to FP32 cores
1024
1024
1024
Shared Memory Size / SM
64 KB
Configurable up to 96 KB
Configurable up to 164 KB
Comparison of Precision Support Matrix[ 16] [ 17]
Supported CUDA Core Precisions
Supported Tensor Core Precisions
FP16
FP32
FP64
INT1
INT4
INT8
TF32
BF16
FP16
FP32
FP64
INT1
INT4
INT8
TF32
BF16
Nvidia Tesla P4
No
Yes
Yes
No
No
Yes
No
No
No
No
No
No
No
No
No
No
Nvidia P100
Yes
Yes
Yes
No
No
No
No
No
No
No
No
No
No
No
No
No
Nvidia Volta
Yes
Yes
Yes
No
No
Yes
No
No
Yes
No
No
No
No
No
No
No
Nvidia Turing
Yes
Yes
Yes
No
No
No
No
No
Yes
No
No
Yes
Yes
Yes
No
No
Nvidia A100
Yes
Yes
Yes
No
No
Yes
No
Yes
Yes
No
Yes
Yes
Yes
Yes
Yes
Yes
Legend:
FPnn: floating point with nn bits
INTn: integer with n bits
INT1: binary
TF32: TensorFloat32
BF16: bfloat16
Comparison of Decode Performance
Concurrent streams
H.264 decode (1080p30)
H.265 (HEVC) decode (1080p30)
VP9 decode (1080p30)
V100
16
22
22
A100
75
157
108
Products
Volta has been announced as the GPU microarchitecture within the Xavier generation of Tegra SoC focusing on self-driving cars .[ 18] [ 19]
At Nvidia's annual GPU Technology Conference keynote on May 10, 2017, Nvidia officially announced the Volta microarchitecture along with the Tesla V100.[ 3] The Volta GV100 GPU is built on a 12 nm process size using HBM2 memory with 900 GB/s of bandwidth.[ 20]
Nvidia officially announced the Nvidia TITAN V on December 7, 2017.[ 21] [ 22]
Nvidia officially announced the Quadro GV100 on March 27, 2018.[ 23]
Model
Launch
Code Name (s)
Fab (nm )
Transistors (billion)
Die size (mm2 )
Bus Interface
Core config
SM Count[ a]
Graphics Processing Clusters[ b]
L2 Cache Size (MiB )
Clock speeds
Fillrate
Memory
Processing power (GFLOPS )
TDP (Watts)
NVLink Support
Launch Price (USD )
CUDA core[ c]
Tensor core[ d]
Base core clock (MHz )
Boost clock (MHz )
Memory (MT/s )
Pixel (GP /s)
Texture (GT /s)
Size (GiB )
Bandwidth (GB /s)
Bus Type
Bus width (bit )
Single precision (boost)
Double precision (boost)
Half precision (boost)
MSRP
Nvidia Titan V[ 24]
December 7, 2017
GV100-400-A1
TSMC 12 nm
21.1
815
PCIe 3.0 ×16
5120:320:96
640
80
6
4.5
1200
1455
1700
139.7
465.6
12
652.8
HBM2
3072
12288 (14899)
6144 (7450)
24576 (29798)
250
No
$2,999
Nvidia Quadro GV100[ 25]
March 27, 2018
GV100
5120:320:128
6
1132
1628
1696
208.4
521
32
868.4
4096
11592 (16671)
5796 (8335)
23183 (33341)
Yes
$8,999
Nvidia Titan V CEO Edition[ 26] [ 27]
June 21, 2018
1200
1455
1700
186.2
465.6
870.4
12288 (14899)
6144 (7450)
24576 (29798)
N/A
Application
Volta is also reported to be included in the Summit and Sierra supercomputers, used for GPGPU compute.[ 28] [ 29] The Volta GPUs will connect to the POWER9 CPUs via NVLink 2.0 , which is expected to support cache coherency and therefore improve GPGPU performance.[ 30] [ 11] [ 31]
V100 accelerator and DGX V100
Comparison of accelerators used in DGX:[ 32] [ 33] [ 34]
Model
Architecture
Socket
FP32 CUDA cores
FP64 cores (excl. tensor)
Mixed INT32/FP32 cores
INT32 cores
Boost clock
Memory clock
Memory bus width
Memory bandwidth
VRAM
Single precision (FP32)
Double precision (FP64)
INT8 (non-tensor)
INT8 dense tensor
INT32
FP4 dense tensor
FP16
FP16 dense tensor
bfloat16 dense tensor
TensorFloat-32 (TF32) dense tensor
FP64 dense tensor
Interconnect (NVLink)
GPU
L1 Cache
L2 Cache
TDP
Die size
Transistor count
Process
Launched
P100
Pascal
SXM/SXM2
N/A
1792
3584
N/A
1480 MHz
1.4 Gbit/s HBM2
4096-bit
720 GB/sec
16 GB HBM2
10.6 TFLOPS
5.3 TFLOPS
N/A
N/A
N/A
N/A
21.2 TFLOPS
N/A
N/A
N/A
N/A
160 GB/sec
GP100
1344 KB (24 KB × 56)
4096 KB
300 W
610 mm2
15.3 B
TSMC 16FF+
Q2 2016
V100 16GB
Volta
SXM2
5120
2560
N/A
5120
1530 MHz
1.75 Gbit/s HBM2
4096-bit
900 GB/sec
16 GB HBM2
15.7 TFLOPS
7.8 TFLOPS
62 TOPS
N/A
15.7 TOPS
N/A
31.4 TFLOPS
125 TFLOPS
N/A
N/A
N/A
300 GB/sec
GV100
10240 KB (128 KB × 80)
6144 KB
300 W
815 mm2
21.1 B
TSMC 12FFN
Q3 2017
V100 32GB
Volta
SXM3
5120
2560
N/A
5120
1530 MHz
1.75 Gbit/s HBM2
4096-bit
900 GB/sec
32 GB HBM2
15.7 TFLOPS
7.8 TFLOPS
62 TOPS
N/A
15.7 TOPS
N/A
31.4 TFLOPS
125 TFLOPS
N/A
N/A
N/A
300 GB/sec
GV100
10240 KB (128 KB × 80)
6144 KB
350 W
815 mm2
21.1 B
TSMC 12FFN
A100 40GB
Ampere
SXM4
6912
3456
6912
N/A
1410 MHz
2.4 Gbit/s HBM2
5120-bit
1.52 TB/sec
40 GB HBM2
19.5 TFLOPS
9.7 TFLOPS
N/A
624 TOPS
19.5 TOPS
N/A
78 TFLOPS
312 TFLOPS
312 TFLOPS
156 TFLOPS
19.5 TFLOPS
600 GB/sec
GA100
20736 KB (192 KB × 108)
40960 KB
400 W
826 mm2
54.2 B
TSMC N7
Q1 2020
A100 80GB
Ampere
SXM4
6912
3456
6912
N/A
1410 MHz
3.2 Gbit/s HBM2e
5120-bit
1.52 TB/sec
80 GB HBM2e
19.5 TFLOPS
9.7 TFLOPS
N/A
624 TOPS
19.5 TOPS
N/A
78 TFLOPS
312 TFLOPS
312 TFLOPS
156 TFLOPS
19.5 TFLOPS
600 GB/sec
GA100
20736 KB (192 KB × 108)
40960 KB
400 W
826 mm2
54.2 B
TSMC N7
H100
Hopper
SXM5
16896
4608
16896
N/A
1980 MHz
5.2 Gbit/s HBM3
5120-bit
3.35 TB/sec
80 GB HBM3
67 TFLOPS
34 TFLOPS
N/A
1.98 POPS
N/A
N/A
N/A
990 TFLOPS
990 TFLOPS
495 TFLOPS
67 TFLOPS
900 GB/sec
GH100
25344 KB (192 KB × 132)
51200 KB
700 W
814 mm2
80 B
TSMC 4N
Q3 2022
H200
Hopper
SXM5
16896
4608
16896
N/A
1980 MHz
6.3 Gbit/s HBM3e
6144-bit
4.8 TB/sec
141 GB HBM3e
67 TFLOPS
34 TFLOPS
N/A
1.98 POPS
N/A
N/A
N/A
990 TFLOPS
990 TFLOPS
495 TFLOPS
67 TFLOPS
900 GB/sec
GH100
25344 KB (192 KB × 132)
51200 KB
1000 W
814 mm2
80 B
TSMC 4N
Q3 2023
B100
Blackwell
SXM6
N/A
N/A
N/A
N/A
N/A
8 Gbit/s HBM3e
8192-bit
8 TB/sec
192 GB HBM3e
N/A
N/A
N/A
3.5 POPS
N/A
7 PFLOPS
N/A
1.98 PFLOPS
1.98 PFLOPS
989 TFLOPS
30 TFLOPS
1.8 TB/sec
GB100
N/A
N/A
700 W
N/A
208 B
TSMC 4NP
Q4 2024 (expected)
B200
Blackwell
SXM6
N/A
N/A
N/A
N/A
N/A
8 Gbit/s HBM3e
8192-bit
8 TB/sec
192 GB HBM3e
N/A
N/A
N/A
4.5 POPS
N/A
9 PFLOPS
N/A
2.25 PFLOPS
2.25 PFLOPS
1.2 PFLOPS
40 TFLOPS
1.8 TB/sec
GB100
N/A
N/A
1000 W
N/A
208 B
TSMC 4NP
See also
References
^ "Nvidia Volta Trademark Status" . United_States_Patent_and_Trademark_Office . 14 August 2023. Retrieved 14 August 2023 .
^ Gasior, Geoff (19 March 2013). "Nvidia's Volta GPU to feature on-chip DRAM" . The Tech Report . Retrieved 14 March 2017 .
^ a b c Smith, Ryan (2017-05-10). "The NVIDIA GPU Tech Conference 2017 Keynote Live Blog" . Retrieved 2018-11-03 .
^ "NVIDIA Volta AI Architecture | NVIDIA" . NVIDIA . Retrieved 2018-04-11 .
^ "Volta trademark Cancellation Proceeding" . United_States_Patent_and_Trademark_Office .
^ "Volta trademark Exparte Appeal Proceeding" . United_States_Patent_and_Trademark_Office .
^ "Volta Trademark status" . United_States_Patent_and_Trademark_Office .
^ a b Killian, Zak (14 March 2017). "Report: TSMC set to fabricate Volta and Centriq on 12-nm process" . The Tech Report . Retrieved 14 March 2017 .
^ Durant, Luke; Giroux, Olivier; Harris, Mark; Stam, Nick (May 10, 2017). "Inside Volta: The World's Most Advanced Data Center GPU" . Nvidia developer blog .
^ Gasior, Geoff (March 19, 2013). "Nvidia's Volta GPU to feature on-chip DRAM" . The Tech Report .
^ a b Shah, Agam (22 August 2016). "Nvidia's NVLink 2.0 will first appear in Power9 servers next year" . PC World . Retrieved 14 March 2017 .
^ a b Harris, Mark (May 11, 2017). "CUDA 9 Features Revealed: Volta, Cooperative Groups and More" . Retrieved August 12, 2017 .
^ "NVIDIA Ampere Architecture In-Depth" . 14 May 2020.
^ "NVIDIA A100 Tensor Core GPU Architecture" (PDF) . Retrieved 2023-12-15 .
^ "NVIDIA A100 Tensor Core GPU Architecture: Unprecedented Acceleration at Every Scale" (PDF) . Nvidia . Retrieved September 18, 2020 .
^ "NVIDIA Tensor Cores: Versatility for HPC & AI" . NVIDIA .
^ "Abstract" . docs.nvidia.com .
^ Cutress, Ian; Tallis, Billy (4 January 2016). "CES 2017: Nvidia Keynote Liveblog" . AnandTech . Retrieved 9 January 2017 .
^ "NVIDIA DRIVE Xavier, World's Most Powerful SoC, Brings Dramatic New AI Capabilities | NVIDIA Blog" . The Official NVIDIA Blog . 2018-01-07. Retrieved 2018-11-03 .
^ Smith, Ryan (10 May 2017). "Nvidia Volta Unveiled" . AnandTech . Retrieved 2 June 2017 .
^ "NVIDIA TITAN V Transforms the PC into AI Supercomputer" .
^ "Introducing NVIDIA TITAN V: The World's Most Powerful PC Graphics Card" .
^ "NVIDIA Reinvents the Workstation with Real-Time Ray Tracing" .
^ "Introducing NVIDIA TITAN V: The World's Most Powerful PC Graphics Card" . NVIDIA . Retrieved 2017-12-08 .
^ "NVIDIA Quadro GV100" . Retrieved 2018-03-27 .
^ Smith, Ryan. "NVIDIA Unveils & Gives Away New Limited Edition 32GB Titan V "CEO Edition" " . Retrieved 2018-07-06 .
^ "NVIDIA TITAN V CEO Edition" . TechPowerUp . Retrieved 2018-07-07 .
^ Shankland, Steven (14 September 2015). "IBM, Nvidia land $325M supercomputer deal" . CNET . Retrieved 29 December 2015 .
^ Noyes, Katherine (16 March 2015). "IBM, Nvidia rev HPC engines in next-gen supercomputer push" . PC World . Retrieved 29 December 2015 .
^ Smith, Ryan (17 November 2014). "Nvidia Volta, IBM Power9 Land Contracts for New US Government Supercomputers" . Anandtech . Retrieved 14 March 2017 .
^ Lilly, Paul (January 25, 2017). "NVIDIA 12nm FinFET Volta GPU Architecture Reportedly Replacing Pascal In 2017" . HotHardware .
^ Smith, Ryan (March 22, 2022). "NVIDIA Hopper GPU Architecture and H100 Accelerator Announced: Working Smarter and Harder" . AnandTech.
^ Smith, Ryan (May 14, 2020). "NVIDIA Ampere Unleashed: NVIDIA Announces New GPU Architecture, A100 GPU, and Accelerator" . AnandTech.
^ "NVIDIA Tesla V100 tested: near unbelievable GPU power" . TweakTown . September 17, 2017.
External links
Software and technologies
Multimedia acceleration Software Technologies GPU microarchitectures