Published at 2023-10-25 | Last Update 2023-10-25
This post provides a concise reference for the performance of popular GPU models from NVIDIA and Huawei/HiSilicon, primarily intended for personal use.
The first letter in GPU model names denote their GPU architectures, with:
T for Turing;A for Ampere;V for Volta;H for Hopper; 2022L for Ada Lovelace;| T4 | A10 | A10G | A30 | V100 PCIe/SMX2 | |
|---|---|---|---|---|---|
| Designed for | Data center workloads | (Desktop) Graphics-intensive workloads | Desktop | Desktop | Data center |
| Year | 2018 | 2020 | 2017 | ||
| Manufacturing | 12nm | 12nm | 12nm | ||
| Architecture | Turing | Ampere | Ampere | Ampere | Volta |
| Max Power | 70 watts | 150 watts | 165 watts | 250/300watts | |
| GPU Mem | 16GB GDDR6 | 24GB GDDR6 | 48GB GDDR6 | 24GB HBM2 | 16/32GB HBM2 |
| GPU Mem BW | 400 GB/s | 600 GB/s | 933GB/s |
900 GB/s |
|
| Interconnect | PCIe Gen3 32GB/s | PCIe Gen4 66 GB/s | PCIe Gen4 64GB/s, NVLINK 200GB/s | PCIe Gen3 32GB/s, NVLINK 300GB/s |
|
| FP32 | 8.1 TFLOPS | 31.2 TFLOPS | 10.3TFLOPS | 14/15.7 TFLOPS | |
| BFLOAT16 TensorCore | 125 TFLOPS | 165 TFLOPS | |||
| FP16 TensorCore | 125 TFLOPS | 165 TFLOPS | |||
| INT8 TensorCore | 250 TFLOPS | 330 TOPS | |||
| INT4 TensorCore | 661 TOPS |
Datasheets:
| A800 (PCIe/SXM) | A100 (PCIe/SXM) | Huawei Ascend 910B | H800 (PCIe/SXM) | H100 (PCIe/SXM) | |
|---|---|---|---|---|---|
| Year | 2022 | 2020 | 2023 | 2022 | 2022 |
| Manufacturing | 7nm | 7nm | 7+nm | 4nm | 4nm |
| Architecture | Ampere | Ampere | HUAWEI Da Vinci | Hopper | Hopper |
| Max Power | 300/400 watt | 300/400 watt | 400 watt | 350/700 watt | |
| GPU Mem | 80G HBM2e | 80G HBM2e | 64G HBM2e | 80G HBM3 | 80G HBM3 |
| GPU Mem BW | 1935/2039 GB/s | 2/3.35 TB/s | |||
| Interconnect | NVLINK 400GB/s | PCIe Gen4 64GB/s, NVLINK 600GB/s | HCCS 392GB/s | NVLINK 400GB/s | PCIe Gen5 128GB/s, NVLINK 900GB/s |
| FP32 | 19.5 TFLOPS | 51/67 TFLOPS | |||
| TF32 (TensorFloat) | 156/312 TFLOPS | 756/989 TFLOPS | |||
| BFLOAT16 TensorCore | 156/312 TFLOPS | ||||
| FP16 TensorCore | 312/624 TFLOPS | 320 TFLOPS | 1513/1979 TFLOPS | ||
| FP8 TensorCore | NOT support | NOT support | 3026/3958 TFLOPS | ||
| INT8 TensorCore | 624/1248 TFLOPS | 640 TFLOPS | 3026/3958 TFLOPS |
H100 vs. A100 in one word: 3x performance, 2x price.
Datasheets: