GPU Performance (Data Sheets) Quick Reference (2023)

GPU Performance (Data Sheets) Quick Reference (2023)
Published at 2023-10-25 | Last Update 2023-10-25 This post provides a concise refe 2023-10-25 08:0:0 Author: arthurchiao.github.io(查看原文) 阅读量:24 收藏

Published at 2023-10-25 | Last Update 2023-10-25

This post provides a concise reference for the performance of popular GPU models from NVIDIA and Huawei/HiSilicon, primarily intended for personal use.

1 Introduction
- Naming convention of NVIDIA GPUs
2 Comparison of T4/A10/A10G/V100
3 Comparison of A100/A800/H100/H800/Ascend 910B

Naming convention of NVIDIA GPUs

The first letter in GPU model names denote their GPU architectures, with:

T for Turing;
A for Ampere;
V for Volta;
H for Hopper; 2022
L for Ada Lovelace;

	T4	A10	A10G	A30	V100 PCIe/SMX2
Designed for	Data center workloads	(Desktop) Graphics-intensive workloads	Desktop	Desktop	Data center
Year	2018	2020			2017
Manufacturing	12nm	12nm	12nm
Architecture	Turing	Ampere	Ampere	Ampere	Volta
Max Power	70 watts	150 watts		165 watts	250/300watts
GPU Mem	16GB GDDR6	24GB GDDR6	48GB GDDR6	24GB HBM2	16/32GB HBM2
GPU Mem BW	400 GB/s	600 GB/s		`933GB/s`	`900 GB/s`
Interconnect	PCIe Gen3 32GB/s	PCIe Gen4 66 GB/s		PCIe Gen4 64GB/s, NVLINK 200GB/s	PCIe Gen3 32GB/s, NVLINK `300GB/s`
FP32	8.1 TFLOPS	31.2 TFLOPS		10.3TFLOPS	14/15.7 TFLOPS
BFLOAT16 TensorCore		125 TFLOPS		165 TFLOPS
FP16 TensorCore		125 TFLOPS		165 TFLOPS
INT8 TensorCore		250 TFLOPS		330 TOPS
INT4 TensorCore				661 TOPS

Datasheets:

	A800 (PCIe/SXM)	A100 (PCIe/SXM)	Huawei Ascend 910B	H800 (PCIe/SXM)	H100 (PCIe/SXM)
Year	2022	2020	2023	2022	2022
Manufacturing	7nm	7nm	7+nm	4nm	4nm
Architecture	Ampere	Ampere	HUAWEI Da Vinci	Hopper	Hopper
Max Power	300/400 watt	300/400 watt	400 watt		350/700 watt
GPU Mem	80G HBM2e	80G HBM2e	64G HBM2e	80G HBM3	80G HBM3
GPU Mem BW		1935/2039 GB/s			2/3.35 TB/s
Interconnect	NVLINK 400GB/s	PCIe Gen4 64GB/s, NVLINK 600GB/s	HCCS 392GB/s	NVLINK 400GB/s	PCIe Gen5 128GB/s, NVLINK `900GB/s`
FP32		19.5 TFLOPS			51/67 TFLOPS
TF32 (TensorFloat)		156/312 TFLOPS			756/989 TFLOPS
BFLOAT16 TensorCore		156/312 TFLOPS
FP16 TensorCore		312/624 TFLOPS	320 TFLOPS		1513/1979 TFLOPS
FP8 TensorCore	NOT support	NOT support			3026/3958 TFLOPS
INT8 TensorCore		624/1248 TFLOPS	640 TFLOPS		3026/3958 TFLOPS

H100 vs. A100 in one word: 3x performance, 2x price.

Datasheets:

文章来源: https://arthurchiao.github.io/blog/gpu-data-sheets/
如有侵权请联系:admin#unsafe.sh