Models

Model Performance Data for Intel® Gaudi® 2 AI Accelerators

These performance numbers are measured using the latest SynapseAI* software release version 1.19.0, unless otherwise noted.

Note All models for both training and inference are using the PyTorch* 2.5.1 framework. Other applicable frameworks used for training or inference are noted for each model.

Explore Intel Gaudi 3 Accelerator Performance Data

INFERENCE | TRAINING

Large Language Models (LLM) for Throughput with Intel Gaudi 2 Accelerator

Model #HPU Precision Input Length Output Length Batch Size Throughput

Max Throughput [TpS - higher is better]
Model	# HPU	Precision	Input Length	Output Length	Batch Size	Throughput (tokens/sec)
LLaMA 2 7b	1	fp8	128	128	1230	13583
LLaMA 2 7b	1	fp8	128	2048	163	4802
LLaMA 2 7b	1	fp8	2048	128	94	1447
LLaMA 2 7b	1	fp8	2048	2048	81	1956
LLaMA 2 70b	2	fp8	128	128	1750	2943
LLaMA 2 70b	2	fp8	128	2048	327	3312
LLaMA 2 70b	2	fp8	2048	128	95	316
LLaMA 2 70b	2	fp8	2048	2048	159	1755
LLaMA 3.1 8B	1	fp8	128	128	2816	19875
LLaMA 3.1 8B	1	fp8	128	2048	512	14784
LLaMA 3.1 8B	1	fp8	2048	128	179	2011
LLaMA 3.1 8B	1	fp8	2048	2048	256	6083
LLaMA 3.1 70B	2	fp8	128	128	1792	2895
LLaMA 3.1 70B	2	fp8	128	2048	256	3816
LLaMA 3.1 70B	2	fp8	2048	128	142	316
LLaMA 3.1 70B	2	fp8	2048	2048	139	1648
LLaMA 3.1 70B	8	fp8	128	128	4000	10012
LLaMA 3.1 70B	8	fp8	128	2048	600	12538
LLaMA 3.1 70B	8	fp8	2048	128	383	1083
LLaMA 3.1 70B	8	fp8	2048	2048	476	6623

System Configuration

Intel Gaudi 2 Platform

System: HLS-Gaudi2 with eight Intel Gaudi 2 platform HL-225H mezzanine cards, two Intel Xeon Platinum 8380 CPUs at 2.30 GHz, and 1 TB of system memory

Common Software

Ubuntu* v22.04,
Intel Gaudi software v1.19.0 (full software support details)
PyTorch: Models run with PyTorch v2.4.0

选择您的语言

使用 Intel.com 搜索

快速链接

最近搜索

高级搜索

仅搜索

Models

Model Performance Data for Intel® Gaudi® 2 AI Accelerators

Large Language Models (LLM) for Throughput with Intel Gaudi 2 Accelerator

System Configuration

Stay Informed