The AMD Instinct MI250X (128GB HBM2e OAM module) 560W accelerator designed with AMD CDNA 2 6nm FinFET process technology at 1,700 MHz peak boost engine clock resulted in 47.9 TFLOPS peak theoretical single precision (FP32) floating-point performance.
6 Includes AMD high-performance CPU and GPU accelerators used for AI training and high-performance computing in a 4-Accelerator, CPU-hosted configuration. Goal calculations are based on performance scores as measured by standard performance metrics (HPC: Linpack DGEMM kernel FLOPS with 4k matrix size. AI training: lower precision training-focused floating-point math GEMM kernels such as FP16 or BF16 FLOPS operating on 4k matrices) divided by the rated power consumption of a representative accelerated compute node, including the CPU host + memory and 4 GPU accelerators.
7 MI300-33: Text generated with Llama2-70b chat using input sequence length of 4096 and 32 output token comparison using custom docker container for each system based on AMD internal testing as of 11/17/2023.
Configurations:
2P Intel Xeon Platinum CPU server using 4x AMD Instinct MI300X (192GB, 750W) GPUs, ROCm® 6.0 pre-release, PyTorch 2.2.0, vLLM for ROCm, Ubuntu® 22.04.2.
Vs.
2P AMD EPYC 7763 CPU server using 4x AMD Instinct MI250 (128 GB HBM2e, 560W) GPUs, ROCm® 5.4.3, PyTorch 2.0.0., HuggingFace Transformers 4.35.0, Ubuntu 22.04.6.
4 GPUs on each system was used in this test.
Server manufacturers may vary configurations, yielding different results. Performance may vary based on use of latest drivers and optimizations.
A photo accompanying this announcement is available at https://www.globenewswire.com/NewsRoom/AttachmentNg/793bf8fd-eec4-4460-bec2-cc6620bdd6c7
Contact: Aaron Grabein AMD Communications (512) 602-8950 Aaron.grabein@amd.com Suresh Bhaskaran AMD Investor Relations (408) 749-2845 Suresh.Bhaskaran@amd.com