STAC-AI™ LANG6 on NVIDIA GH200 Grace Hopper Superchip

Type: Unaudited

Specs: STAC-AI™ LANG6

NVIDIA recently performed two STAC-AI™ LANG6 (Inference-Only) benchmark runs using a QuantaGrid S74G-2U server, equipped with the GH200 Grace Hopper Superchip.

Stack under test:

  • Llama-3.1-70B
  • STAC-AI Pack for NVIDIA TensorRT-LLM
  • TensorRT-LLM release v0.17.0
  • Hardware stack – NVIDIA GH200 Grace Hopper Superchip

This particular report is for the Llama-3.1-70B-Instruct model.

The companion report for Llama-3.1-8B-Instruct can be found here: https://www.STACresearch.com/NVDA250610a

Note: None of the results have been audited by STAC.

Premium subscribers have access to extensive visualizations of all test results, the detailed configuration information for the solutions tested, the code used in this testing, and the ability to run these same benchmarks – as is, or with other models and data sets - in the privacy of their own labs. To learn about subscription options, please contact us.

Please log in to see file attachments. If you are not registered, you may register for no charge.

The STAC-AI Working Group focuses on benchmarking artificial intelligence (AI) technologies in finance. This includes deep learning, large language models (LLMs), and other AI-driven approaches that help firms unlock new efficiencies and insights.