Vault Report: STAC-AI™ LANG6 on NVIDIA GH200 Grace Hopper Superchip

NVIDIA publish unaudited STAC-AI inferencing benchmark results for GH200
3 July 2025
NVIDIA recently submitted unaudited results for STAC-AI™ LANG6 (Inference-Only) benchmark runs on a GH200 Grace Hopper Superchip.
The Stack Under Test (SUT) was a QuantaGrid S74G server featuring NVIDIA GH200 Grace Hopper Superchip. Two separate tests were performed for the Llama-3.1-8B-Instruct and Llama-3.1-70B-Instruct models. Note STAC has not audited these reports and NVIDIA is solely responsible for these results.
The EDGAR4a/b Data Sets mentioned involved in the benchmark model a Retrieval Augmented Generation (RAG) workload based on EDGAR securities filings, having a median initial context size of approximately 1,200 words. The EDGAR5a Data Set represents question-answering against an entire EDGAR 10-K filing with a median initial context size of 44,000 words.
https://www.STACresearch.com/NVDA250610a (Llama-3.1-8B-Instruct)
https://www.STACresearch.com/NVDA250610b (Llama-3.1-70B-Instruct)
The reports and detailed configuration information are now available to eligible subscribers at the links above. To learn more about subscription options, please contact us.
About STAC News
Read the latest about research, events, and other important news from STAC.