STAC Report: Dell with NVIDIA H100 SXM5 GPUs under STAC-A2 (derivatives risk)

New records in performance and energy efficiency

13 November 2023

STAC recently performed STAC-A2 Benchmark tests on a stack involving NVIDIA H100 SXM5 GPUs, which provide more FLOPS and memory bandwidth than the NVIDIA H100 PCIe GPUs that STAC has tested in previous systems. The “stack under test” (SUT) was a Dell PowerEdge XE9680 server with 8 x NVIDIA H100 SXM5 80 GiB GPUs supporting CUDA 12. The server was configured to mitigate the full range of Spectre/Meltdown threats.

The STAC Report is available here.

STAC-A2 is the technology benchmark standard based on financial market risk analysis. Designed by quants and technologists from some of the world's largest banks, STAC-A2 reports the performance, scaling, quality, and resource efficiency of any technology stack that is able to handle the workload (Monte Carlo estimation of Heston-based Greeks for a path-dependent, multi-asset option with early exercise).

Dell wished to highlight several results from this report:

  • Compared to all publicly reported solutions to date, this Dell PowerEdge XE9680 system featuring NVIDIA H100 SXM5 80 GiB GPUs set numerous performance and efficiency records, including:
    • The highest throughput (561 options / second)1
    • The fastest warm time (7.40 ms) in the baseline Greeks benchmark2
    • The fastest warm (160 ms) and cold (598 ms) times in the large Greeks benchmarks3
    • The most correlated assets (440) and Monte Carlo paths (316,000,000) simulated in 10 minutes4
    • The best energy efficiency (364,945 options / kWh)5
  • Compared to a liquid-cooled solution using 4 GPUs (INTC230927), this 8-GPU solution:
    • Was 16% more energy-efficient5
    • Was 2.5x / 1.8x the speed in the warm / cold runs of the large Greeks benchmarks3
    • Was 1.2x / 7.5 x the speed in the warm / cold runs of the baseline Greeks benchmarks2
    • Simulated 2.4x the correlated assets and 316x the Monte Carlo paths in 10 minutes4
    • Demonstrated 2.0x the throughput1
  • Compared to a solution using 8 x NVIDIA H100 PCIe GPUs, as well as previous versions of the NVIDIA STAC Pack and CUDA (SUT ID NVDA230721), this solution using NVIDIA H100 SXM5 GPUs:
    • Demonstrated 1.59x the throughput1
    • Was 1.17x the speed in the warm runs of the baseline Greeks benchmark2
    • Was 3.1x / 3.0x the speed in the warm / cold runs of the large Greeks benchmarks3
    • Simulated 10% more correlated assets in 10 minutes4
    • Was 17% more energy efficient5

For details, please see the report at the link above. Premium subscribers have access to the code used in this project as well as the micro-detailed configuration information for the solution. To learn about subscription options, please contact us.

---------------
1STAC-A2.β2.HPORTFOLIO.SPEED
2STAC-A2.β2.GREEKS.TIME.[WARM | COLD]
3STAC-A2.β2.GREEKS.10-100K-1260.TIME.[WARM | COLD]
4STAC-A2.β2.GREEKS.[MAX_ASSETS | MAX_PATHS]
5STAC-A2.β2.HPORTFOLIO.ENERG_EFF

About STAC News

Read the latest about research, events, and other important news from STAC.

Subscribe to notifications of research, events, and more.

(If you're a human, don't change the following field)
Your first name.
(If you're a human, don't change the following field)
Your first name.

Enter your email above, then click "Sign Up" to join the STAC mail list and (optionally) register to access materials on the site. Click for terms.