SUT ID: STAC221007a
STAC-ML

STAC-ML™ Markets (Inference) Naive Implementation with ONNX on an Azure E104is v5 VM (104 Intel® Xeon® Platinum 8370C vCPUs, 672 GiB memory) Latency-Optimized Configuration

STAC-ML™ Markets (Inference) Benchmarks (Sumaco suite)

STAC-ML Markets (Inference) Naive Implementation (Compatibility Rev B)
Driver and Inference Engine
- Python 3.8.10
- ONNX runtime 1.12.1
- NumPy 1.23.3
Ubuntu Linux 20.04.5 LTS
- Based on a standard image provided by Microsoft^® Azure
- No OS tuning performed
A Microsoft^® Azure Standard E104is v5 VM
- Isolated Instance – No other VMs on the system
- 104 Intel^® Xeon^® Platinum 8370C (Ice Lake) vCPUs @ 2.8GHz
- 672 GiB of memory
- 256 GiB Premium SSD LRS

Though no vendors had a hand in optimizing the system's performance, one vendor did help make the project happen: Microsoft provided credits in Azure so that this research could be completed. We are grateful for their help.

This report is just one in a series that explores latency and throughput optimization of ML inference workloads across different processor architectures in Microsoft Azure, all under similar software stacks. Together, these STAC Reports illustrate the kinds of insights STAC-ML benchmarks can provide while underscoring the sensitivity of performance results to the objectives of the solution architect.

The full set of reports in this series also includes:

Ampere Altra (latency optimized): www.STACresearch.com/STAC221006a
Ampere Altra (throughput optimized): www.STACresearch.com/STAC221006b
Intel Ice Lake (throughput optimized): www.STACresearch.com/STAC221007b
AMD Milan (latency optimized): www.STACresearch.com/STAC221008a
AMD Milan (throughput optimized): www.STACresearch.com/STAC221008b

A research note that compares the SUTs, details their performance differences, and explores the latency-throughput-cost trade-offs is available here.

Please log in to see file attachments. If you are not registered, you may register for no charge.

STAC-ML™ Markets (Inference) Naive Implementation with ONNX on an Azure E104is v5 VM (104 Intel® Xeon® Platinum 8370C vCPUs, 672 GiB memory) Latency-Optimized Configuration

User login