STAC-ML™ Markets (Inference) Naive Implementation with ONNX on an Azure E104is v5 VM (104 Intel® Xeon® Platinum 8370C vCPUs, 672 GiB memory) Latency-Optimized Configuration
STAC-ML™ Markets (Inference) Benchmarks (Sumaco suite)
- STAC-ML Markets (Inference) Naive Implementation (Compatibility Rev B)
- Driver and Inference Engine
- Python 3.8.10
- ONNX runtime 1.12.1
- NumPy 1.23.3
- Ubuntu Linux 20.04.5 LTS
- Based on a standard image provided by Microsoft® Azure
- No OS tuning performed
- A Microsoft® Azure Standard E104is v5 VM
- Isolated Instance – No other VMs on the system
- 104 Intel® Xeon® Platinum 8370C (Ice Lake) vCPUs @ 2.8GHz
- 672 GiB of memory
- 256 GiB Premium SSD LRS
Though no vendors had a hand in optimizing the system's performance, one vendor did help make the project happen: Microsoft provided credits in Azure so that this research could be completed. We are grateful for their help.
This report is just one in a series that explores latency and throughput optimization of ML inference workloads across different processor architectures in Microsoft Azure, all under similar software stacks. Together, these STAC Reports illustrate the kinds of insights STAC-ML benchmarks can provide while underscoring the sensitivity of performance results to the objectives of the solution architect.
The full set of reports in this series also includes:
- Ampere Altra (latency optimized): www.STACresearch.com/STAC221006a
- Ampere Altra (throughput optimized): www.STACresearch.com/STAC221006b
- Intel Ice Lake (throughput optimized): www.STACresearch.com/STAC221007b
- AMD Milan (latency optimized): www.STACresearch.com/STAC221008a
- AMD Milan (throughput optimized): www.STACresearch.com/STAC221008b
A research note that compares the SUTs, details their performance differences, and explores the latency-throughput-cost trade-offs is available here.