STAC-ML™ Markets (Inference) Naive Implementation with ONNX on an Azure D64plds v5 VM (64 Ampere® Altra® vCPUs, 128 GiB memory) Latency-Optimized Configuration
STAC-ML™ Markets (Inference) Benchmarks (Sumaco suite)
- STAC-ML Markets (Inference) Naive Implementation (Compatibility Rev B)
- Driver and Inference Engine
- Python 3.8.10
- ONNX runtime 1.12.1
- NumPy 1.23.3
- Ubuntu Linux 20.04.5 LTS
- Based on a standard image provided by Microsoft® Azure
- No OS tuning performed
- A Microsoft® Azure Standard D64plds v5 VM
- 64 Ampere® Altra® vCPUs @ 3.0GHz
- 128 GiB of memory
- 256 GiB Premium SSD LRS
Though no vendors had a hand in optimizing the system's performance, one vendor did help make the project happen: Microsoft provided credits in Azure so that this research could be completed. We are grateful for their help.
This report is just one in a series that explores latency and throughput optimization of ML inference workloads across different processor architectures in Microsoft Azure, all under similar software stacks. Together, these STAC Reports illustrate the kinds of insights STAC-ML benchmarks can provide while underscoring the sensitivity of performance results to the objectives of the solution architect.
The full set of reports in this series also includes:
- Ampere Altra (throughput optimized): www.STACresearch.com/STAC221006b
- Intel Ice Lake (latency optimized): www.STACresearch.com/STAC221007a
- Intel Ice Lake (throughput optimized): www.STACresearch.com/STAC221007b
- AMD Milan (latency optimized): www.STACresearch.com/STAC221008a
- AMD Milan (throughput optimized): www.STACresearch.com/STAC221008b
A research note that compares the SUTs, details their performance differences, and explores the latency-throughput-cost trade-offs is available here.