

Intel® Xeon Phi™ Processor Codenamed Knights Landing (KNL)

## **KNL Architecture Overview**

x4 DMI2 to PCH

36 Lanes PCle\* Gen3 (x16, x16, x4)

#### ISA

Intel® Xeon® Processor Binary-Compatible (w/Broadwell)

### On-package memory

Up to 16GB, ~460 GB/s STREAM at launch

### **Platform Memory**

Up to 384GB (6ch DDR4-2400 MHz)

### **Fixed Bottlenecks** ✓ 2D Mesh Architecture

TILE: (up to 36) ✓ Out-of-Order Cores

✓ 3X single-thread vs. KNC

| 2VPU |         | HUB | 2VPU    |      |
|------|---------|-----|---------|------|
| 2110 |         | 1MB |         |      |
| Core | 32K L1D | L2  | 32K L1D | Core |
|      | 32K L1I |     | 32K L1I |      |

Enhanced Intel® Atom™ cores based on Silvermont Microarchitecture



EDC (embedded DRAM controller)



IMC (integrated memory controller)



IIO (integrated I/O controller)



## Integrated On-Package Memory Usage Models

Model configurable at boot time and software exposed through NUMA<sup>1</sup>

Platform Memory (DDR4) only available for bootable KNL host processor





# **Knights Landing Overview**



Source Intel: All products, computer systems, dates, and figures specified are preliminary based on current expectations and are subject to change without notice. KNL data are preliminary based on current expectations and are subject to change without notice. 1. Binary Compatible with Intel Xeon processors using Haswell Instructions Set (except TSX), 2 Bandwidth numbers are based on STREAM-like memory access pattern when MCDRAM used as flat memory. Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware & software design or configuration may affect actual performance.



## **KNL ISA**



## **KNL** implements all legacy instructions

- Existing binaries run w/o recompilation
- KNC binaries require recompilation

#### **KNL** introduces AVX-512 Extensions

- 512-bit FP/Integer Vectors
- 32 registers, & 8 mask registers
- Gather/Scatter

LEGACY

**Conflict Detection**: Improves Vectorization

**Prefetch**: Gather and Scatter Prefetch

**Exponential and Reciprocal Instructions** 

- 1. Previous Code name Intel® Xeon® processors
- 2. Xeon Phi = Intel® Xeon Phi™ processor



# Case Study: The STAC-A2 benchmark

STAC-A2 evaluates Monte Carlo over 5 assets and the Greeks

- 5 assets, 25K path, 252 time steps
- For American-style options using the Heston Model
- Compute Greeks: Theta, Rho, Delta, Gamma, Cross-Gamma, Model Vega, Correlation Vega:



27x overall improvement



# INTEL SOFTWARE DEVELOPERS CONFERENCE NEW YORK—FINANCIAL INDUSTRY

Big Data • Data Analytics • Machine Learning

Don't miss this free, one-of-a-kind event!

Join Intel experts and software development leaders for an exclusive one-day conference in Midtown Manhattan. Learn how to optimize performance for big data analytics in the financial services industry.

**June 23, 2016** 

Le Parker Meridien, 119 W. 56th St., New York, NY 10019

Register now at intel.ly/1TKRGPT



## STAC-A2 on KNL – Background

- Testing out pre-release product. First configuration is below.
- Hardware
  - 1 x Intel Xeon Phi 7250 (Knights Landing)
    - · 68 physical cores
    - 272 logical cores
  - 96GB DRAM, 16GB MCDRAM
  - Intel white box, effectively 0.5U



- Software
  - STAC-A2 Pack for Intel Composer XE Rev H
    - · Derived from Rev F. Ideal for homogeneous systems
  - Intel Composer XE, Intel Threading Building Blocks
- First STAC-A2 results using just one socket

## STAC-A2 on KNL – Results Highlights

