

# THE LATENCY RACE:

# **IT'S ONLY JUST BEGUN**



## Every 100-150 Years We Hit an Accelerated Innovation Wave Which Drives Innovation



Source: Internetlivestats.com; Intel 4004 photo: Hellisp / Wikipedia; Compaq portable photo: Geni / Wikipedia; iPad photo: mama\_mia / Shutterstock.com; iPhone photo: Zeynep Demir / Shutterstock.com

## Today's Trading Landscape

Latency still matters.... A lot...

- Lower is still better, but ROI is smaller than before
- Can't be slower than competitors as the bar has been raised
- There's a hierarchy of needs either nanos or micros
- There's tension between algorithm sophistication and speed from a both a computational and programming perspective

Tech Trends to Consider in the Wake of the Wave Moore's Law Cloud Heterogeneity New materials

# Are We There Yet?

"Are attempts to reduce... latency becoming a case of ever diminishing returns and ever increasing investment?"

- Automated Trader April 5,

# Are We There Yet?

"Are attempts to reduce... latency becoming a case of ever diminishing returns and ever increasing investment?"

- Automated Trader April 5, 2007

## Tech Trend #1: Moore's Law

The key enabler of:

Strained Silicon

- New Levels of Performance & Integration
- Higher functionality and complexity
- Control over power, cost, and size





#### Recently Released Intel<sup>®</sup> Xeon<sup>®</sup> Processor E5-2600 v4

Broadwell microarchitecture, Built on 14nm process technology

- More Cores, More LL Cache, Larger Registers, Faster Memory
- Integrated: Memory Controller, I/O Controller, Voltage Regulator
- Enhanced ISA, AES-NI, Realtime Processor Trace

| Feature                                 | Xeon E5-2600 v3<br>(Haswell-EP)                            | Xeon E5-2600 v4<br>(Broadwell-EP) | 4 Channels<br>DDR4<br>BROADWELL-EP |
|-----------------------------------------|------------------------------------------------------------|-----------------------------------|------------------------------------|
| Cores Per Socket                        | Up to 18                                                   | Up to 22                          | DDR4 2 QPI 1.1                     |
| Threads Per Socket                      | Up to 36 threads                                           | Up to 44 threads                  | Core Core QPI                      |
| Last-level Cache (LLC)                  | Up to 45 MB                                                | Up to 55 MB                       | DDR4 Core Core                     |
| QPI Speed (GT/s)                        | 2x QPI 1.1 channels 6.4, 8.0, 9.6 GT/s                     |                                   | DDR4 QPI                           |
| PCIe* Lanes/<br>Controllers/Speed(GT/s) | 40 / 10 / PCIe* 3.0 (2.5, 5, 8 GT/s)                       |                                   | DDR4 Core Core Shared Cache        |
| Memory Population                       | 4 channels of up to 3<br>RDIMMs or 3 LRDIMMs               | + 3DS LRDIMM <sup>&amp;</sup>     | 40 Lanes PCIe* DMI2                |
| Max Memory Speed                        | Up to 2133                                                 | Up to 2400                        | 3.0                                |
| TDP (W)                                 | 160 (Workstation only), 145, 135, 120, 105, 90, 85, 65, 55 |                                   |                                    |

# Requires BIOS and firmware update & Depends on market availability

All products, computer systems, dates and figures specified are preliminary based on current expectations, and are subject to change without notice. Intel may make changes to specifications and product descriptions at any time, without notice

## **KNL Architecture Overview**

#### ISA

Intel<sup>®</sup> Xeon<sup>®</sup> Processor Binary-Compatible (w/Broadwell)

#### **On-package memory**

Up to 16GB, ~460 GB/s STREAM at launch

#### **Platform Memory**

Tile

Up to 384GB (6ch DDR4-2400 MHz)

Fixed Bottlenecks ✓ 2D Mesh Architecture

✓ Out-of-Order Cores

EDC (embedded DRAM controller)

TILE: (up to 36)





x4 DMI2 to PCH 36 Lanes PCIe\* Gen3 (x16, x16, x4)







#### Tech Trend #2: The Journey to the Hybrid Cloud On-Demand





Cache/Memory Quality of Service Built for Virtualization and Cloud Applicable for Trading

#### New Features for Broadwell Server

- Cache QoS Monitoring on L3
  - Double the RMIDs over Haswell Server for Class of Service Control over resources
- Cache QoS Enforcement on L3
- Memory BW Monitoring
  - Leverages same architecture as Cache QoS Monitoring

Using CQOS provides trading models another level of Control



# "UP TO 1/3 OF CLOUD SERVICE PROVIDER Nodes to use FPGAs by 2020"



© Intel Corporation 2016

# TECH TREND #3 HETEROGENEITY Assembling the right parts to get the Job Done



# FGPA's have High Speed Connections to CPU's, Software CPU's and ARM CPU's. Integration with Xeon is a logical recognition of the value of FPGA's



## **INTEGRATION OF SILICON SOLUTIONS AND IP & TOOL-FLOW DRIVES PERFORMANCE**

#### Manufacturing Is a Key Ingredient in FPGA



### **TECH TREND #4 NEW MEMORY ECOMONICS** Breaks the Memory Storage Barrier



Latency is everywhere... in data, in calculations, in storage while the time to act is shrinking

# **Computer Architecture Reoptimization**



Source:

Intel 2011 & David Patterson "Latency Lags Bandwidth" COMMUNICATIONS OF THE ACM October 2004/Vol. 47, No. 10

# **Computer Architecture Reoptimization**



Source:

Intel 2011 & David Patterson "Latency Lags Bandwidth" COMMUNICATIONS OF THE ACM October 2004/Vol. 47, No. 10

# **Computer Architecture Reoptimization**



COMMUNICATIONS OF THE ACM October 2004/Vol. 47, No. 10

Source:

NVM**e\*** with 3D Xpoint<sup>™</sup> Technology

(intel



Technology claims are based on comparisons of latency, density and write cycling metrics amongst memory technologies recorded on published specifications of in-market memory products against internal Intel specifications.

#### Intel® Xeon® Processor E7 v4 Product Family

#### Kx Systems\*

#### kdb+ 3.1\* running the STAC-M3\* workload (High-speed tick analysis)

"For the longest time, we were held back by slow discs and minimal amounts of memory . Kx customers require performance. As memory configurations have increased to six terabytes or larger, game-changing data strategies are becoming possible and changing how our customers do business when they can load their data into memory."<sup>1</sup>

#### Simon Garland – Chief Strategist, Kx Systems

- The STAC-M3\* benchmark characterizes analysis of time-series data such as tick-by-tick quote and trade histories, which are crucial to many trading functions, from algorithm development to risk management. The key metric in STAC-M3\* is responses times.
- SK Hynix\* DDR4 memory, 60% more cores along with Intel<sup>®</sup> AVX2 instructions enabled significant growth over the past two generations in analytic capabilities using the same Brickland platform.

Improve your financial portfolio results by running more theoretical profit and loss analyses a day by up to 2.8x



#### **Financial Services**

Improve financial portfolio analysis with the 4-Socket Servers using the Intel<sup>®</sup> Xeon<sup>®</sup> processor E7-8890 v4



Workload: STAC-M3\* B1.10T.THEOPL.TIME high-speed analytics on time series, tick-by-tick market data.

<sup>1 -</sup> Testing conducted on ISV\* software comparing 4S Intel<sup>®</sup> Xeon<sup>®</sup> Processor E7-8890 v4 with 4S Intel<sup>®</sup> Xeon<sup>®</sup> Processor E7-4890 v2. Testing done by ISV/Intel. For complete testing configuration details, SEE SLIDE xx. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit <a href="http://www.intel.com/performance">http://www.intel.com/performance</a>.



## Intel NVM DIMM Technology



\* DIMM population shown as an example only.



## NAND BASED NVME SSD

## **3D XPOINT<sup>™</sup> BASED NVME SSD**



**NVM Solutions Group** 



## Conclusion

Trading will continue to benefit from technology trends driven by other industries

- There will be first mover advantages for trading firms
  - CPUs will provide not just more cores but also more control
  - FPGAs will continue to get faster and more capable
  - CPUs and FPGA will integrate
  - New solid-state storage will shrink latencies
- Trading firms will be able to compute ever larger problems
  - CPU-FPGA integration, faster interconnects, faster solid state media will enable more sophisticated low-latency algorithms
- What Can You Do?
  - Be prepared to rethink your application architecture
  - Experiment with these new technologies
  - Use them to their fullest (e.g., leverage high-performance APIs)