Vous êtes sur la page 1sur 34

HP Gen8 technologies for low

latency, high performance


trading and exchanges

Patrick Greene
Solution Architect – HP
HPC on Wall Street
9/19/12

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Experience matters

HP ProLiant
#1 in x86 server market share
16+ years straight – 65 consecutive quarters in
both factory revenue and units

#1 in blade server market share


5 ¾ years straight – 23 consecutive quarters in
both factory revenue and units

HP’s leadership in the datacenter that has been built over years of innovation,
experience and market leadership.
Source: IDC Worldwide Quarterly Server Tracker, August 2012. Includes Compaq ProLiant from Q196 through Q202

2 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
FSI-HPC Solutions for Capital Markets
TM

• Ultra Low Latency Systems


for High Frequency Trading
TM
• fastest Xeon performance
• tuning White Paper and HP-TimeTest utility
TM
• HP/Mellanox TCP/UDP kernel bypass

• Low power choices


for grid computing

• Open reference architecture


for unstructured data

• Quality infrastructure
3
for IT cost reduction
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Low Latency Systems Require Optimization at
every layer in the Solution Stack
Use Cases Low Latency FSI Solution Stack
Exchange Matching Engines Use Cases / Lines of Business
Market Data Distribution Application Environment

Mgmt
Fab.
Messaging Middleware

Precision Timing
High Frequency Algorithmic Trading
Server I/O Fabric
Pre/Post Trade Analytics High Speed Storage
Real Time Enterprise Risk Integrated Acceleration
Management Firmware and Operating System
X86-64 Server Architecture
Definitions:
Solution - includes messaging middleware; in-house apps; design services
System - integrated server/networking/storage infrastructure
Components - specific servers/OS/switches/file system in the “system”
4 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Optimized Form Factors to • DL rack-mount servers for expandability
meet a variety of needs • All top bin E5-2600 processors offered with 3DPC

• DL380p option for 25 disks in 2U 2P Gen8

• BL systems with integrated networking


• Integrated chassis system for redundancy & TCO

• Gen8 NIC/Switch options leveraging PCIe Gen3

• SL multi-node systems for scale-out grids


• Optimized for performance, power and price at scale

• ML mini-tower for ultimate expandability


• ML350 model (rack mount or mini tower) for even more disk, 9 PCI
slots!

5 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
HP Gen8 Servers (Sandy Bridge E5-2600)

DL380p 8SFF Model


w/optional 8SFF
hot swap drives

Three top bin Processors circled


• 8c 3.1GHz in HP Z820 workstation (4U with racking kit; no iLO4)
• 8c 2.9GHz and 4c 3.3GHz in DL380p (2U) and DL360p (1U)
• 130 watt 8c & 6c in BL460c (16 in 10U), SL230 (8 in 4U), and SL250 (4 in 4U)
6 • ©Turbo Boost
Copyright 2012 deserves
Hewlett-Packard Developmenta fresh
Company, L.P. look (e.g.contained
The information +400 hereinMHz)
is subject to change without notice.
Fastest Memory: ProLiant Gen8 DIMMs
Intel E5 (SB) = 4 memory channels, so 2p servers have 8 channels with 2 or 3 DPC
8 Dual Rank DIMMs are optimum if it meets your memory capacity requirements
Explanation: The memory bus is forced to idle for one clock when switching between ranks on the same DIMM, and idle for 2 clocks when
switching between ranks on different DIMMs. So 1 DPC out performs 2DPC at the same capacity and same number of ranks on the channel.

UDIMMs offer a 1 clock latency advantage when only 1 DIMM per Channel (DPC)
Unregistered DIMMs UDIMM failure rates are higher, so use these judiciously
DIMM Description 1DPC (DDR3-) 2DPC (DDR3-) 3DPC (DDR3-)

8GB 2Rx4 PC3-12800R 1.5V DDR3-1600 RDIMM 1600 1600 1333 1


16GB 2Rx4 PC3-12800R 1.5V DDR3-1600 RDIMM 1600 1600 1333 1
4GB 2Rx8 PC3-12800E 1.5V DDR3-1600 UDIMM 1600
New
4 June, 2012
8GB 2Rx8 PC3-12800E 1.5V DDR3-1600 UDIMM 1600

Why dual rank?


For the same memory speed and DIMM type, more ranks will
result in lower loaded latency. We enable rank interleaving
when dual-rank DIMMs are installed on a channel. So more
ranks give the memory controller a greater capability to
7 parallelize theHewlett-Packard
© Copyright 2012 processingDevelopment
of memory requests.
Company, This results
L.P. The information inherein is subject to change without notice.
contained
shorter request queues and therefore lower latency.
Platform Tuning Advice for Low Latency
Updated White Paper: Configuring and Tuning HP ProLiant Servers for Low-Latency Applications
Posted at http://h20000.www2.hp.com/bc/docs/support/SupportManual/c01804533/c01804533.pdf

Disable Power and CPU Monitoring SMI


Eliminate 8x/sec latency spike on managed servers from this System Management Interrupt (SMI) of magnitude >200msec
Turns off P-state monitoring so server always runs at full speed

Consider Disabling Memory Pre-Failure Notification SMI


Eliminates an SMI that occurs once per 5 min for Gen8 and once/hour for G7;.
Correctible and uncorrectable memory error handling is unaffected by turning off notification of the # of correctible errors made

Do this with the new HPRCU, Conrep scripting tool or RBSU Advanced Menu
Conrep now available for Solaris too

See User Guide for ROM-Based Setup Utility (RBSU) for explanation of BIOS settings
Pub #347563-405 June, 2012 at: http://h20000.www2.hp.com/bc/docs/support/SupportManual/c00191707/c00191707.pdf

Run HP-TimeTest utility v7.2 for a quick jitter check


Request free utility via e-mail to low.latency@hp.com Include your company name, city/country, and HP sales rep/reseller if known so that the right
regional person can respond.
8 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
The Benefit of Low Latency Tuning – minimized jitter
Plots of HP-TimeTest output:
Jitter observed in 7-8 microsecond range
Latency Spikes: Time History, DL380p Gen8, E5-2643 @ 3.300 GHz
with current LL RHEL 6.2/2.6.32-220.el6.x86_64, HP-TimeTest7.2
9

tuning on SNB, 25000


8
7

we observe 20000

latency (cycles)
6

latency (μsecs)
5

spikes <9 μsec


15000
4
10000 3
2
5000
1
0 0
0 30 60 90 120 150 180 210 240 270 300 330 360 390 420 450 480 510 540 570 600
spike (usec) Elapsed Time (seconds) threshold set to 3 msec

with prototype Jitter observed in 1.5 – 2.5 microsecond range !


Latency Spikes: Time History, DL380p Gen8, E5-2643 @ 3.300 GHz
HP BIOS option RHEL 6.2/2.6.32-220.el6.x86_64, HP-TimeTest7.2
bootleg BIOS (06/22/2012)
for SNB memory 9
8

power refresh,
25000
7
latency (cycles)

20000 6

latency (μsecs)
we observe 15000
5

spikes <3 μsec !


4
10000 3

(to be released mid-Oct’12) 5000


2
1
0 0
9 0
© Copyright 2012 Hewlett-Packard Development Company, 30 The60
L.P. 90 contained
information 120 herein
150 is subject
180 to 210
change240
without270
notice. 300 330 360 390 420 450 480 510 540 570 600
spike (cycle) Elapsed Time (seconds)
threshold set to 1.5 msec
Why PCIe Gen3 matters...

ProLiant Gen8 servers with


ConnectX-3 based Adapters
and VMA acceleration enable
2msec trading advantage!

10 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
VMA v6 - TCP – Improved Capability In ConnectX-3
Feature CX-2 CX-3 Description
Connection Steering MAC+IP per process in No additional MAC+IP. ConnectX-3 implements Flow Steering
addition to Server MAC+IP Use Server’s MAC+IP

Multithread support QP per process QP per thread/socket ConnectX-3 Flow Steering enables finer
Multi-threaded applications performance tuning and optimizations
will share same CQ

DHCP Not supported Supported


Bonding & HA Not supported Supported (Q1’12)
VLAN Not supported Supported
IP routing gateway Single default GW is Host stack routing table is ConnectX-3 Flow Steering enables utilizing
supported per process and supported the host IP stack
requires per process
configuration

CX-3 Introduces 40GbE!


11 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
HP/Mellanox Solution now accelerates TCP as well as UDP
protocols
TCP Latency Improvement (Netperf 10GbE)
10
G7 X5687 3.6GHz ConnectX-2
9
G7 X5687 3.6GHz ConnectX-3/VMA
8
Gen8 E5-2690 2.9GHz ConnectX-3/VMA
7
Latency (usec)

6
5
4
½ RT
3
Latency
(msec) 2
1
0
1 2 4 8 16 32 64 128 256 512 1024
Message Size (Bytes)
Back-to-back configuration (no Switch), ½ Round Trip; Netperf v2.5.0; MTU size = 1470 Bytes
12 RHEL Development
© Copyright 2012 Hewlett-Packard 6.1; ConnectX-3 FWL.P.
Company, 2.10.2220; Driver:
The information OFED-VMA
contained 1.5.3-0008;
herein is subject VMA
to change 6.1.6
without notice.
Command Line: netperf -n 16 -H <peer ip> -c -C -P 0 -t TCP_RR -l 10 -T 2,2 -- -r <message size>
Application Accelerator Options
FSI customers use accelerators for faster feed handlers, order execution engines, and compute-intensive risk &
pricing calculations

ISS/HPC team helps certify accelerators in ProLiant

Computational accelerator partner FPGAs:


• NVIDIA (SL2X0 Gen8 with Tesla cards)

HFT accelerator solution partners:


• ActivFinancial (OEMs DL380)

• Tervella (OEMs DL380)

• Ulink (OEMs DL160)

Gen8 servers enhance our support for accelerators


• DL380p risers now supports double wide HL PCIe cards with aux power cable options at PCIe Gen3 speeds!

Rapid changes underway: FPGA vendors adding 10GbE; 10GbE vendors adding FPGAs; switches adding FPGAs…
13 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Application Programming for Low Latency
Determine how many cores your trading strategy requires
Can it run on 8 cores? If so, match up CPU+NIC per strategy

Maximize your Application resources by doing the following:


1. Bind threads, interrupts and processes to cores using CPU_ID
/usr/bin/taskset –c 0,1 /usr/bin/numactl --localalloc …. (other command line options)
or use Red Hat “tuna” to do this with GUI (in RHEL 5.5 MRG and RHEL 6.0 standard)
Beginning with SandyBridge on-chip PCIe controllers, bind NICs to cores for minimum QPI latencies
2. Program memory accesses for NUMA awareness
See: http://bizsupport2.austin.hp.com/bc/docs/support/SupportManual/c03261871/c03261871.pdf
3. Place “communication” functions threads on adjacent cores
3. Use PCM to determine L3 Cache misses & keep data in L3 Cache
http://software.intel.com/file/41604
4. Compile with Performance Settings, Use PGO, Evaluate IPP / SSE 4.2 Strings
http://software.intel.com/en-us/articles/using-avx-without-writing-avx-code/

Implement application-transparent multicast acceleration between nodes,


Link
THEMellanox’s
GOLDEN VMA v6 library
TICKET: to the
Above theapplication
noise. for kernel bypass over Ethernet and IB (HP now resells VMA)
14 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
FSI-HPC Solutions for Capital Markets
TM

• Ultra low latency systems


for High Frequency Trading

• Low power choices


for grid computing
• SL200s servers with GPU options
• Moonshot program for ARM, Atom, Phi

• Open reference architecture


for unstructured data

• Quality infrastructure
15
for IT cost reduction
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Demonstrating the value of SL6500 servers
Built on ProActive Insight Architecture
SL230s SL250s
HPC optimized for HPC optimized for efficiency
maximum performance, and density, with balanced
efficiency and density GPU performance

• Purpose-built for HPC performance at scale • Purpose-built for HPC performance at scale
• Up to 1 integrated I/O Accelerator • Up to 3 integrated GPUs
• Maximum speed FDR IB FlexibleLOM • Maximum speed FDR IB FlexibleLOM
• Multi-node 1/2U density and efficiency • Multi-node 1U density and efficiency
• Enhanced, simple front serviceability • Enhanced, simple front serviceability
• Rack level power management • Rack level power management
• Industry Leading Mgmt with Insight Control* • Industry Leading Mgmt with Insight Control*
16 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
“GPUDirect RDMA” for Peer-to-Peer I/O
GPU Direct RDMA (previously known as GPU Direct 3.0)
Enables peer to peer communication directly between HCA and GPU
Dramatically reduces overall latency for GPU to GPU communications
by bypassing the host CPU’s memory
System GDDR5 GDDR5 System
Memory Memory Memory Memory

CPU GPU GPU CPU

PCI Express 3.0 PCI Express 3.0

GPU

Mellanox Mellanox
HCA HCA
Mellanox VPI Availability: GPUDirect RDMA requires
CUDA 5.0 and MLNX_OFED driver changes
17 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. (beta 9/12 with expected GA by 12/12).
HP/Nvidia Gen 8 GPU Starter Kit V2.0 in Americas
– Configuration:
• 1 DL380 control node w/ E5-2670 8 core 2.6GHz 115WCPUs, 64 GB RAM and 2x 600 GB HDD
• 1 SL6500 enclosures
• 4 SL250s 2u server trays w/ E5-2670 8 core 2.6GHz 115W CPUs, 64 GB RAM, 600 GB HDD, 2 Nvidia M2090 GPU
modules
• Mellanox IB 4x QDR 36 port managed switch
• HPN ProCurve 2910 24 port 10/100/1000 Ethernet switch
• RHEL
• CMU
• Linux Value Pack
• Rack and infrastructure
• Hardware/Software Integration

– Development Environment for commercial, enterprise, Higher Ed, ISVs


– CUDA Programming Environment
– Proof-of-concept environment for channel partners
– End-user Price ~$70K
– Contact HP Sales for detailed BOM
18 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Throughput-bound applications pervade the trading lifecycle

• Post Trade • Data Storage


Analysis and and Analysis
Compliance - Full trade history - Historical market data
logs and analytics - Firm-wide log
- Venue latencies consolidation
- Transaction Costs - Data Publishing for on-
- Risk Analytics demand large analytics

- Matching - Back Testing


- Execution - Optimization
- Online Risk - Search
• Strategy Management • Strategy
Execution Development
• Other latency and Testing
sensitive apps
19 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
A New Era of Extreme Scale Computing
From tens of servers per rack sharing nothing to thousands sharing everything

HP Project Moonshot Infrastructure


HP
Converged
Federated
Infrastructure Management, Fabric, Storage
Networking, Power/Cooling

HP Redstone HP Discovery Lab HP Pathfinder Program


Sever Development Platform Proof of Concept Lab Partner Collaboration

20 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

2
HP ‘Redstone’ Server Development Platform
Perfect for development and testing with unparalleled density, flexibility, and simplicity

ProLiant SL 6500s chassis Shared SL 6500 scalable system enclosure


• Pooled power—4 common slot power supplies
• Shared cooling—8 shared fans, N+1, rear-serviceable
Up to 72 servers in a
single 1U tray • Integrated, configurable network fabric with up to 16 10Gb uplinks

Up to 288 servers—18 quad node compute


cartridges per server tray
4 trays in a single 4U
• Calxeda EnergyCore ™ quad-core ARM SoCs w/4MB L2 cache
chassis
• Up to 4GB ECC (up to 1333mhz) memory per server
• Integrated management

Shared and configurable storage


HP ‘Redstone’ Development
Platform Server tray • Diskless or up 4 SATA drives (1 drive cartridges) per server
• Up to 192 SSD or 96 2.5” SFF HDD per enclosure
21 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

2
Breakthrough Savings and Simplicity
Energy, cost and space savings move the industry to new
infrastructure
Traditional x86 HP ‘Redstone Server’
$3.3M $1.2M
89% less energy
94% less space
63% less cost
97% less complexity
400 servers 1,600 servers
10 racks 1/2 rack
20 switches 2 switches
1,600 cables 41 cables
91 kilowatts 9.9 kilowatts

Select hyperscale web, and data analytics applications show tremendous promise
22 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Based on weighted average performance projections for workloads such as web serving, memcached, and Data Analytics.
© 2011 HP Confidential NDA Required Cost estimates include infrastructure, space, and power and cooling costs over three years.
FSI-HPC Solutions for Capital Markets
TM

• Ultra low latency systems


for High Frequency Trading

• Low power choices


for grid computing

• Open reference architecture


for unstructured data
• scalable Hadoop clusters with CMU
• analysis with Vertica and Autonomy

• Quality infrastructure
for IT cost reduction
23 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
What is Hadoop?
Your data is going unstructured

The digital universe will expand by almost half in 2012 - 90% of that data is unstructured

Traditional systems are not designed to analyze unstructured data

Hadoop is designed specifically to extract business value from unstructured data

Risk Modeling Fraud Detection Sentiment Analysis Customer Retention Web Mining

Financial Services Government Retail Telecom Media

24 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
How does Hadoop fit into existing BI ecosystems
Click Stream Analysis using Hadoop, Vertica and Autonomy
Navigation paths Data Assimilation Multi-dimensional analysis User segmentation
Time per page Data Consolidation, Aggregation Predictive analysis Software testing
Products Browsed Transformation into structured data Geographical analysis Market research

Hadoop Distributions
Unstructured HP Insight CMU (Cloudera, MapR, Ad hoc SQL Compliant Analytics Business
Click Stream Data Hortonworks) Users
Vertica

Meaning Based Analytics


Operating System
Autonomy IDOL
HP Converged
Infrastructure

Consulting Services

25 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
HP offers the shortest route to Hadoop success
Open strategy that combines Hadoop with advanced analytics and management
Seamless
• Deploy in days, not months analytics

• Scale to thousands of nodes with the push


Leading Distributions
of a button
Choice of
• Manage with single pane of glass solutions

• Optimize with real time and 3-D historical


views of compute resources
• Perform end to end analytics Insight cluster
Consulting
management Services
utility

26 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
HP HyperStorage Server
Address the explosion of data permeating the data center
ProLiant SL 4500
Shared SL 4500 HyperStorage chassis
• Pooled power — 4 HP common slot power supplies
180TB Storage • Shared cooling — 10 shared fans, N+1, rear-serviceable
• Shared management — Reduced cabling with single iLo port
Single node
Most dense storage available in market today
2 x 75TB Storage • Up to 60 LFF drives in a single chassis giving a total of 180 TB of
available storage

Dual node Multiple configurations available


• Single server model gives the most dense storage solution for
massive data stores
3 x 45TB Storage
• Triple server gives users optimal mix of storage and compute for
working inside large unstructured datasets
Triple node • Dual server provides an optimal mix of high density storage and
compute
27 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

2
HP ProLiant SL4500 Solution Efficiency
Three Node vs. Traditional Similar Deployment

vs.

vs.

28 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
SL45xx Overview and Features
Designed for Density
First HP ProLiant server, built purely with storage intensive applications in mind

Densest HDD option in HP ProLiant portfolio

Various configurations allow customer selection for optimization for their unique data center needs

29 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
FSI-HPC Solutions for Capital Markets
TM

• Ultra low latency systems


for High Frequency Trading

• Low power choices


for grid computing

• Open reference architecture


for unstructured data

• Quality infrastructure
for IT cost reduction
• ProActive Insight Architecture
• Performance Optimized Datacenters
30 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
HP ProLiant Gen8:
The World’s Most Self-Sufficient Servers

With HP ProActive Insight architecture:


Integrated Dynamic Automated Proactive
Lifecycle Workload Energy Service &
Automation Acceleration Optimization Support

3X 6X 70% 66%
Admin productivity Performance increase More compute Faster time to
improvement for the most per watt problem resolution
demanding workloads

Serviceabilty with Quality:


www.youtube.com/watch?v=AZw-LG-oyDU
31 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

3
HP ProActive Insight Architecture
Designed to Simplify, Integrate and Automate your Infrastructure

iLO4 Management Engine


HP Smart Storage

HP FlexNet Adapters
Insight Online Sea of Sensors 3D

Virtual Connect “ProLiant” Operating Environment Datacenter Smart Grid

Integrated Lifecycle Automation / Dynamic Workload Acceleration / Automated Energy Optimization / ProActive Service and Support
32 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Gen8 Smart Array Innovations
Increased performance, data availability and storage capacity
Faster access to data
• Up to 2X performance improvement*
• 2X Write Cache (up to 2 GB)

Address explosive data growth


• 2X # of Drives supported (up to 227)

Minimize data loss


• Long term data retention with Flash Backed Write
Cache standard

External model with SAS cable connectors


for extending the RAID set to JBODs Reduce initial setup time
• 95% reduction in parity initialization from several
days to 5 hours**
33 © Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
*256KiB, Sequential write, RAID 5 with 15K SAS drives, performance will vary based on configuration
** HP R & D, Validation information TBD
Thank you

Low.latency@hp.com

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Vous aimerez peut-être aussi