Vous êtes sur la page 1sur 32

SOLID STATE DRIVES IN HIGH PERFORMANCE COMPUTING

REDUCING THE I/O BOTTLENECK


Lawrence McIntosh, Systems Engineering Solutions Group Michael Burke, Ph.D., Strategic Applications Engineering Sun BluePrints Online
Part No 821-0125-10 Revision 1.0, 06/25/09

Sun Microsystems, Inc.

Table of Contents
Introduction ....................................................................................................... 3 Motivation ..................................................................................................... 3 SSD technology review ........................................................................................ 4 Single system application performance ............................................................... 6 The ABAQUS benchmark application ................................................................ 7 Hardware configuration .............................................................................. 8 Software configuration ................................................................................ 8 The NASTRAN benchmark application............................................................... 9 Hardware configuration ............................................................................ 11 Software configuration .............................................................................. 11 The ANSYS benchmark application ................................................................. 12 Hardware configuration ............................................................................ 13 Software configuration .............................................................................. 13 Summary for single system application performance ...................................... 14 SSD usage with the Lustre parallel file system .................................................... 14 Lustre file system design ............................................................................... 14 IOZone file system testing ............................................................................. 18 Hardware configuration ............................................................................ 20 Software configuration .............................................................................. 20 Summary for SSD usage with the Lustre parallel file system ............................ 20 Future directions........................................................................................... 20 Conclusion ....................................................................................................... 21 Appendix: Benchmark descriptions and parameters ............................................ 22 ABAQUS standard benchmark test cases ......................................................... 22 Hardware configuration ............................................................................ 26 Software configuration .............................................................................. 26 NASTRAN benchmark test cases ..................................................................... 27 Hardware configuration ............................................................................ 28 Software configuration .............................................................................. 28 ANSYS 12.0 (prel. 7) with ANSYS 11.0 distributed benchmarks .......................... 29 Hardware configuration ............................................................................ 30 Software configuration .............................................................................. 30 About the authors ............................................................................................. 30 References........................................................................................................ 31 Ordering Sun Documents................................................................................... 31 Accessing Sun Documentation Online ................................................................ 31

Solid State Drives in HPC: Reducing the I/O Bottleneck

Sun Microsystems, Inc.

Introduction
This Sun BluePrints article focuses on a comparison between traditional hard disk drives (HDDs) and the newer solid state drive (SSD) technology in high-performance computing (HPC) applications. SSD devices can help correct the imbalance between processor and storage speed while also reducing energy usage and environmental impact. This comparison was performed using two approaches: Application-based benchmarking was performed using the ABAQUS, NASTRAN, and ANSYS finite-element analysis (FEA) applications, in order to evaluate the effect of SSD technology in realistic HPC applications. These applications are commonly used to benchmark HPC systems. Benchmark testing of storage performance using the Lustre parallel file system and the popular IOZone benchmark application was performed, in order to evaluate large sequential I/O operations typical for the Lustre file system employed as a compute cluster data cache. These tests were performed using three system configurations: A baseline test using the Lustre file system with a single HDD-based Object Storage Server (OSS) A Lustre file system configuration using a single SSD-based OSS similar to the baseline test A comparison test using the Lustre file system and two SSD-based OSSs in parallel The results of these tests demonstrate the potential for significant benefits in the use of SSD devices for HPC applications with large I/O components.

Motivation
Processor performance, especially in high-performance clustered multiprocessor systems, has grown much more quickly than the performance of I/O systems and large-scale storage devices. At the same time, high-performance computing tasks in particular have been dominated more and more by the need to manage and manipulate very large data sets, such as sensor data for meteorology and climate models. In combination, the need to manage large data sets while meeting the data demands of fast processors has led to a growing imbalance between computation and I/O the I/O bottleneck. This I/O bottleneck constrains the overall performance of HPC systems. It has become essential to look for HPC performance improvements somewhere other than increased processor speed. It has become equally essential to reduce the energy requirements of HPC systems. Many HPC datacenters are up against hard limits of available power and cooling capacity. Reducing energy cost and cooling load can

Solid State Drives in HPC: Reducing the I/O Bottleneck

Sun Microsystems, Inc.

provide increases in capacity that would otherwise not be feasible. The use of solid state devices to replace traditional HDDs can allow HPC systems to both improve I/O performance, and reduce energy consumption and cooling load. This Sun BluePrints article is divided into the following sections: SSD technology review on page 4 provides an introduction to SSD technology. Single system application performance on page 6 compares HDD and SSD technology using well-known HPC applications. SSD usage with the Lustre parallel file system on page 14 compares an HDD baseline configuration with SSD-based configurations for the Lustre file system. The Appendix: Benchmark descriptions and parameters on page 22 details specifics of benchmarks used in this study.

SSD technology review


SSD devices are already familiar to most, in the form of flash drive technology used in PDAs, digital cameras, mobile phones, and in USB thumb drives used for portable storage and data transfer. With no moving parts, high speed data transfer, low power consumption, and cool operation, SSD devices have become a popular choice to replace HDDs. There are two choices available for SSD technology: multilevel cell (MLC) SSDs as found in laptops and thumb drives, and single-level cell (SLC) SSDs as used in enterprise servers. In MLC storage, data is stored with two bits in each storage cell. SLC storage stores a single bit per cell, so MLC devices store twice as much data as SLC devices for the same storage footprint. SLC devices, however, are faster and have ten times the life expectancy of MLC devices. Sun enterprise SSD devices use SLC technology. The experiments described in this article used the Intel X25-E Extreme SATA SolidState Drive mounted in either a 3.5 inch SATA Carrier or 2.5 inch SAS, similar to that shown in Figure 1. These SSDs deliver very good performance while simultaneously improving system responsiveness over traditional HDDs in some of the most demanding applications. Sun SSDs are available in both 2.5-inch and a 3.5-inch carriers to support a wide variety of Sun rack mount and blade servers. By providing these two formats, SSDs can be used as a drop-in replacement for HDDs, while delivering enhanced performance, reliability, ruggedness and power savings.

Solid State Drives in HPC: Reducing the I/O Bottleneck

Sun Microsystems, Inc.

Figure 1. Solid state drive mounted in 3.5 inch SATA carrier

SSDs yield access to stored data via traditional read operations and store data via traditional write operations. No modifications are required to applications that access data via HDDs. SSDs are much faster and provide greater data throughput than HDDs, because there are no rotating platters, moving heads, fragile actuators, unnecessary spin-up time or positional seek time. The SSDs employed by Sun utilize native SATA interface connections so they do not require any modification to the hardware interface when placed in Sun servers. SSDs utilize native SATA interfaces, but also provide a built-in parallel NAND channel to the flash memory cells (Figure 2). This architecture provides much greater performance compared to traditional HDDs without modification to applications. SSDs also support native command queueing (NCQ), lowering latency and increasing I/O bandwidth. Sun SSDs incorporate a wear-leveling algorithm for higher reliability of data and provide a life expectancy of two million hours Mean Time Between Failures (MTBF).

Solid State Drives in HPC: Reducing the I/O Bottleneck

Sun Microsystems, Inc.

Flash Memory Channel 0

NAND Flash Memory

SATA Interface

Intel System On a Chip (SOC)

Flash Memory Channel ...

NAND Flash Memory

Flash Memory Channel n

NAND Flash Memory

Figure 2. SSDs use a native SATA interface, but provide fast parallel NAND channels

Single system application performance


To evaluate the use of SSDs in HPC environments, Sun first compared HPC application run times on Sun Fire servers using traditional HDDs and SSDs. These comparisons were made using FEA applications from the mechanical computer aided engineering (MCAE) domain. These FEA applications are computer models that focus on designs and materials used in real engineering analysis. More important for this study, these applications are the basis for a number of well-known application benchmarks, widely used to evaluate HPC systems. In this study, the results of similar computations using HDD and SSD configurations were compared. If these application benchmarks run significantly faster in the SSD configuration, then running these applications with data from real user models might show similar gains. The FEA benchmark applications used in this report are: ABAQUS NASTRAN ANSYS
Note: For details of benchmarks and benchmark configurations, see the Appendix beginning on page 22.

Solid State Drives in HPC: Reducing the I/O Bottleneck

Sun Microsystems, Inc.

The ABAQUS benchmark application


Runs were made on a Sun Fire X4450 server1 using from one to four cores per job with the ABAQUS standard test suite. (Please see ABAQUS standard benchmark test cases on page 22 for details.) This suite features "large" models, that require: Large number of degrees of freedom Large memory requirements A substantial I/O component The Sun Fire X4450 server with four 2.93 GHz quad-core Intel Xeon Processor X7350 CPUs demonstrated a substantial performance improvement using Sun SSDs as compared to traditional HDDs. The performance generally increased in concert with the system load (increased number of active cores). Table 1 illustrates the overall comparisons for the ABAQUS standard test suite runs.
Table 1. ABAQUS Benchmark standard test suite: HDDs vs. SSDs

Test, cores S2a-1,1 S2a-2,2 S2a-4,4 S2b-1,1 S2b-2,2


.

Time(sec) x4450 HDD 2787 1659 949 3074 1684 1608 679 628 480 11698 6162 3734 6608 5499 4073 1708 1345 1069 9040 6128 4864

Time (sec) x4450 SSD 2464 1298 709 3111 1753 1606 613 419 303 8115 4520 2655 6743 4571 3509 1051 675 456 7175 4741 3520

Time Ratio HDD:SSD 1.13 1.28 1.34 0.99 0.96 1 1.11 1.5 1.58 1.44 1.36 1.41 0.98 1.2 1.16 1.63 1.99 2.34 1.26 1.29 1.38

Improvement Sockets 12.00% 22.00% 25.00% -1.00% -4.00% 0.00% 10.00% 33.00% 37.00% 31.00% 27.00% 29.00% -2.00% 17.00% 14.00% 38.00% 50.00% 57.00% 21.00% 23.00% 28.00% 1 2 4 1 2 4 1 2 4 1 2 4 1 2 4 1 2 4 1 2 4

Cores 1 2 4 1 2 4 1 2 4 1 2 4 1 2 4 1 2 4 1 2 4

S2b-4,4 S4a-1,1 S4a-2,2 S4a-4,4 S4b-1,1 S4b-2,2 S4b-4,4 S4c-1,1 S4c-2,2 S4c-4,4 S5-1,1 S5-2,2 S5-4,4 S6-1,1 S6-2,2 S6-4,4

As shown in Figure 3, the use of SSD improves performance in all cases, with increases as great as two times the HDD baseline in the S5 test.
1 For details of the Sun Fire x4450 server, please see http://www.sun.com/servers/x64/x4450/

Solid State Drives in HPC: Reducing the I/O Bottleneck

Sun Microsystems, Inc.

12000

10000

HDD SSD

Time (sec)

8000

6000

4000

2000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
2 4 1 2 2 4 4 1 2 1 4 1 2 2 4 1 4 1 2 - 2, - 4, - 1, - 2, - 2, - 4, - 4, - 1, - 2, - 1, - 4, - 1, - 2, - 2, - 4, - 1, - 4, - 1, - 2, S2b S2b S4a S2a S4a S4b S4b S2a S4a S2b S4b S4c S4c S5 S4c S5 S5 S6 S6
Figure 3. SSDs provided up to two times the performance of HDDs in the ABAQUS tests Test - # sockets, # cores

- 1,

S2a

Sun used the following configuration for the ABAQUS test comparisons. Hardware configuration Sun Fire X4450 server Four 2.93 GHz quad-core Intel Xeon Processor X7350 CPUs Four 15,000 RPM 500 GB SAS drives Three 32 GB SSDs The system was set up to boot off of one of the hard disk drives. The base-line harddisk based file system was set to stripe across three SAS HDDs. For comparative purposes, the SSD-based file system was configured across three SSDs. Software configuration 64-bit SUSE Linux Enterprise Server SLES 10 SP 1 ABAQUS V6.8-1 Standard Module ABAQUS 6.7 Standard Benchmark Test Suite

S6

- 4,

Solid State Drives in HPC: Reducing the I/O Bottleneck

Sun Microsystems, Inc.

The NASTRAN benchmark application


NASTRAN is an FEA program that was originally developed for NASA (National Aeronautics and Space Administration) in the late 1960s under United States government funding for the Aerospace industry. NASTRAN is widely used throughout the world in the aerospace, automotive, and maritime industries. The MSC/ NASTRAN test suite was used to compare the performance of a Sun Fire server using either HDDs or SSDs. Runs were made on a Sun Fire x2270 server2 using from one to eight cores per job with the MSC/NASTRAN Vendor 2008 benchmark test suite. (Please see NASTRAN benchmark test cases on page 27 for details.) In some cases only one core was used since some test cases don't scale well beyond this point. A few scaled well up to four cores, and the rest scaled well up to the eight cores that were used for this report. The test cases for the MSC/NASTRAN module have a substantial I/O component where from 15% to 25% of the total run times could be associated with I/O activity (primarily scratch files). The Sun Fire x2270 server equipped with two 2.93 GHz quad-core Intel Xeon Processor X5570 CPUs demonstrated a substantial performance improvement using Sun SSDs as compared to HDDs. The performance increased in concert with the system load (increased number of active cores). Charting these MSC/NASTRAN Vendor_2008 test suite runs to the particular test as shown in Table 2, one can see greater than two times the overall speed and productivity of the xxocmd2 eight-core as well as the xlotdf1 eight-core MCS/ NASTRAN benchmark tests. So as the number of cores increased, the overall clock time of the runs decreased, for an overall majority of the test suite. The xxocmd2 and xlotdf1 eight-core runs each increased performance by more then 54%.

2 For more information on the Sun Fire x2270 server, please see http://www.sun.com/servers/x64/x2270/

10

Solid State Drives in HPC: Reducing the I/O Bottleneck

Sun Microsystems, Inc.

Table 2. MSC/NASTRAN Test Suite: HDD vs SSD

Test--number of cores

Sun Fire x2270 server Time (sec) x2270 HDD 127 895 614 631 1554 2000 1240 833 1562 2479 2450 843

Sun Fire x2270 server Time (sec) x2270 SSD 126 884 583 404 711 1939 1189 751 712 2402 2262 817

Time Ratio HDD:SSD

Improvment

Number Cores

vlosst1-1 xxocmd2-1 xxocmd2-2 xxocmd2-4 xxocmd2-8 xlotdf1-1 xlotdf1-2 xlotdf1-4 xlotdf1-8 sol400_1-1 sol400_S-1 getrag-1

1.007936508 1.012443439 1.053173242 1.561881188 2.185654008 1.031459515 1.042893188 1.10918775 2.193820225 1.032056619 1.08311229 1.031823745

0.79% 1.23% 5.05% 35.97% 54.25% 3.05% 4.11% 9.84% 54.42% 3.11% 7.67% 3.08%

1 1 2 4 8 1 2 4 8 1 1 1

The testing shows a significant gain in productivity for MCS/NASTRAN when using SSDs. As seen in Figure 4, the MSC/NASTRAN test suite demonstrates significant improvement in clock time in nearly all cases, with a gain of nearly two times in the xxocmd2 and xlotdf1 tests with eight cores.
.

11

Solid State Drives in HPC: Reducing the I/O Bottleneck

Sun Microsystems, Inc.

2500

2000

HDD SSD

Time (sec)

1500

1000

500

f1

f1

d2

d2

d2

d2

t1

f1

f1

0_

0_

l40

xlo

Xlo

xlo

vlo

xlo

l40

xx oc

xx oc

xx oc

xx oc

So

Figure 4. SSDs improved performance by more than 54% in the xxocmd2 and Test - #cores xlotdf1 MSC/NASTRAN test
.

Sun used the following configuration for the NASTRAN test comparisons: Hardware configuration Sun Fire x2270 server Two 2.93 GHz quad-core Intel Xeon Processor X5570 CPUs 24 GB memory Three 7200 RPM SATA HDD Two 32 GB SSD The system was set up to boot off of one of the hard disk drives. The base-line hard-disk based file system was set to stripe across two SATA HDDs. For comparative purposes, the SSD-based file system was configured across both SSDs. Software configuration 64-bit SUSE Linux Enterprise Server SLES 10 SP 1 MSC/NASTRAN MD 2008 MSC/NASTRAN Vendor_2008 Benchmark Test Suite

So

ge

tra

td

td

ss

td

td

g-

1-

S-

10

11

12

-4

-8

-1

-8

-2

-2

-1

-1

-4

12

Solid State Drives in HPC: Reducing the I/O Bottleneck

Sun Microsystems, Inc.

The ANSYS benchmark application


ANSYS is a general-purpose FEA modeling package used widely in industry. The ANSYS BMD Test Suite was used to acquire this data. (Please see ANSYS 12.0 (prel. 7) with ANSYS 11.0 distributed benchmarks on page 29 for details.) This test suite was used to compare the performance of a Sun Fire server equipped with HDDs and SSDs. Runs were made on a Sun Fire x2270 server using eight cores per job. The test cases have a substantial I/O component where 15% to 20% of the total run times are associated with I/O activity (primarily scratch files). The Sun Fire x2270 server equipped with two 2.93 GHz quad-core Intel Xeon Processor X5570 CPUs demonstrated a substantial performance improvement using Sun SSDs as compared to using the traditional HDDs. One of the most I/O intensive cases in the ANSYS BMD test suite is the bmd-4 case. This test case in particular showed a significant increase in overall performance and productivity. The same test running with HDDs took 2.78 times longer to complete than when the system was equipped with SSDs. Table 3 illustrates the overall comparisons for the ANSYS BMD test suite runs:
Table 3. HDD-based system required as much as 2.78 times longer as SSD on the ANSYS BMD test suite

Eight-core BM test

Sun Fire x2270 server Time (sec) x2270 HDD 39 117 68 703 298 297 293

Sun Fire x2270 server Time (sec) x2270 SSD 26 84 66 253 285 292 212

Time Ratio HDD:SSD

Improvement

bmd-1 bmd-2 bmd-3 bmd-4 bmd-5 bmd-6 bmd-7

1.5 1.392857143 1.03030303 2.778656126 1.045614035 1.017123288 1.382075472

33.33% 28.21% 2.94% 64.01% 4.36% 1.68% 27.65%

As shown in Figure 5, the ANSYS BMD test suite bmd-4 runs using SSDs yield 2.78 times the overall speed and productivity of the same test using HDDs. The bmd-4 eight-core run improved by 64.01% simply by using SSDs.

13

Solid State Drives in HPC: Reducing the I/O Bottleneck

Sun Microsystems, Inc.

800 700 600

HDD SSD

Time (sec)

500 400 300 200 100 0

bmd-1 1

bmd-2 2

bmd-3 3

bmd-4 4

bmd-5 5

bmd-6 6

bmd-7 7

Test

Figure 5. The bmd-4 test eight-core run improves performance by 64.01%

The testing shows a substantial boost in productivity for ANSYS when using SSDs. Sun used the following configuration for the ANSYS test comparisons described in this report:
.

Hardware configuration Sun Fire x2270 server Two 2.93 GHz quad-core Intel Xeon Processor X5570 CPUs 24 GB memory Two 32 GB SSDs Three 7200 rpm SATA 500 GB HDDs The system was set up to boot from one of the hard disk drives. The base-line harddisk based file system was set to stripe across two SATA HDDs. For comparative purposes, the SSD-based file system was configured across two SSDs. Software configuration 64-bit SUSE Linux Enterprise Server SLES 10 SP 2 ANSYS V 12.0 Prerelease 7 ANSYS 11 Distributed BMD Benchmark Test Suite

14

Solid State Drives in HPC: Reducing the I/O Bottleneck

Sun Microsystems, Inc.

Summary for single system application performance


These tests have demonstrated that the use of SSDs can lead to overall improvement in performance in testing using HPC MCAE applications when compared to the same applications run on HDDs. This improvement was seen using both SAS and SATA-based configurations. These tests have demonstrated that SSDs can improve performance markedly in I/O bound applications. It is also important to note that the use of SSDs can result in a reduction of overall power consumption. The systems run cooler and have less of an impact on the environment. This testing demonstrated that the greatest reduction in wall-clock time, and improvement in productivity, is associated with benchmark applications that have the most significant I/O component. In cases where the I/O load is less, as expected, the performance improvement is more limited. In the case of the ANSYS bmd-5 benchmark, for example, if sufficient memory is available, the solver can run in memory. In this case, no I/O is required at all, and the improvement in I/O bandwidth has little or no effect on the performance of the benchmark application. Thus, it is important to consider the I/O requirements of a particular application when considering the use of SSD to improve performance.

SSD usage with the Lustre parallel file system


The Lustre parallel file system is an open source, shared file system designed to address the I/O needs of the largest and most demanding compute clusters. The Lustre parallel file system is best known for powering the largest HPC clusters in the world, with tens of thousands of client systems, petabytes of storage, and hundreds of gigabytes per second of I/O throughput. A number of HPC sites use the Lustre file system as a site-wide global file system, servicing clusters on an exceptional scale. The Lustre file system is used by over 40% of Top 100 Supercomputers as ranked by top500.org on the November 2008 listing. Additionally, IDC lists the Lustre file system as the file system with the largest market share in HPC.3 With the mass adoption of clusters and explosive growth of data storage needs, I/O bandwidth challenges are becoming common in a variety of public and private sector environments. The Lustre file system is a natural fit for these situations where traditional shared file systems, such as NFS, do not scale to the required aggregate throughput requirements. Sectors struggling with this challenge can include oil and gas, manufacturing, government, and digital content creation (DCC).

Lustre file system design


The Lustre file system (Figure 6) is a software-only architecture that allows a number of different hardware implementations. The main components of the Lustre file system architecture are Lustre file system clients (Lustre clients), Metadata Servers (MDS), and Object Storage Servers (OSS). Lustre clients are typically compute nodes
3 HPC User Forum Survey, 2007 HPC Storage and Data Management: User/Vendor Perspectives and Survey Resuls

15

Solid State Drives in HPC: Reducing the I/O Bottleneck

Sun Microsystems, Inc.

in HPC clusters. These nodes run Lustre client software, and access the Lustre file system via InfiniBand, Gigabit Ethernet, or 10 Gigabit Ethernet connections. The Lustre file system client software presents a native POSIX file interface to the client nodes on which it runs. The Lustre file system is then mounted like any other file system. Metadata Servers and Object Storage Servers implement the file system and communicate with the Lustre clients.

Metadata Servers (MDS)

(active)

(standby)

Commodity Storage

InfiniBand

Multiple networks supported simultaneously

Storage Arrays (Direct Connect)

Clients
.
Ethernet

File System Fail-over

Object Storage Servers (OSS)

Enterprise Storage Arrays & SAN Fabrics

Figure 6. The Lustre file system

The Lustre file system uses an object-based storage model, and provides several abstractions designed to improve both performance and scalability. At the file system level, Lustre file system technology treats files as objects that are located through metadata servers. Metadata servers support all file system name space operations, such as file lookups, file creation, and file and directory attribute manipulation. File data is stored in objects on the OSSs. The MDS directs actual file I/O requests from Lustre file system clients to OSSs, which ultimately manage

16

Solid State Drives in HPC: Reducing the I/O Bottleneck

Sun Microsystems, Inc.

the storage that is physically located on underlying storage devices. Once the MDS identifies the storage location of a file, all subsequent file I/O is performed between the client and the OSS. This design divides file system updates into two distinct types of operations: file system metadata updates on the MDS, and actual file data updates on the OSS. Separating file system metadata operations from actual file data operations not only improves immediate performance, but also improves long-term aspects of the file system such as recoverability and availability. The Lustre file system implementation supports InfiniBand or Gigabit Ethernet interconnects, redundant metadata servers, and a choice of commodity storage for use on the Object Storage Servers. This can include: Simple disk storage devices (colloquially, just a bunch of disks, or JBOD) High availability direct storage Enterprise SANs Since the Lustre file system is so flexible, it can be used in place of a shared SAN for enterprise storage requirements. However, the Lustre file system is also well suited to use in a traditional SAN environment as well. Lustre file system clusters are composed of rack mount or blade server clients, metadata servers, and object storage servers. The Lustre file system runs on Suns Open Network Systems Architecture.
.

Note: For more information on the Lustre file system, see http://wiki.lustre.org/ and http://www.sun.com/software/products/lustre/.

With the previously documented success on single system runs with SSDs, Sun has begun to explore using SSDs with the Lustre file system. Testing within Sun has been performed with a cluster deployed through the use of Sun HPC Software, Linux Edition4. This software fully integrates the Lustre file system as an open software component. It also includes OFED (Open Fabrics Enterprise Distribution) software for Mellanox InfiniBand support. The test cluster configuration included: One Sun Fire x2250 server configured as a Lustre file system client One Sun Fire X2250 server configured as an MDS Two Sun Blade x6250 server modules5 configured with HDDs and SSDs as OSSs, Sun Blade 6000 Modular System enclosure6 One Dual Data Rate (DDR) InfiniBand Network

4 http://www.sun.com/software/products/hpcsoftware/index.xml 5 http://www.sun.com/servers/blades/x6250/ 6 http://www.sun.com/servers/blades/6000/

17

Solid State Drives in HPC: Reducing the I/O Bottleneck

Sun Microsystems, Inc.

After the systems were provisioned with the Sun HPC Software, Linux Edition a Lustre file system was created using the commands below. The following commands were executed to configure the MDS server.
mkfs.lustre --fsname=testfs --mgs --mdt /dev/sdb mkdir mdt mount -t lustre /dev/sdb /mdt

The first Sun Blade X6250 server module acting as an OSS was configured with file systems based on both a HDD and an SSD.
mkfs.lustre --fsname=testfs --ost --mgsnode=v6i@o2ib /dev/sda mkfs.lustre --fsname=testfs --ost --mgsnode=v6i@o2ib /dev/sdc mkdir ostsdahdd mkdir ostsdcssd mount -t lustre /dev/sda /ostsdahdd mount -t lustre /dev/sdc /ostsdcssd

The second Sun Blade X6250 server module acting as an OSS was configured with a single SSD-based file system.
mkfs.lustre --fsname=testfs --ost --mgsnode=v6i@o2ib /dev/sdc mkdir ostsdcssd mount -t lustre /dev/sdc /ostsdcssd

The Lustre file system client was then configured to access the various HDD-based and SSD-based file systems for testing.
mkdir lustrefs mount -t lustre v6i@o2ib:/testfs /lustrefs mkdir /lustrefs/st1hdd mkdir /lustrefs/st1ssd mkdir /lustrefs/st2ssd lfs setstripe /lustrefs/st1hdd -c 1 -s 1m -i 1 lfs setstripe /lustrefs/st1ssd -c 1 -s 1m -i 2 lfs setstripe /lustrefs/st2ssd -c 2 -s 1m -i 2 Note: The Lustre file system command lfs setstripe was used on specific directories (st1hdd, st1ssd, st2ssd) to direct I/O to specific HDDs and SSDs for data contained in this report.

In addition, the lfs getstripe Lustre file system command was used to review that proper striping was in force as well as specific object-storage targets (OSTs) were assigned that were needed to support the specific tests that were run in this report as described. The Java Performance statistics monitor (JPerfmeter7) was also incorporated to see which OSS/OST was being used.

7 http://jperfmeter.sourceforge.net/

18

Solid State Drives in HPC: Reducing the I/O Bottleneck

Sun Microsystems, Inc.

IOZone file system testing


The IOZone file system benchmark tool8 is used to perform broad--based performance testing of file systems, using a synthetic workload with a wide variety of file system operations. IOZone is an independent, portable benchmark that is used through the industry. Runs were first made with the HDD-based OSS and the IOZone benchmark in order to establish baseline performance. Similar runs were then made using the SSD-based configuration, again recording performance using IOZone. From the Lustre file system client several IOZone commands were used to gather data for these tests. The following IOZone command was used to direct traffic to the HDD-based OSS on the first Sun Blade X6250 server module.
iozone -r 1024k -s 2G -i 0 -i 1 -f /lustrefs/st1hdd/iozone2ghdd -Rb /lustrefs/st1hdd/iozone2ghdd.xls -+m /lustrefs/scripts/iozone/iozone3_311/src/current/client_ list

The following IOZone command was used to direct traffic to the SSD-based OSS on the first Sun Blade X6250 server module.
iozone -r 1024k -s 2G -i 0 -i 1 -f /lustrefs/st1ssd/iozone2gssd -Rb /lustrefs/st1ssd/iozone2gssd.xls -+m /lustrefs/scripts/iozone/iozone3_311/src/current/ client_list

The following IOZone command was used to direct traffic to the two separate SSDbased OSSs on each of the Sun Blade X6250 server modules. This testing was done to verify that scaling would occur with the Lustre file system and multiple SSD-based OSSs.
iozone -r 1024k -s 2G -i 0 -i 1 -f /lustrefs/st2ssd/iozone2g-2-ssd -Rb /lustrefs/st2ssd/iozone2g-2-ssd.xls -+m /tmp/lustrefs/scripts/iozone/iozone3_311/src/current/ client_list Note: These are single command lines, reformatted to fit the page.

8 http://www.iozone.org/

19

Solid State Drives in HPC: Reducing the I/O Bottleneck

Sun Microsystems, Inc.

Figure 7 shows data write performance with the results of: A single HDD-based OSS as the baseline A single SSD-based OSSs for initial comparison Two SSD-based OSSs to verify scaling

Figure 7. Data write performance was greater with SSDs, and scaled with multiple SSD-based OSSs

A Lustre file system using SSDs shows significant advantages over a similar Lustre file system using the baseline HDD configuration: Using a Lustre file system configuration with a single OSS using SSD, runs required only 77.5% of the time required using the baseline HDD configuration. I/O bandwidth was 1.37 greater using the Lustre file system in a single SSD OSS configuration, compared to the baseline HDD OSS. Using a Lustre file system configuration with two OSSs using SSD, run time was reduced to 41.65% of the time required using the baseline HDD configuration. I/O bandwidth was 2.39 times greater using the Lustre file system with two OSS/ SSD devices, compared to the baseline HDD OSS. These results demonstrate that SSDs provide improved performance used in an OST for the Lustre file system. Two OSS/SSD devices show further improvement, demonstrating that OSS/SSD performance scales with the number of OSSs

20

Solid State Drives in HPC: Reducing the I/O Bottleneck

Sun Microsystems, Inc.

Sun used the following configuration for the OSS server to run the IOZone test comparisons described in this report: Hardware configuration Sun Blade X6250 server module Two 3.00 GHz quad-core Intel Xeon Processor E5450 CPUs 16 GB memory One SSD One 10000 RPM SAS HDD Software configuration Red Hat Enterprise Linux 5.1 CentOS 5.1 x86_64 2.6.18-53.1.14.el5_lustre.1.6.5smp The IOZone file system benchmarking tool

Summary for SSD usage with the Lustre parallel file system
This report has shown that the use of SSD-based OSSs can drive I/O faster than traditional HDD-based OSSs. Testing showed, further, that the Lustre file system can scale with the use of multiple SSD-based OSSs. Not only can I/O bandwidth be increased with the use of the Lustre file system and SSDs but it is anticipated that run times of other applications using the Lustre file system equipped with SSDs can also be reduced .

Future directions
New technology included in the Lustre file system version 1.8 allows pools of storage to be configured based on technology and performance, and then allocated according to the needs of specific jobs. So, for example, an elite pool of extremely fast SSD storage could be defined along with pools of slower, but higher capacity, HDD storage. Other pools might be defined to use local devices, SAN devices, or networked file systems. The Lustre file system then allows these pools to be allocated as needed to specific jobs in order to optimize performance based upon service level objectives. Performance studies of a production Lustre file system have been performed at the Texas Advanced Computing Center (TACC), using the scaling capabilities of the Lustre file system to obtain higher performance, therefore reducing the I/O bottleneck. (This work is described in the Sun BluePrint Solving the HPC I/O Bottleneck: Sun Lustre Storage System9.) Future work will explore the use of SSDs integrated with new versions of the Lustre file system, Quad Data Rate (QDR) InfiniBand, and Suns new servers and blades.
9 http://wikis.sun.com/display/BluePrints/Solving+the+HPC+IO+Bottleneck+-+Sun+Lustre+Storage+System

21

Solid State Drives in HPC: Reducing the I/O Bottleneck

Sun Microsystems, Inc.

Conclusion
Use of SSDs with Sun servers and blades has demonstrated significant performance improvements in single-system runs of FEA HPC application benchmarks, and through the use of the Lustre parallel file system. There is significant promise that other applications with similar data throughput needs and workloads will also obtain increased bandwidth as well as a reduction in run times.

22

Solid State Drives in HPC: Reducing the I/O Bottleneck

Sun Microsystems, Inc.

Appendix: Benchmark descriptions and parameters


The results reported in this article make use of a collection of benchmark numerical applications. Each benchmark suite makes particular requirements for data that should be made available so the benchmarks can be evaluated fairly. In this Appendix, we note the required details for each of the benchmarks used.

ABAQUS standard benchmark test cases


The problems described below provide an estimate of the performance that can be expected when running ABAQUS/Standard on different computers. The jobs are representative of typical ABAQUS/Standard applications including linear statics, nonlinear statics, and natural frequency extraction. S1: Plate with gravity load This benchmark is a linear static analysis of a plate with gravity loading. The plate is meshed with second-order shell elements of type S8R5 and uses a linear elastic material model. Edges of the plate are fixed. There is no contact. Input file name: s1.inp Increments: 1 Iterations: 1 Degrees of freedom: 1,085,406 Floating point operations: 1.89E+011 Minimum memory requirement: 587 MB Memory to minimize I/O: 2 GB Disk space requirement: 2 GB

S2: Flywheel with centrifugal load This benchmark is a mildly nonlinear static analysis of a flywheel with centrifugal loading. The flywheel is meshed using first-order hexahedral elements of type C3D8R and uses an isotropic hardening Mises plasticity material model. There is no contact. The nonlinearity in this problem arises from localized yielding in the vicinity of the bolt holes. Two versions of this benchmark are provided. Both versions are identical except that one uses the direct sparse solver and the other uses the iterative solver. S2a: Direct solver version Input file name: s2a.inp Increments: 6 Iterations: 12 Degree of freedom: 474,744 Floating point operations: 1.86E+012 Minimum memory requirement: 733 MB Memory to minimize I/O: 849 MB Disk space requirement: 4.55 GB

23

Solid State Drives in HPC: Reducing the I/O Bottleneck

Sun Microsystems, Inc.

S2b: Iterative solver version Input file name: s2b.inp Increments: 6 Iterations: 11 Degrees of freedom: 474,744 Floating point operations: 8.34E+010 Minimum memory requirement: 2.8 GB Memory to minimize I/O: NA Disk space requirement: 387 MB S3: Impeller frequencies This benchmark extracts the natural frequencies and mode shapes of a turbine impeller. The impeller is meshed with second-order tetrahedral elements of type C3D10 and uses a linear elastic material model. Frequencies in the range from 100 Hz. to 20,000 Hz. are requested. Three versions of this benchmark are provided: a 360,000 DOF version that uses the Lanczos eigensolver, a 1,100,000 DOF version that uses the Lanczos eigensolver, and a 1,100,000 DOF version that uses the AMS eigensolver. S3a: 360,000 DOF Lanczos eigensolver version Input file name: s3a.inp Degrees of freedom: 362,178 Floating point operations: 3.42E+11 Minimum memory requirement: 384 MB Memory to minimize I/O: 953 MB Disk space requirement: 4.0 GB S3b: 1,100,000 DOF Lanczos eigensolver version Input file name: s3b.inp Degrees of freedom: 1,112,703 Floating point operations: 3.03E+12 Minimum memory requirement: 1.33 GB Memory to minimize I/O: 3.04 GB Disk space requirement: 23.36 GB S3c: 1,100,000 DOF AMS eigensolver version Input file name: s3c.inp Degrees of freedom: 1,112,703 Floating point operations: 3.03E+12 Minimum memory requirement: 1.33 GB Memory to minimize I/O: 3.04 GB Disk space requirement: 19.3 GB

24

Solid State Drives in HPC: Reducing the I/O Bottleneck

Sun Microsystems, Inc.

S4: Cylinder head bolt-up This benchmark is a mildly nonlinear static analysis that simulates bolting a cylinder head onto an engine block. The cylinder head and engine block are meshed with tetrahedral elements of types C3D4 or C3D10M, the bolts are meshed using hexahedral elements of type C3D8I, and the gasket is meshed with specialpurpose gasket elements of type GK3D8. Linear elastic material behavior is used for the block, head, and bolts while a nonlinear pressure-overclosure relationship with plasticity is used to model the gasket. Contact is defined between the bolts and head, the gasket and head, and the gasket and block. The nonlinearity in this problem arises both from changes in the contact conditions and yielding of the gasket material as the bolts are tightened. Three versions of this benchmark are provided: a 700,000 DOF version that is suitable for use with the direct sparse solver on 32-bit systems, a 5,000,000 DOF version that is suitable for use with the direct sparse solver on 64-bit systems, and a 5,000,000 DOF version that is suitable for use with the iterative solver on 64-bit systems. S4a: 700,000 DOF direct solver version Input file name: s4a.inp Increments: 1 Iterations: 5 Degrees of freedom: 720,059 Floating point operations: 5.77E+11 Minimum memory requirement: 895 MB Memory to minimize I/O: 3 GB Disk space requirement: 3 GB S4b: 5,000,000 DOF direct solver version Input file name: s4b.inp Increments: 1 Iterations: 5 Degrees of freedom: 5,236,958 Floating point operations: 1.14E+13 Minimum memory requirement: 4 GB Memory to minimize I/O: 20 GB Disk space requirement: 23 GB S4c: 5,000,000 DOF iterative solver version Input file name: s4c.inp Increments: 1 Iterations: 3 Degrees of freedom: 5,248,154 Floating point operations: 3.74E+11

25

Solid State Drives in HPC: Reducing the I/O Bottleneck

Sun Microsystems, Inc.

Minimum memory requirement: 16 GB Memory to minimize I/O: NA Disk space requirement: 3.3 GB S5: Stent expansion This benchmark is a strongly nonlinear static analysis that simulates the expansion of a medical stent device. The stent is meshed with hexahedral elements of type C3D8 and uses a linear elastic material model. The expansion tool is modeled using surface elements of type SFM3DR. Contact is defined between the stent and expansion tool. Radial displacements are applied to the expansion tool which in turn cause the stent to expand. The nonlinearity in this problem arises from large displacements and sliding contact. Input file name: s5.inp Increments: 21 Iterations: 91 Degrees of freedom: 181,692 Floating point operations: 1.80E+009 Minimum memory requirement: NA Memory to minimize I/O: NA Disk space requirement: NA

Note: Abaqus, Inc. would like to acknowledge Nitinol Devices and Components for providing the original finite element model of the stent. The stent model used in this benchmark is not representative of current stent designs.

S6: Tire footprint This benchmark is a strongly nonlinear static analysis that determines the footprint of an automobile tire. The tire is meshed with hexahedral elements of type C3D8, C3D6H, and C3D8H. Linear elastic and hyperelastic material models are used. Belts inside the tire are modeled using rebar layers and embedded elements. The rim and road surface are modeled as rigid bodies. Contact is defined between the tire and wheel and the tire and road surface. The analysis sequence consists of three steps. During the first step the tire is mounted to the wheel, during the second step the tire is inflated, and then during the third step a vertical load is applied to the wheel. The nonlinearity in the problem arises from large displacements, sliding contact, and hyperelastic material behavior. Input file name: s6.inp Increments: 41 Iterations: 177 Degrees of freedom: 729,264 Floating point operations: NA

26

Solid State Drives in HPC: Reducing the I/O Bottleneck

Sun Microsystems, Inc.

Minimum memory requirement: 397 MB Memory to minimize I/O: 940 MB Disk space requirement: NA Hardware configuration Sun Fire X4450 server Four 2.93 GHz quad-core Intel Xeon X7350 Processor CPUs Four 15,000 RPM 500 GB SAS drives Three 32 GB SSDs The system was set up to boot off of one of the hard disk drives. The base-line harddisk based file system was set to stripe across three SAS HDDs. For comparative purposes, the SSD-based file system was configured across three SSDs. Software configuration 64-bit SUSE Linux Enterprise Server SLES 10 SP 1 ABAQUS V6.8-1 Standard Module ABAQUS 6.7 Standard Benchmark Test Suite

27

Solid State Drives in HPC: Reducing the I/O Bottleneck

Sun Microsystems, Inc.

NASTRAN benchmark test cases


The problems described below are representative of typical MSC/Nastran applications including both SMP and DMP runs involving linear statics, nonlinear statics, and natural frequency extraction. vl0sst1 No. Degrees Of Freedom: 410,889 Run time sensitive to memory allocated to job: 2:04:36 elapsed w/ mem=37171200 4:35:26 elapsed w/ mem=160mb sys1=32769 5:20:12 elapsed w/ mem=80mb sys1=32769 1:11:58 elapsed w/ mem=1600mb bpool=40000 (This job does extensive post solution processing of GPSTRESS I/O. ) Solver: SOL 101 Memory Usage: 7.3 MB Maximum Disk Usage: 4.33 GB xx0cmd2 No. Degrees Of Freedom: 1,315,562 Solver: SOL 103 Normal Modes With ACMS - DOMAINSOLVER ACMS (Automated Component Modal Synthesis) Memory Usage: 1800 MB Maximum Disk Usage: 14.422 GB xl0tdf1 No. Degrees Of Freedom: 529,257 Solver: SOL 108 Fluid/Solid Interaction Car Cabin Noise - FULL VEHICLE SYSTEM MODEL Eigenvalue extraction - Direct Frequency Response Memory Usage: 520 MB Maximum Disk Usage: 5.836 GB xl0imf1 No. Degrees Of Freedom: 468,233 Fluid/Solid Interaction Frequency Response Memory Usage: 503 MB Maximum Disk Usage: 10.531 GB

28

Solid State Drives in HPC: Reducing the I/O Bottleneck

Sun Microsystems, Inc.

md0mdf1 No. Degrees Of Freedom: 42,066 This model is for Exterior Acoustics Modal Frequency Response Analysis With UMP Pack Fluid/Solid Interaction Memory Usage: 1 GB Maximum Disk Usage: 414.000 MB 400_1 & 400_S No. Degrees Of Freedom: 437,340 Solver: 400 (MARC module) Nonlinear Static Analysis Memory Usage: 1.63 GB Maximum Disk Usage: 3.372 GB (S Model Sets Aside 3 GB Physical Memory For I/O Buffering) getrag (Contact Model) No. Degrees Of Freedom: 2,450,320 PCGLSS 6.0: Linear Equations Solver Solver: 101 Memory Usage: 8.0 GB Maximum Disk Usage: 17.847 GB Total I/O: 139 GB
.

Hardware configuration Sun Fire x2270 server Two 2.93 GHz quad-core Intel Xeon Processor X5570 CPUs 24 GB memory Three 7200 RPM SATA 500 GB HDDs Two 32 GB SSDs The system was set up to boot from one of the hard disk drives. The base-line harddisk based file system was set to stripe across two SATA HDDs. For comparative purposes, the SSD-based file system was configured across both SSDs. Software configuration 64-bit SUSE Linux Enterprise Server SLES 10 SP 1 MSC/NASTRAN MD 2008 MSC/NASTRAN Vendor_2008 Benchmark Test Suite

29

Solid State Drives in HPC: Reducing the I/O Bottleneck

Sun Microsystems, Inc.

ANSYS 12.0 (prel. 7) with ANSYS 11.0 distributed benchmarks


bmd-1 Dsparse solver, 400K DOF Static analysis Medium sized job, should run in-core on all systems bmd-2 1M DOF iterative solver job. Shows good scaling due to simple preconditioner bmd-3 2M DOF Static analysis Shows good parallel performance for iterative solver Uses pcg iterative solver Uses msave,on feature, cache friendly bmd-4 Larger dsparse solver job 3M DOF, tricky job for dsparse when memory is limited Shows I/O as well as CPU performance Good to show benefit of large memory bmd-5 5.8M DOF large pcg solver job Good parallel performance for iterative solver on a larger job Cache friendly msave,on elements bmd-6 1M DOF lanpcg: Uses assembled matrix with PCG preconditioner New iterative modal based analysis solver chosen to maximize speedups bmd-7 5M DOF static analysis, uses solid45 elements Best test of memory bandwidth performance, which are NOT msave,on elements Lower mflop rate is expected because of sparse matrix/vector kernel

30

Solid State Drives in HPC: Reducing the I/O Bottleneck

Sun Microsystems, Inc.

Hardware configuration Sun Fire x2270 server Two 2.93 GHz quad-core Intel Xeon Processor X5570 CPUs 24 GB memory Two 32 GB SSDs Three 7200 rpm SATA 500 GB HDDs The system was set up to boot from one of the hard disk drives. The base-line harddisk based file system was set to stripe across two SATA HDDs. For comparative purposes, the SSD-based file system was configured across three SSDs. Software configuration 64-bit SUSE Linux Enterprise Server SLES 10 SP 2 ANSYS V 12.0 Prerelease 7 ANSYS 11 Distributed BMD Benchmark Test Suite

About the authors


Larry McIntosh is a Principal Systems Engineer at Sun Microsystems and works within Suns Systems Engineering Solutions Group. He is responsible for designing and implementing high performance computing technologies at Suns largest customers. Larry has 35 years of experience in the computer, communications, and storage industries and has been a software developer and consultant in the commercial, government, education and research sectors as well as a computer science college professor. Larrys recent work has included the deployment of the Ranger system servicing the National Science Foundation and Researchers at the Texas Advanced Computer Center (TACC) in Austin, Texas. Michael Burke obtained his Ph.D. from Stanford University. Since then he has spent over 35 years in the development and application of MCAE software. He was the principal developer of the MARC code now owned by MSC/Nastran. Following the SS Challenger disaster he developed FANTASTIC (Failure Analysis Thermal and Structural Integrated Code) for NASA and its suppliers/contractors for the analysis of rocket (nozzles) More recently he has been involved with the benchmarking of state of the art HPC platforms using the more prominent commercial ISV MCAE/CFD/CRASH and other scientific applications He has performed this benchmarking for Fujitsu and Hewlett Packard, and is currently in the Strategic Applications Engineering group at Sun Microsystems.

31

Solid State Drives in HPC: Reducing the I/O Bottleneck

Sun Microsystems, Inc.

References
Web Sites Sun Fire x4450 server Sun Fire x2270 server Sun HPC Software, Linux Edition Sun Blade x6250 server module Sun Blade 6000 Modular System chassis JPerfMeter IOZone Benchmark Sun BluePrints Articles http://www.sun.com/servers/x64/x4450/ http://www.sun.com/servers/x64/x2270/ http://www.sun.com/software/products/ hpcsoftware/index.xml http://www.sun.com/servers/blades/x6250/ http://www.sun.com/servers/blades/6000/ http://jperfmeter.sourceforge.net/ http://www.iozone.org/ http://wikis.sun.com/display/BluePrints/ Solving+the+HPC+IO+Bottleneck++Sun+Lustre+Storage+System

Solving the HPC I/O Bottleneck: Sun Lustre Storage System

Ordering Sun Documents


The SunDocsSM program provides more than 250 manuals from Sun Microsystems, Inc. If you live in the United States, Canada, Europe, or Japan, you can purchase documentation sets or individual manuals through this program.

Accessing Sun Documentation Online


.

The docs.sun.com Web site enables you to access Sun technical documentation online. You can browse the docs.sun.com archive or search for a specific book title or subject. The URL is http://docs.sun.com To reference Sun BluePrints Online articles, visit the Sun BluePrints Online Web site at: http://www.sun.com/blueprints/online.html

Solid State Drives in HPC: Reducing the I/O Bottleneck

Sun Microsystems, Inc.

Sun Microsystems, Inc. 4150 Network Circle, Santa Clara, CA 95054 USA Phone 1-650-960-1300 or 1-800-555-9SUN (9786) Web sun.com
2009 Sun Microsystems, Inc. All rights reserved. Sun, Sun Microsystems, the Sun logo, Java, Sun Blade, and Sun Fire are trademarks or registered trademarks of Sun Microsystems, Inc. or its subsidiaries in the United States and other countries. Intel Xeon is a trademark or registered trademark of Intel Corporation or its subsidiaries in the United States and other countries. Information subject to change without notice. Printed in USA 06/2009