Vous êtes sur la page 1sur 12

Copyright BM Corp. 2013. All rights reserved.

1
Draft Document for Review February 7, 2014 7:33 pm 8189intro.fm
Chapter 1. FIashSystem Storage
Introduction
Flash technology in the data center is too relevant to be ignored for a few simple reasons:
Since its introduction, flash storage has improved across all metrics offering: higher
performance, density, and reliability all of which translate to improved business efficiency.
Flash cost per capacity and cost per transaction relative to hard disk based storage make
it very attractive to business' who are attempting to maintain pace in a 24x7 competitive
marketplace.
Flash is easily integrated into existing data center environments and provides an instant
boost to the mission critical applications.
While flash in storage is pervasive in the data center, its implementation varies considerably
amongst competitors and technologies. Some use it as a simple cache accelerator while
others implement it as yet another permanent data tier. The reality is that Flash only matters
when two conditions in the data center are met:
Flash eliminates O bottlenecks while generating higher levels of application efficiency
(mproved performance)
Storage Economics are improved by its use, this is, it provides lower TCO (Cost
Reduction) and faster RO to the existing environment (enable new business
opportunities).
BM FlashSystem storage delivers high performance, efficiency, and reliability for shared
enterprise storage environments. t helps clients address performance issues with their most
important applications and infrastructure.
This chapter provides an introduction to BM FlashSystem storage system and its core value,
benefits and technological advantages.
1
8189intro.fm Draft Document for Review February 7, 2014 7:33 pm
2 mplementing BM FlashSystem 840 Storage
1.1 FIashSystem Storage Overview
Flash technology has fundamentally changed the paradigm for T systems, enabling new use
cases and unlocking the scale of enterprise applications. Flash technology enhances the
performance, efficiency, reliability and design of essential enterprise applications and
solutions by addressing the bottleneck in the T process (data storage), enabling truly
optimized information infrastructure.
The BM FlashSystem shared flash storage systems offer affordable, high-density, ultra
low-latency, high-reliability and scalable performance in a storage device that is both space
and power efficient. BM Flash products, which can either augment or replace traditional
hard-drive storage systems in enterprise environments, empower applications to work faster
and scale further.
n addition to optimizing performance, the BM FlashSystem family brings enterprise reliability
and macro efficiency to the most demanding data centers, allowing businesses to:
- Reduce customer complaints by improving application response time
- Service more users with less hardware
- Reduce /O wait and response times of critical applications
- Simplify solutions
- Reduce power and floor space requirements
- Speed up applications, thereby enhancing pace of business
- mprove utilization of existing infrastructure
- Compliment existing infrastructure
- Mitigate risk
From the client business perspective, an BM FlashSystem provides focus benefits and value
in four essential areas:
Extreme Performance Enable business' to unleash the power of performance, scale,
and insight to drive services and products to market faster.
MicroLatency Achieve competitive advantage through applications that enable
faster decision making due to microsecond response times.
Macro Efficiency Decrease costs by getting more from efficient use of the T staff,
T applications and T equipment due to the efficiencies flash
brings to the data center.
Enterprise ReIiabiIity Durable and reliable designs that use enterprise class flash and
patented data protection technology
1.2 Why FIash matters
A recent nternational Data Corporation (DC) study expects the flash market to grow at about
27% compounded annual rate from 2011 to 2016 and that the amount of Flash memory being
Chapter 1. FlashSystem Storage ntroduction 3
Draft Document for Review February 7, 2014 7:33 pm 8189intro.fm
shipped is expected to grow by a factor of 20X reaching almost 3,000 PB or 3 Exabytes by
2016.
Clearly, this is a very vibrant, and fast growing technology. Clients are looking to solve data
center problems, optimize applications, reduce costs and grow their businesses.
Here are some reasons why Flash is a must in every data center, and why an BM
FlashSystem changes the storage economics:
Reduce application and server licensing costs, especially those related to databases and
virtualization solutions.
mprove application efficiency, that is, an application's ability to process, analyze and
manipulate more information, faster.
mprove server efficiency. Get more out of your existing processors, use less RAM per
server, and consolidate operations by having server resources spend more time
processing data as opposed to waiting for data.
mprove storage operations. Eliminate costly application tuning, wasted developer cycles,
storage array hot spots, array tuning, and complex troubleshooting. Decrease floor space
usage and energy consumption by improving overall storage environment performance.
Best performance for critical applications by providing the lowest latency in the market.
Almost all technological components in the data center are getting faster: central processing
units, network, SAN, and memory. All of them have improved their speeds by a minimum of
10X; some of them by 100X like the case of data networks. However, spinning disk has only
increased it's performance, by 1.2 times!
BM FlashSystem 840 provides benefits that include better user experience, server and
application consolidation, development cycle reduction, application scalability, data center
foot-print savings, and improved price performance economics.
Flash improves performance of applications that are critical to the user experience such as
market analytics and research applications, trading and data analysis interfaces, simulation,
modeling, rendering and so forth. Server and application consolidation is possible due to the
increased process utilization resulting from the low latency of flash memory which enables a
server to load more users, more databases and more applications. Flash provides or gives
back time for further processing within existing resources of such servers. Clients soon
realize that there's no need to acquire or expand server resources as often or as soon as it
was previously expected.
Development cycle reduction is possible as developers spend less time designing an
application to work around the inefficiencies of hard disk drives and less time tuning for
performance.
Data center footprint savings are due to high density and high performance per density flash
solutions replacing racks of spinning hard disk drives. Reducing the data center footprint also
translates into power and cooling savings making flash one of the greenest technologies for
the data center
Note: mproved price : performance economics is due to the very low cost for performance
from FlashSystem. The cost savings come from deploying fewer storage enclosures, fewer
disk drives, fewer servers with fewer processors and less RAM while using less power,
space and cooling. Flash is one of the best tools the data center manager has for
improving data center economics.
8189intro.fm Draft Document for Review February 7, 2014 7:33 pm
4 mplementing BM FlashSystem 840 Storage
1.3 IBM FIashSystem famiIy: Product Differentiation
Flash is used widely in the data center, either within a server (PCe cards or internal SSDs), in
storage arrays (hybrid or all-flash), in appliances or platform bundles/solutions
(hardware/software/network). Flash can be used as cache or as a data tier. The bottom line is
that due to the vast and wide adoption of flash, there are a number of different flash
architectures and therefore criteria that can be applied to compare flash options. See
Figure 1-1.
Figure 1-1 The different deployments of Flash
Most storage vendors use and promote flash. The difference is how it is implemented, and the
impact that such implementation has on the economics (cost reduction and revenue
generation) for clients.
Flash technology' is used to eliminate the storage performance bottleneck. The BM
FlashSystem family is the shared-storage market leader and it provides the lowest latency
and consistent response times. t is purpose-built and designed from the ground up for flash.
Many of BM's competitors in the market create flash appliances based on commodity server
platforms and use software-heavy stacks. Many of these suppliers use hardware technologies
designed and created for disk, not flash. Some hybrid arrays combine legacy storage designs,
spinning disk (HDD) and solid state disk (SSD). These solutions that rely on SSD are by no
means inferior as BM storage portfolio include SSD and flash on a variety of storage
platforms as well. However, these alternate solutions don't have the same low latency
(MicroLatency) as the hardware accelerated FlashSystem.
IBM FIashSystem famiIy versus SSD-based storage arrays
Flash memory technologies appeared in the traditional storage systems some time ago.
These SSD-based storage arrays help to successfully address the challenge of increasing
/Os per second needed by applications, and the demand for lower response times in
particular tasks. An implementation example is the BM Easy Tier technology. Refer to,
"Easy Tier on page 247 for an overview of this technology.
However, these technologies typically rely on flash in the format of Fibre Channel (FC), Serial
attached storage (SAS), or serial advanced technology attachment (SATA) disks, placed in
the same storage system as traditional spinning disks, and utilizing the same resources and
HW-based Vs SW-based architecture
Purposed-built for flash or not?
Flash Chip Choices
Data Protection Schemes
Scalability, Reliability, Availability
Method of Deployment
Functionality and Features
Hybrid or All Flash
HW-only data path
FPGA
IBM FIashSystem
j
Chapter 1. FlashSystem Storage ntroduction 5
Draft Document for Review February 7, 2014 7:33 pm 8189intro.fm
data paths. This approach can limit the advantages of flash technology due to the limitations
of traditional disk storage systems.
For more information on BM Easy Tier, refer to the following BM Redbooks publications:
Implementing the IBM System Storage SAN Volume Controller V6.3, SG24-7933, Chapter
7, Easy Tier.
IBM System Storage SAN Volume Controller Best Practices and Performance Guidelines,
SG24-7521, Chapter 11, IBM System Storage Easy Tier function
IBM System Storage DS8000 Easy Tier, REDP-4667
BM FlashSystem storage provides a hardware-only data path that realizes all of the potential
of flash memory. These systems are different from traditional storage systems, both in the
technology and usage.
An SSD device with a HDD disk form factor has flash memory that is put into a carrier or tray.
This carrier is inserted into an array like a hard disk drive. The speed of storage access is
limited by the following technology because it adds latency and cannot keep pace with flash
technology:
Array controllers and software layers
SAS controllers and shared bus
Tiering and shared data path
Form factor enclosure
BM FlashSystem products are fast and efficient. The hardware-only data path has a
minimum number of software layers, which are mostly firmware components, as well as
management software which is separated from the data path (out-of-band). The only other
family of products that have hardware-only access to flash technology are PC express (PCe)
flash products which are installed into a dedicated server. With the appearance of BM
FlashSystem, the benefits of PCe flash products to a single server can now be shared by
many servers.
1.4 TechnoIogy and ArchitecturaI Design Overview
BM FlashSystem, with an all-hardware data path using field programmable-gate array
(FPGA) modules, are engineered to deliver the lowest possible latency. They incorporate
proprietary flash controllers and leverage numerous patented technologies. FlashSystem
controllers have proprietary logic design, firmware, and system software. There are no
commodity 2.5-inch SSDs, PCe cards, or any other significant non-BM assemblies within the
system. The flash chips, FPGA chips, processors, and other semiconductors in the system
are carefully selected to be consistent with the purpose-built design, which is designed from
the ground up for high performance, reliability, and efficiency. Notable architectural concepts
of the BM FlashSystem storage systems are:
Hardware-only data path
Leverage FPGA's extensively
Field-upgradable hardware logic
Less expensive design cycle
Extremely high degree of parallelism
ntelligent flash modules
8189intro.fm Draft Document for Review February 7, 2014 7:33 pm
6 mplementing BM FlashSystem 840 Storage
Distributed computing model
Low-power PPC processors
nterface and flash processors run thin real time operating systems.
Management processor communicates with the interface and flash processors via internal
network
Minimal management communication
Hardware-only data path
The hardware-only data path design of BM FlashSystem eliminates the software layer
latency that is found in other vendor products. n order to achieve such extremely low
latencies, BM FlashSystem advanced software functions are carefully assessed and
implemented on a limited basis. For environments requiring advanced storage services,
implementing BM FlashSystem with BM SAN Volume Controller can offer an unmatched
combination of performance, low latency, and rich software functionality.
n BM FlashSystem and data traverses the array controllers through FPGAs and dedicated,
low-power CPUs. There's no wasted cycles on interface translation, protocol control or
tiering.
BM FlashSystem, with an all-hardware data path design, have an internal architecture that is
different from the other hybrid (SSD + HDD) or SSD-only based disk systems.
FIash chips
The flash chip is the basic storage component of the flash module. There can be a maximum
of 80 enterprise multi-level cell (eMLC) flash chips per flash module. Combining flash chips of
different flash technologies is not supported in the same flash module or storage system to
maintain consistent wearing and reliability.
Gateway interface FPGA
The gateway interface FPGA is responsible for providing /O to the flash module and direct
memory access (DMA) path. t is located on the flash module and has two connections to the
backplane.
FIash controIIer FPGA
The flash controller FGPA of the flash module is used to provide access to the flash chips and
is responsible for the following functions:
Provide data path, hardware /O logic
Look up tables and write buffer
Control 20 flash chips
Operate independently of other controllers
Maintain write ordering and layout
Provide write setup
Maintain garbage collection
Provide error handling
Figure 1-2 shows the flash Controller Design details.
Chapter 1. FlashSystem Storage ntroduction 7
Draft Document for Review February 7, 2014 7:33 pm 8189intro.fm
Figure 1-2 IBM FlashSystem controller details
The concurrent operations performed on the flash chips include moving data in and out of the
chip via DMA, and by internally moving data and performing erases. This means that while
actively transferring user data in the service of host initiated /O, the system can be
simultaneously running garbage collection activities without impacting the /O. The ratio of
transparent background commands running concurrent to active data transfer commands is
seven to one.
There are a maximum of four flash controllers per flash module: two per primary board and
two per expansion board .
1.4.1 VariabIe Stripe RAID and two dimensionaI FIash RAID Overview
Storage systems of any kind are typically designed to perform two main functions: to store
and protect data. BM FlashSystem include the following options for data protection. Table 1-1
shows the various methods of protection.
RAD data protection
- Variable Stripe RAD
- Two dimensional (2D) Flash RAD
Flash memory protection methods
Optimized RAD Rebuild times
Data path, Hardware I/O logic
Look up Tables and Write Buffer
Controls 20 flash Chips
Lookup Tables
DRAM
ControI PPC and DRAM
Out of Data path operations
Garbage collection, Error Handling, System Health
Wear Leveling, Statistics, etc.
DRAM Write Buffer
Gateway Interface FPGA
I/O and Direct Memory Access
NAND Flash Memory
FPGA
FPGA
8189intro.fm Draft Document for Review February 7, 2014 7:33 pm
8 mplementing BM FlashSystem 840 Storage
Table 1-1 Various IBM FlashSystem protection
1.5 VariabIe Stripe RAID
Variable Stripe RAD or VSR is a unique BM technology that provides data protection of
the memory page, block, or whole chip, which eliminates the necessity to replace a whole
flash module in a case of a single memory chip failure or plane failures. This in turn, expands
the life and endurance of Flash Modules and reduces considerably maintenance events
throughout the life of the system.
VSR provides high redundancy across chips within a flash module. RAD is implemented at
multiple addressable segments within chips, in a 9+1 RAD 5 fashion, and it is controlled at
the flash Controller level (four in each flash module). Due to the massive parallelism of DMA
operations controlled by each FPGA and parallel access to chip sets, dies, planes, blocks and
pages; the implementation of VSR has minimum impact on performance.
The following information describes some of the most important aspects of VSR
implementation:
VSR is managed and controlled by each of the four flash Controllers within a single
module.
A flash Controller is in charge of only 20 flash chips
Data is written on flash pages of 8 KB and erased in 1MB flash blocks
VSR is implemented/managed at flash chip plane levels
There are 16 planes per chip
Before a plane fails, at least 256 flash blocks within a plane must be deemed failed
A plane can also fail in its entirety
Up to 64 planes can fail before a whole module is considered failed
Up to 4 chips can fail before a whole module is considered failed
When a flash Module is considered failed, 2D Flash RAD takes control of data protection
and recovery
LAYER MANAGED BY PROTECTION
System-level RAD 5 Centralized RAD
Controllers
Module failure
Module-level RAD 5 Each module across the
chips
Chip failure, page failure
Module-level Variable Stripe
RAD
Each module across the
chips
Subchip, chip, or multi-chip
failure
Chip-level ECC Each module using the
chips
Bit and block error
Note: The proprietary two dimensional (2D) Flash RAD data protection scheme of the BM
FlashSystem 840 storage system combines system-level RAD 5 and Module-level
Variable Stripe RAD (not just module level RAD).
Chapter 1. FlashSystem Storage ntroduction 9
Draft Document for Review February 7, 2014 7:33 pm 8189intro.fm
When a plane or a chip fails, VSR activates to protect data while maintaining system-level
performance and capacity.
1.6 How VSR works
Variable Stripe RAD is an BM patented technology. t includes, but is more advanced than a
simple RAD of flash chips. Variable Stripe RAD introduces two key concepts:
1. The RAD stripe is not solely across chips, it actually spans across flash layers.
2. The RAD stripe can automatically vary based on observed flash plane failures within a
flash module. For example, stripes are not fixed at 9+1 RAD 5 stripe members, but can
go down to 8+1, 7+1 or even 6+1 based on plane failures.
This ability to protect the data at variable stripes effectively maximizes flash capacity even
after flash component failures. Figure 1-3 and shows an overview of BM FlashSystem
Variable Stripe RAD (VSR)
Figure 1-3 IBM FlashSystem Variable Stripe RAID (VSR)
Figure 1-4 on page 10 shows the benefits of BM VSR.
Patented VSR allows RAD stripe sizes to vary.
f one die fails in a ten-chip stripe, only the failed die is bypassed, then data is
restriped across the remaining nine chips.
VSR reduces maintenance intervals caused by flash failures
16
Planes
10 Chips
FAL
FAL FAL
8189intro.fm Draft Document for Review February 7, 2014 7:33 pm
10 mplementing BM FlashSystem 840 Storage
Figure 1-4 The Value of IBM FlashSystem Variable Stripe RAID.
t is important to emphasize that VSR only has an effect at the plane level. This means that
only the affected planes within a plane failure are converted to (N-1), yet it will maintain the
current stripe member count (9+1) layout through the rest of the areas of all other planes and
chips that are not involved in the plane failure.
To illustrate how VSR functions, assume that a plane fails within a flash chip and is no longer
available to store data. This might occur as a result of a physical failure within the chip, or
some damage being inflicted on the address or power lines to the chip. The plane failure
would be detected and the system could change the format of the page stripes that are used.
The data that was previously stored in physical locations across chips in all ten lanes using a
page stripe format with ten pages, is now stored across chips in only nine lanes using a page
stripe format with nine pages as reflected. Thus, no data stored in the memory system was
lost, and the memory system can self-adapt to the failure and continue to perform and
operate by processing read and write requests from host devices.
This ability of the system to automatically self-adapt, when needed, to chip and intra-chip
failures makes the FlashSystem flash module extremely rugged and robust, and capable of
operating despite the failure of one or more chips or intra-chip regions. t also makes the
system very user-friendly in that the failure of one, two, or even more individual memory chips
or devices does not require the removal and potential disposal of previously used memory
storage components. The reconfiguration or reformatting of the data to change the page
stripe formatting to account for chip or intra-chip failures might reduce the amount of physical
memory space that is held in reserve by the system and available for the system for
background operation. t should be noted that in all but the most extreme of circumstances (in
which case the system will create alerts), it does not impact usable capacity of performance.
Reliability, availability, and serviceability
The previous explanation brings other important topics to discuss, one of them being an
increase in RAS levels or Reliability, Availability and Serviceability levels. And more
importantly, BM FlashSystem RAS levels over other technologies.
Following is a summary of the capabilities of VSR:
Patented Variable Stripe RAD allows RAD stripe sizes to vary.
Form Factor SSD
Flash failure = Disk failure
Requires top-level RAD
Relatively frequent hot-swaps
Enterprise Flash Drive or Memory Module
Flash failure = Degraded state
within module
Performance impact on RAD set
Hot-swap to resolve
FlashSystem with Variable Stripe RAD
Preserves flash life
Preserves performance
Re-parity data in microseconds
No Parity
Value of
Variable Stripe RAD
Less maintenance touches
while still preserving the Iife,
protection, and
performance of the Day-1
experience
Parity
Parity
Chapter 1. FlashSystem Storage ntroduction 11
Draft Document for Review February 7, 2014 7:33 pm 8189intro.fm
f one plane fails in a ten-chip stripe, only the failed Plane is bypassed, and then data is
restriped across the remaining nine chips. No system rebuild is needed
VSR reduces maintenance intervals caused by flash failures.
1.7 Two dimensionaI (2D) FIash RAID
Two dimensional (2D) Flash RAD refers to the combination of Variable Stripe RAD (at the
flash module level) and system-level RAD 5.
The second Dimension of data protection, hence the name two-dimensional Flash RAD is
actually implemented going across Flash modules of RAD 5 protection. This system level
RAD 5 is striped across the appropriate number of flash modules in the system based on the
selected configuration. System-level RAD-5 could stripe across four (2D+1P+1S), eight
(6D+1P+1S) or twelve flash modules (10D+1P+1S).
The architecture allows to designate a dynamic flash module Hot Spare. Figure 1-5 shows
BM FlashSystem 2D RAD as illustrated in Figure 1-5.
Figure 1-5 IBM FlashSystem 2D-RAID
Two dimensional (2D) Flash RAD technology within BM FlashSystem provides two
independent layers of RAD 5 data protection within each system as mentioned earlier. The
module-level Variable Stripe RAD technology, and an additional system-level RAD 5 across
flash modules. When operating in system-level RAD 5 mode, redundant centralized RAD
controllers create a stripe arrangement across the four, eight or twelve flash modules in the
system. The system-level RAD 5 complements the Variable Stripe RAD technology
implemented within each flash module, and provides protection against data loss and data
unavailability resulting from flash module failures. t also allows data to be rebuilt onto a hot
spare Flash module, so flash modules can be replaced without data disruption.
n addition to 2D Flash RAD and Variable Stripe RAD data protection, BM FlashSystem
family storage systems incorporate other reliability features, including:
Error correcting codes to provide bit-level reconstruction of data from flash chips.
RAID 5 across FIash ModuIes (10 data + 1 parity + 1 hot spare)
ExternaI
Interfaces
(FC, IB)
RAID
ControIIers
VSR within FIash
ModuIes
(9 data + 1 parity)
2D FIash RAID
nterface A nterface A nterface A nterface B nterface B nterface B
RAD Controller A RAD Controller on C A RAD Controller A
RAD Controller B RAD Controller on B RAD Controller B
8189intro.fm Draft Document for Review February 7, 2014 7:33 pm
12 mplementing BM FlashSystem 840 Storage
Checksums and data integrity fields designed to protect all internal data transfers within
the system.
Overprovisioning to enhance write endurance and decrease write amplification.
Wear leveling algorithms balance the number of writes among flash chips throughout the
system.
Sweeper algorithms to help ensure all data within the system is read periodically to avoid
data fade issues.
Understanding 2D Flash RAD allows the reader to visualize the tremendous advantage over
other flash storage solutions. Both VSR and 2D Flash RAD are implemented and controlled
at FPGA hardware based levels, and not via Software-heavy, external controllers and legacy
architectures. Two-dimensional flash RAD eliminates single points of failure and provides
enhanced system-level reliability.

Vous aimerez peut-être aussi