Vous êtes sur la page 1sur 9

What is RAID?

Using Multiple Hard Drives for Performance and Reliability

s s
do
ny

Introduction

Back in the late 1980's and early 1990's, computer information servers were
to

encountering a dramatic increase in the amount of data they needed to serve


and store. Storage technologies were getting very expensive to place a large
number of high capacity hard drives in the servers. A solution was needed and
thus RAID was born.
An

So what exactly is RAID? First of all, the acronym stands for Redundant Array
of Inexpensive Disks. It was a system developed whereby a large number of low
cost hard drives could be linked together to form a single large capacity
storage device that offered superior performance, storage capacity and
reliability over older storage solutions. It has been widely used and
deployed method for storage in the enterprise and server markets, but over
the past 5 years has become much more common in end user systems.

Advantages of RAID

There are three primary reasons that RAID was implemented:

• Redundancy
• Increased Performance
• Lower Costs
Redundancy is the most important factor in the development of RAID for server
environments. This allowed for a form of backup of the data in the storage
array in the event of a failure. If one of the drives in the array failed, it
could either be swapped out for a new drive without turning the systems off
(referred to as hot swappable) or the redundant drive could be used. The
method of redundancy depends on which version of RAID is used.

The increased performance is only found when specific versions of the RAID
are used. Performance will also be dependent upon the number of drives used
in the array and the controller.

All managers of IT departments like low costs. When the RAID standards were
being developed, cost was also a key issue. The point of a RAID array is to
provide the same or greater storage capacity for a system compared to using
individual high capacity hard drives. A good example of this can be seen in
the price differences between the highest capacity hard drives and lower
capacity drives. Three drives of a smaller size could cost less than an
individual high-capacity drive but provide more capacity.

s
There are typically three forms of RAID used for desktop computer systems:
RAID 0, RAID 1 and RAID 5. In most cases, only the first two of these
versions is available and one of the two technically is not a form of RAID.

s
RAID 0
do
The lowest designated level of RAID, level 0, is actually not a valid type of
RAID. It was given the designation of level 0 because it fails to provide any
level of redundancy for the data stored in the array. Thus, if one of the
drives fails, all the data is damaged.
ny
to
An

RAID 0 uses a method called striping. Striping takes a single chunk of data
like a graphic image, and spreads that data across multiple drives. The
advantage that striping has is in improved performance. Twice the amount of
data can be written in a given time frame to the two drives compared to that
same data being written to a single drive.
Below is an example of how data is written in a RAID 0 implementation. Each
row in the chart represents a physical block on the drive and each column is
the individual drive. The numbers in the table represent the data blocks.
Duplicate numbers indicate a duplicated data block.

Drive 1 Drive 2
Block 1 1 2
Block 2 3 4
Block 3 5 6

Thus, if the 6 blocks of data above constitute a single data file, it can be
read and written to the drive much faster than if it were on a single drive.
Each drive working in parallel could read only 3 physical blocks while it
would take a single drive twice as long because it has to read 6 physical
blocks. The drawback of course is that if one drive fails, the data is no
longer functional. All 6 data blocks are needed for the file, but only three
are accessible.

s
Advantages:

s
• Increased storage performance
• No loss in data capacity

Disadvantages:
do

ny
No redundancy of data

RAID 1
to

RAID version 1 was the first real implementation of RAID. It provides a


simple form of redundancy for data through a process called mirroring. This
form typically requires two individual drives of similar capacity. One drive
is the active drive and the secondary drive is the mirror. When data is
An

written to the active drive, the same data is written to the mirror drive.

The following is an example of how the data is written in a RAID 1


implementation. Each row in the chart represents a physical block on the
drive and each column is the individual drive. The numbers in the table
represent the data blocks. Duplicate numbers indicate a duplicated data
block.
s s
Block 1
do
Drive 1 Drive 2
1 1
Block 2 2 2
ny
Block 3 3 3

This provides a full level of redundancy for the data on the system. If one
to

of the drives fails, the other drive still has all the data that existed in
the system. The big drawback of course is that the capacity of the RAID will
only be as big as the smallest of the two drives, effectively halving the
amount of storage capacity if the two drives were used independently.
An

Advantages:

• Provides full redundancy of data

Disadvantages

• Storage capacity is only as large as the smallest drive


• No performance increases
• Some downtime to change active drive during a failure

RAID 0+1

This is a hybrid form of RAID that some manufacturers have implemented to try

s
and give the advantages of each of the two versions combined. Typically this
can only be done on a system with a minimum of 4 hard drives. It then

s
combines the methods of mirroring and striping to provide the performance and
redundancy. The first set of drives will be active and have the data striped
the first two. do
across them while the second set of drives will be a mirror of the data on

Below is an example of how data is written in a RAID 0+1 implementation. Each


row in the chart represents a physical block on the drive and each column is
the individual drive. The numbers in the table represent the data blocks.
ny
Duplicate numbers indicate a duplicated data block.

Drive 1 Drive 2 Drive 3 Drive 4


Block 1 1 2 1 2
to

Block 2 3 4 3 4
Block 3 5 6 5 6
An

In this case, the data blocks will be striped across the drives within each
of the two sets while it is mirrors between the sets. This gives the
increased performance of RAID 0 because it takes the drive half the time to
write the data compared to a single drive and it provides redundancy. The
major drawback of course is the cost. This implementation requires a minimum
of 4 hard drives.

Advantages:

• Increased performance
• Data is fully redundant

Disadvantages:

• Large number of drives required


• Effective data capacity is halved
RAID 10 or 1+0

RAID 10 is effectively a similar version to RAID 0+1. Rather than striping


data between the disk sets and then mirroring them, the first two drives in
the set are a mirrored together. The second two drives form another set of
disks that is are mirror of one another but store striped data with the first
pair. This is a form of nested RAID setup. Drives 1 and 2 are a RAID 1 mirror
and drives 3 and 4 are also a mirror. These two sets are then setup as
stripped array.

s s
do
ny
to
An

Below is an example of how data is written in a RAID 10 implementation. Each


row in the chart represents a physical block on the drive and each column is
the individual drive. The numbers in the table represent the data blocks.
Duplicate numbers indicate a duplicated data block.

Drive 1 Drive 2 Drive 3 Drive 4


Block 1 1 1 2 2
Block 2 3 3 4 4
Block 3 5 5 6 6

Just like the RAID 0+1 setup, RAID 10 requires a minimum of four hard drives
to function. Performance is pretty much the same but the data is a bit more
protected than the RAID 0+1 setup.

Advantages:
• Increased performance
• Data is fully redundant

Disadvantages:

• Large number of drives required


• Effective data capacity is halved

RAID 5

This is the most powerful form of RAID that can be found in a desktop
computer system. Typically it requires the form of a hardware controller card
to manage the array, but some desktop operating systems can create these via
software. This method uses a form of striping with parity to maintain data
redundancy. A minimum of three drives is required to build a RAID 5 array and
they should be identical drives for the best performance.

s s
do
ny
to
An

Parity is essentially a form of binary math that compares two blocks a data
and forms a third data block based upon the first two. The easiest way to
explain it is even and odd. If the sum of the two data blocks is even, then
the parity bit is even. If the sum of the two data blocks is odd, the parity
bit is odd. So 0+0 and 1+1 both equal 0 while 0+1 or 1+0 will equal 1. Based
on this form of binary math, a failure in one drive in the array will allow
the parity bit to reconstruct the data when the drive is replaced.

With that information in mind, here is an example of how a RAID 5 array would
work. Each row in the chart represents a physical block on the drive and each
column is the individual drive. The numbers in the table represent the data
blocks. Duplicate numbers indicate a duplicated data block. A "P" indicates a
parity bit for two blocks of data.

Drive 1 Drive 2 Drive 3


Block 1 1 2 P
Block 2 3 P 4
Block 3 P 5 6

The parity bit shifts between the drives to increase the performance and
reliability of the data. The drive array will still have increased
performance over a single drive because the multiple drives can write the
data faster than a single drive. The data is also fully redundant because of
the parity bits. In the case of drive 2 failing, the data can be rebuilt
based on the data and parity bits on the two remaining drives. Data capacity
is reduced due to the parity data blocks. In practice the capacity of the
array is based on the following equation where n is the number of drives and
z is the capacity:

(n-1)z = Array Capacity

In the case of three 500 gigabyte hard drives, the total capacity would be

s
(3-1)x500GB or 1000 gigabytes.

s
Hardware RAID 5 implementations can also have a function called hot swap.
This allows for drives to be replaced while the array is still functioning to
do
either increase the drives capacity or to replace a damaged drive. The drive
controller then takes time while the array is running to rebuild the data
array across the drives. This is a valuable feature for systems that require
24x7 operation.

Advantages:
ny

• Increased storage array performance


• Full data redundancy
to

• Ability to run 24x7 with hot swap

Disadvantages
An

• High costs to implement


• Performance degrades during rebuilding

Software vs. Hardware RAID

In order for RAID to function, there needs to be software either through the
operating system or via dedicated hardware to properly handle the flow of
data from the computer system to the drive array. This is particularly
important when it comes to RAID 5 due to the large amount of computing
required to generate the parity calculations.

In the case of software implementations, CPU cycles are taken away from the
general computing environment to perform the necessary tasks for the RAID
interface. Software implementations are very low cost monetarily because all
that is necessary to implement one is the hard drives. The problem with
software RAID implementations is the performance drop of the system. In
general, this performance hit can be anywhere from 5% or even greater
depending upon the processor, memory, drives used and the level of RAID
implemented. Most people do not use software RAID anymore due to the
decreasing costs of hardware RAID controllers over the years.

Hardware RAID has the advantage of dedicated circuitry to handle all the RAID
drive array calculations outside of the processor. This provides excellent
performance for the storage array. The drawbacks to hardware RAID have been
the costs. In the case of RAID 0/1 controllers, those costs have become so
low that many chipset and motherboard manufacturers are including these
capabilities on the motherboards. The real costs rest with RAID 5 hardware
that require more circuitry for added computing ability.

Drive Selection

What a lot of people don't realize is that the performance and capacity of a
RAID array is heavily dependent upon the hard drives used in the array. For
the best results, all hard drives in the array should be the same brand and
model. This means that all of the hard drives will have the same capacity and
performance levels. It is not a requirement that the drives be matched, but
mismatching the drives can actually hurt the RAID array.

s
The capacity of the RAID array will depend upon the method implemented. In

s
the case of RAID 0, the striping can only be done across an equal amount of
space on the two drives. As a result, if an 80GB and 100GB drive are used to
do
make the array, the final capacity of the array would only be 160GB.
Similarly, in RAID 1 the drives can only mirror data equal to the smallest
size. Thus based on the two drives mentioned before, the final data size
would only be 80GB. RAID 5 is a bit more complicated because of the formula
mentioned before. Once again the smallest capacity would be used. So if a
80GB, 100 GB and 120GB drive were used to make a RAID 5 array, the final
capacity would be 160GB of data.
ny

Performance of the array is also dependent upon the drives. In order for the
array to function properly, it must wait for the data to be written to each
of the drives before it can continue. This means that in the example charts
to

for the RAID arrays, the controller must wait until all physical data has
been written to block 1 across all the drives in the array before it can
continue to the next set of data for the drives. This means an array where
one drive has half the performance of the other two will slow down the
An

overall performance of the other drives.

Conclusions

Overall RAID provides systems with a variety of benefits depending upon the
version implemented. Most consumer users will likely opt to use the RAID 0
for increased performance without the loss of storage space. This is
primarily because redundancy is not an issue for the average user. In fact,
most computer systems will only offer either RAID 0 or 1. The costs of
implementing a RAID 0+1 or RAID 5 system generally are too expensive for the
average consumer and are only found in high-end workstation or server level
systems.

Vous aimerez peut-être aussi