Vous êtes sur la page 1sur 17

INTERNATIONAL ISLAMIC UNIVERSITY MALAYSIA

ASSIGNMENT 1

RAID
(REDUNDANT ARRAY OF INDEPENDENT DISKS)

Group Members : Nur Hidayah Binti Mohd Din


0822614
: Nur Farhana Binti Noordin
0823844
: Nur Khairunnisa Binti Juarah 0824780
: Nor Iffah Binti Md Najib
0828770

Group : COAD 09
Section : 1
Lecturer : Moaiad Ahmad Khder
Subject : CSC1401---Introduction to Computer
Organization.
Introduction

An Overall description

The purpose of this paper is to give an over view of the history of RAID, what
actually RAID is, why we need it, how it works, the different levels of RAID and also
the advantages and disadvantages of each levels of the RAID.

History
Back in the middle eighties SLED (Single Large Expensive Disk) was the
most popular media for storing data. At that time, the disk drives did not by far have
the storing capacity or even the performance that the disks available today. Therefore,
in order to save a large amount of data, one will need a bunch of disks which
sometimes can be a complete mess. This is also an inconvenient way of handling data.
On top of that, the price of the disk was also very expensive which explained SLED.

Another big problem was, and still is, loss of data because of the disk failure.
A solution for this was pretty much needed. Hence, IBM co-sponsored Berkeley
University of California to build a disk array subsystem in order to help solving these
problems to which IBM had received a patent in 1978. In 1987 Randy Katz and Dave
Patterson, both working at Berkeley University of California, had succeeded. They
called the solution as RAID which stand for “Redundant Array of Inexpensive Disks”,
although some people prefer to change the original word which is ‘Inexpensive’, to
‘Independent’. Randy and Dave had clustered multiple smaller and less expensive
disks into an array.

By doing this, all disks appear to the rest of the world as if there was just one
single large disk. The result was compared to SLED according to cost versus the
performance. It turned out that RAID had the same or superior performance as SLED,
but with a theoretical Meantime Before Data Loss (MTBDL) that was reduced to an
acceptable level. The search for a way of decreasing the MTBDL now started. There
was need for a way to prevent single disk drive failures from causing data loss within
the array of disks. The result was the seven RAID levels 0 through 6. In addition to
that, there are few more inventions that are still going on such as the inventions of
RAID 7, 10, 12, and 15.

In computing, RAID is a system which uses multiple hard drives to share or


replicate data among the drives. Depending on the version chosen, the benefit of
RAID is one or more of increased data integrity, fault-tolerance, throughput or
capacity compared to single drives. In its original implementations (in which it was an
abbreviation for "redundant array of inexpensive disks"), its key advantage was the
ability to combine multiple low-cost devices using older technology into an array that
offered greater capacity, reliability, speed, or a combination of these things, than was
affordably available in a single device using the newest technology.
At the very simplest level, RAID combines multiple hard drives into a single
logical unit. Thus, instead of seeing several different hard drives, the operating system
sees only one. RAID is typically used on server computers, and is usually (but not
necessarily) implemented with identically-sized disk drives. With decreases in hard
drive prices and wider availability of RAID options built into motherboard chipsets,
RAID is also being found and offered as an option in more advanced user computers.
This is especially true in computers dedicated to storage-intensive tasks, such as video
and audio editing.

The original RAID specification suggested a number of prototype "RAID


levels", or combinations of disks. Each had theoretical advantages and disadvantages.
Over the years, different implementations of the RAID concept have appeared. Most
differ substantially from the original idealized RAID levels, but the numbered names
have remained.

Why we need RAID?

RAID has for a long time been something that you only find in large server systems,
but lately cheaper RAID controller card have made it possible to get a RAID system
even for small servers and home computers. These will of course not have all the
features, which the more expensive ones have. Different levels of RAID have
different advantages and disadvantages. Therefore one must make an analysis of the
workload before deciding what to buy. The choice also much depend on the quality
attributes needed. Some examples of quality attributes one can get by using a RAID
system is data redundancy, fault tolerance, increased capacity and increased
performance.

How does RAID work?


The main idea behind RAID is, as mentioned in the introduction, to take some
inexpensive disks and group them together, which will make the system see them as
one single disk. This is done by using a RAID controller card that handle all I/O to the
disks, and which knows where the stored data can be found. RAID works in three
different ways to provide the quality attributes mentioned above. These ways are
mirroring, striping and parity, of which each can be used either separately or mixed
with one or more of the others. This is why RAID is divided into different levels.

• Mirroring
The easiest way to get both availability and fault tolerance is to make a copy
of all data on a second disk. This is called mirroring and you normally get one
MB for every two MB of physical disk space. You will always have the
second disk to read from if the other disk fails. The disadvantages of this
method are waste of disk space and that you will not get higher write
performance. You can however get higher read performance because reads can
occur simultaneously on every drive.

• Striping

Whereas mirroring and parity deal with improvement of reliability, striping is


used to get higher performance. The idea is to split data into small pieces,
which then are distributed across the disks. This way the disks can work in
parallel with different pieces of data. You will not lose any disk space as with
mirroring and striping. One big disadvantage with this method is that if one
disk breaks, all data will be lost. Therefore it is most often not used alone but
in combination with mirroring or parity. Only such data that can be recreated
by the application, such as cache or other temporary stored data is
recommended to store using only striping.

• Parity

Mirroring and striping are fairly easy to understand. Parity however is a bit
more complicated. In the same way as with mirroring, it is used to improve the
availability but without the waste of space. If you have X number of data
elements, they can be used to create a parity. Then you end up with X+1 data
elements. It is always possible to recover a lost element by using the others.
The advantage with parity is of course that you have no single point of failure.
However, to achieve this, it will cost a lot of computing power.

Different levels of RAID


1) RAID 0

RAID 0 was created in the early advancements of the RAID technology. This level is
also known as striping. Taking more than two disk drives, preferably five, and
striping them together to create one virtual disk will accomplish this level of RAID.
Data is then written to what is known as the stripe set and is spanned across the
volume, where each drive operates parallel of the others. RAID 0 is commonly used
in environments where files are large and the data is sequential.

The benefits of RAID 0 are fairly straightforward. Data access performance is


increased because data request queues are shortened for each disk drive. Disk
utilization is decreased because there are more drives to help take on the load of data
access. This is achieved by writing data sequentially across the drive set so the data
can later be retrieved by each drive simultaneously.

However, the increased performance of RAID 0 only applies to applications using


sequential access because it involves no indexing of the data. Furthermore, striping
the drives together does nothing to protect the information stored on the drives;
therefore there is no data redundancy. In spite of this, RAID 0 can be combined with
other levels of RAID to not only increase performance, but also employ data
redundancy and fault tolerance.
2) RAID 1

RAID 1 encompasses the potential for data redundancy and is commonly known as
mirroring, which is the response for the reliability issues of RAID 0. In lieu of writing
the data across the set of drives, as in RAID 0, mirroring duplicates the data across the
set. For example, in the most simple of cases, a system may have two hard disk drives
operating on the same controller. The same data written to disk 0 would we
simultaneously written to disk.

The RAID 1 scenario grants the user data protection, in that when one drive fails,
there is a replica, which can be immediately brought online, depending upon the
sophistication of the environment, to eliminate any downtime. Additionally, the failed
disk drive can be replaced during a more convenient time. Common uses for RAID 1
include very sensitive data or data that is mandatory for a system to operate, such as
the boot drive, and where data is not sequential. Because data is written twice as often
with RAID 1, it may seem that writes to the drive set would take twice as long, but
this is a myth. In opposition, writes to a mirrored set generally take only 15% to 20%
longer than writes to a single member. Some write performance to the mirrored array
may be lost; however, as in RAID 0, lowering disk utilization increases performance.
One other fallback to implementing RAID 1 is the higher costs it demands, since disk
drive requirements double. Implementing RAID 1 and RAID 0 is a fairly simple task,
but they only lay the groundwork for the absolute potential of RAID.

3) RAID 2
Level 2 is the "black sheep" of the RAID family, because it is the only RAID
level that does not use one or more of the "standard" techniques of mirroring, striping
and/or parity. RAID 2 uses something similar to striping with parity, but not the same
as what is used by RAID levels 3 to 7. It is implemented by splitting data at the bit
level and spreading it over a number of data disks and a number of redundancy disks.
The redundant bits are calculated using Hamming codes, a form of error correcting
code (ECC).

RAID Level 2, which uses Hamming error correction codes, is intended for use
with drives which do not have built-in error detection. Disks are synchronized and
striped in very small stripes, often in single bytes/words. Hamming codes contain
parity for distinct overlapping subsets of components. Each data word has its
Hamming Code ECC word recorded on the ECC disks.

Each time something is to be written to the array these codes are calculated and
written along side the data to dedicated ECC disks; when the data is read back these
ECC codes are read as well to confirm that no errors have occurred since the data was
written. If a single-bit error occurs, it can be corrected "on the fly".

In one version of this scheme, four disks require three redundant disks, one less
than mirroring. Hamming codes error-correction is calculated across corresponding
bits on disks, and is stored on multiple parity disks. All SCSI drives support built-in
error detection, so this level is of little use when using SCSI drives. Raid 2 is seldom
used today since ECC is embedded in almost all modern disk drives.

For a number of reasons, including the fact that modern disk drives contain
their own internal ECC, RAID 2 is not a practical disk array scheme. If a single
component fails, several of the parity components will have inconsistent values, and
the failed component is the one held in common by each incorrect subset. The lost
information is recovered by reading the other components in a subset, including the
parity component, and setting the missing bit to 0 or 1 to create proper parity value for
that subset. Thus, multiple redundant disks are needed to identify the failed disk, but
only one is needed to recover the lost information.

4) RAID 3
RAID 3 implements byte level striping with parity. It requires a minimum of 3
disks to be implemented. Data to be written is divided into stripes and stripe parity is
calculated for every write operation. The stripe parity is stored on a separate parity
disk. Provides fault tolerance and disk usage is better than that of mirroring.
Controller design is quite complex. Write operations are slow as there are overheads
of parity calculation and writing parity to a separate disk. Read operations are faster
as compared to write.

RAID 3 can be used in data intensive or single-user environments which access


long sequential records to speed up data transfer. However, RAID-3 does not allow
multiple I/O operations to be overlapped and requires synchronized-spindle drives in
order to avoid performance degradation with short records. Byte-level striping
requires hardware support for efficient use.

Striped set with dedicated parity or bit interleaved parity or byte level parity.
This mechanism provides an improved performance and fault tolerance similar to
RAID 5, but with a dedicated parity disk rather than rotated parity stripes. The single
parity disk is a bottle-neck for writing since every write requires updating the parity
data. One minor benefit is the dedicated parity disk allows the parity drive to fail and
operation will continue without parity or performance penalty.

The parity information is sent to a dedicated parity disk, but the failure of any
disk in the array can be tolerated (i.e., the dedicated parity disk doesn't represent a
single point of failure in the array). The dedicated parity disk does generally serve as a
performance bottleneck, especially for random writes, because it must be accessed
any time anything is sent to the array. This is contrasted to distributed-parity levels
such as RAID 5 which improve write performance by using distributed parity (though
they still suffer from large overheads on writes. RAID 3 differs from RAID 4 only in
the size of the stripes sent to the various disks.
One can improve upon memory-style ECC disk arrays by noting that, unlike
memory component failures, disk controllers can easily identify which disk has failed.
Thus, one can use a single parity rather than a set of parity disks to recover lost
information.

5) RAID 4

RAID Level 4 stripes data at a block level across several drives, with parity
stored on one drive. This makes it in some ways the "middle sibling" in a family of
close relatives, RAID levels 3, 4 and 5. It is like RAID 3 except that it uses blocks
instead of bytes for striping, and like RAID 5 except that it uses dedicated parity
instead of distributed parity. Going from byte to block striping improves random
access performance compared to RAID 3, but the dedicated parity disk remains a
bottleneck, especially for random write performance. Fault tolerance, format
efficiency and many other attributes are the same as for RAID 3 and RAID 5.

Each entire block is written onto a data disk. Parity for same rank blocks is
generated on writes, recorded on the parity disk and checked on Reads. RAID 4
requires a minimum of 3 drives to implement.

This type uses large stripes, which means you can read records from any single
drive. In this setup, files can be distributed between multiple disks. Each disk operates
independently which allows I/O requests to be performed in parallel, though data
transfer speeds can suffer due to the type of parity. The error detection is achieved
through dedicated parity and is stored in a separate, single disk unit.

The parity information allows recovery from the failure of any single drive.
The performance of a level 4 array is very good for reads (the same as level 0).
Writes, however, require that parity data be updated each time. The Controller design
is quite complex. This slows small random writes, in particular, though large writes or
sequential writes are fairly fast. Because only one drive in the array stores redundant
data, the cost per megabyte of a level 4 array can be fairly low. It is difficult to rebuild
data in case of a disk failure. RAID 4 offers no advantages over RAID-5 and does not
support multiple simultaneous write operations.

6) RAID 5

RAID 5 is similar to RAID 4 except that it exchanges the dedicated parity


drive for distributed parity drive for a distributed parity algorithm, writing data and
parity blocks across all the drives in the array. This removes the bottleneck that the
dedicated parity drive represents, improving write performance slightly and allowing
better parallelism in a multiple-transaction environment, thought the overhead
necessary in dealing with the parity continues to bog down writes. Fault tolerance is
maintained by ensuring that the parity information for any given block of data is
placed on a drive separate from those used to store the data itself. The performance of
a RAID 5 array can be adjusted by trying different stripe size until one is found that is
well-matched to the application being used.

7) RAID 6
RAID 6 stripes blocks of data and parity across an array of drives like RAID
5, except that it calculates two sets of parity information for each parcel of data. The
goal of this duplication is solely to improve fault tolerance; RAID 6 can handle the
failure of any two drives in the array while other single RAID levels can handle at
most one fault. Performance-wise, RAID 6 is generally slightly worse than RAID 5 in
terms of writes due to the added overhead of more parity calculations, but may be
slightly faster in random reads due to spreading of data over one more disk. As with
RAID levels 4 and 5, performance can be adjusted by experimenting with different
stripe sizes.

Additional Information

8) RAID 7

Unlike the other RAID levels, RAID 7 isn't an open industry standard. It is a
trademarked marketing term of Storage Computer Corporation, used to describe their
proprietary RAID design. RAID 7 is based on concepts used in RAID levels 3 and 4,
but greatly enhanced to address some of the limitations of those levels. Of particular
note is the inclusion of a great deal of cache arranged into multiple levels, and a
specialized real-time processor for managing the array asynchronously. This hardware
allow the array to handle many simultaneous operations, greatly improving
performance of all sorts while maintaining fault tolerance. In particular, RAID 7
offers much improved random read and writes performance over RAID 3 or RAID 4
because the dependence on the dedicated parity disk is greatly reduced through the
added hardware. The increased performance of RAID 7 of course comes at a cost.

The advantages and disadvantages of each RAID


As the RAID in each different levels are unique in its design architectures, all
of it have its’ own specialties and drawbacks.

RAID 0

Specialties:

• RAID 0 is very simple design and easy to be implemented.


• It also offers the best performance and cheap because no parity is used.
• It is also very high in data transfer capacity.
• It also reduce the I/O requests queuing time which is way much better than
having a single large disk -- For instance, if there are two I/O request which
each are from two different blocks of data. There is great possibility that both
blocks are from different disks. This will then enable the two requests to be
issued simultaneously which fasten the queuing time.

Drawbacks:

• Since RAID 0 is non-redundant, there is a high risk of data might be lost if any
disk failure occurred. This shows that RAID 0 does not provide a fault-
tolerance environment. Hence, a critical files or workloads are simply not
suitable to be stored in this level of RAID.

RAID 1

Specialties:

• RAID 1 provides 100% of data redundancy. It will duplicate all the data in
logical disk which will then be mapped to two separate strips in the physical
disks. Hence, every data has it’s mirror disk.
• RAID 1 also offers a fault-tolerance environment -- have real-time back-up of
all data.
• It also increase the performance in data transfer for reads -- as the requests
application can be split into both disks that contain the desired data (from the
disk and it’s parity ), hence both disks can participate in each request thus
increase the performance.

Drawbacks:

• Is expensive as it needs twice the space of the logical disk it supports (for the
parity/mirror disks). Therefore RAID 1 is usually being used to store system
software or any other critical files only.

RAID 2
Specialties:

• Extremely high data transfer rates. Even the highest compared to all other
RAID levels.
• It requires fewer disks compared to RAID 1.
• The data availability is very good as it implies the "on the fly" error
correction.

Drawbacks:

• It is very expensive.

RAID 3

Specialties:

• RAID 3 is very good in sequential read and write. It is even faster than RAID
5.
• It has almost the same level with RAID 0 in striping but with the additional
capabilities of data protection.
• Very high data transfer rates.
• Disk failure has an insignificant impact to the application process.

Drawbacks:

• RAID 3 is very poor in random reads and writes.


• RAID 3 does not allow multiple I/O operations to be overlapped.

RAID 4

Specialties:

• RAID 4 has a high I/O request rates.


• Since it uses independent access technique, each member disk is able to
operate independently. Hence, separate I/O requests can be satisfied in
parallel.

Drawbacks:

• It is not really suitable for application that requires a high data transfer rate.
• Difficult and inefficient data rebuild in the event of disk failure.
• Block Read transfer rate equal to that of a single disk
• RAID 4 does not support multiple simultaneous write operations.

RAID 5
Specialties:

• RAID 5 has high data availability.


• Has the highest read data transaction rate.

Drawbacks:

• Has the most complex controller design.

• Difficult to rebuilt if disk failure occures (compared to RAID 1).

RAID 6

Specialties:

• Offers highest data availability -- three disks need to be fail in order to cause
data to be lost.
• RAID 6 allows extra fault-tolerance by using a second independent parity
scheme.
• Perfect solution for mission critical applications.

Drawbacks:

• More complex controller design

• Controller overhead to compute parity addresses is extremely high.

RAID 7

Specialties:

• Secure storage system

• Hugely reduced manual handing


The best RAID
As we all know, there are two basic design goals in using RAID technology, one is
performance, and the other one is data protection. So, here are some of the
characteristics that differ each levels of RAID and suggestion of the best RAID
available for the specific characteristic or attributes :

No. Attributes Best RAID

1 Data transfer rate RAID 2 and RAID 3. Both have the highest data
transfer rate.

2 Data Protection and RAID 6 as it has dual disk drive-failure


Reliability protection. It needs at least 3 disks to be failed in
order for the data to be lost.

3 Price RAID 0 is the cheapest compared to the other


level of RAID.

4 I/O request rate RAID 0 has the highest I/O request rate as it
balances the I/O load across multiple disks.

5 Minimum disk drives RAID 0 and RAID 1 needs the most minimum
needed number of disk drive which is only 2 disk drives.
RAID 0 needs at least 2 disks for striping or else
it would not make any different from the logical
disk (single disk) in any way. As for RAID 1,
one disk will be the mirror or parity of the other
disks.

6 Performance In RAID 0, data is split across drives, resulting


in higher data throughput and because no
redundant information is stored, performance is
very good.
In order to determine which one is the best RAID, few question need to be answered
on which one is more important the person,

• Cost of disk storage?


• Data protection or data availability?
• Or high performance?

As for a student, I think RAID 0 would be the best choice among all the levels of
RAID as it is high in performance. RAID 0 has the fastest read and writes
performance as no redundant data is stored. It also offers the highest I/O request rate
as it balance the I/O load in multiple disks. Besides, it needs only a minimum of two
drives to operate. On top of that, it is also lower in cost compared to the other levels
of RAID. Although RAID 0 offers no fault-tolerance environment as it has no
redundant data or back-up disks, this should not be a main problem for people
nowadays as we can easily get our secondary storage (to act as back up storage) with
an affordable price. Furthermore, as a student I don’t think there is any so-called
critical files that need to have a back-up in case of disk failure other than the
assignments and projects which are usually be copied into the removable secondary
storage such as the external hard disk or the pen drives too. That is why I think RAID
0 really suits the students best.
References

Book

• Computer Organization & Architecture Designing for Performance, Seventh


Edition, William Stallings, Pearson Practice Hall, 2006.

Websites

Storagesearch.com, RAID manufacturers on STORAGE search.com,


• http://www.storagesearch.com/raid.html

What is a RAID,
• http://www.4raid.com/raidlevels.htm

Berkeley, RAID,
• http://www.sims.berkeley.edu/courses/is257/f99/Lecture10_257/sld025.htm

Basic levels of RAID,


• http://www.broadberry.co.uk/explanations/RAID_level_4.htm
• http://en.wikipedia.org/wiki/RAID
• http://www.acnc.com/04_00.html
• http://www.ecs.umass.edu/ece/koren/architecture/Raid/basicRAID.html

Vous aimerez peut-être aussi