Académique Documents
Professionnel Documents
Culture Documents
ASSIGNMENT 1
RAID
(REDUNDANT ARRAY OF INDEPENDENT DISKS)
Group : COAD 09
Section : 1
Lecturer : Moaiad Ahmad Khder
Subject : CSC1401---Introduction to Computer
Organization.
Introduction
An Overall description
The purpose of this paper is to give an over view of the history of RAID, what
actually RAID is, why we need it, how it works, the different levels of RAID and also
the advantages and disadvantages of each levels of the RAID.
History
Back in the middle eighties SLED (Single Large Expensive Disk) was the
most popular media for storing data. At that time, the disk drives did not by far have
the storing capacity or even the performance that the disks available today. Therefore,
in order to save a large amount of data, one will need a bunch of disks which
sometimes can be a complete mess. This is also an inconvenient way of handling data.
On top of that, the price of the disk was also very expensive which explained SLED.
Another big problem was, and still is, loss of data because of the disk failure.
A solution for this was pretty much needed. Hence, IBM co-sponsored Berkeley
University of California to build a disk array subsystem in order to help solving these
problems to which IBM had received a patent in 1978. In 1987 Randy Katz and Dave
Patterson, both working at Berkeley University of California, had succeeded. They
called the solution as RAID which stand for “Redundant Array of Inexpensive Disks”,
although some people prefer to change the original word which is ‘Inexpensive’, to
‘Independent’. Randy and Dave had clustered multiple smaller and less expensive
disks into an array.
By doing this, all disks appear to the rest of the world as if there was just one
single large disk. The result was compared to SLED according to cost versus the
performance. It turned out that RAID had the same or superior performance as SLED,
but with a theoretical Meantime Before Data Loss (MTBDL) that was reduced to an
acceptable level. The search for a way of decreasing the MTBDL now started. There
was need for a way to prevent single disk drive failures from causing data loss within
the array of disks. The result was the seven RAID levels 0 through 6. In addition to
that, there are few more inventions that are still going on such as the inventions of
RAID 7, 10, 12, and 15.
RAID has for a long time been something that you only find in large server systems,
but lately cheaper RAID controller card have made it possible to get a RAID system
even for small servers and home computers. These will of course not have all the
features, which the more expensive ones have. Different levels of RAID have
different advantages and disadvantages. Therefore one must make an analysis of the
workload before deciding what to buy. The choice also much depend on the quality
attributes needed. Some examples of quality attributes one can get by using a RAID
system is data redundancy, fault tolerance, increased capacity and increased
performance.
• Mirroring
The easiest way to get both availability and fault tolerance is to make a copy
of all data on a second disk. This is called mirroring and you normally get one
MB for every two MB of physical disk space. You will always have the
second disk to read from if the other disk fails. The disadvantages of this
method are waste of disk space and that you will not get higher write
performance. You can however get higher read performance because reads can
occur simultaneously on every drive.
• Striping
• Parity
Mirroring and striping are fairly easy to understand. Parity however is a bit
more complicated. In the same way as with mirroring, it is used to improve the
availability but without the waste of space. If you have X number of data
elements, they can be used to create a parity. Then you end up with X+1 data
elements. It is always possible to recover a lost element by using the others.
The advantage with parity is of course that you have no single point of failure.
However, to achieve this, it will cost a lot of computing power.
RAID 0 was created in the early advancements of the RAID technology. This level is
also known as striping. Taking more than two disk drives, preferably five, and
striping them together to create one virtual disk will accomplish this level of RAID.
Data is then written to what is known as the stripe set and is spanned across the
volume, where each drive operates parallel of the others. RAID 0 is commonly used
in environments where files are large and the data is sequential.
RAID 1 encompasses the potential for data redundancy and is commonly known as
mirroring, which is the response for the reliability issues of RAID 0. In lieu of writing
the data across the set of drives, as in RAID 0, mirroring duplicates the data across the
set. For example, in the most simple of cases, a system may have two hard disk drives
operating on the same controller. The same data written to disk 0 would we
simultaneously written to disk.
The RAID 1 scenario grants the user data protection, in that when one drive fails,
there is a replica, which can be immediately brought online, depending upon the
sophistication of the environment, to eliminate any downtime. Additionally, the failed
disk drive can be replaced during a more convenient time. Common uses for RAID 1
include very sensitive data or data that is mandatory for a system to operate, such as
the boot drive, and where data is not sequential. Because data is written twice as often
with RAID 1, it may seem that writes to the drive set would take twice as long, but
this is a myth. In opposition, writes to a mirrored set generally take only 15% to 20%
longer than writes to a single member. Some write performance to the mirrored array
may be lost; however, as in RAID 0, lowering disk utilization increases performance.
One other fallback to implementing RAID 1 is the higher costs it demands, since disk
drive requirements double. Implementing RAID 1 and RAID 0 is a fairly simple task,
but they only lay the groundwork for the absolute potential of RAID.
3) RAID 2
Level 2 is the "black sheep" of the RAID family, because it is the only RAID
level that does not use one or more of the "standard" techniques of mirroring, striping
and/or parity. RAID 2 uses something similar to striping with parity, but not the same
as what is used by RAID levels 3 to 7. It is implemented by splitting data at the bit
level and spreading it over a number of data disks and a number of redundancy disks.
The redundant bits are calculated using Hamming codes, a form of error correcting
code (ECC).
RAID Level 2, which uses Hamming error correction codes, is intended for use
with drives which do not have built-in error detection. Disks are synchronized and
striped in very small stripes, often in single bytes/words. Hamming codes contain
parity for distinct overlapping subsets of components. Each data word has its
Hamming Code ECC word recorded on the ECC disks.
Each time something is to be written to the array these codes are calculated and
written along side the data to dedicated ECC disks; when the data is read back these
ECC codes are read as well to confirm that no errors have occurred since the data was
written. If a single-bit error occurs, it can be corrected "on the fly".
In one version of this scheme, four disks require three redundant disks, one less
than mirroring. Hamming codes error-correction is calculated across corresponding
bits on disks, and is stored on multiple parity disks. All SCSI drives support built-in
error detection, so this level is of little use when using SCSI drives. Raid 2 is seldom
used today since ECC is embedded in almost all modern disk drives.
For a number of reasons, including the fact that modern disk drives contain
their own internal ECC, RAID 2 is not a practical disk array scheme. If a single
component fails, several of the parity components will have inconsistent values, and
the failed component is the one held in common by each incorrect subset. The lost
information is recovered by reading the other components in a subset, including the
parity component, and setting the missing bit to 0 or 1 to create proper parity value for
that subset. Thus, multiple redundant disks are needed to identify the failed disk, but
only one is needed to recover the lost information.
4) RAID 3
RAID 3 implements byte level striping with parity. It requires a minimum of 3
disks to be implemented. Data to be written is divided into stripes and stripe parity is
calculated for every write operation. The stripe parity is stored on a separate parity
disk. Provides fault tolerance and disk usage is better than that of mirroring.
Controller design is quite complex. Write operations are slow as there are overheads
of parity calculation and writing parity to a separate disk. Read operations are faster
as compared to write.
Striped set with dedicated parity or bit interleaved parity or byte level parity.
This mechanism provides an improved performance and fault tolerance similar to
RAID 5, but with a dedicated parity disk rather than rotated parity stripes. The single
parity disk is a bottle-neck for writing since every write requires updating the parity
data. One minor benefit is the dedicated parity disk allows the parity drive to fail and
operation will continue without parity or performance penalty.
The parity information is sent to a dedicated parity disk, but the failure of any
disk in the array can be tolerated (i.e., the dedicated parity disk doesn't represent a
single point of failure in the array). The dedicated parity disk does generally serve as a
performance bottleneck, especially for random writes, because it must be accessed
any time anything is sent to the array. This is contrasted to distributed-parity levels
such as RAID 5 which improve write performance by using distributed parity (though
they still suffer from large overheads on writes. RAID 3 differs from RAID 4 only in
the size of the stripes sent to the various disks.
One can improve upon memory-style ECC disk arrays by noting that, unlike
memory component failures, disk controllers can easily identify which disk has failed.
Thus, one can use a single parity rather than a set of parity disks to recover lost
information.
5) RAID 4
RAID Level 4 stripes data at a block level across several drives, with parity
stored on one drive. This makes it in some ways the "middle sibling" in a family of
close relatives, RAID levels 3, 4 and 5. It is like RAID 3 except that it uses blocks
instead of bytes for striping, and like RAID 5 except that it uses dedicated parity
instead of distributed parity. Going from byte to block striping improves random
access performance compared to RAID 3, but the dedicated parity disk remains a
bottleneck, especially for random write performance. Fault tolerance, format
efficiency and many other attributes are the same as for RAID 3 and RAID 5.
Each entire block is written onto a data disk. Parity for same rank blocks is
generated on writes, recorded on the parity disk and checked on Reads. RAID 4
requires a minimum of 3 drives to implement.
This type uses large stripes, which means you can read records from any single
drive. In this setup, files can be distributed between multiple disks. Each disk operates
independently which allows I/O requests to be performed in parallel, though data
transfer speeds can suffer due to the type of parity. The error detection is achieved
through dedicated parity and is stored in a separate, single disk unit.
The parity information allows recovery from the failure of any single drive.
The performance of a level 4 array is very good for reads (the same as level 0).
Writes, however, require that parity data be updated each time. The Controller design
is quite complex. This slows small random writes, in particular, though large writes or
sequential writes are fairly fast. Because only one drive in the array stores redundant
data, the cost per megabyte of a level 4 array can be fairly low. It is difficult to rebuild
data in case of a disk failure. RAID 4 offers no advantages over RAID-5 and does not
support multiple simultaneous write operations.
6) RAID 5
7) RAID 6
RAID 6 stripes blocks of data and parity across an array of drives like RAID
5, except that it calculates two sets of parity information for each parcel of data. The
goal of this duplication is solely to improve fault tolerance; RAID 6 can handle the
failure of any two drives in the array while other single RAID levels can handle at
most one fault. Performance-wise, RAID 6 is generally slightly worse than RAID 5 in
terms of writes due to the added overhead of more parity calculations, but may be
slightly faster in random reads due to spreading of data over one more disk. As with
RAID levels 4 and 5, performance can be adjusted by experimenting with different
stripe sizes.
Additional Information
8) RAID 7
Unlike the other RAID levels, RAID 7 isn't an open industry standard. It is a
trademarked marketing term of Storage Computer Corporation, used to describe their
proprietary RAID design. RAID 7 is based on concepts used in RAID levels 3 and 4,
but greatly enhanced to address some of the limitations of those levels. Of particular
note is the inclusion of a great deal of cache arranged into multiple levels, and a
specialized real-time processor for managing the array asynchronously. This hardware
allow the array to handle many simultaneous operations, greatly improving
performance of all sorts while maintaining fault tolerance. In particular, RAID 7
offers much improved random read and writes performance over RAID 3 or RAID 4
because the dependence on the dedicated parity disk is greatly reduced through the
added hardware. The increased performance of RAID 7 of course comes at a cost.
RAID 0
Specialties:
Drawbacks:
• Since RAID 0 is non-redundant, there is a high risk of data might be lost if any
disk failure occurred. This shows that RAID 0 does not provide a fault-
tolerance environment. Hence, a critical files or workloads are simply not
suitable to be stored in this level of RAID.
RAID 1
Specialties:
• RAID 1 provides 100% of data redundancy. It will duplicate all the data in
logical disk which will then be mapped to two separate strips in the physical
disks. Hence, every data has it’s mirror disk.
• RAID 1 also offers a fault-tolerance environment -- have real-time back-up of
all data.
• It also increase the performance in data transfer for reads -- as the requests
application can be split into both disks that contain the desired data (from the
disk and it’s parity ), hence both disks can participate in each request thus
increase the performance.
Drawbacks:
• Is expensive as it needs twice the space of the logical disk it supports (for the
parity/mirror disks). Therefore RAID 1 is usually being used to store system
software or any other critical files only.
RAID 2
Specialties:
• Extremely high data transfer rates. Even the highest compared to all other
RAID levels.
• It requires fewer disks compared to RAID 1.
• The data availability is very good as it implies the "on the fly" error
correction.
Drawbacks:
• It is very expensive.
RAID 3
Specialties:
• RAID 3 is very good in sequential read and write. It is even faster than RAID
5.
• It has almost the same level with RAID 0 in striping but with the additional
capabilities of data protection.
• Very high data transfer rates.
• Disk failure has an insignificant impact to the application process.
Drawbacks:
RAID 4
Specialties:
Drawbacks:
• It is not really suitable for application that requires a high data transfer rate.
• Difficult and inefficient data rebuild in the event of disk failure.
• Block Read transfer rate equal to that of a single disk
• RAID 4 does not support multiple simultaneous write operations.
RAID 5
Specialties:
Drawbacks:
RAID 6
Specialties:
• Offers highest data availability -- three disks need to be fail in order to cause
data to be lost.
• RAID 6 allows extra fault-tolerance by using a second independent parity
scheme.
• Perfect solution for mission critical applications.
Drawbacks:
RAID 7
Specialties:
1 Data transfer rate RAID 2 and RAID 3. Both have the highest data
transfer rate.
4 I/O request rate RAID 0 has the highest I/O request rate as it
balances the I/O load across multiple disks.
5 Minimum disk drives RAID 0 and RAID 1 needs the most minimum
needed number of disk drive which is only 2 disk drives.
RAID 0 needs at least 2 disks for striping or else
it would not make any different from the logical
disk (single disk) in any way. As for RAID 1,
one disk will be the mirror or parity of the other
disks.
As for a student, I think RAID 0 would be the best choice among all the levels of
RAID as it is high in performance. RAID 0 has the fastest read and writes
performance as no redundant data is stored. It also offers the highest I/O request rate
as it balance the I/O load in multiple disks. Besides, it needs only a minimum of two
drives to operate. On top of that, it is also lower in cost compared to the other levels
of RAID. Although RAID 0 offers no fault-tolerance environment as it has no
redundant data or back-up disks, this should not be a main problem for people
nowadays as we can easily get our secondary storage (to act as back up storage) with
an affordable price. Furthermore, as a student I don’t think there is any so-called
critical files that need to have a back-up in case of disk failure other than the
assignments and projects which are usually be copied into the removable secondary
storage such as the external hard disk or the pen drives too. That is why I think RAID
0 really suits the students best.
References
Book
Websites
What is a RAID,
• http://www.4raid.com/raidlevels.htm
Berkeley, RAID,
• http://www.sims.berkeley.edu/courses/is257/f99/Lecture10_257/sld025.htm