Vous êtes sur la page 1sur 13

Chapter Outline

Database Design
 Disk Storage Devices
Chapter 13  Files of Records
 Operations on Files
 Ordered and Unordered Files
Disk Storage, Basic File  Hashed Files
Structures, and Hashing  Extendible and Linear Hashing Techniques
 RAID Technology
CS 6360.501 (Fall 2009)
 Storage Area Networks
Instructor: Sunan Han
The University of Texas at Dallas

Slide 13- 2

Primary and Secondary Storage Devices Disk Storage Devices


 Primary storage: random access memory (RAM) with fastest  Preferred secondary storage device for high storage
access time. Volatile in nature – stored data is gone when capacity, non-volatility and low cost.
power is off. Technology limits its size (Moore’s Law)  Data stored as magnetized areas on magnetic disk
 Main memory: stores data being processed by CPU: programs surfaces
and system data 
 Cache memory: stores the most frequently used data to reduce
the number of times accessing the secondary storage or the
main memory
 Stores dynamic databases, such as network routing table
 Secondary storage: magnetic disks or tapes: massive
capacity. Permanent regardless power. Slower access time
than RAM. Inexpensive
 Databases are usually stored in secondary storage
Slide 13- 3
Disk Storage Devices (contd.) Disk Storage Devices (contd.)
 A track is divided into smaller sectors
Meaning number of blocks varies in fixed angle sectors:
 because it usually contains a large amount of information outer tracks have more blocks than inner tracks
 The division of a track into sectors is hard-coded on the disk
surface and cannot be changed (see example on next slide)
 One type of sector organization calls a portion of a track that
subtends a fixed angle at the center as a sector.
 A track is divided into blocks by a computer operating system
 It may be a subdivision of a sector or sector itself
 The block size is fixed for each system
 Typical block sizes range from 512 bytes to 4096 bytes.
 Whole blocks are transferred between disk and main memory for
processing. (Blocks are units of data transfer at operating
system’s level)
Slide 13- 5 Slide 13- 6

Disk Storage Devices (contd.) Disk Storage Devices (contd.)


 A read-write head moves to the track that contains the block
to be transferred.
 Disk rotation moves the block under the read-write head for
reading or writing.
 A physical disk block (hardware) address consists of:
 a cylinder number (imaginary collection of tracks of same radius
from all recorded surfaces): decides the head position
 the track number or surface number (within the cylinder):
decides the disk/surface
 and block number (within track): decides the actual location
 Reading or writing a disk block is time consuming because of
the seek time and rotational delay (latency)
 Double buffering can be used to speed up the transfer of
contiguous disk blocks.

Slide 13- 7 Slide 13- 8


Buffering of Blocks Double Buffering

 Note that the data transfer is directly between the


memory and the disk, after being set up by the CPU,
store some blocks of data from the disk in the main
memory may accelerate the entire data retrieval process B

 Buffering technique is used so that, while transferring


data from the disk, the CPU can simultaneously process
data already in the memory
 Double buffering: reading and processing consecutive
blocks from disk (see example on next slide)

Slide 13- 9 Slide 13- 10

Typical Disk Records


Parameters  Fixed and variable length records (for entities)
 Records contain fields (for attributes) which have
values of a particular type
 E.g., amount, date, time, age
 Fields themselves may be fixed length or variable
length
 Variable length fields can be mixed into one record:
 Separator characters or lengths of fields are needed
so that the record can be “parsed.”

(Courtesy of Seagate Technology)

Slide 13- 11 Slide 13- 12


Blocking
 Blocking:
 Refers to the fact of storing a number of records in one block on
the disk.
 Blocking factor (bfr) refers to the number of records per
block.
 There may be empty space in a block if an integral number of
records do not fit in one block.
 Unspanned records: records can’t be stored in two or more
blocks
 Spanned Records:
 Refers to records that that is stored in more than one block (to
make use of the empty space), or records that exceed the size
of one or more blocks and hence span a number of blocks

Slide 13- 13 Slide 13- 14

Spanned and Unspanned Records Files of Records


 A file is a sequence of records, where each record is a
collection of data values (or data items). A file is a good unit
to represent a table, but not limited to just one file/one table
 A file descriptor (or file header) includes information that
describes the file, such as the field names and their data
types, and the addresses of the file blocks on disk. (It may
contain other information such as hashing structures)
 Records are stored in disk blocks.
 The blocking factor (bfr) for a file is the (average) number
of records of the file stored in a disk block.
 A file can have fixed-length records or variable-length
records.

Slide 13- 15 Slide 13- 16


Files of Records (contd.) Operations on Files
 OPEN: Readies the file for access, and associates a pointer that will refer
to a current file record at each point in time.
 The physical disk blocks that are allocated to hold the
 FIND: Searches for the first file record that satisfies a certain condition,
records of a file can be contiguous, linked, or indexed.
and makes it the current file record.
 In a file of fixed-length records, all records have the same  FINDNEXT: Searches for the next file record (from the current record) that
format. Usually, unspanned blocking is used with such files. satisfies a certain condition, and makes it the current file record.
 Files of variable-length records require additional information  READ: Reads the current file record into a program variable.
to be stored in each record, such as separator characters  INSERT: Inserts a new record into the file & makes it the current file
and field types. record.
 Usually spanned blocking is used with such files.  DELETE: Removes the current file record from the file, usually by marking
the record to indicate that it is no longer valid.
 MODIFY: Changes the values of some fields of the current file record.
 CLOSE: Terminates access to the file.
 REORGANIZE: Some files need to be reorganized periodically (e.g.
sorting)
 READORDERED: Read the file blocks in order of a specific field of file
Slide 13- 17 Slide 13- 18

Unordered Files Ordered Files


 Also called a sequential file.
 Also called a heap or a pile file.  File records are kept sorted by the values of an ordering field.
 New records are inserted at the end of the file.  Insertion is expensive: records must be inserted in the correct
order.
 A linear search through the file records is  It is common to keep a separate unordered overflow (or
necessary to search for a record. transaction) file for new records to improve insertion efficiency;
 This requires reading and searching half the file this is periodically merged with the main ordered file.
blocks on the average, and is hence quite expensive.  A binary search can be used to search for a record on its
ordering field value.
 Record insertion is quite efficient.  This requires reading and searching log2 of the file blocks on
the average, an improvement over linear search.
 Reading the records in order of a particular field
 Reading the records in order of the ordering field is quite
requires sorting the file records. efficient.

Slide 13- 19 Slide 13- 20


Ordered Files Average Access Times
 The following table shows the average access time
Binary search can be to access a specific record for a given type of file (b
applied to an ordered is the number of blocks the file uses on the disk)
list:
1.Start from the middle
of the list;
2. Compare: stop if
found, or select the a
half of the list to be
current list and go to 1.

Slide 13- 21 Slide 13- 22

Hashing Techniques – Internal Hashing Hashing Algorithms


 Hashing is a special type of file organization for fast file access
 Algorithm 13.2. The key field (K) assumes a data type of 20
(vs. ordered log2(n) and unordered (n/2)). The search condition
must be an equality on a single field characters. code returns the ASCII of a character
 Internal Hashing is used for fast access of the records in a table in (a) temp = 1;
the memory for i = 1 to 20 do temp = temp*code(K[i]) MOD M;
hash_address = temp MOD M;
 A key field (attribute) in a table uniquely identifies the records. If it
is used as the address of the table, then no search is necessary:
as long as the key of a record is given, the record can be accessed (b) i = hash_address; a = i; new_hash_address = i;
right away if (location i is occupied)
 The process of mapping the key field (K) to the address is called then { i = (i+1) MOD M;
hashing while (i<>a and location is occupied) {
i = (i+1) MOD M;}
 A hashing function does the mapping. A simple approach is the
if (i == a)
remainder operator MOD (Note: it generates collision)
then {print “List full”; exit;}
h(K) = K MOD M, M is the number of records in the file
else {new_hash_address = i;}
h(K) generates integer numbers from 0 to M-1, referring to the M
}
entries/addresses in the table
Slide 13- 23 Slide 13- 24
Internal Hashing Example Discussion on Collision Resolution
Name Ssn Job Salary h(K) = K MOD 6 new_hash Ph Addr Memory Storage
John 23 A 100 h(23) = 5 ==> 5 0 Deb 39 B 200  Open addressing: From the hash address, checks the
Kyle 14 B 200 h(14) = 2 ==> 2 1 Sean 18 C 150
Jay 32 A 150 h(32) = 2 ==> 3 2 Kyle 14 B 200
subsequent positions in order until an unused (open) position
Sue 98 C 100 h(98) = 2 ==> 4 3 Jay 32 A 150 is found (Algorithm 13.2 (b))
Deb 39 B 200 h(39) = 3 ==> 0 4 Sue 98 C 100  Chaining: Additional overflow space is provided for collided
Sean 18 C 150 h(18) = 0 ==> 1 5 John 23 A 100
hash addresses. All collided records are chained together
(see example on next slide)
Algorithm Algorithm
13.2 (a) 13.2 (b)  Multiple hashing: It applies a second hash function if the first
results in a collision. If another collision results, it uses open
Algorithm 13.2 (a) is used to calculate the hash function. addressing or applies a third hash function and then uses
Obviously, address collision occurs from the hash open addressing if necessary
function. Algorithm 13.2 (b) is used to resolve the collision
 Trade-off of simplicity, space and computation time

Slide 13- 25 Slide 13- 26

External Hashing – Hashed Files


Collision
 Hashing for disk files is called External Hashing
Resolution by  The file blocks are divided into M equal-sized buckets,
Chaining numbered bucket0, bucket1, ..., bucketM-1.
 Typically, a bucket corresponds to one (or a fixed number of)
disk block.
 One of the file fields is designated to be the hash key of the
file.
 The record with hash key value K is stored in bucket i, where
i=h(K), and h is the hashing function.
 Search is very efficient on the hash key.
 Collisions occur when a new record hashes to a bucket that
is already full.
 An overflow file is kept for storing such records.
 Overflow records that hash to each bucket can be linked
together.
Slide 13- 27 Slide 13- 28
Bucket Number to Block Matching Hashed Files - Overflow handling

Chained for
records from
h(K) same bucket

Stored in File Header

Slide 13- 29 Slide 13- 30

Hashed Files Discussions Dynamic File Hashing

 To reduce overflow records, a hash file is typically  Dynamic Hashing Techniques


kept 70-80% full.  Hashing techniques are adapted to allow the dynamic
 The hash function h should distribute the records growth and shrinking of the number of file records.
uniformly among the buckets  Extendible hashing
 Otherwise, search time will be increased because
many overflow records will exist.  Linear hashing
 Main disadvantages of static external hashing:
 Fixed number of buckets M is a problem if the number
of records in the file grows or shrinks.
 Ordered access on the hash key is quite inefficient
(requires sorting the records).

Slide 13- 31 Slide 13- 32


Extendible Hashing Extendible Hashing
 Extendible hashing uses the binary representation of the
 The directories can be stored on disk, and they expand or
hash value h(K) in order to access a directory.
shrink dynamically.
 The directory is an array of size 2d where d is called the  Each directory entry points to a bucket which points to the disk
global depth and is the number of binary digits used for the block/blocks that contain the stored records
directory addresses, or determines the number of entries of a  Each bucket has a local depth d’, d’ ≤ d
directory
 An insertion in a disk block that is full causes the
 d can be increased or decreased by one at a time, doubling block/bucket to split into two by increasing the local depth
or halving the size of the directory (e.g. 01 becomes 010 and 011), and the records are
 Assume a record’s K field value is K. Then the first d binary redistributed among the two blocks based on hashed keys
digits of h(K) determines which directory entry it belongs to, with the new local depth of d’
and therefore which bucket it belongs to  Extendible hashing do not require an overflow area.

Slide 13- 33 Slide 13- 34

Linear Hashing
Extendible
Hashing  Linear hashing does require an overflow area but does not
use a directory
 Initial hashing is h1(K) = K MOD M, M = number of buckets
 Buckets are split in linear order as overflow occurs: first
overflow occurs, bucket0 is split into two. Second overflow
Before expending. If
full, expand d’ = 3 and occurs, bucket1 is split into two, …
redistribute records  Use a variable n to keep track of which bucket has just been
based on 010 and 011
split (or how many have been split). When split, the records
are redistributed between the two buckets based on a new
Before expending. If
hash function hi+1(K) = K MOD 2iM (It’s possible because of
full, expand d = 4 and the property hi+1(K) = either hi(K) or hi(K) + 2i-1M)
redistribute records  At each bucket, an overflow chain is needed for records that
based on 1100 and
1101. Directory has 16 cause overflows but the bucket’s turn of split is yet to come
entries Slide 13- 35 Slide 13- 36
Linear Hashing Linear Hashing
h1(K) = K MOD M  In retrieving a record with field value K, if h(K) < n, then re-
hash it using hi+1(K) because it was in a bucket which has
been split. So, n divides the records as far as which hash
Bucket0 Bucket1 Bucket2 Bucket3 Bucket4
function is used
 When n = M, all original M buckets have been split and hi+1
applies to all buckets now
Split Split Split
M=5  n can be reset to zero and any new overflow leads to the use
of a new hash function hi+2(K) = K MOD 4M
Bucket5 Bucket6  In general, hashing function hi+j(K) = K MOD 2jM is used for
Bucket7
records who hashes to buckets with a number < n, and
n=3
hi+j+1(K) = K MOD 2j+1M is used for other records, where j =
When h1(K) < n or splitting 0,1,2, …. Each j is a pass of split of buckets
use h2(K) = K MOD 2M
Slide 13- 37 Slide 13- 38

Parallelizing Disk Access using RAID


Linear Hashing Search Algorithm
Technology.
 Algorithm 13.3. Given field value K of a record,
decide its hash value (a is the hashed value and a  Secondary storage technology must take steps to keep up in
performance and reliability with processor technology.
zero or positive integer pointing to a bucket)
 A major advance in secondary storage technology is
represented by the development of RAID, which stands for
a = hj(K) ; Redundant Arrays of Inexpensive Disks.
if (a < n)
 The main goal of RAID is to even out the widely different
then rates of performance improvement of disks against those in
{ memory and microprocessors. (Disk performance – access
a = hj+1(K) ; speed, capacity, etc. – improvement is found to be at a lot
} slower rate than memory and CPU)

Slide 13- 39 Slide 13- 40


RAID Technology Improving Reliability
 Assume that a disk’s mean time between failure (MTBF) is about
 A natural solution is a large array of small independent disks 200,000 hours (22.8 years)
acting as a single higher-performance logical disk.  For 100 disks running in RAID, the MTBF becomes 2,000 hours or
 A concept called data striping is used, which utilizes 83.3 days. This is very bad since, if the mean time to repair
parallelism to improve disk performance (MTTR) is 24 hours, the per year down time is (365/83.3)*24 = 105
 Bit level striping and block level striping hours, or 4.38 days, this is a lot of data loss. The unavailability (U)
is 24/(2000+24) = 0.012, or availability (A) = 1 – 0.012 = 0.988.
 Data striping distributes data transparently over multiple The industrial standard usually requires five 9’s (0.99999 =
disks to make them appear as a single large and fast disk. 99.999%, about 5 minutes downtime/year)
 Redundancy is necessary. One technique is called mirroring
(shadowing): data is stored redundantly in two identical disks, but
they are treated as one logical disk
 The reliability is significantly improved: for each pair of redundant
disks, the MTBF becomes 190,304 years.

Slide 13- 41 Slide 13- 42

RAID Organizations and Levels


 Different raid organizations were defined based on different combinations
of the two factors of granularity of data interleaving (striping) and the
redundancy pattern
 Raid level 0 has no redundant data and hence has the best write

performance at the risk of data loss


 Raid level 1 uses mirrored disks.

 Raid level 2 uses memory-style redundancy by using Hamming codes,

which contain parity bits for distinct overlapping subsets of


components. Level 2 includes both error detection and correction.
 Raid level 3 uses a single parity disk relying on the disk controller to Multiple Level of
figure out which disk has failed. RAID
 Raid Levels 4 and 5 use block-level data striping, with level 5

distributing data and parity information across all disks.


 Raid level 6 applies the so-called P + Q redundancy scheme using

Reed-Soloman codes to protect against up to two disk failures by


using just two redundant disks.
Slide 13- 43 Slide 13- 44
Direct-Attached Storage (DAS) Network-Attached Storage (NAS)

Ethernet

LAN = Local Area Network


JBOD = Just a Bunch Of Disks
(no redundancy and parallelism)
Slide 13- 45 Slide 13- 46

Storage Area Networks Storage Area Networks


 The demand for higher storage has risen considerably in
recent times.
 Organizations have a need to move from a static fixed data
Ethernet center oriented operation to a more flexible and dynamic
infrastructure for information processing.
 Thus they are moving to a concept of Storage Area Networks
(SANs).
 A SAN is a network of interconnected computers and data
SONET/DWDM/fiber storage devices.
Ethernet over fiber  In a SAN, online storage peripherals are configured as nodes
on a high-speed network and can be attached and detached
Ethernet/Fiber Channel
from servers in a very flexible manner.
 This allows storage systems to be placed at longer distances
from the servers and provide different performance and
connectivity options.
Slide 13- 47 Slide 13- 48
Storage Area Networks Summary
 Advantages of SANs are:  Disk Storage Devices
 Flexible many-to-many connectivity among servers and storage
 Files of Records
devices using fiber channel hubs and switches
 Up to 10km separation (this can be significantly extended by  Operations on Files
such technology as dense wavelength division multiplexing)  Ordered and Unordered Files
between a server and a storage system using appropriate fiber
optic cables  Hashed Files
 Better isolation capabilities allowing non-disruptive addition of  Extendible and Linear Hashing Techniques
new peripherals and servers
 RAID Technology
 SANs face the problem of combining storage options from
multiple vendors and dealing with evolving standards of  NAS and SAN
storage management software and hardware.

Slide 13- 49 Slide 13- 50

Assignment #11
 Page 507: 13.27, 13,28
 Due 11/16/09

Slide 13- 51

Vous aimerez peut-être aussi