Vous êtes sur la page 1sur 7

File Organization and Storage Structures

o Storage of data
– Primary Storage = Main Memory
File Organization and • Fast
Storage Structures • Volatile
• Expensive

– Secondary Storage = Files in disks or tapes


• Non-Volatile

 Secondary Storage is preferred for storing


data

File Organization and Storage Structures - 1 File Organization and Storage Structures - 2

Basic Concepts Logical Record Vs Physical Record

o Information are stored in data files o Logical record


o Each file is a sequence of records – Eg. The record of a staff (SG37).
– “A record”
o Each record consists of one or more fields
o Physical record
– The unit of transfer between disk and primary
Sno Lname Position NIN Bno
storage.
SL21 White Manager WK440211B B5 – “A page”, “A block”
SG37 Beech Snr Asst WL432514C B3
SG14 Ford Deputy WL220658D B3 Generally, a physical record consists of more than
one logical record

File Organization and Storage Structures - 3 File Organization and Storage Structures - 4

CS3462 Introduction to Database Systems


Helena Wong, 2001
Logical Record Vs Physical Record File Organization & Access Method

o File Organization means the physical arrangement


of data in a file into records and pages on
Sno Lname Position NIN Bno Page
secondary storage
SL21 White Manager WK440211B B5
– Eg. Ordered files, indexed sequential file etc.
SG37 Beech Snr Asst WL432514C B3 1
SG14 Ford Deputy WL220658D B3
o Access Method means the steps involved in storing
SA9 Howe Assistant WM532187D B7 and retrieving records from a file.
SG5 Brand Manager WK588932E B3 2
– Eg. Using an indexed access method to retrieve a
SL41 Lee Assistant WA290573K B5 record from an indexed sequntial file.

File Organization and Storage Structures - 5 File Organization and Storage Structures - 6

Heap Files Ordered Files


o Ordered Files: Records are sorted on field(s) => Key
o Heap files are files of unordered records.
o Allow Binary Searching
o Quick insertion (no particular ordering)
Suppose one page stores one record.
– When a new record is created, it is put in the last
page of the file if there is sufficient space. Otherwise To search for SG37, search the middle page (6/2 = 3)
a new page is added to the file. first. We find that SG37 does not exist in this
page(SG14). Then, since SG37 is greater than SG14,
we search the middle page within the lower half of the
o Slow retrieval (only allow linear search)
file, and so on.
– reading pages from the file until a required record is
found.

o To delete a record, the record is marked as deleted.


Space is reclaimed during periodical reoganization.

File Organization and Storage Structures - 7 File Organization and Storage Structures - 8

CS3462 Introduction to Database Systems


Helena Wong, 2001
Ordered Files Direct Files
o Inserting a record o Direct Files are also called Hash Files or Random
– If the appropriate page is full, may have to re- Files
organize the whole file => Time consuming o No need to write records sequentially
– Solution: use a temporary unsorted file (transaction o Use a hash function to calculate the number of the
file). Merge to the sorted file periodically. page (bucket) which a record should be located

o Rarely used unless come with an index => Indexed o Eg., use the division-remainder calculation method
Sequential File that,

bucket_no = Record_key mod 3


o Both Heap Files and Ordered Files are also called
Sequential Files.

File Organization and Storage Structures - 9 File Organization and Storage Structures - 10

Direct Files Direct Files

Open Addressing

o Upon a collision, the system


performs a linear search to
find the first available slot.

o When last bucket has been


searched, starts from the first
o Problem: If a new record SG41 is created, which bucket.
bucket to go?
o SL41 will be inserted to:
o Collision Management Bucket 1
Open addressing, Unchained overflow, Chained
overflow, Multiple hashing
File Organization and Storage Structures - 11 File Organization and Storage Structures - 12

CS3462 Introduction to Database Systems


Helena Wong, 2001
Direct Files Direct Files
Unchained Overflow Chained Overflow
o An overflow area is maintained for collisions. o Each bucket has a synonym pointer
o Value of the synonym pointer:
o SL41 will be inserted to:
Bucket 3 Zero: no collision occurred
Non-zero: the overflow bucket used

File Organization and Storage Structures - 13 File Organization and Storage Structures - 14

Direct Files Direct Files

Multiple Hashing
Limitation (of Hashing)
o Upon collision, apply a second hashing function to
produce a new hash address in an overflow area.
Inappropriate for some retrievals:
– based on pattern matching
eg. Find all students with ID like 98xxxxxx.

– Involving ranges of values


eg. Find all students from 50100000 to 50199999.

– Based on a field other than the hash field

File Organization and Storage Structures - 15 File Organization and Storage Structures - 16

CS3462 Introduction to Database Systems


Helena Wong, 2001
Indexes Indexes
Index: A data structure that allows particular records in
TERMINOLOGY
a file to be located more quickly
~ Index in a book Data file: a file containing the logical records

Index file: a file containing the index records


An index can be sparse or dense:
Indexing field: the field used to order the index records
Sparse: record for only some of the search key values in the index file
(eg. Staff Ids: CS001, EE001, MA001). Applicable to
ordered data files only. Key: One or more fields which can uniquely identify a
record (eg. No 2 students have the same student ID).
Dense: record for every search key value. (eg. Staff Ids:
CS001, CS002, .. CS089, EE001, EE002, ..)

File Organization and Storage Structures - 17 File Organization and Storage Structures - 18

Indexes Indexed Sequential Files


TYPES OF INDEXES
What are Indexed Sequential Files?
Primary Index: An index ordered in the same way as = A sorted data file with a primary index
the data file, which is sequentially ordered
Advantage of an Indexed Sequential File
according to a key. (The indexing field is equal to
this key.) Allows both sequential processing and individual
record retrieval through the index.
Secondary Index: An index that is defined on a non- Structure of an Indexed Sequential File
ordering field of the data file. (The indexing field
o A primary storage area
need not contain unique values).
o A separate index or indexes
 A data file can associate with at most one primary o An overflow area
index plus several secondary indexes.

File Organization and Storage Structures - 19 File Organization and Storage Structures - 20

CS3462 Introduction to Database Systems


Helena Wong, 2001
B+-Trees B+-Trees
In B+-Tree, data or indexes are stored in a hierarchy of o B => Balanced
nodes
o Consistent access time (for each access, same
number of nodes are searched)

TERMINOLOGY

Degree (Order) : The maximum number of children


allowed per parent.

Depth : The maximum number of levels between the


root node and a leaf node in the tree.
Point to
data
File Organization and Storage Structures - 21 File Organization and Storage Structures - 22

B+-Trees B+-Trees
In practice, each node in the tree is actually a page, so we RULES (Cont’d):
can store many pointers and keys. Eg. For a page size
of 4KB, the B+-Tree can be of order 512. o For a tree or order n, the number of key values in a
leaf node must be between (n-1)/2 and (n-1) pointers
Access time depends more ofen upon depth than on and children. If (n-1)/2 is not an integer, the result is
breadth => Shallow trees are preferred. rounded up.

RULES o The number of key values contained in a nonleaf


node is 1 less than the number of pointers.
o The root (if not a leaf node) must have at least 2
children o The tree must always be balanced: every path from
the root node to a leaf must have the same length.
o For a tree of order n, each node (except root and leaf)
must have between n/2 and n pointers and children. If o Leaf nodes are linked in order of key values.
n/2 is not an integer, the result is rounded up.
File Organization and Storage Structures - 23 File Organization and Storage Structures - 24

CS3462 Introduction to Database Systems


Helena Wong, 2001
B+-Trees B+-Trees

Balancing can be costly to maintain. Example:

Example:
Adding Adding
SG14 SA9

File Organization and Storage Structures - 25 File Organization and Storage Structures - 26

B+-Trees Summary

Example: o Basic concepts (Files, Records, Fields)


o Primary storage vs secondary storage
Adding SA9
o Logical record vs physical record
o File Organization (and access methods)
– Heap files
– Ordered Files (Binary Search)
– Direct Files (Hashing)
– Indexes
– Indexed Sequential Files
– B+- Trees
File Organization and Storage Structures - 27 File Organization and Storage Structures - 28

CS3462 Introduction to Database Systems


Helena Wong, 2001

Vous aimerez peut-être aussi