Académique Documents
Professionnel Documents
Culture Documents
TUTORIAL ON FILE
ORGANIZATION
l Clustered B+ index
u balanced tree with high fanout
Heap Files
l Inserting a row:
u Access path is scan
u Avg. N/2 page transfers if row already exists
u N+1 page transfers if row does not already exist
l Deleting a row:
u Access path is scan
u Avg. N/2+1 page transfers if row exists
u N page transfers if row does not exist
SELECT T.Grade
FROM Transcript T
WHERE T.StudId=12345 AND T.CrsCode =‘CS305’
AND T.Semester = ‘S2000’
6
CSD Univ. of Crete Fall 2005
Sorted File
l Rows are sorted based on some attribute(s)
u Access path is binary search
u Equality or range query based on that attribute has cost log2N to
retrieve page containing first row
u Successive rows are in same (or successive) page(s) and cache hits
are likely
u By storing all pages on the same track, seek time can be minimized
l Problem: After the correct position for an insert has been determined,
inserting the row requires (on average) N/2 reads and N/2 writes
10
CSD Univ. of Crete Fall 2005
Overflow
3
111111 MGT123 F1994 4.0
111111 CS305 S1996 4.0 page 0
111111 ECO101 F2000 3.0
122222 REL211 F2000 2.0
Pointer to
overflow chain -
123456 CS315 S1997 4.0
123456 EE101 S1998 3.0 page 1
232323 MAT123 S1996 2.0
234567 EE101 F1995 3.0
-
234567 CS305 S1999 4.0
page 2
7
Chain 111654 CS305 F1995 2.0
pointer 111233 PSY 220 S2001 3.0 page 3 11
NOTATION:
l B: The number of data pages
l R: Number of records per page
l F: Fanout of B-tree (number of children for each non-leaf node)
l S: I/O’s for equality search
NOTES:
u Average-case analysis; based on several simplistic assumptions
(see text for more detail)
u CPU costs are ignored
12
CSD Univ. of Crete Fall 2005
13
14
CSD Univ. of Crete Fall 2005
17
Delete S + 1 S + B S + 1 S + 2 4
18
CSD Univ. of Crete Fall 2005
19
22
CSD Univ. of Crete Fall 2005
Questions 8-11
8. How long will it take to find all employee numbers in the range
K0011200..K0011299?
9. How does the cost here compare with that in sorted and heap files?
10. How long will it take to find all employee numbers beginning with K?
11. How does the cost here compare with that in sorted and heap files?
25
Q8 – Hash Search
l K0011200..K0011299
l How many pages will this be distributed over?
l Not known. Really need idea of denseness - does every possible key
exist?
l Maximum is over pages 200, 201, …., 299 so 100 pages; minimum is
over 0 pages (does not exist)
l Because of uncertainty, hash method with any range (even on key)
involves complete scan
l Cost = 1000*(D + RC) =1000*( 0.015 + (5000*10-7)) = 15.5 secs (very
long)
26
CSD Univ. of Crete Fall 2005
Q9 – Sorted/Heap
l Heap cost: search all of file – cost = B(D+RC) = 833*(0.015 +
6,000*10-7) = 13.0 secs (very long, B=833 as can pack 6,000 records
per page at 100% full)
l Sorted cost: find initial page by binary chop then find all records in
one, possibly two, accesses. Cost = D log2B+Clog2R = (0.015*10)+
(10-7*13) = 0.15+(1.1*10-6) » 0.15 secs (much the fastest)
l There is a minor extra cost for sorted. What is it?
27
Q10/Q11 – Cost K?
l Cannot say how many records there are
l Hashed -- cost will be search of whole of file =B*(D + RC) =1000*( 0.015
+ (5000*10-7)) = 15.5 secs (very long, longer than heap as <100% full)
l Heap – cost will be search of all of file = B(D+RC) = 833(0.015 +
6,000*10-7) = 13.0 secs (very long)
l Sorted – cost will be D log2B+Clog2R = (0.015*10)+ (10-7*13) » 0.15 secs
l So sorted is very much faster
l Note hashing gives fast single-record access but is slow for searching on
ranges
28
CSD Univ. of Crete Fall 2005