B Trees and Its Variants

B-Trees & its
Variants
Advanced Data Structure
Spring 2007
Zareen Alamgir
Motivation
Yet another Tree!
Why do we need another
Tree-Structure ?
Data Retrieval from

External Storage
In database programs, the data is too large to fit in

memory, therefore, it is stored on secondary storage
(disks or tapes).
Disk access is very expensive, the disk I/O operation

takes milliseconds while CPU processes data on the
order of nanoseconds, one million times faster.
When dealing with external storage the disk accesses

dominate the running time.
Balanced Binary Search

Trees
Balanced binary search trees (AVL & Red-Black) have

good performance if the entire data can fit in the main
memory.
These trees are not optimized for external storage and

require many disk accesses, thus give poor
performance.
Reduce Disk Accesses
Data is transfer to and from the disk in block. (typically block

are of 512, 2048, 4096 or 8192 bytes)
We can reduce disk accesses by

Storing multiple records in a block on the disk.
Reducing the height of the tree by increasing the
number of subtrees of a node.
To achieve above goals we use Multiway (m-way) search

tree, which is a generalization of BST, binary search tree.
25 62
12 19
15 17
32 39
21 23
30 31
34 37
73 84
45 51
69 71
75 79
90 94
Multiway(m-way)
Search Trees
Multiway(m-way) Search
Trees
In an m-way tree all the nodes have degree m.

Each node has the following structure:
P1
keys<K1
K1
P2
Ki-1
Pi
Ki
Ki-1<keys<Ki
Ki are the keys, where 1 i q-1, q<=m
Pi are pointers to subtrees, where 1 i q, q<=m
Kq-1
Pq
Kq-1<keys
The keys in each node are in ascending order K1 K2 ... Ki

The key Ki is larger than keys in subtree pointed by P i and
smaller than keys in subtree pointed by Pi+1 .
The subtrees are the m-way trees.
Multiway(m-way) Search
Trees
M-way tree is a generalization of BST, its working, benefits

& issues are same.
Benefits
Fast information
retrieval.
Fast update.
Problems
M-way tree
The tree is not balanced.

Leaf nodes are on different
levels.
Bad space usage, tree can
become skew.
B-Trees
B-Trees
B-Tree is a balanced m-way tree that is tuned to

minimize disk accesses.
The node size of B-Tree is usually equal to the disk

block size and the number of keys in a node depends on
Key size
Disk block size
Data organization (keys or entire data records are store in nodes)
Access paths from root to leaf nodes are small.

25 62
12 19
15 17
32 39
21 23
30 31
34 37
73 84
45 51
69 71
75 79
90 94
B-Tree: Definition
A B-tree of order m has following properties
The root has at least two subtrees unless it is a leaf.

Each non-root and each non-leaf node holds q-1 keys
and q pointers to subtrees, where m / 2 q m .
Each leaf node holds q-1 keys where m / 2 q m.
All leaves are on the same level.
It is clear that B-tree is always at least half full, has

fewer levels and is perfectly balanced.
P1
Fr
keys<K1
K1
P2
Ki-1
Pi
Ki
Ki-1<keys<Ki
Kq-1
Pq
Kq-1<keys
B-Tree
B-Tree can have a field KeyTally, KT, to indicate the number of keys
currently stored in the node.
B-Tree node usually contains key and data pointer pair. The data
pointer points to the data record which is not stored in the node, with
this scheme we can pack more keys & pointers in a B-Tree node.
KT
P1
keys<K1
K1
D1
Data pointer
Ki-1 D
Ki-1
Pi
Ki
Ki-1<keys<Ki
Di
Kq-1 KD
q-1
q-1
Data pointer
Pq
Kq-1<keys
Height of B-Tree
Maximum height of the B-Tree with n keys is important as it

bound the number of disk accesses.
The height of the tree is maximum when each node has

minimum number of the subtree pointers, q m / 2 .
Height of B-Tree
The height of B-tree is maximum if all nodes have minimum

number of keys.
1 key in the root + 2(q-1) keys on the second level ++ 2q h-2(q-1) keys in
the leaves (level h).
1 2(q - 1) 2q(q - 1) 2q h-2 (q - 1)
1 ( q 1)
h 2
2qi
i 0
Applying the formula of geometric progression

q h 1 1
1 2( q 1)
q
1 2q h 1
Thus, the number of keys in B - Tree of height h is given as :
n 1 2q h 1
h log q
n 1
1
2
Height of B-Tree
The height of B-tree is minimum if all nodes are full, thus we have
m-1 keys in the root + m(m-1) keys on the second level ++ m h-1(m-1) keys in the leaf nodes
(m - 1) m(m - 1) m 2 (m - 1) m h-1 (m - 1)
h 1
h 1
( m 1)m (m 1) mi
i
i 0
i 0
Applying the formula of geometric progression

mh 1
(m 1)
m
mh 1
Thus, the number of keys in B - Tree of height h is given as :
n mh 1
h logm ( n 1)
logm (n 1) h log q
n 1
1
2
Height of B-Tree
If number of nodes in B-tree equal 2,000,000 (2 million)

and m=200 then maximum height of B-tree is 3, where
as the binary tree would be of height 20.
Note: Order m is chosen so that B-tree node size is

nearly equal to the disk block size.
Search in a B-Tree
Search in a B-tree is similar to the search in BST

except that in B-tree we make a multiway
branching decision instead of binary branching in
BST.
25 62
12 19
15 17
32 39
21 23
30 31
34 37
73 84
45 51
Search key 71
69 71
75 79
90 94
B-Tree Insert Operation
Insertion in B-tree is more complicated than in BST.
In BST, the keys are added in top down fashion

resulting in a unbalanced tree.
B-tree is built bottom up, the keys are added in the

leaf node, if the leaf node is full another node is
created, keys are evenly distributed and middle key
is promoted to the parent. If parent is full, the
process is repeated.
B-tree can also be built in top down fashion using

pre-splitting technique.
Basic Idea
Find position for the key
in the appropriate leaf node
Is node
full ?
Split node:
Create a new node
Move half of the keys from the full node to
the new node
Promote the median key (before split)
to the parent.
Split guarantees that each node has
m / 2 1
keys.
If parent is full
Cases in B-Tree Insert

Operation
In B-tree insertion we have the following cases:
Case 1: The leaf node has room for the new key.
Case 2: The leaf in which key is to be placed is full.
This case can lead to the increase in tree height.
Now we explain these cases in detail.
Case 1: The leaf node has room for the new key.
Find appropriate leaf
node for key 3
Insert 3
3
10 25
Insert 3 in order
14 19 20 23
32 38

node for key 16
Insert 16
16
10 25
19
14 19 20 23
32 38
No room for key 16 in leaf node
Insert key 19 in parent node in order
19 20 23
14 16
Move median key 19 up and

Split node: create a new node and
move keys to the new node.

Case 2: The leaf in which key is to be placed is full
and this lead to the increase in tree height.
Case 2: The height of the tree increases.
Insert 16
55
Insert 27 in parent in order

16
45
55
55
67
No room for 27 in parent, Split node

81
No room for 19 in parent,

Split parent node
48
13
12
14
19
20
23
27
19
27
29
33
31
52
57
61
72
19
20
23
92
35
36
41
42
No room for key 16,

Move median key 19 up & Split node
16
86
38
Insert 19 in parent node in order
14
77
B-Tree Delete Operation
Deletion is analogous to insertion, but a little

more complicated.
Two major cases
Deletion from leaf node

Deletion from nonleaf node
Apply delete by copying technique used in BST to

reduce this case to deleting a key from leaf.
In delete by copying, the key to be deleted is

replaced by the largest key in the right subtree or
smallest in left subtree (which is always a leaf).
Leaf node deletion cases:
After deletion node is at least half full.

After deletion underflow occurs
Redistribute: if number of keys in siblings >
Merge nodes if number of keys in siblings <
Merging leads to decrease in tree height.
m
2 1
m
2
.
1 .
After deletion node is at least half full. (inverse of insertion

case 1)
Search key 3
10 25
14 19
32 38 40 45
Key found, delete key 3.

Move others keys in the node to eliminate the gap.
Underflow occurs, evenly redistribute the keys if left or right sibling

has keys
.
m / 2 1
Search key 14
Delete 14
10 25
14 19
32 38 40 45
Underflow occurs, evenly redistribute keys

in the underflow node, in its sibling and the
separator key.
Underflow occurs and the keys in the left & right sibling are
m / 2 1
. Merge the underflow node and a sibling.

Move separator key down.
Delete 25
10 32
19 25
Move the keys to underflow

node and discard the sibling.
38 40
Underflow occurs, merge nodes.

Underflow occurs, height decreases after merging.
Delete 21
70
Underflow occurs,
merge nodes
8
32
21 27
79 85
47 66
Underflow occurs, merge nodes

by moving separator key and
the keys in sibling node to the
underflow node.
73 75 78
81 83
88 90 92
Issues in B-tree
In B-tree, accessing data in sequential order is not

efficient. In-order traversal of the B-tree requires many
disk accesses.
In B-tree, data pointers are stored in each node, thus

resulting in less subtree pointers per node and more tree
levels.
KT
P1
keys<K1
K1
D1
Data pointer
Ki-1
D
Ki-1
Pi
Ki
Ki-1<keys<Ki
Di
Kq-1free
KD
space
Pq
q-1
q-1
Data pointer
Kq-1<keys
B+-Trees
B+ -Tree- Variant of BTree
Resolves the issues in B-tree.
In B+ -tree
Pointers to data is stored only in leaf nodes.
Internal nodes contain only keys and subtree pointers
Can accommodate more keys in internal nodes.

Less disk accesses due to fewer levels in the tree.
B+ -tree provides faster sequential access of data.
B+-Tree Structure
B+-tree consist of two parts

Index Set
Provides indexes for fast access of data.

Consist of internal nodes that store only key & tree pointers.
Sequence Set
Consist of leaf nodes that contain data pointers.

Provide efficient sequential access of data(using doubly linked list).
Index Set
Sequential
Search
Sequence Set
B+-Tree: Index Node

Structure
The basic structure of the B+-tree index node of

order m is same as that of B-tree node of order m.
The root has at least two subtrees unless it is a leaf.

Each non-root index node holds q-1 keys and q subtree
pointers, where m / 2 q m .
Only difference is that index node do not contain

data pointers.
P1
Fr
keys<K1
K1
P2
Ki-1
Pi
Ki
Ki-1<keys<Ki
Kq-1
Pq
Kq-1<keys
B+-Tree: Sequence Node

Structure
The structure of the B+-tree sequence node is as

follows:
K1
D1
Ki-1 Di-1
Ki
Di
Kq-1
Dq-1
Pointer to next
leaf node in tree
Data pointer
Data pointer
Data pointer
Data pointer
Search in a B+-Tree
The search in B+-tree works similar to B-tree but it

always ends at the leaf node because
Pointer to the data is stored in the leaf node.

Existence of the key in index set does not guarantee that the
particular record is present in the tree.
A key can occur multiple time in the index set. (this does not
create problem because the key in index set node act only as a
separator).
Note 62 is not present
in the sequence set
30 62
15 21
15 17
34 45
21 23
30 31
34 37
75 90
45 51
B+-tree of order m=5
69 71
75 79
90 94
B+-Tree Insert
Case: The leaf node has room for the key to be inserted.
Find appropriate leaf node for
key 3, and insert in order.
Insert 3
3
14 32
14 19 20 23
32 38
B+-Tree Insert Operation

node for key 16
Insert 16
16
10 25
19
14 19 20 23
No room for key 16,

Split node: create a new
node and move m / 2
keys to the new node.
32 38
Insert a copy of the first key of the

new node in the parent node in order.
14 16
19 20 23
19
Modify Sequence Set next node links
B+ -Tree Insert
Operation
Case: Only root node exists, and it is full.

Find appropriate
position for key 18
Insert 18
18
No room for key 18,
Split node: create a new
sequence set node and
move m / 2 keys to the
new node.
18
10 17 20 23
18
Create a new index set node and
make it a root node.
Insert the first key of the new
sequence set node in the new root.
B+-Tree Delete Operation
B+-tree deletion follows same rules as that of Btree deletion but the separator in index set node is
not removed when a key is deleted.
Deletion cases:
After deletion node is at least half full.

Merging lead to decrease in tree height.
m
2 1
m
2
.
1 .
B+-Tree Delete
Case: After deletion node is at least half full.
Search key 14
Delete 14
14 32
14 19 21
Note: key 14 in the parent node is

still a valid separator key. It is not
modified.
32 38 40
Key found, delete key 14.

Move others keys in the node to eliminate the gap.
Underflow occurs, evenly redistribute the keys if left or right sibling

has keys
.
m / 2 1
Separator
key
the parent
is
Search
key 14
& in
delete
it.
no longer valid
Delete 14
14 38
32
14 19
Insert a copy of the first key of

the sibling node in parent in order.
32 38 40
Underflow occurs, evenly redistribute keys

in the underflow node and in the sibling node.
Note: unlike B-tree, the separator key in the
parent node is not included. Why?
Underflow occurs and the keys in the left & right sibling are
m / 2 1
. Merge the underflow node and a sibling.

Search key 32 & delete it.
Delete 32
14 38
19 32
38 40
Underflow occurs, merge nodes.

Move keys in sibling to underflow
node and discard the sibling node.
Efficient Sequential Access in

For efficient sequential access, start from the beginning of
B+-Tree
the sequence set and traverse the sequence set using the
next pointers in sequence set nodes
45
13
27
33
38
Sequence Set Start
48
52
55
67
57
81
61
72
77
86
92
Comparison B-Tree & B+Tree

B-Tree
Data pointers stored in all

nodes
No redundant keys
Search can end at any
node
Slow sequential access
Higher trees
B+ -Tree
Data pointers stored in leaf

nodes (sequence set)
Redundant keys exist
Search ends always at
leaves
Efficient sequential access
Flatter trees (no data pointers
in index set nodes)
B* -Trees
B* -Tree -- Variant of BTree
Each node of a B-tree represent a block of secondary memory,

therefore, accessing a node is expensive operation. Thus, the
fewer nodes that are created, the better.
In B*-tree is a variant of B-tree introduced by Donald Knuth and

named by Douglas Comer.
In a B*-tree, all nodes except the root are required to be at least

two-thirds full, not just half full as in B-tree.
The number of keys in all non-root nodes in a B*-tree of order m

2m 1
is k, where 3 k m 1 .
Average utilization of B*-tree is 81%.
B* -Tree Insert Operation
In B*-tree, the frequency of node splitting is decreased by

delaying a split.
A split in a B*-tree is delayed by attempting to redistribute the

keys between node and its sibling when node overflows.
In B*-tree split operation two nodes are split into three instead of
one into two as in B-tree.
All three nodes participating in the split are guaranteed to be twothirds full after split.
B*-Tree Insert Operation
Overflow occurs, evenly redistribute the keys between node

and its sibling.
Insert 22
22
32
10 12 15 16 21 24 25 29
35 42 47 51 53
Overflow occurs, evenly redistribute keys

in the overflow node, in its sibling including
the separator key in parent and new key.
B*-Tree Insert Operation
Overflow occurs, sibling is full, split node.
Insert 72
72
32
10 12 15 16 21 24 25 29
35 42 47 51 53 55 57 59
Overflow occurs, sibling is full, split node.
B* -Tree Delete Operation
B*-tree deletion follows same rules as that of B-tree deletion.
Deletion cases:
After deletion node is at least two third full.
2m 1
3 .
2m 1
.
3
Examples of deletion are omitted as they are similar to B-tree

deletion.
Questions ?
References
Data Structure and Algorithms in C++, Adam Drozek.
Introduction to Algorithms, T.H.Cormen, C.E.Leiserson, R.L.Rivest,

and C.Stein.
Fundamentals of Database Systems, Elmasri Navathe.
Fundamentals of Data Structures in C++, E.Horowitz, S.Sahni and

D.Mehta
The Ubiquitous B-Tree, DOUGLAS COMER, ACM Computing

Surveys (CSUR), Volume 11 ,Issue 2(June 1979).
The End

B Trees and Its Variants

Transféré par

Informations du document

Description originale:

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

B Trees and Its Variants

Transféré par

Droits d'auteur :

Formats disponibles

B-Trees & its

Data Retrieval from

In database programs, the data is too large to fit in

Disk access is very expensive, the disk I/O operation

When dealing with external storage the disk accesses

Balanced Binary Search

Balanced binary search trees (AVL & Red-Black) have

These trees are not optimized for external storage and

Reduce Disk Accesses

Data is transfer to and from the disk in block. (typically block

We can reduce disk accesses by

To achieve above goals we use Multiway (m-way) search

In an m-way tree all the nodes have degree m.

Ki are the keys, where 1 i q-1, q<=m

Pi are pointers to subtrees, where 1 i q, q<=m

The keys in each node are in ascending order K1 K2 ... Ki

M-way tree is a generalization of BST, its working, benefits

The tree is not balanced.

B-Tree is a balanced m-way tree that is tuned to

The node size of B-Tree is usually equal to the disk

Access paths from root to leaf nodes are small.

A B-tree of order m has following properties

The root has at least two subtrees unless it is a leaf.

It is clear that B-tree is always at least half full, has

Maximum height of the B-Tree with n keys is important as it

The height of the tree is maximum when each node has

The height of B-tree is maximum if all nodes have minimum

1 2(q - 1) 2q(q - 1) 2q h-2 (q - 1)

Applying the formula of geometric progression

Applying the formula of geometric progression

If number of nodes in B-tree equal 2,000,000 (2 million)

Note: Order m is chosen so that B-tree node size is

Search in a B-tree is similar to the search in BST

B-Tree Insert Operation

Insertion in B-tree is more complicated than in BST.

In BST, the keys are added in top down fashion

B-tree is built bottom up, the keys are added in the

B-tree can also be built in top down fashion using

Cases in B-Tree Insert

In B-tree insertion we have the following cases:

Now we explain these cases in detail.

B-Tree Insert Operation

B-Tree Insert Operation

Case 2: The leaf in which key is to be placed is full.

Insert key 19 in parent node in order

Move median key 19 up and

B-Tree Insert Operation

B-Tree Insert Operation

Case 2: The height of the tree increases.

Insert 27 in parent in order

No room for 27 in parent, Split node

No room for 19 in parent,

No room for key 16,

Insert 19 in parent node in order

B-Tree Delete Operation

Deletion is analogous to insertion, but a little

Two major cases

Deletion from leaf node

Apply delete by copying technique used in BST to

In delete by copying, the key to be deleted is

B-Tree Delete Operation

Leaf node deletion cases: