CSM Final Review Part 2 (11/2)

CSM Final Review, Part 2
Post-Midterm 2 Material
Graph Traversals
All Traversals
● All graph traversals can be written
as variations of the baseline
traversal with a fringe
○ A fringe is any data structure that
supports insert(), poll(), remove()
○ This includes stacks and queues
BFS
● Replace the COLLECTION_TYPE with
a QUEUE
○ This search will traverse a graph “layer
by layer”, first the start node, then the
start node’s neighbors, then the start
node’s second degree neighbors, etc.
DFS
● Replace the COLLECTION_TYPE with
a STACK
○ This search will traverse a graph by
traversing the root, then picking a path
to follow down as long as it can (until it
hits a dead end).
○ After it hits a dead end, it find the most
recent node which has an unmarked
neighbor and continues searching from
there.
Orderings
● Pre Order
○ Parent then its children
○ First node will be the start node
● In Order
○ Left children then right children
○ Only really relevant in trees
○ Fun off topic fact: processes search trees in order
● Post Order
○ Children then the parent
○ Last node will be the start node
BFS Example
QUEUE VISITED
Start at A and break ties alphabetically

BFS Example
QUEUE VISITED
A A
B
D

BFS Example
QUEUE VISITED
A A
B B
D
C
E

BFS Example
QUEUE VISITED
A A
B B
D D
C
E

BFS Example
QUEUE VISITED
A A
B B
D D
C C
E

BFS Example
QUEUE VISITED
A A
B B
D D
C C
E E
F

BFS Example
QUEUE VISITED
A A
B B
D D
C C
E E
F F

BFS Example
QUEUE VISITED
START 1 AWAY 2 AWAY A A : Start

B B : 1 Away
D D : 1 Away
C C : 2 Away
E E : 2 Away
1 AWAY 2 AWAY 3 AWAY F F : 3 Away

DFS (PreOrder) Example
STACK SEEN VISITED

STACK SEEN VISITED
A A A
B
D

STACK SEEN VISITED
A A A
B B B
C
E
D

STACK SEEN VISITED
A A A
B B B
C C C
E
D

STACK SEEN VISITED
A A A
B B B
C C C
E E E
D
F

STACK SEEN VISITED
A A A
B B B
C C C
E E E
D D D
F

STACK SEEN VISITED
A A A
B B B
C C C
E E E
D D D
F F F

VISITED
A
B
C
E
D
F

DFS Postorder?

DFS Postorder?

DFS Postorder?
VISITED
C
D
F
E
B
A

Rapidfire (1 / 3):
Runtime of DFS / BFS?
Rapidfire (1 / 3):
Runtime of DFS / BFS?
We add each vertex to the fringe one time, and we check all edges two times,
so it runs in O(V+E) time. (also called linear time)
This is the fastest way to traverse a graph

Rapidfire (2 / 3):
Given vertex A and B which are “close” to each other, which traversal should
we use to find a path to B starting at A?
Rapidfire (2 / 3):
Given vertex A and B which are “close” to each other, which traversal should
we use to find a path to B starting at A?
BFS, DFS generally traverses far vertices first while BFS generally traverses
close vertices first.
Rapidfire (3 / 3):
Say I represent a graph in real life with string,
glue, and blocks as shown. If I grab the light
green block and lift the whole graph ...
What block(s) will be the lowest?

Rapidfire (3 / 3):

● Red, Orange, Purple
Rapidfire (3 / 3):

Which traversal would I use to return the blocks

from the top to the bottom?
Rapidfire (3 / 3):

Which traversal would I use to return the blocks

from the top to the bottom?
● BFS - It returns the first ‘layer’ first, the
second ‘layer’ second, etc...
SP15 Fall Q. 2b
Draw a directed graph whose
● DFS pre-order traversal is {A, B, C, D, E}
● DFS post-order traversal is {C, B, D, E, A}
SP15 Fall Q. 2b
Hint: Simplify -- try to find a tree that fulfills these requirements

SP15 Fall Q. 2b A

B D E
C
Technique 1: Remember Rules
1. A must be root (first in pre-order, last in post-order)
2. C must be leaf (first in post-order)
3. E must be leaf (last in pre-order)
4. B must be in between A and C (before C in pre, after C in post)
5. D must be same layer as E (D is right before E in both graphs)
Minimum Spanning Trees
Review: Trees and Spanning Trees
● Trees
○ Trees are connected, acyclic graphs
■ Connected: a path exists between any two vertices
■ Acyclic: the tree contains no cycles
● N vertices, N-1 edges
● Spanning Trees
○ Connected, acyclic graphs that cover all N vertices in a graph using N-1 edges
● What are Minimum Spanning Trees...
Minimum Spanning Trees
● Given a graph with weighted edges (assume positive edge weights)
● Find the MST by finding N-1 edges that cover all N vertices
○ Create a spanning tree (connected, acyclic graph) with minimum total edge weight
● There may be multiple MSTs possible for any given graph
● Example
○ Vertices are places, edges are roads (edge weights are distances)
○ MST is a set of connecting roads of minimum total distance
○ Allows travel between any two places
■ Note: The route in the MST between two places A and B may not be the shortest
path possible between A and B.
The Cut Property
● A “cut” of a graph is a partition of the vertices into
two disjoint sets
● Cut Property: The min-weight edge in a cut must be
in the MST
● How does the cut property work?
○ There exists some cut of a graph with min-weight edge e.
○ Suppose we construct an MST where edge e is not a member
of the MST.
○ We choose some other edge e’ across the cut to be a member
of the MST, where weight(e) != weight(e’)
○ Since weight(e) < weight(e’), replacing e’ with e in the MST gives
us a smaller total edge weight, meaning the original MST we
constructed was not actually an MST.
The Cut Property
● Why do we need the cut property?
○ We can use it to construct an MST
● We will go over two MST algorithms
○ Prim’s Algorithm
○ Kruskal’s Algorithm
Prim’s Algorithm
● Construct an MST using Prim’s
● Start at an arbitrary node and create a tree from there
● Iteratively expand the cut until it includes every vertex
○ Add the shortest edge connecting some node already in the tree to one that isn’t yet
○ If the edge weights are non-unique, Prim’s may result in different MSTs depending on the
starting node and the tie-breaking algorithm between edges of the same weight.
Prim’s Visualized
9
B C
4
Starting at 2
Vertex A 1 5
A E
6 D 3
Prim’s Visualized
﹍﹍﹍﹍﹍
9
B C
4
2
1 5
A E
6 D 3
Prim’s Visualized
﹍
9
B C
4
﹍
2
1
﹍
5
A
6 ﹍ D 3
E
﹍
Prim’s Visualized
﹍﹍﹍﹍
9
B C
4
2
1 5
A E
6 D 3
Prim’s Visualized
9
B C﹍
4
2 ﹍
1 5
﹍
A
﹍ E
6 D ﹍ 3
Prim’s Visualized
9
B C
4
2
1 5
A E
6 D 3
Prim’s Visualized
B C
4
2
1
A E
D 3
Kruskal’s Algorithm
● Construct an MST using Kruskal’s
● Sort the graph edges by increasing weight
● Starting with the min-weight edge, iteratively pick edges that won’t form a
cycle until we have N-1 edges in the MST
● Always produces the same MST assuming same sort and Weighted Quick
Union tie-breaking
○ WQU is used to make sure the MST is acyclic
Kruskal’s Visualized
9
B C Sorted Edges:
5 1. B - D, 1
2 2. C - D, 2
1 4 3. D - E, 3
4. C - E, 4
5. A - B, 5
A E 6. C - E, 6
6 D 3 7. B - C, 9
9
B C Sorted Edges:
5 1. B - D, 1
2 2. C - D, 2
1 4 3. D - E, 3
4. C - E, 4
5. A - B, 5
A E 6. C - E, 6
6 D 3 7. B - C, 9
9
B C Sorted Edges:
5 1. B - D, 1
2 2. C - D, 2
1 4 3. D - E, 3
4. C - E, 4
5. A - B, 5
A E 6. C - E, 6
6 D 3 7. B - C, 9
9
B C Sorted Edges:
5 1. B - D, 1
2 2. C - D, 2
1 4 3. D - E, 3
4. C - E, 4
5. A - B, 5
A E 6. C - E, 6
6 D 3 7. B - C, 9
9
B C Sorted Edges:
5 1. B - D, 1
2 2. C - D, 2
1 4 3. D - E, 3
4. C - E, 4
5. A - B, 5
A E 6. C - E, 6
6 D 3 7. B - C, 9
9
B C Sorted Edges:
5 1. B - D, 1
2 2. C - D, 2
1 4 3. D - E, 3
4. C - E, 4
5. A - B, 5
A E 6. C - E, 6
6 D 3 7. B - C, 9
B C Sorted Edges:
5 1. B - D, 1
2 2. C - D, 2
1 3. D - E, 3
4. C - E, 4
5. A - B, 5
A E 6. C - E, 6
D 3 7. B - C, 9
Practice MSTs
-------------
Solution for MST Practice
-----
Shortest Paths Trees
Shortest Paths Trees
● Graph (directed or undirected) has non-negative edge weights
● Compute shortest paths from a given source node s to all other vertices in
the graph
○ “Shortest” = sum of weights along path is smallest
Dijkstra’s Algorithm
● Find the shortest paths from the source node using Dijkstra’s
● Use a minimum priority queue to visit vertices in increasing path cost
○ Start at the source node (cost = 0)
○ Push fringe nodes onto the minPQ
■ Keep track of the distance to a node (cost)
■ Keep track of which node led to this node in the tree (prev)
○ Pop a node from the minPQ when visiting the node
● May give incorrect solution if the graph has negative edges
○ Once you pop a node off of the priority queue, it is not necessarily true that we have
found the shortest path to it (unlike in Dijkstra’s)
Prim’s vs. Dijkstra’s
● The two algorithms are exactly the same, except for the following
● Visit order
○ Dijkstra’s algorithm visits vertices in order of distance from the source.
○ Prim’s algorithm visits vertices in order of distance from the MST under construction.
● Visitation
○ Visiting a vertex in Dijkstra’s algorithm means to relax all of its edges.
○ Visiting a vertex in Prim’s algorithm means relaxing all of its edges, but under the metric
of distance from tree instead of distance from source.
Dijkstra’s Visualized
Vertex Visited Cost Prev
S F Inf NULL
A F Inf NULL
B F Inf NULL
C F Inf NULL
G F Inf NULL
Priority Queue : {}
S F 0 NULL
A F Inf NULL
B F Inf NULL
C F Inf NULL
G F Inf NULL
Priority Queue : {(S,0)}
S T 0 NULL
A F Inf NULL
B F Inf NULL
C F Inf NULL
G F Inf NULL
Priority Queue : {}
S T 0 NULL
A F Inf NULL
B F 2 S
C F 2 S
G F Inf NULL
Priority Queue : {(B, 2), (C, 2)}
S T 0 NULL
A F Inf NULL
B F 2 S
C T 2 S
G F Inf NULL
Priority Queue : {(B, 2)}
S T 0 NULL
A F 4 C
B F 2 S
C T 2 S
G F Inf NULL
Priority Queue : {(B, 2), (A, 4)}
S T 0 NULL
A F 4 C
B T 2 S
C T 2 S
G F Inf NULL
Priority Queue : {(A, 4)}
S T 0 NULL
A F 3 B
B T 2 S
C T 2 S
G F Inf NULL
Priority Queue : {(A, 3)}
S T 0 NULL
A T 3 B
B T 2 S
C T 2 S
G F Inf NULL
Priority Queue : {}
S T 0 NULL
A T 3 B
B T 2 S
C T 2 S
G F 7 A
Priority Queue : {(G, 7)}
S T 0 NULL
A T 3 B
B T 2 S
C T 2 S
G T 7 A
Priority Queue : {}
S T 0 NULL
A T 3 B
B T 2 S
Following the Prev pointers for each vertex, reconstruct the shortest paths C T 2 S
from source S to the nodes B, C, A, G.
S → B = S-B
G T 7 A
S → C = S-C
S → A = S-B-A
S → G = S-B-A-G
S T 0 NULL
A T 3 B
B T 2 S
C T 2 S
● Above is the shortests paths tree from source S
● Note: C-A is not in the shortest paths tree because we only popped A G T 7 A
from the queue after visiting it from B. (S-B-A offered a shorter path
than S-C-A)
A* Search Algorithm
● Given a starting vertex S and an ending vertex G, find the shortest path
between the two nodes S → G
● Optimization of Dijkstra’s
○ Break once you dequeue G
○ Visit nodes closer to G first
● Use a heuristic function h to estimate a node’s distance to G
○ minPQ priority = cost + h
○ priority(y) = minPathCost(S → y) + h(y)
○ priority(y) = [priority(x) - h(x) + cost(x → y)] + h(y)
Admissible and Consistent Heuristics
● Heuristic for A* search must be admissible and consistent
● An admissible heuristic function never overestimates the minimum path
distance from a given vertex A to destination G.
○ In other words, h(A) must never overestimate d(A, G).
● A consistent heuristic satisfies the following inequality
○ Given neighboring vertices A and B and a destination G
○ h(A) ≤ h(B) + d(A, B)
● All consistent heuristics are admissible
A* Visualized
S F Inf NULL
A F Inf NULL
B F Inf NULL
C F Inf NULL
G F Inf NULL
Priority Queue : {}
A* Visualized
S F 0 NULL
A F Inf NULL
B F Inf NULL
C F Inf NULL
G F Inf NULL
Priority Queue : {(S,4)}
A* Visualized
S T 0 NULL
A F Inf NULL
B F Inf NULL
C F Inf NULL
G F Inf NULL
Priority Queue : {}
A* Visualized
S T 0 NULL
A F Inf NULL
B F 2 S
C F 2 S
G F Inf NULL
Priority Queue : {(B, 4), (C, 8)}
A* Visualized
S T 0 NULL
A F Inf NULL
B T 2 S
C F 2 S
G F Inf NULL
Priority Queue : {(C, 8)}
A* Visualized
S T 0 NULL
A F 3 B
B T 2 S
C F 2 S
G F Inf NULL
Priority Queue : {(C, 8), (A, 4)}
A* Visualized
S T 0 NULL
A T 3 B
B T 2 S
C F 2 S
G F Inf NULL
A* Visualized
S T 0 NULL
A T 3 B
B T 2 S
C F 2 S
G F 7 A
Priority Queue : {(C, 8), (G, 7)}
A* Visualized
S T 0 NULL
A T 3 B
B T 2 S
C F 2 S
G T 7 A
A* Visualized
S T 0 NULL
A T 3 B
B T 2 S
● We dequeued G, so we can quit! C F 2 S

● Notice we find the same paths as Dijkstra’s
○ True because our h is relatively accurate, and admissible
(doesn’t overestimate the node’s distance to G) G T 7 A
● However, we never visited C, and therefore saved runtime
○ In larger graphs, the effect is much more obvious
Self-Balancing Trees
BSTs
● Tree: A data structure with a root value and
3
child subtrees
● BST: Binary Search Tree
1 5
● To the LEFT are elements smaller than the
node
● To the RIGHT are elements larger than the
node
● For an balanced tree, search/insert/delete is
O(logN)
BSTs
● What happens when the BST isn’t balanced? 1
● Worst Case Runtime: search/insert/delete:

2
O(N)
○ Could look like a linked list (depending on
the order that you add elements) 3
● How do we make the runtime better?

4
B-Trees
● If our tree only grows at the root, the two sides will always have same
height
● Nodes can have multiple elements and are sorted
● All keys in subtrees to left of a key, K, are <K, and all to the right are >K
● Children are all empty (don’t really exist) and are equidistant from root
B-Tree Insertion
● Insertion: add just above bottom; split overfull nodes as needed, moving
one key up to parent
● We create new nodes when we split other nodes, not when we add
elements!
B-Tree Insertion
● Insert 7 into the (2, 4) Tree
B-Tree Insertion
B-Tree Insertion
B-Tree Insertion
B-Tree Insertion
B-Tree Insertion
B-Trees
● When we perform operations on a B-Tree, we maintain a balanced
structure
● Runtime of operations: O(log n)
Practice Question A
● True or false: If A and B are 2-4 trees with the exact same elements, they
must be identical.
Practice Question A
must be identical. False! A: insert 1, 2, 3, 4, 5, 6. B: insert 2, 3, 4, 5, 6, 1
1 2
Practice Question A
1 2 2 3
Practice Question A
1 2 3 2 3 4
Practice Question A
1 2 3 4 2 3 4 5
split! split!
Practice Question A
2 3
1 3 4 2 4 5
Practice Question A
2 3
1 3 4 5 2 4 5 6
Practice Question A
2 3
1 3 4 5 6 1 2 4 5 6
split!
Practice Question A
2 4 3
1 3 5 6 1 2 4 5 6
Practice Question A
2 4 3
1 3 5 6 1 2 4 5 6
False! We ended up with different trees.

Red Black Trees
● A red-black tree is a binary search tree with additional constraints that
limit how unbalanced it can be
○ Thus, searching is always O(lg N)
● Nodes are colored either red or black
● Red black trees are rotated and recolored to restore balance
Red Black Trees
1. Each node is (conceptually)
colored red or black
2. Root is black
3. Every leaf node contains no data
(as for B-trees) and is black
4. Every leaf has same number of
black ancestors
5. Every internal node has two
children
6. Every red node has two black
children
Red Black Trees and B-Trees
● Every red-black tree corresponds to a (2,4) tree, and the operations on
one correspond to those on the other
● Each node of a (2,4) tree corresponds to a cluster of 1–3 red-black nodes
in which the top node is black and any others are red
Red Black Trees
● A node in a (2,4) or (2,3) tree with three children may be represented in
two different ways in a red-black tree
● We can simplify the trees by only using the version on the left, which is
referred to as a left-leaning red black tree
Red Black Trees
● Inserting nodes into a red black tree will involve rotating our tree as well
as re-coloring it to maintain the red black tree properties
● We will focus specifically on red black trees that correspond to (2,3) trees
● In (2,3) trees, no node has more than 3 children, which means no
red-black node will have two red children
Tree Rotations
● In order to perform operations on a red black tree, we may have to rotate
our tree
● Rotating a normal binary tree to the right will promote the left child, and
vice versa
Red Black Tree Rotations
● For red black trees, we also need to recolor our tree
● When rotating, transfer the color from the original root to the new root,
and color the original root red
Red Black Tree Recoloring
● When we rotate our tree, sometimes we will temporarily create nodes
with too many children, so we need to be able to split them up
● We can do this by recoloring our tree
Red Black Tree Insertions
● When inserting into a red black tree, first insert into the bottom just like a
binary tree
○ Color as a red node (unless the tree is empty)
● We do not want any right-leaning trees or nodes with 4 children, so we
will perform any necessary rotations or recolorings to “fix up” the tree
Fixup 1: Convert right-leaning trees to left-leaning by rotating the tree to the
left using the rotation technique described earlier
● Nodes with a red right subtree and black left subtree need to be rotated
Fixup 2: Rotate linked red nodes into a normal 4-node (temporarily)
Fixup 3: Break up 4-nodes into 3-nodes or 2-nodes by recoloring
Fixup 4: As a result of other fixups, or of insertion into the empty tree, the root
may end up red, so color the root black after the rest of insertion and fixups
are finished
Note: This is only done at the end

● Insert 0 into the Red Black Tree
No fixups needed!
Insert a red 85 node at the

bottom of the tree!
This is right leaning!

Apply fixup 1!
There are now two linked red nodes!

Apply fixup 2!
This gives us a 4-node!

Apply fixup 3!
This gives us another 4-node!

Apply fixup 3!
This is a right-leaning tree!

Apply fixup 1!
And we’re done!

Practice Question B
● Draw the LLRB that results from adding 9 to the following LLRB:
7
3 10
1 8
Practice Question B
7 7
3 10 3 10
1 8 1 8
First, add 9 as a red node

Practice Question B
7 7
3 10 3 10
1 8 1 9
9 8
Apply fixup 1!
Practice Question B
7 7
3 10 3 9
1 9 1 8 10
8
Apply fixup 2!
Practice Question B
7 7
3 9 3 9
1 8 10 1 8 10
Apply fixup 3!
Practice Question B
7 9
3 9 7 10
1 8 10 3 8
Apply fixup 1!
Balanced Search Structures
● A trie is a data structure that takes advantage of using multiple branches
for each node
● Each node inside a trie corresponds to a possible prefix
{a, abase, abash, abate, abbas,

axolotl, axe, fabric, facet}
● A skip list is a sorted search tree where we put items at random heights
● We search along the top layer until we hit a node that is larger than our
target, then move down a layer and repeat
● Search for 125
Searching
level 3
● Search for 125
Searching
level 3
● Search for 125
Searching
level 3
● Search for 125
Searching
level 2
● Search for 125
Searching
level 2
● Search for 125
Searching
level 2
● Search for 125
Searching
level 1
● Search for 125
Searching
level 1
● Search for 125
Searching
level 0
● Search for 125
Success!
Sorting
Sort Qualities
Stable Sort - A sort that does not change the relative order of equivalent
entries (compared to the input) is called stable.
Internal Sorts - keep all data in primary memory.

External Sorts - process large amounts of data in batches, keeping what
won’t fit in secondary storage (in the old days, tapes).
Comparison-based - sorting assumes only thing we know about keys is their

order.
Radix sorting - uses more information about key structure.

Selection Sort
Algorithm Selection_Sort(list l):
1. Find the smallest item, swap

with first item
2. Find the second-smallest item,
swap with second
3. Find the third-smallest item,
swap with third
4. Repeat finding the next smallest
and performing swaps until list
is sorted
Selection Sort Runtime
Runtime for finding the minimum entry in an
unsorted list: Θ(N)
Find the minimum entry Θ(N) times
Runtime: Θ(N2)
Stable, internal, comparison-based sort

Inversions
Inversions - Number of pairs that are out of order.
- Number of swaps of adjacent entries it takes to sort the data.
How many inversions are there in this list?
32 15 2 17 19 21 41 17 52
Inversions
Inversions - Number of pairs that are out of order.
- Number of swaps of adjacent entries it takes to sort the data.
How many inversions are there in this list?
C B A D F G H E I
1 + 2 + 3 = 6 inversions
Insertion Sort
In-place insertion sort:
1. Pick an entry at the start of the list

2. Compare this entry to the item to the
left of it
3. If the entry is less than the adjacent
entry, swap the two entries
4. Repeat until the entry is no longer less
than its neighbor or it reaches the
front of the list
5. Repeat the above steps with the next
entry that has not yet been sorted
Insertion Sort Runtime
What is the runtime of insertion sort?
36 swaps
A. Ω(1), O(N)
B. Ω(N), O(N)
C. Ω(1), O(N2)
D. Ω(N), O(N2)
E. Ω(N2), O(N2)
Insertion Sort Runtime
What is the runtime of insertion sort?
36 swaps
A. Ω(1), O(N)
B. Ω(N), O(N)
C. Ω(1), O(N2)
D. Ω(N), O(N2)
E. Ω(N2), O(N2)
Θ(N + Inversions)
Max number of inversions is N(N-1)/2

when the list is reversed.
Insertion Sort
1. Is insertion sort stable?
2. Is insertion sort an internal or external sort?
3. Is insertion sort comparison-based or a radix sort?

Insertion Sort
1. Is insertion sort stable?
Yes
2. Is insertion sort an internal or external sort?
Internal sort (the way we implemented it)
3. Is insertion sort comparison-based or a radix sort?
Comparison-based
Heapsort
Algorithm HeapSort(List L):
1. Max-Heapify the array (in place

with most implementations)
2. While the heap is not empty,
swap the max element to the
end of the list.
3. Bubble down the element that
was swapped to the top of the
heap each time to maintain the
invariant.
https://docs.google.com/presentation/d/1z1lCiLSVLKoyUOIFspy1vxyEbe329ntLAVDQP3xj
mnU/pub?start=false&loop=false&delayms=3000&slide=id.g12a2a1b52f_0_1333
Heapsort
Algorithm HeapSort(List L):
1. Max-Heapify the array (in place

with most implementations)
2. While the heap is not empty,
swap the max element to the
end of the list.
3. Bubble down the element that
was swapped to the top of the
heap each time to maintain the
invariant.
https://docs.google.com/presentation/d/1z1lCiLSVLKoyUOIFspy1vxyEbe329ntLAVDQP3xj
mnU/pub?start=false&loop=false&delayms=3000&slide=id.g12a2a1b52f_0_1333
Heapsort Runtime
Runtime for removing an element from the
heap: Θ(logN)
Remove Θ(N) items from the heap
Runtime: Θ(NlogN)
Heapsort
1. Is heapsort stable?
2. Is heapsort an internal or external sort?
3. Is heapsort comparison-based or a radix sort?

Heapsort
1. Is heapsort stable?
No
2. Is heapsort an internal or external sort?
Internal sort
3. Is heapsort comparison-based or a radix sort?
Comparison-based
Heap Sort Practice
Suppose we want to sort the array [5, 6, 10, 17, 14, 12, 13] using in-place
heapsort. Give the array after heapification and a single remove-max
operation.
Heap Sort Practice
Suppose we want to sort the array [5, 6, 10, 17, 14, 12, 13] using in-place
heapsort. Give the array after heapification and a single remove-max
operation.
[14, 10, 13, 6, 5, 12, 17]

Merge Sort
Algorithm MergeSort(List L):
1. Split the list in half and sort

both halves of the list
2. Once we have two sorted
halves, compare their first
element and add the smaller
one to the sorted list.
3. Repeat step two until all
elements from both halves
have been added to the sorted
list.
https://docs.google.com/presentation/d/1h-gS13kKWSKd_5gt2FPXLYigFY4jf5rBkNFl3qZz
RRw/edit#slide=id.g12a3009c32_0_269
Merge Sort Runtime
Intuitive explanation:
● Every level does N work
○ Top level does N work.
○ Next level does N/2 + N/2 = N.
N
○ One more level down: N/4 + N/4 + N/4 + N/4 = N.
● Thus work is just Nk, where k is the number of levels.
○ How many levels? Goes until we get to size 1. N/2 N/2
○ k = lg(N)
● Overall runtime is N log N. N/4 N/4 N/4 …
N/8 N/8 ….
Merge Sort
1. Is merge sort stable?
2. Is merge sort an internal or external sort?
3. Is merge sort comparison-based or a radix sort?

Merge Sort
1. Is merge sort stable?
Yes
2. Is merge sort an internal or external sort?
External
3. Is merge sort comparison-based or a radix sort?
Comparison-based
Quicksort
Algorithm QuickSort(List L):
If L contains one element:
return L
Else:
Select a pivot P and partition with P
return Quicksort(left_half) + Quicksort(right_half)
https://docs.google.com/presentation/d/1QjAs-zx1i0_XWlLqsKtexb-iueao9jNLkN-gW9QxA
D0/edit#slide=id.g3655bd8207_1_60
Quick Sort
Quick sorting N items:
● Partition on leftmost item.
● Quicksort left half.
● Quicksort right half.
unsorted
Input: 32 15 2 17 19 26 41 17 17
Quick Sort
partition(32)
● Partition on leftmost item (32).
Input: 32 15 2 17 19 26 41 17 17
Quick Sort
partition(32)
● Partition on leftmost item (32).
in its
place
<= 32 >= 32
Input: 15 2 17 19 26 17 17 32 41
Quick Sort
partition(32)
● Partition on leftmost item (32) (done).
in its
place
Input: 15 2 17 19 26 17 17 32 41
Quick Sort
partition(32)
● Partition on leftmost item (32) (done). partition(15)
● Quicksort left half (details not shown). partition(2) partition(17)

● Quicksort right half. x x x
partition(19)
partition(17) partition(26)
x x x
partition(17)
in its in its in its in its in its in its in its in its x x
place place place place place place place place
Input: 2 15 17 17 17 19 26 32 41
Quick Sort
partition(32)
● Partition on leftmost item (32) (done). partition(15)
● Quicksort left half (details not shown). partition(2) partition(17)

● Quicksort right half (details not shown). x x x
partition(19)
partition(17) partition(26)
x x x
partition(17)
x x
in its in its in its in its in its in its in its in its in its
place place place place place place place place place
Input: 2 15 17 17 17 19 26 32 41
Quick Sort
1. Is quick sort stable?
2. Is quick sort an internal or external sort?
3. Is quick sort comparison-based or a radix sort?

Quick Sort
1. Is quick sort stable?
No
2. Is quick sort an internal or external sort?
Depends on implementation
3. Is quick sort comparison-based or a radix sort?
Comparison-based
Quicksort Runtime
datastructur.es
Best Case: Pivot Always Lands in the Middle
Only size 1 problems remain, so we’re done.
datastructur.es
Best Case Runtime?
What is the best case runtime?

datastructur.es
Best Case Runtime?
Total work at each level:

≈ N
≈ N/2 + ≈N/2 = ≈N
≈ N/4 * 4 = ≈N

Overall runtime:
Θ(NH) where H = Θ(log N)
so: Θ(N log N) datastructur.es

Worst Case: Pivot Always Lands at Beginning of Array
Give an example of an array that
would follow the pattern to the right.
What is the runtime Θ(·)?
datastructur.es
Worst Case: Pivot Always Lands at Beginning of Array
Give an example of an array that
would follow the pattern to the right.
● 123456
What is the runtime Θ(·)?

● N2
datastructur.es
Quicksort Performance
Theoretical analysis:
● Best case: Θ(N log N)
● Worst case: Θ(N2)
Compare this to Mergesort.

● Best case: Θ(N log N)
● Worst case: Θ(N log N)
Recall that Θ(N log N) vs. Θ(N2) is a really big deal. So how can Quicksort be the
fastest sort empirically? Because on average it is Θ(N log N).
datastructur.es
Counting Sort
1. Count number of
occurrences of each entry
and allocate space.
2. Walk through the list and
add items to the new
sorted list in the correct
spaces.
Counting Sort Runtime
Total runtime on N keys with alphabet of size R: Θ(N+R)
● Create an array of size R to store counts: Θ(R)
● Counting number of each item: Θ(N)
● Calculating target positions of each item: Θ(R)
● Creating an array of size N to store ordered data: Θ(N)
● Copying items from original array to ordered array: Do N times:
○ Check target position: Θ(1)
○ Update target position: Θ(1)
● Copying items from ordered array back to original array: Θ(N)
Memory usage: Θ(N+R)

Bottom line: If N is ≥ R, then we expect reasonable performance.
LSD Radix Sort
● Perform counting sort on the
least significant digit of the
keys, then perform counting
sort on the next digit and so
on until fully sorted.
MSD Radix Sort
● Same as LSD except start
sorting from the most
significant digit.
● This means we need
separate sections to sort
within after sorting based
on the first digit.
● We can stop MSD sort once
each sorted section is only
one element long (stop early
when the list is sorted)
Radix Sort Runtime
Let W be the number of digits in each key
We run the counting sort over digits of the keys W times
Runtime: Θ(W(N+R)) = Θ(WN + WR)

Dynamic Programming
Dynamic Programming
● Avoid repeated computations by memoizing our intermediate results
○ Ex: Fibonacci
■ From CS61A:
Dynamic Programming
● Instead, memoize our intermediate results
○ Create an array to hold intermediate results
■ Use these intermediate results along the way to calculate final result
Demo
Fib Table
0 1
Demo
Fib Table
0 1 2
Demo
Fib Table
0 1 2 3
Demo
Fib Table
0 1 2 3 5
Demo
Fib Table
0 1 2 3 5 8
Practice: Dynamic Programming
Find the length longest increasing subsequence out of a sequence of
numbers:
Ex: Given a list of numbers : [10, 22, 9, 33, 21, 50, 41, 60, 80], the longest
increasing subsequence is [10, 22, 33, 50, 60, 80] and the length is 6
Solution
First, we initialize a list to hold
our intermediate results
(longest subsequence up to
each integer of our list)
Then, we compute the

longest subsequences from
the left to right (bottom up
manner)
Then, we pick the max value.

Compression
Compression
● Huffman Encoding
○ Prefix Free Codes - No codes is a prefix of any other code
■ Removes ambiguity
○ Assign each symbol to a node with weight = relative frequency
○ Take the two smallest nodes and merge them into a node equal to the sum of their
weights
○ Continue until every node is in the tree
Delta Compression
● Delta Compression
○ In a Git repository, If two files have the same name and roughly same size, store one of
them as a pointer to the other plus a list of changes
■ Delta = Differences
● Git uses a combination of something similar to Delta Compression and
Huffman Encoding
○ LZ77 and Huffman Encoding
Huffman Encoding
Example:
Symbol Frequency
A .35
B .17
C .17
D .16 A B C E D
E .15
Huffman Encoding
Example:
Symbol Frequency
A .35
B .17
C .17
D .16
E .15
A B C E D
Huffman Encoding
Example:
Symbol Frequency
A .35
B .17
C .17
D .16
E .15 A B C D E
Huffman Encoding
Example:
Symbol Frequency
A .35
B .17
C .17
D .16
E .15
A B C D E
Practice
Solution
0 1
b
(.38) .62
0 1
e
(.25) .37
0 1
.15 .22
0 1 0 1
.09 c a
f (.06) (.10) (.12)
0 1
.04 d
(.05)
0 1
g h
(.01) (.03)
Pseudorandom Sequences
Pseudorandom Sequences
● Deterministic sequences that satisfy statistical criteria
● Linear congruential method used by Java:
● Java uses a = 5214903917, c = 11, m = 248 to compute 48 bit

pseudorandom numbers
○ Not cryptographically secure
● Cryptographic pseudorandom number generators
○ Given k bits of a sequence, no polynomial-time algorithm can guess the next bit >50% of
the time
○ It is infeasible to reconstruct the bits generated to the current state
Enumeration Types
Enumeration Types
● Want to represent a group of named constants
● The names are called enumeration constants (or enumerals)

○ Static, and final, and are of type Piece
● Can do things like: “import static Piece.*”
Enumeration Types
● Enum types are classes
○ Can define extra things like methods and constructors
● Constructors are used only in creating enumeration constants
Threads
Threads
● Want to use threads so asynchronous events can be divided into
subprograms
○ Threads allow us to insulate subprograms from each other
● Threads support concurrently running programs
○ If two threads simultaneously access data, programs would not run as expected
○ Can use Java primitive facilities to avoid interference
■ Ex: Wait method on Objects
● Coroutines
○ Synchronous thread that explicitly hands off control to other coroutines so that only one
thread runs at a time
Garbage Collection
Garbage Collection
● Scope: Portion of program in which something is visible
● Lifetime: Portion of program duration for which something exists
○ Static: Entire duration
○ Local: Duration of call block or execution
○ Dynamic: From time it was allocated to time it was deallocated
■ x = new int[50];
■ Java has no explicit means to free dynamically stored variables
● Garbage collector recycles objects automatically at runtime
● Pointers to objects are actually integer addresses
○ In C, we explicitly free and allocate our memory
○ In Java, we do not have to worry about freeing our references and memory
■ Slower, but less error-prone
Garbage Collection
● Free Lists
○ Explicit allocator (new keyword) gives storage given by OS to applications
■ Or gives recycled storage if available
○ When storage is freed, it is added to a freed list and used for explicit and automatic
freeing
○ Strategies for free lists:
■ Memory requests come in different sizes
● Might have to break up requests if they’re too big for free list chunks
● Sequential fits - Link blocks in order, coalesce adjusted blocks, search for fit
● Segregated fits - Separate free lists for different chunk sizes
● Buddy Systems - A kind of segregated fit where adjacent blocks are easy to
attach and combine into original chunks
Garbage Collection Strategies
● Reference Counting
○ Count number of pointers to each object, release the object when pointers go to 0
● Mark and Sweep
○ Traverse and mark graph of objects, freeing through unmarked objects
● Copying Garbage Collection
○ Traverse group of active objects and copy into contiguous open space (“to-space”) and
mark with forwarding pointer
○ Next time we have to copy an already marked object, just use forwarding pointer
○ The space we copied from becomes the next “to-space”
Git Internals
Git Internals
● Blobs: Files
● Trees: Directory structures of files
● Commits: References to trees and other
● Tags: Refer to commits other objects, used to provide information
● Branches: Refer to commits that are updated to keep track of the most
recent commits in various versions
Git Internals
Git Diagram
Git Internals
● Want a content addressable file system
● To be able to address objects between repositories, Git uses SHA1 Hash
to give keys with low probability of collisions
○ Cryptographic Hash Function
○ Object names in Git are 160 bit hash codes of contents

CSM Final Review Part 2 (11/2)

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

CSM Final Review Part 2 (11/2)

Transféré par

Droits d'auteur :

Formats disponibles

CSM Final Review, Part 2

Start at A and break ties alphabetically

Start at A and break ties alphabetically

Start at A and break ties alphabetically

Start at A and break ties alphabetically

Start at A and break ties alphabetically

Start at A and break ties alphabetically

Start at A and break ties alphabetically

START 1 AWAY 2 AWAY A A : Start

Start at A and break ties alphabetically

Start at A and break ties alphabetically

Start at A and break ties alphabetically

Start at A and break ties alphabetically

Start at A and break ties alphabetically

Start at A and break ties alphabetically

Start at A and break ties alphabetically

Start at A and break ties alphabetically

Start at A and break ties alphabetically

Start at A and break ties alphabetically

Start at A and break ties alphabetically

Start at A and break ties alphabetically

This is the fastest way to traverse a graph

What block(s) will be the lowest?

What block(s) will be the lowest?

What block(s) will be the lowest?

Which traversal would I use to return the blocks

What block(s) will be the lowest?

Which traversal would I use to return the blocks

Hint: Simplify -- try to find a tree that fulfills these requirements

Draw a directed graph whose

● We dequeued G, so we can quit! C F 2 S

● Worst Case Runtime: search/insert/delete:

● How do we make the runtime better?

False! We ended up with different trees.

Note: This is only done at the end

Insert a red 85 node at the

This is right leaning!

There are now two linked red nodes!

This gives us a 4-node!

This gives us another 4-node!

This is a right-leaning tree!

And we’re done!

First, add 9 as a red node

{a, abase, abash, abate, abbas,

Internal Sorts - keep all data in primary memory.

Comparison-based - sorting assumes only thing we know about keys is their

Radix sorting - uses more information about key structure.

1. Find the smallest item, swap

Stable, internal, comparison-based sort

- Number of swaps of adjacent entries it takes to sort the data.

How many inversions are there in this list?

- Number of swaps of adjacent entries it takes to sort the data.

How many inversions are there in this list?

1. Pick an entry at the start of the list

Max number of inversions is N(N-1)/2

2. Is insertion sort an internal or external sort?

3. Is insertion sort comparison-based or a radix sort?

2. Is insertion sort an internal or external sort?

Internal sort (the way we implemented it)

3. Is insertion sort comparison-based or a radix sort?

1. Max-Heapify the array (in place

1. Max-Heapify the array (in place

2. Is heapsort an internal or external sort?