COP3530 Cheat Sheet Data Structures

Hashing Balanced Trees bTree Minimum Spanning Tree
General Information - The basic idea behind hashing is to take a field in a record, known as General Information - We want to create trees that are as close to balanced as possible. General Information - In B-trees, internal (non-leaf) nodes can have a variable number of Given a connected, undirected graph, a spanning tree of that graph is a subgraph that is a
the key, and convert it through some fixed process to a numeric value, known as the hash This ensures that our search time complexity is closer to log(N) as apposed to N. child nodes within some pre-defined range. When data is inserted or removed from a node, its tree and connects all the vertices together. A single graph can have many different spanning
key, which represents the position to either store or find an item in the table. The numeric number of child nodes changes. In order to maintain the pre-defined range, internal nodes trees. We can also assign a weight to each edge, which is a number representing how
value will be in the range of 0 to N-1, where n is the maximum number of slots (or buckets) in may be joined or split. Because a range of child nodes is permitted, B-trees do not need re- unfavorable it is, and use this to assign a weight to a spanning tree by computing the sum of
the table. AVL Tree balancing as frequently as other self-balancing search trees, but may waste some space, the weights of the edges in that spanning tree. A minimum spanning tree (MST) or minimum
Self-balancing binary tree. If at any point the heights of two children subtrees differ by more since nodes are not entirely full. weight spanning tree is then a spanning tree with weight less than or equal to the weight of
Division Hashing (Susceptible to collisions) Characteristics: Trees are ALWAYS balanced, but we tolerate some wasted space to every other spanning tree.
than one, rebalancing is done to make them balancing.
Typical hashing involves taking an incoming key and performing a modulo operation on it with achieve this.
the size of the array. This ensures that your index is bounded (0 - N-1). Best table sizes are Behavior: When the node is full, we split the node into two and propagate the value upwards. If there are n vertices in the graph, then each spanning tree has n-1 edges.
prime numbers. If they are strings you typically add the hex values of the chars. Randomized Binary Search Tree The two new nodes are at the same level as the one we just split. The height changes ONLY If each edge has a distinct weight then there will be only one, unique minimum spanning tree.
when the root splits. This is what keeps the tree balanced. When we delete, two adjacent
Chain Hashing With a randomized binary search tree you are randomizing the root node from your given data nodes are recombined if both are less than half full. Two algorithms can be used to obtain a minimum spanning tree of a connected
set. This is very similar to how the quick sort works. A Randomized BST sort permutes the weighted and undirected graph. 1. Kruskal’s Algorithm 2. Prim’s Algorithm
When a collision occurs you can chain nodes together by having an array of link-lists. When a input, then calls a BST search on that data set. The expected average depth is log(N), can be Graphs, Adjacency Lists, Matrices
value is given the key of another value already in the hash map, you simply link it to the constructed in Nlog(N) time, and find takes log(N). By removing random elements from it you
previous member. preserve the trees “randomness.” A nearly sorted list will converge to a list, with depth N, and General Information (Graphs) - A graph consists of a set of vertices V called nodes, and a Kruskal’s Algorithm (Generates MST)
Pseudocode for inserting: be extremely unbalanced. set of edges E. It can be notated as G = (V, E). In this
Compute the hash key -> IF slot at hash key is null THEN insert as first node in the chain image V = {v1, v2, v3, v4, v5, v6} and E = {e1, e2, e3, e4, e5, Minimum-spanning-tree algorithm where the algorithm finds an edge of the least possible
ELSE search the chain for a duplicate key. -> IF duplicate key is found don’t insert ELSE e6} E = {(v1, v2) (v2, v3) (v1, v3) (v3, v4), (v3, v5) (v5, v6)}. weight that connects any two trees in the forest. It is a greedy algorithm in graph theory as it
insert into chain. Splay Binary Search Tree There are six edges and vertex in the graph finds a minimum spanning tree for a connected weighted graph at each step. This means it
Pseudocode for Deleting: finds a subset of the edges that forms a tree that includes every vertex, where the total weight
A splay tree is a binary search tree. Whenever an element is looked up in the tree, the splay of all the edges in the tree is minimized. If the graph is not connected, then it finds a minimum
Compute the hash key -> IF slot at hash key is null THEN there is nothing to delete, ELSE tree reorganizes to move that element to the root of the tree, without breaking the binary *A graph is said to complete (or fully connected or strongly
search the chain for the desired key. IF the key is not found, there is nothing to delete. ELSE search tree invariant. If the next lookup request is for the same element, it can be returned connected) if there is a path from every vertex to every other spanning forest (a minimum spanning tree for each connected component). Use Krusal’s
when you
remove the node from the chain. immediately. In general, if a small number of elements are being heavily used, they will tend vertex 1. Find the edge with the lowest weight
to be found near the top of the tree and are thus found quickly. (Good for instances where we 2. Continue repeating this pattern, it does not matter where they are unless an edge connects
have to decide on things very quickly) All splay tree operations run in O(log(N)) time on 1. Directed Graph - A graph, or set of nodes connected by
Linear-Probe Hashing edges, where the edges have a direction associated with them. to a previously connected vertex. If there are more than one edges with the same weight,
average. Worst case can be O(N). They depend on left and right rotations in order to stay then we pick whichever one we want. We need N-1 edges to form a complete MST.
relatively well balanced (Splay trees are not perfectly balanced) (Simple if it has no loops)
Items are stored in the next general slot in the table, assuming that the table is not already 2. Weighted Graph - Every edge and/or vertices in the graph
full. This is implemented via linear search for an empty spot from the point of collision. Search is assigned with some weight or value. A weighted graph can
will wrap-around if the end of the table is met. This might cause primary clustering around be defined as G = (V, E, We, Wv) where V is the set of vertices, E is the set at edges and We
values with the same hash. is a weights of the edges whose domain is E and Wv is a weight to the vertices whose
domain is V.
Quadratic-Probe Hashing How do we represent a graph as an adjacency list?
Similar to linear-probe hashing except that the instead of moving one spot over we move i^2 2-3-4 Tree There are two main ways of representing a graph as a DS:
spots over, where i is the number of attempts to resolve the collision. At most half of the table 1. Sequential representation of a graph using adjacency list
can be used as alternative locations to resolve collision. Once the table is more than half full it A 2-3-4 tree is a perfectly balanced tree, and because of this find, insert, and remove 2. Linked representation of a graph using linked list
becomes difficult to resolve the collision. This is known as secondary clustering. operations take O(log(N)) time, even in the worst case. Each node stores 1, 2, or 3 entries,
and has 2, 3, 4 children, respectively. Top-down 2-3-4 trees are usually faster than bottom-up Adjacency Matrix
ones.
Double-Hashing Hashing Insertion: Insert the new @ the lowest internal node reached in the search. (2-node becomes This is a directed graph, we can represent this
using the following rules:
Apply a second hash to a hash when a collision occurs. The new value will be the number of 3-node, 3-node becomes 4-node) Whenever we reach a 4-node we break it up into 2-nodes Aij = 1 {if there is an edge from Vi to Vj or if the
positions form the point of collision to place the item. It must never evaluate to zero, and all and move the middle node up into the parent node. We want to keep things in the middle.
edge (i, j) is member of E.} Aij = 0 {if there is no
cells must be probable. Common function is Hash2(key) = R - (key % R) where R is a prime edge from Vi to Vj}
number smaller than the size of the table.
Red-Black Trees (Preferred over a Splay) This adjacency matrix is showing the edges. When Prim’s Algorithm (Generates MST)
A balanced BST in which each node is colored red or black. A red node cannot have a red there is a directed edge from i to j, that array Similar to Kruskal’s also except that we are looking for the next closest and smallest edge form the
Hashing with Rehashing Hashing
child, and in any path from the root to null the number of black nodes remains the same. The element shows a 1. vertex. Our graph is always connected, and has no discontinuities. Kruskal’s will instead look for the
Once the hash table gets too large the running time for hashes takes too long, so making a root node is a black node. Think of this as an abstraction to the 2-3-4 tree. The structure has shortest edges which will eventually link up. Use Prim when you have a very dense graph with a lot of
table twice as large as the original is employed. The size of the new table should also be O(log(N)) search time. When an element is placed into the tree we travel down the tree and For an undirected array we have to place a value connections
prime. This is expensive though, it will take O(N) operations. This should be done when the find the correct element location for it. It will inherit the parent color, whether it be black or red. indicating that there is a connection for Vij, and Vji.
table either becomes full, an insertion fails, or the load factor has been reached. If one red is connected to another, then we will have to rebalance the tree. With this tree we This indicated that the connection goes both ways.
are GUARANTEED TO HAVE A BALANCED TREE. When going from the root to any leaf we
traverse through the same number of black links. It takes O(N^2) space to represent a graph with N
Hashing Functions vertices, even for a spars graph and it takes O(N^2)
Universal Hashing functions refer to a set of hashing functions from which one is selected time to solve the graph problem
depending on the data type. You typical use a pseudorandom number generator to
Pseudocode for string-based UHF is: Adjacency List (Linked-List Representation)
int hashU(char* V, int M): int h, a = 31415, b = 27183; for(h = 0; *v != 0; v
++, a = a*b % (M-1)): A. 2-node B. 3-node C. 4-node In this representation (also called adjacency list representation)
h = (a*h + *v) % M; return (h < 0) ? (h + M) : h; we store a graph as a linked structure. First we store all the
Modular Hashing a simple modulo operation. It is very fast but you have to be careful with the Skip Lists vertices of the graph in a list and then each adjacent vertices
implementation. You may have too many collisions. will be represented using linked list node.
An ordered linked list where each node contains a variable number of links, with the with links
in the nodes implementing a singly linked lists that skip the node with fewer than i links. This Although the linked list representation requires very less
Hashing Functions (Random notes) is a probabilistic data structure. memory as compared to the adjacency matrix, the simplicity
Space: O(N) avg. O(N*log(N)) worst. of adjacency matrix makes it preferable when graph are
1. The table for linear probing is larger than one with chaining since there are no nodes to Search: O(log(N)) O(N)
keep track of but the total amount of space may be smaller since no links are used. reasonably small.
Insert: O(log(N)) O(N)
Delete: O(log(N)) O(N) Most graph algorithms use the adjacency list to run through the graph. You cannot traverse
Additional Notes the graph with a adjacency matrix. Running Time:
1. The table for linear probing is larger than one with chaining since there are no nodes to Radix Search Cycle Detection Dijkstra’s Algorithm (Generates shortest path)
keep track of but the total amount of space may be smaller since no links are used.
Searching is based on the binary representation of the search key. If the keys are uniformly An easy way to detect if we have entered a loop is by by having a fastRunner and a slowRunner iterate NOTE: Dijkstra selects as next edge the one that leads out from the tree to a node not yet chosen closest
BST/Hash-table Advantages/Disadvantages: distributed the average time per operation in O(log(N)). The worst case is O(b) where b is the through the elements in the list. Fastener will be i + 1, and slow runner is at i. When fastRunner enters the to the starting node. (Then with this choice, distances are recalculated.) Prim choses as edge the shortest
Hash table supports searching, inserting, and deleting in O(1) time, but a self-balancing tree number of bits in the search key. And duplicated have to be handled some other way. loop, eventually slowRunner will collide with it. To surmise, create two pointers, FastPointer moves @ 2x one leading out of the tree constructed so far. Dijkstra’s algo will find the shortest from one node to all of
guarantees an upper bound at O(log(N)). Use a BST in the following scenarios: the rate SlowPointer. They meet at LOOP_SIZE - k turns. the other nodes, individually. (Greedy Algorithm)
1. Hash tables cannot give us an in-oreder traversal of the data in a natural manner. Digital Search Trees
2. Looking for statistics, i.e. finding lower and greater elements are not natural in hash tables. Similar to the binary search tree, except that branching is based on the bits of the search key. BFS (Breadth-First Search) [Searches across nodes] Floyd Warshall Algorithm (Generates shortest path)
3. BSTs are easier to implement without depending on external libraries. Search keys are given in the following fashion: 0bXXXX… Given the following example
4. With BSTs, all operations are guaranteed to work in O(Logn) time. But with Hashing, Θ(1) where A: 000, B: 001, C:010, D: 011, E: 100, F: 101, G: 110, H: 111. The breadth first search systematically traverse the edges of G to explore every vertex that is Algorithm for finding shortest paths in a weighted graph with positive or negative edge
is average time and some particular operations may be costly, especially when table resizing A. H’s are placed in the left, while values with MSB’s reachable from S. Then we examine all the vertices neighbor to source vertex S. Then we weights. The Floyd–Warshall algorithm compares all possible paths through the graph
happens. equal to H’s are placed in the right. traverse all the neighbors of the neighbors of source vertex S and so on. A queue is used to between each pair of vertices. It is able to do this with Θ(|V |3) comparisons in a graph. This is
keep track of the progress of traversing the neighbor nodes. remarkable considering that there may be up to Ω(|V |2) edges in the graph, and every
Heaps The time complexity can be expressed as O(|V|+|E|) since every vertex and every edge will combination of edges is tested.
General Information - Binary tree that is complete and the data stored @ each node is less B. E has a second bit value that is less than F’s so it is be explored in the worst case. Note: O(|E|) may vary between O(1) and O(|V|^2), depending Johnson's algorithm solves all pairs shortest paths, and may be faster than Floyd–Warshall
that the data stored in its children. Typically stored in arrays or vectors. Properties are the placed to the left on how sparse the input graph is. on sparse graphs.
following: 1. Root of the heap is @ [1] 2. Left child is located at [2*parent’s position] 3. Right Bellman–Ford: Computes shortest paths from a single source vertex to all of the other
child is located at [2*parent’s position +1] 4. Parent is located at either child’s position / 2 5. Pseudocode: vertices in a weighted digraph. It is slower than Dijkstra's algorithm for the same problem, but
Next free location is [number of elements + 1]. Trie procedure BFS(G,v) is: more versatile, as it is capable of handling graphs in which some of the edge weights are (-)
let Q be a queue
Priority Queue/Heaps Advantages: Can replace a hash table, looking up information in the worst case is O(m) Q.enqueue(v) Strongly Connected
where m is the length of the search string. There are no collisions with tries, unless one key is label v as discovered
Heap Property: For every node n, the value in n is greater than or equal to the values in its associated with multiple values. There is no need to provide a hash function or to change a while Q is not empty In the mathematical theory of directed graphs, a graph is said to be strongly connected if
children (and thus is also greater than or equal to all of the values in its subtrees). hash function as more keys are added. v ← Q.dequeue() every vertex is reachable from every other vertex.
Shape Property: All leaves are either at depth d or d-1 (for some value d). All of the leaves at Disadvantages: Can be slower than hash tables for looking up data especially if it is stored for all edges from v to w in G.adjacentEdges(v) do A directed graph is called strongly connected if there is a path in each direction between
depth d-1 are to the right of the leaves at depth d. (a) There is at most 1 node with just 1 child. somewhere with high read times. Some tries can require more space than a hash table as if w is not labeled as discovered each pair of vertices of the graph.
(b) That child is the left child of its parent, and (c) it is the rightmost leaf at depth d. memory may be allocated for each character in the search string rather than a single chunk of Q.enqueue(w) Tarzan Algorithm: Performs a single pass of depth first search. It maintains a stack of
Used in event based simulations memory for the whole entry. label w as discovered vertices that have been explored by the search but not yet assigned to a component, and
calculates "low numbers" of each vertex (an index number of the highest ancestor reachable
in one step from a descendant of the vertex) which it uses to determine when a set of vertices
Ternary Search Tree Multiway Trie DFS (Depth-First Search) [Searches down nodes] should be popped off the stack into a new component.
Nodes are arranged in a manner similar to a binary search tree, but with up to 3 children Multiway tries have strong key ordering, at a node X all keys in X’s leftmost subtree are General Information - An algorithm for traversing or searching tree or graph data structures. Kosaraju Algorithm: Uses two passes of depth first search. The first, in the original graph, is
rather than the binary tree's limit of 2. However, ternary search trees are more space efficient smaller than keys in X’s next-to-leftmost subtree. A preorder traversal will give you a the tree One starts at the root (selecting some arbitrary node as the root in the case of a graph) and used to choose the order in which the outer loop of the second depth first search tests
compared to standard prefix trees, at the cost of speed. Common applications for ternary in sorted order. If there are R bits per digit (R = 2^r) and keys contain at most B bits then the explores as far as possible along each branch before backtracking. Typical strategy for vertices for having been visited already and recursively explores them if not. The second
search trees include spell-checking and auto-completion. They are better suited for large data worst case height of an R-ary trie containing N keys is B/r solving mazes. Related to preorder traversal of a tree. By keeping track of which nodes have depth first search is on the transpose graph of the original graph, and each recursive
sets than a trie, and use less memory than a hashtable. been visited we can prevent infinite loops. Complexity same as BFS exploration finds a single new strongly connected component.
t is a
anning
w
sum of
nimum
ght of
ing tree.
sible
ry as it
ans it
al weight
minimum
al’s
connects
ght,
he
the
lot of
en closest
e shortest
to all of
e
h
h. This is
y
Warshall
er
em, but
are (-)
d if
een
of
and
achable
vertices
graph, is
s
ond

COP3530 Cheat Sheet Data Structures

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

COP3530 Cheat Sheet Data Structures

Transféré par

Droits d'auteur :

Formats disponibles

Hashing Balanced Trees bTree Minimum Spanning Tree

Vous aimerez peut-être aussi