Académique Documents
Professionnel Documents
Culture Documents
5.2 T RIES
R-way tries
ternary search tries
Algorithms
F O U R T H
E D I T I O N
character-based operations
typical case
implementation
red-black BST
hash table
search
insert
delete
log N
log N
log N
ordered
operations
operations
on keys
compareTo()
equals()
hashCode()
Q. Can we do better?
A. Yes, if we can avoid examining the entire key, as with string sorting.
2
dedup
implementation
search
hit
search
miss
insert
space
(references)
moby.txt
actors.txt
red-black BST
L + c lg 2 N
c lg 2 N
c lg 2 N
4N
1.40
97.4
hashing
(linear probing)
4N to 16N
0.76
40.6
Parameters
N = number of strings
L = length of string
R = radix
file
size
words
distinct
moby.txt
1.2 MB
210 K
32 K
actors.txt
82 MB
11.4 M
900 K
5.2 T RIES
R-way tries
ternary search tries
Algorithms
R OBERT S EDGEWICK | K EVIN W AYNE
http://algs4.cs.princeton.edu
character-based operations
Tries
Tries
Tries. [from retrieval, but pronounced "try"]
root
b
y
key
value
by
sea
sells
she
shells
shore
the
s
e
t
h
l
s
5
value for she in node
corresponding to last
key character
3
7
Search in a trie
Follow links corresponding to each character in the key.
b
y
s
e
t
h
h
o
7
return value associated
with last key character
(return 3)
Search in a trie
Follow links corresponding to each character in the key.
b
y
s
e
t
h
o
l
s
3
9
Search in a trie
Follow links corresponding to each character in the key.
b
y
s
e
t
h
h
o
7
no value associated
with last key character
(return null)
10
Search in a trie
Follow links corresponding to each character in the key.
b
y
s
e
t
h
h
o
7
no link to t
(return null)
11
b
y
s
e
t
h
h
o
3
12
trie
trie
h
o
s
e
a
a
0
e
0
s
1
l
l
Trie representation
15
extended ASCII
16
cast needed
Trie performance
Search hit. Need to examine all L characters for equality.
Search miss.
s
e
a
a
0
s
1
l
l
h
l
Trie representation
Bottom line. Fast search hit and even
faster search miss, but wastes space.
18
b
y
s
e
t
h
h
o
b
y
s
e
h
o
s
null value and links
(delete node)
s
20
dedup
implementation
search
hit
search
miss
insert
space
(references)
moby.txt
actors.txt
red-black BST
L + c lg 2 N
c lg 2 N
c lg 2 N
4N
1.40
97.4
hashing
(linear probing)
4N to 16N
0.76
40.6
R-way trie
log R N
(R+1) N
1.12
out of
memory
R-way trie.
5.2 T RIES
R-way tries
ternary search tries
Algorithms
R OBERT S EDGEWICK | K EVIN W AYNE
http://algs4.cs.princeton.edu
character-based operations
Abstract
We present theoretical algorithms for sorting and
searching multikey data, and derive from them practical C
implementations for applications in which keys are character strings. The sorting algorithm blends Quicksort and
radix sort; it is competitive with the best known C sort
codes. The searching algorithm blends tries and binary
search trees; it is faster than hashing and other commonly
used search methods. The basic ideas behind the algorithms date back at least to the 1960s but their practical
utility has been overlooked. We also present extensions to
more complex string problems, such as partial-match
searching.
1. Introduction
Section 2 briefly reviews Hoares [9] Quicksort and
binary search trees. We emphasize a well-known isomorphism relating the two, and summarize other basic facts.
The multikey algorithms and data structures are presented in Section 3. Multikey Quicksort orders a set of II
vectors with k components each. Like regular Quicksort, it
partitions its input into sets less than and greater than a
Robert Sedgewick#
23
12
s
e
14
h
o
l
s
11
10
15
b
y
12
14
13
11
l
y
10
15
e
7
l
y
13
24
s
b
l
6
a
l
3
25
s
b
l
6
a
l
no link to t
(return null)
26
27
s
b
l
6
a
l
3
28
Search in a TST
Follow links corresponding to each character in the key.
r
e
12
14
e
a
return value
associated with
last key character
11
u
10
15
e
7
l
y
13
29
now
for
tip
ilk
dim
tag
jot
sob
nob
sky
hut
ace
bet
men
egg
few
jay
owl
joy
rap
gig
wee
was
cab
wad
caw
cue
fee
tap
ago
tar
jam
dug
and
30
A character c.
A reference to a left TST.
A reference to a middle TST.
A reference to a right TST.
A value.
u
link for keys
that start with su
Trie node representations
31
32
d)
33
dedup
implementation
search
hit
search
miss
insert
space
(references)
moby.txt
actors.txt
red-black BST
L + c lg 2 N
c lg 2 N
c lg 2 N
4N
1.40
97.4
hashing
(linear probing)
4N to 16N
0.76
40.6
R-way trie
log R N
(R+1) N
1.12
out of
memory
TST
L + ln N
ln N
L + ln N
4N
0.72
38.7
Each of R
aa
TST
ab
ac
TST
zy
TST
TST
zz
TST
dedup
implementation
search
hit
search
miss
insert
space
(references)
moby.txt
actors.txt
red-black BST
L + c lg 2 N
c lg 2 N
c lg 2 N
4N
1.40
97.4
hashing
(linear probing)
4N to 16N
0.76
40.6
R-way trie
log R N
(R+1) N
1.12
out of
memory
TST
L + ln N
ln N
L + ln N
4N
0.72
38.7
TST with R2
L + ln N
ln N
L + ln N
4 N + R2
0.51
32.7
TSTs.
5.2 T RIES
R-way tries
ternary search tries
Algorithms
R OBERT S EDGEWICK | K EVIN W AYNE
http://algs4.cs.princeton.edu
character-based operations
value
by
sea
sells
she
shells
shore
the
Prefix match. Keys with prefix sh: she, shells, and shore.
Wildcard match. Keys that match .he: she and the.
Longest prefix. Key that is the longest prefix of shellsort: shells.
39
Iterable<String> keys()
Iterable<String> keysWithPrefix(String s)
Iterable<String> keysThatMatch(String s)
String longestPrefixOf(String s)
all keys
keys having s as a prefix
keys that match s (where . is a wildcard)
longest key that is a prefix of s
Remark. Can also add other ordered ST methods, e.g., floor() and rank().
40
by
by sea
by sea sells
42
Prefix matches
Find all keys in a symbol table starting with a given prefix.
Ex. Autocomplete in a cell phone, search bar, text editor, or shell.
43
b
y
b
ye 4
s
eh
t b
h h
h y
a 6 l a 6 el0 o e 0e o5
find subtrie
find for
subtrie
all for all
ll r l
r
l
keys beginning
keys beginning
with "sh"
with "sh"
s 1 ls 1 e l7 e
s
ye 4
h h
o e 0e o5
eh
la
6 el0
ll
l
7
ls
s
qkey
sh
sh
she
she
she
shel shel
shell shell
shellsshells
she shells
she sh
sho
sho
shor shor
shore shore
she shells
she shs
rl
e l7
e 7collect keys
collect keys
in that subtrie
in that subtrie
Prefix match
Prefix in
match
a triein a trie
keysWithPrefix("sh");
l
s
l
s
key
key
sh
she
shel
shell
shells
sho
shor
shore
collect keys
in that subtrie
queue
q
she
she shells
she shells shore
44
Longest prefix
Find longest key in symbol table that is a prefix of query string.
Ex. To send packet toward destination IP address, router chooses IP
address in routing table that is longest prefix match.
"128"
"128.112"
represented as 32-bit
binary number for IPv4
(instead of string)
"128.112.055"
"128.112.055.15"
"128.112.136"
"128.112.155.11"
"128.112.155.13"
longestPrefixOf("128.112.136.11") = "128.112.136"
longestPrefixOf("128.112.100.16") = "128.112"
longestPrefixOf("128.166.123.45") = "128"
"128.222"
"128.222.136"
floor("128.112.100.16") = "128.112.055.15"
45
"she"
"she"
"she"
s s s
s s s
e e e h h h
a a
a l2 le e
2 2l
0 0e
l l ll l l
"shellsort"
"shellsort"
"shellsort"
"shell"
"shell"
"shell"
s s s
e e e h h h
0
search
ends
atends
search
ends
at at
search
s s
ofend
string
of string
of string
1 1s
l 1l l endend
value
is
not
null
value
is not
nullnull
value
is not
she
sheshe
return
return
s s
3 3s 3return
ends
atends
search
ends
at at
search
a a
a l2 le e
2 2l
0 0e 0search
endend
ofend
string
of string
of string
value
is
null
value
is null
value
is null
l l ll l l return
she
sheshe
return
return
(last
key
onkey
path)
(last
key
on
path)
(last
on path)
s s
1 1s
l 1l l
s s
3 3s
e e e h h h
a a
a l2 le e
2 2l
0 0e
l l ll l l
s s
1 1s
l 1l l
s s
3 3s
search
ends
atends
search
ends
at at
search
null
link
null
linklink
null
shells
shells
shells
return
return
return
(last
key
onkey
path)
(last
key
on
path)
(last
on path
3
Possibilities
for longestPrefixOf()
Possibilities
forfor
longestPrefixOf()
Possibilities
longestPrefixOf()
Possibilities
for
longestPrefixOf()
46
47
T9 texting
Goal. Type text messages on a phone keypad.
Multi-tap input. Enter a letter by repeatedly pressing a key.
Ex. hello: 4 4 3 3 5 5 5 5 5 5 6 6 6
"a much faster and more fun way to enter text"
T9 text input.
Ex. hello: 4 3 5 5 6
Q. How to implement?
www.t9.com
48
Patricia trie
Patricia trie. [Practical Algorithm to Retrieve Information Coded in Alphanumeric]
put("shells", 1);
put("shellfish", 2);
standard
trie
no one-way
branching
shell
Applications.
Database search.
P2P network search.
IP routing tables: find longest prefix match.
Compressed quad-tree for N-body simulation.
Efficiently storing and querying XML documents.
fish
h
e
internal
one-way
branching
l
l
s
f
external
one-way
branching
i
s
51
Suffix tree
Suffix tree.
NA
NAS
NA
NAS
Applications.
52
Hash tables.
Bottom line. You can get at anything by examining 50-100 bits (!!!)
53