Académique Documents
Professionnel Documents
Culture Documents
PDP-9
PDP-9 18-bit words 16K memory switches+lights paper tape reader/punch display w/ lightpen
write cross-assembler in IBM 360/50 assembly write Illustrator application in PDP-9 assembly load paper tape into PDP-9 run application
M searches, N symbols M >> N Improved running time from ~MN to ~MlgN IBM 360/50
Lesson 1: Good algorithms matter Lesson 2: Not many programmers appreciate that fact
Context
1982
1st
compiles
1986
2nd
runs
1994
3rd
2011
[this talk]
4th
Four challenges
I. II.
Many algorithms implemented/tested in back rooms, not open literature Need appropriate mathematical models
I. Scientic method
millions or billions of inputs 1012 nanoseconds is 15+ minutes 1018 nanoseconds is 31+ years
indexing and search Bose-Einstein model N-body signal processing string matching for genomics natural language analysis [ very long list ]
Important lessons of the past several decades 1. Efficient algorithms enable solution of problems that could not otherwise be addressed. 2. Scientific method is essential in understanding program performance
Important lessons for beginners engineers scientists programmers
create a model describing natural world use model to develop hypotheses run experiments to validate hypotheses refine model and repeat
model experiment
Algorithm designer who does not experiment gets lost in abstraction Software developer who ignores cost risks catastrophic consequences Scientist/engineer needs to control costs of experiments/simulations
Goal: compare performance of two basic implementations shortest augmenting path maximum capacity augmenting path Key steps in analysis research literature How many augmenting paths? What is the cost of finding each path?
this talk
E
edges
C
max capacity
upper bound
VE/2 VC 2E lg C
E = 2000
edges
C = 100
max capacity
upper bound
for example
VE/2 VC 2E lg C
E = 2000
edges
C = 100
max capacity
upper bound
for example
actual
VE/2 VC 2E lg C
37 7
E
edges
C
max capacity
upper bound
VE/2 VC 2E lg C
E (upper bound)
Warning: Such analyses are useless for predicting performance or comparing algorithms
predict performance (running time) or guarantee that cost is below specified bounds
worst-case bounds
Common wisdom random graph models are unrealistic average-case analysis of algorithms is too difficult worst-case performance bounds are the standard
Unfortunate truth about worst-case bounds often useless for prediction (fictional) often useless for guarantee (too high) often misused to compare algorithms
which ones??
O-notation is useful for many reasons, BUT Common error: Thinking that O-notation is useful for predicting performance.
RS (in a talk):
Q: RS: Q: RS:
?? O(N log N ) surely beats O(N2) Not by the definition. O expresses upper bound. So, use Theta. Still (typically) bounding the worst case. Is the input a worst case?
Q:
Galactic algorithms
R.J. Lipton: A galactic algorithm is one that will never by used in practice Why? Any effect would never be noticed in this galaxy
theoretical tour-de-force too complicated to implement cost of implementing would exceed savings in this galaxy, anyway
One bloggers conservative estimate: 75% SODA, 95% STOC/FOCS are galactic
OK for basic research to drive agenda, BUT Common error: Thinking that a galactic algorithm is useful in practice.
It is not optimal. It has an extra O(log log N) factor. But Algorithm B is very complicated, lg lg N is less than 6 in this universe, and that is just an upper bound. Algorithm A is certainly going to run 10 to 100 times faster in any conceivable real-world situation. Why should Google care about Algorithm B?
TCS:
Analytic Combinatorics
is a modern basis for studying discrete structures Developed by Philippe Flajolet and many coauthors (including RS) based on classical combinatorics and analysis
Generating functions (GFs) encapsulate sequences Symbolic methods treat GFs as formal objects
Complex asymptotics treat GFs as functions in the complex plane Study them with singularity analysis and other techniques Accurately approximate original sequence
=
<
( )=
( )=
( )
( )=
Quadratic equation
Binomial theorem
Stirlings approximation
< B >=
+<B><B>
( )=
( )
( )=
and treat as a function in the complex plane directly approximate via singularity analysis
( / )
Complexication
Assigning complex values to the variable z in a GF gives a method of analysis to estimate the coefficients. The singularities of the function determine the method.
singularity type meromorphic (just poles) fractional powers logarithmic none (entire function)
First Principle. Exponential growth of a functions coefficients is determined by the location of its singularities. Second Principle. Subexponential factor in a functions coefficients is determined by the nature of its singularities.
Analytic combinatorics
Q. Wait, didnt you say that the masses dont need to know all that math? RS. Well, there is one thing...
Why?
the constant a depends on both complex functions and properties of machine and implementation the exponential growth factor b should be 1 the exponent c depends on singularities the log factor d is reconciled in detailed studies
data structures evolve from combinatorial constructions universal laws from analytic combinatorics have this form
Plenty of caveats, but provides, in conjunction with the scientific method, a basis for studying program performance
develop a mathematical model for the frequency of execution of each instruction in the program determine the time required to execute each instruction multiply and sum
Hypothesis: T(N ) ~ a N c
... cycle time instruction set ... cache structure code GFs model ...
engineers part of the constant (harder to determine now than in the 1970s)
mathematicians part of the constant (easier to determine now than in the 1970s)
as N 0 grows
as N 0 grows
2. Run it for N 0 , 2N 0 , 4N 0 , 8N 0,
. . .
T(2N 0 ) a ( 2N 0 ) c ~ T(N 0 ) aN 0 c
= 2c
4. Multiply by 2c to predict next value
Plenty of caveats, but provides a basis for teaching the masses about program performance
borders on malpractice not to do so!
III. Introduction to CS
The masses
Scientists, engineers and modern programmers need
They also need to know how to write programs design and analyze algorithms
Do they need to know? Detailed analysis Galactic algorithms Overly simple input models
They do need to know Classic algorithms Realistic input models and randomization How to predict performance and compare algorithms
Unfortunate facts
Many scientists/engineers lack basic knowledge of computer science Many computer scientists lack back knowledge of science/engineering
1970s: Want to use the computer? Take intro CS. 1990s: Intro CS course relevant only to future cubicle-dwellers
identify fundamentals teach them to all students who need to know them as early as possible
CS for CS majors
CS for physicists
CS for EE
CS for poets
CS for idiots
CS for everyone
Original motivation (1992) Why not? Works for biology, math, physics, economics. Responsibility to identify and teach fundamental tenets of discipline.
modern programming models the scientific method in understanding program behavior fundamental precepts of computer science computation in a broad variety of applications preparing for a lifetime of engaging with computation
it is easier than most challenges youre facing you cannot be successful in any field without it
Performance matters
in support of encapsulation
data abstraction functions and modules graphics, sound, and image I/O arrays conditionals and loops
Math text I/O
StdIn StdOut StdDraw StdAudio Picture
Basic requirements
CS in scientific context
teaches a basic CS concept solves an important problem intellectually engaging and appealing illustrates modular programming is open-ended
functions libraries 1D arrays 2D arrays recursion strings I/O streams OOP data structures
sqrt(), log()
I/O, data analysis sound images fractal models genomes web resources Brownian motion small-world
teaches a basic CS concept solves an important problem intellectually engaging and appealing illustrates modular programming is open-ended
Bouncing ball
Simulation is easy
teaches a basic CS concept solves an important problem appeals to students intellectual interest illustrates modular programming is open-ended
Bouncing balls
OOP is helpful
teaches a basic CS concept solves an important problem appeals to students intellectual interest illustrates modular programming is open-ended
N-body
appeal to familiar concepts from HS science and math saves room broad coverage provides real choice for students choosing major modular organization gives flexibility to adapt to legacy courses detailed examples useful throughout curriculum
Incorrect perceptions about CS? scientific basis gives students the big picture students are enthusiastic about addressing real applications
Excessive focus on programming? careful introduction of essential constructs nonessential constructs left for later CS courses library programming restricted to key abstractns taught in context with plenty of other material
few students get adequate CS in high school nowadays 90+ percent on level playing field by midterms open-ended assignments appeal even to experienced programmers not harmful for CS students to learn scientific context before diving into abstraction
CS is for cubicle-dwellers?
learned more in this course than in any other came here to study physics/math/bio/econ, now I want to do CS cool
relevant CS concepts
Understanding of the costs Fundamental data types Computer architecture Computability and Intractability
scienti c content
Scientific method Data analysis Simulation Applications
Goals
Progress report
2008: Enrollments are up. Is this another bubble?
525
350
175
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
Progress report
2009: Maybe.
525
350
175
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
Progress report
2012: Enrollments are skyrocketing.
525
enrollments now are twice what they were at the height of the bubble
350
175
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
3% 35% 62%
CLASS
8%
none some lots
4%
INTENDED MAJOR
18%
70% 18%
First-year Sophomore Junior Senior
11%
24%
10% 37%
Books?
Libraries?
Textbooks?
Robert Sedgewick
December 10, 2007
Future of libraries?
1990 Every student spent significant time in the library 2010 Every student spends significant time online Few faculty members in the sciences use the library at all for research YET, the librarys budget continues to grow! 2020?
A few book museums (for Grafton) Digital library infrastructure (for everyone else)
Scientific papers?
Alan Kay: The best way to predict the future is to invent it.
Scientific papers
When is the last time you visited a library to find a paper? Did you print the papers to read the last time you refereed a conference?
Question: If it will not be read on paper, why write it as if it will? Prediction: Someone will soon invent the future (should be easy)
Textbooks
A road to ruin
prices continue to escalate students now rent, not own books planned obsolescence? walled garden?
Is there room for a good textbook? Will free web resources prevail?
Textbook traditional look-and-feel builds on 500 years of experience for use while learning
Booksite supports search has code, test data, animations links to references a living document for use while programming, exploring
Textbook
Part I: Programming (2009)
Prolog 1 Elements of Programming
Your First Program Built-in types of Data Conditionals and Loops Arrays Input and Output Case Study: Random Surfer
7 Theory of Computation
Formal Languages Turing Machines Universality Computability Intractability
3 Data Abstraction
Data Types Creating DataTypes Designing Data Types Case Study: N-body
8 Systems
Library Programming Compilers and Interpreters Operating Systems Networks Applications Systems
4 Algorithms/Data Structures
Performance Sorting and Searching Stacks and Queues Symbol Tables Case Study: Small World
9 Scientific Computation
Precision and Accuracy Differential Equations Linear Algebra Optimization Data Analysis Simulation
Booksite
introcs.cs.princeton.edu
Text digests Ready-to-use code Supplementary exercises/answers Links to references and sources Modularized lecture slides Programming assignments Demos for lecture and precept Simulators for self-study Scientific applications
10000+ les 2000+ Java programs 50+ animated demos 1.2 million unique visitors in 2011
intellectually challenging pervasive in modern life critical to modern science and engineering
Barriers
no room in curriculum need to implement all the algorithms (!) need to analyze all the algorithms (!) need to pick the most important ones
data abstraction and modular programming 50+ classic and important algorithms and data structures historical context, applications relationships to OR, theory of algorithms
Booksite
algs4.cs.princeton.edu
Text digests Ready-to-use code Supplementary exercises/answers Links to references and sources Modularized lecture slides Programming assignments Demos for lecture and precept Simulators for self-study Scientific applications
one-click download test data variants robust library versions typical clients
Algorithms are important and useful in scientific, engineering, and commercial applications of all sorts
Performance matters
Classic algorithms for sorting, searching, graphs and strings have enabled the development of the computational infrastructure that surrounds us
teaches a basic CS concept solves an important problem intellectually engaging modular program is open-ended
union- nd
teaches a basic CS concept solves an important problem intellectually engaging modular program is open-ended
graph search
teaches a basic CS concept solves an important problem intellectually engaging modular program is open-ended
priority queue
Algorithms enrollments
300
225
enrollments now are three times what they were at the height of the bubble
150
75
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
25+% of all Princeton students Key factor in increase: All students in CS for everyone can take Algorithms
Summary
The scientific method is an essential ingredient in programming. Embracing, supporting, and leveraging science in intro CS and algorithms courses can serve large numbers of students.
50+% of Princeton students in a single intro course 25+% of Princeton students in a single algorithms course
Next goals:
50+% of all college students in an intro CS course 25+% of all college students in an algorithms course ALGORITHMS FOR THE MASSES