Program Efficiency and Complexity

Program Efficiency and Complexity
What will we be looking at:
What is an "efficient" program?

How can we measure efficiency?
The Big O, Big Theta and Big Omega Notation
Asymtotic Analysis
Programming and Data Structures 1

Why Measure Efficiency
Many ways to solve a problem.
- but which way (i.e. algorithm) is better
Moores Law :
- Number of transistors on CPU doubles every year
(well more precisely every 18 months)
- i.e. System performance doubles (well almost) every 18 months
or so.
If this is so and with current speeds of cpus these days, then why
bother worrying about how efficient our code is?

A Classic Optimisation Problem
Many optimisation problems can be formulated as a Travelling Salesman
problem.
The Travelling Salesman problem:

- Stated : A travelling salesman has to visit 100 different locations in a town,
what is the shortest route that he can take.
- Total number of distinct routes possible : 100! 30100
What does this mean in turns of running time?

- A supercomputer capable of checking 100 billion routes per second can
check roughly about 1020 routes in the space of one year.
- Millions of years needed to check all routes!

How do we Analyse an Algorithm?
A simple Example :
// Input: int A[N], array of N integers

// Output: Sum of all numbers in array A
int Sum(int A[], int N)

{
int s=0;
for (int i=0; i< N; i++)
s = s + A[i];
return s;
}
How should we analyse this?

Analysis of Sum Method
Niklaus Wirth (person who came up with Pascal and Modula) once said that any
algorithm can be written using only 3 programming statement constructs :
- Sequence
- Selection
Loop
Sequence :
- A series of statements that do not alter the path within the algorithm
- A call to another method is considered a sequence statement also.
Selection :
- Our if else statements
Loop :
- Our familiar for, while, do-while loops.

Analysis of Sum Method (2)
Efficiency of an algorithm is a function of the number of elements to be
processed.
If algorithm contains no loops then it is considered linear, so efficiency is

function of number of instructions.
Algorithms with loops vary in efficiency.
Lets get back to our Sum method:

- Describe the size of the input in terms of one or more parameters:
+ Input to Sum is an array of N ints, so size is N.
- Then, count how many steps are used for an input of that size:
+ A step is an elementary operation such as +, <, =, A[i]

Analysis of Sum Method (3)
// Input: int A[N], array of N integers
// Output: Sum of all numbers in array A
int Sum(int A[], int N) {

int s=0; 1
for (int i=0; i< N; i++)
2 3 4
s = s + A[i];
5
return s;
6 7 1,2,8 (i.e. sequences): Once
} 3,4,5,6,7: Once per each iteration
8 of for loop, N iteration
Total: 5n + 3
The complexity function of the
algorithm is : f(n) = 5n +3

How 5n+3 Grows
Estimated running time for different values of n:
n = 10 => 53 steps
n = 100 => 503 steps
n = 1,000 => 5003 steps
n = 1,000,000 => 5,000,003 steps
As n grows, the number of steps grows in linear proportion to

n for this Sum function.
This makes sense since f(n) = 5n+3 is a linear function in n.

Asymptotic Complexity
What term in the previous complexity function dominates?
What about the 5 in 5n+3? What about the +3?
As n gets large, the +3 becomes insignificant
The 5 is inaccurate as different operations require varying amounts of time
What is fundamental is that the time is linear in n.
Asymptotic Complexity: As n gets large, ignore all lower order terms and
concentrate on the highest order term only:
i.e. :
- Drop lower order terms such as +3
- Drop the constant coefficient of the highest order term.
Asymptotic Complexity (2)
The 5n+3 time bound is said to "grow asymptotically" like n.
This gives us an approximation of the complexity of the algorithm. (i.e. f(n) = n )
Ignores lots of (machine dependent) details, concentrates on the bigger picture.
Why is this useful?

- As inputs get larger, any algorithm of a smaller order will be more efficient
than an algorithm of a larger order.
0.05 n2 = O(n2)
Time (steps)
3n = O(n)
N = 60 Input (size)

Big O Notation
What does the O(n) and O(n2) mean in the previous slide?
This is known as the Big O notation.
It is used to express the upper bound on a function
If f(n) and g(n) are two complexity functions then we can say:
f(n) = O(g(n)) if constants c, n0 such that 0 f(n) cg(n), for all n n0
the above is read "f(n) is order g(n)", or "f(n) is big-O of g(n)"
cg(n)
f(n) Function cg(n) always dominates
f(n) to the right of n0
n0 n
Big O Notation
Think of f(n) = O(g(n)) as
- " f(n) grows at most like g(n)" or
- " f grows no faster than g"
(ignoring constant factors for n)
Important:
- Big-O is not a function!
- Never read = as "equals"
- Examples:
5n + 3 = O(n)
7n2 2n + 1 = O(n2)
Big O is used as an upper bound estimate so it is ok to say something like running

time is at worse O(n)
It is not ok to say running time is at least O(n)

Big Omega Notation
If we wanted to say running time is at least we use
Big Omega notation, , is used to express the lower bounds on a function.
If f(n) and g(n) are two complexity functions then we can say:
f(n) = (g(n)) if constants c, n0 such that 0 cg(n) f(n), for all n n0
f(n)
cg(n) In this instance, function cg(n) is
dominated by function f(n) to the right
of n0
n0 n
Example : 3n + 2 = (n)

Big Theta Notation
If we wish to express tight bounds we use the theta notation,
f(n) = (g(n)) means that f(n) = O(g(n)) and f(n) = (g(n))
cg(n)
f(n)
c1g(n)
n

What does this all mean?
If f(n) = (g(n)) we say that f(n) and g(n) grow at the same rate,
asymptotically
If f(n) = O(g(n)) but f(n) (g(n)), then we say that f(n) is

asymptotically slower growing than g(n).
If f(n) = (g(n)) but f(n) O(g(n)), then we say that f(n) is

asymptotically faster growing than g(n).
Mathematically we can express these in terms of limits as n

tends to infinity.

Limit as n tends to Infinity
f (n)
1. If lim n = 0 f(n) = O(g(n) )
g (n )
f (n)
2. If lim n = f(n) = (g(n) )
g (n)
f (n )
3. If lim n = c f(n) = (g(n) )
g (n )

Which Notation do we use?
To express the efficiency of our algorithms which of the three

notations should we use?
As computer scientist we generally like to express our algorithms

as big O since we would like to know the upper bounds of our
algorithms.
Why?
If we know the worse case then we can aim to improve it and/or

avoid it.

Common Orders of Growth
Let n be the input size, and b and k be constants
O(k) = O(1) Constant Time
Increasing Complexity
O(logbn) = O(log n) Logarithmic Time
O(n) Linear Time
O(n log n)
O(n2) Quadratic Time
O(n3) Cubic Time
...
O(kn) Exponential Time
O(n!) Exponential Time

Why Avoid Exponential Time Agorithms?
Suppose a program has run time O(n!) and the run time for
n = 10 is 1 second
- For n = 12, the run time is 2 minutes

- For n = 14, the run time is 6 hours
- For n = 16, the run time is 2 months
- For n = 18, the run time is 50 years
- For n = 20, the run time is 200 centuries!

Comparing Complexity
What happens if we double the input size n?
n log2n 5n n log2n n2 2n
8 3 40 24 64 256
16 4 80 64 256 65536
32 5 160 160 1024 ~109
64 6 320 384 4096 ~1019
128 7 640 896 16384 ~1038
256 8 1280 2048 65536 ~1076

Constant Time Statements
Simplest case: O(1) time statements
Assignment statements of simple data types

int x = y;
Arithmetic operations:
x = 5 * y + 4 - z;
Array referencing:
A[j] = 5;
Most conditional tests:
if (x < 12) ...

Analysing Loops Linear Loops
Example (have a look at this code segment):
int i = 1; Executed 1 times

while (i < n)
n comparisons
{
<<list of sequence statements>> Lets presume there is constant c steps
i++; present here; so we have c*n steps
}
Executed n times
Efficiency is proportional to the number of iterations.
Efficiency time function is :

f(n) = 1 + (n-1) + c*(n-1) +( n-1)
= (c+2)*(n-1) + 1
= (c+2)n (c+2) +1
Asymptotically, efficiency is : O(n)

Analysing Loops Logarithmic Loops

while (i < 1000)
1000 comparisons? not quite! -
{ depends on i variable
<<list of sequence statements>>
i = i * 2; Lets again presume there is constant c
} steps present here. Number of times
executed depends on i variable also.
What is i for each iteration?
In the code segment above, we cannot say that we iterate 1000 times as the number of
times we iterate is governed by the i variable.
i variable does not change in a linear fashion in this case.
Lets have a look at what happens to i during each iteration

Analysing Loops Logarithmic Loops (2)
i initially = 1
Iterations i in (i <100) check i after statement i = i * 2;
1 1 1*2 = 2
2 2 2*2 = 4
3 4 4*2 = 8
4 8 8*2 = 16
5 16 16*2 = 32
6 32 32*2 = 64
7 64 64*2 = 128
8 128 128*2 = 256
9 256 256*2 = 512
10 512 512*2 = 1024
11 1024 EXIT LOOP!
On inspection we can see that the change in value, n, to i is in base 2 to the power of x
(i.e. n = 2x)
Expressing this in terms of logarithms, we can say that there are log2n iterations.
Analysing Loops Logarithmic Loops (3)
So, we know that we loop log2n times
This means that all steps within our loop get done log2n times
So our time complexity function is :

f(n) = 1 + log2n + c*log2n + log2n
= 1 + (c+2)*log2n
Expressing this asymptotically, efficiency is : O(log2n)

Analysing Loops Nested Loops

while (i <= 20)
n comparisons
{
int j = 1; Initialisation done n times
while (j <= 20)
{
n comparisons
j++;
} Executed n times
i++;
}
Treat just like a single loop and evaluate each level of nesting as needed.
Total number of iterations is the product of total number of inner loop itterations and
outer loop itterations.
Analysing Loops Nested Loops (2)
Our time complexity function for our bit of code is then :
f(n) = 1+ (n + n + n) * (n + n)
Outer loop Inner loop
= 1 + 3n*2n
= 1+ 6n2
Asymtotically, efficiency is: O(n2)

Analysing Loops Nested Loops (3)
What if the number of iterations of one loop depends on the
counter of the other?
int j,k;
for (j=0; j < N; j++)
for (k=0; k < j; k++)
sum += k+j;
Solution :
Analyse inner and outer loop together:
- Number of iterations of the outer and inner loop together:
0 + 1 + 2 + ... + (n-1) = O(n2)

How Did We Get This Answer?
When doing Big-O analysis, we sometimes have to compute
a series like: 1 + 2 + 3 + ... + (n-1) + n
i.e. Sum of first n numbers. What is the complexity of this?
Gauss figured out that the sum of the first n numbers is always:
n
i=
i=1
n * (n+1)
2
n2 + n
= 2 = O(n2)
If we had analysed each loop seperately in the previous code we

would have worked out that the outer loop itterates n times and the
inner loop itterates (n+1)/2 times so total number of itterations is
n*(n+1)/2 which is Gausss equation.
Sequence of Statements
For a sequence of statements, compute their complexity functions
individually and add them up
for (j=0; j < N; j++)
for (k =0; k < j; k++) O(N2)
sum = sum + j*k;
for (l=0; l < N; l++)
sum = sum -l; O(N)
System.out.print("sum is now+sum); O(1)
Total cost is O(n2) + O(n) +O(1) = O(n2)

Conditional Statements
What about conditional statements such as
if (condition)
statement1;
else
statement2;
where statement1 runs in O(n) time and statement2 runs in O(n2) time?
We use "worst case" complexity: among all inputs of size n, what is the
maximum running time?
The analysis for the example above is O(n2)

Recurrence Relations
So far, all algorithms that we have been analysing have been non recursive
The running time of a recursive algorithm is naturally given by a recurrence

relation.
Recurrence Relation : Expresses the running time of a recursive algorithm for

inputs of size N in terms of smaller sized inputs.
Must solve recurrences to derive the running time.
Many ways to solve a recurrence equation once derived. Will look at three here :
- Iteration method
- Recursion Tree
- Master method

Deriving A Recurrence Equation
Lets have a look at a simple example of deriving a recurrence equation:
Example : Recursive power method
double power( double x, int n) {

if ( n == 0)
return 1.0; // base calse
//else
return power(x, n-1)*x; // recursive case
}
If N = 1, then running time T(N) is 2, (i.e. T(1) = 2 for N = 1)
However if N 2, then running time T(N) is the cost of each step taken plus time required
to compute power(x,n-1). (i.e. T(N) = 2+T(N-1) for N 2)
How do we solve this? One way is to use the iteration method.

Iteration Method
This is sometimes known as Back Substituting.
Involves expanding the recurrence in order to see a pattern.
Solving formula from previous example using the iteration method :
Solution : Expand and apply to itself :

Let T(1) = n0 = 2, so T(N) = nk
T(N) = 2 + T(N-1)
= 2 + 2 + T(N-2)
= 2 + 2 + 2 + T(N-3)
= 2 + 2 + 2 + + 2 + T(1)
= 2N + 2 remember that T(1) = n0 = 2 for N = 1
So T(N) = 2N+2 is O(N) for last example.
There are some common recurrence relations that appear time and time again which we will
introduce on the next slide along with the actual solution.

Common Recurrence Relations
The actual solving of the recurrence relations is left as an exercise for you to do in your tutorial.
T(1) = 1 for N = 1
T(N) = T(N-1) + N for N 2
This recurrence arises for a recursive algorithm that loops
N ( N + 1) through the input to eliminate one item:
Sol : T(N) = = O(N2)
2
T(1) = 1 for N = 1 This recurrence arises for a recursive algorithm that halves the
T(N) = T(N/2) + 1 for N 2 input in one step.
Hint for solving this : assume that N = 2n, so that the recurrence is
Sol : T(N) = lg N +1 = O(lg N) always defined (Note that this means that n = log2 N).
T(1) = 0 for N = 1 This recurrence arises for a recursive program that halves the input, but
T(N) = T(N/2) + N for N 2 perhaps must examine every item in the input.
Hint for solving this : Expand out to a geometric series and reason out.
Sol : T(N) = 2N = O(N)
T(1) = 0 for N = 1
T(N) = 2T(N/2) + N for N 2 This recurrence applies to a family of standard divide-and-conquer
algorithms.
Sol : T(N) = N lg N = O(N lg N)

Recursion Tree
The last formula is an example of an algorithm that splits the problem
into two halves and seeks to solve both.
These algorithms sometimes lend themselves to a pictorial method of

solution using recursion trees.
T(1) = 0 for N = 1
T(N) = 2T(N/2) + N for N 2
Size of Remaining Argument Additions

T(N) N => N
T(N/2) T(N/2) N/2 N/2 => N
T(N/4) T(N/4) T(N/4) T(N/4) N/4 N/4 N/4 N/4 => N

Recursion Tree Solution
In this case we can see that all the terms to be
added are N, and since the tree has height lg N the
solution is,
T(N) = N log N
With all of these methods we are seeking a pattern

which makes the solution simple.
If applying them does not lead to a simplification

of the solution then we may need to apply a
different method.
The Master Method
The Master Theorem: In general the Master
Theorem says that the recurrence,
T(1) = d
T(n) = aT(n/b) + cn
has solution
T(n) = O(n) if a < b
T(n) = O(n lg n) if a = b
T(n) = O(nlg a) if a > b

Example
Take the following recurrence equation :
T(1) = 0 for n = 1
T(n) = 4T(n/2) + cn for n 2
gives, a = 4, b = 2
therefore,
a > b and,
T(n) = O( nlg 4)
= O(n2)

Program Efficiency and Complexity

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Program Efficiency and Complexity

Transféré par

Droits d'auteur :

Formats disponibles

Program Efficiency and Complexity

What will we be looking at:

What is an "efficient" program?

Programming and Data Structures 1

Programming and Data Structures 2

The Travelling Salesman problem:

What does this mean in turns of running time?

Programming and Data Structures 3

// Input: int A[N], array of N integers

int Sum(int A[], int N)

How should we analyse this?

Programming and Data Structures 4

Programming and Data Structures 5

If algorithm contains no loops then it is considered linear, so efficiency is

Algorithms with loops vary in efficiency.

Lets get back to our Sum method:

Programming and Data Structures 6

int Sum(int A[], int N) {

Programming and Data Structures 7

As n grows, the number of steps grows in linear proportion to

This makes sense since f(n) = 5n+3 is a linear function in n.

Programming and Data Structures 8

What about the 5 in 5n+3? What about the +3?

As n gets large, the +3 becomes insignificant

The 5 is inaccurate as different operations require varying amounts of time

What is fundamental is that the time is linear in n.

This gives us an approximation of the complexity of the algorithm. (i.e. f(n) = n )

Ignores lots of (machine dependent) details, concentrates on the bigger picture.

Why is this useful?

Programming and Data Structures 10

This is known as the Big O notation.

It is used to express the upper bound on a function

f(n) = O(g(n)) if constants c, n0 such that 0 f(n) cg(n), for all n n0

the above is read "f(n) is order g(n)", or "f(n) is big-O of g(n)"

Big O is used as an upper bound estimate so it is ok to say something like running

It is not ok to say running time is at least O(n)

Programming and Data Structures 12

Big Omega notation, , is used to express the lower bounds on a function.

f(n) = (g(n)) if constants c, n0 such that 0 cg(n) f(n), for all n n0

Programming and Data Structures 13

f(n) = (g(n)) means that f(n) = O(g(n)) and f(n) = (g(n))

Programming and Data Structures 14

If f(n) = O(g(n)) but f(n) (g(n)), then we say that f(n) is

If f(n) = (g(n)) but f(n) O(g(n)), then we say that f(n) is

Mathematically we can express these in terms of limits as n

Programming and Data Structures 15

Programming and Data Structures 16

To express the efficiency of our algorithms which of the three

As computer scientist we generally like to express our algorithms

If we know the worse case then we can aim to improve it and/or

Programming and Data Structures 17

O(k) = O(1) Constant Time

Programming and Data Structures 18

- For n = 12, the run time is 2 minutes

Programming and Data Structures 19

What happens if we double the input size n?

Programming and Data Structures 20

Assignment statements of simple data types

Programming and Data Structures 21

int i = 1; Executed 1 times

Efficiency is proportional to the number of iterations.

Efficiency time function is :

Asymptotically, efficiency is : O(n)

Programming and Data Structures 22