Lect 2

Performance evaluation
Lecture-Module2
What is a Good Algorithm?

Efficient:
Running time Space used Efficiency as a function of input size: The number of bits in an input number
Number of data elements (numbers, points)
Measuring the Running Time

How should we measure the running time of an algorithm? Experimental Study

Write a program that implements the algorithm Run the program with data sets of varying size and composition.
Limitations of Experimental Studies

It is not necessary to implement and test the algorithm in order to determine its running time.
Experiments can be done only on a limited set of inputs, and may not be indicative of the running time on other inputs not included in the experiment. In order to compare two algorithms, the same hardware and software environments should be used.
Beyond Experimental Studies

We will develop a general methodology for analyzing running time of algorithms. This approach
Uses a high-level description of the algorithm instead of testing one of its implementations. Takes into account all possible inputs. Allows one to evaluate the efficiency of any algorithm in a way that is independent of the hardware and software environment.
Performance Evaluation of Algorithm

Two algorithms may perform the same task but one is more "efficient" than the other.
Efficiency means "uses fewer resources.
What is a resource? Resources include

CPU Cycles (time) Computer Memory (space)
Our concern is with significant differences in efficiency.
Cont

Most "coding tricks" provide insignificant improvements in efficiency. Focus must be on the underlying approach to solving the problem as opposed to the implementation.
Otherwise we might end up with an efficient implementation of an inefficient algorithm.
It is very convenient to classify algorithms based on the relative amount of time or space they require. We have to specify the growth of time /space requirements as a function of the input size.
Types of Complexity
There are two types of complexity:

Time Complexity: Running time of the program as a function of the size of input. Space Complexity: Amount of computer memory required during the program execution, as a function of the input size
Time and space are both important, but we are usually more interested in
time efficiency rather than space efficiency
These days - memory prices are quite low compared to the past.
Cont
The performance of an algorithm can be affected by

the language in which it is written, the machine it is executed on and the data it processes.

If we confine our analysis to the algorithm itself, we can avoid these problems. We can analyze the time efficiency of an algorithm by
estimating how many "high level" instructions will get executed as a function of the amount of data it is supposed to process.
Example: Sum of first n values

Algorithm:
read (n); i=1, sum=0; while(i < n+1) { sum=sum + i; i=i+1 } Output sum
This algorithm requires following operations:

check a condition, n times add the value of i to sum and Increment i plus the initial assignments, input and output. (constant)
Cont
So the time taken by above algorithm is dependent on the amount of input data n.
The time taken by algorithm is: (3 * n + 3) Here for large value of n, the extra three instructions are insignificant.
N = 100 -> 303 N = 1,000 -> 3,003 N = 10,000 -> 30,003
We can even ignore the coefficient 3 - the difference between 3,000 and 3,003 is not really significant! So when dealing with large problem sizes, small differences aren't really an issue. The "size of the problem" is usually expressed in terms of "number of items to process"
ContFew Terminologies
Growth rate: growth of time as the size of the problem grows.

The growth rate can usually be expressed as a function f(n).
Express the algorithms' growth function in terms of the order of magnitude of the growth rate:
For example, if f(n) = 3 * n + 3, then order of magnitude is O(n)
This is known as "Big O" Notation
Cont
Notice that low order terms and multiplicative constants of an algorithm's growth rate function can be ignored.
Performance-wise for sufficiently large n the following functions are of quadratic growth rates
f(N) = 300 * N2 + 3 * N + 42 and f(N) = 20 * N2
Obviously the performance algorithms is based on the data.
of
many
Big Oh Notation
It is convenient way of describing the growth rate of a function and hence the time complexity of an algorithm. Let n be the size of the input and f(n), g(n) be positive functions of n. Mathematical Definition of Big O: Function f(n) is of order another function g(n) if and only if there exists a real, positive constant c and a positive integer k such that |f(n)| <= c* |g(n)|, n >= k We write f(n) = O(g(n)
Various order of magnitudes

Constant: O(1)
f(n)=300 - it does not depend on the size of problem.
Logarithmic: O(log2n)
f(n)=log2(n)+20, - time increases in proportion to the logarithm of the size of the problem.
Linear: O(n)
f(n) = 2n +10 time is proportional to the size of the problem.
Quadratic: O(n2)
f(n) =5n2+100 time is proportional to the square of the size of the problem
Cont
Cubic: O(n3)
f(n) = 4*n3 + 200
N logarithmic: O(nlog2n)
f(n) = n * log2(n) + 10
Exponential : O(2n) f(n) = 300 * 2n - time is proportional to some

number raised to the size of the problem.
Recurrence relation

Complexity can be expressed by recurrence relation. Example:

Recurrence relation Complexity: for expressing Linear
Tk =
b + Tk-1 , if k 1 a , if k=1
Cont
The solution of above recurrence can be found as follows:

Tn = = = = = = b + Tn-1 b + (b+Tn-2) 2b + Tn-2 ... (n-1)b + T1 (n-1)b + a n*b + (a-b) O(n)
Recurrence for Exponential Complexity

A class of algorithms has complexity given by the following recurrence relation. 2*Tk-1 + a , if k 0 b , if k=0
Tk
Cont
The solution for Tn is:

Tn = = = = = = 2*Tn-1+a 2*(2*Tn-2+a) + a 2*(2*(Tn-3+a) + a) + a : 2*(2* (T1+a) + a..) + a 2*((a+b)2n-1-a)+a (a+b)*2n a O(2n)
Logarithmic Complexity
The following recurrence relation expresses Logarithmic complexity. Tk = Tk/2 + a b if k 1 if k = 1
Cont.
When n is a power of 2. The solution should be no surprise because when the problem size is doubled from k/2 to k the resource used only increase by a constant a. Repeated doubling, i.e. 1, 2, 4, 8, 16, 32, ..., reaches any limit n in about log2(n) steps. Unless otherwise stated, logarithms are to base two. The solution of above recurrence is: Tn = a*log2(n) + b
Worst case Running Time

The behavior of the algorithm with respect to the worst possible case of the input instance. The worst-case running time of an algorithm is an upper bound on the running time for any input. Knowing this gives us a guarantee that the algorithm will never take any longer. There is no need to make an educated guess about the running time.
Average case Running Time

The expected behavior when the input is randomly drawn from a given distribution. The average-case running time of an algorithm is an estimate of the running time for an "average" input. Computation of average-case running time entails knowing
all possible input sequences, the probability distribution of occurrence of these sequences, and the running times for the individual sequences. Often it is assumed that all inputs of a given size are equally likely.
Best/Worst/Average Case
Worst case is usually used: It is an upperbound and in certain application domains (e.g., air traffic control, surgery) knowing the worst-case time complexity is of crucial importance
For some algorithms worst case occurs fairly often Average case is often as bad as the worst case Finding average case can be very difficult
Amortized Running Time

Here the time required to perform a sequence of (related) operations is averaged over all the operations performed. Amortized analysis can be used to show that
the average cost of an operation is small, if one averages over a sequence of operations, even though a simple operation might be expensive.
Amortized analysis guarantees the average performance of each operation in the worst case.
Analysis of the Sequential Search

Recurrence relation: T(n) = T(n-1) + c T(1) = c Various cases

Best case : O(1) Average case : we might expect to find the key half way through the array after n/2 comparisons : O(n) worst case : O(n)
Analysis of Binary search

Recurrence relation: T(n) = T(n/2) + c T(1) = c Various Cases:

O(log2n) is the worst case O(1) - first comparison
Cont
N
8 16
32 64 128 256 512 1024 2048 4096
Consider the difference between O(N) and O(log N): O(N) O(log N)___
8 16 32 64 128 256 512 1024 2048 4096 8192 3 4 5 6 7 8 9 10 11 12 13
8192
Asymptotic Notation
Simple Rule: Drop lower order terms and constant factors.

50 n log n is O(n log n) 7n -3 is O(n) 8n2log n + 5n2+ n is O(n2log n)
Use O-notation to express number of primitive operations executed as function of input size.
Comparing asymptotic running times an algorithm that runs in O(n) time is better than one that runs in O(n2) time. Similarly, O(log n) is better than O(n)
big-Omega
The big-Omega Notation asymptotic lower bound

f(n) is (g(n))if there exists constants c and n0, s.t. c g(n) f(n) for n n0 Used to describe best-case running times or lower bounds for algorithmic problems E.g., lower-bound for searching in an unsorted array is (n).
Exhaustive Search
Many problems have complexity worse than polynomial, such as the recursive Towers of Hanoi, which is O(2n). Some are inherently expensive, such as finding all permutations of a string of n characters, which is O(n!). Faced with an expensive algorithm, one should look for techniques to reduce the work done.
NP and P problems

A P-problem is one whose solution time is bounded by a polynomial. A NP-problem is one which is solvable in polynomial time by a nondeterministic Turing machine. Is Problem P = NP?
Obviously P NP It is not known, whether P = NP
NP hard Problem

A problem is said to be NP-hard if an algorithm for solving it can be translated into one for solving any other NP-problem. It is much easier to show that a problem is NP than to show that it is NP-hard. Example of NP-hard problems is traveling salesman problems. A problem which is both NP and NP-hard is called an NP-complete problem.

Lect 2

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Lect 2

Transféré par

Droits d'auteur :

Formats disponibles

Performance evaluation

What is a Good Algorithm?

Number of data elements (numbers, points)

Measuring the Running Time

How should we measure the running time of an algorithm? Experimental Study

Limitations of Experimental Studies

Beyond Experimental Studies

Performance Evaluation of Algorithm

What is a resource? Resources include

Our concern is with significant differences in efficiency.

There are two types of complexity:

The performance of an algorithm can be affected by

Example: Sum of first n values

This algorithm requires following operations:

Growth rate: growth of time as the size of the problem grows.

This is known as "Big O" Notation

Obviously the performance algorithms is based on the data.

Various order of magnitudes

Exponential : O(2n) f(n) = 300 * 2n - time is proportional to some

Complexity can be expressed by recurrence relation. Example:

The solution of above recurrence can be found as follows:

Recurrence for Exponential Complexity

The solution for Tn is:

The following recurrence relation expresses Logarithmic complexity. Tk = Tk/2 + a b if k 1 if k = 1

Worst case Running Time

Average case Running Time

Amortized Running Time

Analysis of the Sequential Search

Recurrence relation: T(n) = T(n-1) + c T(1) = c Various cases

Analysis of Binary search

Recurrence relation: T(n) = T(n/2) + c T(1) = c Various Cases:

Simple Rule: Drop lower order terms and constant factors.

The big-Omega Notation asymptotic lower bound

Vous aimerez peut-être aussi