Vous êtes sur la page 1sur 36

Performance evaluation

Lecture-Module2

What is a Good Algorithm?


Efficient:
Running time Space used Efficiency as a function of input size: The number of bits in an input number

Number of data elements (numbers, points)

Measuring the Running Time


How should we measure the running time of an algorithm? Experimental Study


Write a program that implements the algorithm Run the program with data sets of varying size and composition.

Limitations of Experimental Studies


It is not necessary to implement and test the algorithm in order to determine its running time.
Experiments can be done only on a limited set of inputs, and may not be indicative of the running time on other inputs not included in the experiment. In order to compare two algorithms, the same hardware and software environments should be used.

Beyond Experimental Studies


We will develop a general methodology for analyzing running time of algorithms. This approach
Uses a high-level description of the algorithm instead of testing one of its implementations. Takes into account all possible inputs. Allows one to evaluate the efficiency of any algorithm in a way that is independent of the hardware and software environment.

Performance Evaluation of Algorithm


Two algorithms may perform the same task but one is more "efficient" than the other.
Efficiency means "uses fewer resources.

What is a resource? Resources include


CPU Cycles (time) Computer Memory (space)

Our concern is with significant differences in efficiency.

Cont

Most "coding tricks" provide insignificant improvements in efficiency. Focus must be on the underlying approach to solving the problem as opposed to the implementation.
Otherwise we might end up with an efficient implementation of an inefficient algorithm.

It is very convenient to classify algorithms based on the relative amount of time or space they require. We have to specify the growth of time /space requirements as a function of the input size.

Types of Complexity

There are two types of complexity:


Time Complexity: Running time of the program as a function of the size of input. Space Complexity: Amount of computer memory required during the program execution, as a function of the input size

Time and space are both important, but we are usually more interested in
time efficiency rather than space efficiency

These days - memory prices are quite low compared to the past.

Cont

The performance of an algorithm can be affected by


the language in which it is written, the machine it is executed on and the data it processes.

If we confine our analysis to the algorithm itself, we can avoid these problems. We can analyze the time efficiency of an algorithm by
estimating how many "high level" instructions will get executed as a function of the amount of data it is supposed to process.

Example: Sum of first n values


Algorithm:
read (n); i=1, sum=0; while(i < n+1) { sum=sum + i; i=i+1 } Output sum

This algorithm requires following operations:


check a condition, n times add the value of i to sum and Increment i plus the initial assignments, input and output. (constant)

Cont

So the time taken by above algorithm is dependent on the amount of input data n.
The time taken by algorithm is: (3 * n + 3) Here for large value of n, the extra three instructions are insignificant.
N = 100 -> 303 N = 1,000 -> 3,003 N = 10,000 -> 30,003

We can even ignore the coefficient 3 - the difference between 3,000 and 3,003 is not really significant! So when dealing with large problem sizes, small differences aren't really an issue. The "size of the problem" is usually expressed in terms of "number of items to process"

ContFew Terminologies

Growth rate: growth of time as the size of the problem grows.


The growth rate can usually be expressed as a function f(n).

Express the algorithms' growth function in terms of the order of magnitude of the growth rate:
For example, if f(n) = 3 * n + 3, then order of magnitude is O(n)

This is known as "Big O" Notation

Cont

Notice that low order terms and multiplicative constants of an algorithm's growth rate function can be ignored.
Performance-wise for sufficiently large n the following functions are of quadratic growth rates
f(N) = 300 * N2 + 3 * N + 42 and f(N) = 20 * N2

Obviously the performance algorithms is based on the data.

of

many

Big Oh Notation
It is convenient way of describing the growth rate of a function and hence the time complexity of an algorithm. Let n be the size of the input and f(n), g(n) be positive functions of n. Mathematical Definition of Big O: Function f(n) is of order another function g(n) if and only if there exists a real, positive constant c and a positive integer k such that |f(n)| <= c* |g(n)|, n >= k We write f(n) = O(g(n)

Various order of magnitudes


Constant: O(1)
f(n)=300 - it does not depend on the size of problem.

Logarithmic: O(log2n)
f(n)=log2(n)+20, - time increases in proportion to the logarithm of the size of the problem.

Linear: O(n)
f(n) = 2n +10 time is proportional to the size of the problem.

Quadratic: O(n2)
f(n) =5n2+100 time is proportional to the square of the size of the problem

Cont

Cubic: O(n3)
f(n) = 4*n3 + 200

N logarithmic: O(nlog2n)
f(n) = n * log2(n) + 10

Exponential : O(2n) f(n) = 300 * 2n - time is proportional to some


number raised to the size of the problem.

Recurrence relation

Complexity can be expressed by recurrence relation. Example:


Recurrence relation Complexity: for expressing Linear

Tk =

b + Tk-1 , if k 1 a , if k=1

Cont

The solution of above recurrence can be found as follows:


Tn = = = = = = b + Tn-1 b + (b+Tn-2) 2b + Tn-2 ... (n-1)b + T1 (n-1)b + a n*b + (a-b) O(n)

Recurrence for Exponential Complexity


A class of algorithms has complexity given by the following recurrence relation. 2*Tk-1 + a , if k 0 b , if k=0

Tk

Cont

The solution for Tn is:


Tn = = = = = = 2*Tn-1+a 2*(2*Tn-2+a) + a 2*(2*(Tn-3+a) + a) + a : 2*(2* (T1+a) + a..) + a 2*((a+b)2n-1-a)+a (a+b)*2n a O(2n)

Logarithmic Complexity

The following recurrence relation expresses Logarithmic complexity. Tk = Tk/2 + a b if k 1 if k = 1

Cont.

When n is a power of 2. The solution should be no surprise because when the problem size is doubled from k/2 to k the resource used only increase by a constant a. Repeated doubling, i.e. 1, 2, 4, 8, 16, 32, ..., reaches any limit n in about log2(n) steps. Unless otherwise stated, logarithms are to base two. The solution of above recurrence is: Tn = a*log2(n) + b

Worst case Running Time


The behavior of the algorithm with respect to the worst possible case of the input instance. The worst-case running time of an algorithm is an upper bound on the running time for any input. Knowing this gives us a guarantee that the algorithm will never take any longer. There is no need to make an educated guess about the running time.

Average case Running Time


The expected behavior when the input is randomly drawn from a given distribution. The average-case running time of an algorithm is an estimate of the running time for an "average" input. Computation of average-case running time entails knowing
all possible input sequences, the probability distribution of occurrence of these sequences, and the running times for the individual sequences. Often it is assumed that all inputs of a given size are equally likely.

Best/Worst/Average Case

Worst case is usually used: It is an upperbound and in certain application domains (e.g., air traffic control, surgery) knowing the worst-case time complexity is of crucial importance
For some algorithms worst case occurs fairly often Average case is often as bad as the worst case Finding average case can be very difficult

Amortized Running Time


Here the time required to perform a sequence of (related) operations is averaged over all the operations performed. Amortized analysis can be used to show that
the average cost of an operation is small, if one averages over a sequence of operations, even though a simple operation might be expensive.

Amortized analysis guarantees the average performance of each operation in the worst case.

Analysis of the Sequential Search


Recurrence relation: T(n) = T(n-1) + c T(1) = c Various cases


Best case : O(1) Average case : we might expect to find the key half way through the array after n/2 comparisons : O(n) worst case : O(n)

Analysis of Binary search


Recurrence relation: T(n) = T(n/2) + c T(1) = c Various Cases:


O(log2n) is the worst case O(1) - first comparison

Cont

N
8 16
32 64 128 256 512 1024 2048 4096

Consider the difference between O(N) and O(log N): O(N) O(log N)___
8 16 32 64 128 256 512 1024 2048 4096 8192 3 4 5 6 7 8 9 10 11 12 13

8192

Asymptotic Notation

Simple Rule: Drop lower order terms and constant factors.


50 n log n is O(n log n) 7n -3 is O(n) 8n2log n + 5n2+ n is O(n2log n)

Use O-notation to express number of primitive operations executed as function of input size.
Comparing asymptotic running times an algorithm that runs in O(n) time is better than one that runs in O(n2) time. Similarly, O(log n) is better than O(n)

big-Omega

The big-Omega Notation asymptotic lower bound


f(n) is (g(n))if there exists constants c and n0, s.t. c g(n) f(n) for n n0 Used to describe best-case running times or lower bounds for algorithmic problems E.g., lower-bound for searching in an unsorted array is (n).

Exhaustive Search

Many problems have complexity worse than polynomial, such as the recursive Towers of Hanoi, which is O(2n). Some are inherently expensive, such as finding all permutations of a string of n characters, which is O(n!). Faced with an expensive algorithm, one should look for techniques to reduce the work done.

NP and P problems

A P-problem is one whose solution time is bounded by a polynomial. A NP-problem is one which is solvable in polynomial time by a nondeterministic Turing machine. Is Problem P = NP?
Obviously P NP It is not known, whether P = NP

NP hard Problem

A problem is said to be NP-hard if an algorithm for solving it can be translated into one for solving any other NP-problem. It is much easier to show that a problem is NP than to show that it is NP-hard. Example of NP-hard problems is traveling salesman problems. A problem which is both NP and NP-hard is called an NP-complete problem.

Vous aimerez peut-être aussi