Vous êtes sur la page 1sur 41

MH 1401

ALGORITHMS &
COMPUTING I
Asst. Prof. Axel POSCHMANN
AY 2012/13 Semester 1

23.10.2012 Lecture 10: Basic Statistics, Sets, Sorting,


and Indexing

23/10/12

Lecture 10: Basic Statistics, Sets, Sorting, and Indexing

L10-2

Remaining Lab Schedule


This Friday 26.10.2012 is Hari Raya Haji
No make up lab sessions for LA4, LA5, LA6, no lab session LA3

on Wednesday 24.10.2012
Next week will be Lab 8 (no marks, but part of GL2)
The week after is Graded Lab Session 2 (GL2)
The week after is presentation of the final project
Lab 8

GL2

Project

LA1

Monday

29.10.2012

05.11.2012

12.11.2012

LA2

Tuesday

30.10.2012

06.11.2012

20.11.2012

LA3

Wednesday

31.10.2012

07.11.2012

14.11.2012

LA4

Friday

02.11.2012

09.11.2012

16.11.2012

LA5

Friday

02.11.2012

09.11.2012

16.11.2012

LA6

Friday

02.11.2012

09.11.2012

16.11.2012

23/10/12

Lecture 10: Basic Statistics, Sets, Sorting, and Indexing

L10-3

Final Project
Information are available in edventure
Groups of 5 have been (randomly) assembled
Meet your team and split the work, discuss your approach, schedule

meetings etc
Remember: everybody is responsible for one part
More explanations during next weeks lecture
Deadline (sharp): 11.11.2012 23:59h SGT

23/10/12

Lecture 10: Basic Statistics, Sets, Sorting, and Indexing

Quiz
Use a pen
Closed Book
Move your bags, materials etc far away
5 minutes
1% possible

L10-4

23/10/12

Lecture 10: Basic Statistics, Sets, Sorting, and Indexing

L10-5

MH 1401 Algorithms & Computing I


Outline
Statistical Functions
Set Operations
Sorting
Index Vectors
Lessons Learned

23/10/12

Lecture 10: Basic Statistics, Sets, Sorting, and Indexing

L10-6

Statistical Functions in MATLAB


Statistical functions in MATLAB are in the data analysis

help topic datafun


>> help datafun
In general we will write a data set of n values as

X = {x1, x2, x3, x4, .., xn}


In MATLAB this will generally be represented as a row

vector called x

23/10/12

Lecture 10: Basic Statistics, Sets, Sorting, and Indexing

L10-7

Motivating Example
Statistics can be used to characterize properties of a data

set
Consider a set of exam grades
x = {33, 75, 77, 82, 83, 85, 85, 91, 100}
What is a normal, expected or average exam grade?
There are several ways to interpret this:
Mean: summing the grades, then divide by n (79)
Mode: Most often found grade (85)
Median: The value in the middle of the list (83)

Another useful property to know is how spread out the

data values are within the data set

23/10/12

Lecture 10: Basic Statistics, Sets, Sorting, and Indexing

Exp: UK Income Distributions


Mode ~290

L10-8

23/10/12

Lecture 10: Basic Statistics, Sets, Sorting, and Indexing

L10-9

Min and Max


MATLAB has many built in functions for statistics
min and max also return the index of the smallest/largest

value; if there is more than one occurrence, it returns the


first
Example
>> x=[9 10 10 9 8 7 3 10 9 8 5 10];
>> [maxval, maxind] = max(x)
maxval =
10
maxind =
Only the first index is returned
2

23/10/12

Lecture 10: Basic Statistics, Sets, Sorting, and Indexing

Min and Max (ctd)


For matrices min and max operate columnwise
Example

>> mat=[9 10 17 5; 19 9 11 14]


mat =
9 10 17 5
19 9 11 14
>> [minval, minind] = min(mat)
minval =
9 9 11 5
minind =
1 2 2 1

L10-10

23/10/12

Lecture 10: Basic Statistics, Sets, Sorting, and Indexing

L10-11

Min and Max (ctd)


min and max can also compare vector/matrices with the

same dimension
Example
>> x=[3 5 8
>> y=[2 6 4
>> min(x,y)
Ans =
2 5 4 2 10

2 11];
5 10];

Second argument is
for second vector/matrix

23/10/12

Lecture 10: Basic Statistics, Sets, Sorting, and Indexing

L10-12

Rowwise Min and Max (ctd)


To find the minimum/maximum for each row, the

dimension of 2 can be specified as the third argument


Example
>> mat=[9 10 17 5; 19 9 11 14]
mat =
For min and max the
second argument
9 10 17 5
must be empty vector
19 9 11 14
[minval, minind] = min(mat,[],2)
minval =
minind =
5
4
9
2

23/10/12

Lecture 10: Basic Statistics, Sets, Sorting, and Indexing

L10-13

The Arithmetic Mean


The arithmetic mean of a data set is also usually called

the average of the values


It is the sum of the values divided by the number of values
n

x
x=

i=1

>> x=[33, 75, 77, 82, 83, 85, 85, 91, 100];
>> mean(x)
ans =
79

23/10/12

Lecture 10: Basic Statistics, Sets, Sorting, and Indexing

L10-14

The Arithmetic Mean (ctd)


For a matrix, mean operates columnwise

>> mat = [8 9 3; 10 2 3; 6 10 9]
>> mean(mat)
8 9 3
ans =
columnwise
10 2 3
6 10 9
8 7 5
To find the mean of each row, the second argument is 2

>> mean(mat,2)
ans =
6.6667
5.0000
8.3333

8 9 3
10 2 3
6 10 9

rowwise

For mean the second


Argument does not
need to be []

23/10/12

Lecture 10: Basic Statistics, Sets, Sorting, and Indexing

L10-15

Outliers
Sometimes a value that is much larger or smaller than the

rest of the data -called an outlier- can throw off the mean
Example
>>ybig=[9 10 10 9 8 100 7 3 10 9 8 5 10];
>>mean(ybig)
ans =
15.2308
Typically, an outlier represent an error of some kind (data
collection etc)
In this example, the maximum and minimum could be
removed using logical indexing (how?)

23/10/12

Lecture 10: Basic Statistics, Sets, Sorting, and Indexing

L10-16

Variance and Standard Deviation


The standard deviation and variance are ways of

determining the spread of the data


The variance is usually defined in terms of the arithmetic
mean as
n
2
(x

mean)
i

var =

i=1

n 1

(Sometimes the denominator is defined as n, but MATLAB

uses n-1)

23/10/12

Lecture 10: Basic Statistics, Sets, Sorting, and Indexing

L10-17

Variance and Standard Deviation (ctd)


Example, x = [8 7 5 4 6]. Then the mean for the n=5

elements is 6

(8 6)2 + (7 6)2 + (5 6)2 + (4 6)2 + (6 6)2


var =
= 2.5
4
Matlab has a built in function var

>> x = [8 7 5 4 6];
>> var(x)
ans =
2.5000

23/10/12

Lecture 10: Basic Statistics, Sets, Sorting, and Indexing

L10-18

Variance and Standard Deviation (ctd)


The standard deviation is the square root of the variance

sd = var
MATLAB has a built in function std

>> x = [8 7 5 4 6];
>> std(x)
ans =
1.5811

23/10/12

Lecture 10: Basic Statistics, Sets, Sorting, and Indexing

The Mode
The mode of a data set is the value that appears most

frequently
MATLAB has a built in function mode
>> x=[9 10 10 9 8 7 3 10 9 8 5 10];
>> mode(x)
ans =
10
If there is more than one value with the same (highest)
frequency, the smaller value is the mode
>> x=[3 8 5 3 4 1 8];
>> mode(x)
ans =
3

L10-19

23/10/12

Lecture 10: Basic Statistics, Sets, Sorting, and Indexing

L10-20

The Median
The median is defined only for a data set that has been

sorted first, meaning that the values are in order


The median of a sorted set of n data values is defined as
The value in the middle if n is odd
The average of the two values in the middle if n is even

MATLAB has a built in function median


median Works also for unsorted vectors

>> x=[1 4 5 9 12];


>> median(x)
ans =
5
Odd case

>> x=[1 4 5 9 12 33];


>> median(x)
ans =
7
Even case

23/10/12

Lecture 10: Basic Statistics, Sets, Sorting, and Indexing

L10-21

MH 1401 Algorithms & Computing I


Outline
Statistical Functions
Set Operations
Sorting
Index Vectors
Lessons Learned

23/10/12

Lecture 10: Basic Statistics, Sets, Sorting, and Indexing

L10-22

Set Functions in MATLAB


MATLAB has several built in functions that perform set

operations
Examples are:
union,
intersect
setdiff
setxor
unique

All return vectors that are sorted from lowest to highest

(ascending order)
There are two is functions: issorted and ismember

23/10/12

Lecture 10: Basic Statistics, Sets, Sorting, and Indexing

L10-23

Union
The union function returns a vector that contains all of

the values from the two input argument vectors


Example
>>v1=[6 5 4 3 2];
>>v2=[1 3 5 7];
v1

>>union(v1,v2)
ans =
1 2 3 4 5 6 7

v2

23/10/12

Lecture 10: Basic Statistics, Sets, Sorting, and Indexing

L10-24

Intersect
The intersect function returns a vector that contains all

of the values that can be found in both of the two input


argument vectors
Example

>>v1=[6 5 4 3 2];
>>v2=[1 3 5 7];
v1

>>intersect(v1,v2)
ans =
3 5

v2

23/10/12

Lecture 10: Basic Statistics, Sets, Sorting, and Indexing

L10-25

Intersect (ctd)
The intersect function also returns an index vector into

v1 and an index vector into v2, such that outvec is the


same as v1(index1) and also v2(index2)
Example
>>v1=[6 5 4 3 2];
>>v2=[1 3 5 7];
>>[outvec,index1,index2]= intersect(v1,v2)
outvec =
3 5

index1 =
4 2

index2 =
2 3

23/10/12

Lecture 10: Basic Statistics, Sets, Sorting, and Indexing

L10-26

Setdiff
The setdiff function returns a vector consisting of all of the

values that are contained in the first input argument vector but
not the second
The order of the input arguments is important!
Example
>>v1=[6 5 4 3 2];
>>v2=[1 3 5 7];
v1
v2
>>setdiff(v1,v2)
ans =
2 4 6
>>setdiff(v2,v1)
v1
v2
ans =
1 7

23/10/12

Lecture 10: Basic Statistics, Sets, Sorting, and Indexing

L10-27

Setxor
The setxor function returns a vector consisting of all of the

values from the two input vectors that are not in the intersection
of these two vectors
It is the union of the two vectors obtained using setdiff
Example
>>v1=[6 5 4 3 2];
>>v2=[1 3 5 7];
v1
v2
>>setxor(v1,v2)
ans =
1 2 4 6 7
>>union(setdiff(v1,v2),setdiff(v2,v1))
ans =
1 2 4 6 7

23/10/12

Lecture 10: Basic Statistics, Sets, Sorting, and Indexing

L10-28

Unique
The unique function returns all of the unique values from

a set argument
Example

>>v3=[1 2 3 4 5 3 4 5 6];
>>unique(v3)
ans =
1 2 3 4 5 6

23/10/12

Lecture 10: Basic Statistics, Sets, Sorting, and Indexing

L10-29

MH 1401 Algorithms & Computing I


Outline
Statistical Functions
Set Operations
Sorting
Index Vectors
Lessons Learned

23/10/12

Lecture 10: Basic Statistics, Sets, Sorting, and Indexing

L10-30

Sorting
Sorting is the process of putting a list in order
Either descending highest to lowest
Or ascending lowest to highest
Example

>>vec=[85 70 100 95 80 91];


>>vec=sort(vec)
vec =
70 80 85 91 95 100
>>vec=sort(vec,descend)
vec =
100 95 91 85 80 70

By default
sorted acending

23/10/12

Lecture 10: Basic Statistics, Sets, Sorting, and Indexing

L10-31

Sorting (ctd)
For matrices the sort function will sort each column
To sort by rows dimension 2 is specified
Example

>>sort(mat)
ans =
6 2 3
8 9 3
10 10 9
>>sort(mat,2)
ans =
3 8 9
2 3 10
6 9 10

8 9 3
10 2 3
6 10 9

8 9 3
10 2 3
6 10 9

rowwise

columnwise

23/10/12

Lecture 10: Basic Statistics, Sets, Sorting, and Indexing

L10-32

Sorting rows
The sortrows function sorts each row as a block, or

group
Example

>>sortrows(mat)
ans =
6 10 9
8 9 3
10 2 3

8 9 3
10 2 3
6 10 9

columnwise

23/10/12

Lecture 10: Basic Statistics, Sets, Sorting, and Indexing

L10-33

Sorting rows (ctd)


The sortrows function also works on strings
Example

>>words=char(Hello,Hi,Goodbye,Ciao)
Words =
Hello
Hi
Goodbye
Ciao
>>sortrows(words)
ans =
Ciao
Goodbye
Hello
Hi

23/10/12

Lecture 10: Basic Statistics, Sets, Sorting, and Indexing

L10-34

MH 1401 Algorithms & Computing I


Outline
Statistical Functions
Set Operations
Sorting
Index Vectors
Lessons Learned

23/10/12

Lecture 10: Basic Statistics, Sets, Sorting, and Indexing

L10-35

Index Vectors
Using index vectors is an alternative to sorting a vector
Indexing leaves vector in its original order, just point to

the elements in the desired order


Example
1 2 3
4 5 6
>>grades=[85 70 100 95 80 91];
>>grade_index=[2 5 1 6 4 3];
>>grades(grade_index)
ans =
70 80 85 91 95 100

index

23/10/12

Lecture 10: Basic Statistics, Sets, Sorting, and Indexing

L10-36

Index Vectors (ctd)


General algorithm to create an index vector
Initialize the values in the index vector to be the indices

1,2,3, to the length if the vector


Use any sort algorithm, but compare the elements in the
original vector using the index vector to index into it (e.g.
using grades(grades_index(i))) as previously
shown
When the sort algorithm calls for exchanging values,
exchange the elements in the index vector, not in the
original vector.

23/10/12

Lecture 10: Basic Statistics, Sets, Sorting, and Indexing

Index Vectors (ctd)


createind.m
function indvec = createind(vec)
% Initialize the index vector
len = length(vec);
indvec = 1:len;
for i = 1:len-1
low = i;
for j=i+1:len
% Compare values in the original vector
if vec(indvec(j)) < vec(indvec(low))
low = j;
end
end
% Exchange elements in the index vector
temp = indvec(i);
indvec(i) = indvec(low);
indvec(low) = temp;
end
end

L10-37

23/10/12

Lecture 10: Basic Statistics, Sets, Sorting, and Indexing

L10-38

MH 1401 Algorithms & Computing I


Outline
Statistical Functions
Set Operations
Sorting
Index Vectors
Lessons Learned

23/10/12

Lecture 10: Basic Statistics, Sets, Sorting, and Indexing

L10-39

Lessons learned
Common Pitfalls:
Forgetting that max and min return the index of only the
first occurrence of the maximum or minimum value
Not realizing that a data set has outliers that can
drastically alter the results obtained from the statistical
functions

23/10/12

Lecture 10: Basic Statistics, Sets, Sorting, and Indexing

L10-40

Lessons learned
Programming Style Guidelines:
Remove the largest and the smallest numbers from a
large data set before performing statistical analysis to
handle the problem of outliers
Use sortrows to sort strings stored in a matrix
alphabetically

23/10/12

Lecture 10: Basic Statistics, Sets, Sorting, and Indexing

L10-41

The Geometric Mean


The geometric mean of the n values in a vector x is

defined as the nth root of the product of the data set


values

G = n x1 * x2 * x3 *...xn
>> x = [33, 75, 77, 82, 83, 85, 91, 100];
>> mean(x)
ans =
78.2500

Vous aimerez peut-être aussi