Vous êtes sur la page 1sur 43

Data Analysis in Decision Making Environment

Lecture 7-8 7-

Contents
     

Describing and Summarising Data Getting the Right Data Types of Data Distributions Decision Making Under uncertainty Concepts of Probability Probability Distributions

Describing and Summarising Data ( Basic Concepts)




Population: Population Includes all the objects of interest in a study; whether they be people, household, machines Sample: Sample A subset of a population, often randomly chosen and preferably representative of population as a whole A representative sample of reasonable size and give lot of information about population

Describing and Summarising Data ( Basic Concepts)


Variable: Variable Each column represents a variable. It is an attribute or measurement on members of population, like height, age, gender etc.  observation Each row corresponds to an observation: observation, or a list of variable values for a single member of population.  A the terms of fields and records may also be used for variables and observations


Describing and Summarising Data ( Basic Concepts)


Age Gender State Children Opinion Salary USD Mill/year
35 45 63 55 Male Female Female Male Texas NY NJ Ohio 2 3 2 4 1 2 5 3 0.25 0.75 1.2 0.9

Describing and Summarising Data ( Basic Concepts)




Numerical: Numerical A variable is numerical if meaningful arithmatic can be performed on it Categorical: Categorical If meaningful arithmatic cannot be performed on it.e.g opinion column is based on agree, disagree, neutral etc.

Describing and Summarising Data ( Basic Concepts)




Categorical: Categorical have definite order like, agree, strongly agree, neutral, disagree and strongly disagree; so they are termed as ordinal If there is no natural order like male , female in the values of variable, it is called nominal

Describing and Summarising Data ( Basic Concepts)


Age Gender State Children Opinion Salary USD Mill/year
35 45 63 55 Male Female Female Male Texas NY NJ Ohio 2 3 2 4 1 2 5 3
I6 I5

0.25 0.75 1.2 0.9

I7

Slide 8 I5 I6 I7 strongly agree


Irfan, 6/25/2007

strongly disagree
Irfan, 6/25/2007

neutral
Irfan, 6/25/2007

Describing and Summarising Data ( Basic Concepts)




The categorical variables may be coded or non coded like opinion was coded in previous example Similarly gender and age could also be coded

Describing and Summarising Data ( Basic Concepts)


Age Gender State Children Opinion Salary USD Mill/year
I16 I15

Y M E E

1 2 2 1

Texas NY NJ Ohio

2 3 2 4

1 2 5 3

I8

0.25 0.75

I9

1.2 0.9

I10

Slide 10 I8 I9 I10 I15 strongly agree


Irfan, 6/25/2007

strongly disagree
Irfan, 6/25/2007

neutral
Irfan, 6/25/2007

1 MALE 2 FEMALE
Irfan, 6/25/2007

I16

35 AND BELOW = YOUNG 36-45 MIDDLE AGE 46 AND ABOVE ELDERLY


Irfan, 6/26/2007

Describing and Summarising Data ( Basic Concepts)




Discrete variable if the numerical values variable: can be counted Continuous: A variable value resulting from continuous measurement Cross sectional data data on population data: at a distinct point in time, like opinion about elections Time series Data collected across time series: e.g value of stocks on daily basis

Describing and Summarising Data ( Basic Concepts)




Frequency Table: A list of number of observations under various categories Histogram: A bar chart of these frequencies

Describing and Summarising Data ( Basic Concepts)


All monetary values are in Million USD Name Anglela basset Jessica lange Julia Roberts Nicole Kidman Mel Gibson Bruce Willis Brad Pitt Tom Hanks Gender F F F F M M M M Domestic gross Foreign Gross 32 21 57 55 91 55 57 166 17 27 47 57 95 99 124 182 Salary 2.5 2.5 12 13 19 16 10 17.5

Describing and Summarising Data ( Basic Concepts)


200 150 Earnings 100 50 0

Monitary data for stars

M Mel Gibson 91 95 19

M Bruce Willis 55 99 16

M Brad Pitt 57 124 10

M Tom Hanks 166 182 17.5

Anglela Jessica Julia Nicole basset lange Roberts Kidman Domestic gross Foreign Gross Salary 32 17 2.5 21 27 2.5 57 47 12 55 57 13

Names & gender

Describing and Summarising Data ( Basic Concepts)




 

Steps for creating Histogram and Frequency Table: Start Excel, fill in required data in rows column format Place cursor anywhere on data Select the tools menu and select data analysis

Describing and Summarising Data ( Basic Concepts)


 

Select Histogram option and press OK Select the data input range for which the frequency and histogram is required Bin range is optional and the software itself chooses suitable range between highest and lowest values

Describing and Summarising Data ( Basic Concepts)


Histogram
6 5 Frequency 4 3 2 1 0 2.5 10.75 Bin More

Bin 2.5 10.75 More

Frequency 1 1 5

More Examples of data , frequency and Histogram Diameters of parts (Normal Distribution)
Bin 0.45 0.45 Frequency 1 2 1 8 10 14 21 29 39 33 42 38 39 36 34 13 19 9 5 6 0.46 0.47 0.47 0.48

45 40 35

Histogram

0.48 0.49 0.49 0.50

Frequency

30 25 20 15 10 5 0 More 0.45 0.45 0.46 0.47 0.47 0.48 0.48 0.49 0.49 0.50 0.50 0.51 0.51 0.52 0.52 0.53 0.53 0.54 0.54

0.50 0.51 0.51 0.52 0.52 0.53 0.53 0.54 0.54 More

Bin

Frequency

More Examples of data , frequency and Histogram Arrival to bank ( Skewed to left)
Histogram
120 100 80 Frequency 60 40 20 0 0.008 1.713 3.419 5.124 6.830 8.535 10.241 11.946 13.652 15.357 17.063 18.768 20.474 22.179 23.885 25.590 27.296 More
Bin 0.008 1.713 3.419 5.124 6.830 8.535 10.241 11.946 13.652 15.357 17.063 18.768 20.474 22.179 23.885 25.590 27.296 Frequency 1 106 70 36 30 20 14 4 6 6 1 1 1 2 0 0 1 1

Bin
More

More Examples of data , frequency and Histogram Distribution of Accounting Midterm scores (Skewed to right)
Histogram 25 20 15 10 5 0 43 50 57 64 71 Bin 78 85 92 More

Bin 43 50 57 64 71 78 85 92 More

Frequency 1 2 3 3 4 17 11 23 15

Frequency

More Examples of data , frequency and Histogram (two Machines output)


Bin Frequency 1 14 75 16 1 0 0 0 0 0 0 0 12 73 22

Histogram 80 70 60 50 40 30 20 10 0 0.486 0.495 0.504 0.513 0.522 0.531 0.540 0.549 0.558 0.567 0.576 0.586 0.595 0.604 More

0.486 0.495 0.504 0.513 0.522 0.531 0.540 0.549 0.558 0.567 0.576 0.586 0.595 0.604 More

Frequency

Bin

Learning the use of Statpro




Install all elements of decision making software Select the StatPro/Chart/Histrogram

Using Statpro
  

  

Install statpro Check statpro as add ins You will see statpro on the command menu Open actor and actresses file in excel Select statpro Select the statpro/charts/histogram menu

Using statpro


A list of numerical variables in data set would appear Enter values in the categories and minimum, category length The histogram will appear in a chart, and so would the data

Learning the use of Statpro


Histogram for Salary
16 14 12 10 8 6 4 2 0
<=2 2- 4 4- 6 6- 8 8- 10 10- 12 12- 14 14- 16 16- 18 18- 20 >20

Category

Learning the use of Statpro


Frequency table for Salary

Upper limit
2 4 6 8 10 12 14 16 18 20

Category
<=2 2- 4 4- 6 6- 8 8- 10 10- 12 12- 14 14- 16 16- 18 18- 20 >20

Frequency
2 15 11 12 9 3 3 2 3 6 0

Using stat pro examples (50 categories for both machines)


H i togr am f or s 50

45

40

35

30

25

20

15

10

C ategor y

oth

Using stat pro examples (50 categories for both machines)


Frequency table for Both Upper limit 0.48 0.485 0.49 0.495 0.5 0.505 0.51 0.515 0.52 0.525 0.53 0.535 Category <=0.48 .48- .485 .485- .49 .49- .495 .495- .5 .5- .505 .505- .51 .51- .515 .515- .52 .52- .525 .525- .53 .53- .535 Frequency 0 0 3 12 39 42 8 3 0 0 0 0

Using stat pro examples (50 categories for both machines)


Frequency table for Both 0.58 0.585 0.59 0.595 0.6 0.605 0.61 0.615 0.62 0.625 0.63 .575- .58 .58- .585 .585- .59 .59- .595 .595- .6 .6- .605 .605- .61 .61- .615 .615- .62 .62- .625 .625- .63 0 0 1 16 43 30 14 3 0 0 0

Using stat pro examples (50 categories for both machines)


Frequency table for Both

0.705
0.71 0.715 0.72

.7- .705 .705- .71 .71- .715 .715- .72 >0.72

0 0 0 0 0

Using stat pro examples (35 categories for both machines)


H i t ogr a m f or B ot h s 50

45

40

35

30

25

20

15

10

C a t e gor y

Analysing relationships with scatter plots




Scatter plot Contains a point for each plot: variable based on the values of two selected variables. The resulting plot shows relationship between two variables

Analysing relationships with scatter plots


Correlation = 0.502 64 60 Sales 56 52 48 44 0 4 8 12 16 YrsExper 20 24 28

Analysing relationships with scatter plots


Note: All monetary values are in $ thousands. YrsExper 24 8 2 12 8 4 6 6 Sales 54 57 45 61 57 50 54 54

Analysing relationships with scatter plots


Note: All monetary values are in $ thousands. YrsExper 10 11 16 14 10 18 22 20 Sales 60 60 62 62 60 61 57 60

Analysing relationships with scatter plots salary vs domestic gross


24 20 16 Salary 12 8 4 0 0 25 50 75 100 125 150 175 DomesticGross Correlation = 0.610

Analysing relationships with scatter plots domestic and foreign gross


Correlation = 0.643 175 150 125 DomesticGross 100 75 50 25 0 0 40 80 ForeignGross 120 160 200

Analysing relationships with time series plots


4800 4200 3600 Revenue 3000 2400 1800 1200 600 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Observation Number Time series plot of Revenue

Analysing relationships with time series plots


Time series plot of Revenue

4800 4200 3600 Revenue 3000 2400 1800 1200 600 Q1- Q2- Q3- Q4- Q1- Q2- Q3- Q4- Q1- Q2- Q3- Q4- Q1- Q2- Q3- Q492 92 92 92 93 93 93 93 94 94 94 94 95 95 95 95 Quarter

Analysing relationships with time series plots


Time series chart of Product1 and Product2 Product1 Product2

200 180 Product1 160 140 120 100 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 Observation Number

8.00 7.00 Product2 6.00 5.00 4.00 3.00

Assignment No. 5
Do any four examples and discuss their results

Vous aimerez peut-être aussi