Académique Documents
Professionnel Documents
Culture Documents
Lecture 7-8 7-
Contents
Describing and Summarising Data Getting the Right Data Types of Data Distributions Decision Making Under uncertainty Concepts of Probability Probability Distributions
Population: Population Includes all the objects of interest in a study; whether they be people, household, machines Sample: Sample A subset of a population, often randomly chosen and preferably representative of population as a whole A representative sample of reasonable size and give lot of information about population
Numerical: Numerical A variable is numerical if meaningful arithmatic can be performed on it Categorical: Categorical If meaningful arithmatic cannot be performed on it.e.g opinion column is based on agree, disagree, neutral etc.
Categorical: Categorical have definite order like, agree, strongly agree, neutral, disagree and strongly disagree; so they are termed as ordinal If there is no natural order like male , female in the values of variable, it is called nominal
I7
strongly disagree
Irfan, 6/25/2007
neutral
Irfan, 6/25/2007
The categorical variables may be coded or non coded like opinion was coded in previous example Similarly gender and age could also be coded
Y M E E
1 2 2 1
Texas NY NJ Ohio
2 3 2 4
1 2 5 3
I8
0.25 0.75
I9
1.2 0.9
I10
strongly disagree
Irfan, 6/25/2007
neutral
Irfan, 6/25/2007
1 MALE 2 FEMALE
Irfan, 6/25/2007
I16
Discrete variable if the numerical values variable: can be counted Continuous: A variable value resulting from continuous measurement Cross sectional data data on population data: at a distinct point in time, like opinion about elections Time series Data collected across time series: e.g value of stocks on daily basis
Frequency Table: A list of number of observations under various categories Histogram: A bar chart of these frequencies
M Mel Gibson 91 95 19
M Bruce Willis 55 99 16
Anglela Jessica Julia Nicole basset lange Roberts Kidman Domestic gross Foreign Gross Salary 32 17 2.5 21 27 2.5 57 47 12 55 57 13
Steps for creating Histogram and Frequency Table: Start Excel, fill in required data in rows column format Place cursor anywhere on data Select the tools menu and select data analysis
Select Histogram option and press OK Select the data input range for which the frequency and histogram is required Bin range is optional and the software itself chooses suitable range between highest and lowest values
Frequency 1 1 5
More Examples of data , frequency and Histogram Diameters of parts (Normal Distribution)
Bin 0.45 0.45 Frequency 1 2 1 8 10 14 21 29 39 33 42 38 39 36 34 13 19 9 5 6 0.46 0.47 0.47 0.48
45 40 35
Histogram
Frequency
30 25 20 15 10 5 0 More 0.45 0.45 0.46 0.47 0.47 0.48 0.48 0.49 0.49 0.50 0.50 0.51 0.51 0.52 0.52 0.53 0.53 0.54 0.54
0.50 0.51 0.51 0.52 0.52 0.53 0.53 0.54 0.54 More
Bin
Frequency
More Examples of data , frequency and Histogram Arrival to bank ( Skewed to left)
Histogram
120 100 80 Frequency 60 40 20 0 0.008 1.713 3.419 5.124 6.830 8.535 10.241 11.946 13.652 15.357 17.063 18.768 20.474 22.179 23.885 25.590 27.296 More
Bin 0.008 1.713 3.419 5.124 6.830 8.535 10.241 11.946 13.652 15.357 17.063 18.768 20.474 22.179 23.885 25.590 27.296 Frequency 1 106 70 36 30 20 14 4 6 6 1 1 1 2 0 0 1 1
Bin
More
More Examples of data , frequency and Histogram Distribution of Accounting Midterm scores (Skewed to right)
Histogram 25 20 15 10 5 0 43 50 57 64 71 Bin 78 85 92 More
Bin 43 50 57 64 71 78 85 92 More
Frequency 1 2 3 3 4 17 11 23 15
Frequency
Histogram 80 70 60 50 40 30 20 10 0 0.486 0.495 0.504 0.513 0.522 0.531 0.540 0.549 0.558 0.567 0.576 0.586 0.595 0.604 More
0.486 0.495 0.504 0.513 0.522 0.531 0.540 0.549 0.558 0.567 0.576 0.586 0.595 0.604 More
Frequency
Bin
Using Statpro
Install statpro Check statpro as add ins You will see statpro on the command menu Open actor and actresses file in excel Select statpro Select the statpro/charts/histogram menu
Using statpro
A list of numerical variables in data set would appear Enter values in the categories and minimum, category length The histogram will appear in a chart, and so would the data
Category
Upper limit
2 4 6 8 10 12 14 16 18 20
Category
<=2 2- 4 4- 6 6- 8 8- 10 10- 12 12- 14 14- 16 16- 18 18- 20 >20
Frequency
2 15 11 12 9 3 3 2 3 6 0
45
40
35
30
25
20
15
10
C ategor y
oth
0.705
0.71 0.715 0.72
0 0 0 0 0
45
40
35
30
25
20
15
10
C a t e gor y
Scatter plot Contains a point for each plot: variable based on the values of two selected variables. The resulting plot shows relationship between two variables
4800 4200 3600 Revenue 3000 2400 1800 1200 600 Q1- Q2- Q3- Q4- Q1- Q2- Q3- Q4- Q1- Q2- Q3- Q4- Q1- Q2- Q3- Q492 92 92 92 93 93 93 93 94 94 94 94 95 95 95 95 Quarter
Assignment No. 5
Do any four examples and discuss their results