Attribution Non-Commercial (BY-NC)

91 vues

Attribution Non-Commercial (BY-NC)

- Cochin Port Employee Engagement
- BA201 Engineering Mathematic UNIT2 - Measures of Central Tendency
- Transportation Statistics: table b 12 a
- Transportation Statistics: table b 12 b
- Learning Activity 5.19 Psy
- IJER_2014_1004.pdf
- F06101finalass
- 10 Math Statistic
- Introduction to psychology
- 3
- TInspire Core
- Arman_Abdul_Razak
- SEC_case_study
- phase III
- 6-27
- Handout
- Theoritical Statistics Assignment Help
- 3.Random Variables Expectation Transformation
- presentation 3.pptx
- scan.313150343.pdf

Vous êtes sur la page 1sur 66

Numerical Descriptive

Techniques

Introduction…

Recall Chapter 2, where we used graphical techniques to

describe data:

questions (e.g. what is the class average? what is the mark spread?) go

unanswered.

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 4.2

Numerical Descriptive Techniques…

Measures of Central Location

Mean, Median, Mode

Measures of Variability

Range, Standard Deviation, Variance, Coefficient of Variation

Percentiles, Quartiles

Covariance, Correlation, Least Squares Line

Measures of Central Location…

The arithmetic mean, a.k.a. average, shortened to mean, is

the most popular & useful measure of central location.

dividing by the total number of observations:

Mean =

Number of observations

Notation…

When referring to the number of observations in a

population, we use uppercase letter N

sample, we use lower case letter n

letter “mu”:

“x-bar”:

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 4.5

Statistics is a pattern language…

Population Sample

Size N n

Mean

Arithmetic Mean…

Sample Mean

Population Mean

Statistics is a pattern language…

Population Sample

Size N n

Mean

The Arithmetic Mean…

…is appropriate for describing measurement data, e.g.

heights of people, marks of student papers, etc.

E.g. as soon as a billionaire moves into a neighborhood, the

average household income increases beyond what it was

previously!

Measures of Central Location…

The median is calculated by placing all the observations in

order; the observation that falls in the middle is the median.

Sort them bottom to top, find the middle:

0 0 5 7 8 9 12 14 22

simple average between 8 & 9:

0 0 5 7 8 9 12 14 22 33

median = (8+9)÷2 = 8.5

Sample and population medians are computed the same way.

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 4.10

Measures of Central Location…

The mode of a set of observations is the value that occurs

most frequently.

A set of data may have one mode (or modal class), or two, or

more modes.

Mode is a useful for all data types, though mainly used for

nominal data.

For large data sets the modal class is much more relevant

than a single-value mode.

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 4.11

Mode…

E.g. Data: {0, 7, 12, 5, 14, 8, 0, 9, 22, 33} N=10

The mode for this data set is 0. How is this a measure of

“central” location?

A modal class

Frequency

Variable

=MODE(range) in Excel…

Note: if you are using Excel for your data analysis and your

data is multi-modal (i.e. there is more than one mode), Excel

only calculates the smallest one.

determine if your data is bimodal, trimodal, etc.

Mean, Median, Mode…

If a distribution is symmetrical,

the mean, median and mode may coincide…

median

mode

mean

Mean, Median, Mode…

If a distribution is asymmetrical, say skewed to the left or to

the right, the three measures may differ. E.g.:

median

mode

mean

Mean, Median, Mode…

If data are symmetric, the mean, median, and mode will be

approximately the same.

for each subgroup.

Mean, Median, & Modes for Ordinal & Nominal Data…

not valid.

determining highest frequency but not “central location”.

Geometric Mean…

The geometric mean is used when the variable is a growth

rate or rate of change, such as the value of an investment

over periods of time.

then

The geometric mean Rg of the returns R1, R2, … Rn is defined

such that:

Finance Example…

Suppose a 2-year investment of $1,000 grows by 100% to $2,000 in

the first year, but loses 50% from $2,000 back to the original $1,000 in

the second year. What is your average return?

investment, not $1,000.

Solving for the geometric mean yields a rate of 0%, which is correct.

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 4.19

Measures of Central Location •

Summary…

Compute the Mean to

• Describe the central location of a single set of interval data

• Describe the central location of a single set of interval or ordinal

data

• Describe a single set of nominal data

• Describe a single set of interval data based on growth rates

Measures of Variability…

Measures of central location fail to tell the whole story about

the distribution; that is, how much are the observations

spread out around the mean value?

For example, two sets of

class grades are shown. The

mean (=50) is the same in

each case…

greater variability than the

blue class.

Range…

The range is the simplest measure of variability, calculated

as:

E.g.

Data: {4, 4, 4, 4, 50} Range = 46

Data: {4, 8, 15, 24, 39, 50} Range = 46

The range is the same in both cases,

but the data sets have very different distributions…

Range…

Its major advantage is the ease with which it can be

computed.

the dispersion of the observations between the two end

points.

all the data and not just two observations. Hence…

Variance…

Variance and its related measure, standard deviation, are

arguably the most important statistics. Used to measure

variability, they also play a vital role in almost all statistical

inference procedures.

(Lower case Greek letter “sigma” squared)

(Lower case “S” squared)

Statistics is a pattern language…

Population Sample

Size N n

Mean

Variance

Variance…

population mean

population size

sample mean

Variance…

As you can see, you have to calculate the sample mean (x-

bar) in order to calculate the sample variance.

sample variance directly from the data without the

intermediate step of calculating the mean. Its given by:

Application…

Example 4.7. The following sample consists of the number of

jobs six students applied for: 17, 15, 23, 7, 9, 13.

Finds its mean and variance.

students applied for: 17, 15, 23, 7, 9, 13.

Finds its mean and variance.

…as opposed to µ or σ 2

Sample Mean & Variance…

Sample Mean

Sample Variance

Standard Deviation…

The standard deviation is simply the square root of the

variance, thus:

Statistics is a pattern language…

Population Sample

Size N n

Mean

Variance

Standard

Deviation

Standard Deviation…

Consider Example 4.8 where a golf club manufacturer has

designed a new club and wants to determine if it is hit more

consistently (i.e. with less variability) than with an old club.

Using Tools > Data Analysis… > Descriptive Statistics in

Excel, we produce the following tables for interpretation…

consistent

distance with the

new club.

Interpreting Standard Deviation…

The standard deviation can be used to compare the variability of

several distributions and make a statement about the general shape

of a distribution. If the histogram is bell shaped, we can use the

Empirical Rule, which states:

deviation of the mean.

2) Approximately 95% of all observations fall within two standard

deviations of the mean.

3) Approximately 99.7% of all observations fall within three standard

deviations of the mean.

The Empirical Rule…

Approximately 68% of all observations fall

within one standard deviation of the mean.

within two standard deviations of the mean.

within three standard deviations of the mean.

Chebysheff’s Theorem…

A more general interpretation of the standard deviation is

derived from Chebysheff’s Theorem, which applies to all

shapes of histograms (not just bell shaped).

within k standard deviations of the mean is at least:

For k=2 (say), the theorem

states that at least 3/4 of all

observations lie within 2

standard deviations of the

mean. This is a “lower bound”

compared to Empirical Rule’s

approximation (95%).

Coefficient of Variation…

The coefficient of variation of a set of observations is the

standard deviation of the observations divided by their mean,

that is:

Statistics is a pattern language…

Population Sample

Size N n

Mean

Variance

Standard S

Deviation

Coefficient of CV cv

Variation

Coefficient of Variation…

This coefficient provides a

proportionate measure of variation, e.g.

the mean value is 100, but only moderately large when the

mean value is 500.

Measures of Variability…

If data are symmetric, with no serious outliers, use range and

standard deviation.

of variation.

used only for interval data.

Measures of Relative Standing & Box

Plots

Measures of relative standing are designed to provide

information about the position of particular values relative to

the entire data set.

are less than that value and (100-P)% are greater than that

value.

Suppose you scored in the 60th percentile on the GMAT,

that means 60% of the other scores were below yours,

while 40% of scores were above yours.

Quartiles…

We have special names for the 25th , 50th , and 75th percentiles,

namely quartiles.

median).

(tenths).

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 4.41

Commonly Used Percentiles…

First (lower) decile= 10th percentile

First (lower) quartile, Q1, = 25th percentile

Second (middle)quartile,Q2, = 50th percentile

Third quartile, Q3, = 75th percentile

Ninth (upper) decile = 90th percentile

that doesn’t mean you scored 80% on the exam – it means

that 80% of your peers scored lower than you on the exam; its

about your position relative to others.

Location of Percentiles…

The following formula allows us to approximate the location

of any percentile:

Location of Percentiles…

Recall the data from example 4.1:

0 0 5 7 8 9 12 14 22 33

which point are 25% of the values lower and 75% of the

values higher?

0 0 5 7 8 9 12 14 22 33

L25 = (10+1)(25/100) = 2.75

The 25th percentile is three-quarters of the distance between the

second (which is 0) and the third (which is 5) observations. Three-

quarters of the distance is: (.75)(5 – 0) = 3.75

Because the second observation is 0, the 25th percentile is 0 + 3.75

= 3.75

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 4.44

Location of Percentiles…

What about the upper quartile?

0 0 5 7 8 9 12 14 22 33

ninth observations, which are 14 and 22, respectively. One-quarter

of the distance is: (.25)(22 - 14) = 2, which means the 75th

percentile is at: 14 + 2 = 16

Location of Percentiles…

Please remember…

position

2.75 16

0 0 | 5 7 8 9 12 14 | 22 33

position

3.75 8.25

value lies, not the value of the percentile itself.

Interquartile Range…

The quartiles can be used to create another measure of

variability, the interquartile range, which is defined as

follows:

Interquartile Range = Q3 – Q1

50% of the observations.

Large values of this statistic mean that the 1st and 3rd

quartiles are far apart indicating a high level of variability.

Box Plots…

The box plot is a technique that graphs five statistics:

• the minimum and maximum observations, and

Whisker

Whisker (1.5*(Q3–Q1))

The lines extending to the left and right are called whiskers. Any points that

lie outside the whiskers are called outliers. The whiskers extend outward to

the smaller of 1.5 times the interquartile range or to the most extreme point

that is not an outlier.

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 4.48

Box Plots…

These box plots are based on

data in Xm04-15.

shortest and least variable.

variability, while Jack-in-

the-Box has the longest

service times.

Measures of Linear Relationship…

We now present two numerical measures of linear

relationship that provide information as to the strength &

direction of a linear relationship between two variables (if

one exists).

Covariance - is there any pattern to the way two variables

move together?

Coefficient of correlation - how strong is the linear

relationship between two variables?

Covariance…

population mean of variable X, variable Y

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 4.51

Covariance…

In much the same way there was a “shortcut” for

calculating sample variance without having to calculate the

sample mean, there is also a shortcut for calculating sample

covariance without having to first calculate the mean:

Statistics is a pattern language…

Population Sample

Size N n

Mean

Variance S2

Standard S

Deviation

Coefficient of CV cv

Variation

Covariance Sxy

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 4.53

Covariance Illustrated…

Consider the following three sets of data (textbook §4.5)…

In each set, the values of X are the same, and the value for Y are the

same; the only thing that’s changed is the order of the Y’s.

In set #1, as X increases so does Y; Sxy is large & positive

In set #2, as X increases, Y decreases; Sxy is large & negative

In set #3, as X increases, Y doesn’t move in any particular way; Sxy is

“small”

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 4.54

Covariance… (Generally speaking)

When two variables move in the same direction (both increase or both

decrease), the covariance will be a large positive number.

large negative number.

Coefficient of Correlation…

The coefficient of correlation is defined as the covariance

divided by the standard deviations of the variables:

Greek

letter

“rho”

How strong is the association between X and

Y?

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 4.56

Statistics is a pattern language…

Population Sample

Size N n

Mean

Variance S2

Standard S

Deviation

Coefficient of CV cv

Variation

Covariance Sxy

Coefficient of r

Correlation

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 4.57

Coefficient of Correlation…

The advantage of the coefficient of correlation over covariance is that

it has fixed range from -1 to +1, thus:

If the two variables are very strongly positively related, the coefficient

value is close to +1 (strong positive linear relationship).

If the two variables are very strongly negatively related, the coefficient

value is close to -1 (strong negative linear relationship).

Coefficient of Correlation…

Strong

+1 positive linear relationship

ρ or

No linear relationship

0

r =

Strong

-1 negative linear relationship

Coefficient of Correlation

(Application)

Consider Example 4.16, where MBA grade point averages

are compared with GMAT scores. Is the GMAT score a

good predictor of MBA success?

Excel:

Tools > Data Analysis… > Covariance

Tools > Data Analysis… > Correlation

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 4.60

GMAT & GPA Interpretation…

These two statistics tell us that there is a positive linear relationship

between GMAT score and GPA.

“moderately” strong.

Least Squares Method…

Recall, the slope-intercept equation for a line is expressed in these

terms:

y = mx + b

Where:

m is the slope of the line

b is the y-intercept.

variables with covariance and the coefficient of correlation, can we

determine a linear function of the relationship?

The Least Squares Method…

…produces a straight line drawn through the points so that

the sum of squared deviations between the points and the

line is minimized. This line is represented by the equation:

b1 is the slope, and

(“y” hat) is the value of y determined by the line.

The Least Squares Method…

The coefficients b0 and b1 are given by:

Least Squares Line…

Example 4.17 :: find the least squares line for the previous

example (e.g. MBA / GMAT).

Least Squares Line…

- Cochin Port Employee EngagementTransféré parRaheena.N
- BA201 Engineering Mathematic UNIT2 - Measures of Central TendencyTransféré parAh Tiang
- Transportation Statistics: table b 12 aTransféré parBTS
- Transportation Statistics: table b 12 bTransféré parBTS
- Learning Activity 5.19 PsyTransféré paryolanda Sitepu
- IJER_2014_1004.pdfTransféré parInnovative Research Publications
- F06101finalassTransféré parapi-26315128
- 10 Math StatisticTransféré parAjay Anand
- Introduction to psychologyTransféré parShakeel Khan
- 3Transféré parapi-476696603
- TInspire CoreTransféré parthor1s
- Arman_Abdul_RazakTransféré parap060108
- SEC_case_studyTransféré parSaeesh Prabhu Desai
- phase IIITransféré parAngelina Diaz
- 6-27Transféré parSunu Pradana
- HandoutTransféré parFariza Makhay
- Theoritical Statistics Assignment HelpTransféré parStatisticsAssignmentExperts
- 3.Random Variables Expectation TransformationTransféré parSurendra Swami
- presentation 3.pptxTransféré parLevi Pogi
- scan.313150343.pdfTransféré parphisitlai
- Stats assessmentTransféré parJames Lamb
- Rocas - Abrasividad 3Transféré parGabriel1976ipc
- NITKclass1Transféré par'-MubashirBaig
- odisha_tourism.pdfTransféré parsatyadeep mohanty
- ch03Transféré parTanya Yablonskaya
- January 2010 QP - S1 EdexcelTransféré parWambui Kahende
- CharacterizationofHeavyMetalAirPollutioninRomaniaUsingMossBiomonitoringNeutronActivationAnalysisandAtomicAbsorptionSpectrometryTransféré parAnonymous bjD4fCi
- interpreting categorical quantitative data unit mapTransféré parapi-343090158
- titleTransféré parMaysie Angily Tapia
- Chapter4Transféré parJhudy Phot

- 11 Economics Impq Ch05 Measures of Central TendencyTransféré parnivedita_h42404
- Palma Inequality 2011Transféré parmruz7712
- MTH410 S14 Lecture 01 May 12 -Mo-WedTransféré parzaid
- ASTM D 1329 – 88 (Reapproved 1998) Evaluating Rubber Property—Retraction at LowerTransféré paralin2005
- biostatistics-140127003954-phpapp02Transféré parNitya Krishna
- Chapter 07Transféré parEstefania Robles
- Particle technologyTransféré parYu Gen Xin
- 131-0105Transféré parapi-27548664
- Chapter 3 Descriptive StatisticsTransféré parPhillipBattista
- Handout 04 Data DescriptionTransféré parakbar
- 2- Disruptive Methods for Assessing Soil StructureTransféré parAlireza
- V1I8_IJERTV1IS8003Transféré parIwan S. Gaurifa
- Lowest Value of Variance Can BeTransféré parRameez Abbasi
- Com Med CompiledTransféré parbharati adhikari
- Quantitative and Qualitative ApproachesTransféré parBerry Cheung
- Chap 001 COMPLETE BUSINESS STATISTICSTransféré parShekhar Saurabh Biswal
- Fingerprint Biometric Attendance SystemTransféré parRose Kimberly Bagtas Lucban-Carlon
- SLC _ OPT Math _ Arithmetic ProgressionTransféré parwww.bhawesh.com.np
- Microsoft_Word_-_P6LabManual1a1-libre.pdfTransféré parabdala67
- Lecture NoteTransféré parOlaolu Olalekan
- 2004Transféré parMohammad Salim Hossain
- Engineering MathematicsTransféré parAsif Mohammed
- bus173_chap03Transféré parKaziRafi
- Antibiotice Si SCCTransféré parDaniela Cirnatu
- QTB Study Pack Final FinalTransféré parQaiser Iqbal
- Dcom203 Dmgt204 Quantitative Techniques iTransféré parKabutu Chuunga
- Measures of Central TendencyTransféré parPrabhakar Dash
- PHYSICAL PROPERTIES OF RED KIDNEY BEANS.docxTransféré parskittler jj
- lesson 1.pdfTransféré parShisi Shishi
- BcaTransféré parVishal Shinde