Vous êtes sur la page 1sur 6

TMATH 390: Chapter 2 A: Numerical summaries for data.

In chapter 1 we covered a few graphical summaries of distributions.


In this chapter well cover some numerical summaries of distributions.
For every summary, there are two cases to consider:
are we using the number to summarize a data distribution or a theoretical population distribution?
In the former case, were dealing with a

In the latter case, were dealing with a

First well cover the numerical summaries for data distributions.


(2.1: Measures of Center)
Question: what tool(s) did we have for graphically displaying the distribution of a quantitative
variable?

Question: which of these can be defined for categorical variables? Explain.

Example: The data from the class heights appear below. Find the mean, median and mode.

Definition: A measure is
observations.

if it is not sensitive to extreme

Example: Consider two data sets: A = {1, 2, 3, 3, 3, 4, 6, 7} and B = {1, 2, 3, 3, 3, 4, 6, 15}.


1. Find the mean, median, and mode of A and B.

2. Based on your previous work, of the three measures of center, are any resistant? Explain.

3. Do the class heights appear to have any extreme observations?


The trimmed mean is a compromise between x and x. To find it:
(a) Order the observations smallest to largest.
(b) Choose a trimming percentage 100r%, where 0 r 0.5. Trim the most extreme 100r% n
values from the data, half from the top and half from the bottom. For example, let r = 20%
and suppose n = 40. We have (40)(0.20) = 8, so 8/2 = 4 observations are trimmed from the
top and bottom of the distribution. The mean of the remaining observations is the trimmed
mean.
(c) The mean and median are special cases of the trimmed mean. Why?
2

(2.2: Measures of Variability)

Question: Why the n 1 in the denominator?


This is called Bessels correction for the sample variance. Well talk more about unbiased estimators
later, but suffice it to say, its the right thing to divide by when we want to do inference with the
sample variance.
Question: Why the square rootwhy not always use variance?

Defining formula versus computational formula:

Which of the above measures are resistant?


3

(2.3: More Detailed Summary Quantities)


A percentile p of a quantitative distribution is a number in [0, 100] such that p% of the observations
are less than or equal to that number.
A quantile q of a quantitative distribution is a numberin [0, 1] such that a proportion q is less than or
equal to that number.
So percentiles and quantiles are related!
The 0.5 quantile is the 50th percentile, which is also called the

Common percentiles are: the quartiles (quarter = one-fourth):


Q1 is the first quartile, aka the 25th percentile, or the 0.25 quantile.

Q2 is the second quartile, aka the 50th percentile.

Q3 is the third quartile, aka the

percentile.

Note: for data distributions, there is no universally agreed upon definition of quartile. This isnt a
problem if n is even, but it is a problem if n is odd.
Your book: includes the median in the lower half and upper half (this is the class definition).
Your calculator: the Texas Instrument calculators do not include the median in the upper and
lower halves.
What does R do? Find out.
Example: S = {1, 1, 3, 5, 6}.

Definition: The distance between the first and third quartiles tells us how
spread out the middle 50% of the data are. This is a commonly
used measure of spread. So common, in fact, it has a name:
. It is denoted by IQR.
Question: is the IQR resistant? Explain.

Definition: The five-number summary consists of the minimum (min), Q1 , x, Q3 ,


and the maximum (max).
Definition: An
is an observation that appears to fall
outside the overall pattern of the data.
Possible causes:

It is important to look at your data to identify potential outliers.


A numerical rule-of-thumb is sometimes employed to identify potential outliers as well.
This rule of thumb says: any observation 1.5IQR outside of Q1 and Q3 are potential outliers.

Example: Find the five-number summary of and apply the IQR rule to the class height data.

Boxplots versus modified box plots


The five-number summary gives another way of visually summarizing data, using the so-called boxplot.
Boxplot of class heights:

Do the following worksheets:


Ch2-3 comparing histograms and boxplots.pdf
Ch2-3 using boxplots to compare dists.pdf
6

Vous aimerez peut-être aussi