Académique Documents
Professionnel Documents
Culture Documents
Statistics (Chapter 3)
Lecture Objectives :
Review approaches to visually displaying Data.
Graphics that display key statistical features of measurements from a
sample.
Define the distribution of a set of data.
Review common basic statistics.
• Extremes (Minimum and Maximum)
• Central Tendency ( Mean, Median)
• Spread (Range, Variance, Standard Deviation)
Review not so common basic statistics.
• Extremes (upper and lower quartiles)
• Central Tendency (Mode, Winsorized Mean)
• Spread (Interquartile Range)
STA6166-2-1
Graphics
“A picture is worth a
thousand words…”
STA6166-2-2
Objectives
As you create graphics keep the following in mind.
STA6166-2-4
Example Data
STA6166-2-5
Candy data as Excel spreadsheet
STA6166-2-6
Af
te
rD
in
ne
0
50
100
150
200
250
rM
C in
an t
d y
C C
he or
w n
in
G g
um G
um
m
y
Column chart
Li Be
co a rs
ric
M e Tw
ilk is
Ch ts
M o co
ilk la
C ...
ho
M co
la
ilk
C ...
ho
co
Sl
ic
So es
ur
Ba
lls
Ta
ff y
Display the data table
STA6166-2-7
Alternate Display
Sorting and expanding the scale of the graph allows all
labels to be seen as well as displaying a characteristic of
the data.
Calories in Common Candies
250
200
150
100
50
0
ts s s t ns l ls r r ar s
um tc
h
po
p
al
ls in fee hip ar is ts ce
s in or
n
el
s
ans isi ffy Ba itt
le
Ba Ba B nut
G o li f e li r M C m e a a lk r e d e a
g rs
c
Lo
l rB ht
M
To te
C B Tw in
S
ne y ar
a B R T i tB la
t
on la
t e
w
in te ou rl ig la m
y
ice ct in nd C ll y r ed edM a nu co lm co edP
t S a o m r e D a e e l t o A o r
he Bu St ho
c u co P te
r C J
ov a Pe h te h e
C C G Li Af C te
M rk
C
o la i lk
C ov
t te a a c e C
ee ol
a
co
l D ho
M
la
t
w C co
iS hoc ho lk o
m C C i h
Se i lk i lk M C
M M i lk
M
STA6166-2-8
Vertical Display of Data
Calories in Common Candies
MilkChocolate Bar
DarkChocolateBar
MilkChocolateMaltedMilkBalls
MilkChocolateCoveredRaisins
Caramels
AfterDinnerMint
LicoriceTwists
SemiSweetChocolateChips
StarlightMints
Lollipop
Chewing Gum
3 ( 3, 13.6%)
1 ( 3, 13.6%)
6 ( 1, 4.5%)
4 ( 1, 4.5%)
SatFat ( 9, 40.9%)
0 (14, 63.6%)
10 60 60 60 60 60 70 130 140 140 160 160 160 160 160 160 180 180 200 210 210 210
STA6166-2-13
Range
Extremes
•Minimum(calories) = 10 Range = 210-10 = 200
•Maximum(calories) = 210
Trimmed mean = mean of data where some fraction of the smallest and
largest data values are not considered. Usually the
smallest 5% and largest 5% values (rounded to nearest
integer) of data are removed for this computation.
= 136.0 (with 10% trimmed, 5% each tail).
Here n=22, (n+1)/4=23/4=5.75, hence Q1 is three quarters between the 5th and 6th
observations in the sorted list. The 5th value is 60 and the 6th
value is 60, thus
60 + .75(60-60)=60.
For Q2, (n+1)/2 = 23/2 = 11.5, e.g. half way between the 11th and 12th obs.
Q2 = 160 + .5(160-160) = 160.
For Q3, 3(n+1)/4 = 3(23)/4 = 69/4 = 17.25, e.g a quarter of the way between the 17th
and 18th observations.
Q3 = 180 + .25(180-180) = 180
10 60 60 60 60 60 70 130 140 140 160 160 160 160 160 160 180 180 200 210 210 210
STA6166-2-17
Percentiles
100pth Percentile: that value in a sorted list of the data that
has approx p100% of the measurements below it
and approx (1-p)100% above it. (The p quantile.)
Distribution
function 0<p<1
Examples:
Q1 = 25th percentile
Q2 = 50th percentile
Q3 = 75th percentile
STA6166-2-18
Simplified Quartiles
A simpler way to find Q1 & Q3 is as follows:
1. Order the data from the lowest to the highest value, and find the
median.
2. Divide the ordered data into the lower half and the upper half, using
the median as the dividing value. (Always exclude the median itself
from each half.)
3. Q1 is just the median of the lower half.
4. Q3 is just the median of the upper half.
Ex: For the candy data we still get Q1=60 and Q3=180.
STA6166-2-19
Measures of Variability
Range
Interquartile Range
Variance
Standard Deviation
Quartiles:
Q1 = 25th = 60
Q2 = 50th = median = 160
Q3 = 75th = 180
n n
iy y 2
s2 i 1
n 1
STA6166-2-22
Excel Data Analysis Tool
Select the Data Analysis Tool
Select Descriptive Statistics
The menu below appears.
Enter the Input Range and
check the output options
desired.
STA6166-2-23
Excel Descriptive Statistics Output
STA6166-2-24
Importing a text
data file in standard
format into Minitab
Pull down
menus
Session
worksheet
with script
commands
Spreadsheet
like data area
STA6166-2-25
Computing Descriptive
Stats
Descriptive Statistics
Histogram of calories N = 22
• A printer graph of the Midpoint Count
20 1 *
frequency table. 40 0
• Easy to do by hand. 60 5 *****
• Quick visualization of 80
100
1 *
0
the data. 120 0
140 3 ***
160 6 ******
180 2 **
200 1 *
220 3 ***
STA6166-2-28
Box Plot for Calories
Maximum
100
Minimum
Box Plot
(SAS Proc Insight)
STA6166-2-29
Percentiles
100pth Percentile: that value in a sorted list of the data that
has approx p100% of the measurements below it
and approx (1-p)100% above it. (The p quantile.)
Smoothed
histogram 0<p<1
Examples:
Q1 = 25th percentile
Q2 = 50th percentile
Q3 = 75th percentile
STA6166-2-30
Frequency Histogram
A graphical presentation of the frequency table where the relative
areas of the bars are in proportion to the frequencies.
Frequency 9
6
F re q u e n c y
calories
Bin width
STA6166-2-31
Density Histogram
Histograms have
important ties to
probability.
STA6166-2-32
Number of Bins for Smoothed histogram or density curve.
Histograms
100
0 5 10 15
lengths of the axes can
totfat
change how the relationship is
perceived.
200
calories
100
0 5 10 15
totfat
STA6166-2-34
Matrix Plot
STA6166-2-35
Brushing the plot Three-D
to identify Views
interesting points.
STA6166-2-36
Chernoff Faces
Displaying
multiple variables
symbolically.
STA6166-2-37