Vous êtes sur la page 1sur 80

Course : STAT6081 Statistics

Year : 2014

Session 2-3

DESCRIPTIVE STATISTICS
Topics
1. Introduction : Data and Statistics
2. Descriptive Statistics
3. Introduction to Probability
4. Discrete Probability Distributions
5. Continuous Probability Distributions
6. Sampling and Sampling Distributions
7. Interval Estimation
8. Hypothesis Tests
9. Analysis of Variance
10. Simple Linear Regression

Bina Nusantara University 3


References
Anderson, David R., Sweeney, Dennis J., Williams, Thomas A. (2011).
Statistics for Business and Economics. 11. Cengage Learning. USA.
ISBN: 978-0538481649.

Bina Nusantara University 4


Evaluating and Grading

Bina Nusantara University 5


Learning Outcome

LO 1: Explain the statistical data


LO 2: Interpret the results of statistical measurements
LO 3: Apply statistical method to the real problem
LO 4: Analyze the results from statistical method solution

Bina Nusantara University 6


IMPORTANT GSLC FOR MEETING 3

Find articles about descriptive statistics and


then summarize the contents. The articles are
from news or journals.
Do assignment in Binusmaya meeting 3

Bina Nusantara University 7


(1) Explain the statistical data

(2) Interpret the results of statistical measurements

Bina Nusantara University 7


1. Summarizing Qualitative Data

2. Summarizing Quantitative Data

3. Exploratory Data Analysis

4. Measures of Location

5. Measures of Variability
Bina Nusantara University 8
Frequency Distribution

Relative Frequency

Percent Frequency Distribution

Bar Graph

Pie Chart

Bina Nusantara University 9


(a) Frequency Distribution

A frequency distribution is a tabular summary of data showing the


frequency (or number) of items in each of several non overlapping
classes.

The objective is to provide insights about the data that cannot be


quickly obtained by looking only at the original data.

Bina Nusantara University 10


Example : Marada Inn
Guests staying at Marada Inn were asked to rate the quality of their
accommodations as being excellent, above average, average,
below average, or poor. The ratings provided by a sample of 20
guests are shown below.

Below Average Average Above Average


Above Average Above Average Above Average
Above Average Below Average Below Average
Average Poor Poor
Above Average Excellent Above Average
Average Above Average Average
Above Average Average

Bina Nusantara University 11


Bina Nusantara University 12
(b) Relative Frequency Distribution

The relative frequency of a class is the fraction or proportion of


the total number of data items belonging to the class.

A relative frequency distribution is a tabular summary of a set


of data showing the relative frequency for each class.

Bina Nusantara University 13


(c) Percent Frequency Distribution

The percent frequency of a class is the relative frequency


multiplied by 100.

A percent frequency distribution is a tabular summary of a set of


data showing the percent frequency for each class.

Bina Nusantara University 14


Example : Marada Inn

Bina Nusantara University 15


(d) Graph
A bar graph is a graphical device for depicting qualitative data that
have been summarized in a frequency, relative frequency, or percent
frequency distribution.
On the horizontal axis we specify the labels that are used for each of
the classes.
A frequency, relative frequency, or percent frequency scale can be
used for the vertical axis.
Using a bar of fixed width drawn above each class label, we extend the
height appropriately.
The bars are separated to emphasize the fact that each class is a
separate category.

Bina Nusantara University 16


Example : Marada Inn

9
8
Frequency

7
6
5
4
3
2
1
Rating
Poor Below AverageAbove Excellent
Average Average
Bina Nusantara University 17
(e) Pie Chart
The pie chart is a commonly used graphical device for
presenting relative frequency distributions for qualitative data.
First draw a circle; then use the relative frequencies to subdivide
the circle into sectors that correspond to the relative frequency
for each class.
Since there are 360 degrees in a circle, a class with a relative
frequency of .25 would consume .25(360) =
90 degrees of the circle.

Bina Nusantara University 18


Example : Marada Inn

Exc.
Poor
5%
10%
Below
Average
Above
15%
Average
45%
Average
25%

Quality Ratings
Bina Nusantara University 19
Insights Gained from the Preceding Pie Chart

One-half of the customers surveyed gave Marada a quality rating


of above average or excellent (looking at the left side of the
pie). This might please the manager.

For each customer who gave an excellent rating, there were


two customers who gave a poor rating (looking at the top of the
pie). This should displease the manager.

Bina Nusantara University 20


Frequency Distribution

Relative Frequency and Percent Frequency Distributions

Dot Plot

Histogram

Cumulative Distributions

Ogive

Bina Nusantara University 21


Example : Hudson Auto Repair
The manager of Hudson Auto would like to get a better picture of
the distribution of costs for engine tune-up parts. A sample of 50
customer invoices has been taken and the costs of parts, rounded
to the nearest dollar, are listed below.

91 78 93 57 75 52 99 80 97 62
71 69 72 89 66 75 79 75 72 76
104 74 62 68 97 105 77 65 80 109
85 97 88 68 83 68 71 69 67 74
62 82 98 101 79 105 79 69 62 73

Bina Nusantara University 22


(a) Frequency Distribution

Guidelines for Selecting Number of Classes

Use between 5 and 20 classes.

Data sets with a larger number of elements usually require a


larger number of classes.

Smaller data sets usually require fewer classes.

Bina Nusantara University 23


Guidelines for Selecting Width of Classes

Use classes of equal width.

Approximate Class Width =

Largest Data Value Smallest Data Value


Number of Classes

Bina Nusantara University 24


Example : Hudson Auto Repair
Frequency Distribution
If we choose six classes:
Approximate Class Width = (109 - 52)/6 = 9.5 = 10

Cost ($) Frequency


50-59 2
60-69 13
70-79 16
80-89 7
90-99 7
100-109 5
Total 50

Bina Nusantara University 25


Bina Nusantara University 26
Insights Gained from the Percent Frequency Distribution

Only 4% of the parts costs are in the $50-59 class.

30% of the parts costs are under $70.

The greatest percentage (32% or almost one-third) of the parts


costs are in the $70-79 class.

10% of the parts costs are $100 or more.

Bina Nusantara University 27


(b) Dot Plot

One of the simplest graphical summaries of data is a dot plot.

A horizontal axis shows the range of data values.

Then each data value is represented by a dot placed above the axis.

Bina Nusantara University 28


Example : Hudson Auto Repair

.. .. . . .
. . .. .....
.. ..........
.. .. .. .. . .. . . ...
. . ... .
50 60 70 80 90 100 110

Cost ($)

Bina Nusantara University 29


(c) Histogram
Another common graphical presentation of quantitative data is a
histogram.
The variable of interest is placed on the horizontal axis and the
frequency, relative frequency, or percent frequency is placed on the
vertical axis.
A rectangle is drawn above each class interval with its height
corresponding to the intervals frequency, relative frequency, or percent
frequency.
Unlike a bar graph, a histogram has no natural separation between
rectangles of adjacent classes.

Bina Nusantara University 30


Example : Hudson Auto Repair
18
16
14
12
Frequency

10
8
6
4
2 Parts
Cost ($)

50 60 70 80 90 100 110
Bina Nusantara University 31
(d) Cumulative Distribution
The cumulative frequency distribution shows the number of
items with values less than or equal to the upper limit of each
class.
The cumulative relative frequency distribution shows the
proportion of items with values less than or equal to the upper
limit of each class.
The cumulative percent frequency distribution shows the
percentage of items with values less than or equal to the upper
limit of each class.

Bina Nusantara University 32


Example : Hudson Auto Repair

Bina Nusantara University 33


(e) Ogive
An ogive is a graph of a cumulative distribution.

The data values are shown on the horizontal axis.

Shown on the vertical axis are the:

cumulative frequencies, or

cumulative relative frequencies, or

cumulative percent frequencies

The frequency (one of the above) of each class is plotted as a point.

The plotted points are connected by straight lines.

Bina Nusantara University 34


Example : Hudson Auto Repair

Ogive
Because the class limits for the parts-cost data are 50-59, 60-69,
and so on, there appear to be one-unit gaps from 59 to 60, 69 to
70, and so on.
These gaps are eliminated by plotting points halfway between
the class limits.
Thus, 59.5 is used for the 50-59 class, 69.5 is used for the 60-69
class, and so on.

Bina Nusantara University 35


Bina Nusantara University 36
The techniques of exploratory data analysis consist of simple
arithmetic and easy-to-draw pictures that can be used to
summarize data quickly.

One such technique is the stem-and-leaf display.

Bina Nusantara University 37


(a) Stem-and-Leaf Display

A stem-and-leaf display shows both the rank order and shape of


the distribution of the data.
It is similar to a histogram on its side, but it has the advantage of
showing the actual data values.
The first digits of each data item are arranged to the left of a
vertical line.
To the right of the vertical line we record the last digit for each
item in rank order.
Each line in the display is referred to as a stem.
Each digit on a stem is a leaf.

Bina Nusantara University 38


Example : Hudson Auto Repair

Bina Nusantara University 39


(b) Stretched Stem-and-Leaf Display

If we believe the original stem-and-leaf display has condensed the


data too much, we can stretch the display by using two more
stems for each leading digit(s).

Whenever a stem value is stated twice, the first value corresponds


to leaf values of 0-4, and the second values corresponds to values
of 5-9.

Bina Nusantara University 40


Example : Hudson Auto Repair

Bina Nusantara University 41


Stem-and-Leaf Display

Leaf Units

A single digit is used to define each leaf.

In the preceding example, the leaf unit was 1.

Leaf units may be 100, 10, 1, 0.1, and so on.

Where the leaf unit is not shown, it is assumed to equal 1.

Bina Nusantara University 42


Example: Leaf Unit = 0.1

Bina Nusantara University 43


(c) Cross Tabulations and Scatter Diagrams
Thus far we have focused on methods that are used to
summarize the data for one variable at a time.

Often a manager is interested in tabular and graphical methods


that will help understand the relationship between two variables.

Crosstabulation and a scatter diagram are two methods for


summarizing the data for two (or more) variables
simultaneously.

Bina Nusantara University 44


Cross Tabulation
Crosstabulation is a tabular method for summarizing the data for
two variables simultaneously.

Crosstabulation can be used when:

One variable is qualitative and the other is quantitative

Both variables are qualitative

Both variables are quantitative

The left and top margin labels define the classes for the two
variables.

Bina Nusantara University 45


Example : Finger Lakes Homes

Crosstabulation

The number of Finger Lakes homes sold for each style and
price for the past two years is shown below.

Bina Nusantara University 46


Insights Gained from the Preceding Cross tabulation

The greatest number of homes in the sample (19) are a split-


level style and priced at less than or equal to $99,000.

Only three homes in the sample are an A-Frame style and priced
at more than $99,000.

Bina Nusantara University 47


Cross Tabulation: Row or Column Percentages

Converting the entries in the table into row percentages or column


percentages can provide additional insight about the relationship
between the two variables.

Bina Nusantara University 48


Example : Finger Lakes Homes

Row Percentages

Price Home Style


Range Colonial Ranch Split A-Frame Total

< $99,000 32.73 10.91 34.55 21.82


100
> $99,000 26.67 31.11 35.56 6.67 100

Note: row totals are actually 100.01 due to rounding.

Bina Nusantara University 49


(d) Scatter Diagram

A scatter diagram is a graphical presentation of the relationship


between two quantitative variables.
One variable is shown on the horizontal axis and the other variable
is shown on the vertical axis.
The general pattern of the plotted points suggests the overall
relationship between the variables.

Bina Nusantara University 50


A Positive Relationship

Bina Nusantara University 51


A Negative Relationship

x
Bina Nusantara University 52
No Apparent Relationship

Bina Nusantara University 53


Example : Panthers Football Team

Bina Nusantara University 54


Example : Panthers Football Team
The preceding scatter diagram indicates a positive relationship
between the number of interceptions and the number of points
scored.
Higher points scored are associated with a higher number of
interceptions.
The relationship is not perfect; all plotted points in the scatter
diagram are not on a straight line.

Bina Nusantara University 55


Tabular and Graphical Procedures

Bina Nusantara University 57


Example : Apartment Rents
Mean xi 34 , 356
x 490.80
n 70
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615
Bina Nusantara University 57
Median

The median is the measure of location most often reported for


annual income and property value data.

A few extremely large incomes or property values can inflate the


mean.

The median of a data set is the value in the middle when the data
items are arranged in ascending order.

For an odd number of observations, the median is the middle value.

For an even number of observations, the median is the average of the


two middle values.
Bina Nusantara University 58
Example : Apartment Rents
Median = 50th percentile
i = (p/100) n = (50/100)70 = 35.5
Averaging the 35th and 36th data values:
Median = (475 + 475)/2 = 475

425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615

Bina Nusantara University 59


Mode

The mode of a data set is the value that occurs with greatest
frequency.

The greatest frequency can occur at two or more different


values.

If the data have exactly two modes, the data are bimodal.

If the data have more than two modes, the data are multimodal.

Bina Nusantara University 60


Example : Apartment Rents
450 occurred most frequently (7 times)
Mode = 450

425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615

Bina Nusantara University 61


Percentiles

A percentile provides information about how the data are spread


over the interval from the smallest value to the largest value.

Admission test scores for colleges and universities are frequently


reported in terms of percentiles.

Bina Nusantara University 62


Example : Apartment Rents
90th Percentile
i = (p/100)n = (90/100)70 = 63
Averaging the 63rd and 64th data values:
90th Percentile = (580 + 590)/2 = 585
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615

Bina Nusantara University 63


Quartiles

Quartiles are specific percentiles

First Quartile = 25th Percentile

Second Quartile = 50th Percentile = Median

Third Quartile = 75th Percentile

Bina Nusantara University 64


Example : Apartment Rents
Third Quartile
Third quartile = 75th percentile
i = (p/100)n = (75/100)70 = 52.5 = 53
Third quartile = 525
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615

Bina Nusantara University 65


It is often desirable to consider measures of variability (dispersion),
as well as measures of location.

For example, in choosing supplier A or supplier B we might consider


not only the average delivery time for each, but also the variability in
delivery time for each.

Bina Nusantara University 66


Measures of Variability

Range

Interquartile Range

Variance

Standard Deviation

Coefficient of Variation

Bina Nusantara University 67


(a) Range

The range of a data set is the difference between the largest


and smallest data values.

It is the simplest measure of variability.

It is very sensitive to the smallest and largest data values.

Bina Nusantara University 68


Example : Apartment Rents
Range
Range = largest value - smallest value
Range = 615 - 425 = 190

425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615

Bina Nusantara University 69


(b) Interquartile Range

The interquartile range of a data set is the difference between


the third quartile and the first quartile.

It is the range for the middle 50% of the data.

It overcomes the sensitivity to extreme data values.

Bina Nusantara University 70


Example : Apartment Rents
Interquartile Range
3rd Quartile (Q3) = 525
1st Quartile (Q1) = 445
Interquartile Range = Q3 - Q1 = 525 - 445 = 80
425 430 430 435 435 435 435 435 440 440
440 440 440 445 445 445 445 445 450 450
450 450 450 450 450 460 460 460 465 465
465 470 470 472 475 475 475 480 480 480
480 485 490 490 490 500 500 500 500 510
510 515 525 525 525 535 549 550 570 570
575 575 580 590 600 600 600 600 615 615

Bina Nusantara University 71


(c) Variance

The variance is a measure of variability that utilizes all the data.

It is based on the difference between the value of each observation


(xi) and the mean (x for a sample, m for a population).

Bina Nusantara University 72


(d) Standard Deviation
The standard deviation of a data set is the positive square root
of the variance.

It is measured in the same units as the data, making it more


easily comparable, than the variance, to the mean.

If the data set is a sample, the standard deviation is denoted s.

s s2
If the data set is a population, the standard deviation is denoted
(sigma). 2
Bina Nusantara University 73
Exercises
1. The five top-selling vehicles during 2003 were the Chevrolet
Silverado/C/K pickup, Dodge Ram pickup, Honda Accord, and
Toyota Camry (Motor Trend, 2003). Data from a sample of 50
vehicle purchases are presented in table.

Bina Nusantara University 75


a. Develop a frequency and percent frequency distribution!

b. What is the best-selling pickup truck, and what is the best-selling


passenger car?

c. Show a chart or table for qualitative data in table!

Bina Nusantara University 76


2. According to the 2003 Annual Consumer Spending Survey, the
average monthly Bank of America Visa credit card charge was
$1838 (U.S. Airways Attache Magazine, December 2003). A sample
of monthly credit card charges provides the following data.

Bina Nusantara University 77


a. Compute the mean and median!

b. Compute the first and third quartiles!

c. Compute the range and interquartile range!

d. Compute the variance and standard deviation!

Bina Nusantara University 78


Anderson, David R., Sweeney, Dennis J., Williams, Thomas A. (2011).
Statistics for Business and Economics. 11. Cengage Learning. USA. ISBN:
978-0538481649.

Bina Nusantara University 79


Bina Nusantara University 80

Vous aimerez peut-être aussi