Vous êtes sur la page 1sur 27

MGT1051

Business Analytics
for Engineers

Normal Distribution

© 2018 C. Gangatharan – VIT Dec 11, 2018 – Tue MGT1051 – Business Analytics for Engineers
Data Distribution
• Data can be “distributed” (spread out) in different ways

© 2018 C. Gangatharan – VIT Dec 11, 2018 – Tue MGT1051 – Business Analytics for Engineers
What is Normal (Gaussian)
Distribution?

• The normal distribution is a descriptive model


that describes real world situations.

• It is defined as a continuous frequency distribution of


infinite range (can take any values not just integers as
in the case of binomial and Poisson distribution).

• This is the most important probability distribution in


statistics and important tool in analysis of
epidemiological data and management science.

© 2018 C. Gangatharan – VIT Dec 11, 2018 – Tue MGT1051 – Business Analytics for Engineers
Types of Distribution

• Frequency Distribution
• Normal (Gaussian) Distribution
• Probability Distribution
• Poisson Distribution
• Binomial Distribution
• Sampling Distribution
• t distribution
• F distribution

© 2018 C. Gangatharan – VIT Dec 11, 2018 – Tue MGT1051 – Business Analytics for Engineers
A Bell Curve

© 2018 C. Gangatharan – VIT Dec 11, 2018 – Tue MGT1051 – Business Analytics for Engineers
What are some examples of things that
follow a Normal Distribution?

• Heights of people
• Size of things produced by machines
• Errors in measurements
• Blood Pressure
• Test Scores

© 2018 C. Gangatharan – VIT Dec 11, 2018 – Tue MGT1051 – Business Analytics for Engineers
Standard Normal Distribution
• mean=median=mode
• Symmetry about the center
• 50% of the values less than the mean and 50%
greater than the mean

© 2018 C. Gangatharan – VIT Dec 11, 2018 – Tue MGT1051 – Business Analytics for Engineers
Characteristics of Normal Distribution

• It links frequency distribution to probability


distribution

• Has a Bell Shape Curve and is Symmetric

• It is Symmetric around the mean:


Two halves of the curve are the same (mirror
images)

© 2018 C. Gangatharan – VIT Dec 11, 2018 – Tue MGT1051 – Business Analytics for Engineers
The Standard Deviation
68% of values
are within 1
standard
deviation of
the mean

95% of values
are within 2
standard
deviations of
the mean

99.7% of
values are
within 3
standard
deviations of
the mean
© 2018 C. Gangatharan – VIT Dec 11, 2018 – Tue MGT1051 – Business Analytics for Engineers
Why do we need to know
Standard Deviation?

• Any value is
• likely to be within 1 standard deviation of the mean
• very likely to be within 2 standard deviations
• almost certainly within 3 standard deviations

© 2018 C. Gangatharan – VIT Dec 11, 2018 – Tue MGT1051 – Business Analytics for Engineers
How good is rule for real data?

Check some example data:

• The mean of the weight of the women = 127.8 lb


• The standard deviation (SD) = 15.5 lb

© 2018 C. Gangatharan – VIT Dec 11, 2018 – Tue MGT1051 – Business Analytics for Engineers
68% of 120 = .68x120 = ~ 82 runners
In fact, 79 runners fall within 1-SD (15.5 lbs) of the mean.

112.3 127.8 143.3

25

20

P
e 15
r
c
e
n 10
t

0
80 90 100 110 120 130 140 150 160
POUNDS

© 2018 C. Gangatharan – VIT Dec 11, 2018 – Tue MGT1051 – Business Analytics for Engineers
95% of 120 = .95 x 120 = ~ 114 runners
In fact, 115 runners fall within 2-SD’s of the mean.

96.8 127.8 158.8

25

20

P
e 15
r
c
e
n 10
t

0
80 90 100 110 120 130 140 150 160
POUNDS

© 2018 C. Gangatharan – VIT Dec 11, 2018 – Tue MGT1051 – Business Analytics for Engineers
99.7% of 120 = .997 x 120 = 119.6 runners
In fact, all 120 runners fall within 3-SD’s of the mean.

81.3 127.8 174.3

25

20

P
e 15
r
c
e
n 10
t

0
80 90 100 110 120 130 140 150 160
POUNDS

© 2018 C. Gangatharan – VIT Dec 11, 2018 – Tue MGT1051 – Business Analytics for Engineers
The Normal Distribution:
as mathematical function (pdf)

1 x 2
1  ( )
f ( x)  e 2 
 2
This is a bell shaped
Note constants: curve with different
=3.14159 centers and spreads
e=2.71828 depending on  and 
© 2018 C. Gangatharan – VIT Dec 11, 2018 – Tue MGT1051 – Business Analytics for Engineers
Outliers ?

Bill Gates makes $500 million a year.


He’s in a room with 9 teachers, 4 of whom
make $40k, 3 make $45k, and 2 make
$55k a year. What is the mean salary of
everyone in the room? What would be the
mean salary if Gates wasn’t included?

Mean With Gates: Mean Without Gates:


$50,040,500 $45,000

© 2018 C. Gangatharan – VIT Dec 11, 2018 – Tue MGT1051 – Business Analytics for Engineers
What is an outlier?

• Observations inconsistent with rest of the


dataset – Global Outlier

• Special outliers – Local Outlier


• Observations inconsistent with their
neighborhoods
• A local instability or discontinuity

© 2018 C. Gangatharan – VIT Dec 11, 2018 – Tue MGT1051 – Business Analytics for Engineers
Outlier Detection

Find the mean and median of the following set


of numbers:

3 12 7 40 9 14 18 15 17

Mean is 15

Median is 14

© 2018 C. Gangatharan – VIT Dec 11, 2018 – Tue MGT1051 – Business Analytics for Engineers
Outlier

In a set of numbers, a number that is


much LARGER or much SMALLER
than the rest of the numbers is called an
Outlier.
To find any outliers in a set of data, we
need to find the 5 Number Summary of
the data.

© 2018 C. Gangatharan – VIT Dec 11, 2018 – Tue MGT1051 – Business Analytics for Engineers
Outlier Detection
To find any outliers in a set of data, we need
to find the 5 Number Summary of the data.
Find the 5 Number Summary of the following numbers:
Step 1: Sort the numbers from lowest to highest

Step 2: Identify the Median

Step 3: Identify the Smallest and Largest numbers

Step 4: Identify the Median between the smallest number


and the Median for the entire set of data, and between that Median and the
largest number in the set.

3 7 9 12 14 15 17 18 40

© 2018 C. Gangatharan – VIT Dec 11, 2018 – Tue MGT1051 – Business Analytics for Engineers
Outlier Detection

3 - Smallest number in the set

9 - Median between the smallest number


and the median

14 - Median of the entire set

17 - Median between the largest number


and the median

40 - Largest number in the set

These are the five numbers in the 5 Number Summary

3 7 9 12 14 15 17 18 40
© 2018 C. Gangatharan – VIT Dec 11, 2018 – Tue MGT1051 – Business Analytics for Engineers
Outlier Detection
A 5 Number Summary divides your data into four quarters.

3 7 9 12 14 15 17 18 40

1st 2nd 3rd 4th


Quarter Quarter Quarter Quarter

© 2018 C. Gangatharan – VIT Dec 11, 2018 – Tue MGT1051 – Business Analytics for Engineers
Outlier Detection
25% of all the numbers in the set are smaller than Q1

3 7 9 12 14 15 17 18 40

The Lower Quartile (Q1) is the second number in the 5


Number Summary
The Upper Quartile (Q3) is the fourth number in the 5
Number Summary

25% of all the numbers in the set are larger than Q3

© 2018 C. Gangatharan – VIT Dec 11, 2018 – Tue MGT1051 – Business Analytics for Engineers
Outlier Detection

What percent of all the numbers are between Q1 and Q3?

3 7 9 12 14 15 17 18 40

50% of all the numbers are between Q1 and Q3


This is called the Inter-Quartile Range (IQR)
The size of the IQR is the distance between Q1 and Q3

17 - 9 = 8
© 2018 C. Gangatharan – VIT Dec 11, 2018 – Tue MGT1051 – Business Analytics for Engineers
Outlier Detection

3 7 9 12 14 15 17 18 40

IQR = 8

To determine if a number is an outlier, multiply the IQR by


1.5
8 • 1.5 = 12
An outlier is any number that is 12 less than Q1 or 12 more
than Q3
© 2018 C. Gangatharan – VIT Dec 11, 2018 – Tue MGT1051 – Business Analytics for Engineers
Outlier Detection
+ 12

- 12

3 7 9 12 14 15 17 18 40

IQR = 8

-3 39
OUTLIER
© 2018 C. Gangatharan – VIT Dec 11, 2018 – Tue MGT1051 – Business Analytics for Engineers
Outlier Detection

Find the mean and median of the following set of numbers


(no outliers):

3 12 7 40 9 14 18 15 17
Mean is 15 Mean is 11.875

Median is 14 Median is 13

© 2018 C. Gangatharan – VIT Dec 11, 2018 – Tue MGT1051 – Business Analytics for Engineers

Vous aimerez peut-être aussi