Académique Documents
Professionnel Documents
Culture Documents
IENG 455
Feng Yang
West Virginia University
Population Sample
Statistics
2
1
Basic Concepts
Population a collection of all units of interest. (people,
products, )
Sample a subset of a population that is actually observed.
Variable a measurable property or attribute associated
with each unit in the population.
Parameters numeric characteristics of the population
defined for each variable of interest.
An Example:
Consider a lot of 100 items in manufacturing
Population: 100 items Sample: a subset of 10 items
Variable X: defectiveness of products
1 If the item is defective
X
0 Otherwise
Parameters: Defective rate of the lot
(number of defectives/number of lots)
3
2
Probability Basics
Random experiment a specific procedure
whose outcome is uncertain.
Random variable X a numeric quantity whose
value is determined by the outcome of a random
experiment.
Sample space S the collection of all possible
outcomes of a random experiment.
Event E any collection of outcomes contained in
the sample space.
Probability of an event the relative likelihood
that it will occur when you do the experiment
5
3
Random Variables
Quantifies the random outcome
Assigns a numerical value for every outcome of the experiment
Probabilistic behavior described by distribution function
A RV can only take values in its sample space ( X S ).
Examples of sample spaces:
1. Toss a coin: S = { H, T }
2. Roll a single die: S = { 1, 2, , 6 }
3. Count the number of customers entering a store during one
day: S = { 1, 2, 3, }
4. Observe the lifetime of a car battery: S = [0,)
(hours)
6. Measure the outdoor temperature: S = [23, 104]
(Fahrenheit )
Discrete vs. Continuous RV.
Discrete can take on only certain separated values
Continuous can take on any real value in some range 7
Discrete Distributions
Let X be a discrete RV with S = { x1, x2, x3, }
Probability mass function (pmf)
p(xi) = P(X = xi) for i = 1, 2, 3, ...
Toss a die: S ={1,2,,6 }
pmf
P( X 1) 1 / 6
P( X 2) 1 / 6 1/6
P ( X 6) 1 / 6 1 2 3 4 5 6 x
4
Discrete Distributions (contd)
Cumulative distribution function (cdf)
F ( x ) P{ X x}
all i such
p ( xi )
that x i x
3/6
Toss a die:
Properties of discrete c.d.f.
0 F(x) 1 for all x
As x , F(x) 0; As x +, F(x) 1
F(x) is nondecreasing in x
F(x) is a step function continuous from the right with jumps
at the xis of height equal to the pmf at that xi
Some discrete distributions
Bernoulli, Binomial, Poisson Distribution
9
Continuous Distributions
Now let X be a continuous RV with sample space
S = [ xL, xU ] (Possibly limited to a range bounded on left or right or both.)
f(x) Fun facts about p.d.f
Observed Xs are denser in regions where f (x)
is high
The height of a density, f(x), is not the
0 x probability of anything it can even be > 1
a
10
5
Continuous Distributions (contd.)
Cumulative distribution function (cdf) - probability that the
RV will be a fixed value x:
F (x) P( X x)
x
f ( t ) dt
f(t) F(x)
1
F(x)
x t x
Properties of continuous cdf is
0 F(x) 1 for all x
As x , F(x) 0; As x +, F(x) 1
F(x) is nondecreasing in x
F(x) is a continuous function with slope equal to the
pdf: f (x) = F(x)
11
1
Exponential exp( x / ) x0
f ( x)
pdf 0 elsewhere
1 ( x )2
Normal f ( x) exp[ ] - x
2 2 2 2
pdf
6
Parameters of a Distribution
Expected value / Mean (measure of center)
Discrete RV: E( X ) xi p ( xi )
all i
Continuous RV: E( X ) x f ( x) dx
F ( ) P( X )
f(t) F(x)
1
t = F -1() x
7
What is Statistics?
Statistics Sample
Population X
X 1 , X2 , X 3 ,
Sampling
Random sample is a set of independent and identically distributed (i.i.d)
observations of size n from the population:
X1, X2, , Xn
Sample statistic a numeric function of the sample data
h(X1, X2, , Xn)
Used to estimate population parameters
Sample statistics are random variables themselves.
1 n
Sample var. S2 ( X i X )2
n 1 i 1
Variance 2 = Var(X)
16
8
17
Distribution of a Statistic
A sample statistic is a RV and thus have its own distribution, called the
sampling distribution.
Some sampling distribution results
Draw i.i.d obs. (X1, X2, , Xn) from an population (distribution) with
unknown parameters and 2.
pdf of t-dist.
Sample mean and variance:
(a) E ( X ) Var ( X ) 2 / n
(b) E ( S 2 ) 2
(c) X
~ Students t-distribution
S/ n
X
P t n 1,1 / 2 t n 1,1 / 2 1
0
S/ n t n 1,1 / 2 t n 1,1 / 2
S S
P X t n 1,1 / 2 X t n 1,1 / 2 1
n n 18
9
Point Estimation and CIs
S S
P X t n 1,1 / 2 X t n 1,1 / 2 1
n n
CI for the population mean
S S
[LCL, UCL] X t n 1,1 / 2 , X t n 1,1 / 2
n n
where tn-1,1- is the 100(1-th percentile of the students t distribution with
n-1degree of freedom (Excel function TINV can be used to compute tn-1,1-
Prediction Intervals
CI: estimated interval for the mean of population.
A C.I. is a measure of the error; its length will shrink to 0 as we get more
data
X
~ Students t-distribution
S/ n
S S
[ LCL, UCL ] X t n 1,1 / 2 , X t n 1,1 / 2
n n
Many practical applications call for an interval estimate of an individual
(future) observation sampled from a population rather than of the mean of
the population.
e.g., a company buying a new machine would like to estimate the
performance of that machine --- not the average performance of all the
machines produced by the manufacturer.
10
Prediction Intervals (contd)
Suppose that a random sample X1, X2, , Xn from an
approximately normal distribution N(, 2), where and
2 are both unknown parameters.
Estimate the interval such that with probability 1- a
random outcome X will fall within it
Prediction Interval (PI)
1 1
X t n 1,1 / 2 S 1 , X t n 1,1 / 2 S 1
n n
CI & PI X
~ Students t-distribution
S/ n
0
tn 1,1 / 2 tn 1,1 / 2
22
11
Example
Lets assume that the time it takes for a pumpkin candle to burn
itself out (burning time) is normally distributed. I bought 10
candles, burned them, and found that the sample mean of the
burning time is 5 hours, and the sample standard deviation is 1.2
hours.
RV X: candles burning time.
n = 10
X 5 hours
s = 1.2 hours
Q1: Provide a 95% confidence interval for the mean of the
burning time of candles.
S S
[ LCL, UCL ] X t n 1,1 / 2 , X t n 1,1 / 2
n n
23
Example (contd)
Q2: Now I bought another candle, please write down the
interval estimate such that with probability 0.95 the burning time
of this particular candle will fall into that interval.
1 1
X t n 1,1 / 2 S 1 , X t n 1,1 / 2 S 1
n n
24
12
Summary Statistics
1. Take a random sample: independent and identically
distributed (i.i.d) observations of size n
X1, X2, , Xn
Stochastic System
Random output of X ~ a 2. Calculate sample statistics (functions of RVs X1, X2, ,
certain distribution with Xn), such as sample mean X and sample variance S2
UNKNOWN parameters ---- A sample statistic itself is a random variable.
X
~ Students t-distribution
S/ n
13