STATS Formula Sheet

Measures of Location: mean median mode
Measures of Variability: Independent Events: Multiple Discrete Random Variables:

Range, Variation, st. Dev, IQR P( A ∩ B)=P( A)P (B) Joint PMFs: Let X and Y be r.v’s defined on the same
Sample Variation & St. Deviation: sample space S. The joint PMF of X and Y is
n P( A∨B)=P ( A ) p X ,Y ( x , y )=P( X =x , Y = y)
1
σ 2= ∑ ¿¿
n−1 i=1 For any joint PMF, p X ,Y ( x , y )≥ 0 and
If you scale every value in a data set by a constant c,
Discrete Random Variables Σ Σ p X ,Y ( x , y )=1
then the sample mean is scaled by c, the sample ❑
variance is scaled by c2 and the sample standard PMF:
deviation is scaled by |c|. p X (x)=P( X=x )=P({s ∈ S , X (s)=x }) For any event A, P(A) = ∑ p X ,Y (x , y )
❑ (x , y)∈ A
Marginal PMF:
CDF: F X ( x )=P( X ≤ x) ∑ P X ( y)
Yes No
y≤ x
Given a joint PMF p X ,Y ( x , y ), the marginal PMFs
are
Yes E( X)=Σ x∗P ( X=x )
Expectation:
nk
❑
E( h( X ))≠ h(E ( X )) , E(aX + bY )=aE( X)+bE
p X (x)=
(Y )∑ p X ,Y ( x , y )
Variation: y
No n! n! ❑
Var ( X )=E ¿
(n−k )k!!(n−k )! pY ( y )=∑ p X ,Y (x , y )
X −μ ❑¿ x
X=
Standardized version of X:
σ p X ( x)is the PMF of X and pY ( y )is the PMF of Y
Demorgan’s Law:
E( X ¿ )=0 , Var ( X ¿ )=1
Expectation
Continuous Random Variables: ❑ ❑
A probability function is a function P() which assigns
a number between 0 and 1 to every event that PDF: a function f: R -> R where: E( g ( X , Y ))=∑ ∑ g ( x , y ) p X ,Y (x , y )
satisfies the axioms of probability: (1) f (x)≥ 0for all x ∈ R x y
(1) P( A)≥0 , for every event A (2) Integrate from negative infinity to infinity is 1
Conditional PMFs:
(2) If the events are mutually exclusive then the union x
of the events are equal to the sum of the individual CDF: F (X )=P( X ≤ x)= ∫ f (u)du p X ,Y ( x , y)
probabilities PY ∨X ( x , y)=P( Y = y∨ X=x )=
(3) P(S) = 1
−∞ p X (x )
Percentiles: Let p be in [0,1] and let f be the PDF of Independence: for any event A and B
Properties of Probability Functions: the r.v. X, The (100p)th percentile of X is the number
(1) P(empty set) = 0 P( X ∈ A ,Y ∈ B)=P( X ∈ A )P(Y ∈ B)
(2) if A1 ... A nare disjoint events, η( p)satisfying F (η( p))=p (solve for η( p)) AKA
Expectation and Variance:
P( A1 ∪...∪ A n)=P (A 1)+... P( An ) ∞ p X ,Y ( x , y )= p X ( x ) pY ( y )
(3) P( A c )=1−P ( A ) E(X) = ∫ xf (x )dx
Multiple Continuous Random Variables
(4) if A ⊆B then P( A)≤ P( B) −∞ Joint PDF: of a pair of continuous r.v. X, Y is a function
∞
(5) P( A ∪ B)=P( A)+ P(B)−P( A ∩B) Var(X) = f X , Y ( x , y) on R × R where:
∫ ¿¿ f X , Y (x , y) ≥ 0 for all x , y ∈ R , and
−∞
Picking a Probability Function: Q-Q Plots: ∞ ∞
(1) Symmetry (2) Estimating the probability by data (3) The q-quantile is defined as follows:
Theory - If there is an i where q = qi then the q
∫ ∫ f X ,Y (x , y) dxdy=1
−∞ −∞
quantile is x(i)
Conditional Probability: For any event A,
- If q is between qi and qi+1, the q quantile is: ❑
- Let A and be certain events and suppose
x(i) + x(i+1) P(( X , Y )∈ A)= ∫ f X , Y (x , y) dxdy
P(B)> 0. The conditional probability that A
occurs given that B occurs is 2 (x , y)∈ A
P( A ∩ B) - if q≤ q the q quantile
1 is x (1)
Note: f X , Y (x , y) is not a probability
P( A∨B)= Sort observations x 1 , x 2 , ... , x n in increasing order:
d b
P(B) P( X ∈[a , b] , Y ∈[c , d
** x(1) , ..., x(n) . ])= ∫ ∫ f X ,Y ( x , y )d
c a
P( A ∩ B)=P( A)P (B∨ A)=P(B)P( A∨B) x(i)is the q -quantile of the dataset, where
i
** Marginal PDF:
P( A ∩ B ∩C)=P( A) P(B∨ A) P (C∨A ∩B) i−0.5 ∞
q i=
n PDF of x: f ( x )= ∫ f
X X ,Y ( x , y ) dy
Law of Total Probability: An indicator of whether your guess is a good one is −∞
-The events A1 , ..., An be a partition of S, and let B whether −1
x i ≈ F (q i ) (sample vs theoretical ∞
be an event where P(B)> 0 . if P( A i )>0then quantile). PDF of y: f Y ( y)= ∫ f X , Y ( x , y)dx

1) Plot the histogram data −∞
n n
2) Select prob. distribution (e.g. uniform,
P( B)= P( A i ∩ B)= P(B∨ A i) P( Ai )
∑ ∑ exponential, Weibull,...) based on shape of ex) P(Y ∈ A)=P( X ∈ R , Y ∈ A )
i=1 i=1 histogram
❑ ∞ ❑
3) Letting F(x) be the CDF of the selected
Baye’s Rule: distribution, compute the theoretical ¿ ∫ ∫ f X ,Y ( x , y ) dxdy=∫ f Y ( y ) dy
Let A 1 ... A nbe a partition of S and left B be the quantiles −1 A −∞ A
F (qi )
event where P(B)> 0. If P( Ai )>0then 4) Plot the points ( F (q ), x(i) )
−1 Conditional PDF: Let X, Y be continuous
i
P( Ai ∩ B) The closer that the points are to the line y=x, the r.v.’s with joint PDF fX,Y (x, y). Then
P( Ai ∨B)= better the fit.
P(B)
for any x ∈ R where fX (x) > 0, the
conditional PDF of Y given X = x is
f X ,Y (x , y )
f Y ∨X ( y∨x )=
f X (x )
For fixed x, f Y ∨X ( y∨x )is a PDF because
f Y ∨X ( y∨x )≥ 0for all y and

∞ ∞
∫ f Y ∨X ( y∨x ) dy=∫ ❑
−∞ −∞
f X ,Y (x , y ) f ( x)
dy= X , =1
f X (x) f X (x)
Independent: Two continuous r.v.s are independent if
f X , Y (x , y)=f X (x) f Y ( y ) AKA
P( X ∈ A ,Y ∈ B)=P( X ∈ A )P(Y ∈ B)
If independent, then Var(X+Y)=Var(X)+Var(Y) CLT: In terms of sample mean, CLT says that for large
Special Discrete Random Variables: Covariance and Scaling
X −μ
Given constants a, b, Cov(aX+b,Y)=aCov(X,Y) n, a.k.a. X ∼ N (μ , σ 2 /n)
BERNOULLI
Var(aX)=Var(X) σ / √❑
p X (x)=¿1− p , if
Cov( X ,Y )
Bern( ρ): x=0 Correlation: ρ( X ,Y )= The Poisson approximation of the Binomial may not
models random √❑ work that well, but according to the CLT
experiments whose p X (x)=¿❑ p ¿,
outcomes are if x = 1 ρ( aX +b , cY +d )=ρ( X ,Y )with a , c ≠0 P( X ≤ y)=ϕ ¿
either success or E( X)= p
failure
Var ( X )= p (1− p) CI Intervals : Suppose we want 95% CI for μ. Since
Under scaling:
X −μ
BINOMIAL n x It only detects linear relationships between X, Y , P ¿which means P ¿.
B(n , p): # of p X (x)=¿ ρ ¿ Independence implies that Cov(X,Y)=0, but not the σ / √❑
“successes” in n
x other way around! The random interval ¿ contains μwith probability
Bernoulli Trials E( X)=n ρ SAMPLING DISTRIBUTIONS
0.95. The probability that μ lies in [45,65] is either 0
(with replacement) Var ( X )=n ρ(1−ρ) Sample mean:
or 1.
n
1 A CI with confidence level 1−α is ¿ with width
POISSON −λ x X = ∑ xi
Pois ( λ): # of p X ( x)=¿ e λ n i σ
x! For Bernoulli rv, 2∗zα /2 . If X 1 ... X naren’t normally
times that an event √❑
occurs during a E( X)=λ k distributed, CLT says that for large n the CI is ¿
fixed length of time
Var ( X )= λ
X ρ( X= )=( n k ) ρk (1− ρ)n−k
_______________ n If we don’t know σ , replace it with sample standard
_________________ n
__
p X ( x)=¿ 2
Sample variance S =
1 X−μ
Poisson
approximation of
∑ ¿ ¿ is the point
n−1 i=1
deviation S. So
S / √❑
is no longer a standard
Binomial R.V. for e−n ρ (n ρ) x

estimate of the population variance σ 2 normal r.v. (distribution is more spread out).
n>50 and ρ<0.1 x! T-distribution: When X1…Xn are iid Normal r..v. with
When we have a simple random sample X..Xn, (1) the
GEOMETRIC p X (x)=¿ Xi’s are independent r.v and (2) every Xi has the X−μ
same probability distribution (the same PMF or PDF)
mean μ and variance σ 2then the r.v. T =
Geom( ρ): # of 1 S / √❑
Sum and Averages
independent E( X)= has a t-distribution with n-1 degrees of freedom. T
Bernoulli trials ρ X1…Xn are iid with meanμ and variance σ 2 distri is more spread out than standard normal, and
needed to obtain 1− ρ if Sn= X 1 +...+ X nwith X ❑ =S n /n for n ≥ 30, we can approximate t distribution as
the first success. A Var ( X )= and
success occurs with ρ2 E( S n)=n μ , Var ( Sn )=n σ 2,
standard normal$
probability ρ The quantiles t α ,n−1=(1−α ) quantile of t-dist
SD(S ❑n)=√ ❑ with n-1 degrees of freedom are critical values.
HYPERGEOMETRI N s N−N s Unknown Mean, Unknown Variance: CI for
C 2 , SD( X )=
σ2
E( X)=μ ,Var ( X )=σ /n
Hyp( N , N s , n) p (x)= x n−x n T=
X−μ
is ¿ . BUT if n ≥ 30, you can use
X
: # of successes N S / √❑
when sampling w/o
replacement n standard normal distribution instead of t distribution.
n∗N s Central Limit Theorem: With a sample size of at least σ 2can be replaced with an approximation S2
E( X)= 30, the distribution of the sample means will be CI for Proportion (with Bernoulli r.v).: If
N
approximately distributed regardless of the shape of
N−n n∗N s the population
N X 1 ... X n ∼ Bern( p) with μ= p and
Var ( X )=
N−1( N )( N )( )
1− s with mean μ and variance σ 2. Then
σ 2=p (1− p). Let
Ns for any x ϵ R , lim P ¿where Φ ( x)is the
If we let ρ= ,
n→∞ ^p= X=number of yes /success. Since
we’re dealing with Bern,
N n
similar to Binomial R.V. standard normal CDF and S = ∑ x . n
n i S2 = ^p (1− ^p ) ≈ ^p (1−^p ). So we get a
COVARIANCE AND CORRELATION i n−1
The CLT says that for large n, the r.v.,
S n−n μ is
CI of ¿
-Covariance of two rv X and Y is Margin of Error: No standard defn, usually 1 or 2 times
√❑
Cov(X,Y) = E[(X-E(X)(Y-E(Y)))
approximately normally distributed with the standard error SD( ^p)=√❑
-Linearity of expectation implies that Cov(X,Y)=E(XY)-
2 Estimating Differences between Means: Assume
E(X)E(Y) mean 0 and variance 1→ Sn ∼ N (n μ , n σ )
Pos correlation if Cov(X,Y)>0 Normal Approx of Binomial: Suppose X 1 ... X n 1 ∼ N (μ1 , σ 12)and
Neg correlation if Cov(X,Y)<0
Uncorrelated if Cov(X<U) = 0
Y ∼B (n , p). Then we can write Y as a sum of iid Y 1 ... Y n 2 ∼ N (μ2 ,σ 22 ) assuming
Bernoulli r.v. with μ= ρand variance
-Cov(X,X)=Var(X) independence betw X and Y. Estimate μ1−μ 2by
-If X,Y are independent, then Cov(X,Y) = 0 because 2
the independence of X and Y implies E(XY)=E(X)E(Y)
σ =ρ(1− ρ) . Recall E(Y )=n ρ and
X −Y where E( X−Y )=¿ μ1−μ 2 and Var(
Variance and Independence Var (Y )=n ρ(1−ρ). CLT says that for large n,
For any X and Y (independent or not)
Var(X+Y) = Var(X)+Var(Y)+2Cov(X,Y)
Y ∼ N (n ρ , n ρ(1−ρ)).
σ 12 σ 2 2 H 0 :μ x −μ y =c , H 1 : μ x −μ y ≠ c . Test
X −Y ¿= + . So an CI is ¿ . Safe to use
waiting and x >0, 0 otherwise
interarrival times
n1 n2 X m −Y n−c 1
statistic is T = E( X)= ,
when n1 , n2 ≥ 30 . If σ 2
, σ2 2
unknown then √❑ λ
1
Can be adopted to situations where σ
2
,σ
2
are 1
replace with S12 , S22. x y Var ( X )= 2
unknown, or X and Y are non-normal with m,n≥ 30 λ
If X 1 ... X n 1 ∼ Bern( p1 )and
H 0is true and you
Type I and II Error: Type I is when
Y 1 ... Y n 2 ∼Bern( p2 ) assuming independence
WEIBULL α
reject H 0(false rejection). Type II is when H 1is true Weibull (α , β): f (x)= ∗¿ if
betw X and Y. Estimate
generalization of
β
and you fail to reject H 0(false accept). exponential r.v.; x >0, 0 otherwise
p (1− p1) p 2(1− p2)
p1 − ^
^ p2 ∼ N ( p1 − p2 , 1 + α ),
=P(Test makes Type I error) and
useful for reliability F (x)=1−e−¿¿if
n1 n2 theory
x >0, 0 otherwise
CI is ¿ β=P(Test makes Type II error ). By
β Γ (1/α )
picking α , we pick our Type I Error rate. E( X)=
Paired Samples: When X and Y are not independent, α
D 1 ... D nwhere D i= X i−Y i and i=1. ..n 2
let
Var ( X )= β2 (Γ (1+ )−Γ 2
and E( D)=E (X )−E (Y ) . The Confidence
α
Where
Interval is ¿ where S Dis found by calculating the ∞
actual data points D 1 ... D n . Γ ( z )=∫ u z−1 e−μ du

0
Hypothesis Testing: p value is the probability of
obtaining data at least as extreme as those in our BETA
f (x)= xα −1 ¿ ¿, if
sample, assuming H 0is true. It is not the probability Beta(α , β ):
useful for modeling
x ∈(0,1), 0
thatH 0is true, that is either 0 or 1. We reject null r.v. whose set of otherwise
hypothesis if p value ≤ α , the significance level. possible values is a Γ (α ) Γ (β )
finite interval B(α , β )=
Otherwise, fail to reject Γ (α + β)
One-Sided: H 0 :μ=μ0 , H 1 : μ< μ 0. We reject B(x ; α , β)
F (x)=
H 0if sample mean X nis too small. If sample mean is B (α , β)
x n, then the p value is p=P ¿ α
E( X)=
α+β
X −μ 0
Test Statistic: T = n . If T = t, then αβ
σ / √❑ Var ( X )=
¿¿
p=P( Z ≤t ). However, if n<30, then
p=P( T n−1 ≤t ) GAMMA −x
X −μ 0 Gamma(α , β) x α −1 e B ,
Two-Sided: Use T = n or replace σ with : generalization of f (x)= α
σ / √❑ exponential r.v.; β Γ (α)
Sn. The p value is p=P(|Z|≥|t|) or Continuous Random Variables: useful for sums of if x ≥ 0 , 0 otherwise
exponential r.v.’s
γ (α , β x)
p=P(|T n−1|≥|t|) F (x)=
UNIFORM PDF: Γ (α )
Unif ( A , B): 1 E( X)=α β
Kind of Beta R.V. f (x)= if
when any number
B− A Var ( X )=α β 2
in a range is equally A ≤ x ≤ B, 0
likely otherwise NORMAL 1
CDF: N ( μ , σ 2 ): most f (x)=
√❑
x− A widely used. Sum 1
Bernoulli Population: Let X 1 ... X n are iid Bern(p). F (x)= if of a finite number F (x)= ¿
B− A 2
We conduct hypothesis test of independent &
A ≤ x ≤ B , 0 if identically E( X)=μ
H 0 : p= p0 , H 1 : p ≠ p0 .When n is large by CLT, x ≤ A , 1 otherwise distributed(iid) r.v. is
X n=N ( p0 , p 0 (1− p0 )/n) and test statistic is A+ B (approximately) Var ( X )=σ 2
E( X)= , normally x
X n− p 0 2 distributed. F (x)=P( X ≤ x )=P(Z ≤
T= Var ( X )=¿ ¿
√❑ x−μ
Two Independent Samples: If X 1 ... X mare iid Φ( )
EXPONENTIAL
f (x)= λ e if −λ x σ
N ( μ x , σ x 2)and Y 1 ... Y nare iid N ( μ y , σ y2 ). exp( λ): kind of x ≥ 0 , 0 otherwise
Weibull r.v. useful
We conduct test for modeling F (x)=1−e− λx if

STATS Formula Sheet

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

STATS Formula Sheet

Transféré par

Droits d'auteur :

Formats disponibles

Measures of Location: mean median mode

Measures of Variability: Independent Events: Multiple Discrete Random Variables:

be an event where P(B)> 0 . if P( A i )>0then quantile). PDF of y: f Y ( y)= ∫ f X , Y ( x , y)dx

For fixed x, f Y ∨X ( y∨x )is a PDF because

f Y ∨X ( y∨x )≥ 0for all y and

Binomial R.V. for e−n ρ (n ρ) x

actual data points D 1 ... D n . Γ ( z )=∫ u z−1 e−μ du

Vous aimerez peut-être aussi