Vous êtes sur la page 1sur 2

Statistics: the science of decision making in the face of uncertainty Descriptive statistics: summarize information in a data set by numeric

or graphical methods Inferential statistics: make a generalization from a small set of data to a large set of data and measure how accurate this generalization is. Example: Sample 1 % is higher than sample 2, thus infer population 1 % is higher than population 2. Population: all units of interest to the researcher finite or infinite Variable: any characteristic of a unit in a population 1. Qualitative 2. Quantitative Discrete Continuous Dependent variable: variable within a population a researcher is interested in studying Independent variable: variables who influence on the dependent variable Study Census: a census occurs with respect to a variable, when the variable for every unit in a population is measured Parameter: a number based on the entire population. Must be constant. Statistic: a number based on a sample. Statistic can change. Observational study: observes, does not control Experiment: manipulates conditions under which observations are made. Random sample: a sample selected in such a way that every unit in the population has the same chance in being selected Equal opportunities of being selected maximizes a sample that is representative of the population Random sample is the only way to measure accuracy of the sample Inference: using a statistic to estimate the value of a parameter, then measure reliability Logic of inference: assume a condition is true then determine the probability of an observed outcome given the condition is true Confounding variables: mixed variables that are uncontrolled with study rendering it impossible to determine if the study is viable. Histograms Relative frequency histogram: proportions on the vertical axis Frequency histogram: counts on vertical axis Shape: symmetric, skewed(where tail lies), bimodal (two local maxs) Spread: large, small Gaps: where no observations occur, Outliers: values that deviate from the shape of the data Goal: explain outliers and only drop from data when justifiable Distribution: Distribution of a quantitative variable: a gives all possible values of X and the proportion of those values or range of values in some intervals Continuous variable has a PDF (probability density function) Discrete variable has a PMF (probability mass function) PDF: probability density function F(x) 0 for all x values ( ) PMF: probability mass function F(x) 0 for all x values ( ) -

Normal Distribution: Notation: X N ( ), with x value, a the normal x value (in the table) is found by Bell curve, continuous Lognormal Distribution: Notation: ln(X) N ( ), with x value, a the normal z ( ) value (in the table) is found by Log curve, continuous Binomial Distribution: Bernoulli process: any process that has two outcomes S- success, F- fail, P- probability of success, when each outcome does not depend on the last the process is independent, X- # of successes in a binomial process P(x)- proportion of X that are successful in a large number of processes undergone ( ) ( ) ( ) ( ) ( ) Notation: X B( ), Measures of Center: - sample mean, average - Sample median Skewed left < Skewed right > Measuring center of distribution: ( ) E(x)= ( ) = E(g(x))= ( ) ( ) f(x)={

or

( ) ( )

, E(x)=

( ) Measures of Variation: : sample variance =


( )

S : sample standard deviation = : population variance; Definition ( ) Computing formula Short cut: -Uniform: = , -Binomial: =np(1-p) : population standard deviation
( )

) ,

Range= (largest value)-(smallest value) Experimental outcomes: Union, Intersection, Disjoint: mutually exclusive, no outcomes in common Probability Axioms: ( ) 1. 2. P(S) = 1, S-means the sample 3. Pairwise disjoint A1, A2,. Is a sequence of pairwise disjoint events in S, then P(A1 A2 .)= (Ai) Assigning Probability Classical Assignment: suppose outcomes are equally likely, then P(outcome)=1/N, N-total number of outcomes Interpret: As repeat the experiment to infinity, the proportion of an occurrence of an outcome will converge to P(outcome).

Relative frequency Assignment: P(outcome)= the proportion of the outcome in a population, or proportion of times the outcome occurs as repeat to infinity Interpret: As repeat the experiment to infinity, drawing/sampling with replacement, the proportion of an occurrence of an outcome will converge to P(outcome). Subjective Assignment: measure of belief in an outcome Rules: Addition Rule: P(A B)=P(A) + P(B) P(A B) Complement rule: P(A) = 1- P(A) Conditional Probability: ( ) Definition: ( | )
( )

Type 1: Ho is true but we conclude Ha. P(conclude Ha|Ho is true)=a 2. Type 2: Ha is true but we fail to reject Ho. P(conclude Ho| Ha is true)= B Power: probability of correctly concluding Ha - P(conclude Ha|Ha is true)=1-B Goal: Low a and B, and high power Ha: Ha: Ha: p-value= P( ( p-value= P( ( p-value=2 P( (
) ) )

Errors: 1.

|Ho is true) |Ho is true) ||Ho is true)

wo ways to calculate: 1- Use conditional sample space 2- Use definition Independence: Two events: P(A ) ( ) ( ) ( | )= P(A) Draws with replacement are independent Draws without replacement are dependent Rules: P(A B)=P(A)*P(B|A)=P(B)*P(A|B) Sampling Distribution: Parameter: constant number based on all units of pop. , population mean , population variance , pop. Std. deviation Unbiased estimator- when sampling distribution has a mean equal to the parameter trying to estimate Biased estimator- when the mean of a sampling distribution is not equal to the parameter Use biased estimator with smallest standard deviation if no unbiased estimator exists Sample distribution of If normal ~ N( ), X N( ) If population n 30 then ~ N( )

Need ~ N(

) for a high power

Area=p-value

t-distribution t ( ) df= degrees freedom as df , a ( ) ( ) ex. Interpretation: as randomly select t values from a ( ) distribution, to infinity, with replacement, the proportion of values # converges to P(t 2.4). Confidence interval (measure of reliability) 100% (1- )

, s-sample standard deviation

Interpret: the method used to obtain the interval has a 95% chance of producing an interval that contains CI ex: Based on the CI for , conclude > or < # CI ex: Based on the CI for , cant reject claim =# Hypothesis test Let = mean = Null hypothesis or status quo = Alternative hypothesis = the probability of obtaining the observed test statistic given Ho is true p-value, stronger evidence against Ho. - If p-value a, conclude; reject Ho in favor of Ha. - If p-value > a, conclude; fail to reject Ho -