Vous êtes sur la page 1sur 28

Preliminaries (Review)

Basics

The purpose of statistics is to design studies and analyze the data that those studies produce. In other words, it is the science of collecting and learning from data. Examples include: Predicting the outcome of an election based on a survey of 500 registered voters Designing and properly analyzing clinical trials to get FDA approval for an experimental drug Clustering online shoppers based on their purchasing behavior so that advertising can be targeted to specic groups likely to be interested in a product Analyzing data produced in an fMRI study to ascertain how the brain responds to certain types of tasks or stimuli In short, any eld that requires the collection of data and drawing subsequent conclusions uses statistics. The idea is that we are interested in some numerical summary (e.g. average) or characteristic of some extremely large group, but we cant observe the whole group directly. We must observe what we can, describe it, and decide what we think is true.

We learn about populations using samples.


Population: The total set of subjects in which we are interested. E.g., the entire electorate in a certain state, all people would possibly ever take a drug, every human brain, etc. Sample: A subset from the population that we observe and collect data on. E.g., 200 voters that participate in a sample, volunteer patients in a clinical study, 25 randomly selected Amazon.com users, etc. Subject: An individual element of the sample. E.g., each individual voter, a single individual brain from an fMRI study, an individual rat that was exposed to some treatment, etc. 1

Parameter: A numerical value summarizing the entire population. E.g., the true proportion of voters in Georgia who will vote for Barack Obama, the true average gas mileage of every Cadillac Escalade on the road, etc. Statistic: A numerical value summarizing the sample. E.g., the proportion of our 200 voters who said they would vote for Obama, the maximum time to nish a race out of 15 selected athletes, etc.

A note on notation
The convention in statistics is to (usually) use Greek letters to denote parameters and Latin letters to denote statistics: Population mean Sample mean x Population standard deviation Sample standard deviation s Population correlation Sample correlation r Of course, there are exceptions. We will often use p to denote a population proportion and p when referring to a sample proportion. ( is also used for population proportions, but this can be confusing when the context is not clear.)

Aspects of Statistics
Statistics, broadly speaking, can be divided into three categories: 1. Design - Deciding how to collect data so that the most information possible can be extracted to answer the specic questions of interest

2. Descriptive Statistics - Methods of graphically displaying and numerically summarizing collected data

3. Inferential Statistics - Using the collected information to draw conclusions and/or make predictions about the population as a whole

Experimental design primarily falls under (1). However, we will also be considering how to analyze data after it has been collected in an experiment to draw appropriate conclusions. This aspect falls under (3).

Categorical versus Quantitative Variables


Categorical data arise when what we are observing or recording is a member of some category. For example, a person may be short or tall; a car is red, black, or other; a person responds to a questionnaire with strongly disagree, disagree, neutral, agree, or strongly agree. Such data are usually displayed and analyzed via frequency tables,

contingency tables,

or bar charts.

Pie charts are also used, but these are pretty much useless from a statistics perspective.

Quantitative variables, on the other hand, are numerical measures of subjects, e.g. height, weight, test score, etc. We further make a distinction between discrete and continuous variables: Discrete - Variables can only take on a countable number of values. (Note that countable is not the same as nite.) If you can count the possible values that could possibly be observed, then the (quantitative) variable is considered discrete. Example: Discrete variables include the number of students in a randomly selected class on campus, the number of times a 6 is rolled out of 15 rolls of a die, the number of words on a page of notes, etc.

Continuous - If a variable can (theoretically) take on any value in a continuum of possible values, then it is considered continuous. For continuous variables, the number of possibilities that can be observed is uncountable. Example: Height, weight, and temperature may all be considered to be continuous variables. For example, there is no way to count all the possible weights of cars we could observe.

When dealing with quantitative data (as will usually be the case in this course), it is generally a good idea to plot your data. There are several ways in which we can graphically depict such data, including histograms,

stem-and-leaf plots,

and dotplots.

Often, at least one of the purposes of plotting your data in, e.g., a histogram, is to get some idea of the shape of the population from which our observations came. A common assumption in statistical inference is that our data are normally distributed, so that the population as a whole follows a normal (symmetric and bell-shaped) distribution. Certainly this assumption is not tenable if we plot a reasonably large sample and nd that it is skewed.

Skewed Right - The right tail of the distribution is stretched out more than the left tail

Skewed Left - The left tail of the distribution is stretched out more than the right tail

Symmetric - When split down the middle, the distribution is a mirror image of itself

As our sample gets larger and larger, and the bins of the histogram get smaller and smaller, we end up with a smooth curve that describes how values in the population are distributed. The actual smoothed histogram of the entire population (called a probability density function ) gives a complete picture of how the values in the population are distributed so that we can evaluate probabilities of observing events within this population. In a sense, most of the science of statistics boils down to questions of what this probability distribution (or density) looks like. Arguably the two most often asked questions of probability distributions are (1) what is the center of this distribution, or a typical value were likely to observe, and (2) how spread out is the distribution?

Measures of Center for Quantitative Data


Questions of a typical value are akin to asking where we think the center of a distribution lies. The most common values used as a measure of this center are the mean, median, and mode. Mean - This is just the average. For a sample, we calculate the mean as x = xi . n

Remember that we denote the sample mean with x and the population mean with . Median - This value occupies the middle position when the data are arranged in ascending order. In other words, the median separates the top 50% of the data from the bottom 50%. To compute the medain: 1. Arrange the data in ascending order 2. (a) If n is odd, the median is the (n + 1)/2 value in the ordered list (b) If n is even, we take the median to be the average of the middle observations. That is, we average the n/2 and n/2 + 1 values. Mode - The mode is just the value that occurs most often in a dataset; i.e. it is the value that corresponds with the highest frequency. Note that the mode does not have to be unique. We can have bimodal datasets:

When thinking about the best number to use to describe a typical value, you want to think about the spread of the data and whether there are any outliers. A mean is sensitive to every value in the dataset so that even one outlier can pull the average quite far o from what we would consider a typical value. The median is much more robust (resistant) to extreme values. Further, we can compare the mean and the median to get some idea of the shape of the distribution: Mean > Median The data are skewed right Mean < Median The data are skewed left Mean = Median The data are symmetric

Measures of Spread for Quantitative Data


To measure how spread out a dataset is, we would consider values such as the standard deviation, variance, and range. Standard Deviation - This is a measure of the average distance away from the average. It is simply a way of quantifying the typical deviation for an observation in the data, where deviation for observation i = xi x . Variance - While not as easily interpretable as the standard deviation, it is always computed as a step toward nd the standard deviation. It can be thought of as the typical squared distance away from the average. It is used because it is more mathematically convenient to work with. The sample variance also has many desirable properties as far as estimating the population variance. The sample variance is given by (xi x ) s2 = , n1 whence the sample standard deviation is simply s = s2 . Range - Measures the spread with the distance between the largest and smallest values: range = maximum minimum Sometimes, we are also interested in the position of a particular value relative to the other values in the dataset. This is found via sample percentiles or z-scores. Percentile - The pth percentile is a value such p% of the data are less than or equal to that value. Special cases are the rst quartile (Q1 = 25th percentile), the median (50th percentile), and the third quartile (Q3 = 75th percentile). Another quantity that can be used to describe the spread of data is the inter-quartile range, IQR = Q3 Q1 9

Boxplots are graphical representations of the ve-number summary (Min, Q1, Median, Q3, Max)

Z scores measure how far a value is away from its mean in terms of number of standard deviations: xi x z-score of observation i = s Recall that z-scores (i) scale all of our data to a common metric so that they can be compared, (ii) should usually between -3 and 3, if were using the right mean and standard deviation, and (iii) follow a standard normal distribution if the data are themselves normally distributed.

Probability Distributions

Probability
Probability, simply stated, is how we quantify uncertainty. For our purposes, we can think of the probability of a particular outcome as the long-run relative frequency of its occurrence. In other words, when I repeat a trial a large number of times, the ratio of the number of times I observe that particular outcome to the total number of trials is approximately its probability. The larger the number of trials, the closer the ratio is to the actual probability. This is a case of the Law of Large Numbers. Another way of thinking about probability is the proportion of the population that satises the event of interest. Example: 57% of all college students in the United States are female. So, if the population is all college students in the United States, then the probability of randomly selecting a female is P (female) = .57.

10

A few basic rules about probability: 0 P (A) 1, for any event A. If A1 , . . . , An partition the sample space so that each possible outcome is in one and only one event Ai , then n i=1 P (Ai ) = 1. P (Ac ) = 1 P (A) P (A or B ) = P (A) + P (B ) P (A and B ) P (A|B ) = P (A and B )/P (B ) Two events are said to be independent if knowing that one occurred does not aect any probability statements we make about the other. This is another assumption we will usually make about any data we analyze. This is actually an extremely important assumption and why it is important to make sure we have a random sample. Example: If I get heads on the rst ip of a coin, what does that tell me about the probability of getting heads on the second ip?

11

Example: Suppose we draw an ace from an ordinary 52-card deck and dont replace it. What is the probability of getting another ace on the second draw? Is it the same as the probability of an ace on the rst draw? Are the two draws independent or dependent?

Lastly, note that if two events A and B are independent, then P (A and B ) = P (A)P (B ).

Probability Distributions
We have already noted that distributions can be described with histograms for an entire population. Specically, probability distributions are a way of specifying all possible values and the probabilities with which any of those values may occur. The mean of the probability distribution of X is called the expected value of X , denoted E (X ). We usually describe the probability distribution of continuous random variables with a curve. In such cases, the probability of a range of values occurring is just the area under the curve between the values specied.

Note that calculating an area always involves length height. For probability curves, the length is the length of the interval of values were interested in and the (varying) height is determined by the function. Note that when we consider something like P (X = 0), the length of this interval is 0. Hence, for any continuous random variable, the probability that it is exactly equal to any number is 0. By far the most important distribution we deal with in statistics is the normal distribution. This distribution is characterized by the usual symmetric, bell-shaped curve. Its very dicult to even approximate areas under a normal curve, so we have to use a normal table or a computer to calculate probabilities associated with normal distributions. However, there are some special cases we can gure out via the Empirical Rule. 12

Example: Suppose X follows a normal distribution with mean and standard deviation , for which we will use the notation X N (, 2 ). Using the Empirical Rule, nd P ( < X < + 2 ).

13

Recall that if X N (, 2 ), then Z = (X )/ N (0, 1). Example: For a standard normal random variable Z , nd P (0 < Z < 2). Compare this to the previous example.

Sampling Distributions and The Central Limit Theorem


When we take a sample from a population, we are just observing repeated realizations of a random variable from its probability distribution. As such, the sample is random and thus any quantity we calculate from the sample (i.e. any statistic) is also random. This means and S 2 are themselves random variables and have their own probability that values such as X is called a sampling distributions. The probability distribution of any statistic such as X distribution. The sampling distribution of a sample statistic describes all possible values the statistic could take by taking all possible samples of size n. The standard deviation of a sampling distribution is called a standard error. So why is the normal distribution so prevalent in statistics? Because of the Central Limit Theorem.

The Central Limit Theorem (CLT) says that the sampling distribution of the sample mean will be well approximated by a normal distribution as long as the mean is based on adding up n independent random variables, provided n is large. looks like reThe CLT is useful because it tells us what the sampling distribution of X gardless of the population distribution. Of course, if we have a small sample (or dependent observations!), the result is of little use. However, if the population itself is normal, then X is automatically normally distributed regardless of the sample size. So, concerning the sam , we have two things that are always true, and one that is sometimes pling distribution of X true:

14

The Sampling Distribution of X Regardless of how the population is distributed: ) = , i.e. the mean of the distribution of the sample mean is the same as the E (X mean of the population ) = /n, i.e. the standard error of the sample mean is the population standard se(X deviation divided by the square root of the sample size N (, 2 /n), by CLT. If the sample size is large, then X

Example: The average fat content of hot dogs is 18 grams. Suppose the standard deviation is 3. What is the probability that the average of 47 randomly selected hot dogs exceeds 19.2 g?

Now suppose that the fat content of hot dogs is normally distributed. What is the probability that the fat content of a randomly selected hot dog exceeds 19.2?

15

Inference

Point vs. Interval Estimation


A general problem with which statistics is concerned is that we rarely have knowledge of the true parameters governing the observations that we get from collecting data. In fact, one could argue that the whole point of collecting data is to learn as much as we can about what those parameters are. This is what makes sampling distributions so useful - by knowing what probabilities we have of observing certain statistics, given the parameters, we can work backward and make educated guesses about what the parameter(s) must be, given what we have observed. These educated guesses comprise what we call estimation and are generally categorized as either point estimation or interval estimation.

Point Estimate - A single number used as our best guess for the value of an unknown parameter

Interval Estimate - An interval of numbers that we use as a set of plausible values of the parameter of interest. These intervals are constructed from properties that we know to be true of sampling distributions

Condence Intervals
Interval estimation is done using condence intervals (CI). These are just intervals that we say contain the true value of the parameter with a certain level of condence. The general form for most (but not all!) condence intervals we will be concerned with is given by point estimate (critical value) standard error. The product on the right hand side of the expression is called the margin of error. The critical value is totally determined by our desired level condence. That is, for a C % condence interval, the critical value is the number of standard errors away from the mean within which the middle C % of the probability is contained under the sampling distribution of the point estimate.

16

Example: Under certain conditions, the sampling distribution of the sample proportion ( p) is normal. So, the critical value for 95% condence intervals about proportions is approximately 1.96

NOTE : We must be very careful about how we interpret condence intervals. Lets say a 95% CI for the average number of peaches produced by a tree in some orchard is given by (112, 148). We would then say, We are 95% condent that the true average number of peaches produced by trees in this orchard is between 112 and 148. This is not the same as saying the true average has a 95% chance of being between 112 and 148. Since the true average is (assumed to be) a xed number, its either in that interval or its not. Were saying that we have no idea if the true value was captured in that particular interval, but we know that 95% of all the intervals we could ever construct using that method would capture it.

17

The (Students) t Distribution


When dealing with means of quantitative variables, the most common point estimate is x . So, how do we nd the necessary critical value for condence intervals about x ? Assuming the necessary criteria are met, the sampling distribution is normal. However, the standard is /n, which obviously involves the population standard deviation. We usually error of X dont know what the true standard deviation is, so it has to be estimated with the sample standard deviation, s.
X N (0, 1) to derive interval Again, in a perfect world, we would like to start with Z = / n estimates for . Since we are now estimating with s, our starting point is the quantity X . This introduces additional uncertainty into the sampling distribution, which is now s/ n called a t distribution.

The t distribution is used an approximation to the normal distribution, accounting for additional uncertainty introduced by estimating the standard deviation.

Properties of the t distribution: 1. It is bell-shaped and symmetric about 0 2. Its spread and shape, and hence the probabilities associated with it, depend on the degrees of freedom, df . 3. The t distribution has more probability in the tails than a standard normal, but its shape gets closer to a normal as df increases

18

With the t distribution in place of the standard normal, the general form for a condence interval for means then becomes: s x t n 1 n Here, t n1 refers to the appropriate critical value from a t distribution with df = n 1. Strictly speaking, the t distribution only holds exactly when the population is normally distributed (i.e. the CLT doesnt quite work). However, using this distribution still works quite well with mild departures from normality, so if we believe the population is close enough to being normal, we can still use the t as a basis for inference.

19

Hypothesis Testing
The goal of interval estimation is to produce a set of plausible values for a parameter, based on the data that we have observed. Alternatively, we may be particularly interested in one specic value for a parameter and want to see if this is a plausible value. This is the realm of hypothesis testing, a.k.a. signicance testing. The logic of hypothesis testing is to assume some value is true, then assess how probable our actual observations would be under this assumed distribution. If the results are extremely unlikely under the assumption, then either (a) we have observed something that rarely occurs, or (b) our assumption was wrong to begin with and we reject the assumed value of the parameter as being implausible. Steps of a Hypothesis Test 1. Assumptions: We must make certain assumptions about our data for the testing procedure to be valid. The conditions must hold to make sure that were using the correct reference distribution of our test statistics 2. Hypotheses: The null and alternative hypotheses must be specied so that the test will, in fact, answer the question we have. The null hypothesis, H0 , gives us the assumed value with which probabilities will be evaluated. The alternative hypothesis, HA (sometimes denoted H1 ), is what we actually believe to be true or hope to show. It also dictates how the p-value will be calculated. 3. Test Statistic: This is the quantity we calculate to quantify how extreme our observed data would be if H0 were true. It is (hopefully) chosen to be a random quantity whose probability distribution (sampling distribution) we know exactly under H0 so we can determine the probability of observing it or observing something more extreme. 4. p-value: The p-value is the probability that our chosen test statistic takes on the value we observed, or something more extreme, if the null were true. The direction of what we mean by extreme is determined by our alternative of interest: Right-tailed Test - The alternative is that the true parameter is greater than the null value, e.g. HA : > 0.

20

Left-tailed Test - The alternative is that the parameter is less than the null value, e.g. HA : < 0.

Two-tailed Test - The alternative is that the parameter is not equal to the null value (just something else), e.g. HA : = 0.

Since the p-value is representative of how likely our observations would be assuming H0 , small values are evidence against the null. That is, the smaller the p-value, the less plausible H0 is. If the p-value is too small, we reject H0 in favor of the alternative. 5. Conclusions: To determine how small is too small, we usually compare the p-value to some pre-specied signicance level, . The most common choices are = .01, .05 or .1. If p-value < , we reject H0 and say that the results are statistically signicant. Otherwise, we fail to reject H0 . Failing to reject the null hypothesis is not the same as claiming the null to be true. It simply says we dont have enough evidence to think otherwise. Remember: Absence of evidence is not evidence of absence. - Carl Sagan

21

NOTE : It is important to draw the appropriate conclusions in the context of the problem. If a psychologist is interested in whether or not attention span decreases in children after watching Spongebob Squarepants, you dont go back to them and say, we rejected the null hypothesis. Make sure you are able to translate the results of the signicance test back into a context meaningful to the problem at hand.

Signicance Tests for Means


The general signicance test about the mean of a single population proceeds as follows: 1. Assumptions: We assume that the data are (i) randomly collected so that they are an independent sample, and (ii) the population is normal OR the sample size is at to be normally distributed. least 30. In other words, what we want is for X 2. Hypothesis: The null hypothesis will always be of the form H0 : = 0 , where we want to see if 0 is a plausible value of the population mean. The alternative will be one of three types: (i) HA : < 0 , (ii) HA : > 0 , or (iii) HA : = 0 . Again, whichever is appropriate depends on the context of the problem. 3. Test Statistic: The test statistic we use is t= x 0 , s/ n

which follows a t distribution with df = n 1, if H0 is true. Hence the name, t test. 4. p-value: We always use the null distribution to evaluate the p value: (a) HA : < 0 :

(b) HA : > 0 :

22

(c) HA : = 0 :

5. Conclusions: Again, make sure that you can interpret the results in the context of the problem. There is no xed rule for this; just always keep in mind the big picture of what youre trying to do.

23

The Connection Between Condence Intervals and Hypothesis Tests


There is an intimate relationship between condence intervals and signicance testing. It turns out that every hypothesis test has a corresponding condence interval, and vice versa. One can actually start with a hypothesis test to derive the formula for a condence interval. A (1 )100% condence interval is the set of all null values such that a two-sided level hypothesis test would not reject H0 . For example, in the case of population means, a 93% condence interval is the set of all numbers 0 such that H0 : = 0 would not be rejected at the = .07 level using a two-sided alternative. Example: A 95% CI for the average number of peaches produced by a tree in a particular orchard is (112, 148). If represents this true average and we were to perform a hypothesis test of H0 : = 130 vs. HA : = 130, would we nd a signicant dierence at the = .05 level? What if we tested H0 : = 110 vs. HA : = 110?

Evaluating Hypothesis Tests


When we perform a hypothesis test, we (usually) have no way of knowing what the truth actually is. So our testing procedure allows for four possibilities: (i) We reject H0 when the null really is false, so we make the right decision (ii) We reject H0 when, in fact, H0 is true. This is a Type I Error. (iii) We fail to reject H0 when the null is true. Again, this is the right decision. (iv) We fail to reject when H0 is actually false. This is a Type II Error.

24

The two things statisticians really consider when evaluating a testing procedure are the probabilities of committing each type of error, := P (Type I Error) and := P (Type II Error). The used here is the exact same used in deciding to reject H0 . That is, the chosen signicance level is the probability of committing a type I error. This follows directly from how we constructed the testing procedure to begin with. For example, when using = .05, we reject when there is less than a 5% chance of observing what we did under H0 . But there is a 5% chance of getting a result like this when H0 is, in fact, true, so P (Type I Error) = .05. Ideally, we want both and to be small. However, these can be competing goals. For example, letting get really small obviously decreases the chance of committing a type I error. However, if is too small, it will be hard to ever reject H0 , even when its false, so the chance of a type II error increases. Statisticians generally consider a type I error to be worse than a type II error (i.e. hanging an innocent man is worse than letting a guilty one go free). Thus, we set to be some desired level, then do what we can with subject to that constraint.

Comparing Two Groups

All the techniques discussed thus far involve the analysis of a single population. That is, there has only been one variable with which we were concerned and we wanted to estimate or perform tests about parameters governing that variables distribution. There are many practical situations, though, where we wish to analyze samples from two or more separate populations to see how they compare to each other. This is certainly true in the analysis of experiments, where each treatment condition has its own population of possible observations. Fortunately, many statistical procedures for the analysis of a single population can be extended to answer analogous questions about two or more groups. Exactly which procedure is appropriate depends on what we know about the samples and what were willing to assume about them.

Testing for Equality of Two Means


Suppose we are interested in comparing the number of hours students spend studying per week during the Spring semester. In particular, we want to see if their study habits are dierent before and after Spring Break. If we let 1 be the average number of study hours per week prior to the break and 2 be the average number of study hours after the break, then this is a question of how 1 compares to 2 . Since were only interested in a change of habits, the appropriate test is H0 : 1 = 2 vs. HA : 1 = 2 , or, equivalently, H0 : 1 2 = 0 vs. HA : 1 2 = 0. 25

To test this, we randomly select students and ask them to estimate their average number of study hours per week. We do this prior to Spring Break and after, so that we are comparing two populations. The rst population is that of the study hours per week for all students prior to spring break, which has its own mean (1 ) and its own standard deviation (1 ). Likewise, the second population is that of the study hours per week for all students after spring break, with mean 2 and standard deviation 2 . For comparing two groups, we can always us a test statistic of the general form, Test stat. = point estimate H0 value . s.e. of point estimate

The details (the appropriate standard error, really) depend on (i) whether the two sampled 2 2 groups are independent, and (ii) whether we believe 1 = 2 .

Two Independent Samples


If the students selected prior to Spring Break are a completely dierent group from those randomly selected after Spring Break, then the two groups are independent of one another, i.e. what we know about the rst group of students doesnt really tell us much about the second group. So this is a case of comparing two independent samples. It makes sense to use the dierence of the averages of the two groups, x 1 x 2 , as a point estimate of the true dierence 1 2 . So, its just a question of nding the standard error of x 1 x 2 . This is where we need to decide whether or not the two variances (equivalently, the two standard deviations) are equal.
2 2 (i) 1 = 2 2 We can take the sample variance of each group to get s2 1 and s2 . If the variances of the two populations are the same, though, then both of these statistics are estimating the 2 2 same quantity, 2 = 1 = 2 . It then makes sense to combine the information contained in both of these via weighted average to get a pooled estimate of the variance:

s2 p

2 (n1 1)s2 1 + (n2 1)s2 = n1 + n2 2

We can then write the standard error of x 1 x 2 as s.e.( x1 x 2 ) = so that our test statistic is t= s2 p 1 1 + n1 n2

x 1 x 2 s2 p 26
1 n1

.
1 n2

We compare this to a t distribution with df = n1 + n2 2 to get the appropriate p value. The corresponding condence interval for this test is x 1 x 2 t n 1 +n 2 2 s2 p 1 1 + n1 n2

2 2 1. (ii) 1 = 2 In this case, we still have the two sample variances from each group. However, theres no sense in combining them because they are estimating two dierent variances. Rather than pooling, we use the unpooled estimate of the standard error,

s.e.( x1 x 2 ) = The test statistic in this case becomes t=

s2 s2 1 + 2. n1 n2

x 1 x 2
s2 1 n1

s2 2 n2

It turns out that this test statistic isnt exactly distributed as a t. Its exact distribution is unknown, but it can be approximated with a t distribution with degrees of freedom given by :=
s2 1 n1
2 (s2 1 /n2 ) n 1 1

+ +

s2 2 n2

2 (s2 2 /n2 ) n 2 1

The corresponding condence interval still ts the general form: x 1 x 2 t s2 s2 1 + 2 n1 n2

Two Dependent Samples


Rather than randomly selecting two dierent groups of students, suppose we asked the same group of students about their study habits before and after Spring Break. Then we certainly cannot treat the two groups of observations as if theyre independent, because knowing what students said the rst time will give us some idea of what the data will look like after the break. However, recall that the hypothesis of interest is H0 : 1 2 = 0, so that its really the dierence of the means that were interested in, not the two separate means themselves. Let 1 2 d be the average dierence of study times before and after spring break. Then we can think of d as being the mean of the population of dierences in study times. We can rewrite the hypothesis test as H0 : d = 0 vs. HA : d = 0 and nd the dierence in 27

study time for each student selected.

Notice that by collapsing each pair of observations down to a single observed dierence, we have reduced this to a one-sample testing problem. We can simply use another one-sample t test for this: x d , t= sd / n where x d is the sample average of dierences and sd is the sample standard deviation of those dierences. We compare this a t distribution with df = n 1 to nd the p value. Note that n is the number of pairs of observations (e.g. the number of students), not the n1 + n2 total number of observations. The corresponding condence interval follows: x d t n 1 sd / n

28

Vous aimerez peut-être aussi