Vous êtes sur la page 1sur 2

# Some More Thoughts about Thinking Through the AP Exam

Last-Minute Tips for AP Statistics Exam Relax, and think. As clich as that sounds, remember that everyone else taking the exam is in a situation identical to
yours. Youve seen these types of problems all year. Just think.
Make sure your calculator is functioning properly. Insert new batteries a day or so before the exam, and make sure all
systems are "go." Bring a backup calculator, if possible.
Top 5 Reminders About the AP Exam Don't confuse median and mean. They are both measures of center, but, for a given data set, they may differ by a
1. There is a high probability you will get a FRAPPY (or two) about experimental design, probability, and/or LSRL. considerable amount. Remember that the mean is pulled in the direction of the skew.
2. On all inference questions, assumptions MUST always be checked and stated. mean > median <===> distribution skewed right
3. Students must always define any symbols used on their papers, regardless of how obvious it may seem. mean < median <===> distribution skewed left
Don't confuse correlation coefficient and slope of least-squares regression line.
For example: a = b Students must define what a and b represent. A slope close to 1 or -1 doesn't mean strong correlation.
4. Remember to allow 25-30 minutes for FRAPPY #6. It is long and counts for more, but often is the easiest question. An r value close to 1 or -1 doesn't mean slope of least-squares regression line is close to 1 or -1.
5. State your case and STOP for goodness sake! You don't need to fill all the white space allocated. Adding more to an Relation between b (slope of regression line) and r (coefficient of correlation) is b = r(Sy/Sx).
Remember that r > 0 doesn't mean r > 0. For instance, if r = 0.81, then r = 0.9 or r = -0.9.
! ! ! Remember that the least-squares regression line contains the point (mean x, mean y), where mean x is the mean of
the x-values, and mean y is the mean of the y-values.
Highlighting What You Should Already Know A coefficient of correlation near 0 doesn't necessarily mean there are no meaningful relationships to be observed.
For example,
Students should:
read the question carefully. x 2 3 4 5 6 7 8 9 10 11 12
be able to describe how to randomize. y 6 30 8 50 10 70 12 90 14 110 16
need to answer the question asked. Volunteering extra information can be disastrous. In this case, r = .38, but a scatterplot displays something quite interesting.
be calculator-efficient. It is certainly possible to waste time punching numbers into a calculator. Entering lists of Moral of story: Whenever possible, look at the "shape" of the data.
numbers into a calculator can be time-consuming, and certainly doesn't represent a display of statistical intelligence. Be careful with the concept of simple random sample (SRS). For instance, if each individual in a group has an equal
If, upon reading an AP question, you think you will have to enter many numbers into a calculator, you are probably probability of being chosen in a sample, it doesn't follow that the sample is an SRS. Consider a class of 6 boys and 6
overlooking something. Reread the problem, and look for a quicker path to a solution. girls. I want to randomly pick a committee of two students from this group. I decide to flip a coin. If "heads," I will
understand how to set up and run a simulation. choose two girls by a random process. If "tails," I will choose two boys by a random process. Now, each student has
understand that a normal curve is for continuous data only. A discrete distribution may be modeled by a normal an equal probability (1/6) of being chosen for the committee. However, the chosen two students do not represent an
curve or it can be approximately normal but can never be exactly normal (i.e. normal approximation to the binomial). SRS of size two picked from members of the class, for the selection process does not allow for a committee
understand the meaning of a sampling distribution and its behavior. consisting of one boy and one girl. To have an SRS of size two from this class of 6 boys and 6 girls, each committee
understand the difference between sample mean, the mean of the sample, means/sampling distribution, and the of two students would have to have an equal probability of being chosen.
population mean. Look at graphs and displays carefully. For graphs, note carefully what is represented on the axes, and be aware of
understand the difference between standard deviation and standard error. number scale. Some questions the provide tables of numbers and graphs relating to the numbers can be answered
understand when to use a Chi-Square Test; and when to use a Difference of Proportions Test. simply by "reading" the graphs.
understand the difference between a sampling distribution and a t or z distribution. Don't confuse standard deviation and variance. Remember that variance is the square of standard deviation.
know how to read generic computer output. Be aware of the following since they represent simplified versions of sophisticated concepts:
know difference between a scatter plot and a residual plot. (a) When combining two independent sets by addition,
realize that a model that produces predicted values isn't providing actual data values. If the equation for a least- - means add;
square regression line is y = 1.5x + 3.34, then the slope and y-intercept need to be interpreted properly. For instance, - standard deviations do NOT add;
one might say that "on the average, a unit change in x results in a change of 1.5 units in y" and that "the predicted - variances add.
value of y is 3.34 when x = 0." (b) When combining two independent sets by subtraction,
not be sloppy in choice of words. For instance, on a residual graph, the phrase "half are above and half are below" is - means subtract;
not equivalent to "randomly scattered." - standard deviations do NOT subtract;
give answers that make sense. in the context of the problem. For instance, it generally makes no sense to talk about - variances ADD.
"1/3 of an airplane.". Simple examples:
realize that null and alternate hypotheses are stated in terms of population parameters, not sample statistics. Also, Let S = {5, 9} and T = {1,3}.
students need to be careful not to reverse the null and alternate hypotheses. Then set S+T = {5+1,5+3,9+1,9+3} = {6,8,10,12}, and set S-T = {5-1,5-3,9-1,9-3} = {2,4,6,8}.
know the distinction between a test for homogeneity of proportions and one for independence. Set S Set T Set S+T Set S-T
actually check necessary assumptions instead of just saying something like "it is assumed...." For instance, in a chi- Mean 7 2 9 5
square test where cell counts are known, if all expected counts are greater than or equal to 5, this should be noted, as St. Dev. 2 1 2.2361 2.2361
contrasted to just stating the assumptions for chi-square. Variance 4 1 5 5
interpret p-values correctly. Note that:
understand the difference between a simple random sample and the random assignment of treatment to mean(S+T) = mean(S) + mean(T)
subjects. mean(S-T) = mean(S) - mean(T)
understand that there are two types of replication in experiments: (1) Replication within the experiment quantifies variance(S+T) = variance(S-T) = variance(S) + variance(T)
variability within the experiment, and (2) replication of the experiment helps achieve validation. Recognize a binomial distribution situation when it arises. Thinking in terms of slots, if you have a set number of slots,
understand terms like confounding, lurking variables, etc. and the probability of getting a "success" in each slot is constant, then you have a binomial setting. Consider, for
be careful when using "calculator language." It is important for a reader to understand what is written and that the instance, rolling a die ten times. There are ten slots to be filled, and the probability of filling each slot is 1/6.
student really know what he/she was writing. Using the TI-83, the probability of getting exactly three 6's is (10C3)*(1/6)3*(5/6)7 = binomPDF(10,1/6,3) = 0.1551.
The probability of getting less than four 6's is binomCDF(10,1/6,3) = 0.93027, or about 93%. The probability of
getting four or more 6's in 10 rolls of a single die is about 7%. If x is the number of 6's obtained when ten dice are
rolled, then mean(x) = 10(1/6) = 1.6667, and st.dev(x) = sqrt[10(1/6)(5/6)] = 1.1785
Binomial distribution approaches normal distribution as number of trials increases. If N is the number of trials in a Thoughts about the Multiple-Choice Questions
binomial setting, and if p represents the probability of "success" in each trial, then a general rule of thumb states that a
normal distribution can be used to approximate the binomial distribution if Np > 5 and N(1-p) > 5. Answer ALL of the questions. NEW! There is NO penalty for wrong answers.
Recognize a discrete random variable situation when it arises (and don't confuse it with a binomial situation.) If an answer is "obvious," think about it. If it's so obvious to you, it's probably obvious to others... and the chances
Let x = the number of heads obtained when five coins are tossed. are good that it is not the correct response.
Value of x 0 1 2 3 4 5 For example, suppose one set of test scores has a mean of 80, and another set of scores on the same test has a
Probability 1/32=.03125 5/32=.15625 10/32=.3125 10/32=.3125 5/32=.15625 1/32=.03125 mean of 90. If the two sets are combined, what is the mean of the combined scores.
The "obvious" answer is 85 (and will certainly appear among the answer choices), but you, as an intelligent
mean(x) = 0(.03125) + 1(.15625) +2 (.3125) + 3(.3125) +4 (.15625) + 5(.03125) = 2.5.
2 2 2 2 2 2 statistics student, realize that 85 is not necessarily the correct response.
var(x) = .03125(0-2.5) + .15625(1-2.5) + .3125(2-2.5) +.3125(3-2.5) + .15625(4-2.5) + .03125(5-2.5) = 1.25.
st.dev(x) = sqrt[var(x)] = sqrt(1.25) = 1.118.
Simpson's Paradox: This usually involves percentages.
Example:
WIN TOTAL % WIN WIN TOTAL % WIN
A: First Half 80 100 80% B: First Half 78 100 78%
A: Second 20 40 50% B: Second 2 5 40%
Half Half

## WIN TOTAL % WIN

A: Both 100 140 71.4%
Halves
B: Both 80 105 76.2%
Halves
Here, A's winning percentage exceeds B's for both of two periods, but B has a better overall winning percentage.
Realize that logarithmic transformations can be practical and useful. Among other things, taking logs cuts down the
magnitude of numbers. Also, if {(x,y)} has an exponential pattern, then {(x,log y)} has a linear pattern.
Logarithmic Transformation Example:
x y log y
1 24 1.3802
2 192 2.2833
3 1,536 3.1864
4 12,188 4.0859
7 6,290,000 6.7987
8 49,900,000 7.6981
x
An exponential fit to (x,y) on the TI-83 yields y = 3(8 ), with r = .9999.
If we attempt to extrapolate and predict a value for y when x = 9, we get y = 3(89) = 402,653,184.
A linear fit to (x,log y) on the TI-83 yields log y = .9027286x + 0.477395, with r = .9999.
If x = 9, then log y = .9027286(9)+0.477395 = 8.6019524. Hence y = 108.6019524 = 399,900,917.
Type I/II Error & Power:
Type I error: Rejecting a null hypothesis when it is true.
Type II error: Accepting a null hypothesis when it is false.
Power of a test: Probability of correctly rejecting a null hypothesis = 1 - Probability (Type II error).
Simple example: Population #1: A A A A A A A A B B Population #2: A B B B B B B B B B
Without knowing which of the populations is represented, an element is randomly chosen. After viewing the
element, the chooser must guess the population from which it came.
Null hypothesis (Ho): The element came from population #1.
Alternate hypothesis (Ha): The element came from population #2.
Test decision: Accept Ho if the element is A; otherwise reject Ho and accept Ha.
Here is a probability chart:
Ho TRUE Ho FALSE
ACCEPT Ho 80% 10% (Type II error)
REJECT Ho 20% (Type I error) 90% (Power of the test)
In hypothesis testing, the level of significance is the probability of making a Type I error.
Independent events are NOT the same as mutually exclusive (i.e. disjoint) events.
Two events, A and B, are independent if the occurrence or non-occurrence of one of the events has NO effect on the
probability that the other event occurs. Events A and B are mutually exclusive if they cannot happen simultaneously.
Remember that P(A|B) = P(A) and P(B|A) = P(B) if and only if the events A and B are independent.