Vous êtes sur la page 1sur 12

Name______________________________________

Final Review
Data Analysis
Use the following information for questions 1-7
Quiz Scores
1
4
5
5
6
7
7
7
3
4
5
6
6
7
7
8
3
4
5
6
6
7
7
8

AP Statistics

8
8
8

9
9
10

1. Are these quiz scores qualitative or quantitative?


Qualitative
2. Make a dot plot or histogram (your choice) of the quiz scores.
Dot Plot

Collection 1

Histogram

Collection 1
8
7
6
5
4
3
2
1

6
8
Scores

10

12

8 10 12
Scores

14 16

18

3. Find the mean of the quiz scores.


6.2
4. Find the standard deviation of the quiz scores.
2.024
5. Make a boxplot of the quiz scores. Be sure to label the 5-number summary on your boxplot.
Box Plot

Collection 1

Min= 1
Q1 = 5
Med= 6.5
0

10

Q3 = 8
Max = 10

12

6. Find the Scores


range of the quiz scores.
Range = 10 1 = 9
7. Find the interquartile range of the quiz scores.
IQR = Q3-Q1 = 8 3 = 5
8. What is the difference between x and
x

is a statistic, a measure of the average of a sample.


is a parameter, a measure of the average taken from the population

9. What is the difference between s and s ?


s is a statistic, a measure of the standard deviation taken from a sample.
is a parameter, a measure of the standard deviation take from the population.
The standard deviation is a measure of how far values on average, typically vary from the mean.

10. What is the area under a density curve?


The area under a density curve is equal to 1.

Normal Distribution
Use the following info for questions 11-16, be sure to sketch the curve for each problem.
IQ is distributed normally with a mean of 100 and a standard deviation of 15.5.

X~N(100 , 15.5) normcdf(lower, upper, mean, sd) invnorm(area, mean, sd)


10. What is the percentile rank of someone with an IQ of 112?
normcdf(-99999,112,100,15,5)= .78
P ( X 112 ) =

11. What IQ do you need to have to be in the top 5%?


P (z X ) =
.95
X = 125.5

12. What percent of the people have an IQ below 90?


normcdf(-9999,90,100,15.5)= .26
P ( X 90 ) =

13. What percent of the people have an IQ between 75 and 115?


normcdf(75,115,100,15.5)=.78
P ( 75 X 115 ) =
Or using z-scores
75 100
115 100
=
1.613 and
=
z =
.968
15.5
15.5
Normcdf(-1.613,.968)=.78
14. What IQ scores are in the bottom 10%?

P (z X ) =
.10
invnorm(.10, 100, 15.5)= 80
15. What IQ scores are in the middle 50% of the population?
P (z X ) =
.25

P (z X ) =
.75
invnorm(.25, 100, 15.5)=89.5

invnorm(.75, 100, 15.5) = 110.5

16. What percent of the people have an IQ of at least 132?


normcdf(132,99999,100,15.5)= .02
P ( X 132 ) =

Linear Regression Analysis


For questions 17-26 be use the following information. Include any formulas you used to received full credit.
Big Ten Average Scores
School
Football Players SAT All Students SAT
Illinois
872
1140
Indiana
741
1007
Michigan
826
1190
Iowa scores
Michigan State
788
998
were not
Minnesota
838
1050
available.
Northwestern
1034
1250
Ohio State
820
986
Penn State
897
1083
Purdue
881
1009
Wisconsin
825
1090
17. Which universitys scores would be influential on a scatterplot? Why?
Northwestern. It has the greatest influence on the line if you take it away the linear relationship would decrease.
18. Using football players SAT as your explanatory variable, find the LSRL for this data.
=
y 406 + .791x

=
all
406 + .791football
12
= .34
35
19. What is the correlation coefficient for this data?
r = .70
20. What two things does this correlation tell us about the scatterplot of the data?
Its strong and positive.
The correlations suggests that schools who have higher achieving athletes tend to have a higher overall achieving
student body.
21. What is the correlation without the influential point?
The correlation without Northwestern is .35, significantly reducing the linear relationship between student athlete
score and overall school score.
22. Correlation only applies to what type(s) of relationship(s)?
Linear relationships
23. Give an example of two things that are highly correlated but are not necessarily a cause-and-effect
relationship.
Reading levels and shoe sizes, smoking and cancer
24. Iowas football players have an average SAT score of 814. What score would you predict for the entire
student body? Is this a good prediction? Why or why not?
1049.87, this is not a good prediction. Its not necessarily bad either. However, since we do not have data that
exists beyond 820 we are facing the danger of extrapolation to arrive at our answer.
25. Find the residual for Penn State.
res = actual predicted
1083-1115.83= -32.83
26. What is the coefficient of determination for this data and what does it tell you about the data.
r2= .49, This means that approximately 49% of the variation in student SAT can be explained by the linear
relationship between football SAT and overall SAT

Use the following information for questions 27-29. Be sure to include all formulas used.
Shipping Cost ($)
Shipping Box Length (inches)
10
4.99
12
8.59
15
16.79
18
28.99
24
68.99
Standard
Coefficients
Error
t Stat
P-value
Intercept
-46.8530
10.3134
4.542
0.019
Shipping Box Length
(inches)
4.5900
0.62328
7.364
0.005
R-Squared = .948

27. Perform a logarithmic transformation for an exponential model for this data. Show your work.
shipping cost = 163.081*log(box length) 166.4634

r= .9351 r2= .8745

28. Perform a logarithmic transformation for a power model for this data. Show your work.
Log(shipping cost) = 3.001*log(box length) 2.303

r= .999 r2= .9999

29. Which model, exponential or power, is a better fit for this data? Justify.
Power model because there is no pattern in the residual plot and Approx. 99.9% of the variation in the log(shipping
cost) can be explained by the linear relationship between log(box length) and log(shipping cost)
Use the following table to answer questions 30-32
A researcher suspected a relationship between peoples preferences in movies and their preferences in pizza. A
random sample of 100 people produced the following two-way table:
Favorite Movie
Pepperoni
Veggie
Cheese
The Matrix
20
5
10
Ever After
8
15
12
American Pie
15
2
13
Total
43
22
35
30. What percent of these people prefer pepperoni pizza?
43%
31. What percent of people who prefer veggie pizza like The Matrix?
5
= .23
22
32. What percent of those who like Ever After prefer cheese pizza?

12
= .34
35

Total
35
35
30
100

Experimental Design and Sampling


33. What is the difference between an observational study and an experiment?
Experiments involve manipulation some factor to achieve a desired result by randomization of treatments.
Observational studies involve observing a population to
34. Explanatory variables in experiments are often called factors
35. If you test three different strengths of a drug, you are testing three different levels.
The combination of specific levels and factors is called a treatment.
36. What are the 4 principles of experimental design?
Control- making conditions as similar as possible
Randomization- equalize the effects of unknown or uncontrollable sources of variation.
Replication- repeat the experiment on numerous subjects, and numerous times.
Blocking is used to reduce variation by placing similar control groups together.
37. What is a bias? What kinds of biases are there?
A bias is a systematic tendency for a method to over or under estimate a population.
38. What is the difference between and experimental unit and subject?
An experimental unit is what we experiment on. A subject is a human that is being experimented on. The term
subject is considered to be more ethically correct.
39. What is the purpose of a control group?
It provides a baseline to the group being experimented on. With the control group we are able to tell if a
treatment is actually providing any change.
40. What is sampling? What does an SRS, systematic, clustering, and stratified sample consist of. What are their
similarities, and what are their differences?
Sampling is the process of looking at smaller subsets of an entire population to get a generalization about the
entire population of interest.
SRS or simple random sample - a sample that ensures that every possible sample of sample size n has equally likely
chance of being selected.
Systematic sampling- entails taking a sampling frame (list of individuals in sample) and taking every kth individual
after starting at some random number
Cluster- break into similar groups called clusters and randomly select and sample entire cluster. The clusters are
heterogeneous.
Stratified sample - Break into similar groups called strata and sample a portion of each strata fulfilling the
number you need for your sample. The strata are homogenous groups
41. An ad for OptiGro plants states that youll grow jucier and tastier tomatoes using their products. You want to
test this claim and wonder if you can get by, by only using half a dose. You go down to your local nursery and pick up
24 tomato plants. Explain how you would perform this experiment using a completely randomized experiment.
12 plants- Full dose
24 plants

12 plants- Half dose

Compare to see
which is
jucier/tastier

***Randomly Assign by
flipping a coin where
Heads= full dose
Tails= half dose

42. Instead of testing 24 trees you decide to do only 18. However, when you try to purchase 18 at nursery A, you
find out they only have 12. Letting this how you would use a completely random experiment that utilizes blocking.
(You are blocking because you need to account for the differences in the two stores)

43. Do cars get better gas mileage with premium instead of regular unleaded gasoline? While it might be possible
to test some engines in a laboratory setting, wed rather used real cars and real drivers in real day to day driving,
so we get 20 volunteers. Design the experiment.

a. I want to test the effects of aerobic exercise on resting heart rate. I want to test two different levels
of exercise, 30 minutes 3 times per week and 30 minutes 5 times per week. I have a group of 20 people to
test, 10 men and 10 women. I will take heart rates before and after the experiment. Draw a diagram for
this experimental design. Explain how you

Simulations
44. Design and perform a simulation of how many children a couple must have to get at least one girl and at least
one boy. Include a description and perform 10 trials.
17868
95034
27754
90056
19233
01927
82226

24943
05756
42648
52711
95034
27754
90056

61790
28713
82425
38889
05756
42648
52711

90656
96409
36290
93074
28713
82425
38889

87964
12531
45467
60227
96409
36290
93074

18883
42544
71709
40011
12531
45467
60227

41979
82853
77558
85848
42544
71709
40011

83485
73676
00095
48767
82853
77558
85848

46816
47150
32863
52573
73676
00095
48767

85435
99400
29485
95592
47150
32863
52573

19233
01927
82226
94007
99400
29485
95592

Probability
Use the following for questions 45-48
Probability of winning certain prizes in my fake raffle (tickets ARE replaced after each draw):
Car
0.03

Boat
0.07

TV
0.12

Can Opener
0.33

45. What is the probability of winning nothing?


P(N)=1-(.03+.07+.12+.33)=.45
46. What is the probability of winning the car or the TV?
P(C U TV)= P(C)+P(TV)-P(C intersect TV)= .03+.12-(.03*.12)=.1464
47. What is the probability of winning the boat and the can opener?
P(B intersect CO)= (.07*.33)=.0231
48. What is the probability of not winning the car or the can opener?
1-P(C U CO)=1-(.03+.33-(.03+.33))=.6499
Use the following to answer questions 49-53
Number of family members (X)
2
3
P(X)
0.05
0.12

4
0.39

5
0.26

6
0.15

7 or more
0.03

49. Find P(X > 4).


P(x=5)+P(x=6)+P(x=7 or more)=.26+.15+.03=.44
50. Find P(X < 5).
1-P(x>4)=1-.44=.56
51. Find P(X 3).
1-P(x=3)=1-.12=.88
52. Find the expected number of family members. (Use 7 for 7 or more.)
E(x)=2(.05)+3(.12)++7(.03)=4.43
53. Find the standard deviation for the number of family members.
SD(x)= (2-4.43)(.05)+(3-4.43(.12)++(7-4.43)(.03)=5.196

Random Variables
Use the following for 54-57
Liz can run the 400 meter dash in an average of 60 seconds with a standard deviation of 4 seconds. Paul can run it
in 70 seconds with a standard deviation of 8 seconds.
=
L 60
=
L2 16
=
P 70
=
P2 64
54. If Liz and Paul are the first two legs of a 1600 m relay team, what is the mean and standard deviation of their
times together?
L + P = 60 + 70 = 130

L +P = Var (L) +Var (P ) = 16 + 64 =

70 = 8.94

55. Liz and Paul race each other. What is the mean and standard deviation of the difference in their times?
L P =
60 70 =
10

Var (L) +Var (P ) = 16 + 64 =

L P =
L +P =

70 = 8.94

56. Paul drinks a 2-liter of Mountain Dew, so he now runs twice as fast. What are his new mean and standard
deviation?
1
1
=
P
=
( 70 ) 35
2
2
1
.52Var =
(P )
64 =
16 4
=
P
(.25 )=
2
57. Liz is penalized 10 seconds for jumping the gun. What are her new mean and standard deviation?
L + 10 = 60 + 10 = 70

L + 10 =

Var ( L ) =

16 = 4

For problems 58-66, use the following situation: For Test 1, the class average was 80 with a standard
deviation of 10. For Test 2, the class average was 70 with a standard deviation of 12.
=
1 80
=
12 100
=
2 70
=
22 144
58. What is the average for the two tests added together?

E (T1 ) + E (T2 ) = 80 + 70 = 150

59. What is the standard deviation for the two tests added together?

VAR (T1 ) +VAR (T2 ) =

100 + 144 = 15.62

60. What is the difference in the test averages?

E (T1 ) E (T2 ) = 80 70 = 10

61. What is the standard deviation for the difference in the test averages?

VAR (T1 ) +VAR (T2 ) =

100 + 144 = 15.62

62. If I cut the test scores on Test 2 in half and add 50, what is the new average?

1
1
E (T2 ) + 50
=
( 70 ) + 50= 85
2
2

63. What is the new standard deviation for Test 2 in problem 199?
2

1
=
VAR (T2 )
2

1
=
(144 ) 6
2

64. If I add 7 points to every Test 1, what is the new standard deviation?

SD (T1 ) = 10

65. If I multiply every Test 1 by 2 and subtract 80, what is the new mean?

2E (T1 ) 80
= 2 ( 80 ) 80
= 80

66. If I multiply every Test 1 by 2 and subtract 80, what is the new standard deviation.

(2=
) VAR (T ) =
(2) (100 )
2

20

Use the following for 67-70


A young woman works two jobs and receives tips from both jobs. As a hairdresser her distribution of weekly tips
normally distributed with a mean of $65 and a standard deviation of $5.75. As a waitress, her distribution is
normally distributed with a mean of $164 and a standard deviation of $8.02
Let X = Hairdresser tips
Y = Waitress tips
X~N(65, 5.75) Y~N(164,8.02)
67. What is her expected take home for both jobs?
E(X + Y) = E(X) + E(Y) = 65 + 164 = $229.00
68. What is the standard deviation of her tips from both jobs combined?
VAR(X + Y) = VAR (X) + VAR (Y) = 5.752 + 8.022
SD(X + Y) = SQRT(97.3829) = $9.87
69. Find the probability she earns as least $250 in any given week.
250 229
1.7% chance she earns more than $250
=
z = 2.128 P(z 2.128) = 0.017
9.87
70. What assumption must be made in order to answer problem 69?
The tips of each job are independent of each other.
Suppose our archer shoots 10 arrows:
71. Find the mean and standard deviation of the number of bulls-eyes she may get.
=8
= 1.6
72. Whats the probability that she never misses?
.107
73. Whats the probability that there are no more than 8 bulls-eyes?
.624
74. Whats the probability that she hits the bulls-eye more often than she misses?
Use the following situation for questions 75-82: The probability that a child born to a certain set of
parents will have blood type AB is 25%.
75. The parents have four children. X is the number of those children with blood type AB. Does this situation
meet the requirements for a binomial experiment? Explain.
Yes, there is a fixed number of trials, each trial (having a child) is independent of each other. The
probability of success (Blood type) is constant for all children. There are only two outcomes success (Blood
type AB or failure Blood Type not AB)
76. Using the situation in problem 169, find P(X = 2).
.211
77. Using the situation in problem 169, find P(X < 3).
.0039
78. Using the situation in problem 169, find P(X > 1).
.684
79. Using the situation in problem 169, find P(1 < X < 3).
.6799
80. Using the situation in problem 169, find P(2 < X < 4).
.047
81. What is the mean of the situation in problem 169?
=1
82. What is the standard deviation of the situation in problem 169?
= .866

Multiple Choice
83) You measure the age, marital status and earned income of an SRS of 1463 women. The number and type of
variables you have measured is
A) 1463
B) Four; two categorical and two quantitative
C) Four; one categorical and three
quantitative

D) Three; two categorical and one


quantitative
E) Three; one categorical and two quantitative

84) If your score on a test is at the 60th percentile, you know that your score lies
A) Below the lower quartile
B) Between the lower quartile and the median
C) Between the median and the upper quartile

D) Above the upper quartile


E) Cant say where it lies relative to the
quartiles

85) When dealing with financial data (such as salaries or lawsuits settlements), we often find that
the shape of the distribution is _________. When the distribution has this shape, the _________
is pulled toward the long tail of the distribution, but the _________ is less affected. The sequence
of words to correctly complete this passage is
A) Right skewed, median, mean.
B) Left skewed, mean, median.
C) Right skewed, mean, standard deviation.

D) Right skewed, mean, median.


E) Roughly symmetric, mode, mean.

86) Items produced by a manufacturing process are supposed to weigh 90 grams. The
manufacturing process is such, however, that there is variability in the items produced and they do
not all weigh exactly 90 grams. The distribution of weights can be approximated by a normal
distribution with mean 90 grams and a standard deviation of 1 gram. What percentage of the items
will either weigh less than 87 grams or more than 93 grams?
A) 6%

B) 94%

C) 99.7%

D) 0.3%

E) 0.15%

87) The correlation coefficient measures


A) Whether there is a relationship between two variables.
B) The strength of the relationship between two quantitative variables.
C) Whether or not a scatterplot shows an interesting pattern.
D) Whether a cause and effect relation exists between two variables.
E) The strength of the linear relationship between two quantitative variables.
88) Which of the following is true of the correlation coefficient r ?
A) It is a resistant measure of association.
B) 1 r 1
C) If r is the correlation between X and Y, then r is the correlation between Y and X.
D) The correlation coefficient will be +1.0 only if all the data lie on a perfectly straight-line.
E) All of the above.
89) The least-squares regression line is
A) The line that makes the square of the correlation in the data as large as possible.

B) The line that makes the sum of the squares of the vertical distances of the data points from the
line as small as possible.
C) The line that best splits the data in half, with half of the points above the line and half below
the line.
D) The line that makes the sum of the squares of the residuals 0.
E) All of the above.
90) The fraction of the variation in the values of a response y that is explained by the leastsquares regression of y on x is the
A) Correlation coefficient
B) Slope of the least-squares regression line
C) Square of the correlation coefficient

D) Intercept of the least-squares regression


line
E) Sum of the squared residuals

91) Suppose we fit the least-squares regression line to a set of data. If a plot of the residuals
shows a curved pattern,
A) A straight line is not a good summary for
the data.
B) The correlation must be 0.

C) The correlation must be positive.


D) Outliers must be present
E) r 2 = 0.

92) Which of the following statements concerning residuals is true?


A) The sum of the residuals is always 0.
B) A plot of the residuals is useful for assessing the fit of the least-squares regression line.
C) The value of a residual is the observed value of the response minus the value of the response
that one would predict from the least-squares regression line.
D) An influential point on a scatterplot is not necessarily the point with the largest residual.
E) All of the above.
93) If changes in a response variable are due to the effects of the explanatory variable as well as
the effects of lurking variables, and we cannot distinguish between these effects, we are said to
have
D) Correlation
A) A cause-and-effect relation between the
E) Extrapolated
explanatory and response variable.
F) None of the above.
B) Common response
C) Confounding
94) Using least-squares regression, I determined that the logarithm of the population of a country
is approximately described by the equation: log(population) = -13.5 + 0.01x(year)
Based on this equation, the population of the country in the year 2000 should be about
A) 6.5
D) 2,000,000
E) 3,162,278
B) 665
C) 6,665
95) The reason that blocking (as in a randomized block design) is sometimes used in experimentation
is to
C) Eliminate confounding with another factor
A) Prevent the placebo effect
D) Eliminate sampling variability
B) Allow double blinding

96) An experiment compares the taste of a new spaghetti sauce with the taste of a successful
sauce. Each of a number of tasters tastes both sauces (in random order) and says which tastes
better. This is called a
A) Simple Random Sample
B) Stratified Random Sample
C) Completely Randomized Design

D) Matched Pairs Design


E) Double-Blind Design

97) In a certain town, 50% of the households own a cellular phone, 40% own a pager, and 20% own
both a cellular phone and a pager. The proportion of households that own neither a cellular phone
nor a pager is
C) 30%
D) 70%
E) 90%
A) 0%
B) 10%
98) If the knowledge that an event A has occurred implies that a second event B cannot occur, the
events A and B are said to be
D) The Sample Space
A) Independent
E) Complementary
B) Disjoint
C) Mutually Exhaustive
99) A deck of cards contains 52 cards, of which 4 are aces. You are offered the following wager:
Draw one card at random from the deck. You win $10 if the card drawn is an ace. Otherwise you
lose $1. If you make this wager very many times, what will be the mean outcome?
A) About -$1, because you will lose most of the time.
B) About $9, because you win $10 but lose only $1.
C) About -$0.15, that is, on the average you lose about 15 cents.
D) About $0.77, that is, on the average you win about 77 cents.
E) About $0, because the random draw gives you a fair bet.
100) All bags entering a research facility are screened. Ninety-seven percent of the bags that
contain forbidden material trigger an alarm. Fifteen percent of the bags that do not contain
forbidden material also trigger the alarm. If 1 out of every 1,000 bags entering the building
contains forbidden material, what is the probability that a bag that triggers the alarm will actually
contain forbidden material?
A) 0.00097

B)0.00640

C)0.03000

D)0.14550

E)0.9700