Vous êtes sur la page 1sur 19

STAT 2300: Review Problems for Test #1 Spring 2019

Your Test 1 will cover Sections 1.1 – 5.1 & 5.5. This set of review problems includes all problems over
these sections from the released exams (Fall 2018 & Spring 2018) posted to the course website. Thus,
this set of review problems does not necessarily reflect the length and/or difficulty of Test 1. This review
cannot be guaranteed to be inclusive or exclusive of all that is covered on Test 1, it is merely meant to
aid you in the review process.

Part I: Multiple Choice. Circle the letter corresponding to the best answer of the choices given.

1. Which of the following summary measures are resistant to outliers?


(A) The mean and standard deviation
(B) The median and standard deviation
(C) The median and range
(D) The median and interquartile range

2. On February 7, 2010, Drew Brees and Peyton Manning played quarterback for their respective
football teams in the Super Bowl. Manning was the Most Valuable Player (MVP) of the regular
season. Drew Brees was the MVP of the Super Bowl, as the New Orleans Saints beat the
Indianapolis Colts. The boxplots below display the distribution of passing yards for each quarterback
during the regular season games that year. Note that Peyton Manning did not finish the game in
which he threw for less than 100 yards.

Which of the following statements is NOT justified by the boxplots?


(A) Drew Brees had 350 or more passing yards in one-fourth of the games in which he played.
(B) For the complete games played, Peyton Manning had less variability in the number of passing
yards per game than Drew Brees.
(C) Both quarterbacks had 300 or more passing yards in about half of the games they played.
(D) For both quarterbacks, the distribution of number of passing yards per game is roughly bell-shaped.
1
3. The STAT 2300 syllabus states that the final course letter grade will be calculated according to the
usual 10-point grading scale, where the minimum grade for an A is 90, for a B is 80, for a C is 70,
for a D is 60, and below 60 is an F. What is the level of measurement of final course letter grade?
(A) Ratio
(B) Interval
(C) Ordinal
(D) Nominal

4. The following dotplots show the distribution of lab grades on Lab 1 for three sections (A, B, and C)
of STAT 2301. Each section consists of 16 students and has the same mean and range in lab grades.

Which lab section has the largest standard deviation in lab grades?
(A) Lab section A
(B) Lab section B
(C) Lab section C
(D) The standard deviation is the same for all 3 lab sections

2
5. For a class project, a student surveys 30 Biology majors. Each student is asked "How many hours do
you work at a part-time job in a typical week?" The results are summarized below.

Variable: Number of Hours


n: 30
Mean: 7.20
Median: 6.00
StDev: 5.3
Min: 0
Max: 24

Based on these summary statistics, the distribution of number of hours worked is most likely:
(A) Bell-shaped
(B) Skewed left
(C) Skewed right
(D) Uniform

Use the following information to answer questions 6 – 7.


The Centers for Disease Control and Prevention (CDC) conducted a nationwide survey of 6,000 U.S.
adults age 65 and older. Based on the survey responses, the CDC reported that only 36% of adults 65
years and older are physically active, as defined by exercising 75 minutes or more per week.

6. Which of the following statements about this study is correct?


(A) The population of interest is all U.S. adults.
(B) The sample is all U.S. adults 65 years and older.
(C) The value 36% is a statistic because it was calculated from sample data.
(D) The value 36% can be considered a parameter because the number of respondents is so large.

7. Identify the variable of interest in this study and classify it as qualitative or quantitative.
(A) The variable of interest is whether or not an older adult is physically active, which is qualitative.
(B) The variable of interest is whether or not an older adult is physically active, which is quantitative.
(C) The variable of interest is the percentage of older adults that are physically active, which is
qualitative.
(D) The variable of interest is the percentage of older adults that are physically active, which is
quantitative.

3
8. In the previous academic year, Clemson awarded degrees to 6,032 students. The bar graph below
displays the number of degrees awarded by each of the 7 colleges: CECAS (College of Engineering,
Computing & Applied Sciences), Business, BSHS (Behavioral, Social & Health Science), Science,
AAH (Architecture, Arts & Humanities), CAFLS (College of Agriculture, Forestry & Life
Sciences), and Education.

Which of the following statements is correct?


(A) The distribution is skewed left.
(B) The distribution is skewed right.
(C) The majority of degrees awarded were from CECAS.
(D) After CECAS, Business, and BSHS, there is a large drop in number of degrees awarded for the
other colleges.

9. Before opening a new restaurant, a chef wants to gather information about the eating habits of the
local residents. He randomly selects 1000 households from all households in the area and mails a
questionnaire to them. Of the 1000 surveys mailed, 150 are returned. Which of the following is the
most obvious concern with how this information is gathered?
(A) Mailing questionnaires instead of conducting in-person interviews produces a convenience
sample.
(B) Those who chose to respond to the survey may have different eating habits from those who did
not respond.
(C) Only residents from the local area were polled.
(D) The chef must conduct a census in order to avoid all sources of bias.

4
10. The stem-and-leaf plot below displays the distribution of the number of points that Clemson scored
in each of its football games in the 2017 season. Selected summary statistics are also given.

Summary Statistics

Mean: 33.29
Std Dev: 14.79
Min: 6.00
Q1: 24.00
M: 32.50
Q3: 40.25
Max: 61.00

Key: 0|6 = 6 points

Which of the following best describes the distribution of points scored?


(A) The number of points scored is approximately symmetric and unimodal. The center is best
summarized by the median number of points scored (M = 32.50 points) and the spread is best
summarized by the range in number of points scored (R = 55 points).
(B) The number of points scored is approximately symmetric and unimodal. The center is best
summarized by the mean number of points scored (𝑥𝑥̅ = 33.29 points) and the spread is best
summarized by the standard deviation of number of points scored (s = 14.79 points).
(C) The number of points scored is strongly skewed right. The center is best summarized by the
median number of points scored (M = 32.5 points) and the spread is best summarized by the
range in number of points scored (R = 55 points).
(D) The number of points scored is strongly skewed right. The center is best summarized by the
mean number of points scored (𝑥𝑥̅ = 33.29 points) and the spread is best summarized by the
standard deviation of number of points scored (s = 14.79 points).

11. Jack and Jill want to know whether running up a hill once a day improves lung capacity and decided
to conduct an experiment. Both Jack and Jill measured their lung capacity before the experiment. By
the flip of a coin, it was decided that Jack would not exercise for six weeks while Jill ran up the hill
once a day for the same six week period. At the end of six weeks, Jack and Jill again measured their
lung capacity and compared the measure to the initial value for potential improvement. What key
feature of a well-designed experiment did Jack and Jill not incorporate?
(A) Random assignment of subjects to treatments
(B) Replication within each treatment group
(C) A control group for comparison
(D) This experiment includes all key features of a well-designed experiment

5
12. A medical administrative assistant has been asked to randomly sample 100 patients' office visit
records from a file of 2000 patient records and record the amount of time the attending physician
spent with each patient. The assistant is concerned the time of year during which the patient was
seen may be a lurking variable and plans to divide the year into quarters (Jan–Mar, Apr–Jun, Jul–
Sep, Oct–Dec) and randomly select 25 patient records from each quarter. Which of the following
best describes this sampling plan?
(A) Simple random sampling
(B) Systematic sampling
(C) Stratified sampling
(D) Cluster sampling

13. An educational researcher will conduct an experiment to determine whether children learn their
multiplication facts better by practicing with flash cards or by practicing on a computer. Twenty
children who volunteer for the experiment will be randomly assigned to one of the two treatments.
After practice, the children will be given a test on their multiplication facts. Why will it be
impossible to conduct a double-blind experiment?
(A) The person who grades the tests will know whether the child used flash cards or the computer.
(B) The child will know whether he or she used flash cards or the computer.
(C) The design does not include a control comparison group.
(D) It is not possible to apply each treatment to more than one subject.

14. Reading and understanding nutrition labels on foods may be an important precursor to dietary
change. Researchers randomly selected four medical clinics in Missouri. A total of 885 patients were
asked to complete a survey while waiting to see their physicians. The study found that those who
read labels tended to have better diets. Can the researchers conclude that reading nutrition labels on
foods causes patients to have better diets?
(A) Yes, because the use of randomization in the study allows for cause and effect conclusions.
(B) The researchers can only claim a cause and effect relationship among individuals similar to the
patients who completed the survey.
(C) No, because this was an observational study the researchers can only claim an association.
(D) No, because the 885 respondents do not represent a simple random sample.

15. According to gasbuddy.com, the mean price for a gallon of gas in South Carolina is currently $2.57
with a standard deviation of $0.07. Assume the distribution of gas prices is roughly symmetric and
unimodal. If we were to randomly select 100 gas stations in South Carolina, how many of these gas
stations does the Empirical Rule predict are selling a gallon of gas for less than $2.50?
(A) 0
(B) 3
(C) 16
(D) 68
6
16. The Cancer Prevention Study II (CPS-II) examines the relationship among lifestyle factors and cancer
mortality by tracking approximately 1.2 million men and women. Study participants completed an
initial study questionnaire in 1982 providing information on a range of lifestyle factors such as diet,
alcohol and tobacco use, occupation, and medical history. These data have been examined extensively
in relation to cancer mortality. What is the response variable of interest in the study?
(A) the number of study participants completing the questionnaire
(B) the relationship among lifestyle factors and cancer mortality
(C) lifestyle factors such as diet, alcohol and tobacco use, occupation, and medical history
(D) cancer mortality

17. A statistics student at Pleasantville High School looked at seat belt use by drivers. Customers were
observed driving into a local convenience store. After the drivers left their cars, the student asked
each driver several questions about seat belt use. In all, 80% of the drivers said that they always use
seat belts. However, the student observed that only 61.5% of these same drivers were actually
wearing a seat belt when they pulled into the store parking lot. Which of the following best explains
the difference in the two percentages?
(A) The difference is due to sampling variability. We shouldn't expect the results of a sample to
match the truth about the population every time.
(B) The difference is due to response bias. Drivers who don't use seat belts are likely to lie and say
they do use seat belts.
(C) The difference is due to sampling bias. The study included only customers of the convenience
store and did not include all drivers in the population.
(D) The difference is due to nonresponse bias. Drivers who don't use seat belts are less likely to
respond to the student's questions.

18. What does a correlation coefficient value of 𝑟𝑟 = −1 indicate?


(A) The points on the scatterplot are collinear (i.e., they lie on a straight line).
(B) There is a strong relationship between the two variables but the relationship is not linear.
(C) An increase of one unit of the explanatory variable leads to an equal increase of the response
variable.
(D) There is no discernible relationship of any kind between the two variables.

19. In 2013 two Harvard researchers set out to analyze student behavior in college libraries. They
randomly chose two locations to sit in the main library on the Harvard campus, and then observed
the behavior of over 700 students. Approximately 80% of the students' behaviors were study related.
Can the researchers conclude the majority of college students are studying while in the library?
(A) Yes, because a majority of the students observed were studying.
(B) Yes, because the use of randomization and replication make this a well-designed experiment.
(C) No, because the results of observational studies can never be generalized to broader populations.
(D) No, because students at Harvard may not be representative of all college students.
7
20. Clemson administrators are interested in the average teaching effectiveness rating of professors as
reported by students in the course evaluations. They plan to find this by taking a representative
sample of 200 professors and looking at the course evaluations for each professor. The administrators
suspect teaching effectiveness may vary by department and want to be sure to include professors from
each department on campus in their sample. Which sampling method should the administrators use?
(A) Simple random sampling because it is the easiest and most reliable sampling method
(B) Stratified random sampling with departments for strata
(C) Cluster sampling with departments for clusters
(D) Voluntary response sampling where a few volunteers are requested from each department

21. It has been observed that trade and war between countries are positively correlated. That is, countries
that trade with each other also tend to fight with each other. What is the most likely explanation of
this phenomenon?
(A) Disagreements over trade often lead to war.
(B) Rebuilding after war often leads to increased trade.
(C) The geographical proximity of the two countries is a lurking variable.
(D) The relative wealth of the two countries is a lurking variable.

22. The following stem-and-leaf plot displays the daily high temperatures (℉) in Clemson for the 31
days of January 2017 (source: weather.com).

3 0 3 4
3 5 6 7 8
4 0 1 1 1
4 6 7 8
5 1 3 3
5 5 5 6 6 6 6 7 9
6 0 1 1 2 3
6 9

Key: 3|0 = 30℉

What was the range in daily high temperatures in Clemson this January?
(A) 69℉
(B) 39℉
(C) 35℉
(D) 3.9℉

8
23. Assuming births are equally likely to occur on any day of the year, the probability that a randomly
selected person has a birthday on the first or last day of a month is __________ than .05 so it
_________ unusual to meet someone with a birthday on the first or last day of a month.
(A) Greater, is
(B) Greater, is not
(C) Less, is
(D) Less, is not

24. The correlation coefficient for the relationship between an athlete's weight and how much he can
bench press, calculated from data found online, is 𝑟𝑟 = .67. Which of the following statements must
be true?
(A) Being heavier causes athletes to gain strength.
(B) Gaining strength makes athletes heavier.
(C) Heavier athletes tend to be able to lift more weight.
(D) Heavier athletes tend to be able to lift less weight.

25. One of the questions on the STAT 2300 Student Survey asked students to report the web browser
they primarily use. The relative frequency bar graph below summarizes the 639 student responses.

How many more respondents primarily use Chrome than primarily use Firefox?
(A) About 54
(B) About 166
(C) About 345
(D) Cannot be determined from the given information

9
26. One retail store asks customers to provide their zip code at check out in order to determine locations
where it would be most profitable to open new stores. What type of variable is customer zip code?
(A) Qualitative because the observations take on numerical values that represent different
magnitudes of the variable.
(B) Qualitative because each observation belongs to one of a set of categories
(C) Quantitative because the observations take on numerical values that represent different
magnitudes of the variable
(D) Quantitative because each observation belongs to one of a set of categories

27. A local pizza shop is offering any medium, two-topping pizza for $5. Customers can pick from 15
different toppings, but double toppings are not allowed. How many different two-topping pizzas are
possible?
(A) 30
(B) 105
(C) 210
(D) 225

28. The histograms below summarize the heights, in centimeters, of 200 pine seedlings six years after
they were planted at a center for environmental study. Half of the trees were fertilized yearly, and
the remaining trees were never fertilized.

Fertilized Trees Unfertilized Trees

Which of the two height distributions has the larger standard deviation? Explain.
(A) The fertilized trees because the heights have less variability.
(B) The unfertilized trees because the heights have more variability.
(C) The fertilized trees because they are centered at a larger value.
(D) The unfertilized trees because they are centered at a smaller value.

10
29. A survey questioned 2300 adults 18 years of age or older living in South Carolina. Of the 2300
surveyed, 54% agreed with the statement, "I don’t like coffee, but I drink it anyway." Which of the
following statements is correct?
(A) The value 54% is a parameter because it summarizes the sample of 2300 adults.
(B) The value 54% is a parameter because it summarizes the population of all South Carolina adults.
(C) The value 54% is a statistic because it summarizes the sample of 2300 adults.
(D) The value 54% is a statistic because it summarizes the population of all South Carolina adults.

30. Each dotplot below displays the distribution of 35 observations of a variable. For which distribution
is it most likely that the value of the mean is noticeably less than the value of the median?

(A) (B)

(C) (D)

31. The weights of Sunkist navel oranges follow a bell-shaped distribution with a mean of 140 grams
and a standard deviation of 15 grams. In a box containing 100 Sunkist navel oranges, approximately
how many of them would you expect to be between 110 and 170 grams?
(A) 50
(B) 68
(C) 95
(D) 99

11
32. A survey was administered to a nationwide random sample of 200 adults who were asked to report
their monthly cell phone bill (in dollars). A histogram of the results is shown below.

Which of the following statements about the distribution of monthly cell phone bills is true?
(A) The distribution of monthly cell phone bills is skewed to the left.
(B) Approximately 30% of respondents reported monthly cell phone bills less than $50.
(C) There were no reported monthly cell phone bills between $300 and $350.
(D) The largest reported monthly cell phone bill was $70.

33. The 600 students enrolled in an introductory statistics course at a large university were surveyed and
asked to report how many hours they participate in sports or other physical exercise in a typical
week. Selected summary statistics are provided below.

Mean: 6.5 hours


Standard Deviation: 5.5 hours
First Quartile: 3 hours
Median: 5 hours
Third Quartile: 8 hours

About 300 of the surveyed students typically exercise


(A) less than 6.5 hours each week
(B) between 1 and 12 hours each week
(C) between 3 and 8 hours each week
(D) more than 8 hours each week

12
Part II: Free Response. Show all your work. Indicate clearly the methods you use, because you will be
graded on the correctness of your methods as well as on the accuracy and completeness of your results
and explanations. Answers with no justification will receive no credit.

1. There have been many studies recently concerning coffee drinking and cholesterol level. While it is
known that several coffee-bean components can elevate cholesterol level, it is thought that a new
type of paper coffee filter may reduce the presence of some of these components in brewed coffee.

The effect of the new filter on cholesterol level will be studied over a 10-week period using 300
volunteers who each drink 4 cups of caffeinated coffee per day. Each of these 300 volunteers will be
randomly assigned to two groups: the experimental group, who will drink only coffee that has been
made with the new filter, or the control group, who will drink only coffee that has been made with
the standard filter. The change in cholesterol level over the treatment period will be used to measure
the effectiveness of the new filter.

(a) Identify the explanatory variable (i.e., the factor) and the response variable in the study. (4 pts)

Explanatory variable:

Response variable:

(b) Based on the design of the study, is it possible to draw a cause-and-effect conclusion between the
type of filter used and change in cholesterol level? Explain why or why not. (2 pts)

(c) Based on the design of the study, would it be reasonable to generalize the results beyond the 300
participants? Explain why or why not. (2 pts)

13
2. The weights, in pounds, of the 22 members of the 2012 U.S. Women’s Olympic Rowing Team are
listed below.

106 123 130 152 157 160 160 160 165 170 172

175 175 175 175 175 175 175 178 178 180 185

(a) Calculate the five-number summary for this distribution and enter your responses in the table
below. (3 pts)

MIN Q1 MEDIAN Q3 MAX

(b) Are there any outliers in the data set? Show your calculations. (3 pts)

(c) Draw a boxplot of the distribution of weights using the number line below. (4 pts)

(d) Describe the shape of this distribution. (2 pts)

14
3. The Iowa Farm Cooperative owns and leases prime farmland in the Midwest. Most of its acres are
used to plant corn. The Cooperative performs a substantial amount of testing to determine what types
of corn seed produce the greatest yields. Recently, the Cooperative planted three types of corn seed
(A, B, and C) on it plots of land used for testing. The following table gives the mean and standard
deviation of the yield (in bushels per acre) after one year.

Seed Type A Seed Type B Seed Type C

Mean
88 56 100
(bushels per acre)

Standard Deviation
25 15 11
(bushels per acre)

Histograms of the data revealed that the yield distribution for each type of seed is approximately
bell-shaped. One particular farmer needs to obtain at least 135 bushels per acre to avoid bankruptcy
and will use these test results to determine what type of seed to plant.

(a) Based on the results of the testing, which seed type tends to produce the greatest yield per acre?
Justify your answer. (2 pts)

(b) Determine the z-score for an observation of 135 bushels per acre for each seed type. (3 pts)

Seed Type A:

Seed Type B:

Seed Type C:

(c) What type of seed would you recommend the farmer plants? Justify your answer. (3 pts)

15
4. A professional sports team evaluates potential players for a certain position based on two main
characteristics, speed and strength.
Speed is measured by the time required to run a distance of 40 yards, with smaller times indicating
more desirable (faster) speeds. From previous speed data for all players in this position, the times to
run 40 yards have a mean of 4.60 seconds and a standard deviation of 0.15 seconds.
Strength is measured by the number of times (reps) a player can bench press 225 pounds, with more
reps indicating more desirable (greater) strength. From previous strength data for all players in this
position, the number of 225 lb bench presses has a mean of 21 reps and a standard deviation of 5 reps.
The measurements for two players in this position, Players A and B, are given below.

Player A Player B
Time to run 40 yards 4.42 seconds 4.57 seconds
Number of 225 lb bench presses 32 reps 33 reps

(a) Calculate the z-score for each player's speed measurement. (2 pts)

Player A:

Player B:

(b) Calculate the z-score for each player's strength measurement. (2 pts)

Player A:

Player B:

(c) The characteristics of speed and strength are considered to be of equal importance to the team in
selecting a player for the position. Based on the z-scores calculated in (a) and (b), which player
should the team select if the team can only select one of the two players? Justify your answer. (3 pts)

16
5. The January 2015 issue of The Lancet reported on a study of the effectiveness of Tamiflu for
reducing the duration of flu symptoms. Researchers recruited 30 subjects who agreed to report to the
researchers' lab within 24 hours of getting flu symptoms. Half of the subjects were randomly
assigned to take a full dosage of Tamiflu and the other half received a placebo.
(a) The following table summarizes the number of days to alleviation of all flu symptoms for the 15
subjects assigned to take a placebo.
Values below Q1 Q1 Median Q3 Values above Q3
2, 3, 3 4 5 7 10, 12, 13
Are there any outliers in the placebo group? Show your calculations. (4 pts)

(b) The graph below displays a boxplot for the distribution of number of days to alleviation of all flu
symptoms for the 15 subjects assigned to take Tamiflu. Add a boxplot to display the distribution
for the placebo group to this graph. (4 pts)

Time to Alleviation (in days)

(c) Write a sentence to compare the shapes of the two distributions. (3 pts)

(d) Based on the boxplots, is there evidence to suggest that Tamiflu helps to reduce the time until
alleviation of all flu symptoms? Explain. (3 pts)

17
6. The least-squares linear regression equation below is the result of an analysis that modeled the
annual salary of public school teachers with a bachelor's degree in Greenville County as a function
of the number of years of teaching experience they have.

𝑦𝑦� = 34,769.78 + 910.55𝑥𝑥

The teachers in the analysis had teaching experience ranging from 0 years to 28 years.

(a) If it makes sense to do so, interpret the y-intercept in this regression equation. If it does not make
sense to do so, explain why it does not make sense. (3 pts)

(b) Interpret the slope in this regression equation. (3 pts)

(c) A teacher with 5 years of experience whose salary is $39,002.53 was included in the sample used
to fit the linear model. Calculate the residual for this teacher. (3 pts)

18
7. A computer-administered test consists of three components: a quantitative section (Q), a verbal
section (V), and an essay section (E). The computer randomly assigns the order of the components to
each test taker.

(a) Use a counting rule to determine the total number of possible orders of test components. Show
your calculations. (3 pts)

(b) List all the possible orderings of the test components using the letters Q, V, and E. (3 pts)

(c) Calculate the probability that a particular test taker gets the quantitative section first or the verbal
section last. (3 pts)

19

Vous aimerez peut-être aussi