Vous êtes sur la page 1sur 9

STA 100: Midterm II Practice Questions

For all questions you must show your working. This enables us to understand your
thought process, give partial credit and prevent crude cheating.

Notes About These Practice Questions:


ˆ These practice questions are not a practice midterm! i.e., you would not be expected to
complete all of these questions in the the time given for the midterm.

ˆ Solutions to these questions will be posted after you have had sufficient time to practice them.
1. HDL Cholestrol, otherwise known as ‘Good’ Cholestrol, has received much attention in re-
cent years. Higher levels of HDL are considered to help reduce the risk of heart disease.
HDL Cholestrol levels are measured in miligrams per deciliter of blood (mg/dl). There are
variations in HDL levels between men and women. HDL levels in adult men in the general
population are normally distributed with mean 52 mg/dl, and standard deviation 8 mg/dl.
For women, HDL levels are normally distributed with mean 50 mg/dl and standard deviation
6 mg/dl. HDL levels for men and women are assumed to be independent.
(Information from the American Heart Association).

(a) Among adult males, an HDL level below 40 mg/dl provides an increased risk of heart
disease. What is the proportion of men in the general population will be at increased
risk of heart disease?

(b) Among adult females, an HDL level below 50 mg/dl provides an increased risk of heart
disease. What is the proportion of women in the general population will be at increased
risk of heart disease?

(c) Higher HDL levels are considered to be preventive of heart disease. An HDL level of
above 60 mg/dl, for both males and females, reduces your risk of heart disease. Among
the general population, are there a higher proportion of men or women that have reduced
risk of heart disease? Show your working.

(d) You have two friends, Bob and Jane. What is the probability that Bob has a higher
level of HDL cholestrol than Jane?
2. The last decade has seen a large increase in the number of biotechnology companies. As a
venture captialist with an interest in the industry, you decide to invest some money in some
biotech startups. The log rate of return on your investment after 5 years, for a randomly
chosen biotech startup is normally distributed with mean 0.02 and standard deviation 0.10
i.e., Y ∼ N (0.02, 0.12 ).

(Note: The log rate of return is defined as Y = log(X5yr /X0yr ) where X5yr is the value of
your investment after 5 years, and X0yr is the value of the initial investment. The important
thing to note is that if Y is bigger than zero then you made some money, if Y is less than
zero then you lost money.)

(a) If you randomly pick a startup and invest in it, what is P(Y > 0.12)?
(Note: Y = 0.12 means a 12.75% return on your investment. So P(Y > 0.12) is the
probability that you make more than a 12.75% return on your investment in 5 years.)

(b) What is the probability that you make money if you invest all of your money in a ran-
domly chosen startup? i.e., what is P(Y > 0)?

(c) If you invest half of your money in each of two randomly chosen startups, Y1 and Y2 ,
what is the probability that your total investment T2 = 12 Y1 + 12 Y2 will make you money?

(d) By comparing your answers for (2b) and (2c), do you think it would be a good idea to
invest in 1 startup or many startups? (One / Many)
(e) In 2-3 sentences, explain the statistical reasoning to justify your answer.
3. You are working for an environmental conservation group, and are asked to investigate the
birth rate among Bonobo’s (an endangered species of Chimpanzee) living in the jungle in
the Democratic Republic of Congo (DRC). Your group has tagged 200 Bonobo’s, and over
the course of the last 12 years has recorded the number of Bonobo babies born each quarter
(Q1=Jan-Mar, Q2=Apr-Jun, Q3=Jul-Sep, Q4=Oct-Dec). So, you have a dataset that tells
you the number of Bonobo births every quarter since 1998:

Date Number.of.Births
"1998,Q1" 6
"1998,Q2" 2
"1998,Q3" 7
...etc...
"2010,Q4" 5

Since the number of births each quarter is ‘count’ data, you decide to use the Poisson distri-
bution that you learned in STA 100. Let Xi = the number of Bonobo births in month i. You
start by assuming that Xi ∼ Pois (λ), with each quarter assumed to be independent.

(a) Do you think your Poisson assumption is reasonable here? (Yes/No).


Justify your answer in 1-2 sentences.

(b) The average number of births per quarter in the 48 months of data you have is 5.3.
Provide an estimate of λ.

(c) Provide a 95% Confidence Interval for λ. Show your working.

(d) State whether your confidence interval is exact or approximate. If it is an approximation,


state which theorem justifies the approximation.
You remember that your Professor in STA 100 always emphasized that, in addition to
your analysis, you should also (i) plot the data, and (ii) check your assumptions. You
are currently in the jungle in the DRC without access to your trusted R Commander
software. Nevertheless, as a check. . .
(e) Write down the assumptions you have made about Bonobo births.

(f ) If Xi ∼ Pois (5.3), what is the probability of seeing no births in a given quarter?

(g) Again, if Xi ∼ Pois (5.3) for i = 1, . . . , 48, then in roughly how many of the 48 quarters
would you expect to see no births?

(h) In your real data, there were actually 7 quarters that had no Bonobo births. Do you
think your assumptions were valid? If so, justify. If not, state which you think were
violated, and why. Keep your answers brief.
4. Back in the lab, you and your colleagues are again interested in the cell mutation process.
Whether or not a given cell mutates is assumed to be independent of other cells, and each cell
is assumed to have an equal probability, p, of mutating. There are a fixed number of cells, n,
in each culture.

(a) Let X denote the number of cells in the culture that mutate. What is the distribution
of X?

(b) In the first experiment you conduct, there are 200 cells in the culture, and a total of 12
of them mutate. Write down n and estimate of p.

(c) Provide a 95% Confidence Interval for p.

(d) What requirements must your values for n and your estimate of p from (4b) satisfy in
order for your confidence interval to be reliable?

(e) Your lab requests that you repeat the experiment another 99 times. Begrudgingly, you
do so. For each of the 100 total experiments you compute an estimate p̂ and a 95%
confidence interval for p. Assuming experimental conditions are exactly replicated, so
that n and p are the same each time, how many of your 100 confidence intervals do you
expect to contain the true value of p?

(f ) Just after you finish your experiments, another rival laboratory reveals that they have
determined the true mutation probability, p. Upset that you were scooped, but still
statistically curious, you look at your 100 confidence intervals, and see that 93 of them
in fact contained the true value. Do you think you made any mistakes in your analysis,
or does this indicate you probably did a good job? Explain your answer.
5. Following on from your analysis of the costs of owning dogs and cats, you decide to broaden
your scope to include rabbits. The annual cost of owning a dog was normally distributed
with mean $100, and standard deviation $30, the annual cost of owning a cat was normally
distributed with mean $120 and standard deviation $40, and the annual cost of owning a
rabbit is normally distributed with mean $70 and standard deviation $10. Ownership costs
for dogs, cats and rabbits may be assumed to be independent.

(a) How much would you expect to pay per year, on average, if you owned a dog and a cat
and a rabbit?

(b) What is the standard deviation of the yearly cost of owning a dog and a cat and a rabbit?

(c) Now, suppose that costs of owning a dog, cat and rabbit are not independent. Would
your answers to (5a) and/or (5b) still hold? Justify your answers.

For the remaining parts of this question you can again assume that the cost
of owning dogs, cats and rabbits are independent.
(d) You check your pet fund bank account and see that you have $382. What is the proba-
bility you would be able to afford to pay for a dog and a cat and a rabbit for one year?

(e) You consider other options, such as owning two cats. If the costs of each cat are in-
dependent, what is the mean, standard deviation and distribution of the yearly cost of
owning two randomly chosen cats?

(f ) Another option you consider is owning a dog and two rabbits. What is the probability
that owning a dog and two rabbits is cheaper than owning two cats?
6. Blood pressure is usually described by two numbers: systolic and diastolic blood pressure.
Systolic blood pressure among the general population is normally distributed with mean 110
and standard deviation 18, and diastolic blood pressure is normally distributed with mean 65
and standard deviation 12.5. All measurements are in milimeters of mercury (mmHg).

Note: Later parts of this question (c)-(f ) depend on answers to earlier parts (a)
and (b). Should you get any of the earlier parts wrong, you will still be given
FULL CREDIT if your answer to later questions would have been correct if your
answer to the earlier part had been correct. If you do not know the answer to
a question, you should make up an answer. You can then still receive full credit
when using that answer in later parts of the problem.

(a) What is the probability of a randomly chosen person having systolic blood pressure
above 140 mmHg?

(b) What is the probability of a randomly chosen person having diastolic blood pressure
above 90 mmHg?

(c) One particular type of hypertension (or high blood pressure) occurs when a person has a
systolic blood pressure above 140 mmHg and diastolic blood pressure above 90 mmHg.
Assuming diastolic and systolic blood pressure readings for the same person are inde-
pendent, what is the probability that a randomly chosen person has this particular form
of hypertension?

(d) A more general form of hypertension is diagnosed when a person has systolic blood pres-
sure above 140 mmHg or diastolic blood pressure above 90 mmHg. Using your answers
to (6a), (6b) and (6c), what is the probability that a randomly chosen person has this
type of hypertension?
(e) Unsurprisingly, systolic and diastolic blood pressure are not independent. In fact, if a
person has high systolic blood pressure, we usually expect them to have high diastolic
blood pressure. If you do not assume independence and instead use this new informa-
tion, would you expect the probability in (6c) to increase, decrease or stay the same?
Explain your answer.

(f ) Using your answer to (6e), would the probability of having the more general form of
hypertension in (6d) increase, decrease, or stay the same? Compare your answer to your
answer for (6e), and comment on what you notice.

Vous aimerez peut-être aussi