# Biostatistics

RR calculation
Disease +
Disease Control +
A
B
Control C
D
o Relative risk is the risk of an outcome in the exposed divided by the
risk of that outcome in the unexposed
o RR = [A/(A+B)] / [C/(C+D)]
Sensitivity
o Probability of a positive test result in a person with the disease
o High sensitivity is important for screening tests
o Given a test with a high sensitivity, a negative results would help
rule out a diagnosis
Correlation Coefficient
o Ranges from -1 to +1 and describes the strength and polarity of
a liner association
o The closer the values are to the margins (-1, +1), the stronger
the association
Calculations using specificity
o Specificity is the number of true negatives divided by the total
number of subjects without the disease
o True negatives= specificity X number of patients without the
disease
o False positives= (1- specificity) X number of patients without the
disease
Relationship between prevalence and incidence
o Incident cases- new cases diagnosed in a given period
o Prevalent cases- total number of cases both new and old
o Any treatment that prolongs survival but does not cure a disease
will increase prevalence (increases the number of living
individuals with the disease)
Hawthorne effect
o Tendency of study subjects to change their behavior as a result
of their awareness that they are being studied
Power
o Probability of seeing a difference when there is one
o Power= 1-B (B= type II error rate)
Reliability of a test
o Reliable test is reproducible in that it gives similar results on
repeat measurements.
o Reliability is maximal when random error is minimal
Specificity
o Correctly identify individuals without the disease
o Should be high in a confirmatory test to decrease false positives

Risk calculation
o Probability of developing a disease over a certain period of time
o Divide the number of affected subjects by the total number of
subjects in the corresponding exposure group
Relative risk reduction
o Relative risk reduction = (absolute risk in control absolute risk
in treatment) / absolute risk control
Absolute risk reduction
o Absolute risk reduction = event rate in the control group event
rate in the treatment group
Health promotion
o Primary- preventing a disease process from becoming
established. Health promotion (exercise, dont smoke, lose
weight)
o Secondary- detecting a disease process before it causes
symptoms. Individual case finding (cervical cancer screening),
community screening (blood pressure screening at state fair)
o Tertiary- treating a disease to prevent progression/complications.
Disability limitation (blood sugar and blood pressure control in
diabetes), rehabilitation (physical therapy after stroke)
Mean calculation
o Average (mean) of a dataset of values is the sum of the values
divided by the total number of values
Cross sectional study
o Exposure and outcome are measures simultaneously at a
particular point in time
o In other studies, a certain time period separates the exposure
from the outcome
Case control study and odds ratio
o Case control study- people with the disease of interest (cases)
previous exposure to the variable being studied
o Measure of association is the exposure odds ratio
Odds of exposure of people with the disease (cases)/ odds
of exposure of people without the disease (controls)
Hepatitis graph interpretation
o Incidence is the number of new cases of a disease in a certain
population at risk over a given time period
o Prevalence is the total number of cases in the population over a
given period of time
Median
o Median- value in the precise center of an ordered dataset
o Separates the right half of the data from the left half
Odds ratio calculations

Odds ratio- measure of association between an exposure and an

outcome
Probabilities
o If the events are independent, the probability that all events will
turn out the same is the product of the separate probabilities for
each event
o The probability of at least 1 event turning out different is given
as 1-P (all events being the same)
Sample size
o A studys power increases as the same size increases
o The larger the sample, the greater the ability of a study to detect
a difference when one truly exists
Meta analysis confidence interval interpretation
o Meta analysis groups results of several trials to increase
statistical power and provide an overall pooled effect estimate
o If a CI crosses a null value, then there is no statistically
significant difference between the two groups
Attributable risk percent
o Attributable risk percent (ARP) in the exposed represents the
excess risk in the exposed population that can be attributed to
the risk factor
o Derived from the relative risk formula:
ARPexposed = 100 X [(RR-1)/RR]
Effect modification
o Present when the effect of the main exposure on the outcome is
modified by the presence of another variable
o Not a bias
Latent period
o The concept of a latent period can be applied to both disease
pathogenesis and exposure to risk modifiers
o The initial steps in pathogenesis and/or exposure to a risk factor
sometimes occur years before clinical manifestations of a
disease are evident
o Additionally, exposure to risk modifiers may need to be
continuous over a certain period of time before influencing the
outcome
Measure of center, outliers
o Outlier- extreme and unusual observed value in a dataset
o Can affect measures of central tendency (mean, median, mode)
as ell as measures of dispersion (standard deviation, variance)
o Modes tend to be resistant to outliers
Number needed to harm (NNH)
o NNH = 1/attributable risk

Attributable risk = adverse event rate tx adverse event rate

placebo
Confounding
o Confounding bias occurs when the exposure-disease relationship
is muddied by the effect of an extraneous factor that has
correlations with both the exposure and the disease
o Confounding bias can result in the false association of an
exposure with a disease
o In prospective studies, disproportionate loss to follow up between
the exposed and unexposed groups creates the potential for
attrition bias, which is a form of selective bias
o Investigators try to achieve high patient follow up rates in
prospective studies
Statistical power
o Power (1-beta) is the probability of rejecting the null hypothesis
when it is truly false
o It is typically set at 80% and depends on the sample size and
difference between outcomes
Positive predictive value
o Positive predictive value represents the probability of truly
having a disease given a positive test result.
o It increases with increasing disease prevalence and decreases
with decreasing disease prevalence
Positively skewed distribution
o Mean > median > mode
P-value
o A result is statistically significant if the 95% confidence interval
does not cross the null value, which corresponds to a p-value <
0.05
Confidence interval
o The standard deviation reflects the spread of individual values in
a normal distribution
o The standard error of the mean reflects the variability of means
and helps estimate the true mean of the underlying population
Positive predictive value calculation
o Answers the question: if the test result is positive, what is the
probability that a patient has the diease?
o PPV = true positives / (true positives + false positives)
Ecological study
o Unit of study is populations (not individuals)
Sensitivity
o Sensitivity = true positives / (true positives + false negatives)
o Screening tests should have a high sensitivity

Attack rate
o Attack rate = number of people who contracted an illness /
number of people who are at risk of contracting that illness
Incidence and prevalence
o An increasing prevalence and stable incidence can be attributed
to factors which prolong the duration of a disease
o Example- improved quality of care
Observer bias
o Occurs when the investigators decision is affected by prior
knowledge of the exposure status
Recall bias
o Results from inaccurate recall of past exposure by people in the
study and applies mostly to retrospective studies such as case
control studies
o People who have suffered an adverse event are more likely to
recall risk factors than those without adverse experiences
o Recall bias is a threat to the validity of a study
Matching
o Used in case control studies in order to control confounding
o Matching variables should always be the potential confounders of
the study (age, race)
o Cases and controls are then selected based on the matching
variables so that both groups have similar distribution in
accordance with the variables
Normal distribution
o In a normal (bell shaped) distribution curve, 68% of observations
lie within 1 standard deviation of the mean, 95% of observations
lie within 2 standard deviations of the mean, and 99.7% of
observations lie within 3 standard deviations of the mean
Case control study
o Selection of control subjects in case control studies is intended to
provide an accurate estimation of exposure frequency amount
the non diseased general population
o Cases and controls are often matched to decrease confounding
o Matching must be carefully performed so as to not introduce
selection bias