Portfolio Part 1

Pauravi Shippen-How
Public Health Statistics

Portfolio

Part 1:

Definitions-

Sample vs. population
-Population-includes all the possible people with the characteristic to be studied
Ex: All the students attending Guilford College
-Sample-a subgroup of the population with the characteristic to be studied
Ex: The freshman students attending Guilford College

Different types of sampling
-Simple random sample- To take away common bias when choosing people for a study,
this method gives each person in the entire population an identical chance of being
selected.
Ex: Place all freshman student ID numbers in a computer database similar to a
lottery and then randomly select 50 ID numbers of students who would then
participate in the study.
-Stratified Random Sampling-Separately selects participants arbitrarily from each section
or level (that is created along lines of impact: race, gender, age, ect.) (higher level of
unbias method then simple random sample)
Ex: Divide Guilford College freshman by their sex (male, female, ect). Then pull
at random a sample from each of those groups.
-Random Cluster Sampling-Separate population into already naturally occurring
subgroups, then randomly select a few of those subgroups to study. All of the participants
belonging to the subgroups chosen will participate in the study.
Ex: At Guilford College all entering freshman belong to a FYE homeroom. If a
researcher is interested in studying the average alcohol consumption for freshman,
she would select a few (specific number) of these FYE homerooms to study.
Every student in the selected FYE homerooms would then participate in the study.
-Convenience Sample- The most bias method of creating a sample for study. This
method relies on who happens to be accessible and willing to participate in the study.
The data yielded does not necessarily represent the population of study.
Ex: The researcher is sitting outside the Guilford College campus cafeteria
requesting students to fill out surveys on alcohol consumption. The students who
are not naturally around the cafeteria or refuse to complete the survey will not be
considered.
-Sample Bias- Occurs whenever parts of the specific population have a greater chance of
participating in a study than anyone else. All methods of collecting a sample have a risk
of bias.
Ex: A researcher selects five students from the Guilford College cafeteria to
represent the entire student body. Not all student have a meal plan and attend the
cafeteria.

Independent vs. dependent variable
-Independent- The variable you have control over to manipulate or change. Affects the
constant dependent variable. Commonly known as a control value.
Ex: At Guilford College, a researcher wants to test the theory that vegetable and
fruit consumption can decrease the number of classes students miss. The
independent variable is the amount of fruits and vegetables that the students
consume.
-Dependent- The variable you measure in a study, the response. Dependent variable
responds to the independent variable.
Ex: At Guilford College, a researcher wants to test the theory that vegetable and
fruit consumption can decrease the number of classes students miss. The
dependent variable is the number of classes missed.

Levels of measurement
-Nominal-Categorical. No hierarchy or ranking.
Ex: The freshman at Guilford College were asked to name their favorite movie
genre.
-Ordinal-Ordered categories, in a hierarchy. May consist of ranges. Not equally spaced
intervals.
Ex: The researcher asks freshman at Guilford College to report on the amount of
alcohol they consume per week. A) 0-3, B) 4-6, C) 6-10, D) 10- or up
-Interval-Same distance between the values. No real 0.
Ex: The research asks freshman at Guilford College to report the number of times
they use the bathroom in a week.
-Ratio- Same distance between the numbers. True 0.
Ex: The researcher asks freshman at Guilford College to report on the amount of
alcohol they consume per week. Values can range from 0 to infinity.

Statistical distributions (negative skew; positive skew; normal; bimodal) VISUAL
Normal-When the curve is in the middle of the graph with equal tales on both sides. The
mean, median, and mode are similar.

Ex:

1 2 3 4 5 6 7 8 9 10
11 12 13
Mean, Median, Mode
Median
Mode
Positive skew- When the curve is on the left side, with the tale streaming to the right. The
mean is larger than the median.

Ex:

Negative Skew-When the distribution curve is on the right side, with the tale streaming to
the left. The mean is smaller than the median.

Ex:

Bimodal-When there are two peak curves within the datas range. Does not have to be
equal, contains two modes.

Ex:

Mode
Mean
Mode
Median
Mean
1 2 3 4 5 6 7 8 9 10
1 2 3 4 5 6 7 8 9 10
11 12 13
1 2 3 4 5 6 7 8 9 10
11 12 13
Mode
Median,
Mean

Measures of central tendency- Measurements used to help describe data and its distribution.
Mean-Average number of a data set. Use only for Ratio data. Is calculated by adding all
the individual values, then divide by the amount of values.
Ex: Five students were surveyed at Guilford College, on the weekly consumption
of alcoholic servings. Results: 0, 2,4,10, 4. To calculate: 0+2+4+10+4=20/5=4
(Mean)
Median-After lining up the individual data from lowest to highest, then the researcher
identifies middle number. If there is an even number of values (resulting in two numbers
as the middle) then take the mean of the two middle values for the median. Can be used
for ordinal, interval, and ratio data.
Ex: 0,2,4,4,10 Median=4
Mode-Category or number that appears most often. Used in nominal, ordinal, interval,
and ratio.
Ex: Mode =4
In Normal distribution: The mean, median, and mode are all the same
In Positive distribution: Mean > median
In Negative distribution: Median> mean

Percentages vs. proportions (how to calculate each)
Percentage: Says how many out of 100. Helpful when comparing more than one large
groups. To calculate, divide the smaller number by the total number (or larger number),
then X by 100.
Ex: If 146 of the Guilford freshman sample group (N=400) reported do not drink
alcohol, then the percent (%) would equal 146/400=.365 X 100 =36.5% of the freshman
sample group do not drink.
Proportion: The number as part of one. To calculate, divide (same directions as obtaining
a percentage), but do NOT multiple by 100 in the end. End up with a decimal most
likely.
Ex: If 146 of the Guilford freshman sample group (N=400) reported do not drink
alcohol, then the proportion would be 146/400=.365 (thirty six hundredths)

Variation (range; standard deviation) VISUAL
Range: The difference between the highest number and the lowest (largest number-
smallest number = Range). Sensitive to outliers.

Ex:

The range is calculated as follows: 9 3 = 6
1 2 3 4 5 6 7 8 9 10
11 12 13
Low=3
Standard Deviation: Statistical number that indicates how much the data set differs from
the mean score of the set. Only applies to singular data sets. Also known as: spread or
dispersion. Most used measure of variability. 1SD=68%, 2SD=95%, 3SD=99% in a
normal distribution curve data set.

Ex: Given that the Guilford College students were asked to report how many time they
ate in the cafeteria per week between 0 and 11. The resulting shows the distribution
of responses. The mean of this sample was 7. The standard deviation is about 2. It can
be concluded that 95% of the data fall within 2 values higher and lower of the mean.

Standard error SE=margin of error for multiple data set means. Determine how well it can be
generalizable to a wider group. The larger the population or sample the smaller the SE of
the mean. The less variability in a population, the smaller the SE of the mean.
1SE=68%, 2SE=95%, 3SE=99% in more than one normal distribution means.
Ex: Given the standard deviation is 4, and the sample size is 100. SE=4 10 (Take the
square root of 100 =10) It can be determined that the standard error is 0.4.

Confidence intervals- (SD and SE) A measurement used to decide the level of confidence a
researcher has when applying a statistic from a sample to determine a parameter that the
population when the data is normally distributed. To calculate a 68% CI, the statistician would
subtract 1SD from the data mean (or 1SE from the population means) to get the low number of
the confidence interval, and add 1SD (or 1SE from the population means) to the data mean to get
the high number of the CI. CI for a sample follows the 1SD=68%, 2SD=95%, and 3SD=99%.
As well as the same rule for population means (SE).
Ex: Given the mean of the data is 20, and the standard deviation is 2. The 68%
confidence interval for 1SD would be 18-22.

(ALL images by Pauravi Shippen-How)

0 2 3 4 5 6 7 8 9 10 11
Mean
High=11 Mean 2SD = 7 2(2) = 3, 11
Primary Learning Strategy:
My primary learning strategy is visual. I chose statistical distribution including normal curve,
positive skew, negative skew, and bimodal. Additionally, I included a visual for range and standard
deviation. I chose to represent statistical distribution through a visual because I can better identify not
only the curve of each distribution according to the data given, but also the relationship of the mean,
median, and mode depending of the shape of the curve. Moreover, the range and standard deviation
was initially difficult for me to understand in terms of there differences. However, when physically
drawing a number line, and seeing how each deviation falls within the line, helped me to better
distinguish between the highest and lowest in the range. Additionally, the difference between the
confidence interval and standard deviation or standard error are more apparent on a number line when
distinguishing between 68%, 95%, and 99%.

Portfolio Part 1

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Portfolio Part 1

Transféré par

Droits d'auteur :

Formats disponibles

Pauravi Shippen-How

Public Health Statistics

Vous aimerez peut-être aussi