Académique Documents
Professionnel Documents
Culture Documents
SPH 245
Homework 1
PROBLEM 1:
(a) Normality: After compressing and sorting the data by treatment, I used proc univariate to test
for normality. The tables that depict the various tests of normality (specifically Shapiro-Wilks),
confirm that none of the pairwise comparisons violate the normality assumption, given that the
p-values are all insignificant (p>0.05). This means we cannot reject the null hypothesis that the
data has a normal distribution at 0.05. So, we can assume that this data is normally distributed.
Appendix A at the end of this report shows the p-values associated with each treatment
combination (drug, exercise, and high-fat diet).
Equal Variances: Because the data is normally distributed, I used the Bartlett test (below) to test
for homogeneity of variances (as it is a parametric test). Because the Bartlett test similarly
shows a p-value > 0.05 (p=0.9045), we can assume that the data maintains equal variances.
***Because this data is randomized and because of the fact that we can assume both normality
and equal variances, it DOES appear appropriate to use ANOVA for data analysis.
Alternative Hypotheses:
1. HA: There is an interaction effect between exercise and drug on cholesterol levels.
2. HA: There is an interaction effect between diet and drug on cholesterol levels.
3. HA: There is an interaction effect between exercise and diet on cholesterol levels.
4. HA: Controlling for exercise and drug assignment, there is a difference in mean cholesterol
between patients randomly assigned to a high-fat diet or not assigned to a high-fat diet.
5. HA: Controlling for exercise and drug assignment, there is a difference in mean cholesterol
between patients randomly assigned to a high-fat diet or not assigned to a high-fat diet.
6. HA: Controlling for exercise and drug assignment, there is a difference in mean cholesterol
between patients randomly assigned to a high-fat diet or not assigned to a high-fat diet.
Madeline Lasell
SPH 245
Homework 1
Our 2-way ANOVA (GLM) output indicates a p-value of less than 0.05 (p<0.0001), meaning that
we reject the null that there are no differences in mean cholesterol levels among all different
levels of treatment.
The following table shows that diet, exercise, and drug categories each have significant p-
values. This indicates that each appear to have an individual main effect when controlling for
the others. In contrast, the interactions prove to be insignificant (p-values greater than 0.05).
Results:
At a significance level of 0.05:
1) The interactions between diet and exercise (p=0.6649), diet and drug (p=0.0770), and
exercise and drug (p=0.4690) are insignificant. This means that:
i. the effect of diet on reducing cholesterol levels does not depend on exercise or the drug
ii. the effect of exercise on reducing cholesterol levels does not depend on diet or drug
iii. the effect of the drug on reducing cholesterol levels does not depend on diet or exercise
2) Diet treatment had a significant main effect on reducing mean cholesterol levels
(p<0.0001).
3) Exercise treatment had a significant main effect on reducing mean cholesterol levels
(p=0.0009).
4) Drug treatment had a significant main effect on reducing mean cholesterol levels
(p<0.0001).
Madeline Lasell
SPH 245
Homework 1
PROBLEM 2:
(a) Normality: After compressing and sorting the data by treatment, I again used proc univariate to
test for normality. The normality test (Shapiro Wilks) confirmed that one of the pairwise
comparisons (Older, Imagery) violates the normality assumption with a significant p-value of
(p=0.0251). Therefore, we fail to reject the null hypothesis that the data is normal at a p-value of
0.05. So, we can assume that this data is NOT normally distributed. Appendix B at the end of
this report shows the p-values associated with each combination of age (younger or older) and
memorization process (5 different processes).
Equal Variances: If we look at the Brown and Forsythe’s Test (a nonparametric test appropriate
for this non-normal dataset), we can see that the p-value is greater than 0.05 (p=0.9045). This
means that we don’t have significant evidence to reject the null hypothesis that variances are
homogenous or equal. So, we can assume that the data maintains equal variances.
***However, although the data has equal variances, it doesn’t show normality. Therefore, we
can only use a 2-way ANOVA if we take the square root of the response variable (words) in
order to correct for normality.
Madeline Lasell
SPH 245
Homework 1
Alternative Hypotheses:
• HA: Controlling for age, process type has an effect on the mean number of words
memorized.
• HA: Controlling for process, age has an effect on the mean number of words memorized.
• HA: There is an interaction effect between age and process on the mean number of words
memorized.
Our 2-way ANOVA (GLM) output indicates a p-value below 0.05 (p<0.0001), meaning that we
reject the null that there are no differences in mean number of words memorized among use
of the different techniques.
The following table shows that age and process categories each have significant p-values. This
indicates that both appear to have an individual main effect when controlling for the other. In
addition, the interaction between them proves to be significant (p-values=0.0014).
Madeline Lasell
SPH 245
Homework 1
Results:
So, at a significance level of 0.05:
1) The interactions between age and process is significant on mean number of words
remembered (p=0.0014)
2) Age had a significant main effect on the mean number of words remembered
(p<0.0001)
3) Processes had a significant main effect on the mean number of words remembered
(p<0.0001)
a. Because there are 5 processes, we have to compare the pairwise differences for each to test
their interaction effects. According to Tukey’s post-hoc test, the following processes were:
Significantly different from one another:
1. Adjective, counting
2. Adjective, rhyming
3. Counting, Imagery
4. Counting, intention
5. Imagery, Rhyming
6. Intention, rhyming
***In conclusion, we reject all 3 null hypothesis because the p-values are significant.
Madeline Lasell
SPH 245
Homework 1
FULL CODE:
*PROBLEM 1;
FILENAME REFFILE '/folders/myshortcuts/SAS_Scripts/Cholesterol.csv';
*1A;
DATA CHOLESTEROL_HMK1;
SET CHOLESTEROL_HMK1;
TREATMENTS = COMPRESS (DRUG||EXERCIES||DIET);
RUN;
*1B;
PROC GLM DATA=CHOLESTEROL_HMK1;
CLASS TREATMENTS;
MODEL CHOLESTEROL=TREATMENTS;
MEANS TREATMENTS/ HOVTEST=BARTLETT HOVTEST=BF;
RUN;
*PROBLEM 2;
FILENAME REFFILE '/folders/myshortcuts/SAS_Scripts/MemoryA.csv';
*2A;
DATA MEMORY_HMK1;
SET MEMORY_HMK1;
TECHNIQUES = COMPRESS (AGE||PROCESS);
SQRTWORDS=SQRT(WORDS);
RUN;
*2B;
PROC GLM DATA=MEMORY_HMK1;
CLASS AGE PROCESS;
MODEL SQRTWORDS=AGE PROCESS AGE*PROCESS;
LSMEANS AGE PROCESS AGE*PROCESS/CL ADJUST=TUKEY;
RUN;
Madeline Lasell
SPH 245
Homework 1
APPENDIX A: 12 tables sorted in order by drug treatment type (A, B, C), exercise (yes, no), diet (yes, no)
APPENDIX B: 10 tables sorted by age (young, older), process (counting, rhyming, adjective, imagery,
intentional)
Older, Adjective
Older, Counting
Madeline Lasell
SPH 245
Homework 1
Older, Intentional
Older, Rhyming
Younger, Adjective
Madeline Lasell
SPH 245
Homework 1
Younger, Counting
Younger, Imagery
Younger, Intentional
Younger, Rhyming