Vous êtes sur la page 1sur 9

The Influencing Factors of High School Students’ Writing Scores

March 2010

Abstract: Two multivariate models are used to determine how high school students’ writing
scores are affected by independent variables, which are recorded along with individuals’
scores. There are 200 observed high school students with their details about race, socio-
economic status, type of school and program they attend, and four other test scores in
reading, math, science, and social studies subjects. The following explanation will show a
linear regression model and a logit model, respectively. The dataset that is provided,
however, shows an imbalance in the number of observations in race and types of school. This
could lead to the misinterpretation of the data.

Introduction

Unique identification number represents the number that each student is assigned arbitrarily.
It runs from 1 to 200. This factor has no influence on individuals’ writing scores.

Gender is sometimes considered an important factor that affects writing scores. Males and
females have different physical bodies naturally. Female students tend to get higher scores
than male students do.

Race is a categorical variable, which also influences students’ scores. From all observations,
there are four groups, Hispanic, Asian, African-American, and Caucasian. Only a small
portion of Asian students is surveyed, compared to other races.

Socio-economic status is subdivided into three levels, low, medium, and high. There is no
data to explain how each student is divided into each status.

Type of school is generally divided into two groups, private school and public school.

Studying Programs consist of general, vocational, and academic school programs.

Test scores that are recorded are from reading, math, science, and social studies subjects.

Overview

A dataset of writing scores is provided into 200 observations with two multivariate models.
The first model is called a regression model that is related to gender, race, socio-economic
status, type of school, type of program, and other subject scores each student has obtained as
shown in Equation 1.
The Influencing Factors of High School Students’ Writing Scores Panichpol

Page 2 of 9
Where:
WRITING Respondent’s writing scores
FEMALE Dichotomous variable (1 = female, 0 = male)
READING Respondent’s reading scores
MATH Respondent’s math scores
SCIENCE Respondent’s science scores
SOCST Respondent’s social studies scores
WHITE Dichotomous variable (1 = white, 0 = not white)
LOWSES Dichotomous variable (1 = low socio-economic status, 0 = not low)
MIDSES Dichotomous variable (1 = medium socio-economic status, 0 = not
medium)
PRIVATE Dichotomous variable (1 = private school, 0 = not private school)
ACAD Dichotomous variable (1 = academic school program, 0 = not
academic)
GENERAL Dichotomous variable (1 = general school program, 0 = not general)

Data: Definitions and Limitations

From 200 observations from high school with the range of writing scores from 31 to 67, there
are 91 and 109 males and females, respectively. A sample group of male students is about
45% and the rest is female students. Minimum scores that male students obtain are 31, while
female students get minimum scores of 35. The majority of students are Caucasian, which
equals to 145, 68 are male and 77 are female. 24 students are Hispanic and 20 are African-
American. Only 11 are Asian students; however, their minimum scores are 44, which are
about 10 points higher than minimum scores of other race. Almost half of all observations are
from middle socio-economic status, while low and high status is around 50% each. 168
students, 84% of observations, study at public school. Average writing scores are 53 points
and standard deviation is about 9.5 (Table 1). The level of high writing scores is equal to 52
points or above. There are 126 students, 80 female and 46 male, who get high scores. While
73% of female students obtain high scores, only half of male students are in the same level
with these female students. From these statistics, 82% and 70% of Asian and Caucasian
receive high scores on writing, respectively. 60% of public students and 78% of private
students get high points as well.
The Influencing Factors of High School Students’ Writing Scores Panichpol

Page 3 of 9
Table 1: Characteristics of the Factors Affecting Writing Scores
Writing Scores
Mean 52.78
Standard Deviation 9.48
Variance 89.84
Maximum 67.00
Minimum 31.00

Bivariate Analyses

To explain the relationships between pairs of variables, we consider correlation coefficients


for both the direction and magnitude (Table 2). Respondent’s writing scores and reading
scores have a positive correlation, which is equal to 0.60. This means that a one unit increase
in reading scores will raise 0.60 scores on writing. Also, a one point increase of science score
causes the writing score to increase 0.57 points due to the fact that writing scores are
positively correlated to science scores (0.57). Surprisingly, identification number and writing
scores have a positive correlation of 0.19. In fact, identification numbers might not influence
students’ scores. No pair of attributes has a negative correlation value.

Table 2: Correlation Coefficients' Variables


Respondents Respondents Respondents Respondents
Writing Identification Reading Respondents Science Social Studies
Scores Number Scores Math Scores Scores Scores
Respondents Writing
Scores 1.00

Identification Number **0.19 1.00


Respondents Reading
Scores 0.60 *0.15 1.00

Respondents Math Scores 0.62 **0.22 0.66 1.00

Respondents Science Scores 0.57 0.32 0.63 0.63 1.00


Respondents Social Studies
Scores 0.60 **0.18 0.62 0.54 0.47 1.00
All correlations significant at p ≤ 0.001, except * p ≤ 0.05 and ** p ≤ 0.01.

Writing Scores and Gender

Because gender consists of two groups, male and female, we begin to run the f-test to see
whether there is a significant difference (f = 1.61, dfn = 90, dfd = 108, p ≤ 0.001). The
probability is 0.02, which means there is a significant difference between two sample groups
of the variable. So we should continue using the t-test with unequal variances (t = -3.66, df =
169.71, p ≤ 0.001). Due to the probability being less than 0.05, again there is a significant
The Influencing Factors of High School Students’ Writing Scores Panichpol

Page 4 of 9
difference. On average, male students received scores of 50 and female students have
obtained scores of 55. The differential of 95% confidential interval is [-7.50, -2.24].

Writing Scores and Race

There are four categories of race, Hispanic, Asian, African-American, and White. Instead of
running the t-test, we use the Analysis of Variance, ANOVA, to see whether there is at least
one group that differs from the others (F = 7.83, df = 199, p ≤ 0.001). This test compares
variance within groups to variance between groups. To make a comparison of writing scores
by race, Bonferroni is used with a one-way ANOVA. The post hoc test explains that there is a
significant difference of scores between Asian and Hispanic, between Asian and African-
American, between Caucasian and Hispanic, and between Caucasian and African-American
(Table 3). Asian students receive the highest scores and Caucasian students are in the second
ranking.

Table 3: Differences of Writing Scores among Races


Hispanic White African-American
Asian 11.54 3.94 9.80
African-american 1.74 5.86
White 7.60

Socio-economic Status and Type of Program

According to the dataset, there are three levels of socio-economic status and three types of
programs, the test that is used is chi-square test (chi2 = 16.60, df =4, p ≤ 0.01). The result
explains that each student’s social status is not independent of studying in any type of
program.

Regression Results

To see how multivariate analyses relate to writing scores, we consider a multiple regression
model as shown in Equation 1 (r = 26.75, df = 199, p ≤ 0.001). The r-squared value and
adjusted r-squared value are equal to 0.6102 and 0.5874, respectively. The r-square expresses
that the model explains about 61 percent of the variation of price around its mean. The
adjusted r-square indicates that the model explains around 59 percent of the variance in
price. The F-value is 26.75 (Table 4).

To drop variables, we consider the t values and betas, and run a test on the variables we want
to drop. In this case there are eight variables that are dropped, and the rest should be kept for
running a new regression model as shown in Equation 2 (r = 58.25, df = 199, p ≤ 0.001).
The Influencing Factors of High School Students’ Writing Scores Panichpol

Page 5 of 9
The F-value is 58.25. The r-squared value and adjusted r-squared value are equal to 0.6002
and 0.5899, respectively. The r-square explains that the model explains about 60 percent of
the variation of price around its mean. The adjusted r-square shows that the model explains
around 59 percent of the variance in price. We consider this as a final model (Table 5).

Table 4: First Regression Results for Writing Scores


Coefficient Standard Error t-value Lower Upper
Constant 6.79 3.34 2.03 * 0.20 13.37
**
FEMALE 5.37 0.89 6.02 * 3.61 7.13
READ 0.11 0.07 1.73 -0.02 0.24
MATH 0.21 0.07 2.93 ** 0.07 0.35
**
SCIENCE 0.26 0.07 3.98 * 0.13 0.39
**
SOCST 0.22 0.06 3.89 * 0.11 0.33
WHITE 0.01 1.07 0.01 -2.10 2.13
LOWSES 0.85 1.34 0.64 -1.78 3.49
MIDSES -0.15 1.06 -0.14 -2.24 1.94
PRIVATE 1.07 1.23 0.87 -1.35 3.49
ACAD 1.74 1.28 1.36 -0.78 4.26
GENERAL 0.67 1.32 0.51 -1.93 3.26
Significant: * p ≤ 0.05 ** p ≤ 0.01 *** p ≤ 0.001

Table 5: New Regression Results for Writing Scores


Coefficient Standard Error t-value Lower Upper
Constant 8.06 2.93 2.75 ** 2.27 13.84
FEMALE 5.45 0.88 6.22 *** 3.72 7.18
MATH 0.24 0.07 3.54 *** 0.11 0.37
SCIENCE 0.30 0.06 5.05 *** 0.18 0.41
SOCST 0.25 0.05 4.85 *** 0.15 0.35
ACAD 1.75 1.01 1.73 -0.24 3.74
Significant: ** p ≤ 0.01 *** p ≤ 0.001

Female gender, math scores, science scores, and social studies scores have a significant
positive correlation to writing scores. An increase in each variable will raise writing scores on
individuals, while keeping other variables constant. A one point increase in math scores
brings up 0.24 scores increasing in writing. The coefficient between science and writing
scores is equal to 0.30. This means that a student with one score increase in science will have
0.30 higher scores in writing. Female gender also positively correlates to writing scores.
Female students tend to get higher writing scores than male students.
The Influencing Factors of High School Students’ Writing Scores Panichpol

Page 6 of 9
Logit Results

The third model is called a logit model that is related to independent variables as shown in
Equation 3.

Where:

HIWRITE Dichotomous variable (1 = writing scores are equal or higher than 52,
0 = writing scores are lower than 52)
FEMALE Dichotomous variable (1 = female, 0 = male)
READ Respondent’s reading scores
MATH Respondent’s math scores
SCIENCE Respondent’s science scores
SOCST Respondent’s social studies scores
WHITE Dichotomous variable (1 = white, 0 = not white)
MIDSES Dichotomous variable (1 = medium socio-economic status, 0 = not
medium)
HIGHSES Dichotomous variable (1 = high socio-economic status, 0 = not high)
PRIVATE Dichotomous variable (1 = private school, 0 = not private school)
ACAD Dichotomous variable (1 = academic school program, 0 = not
academic)
GENERAL Dichotomous variable (1 = general school program, 0 = not general)

Instead of using WRITE as a dependent variable, HIWRITE is used to run the first logit
model with other all variables in Equation 3 (LR chi2 = 122.21, df = 11, p ≤ 0.001). The
pseudo R-squared value equals to 0.4637. LOWSES and VOCATIONAL variables are
dropped due to collinearity.
The Influencing Factors of High School Students’ Writing Scores Panichpol

Page 7 of 9
Table 6: First Logit Results for Writing Scores
Coefficient Standard Error z-value Lower Upper
Constant -15.16 2.36 -6.43 *** -19.78 -10.54
FEMALE 2.28 0.52 4.40 *** 1.26 3.30
READ 0.05 0.03 1.37 -0.20 0.11
MATH 0.09 0.04 2.62 ** 0.02 0.16
SCIENCE 0.08 0.03 2.38 * 0.01 0.14
SOCST 0.05 0.03 1.80 0.00 0.10
WHITE 0.17 0.53 0.33 -0.86 1.21
MIDSES 0.21 0.57 0.37 -0.90 1.32
HIGHSES 0.49 0.69 0.71 -0.86 1.84
PRIVATE 0.21 0.60 0.36 -0.96 1.39
ACAD 0.83 0.60 1.38 -0.35 2.01
GENERAL 0.51 0.61 0.85 -0.67 1.70
Significant: * p ≤ 0.05 ** p ≤ 0.01 *** p ≤ 0.001

As we consider the z value and significant of each variable, there are seven variables that are
not significant as seen in Table 6. To drop those variables, we run test on them to see whether
the probability is equal or less than 0.05. The result shows that the probability is more than
0.05, so they can be dropped and only four attributes, FEMALE, MATH, SCIENCE, and
SOCST, are kept for running a new logit model as shown in Equation 4 (LR chi2 = 116.60, df
= 4, p ≤ 0.001). The pseudo R-squared value is 0.4424.

In Table 7, all independent variables are positively significant related to writing scores. While
keeping other factors constant, a one point increase in math scores will raise a natural log
0.12. Also, increasing one point of science scores will increase 0.09 in the log-odds of high
writing scores. Female students tend to obtain writing scores, which are considered to be at
high level (equal or more than 52 points), than male students.

Table 7: New Logit Results for Writing Scores


Coefficient Standard Error z-value Lower Upper
Constant -15.23 2.24 -6.80 *** -19.63 -10.84
FEMALE 2.10 0.48 4.34 *** 1.15 3.05
MATH 0.12 0.03 3.60 *** 0.05 0.19
SCIENCE 0.09 0.03 3.26 *** 0.04 0.15
SOCST 0.08 0.02 3.22 *** 0.03 0.12
Significant: *** p ≤ 0.001
The Influencing Factors of High School Students’ Writing Scores Panichpol

Page 8 of 9
Although we can use the coefficient value to interpret the data as explained above, using odds
ratio is easier to understand how each variable affects individuals’ writing scores (Table 8).
Odds ratio is normally used to interpret dummy (dichotomous) variables. A one unit increase
in math scores will increase the odds of obtaining writing scores equal or more than 52 points
by a factor of 1.13. One score increase in social studies will raise the odds of getting writing
scores equal or more than 52 points by a factor of 1.08. In addition, female high school
students tend to have 8.20 times of the high level of writing scores than male students.

Table 8: New Logit Results for Writing Scores with Odds Ratio
Odds Ratio Standard Error z-value Lower Upper
**
FEMALE 8.20 3.98 4.34 * 3.17 21.21
**
MATH 1.13 0.04 3.60 * 1.06 1.20
**
SCIENCE 1.10 0.03 3.26 * 1.04 1.16
**
SOCST 1.08 0.03 3.22 * 1.03 1.13
Significant: *** p ≤ 0.001

We also run the goodness-of-fit test to see whether the model fits well with the data we
choose or not. Hosmer-Lemeshow of the chi-square test is used and the probability is 0.7442,
which is more than 0.05. It is not significant, showing that the model fits as a whole.
Moreover, this logit model is better than the first logit model because both AIC and BIC
values are less than those of the first model. While AIC is 165.3687 and BIC is 204.9485 for
the first model, AIC and BIC are equal to 156.9845 and 173.4761, respectively, for the new
model. Thus, this model should be considered as the final logit model. All variables also
work well as a group.

Comparison of Two Multivariate Models

To compare between regression and logit models, the figures are shown in Table 9 below.
Although the regression model includes ACAD variable in the equation while the logit model
does not, we can see that both models are quite similar to each other. All independent
variables are significant and positively correlate to individuals’ writing scores. This explains
that a one unit increase in each variable will affect high school students’ writing scores by
increasing the amount of points as same as the coefficient value in the table, when other
variables are constant. In addition, female students tend to make higher scores than male
students do. It shows that gender has significantly influenced on writing scores as well as
math, science, and social studies scores have.
The Influencing Factors of High School Students’ Writing Scores Panichpol

Page 9 of 9

Table 9: Comparison between Regression and Logit Results for Writing Scores
FEMALE MATH SCIENCE SOCST

Regression Logit Regression Logit Regression Logit Regression Logit

Coefficient 5.45 2.10 0.24 0.12 0.30 0.09 0.25 0.08


Standard Error 0.88 0.48 0.07 0.03 0.06 0.03 0.05 0.02
t-value / z-value 6.22 4.34 3.54 3.60 5.05 3.26 4.85 3.22
Significant *** *** *** *** *** *** *** ***
Lower 3.72 1.15 0.11 0.05 0.18 0.04 0.15 0.03
Upper 7.18 3.05 0.37 0.19 0.41 0.15 0.35 0.12
Significant: *** p ≤ 0.001

Conclusion

A priori expectation was not only gender but also race and type of program play important
tole and are significant factors relating to individuals’ writing scores. Other subject scores do
not affect writing scores due to different course contents. Asian and Caucasian might get
higher scores than Hispanic and African-American students. Because of higher educational
levels, students who attend academic school programs might have an opportunity to obtain
higher scores than those who attend general and vocational studying programs. On the other
hand, both regression and logit models show that gender, math scores, science scores, and
social studies scores have significantly affected high school students’ writing scores.
Nonetheless, there is only a small portion of observations who are either Asian (about 6%) or
from private school (16%). It could lead to the misinterpretation since the data is prejudiced.

Vous aimerez peut-être aussi