Vous êtes sur la page 1sur 11

School Psychology Quarterly

2011, Vol. 26, No. 2, 97107

2011 American Psychological Association


1045-3830/11/$12.00 DOI: 10.1037/a0022987

Investigating Early Literacy and Numeracy:


Exploring the Utility of the Bifactor Model
Joseph Betts

Mary Pickart and Dave Heistad

Renaissance Learning, Wisconsin and Center for


Cultural Diversity and Minority Education,
Wisconsin

Minneapolis Public Schools, Minneapolis,


Minnesota

Previous research has provided evidence for the utility of the Minneapolis Kindergarten
Assessment (MKA), which is a measure of early literacy and numeracy skills. The present
research was undertaken to replicate previous factorial results and evaluate the relative
strength of an alternative parameterization of the measurement model, the bifactor model,
which was posited to correct for anomalies found in the research literature. In addition,
predictive validity evidence was ascertained to evaluate the extent to which two different
factorial structures differed when making predictions about later reading and mathematics
outcomes. Results suggested the bifactor model provided a useful measurement model
conceptualization and also provided a strong predictive model for later reading and
mathematics.
Keywords: early literacy, numeracy, confirmatory factor analysis, bifactor model, structural
equation models

The early evaluation of student learning in


reading and mathematics and their related precursor skills has been identified as a necessary
component in providing all students a firm
foundation for later learning (National Association for the Education of Young Children &
National Council for Teachers of Mathematics,
2003; National Institute of Child Health and
Human Development [NICHHD], 2000; National Research Council, 2001). A key factor in
any approach to improve early learning is the
use of reliable and valid assessments that provide useful information to educators. The Minneapolis Kindergarten Assessment (MKA; Minneapolis Public Schools, 2004) is one measure
that provides an evaluation of both early reading
and early numeracy skills during the kindergarten year.
Previous research on the MKA has provided
strong psychometric evidence of the instru-

ments intended internal validity and score reliability along with evidence to support predictive validity inferences for later reading and
mathematics outcomes (Betts, Pickart, & Heistad, 2009; Betts et al., 2008; Pickart, Betts,
Sheran, & Heistad, 2005). This previous research found evidence to support the general
structure of the MKA with two dominant and
highly correlated (r .88) factors measuring
early literacy and early numeracy. High levels
of score reliability (r .80) were established
for the assessment as a whole and also across
diverse subgroups of students of differing racial/ethnic backgrounds and diverse nonEnglish home languages. In addition to providing evidence of predictive validity, the early
literacy and numeracy measures have also been
shown to provide significant incremental validity when predicting both reading and mathematics outcomes at the end of second grade.
The MKA was originally developed to have
two correlated factors with simple structure
measuring early literacy and early numeracy
skills (Pickart et al., 2005). Recent evidence has
provided confirmatory evidence that the intended simple structure is supported; however,
it has also suggested an unaccounted level of
residual covariation between subtests within

This article was published Online First April 11, 2011.


Joseph Betts, Renaissance Learning, Madison, Wisconsin
and Center for Cultural Diversity and Minority Education,
Madison, Wisconsin; Mary Pickart and Dave Heistad, Minneapolis Public Schools, Minneapolis, Minnesota.
Correspondence concerning this article should be addressed to Joseph Betts, 8409 Elderberry Road, Madison,
WI 53717. E-mail: jbetts5118@aol.com
97

98

BETTS, PICKART, AND HEISTAD

each of the main factors (literacy and numeracy)


in the form of significant residual correlations
(Betts et al., 2009). This finding of significant
residual variation over and above that accounted for by the intended main factors of
early literacy and numeracy is potentially problematic. This finding might suggest that other
factors could be influencing the relationship between scores and potentially degrading measurement qualities. The implication of the unaccounted residual variation can have numerous
consequences.
One consequence would be on the computation
of score reliability (Lucke, 2005). Measurement
precision could be attenuated as this residual variation is being accounted for as random error while
there might actually be some structure in the residual variation that is being overlooked. This
would imply a more complex measurement structure than is presently portrayed. However, it is
also possible that if there is not a more complex
measurement model at work, then score reliabilities would be biased as the residual covariance is
not introduced into the computation of the reliability coefficient (Haertel, 2006; Zimmerman &
Williams, 1977).
Another consequence of handling the residual covariation in an appropriate manner affects
the validity of scores (Kane, 2006; Zimmerman
& Williams, 1977). Failure to account for the
potential relationships that could be implied by
the correlated residuals would additionally
compromise evidence of validity as a potentially significant set of relationships between the
variables are being treated as random error
rather than as an important structural factor of
the measurement model. This might obfuscate
potentially important relationships between the
early literacy and numeracy factors and external
variables of interest. In addition, if correlated
residuals were found to be a potentially legitimate aspect of the measurement process, accounting for this would have direct effects on
research (Reddy, 1992) and evaluating the outcomes of instruction based on the MKA.
The existence of the correlated residuals suggests the potential for the existence of a more
complex measurement model than was originally conceived. However, it is also possible
that the correlated residuals found in previous
research could potentially be the result of anomalous sample specific characteristics related to a
single study. It is important to evaluate the

extent to which the residual variation was an


anomalous event related to a specific sample, or
whether its existence can be observed again in
another independent sample. If the more complex structure is found in an independent sample, then this would suggest that the residual
correlations are potentially an important aspect
of the measurement model that would need to
be taken account. This research will utilize a
cross-validation sample to examine the extent to
which correlated residuals can be replicated for
the MKA.
Accounting for the Residual Variation: The
Bifactor Model
If the residual variation is replicated in a
cross-validation sample, the question arises
about how best to conceptualize the results. As
the indicator variables, subtests in the MKA, are
quite distinct, the correlated residuals were not
thought to represent redundancy between indicators, which can be a cause for correlated
residuals. Previous research has consistently
found a high level of covariation between the
two factors of the MKA. In addition, all the
correlated residuals in the previous research
(Betts et al., 2009) were found only between
subtests of a common factor, that is, no correlated residuals were found between subtests
measuring the literacy and numeracy factors.
Taken together, these phenomena could suggest
the existence of an unaccounted latent factor
within the structure of the MKA. Thus, it might
be possible that the high correlation between
factors is masking a more general factor that
describes variation in responses across all the
observed indicators, that is, literacy and numeracy subtests, while the residual covariation
within each factor could be suggesting more
specific group factor related to literacy and numeracy, independently. This research posited
the bifactor model (Harman, 1967; Holzinger &
Swineford, 1937; Jreskog, 1969; Thomson,
1948; Yung, Thissen, & McLeod, 1999) as an
alternative measurement structure that can account for these circumstances.
The bifactor model was originally described
as one of the early competing models for measuring general intelligence (Holzinger & Swineford, 1937), and much of the subsequent use of
the model has been within that theoretical purview (Carroll, 1993; Jensen, 1998). However,

EARLY LITERACY AND NUMERACY

there have been applications in academics


(Gustafsson & Balke, 1993) and in the general
analysis of items comprising a test (Gibbons &
Hedeker, 1992). In addition, recent inquiry into
the place of intelligence in academics and education has emerged (Buckhalt, 2001; Mayer,
2000), and supports the potential for a general
factor underlying academic achievement that is
distinct from domain specific achievement related to instruction, which is similar to recent
intelligence theories (Sternberg, 1997).
The bifactor model posits a single, general
factor accounting for variation in all the indicator variables (see the General factor in Figure
1B), and then conceives independent (uncorrelated) group factors underlying independent sets
of indicator variables (see the Literacy and Nu-

99

meracy factors in Figure 1B). Given the large


correlation between the early literacy and numeracy factors on the MKA, it would be reasonable to assert that rather than two highly
correlated literacy and numeracy factors, there
might be a single common factor (the General
factor) underlying responses to all the indicator
variables (the literacy and numeracy subtests).
Furthermore, the correlated residuals within each
factor found in previous research, might suggest
that there could still be distinct group factors (Literacy and Numeracy factors) in the measurement
model that represent those specific skill related
factors. Thus, the high correlation between factors
might have been obscuring the effect of a general
factor and the previously found within factor correlated residuals might be masking the group spe-

Figure 1. Factorial model for the two-factor, correlated model (A) and the bifactor model
(B) with standardized factor loadings.

100

BETTS, PICKART, AND HEISTAD

cific factors related to literacy and numeracy independent of each other and independent of the
general factor. Therefore, the bifactor model
might account for the anomalies in the previous
research.
The bifactor model, while potentially accounting for previous results, could also have a
useful implication for research in numerous areas (Lohman, 2000; Mayer, 2000; Wagner,
2000). An interesting theoretical application
could be the evaluation of the effect of environmental effects on ability (Carroll, 1997) such as
education (Buckhalt, 2001). This would be
quite important, as education has been identified
as the second most important variable in the
nexus of g-related variables (Jensen, 1998).
Using the bifactor model allows for the representation of early literacy and numeracy on
their own as group factors and also as composed
of a single general factor underlying all skills
being assessed. This representation of a single
factor could be similar to the metacomponent
(Sternberg, 1997) or general intellectual ability
in previous research (Carroll, 1993; Jensen,
1998; Rummel, 1970). For a general factor to be
supported there would need to be a significant,
positive correlation between all or most of the
skills being evaluated (Carroll, 1997). A general
factor could also be conceived as a component
of the individual learning differences that students bring to the classroom.
Furthermore, the group factors related specifically to literacy and numeracy could relate to
the lower-order factors similar to the performance and knowledge-acquisition components
in the Triarchic theory (Sternberg, 1997) or the
Stratum II components in Carrolls threestratum theory (Carroll, 1993, 1996). These
unique group factors, that is, literacy and numeracy, might then be related to the instructional aspects of literacy and numeracy skills,
specifically, for which the environmental variable of education or schooling has an effect.
Such theories as those of Sternberg and Carroll
are well supported in the literature and would
provide interesting applications to early literacy
and numeracy assessments, but are not immediately applicable to the MKA.
A major reason that theories like Carrolls
would be inappropriate for evaluating the
anomalies found in the MKA is that the model
cannot be specified with only two lower-level,
or stratum II factors, for example, literacy and

numeracy. As Carroll states, there should be at


least three lower-order factors (as stratum II) in
order adequately to define a general factor at
stratum III on the basis of the correlations
among lower-order stratum II factors (p. 144).
Because the MKA has only literacy and numeracy factors, which would relate to the stratum II factors, the models are inappropriate for
the MKA. However, future research might build
off the present findings to attempt to incorporate
additional lower-order factors into their models.
In addition, the higher-order variables, like
stratum III in Carrolls model (1993, 1997,
1997) are usually posited as explaining the covariation in the lower-order factors. Thus, two
issues come to the fore with respect to the
MKA. First, the MKA is presently structured as
two lower-order factors with a substantial, positive correlation between the factors. Applying a
model that attempted to explain the covariation
between these two factors would in effect only
be restating the correlation. This can also be
seen as one of the statistical reasons for Carrolls advice (1997), stated above, that a minimum of three lower-order factors is needed.
Thus, nothing new would be gleaned about
the measurement model of the MKA and the
possible determination of the correlated error
components.
Second, the bifactor model posits a model
with independent general and group factors.
There is no correlation between any of the factors in the bifactor model. Therefore, after extracting the positive correlation manifold that
underlies responses to all the literacy and numeracy factors, the leftover covariation in responses for the literacy variables are unique to
only that factor. The covariation leftover for the
numeracy variable would also then be unique to
the numeracy factor. Thus, the bifactor model
structure is appropriate for measurement instruments that only have two lower-order factors,
but conceive of those factors as independent
after extracting the positive correlation manifold across all the lower-level factors.
Research Purpose
Evidence for making validity arguments
should be considered an ongoing process that
strives to ensure that valid inferences can be
made from test scores (American Educational
Research Association [AERA], American Psy-

EARLY LITERACY AND NUMERACY

chological Association [APA], & National


Council on Measurement in Education
[NCME], 1999). Thus, it is necessary to continually evaluate psychological and educational
tests to provide evidence in support of those
inferences. Given the anomalous results found
for the MKA, further research is warranted. In
general, this research will attempt to replicate
previous research in an independent sample of
students to investigate the expected factorial
structure of the MKA. In addition, the salience
of the previously found correlated residuals
will be evaluated to help to clarify whether
these previous results were anomalous or
whether there was some unaccounted for
structure in the MKA. The bifactor model will
be evaluated as a potential structure that accounts for anomalies.
Finally, as the predictive utility of the MKA
for identifying struggling students was a major
purpose for the development of the test, further
evaluations of the predictive validity was undertaken. This aspect of the research was undertaken for two purposes. First, replication of
previous predictive results in an independent
sample was needed to help evaluate previous
predictive results. Second, because a new and
potentially more complex measurement model
was being investigated, it was important to evaluate the extent to which the new model provided improvements in making predictions
about student outcomes in reading and mathematics at the end of second grade. If the more
complex measurement model provides a better
fit to the data, it should also provide useful
predictions about future achievement.
Method
The data for this research came from an ongoing longitudinal research project. The object
of this larger project was to measure students
early literacy and numeracy skills during the
kindergarten year and their relationship to later
reading and mathematics achievement. All students in the district were evaluated using the
MKA when they entered and exited the kindergarten year. There was an additional midyear,
check-up assessment that was used to identify
any students that appeared to be falling behind.
Only the end of kindergarten year testing results
was used in the present analysis. Follow-up
testing was completed at the end of second

101

grade on reading comprehension and mathematics tests.


Participants
Students participating in this research
(N 2,103) attended schools (N 60) in a
large, urban school district in the Midwestern
United States. The racial/ethnic distribution was
as follows: 39% Black/African; 31% Caucasian/European; 15% Hispanic; 11% Asian; and
4% Native American. A large percentage (25%
of students were Limited English Proficient students whose home language was not English
and were being served in English as a Second
Language programs as English Language
Learners (ELLs). There were over 60 foreign
languages identified as non-English dominant
home language, however, there were three
groups that represented the vast majority of the
ELLs: Spanish (51%), Hmong (8%), and Somali (14%). Approximately 60% of the students
were eligible for free and reduced price lunches,
suggesting a majority of the students were of
lower socioeconomic status.
Measures
MKA.
The MKA (Minneapolis Public
Schools, 2004; Pickart et al., 2005) is a standardized, individually administered assessment
of early literacy and numeracy skills for kindergarten students. The MKA was developed to
measure key early literacy variables known to
predict later reading outcomes (NICHHD,
2000) and key variables known to affect later
mathematics outcomes (National Research
Council, 2001). In addition, the end of Kindergarten standards for the state of Minnesota was
used to guide development. The early literacy
subtests, rhyming (Rh), alliteration (Al), letter
naming (LN), and letter sounds (LS), are fluencybased tests scored by the total number of correct
responses within a specific time span (either 1
or 2 min). The early numeracy subtests, number
sense (NS), patterning/functions (PF), and spatial sense/measurement (SM), are untimed subtests where the total score consists of the total
number of correct answers. Further information
on administration and scoring can be found in
the technical manual (Pickart et al., 2005). Evidence for reliability was reported to be strong
with coefficients greater than .80 for both test

102

BETTS, PICKART, AND HEISTAD

retest and internal consistency across multiple


ethnic groups. Concurrent and predictive validity evidence was reported to be strong with
correlations of .75 and .84, respectively, for
later mathematics and reading achievement.
Northwest Achievement Levels Test
(NALT).
The NALT is a group-administered, adaptive test with specific academic
achievement tests of reading and mathematics.
Good evidence of validity is reported in the
technical manual for second-grade students in
reading and mathematics, r .86 and r .80,
respectively. Marginal reliability for second
grade was above .90, and testretest reliabilities
were consistently above .70 for both reading
and mathematics.
Procedures
Trained assessors assessed kindergarten students during May of the kindergarten academic
year. Standardized procedures were used and
are outlined in the technical manual (Minneapolis Public Schools, 2004; Pickart et al., 2005).
The cohort of assessors was composed of retired
teachers who completed two hours of administration training on the MKA. Training consisted
of both group instruction and didactic modules
comprising videotaped MKA administration examples. In addition, all assessors score the videotaped examples and results were checked for
reliability with a criterion of 90% agreement to
the standard. The NALT reading and math tests
were administered during April of the secondgrade year. All teachers were responsible for
complying with administration rules and procedures. Special days were set aside for all of the
students to participate in the assessment.
Statistical Analyses
Confirmatory factor analytic (CFA) methods
for congeneric tests (Bollen, 1989; Jreskog,
1969, 1971; McDonald, 1985) were used to
evaluate the measurement models. Structural
equation modeling (SEM; Bollen, 1989; Schumacker & Lomax, 1996) was used to evaluate
the predictive validity model. All models were
estimated using the Mplus software (Muthen &
Muthen, 1998 2005).
To evaluate the results of the CFA and
SEM, the following common model fit indices
were used (Bollen, 1989; Hu & Bentler, 1999;

Schumacker & Lomax, 1996): 2 test of


model fit; Comparative Fit Index (CFI); Tucker-Lewis Index (TLI); Root Mean Square Error of Approximation (RMSEA); and Closeness of Fit (Cfit). For this analysis, the following a priori magnitudes were set to judge
the fit of the data to the hypothesized model:
CFI .95; TLI .95; RMSEA .05; and
Cfit .05. Because of the large sample size in
the present research, little weight was place
on the overall 2 tests for the measurement
and structural models because it tends to have
excess power with large samples (Bollen,
1989). This excess power can result in finding
significant differences for small and substantively unimportant differences between the
data and hypothesized model. However, the
statistic is reported for completeness of results and because a 2 difference test will be
used to evaluate differences between nested
models.
The first analysis was related to the first research question: Were the correlated residuals
replicated in this cross-validation sample, and if
so, does acknowledging them in the measurement model improve the fit of the model? To
evaluate this question, two factor analytic models were proposed. Both models started with a
two-factor model with correlations between the
literacy and numeracy factors and simple structure related to the indicator variables. However,
the models differed with respect to the relationship of the residual correlations of the indicator
variables. One model did not allow for correlated residuals and the other model replicated
the previous research finding of the following
residual correlations on the indicator variables:
rhyming/alliteration; rhyming/letter names;
rhyming/letter sounds; letter names/letter
sounds; and number sense/patterning. As the
models were nested, basic nested model hypothesis testing was undertaken, using a 2 difference testing, 2(Bollen, 1989; Schumacker &
Lomax, 1996), and change in CFI, CFI
(Cheung & Rensvold, 2002) to evaluate the
importance of adding the residual covariation in
the measurement model.
To evaluate the second research question
pertaining to the utility of the bifactor model,
the bifactor model was compared with the
best-fitting model from the result of the first
research question. The bifactor model was
specified as having a general factor influenc-

EARLY LITERACY AND NUMERACY

ing all the indicator variables, the general


factor. Then two group factors were specified,
literacy and numeracy, with simple structure.
The general factor and group factors were
constrained to be independent with the correlation fixed at zero between all the factors.
Because the bifactor and correlated factor
models are not nested, to evaluate the relative
fit between the models, the following information indices were used (Bollen, 1989; Schumacker
& Lomax, 1996): Akaikes Information Criteria
(AIC); and, Bayesian Information Criteria
(BIC). For the AIC and BIC, the lower of the
values across model comparisons was deemed
to represent a relatively better fit.
The evaluation of the relationship between
the latent factors of the MKA and the second
grade academic achievement in reading and
mathematics was done using SEM (Bollen,
1989; Schumacker & Lomax, 1996). For this
analysis, the results of the previous measurement models were used to fix the factor loadings
in the measurement portion of the model. Then
the SEM was structured with both the reading
and mathematics variables regressed on all the
latent factors from the previous models. Thus,
two SEMs were run, one with the achievement
variables regressed on the two-factor model and
one with the achievement variables regressed on
the three uncorrelated factors of the bifactor
model. Figure 2 provides a visualization of the
different predictive validity model. To evaluate
the differences between the models, AIC and
BIC were used.
Results
An important assumption of maximum
likelihood estimates for CFA and SEM is that
the multivariate kurtosis approximates a multivariate normal distribution (Bollen, 1989;
Browne, 1984). Both Mardias and Srivastavas
test for multivariate kurtosis were nonsignificant, p .13 and p .94, respectively, for all
the subtests of the MKA. These results suggested estimates for standard errors should not
be biased.
Evaluating the Measurement Models: Is
Residual Variation Important?
The CFA results for the models with and
without residual covariation between indica-

103

Figure 2. Results of the predictive validity models with


distal reading and mathematics outcomes regressed on the
two-factor model (A) and the bifactor model (B).

tor variables can be seen in Table 1. Results


indicated that the model with no correlated
residuals provided moderate, CFI .94, to
poor fit, RMSEA .12. The model with
correlated errors provided excellent fit to the
data on all fit indices. When comparing the
two models, there was a statistically better
model fit with the correlated errors in the
model, 2(4) 355.84, p .01, CFI
.05. In addition, the information criteria, AIC
and BIC, were lower for the model with correlated residuals. These results suggested that
the addition of the correlated residuals was
probably not an anomalous function of the
sample used in previous research and that the
measurement model would be significantly
degraded when these important relations are
left out of the model. Therefore, the twofactor model with correlated residuals will be
the base model when evaluating the bifactor
model as an alternative measurement model.
Evaluating the Bifactor Model as an
Alternative Measurement Model
Table 1 provides the results of the model
comparison between the two-factor model with
correlated residuals and the bifactor model. Fig-

104

BETTS, PICKART, AND HEISTAD

Table 1
Model Fit Results for Measurement Models Confirmatory Factor Analyses (CFAs) and Predictive Validity
Structural Equation Modeling (SEM)
Correlated residuals
Fit indices
2
df
P
CFI
TLI
RMSEA
Cfit
2(4)
CFI
AIC
BIC

No
373.09
13
.01
.94
.91
.12
.01

114,872
114,956

Yes
17.25
8
.03
.99
.99
.02
.99
355.84
.05
114,526
114,639

2-factor vs. bifactor

Predictive validity

2-factor

Bifactor

2-factor

Bifactor

17.25
8
.03
.99
.99
.02
.99

114,526
114,639

19.64
7
.01
.99
.99
.03
.99

114,530
114,649

317.67
28
.01
.97
.96
.07
.01

145,416
145,512

165.77
27
.01
.99
.98
.05
.54

145,266
145,368

Note. CFI Comparative Fit Index; TLI Tucker-Lewis Index; RMSEA Root Mean Square Error of Approximation;
Cfit closeness of fit; AIC Akaikes Information Criteria; BIC Bayesian Information Criteria. Please note the second
and third columns contain the same results because they represent the same model; the results were carried over to the third
column to allow for ease of comparison.

ure 1A provides a visualization of the standardized factor loadings for the correlated residuals
model. There was a high correlation (r .88)
between the literacy and numeracy factors, and
all factor loadings were significant. In addition,
all residual correlations between variables were
significant (all p .05). Results were similar to
previous research.
Figure 1B provides the standardized solution
for the bifactor model. All model fit indices
(Table 1) suggested excellent fit of the model,
and were quite comparable to the correlated
factor model (Figure 1A). These results indicated that the addition of a general factor accounting for covariation between all the indicator variables was a salient factor.
Overall results suggested that both models fit
the data very well. Little difference was noted
between the model fit indices between the two
models. With respect to the information criteria,
AIC and BIC, the two factor model with correlated residuals appeared to be a slightly better
fitting model. These results suggested that
both the two-factor model with correlated residuals and the bifactor model adequately accounted for the relations found in the data and
provided excellent fit of the data. In addition,
both models appear to have reasonable structure to account for the underlying covariation
between the indicators.

Predictive Validity of the Two Models for


Second-Grade Reading and Math
Achievement
Standardized regression estimates from the
regression of the reading and mathematics
achievement scores on the Literacy and Numeracy factors from the correlated factors measurement model with correlated residuals were
presented in Figure 2A. Model fit indices for the
predictive validity model were provided in Table 1. Model fit indices varied from poor (Cfit
.01) to good (RMSEA .07) to excellent (CFI
and TLI .95). These results were similar to
previous findings in a number of ways. First,
both early literacy and numeracy were significantly related to reading and mathematics outcomes two school years in the future. Overall,
the literacy and numeracy factors accounted for
59% of variance in reading scores and 51% in
mathematics scores. There was a significant
level of correlation between the reading and
mathematics residuals (r .25, p .01) suggesting the potential for other covariates to explain some of this residual covariation.
Standardized regression coefficients using
the bifactor measurement model were presented
in Figure 2B. Overall the model fit was excellent on all indices (Table 1). The three factors of
the bifactor model accounted for 58% and 52%

EARLY LITERACY AND NUMERACY

of the variance in second grade reading and


mathematics scores, respectively. It is interesting to note that this was very similar to the
previous model with only two factors. However, looking at the standardized parameter estimates for the regression (Figure 2B), it was
noticeable that the general factor was salient
and accounted for about 50% of the variance in
reading and about 44% in mathematics. Thus,
much of the variation in second grade achievement was predicted by the general factor with
both the literacy and numeracy factors contributing small but significant incremental validity.
The information measures (AIC and BIC)
were both smaller for the predictive validity
model using the bifactor measurement model
when compared with the previous predictive
model using the two-factor structure with correlated residuals. Overall, these results suggested the hypothesized bifactor model fit the
data well and was relatively better than the
original measurement model using the intended
factor structure of the MKA when used to predict later achievement outcomes. Therefore,
while the intended structure of the MKA was
replicated in the present sample, there was also
evidence that the bifactor model should be considered in future research.
Discussion
The outcomes of this research provided positive replication results for the MKA. The intended measurement model of the MKA with
two, correlated factors was replicated along
with the correlated residuals between subtests
found in previous studies. These results suggested that the intended measurement model
was somewhat more complex than originally
conceived and that the previous findings of residual correlations were probably not a result of
sampling. The bifactor model was explored and
provided excellent fit to the data and predictive
validity.
The results found here suggest future research directions that expand beyond the specifics of the MKA. Assessment developers
might think about the bifactor model as a measurement model for early literacy and numeracy
assessments. It is important to note that the
present research suggests that there is the potential for a common general factor accounting
for score variations over and above the intended

105

factors of literacy or numeracy. Further research


should investigate the extent to which the general factor found here operates on different measures of literacy and numeracy, which could
help to identify whether this result is specific to
the MKA or whether it might be a more general
phenomena for investigation. In addition, it
would be useful to attempt to utilize more
group, or lower-order factors related to early
literacy and numeracy to evaluate the extent to
which other theories might apply, such as the
three-stratum theory.
Future research should seek to elicit the relationship between the general factor found here
and measures of general intelligence. It is quite
possible that the general factor found in this
research could represent a general cognitive
ability. This could also make sense as students
mathematics and reading skills are still inchoate
during the kindergarten year, and the lack of
differentiation of skills in these areas could be
related to a general ability. The bifactor model
provides some evidence that even if the general
factor is a proxy for some general cognitive
attribute there is still an impact on later achievement from early skill development related to the
uncorrelated group factors.
One important limitation of the present research is the generalizability of results to student populations somewhat different than the
present sample. While the present sample was
similar to the population for which the MKA is
commonly used, districts with very different
demographic characteristics might show differing results. For instance, the sample in this
research has a high level of minority students
and many students for which English is not their
primary home language. Furthermore, the students with non-English primary home languages (ELLs) will potentially be quite different from other populations with high levels of
non-English primary language students. For instance, only about half the ELL students were
from Spanish-speaking homes, but some districts might have high ELL percentages from
which almost all of the students come from
Spanish speaking backgrounds. Also, the present sample has students with Somali and
Hmong primary languages, which might not be
represented to any great extent in non-English
primary language groups in other communities.
Moreover, some communities might have high
percentages of additional languages, such as

106

BETTS, PICKART, AND HEISTAD

Russian, Japanese, and so forth When investigating instruments measuring language related
educational aspects, it is very important to evaluate the impact of language status, and this is
not just related to whether or not the students
are of a general ELL category, as different
languages and different levels of exposure to
English instruction can have substantial impacts.
Future research should attempt to evaluate
and replicate the results found here in distinctly
different student population groups in a crossvalidation methodology. This research would at
least entail an evaluation of (a) the invariance of
measurement properties for students of different
home languages, (b) predictive bias related to
later mathematics and reading outcomes, and
(c) any bias related to using scores on the MKA
to assign students to at-risk categories with student populations markedly different from the
present sample in both ethnicity and language
status. These three issues are especially important as they relate to the proper use of test scores
for their intended purposes.
Overall, this research provides evidence to
support the use of the MKA for its intended
purposes. Evidence was also found that suggested a more complex measurement model
might be useful in organizing student responses.
The use of the bifactor model provides a possible direction for organizing and conceptualizing
early literacy and numeracy measurement.

References
American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (1999). Standards for educational and psychological testing.
Washington, DC: American Educational Research
Association.
Betts, J., Pickart, M., & Heistad, D. (2009). Construct
and predictive validity evidence for curriculumbased measures of early literacy and numeracy
skills in kindergarten. Journal of Psychoeducational Assessment, 27, 8395.
Betts, J., Reschly, A., Pickart, M., Heistad, D.,
Sheran, C., & Marston, D. (2008). An examination
of predictive bias for second grade reading outcomes from measures of early literacy skills in
kindergarten with respect to ELL and ethnic subgroups. School Psychology Quarterly, 23, 553
570.

Bollen, K. (1989). Structural equations with latent


variables. New York, NY: Wiley.
Browne, M. (1984). Asymptotically distribution-free
methods for the analysis of covariance structures.
British Journal of Mathematical and Statistical
Psychology, 37, 62 83.
Buckhalt, J. (2001). Overview of special issue: Is g a
viable construct for school psychology? Learning
and Individual Differences, 13, 9799.
Carroll, J. (1993). Human cognitive abilities: A survey of factor-analytic studies. Cambridge, United
Kingdom: Cambridge University Press.
Carroll, J. (1996). A three-stratum theory of intelligence: Spearmans contribution. In I. Dennis & P.
Tapsfield (Eds.), Human abilities: Their nature
and measurement (pp. 118). Mahwah, NJ: Erlbaum.
Carroll, J. (1997). Theoretical and technical issues in
identifying a factor of general intelligence. In B.
Devlin, S. Fienberg, D. Resnick, & K. Roeder (Eds.),
Intelligence, genes, & success: Scientists respond to
The Bell Curve. New York: Springer.
Cheung, G., & Rensvold, R. (2002). Evaluating
goodness-of-fit indexes for testing measurement
invariance. Structural Equation Modeling, 9, 233
255.
Gibbons, R., & Hedeker, D. (1992). Full-information
item bi-factor analysis. Psychometrika, 57, 423
436.
Gustafsson, J., & Balke, G. (1993). General and
specific abilities as predictors of school achievement. Multivariate Behavioral Research, 28, 407
434.
Haertel, E. (2006). Reliability. In R. Brennan (Ed.),
Educational measurement (4th ed.) (pp. 65110).
Westport, CT: Praeger.
Harman, H. (1967). Modern factor analysis (2nd ed.).
Chicago, IL: The University of Chicago Press.
Holzinger, K., & Swineford, F. (1937). The bi-factor
method. Psychometrika, 2, 4154.
Hu, L., & Bentler, P. (1999). Cutoff criteria for fit
indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural
Equation Modeling, 6, 155.
Jensen, A. (1998). The g factor: The science of mental ability. Westport, CT: Praeger Press.
Jreskog, K. (1969). A general approach to confirmatory maximum likelihood factor analysis. Psychometrika, 34, 183202.
Jreskog, K. (1971). Statistical analysis of sets of
congeneric tests. Psychometrika, 36, 109 133.
Kane, M. (2006). Validation. In R. Brennan (Ed.),
Educational measurement (4th ed.) (pp. 17 64).
Westport, CT: Praeger.
Lohman, D. (2000). Complex information processing
and intelligence. In R. Sternberg (Ed.), Handbook
of intelligence (pp. 285340). Cambridge, United
Kingdom: Cambridge University Press.

EARLY LITERACY AND NUMERACY

Lucke, J. (2005). Rassling the hog: The influence


of correlated item error on internal consistency,
classical reliability, and congeneric reliability. Applied Psychological Measurement, 29, 106 125.
Mayer, R. (2000). Intelligence and education. In R. J.
Sternberg (Ed.), Handbook of intelligence (pp.
519 533). Cambridge, United Kingdom: Cambridge University Press.
McDonald, R. (1985). Factor analysis and related
methods. Hillsdale, NJ: Erlbaum.
Minneapolis Public Schools. (2004). Minneapolis
kindergarten assessment. Minneapolis, MN: Minneapolis Public School Research, Evaluation and
Assessment Division.
Muthen, L., & Muthen, B. (1998 2005). Mplus statistical analysis with latent variables: Users
guide. Los Angeles: Muthen & Muthen.
National Association for the Education of Young
Children & National Council for Teachers of
Mathematics. (2003). Early childhood mathematics: Promoting good beginnings. Retrieved from
http://www.naeyc.org/about/positions/pdf/
Mathematics_Exec.pdf
National Institute of Child Health and Human Development. (2000). Report of the National Reading
Panel. Teaching children to read: An evidencebased assessment of the scientific research literature on reading and its implications for reading
instruction: Reports of the subgroups (NIH Publication No. 00 4754). Washington, DC: U.S. Government Printing Office.
National Research Council. (2001). Adding it up:
Helping children learn mathematics. Washington,
DC: National Academy Press.

107

Pickart, M., Betts, J., Sheran, C., & Heistad, D.


(2005). Minneapolis Kindergarten Assessment.
Beginning and end of kindergarten assessment:
Technical manual. Minneapolis, MN: Minneapolis
Public Schools.
Reddy, S. (1992). Effects of ignoring correlated measurement error in structural equation models. Educational and Psychological Measurement, 52,
549 570.
Rummel, R. (1970). Applied factor analysis. Evanston, Indiana: Northwestern University Press.
Schumacker, R., & Lomax, R. (1996). A beginners
guide to structural equation modeling. Mahwah,
NJ: Erlbaum.
Sternberg, R. (1997). Educating intelligence: Infusing the Triarchic theory into school instruction. In
R. Sternberg & E. Grigorenko (Eds.), Intelligence,
heredity, and environment (pp. 343362). Cambridge, United Kingdom: Cambridge University
Press.
Thomson, G. (1948). The factorial analysis of human
ability. New York: Houghton Mifflin Co.
Wagner, R. (2000). Practical intelligence. In R.
Sternberg (Ed.), Handbook of intelligence (pp.
380 395). Cambridge, United Kingdom: Cambridge University Press.
Yung, Y., Thissen, D., & McLeod, L. (1999). On the
relationship between the higher-order factor model
and the hierarchical factor model. Psychometrika, 64, 113128.
Zimmerman, D., & Williams, R. (1977). The theory
of test validity and correlated errors of measurement. Journal of Mathematical Psychology, 16,
135152.

Online First Publication


APA-published journal articles are now available Online First in the PsycARTICLES
database. Electronic versions of journal articles will be accessible prior to the print
publication, expediting access to the latest peer-reviewed research.
All PsycARTICLES institutional customers, individual APA PsycNET database package subscribers, and individual journal subscribers may now search these records as an
added benefit. Online First Publication (OFP) records can be released within as little as 30
days of acceptance and transfer into production, and are marked to indicate the posting
status, allowing researchers to quickly and easily discover the latest literature. OFP
articles will be the version of record; the articles have gone through the full production
cycle except for assignment to an issue and pagination. After a journal issues print
publication, OFP records will be replaced with the final published article to reflect the
final status and bibliographic information.

Vous aimerez peut-être aussi