Vous êtes sur la page 1sur 22

Child Development, March/April 2005, Volume 76, Number 2, Pages 397 416

Academic Self-Concept, Interest, Grades, and Standardized Test Scores:


Reciprocal Effects Models of Causal Ordering
Herbert W. Marsh

Ulrich Trautwein and Oliver Ludtke

SELF Research Centre, University of Western Sydney

Max Planck Institute for Human Development,


Berlin, Germany

Olaf Koller

Jurgen Baumert

University of Erlangen-Nuremberg, Germany

Max Planck Institute for Human Development,


Berlin, Germany

Reciprocal effects models of longitudinal data show that academic self-concept is both a cause and an effect of
achievement. In this study this model was extended to juxtapose self-concept with academic interest. Based on
longitudinal data from 2 nationally representative samples of German 7th-grade students (Study 1: N 5 5,649, M
age 5 13.4; Study 2: N 5 2,264, M age 5 13.7 years), prior self-concept significantly affected subsequent math
interest, school grades, and standardized test scores, whereas prior math interest had only a small effect on
subsequent math self-concept. Despite stereotypic gender differences in means, linkages relating these constructs were invariant over gender. These results demonstrate the positive effects of academic self-concept on a
variety of academic outcomes and integrate self-concept with the developmental motivation literature.

Academic self-concept, interest, and achievement are


interrelated, and stereotypic gender differences are
found in specific domains such as English and
mathematics. In the present investigation we went
beyond merely observing correlations at a single
point in time to attempt to disentangle the causal
mechanisms relating these constructs across multiple
waves of data collection. In a growing body of research covering a range of developmental periods,
researchers have used reciprocal effects models to
explore the causal ordering of academic achievement
and academic self-concept. The overarching rationale of this work is that people who perceive themselves to be more effective, more confident, and more
able will accomplish more than people who have less
positive self-beliefs (e.g., Marsh & Craven, in press).

These data come from two large-scale German projects directed


by Jurgen Baumert of the Max Planck Institute for Human Development: Learning Processes, Educational Careers and Psychosocial Development in Adolescence and the German component of
the Third International Mathematics and Science Study. The present investigation was conducted while Herbert Marsh was a
visiting scholar at the Center for Educational Research at the Max
Planck Institute for Human Development and was supported in
part by the University of Western Sydney, the Max Planck Institute, and the Australian Research Council.
Correspondence concerning this article should be addressed to
Herbert W. Marsh, Director, SELF Research Centre, University of
Western Sydney, Bankstown Campus, Locked Bag 1797 Penrith
South DC NSW 1797, Australia. Electronic mail may be sent to
h.marsh@uws.edu.au.

Unlike prior research that has focused on academic


self-concept as a causal factor, we also examined effects of academic interest, thus aligning our interests
more closely to mainstream motivation research in
developmental psychology.
Specifically, we focused on the role of gender on
self-concept, interest, and achievement in mathematics. There are substantial gender differences in mean
levels of these constructs (Beaton et al., 1996; Koller,
Baumert, & Schnabel, 2001; Marsh & Yeung, 1998;
Watt, 2004). However, a more complicated question is
the extent to which the relations among these constructs vary with gender over time. Thus, for example,
are high levels of prior math self-concept and math
interest more likely to lead to higher levels of subsequent attainment for girls or for boys? In our research
we integrated these issues from different research
traditions into a common methodological framework
of structural equation modeling (SEM) that has broad
applicability in development research.
Development of Academic Self-Concept and Its Relation
to Achievement
Developmental perspectives of self-concept. Self-concept, self-perceived competence, self-beliefs, and the
role of gender are important in developmental perspectives of motivation such as expectancy-value
r 2005 by the Society for Research in Child Development, Inc.
All rights reserved. 0009-3920/2005/7602-0007

398

Marsh, Trautwein, Ludtke, Koller, and Baumert

theory. Consistent themes have emerged from reviews of the development of competence self-beliefs
(Harter, 1990, 1992, 1998; Jacobs, Lanza, Osgood,
Eccles, & Wigfield, 2002; Marsh, 1989; Marsh, Craven,
& Debus, 1991, 1998; Marsh, Debus, & Bornholt, 2005;
Watt, 2004; Wigfield, 1994; Wigfield & Eccles, 1992).
With improved methodology (better measurement,
stronger applications of confirmatory factor analyses
[CFA]), researchers have demonstrated that even very
young children are able to differentiate between
different domains of self-concept (e.g., verbal, mathematics, physical ability, physical appearance, peer
relations, relations with parents). There is clear evidence for increasing differentiation among these domains through age 12 (Marsh, 1989; Marsh & Ayotte,
2003), but not for older children (Marsh, 1989).
Age and gender differences in mean levels of selfconcept are generally small but systematic. Selfconcept declines from a young age through adolescence, levels out, and then increases at least through
early adulthood (Marsh, 1989, 1993b; see also Crain,
1996; Jacobs et al., 2002; Marsh & Craven, 1997; Wigfield et al., 1997). There are also counterbalancing
gender differences consistent with gender stereotypes.
Consistent across preadolescent, adolescent, late-adolescent/young adult periods, males report higher physical ability, physical appearance, and math
self-concepts, whereas females report higher verbal
self-concepts (Marsh, 1989; see also Crain, 1996;
Wigfield et al., 1997). Contrary to gender intensification hypotheses, gender differences did not vary
substantially with age. Based on longitudinal growth
trajectories of children in Grades 1 through 12, Jacobs
et al. (2002) reported gender stereotypic differences
and age-related declines in competence perceptions
but concluded that their results were broadly consistent with Marshs (1993b) findings of no age-related changes in gender differences in self-concept.
Most self-concept studies have focused on gender
and age differences in mean levels of self-concept but
not on factor structure differences, including relations among key constructs. Byrne and Shavelson
(1987), for example, concluded, Clearly, interpretations of mean differences in SC [self-concept] between males and females are problematic unless the
underlying construct has the same structure in the
two groups (p. 369). Hattie (1992) also emphasized
that the differences in means may not be as critical
in the development of self-concept as changes in
factor structure (pp. 177 178). Testing how relations among these constructs vary with gender and
age is even more complicated. Thus, for example,
Marsh (1993b) tested the gender-stereotypic model
that hypothesized that: (a) math self-concept would

be more highly correlated with academic self-concept and global self-esteem for boys than for girls, (b)
verbal self-concept would be more highly correlated
with academic self-concept and global self-esteem
for girls than for boys, and (c) the contrasting pattern
of results would intensify and increase with age.
Instead, however, he found support for the genderinvariant model in which relations among math,
verbal, academic, and general self-concepts did not
vary as a function of gender or age. More recently,
Watt (2004; see also Jacobs et al., 2002) demonstrated
that gender differences favoring boys for math and
English for girls showed little support for either
gender-intensification or -convergence hypotheses.
Academic self-concept and achievement: A reciprocal
effects model. The causal ordering of academic selfconcept and academic achievement has important
theoretical and practical implications, and has been
the focus of considerable research. Byrne (1996) emphasized that much of the interest in the self-concept/achievement relation stems from the belief that
academic self-concept has motivational properties
such that changes in academic self-concept will lead
to changes in subsequent academic achievement.
Calsyn and Kenny (1977) contrasted self-enhancement and skill development models. According to
the self-enhancement model, academic self-concept
is a primary determinant of academic achievement
(ASC ! ACH), whereas the skill development model
implies that academic self-concept emerges principally as a consequence of academic achievement
(ACH ! ASC). However, Marsh and colleagues
(Marsh, 1990, 1993a; Marsh, Byrne, & Yeung, 1999;
Marsh & Craven, in press) argued that much of the
early research was methodologically unsound and
inconsistent with the academic self-concept theory.
Based on theory, a review of empirical research, and
methodological advances in SEM, he argued for a
reciprocal effects model in which prior self-concept
affects subsequent achievement and prior achievement affects subsequent self-concept. In their metaanalysis of self-belief measures, Valentine, Dubois,
and Cooper (2004) also found clear support for a reciprocal effects model. They concluded that the effects of self-beliefs on subsequent performance were
stronger when the measure of self-belief was based
on domain-specific measures rather than global
measures, such as self-esteem, and when self-belief
and achievement measures were matched in terms of
subject area (e.g., mathematics achievement and math
self-concept) as is typical in self-concept research.
Academic achievement: School grades and standardized
test scores. Academic self-concept, interest, and related motivation constructs should be substantially

A Reciprocal Effects Model

correlated with both school grades and standardized


test scores. However, Wylie (1979) posited that selfconcept should be more strongly related to school
grades than to test scores because school grades are a
more salient source of feedback to students that also
reflect motivational properties likely to be related to
self-concept (see also Hattie, 1992; Marsh, 1987, 1990,
1993a). Marsh (1987) extended this proposal to longitudinal causal modeling studies, suggesting that
paths from self-concept to achievement should be
stronger for school-based performance measures
than for low-stakes standardized achievement
measures. For low-stakes standardized tests, students have no opportunity and little incentive to
study for the tests. Hence, characteristics such as
study habits, effort, and persistence are unlikely to
affect test performance. In contrast, these characteristics are likely to have more impact on examination
performance when students are highly motivated to
perform well on an examination and know the content of the examinationFwhen these characteristics
are an actual part of the grading process, as is typically the case with school grades (e.g., students are
penalized for sloppy work habits or not completing
assignments on time but are rewarded for conscientious effort). Thus, the effects of prior self-concept on
subsequent achievement should be stronger when
achievement is based on high-stakes school grades
rather than low-stakes standardized tests (Marsh,
1987, 1990, 1993a; Marsh & Yeung, 1997a, 1998; but
see also Helmke & van Aken, 1995). Here, we extend
this hypothesis to include academic interest and
evaluate whether this pattern of results generalizes
over responses by boys and girls. This distinction
between school grades and test scores is also relevant
to the study of gender differences, as girls typically
do better than boys on school grades, which reinforce conscientious effort and penalize poor work
habits, compared with standardized test scores,
which are purer measures of learning (e.g., Marsh &
Yeung, 1998).
Developmental perspectives on the reciprocal effects
model. Young childrens understanding of competence changes with age and, compared with older
children, their academic self-concepts are more positive and less related to objective outcomes (e.g.,
Marsh, 1989; Marsh & Craven, 1997; Marsh et al.,
1998). Wigfield and Karpathian (1991; see also Wigfield, 1994) further argued that once ability perceptions are more firmly established the relation likely
becomes reciprocal: Students with high perceptions of
ability would approach new tasks with confidence,
and success on those tasks is likely to bolster their
confidence in their ability (p. 255). Consistent with

399

these suggestions, Skaalvik and Hagtvet (1990) found


support for a reciprocal effects model for older students (sixth and seven grades) but a skill development (ACH ! ASC) model for younger students.
Marsh et al. (1999) also argued that although relations
between academic self-concept and achievement become stronger with age, there was insufficient evidence to determine whether the causal relations
between these variables change with age or whether
differences reflect underlying processes or researchers inability to measure these constructs with young
children (see Marsh et al., 2005).
Guay, Marsh, and Boivin (2003) took this up, using
a multicohort multioccasion design (i.e., three age
cohorts: students in Grades 2, 3, and 4, each with
three measurement waves separated by 1-year intervals). They found that as children grew older,
their academic self-concept responses became more
reliable, more stable, and more strongly correlated
with academic achievement. However, the magnitude of these developmental differences was small. It
is important that there was stronger support for the
self-enhancement model (ACH ! ASC) than for the
skill development model (ACH ! ASC) for all three
age cohorts, and support for the reciprocal effects
model was invariant over age. This study provides
good support for the generalizability of reciprocal
effects to young children as well as adolescents.
Academic Interest and Achievement
Individual interest is hypothesized to be a relatively enduring predisposition to attend to certain
objects and activities, and is associated with positive
affect, persistence, and learning (Hidi & Ainley, 2002;
Koller et al., 2001; Krapp, 2000; Renninger, 2000).
Academic interests are postulated to be dispositions
based on mental schemata associating the objects of
interest with positive experiences and a personal
value system that are activated in the form of interestdriven actions. Whereas there is a theoretical distinction between the value (affective) and commitment
(importance) components of interest, researchers have
been unable to distinguish between these components
empirically (Koller et al., 2001). Interest-driven activities are characterized by the experience of competence and personal control; feelings of autonomy and
self-determination; positive emotional states; and,
under optimal circumstances, an experience of flow
whereby the person and the object of interest merge
(Csikszentmihalyi & Schiefele, 1993). Other motivational researchers (e.g., Wigfield & Eccles, 1992) posit
interest as one of the components of task value in an
expectancy-value framework.

400

Marsh, Trautwein, Ludtke, Koller, and Baumert

Like academic self-concept, academic interest is


domain specific; there are stereotypic gender differences such that boys have more interest in math and
science whereas girls have more interest in verbal
areas, and there are gradual declines in interest levels before adolescence and in early adolescence (e.g.,
Eccles, Wigfield, & Schiefele, 1998). It is surprising,
however, that there is little research incorporating
both academic self-concept and academic interest
into longitudinal SEMs evaluating the reciprocal effects of these constructs on each other and on other
academic outcomes.
On the basis of their meta-analysis, Schiefele,
Krapp, and Wintler (1992) concluded that the overall
correlation between interest and academic achievement was about .30 but that this relation was heterogeneous across different school subjects and
indicators of achievement. In subsequent experimental studies, Schiefele (1996) demonstrated that
interest was a significant predictor of subsequent
achievement, mediated in part by activation; that is,
interest increased activation, which in turn led to
greater achievement. However, because most studies
in this area are cross-sectional studies based on correlations, Schiefele (1998) concluded that there is no
basis for drawing causal conclusions from this research or even to claim that interest predicts subsequent achievement beyond what can be predicted by
prior achievement.
Several authors have proposed that academic
achievement or academic self-concept affect interest
(e.g., Koller et al., 2001; Krapp, 2000). Marsh, Craven,
and Debus (2000) demonstrated that cognitive and
affective self-perceptions were highly correlated. In
her theoretical model of self-concept development,
Harter (1992, 1998) posited that students feel more
intrinsically motivated in domains in which they feel
competent. However, because her results were based
on cross-sectional data, stronger tests of her causal
ordering hypothesis require longitudinal designs
like those in tests of the reciprocal effects model. In
cognitive evaluation theory, Deci and Ryan (1985)
also hypothesized that increased perceptions of
competency lead to increased levels of intrinsic
motivation. Hence, Baumert, Schnabel, and Lehrke
(1998) suggested that the effect of achievement
on interest might be mediated through academic
self-concept. Based on responses by elementary
school children, Bouffard, Marcoux, Vezeau, and
Bordeleau (2003) reported that self-concept was
consistently related to achievement in reading and
mathematics at each year in school, whereas intrinsic
motivation did not contribute to the prediction of
achievement.

In her original expectancy-value model, Eccles


(1983) posited no links between expectations of
success and task value (including interest). However,
she hypothesized academic self-concept to affect
both expectations and value directly, and to affect
achievement-related choices indirectly through its
influences on expectations and value. Complicating
these predictions further, subsequent research (Eccles
& Wigfield, 1995; Wigfield & Eccles, 2002) indicated
that academic self-concept and expectations for success could not be distinguished empirically. Putting
together these different perspectives, expectancy-value theory posits academic self-concept to have a
causal effect on both academic interest and achievement, and academic interest to have an effect on academic achievement. However, reciprocal effects in
which prior achievement also affects subsequent interest and self-concept (mediated, perhaps, by other
constructs such as causal attributions) are also apparently consistent with expectancy-value theory.
Expectancy-value theory apparently does not, however, posit a direct effect of interest on self-concept.
In empirical research based on expectancy-value
theory, Eccles, Wigfield, and colleagues (Eccles, 1983;
Wigfield, 1994; Wigfield et al., 1997) showed that
correlations between self-perceived competency and
interest were evident for even very young children
but that the size of this relation increased with age
during early school years. Although self-perceived
competence was related to several different value
constructs in the expectancy-value model, the relations with interest were consistently strongest (Wigfield & Eccles, 2002). Particularly Wigfield et al.
(1997; see also Wigfield & Eccles, 2002) evaluated
patterns of relations between competence and interest with a multiwave multicohort study for children
ranging from second to sixth grades. Whereas competence perceptions were linked over time, as were
interest ratings, there were few cross-construct links
relating these two constructs over time. Where these
links did occur, they tended to be from prior competence perceptions to subsequent interest, thus
supporting expectancy-value predictions. For longitudinal growth trajectories of children in Grades 1
through 12, Jacobs et al. (2002) found positive relations between competency beliefs and task values
that generalized over domains and age. Consistent
with the expectancy-value assumption that competence causes task value, much of the variance in task
values was explained by competence perceptions.
However, Jacobs et al. also noted there might be reciprocal (bidirectional) effects between these two
constructs over time and argued for longitudinal
research that simultaneously considered competency

A Reciprocal Effects Model

beliefs and task values. Hence, a primary aim of the


present investigation was to pursue this limitation in
existing research.
Koller et al. (2001) argued that the role of interest
was particularly relevant in mathematics because it
is perceived to be a difficult subject; thus, motivational factors are important for enhancing academic
achievement. In longitudinal research covering the
high school years (Grades 7, 10, and 12), mathematics interest in Grade 7 had no direct effect on
achievement in Grade 10. However, interest did have
an effect on coursework selection, which in turn had
an effect on achievement in Grade 12. Koller et al.
also found that interest in Grade 10 did have a direct
effect on achievement in Grade 12, suggesting that
interest became more important in later school years,
when the learning environment was not so highly
structured and intrinsic motivation played a more
important role in academic choice. The results may
also be consistent with Wigfields (1994; Wigfield &
Eccles, 2002) conclusion that whereas prior competence perceptions are strong predictors of subsequent achievement, task values such as interest are
the better predictors of decisions to enroll in mathematics and English classes.
The Present Investigation
Despite the substantial overlap in the historical
development and juxtaposition of theoretical issues,
there have been few longitudinal causal-ordering
studies of relations among academic self-concept,
academic interest, and academic achievement, and
how these relations vary with gender. Marsh et al.
(1999) noted methodological limitations of tests of
the reciprocal effects model in self-concept research.
Particularly relevant to the present investigation,
they encouraged researchers to: (a) explore how effects associated with school grades and standardized
achievement test scores differed by contrasting both
achievement constructs in the same study, (b) incorporate additional motivational variables in these
SEMs to determine their role in the reciprocal effects
model, (c) consider a sufficiently large and diverse
sample to justify the use of SEMs, (d) evaluate the
generality of the findings across different subgroups
of respondents based on characteristics such as
gender, and (e) explore the implications of lags between different waves of data collection that varied
in length of timeFparticularly those within a single
academic year against those that span more than one
academic school year. Reciprocal effects between academic self-concept and achievement have been established but this research has not included academic

401

interest even though these constructs have been


juxtaposed in several theories. Hence, it is important
to evaluate the reciprocal effects among constructs
and academic achievement, using longitudinal designs and tests of causal ordering (Jacobs et al., 2002).
In the present investigation we pursued these theoretically important issues in two large studies of
math self-concept, interest, school grades, and
standardized test scores for nationally representative
samples of German Grade 7 students. Study 1 was
based on one cohort of students using two waves of
data collected in a single school year. Study 2 consisted of a different cohort of Grade 7 students collected approximately 3 years later, comprised two
waves spanning two different school years, and included both the interest measure used in Study 1 and
a new, potentially stronger measure of interest.
Study 1
Method
Data. Study 1 is based on the longitudinal study
Learning Processes, Educational Careers and Psychosocial Development in Adolescence and Young
Adulthood, which was conducted by the Max Planck
Institute for Human Development in Berlin, Germany. Data were collected from large, representative
samples from four German states in which schools
were selected randomly and two classes within each
school were sampled randomly. Most students (93%)
were German citizens and most were Caucasian
(95%). Because the sample was representative, it was
heterogeneous in relation to socioeconomic status.
The final sample used here consists of 5,649 seventh
graders (M age 5 13.4 years, 54% females) who were
tested at two points during the same school year.
Classes were excluded that participated in the study
on only one of the two occasions or that had less than
10 responding students. (For more detailed descriptions of the study and resulting database, see Koller,
1998; see also http://www.biju.mpg.de).
Measures. Math self-concept and math interest
were measured on each occasion (see the Appendix
for the wording of the items). Scores on both measures were reliable (as4.8) and have been shown in
previous research to have convergent and discriminant validity in relation to classroom-based performance in different school subjects (Baumert et al.,
1998). Standardized achievement in mathematics
was measured at Time 1 (T1) and Time 2 (T2). Math
achievement test items were taken from previous
national and international studies, in particular
the International Association for the Evaluation of

402

Marsh, Trautwein, Ludtke, Koller, and Baumert

Educational Achievement (IEA) First and Second


International Mathematics Study. Although most of
the items had a multiple-choice format, approximately 15% had a short-answer format. The content of
the test items was based on mathematics curricula
covered in Grades 5 to 7. Previous analyses (see Koller, 1998) based on item response theory revealed that
a unidimensional model was appropriate for describing the latent variable underlying the test results.
Therefore, we used the total mathematics scores at
both points of measurement. The coefficient alpha
estimates of reliability were greater than .75 for both
measurement points. We characterized the test as a
low-stakes test because the results did not contribute
to the formal evaluation of individual students or
school grades, students did not expect to receive
feedback on their individual performance, and students had no incentive to study for the examination.
School grades in mathematics were based on self-reports at the end of sixth grade (but reported at the
start of seventh grade) and in the middle of seventh
grade (T2).
Statistical analysis. SEMs were estimated with
LISREL (version 8.54) using maximum likelihood estimation (for further discussion of SEM, see Bollen,
1989; Byrne, 1998; Joreskog & Sorbom, 1993; Kaplan,
2000). Following recommendations by Marsh and
Hau (1996; see also earlier discussion by Joreskog,
1979), correlated uniquenesses were included for the
matching items collected at T1 and T2 because their
exclusion would positively bias the corresponding
correlation estimates. Their inclusion, however, had
no substantively important effect on the pattern of
parameter estimates, suggesting that the inclusion of
correlated uniquenesses was not a critical issue. To
facilitate the substantive import of the results, only the
models with correlated uniquenesses are presented.
In most educational research based on school settings, individual student characteristics are potentially confounded with those associated with classes
or schools because individuals are not assigned
randomly to groups (Raudenbush & Bryk, 2002). Our
data had a multilevel or hierarchical structure in that
students were nested within classes. Because students within the same class were likely to be more
homogeneous than a truly random sample of students, standard errors of parameter estimates were
likely to be biased in the direction of being too small
and to result in inflated levels of Type I errors (i.e.,
false positives). Fortunately, this problem is typically
less serious for parameter estimates based on relations between variables than means and typically not
a serious problem for self-concept data where there
is little variation among classes (e.g., Marsh, Hau, &

Kong, 2002; Marsh & Rowe, 1996). Consistent with


this previous research, preliminary analyses in the
present investigation indicated that variance components associated with the self-concept and interest
scores were very small, varying between .05 and .10 in
both Studies 1 and 2. In the present investigation, we
dealt with this problem by constructing a pooled
within-class covariance matrix in which between-class
differences were controlled. We did this by centering
the means of all variables at the mean of the class from
which the case came (i.e., computing the deviation
between the raw score and the corresponding class
mean for that score; for further discussion, see Goldstein, 2003; Raudenbush & Bryk, 2002).
Particularly for longitudinal data, the inevitable
missing data are a potentially important problem. In
Study 1, for example, approximately 12% of the responses were missing for the total sample of students who responded at either T1 or T2. In the
methodological literature on missing data (e.g.,
Graham & Hoffer, 2000; Little & Rubin, 1987), there is
a growing consensus that the imputation of missing
observations has several advantages over traditional
listwise, and particularly pairwise, deletion methods. In the present investigation, we explored a variety of alternative approaches to this problem but
chose to emphasize results based on the full information maximum likelihood (FIML) approach to
missing data (e.g., Allison, 2001). This approach
better represents the entire sample rather than just
the subsample of students who have no missing data
while still providing appropriate tests of statistical
significance that reflect the amount of missing data
for each variable. Whereas this distinction may be
less important for studies based on samples of convenience, it is important for samples specifically selected to be representative of a larger population, as
in the present investigation. For FIML analyses,
LISREL provides only the root mean square error of
approximation (RMSEA) to evaluate goodness of fit.
Whereas tests of statistical significance and indexes
of fit aid in evaluating the fit of a model, there is
ultimately a degree of subjectivity and professional
judgment in the selection of a best model (Marsh,
Balla, & McDonald, 1988). Hence, our main focus is
on the evaluation of parameter estimates. Although
we argue a priori for the superiority of the FIML
approach that is the focus of our presentation, it is
important to emphasize that the results from the
FIML analyses were very similar to unreported
analyses based on imputation with expectation
maximization, as well as to other approaches that we
explored. In particular, both analyses resulted in
fully proper solutions, well-defined factors with

A Reciprocal Effects Model

substantial factor loadings, standardized parameter


estimates that were very similar, and satisfactory
goodness of fit.
In CFA studies with multiple groups, it is possible
to test the invariance of any one, any set, or all parameter estimates across the multiple groups. Tests
of factorial invariance (see Bollen, 1989; Byrne, 1998;
Joreskog & Sorbom, 1993; Marsh, 1994) traditionally
posit a series of nested models in which the endpoints are the least restrictive model with no invariance constraints and the most restrictive (total
invariance) model with all parameters constrained to
be the same across all groups. Testing for factor invariance essentially involves comparing a number of
models in which aspects of the factor structure are
systematically held invariant across groups (males
and females in the present investigation), and assessing fit indexes when elements of these structures
are constrained. If the introduction of increasingly
stringent invariance constraints results in little or no
change in goodness of fit, there is evidence in support of the invariance of the factor structure. In
general, the minimal condition for factorial invariance is the equivalence of all factor loadings in the
multiple groups (e.g., Bollen, 1989; Byrne, 1998;
Joreskog & Sorbom, 1993; Marsh, 1994), but our main
focus is on the invariance of correlations among the
latent constructs and path coefficients relating T1
constructs to T2 constructs.
Within this framework, it is also possible to extend
tests of invariance of mean and covariance structures
to include measured variable intercepts and latent
mean differences in the constructs (Byrne, 1998;
Kaplan, 2000; Joreskog & Sorbom, 1993; Marsh &
Grayson, 1994). Adapting terminology from item
response theory (Marsh & Grayson, 1994), each
measured variable (t) is related to the latent construct
(T) by the equation t 5 a1bT where b is the slope (or
discrimination) parameter that reflects how changes
in the observed variable are related to changes in the
latent construct and a is the intercept (or difficulty)
parameter that reflects the ease or difficulty of getting high manifest scores for a particular measured
variable. Unless there is complete or at least partial
invariance of both the a and b parameters across the
multiple groups, the comparison of mean differences
across the groups may be unwarranted. Following
Meredith (1993), we adapt the terms strong and strict
invariance. Strong invariance holds when factor
loadings and measured variable intercepts are invariant across groups so that between-group differences in average item scores reflect differences in
latent means. Strict invariance holds when measured
variable uniquenesses also are invariant across

403

groups so that item and scale variances are comparable across groups. Although latent mean differences are typically tested with CFAs, it is also
possible to test for differences with an SEM in which
subsequent mean differences are corrected for differences in variables occurring earlier in the causal
ordering of longitudinal data (Marsh & Grayson,
1990). In the present investigation, for example, we
tested for latent mean differences between responses
by boys and girls in four constructs (self-concept,
interest, grades, test scores) at T1 and T2 in a CFA
model, and then evaluated differences at T2 controlling for differences at T1 in an SEM.
Results and Discussion
We began by evaluating the complete SEM that
included T1 and T2 measures of math self-concept,
interest, school grades, and test scores (see Table 1).
Math self-concept and interest were both positively
correlated with math grades and test scores (Table 1).
However, consistent with a priori predictions, both
self-concept and interest were systematically more
highly correlated with school grades (about .40 for
self-concept and .22 for interest) than with standardized test scores (about .30 for self-concept and .15
for interest). It is also important to note that correlations of self-concept with achievement were more
positive than the corresponding correlations between interest and achievement. Finally, the pattern
of these results and actual sizes of the correlations
were consistent across T1 and T2 responses.
In the evaluation of path coefficients relating T1
and T2 constructs, we juxtaposed the results of several models. In each case we began with the overall
model that contained all four constructs (self-concept, interest, grades, test scores). However, because
there were positive correlations among all four constructs, multicollinearity might obscure the pattern
of results. Thus, for example, it would be possible for
both self-concept and interest to have significant effects on achievement when considered separately
but for the effects of neither construct to be statistically significant when both are considered simultaneously (i.e., the unique effect of neither is significant
when the effects of each are controlled for the effects
of the other). For this reason, we also conducted a
series of supplemental models in which we evaluated the causal ordering among various pairs of
constructs (see Table 2 and Figure 1) that are more
like traditional causal models used in previous reciprocal effects research.
Math self-concept and achievement. Consistent with
a priori predictions and previous research, reciprocal

404

Marsh, Trautwein, Ludtke, Koller, and Baumert

Table 1
Factor Solution Relating Academic Self-Concept, Interest, School Grades, and Test Scores at Times 1 and 2 in Study 1: Full Information Maximum
Likelihood Estimation
Time 1 constructs
MASC
Factor loadings
T1MASC1
.63a
T1MASC2
.77
T1MASC3
.80
T1MASC4
.63
T1MASC5
.80
T1MINT1
T1MINT2
T1MINT3
T1MINT4
T1MGrade
T1Mtest
T2MASC1
T2MASC2
T2MASC3
T2MASC4
T2MASC5
T2MINT1
T2MINT2
T2MINT3
T2MINT4
T2MGrade
T2MTst
Path coefficients
T2MASC
.57
T2Mint
.07
T2MGrd
.24
T2MTst
.09
Residual variances/covariances
T1MASC
1.00
T1Mint
.56
T1MGrd
.41
T1MTst
.32
T2MASC
T2Mint
T2MGrd
T2MTst
Correlationsb
T1MASC
1.00
T1Mint
.56
T1MGrd
.41
T1MTst
.32
T2MASC
.62
T2Mint
.39
T2MGrd
.44
T2MTst
.29

MInt

MGrd

Time 2 constructs
MTst

MASC

Mint

MGrd

MTst

.62a
.75
.80
.62
1.00a
1.00a
.72a
.84
.86
.73
.84
.65a
.78
.82
.65
1.00a
1.00a
.04
.55
.01
.02

.03
.00
.35
.17

1.00
.22
.15

1.00
.35

1.00
.22
.15
.37
.59
.25
.17

1.00
.35
.29
.16
.50
.35

.06
.02
.15
.40

1.00

1.00
.25
.12
.35
.49

.61
.16
.09
.07

.65
.06
.03

.66
.10

1.00
.41
.38
.28

1.00
.23
.15

1.00
.37

.71

1.00

Note. All variables were given a label that identifies the Time (T1 or T2), the construct (MASC 5 math self-concept, MINT 5 math interest,
Mgrd 5 math grade, or MTst 5 math test), and, for the multiple indicators of each latent construct, the item number. All parameter estimates are presented in completely standardized form. Not presented are the uniquenesses and correlated uniquenesses. Although the
full information maximum likelihood chi-square of 3516.3 (df 5 176) was highly significant because of the large sample size, the root mean
square error of approximation (RMSEA) of .058 demonstrated that the model was able to fit the data well. For comparison purposes, the
same model was estimated, in which missing values were imputed using the expectation maximization algorithm. The parameter estimates were nearly identical to those presented here and the goodness-of-fit statistics indicated a good fit to the data (normal theory
weighted least squares w2 5 4250.681, df 5 176, RMSEA 5 .064, non-normed fit index 5 .965, comparative fit index 5 .973, standardized root
mean square residual 5 .0383).
a
In the unstandardized model, the first indicator of each construct was fixed to 1.0 to fix the metric of the factor.
b
Factor correlations are based on the equivalent confirmatory factor analysis model in which all constructs are correlated. Time 1 correlations are the same as in the structural equation model (see residual variance and covariance estimates) but differ from Time 2 estimates in
that the effects of Time 1 constructs are not partialed out of the correlations, whereas they are partialed out for the Time 2 residual
variances and covariances.
p o .05. p o .001.

A Reciprocal Effects Model


Table 2
Path Coefficients Relating Time 1 Constructs (Math Self-Concept,
Interest, School Grades, Standardized Test Scores) to Corresponding
Time 2 Constructs for Alternative Structural Equation Models
Considered in Studies 1 and 2

Time 1

T1MASC

T1MInt

1. Study 1
T2MASC
.57
.04
T2MInt
.07
.55

T2MGrd
.24
.01
T2MTst
.09
.02
2. Study 2, subject-specific interest
T2MASC
.55
.07

T2MInt
.12
.51

T2MGrd
.26
.01
T2MTst
.16
 .01
3. Study 2, domain-specific interest
T2MASC
.55
.09

T2MInt
.16
.51
T2MGrd
.26
 .01
T2MTst
.16
 .01
4. Study 2, combined interest
T2MASC
.54
.09
T2MInt
.13
.52
T2MGrd
.26
.00
T2MTst
.16
 .01

T1MGRd

Time 2
.57/.55

MSC

.07/.12

MSC

.04/.07

.09/.16

Time 1 constructs
Time 1 constructs

405

T1MTst

MInt
.03
.00
.35
.17

.02
.15
.40

.08
.04
.37
.14

.03
 .03
.06
.35

.08
.01
.37
.14

.03
.00
.07
.35

.08
.04
.37
.14

.03
 .02
.06
.35

MInt

.55/.51

.06

.24/.26

.03*/.08

MGrd

.35/.37

.06/.03*
.15/.06

Note. All variables were given a label that identifies the Time (T1 or
T2), the construct (MASC 5 math self-concept, MInt 5 math interest, MGrd 5 math grade, or MTst 5 math test). All models were
based on full information maximum likelihood. All parameter
estimates are presented in completely standardized form. Results
for Model 1 (see Table 1) and Model 3 (see Table 3) are presented in
more detail and are the main focus of the present investigation,
whereas critical parameter estimates for the alternative models are
presented here for comparison purposes.
p o .05. p o .001.

effects were found between math self-concept and


achievement in the overall model (see Table 1 and
Figure 1). It is not surprising that the strongest effect
of T1 math self-concept was on T2 math self-concept.
However, the effect of T1 math self-concept was also
statistically significant for both T2 math grades (.24)
and T2 math test scores (.09), even after controlling
for the effects of other T1 measures (interest, grades,
test scores). Also consistent with a priori predictions,
the effects of T1 math self-concept were greater for
T2 school grades than for T2 test scores. The effects
of the T1 achievement on T2 self-concept were
smaller than the effects of T1 self-concept on T2
achievement. Whereas the effects of T1 math test
scores on T2 math self-concept were small (.06) but
highly significant (po.001), the effects of T1 math
grades were even smaller (.03) and marginally non-

MGrd

MTest

.17/.14

.40/.35

MTest

Figure 1. Structural equation model paths relating Time 1 (T1) to


Time 2 (T2) constructs. Stability (horizontal) paths between
matching T1 and T2 constructs are presented in gray and all paths
between nonmatching constructs are presented in black. The first
coefficient in each box is based on Study 1 results (see also Table 1)
and the second is based on Study 2 results (see also Table 5). Only
statistically significant paths are presented (except where a path
was significant in only one of the two studies, in which case the
nonsignificant path is presented with an asterisk). MSC 5 math
self-concept, MInt 5 math interest, MGrd 5 math grade, MTest 5
math test scores.

significant (.104p4.05). Although these results


support the reciprocal effects model, the effects of
prior self-concept on subsequent achievement were
stronger than the corresponding effects of prior
achievement on subsequent self-concept.
In supplemental analyses, we constructed separate
models to evaluate the reciprocal effects of math selfconcept with math grades and with math test scores
(excluding math interest; see Figures 2.1 and 2.2 for
Study 1). Although the pattern of results in these
supplemental models was the same as in the overall
model, the effects were stronger. Thus, for example,
the effects of T1 math self-concept were .15 and .28
for the models of the test scores and grades, respectively (compared with .07 and .24 in the overall
model). Also, the .04 effect of T1 grades on T2 math
self-concept was statistically significant (po.01),
whereas it was marginally nonsignificant in the
overall analysis. Hence, results of the supplemental
analyses were consistent with those based on the
overall model and supported predictions based on
the reciprocal effects model of self-concept and
achievement.

406

Marsh, Trautwein, Ludtke, Koller, and Baumert

1: Self-Concept and Grades


Time 1

Time 2

Time 1

MSC

MSC

.60/.60

MSC
.28/.27

MGrd

MGrd

3: Interest and Grades


Time 1

MGrd

.60/.63

MTest

.06/.05

.45/.39

Time 2

Time 1

MInt

MInt

.03*/.08

MTest

Time 2
.59/.58

MInt

.09/.10

MGrd

.47/.50

MSC

4: Interest and Test Scores

.59/.56

.15/.14

Time 2

.15/.21

.04/.09

.39/.39

MInt

2: Self-Concept and Test Scores

MTest

.48/.45

MTest

5: Self-Concept and Interest


Time 1

Time 2
.60/.60

MSC
.08/.08

MInt

MSC
.04/.10

.55/.60

MInt

Figure 2. Structural equation model paths relating Time 1 (T1) to Time 2 (T2) constructs. Separate models were fitted to selected pairs of
constructs. Stability (horizontal) paths are presented in gray and all statistically significant paths between different constructs are presented in black. The first coefficient in each box is based on Study 1 results (see also Table 1) and the second is based on Study 2 results (see
also Table 5). Only statistically significant paths are presented (except where a path was significant in only one of the two studies, in which
case the nonsignificant path is presented with an asterisk). MSC 5 math self-concept, MInt 5 math interest, MGrd 5 math grade,
MTest 5 math test scores.

Math interest and achievement. Although math interest was correlated with math achievement, there
was no support for any reciprocal effects between the
two constructs based on the overall model (Table 1).
Whereas T1 math interest had a substantial effect on
T2 math interest, it had no significant effects on either T2 math test scores (.02) or T2 math grades (.01).
Similarly, influences on T2 math interest were not
statistically significant for either T1 math test scores
(.02) or T2 math grades (.00). However, in different
supplemental analyses that considered only pairs of
the four constructs, some of these effects were statistically significant. In particular, there were statistically significant effects of T1 math interest on T2

math grades (.15; Figure 2.3) and T2 math test scores


(.09; Figure 2.4). Hence, some of the small effects
associated with math interest in these supplemental
analyses were lost in the more demanding overall
model in which the effects of all four T1 constructs
were controlled. Although the effects were small, the
supplemental analyses suggest that the effects of T1
interest on subsequent achievement were stronger
than the effects of T1 achievement on subsequent
T2 interest.
Math self-concept and interest. The evaluation of the
causal ordering of academic self-concept and interest
is apparently unique to the present investigation.
Based on the full model (Table 1), there was some

A Reciprocal Effects Model

407

gender differences in mean levels of the four constructs? As emphasized earlier, advances in SEM
allow us to incorporate both questions into a latent
factor model such that inferences about means are
based on latent mean differences derived from an
appropriate factor structure.
First, to evaluate the invariance of the SEM across
gender, we pursued a traditional two-group analysis
in which we constrained various sets of parameter
estimates to be invariant over gender. The invariance
of factor loadings is typically considered the minimal
condition for factorial invariance. In the present investigation, we compared RMSEA indexes for models
with a variety of different sets of invariance constraints ranging from model MG1 (no invariance
constraints for any parameter estimates) to the most
restrictive model MG9 (all parameter estimates
Ffactor loading, factor variances and covariances,
factor path coefficients, and measured variable
uniquenessesFconstrained to be the same in solutions for males and females). Although many models
were considered, the results are easy to summarize.
RMSEA values improved progressively for each set of
invariance constraints such that the best model (with
lowest RMSEA value) was the model imposing complete invariance across gender (MG9 in Table 3).

support for the reciprocal effects of math self-concept


and math interest. The effect of T1 math self-concept
on T2 math interest (.07) was highly significant,
whereas the effect of T1 math interest on T2 math
self-concept (.04) was marginally significant. In the
separate analysis of the self-concept and interest
constructs (Figure 2.5), the effect of T1 math selfconcept on T2 math interest (.09) was slightly more
positive than in the overall analysis (Table 1). It is
interesting that the effect of T1 math interest on T2
math self-concept was slightly lower (.036 vs. .039)
and was marginally nonsignificant (p 5 .051). Hence,
whereas there was consistent support for the effect of
prior self-concept on subsequent interest, the evidence supporting the effect of prior interest on subsequent self-concept was marginal.
Gender differences. We now extend the results to
test the generalizability of the results across gender,
exploring two sets of related questions. First, do the
reciprocal effects of math self-concept, interest, and
achievement vary as a function of gender? To evaluate this question we tested the invariance of the full
factor model with all four constructs (math selfconcept, interest, grades, test scores), but our main
focus was on the invariance of the path coefficients
relating T1 and T2 constructs. Second, are there

Table 3
Structural Equation Models of Gender Invariance of Factor Structure and Latent Means: Fit of Alternative Models
Study 1
Model
Invariance
MG1
MG2
MG3
MG4
MG5
MG6
MG7
MG8
MG9
Invariance
MG10

df

w2

RMSEA

of factor structure
352
3759
.060
366
3825
.059
382
3882
.058
386
4018
.059
393
3978
.058
402
4071
.058
409
4040
.057
413
4188
.058
429
4244
.057
of latent means
429
4244
.057

Study 2
95% CI

w2

RMSEA

95% CI

Invariance constraints

(.058 .061)
(.057 .061)
(.057 .060)
(.057 .061)
(.056 .060)
(.056 .060)
(.056 .059)
(.057 .060)
(.056 .059)

1387
1432
1467
1548
1517
1582
1553
1650
1684

.051
.051
.050
.051
.050
.051
.050
.051
.051

(.048 .054)
(.048 .054)
(.047 .053)
(.049 .054)
(.048 .053)
(.048 .054)
(.047 .052)
(.049 .054)
(.048 .054)

FL 5 Free FV/CV 5 Free Uniq 5 Free PC 5 Free


FL 5 Inv FV/CV 5 Free Uniq 5 Free PC 5 Free
FL 5 Inv FV/CV 5 Free Uniq 5 Free PC 5 Inv
FL 5 Inv FV/CV 5 Inv Uniq 5 Free PC 5 Free
FL 5 Inv FV/CV 5 Free Uniq 5 Inv PC 5 Free
FL 5 Inv FV/CV 5 Inv Uniq 5 Free PC 5 Inv
FL 5 Inv FV/CV 5 Free Uniq 5 Inv PC 5 Inv
FL 5 Inv FV/CV 5 Inv Uniq 5 Inv PC 5 Free
FL 5 Inv FV/CV 5 Inv Uniq 5 Inv PC 5 Inv

(.056 .059)

1684

.051

(.048 .054)

FL 5 Free FV/CV 5 Free Uniq 5 Free PC 5


Free Inter 5 Free LFMD 5 zero
FL 5 Free FV/CV 5 Free Uniq 5 Free PC 5
Free Inter 5 Inv LFMD 5 Free
FL 5 Free FV/CV 5 Free Uniq 5 Free PC 5
Free Inter 5 Inv LFMD 5 zero

MG11

443

4286

.056

(.055 .058)

1737

.051

(.048 .053)

MG12

451

4665

.059

(.057 .060)

1919

.054

(.051 .056)

Note. In each model, factor structures for responses by males and females were compared subject to constraints that some parameter
estimates were the same (Inv 5 invariant) in the two solutions or were unconstrained and freely estimated in the two solutions (Free).
w2 5 full information chi-square; RMSEA 5 root mean square error of approximation; 95% CI 5 95% confidence interval about the RMSEA;
FL 5 factor loading; FV/CV 5 factor variance/covariance matrix; uniq 5 measured variable uniqueness; PC 5 path coefficient;
LFMD 5 latent factor mean differences (between males and females).

408

Marsh, Trautwein, Ludtke, Koller, and Baumert

Hence, consistent with a priori predictions, the results provide strong support for the generalizability
of results of Study 1 over gender.
Second, although tangential to our main focus, we
evaluated gender differences in the latent means for
factors representing our four constructs at T1 and T2.
In pursuing this issue, we began with the model of
complete invariance of the latent factor structure and
evaluated whether item intercepts were invariant
over gender. Item intercepts reflect the difficulty of
an item in the sense that students at a given level of
the underlying latent construct (e.g., math self-concept) will give systematically more positive responses to easy items and systematically less
positive response to difficult items. At least reasonable support for the invariance of item intercepts
across males and females is typically taken to be the
minimal condition for valid comparisons of mean
differences. Unless there is such support, differences
on latent means are not consistent across items used
to define the factor. Fortunately, a comparison of the
RMSEAs for models MG10 and MG11 provided
good support for the invariance of item intercepts.
Next, we compared models in which latent means
are constrained to be equal across responses by
males and females. In contrast to all other tests of
invariance over gender, comparison of these models
(MG11 and MG12 in Table 3) indicated that the latent
means differed for males and females.
To evaluate the nature of these latent mean differences (Table 4), we compared responses for males
and females on the four T1 constructs, the corresponding four T2 constructs without controlling for
T1 constructs, and the corresponding four T2 constructs after controlling for T1 constructs. In general,
males had substantially higher math self-concepts
and interest at T1 and T2. Controlling for T1 constructs largely, but not completely, eliminated differences in these two constructs at T2. For math test
scores, males scored higher than did females at T1,
but their advantage was much smaller at T2 so that
after controlling for T1 scores females actually did
slightly better than males at T2. For math grades,
there were no significant differences between males
and females at T1, and girls performed marginally
better than did males at T2.
In summary, tests of gender differences largely
supported a priori predictions that males would have
substantially higher math self-concept and interest
scores and moderately higher math test scores.
However, also consistent with a priori predictions
based on the gender-invariant model, the pattern of
path coefficients linking T1 constructs to T2 constructs
was remarkably similar for male and female students.

Table 4
Latent Mean Differences for Males and Females in Four Math
Constructs (Positive Values Reflect Higher Scores for Males)

Study 1
Math self-concept
Math interest
Math Grades
Math test scores
Study 2
Math self-concept
Math interest
Math grades
Math test scores

Time 1

Time 2

Time 2

No control

No control

Control

Mn

SE

Mn

SE

Mn

SE

.46
.29
.00
.26

.03
.03
.03
.03

.39
.29
 .08
.10

.03
.03
.03
.03

.12
.10
 .08
 .06

.03
.03
.02
.02

.46
.27
.16
.33

.05
.05
.04
.04

.34
.35
.06
.27

.04
.05
.05
.04

.06
.21
 .17
.06

.04
.05
.04
.02

Note. Completely standardized mean differences (based on model


MG12 with factor loading, factor variances/covariances, path coefficients, measured variable uniquenesses, and measured variable intercepts all invariant over solutions by males and females,
but latent factor means freely estimated). Results based on Time 2
responses are presented without controlling for corresponding
Time 1 factors (no control) and controlling for Time 1 responses
consistent with the a priori path model (control).
po.05.

Study 2
The purpose of Study 2 was to evaluate the replicability of results from Study 1 and the generalizability
of results across responses by students in two different school years to two different interest measures
based on data from the German component of the
Third International Mathematics and Science Study
(TIMSS; Baumert et al., 1997). Unlike the international TIMSS study, the German component was
longitudinal in that math achievement, interest, and
self-concept were collected in both Grades 7 and 8.
Study 2 contained the same math self-concept and
interest items used in Study 1 and similar measures
of math achievement (standardized test scores and
school grades). It differed in that data were based on
responses by students in Grades 7 and 8 so that the
data collection waves were separated by a full academic year (Study 1 was based on responses from
two occasions in Grade 7) and data in Study 2 were
collected about 3 years after those in Study 1. In
addition, two measures of interest were used in
Study 2. To maintain comparability with Study 1, the
same interest items were included in Study 2.
However, a separate measure of interest was also
included based on subsequent theoretical work by
Krapp, Schiefele, and colleagues (e.g., Krapp, 2000;

A Reciprocal Effects Model

Krapp, Prenzel, & Schiefele, 1986; Schiefele, 1996,


1998; Schiefele et al., 1992) and others (e.g., Wigfield
& Eccles, 1992), as described by Koller et al. (2001).
The interest measure used in both Studies 1 and 2
referred to the mathematics course in which students
were currently enrolled (class-specific interest),
whereas interest in the second measure referred to
interest in the mathematics domain more generally
(domain-specific interest; see the Appendix for the
wording of items from both measures). Particularly
given the weak effects associated with math interest
in Study 1, it is important to evaluate the generalizability of these effects with potentially stronger
measures of interest.

Method
Data. Study 2 was based on a sample of German
Grade 7 students who participated in the TIMSS
(Baumert et al., 1997; Beaton et al., 1996). The sample
was nationally representative with respect to region,
school type, and gender. The German TIMSS study
in Grades 7 and 8 contained some national extensions compared to the international design. The
longitudinal, nationally representative sample consisted of 128 randomly selected schools in which one
class per school was sampled randomly. Most students (90%) were born in Germany, including 94%
with German citizenship (including 3.4% with dual
citizenship). Students spoke German at home always
or almost always (88%) or sometimes (10%), and
most were Caucasian (95%). Because the sample was
representative, it was heterogeneous in relation to
socioeconomic status. The final sample in the present
investigation included a total of 2,264 students (50%
female, M age in Grade 7 5 13.7 years) who were
tested at two points. Excluded from this final sample
were schools that participated in the study on only
one of the two occasions, or for which there were less
than 10 students who responded.
Measures. The math self-concept and class-specific interest measures were the same as those used
in Study 1, whereas the domain-specific interest
measure was based on a new five-item scale (see the
Appendix). Math achievement in Grade 8 was
measured by 158 items that were part of the official
TIMSS item base. The items were distributed over
eight booklets. Each booklet contained between 30
and 40 items, some specific to that booklet and some
anchor items common to all booklets. Students
worked on one booklet each, thus allowing broad
subject matter coverage without student exhaustion.
All items were checked for curricular validity. Six

409

content areas and four performance categories were


covered. All responses were scaled using item response techniques (see Ludtke, Koller, Marsh, &
Trautwein, in press; see also Beaton et al., 1996). The
36 math items in Grade 7 (T1) were taken from
previous studies by the International Association for
the Evaluation of Educational Achievement, in particular from the First and Second International
Mathematics Study (cf. Husen, 1967; Robitaille &
Garden, 1989) and from an earlier investigation
conducted by the Max Planck Institute for Human
Development. Several items used at T1 were also
administered in the official TIMSS study 1 year later.
We were thus able to build a common achievement
metric for T1 and T2 using item response theory
applications.
Statistical analysis. As in Study 1, SEMs included
correlated uniquenesses for matching items in the
longitudinal data, data were centered within each
class, FIML was used because of small amounts of
missing data (approximately 3.4% of the responses
were missing for the total sample of students who
responded at either Time 1 or Time 2), and gender
differences were evaluated in the pattern of causal
relations among the four constructs measured at T1
and T2 and in the latent mean responses by boys and
girls.
Results and Discussion
As in Study 1, we began with an evaluation of the
complete SEM that included T1 and T2 measures of
math self-concept, math interest, math school grades,
and math test scores. Separate models were evaluated using class-specific measures of math interest
that are directly comparable to Study 1 (Table 5),
domain-specific measures of math interest, and
models with both class and domain-specific measures of interest. In preliminary analyses, results
based on the class- and domain-specific measures of
interest were similar; therefore, we focus on the
class-specific measures of interest that are most
comparable to those used in Study 1. In a subsequent
model that contained both class- and domain-specific measures of interest as separate constructs, the
two latent interest constructs were so highly correlated at each occasion (approximately .9 at T1 and T2)
that their separate effects could not be reliably distinguished (because of multicollinearity). On this
basis we also tested a model in which both class- and
domain-specific interest items reflected a single interest factor, but the results were essentially the same
as the separate models based on class- and domainspecific measures of interest. To facilitate presentation

410

Marsh, Trautwein, Ludtke, Koller, and Baumert

Table 5
Factor Solution Relating Academic Self-Concept, Interest, School Grades, and Test Scores at Times 1 and 2 in Study 2: Full Information Maximum
Likelihood Estimation
Time 1 constructs
MASC
Factor loadings
T1MASC1
.57a
T1MASC2
.74
T1MASC3
.82
T1MASC4
.51
T1MASC5
.83
T1MINT1
T1MINT2
T1MINT3
T1MINT4
T1MGrade
T1Mtest
T2MASC1
T2MASC2
T2MASC3
T2MASC4
T2MASC5
T2MINT1
T2MINT2
T2MINT3
T2MINT4
T2MGrade
T2MTst
Path coefficients
T2MASC
.55
T2Mint
.12
T2MGrd
.26
T2MTst
.16
Residual variances/covariances
T1MASC
1.00
T1Mint
.58
T1MGrd
.52
T1MTst
.36
T2MASC
T2Mint
T2MGrd
T2MTst
Correlationsb
T1MASC
1.00
T1Mint
.58
T1MGrd
.52
T1MTst
.36
T2MASC
.64
T2MInt
.43
T2MGrd
.47
T2MTst
.35

MInt

MGrd

Time 2 constructs
MTst

MASC

MInt

MGrd

MTst

.52a
.76
.80
.54
1.00a
1.00a
.62a
.79
.84
.64
.85
.62a
.76
.81
.62
1.00a
1.00a
.07
.51
.01
 .01

.08
.04
.37
.14

1.00
.24
.17

1.00
.40

1.00
.24
.17
.42
.58
.25
.18

1.00
.40
.39
.22
.53
.36

.03
 .03
.06
.35

1.00
.58
.25
.21
.14

.64
.13
.06

.66
.13

.73

1.00
.27
.12
.31
.46

1.00
.55
.54
.39

1.00
.33
.20

1.00
.38

1.00

Note. All variables were given a label that identifies the Time (T1 or T2), the construct (MASC 5 math self-concept, MINT 5 math interest,
Mgrd 5 math grade, or MTst 5 math test), and, for the multiple indicators of each latent construct, the item number. All parameter estimates are presented in completely standardized form. Not presented are the uniquenesses and correlated uniquenesses. Although the
full information maximum likelihood chi-square of 1106.477 (df 5 176) was highly significant because of the large sample size, the root
mean square error of approximation (RMSEA) of .054 demonstrated that the model was able to fit the data well. For comparison purposes,
the same model was estimated, in which missing values were imputed using the expectation maximization algorithm. The parameter
estimates were nearly identical to those presented here and the goodness-of-fit statistics indicated a good fit to the data (normal theory
weighted least squares w2 5 1181.9, df 5 176, RMSEA 5 0.0563, non-normed fit index 5 0.970, comparative fit index 5 0.977, standardized
root mean square residual 5 0.0378).
a
In the unstandardized model, the first indicator of each construct was fixed to 1.0 to fix the metric of the factor.
b
Factor correlations are based on the equivalent confirmatory factor analysis model in which all constructs are correlated. Time 1 correlations are the same as in the structural equation model (see residual variance/covariance estimates) but differ from Time 2 estimates in
that the effects of Time 1 constructs are not partialed out of the correlations, whereas they are partialed out for the Time 2 residual
variances and covariances.
po.05. po.001.

A Reciprocal Effects Model

of the results, we focus on the model with classspecific measures of interest (Table 5) but also briefly
summarize results based on the other models as well
(see Table 2).
Math self-concept and class-specific interest were
both positively correlated with math grades and test
scores (Table 5), but both self-concept and interest
were more highly correlated with school grades than
with standardized test scores. Also, correlations between self-concept and both achievement constructs
(grades and test scores) were more positive than the
corresponding correlations between interest and
achievement. These patterns of results were consistent across T1 and T2 responses.
In evaluating path coefficients relating T1 and T2
constructs, we focused on the overall model that
contained all four constructs (self-concept, interest,
grades, test scores) and was based on the class-specific measure of interest. However, we also juxtaposed the results of models in which interest was
based on responses to the class-specific items, the
domain-specific items, or both sets of interest items.
As in Study 1, we also conducted supplemental
analyses in which we evaluated various pairs of constructs separately (see Figure 2) that were more like
traditional causal models used in previous research.
Math self-concept and achievement. Consistent with
previous self-concept research (and Study 1), there
were reciprocal effects between math self-concept
and achievement in the overall model that included
all constructs (see Table 5 and Figure 1). Of particular
relevance, the effect of T1 math self-concept was
statistically significant for both T2 math grades (.26)
and T2 math test scores (.16), and the effects of T1
math self-concept were greater for T2 school grades
than for T2 test scores. The effects of T1 math grades
on T2 math self-concept were small (.08) but highly
significant (po.001), whereas the effects of T1 math
test scores were even smaller (.03) and not statistically significant. Hence, whereas the effects were
reciprocal, the effects of self-concept on achievement
are stronger than the effects of achievement on selfconcept. In additional models based on domainspecific measures of interest or both class- and domain-specific measures of interest (Table 2), these
path coefficients were nearly identical to those in
Table 5 based on the class-specific measure of interest (none of the path coefficients differed by more
than .05 and the pattern of significant and nonsignificant effects was the same in all three models).
As in Study 1, we constructed separate models to
evaluate the reciprocal effects of math self-concept
with math school grades and with math test scores
(excluding math interest; see Figures 2.1 and 2.2 for

411

Study 2). The pattern of results in these supplemental models was the same as in the overall model,
but the effects tended to be strongerFparticularly
the effect of T1 math self-concept on T2 math grades
(.27; Figure 2.1) and on T2 math test scores (.21;
Figure 2.2). The effects of T1 math grades and T1
math test scores on T2 math self-concept, although
clearly smaller than the effect of T1 math self-concept
on T2 math achievement, were also statistically significant. Hence, these supplemental analyses supported the reciprocal effects model of self-concept
and achievement.
Math interest and achievement. Math interest was
correlated with both measures of math achievement.
However, in the overall model, the effects of T1 math
interest were nonsignificant for both T2 school
grades and T2 test scores, as were the effects of T1
test scores and T1 school grades on T2 math interest
(Table 5). Furthermore, this consistent pattern of
near-zero, nonsignificant effects between interest and
achievement was consistent across different models
based on the class-specific, domain-specific, and
combined (class- and domain-specific) measures of
math interest (see Table 2). However, in supplemental analyses that considered math interest in
combination with either grades or test scores (excluding math self-concept), some effects involving
math interest were statistically significant. In particular, math interest had statistically significant effects
on both math grades (Figure 2.3) and math test scores
(Figure 2.4), whereas the effects of math grades on
math interest (Figure 2.3) were also significant.
Math self-concept and interest. Based on the full
model (Table 5), there was also some support for the
reciprocal effects of math self-concept and math interest; the effects of T1 math self-concept on T2 math
interest (.12) and of T1 math interest on T2 math selfconcept (.07) were both statistically significant. This
same pattern of results was evident in additional
models using class-specific, domain-specific, and
combined (class- and domain-specific) measures of
math interest (see Table 2). However, even in the
supplemental analyses that excluded the math test
scores and math grades (see Figure 2.5), the reciprocal effects of self-concept on interest and interest
on self-concept did not exceed .10.
Gender differences. As in Study 1, we tested the
generalizability of the results across gender. To facilitate comparisons between Studies 1 and 2, we
only considered the subject-specific measure of interest in Study 2 that was the same as the interest
measure in Study 1. We began with two-group
invariance tests in which we constrained various
sets of parameter estimates to be invariant over

412

Marsh, Trautwein, Ludtke, Koller, and Baumert

responses by males and females. Based on comparison of RMSEA values across models MG1 to MG9,
there were almost no differences in fit between any
of the models. In particular, the most restrictive
model (in which all parameters were constrained to
be the same for males and females) had an RMSEA
of .051, equal to that in the least restrictive model in
which there were no invariance constraints. Hence,
consistent with a priori predictions, the results provide strong support for the generalizability of results
over gender.
Next, we evaluated gender differences in the
means of latent factors representing our four constructs at T1 and T2. A comparison of the RMSEAs
for models MG10 and MG11 provided good support
for the invariance of item intercepts, whereas comparison of models MG11 and MG12 (Table 3, Study
2) indicated that the latent means did differ. In general, males had substantially higher latent means for
math self-concept, math interest, and math tests at T1
and T2 (Table 4, Study 2). Whereas males had
slightly higher math grades at T1, there was no significant difference between males and females at T2
so that females had slightly higher school grades at
T2 after controlling for T1 constructs. A comparison
of the results for Studies 1 and 2 (see Figure 1) shows
that the size and pattern of statistically significant
and nonsignificant path coefficients from each study
are similar.
General Discussion
Although developmental and educational psychologists posit interest and self-concept as primary determinants of outcomes such as achievement,
performance, and choice of behavior, as emphasized
by Wigfield and Eccles (2002), there is a need to integrate the developmental and educational psychology research traditions to provide a more complete
picture. Even though there is substantial overlap of
research into academic self-concept and academic
interest, there have been few if any longitudinal
causal-ordering studies of relations among self-concept, interest, achievement, and gender differences.
We began by arguing that a critical question in
self-concept and motivation research is whether
there are causal links from prior measures of academic self-concept, academic interest, and academic
achievement to subsequent measures of these same
constructs. Our results provide clear evidence that
prior academic self-concept does predict subsequent
academic achievement beyond what can be explained in terms of prior measures of academic interest,
school grades, and standardized achievement test

scores. Whereas the contribution of prior academic


self-concept was stronger for school grades, the effects were also highly significant for standardized
test scores. Although the causal effects of academic
interest on subsequent achievement were largely
nonsignificant, possibly there was some shared variance in subsequent measures of academic achievement that could not be uniquely explained by either
self-concept or interest. There were stereotypic gender differences in the mean levels of math motivation
constructs, but in contrast to predictions from gender-intensification and gender-stereotypic models
(but in support of gender-invariance models), support for the reciprocal effects relating math selfconcept, interest, school grades, and test scores was
similar for boys and girls. Thus, for example, even
though girls had lower math self-concepts than did
boys, the positive effect of a high math self-concept
on subsequent math achievement was similar for
both boys and girls.
In relation to the generalizability of the results and
future research, several characteristics of the present
investigation warrant further consideration. It is
important to evaluate the generalizability of these
results in different settings, countries, and age
groups. Our results showed similar findings in different studies where the data-collection waves were
in the academic school year (Study 1) or spanned
different school years (Study 2), suggesting that this
methodological issue may not be as important as
suggested by Marsh et al. (1999). We note, however,
that the mathematics classes in Germany are reasonably similar for students in Grades 7 and 8.
Hence, the critical feature might not be the length of
the interval per se, but the amount of change that
takes place between consecutive data collections.
Future research should more fully attend to differences in the contextual characteristics as well as the
length of time between waves. Also, because our
study was based only on math constructs, it is important to test the generalizability of the results to
other academic (and perhaps nonacademic) domains.
More complicated is the generalizability of our
results to other age groups. Although some researchers have suggested that effects are stronger for
older samples, Guay et al. (2003) have demonstrated
strong support for the reciprocal effects model that
generalized over elementary school years. However,
Koller et al. (2001) speculated that the effect of interest would be stronger during later school years,
when the curriculum was not so highly structured
and students had more freedom in selecting courses.
Marsh and Yeung (1997a, 1997b) similarly showed
that whereas both achievement and academic

A Reciprocal Effects Model

self-concept were substantially related to coursework selection in different school subjects, domainspecific components of self-concept were much better predictors of course selection.
The focus of reciprocal effects models in academic
self-concept research has been primarily on achievement. However, Wigfield and Eccles (1992, 2002) have
suggested that whereas self-concept is more strongly
related to actual achievement, task values may be
more strongly related to choice behavior (e.g.,
coursework selection). There is a need for further research juxtaposing the combined effects of academic
self-concept, interest, and achievement on a more
varied set of academic choice behaviors. Theoretically,
our research helps bridge gaps between the educational and developmental research literatures, and
among large bodies of self-concept research, interest
research, and more general approaches to motivation
such as expectancy-value theory.
Of course, the direction of causality among academic self-concept, academic interest, and achievement has important practical implications for
educators, as well as for educational and developmental psychologists who work in school settings.
If the direction of causality were from academic
self-concept and interest to achievement (a selfenhancement model), teachers should concentrate
more effort on enhancing students self-concepts and
intrinsic interest rather than focusing on achievement. On the other hand, if causality were from
achievement to self-concept and interest (a skill development model), teachers should focus on improving academic skills as the best way to improve
self-concept and interest. In contrast to both these
apparently simplistic (either or) models, the reciprocal effects model implies that academic selfconcept, interest, and academic achievement are
reciprocally related and mutually reinforcing. Improved academic self-concepts and interest will lead
to better achievement, and improved achievement
will lead to better academic self-concepts and interest. Thus, for example, if teachers enhance students
academic self-concepts and interest without
improving achievement, the gains in self-concept
and interest are likely to be short-lived. However,
if teachers improve students academic achievement
without fostering students self-beliefs in their academic capabilities and intrinsic interest, the
achievement gains are also unlikely to be long lasting. If teachers focus on one construct to the exclusion of the other, both are likely to suffer. The
reciprocal effects model suggests that the most effective strategy is to improve academic self-concept,
interest, and achievement simultaneously.

413

References
Allison, P. D. (2001). Missing data [Sage University Papers
Series on Quantitative Applications in the Social Sciences, 07-136]. Thousand Oaks, CA: Sage.
Baumert, J., Lehmann, R. H., Lehrke, M., Schmitz, B.,
Clausen, M., Hosenfeld, I., et al. (1997). TIMSS: Mathematisch-Naturwissenschaftlicher Unterricht im internationalen Vergleich [TIMSS: Mathematics and science
instruction in an international comparison]. Opladen,
Germany: Leske & Budrich.
Baumert, J., Schnabel, K., & Lehrke, M. (1998). Learning math in school: Does interest really matter? In
L. Hoffmann, A. Krapp, K. A. Renninger, & J. Baumert
(Eds.), Interest and learning (pp. 327 336). Kiel,
Germany: Institut fur die Padagogik der Naturwissenschaften IPN.
Beaton, A. E., Mullis, I. V. S., Martin, M. O., Gonzales, E. J.,
Kelly, D. L., & Smith, T. A. (1996). Mathematics achievement in the middle school years: IEAs Third International
Mathematics and Science Study. Chestnut Hills, MA: Boston College.
Bollen, K. A. (1989). Structural equations with latent variables.
New York: Wiley.
Bouffard, T., Marcoux, M., Vezeau, C., & Bordeleau, L.
(2003). Changes in self-perceptions of competence
and intrinsic motivation among elementary schoolchildren. British Journal of Educational Psychology, 73,
171 186.
Byrne, B. M. (1996). Academic self-concept: Its structure,
measurement, and relation to academic achievement. In
B. A. Bracken (Ed.), Handbook of self-concept (pp. 287
316). New York: Wiley.
Byrne, B. M. (1998). Structural equation modeling with LISREL, PRELIS, and SIMLIS: Basic concepts, applications and
programming. Mahwah, NJ: Erlbaum.
Byrne, B. M., & Shavelson, R. J. (1987). Adolescent selfconcept: Testing the assumption of equivalent structure
across gender. American Educational Research Journal, 24,
365 385.
Calsyn, R., & Kenny, D. (1977). Self-concept of ability and
perceived evaluations by others: Cause or effect of academic achievement?. Journal of Educational Psychology,
69, 136 145.
Crain, R. M. (1996). The influence of age, race, and gender
on child and adolescent multidimensional self-concept.
In B. A. Bracken (Ed.), Handbook of self-concept: Developmental, social, and clinical considerations (pp. 395 420).
Oxford, England: Wiley.
Csikszentmihalyi, M., & Schiefele, U. (1993). Die Qualitat
des Erlebens und der Proze des Lernens [The quality of
experience and the process of learning]. Zeitschrift fur
Padagogik, 39, 207 221.
Deci, E. L., & Ryan, R. M. (1985). Intrinsic motivation and
self-determination in human behavior. New York: Plenum.
Eccles, J. S. (1983). Expectancies, values, and academic
choice: Origins and changes. In J. Spence (Ed.),
Achievement and achievement motivation (pp. 87 134). San
Francisco: Freeman.

414

Marsh, Trautwein, Ludtke, Koller, and Baumert

Eccles, J. S., & Wigfield, A. (1995). In the mind of the actor:


The structure of adolescents achievement task values
and expectancy-related beliefs. Personality & Social Psychology Bulletin, 21, 215 225.
Eccles, J. S., Wigfield, A., & Schiefele, U. (1998). Motivation
to succeed. In W. Damon (Series Ed.) & N. Eisenberg
(Vol. Ed.), Handbook of child psychology: Vol. 3. Social,
emotional, and personality development (5th ed., pp. 1017
1095). New York: Wiley.
Goldstein, H. (2003). Multilevel statistical models (3rd ed.).
London: Hodder Arnold.
Graham, J. W., & Hoffer, S. M. (2000). Multiple imputation
in multivariate research. In T. D. Little, K. U. Schnabel, &
J. Baumert (Eds.), Modeling longitudinal and multilevel
data: Practical issues, applied approaches, and specific examples (pp. 201 218). Mahwah, NJ: Erlbaum.
Guay, F., Marsh, H. W., & Boivin, M. (2003). Academic selfconcept and academic achievement: Developmental
perspectives on their causal ordering. Journal of Educational Psychology, 95, 124 136.
Harter, S. (1990). Processes underlying adolescent selfconcept formation. In R. Montemayor, G. Adams, &
T. Gullotta (Eds.), From childhood to adolescence: A transitional period? (pp. 205 239). Thousand Oaks, CA: Sage.
Harter, S. (1992). The relationship between perceived
competence, affect, and motivational orientation with
the classroom: Processes and patterns of change. In A. K.
Boggiano & T. S. Pittman (Eds.), Achievement and motivation: A social-developmental perspective (pp. 77 113).
New York: Cambridge University Press.
Harter, S. (1998). Developmental perspectives on the selfsystem. In W. Damon (Series Ed.) & N. Eisenberg (Vol.
Ed.), Handbook of child psychology: Vol. 3. Social, emotional,
and personality development (5th ed., pp. 553 618). New
York: Wiley.
Hattie, J. (1992). Self-concept. Hillsdale, NJ: Erlbaum.
Helmke, A., & van Aken, M. A. G. (1995). The causal ordering of academic achievement and self-concept of
ability during elementary school: A longitudinal study.
Journal of Educational Psychology, 87, 624 637.
Hidi, S., & Ainley, M. (2002). Interest and adolescence. In F.
Pajares & T. Urdan (Eds.), Academic motivation of adolescents (pp. 247 275). Greenwich, CT: Information Age.
Husen, T. (1967). International study of achievement in mathematics. A comparison of 12 countries (Vols. I and II).
Stockholm: Almqvist & Wiksell.
Jacobs, J. E., Lanza, S., Osgood, D. W., Eccles, J. S., &
Wigfield, A. (2002). Changes in childrens self-competence and values: Gender and domain differences across
grades one though twelve. Child Development, 73,
509 527.
Joreskog, K. G. (1979). Statistical estimation of structural
models in longitudinal investigations. In J. R. Nesselroade & P. B. Baltes (Eds.), Longitudinal research in the
study of behavior and development (pp. 303 351). New
York: Academic Press.
Joreskog, K. G., & Sorbom, D. (1993). LISREL 8: Users reference guide. Chicago: Scientific Software.

Kaplan, D. (2000). Structural equation modeling: Foundations


and extensions. Newbury Park, CA: Sage.
Koller, O. (1998). Zielorientierungen und schulisches Lernen
[Goal orientations and academic learning]. Munster, Germany: Waxmann.
Koller, O., Baumert, J., & Schnabel, K. (2001). Does interest
matter? The relationship between academic interest and
achievement in mathematics. Journal for Research in
Mathematics Education, 32, 448 470.
Krapp, A. (2000). Interest and human development during
adolescence: An educational-psychological approach. In
J. Heckhausen (Ed.), Motivational psychology of human
development (pp. 109 128). London: Elsevier.
Krapp, A., Prenzel, M., & Schiefele, H. (1986). Grundzuge
einer padagogischen Interessentheorie [Basic principles
of an educational theory of interest]. Zeitschrift fur
Padagogik, 32, 163 173.
Little, R. J. A., & Rubin, D. B. (1987). Statistical analysis with
missing data. New York: Wiley.
Ludtke, O., Koller, O., Marsh, H. W., & Trautwein, U.
(in press). Teacher feedback and the big-fish-little-pond
effect. Contemporary Educational Psychology.
Marsh, H. W. (1987). The big-fish-little-pond effect on academic self-concept. Journal of Educational Psychology, 79,
280 295.
Marsh, H. W. (1989). Age and sex effects in multiple dimensions of self-concept: Preadolescence to earlyadulthood. Journal of Educational Psychology, 81, 417 430.
Marsh, H. W. (1990). The causal ordering of academic
self-concept and academic achievement: A multiwave,
longitudinal panel analysis. Journal of Educational
Psychology, 82, 646 656.
Marsh, H. W. (1993a). Academic self-concept: Theory
measurement and research. In J. Suls (Ed.), Psychological
perspectives on the self (Vol. 4, pp. 59 98). Hillsdale, NJ:
Erlbaum.
Marsh, H. W. (1993b). The multidimensional structure of
academic self-concept: Invariance over gender and age.
American Educational Research Journal, 30, 841 860.
Marsh, H. W. (1994). Confirmatory factor analysis models
of factorial invariance: A multifaceted approach. Structural Equation Modeling, 1, 5 34.
Marsh, H. W., & Ayotte, V. (2003). Do multiple dimensions
of self-concept become more differentiated with age?
The differential distinctiveness hypothesis. Journal of
Educational Psychology, 95, 687 706.
Marsh, H. W., Balla, J. R., & McDonald, R. P. (1988).
Goodness of fit indexes in confirmatory factor analysis:
The effect of sample size. Psychological Bulletin, 103,
391 410.
Marsh, H. W., Byrne, B. M., & Yeung, A. S. (1999). Causal
ordering of academic self-concept and achievement:
Reanalysis of a pioneering study and revised recommendations. Educational Psychologist, 34, 154 157.
Marsh, H. W., & Craven, R. (1997). Academic self-concept:
Beyond the dustbowl. In G. Phye (Ed.), Handbook of
classroom assessment: Learning, achievement, and adjustment (pp. 131 198). Orlando, FL: Academic Press.

A Reciprocal Effects Model


Marsh, H. W., & Craven, R. G. (in press). What comes first?
A reciprocal effects model of the mutually reinforcing
effects of academic self-concept and achievement. In
H. W. Marsh, R. G. Craven, & D. M. McInerney (Eds.),
International advances in self research (Vol. 2). Greenwich,
CT: Information Age.
Marsh, H. W., Craven, R. G., & Debus, R. (1991).
Self-concepts of young children aged 5 to 8: Their measurement and multidimensional structure. Journal of
Educational Psychology, 83, 377 392.
Marsh, H. W., Craven, R. G., & Debus, R. (1998). Structure,
stability, and development of young childrens selfconcepts: A multicohort-multioccasion study. Child
Development, 69, 1030 1053.
Marsh, H. W., Craven, R. G., & Debus, R. (2000). Separation
of competency and affect components of multiple dimensions of academic self-concept: A developmental
perspective. Merrill-Palmer Quarterly, 45, 567 560.
Marsh, H. W., Debus, R., & Bornholt, L. (2005). Validating
young childrens self-concept responses: Methodological ways and means to understand their responses. In
D. M. Teti (Ed.), Handbook of research methods in developmental science (pp. 138 160). Oxford, England: Blackwell.
Marsh, H. W., & Grayson, D. (1990). Public/Catholic differences in the high school and beyond data: A multigroup structural equation modeling approach to testing
mean differences. Journal of Educational Statistics, 15,
199 235.
Marsh, H. W., & Grayson, D. (1994). Longitudinal stability
of latent means and individual differences: A unified
approach. Structural Equation Modeling, 1, 317 359.
Marsh, H. W., & Hau, K.-T. (1996). Assessing goodness of
fit: Is parsimony always desirable? Journal of Experimental Education, 64, 364 390.
Marsh, H. W., Hau, K. T., & Kong, K. W. (2002). Multilevel
causal ordering of academic self-concept and achievement: Influence of language of instruction (English vs.
Chinese) for Hong Kong Students. American Educational
Research Journal, 37, 245 282.
Marsh, H. W., & Rowe, K. J. (1996). The negative effects of
school-average ability on academic self-conceptFAn
application of multilevel modeling. Australian Journal of
Education, 40, 65 87.
Marsh, H. W., & Yeung, A. S. (1997a). Causal effects of
academic self-concept on academic achievement: Structural equation models of longitudinal data. Journal of
Educational Psychology, 89, 41 54.
Marsh, H. W., & Yeung, A. S. (1997b). Coursework selection: The effects of academic self-concept and achievement. American Educational Research Journal, 34, 691 720.
Marsh, H. W., & Yeung, A. S. (1998). Longitudinal structural equation models of academic self-concept and
achievement: Gender differences in the development of
math and English constructs. American Educational
Research Journal, 35, 705 738.
Meredith, W. (1993). Measurement invariance, factor
analysis and factorial invariance. Psychometrika, 58,
525 543.

415

Raudenbush, S. W., & Bryk, A. S. (2002). Hierarchical linear


models: Applications and data analysis methods (2nd ed.).
Thousand Oaks, CA: Sage.
Renninger, K. A. (2000). How might the development of
individual interest contribute to the conceptualization
of intrinsic motivation? In C. Sansome & J. M.
Harackiewicz (Eds.), Intrinsic and extrinsic motivation: The
search for optimal motivation and performance. New York:
Academic Press.
Robitaille, D., & Garden, R. (1989). The IEA Study of
Mathematics II. Contents and outcomes of school mathematics. Oxford, England: Pergamon.
Schiefele, U. (1996). Motivation und Lernen mit Texten
[Motivation and learning with texts]. Gottingen, Germany:
Hogrefe.
Schiefele, U. (1998). Individual interest and learningF
What we know and what we do not know. In L. Hoffmann, A. Krapp, K. A. Renninger, & J. Baumert (Eds.),
Interest and learning (pp. 91 104). Kiel, Germany: Institut
fur die Padagogik der Naturwissenschaften (IPN).
Schiefele, U., Krapp, A., & Winteler, A. (1992). Interest as
predictor of academic achievement: A meta-analysis of
research. In K. A. Renninger, S. Hidi, & A. Krapp
(Eds.), The role of interest in learning and development (pp.
183 212). Hillsdale, NJ: Erlbaum.
Skaalvik, E. M., & Hagtvet, K. A. (1990). Academic
achievement and self-concept: An analysis of causal
predominance in a developmental perspective. Journal of
Personality & Social Psychology, 58, 292 307.
Valentine, J. C., DuBois, D. L., & Cooper, H. (2004). The relations between self-beliefs and academic achievement: A
systematic review. Educational Psychologist, 39, 111 133.
Watt, H. M. G. (2004). Development of adolescents selfperceptions, values, and task perceptions according to
gender and domain in 7th- through 11th-grade Australian students. Child Development, 75, 1556 1574.
Wigfield, A. (1994). Expectancy-value theory of achievement motivation: A developmental perspective. Educational Psychology Review, 6, 49 78.
Wigfield, A., & Eccles, J. S. (1992). The development of
achievement task values: A theoretical analysis. Developmental Review, 12, 265 310.
Wigfield, A., & Eccles, J. S. (2002). The development of
competence beliefs, expectancies for success, and
achievement values from childhood through adolescence. In A. Wigfield & J. S. Eccles (Eds.), Development of
achievement motivation (pp. 173 195). San Diego, CA:
Academic Press.
Wigfield, A., Eccles, J. S., Yoon, K. S., Harold, R. D., Arbreton, A., Freedman-Doan, K., et al. (1997). Changes in
childrens competence beliefs and subjective task values
across the elementary school years: A three-year study.
Journal of Educational Psychology, 89, 451 469.
Wigfield, A., & Karpathian, M. (1991). Who am I and what
can I do? Childrens self-concepts and motivation in achievement solutions. Educational Psychologist, 26, 233 261.
Wylie, R. C. (1979). The self-concept (Vol. 2, Lincoln: University of Nebraska Press.

416

Marsh, Trautwein, Ludtke, Koller, and Baumert

Appendix
Self-Concept and Interest Items Used in
Studies 1 and 2
Math Self-Concept (Studies 1 and 2)
 I would much prefer math if it werent so hard.
(1 5 strongly disagree to 4 5 strongly agree)
 Although I make a real effort, math seems to be
harder for me than for my fellow students.
(1 5 strongly disagree to 4 5 strongly agree)
 Nobodys perfect, but Im just not good at math.
(1 5 strongly disagree to 4 5 strongly agree)
 Some topics in math are just so hard that I know
from the start Ill never understand them.
(1 5 strongly disagree to 4 5 strongly agree)
 Math just isnt my thing. (1 5 strongly disagree to
4 5 strongly agree)

Math Class-Specific Interest (Studies 1 and 2)


 How important is it for you to learn a lot in mathematics classes? (1 5 not at all important to 5 5 very
important)

 Would you like mathematics classes to be taught


more often? (1 5 not at all to 5 5 very much)
 How much do you look forward to mathematics
classes? (1 5 not at all to 5 5 very much)
 How important is it for you to remember what you
have learned in mathematics classes? (1 5 not at all
important to 5 5 very important)

Math Domain-Specific Interest (Study 2 Only)


 It is important to me to be a good mathematician.
(1 5 strongly disagree to 4 5 strongly agree)
 I enjoy working on mathematical problems.
(1 5 strongly disagree to 4 5 strongly agree)
 Mathematics is one of the things that is important to
me personally. (1 5 strongly disagree to 4 5 strongly
agree)
 I would even give up some of my spare time to learn
new topics in mathematics. (1 5 strongly disagree to
4 5 strongly agree)
 While working on a mathematical problem, it
sometimes happens that I dont notice time passing.
(1 5 strongly disagree to 4 5 strongly agree)

Vous aimerez peut-être aussi