Académique Documents
Professionnel Documents
Culture Documents
A R T I C L E
I N F O
Article history:
Available online 29 October 2014
Keywords:
Teachers expectancies
Pygmalion effect
Students self-concept
Multilevel modeling
Math achievement
A B S T R A C T
According to the Pygmalion effect, teachers expectancies affect students academic progress. Many empirical studies have supported the predictions of the Pygmalion effect, but the effect sizes have tended
to be small to moderate. Furthermore, almost all existing studies have examined teacher expectancy effects
on students achievement at the student level only (does a specic student improve?) rather than at the
classroom level (do classes improve when teachers have generally high expectations of their students?). The present study scrutinized the Pygmalion effect in a longitudinal study by using a large sample
in regular classrooms and by differentiating between two achievement outcomes (grades and an achievement test) and two levels of analyses (the individual and classroom levels). Furthermore, students selfconcept was studied as a possible mediator of the teacher expectancy effect on achievement. Data come
from a study with 73 teachers and their 1289 fth-grade students. Multilevel regression analyses yielded
three main results. First, Pygmalion effects were found at the individual level for both achievement outcomes. Second, multilevel mediation analyses showed that teacher expectancy effects were partly mediated
by students self-concept. Third, teachers average expectancy effects at the class level were found to be
nonsignicant when students prior achievement was controlled.
2014 Elsevier Inc. All rights reserved.
1. Introduction
Teachers form expectancies of their students achievements.
Teachers expectancies are based on the knowledge they have about
their students, such as previous grades and perceptions of in-class
performance, but are also based on teachers prejudices or stereotypes (Good, 1987; Jussim, Eccles, & Madon, 1996; Reyna, 2000,
2008). The expectancies teachers form about their students have
been shown to impact students future achievement, an effect that
is often labeled the Pygmalion effect (Rosenthal, 2010). Pygmalion
effects have high scientic and practical relevance due to their potentially positive or negative effects on important student outcomes.
Not surprisingly, Pygmalion effects have been the subject of many
empirical studies (meta-analyses and reviews see Jussim & Harber,
2005; Rosenthal & Rubin, 1978; Tenenbaum & Ruck, 2007), which
have documented, by and large, the existence of expectancy
effects.
smaller coecients are not surprising given that teachers expectancies were not pervasive and enduring per se, but rather exible
and open to change as soon as more information about individual
student achievement was available (Brophy, 1983).
Which mechanisms account for teachers expectancy effects?
Brophy and Good (1970) described a possible mechanism behind
teachers expectancies in a comprehensive model: (a) Teachers
form differential expectancies for their students. (b) Teachers
beliefs about those students begin to lead to different treatment
such as providing more attention and support (climate), offering
more challenging learning materials (input), interacting more
often and longer (output), and being more responsive to the work
(feedback) of the students for whom they hold high expectations
(Rosenthal, 1974). (c) Students in turn recognize the teachers
high expectancies and react to them: They may work more
and harder and develop higher motivation and interest in
schoolwork. (d) This more engaged student behavior will, in the
long run, improve their academic achievement. Those changes
may also affect students self-concept and motivation (Harris &
Rosenthal, 1985). (e) The teacher recognizes the positive changes
in the students behavior, feels supported in his/her former
expectancies and the self-fullling cycle is complete and reinforced. To conclude, there seems to be reasonable theoretical support
for the effects of teachers expectancies on students achievement.
However, longitudinal eld studies concerning teacher expectancy effects have thus far rarely taken into account different
achievement outcomes.
army and found support for the Pygmalion effect for entire work
groups. In their meta-analysis, Kierein and Gold (2000) summarized 13 studies about Pygmalion effects in work organization; some
of them also had groups as the unit of analysis for which they found
an effect of d = 0.83. Yet, the study by Eden and the meta-analysis
manipulated expectancies rather than employing naturally occurring expectancies and was conducted in the work organizational
context, thus leaving the generalizability to educational settings
unclear.
Smith et al. (1998) analyzed expectancy effects on students
achievement for students grouped by ability within and between
classrooms and for students in heterogeneous classrooms (i.e., in
which no ability grouping took place). They did not nd teacher expectancy effects (measured as perceptions of performance, talent,
and effort) on students achievement at the class level, drawing on
aggregated data. Yet, they analyzed whether ability grouping of
classes moderated the relation between teacher expectations and
class achievement and found evidence for this in classes that used
within-class grouping.
Thus, although it is theoretically reasonable to assume whole
group effects, these effects have seldom been analyzed empirically in the educational setting. In line with the few existing former
studies, we assumed that teachers might form evaluations not only
for a single student or a subgroup of students but also for whole
classes. Those class-level teachers expectancies could be
operationalized in two ways: First, teachers could be asked directly about their expectancies for the class as a whole (e.g., Hastings
& Bham, 2003; Lorenz, 2005). A second way is by aggregating teachers expectancies for the individual students in their class. In the
literature on multilevel analyses, aggregating student or teacher variables, grades, or test scores on the class level is a common method
of separating and analyzing student- and class-level effects (e.g.,
Croninger, Rice, Rathbun, & Nishio, 2003; Trautwein, Ldtke, Marsh,
Kller, & Baumert, 2006; Trautwein, Ldtke, Marsh, & Nagy, 2009).
Previous studies investigating expectancies for groups did not use
global assessments when exploring expectancy effects for groups
(Smith et al., 1998). In more detail, prior studies on expectancy effects
for groups followed the assumption that groups of people consist
of different individuals and that their differences account for the perception of the whole group (e.g., see Eden, 1990). For example, in
the study by Eden (1990) conducted in a military context, leaders
were not told that this group had high potential on average, but
that the people in the group had high potential on average. Second,
with the multi-level analysis, we took into account the nested structure of the data and were able to separate both student- and
classroom-level effects, which can be seen as a strength of the present
manuscript (e.g., Miller & Murdock, 2007). Predictor and outcome
measure were assessed on the same (individual) level, aggregated
before the analyses, and therefore were more comparable. Indeed,
with this approach it was ensured that each student was taken into
account to the same extent. Therefore, in particular to increase consistency and comparability with prior expectancy research, and
following Smith et al.s (1998) considerations, we used the second
method and refer to this aggregated teacher measure as teachers
average expectancies.
When a teacher holds rather low average expectancies for a class,
this could result in the selection of less dicult tasks, repeated
problem talk, and less appreciation by the teacher. In the long term,
these actions may result in lower self-concepts or achievements of
the students in this class. By contrast, if a teacher has rather high
average expectancies for a class, this teacher might select challenging tasks, focus more on the strengths of the students, and give more
enforcement, and these actions may all have a positive effect on students self-concept or achievement. This positive effect might be
especially helpful for low-achieving students with poor selfperceptions (which we used in the present study) because a
teachers average expectancies for entire classes on students achievement; thus, we conducted a rather exploratory analysis.
2. Method
2.1. Sample
The participants were math teachers (N = 73) and their fthgrade students (N = 1289) attending the lowest school track in
Germany (Haupt and Werkrealschule) who took part in a larger study
on self-regulated learning.1 Teachers and students participated voluntarily. The core prerequisite for participation was that the teachers
taught a fth-grade math class. The study was conducted in 2012
during regular school hours. Data from 73 classes in different schools
were collected by trained research assistants at three time points.
The rst measurement was in February (T1), the second in April (T2),
and the third in June (T3). Teachers were asked to report their expectancies concerning their fth graders math competence at T1.
Students reported their math self-concept at T2. Students math
achievement was assessed at T3. In addition, control variables such
as students prior self-concept and prior achievement were assessed at T1. There were two rationales behind our decision to
conduct the rst measurement in the middle of the school year in
February. First, as teacher judgments about student performance
are likely to be inuenced by the amount of academic exposure to
a student (Begeny, Eckert, Montarello, & Storie, 2008, p. 53; also
see Jussim, 1989; Kenny, 2004), teachers should have had an extended time period to observe their students (Jussim, 1989).
Following this idea, several previous studies on expectancy effects
in the classroom context have also used a long time lag, which allows
teachers to be acquainted with their students before rating them
(e.g., Brattesani et al., 1984; Marsh & Craven, 1991; Praetorius, Karst,
Dickhuser, & Lipowsky, 2011; Smith et al., 1998). Second, the decision to implement a time lag of 6 months was also driven by our
interest in explaining achievement changes in students. Therefore, we wanted to assess two grades assigned by the same teacher
within a school year. Consequently, in our studys design, the measurement of grades took place at the time points when the halfand end-year grade cards were assigned. In summary, following theoretical considerations and previous expectancy research, teachers
expectancies were surveyed in February, about 6 months after teachers had started teaching their students. To reward their participation,
students received sweets and teachers received a written report of
the main ndings; the report was sent to each school after the study
was completed. As studies have shown that most motivational and
affective constructs are domain-specic in nature (e.g., Bong, 2001;
Goetz, Frenzel, Pekrun, Hall, & Ldtke, 2007), we decided to focus
on one subject (i.e., mathematics). This focus reduced complexity
1
The present data were derived from an intervention study with two experimental groups and one control group. In order to test whether it would be appropriate
to treat the three groups as one dataset for the present analyses, we tested whether
the effects of the independent variables on the outcomes differed across the three
groups (two experimental groups vs. one control group). In more detail, we compared a model in which the effects of the central independent variables on students
math test scores, grades, and self-concept were constrained to be invariant across
groups to a model in which the effects of the independent variables were allowed
to vary freely. We analyzed separate analyses for each outcome. For model evaluation, the common t indices 2, CFI, and RMSEA were used. If the more restrictive
models exhibited t indices that were similar to those of the unconstrained model,
measurement equivalence of the constructs across the groups was assumed (see Little,
1997). Fit indices for all models were satisfactory. In addition, to evaluate comparative model t, SatorraBentler-scaled chi-squares were used for chi-square difference
tests. The chi-square difference test indicated that the invariance constraints did not
yield a signicantly worse model t. Consequently, the analyses were conducted on
the whole dataset.
Table 1
Descriptive statistics of teachers math expectancies of their students math competence and students math self-concept.
T1
T2
Construct
No.
Item
SD
ICC
SD
ICC
Teachers
expectancies
Students
self-concept
1
2
1
2
3
4
2.29
2.57
2.49
2.93
0.86
0.83
0.91
0.81
0.19
0.13
0.03
0.03
2.67
2.90
3.11
2.57
0.94
0.79
0.88
1.15
0.02
0.05
0.05
0.04
Note: N = 1145 to 1281. (r) = reverse coded. M = mean, SD = standard deviation, ICC = Intraclass correlation. Measurement time points: T1 = February, T2 = April; M, SD, and
ICC were calculated on uncentered items.
in the interpretation of our results. The study procedure was approved by the responsible institutional review board.
On average, teachers were 46 years old (SD = 11.71) with a service
length of 17.33 years (SD = 11.32); 68% were female. Grade 5 is the
rst year of secondary school. Seventy-eight percent of the math
teachers were also the principal class teacher who taught students in other subjects in addition to math.
Students were 10 to 14 years old (M = 10.95, SD = 0.77) and
equally distributed with respect to gender (52% boys). Class sizes
varied from nine to 29 students. Only students with active parental consent participated in the study. Nevertheless, the participation
rate was high (91.2%).
2.2. Measures
2.2.1. Teacher reports
Teachers were asked to report their expectancies of their fthgrade students math competence at T1. In general, teachers
expectancies can be assessed by asking teachers about their expectancies of students future success (e.g., How good will this
student be in swimming?, Trouilloud et al., 2002) or by asking teachers for their present opinions of students competences (e.g., How
talented is this student?, Jussim & Eccles, 1992), which is also associated with future-directed expectancies about competences and
achievement. In line with Jussim and Eccles (1992) as well as
Friedrich, Jonkmann, Nagengast, Schmitz, and Trautwein (2013), we
used the second approach in the present study. Teachers received
two items to rate on a 4-point Likert-type scale ranging from 1 (completely disagree) to 4 (completely agree) and were asked to assess each
student without reference to the other students in the classroom.
The items (The student can solve even dicult mathematical tasks
and The student does well in mathematics; = .86; see Table 1)
were adapted from the mathematic abilities subscale of the German
version (Schwanzer, Trautwein, Ldtke, & Sydow, 2005) of the SelfDescription Questionnaire (SDQ; Marsh, 1990). The scale consists
in total of four items, more specically, of two positively and two
negatively worded items. Due to limited space we only assessed two
items in the teacher questionnaire at the rst measurement. As negatively worded items may have undesired method effects (e.g., Marsh,
1996), we selected the two positively worded items. In research on
teacher expectancies and teachers judgments, single-item measures are commonly used (e.g., Hoge & Butcher, 1984; Kuklinski &
Weinstein, 2001; Pohlmann, Mller, & Streblow, 2004; Praetorius,
Berner, Zeinz, Scheunpug, & Dresel, 2013; Praetorius, Greb,
Lipowsky, & Gollwitzer, 2010; Spinath, 2005). Therefore, we assumed
that a two-item measure was acceptable. The reliability of the resulting measure was very satisfying ( = .86).
2.2.2. Student self-reports
Students math self-concept was assessed by two items at T1
and four items at T2. The items (e.g., I can solve even dicult
mathematical tasks and I do well in mathematics; T1: = .61;
T2: = .79; see Table 1) were also adapted from the SDQ with identical wording.
2.2.3. Students achievement
Students scholastic achievement was assessed by students math
grades and students scores on a standardized math achievement
test. The math grade was obtained from school records at the end
of the school year (in July). In the German school system, teachers
evaluate their students with numerical grades that range from 1 to
6. We recoded the grades so that higher values indicate better
achievement. In addition to collecting grades, we conducted a standardized math test. At both measurement points (T1 and T3), the
test consisted of 34 items with varying response formats (e.g., multiple choice items, open questions, drawing tasks, i.e., plot the result
in a coordinate system). The items measured a broad array of students math competences such as logical inference, division,
transformation or use of the rule of three. Students had 35 minutes
to complete the test. We applied two parallel tests (Forms A and
B) at each time point and used a rotation design with a set of xed
items (anchor items) to be able to compare the test results of Times
1 and 3 and a set of items that varied between the time points to
reduce possible inuences of training effects. The test content was
based on the school curriculum of fth graders in Germany.2 Although our sample consisted of students from the lowest school track
only, there was a representative amount of variability in grades and
test scores.
Item response theory (IRT) was used to scale students math
achievement test scores. Model t was checked using conrmatory factor analysis models based on polychoric correlations. An
analysis of dimensionality indicated a good t for a one-dimensional
model at each time point (Time 1: RMSEA = .025; Time 3:
RMSEA = .018). In addition, combined IRT models also resulted in
good model t, thereby indicating measurement invariance. We used
expected a posteriori person-parameter estimates (EAPs) calculated with Mplus for further analyses.
2.2.4. Control variables
As control variables, we assessed students sex, age, their ability
to do gural reasoning, and their prior achievement. For the gural
reasoning score, we used the Fig. Analogies subscale of the Cognitive Ability Test 412 + R (Heller & Perleth, 2000), a German
adaptation of the Cognitive Abilities Test developed by Thorndike
2
Germany consists of 16 federal states, each of which has specic educational
systems and curricula. We focused on one federal state (Baden Wuerttemberg),
therefore ensuring that all public schools in this state, i.e., in our sample, followed
the same curriculum. All schools teach the same content in the same order during
the school year. In addition, we asked the schools in advance to focus on a specic
mathematical topic for the timeframe of the study (a topic that complied with the
planned curriculum). By ensuring that similar mathematical content was taught and
by focusing on one federal state, the use of a standardized achievement test was
justied and results can be considered comparable.
and Hagen (1971). The subscale is an ecient and often used nonverbal measure of students cognitive abilities, tapping highly
g-loaded ability components for which norm data for fth graders
in Germany exist. The subscale consists of 25 gural items in a
multiple-choice format and takes 8 minutes. Again, we had two parallel tests (Forms A and B).
IRT was used to scale students test scores. For students cognitive abilities, the Rasch model was chosen as the measurement
model. Item- and person-parameters were estimated using ConQuest
2.0 (Wu, Adams, Wilson, & Haldane, 2007). The model t statistics
were satisfactory with no signs of bottom or ceiling effects. Weighted
maximum likelihood estimates (WLE) were used as person parameter estimates in further analyses. Marginal reliabilities for WLEs
reached acceptable values (Rel. = .835 for Form A and Rel. = .827 for
Form B).
2.3. Statistical analyses
2.3.1. Multilevel structure
As we were interested in effects of teachers expectancies on outcomes of individual students and on the class level, we used a
multilevel framework with students on the within level (Level 1) and
class on the between level (Level 2). Besides our thematic interest,
using multilevel analyses is the appropriate method due to the clustering of students in a class that is assessed by one teacher, a
structure that violates the assumption of independence of observations (Snijders & Bosker, 2012). A multilevel approach (a) yields
correct standard errors and (b) allows the user to separate the variance between the two levels of analysis (Raudenbush & Bryk, 2002;
Snijders & Bosker, 2012). The degree of clustering and the amount
of between-class variance are usually measured by the intraclass correlation (ICC). The ICC is the proportion of total variance explained
by the variation at the class level, it compares variance components between and within clusters, and it ranges from 0, meaning
total independence of observations to 1.00, meaning maximum dependency within clusters (Snijders & Bosker, 2012). In the present
study, the ICCs ranged from 0.13 to 0.19 for teacher reports at T1.
For students self-reports, the ICCs ranged from 0.02 to 0.05 at T2,
thus implying that they were only moderately affected by the nesting
in classrooms (see Table 1).
To examine Research Questions 1 and 2, we used the grandmean-centering option in Mplus, in which overall means were
subtracted from each variable (Muthn & Muthn, 19982012). To
examine effects of teachers average expectancies for a whole class
(Research Question 3), we aggregated teachers expectancies of individual students. We averaged those expectancies for each class
using the between-level function in Mplus. We further calculated
contextual effects. Following instructions by Nagengast and Marsh
(2012), we used the latent aggregation procedure in Mplus in which
all Level 1 variables are implicitly grand-mean centered. Estimates of contextual effects that represent the effect of Level 2
variables after controlling for Level 1 differences can be obtained
by subtracting the Level 1 effect from the Level 2 effect (Enders &
Toghi, 2007; Kreft, de Leeuw, & Aiken, 1995; Marsh et al., 2012).
2.3.2. Regression analyses
To examine Research Questions 1 and 3, we analyzed two separate multilevel regression analyses. First, to investigate the effects
of teachers expectancies for individual students, we analyzed teacher
expectancy effects at the student level (Level 1) for both outcome
variables. Second, we conducted another multilevel regression analyses in which teachers expectancies were entered as a predictor
at the class level (Level 2) in order to test the teachers average expectancy effects for a whole class on students achievement. Both
multilevel regression analyses on the student level and the class level
were calculated separately for the two outcomes (math grade and
Table 2
Two regression models analyzing teachers expectancies of students math competences as predictor of students math achievement (Research question 1).
Outcome math test (T3)
Variable
Research Question 1
(outcome math test/grade)
Within-students level
Sex
Age (T1)
Figural reasoning (T1)
Math test (T1)
Math grade (T1)
Teachers expectancies (T1)
Variance components
Within-student R2
SE
0.00
0.06**
0.08**
0.35**
0.11**
0.04
0.02
0.01
0.03
0.03
0.26
SE
0.05
0.01
0.04**
0.68**
0.13**
0.03
0.03
0.01
0.04
0.03
0.62
Note: Unstandardized regression coecients are reported. Grand-mean centering was used. Measurement time points: T1 = February, T3 = June. Students sex was coded:
0 = male, 1 = female. Grades were reverse coded with 1 indicating the worst and 6 the best grade. R2 = explained variance.
** p < .01.
Table 3
Two mediation models analyzing the role of students self-concept as a possible mediator of teachers expectancies of students math competences and the outcome students math achievement (Research question 2).
Mediator students math self-concept (T2)
B
Research Question 2
(outcome math test)
Within-students level
Sex
Age (T1)
Figural reasoning (T1)
Math test score (T1)
Students self-concept (T1)
Teachers expectancies (T1)
Students self-concept (T2)
Variance components
Within-student R2
0.14*
0.02
0.01
0.02
0.55**
0.17**
SE
0.03
0.02
0.01
0.03
0.03
0.03
0.46
Research Question 2
(outcome math grade)
Within-students level
Sex
Age (T1)
Figural reasoning (T1)
Math grade (T1)
Students self-concept (T1)
Teachers expectancies (T1)
Students self-concept (T2)
Variance components
Within-student R2
0.17**
0.02
0.01
0.15**
0.52**
0.09*
0.46
B
0.02
0.06**
0.09**
0.34**
0.04
0.11**
0.08*
SE
0.04
0.02
0.01
0.03
0.03
0.03
0.04
0.27
SE
0.04
0.02
0.01
0.04
0.03
0.04
SE
0.03
0.03
0.01
0.04
0.03
0.04
0.03
0.63
Note: In the upper part of the table, results of the mediation model for the math-test outcome are summarized; in the lower part, results for the math-grade outcome are
summarized. Unstandardized regression coecients are reported. Grand-mean centering was used. Measurement time points: T1 = February, T2 = April, T3 = June. Students sex was coded: 0 = male, 1 = female. Grades were reverse coded with 1 indicating the worst and 6 the best grade. R2 = explained variance.
* p < .05. ** p < .01.
Table 4
Two regression models analyzing teachers aggregated expectancies of their class math competences (on the between-students level) as a predictor of students math achievement (Research question 3).
Outcome math test (T3)
Variable
Research
Question 3
Within-students level
Sex
Age (T1)
Figural reasoning (T1)
Math test (T1)
Math grade (T1)
Teachers expectancies (T1)
Between-students level
Teachers average expectancies (T1)
Variance components
Within-student R2
Between-student R2
Snijders & Boskers R2
SE
SE
0.01
0.06**
0.08**
0.34**
0.13**
0.03
0.02
0.01
0.03
0.03
0.05
0.01
0.04**
0.67**
0.14**
0.03
0.03
0.01
0.04
0.04
0.04
0.11
0.04
0.08
0.27
0.41
0.31
0.62
0.71
0.64
Note: Unstandardized regression coecients are reported. Grand-mean centering was used for the calculation of context effects (model constraints). Measurement time
points: T1 = February, T3 = June. Students sex was coded: 0 = male, 1 = female. Grades were reverse coded with 1 indicating the worst and 6 the best grade. R2 = explained
variance.
** p < .01.
p = .00, in the math test model vs. b = .09, p = .01, in the math grade
model). We also found signicant direct effects of students selfconcept (T2) on students achievement (b = .08, p = .08, for the math
test, b = .07, p = .01, for math grade). The indirect effect of teachers
expectancies on the math test mediated by students self-concept
was not signicant (b = .01, p = .07). The indirect effect of teachers
expectancies on math grade mediated by students self-concept was
small and signicant (b = .01, p = .05). We calculated the effectsize measure 2, which displays the size of the indirect effect relative
to the maximum possible indirect effect given the constraints of the
variancecovariance matrix of the three variables involved in the
analysis. The Level 1 mediation effect size was rather small with
2 = 0.03 for the math test and 2 = 0.09 for math grade. We further
calculated the within-student R2 values, which resulted in 27% explained variance in students math test score and 63% explained
variance in students math grade on the within-students level. The
main results of the two multilevel mediation models are summarized in Fig. 1.
To summarize the ndings for Research Question 2: The association between teachers expectancies of their students
competences and students achievements were partially mediated
by students self-concept in math for the math-grade outcome but
not for the math-test outcome.
3.4. Expectancy effects at the class level (research question 3)
Research Question 3 addressed whether teachers average expectancies of their students math competence would be reected
by the students achievements. To this end, teachers average expectancies were applied at the class level to predict students two
math achievement outcomes (see Table 4). In both models, we controlled for students sex, age, gural reasoning score, prior math
achievement, and teachers expectancies of individual students math
competence at the within-students level.
In addressing Research Question 3, we found no signicant association between teachers average expectancies and students later
Level 2
Level 1
0.17**
Teachers
expectancies (T1)
0.08*
Math test
score (T3)
0.11**
Level 2
Level 1
0.09*
Teachers
expectancies (T1)
0.07**
Math grade
(T3)
0.15**
Fig. 1. Results of multilevel mediation models at the within-students level (Research Question 2): Students self-concept was assumed to mediate the effect of teachers
expectancies on students math achievement. Measurement time points: T1 = February, T2 = April, T3 = June. Although not illustrated, models were calculated controlling
for students sex, age, gural reasoning score, prior math achievement, and prior self-concept (T1) in math.
*p < .05. **p < .01.
math test score (b = .04, p = .70) or students later math grade (b = .04,
p = .66). We analyzed contextual effects by subtracting the Level 1
effects from the Level 2 effects. The contextual effect was negative
for the math test (b = 0.17, p = .14) as well as for math grade
(b = 0.11, p = .23), but the coecients were not signicant. Therefore, for students with equal preconditions, being in a class
environment with students for whom their teacher held generally
high or low expectancies in math had no association with their individual math achievement. We further calculated the explained
variance components, resulting in within-student R2 values of 27%
and 62%, between-student R2 values of 41% and 71%, and Snijders
and Boskers R2 values of 31% and 64%. The large and signicant
between-student R2 value can be explained by the impact of the
control variables, especially prior achievement on students later
achievement (the results of the control variables at the betweenstudents level are not displayed in Table 4).
To summarize the ndings for Research Question 3, teachers
average expectancies were found to have no association with students test score or math grade after controlling for students sex,
age, gural reasoning score, teachers expectancies at the withinstudents level, and prior achievement.
4. Discussion
Teachers form expectancies of their students achievements. According to the Pygmalion effect, students perceive and react to their
teachers expectancies, and these perceptions and reactions result
in more or less positive learning outcomes, no matter whether the
teachers prior expectancies were accurate or not. And indeed, in
studying the expectations of 73 teachers and the achievements of
their fth-grade students, we found that teachers expectancies were
positively associated with students math achievement at the end
of the school year.
4.1. Achievement measures
In the present study, we focused on two indicators of students
achievement, thus allowing us to compare effects. In Research Question 1, teachers expectancies predicted grades and test scores equally,
and the coecients for math grade were slightly higher only on a
descriptive level. Regarding the mediation analyses, the direct effect
of teachers expectations on student outcomes was again identical and slightly higher for the math-grade outcome than for the
math-test outcome. For Research Question 3, results for teachers
expectancies on the within level (those on the between level were
not signicant) were identical. Thus, we found almost identical coecients for teachers expectancies on students math grades and
on their test scores. As teachers were responsible for reporting the
data used in the study as well as for giving the grades, we may have
expected higher coecients for the math-grade outcome as found
by Jussim and Eccles (1992), for example.
One explanation for this nding might be that although teachers reported their expectancies on students competences and we
could expect that those expectations would be highly correlated with
students actual performance, school grades are not just objective
indicators of students performance, but they have other functions as well (i.e., pedagogical, informational, selection, etc.; Lorenz
& Artelt, 2009). Tests have the advantage of focusing more on students real achievement. However, one-time achievement tests are
inuenced more by situational events during test taking than grades,
which incorporate achievement assessments of several occasions
in written and verbal form over a whole school year. All in all, as
both methodsgrades and testshave their strengths and weaknesses, we proted from combining the two methods in the present
study and suggest combining them in future studies as well.
10
achieving good test results). With this assessment method, the differential impact of the perceived teachers expectancies for
individuals and the class on individual students outcomes, especially under control of students competences and self-concept, could
be analyzed. To the best of our knowledge, the studies conducted
to date only assess students perceptions of their teachers individual expectancies (Dickhuser & Stiensmeier-Pelster, 2003; Freiberger
et al., 2012; Weinstein, Marshall, Brattesani, & Middlestadt, 1982);
therefore, examining students view of both teacher expectancies
might be an interesting addition to current expectancy research. Possible differential and interaction effects of teachers individual and
whole-class expectancies on students outcomes as well as classroom variability in the perception of these expectancies (e.g.,
Weinstein et al., 1982) should be examined in future studies.
Finally, we recommend that future studies investigate whether
assessing teachers whole class expectancies directly (e.g., This class
is capable of achieving good test results) is better suited to study
class-level effects of teacher expectancies than using averaged scores
of teachers expectancies for individual students (e.g., This student
is capable of achieving good test results). Future research needs
to explore which strategy used to assess the expectancies for whole
classes yields more insight, especially regarding differences in students achievement.
Acknowledgments
Alena Friedrich, Barbara Flunger, Benjamin Nagengast, Kathrin
Jonkmann, and Ulrich Trautwein, Hector Research Institute of Education Sciences and Psychology, Europastrae 6, 72072 Tbingen,
Germany. This research was supported by a grant from the German
Federal Ministry of Education and Research to Bernhard Schmitz and
Ulrich Trautwein (01JH0918). Alena Friedrich was a member of the
Graduate School Empirical Educational Research, which is supported by the Ministry of Science, Research, and the Arts in BadenWrttemberg.
References
Bauer, D. J., Preacher, K. J., & Gil, K. M. (2006). Conceptualizing and testing random
indirect effects and moderated mediation in multilevel models: New procedures
and recommendations. Psychological Methods, 11(2), 142163. doi:10.1037/1082989X.11.2.142.
Begeny, J. C., Eckert, T. L., Montarello, S. A., & Storie, M. S. (2008). Teachers perceptions
of students reading abilities: An examination of the relationship between
teachers judgments and students performance across a continuum of rating
methods. School Psychology Quarterly, 23(1), 4355. doi:10.1037/10453830.23.1.43.
Bong, M. (2001). Between- and within-domain relations of academic motivation
among middle and high school students: Self-ecacy, task value, and
achievement goals. Journal of Educational Psychology, 93(1), 2334. doi:10.1037/
0022-0663.93.1.23.
Brattesani, K. A., Weinstein, R. S., & Marshall, H. H. (1984). Student perceptions of
differential teacher treatment as moderators of teacher expectation effects. Journal
of Educational Psychology, 76(2), 236247. doi:10.1037/0022-0663.76.2.236.
Brennan, R. T., Kim, J., Wenz-Gross, M., & Siperstein, G. N. (2001). The relative
equitability of high-stakes testing versus teacher-assigned grades: An analysis
of the Massachusetts Comprehensive Assessment System (MCAS). Harvard
Educational Review, 71(2), 173216.
Brophy, J. E. (1983). Research on the self-fullling prophecy and teacher expectations.
Journal of Educational Psychology, 75(5), 631661. doi:10.1037/00220663.75.5.631.
Brophy, J. E., & Good, T. L. (1970). Teachers communication of differential expectations
for childrens classroom performance: Some behavioral data. Journal of Educational
Psychology, 61(5), 365374. doi:10.1037/h0029908.
Cooper, H. M. (1979). Pygmalion grows up: A model for teacher expectation
communication and performance inuence. Review of Educational Research, 49(3),
389410. doi:10.3102/00346543049003389.
Croninger, R. G., Rice, J. K., Rathbun, A., & Nishio, M. (2003). Teacher qualications
and rst grade achievement: A multilevel analysis. Paper presented at the 2nd
International Symposium on Educational Attainment and School Reform: Policy,
Evaluation, and Classroom Practices. Center for Research on Core Academic
Competence, The University of Tokyo, Tokyo, Japan.
Dickhuser, O., & Stiensmeier-Pelster, J. (2003). Wahrgenommene Lehrereinschtzungen und das Fhigkeitsselbstkonzept von Jungen und Mdchen in
der Grundschule. [Perceived teachers ability evaluations and boys and girls
concepts of their mathematical ability in elementary school]. Psychologie in
Erziehung und Unterricht, 50(2), 182190.
Eden, D. (1990). Pygmalion without interpersonal contrast effects: Whole groups gain
from raising manager expectations. The Journal of Applied Psychology, 75(4),
394398. doi:10.1037/0021-9010.75.4.394.
Enders, C. K., & Toghi, D. (2007). Centering predictor variables in cross-sectional
multilevel models: A new look at an old issue. Psychological Methods, 12(2),
121138. doi:10.1037/1082-989x.12.2.121.
Freiberger, V., Steinmayr, R., & Spinath, B. (2012). Competence beliefs and perceived
ability evaluations: How do they contribute to intrinsic motivation and
achievement? Learning and Individual Differences, 22(4), 518522. doi:10.1016/
j.lindif.2012.02.004.
Friedrich, A., Jonkmann, K., Nagengast, B., Schmitz, B., & Trautwein, U. (2013). Teachers
and students perceptions of self-regulation and math competence: differentiation
and agreement. Learning and Individual Differences, 27, 2634. doi:10.1016/
j.lindif.2013.06.005.
Frome, P. M., & Eccles, J. S. (1998). Parents inuence on childrens achievement-related
perceptions. Journal of Personality and Social Psychology, 74(2), 435452.
doi:10.1037/0022-3514.74.2.435.
Goetz, T., Frenzel, A. C., Pekrun, R., Hall, N. C., & Ldtke, O. (2007). Between- and
within-domain relations of students academic emotions. Journal of Educational
Psychology, 99(4), 715733. doi:10.1037/0022-0663.99.4.715.
Good, T. L. (1987). Two decades of research on teacher expectations: Findings and
future directions. Journal of Teacher Education, 38(4), 3247. doi:10.1177/
002248718703800406.
Harris, M. J., & Rosenthal, R. (1985). Mediation of interpersonal expectancy effects:
31 meta-analyses. Psychological Bulletin, 97(3), 363386. doi:10.1037/00332909.97.3.363.
Hastings, R. P., & Bham, M. S. (2003). The relationship between student behaviour
patterns and teacher burnout. School Psychology International, 24(1), 115127.
doi:10.1177/0143034303024001905.
Heller, K. A., & Perleth, C. (2000). Kognitiver Fhigkeitstest fr 4. bis 12. Klassen, revision
(KFT 412 + R) [Cognitive ability test for class level 4 to 12, revised version]. Gttingen,
Germany: Hogrefe.
Hoge, R. D., & Butcher, R. (1984). Analysis of teacher judgments of pupil achievement
levels. Journal of Educational Psychology, 76(5), 777781.
Jussim, L. (1989). Teacher expectations: Self-fullling prophecies, perceptual biases,
and accuracy. Journal of Personality and Social Psychology, 57(3), 469480.
doi:10.1037/0022-3514.57.3.469.
Jussim, L., Eccles, J., & Madon, S. (1996). Social perception, social stereotypes, and
teacher expectations: Accuracy and the quest for the powerful self-fullling
prophecy. In M. P. Zanna (Ed.), Advances in experimental social psychology (Vol.
28, pp. 281388). San Diego, CA: Academic Press.
Jussim, L., & Eccles, J. S. (1992). Teacher expectations II: Construction and reection
of student achievement. Journal of Personality and Social Psychology, 63(6),
947961. doi:10.1037/0022-3514.63.6.947.
Jussim, L., & Eccles, J. S. (1995). Naturally occurring interpersonal expectancies. In
N. Eisenberg (Ed.), Social development (pp. 74108). Thousand Oaks, CA: Sage
Publications.
Jussim, L., & Harber, K. D. (2005). Teacher expectations and self-fullling
prophecies: Knowns and unknowns, resolved and unresolved controversies.
Personality and Social Psychology Review, 9(2), 131155. doi:10.1207/
s15327957pspr0902_3.
Kenny, D. A. (2004). PERSON: A general model of interpersonal perception. Personality
and Social Psychology Review, 8, 265280. doi:10.1207/s15327957pspr0803_3.
Kierein, N. M., & Gold, M. A. (2000). Pygmalion in work organizations: A meta-analysis.
Journal of Organizational Behavior, 21(8), 913928. doi:10.1002/10991379(200012)21:8<913::AID-JOB62>3.0.CO;2-#.
Kimball, M. M. (1989). A new perspective on womens math achievement.
Psychological Bulletin, 105(2), 198214.
Kreft, I. G. G., de Leeuw, J., & Aiken, L. S. (1995). The effect of different forms of
centering in hierarchical linear models. Multivariate Behavioral Research, 30(1),
121. doi:10.1207/s15327906mbr3001_1.
Krull, J. L., & MacKinnon, D. P. (2001). Multilevel modeling of individual and group
level mediated effects. Multivariate Behavioral Research, 36(2), 249277.
doi:10.1207/S15327906MBR3602_06.
Kuklinski, M. R., & Weinstein, R. S. (2001). Classroom and developmental differences
in a path model of teacher expectancy effects. Child Development, 72(5), 1554
1578. doi:10.1111/1467-8624.00365.
Kuncel, N. R., Hezlett, S. A., & Ones, D. S. (2001). A comprehensive meta-analysis of
the predictive validity of the Graduate Record Examinations: Implications for
graduate student selection and performance. Psychological Bulletin, 127(1),
162181. doi:10.1037/0033-2909.127.1.162.
Kuncel, N. R., Wee, S., Seran, L., & Hezlett, S. A. (2010). The validity of the Graduate
Record Examination for masters and doctoral programs: A meta-analytic
investigation. Educational and Psychological Measurement, 70(2), 340352.
doi:10.1177/0013164409344508.
Little, T. D. (1997). Mean and Covariance Structures (MACS) analyses of cross-cultural
data: Practical and theoretical issues. Multivariate Behavioral Research, 32(1),
5376. doi:10.1207/s15327906mbr3201_3.
Lorenz, C., & Artelt, C. (2009). Fachspezitt und Stabilitt diagnostischer Kompetenz
von Grundschullehrkrften in den Fchern Deutsch und Mathematik. [Domain
specicity and stability of diagnostic competence among primary school teachers
11
12
Preacher, K. J., & Kelley, K. (2011). Effect size measures for mediation models:
Quantitative strategies for communicating indirect effects. Psychological Methods,
16(2), 93115. doi:10.1037/a0022658.
Preacher, K. J., Zyphur, M. J., & Zhang, Z. (2010). A general multilevel SEM framework
for assessing multilevel mediation. Psychological Methods, 15(3), 209233.
doi:10.1037/a0020141.
Raudenbush, S. W., & Bryk, A. S. (2002). Hierarchical linear models (2nd ed.). Thousand
Oaks, CA: Sage.
Reyna, C. (2000). Lazy, dumb, or industrious: When stereotypes convey attribution
information in the classroom. Educational Psychology Review, 12(1), 85110.
doi:10.1023/A:1009037101170.
Reyna, C. (2008). Ian is intelligent but Leshaun is lazy: Antecedents and consequences
of attributional stereotypes in the classroom. European Journal of Psychology of
Education, 23(4), 439458. doi:10.1007/BF03172752.
Rosenthal, R. (1974). On the social psychology of the self-fullling prophecy: Further
evidence for Pygmalion effects and their mediating mechanism. New York: MSS
Modular Publications.
Rosenthal, R. (2010). Pygmalion effect. In The Corsini encyclopedia of psychology (Vol.
3). Hoboken, NJ: John Wiley & Sons, Inc.
Rosenthal, R., & Jacobson, L. (1968). Pygmalion in the classroom. New York: Holt,
Rinehart & Winston.
Rosenthal, R., & Jacobson, L. (1992). Pygmalion in the classroom (expanded ed.). New
York: Irvington.
Rosenthal, R., & Rubin, D. B. (1978). Interpersonal expectancy effects: The rst 345
studies. The Behavioral and Brain Sciences, 1(3), 377415. doi:10.1017/
S0140525X00075506.
Rubie-Davies, C. M. (2007). Classroom interactions: Exploring the practices of
high-and low-expectation teachers. The British Journal of Educational Psychology,
77(2), 289306. doi:10.1348/000709906X101601.
Rubie-Davies, C. M., Hattie, J., & Hamilton, R. (2006). Expecting the best for students:
Teacher expectations and academic outcomes. The British Journal of Educational
Psychology, 76(3), 429444. doi:10.1348/000709905X53589.
Rubie-Davies, C. M., Peterson, E. R., Sibley, C. G., & Rosenthal, R. (2014). A teacher
expectation intervention: Modelling the practices of high expectation teachers.
Contemporary Educational Psychology, <http://dx.doi.org/10.1016/j.cedpsych.
2014.03.003>.
Schwanzer, A. D., Trautwein, U., Ldtke, O., & Sydow, H. (2005). Entwicklung eines
Instruments zur Erfassung des Selbstkonzepts junger Erwachsener [Development
of a measure for the assessment of young adults self-concept]. Diagnostica, 51(4),
183194. doi:10.1026/0012-1924.51.4.183.
Shulman, L. S. (1986). Paradigms and research programs in the study of teaching:
A contemporary perspective. In M. C. Wittrock (Ed.), Handbook of research on
teaching (pp. 336). New York: Macmillan.
Smith, A. E., Jussim, L., Eccles, J., VanNoy, M., Madon, S., & Palumbo, P. (1998).
Self-fullling prophecies, perceptual biases, and accuracy at the individual and
group levels. Journal of Experimental Social Psychology, 34(6), 530561.
doi:10.1006/jesp.1998.1363.
Snijders, T. A. B., & Bosker, R. J. (1994). Modeled variance in two-level models.
Sociological Methods & Research, 22(3), 342363. doi:10.1177/00491241
94022003004.