Académique Documents
Professionnel Documents
Culture Documents
SUMMARY.The object of the study was to examine the statistical techniques available
for the analysis of process-product studies involving non-randomised quasi-experimental designs, and to demonstrate the practical effects of their use on the data from the
Teaching Styles study (Bennett, 1976). Of particular concern were the ' unit of analysis '
or aggregation problem, and the differential effects of treatment grouping by cluster
and factor methods.
The original grouping of teachers into formal, informal and mixed styles was
investigated using a latent class model for the 38 binary questionnaireitems. Convincing
evidence of three overlapping latent classes was found. The comparison of latent classes
in terms of pre-test gain scores was examined using a series of variance component
models, allowing for correlation of children within the same class. Differences among
classes were altered by the probabilistic clustering of the latent class model compared
to the original findings, and the significance of the differences was reduced when the
correlation among children was allowed for.
INTRODUCTION
INthe four years since the publication of Teaching Styles and Pupil Progress (subsequently abbreviated to TS) there have been rapid developments in the statistical
methods available for the analysis of complex data. While these developments are
still in their early stages, it is already clear that they will have an important influence
on the analysis of large-scale educational research studies. Two of these developments are particularly important for the analysis of educational data from surveys and
observational studies: the development of latent class models for clustering nonhomogeneous populations, and the development of unbalanced variance component
(' mixed ') models for nested and cluster sampling structures.
The objects of this article are to describe the application of these modelling
procedures to the Teaching Styles data, to report the conclusions drawn, and to
compare these conclusions with those found in the original analysis. Implications for
future research studies are also discussed (for statistical detail see Aitkin et al., 1981).
In the re-analysis, two main questions were considered:
(1) Is there convincing statistical evidence of distinguishable teaching styles? If
so, how many styles can be convincingly identified, and how can these be characterised ?
(2) Is teaching style, as determined statistically above, related to overall pupil
progress ?
THE EXISTENCE OF DISTINGUISHABLE TEACHING STYLES
Cluster analysis
The use of cluster analysis in educational research is increasing as researchers
recognise the utility of grouping people rather than grouping variables. Barker
Lunn's (1970) study of streaming in the primary school was the first major investigation to use this approach, delineating two ' types ' of teaching closely conforming
to the progressive-traditional dichotomy. The two most recent studies (Bennett,
171
M. AITKIN,S. N. BENNETT
and JANE HESKETH
1976; Galton et al., 1980) used identical cluster methods to delineate both teacher and
pupil types although the data base was different. The method chosen was based on
iterative relocation using a Euclidean distance metric. Nevertheless it was recognised
in both studies that uncertainties about the method itself, for example the most
appropriate similarity coefficient, should be reflected in cluster interpretation (cf.
Bennett and Jordan, 1975; Galton et al., 1980, appendix 2c).
Uncertainty about technique is perhaps best illustrated by the most recent American study to adopt this approach. Solomon and Kendall(l979) cluster analysed data
from 50 teachers and 1,200 pupils and in so doing they tried several cluster techniques
- Q factor analysis, Linear Typal analysis, Cluster build-up, Elementary Linkage
analysis and a hierarchical method. They reported that although most provided six
teacher clusters they produced somewhat different results. In order to overcome this
they developed several sets of core clusterings , each started from the vantage point
of one of the clustering methods. They then identified for each cluster those classes
which also fell into the same group by at least two of the other clustering methods.
Discriminant function analysis was then used to complete the cluster assignments.
While researchers have been struggling with the practical application of clustering
methods, statisticians have been considering their statistical foundations. Everitt
(1977), for example, argued that A fundamental problem in this area is the lack of a
satisfactory definition of exactly what constitutes a cluster. Because of this, most
clustering techniques cannot be formulated in terms of a satisfactory model . . . Most
cluster analysis methods are essentially non-statistical in the sense that they have no
associated distribution theory or significance tests, and so are unable to relate from
sample to population . . . Hartigan (1977) pointed out the sparsity of methods for
establishing the reality of clustering: The very large growth in clustering techniques and applications is not yet supported by development of statistical theory by
which the clustering results may be evaluated . . . There are many guesses, conjectures, analogies, and hopes, and only a few hard results. Aitkin (1979) has also
pointed out the unsatisfactory nature of clustering methods which are not based on a
probability model: How do we know that a particular configuration of clusters
produced by a numerical algorithm would also have been produced by a different
random sample from the same population, or by a different algorithm on the same
sample? What confidence can be placed in the existence of real clusters? . . The
only methods of cluster analysis which allow formal statistical tests for the actual
existence of clusters . . . are those based on mixture models . . .
Clustering methods based on probability models allow estimation and hypothesis
testing within the framework of standard statistical theory. Though theoretical
difficulties remain in deciding on the number of clusters, for a given number of clusters
the assignment of individuals to clusters is based on standard likelihood ratio methods
analogous to those used in discriminant analysis.
Re-analysis
In re-examining the existence of distinguishable teaching styles, a mixture or latent
class probability model was adopted using the original 38 binary items from the
teacher questionnaire. It is assumed that the population consists of k homogeneous
subpopulations or latent classes of teachers, each class having a distinct teaching style.
Each class is characterised by a set of 38 response probabilities, the probabilities of
responding YES to each of the 38 binary items in Table 1 . Given these probabilities,
the probability that a teacher belongs to the j-th class is calculated from Bayes
theorem using the pattern of Yes and No responses for the teacher. Full details
of the model and the method of estimation of the response probabilities are given in
Aitkin and Bennett (1980). It is an important feature of this form of probabilistic
clustering that it does not produce assignments of individuals to classes, but gives
172
Teaching Styles
instead the probability that each individual belongs to each latent class. This is
preferable to a formal assignment rule (as in discriminant analysis) which assigns each
individual to the class to which he or she has the greatest probability of belonging,
since this overstates the information available about cluster membership.
Parameter estimates
The parameter estimates (i.e., the maximum likelihood estimates of the response
probabilities) for the two- and three-latent class models are shown in Table 1. The
item number corresponds to that in TS (pp. 166-9), the number in parentheses next
to the item number being the number of this item in Table 2 of Bennett and Jordan
(1975). For the two-class model, the response probabilities marked 1- show large
Two- AND
TABLE 1
THREE-LATENT
CLASSPARAMETER
ESTIMATES
(1OOx djl) FOR TEACHER
DATA
Two-Class Model
Item
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16 (ij
(11)
(6)
(7)
(8)
(iii) (9)
(iv) (10)
(v) (11)
17
18
19
20
21
22
23
24
25
26
27 (i)
(12)
(13)
(14)
(15)
(ii) (16)
(iii)
(IV)
(v) (17)
28 (i) (18)
(ii)
(iii) (19)
Class 1
22
60
35
91
97
89
97
82
85
32
90
19
92
29
35
71
29
15
55
28
18
87
43
84
57
59
73
66
09
97
70
65
86
24
19
85
55
22
0.538
Class 2
43
87f
23
54t
63t
48t
76t
gt
60
66t
49
76
37
22
44
42
gt
50
;
it
14t
68
29
38
51
44
09
95
53
42
77
17
15
+62: 5 t
0.462
Three-Class Model
Class 1
Class 2
Class 3
20
54
36
91
100
94
96
92
90
33
95
20
97
28
45
73
24
13
57
29
14
87
50
86
65
68
83
75
07
98
69
64
85
21
15
87
53
21
44
88
22
52
53
50
69
39
70
70
62
56
80
39
29
37
45
59
32
62
95
16
64
30
43
56
48
01
99
49
33
74
13
08
43
61
75
33
79
..
30
89
74
61
95
56
69
35
77
26
75*
34
12*
62
38
20
50
26*
34
90
20
78
34
35*
46*
42*
18*
91*
67
63
85
28*
27*
73
63*
33
0.366
0.312
-f Indicates an item with large differences in response probability between Classes 1 and 2
djJIis the estimated probability that a teacher in class j responds YES to item 1.
60
0.322
M. AITKIN,
S. N. BENNETT
and JANE HESKETH
173
Teaching Styles
174
2 60
Figure 1 (a)
2 40
Probability of class
membership, two
latent-class model
220
200
180
u)
160
Q)
140
4J
u-l
120
2
'2
100
80
60
40
20
-6,
1
.5
*.7
.6
.9
.8
1.0
Figure 1 (bl
Probability of class
membership, three
latent-class model
120
.du
80
4J
w
0
FI
60
40
z'
20' a
* '
,.4
n h n h 7 - F
.5
.6
.7
.8
.9
1.0
M. AITKIN,
S. N. BENNETT
and JANE HESKETH
175
Latent
Class
Mixed
(Class 3)
Informal
(Class 2 )
Total
10
11
11
2
7 2 6 1 9
6 1 4
6
1 1 3
8
(3) (41) (46) (6) (27) (69) (63) (20) (39) (20) (21)
1
22
34
19
8
15
10
2
4
3
(97) (59) (33) (67) (58) (26) (7) (13) (8) (3) (0)
33
26
38
30
35
32
24
30
36
31
39
12
(0)
-
(0)
36
Unclass
31
(39)
31
(40)
78
Total
144
149
468
The top entry is the number of teachers in each latent class who fall in the corresponding TS cluster, and the bottom
entry is the percentage of teachers out of the total in this cluster.
176
Teaching Styles
latent classes (2 and 1 respectively). About 40 per cent of TS Cluster 2 teachers are in
latent class 3, the mixed class, as are 20 per cent of TS Cluster 11 teachers. The
remaining TS clusters are split across all three classes to varying degrees, the proportion of Class 1 teachers increasing, and of Class 2 teachers decreasing, fairly steadily
from Cluster 1 to Cluster 12. Clusters 6 and 7 contain the greatest proportion of
Class 3 teachers.
The general pattern of Table 2 supports the ordering in TS from Cluster 1 to
Cluster 12 of increasing formality, though as noted there (p. 47), clusters other
than 1 and 12 contain both formal and informal elements.
It was noted above that the formal assignment of teachers to latent classes overstates the information available from the probabilistic clustering. Since the conclusions drawn about pupil progress in TS depend critically on the cluster membership
of the 37 teachers, Table 3 considers the actual latent class membership for these
teachers. Table 3 shows the probabilities of latent class membership for 36 of the
teachers (one mixed TS style teacher could not be identified, and has been omitted
from this table) for the three-class model.
TABLE 3
LATENT
CLASS
PROBABILITY
AND TS STYLE
CATEGORY
FOR 36 TEACHERSTHREE
CLASSES
TS Style
Latent
Class
Formal
Mixed
Informal
100
100
99
99
100
100
92
100
98
100
71
94
01
01
-
08
02
29
06
100
100
70
12
44
01
100
85
11
-
3
~
- - - - _ - 30
- 01 85
8 8 49
07
98
01
14
86
1 5 8 9 01
99
_
_
_
-
_
-
03
03
73
36
93
2
~
100
100
14
100
97
100
91
100
100
100
27
64
07
The entries are the probabilities of latent class membership ( x 100) for the
three-classmodel for 36 of the 37 teachers in TS Chapter 5.
The formal TS teachers, with one exception, have very high probabilities of
belonging to Class 1. The one exception has a probability of 0.29 of belonging to
Class 3, the mixed class. Nine of the 13 informal TS teachers have very high probabilities of belonging to Class 2, but three of the remaining four have high probabilities
of belonging to Class 3, and the fourth is essentially unidentified. The mixed TS
teachers are poorly identified : three clearly belong to Class 1, one to Class 2, and one
to Class 3, while the remainder have substantial probabilities of belonging to two
classes.
Conclusion
There is convincing statistical evidence, based on the latent class model, of three
distinguishable but overlapping teaching styles. Two of these correspond closely to
the broad classes formal and informal as these terms were used in TS. The third
M. AITKIN,
S. N. BENNETT
and JANE HESKETH
177
class, called ' mixed ' here as in TS, is characterised by a low frequency of testing and
assessment, and a relatively high frequency of disciplinary problems. The classification of the 36 teachers used in TS corresponds closely to the class membership
probabilities for formal teachers, less so for informal teachers and poorly for mixed
teachers.
THE RELATION OF TEACHING STYLE TO PUPIL PROGRESS
In Chapter 5 of TS the relation between teaching style and pupil progress was
investigated using an analysis of covariance model. The analysis was based on the
individual pre-test and test scores of each child, the children being classified by the
teaching style (formal, mixed, informal) of the teacher.
There has been considerable discussion in the educational research literature of
the ' unit of analysis ' question: should the child or the classroom be treated as the
' unit ' on which statistical analysis is based? Gray and Satterly (1976) raised this
question in their discussion of TS, and Bennett and Entwistle (1976) referred to it
briefly in their reply. Satterly and Gray (1976) gave a more detailed discussion of some
of the statistical issues involved, and recognised the need for a variance component
model for the data. A ' mixed ' or variance component model for ' clustered ' or
' nested ' sample designs is developed below for the one-way analysis of covariance
for pre-test/test situations. This model is then applied to the latent class membership
assignment for the 36 teachers described in Section 1. The model is then adapted for
the probability of latent class membership of the teacher.
Variance component model for the analysis of covariance
Let YPqr denote the achievement test score, and xpqrthe pre-test score, of the
r-th child in the q-th classroom, taught by method p , where r = 1, . . . nq, q = 1, . . .,
36, p = 1, 2, 3, N = Zqn,. All subsequent analyses will be based on extensions or
contractions of the variance component model:
Y p q r = P + Y X p q r + u p + Tq + E p q r .
Here T, and Epqrare mutually independent random variables, assumed to be normally
distributed :
COrr(Ypqr, Ypqfi) = P = 0 2 T / ( 0 2 T + 0 2 E )
178
Teaching Styles
The intraclass correlation p may be large if d Tis large compared with 02,, and is
zero only when ( ~ =2 0, ~that is when there is no variation among teachers in the
teacher population, which will rarely happen in practice.
The above model may be extended to allow for pre-test by method interactions:
it may happen that the slope of the regression of test score y on pre-test score x is
different for different methods. A comparison of the methods then depends on the
covariate value considered, and one method may be superior for low pre-test scores,
while another is superior for high pre-test scores. The extended model is
Y p q r = P + Y p x p q r + ~p + Tq + E p q r ,
+ method 2,
and the regressions are now p + ylxlqr+ a1 for method 1, p y 2 ~ 2 ~u p~for
and p + ~ 3 x for
3 ~method
~
3.
Unconditional conclusions about the relative superiority of one treatment to
another are not possible in general with this extended model. Although methods are
available for drawing conditional conclusions given the value of the pre-test score, this
is not pursued further, as the interaction model will not be found necessary.
In general, efficient (maximum likelihood) estimation of the parameters in the
above models requires extensive iterative computation, even when the class sizes are
equal. Several simpler methods are available which give consistent, but not efficient,
estimates, and for which approximate ANOVA tables can be constructed. Three of
these were applied to the TS data, both for internal comparisons, and for comparison
with the efficient method. Discussion of these methods is given in Aitkin and Bennett
(1980). The methods are summarised in terms of their estimation of the fixed effects
as follows: I-ignore the random effects ; II-unweighted class means ; III-class
means weighted by sample size.
For each method, an ANOVA table can be presented as follows.
Source
SS
df
MS
Regression on pre-test x
ss,
1
Among methods, adjusted for pre-test S S ,
2 MS,
Method x pre-test interaction,
adjusted for methods and pre-test SS,
2 MS,
Residual variation among teachers
SS,
31 MS,
Within teachers, adjusted for pre-test SS, N-37 MS,
The first four sums of squares are obtained by successive differencing of the residual
sum of squares among teachers after fitting the appropriate parameters. This
procedure is fully described in Aitkin and Bennett (1980).
It should be noted that the sums of squares do not have distributions which are
multiples of x2, even when the class sizes are equal. If there are no covariates, the
sums of squares have multiples of x 2 distributions if the class sizes are equal, and
approximate x2 distributions if the class sizes are not too unequal.
Before describing the results of these methods, consideration of the effect of
classroom formation on the conclusions to be drawn is needed.
179
recognised as formal were not systematically assigned to classes which were below
(or above) average on the pre-test.
If there were evidence of such an assignment bias, it would be very difficult to
draw general conclusions about differences in achievement between formal and informal teaching styles used on pupils of the same initial achievement, for teaching style
and initial achievement would be at least partly confounded.
Since pupils were not randomly assigned to classes, it may be expected that the
36 classes will differ systematically in their mean scores on the pre-test, such differences
reflecting variation in the school populations, previous teachers and other systematic
effects. The adjustment for the pre-test should then reduce the residual variation
among teachers, and thus increase the sensitivity of the test for teaching style differences, since the variation among teaching styles would not be reduced by the pre-test
adjustment, if initial achievement and teaching style are not confounded.
Thus we may expect that the ANOVA variance component model, when applied
to the TS study, will give interpretable results only if there are no systematic differences
among teaching styles on the pre-test score. Even in this case, considerable care is
needed in interpreting different styles as a cause of differential achievement. The data
do not come from a randomised experiment, and there are many possible confounding
variables. Discussions of such variables were given in TS, Gray and Satterly (1976)
and Bennett and Entwistle (1976).
With these cautions in mind, the results of the variance component models
applied to the TS data are considered in the next section.
A further difficulty, referred to several times previously, is that latent class
membership is probabilistic, since class membership is not observable. An extended
ANCOVA model incorporating latent variables is necessary to properly model the full
data: such a model is considered later.
ANCOVA results for the TS data
The pre-test scores for reading, mathematics and English are first considered.
A one-way classification variance component model is fitted to each of the pre-test
scores, using the approximate ANOVA method for unequal class sizes. The ANOVA
tables are shown in Table 4,based on complete data for 921 children (although 950
children were analysed in TS, one complete classroom of 29 children was omitted in
the re-analysis because the teachers style could not be identified).
TABLE 4
ANOVA
OF PRE-TEST
SCORES
Reading
Source
Among styles
Among classrooms
within styles
Within teachers
df
2>,
33
885
ss
Mathematics
2,826
MS
1,413
57,649
163,540
1,747
185
ss
50,355
1203
106,227
185
61
0.25
120
53
0.31
157
59
0.27
Means
Formal
Mixed
Informal
101.1
97.4
97-7
99-9
97.3
97.9
102.8
99.7
99.1
1,728
6.2 T
English
MS
w21
1,473
1,526
120
ss
MS
2,853
1,427
55,224
139,293
1,683
157
1,659
Teaching Styles
In all three cases, the among-styles mean square is less than the among-teacher
within-styles mean square, so there is no evidence of association of style with pre-test
score. The variance component estimates are also given in Table 4, based on the
pooled style and within-style sums of squares. The correlation between childrens
pre-test scores within classrooms is moderate, and certainly not zero.
This conclusion differs from that in TS and arises from the use of a different
denominator in the F-tests used. In TS, the within-styles mean square was used as the
denominator for the Ptest of among-style differences. Here the residual among
classrooms within-styles mean square is used. In the variance component model the
ratio of among-styles mean square to within-styles mean square does not have an
Fdistribution unless 6,= 0, and its distribution depends on the ratio of the variance
components B,/a2,
(see Aitkin and Bennett, 1980, for details). Since for all three
test scores 02, = 0 is not tenable, the test for style differences must be based on the
ratio of among-stylesmean square to the among-classroomswithin-styles mean square.
The former ratios are all about 10, the latter are all less than 1.0.
The ANCOVA variance component model was fitted to each of the test scores, and
the ANOVA tables are shown in Table 5. Three tables are presented comparing the
three methods on each test score. The analyses of variance and parameter estimates in
Table 6 are fairly consistent over the three methods.
It is clear that the variation among styles is quite small compared with that
among teachers within styles. There are negligible style-by-pre-test interactions,
though it is notable that Methods I1 and I11 consistently find larger interactions than
Method I. These interactions are therefore pooled with the error term as indicated in
Table 5. As expected, the residual variation among teachers on the test score has been
substantially reduced compared with that on the pre-test after adjustment for the
pre-test. However, the teaching style sums of squares are such that the small effects of
TABLE 5
ANCOVA
OF
TESTSCORES:
LATENT
CLASSASSIGNMENT
Method I
Source
Method I1
df
ss
MS
ss
1
2
21
132,985
527
114
263
3871
0.39
1,778
17.5
99.4
MS
Method 111
ss
0.34
38,590
530
2.160
MS
(a) Reading
Pre-test
Among styles
P x S interaction
Residual
among teachers
Within teachers
(b) Mathematics
Pre-test
Styles
Residual
P
XS
among teachers
Within teachers
3 1 ) ~ ~
882
1
2
21,644
38,305
144,555
972
356
486
2}33
31
882
15,892
44,376
50
1
2
116,457
1,186
26
312}33
882
9,675
41,285
(c) English
Pre-test
Styles
PXS
Residual
among teachers
Within teachers
6983679
43
0.99
513
178}492
726.8
8.8
49.71
2 4 . ~ ) ~ ~ ~20;180
2,975
479.7
-
0.63
16.0
16.4
265
1.0801
673]698
69,450
670
335
870
12,500
417
435)418
0.38
0.80
181
M, AITKIN,S. N. BENNETT
and JANE HESKETH
TABLE 6
ADJUSTED
MEANDIFFERENCES
FOR TEACHING
STYLES
: LATENT
CLASSASSIGNMENT
Reading
Method
0.10
Formal
Mixed
-1.12
Informal
1.01
Regression
0.77
Coefficient
Mathematics
English
I1
111
I1
-0.35
-0.66
1.02
0.03
-1.08
1.04
1.09
-1.62
0.52
0.61
-1.28
066
0.79
-1.41
0.61
0.85
0.80
0.95
1.18
1.15
111
I1
I11
1.49
-1.41
1.24
-1.25
0.00
1.34
-1.35
0.01
-0.08
0.76
0.87
0.82
(These estimates are obtained from and &2in the ANCOVA model of $2.3, u3 being set to zero, by
subtracting (&I +62)/3, to give estimates which sum to zero.)
different teaching styles are swamped by the variation among classrooms due to other
systematic effects. The largest style effects are in English using Method I, but the
Pvalue for among-styles compared with among-teachers is only 2-02, which is not
significant. Table 6 shows the intercept differences, or adjusted mean differences,
between styles on each test, from the model with no interactions.
The direction of the differences is not consistent with those reported in TS, due
to the change in class membership of teacher resulting from the different class assignment by the latent class model. The formal classrooms do best in English and slightly
better in mathematics but the informal classes do best in reading. The mixed classrooms do worst on all tests. These differences, though interesting, are not statistically
significant.
Latent class model for change
Teaching style is not observable, but is estimated probabilistically from the 38
binary behaviour variables. It has been shown that the mixed style, Class 3, was the
lowest on all three tests, and Table 3 shows that there are very few teachers who are
unequivocally assigned to this class-only two teachers have a probability of 0.9 or
more of belonging to it, and three others a probability of 0.85 or more, though there
are seven teachers assigned to this style by the assignment rule.
The assessment of teaching style differences must allow for the certainty with
which these style assignments are made. A reasonable procedure is to fit the
ANCOVA model replacing the implicit (0, 1) dummy variables for teaching style
membership by the latent class membershipprobabilities. Thus if zptakes the value 1 if
the child is taught by a teacher in latent class p , and 0 otherwise, then z1 is replaced
by P (class 11 X = x), and 22 by P (class 2)X = x), where these probabilities are given
in Table 3. Thus for the first teacher in the table, the dummy variables z1 and z2 take
the values 1-00 and 0.00, as they do in the previous analysis. For the last teacher,
z1 and zz take the values 0.00 and 0.07, these being the probabilities of membership in
Classes 1 and 2 for this teacher.
In the resulting ANCOVA model, CLI and c12 still have the same interpretations as
the (Class l-Class 3) and (Class 2-Class 3) mean differences; the change is only in the
certainty of the identification of the class membership of each teacher.
The use of the probabilities of style membership instead of the (0, 1) dummy
variables, though reasonable, is only an approximation to the efficient maximum
likelihood analysis. It is analogous to the use of estimated factor scores as predictors
of a response variable, instead of the full maximum likelihood estimation of the
parameters in the combined factor and regression model (such models are discussed
in Joreskog and Goldberger (1975) and can be analysed using the LISREL package).
D
182
Teaching Styles
TABLE 7
ANCOVA OF TESTSCORES:
LATENT
CLASSPROBABILITIES
Method I
Method I1
Method I11
ss
MS
ss
MS
SS
132,985
693
629
346
3141
0.51
1,778
18.5
115.0
9.2
57.5
0.39
2.43t
38,590
660
2,470
330
1,235
19,740
658
69,450
1.170
585
11,830
11040
394
41,550
1,554
777
424
8,359
279
df
Source
MS
(a) Reading
Pre test
Styles
PXS
Residual
among teachers
Within teachers
(6) Mathematics
Pre-test
Styles
Pxs
Residual
among teachers
Within teachers
1
2
31
2}33
21,623
698)674
710.2
23.7
38,305
43
tinteraction significant for informal group
882
1
2
31
z>33
144,555
1,731
298
15,191
865
1491
1.84
490)~~'
2,975
36.0
524
18.0
26.2
445.4
14.7
1.23
1.78
882
"1-
0.50
I48t
1.46
(c) English
11
1
2
Pre-test
Styles
PXS
Residual
among teachers
Within teachers
1.93
1*42
-7---
17
31
.>33
9,013
291jLr3
353.9
11.4
882
* P<O.lO
2121276
**P<0.05
TABLE 8
ADJUSTED
MEANDIFFERENCES
FOR TEACHING
STYLES: LATENTCLASS PROBABILITIES
Reading
Method
Formal
0.57
Mixed
- 1.75
Informal
1.19
Regression
0.77
Coefficient
I1
Mathematics
I11
See. text
for these:
interaction
-
I1
English
I11
2.82*
I1
I11
1.65
-2.78
1.13
0.91
-2.08
1.16
1.20
-2.36
1.15
2.09
-244
0.35
1.62
-1.92
0.31
1.93
-2.33
0.40
0.95
1.18
1.14
0.76
0.86
0.81
The results of fitting the model are shown in Tables 7 and 8. The F-distributions
for variance ratio tests may be regarded as rough approximations since the probabilistic dummy variables are determined independently of the regression data.
The significance of all style differences is substantially increased. For mathematics, the differences still do not reach significance, though the contrast between the
mixed style and the formal and informal styles is more pronounced. For English, the
differences by Method I reach significance at the 5 per cent level of F2,33,but are not as
large by the other two methods. For reading, the style by pre-test interaction with two
degrees of freedom (df) contains one component with almost all the sum of squares:
the contrast of informal with formal slopes. This single df term is significant at the
5 per cent level of F1,31for Method 11, and is almost significant for Method 111. It is
not significant by Method I.
M. AITKIN,
S. N. BENNETT
and JANE HESKETH
183
TABLE 9
READING
TEST-PRE-TEST REGRESSIONS
:
LATENT
CLASSPROBABILITIES
Method I1
Method I11
P = 4.1f1.02~
Formal
Mixed
P = 25.5t0.80~
Informal 'I = 54.7 + 0 . 5 2 ~
Formal
'I = 10.5 0.96~
Mixed
P = 22.9+ 0 . 8 2 ~
Informal
= 58.9+0.48x
The estimated regressions for the three groups for reading by Methods I1 and I11
are shown in Table 9. The slope is greatest for formal, and least for informal styles,
and the parameter estimates are very similar for the two methods. For classes scoring
low on the pre-test, the informal style has a much higher mean test score-an eightpoint difference for the lowest class. The cross-over between formal and informal
occurs at about x = 102, and classes scoring high on the pre-test do better under a
formal style-for the highest class, the difference is six points. The mixed regression
is in between, but is closer to the formal regression.
Comparison with maximum likelihood estimation
We consider finally the estimation of the parameters of the model by maximum likelihood. Programmes for ML-estimation in the unbalanced mixed model are not widely
available (BMDP has such a programme, but it was not implemented on UMRCC)
so a GENSTAT macro was developed (by Dorothy Anderson). Tests of the hypotheses of no interaction or no style main effects are based on the likelihood ratio test, and
Table 10 gives an ' analysis of deviance ' table in which the entries have x 2 distributions
under the appropriate null hypothesis. This table should be compared to Table 7
where the approximate ANOVA methods were used.
None of the pre-test by style interactions is significant at the 10 per cent level.
Reading, which had the largest interaction in the class mean models, has the smallest
here. The main effect of English is significant at the 10 per cent level. No other
effects are significant. Parameter estimates for the main effect models are given in
Table 11. The pattern of mean differences is similar to that found by the ordinary
least squares Method I, but the differences are smaller.
It should be noted that the class mean method of parameter estimation results in
a serious loss of efficiency. In the case of reading a misleading interaction appears,
and the conclusions about the relative differences between styles are incorrect. Since
the class mean method is based on only 36 ' observations ', the possibility of random
fluctuations among classes producing misleading results is quite high, and this method
cannot be regarded as a satisfactory substitute for ML estimation when the number of
classrooms is small.
TABLE 10
ANALYSIS
OF DEVIANCE
OF TESTSCORES
: LATENT
CLASS
PROBABILITIES
Deviance
Source
Styles, adjusted
for pre-test
PXS
df
(a) Reading
(6) Mathematics
(c) English
0.8
2.8
5.2*
0.2
1.o
4.0
P < 0.10
184
Teaching Styles
TABLE 11
ADJUSTED
MEANDIFFERENCES
FOR TEACHING
STYLES BY RESTRICTED
MAXIMUM
LIKELIHOOD
:
LATENT
CLASSPROBABILITIES
Reading Mathematics English
Formal
Mixed
Informal
0.15
- 1.29
1.14
1.33
-2.56
1.22
1.91
-2.18
0.27
Conclusion
The teaching style differencesin achievement which were found in TS are modified
by the re-analysis. There are two reasons for this. First, the analysis of covariance
model which includes the random effect of teachers results in greatly reduced significance of any differences, because of the large random variation among teachers.
Second, the clustering of teachers by the latent class model changes the nature of the
differences between teaching styles.
The only significant teaching style differences are in English, where the formal
style has the highest mean, mixed the lowest, and informal is in the middle. In
mathematics, the formal and informal styles are close, and substantially above the
mixed style. In reading informal has the highest mean, mixed the lowest, and formal
is in the middle. Though the differences may appear small, the four-point difference
between formal and mixed in reading corresponds to a 6 to 8 months difference in
reading age. It is of interest that the mixed style which was distinguished in the
cluster analysis by a relatively high frequency of disciplinary problems, and by the
lowest use of formal testing, gives consistently the worst results in the achievement
model.
RECOMMENDATIONS
The re-analysis of the TS data discussed in this paper raises important issues for
the design and analysis of future educational research studies of this kind.
First, research designs using multi-stage sampling of schools and classrooms are
natural and administratively feasible. The examination of intact classrooms for
teacher or pupil differences does raise difficult statistical problems, but the formidable
difficulties of randomised experiments in a school administrative context mean that
non-randomised observational studies will remain important in educational research.
When intact classrooms are the effective experimental or quasi-experimental unit,
but outcomes are measured on pupils in the classroom, the correlation between pupils
within a classroom must be allowed for by a suitable variance component model.
Such models are necessary for multi-stage sampling procedures of all kinds in general
survey designs. It should be clear from the discussion in Section 2 that the effective
sample size for testing the significance of effects at the teacher level is the number of
classrooms in the study, and this number should therefore be as large as possible:
many classrooms with few pupils in each will give much greater power than few
classrooms with many pupils. Financial constraints obviously impose a limit on the
possible number of classrooms, but a small number of classrooms is likely to result
in low power and the failure to find differences. Only the four-point difference in
reading was statistically significant, but smaller differences than this are educationally
significant.
The approximate methods of analysis described in Section 2 are not satisfactory
alternatives to full maximum likelihood estimation in the variance component model.
In particular, the use of class means results in a very serious loss of efficiency in
M. AITKIN,
S. N. BENNETT
AND JANE HESKETH
185
estimating effects at the pupil level (for example, the regression of test on pre-test, or
the size of sex differences).
A major gap exists in statistical packages in this area: BMDP is the only package
available which has a maximum likelihood programme, and implementation of this
programme at UMRCC has been substantially delayed. Efficient methods for the
analysis of multi-stage sample designs cannot become generally used without good
general programmes.
In non-randomised observational studies, many sources of potential bias are
present. We cannot interpret effects of interest (like teaching style differences) in
such studies as though they had arisen from properly randomised experimental studies.
The best that can be done is to measure other possible confounding variables,
and to allow for their effects through covariance analysis. This ' statistical control '
is never a substitute for ' experimental control ' through randomisation. In our
interpretation of teaching style effects, we noted that confounding of pre-test score
with teaching style did not occur, and so tentative conclusions could be drawn about
the ' effects ' of different styles. However there are many other possible confounding
variables, some of which were discussed by Gray and Satterly (1976) and Bennett and
Entwistle (1976), and in TS itself, and so the interpretation of different teaching styles
as a cause of differential achievement should not be pushed too far. An important
implication of non-randomised studies is the need for measurement of a large number
of possible confounding variables, and the resulting complexity of the statistical
models which need to be fitted.
In the discussion of clustering in Section 1, great emphasis was placed on the
latent class model. Latent variable models are essential in the analysis of studies of
this kind, in which a large amount of information (38 items here) is available about
each teacher, but the number of teachers used in the second stage is relatively small.
It would not be possible to use the 38 items as explanatory variables in a regression of
test score on pre-test and the items, for there are more items than teachers in the second
stage. The items are treated as indicators of an underlying latent style of teaching,
and so the dimensionality of the teacher information is reduced from 38 to two (the
two dummy variables needed for three styles).
It is worth emphasising again that clustering methods must be based on statistical
models if they are to have any validity. Cluster algorithms based on distance functions
which bear no relation to the type of data considered, or to any probability considerations, cannot be expected to produce clusters which have any statistical validity. The
probabilistic nature of cluster membership is an essential feature of the statistical
model, and formal assignments to clusters from standard algorithms overstate the
real information available from clustering. (In the re-analysis, the addition of the
random error involved in producing a formal assignment actually reduces the differences among styles.) It is important to note that effective clustering requires a sample
size which is large relative to the number of descriptive items used. If the sample size
is not large relative to the number of items, the occurrence of multiple local maxima
of the likelihood function indicates that there may be several different configurations
of clusters which are equally well supported by the data. It would have been pointless
to attempt to cluster a small sample of, say, 50 teachers using 38 binary items, or even
ten items. Again financial constraints limit the possible sample size in observational
studies. There are very few programmes available for probabilistic clustering. The
normal mixture model was described by Wolfe (1970), and a FORTRAN listing for
this model is given in the book on cluster analysis by Hartigan (Wiley, 1975), Goodman (1978, p. 468) gives a brief reference to a programme for maximum likelihood
latent structure analysis.
Simple macros in GLIM and GENSTAT can be written for ML estimation of
Teaching Styles
186
the parameters in general mixture models, including the latent class model of Section
1. Programme listings can be obtained from the Centre for Applied Statistics at
Lancaster.
The final comments are on the importance of statistical computing and statistical
modelling in the analysis of educational studies of this kind, and indeed of any studies
involving multi-stage sampling. The statistical theory of the analysis of unbalanced
mixed models has been established for at least 10 years, but only recently have any
computer programmes been developed which are suitable for the analysis of such
studies. Such programmes are still not generally available, and a pressing need
exists for the development of general-purpose programmes or sub-routines which can
handle the latent variable models on which clustering methods should be based. Such
programmes are under development at Lancaster in the Complex Social Data research
programme supported by the SSRC.
The importance of statistical modelling is more general. The discussion of
' class ' versus ' pupil ' as the ' unit of measurement ', can be resolved by answering
the question, " What is the appropriate statistical model for data from a multi-stage
sample? " This is even more important for cluster models, which are much less
developed statistically.
ACKNOWLEDGMENT.-This
REFERENCES
AITKIN,M. A. (1979). Dealing with survey data. Br. J. educ. Psychol., 49, 198-205.
AITKIN,M. A., and BENNETT,
S. N. (1980). A Theoretical and Practical Investigation into the Analysis
of Change in Classroom Based Research. Final report on SSRC grant HR5710.
AITKIN,M. A., ANDERSON,
D. A., and HINDE,J. P. (1981). Statistical modelling of data on
teaching styles. J. Roy. statist. soc., Series A & B, 144, (in press).
BARKER
LUNN,J. C. (1970). Streaming in the Primary School. Slough: NFER.
S. N. (1976). Teaching Styles and Pupil Progress. London: Open Books.
BENNETT,
BENNETT,
S. N., and EN'IWISTLE,N. J. (1976). Rite and wrong. A reply to 'A Chapter of Errors'.
Educ. Res., 19, 217-222.
BENNETT,
S. N., and JORDAN,
J. (1975). A typology of teaching styles in primary schools. Brit.
J. educ. Psychol., 45, 20-28.
C. A. and PAYNE,C. (Eds.),
EVERITT,B. S. (1977). Cluster analysis. In ~'MUIRCHEARTAIGH.,
The Analysis of Survey Data. Vols. I and 11. Chichester: Wiley.
GALTON,
M., SIMON,B., and CROLL,P. (1980). Inside the Primary School. London: Routledge and
Kegan Paul.
GOODMAN,
L. A. (1978). Analyzing QualitativelCategorical Data. London : Addison-Wesley.
GRAY,J., and SATTERLY,
D. (1976). A chapter of errors. Educ. Res., 19, 45-56.
HARTIGAN,
J. A. (1975). Crustering Algorithms. New York: Wiley.
HARTIGAN,
J. A. (1977). Distribution problems in clustering. In VANRYZIN,J. (Ed.), Classification
and Clustering. New York: Academic Press.
JORESKOG,
K. G., and GOLDBERGER,
A. S. (1975). Estimation of a model with multiple indicators and
multiple causes of a single latent variable. J. American statist. Assoc. 70, 631-639.
SATTERLY,
D., and GRAY,J. (1976). Two Statistical Problems in Classroom Research. Unpublished.
SEARLE,
S. R. (1971). Linear Models. New York: Wiley.
SOLOMON,
D., and KENDALL,
A. J. (1979). Children in Classrooms. New York: Praeger.
WOLFE,J. H. (1970). Pattern clustering by multivariate mixture analysis. Multiv. Behav. Res., 5,
329-350.
(Manuscript received 10th October, 1980)