Académique Documents
Professionnel Documents
Culture Documents
Report. Not included are chapter 12 (Directions per subtest), chapter 13 (The record
form, norm tables and computer program) and the appendices.
The reference for this text is:
Tellegen, P.J., Winkel, M., Wijnberg-Williams, B.J. & Laros, J.A. (1998).
Snijders-Oomen Nonverbal Intelligence Test. SON-R 21/2-7 Manual and Research
Report. Lisse: Swets & Zeitlinger B.V.
This English manual is a translation of the Dutch manual, published in 1998 (SON-R
21/2-7 Handleiding en Verantwoording). The German translation was also published
in 1998 (SON-R 21/2-7 Manual). In 2007 a German manual was published with
German norms (SON-R 21/2-7 Non-verbaler Intelligenztest. Testmanual mit deutscher
Normierung und Validierung).
CONTENTS
Foreword . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
, -7
PART I: THE CONSTRUCTION OF THE SON-R 2,
1.
2.
3.
4.
5.
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13
1.1
13
1.2
14
1.3
16
1.4
17
1.5
18
19
2.1
19
2.2
22
, -7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Description of the SON-R 2,
25
3.1
The subtests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
25
3.2
31
3.3
32
35
4.1
35
4.2
37
4.3
39
4.4
40
Psychometric characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
43
5.1
43
5.2
47
5.3
51
5.4
52
5.5
54
SON-R 2,-7
57
57
58
58
59
60
61
62
63
64
7.
67
67
70
74
74
75
77
78
79
8.
Immigrant children . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.1 The test results of immigrant children . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.2 Relationship with the SES level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.3 Differentiation according to country of birth . . . . . . . . . . . . . . . . . . . . . . . . . .
8.4 Comparison with other tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.5 The test performances of children participating in OPSTAP(JE) . . . . . . . . . .
81
81
82
82
83
84
9.
87
89
93
94
96
101
102
104
106
109
110
112
CONTENTS
117
117
119
124
128
10.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
134
137
11.1 Preparation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
137
138
140
141
144
145
147
12.1 Mosaics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
148
12.2 Categories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
154
12.3 Puzzles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
159
12.4 Analogies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
163
12.5 Situations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
169
12.6 Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
173
187
187
191
194
198
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
205
Appendix A
Norm tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
211
Appendix B
250
Appendix C
255
Appendix D
256
SON-R 2,-7
15
21
23
, -7
Description of the SON-R 2,
Table 3.1 Tasks in the subtests of the SON-R 2,-7 . . . . . . . . . . . . . . . . . . . . . . . . . .
Figure 3.1 Items from the subtest Mosaics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Figure 3.2 Items from the subtest Categories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Figure 3.3 Items from the subtest Puzzles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Figure 3.4 Items from the subtest Analogies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Figure 3.5 Items from the subtest Situations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Figure 3.6 Items from the subtest Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Table 3.2 Classification of the subtests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
26
27
27
28
29
30
31
32
37
38
38
43
45
46
46
47
48
50
51
52
53
55
56
CONTENTS
57
58
59
60
60
61
62
63
64
65
66
68
69
71
73
74
75
76
77
79
81
82
83
84
88
89
90
94
SON-R 2,-7
Table
Table
Table
Table
Table
Table
Table
Table
Table
Table
9.5
9.6
9.7
9.8
9.9
9.10
9.11
9.12
9.13
9.14
Table 9.15
Table 9.16
Table 9.17
Table 9.18
95
97
98
102
103
104
105
107
108
109
111
113
114
116
118
123
130
132
133
133
190
195
197
202
FOREWORD
Nan
Snijders-Oomen
(1916-1992)
Jan
Snijders
(1910-1997)
The publication of the SON-R 2,-7 completes the third revision of the Snijders-Oomen Nonverbal Intelligence Tests. Over a period of fifty years Nan Snijders-Oomen and Jan Snijders
were responsible for the publication of the SON tests. We feel honored to be continuing their
work. They were interested in this revision and supported us with advice until their death.
The present authors played different roles in the production of this test and the manual. Peter
Tellegen, as project manager, was responsible for the revision of the test and supervised the
research. Marjolijn Winkel made a large contribution to all phases of the project in the context of
her PhD research. Her thesis on the revision of the test will be published at the end of 1998. Jaap
Laros, at present working at the University of Brasilia, participated in the construction of the
subtests, in particular Mosaics and Analogies. Barbara Wijnberg-Williams, made a large contribution, based on her experience as a practicing psychologist at the University Hospital of
Groningen, to the manner in which the test can be administered nonverbally to children with
communicative handicaps.
The research was carried out at the department for Personality and Educational Psychology of
the University of Groningen. Wim Hofstee, head of the department, supervised the project.
Jannie van den Akker and Christine Boersma made an important contribution to the organization of the research.
The research was made financially possible by a subsidy from SVO, the Institute for Educational Research (project 0408), by a subsidy from the Foundation for Behavioral Sciences, a section
10
SON-R 2,-7
11
FOREWORD
research. We are, therefore, interested in the experiences of users, and we would appreciate
being informed of their research results when these become available as internal or external
publications. We intend to inform users and other interested parties about the developments and
further research with the SON tests via Internet. The address of the homepage will be:
www.ppsw.rug.nl/hi/tests/sonr.
In the last years the need to carry out diagnostic research on children at a young age has greatly
increased. Furthermore, the realization has grown that the more traditional intelligence tests are
less suitable for important groups of children because they do not take sufficient account of the
limitations of these children, or of their cultural background. In these situations the SON tests
are frequently used. We hope that this new version of the test will also contribute to reliable and
valid diagnostic research with young children.
Heymans Institute
University of Groningen
Grote Kruisstraat 2/1
9712 TS Groningen
The Netherlands
tel. +31 50 363 6353
fax +31 50 363 6304
e-mail: p.j.tellegen@rug.nl
http://www.testresearch.nl
, -7
Reviewing of the SON-R 2,
The test has been reviewed by de COTAN, the test commission of the Netherlands
Institute for Psychologists. The categories used are insufficient, sufficient and good.
The rating is as follows:
Basics of the construction of the test:
Execution of the materials:
Execution of the manual:
Norms:
Reliability:
Construct validity:
Criterion validity:
good
good
good
good
good
good
good
13
INTRODUCTION
The new version of the Snijders-Oomen Nonverbal Intelligence Test for children from two-anda-half to seven years, the SON-R 2,-7, is an instrument that can be individually administered to
young children for diagnostic purposes. The test makes a broad assessment of mental functioning possible without being dependent upon language skills.
, -7
1.1 CHARACTERISTICS OF THE SON-R 2,
The SON-R 2,-7, like the previous version of the test, the SON 2,-7 (Snijders & SnijdersOomen, 1976), provides a standardized assessment of intelligence. The childs scores on six
different subtests are combined to form an intelligence score that represents the childs ability
relative to his or her age group. Separate norm tables allow total scores to be calculated for the
performance tasks and for the tasks mainly requiring reasoning ability.
A distinctive feature of the SON-R 2,-7 is that feedback is given during administration of
the test. After the child has given an answer, the examiner tells the child whether it is correct or
incorrect. If the answer is incorrect, the examiner demonstrates the correct answer. When
possible, the correction is made together with the child. The detailed directions provided in the
manual also make the test suitable for the assessment of very young children. In general, the
examiner demonstrates the first items of each subtest in part or in full. Examples are included in
the test directions and items.
The items on the subtests of the SON-R 2,-7 are arranged in order of increasing difficulty.
This way a procedure for determining a starting point appropriate to the age and ability of each
individual child can be used. By using the starting point and following the rules for discontinuing the test, the administration time is limited to fifty to sixty minutes.
The test can be administered nonverbally or with verbal directions. The spoken text does not
give extra information. The manner of administration can thus be adapted to the communication
ability of each individual child, allowing the test to proceed as naturally as possible.
Because the test can be administered without the use of written or spoken language, it is
especially suitable for use with children who are handicapped in the areas of communication
and language. For the same reason it is also suitable for immigrant children who have little or no
command of the language of the examiner.
The testing materials do not need to be translated, making the test suitable for international
and cross-cultural research. The SON-tests are used in various countries. The names of the
various subtests are shown on the test booklets in the following languages: English, German,
Dutch, French, and Spanish. The manual has been published in English and German as well as
in Dutch.
A similarity between the SON-R 2,-7 and other intelligence tests for (young) children, such
as the BAS (Elliott, Murray & Pearson, 1979-82), the K-ABC (Kaufman & Kaufman, 1983), the
RAKIT (Bleichrodt, Drenth, Zaal & Resing, 1984) and the WPPSI-R (Wechsler, 1989), is that
intelligence is assessed on the basis of performance on a number of quite diverse tasks. However, verbal test items are not included in the SON-R 2,-7. Such items are often dependent to a
great extent on knowledge and experience. The SON-R 2,-7 can therefore be expected to be
focused more on the measurement of fluid intelligence and less on the measurement of crystallized intelligence (Cattell, 1971) than are the other tests.
14
SON-R 2,-7
The subtests of the SON-R 2,-7 differ from the nonverbal subtests in other intelligence tests in
two important ways. First, the nonverbal part of other tests is generally limited to typical
performance tests. The SON-R 2,-7, however, includes reasoning tasks that take a verbal form
in the other tests. Second, while the testing material of the performance part of the other tests is
admittedly nonverbal, the directions are given verbally (Tellegen, 1993).
An important difference with regard to other nonverbal intelligence tests such as the CPM
(Raven, 1962) and the TONI-2 (Brown, Sherbenou & Johnsen, 1990) is that the latter tests
consist of only one item-set and are therefore greatly dependent on the specific ability that is
measured by that test. Nonverbal intelligence tests such as the CTONI (Hammill, Pearson &
Wiederholt, 1996) and the UNIT (Bracken & McCallum, 1998) consist of various subtests, like
the SON-R 2,-7. A fundamental difference, however, is that the directions for these tests are
given exclusively with gestures, whereas the directions with the SON-R 2,-7 are intended to
create as natural a test situation as possible.
An important way in which the SON-R 2,-7 differs from all the above-mentioned tests is
that the child receives assistance and feedback if he or she cannot do the task. In this respect the
SON-R 2,-7 resembles tests for learning potential that determine to what extent the child
profits from the assistance offered (Tellegen & Laros, 1993a). The LEM (Hessels, 1993) is an
example of this kind of test.
In sum, the SON-R 2,-7 differs from other tests for young children in its combination of a
friendly approach to children (in the manner of administration and the attractiveness of the
materials), a large variation in abilities measured, and the possibility of testing intelligence
regardless of the level of language skill.
15
INTRODUCTION
Table 1.1
Overview of the Versions of the SON-Tests
SON
(1943)
Snijders-Oomen
Deaf Children
4-14 years
SON-58
(1958)
Snijders & Snijders-Oomen
Deaf and Hearing Children
4-16 years
, -7 (Preschool SON)
SON 2,
(1975)
Snijders & Snijders-Oomen
Hearing and Deaf Children
3-7 years
SSON
(1975)
Starren
Hearing and Deaf Children
7-17 years
, -7
SON-R 2,
(1998)
Tellegen, Winkel, Wijnberg-Williams & Laros
General Norms
2;6-8;0 years
,-17
SON-R 5,
(1988)
Snijders, Tellegen & Laros
General Norms
5;6-17;0 years
under each heading has been listed: the year of publication of the Dutch manual, the authors of the
manual, the group and the age range for which the test was standardized
The form and contents of the SSON strongly resembled the SON-58, except that the SSON
consisted entirely of multiple choice tests. After the publication of the SSON in 1975, the SON58 remained in production because it was still in demand. In comparison to the SSON, the
SON-58 contained more stimulating tasks and provided more opportunity for observation of
behavior, because it consisted of tests in which children were asked to manipulate a large variety
of test materials. The subtests in the Preschool SON maintained this kind of performance test to
provide opportunities for the observation of behavior.
The third revision of the test for older children, the SON-R 5,-17, was published in 1988
(Snijders, Tellegen & Laros, 1989; Laros & Tellegen, 1991; Tellegen & Laros, 1993b). This test
replaces both the SON-58 and the SSON, and is meant for use with hearing and deaf children
from five-and-a-half to seventeen years of age. In constructing the SON-R 5,-17 an effort was
made to combine the advantages of the SSON and the SON-58. On the one hand, a range of
diverse testing materials was included. On the other hand, a high degree of standardization in
the administration and scoring procedures as well as a high degree of reliability of the test was
achieved.
The SON-R 5,-17 is composed of abstract and concrete reasoning tests, spatial ability tests
and a perceptual test. A few of these tests are newly developed. A memory test was excluded
because memory can be examined better by a specific and comprehensive test battery than by a
single subtest. In the SON-R 5,-17, the standardization for the deaf is restricted to conversion
of the IQ score to a percentile score for the deaf population. The test uses an adaptive procedure
in which the items are arranged in parallel series. This way, fewer items that are either too easy
or too difficult are administered. Feedback is given in all subtests; this consists of indicating
16
SON-R 2,-7
whether a solution is correct or incorrect. The standardized scores are calculated and printed by
a computer program.
The SON-R 5,-17 has been reviewed by COTAN, the commission of the Netherlands Institute
for Psychologists responsible for the evaluation of tests. All aspects of the test (Basics of the
construction of the test, Execution of the manual and test materials, Norms, Reliability and
Validity) were judged to be good (Evers, Van Vliet-Mulder & Ter Laak, 1992). This means the
SON-R 5,-17 is considered to be among the most highly accredited tests in the Netherlands
(Sijtsma, 1993).
After completing the SON-R 5,-17, a revision of the Preschool SON was started, resulting in
the publication of the SON-R 2,-7. The test was published in 1996, together with a manual
consisting of the directions and the norm tables (Tellegen, Winkel & Wijnberg-Williams, 1997).
In the present Manual and Research report, the results of research done with the test are also
presented: the method of revision, the standardization and the psychometric characteristics, as
well as the research concerning the validity of the test. Norm tables allowing the calculation of
separate standardized total scores for the performance tests and the reasoning tests have been
added. Also, the reference age for the total score can be determined. Norms for experimental
usage have been added for the ages of 2;0 to 2;6 years. All standardized scores can easily be
calculated and printed using the computer program.
INTRODUCTION
17
of the Preschool SON showed that the subtests differentiated too little at these ages. The range of
possible raw scores had a mean of 12 points. In the youngest age group, 20% of the children
received the lowest score on the subtests and in the oldest age group, 43% received the highest
score (Hofstee & Tellegen, 1991). In other words, the Preschool SON was appropriate for children of four or five years old, but it was often too difficult for younger children and too easy for
older children. Further, there was no standardization at the subtest level, only at the level of the
total score; this meant that it was not possible to calculate the IQ properly if a subtest had not been
administered. Finally, the norms were presented per age group of half a year. This could lead to a
deviation of six IQ points if the age did not correspond to the middle of the interval.
, -17
Correspondence with the SON-R 5,
To be able to compare the results of the SON-R 2,-7 with those of the SON-R 5,-17, the new
test for young children should be highly similar to the test for older children. An overlap in the
age ranges of the tests was also considered desirable. This way, the choice of a test can be based
on the level of the child, or on other specific characteristics that make one test more suitable
than the other. Various new characteristics of the SON-R 5,-17, such as the adaptive test
procedure, the standardization model and the use of a computer program, were implemented as
far as possible in the construction of the SON-R 2,-7.
Preparatory study
In the preparatory study the Preschool SON was evaluated. This started in 1990. The aim of the
preparatory study was to decide how the testing materials of the Preschool SON could best be
adapted and expanded. To this end, users of the Preschool SON were interviewed, the literature
was reviewed, other intelligence tests were analyzed and a secondary analysis of the data of the
standardization research of the Preschool SON was performed.
18
SON-R 2,-7
19
In this chapter, the test construction phase is described. In this phase, the research necessary to
construct a provisional version of the test was carried out. Successive improvements resulted in
the final test battery.
Evaluation by users
An inventory of the comments received from ten users of the Preschool SON was made.
These were psychologists employed by school advisory services, audiological centers, institutes for the deaf, medical preschool daycare centers, and in the care for the mentally
deficient.
On the whole, the Preschool SON was given a positive assessment as a test to which children
respond well and that affords plenty of opportunity to observe the childs behavior. The users
did, however, have the impression that the IQ score of the Preschool SON overrated the level of
the children. Clear information about administering and scoring the various subtests was lacking in the manual. The users followed the directions accurately but not literally. Furthermore,
they thought the subtests contained too few examples. They were inclined to provide extra help,
especially to young and to mentally deficient children. The discontinuation criterion, used in the
Preschool SON, was three consecutive mistakes per subtest. This discontinuation rule was
considered too strict, particularly for the youngest children, and, in practice, this rule was not
always applied.
The subtest Memory was administered in different ways. Some users administered it as a
game, playing a kind of hide and seek, whereas others tried to avoid doing this. The users had
the impression that this subtest was given too much weight in the total score of the Preschool
SON. Also, some doubt existed about the relationship between this subtest and the other ones.
20
SON-R 2,-7
Comparative research on the Preschool SON, the Stanford-Binet and parts of the WPPSI was
conducted by Harris in the United States of America. In general, her assessment of the test was
positive. Her criticism focused on some of the materials and the global norm tables (Harris, 1982).
21
For the SON-R 5,-17 with an administration time of about one-and-a-half hours, the mean
reliability of the total score is .93 and the generalizability is .85. If the administration of the
SON-R 2,-7 was to be limited to one hour, a reliability of .90 and a generalizability of .80 seemed
to be realistic goals. The improvement of these characteristics could be achieved by adding very
easy and very difficult items to each subtest, and by increasing the number of subtests.
An important object during the revision of the Preschool SON was to obtain a good match
with the early items of the SON-R 5,-17. As the age ranges of the two tests overlapped, the idea
was to take the easy items of the SON-R 5,-17 as a starting point for the new, most difficult
items of the SON-R 2,-7.
These considerations led to a plan for the revision of the Preschool SON in which the subtest
Memory was dropped. The subtest Memory (the Cat House) had a low level of reliability and,
what is more, a low correlation with age and the remaining subtests. The interviews with users
of the Preschool SON showed that children enjoyed doing the Cat House subtest, but that the
directions for administration were often not followed correctly. Another consideration was that
assessment of memory can be carried out more effectively with a specific and comprehensive
test battery. The results from a single subtest are insufficient to draw valid conclusions about
memory. On the basis of similar considerations, no memory subtest had been included in the
SON-R 5,-17. The four remaining subtests of the Preschool SON were expanded to six subtests by dividing two existing subtests:
The subtest Sorting was divided into two subtests: the section Sorting Disks was expanded
with simple analogy items consisting of geometrical forms similar to the SON-R 5,-17; the
section Sorting Pictures was expanded with easy items from the subtest Categories of the
SON-R 5,-17.
The section of the subtest Combining, in which two halves of a picture had to be combined,
was expanded with items from the subtest Situations from the SON-R 5,-17; the section
Puzzles was expanded and implemented as a separate subtest.
The subtest Mosaics was expanded with simple items and with items from the SON-R 5,-17.
The subtest Copying was adapted to increase its similarity to the subtest Patterns of the
SON-R 5,-17.
The relationship between the subtests of the Preschool SON and the SON-R 2,-7 is presented
schematically in table 2.1.
Table 2.1
Relationship Between the Subtests of the Preschool SON and the SON-R 2,-7
Preschool SON
, -7
SON-R 2,
Subtest
Task
Subtest
Task
Sorting
Sorting disks
Analogies
Sorting disks
Analogies SON-R 5,-17
Sorting figures
Categories
Sorting figures
Categories SON-R 5,-17
Mosaics
Mosaics
with/without a frame
Mosaics
Mosaics in a frame
Mosaics SON-R 5,-17
Combination
Two halves of a
picture
Situations
Puzzles
Puzzles
Puzzles in a frame
separate puzzles
Patterns
Copying patterns
Memory
Finding cats
Copying
22
SON-R 2,-7
Testing materials
From the first experimental version on, the test consisted of the following subtests: Mosaics,
Categories, Puzzles, Analogies, Situations and Patterns. This sequence was maintained throughout the three versions. Tests that are spatially oriented are alternated with tests that require
reasoning abilities, and abstract testing materials are alternated with materials using concrete
(reasoning) pictures. Mosaics is a suitable test to begin with as it requires little direction, the
child works actively at a solution, and the task corresponds to activities that are familiar to the
child.
The items of the experimental versions consisted of (adapted) items from the Preschool SON
and the SON-R 5,-17 and of newly constructed items. Most of the new items were very simple
items that would make the test better suited to young children. Table 2.2 shows the origin of the
items in the final version of the test. Of a total of 96 items, five of which are example items, 45%
are new, 25% are adaptations of Preschool SON items, and 30% are adaptions from the SON-R
5,-17.
In the first experimental version the original items of the Preschool SON and the SON-R
5,-17 were used. In the following versions all items of the subtests were redrawn and
reformed to improve the uniformity of the material and to simplify the directions for the
tasks. In the pictures of people the emphasis was on pictures of children and care was taken
to have an even distribution of boys and girls. More children with a non-western appearance
were included.
An effort was made to make the material colorful and attractive, durable and easy to store. A
mat was used to prevent the material from sliding around, to facilitate picking up the pieces and
to increase the standardization of the test situation.
23
Table 2.2
Origin of the Items
,-7
Subtests of the SON-R 2,
Origin
Mos
Cat
Puz
Ana
Sit
Pat
Total
Adapted from
the Preschool SON
24
Adapted from
the SON-R 5,-17
29
New items
10
10
43
16
16
15
18
15
16
96
are much too difficult is very frustrating and demotivating for children. When older children are
given items that are much too easy, they very quickly consider these childish and may then be
inclined not to take the next, more difficult items seriously.
In the Preschool SON a discontinuation rule of three consecutive mistakes was used.
Because the mistakes had to be consecutive, children sometimes had to make many mistakes
before the test could be stopped. In practice this meant that, especially with young children,
examiners often stopped too early. In the SON-R 5,-17 the items are arranged in two or three
parallel series and in each series the test is discontinued after a total of two mistakes. In the first
series the first item is taken as a starting point; in the following series the starting point depends
on the performance in the previous series. This method has great advantages: everyone starts the
test at the same point, but tasks that are too easy as well as tasks that are too difficult are skipped.
Further, returning to an easier level in the next series is pleasant for the child after he or she has
done a few tasks incorrectly.
Research was carried out with the first experimental version to see if the adaptive method of
the SON-R 5,-17 could also be applied with the SON-R 2,-7. The problem was, however, that
the subtests consist of two different parts. This makes a procedure with parallel series confusing
and complicated because switching repeatedly from one part of the test to the other may be
necessary. In the subsequent construction research, only one series of items of progressive
difficulty was used. However, the discontinuation criterion was varied and research was done on
the effect of using an entry procedure in which the item taken as a starting point depended on the
age of the child.
Finally, on the basis of the results of this research, a procedure was chosen in which the
first, third or fifth item is taken as a starting point and each subtest is discontinued after a
total of three mistakes. The performance subtests can also be discontinued when two subsequent mistakes are made in the second section of these tests. The items in these subtests have
a high level of discrimination, and the children require a fair amount of time to complete the
tasks. They become frustrated if they have to continue when the next item is clearly too
difficult for them.
As a result of the adaptive procedure, the number of items to be administered is strictly
limited, and the mean duration of the test is less than an hour, but very little information is lost
by skipping a few items. Further, the childrens motivation remains high during this procedure
because only a very few items above their level are administered.
24
SON-R 2,-7
IRT-model was used because the adaptive administration procedure makes it difficult to evaluate these characteristics on the basis of p-values and item-total correlations. The parameter for
difficulty indicates a level of ability at which 50% of the children solve the item correctly; the
parameter for discrimination indicates how, at this level, the probability that the item will be
answered correctly increases as ability increases.
Because of the use of an adaptive procedure, it was important that the items were
administered in the correct order of progressive difficulty; the examiner had to be reasonably
certain that items skipped at the beginning would have been solved correctly, and that items
skipped at the end would have been solved incorrectly. Also important was a balanced
distribution in the difficulty of the items, and sufficient numbers of easy items for young
children and difficult items for older ones. On the basis of the results of the IRT-analysis,
new items were constructed, some old items were adapted and others were removed from the
test. In some cases the order of administration was changed. A problem arising from this was
that items may become more difficult when administered early in the test. The help and
feedback given after an incorrect solution may benefit the child so that the next, more
difficult item becomes relatively more easy.
Scoring Patterns
In the subtest Patterns lines and figures must be copied, with or without the help of preprinted
dots. Whether the child can draw neatly or accurately is not important when copying, but
whether he or she can see and reproduce the structure of the example is. This makes high
demands on the assessment and a certain measure of subjectivity cannot be excluded. During
the construction research, a great deal of attention was paid to elucidating the scoring rules, and
inter-assessor discrepancies were used to determine which drawings were difficult to evaluate.
On this basis, drawings that help to clarify the scoring rules were selected. These drawings are
included in the directions for the administration of Patterns.
25
, -7
DESCRIPTION OF THE SON-R 2,
The SON-R 2,-7 is a general intelligence test for young children. The test assesses a broad
spectrum of cognitive abilities without involving the use of language. This makes it especially
suitable for children who have problems or handicaps in language, speech or communication,
for instance, children with a language, speech or hearing disorder, deaf children, autistic children, children with problems in social development, and immigrant children with a different
native language.
A number of features make the test particularly suitable for less gifted children and children
who are difficult to test. The materials are attractive, the tasks diverse. The child is given the
chance to be active. Extensive examples are provided. Help is available on incorrect responses,
and the discontinuation rules restrict the administration of items that are too difficult for the
child.
The SON-R 2,-7 differs in various aspects from the more traditional intelligence tests, in
content as well as in manner of administration. Therefore, this test can well be administered as
a second test in cases where important decisions have to be taken, on the basis of the outcome of
a test, or if the validity of the first test is in doubt.
Although the reasoning tests in the SON-R 2,-7 are an important addition to the typical
performance tests, the nonverbal character of the SON tests limits the range of cognitive
abilities that can be tested. Other tests will be required to gain an insight into verbal
development and abilities. However, for those groups of children for whom the SON-R
2,-7 has been specifically designed, a clear distinction must be made between intelligence
and verbal development.
After describing the composition of the subtests, the most important characteristics of the test
administration are presented in this chapter.
26
SON-R 2,-7
Table 3.1
Tasks in the Subtests of the SON-R 2,-7
Task part I
Task part II
Mosaics
Categories
Puzzles
Analogies
Situations
Patterns
Mosaics (Mos)
The subtest Mosaics consists of 15 items. In Mosaics, part I, the child is required to copy several
simple mosaic patterns in a frame using three to five red squares. The level of difficulty is
determined by the number of squares to be used and whether or not the examiner first demonstrates the item.
In Mosaics II, diverse mosaic patterns have to be copied in a frame using red, yellow and red/
yellow squares. In the easiest items of part II, only red and yellow squares are used, and the
pattern is printed in the actual size. In the most difficult items, all of the squares are used and the
pattern is scaled down.
Categories (Cat)
Categories consists of 15 items. In Categories I, four or six cards have to be sorted into two
groups according to the category to which they belong. In the first few items, the drawings on
the cards belonging to the same category strongly resemble each other. For example, a shoe or a
flower is shown in different positions. In the last items of part I, the child must him or herself
identify the concept underlying the category: for example, vehicles with or without an engine.
Categories II is a multiple choice test. In this part, the child is shown three pictures of objects
that have something in common. Two more pictures that have the same thing in common have
then to be chosen from another column of five pictures. The level of difficulty is determined by
the level of abstraction of the shared characteristic.
Puzzles (Puz)
The subtest Puzzles consists of 14 items. In part I, puzzle pieces must be laid in a frame to
27
Figure 3.1
Items from the Subtest Mosaics
Item 3
(Part I)
Item 9
(Part II)
Item 14
(Part II)
resemble the given example. Each puzzle has three pieces. The first few puzzles are first
demonstrated by the examiner. The most difficult puzzles in part I have to be solved independently.
In Puzzles II, a whole must be formed from three to six separate puzzle pieces. No directions
are given as to what the puzzles should represent; no example or frame is used. The number of
puzzle pieces partially determines the level of difficulty.
Figure 3.2
Items from the Subtest Categories
Item 4
(Part I)
Item 11
(Part II)
28
SON-R 2,-7
Figure 3.3
Items from the Subtest Puzzles
Item 3
(Part I)
Item 11
(Part II)
Analogies (Ana)
The subtest Analogies consists of 17 items. In Analogies I, the child is required to sort three,
four or five blocks into two compartments on the basis of either form, color or size. The child
must discover the sorting principle him or herself on the basis of an example. In the first few
items, the blocks to be sorted are the same as those pictured in the test booklet. In the last items
of part I, the child must discover the underlying principle independently: for example, large
versus small blocks.
Analogies II is a multiple choice test. Each item consists of an example-analogy in which a
geometric figure changes in one or more aspect(s) to form another geometric figure. The
examiner demonstrates a similar analogy, using the same principle of change. Together with the
child, the examiner chooses the correct alternative from several possibilities. Then, the child has
to apply the same principle of change to solve another analogy independently. The level of
difficulty of the items is related to the number and complexity of the transformations.
Situations (Sit)
The subtest Situations consists of 14 items. Situations I consists of items in which one half of
each of four pictures is shown in the test booklet. The child has to place the missing halves
beside the correct pictures. The first item is printed in color in order to make the principle clear.
The level of difficulty is determined by the degree of similarity between the different halves
belonging to an item.
Situations II is a multiple choice test. Each item consists of a drawing of a situation with one
or two pieces missing. The correct piece (or pieces) must be chosen from a number of alternatives to make the situation logically consistent. The number of missing pieces determines the
level of difficulty.
Patterns (Pat)
The subtest Patterns consists of 16 items. In this subtest the child is required to copy an
example. The first items are drawn freely, then pre-printed dots have to be connected to make
the pattern resemble the example. The items of Patterns I are first demonstrated by the examiner
and consist of no more than five dots.
29
Figure 3.4
Items from the Subtest Analogies
Item 8
(Part I)
Item 9
(Part I)
Item 16
(Part II)
30
SON-R 2,-7
Figure 3.5
Items from the Subtest Situations
Item 5
(Part I)
Item 10
(Part II)
31
Figure 3.6
Items from the Subtest Patterns
Item 6
(Part I)
Item 13
(Part II)
Item 16
(Part II)
The items in Patterns II consist of five, nine or sixteen dots and have to be copied by the child
without help. The level of difficulty is determined by the number of dots and whether or not the
dots are pictured in the example pattern.
Spatial tests
Spatial tests correspond to concrete reasoning tests in that, in both cases, a relationship within a
spatial whole must be constructed. The difference lies in the fact that concrete reasoning tests
concern a meaningful relationship between parts of a picture, and spatial tests concern a form
relationship between pieces or parts of a figure (see Snijders, Tellegen & Laros, 1989; Carroll,
1993). Spatial tests have long been integral components of intelligence tests. The spatial subtests included in the SON-R 2,-7 are Mosaics and Patterns. The subtest Puzzles is more
difficult to classify, as the relationship between the parts concerns form as well as meaning. We
expected the performance on Puzzles and Situations to relate to concrete reasoning ability.
32
SON-R 2,-7
However, the correlations and factor analysis show that Puzzles is more closely associated with
Mosaics and Patterns (see section 5.3)
Performance tests
An important characteristic that Puzzles, Mosaics and Patterns have in common is that the item
is solved while manipulating the test stimuli. That is why these three subtests are called performance tests. In the three reasoning tests (Situations, Categories and Analogies), in contrast,
the correct solution has to be chosen from a number of alternatives. For the rest, the six subtests
are very similar in that perceptual and spatial aspects as well as reasoning ability play a role in
all of them.
The performance subtests of the SON-R 2,-7 can be found in a similar form in other
intelligence tests. However, only verbal directions are given in these tests. Reasoning tests can
also regularly be found in other intelligence tests, but then they often have a verbal form (such as
verbal analogies).
In table 3.2 the classification of the subtests is presented. The empirical classification, in which
a distinction is made between performance tests and reasoning tests, is based on the results of
principal components analysis of the test scores of several different groups of children (see
section 5.4). In table 3.2. the number of each subtest indicates the sequence of administration;
the sequence of the subtests in the table is based on similarities of content. This sequence is used
in the following chapters when presenting the results.
Table 3.2
Classification of the Subtests
No
Abbr
Subtest
Content
Empirical
6
1
3
5
2
4
Pat
Mos
Puz
Sit
Cat
Ana
Patterns
Mosaics
Puzzles
Situations
Categories
Analogies
Spatial insight
Spatial insight
Concrete reasoning
Concrete reasoning
Abstract reasoning
Abstract reasoning
Performance test
Performance test
Performance test
Reasoning test
Reasoning test
Reasoning test
33
test administration, otherwise an unnatural situation would arise. The manner of administration
of the test depends on the communication abilities of the child. The directions can be given
verbally, nonverbally with gestures or using a combination of both. Care must be taken when
giving verbal directions that no extra information is given.
No knowledge of a specific language is required to solve the items being presented. However, level of language development, for example, being able to name objects, characteristics
and concepts, can influence the ability to solve the problems correctly. Therefore the SON-R
2,-7 should be considered a nonverbal test for intelligence rather than a test for nonverbal
intelligence.
Directions
An important part of the directions to the child is the demonstration of (part of) the solution to a
problem. An example item is included in the administration of the first item on each subtest, and
detailed directions are given for all first items. Once the child understands the nature of the task,
the examiner can shorten the directions for the following items. If the child does not understand
the directions, they can be repeated.
In the second part of each subtest an example is given in advance. Once the child understands
this example, he or she can do the following items independently.
Feedback
The examiner gives feedback after each item. In the SON-R 5,-17, feedback is limited to
telling the child whether his of her answer is correct or incorrect. In the SON-R 2,-7 the
examiner indicates whether the solution is correct or incorrect, and, if the answer is incorrect,
he/she also demonstrates the correct solution for the child. The examiner tries to involve the
child when correcting the answer, for instance, by letting him or her perform the last action.
However, the examiner does not explain why the answer was incorrect.
By giving feedback, a more normal interaction between the examiner and the child occurs,
and the child gains a clearer understanding of the task. The child is given the opportunity to
learn and to correct him or herself. In this respect a similarity exists between the SON-tests and
tests for learning potential (Tellegen & Laros, 1993a).
Time factor
The speed with which the problems are solved plays a very subordinate role in the SON-R
2,-7. A time limit for completing the items is used only in the second part of the performance
tests. The time limit is generous. Its goal is to allow the examiner to end the item. The construction research showed that children who go beyond the time limit are seldom able to find a
correct solution when given more time.
34
SON-R 2,-7
during administration). During the standardization research the administration took between
forty and sixty minutes in 60% of the cases. For children with a specific handicap, the administration takes about five minutes longer. For children two years of age, administration time is
shorter; nearly 50% of the two-year-olds complete the test in less that forty minutes.
Standardization
The SON-R 2,-7 is meant primarily for children in the age range from 2;6 to 7;0 years. The
norms were constructed using a mathematical model in which performance is described as a
continuous function of age. An estimate is made of the development of performance in the
population, on the basis of the results of the norm groups (see chapter 4). These norms run from
2;0 to 8;0 years. In the age group from 2;0 to 2;6 years, the test should only be used for
experimental purposes. In many cases the test is too difficult for children younger than 2;6
years. Often, they are not motivated or concentrated enough to do the test. However, in the age
group from 7;0 to 8;0 years, the test is eminently suitable for children with a cognitive delay or
who are difficult to test. The easy starting level and the help and feedback given can benefit
these children. For children of seven years old who are developing normally, the SON-R 5,-17
is generally more appropriate.
The scaled subtest scores are presented as standard scores with a mean of 10 and a standard
deviation of 3. The scores range from 1 to 19. The SON-IQ, based on the sum of the scaled
subtest scores, has a mean of 100 and a standard deviation of 15. The SON-IQ ranges from 50 to
150. Separate total scores can be calculated for the three performance tests (SON-PS) and the
three reasoning tests (SON-RS). These have the same distribution characteristics as the IQ
score. When using the computer program, the scaled scores are based on the exact age; in the
norm tables age groups of one month are presented. With the computer program, a scaled total
score can be calculated for any combination of subtests.
In addition to the scaled scores, based on a comparison with the population of children of the
same age, a reference age can be determined for the subtest scores and the total scores. This
shows the age at which 50% of the children in the norm population perform better, and 50%
perform worse. The reference age ranges from 2;0 to 8;0 years. It provides a different framework for the interpretation of the test results, and can be useful when reporting to persons who
are not familiar with the characteristics of deviation scores. The reference age also makes it
possible to interpret the performance of older children or adults with a cognitive delay, for
whom administration of a test, standardized for their age, is practically impossible and not
meaningful.
As with the SON-R 5,-17, no separate norms for deaf children were developed for the
SON-R 2,-7. Our basic assumption is that separate norms for specific groups are only required
when a test discriminates against a special group of children because of its contents or the
manner in which it is administered. Research using the SON-R 2,-7 and the SON-R 5,-17
with deaf children (see chapter 7) shows that this is absolutely not the case for deaf children
with the SON tests.
35
Properly standardized test norms are necessary for the interpretation of the results of a test. The
test norms make it possible to assess how well or how badly a child performed in comparison to
the norm population. The norm population of the SON-R 2,-7 includes all children residing in
the Netherlands in the relevant age group, except those with a severe physical and/or mental
handicap. The standardization process transforms the raw scores into normal distributions with
a fixed mean and standard deviation. This allows comparisons to be made between children,
including children of different ages. Intra-individual comparisons between performances on
different subtests are also possible. As test performances improve very strongly in the age range
from two to seven years, the norms should ideally be related to the exact age of the child and not
to an age range, as is the case for most intelligence tests for children.
Regions of research
To ensure a good regional distribution, the research was carried out in ten regions, five of which
are in the West, three in the North/East, and two in the South of the Netherlands. The regions
were chosen to reflect specific demographic characteristics of the Netherlands. In nine of the ten
regions, one examiner administered all the tests. In one region, two examiners shared the test
administration. Approximately the same number of children was tested in each region in five
separate two week periods. The test was administered to 22 children, one boy and one girl from
each age group in each region in each period. The sample to be tested consisted of 1100
children, i.e., 10 (regions) x 5 (periods) x 11 (age groups) x 2 (one boy and one girl).
Communities
The second phase of the standardization research concerned the selection of the communities in
the ten research regions where the test administrations were to take place. In total, 31 communities were selected. Depending on the size of the community, the research was carried out during
one, two or three periods. The selected communities were representative for the Netherlands
with regard to number of inhabitants and degree of urbanization.
Schools
Children four years and older were tested at primary schools. Research at schools was carried
out in the same communities as the research with younger children. One, two or three schools
were selected in each community, depending on the number of periods in which research was to
be done in that community. To select the schools, a sample was drawn from the schools in each
community. The chance of inclusion was proportional to the number of pupils at the school.
36
SON-R 2,-7
Fifty schools were approached, 25 were prepared to participate. Schools that were not prepared
to participate were replaced by other schools in the same community. The socio-economic
status of the parents was taken into account in the choice of replacement schools.
Practical implementation
The department of Orthopedagogics of the University of Groningen, responsible for the standardization in the Netherlands of the Reynell Test for Language Understanding and the Schlichting Test for Language Production (Lutje Spelberg & Sj. van der Meulen, 1990), collaborated in
the design and execution of the standardization research. In three of the five research periods,
children who were tested with the SON-R 2,-7 had also participated in the standardization
research of the language tests six months earlier. To validate both the language tests and the
SON-R 2,-7, a third test was administered to some of the children in the intervening period.
Eleven examiners, eight women and three men, administered most of the tests. Most were
psychology graduates, with extensive experience in testing young children, some of which had
been gained in the previous research they had carried out with the language tests.
Children below four years old were tested in a local primary health care center, in the
presence of one of the parents. In a few cases the child was tested at home. Older children were
tested at school in a separate room. An effort was made to administer the whole test in one
session. However, a short break between the subtests was allowed. At the schools, breaking off
the test for longer periods, or even continuing a test the next day, was sometimes necessary
because of school hours and breaks.
In a few cases the test could not be administered correctly. If no more than four subtests
could be administered, the test was considered invalid and was not used in the analyses. This
situation occurred in the case of ten children, eight of whom who were two years old.
37
Total
2;3
2;9
3;3
3;9
4;3
4;9
5;3
5;9
6;3
6;9
7;3
98
99
99
100
102
101
105
105
102
107
106
94
89
86
90
99
101
104
104
101
105
103
3
1
9
1
11
2
7
3
2
1
47
51
50
49
48
51
52
48
52
50
50
51
52
53
53
52
49
53
53
54
55
51
2.24
14
2.76
16
3.25
16
3.75
14
4.25
15
4.74
22
5.24
22
5.74
24
6.25
18
6.74
23
7.24
21
Phase
1993
Addition:
1994
Immigrant
Spec. Educ.
Sex
Boys
Girls
Age
Mean (years)
SD (days)
38
SON-R 2,-7
Table 4.2
Demographic Characteristics of the Norm Group in Comparison with the Dutch Population
(N=1124)
Region
North/East-Netherlands
South-Netherlands
West-Netherlands
Size of Community
Less than 10.000 inhabitants
10.000 to 20.000 inhabitants
20.000 to 100.000 inhabitants
More than 100.000 inhabitants
Degree of Urbanization
(Urbanized) Rural Communities
Commuter Communities
Urban Communities
Norm Group
Population
31%
19%
50%
31%
22%
47%
Norm Group
Population
12%
22%
44%
22%
11%
20%
42%
27%
Norm Group
Population
37%
16%
47%
34%
15%
51%
Table 4.3 presents the level of education and country of birth of the mother, before and after
weighting, for three age groups. As can be seen, the differences between the age groups were
much smaller after weighting. The level of education of the mothers corresponded well to the
level of education in the population of women between 25 and 45 years of age (CBS, 1994). The
percentages for low, middle and high levels of education in the population are respectively 27%,
54% and 19%. The percentage of children whose mother was born abroad also corresponded to
the national percentage of 10% immigrant children in the age range from zero to ten years
(Roelandt, Roijen & Veenman, 1992).
Table 4.3
Education and Country of Birth of the Mother in the Weighted and Unweighted Norm Group
Education Mother
Low
Middle
High
2 and 3 years
4 and 5 years
6 and 7 years
26%
32%
40%
57%
51%
45%
17%
17%
15%
91%
90%
86%
9%
10%
14%
Total
32%
51%
17%
89%
11%
Education mother
Low
Middle
High
2 and 3 years
4 and 5 years
6 and 7 years
28%
32%
33%
54%
52%
50%
18%
16%
17%
89%
89%
87%
11%
11%
13%
Total
31%
52%
17%
89%
11%
39
40
SON-R 2,-7
After the stepwise fitting procedure, the number of selected parameters in the subtests varied
from six to ten. The cumulative proportion in the population, in the age range from two to eight,
could then be estimated for every possible combination of age and score. Normally distributed
z-values were then determined by calculating the mean z-value for the normal distribution
interval that corresponded to the upper limit and the lower limit of each raw score. The averaging procedure caused a slight loss of dispersion, for which we corrected.
This model may seem to be complicated. However, for simple linear transformations per age
group, twenty-two parameters for each subtest would have to be estimated, and in the case of
nonlinear transformations based on the cumulative proportions, more than one hundred parameters would have to be estimated.
Reliability
For each subtest and age group the reliability was calculated with the formula for labda2
(Guttman, 1945). This is, like labda3 (Coefficient alpha; Cronbach, 1951), a measure for internal consistency. However, labda2 is preferable if the number of items is limited, and if the
covariance between the items is not constant (Ten Berge & Zegers, 1978).
The reliability for each subtest was fitted as a third degree function of the transformed age
(Y), using the method of stepwise multiple regression. In a few cases, when extrapolating to the
ages of 2;0 and 8;0, extreme values occurred for the estimate of reliability. In these cases, the
lower limit for the estimated value was set at .30 and the upper limit at .85.
Standard scores
Scaled subtest scores are presented on a normally distributed scale with a mean of 10 and a
standard deviation of 3. These so-called Wechsler scores have a range of 1 to 19. As a result of
floor and ceiling effects, the most extreme scores will not occur in all age groups. The raw
scores of the subtests are less differentiated than the standard scores. As a result, only some of
the values in the range of 1 to 19 are used in each age group. However, the values show the
position in the normal distribution with more precision which would not be possible with a less
differentiated scale.
The sum of the six scaled subtest scores is the basis of the IQ score. This SON-IQ has a mean
of 100 and a standard deviation of 15. The range extends from 50 to 150.
The sum of the scaled scores of Mosaics, Puzzles and Patterns is transformed to provide the
Performance Scale (SON-PS), and the sum of Categories, Situations and Analogies forms the
Reasoning Scale (SON-RS). Both scales, like the IQ-distribution, have a mean of 100 and a
standard deviation of 15. The range extends from 50 to 150.
In the Appendix, the norm tables for the subtests are shown for each month of age, for the age
41
range 2;0 to 8;0 years. The tables for calculating the standardized total scores are presented per
four month period. When the computer program is used, all the standardized scores are based on
the exact age.
Reference age
The reference age is derived from the raw score(s). The actual age of the child is not important.
For the age range of 2;0 to 8;0 years, the reference age is presented in years and months. The
reference age for the subtests can be found in the norm tables. The reference age for the total
score is the age at which a child with this raw scores would receive an IQ score of 100. This age
is determined iteratively, with the help of the computer program, for the Total Score on the test,
the Performance Scale and the Reasoning Scale. An approximation of the reference age for the
total score is presented in the norm tables in the appendix. This approximation is based on the
sum of the six raw subtest scores.
For use of the norm tables and the computer program, we refer to chapter 13 (The record form,
norm tables and computer program). Directions on the procedure to be used when the test has
not been fully administered can also be found in this chapter.
43
PSYCHOMETRIC CHARACTERISTICS
Important psychometric characteristics of the SON-R 2,-7 will be discussed in this chapter.
These are the distribution characteristics of the scores, the reliability and generalizability of the
test, the relationship between the test scores and the stability of the scores. In general, these
results are based on the weighted norm group (N=1122). In several analyses comparisons have
been made between the results in three age groups, namely:
two- and three year-olds (the norm groups of 2;3, 2;9, 3;3 and 3;9 years),
four- and five-year-olds (the norm groups of 4;3, 4;9; 5;3 and 5;9 years),
six- and seven-year-olds (the norm groups of 6;3, 6;9 and 7;3 years).
The results in this chapter are relevant for the internal structure of the test. Research on validity,
carried out in the norm group, will be discussed in chapter 6 (Relationships with other variables)
and in chapter 9 (Relationship with cognitive tests).
item 1
item 2
item 3
item 4
item 5
item 6
item 7
item 8
item 9
item 10
item 11
item 12
item 13
item 14
item 15
item 16
item 17
Pat
Mos
Puz
Sit
Cat
Ana
.90
.88*
.90
.88
.86
.79
.77
.62
.60
.43
.33
.30
.21
.20
.13
.04
.95
.81
.77
.76
.73
.70
.64
.58
.46
.33
.23
.14
.08*
.10
.06
.97
.90
.89
.79
.76
.72
.64
.59
.37*
.44
.25
.19
.13
.05
.95
.91
.87
.86
.80
.67
.56
.54
.46
.32
.17
.12
.07
.06
.91
.89
.89
.82
.75
.69
.64
.51
.49
.33
.30
.17
.10
.09
.05
.96
.93
.84*
.86
.73
.52*
.58
.57
.45
.28
.28
.23
.15
.13
.04*
.06
.04
44
SON-R 2,-7
For two items, item 9 of Puzzles and item 6 of Analogies, the difference was larger. The six
deviating items are marked with an asterisk in table 5.1.
IRT model
As in the construction research, the item characteristics for the definitive test were estimated
with the 2-parameter model from item response theory. The computer program BIMAIN
(Zimowski et al., 1994) was used for these calculations. This program does not require all
subjects to have completed all the items. The two item parameters estimated for the items of
each subtest are the a-parameter and the b-parameter. The a-parameter shows how well the item
discriminates and the b-parameter shows how difficult the item is. To obtain a reliable estimate
of the item parameters, the analysis was carried out on the test results of 2498 children, almost
all the children who were tested during the standardization and the validation research. The
estimate is based on the items administered de facto.
In figure 5.1 the item characteristics are represented in a graph. The distribution of the bparameters is similar to the results obtained on the basis of the p-values. Except for a few small
deviations, the items increase in difficulty. The difficulty of the items is also distributed evenly
over the range from -2 to +2.
The mean of the discrimination parameter is highest for Patterns (mean=4.8) and Mosaics
(mean=3.8). For Puzzles, Situations, Categories and Analogies, the means are 2.8, 2.4, 2.9 and
2.2 respectively. Within the subtests, however, the discrimination values of the items can diverge
strongly.
Initially, we considered basing the scoring and standardization of the SON-R 2,-7 on the
estimated latent abilities as represented in the IRT model. A good method for doing this with
incomplete test data was described by Warm (1989). Such a method of scoring has important
advantages: items which clearly discriminate have more weight in the evaluation, no assumptions need to be made about scores on items that were not administered, and the precision of
statements about the ability of a person can be shown more clearly. However, the disadvantages
are that this scoring method can only be done with a computer, and that important differences
can occur between the standardized computer results and the results obtained with norm tables.
The main factor in the decision not to apply the IRT model when standardizing the test, however, was the fact that the data did not fit the model. This is not surprising. The IRT model
assumes that the item scores are obtained independently. However, the feedback and the help
given with the SON-R 2,-7, creates an interdependence among the scores. This works out
positively for the test and its validity, but it limits the psychometric methods that can be applied
successfully. IRT models that take learning effects into account are being developed (see
Verhelst & Glas, 1995), but programs with which the item parameters can be estimated in
combination with an adaptive test administration, are not yet available.
45
PSYCHOMETRIC CHARACTERISTICS
Figure 5.1
Plot of the Discrimination (a) and Difficulty (b) Parameter of the Items
Patterns
a
9
15
4
5
10 11 12 14
13
16
1 2 5
2
|
-3
|
-2
|
-1
|
0
|
1
|
2
Mosaics
a
5
4
6
1
8
9
10
11 12
13
14
2 3
|
-3
|
-2
15
|
-1
|
0
|
1
|
2
Puzzles
a
5
5
4
1
2
3
2
|
-3
10
9
|
-2
|
-1
11
|
0
|
1
12 13
14
|
2
|
1
11 12 13 14
|
2
Situations
a 5
2
4
3
1
2
5
|
-3
|
-2
8 7 9
|
b
0
|
-1
10
Categories
a 5
2 3
2
8
9
6 7
1
|
-3
11
10
|
-2
|
-1
|
0
12 14 15
13
|
1
|
2
Analogies
a 4
2
2
|
-3
4 3
|
-2
5
|
-1
8
7 6 9
|
b
0
10 12 14
11
13
|
1
15
16 17
|
2
46
SON-R 2,-7
Table 5.2
Mean and Standard Deviation of the Raw Scores
Pat
Age
Mos
Puz
Sit
Cat
Ana
Sum Subt.
Mean (SD) Mean (SD) Mean (SD) Mean (SD) Mean (SD) Mean (SD)
2;3
2;9
3;3
3;9
4;3
4;9
5;3
5;9
6;3
6;9
7;3
1.1
3.8
5.7
7.1
8.4
9.5
10.4
11.2
12.9
13.2
14.1
Total
(1.5)
(2.3)
(1.8)
(1.3)
(1.3)
(1.5)
(1.6)
(1.9)
(2.0)
(1.8)
(1.7)
8.9 (4.3)
1.3
1.8
3.3
5.3
7.3
8.3
9.0
9.8
11.1
11.4
12.2
( .9)
(1.1)
(2.1)
(2.2)
(1.9)
(1.8)
(1.5)
(1.8)
(2.0)
(1.9)
(2.0)
2.0
2.7
4.3
6.3
7.7
8.4
9.4
10.0
11.0
11.2
11.5
7.4 (4.1)
(1.0)
(1.3)
(2.0)
(1.9)
(1.7)
(2.0)
(1.7)
(2.0)
(1.9)
(1.5)
(1.5)
1.6
3.6
5.1
6.2
7.3
7.8
8.4
9.1
10.2
10.5
11.1
7.7 (3.7)
(1.7)
(2.2)
(1.8)
(1.6)
(1.7)
(1.7)
(1.7)
(1.7)
(1.8)
(1.7)
(1.6)
7.4 (3.4)
1.2
2.8
4.7
6.1
7.6
8.5
8.6
9.9
11.0
11.2
11.9
(1.6)
(2.0)
(1.9)
(1.9)
(1.9)
(2.0)
(1.7)
(1.9)
(1.7)
(1.9)
(1.6)
7.6 (3.9)
1.8
3.5
5.1
6.2
6.9
8.2
8.4
9.6
10.5
11.3
12.7
(1.4)
(1.9)
(2.0)
(2.1)
(2.1)
(2.1)
(2.4)
(2.6)
(3.3)
(3.1)
(3.0)
7.7 (4.0)
Mean (SD)
9.0
18.1
28.3
37.3
45.2
50.7
54.3
59.8
66.6
68.8
73.6
(5.3)
(6.8)
(8.4)
(7.6)
(7.2)
(7.9)
(6.7)
(8.6)
(9.1)
(8.3)
(8.3)
46.5 (21.7)
ranges from 50 to 150. A distribution with a mean of 100 and a standard deviation of 15 is also
used for the Performance Scale (SON-PS), based on the sum of the scores of Mosaics, Puzzles
and Patterns, and for the Reasoning Scale (SON-RS), based on the sum of the scores of Categories, Analogies and Situations.
In table 5.3, the mean and the standard deviation of the standardized scores are presented for
the entire weighted norm group and for three age groups. Only very small deviations from the
planned distribution were found for the entire group. No significant deviations from the normal
distribution were found in tests for skewness and kurtosis. Deviations in mean and dispersion
sometimes differed slightly across the three separate age groups, but an analysis of variance
showed that the differences between the means were not significant. A test for the homogeneity
of the variances also failed to show any significant differences. The kurtosis was not significant
in the different groups. The distribution was positively skewed for Puzzles and for the
Reasoning Scale in the oldest group. However, the values for skewness were small, .4 and .3,
respectively. A variance analysis was also carried out over the eleven original age groups. No
significant differences in mean and variance between the groups were established for any of the
variables.
Table 5.3
Distribution Characteristics of the Standardized Scores in the Weighted Norm Group
Total
Mean (SD)
Patterns
Mosaics
Puzzles
Situations
Categories
Analogies
10.0
10.1
10.0
10.0
10.0
10.0
(2.9)
(3.0)
(3.0)
(2.9)
(2.9)
(2.9)
2-3 years
Mean (SD)
9.9
10.0
10.0
10.0
10.0
10.0
(2.8)
(3.0)
(2.9)
(2.8)
(2.9)
(2.7)
4-5 years
Mean (SD)
10.0
10.0
10.0
9.9
10.0
10.0
(2.9)
(3.1)
(3.0)
(3.1)
(3.0)
(3.0)
6-7 years
Mean
(sd)
10.1
10.2
10.1
10.0
10.1
9.8
(3.1)
(3.0)
(3.0)
(2.8)
(2.9)
(3.1)
SON-PS
SON-RS
100.2 (15.1)
99.9 (15.0)
100.1 (15.2)
100.1 (14.5)
99.9 (15.0)
100.0 (15.6)
100.6 (15.2)
100.0 (14.9)
SON-IQ
100.1 (15.0)
100.1 (14.8)
99.9 (15.2)
100.5 (15.0)
47
PSYCHOMETRIC CHARACTERISTICS
Table 5.4
Floor and Ceiling Effects at Different Ages
Floor Effect (lowest possible standardized score)
Age
Pat
Mos
Puz
Sit
Cat
Ana
PS
RS
IQ
2;0
2;3
2;6
2;9
3;0
3;3
3;6
9
8
6
4
3
1
1
6
6
5
4
3
2
1
4
3
3
2
2
1
1
8
7
5
4
3
2
1
9
8
7
5
3
2
1
7
6
5
3
2
1
1
70
68
62
52
52
50
50
86
80
72
61
52
50
50
73
68
63
51
50
50
50
Pat
Mos
Puz
Sit
Cat
Ana
PS
RS
IQ
5;0
5;6
6;0
6;6
7;0
7;6
8;0
19
19
18
16
15
14
13
19
19
18
17
16
15
14
19
19
18
17
16
16
15
19
19
18
17
16
16
16
19
18
18
17
16
16
15
19
19
19
18
17
16
15
150
150
149
141
137
132
126
150
150
150
149
140
138
134
150
150
150
150
143
139
133
These results indicate that the standardization model is adequate and gives a good estimate of
the distribution of the scores in the population; the deviations in the samples can be seen as
chance deviations from the population values resulting from sample fluctuations.
48
SON-R 2,-7
have no effect on the scores. In the case of the SON-R 2,-7, this condition is not fulfilled for
two reasons. First, the entry and discontinuation rules mean that scores on some items determine whether other items are or are not administered. The latter items are, however, scored as
correct or incorrect. When item scores become interdependent in this way, reliability is
inflated. In the case of the SON-R 5,-17, where this was investigated, the mean overestimation
of the reliability of the subtests as a result of the adaptive procedure was .11 (Snijders, Tellegen
& Laros, 1989, p. 46-51). The item scores are not independent for a second reason. After every
item that a child cannot solve independently, extensive help and feedback are given. This often
leads to the next, more difficult item being solved correctly. These inconsistencies, which have
a valid cause, lead to an underestimation of reliability.
The net effects of the underestimation of reliability (as a result of valid inconsistencies) on
the one hand, and the overestimation of reliability (as a result of artificial consistencies) on the
other hand, cannot be determined. Therefore, the reliability of the subtests with the SON-R
2,-7 was based on the formulas for internal consistency and no correction for under or overestimation was applied. The uncertainty about the correctness of the estimate of reliability is a
reason to be reticent about the individual interpretation of results on the subtest level. It was also
the reason why the standardized subtest scores were not presented, as was done with the
SON-R 5,-17, in such a way that the reliability was taken into account in the score.
Table 5.5
Reliability, Standard Error of Measurement and Generalizability of the Test Scores
Reliability
Age
Pat
Mos
Puz
Sit
Cat
Ana
Mean
PS
RS
IQ
2;6
3;6
4;6
5;6
6;6
7;6
.79
.73
.72
.74
.76
.79
.41
.76
.77
.74
.78
.84
.45
.75
.75
.70
.69
.69
.79
.66
.62
.62
.66
.69
.81
.73
.70
.68
.68
.69
.75
.73
.74
.78
.83
.85
.67
.73
.72
.71
.73
.76
.68
.86
.88
.87
.87
.88
.89
.84
.81
.81
.84
.86
.86
.90
.90
.90
.91
.92
Mean
.75
.73
.69
.67
.71
.78
.72
.85
.84
.90
Pat
Mos
Puz
Sit
Cat
Ana
PS
RS
IQ
2;6
3;6
4;6
5;6
6;6
7;6
1.4
1.6
1.6
1.5
1.5
1.4
2.3
1.5
1.5
1.5
1.4
1.2
2.2
1.5
1.5
1.6
1.7
1.7
1.4
1.7
1.9
1.8
1.8
1.7
1.3
1.6
1.7
1.7
1.7
1.7
1.5
1.6
1.5
1.4
1.2
1.2
8.5
5.6
5.3
5.4
5.4
5.2
5.0
6.1
6.6
6.5
6.0
5.5
5.6
4.7
4.7
4.7
4.5
4.2
Generalizability
Age
PS
RS
IQ
Age
PS
RS
IQ
2;6
3;6
4;6
5;6
6;6
7;6
.45
.67
.77
.78
.75
.71
.74
.66
.57
.56
.63
.71
.71
.77
.78
.78
.80
.82
2;6
3;6
4;6
5;6
6;6
7;6
11.1
8.7
7.3
7.0
7.5
8.1
7.7
8.8
9.8
9.9
9.1
8.1
8.0
7.1
7.1
7.0
6.7
6.4
Mean
.69
.64
.78
PSYCHOMETRIC CHARACTERISTICS
49
The calculated values of labda2 have been fitted in the standardization model as a function of
age. The results for a number of ages are presented in table 5.5. The mean reliability of the
subtests is .72; it increases, though not regularly, with age. Very low reliabilities were found for
Mosaics and Puzzles at the age of 2;6 years. A learning effect may occur with these subtests at
a young age when help is offered, and this may result in an underestimation of reliability.
In the second part of table 5.5 the standard errors of measurement are presented. The standard error of measurement is the standard deviation of the standardized scores that would be
received by an individual child, if the subtest could be administered to him or her many times. It
indicates how strongly the test results of a child can fluctuate. Section 13.4 describes how to use
the standard error of measurement to test the differences between scores statistically.
50
SON-R 2,-7
estimation was used to construct the interval in which the domain score will, with a certain
probability, be found. This interval is not situated symmetrically around the given score. When
the point of departure is the distribution of the scores in the norm population, the middle of the
interval equals 100 + (IQ-100). The standard error of estimation equals 15(1-). In the
norm tables, this interval is presented for each IQ score with a latitude of 1.28 times the standard
error of estimation. This means that the probability that the domain score is in the interval is
80%. When using the computer program, these intervals are also presented for the Performance
Scale and the Reasoning Scale.
For individual assessments, the interval gives a good indication of the accuracy with which a
statement, based on the test results, can be made about the level of intelligence. The interval is
broader than the intervals that are based, as is customary, on the reliability of the test. When
interpreting the results of an intelligence test, one will, in general, not want to limit oneself to
the specific abilities included in the test. The interval, based on generalizability, takes into
account the facts that the number of items per subtest is necessarily limited, and that the choice
of the subtests also denotes a limitation.
Given the problems in correctly determining the reliability of the subtests with the SON-R
2,-7, it is fortunate that the calculation of the generalizability of the total scores depends
exclusively on the number of subtests and the strength of the correlations between the subtests,
and not on the reliability of the subtests.
, -17
Comparison with the Preschool SON and the SON-R 5,
The reliability and generalizability of the IQ score of the SON-R 2,-7 were compared with the
previous version of the test, the Preschool SON, and with the revision of the SON for older
children, the SON-R 5,-17. In the manual for the Preschool SON, reliabilities based on calculations over combined age groups were presented. The combination of age groups leads to a
high overestimation of reliability. Therefore, new calculations were carried out on the original
normalization material, and the reliability and the generalizability for homogeneous age groups
were determined. (Tellegen et al., 1992).
The reliability and the generalizability of the SON-R 2,-7 were greatly improved with
respect to the Preschool SON. This is especially so for the more extreme age groups. However,
an improvement can also be seen for the four-year-olds, for whom the reliability and generalizability of the old Preschool SON were highest (table 5.6).
Table 5.6
Reliability and Generalizability of the IQ Score of the Preschool SON, the SON-R 2,-7 and the
SON-R 5,-17
Reliability
Age
Generalizability
P-SON
2;6 years
SON-R
2,-7
SON-R
5,-17
P-SON
SON-R
2,-7
SON-R
5,-17
.86
2;6 years
.54
.71
.90
3;6 years
.69
.77
.90
4;6 years
.74
.78
.90
.90
5;6 years
.71
.78
.79
.91
.92
6;6 years
.62
.80
.81
.92
.93
7;6 years
.52
.82
.83
Age
.78
3;6 years
4;6 years
.86
5;6 years
.82
6;6 years
7;6 years
51
PSYCHOMETRIC CHARACTERISTICS
In comparison with the SON-R 5,-17, the results of similar age groups for reliability and
generalizability are practically the same. However, for the total age range of the SON-R
5,-17, the mean reliability (.93) and the generalizability (.85) are higher than for the
SON-R 2,-7.
.50
.39
.35
.35
.34
Mos Puz
.45
.36
.36
.37
.34
.30
.28
Sit
.39
.31
Cat
.39
Ana
Pat
Mos
Puz
Sit
Cat
Ana
Pat
Mos
Puz
Sit
Cat
Ana
Pat
Mos Puz
Sit
Cat
Ana
.36
.24
.33
.32
.28
.39
.30
.39
.31
.31
.31
.22
.51
.29
.45
Pat
Mos Puz
Sit
Cat
Ana
.56
.44
.41
.37
.43
.47
.48
.33
.39
.33
.36
.41
Pat
Mos Puz
.60
.50
.33
.36
.32
.49
.34
.34
.40
.32
.32
.28
Sit
.33
.28
Cat
.33
Ana
Pat
Mos
Puz
Sit
Cat
Ana
.38
.26
.35
52
SON-R 2,-7
Table 5.8
Correlations of the Subtests with the Rest Total Score and the Square of the Multiple
Correlations
Correlation with Rest Total
Pat
Mos
Puz
Sit
Cat
Ana
2-7 years
2-3
4-5
6-7
.56
.59
.50
.49
.51
.47
.44
.52
.42
.51
.59
.45
.61
.63
.55
.45
.47
.45
.63
.63
.53
.54
.46
.54
Pat
Mos
Puz
Sit
Cat
Ana
2-7 years
2-3
4-5
6-7
.33
.37
.27
.25
.27
.24
.20
.28
.20
.31
.40
.24
.43
.45
.33
.20
.23
.22
.41
.43
.30
.30
.24
.30
53
PSYCHOMETRIC CHARACTERISTICS
To determine how important the differences in loadings between the three age groups were, a
Simultaneous Components Analysis was carried out on these data sets (Millsap & Meredith,
1988; Kiers & Ten Berge, 1989). This was done to examine whether a uniform solution of
component weights explained (substantially) less of the variance than the solutions that were
optimal for the separate age groups. The analysis with the SCA program (Kiers, 1990) showed
that this was not the case: the uniform solution over the three age groups explained 61.1% of the
variance and the separate optimal solutions explained 61.4% of the variance. Also important
was the fact that the simple weights, being 1 or 0 (depending on the scale to which the subtest
belongs), were almost as effective as the optimal uniform solution. Using simple weights, as is
done in the construction of the Performance Scale and the Reasoning Scale, the percentage of
explained variance was 60.8%.
Table 5.9
Results of the Principal Components Analysis in the Various Age and Research Groups
Eigenvalue and Percentage of the Explained Variance by the first two Main Components
2-7 years
F1
F2
2.8
.8
47%
13%
2-3 years
2.7
.8
45%
14%
4-5 years
2.9
.8
48%
14%
6-7 years
3.0
.8
50%
14%
2-3 years
F1
F2
4-5 years
F1
F2
6-7 years
F1
F2
Pat
Mos
Puz
.72
.75
.80
.29
.29
.12
.44
.72
.85
.43
.30
.07
.82
.78
.78
.23
.31
.19
.69
.79
.79
.37
.23
.07
Sit
Cat
Ana
.35
.17
.18
.59
.80
.75
.30
.25
.05
.65
.78
.78
.22
.20
.21
.68
.74
.70
.65
.13
.35
.29
.88
.70
Boys
F1
F2
Girls
F1
F2
low SES
F1
F2
high SES
F1
F2
Pat
Mos
Puz
.84
.79
.59
.13
.28
.38
.71
.72
.84
.31
.33
.07
.81
.76
.76
.15
.24
.10
.62
.70
.84
.39
.31
.08
Sit
Cat
Ana
.24
.16
.25
.70
.79
.66
.39
.16
.17
.55
.81
.75
.36
.00
.46
.59
.88
.46
.23
.12
.33
.71
.84
.64
Immigrant
F1
F2
Tested outside
the Netherlands
F1
F2
Gen./Perv.
Dev. Disorder
F1
F2
Speech/language/
Hearing Disorder
F1
F2
Pat
Mos
Puz
.90
.71
.52
.03
.39
.34
.85
.82
.75
.28
.36
.39
.80
.80
.82
.37
.33
.24
.78
.79
.82
.28
.26
.18
Sit
Cat
Ana
.10
.22
.38
.87
.72
.60
.30
.41
.28
.78
.71
.80
.42
.29
.23
.66
.80
.78
.57
.28
.24
.32
.79
.83
54
SON-R 2,-7
In the second part of table 5.9, the loadings on the first two components are shown for different
samples of the norm group. These are the boys (N=561), the girls (N=563), and the children
whose parents had either a low (N=233) or a high SES level (N=202). The SES level and its
correlation with the test performances are described in section 6.6. In the four groups the
loadings of the subtests are consistent with a distinction between performance and reasoning
tests, with one exception: the loading of Analogies was the same for both components for the
children with a low SES level.
The last part of table 5.9 presents the component loadings for a number of groups who were
not, or only partially, tested in the context of the standardization research. The first group
consisted of immigrant children (N=118). These were children who lived in the Netherlands and
whose parents were both born abroad. About two thirds of this group was tested in the context of
the standardization research. The remaining one third was tested at primary schools in the
context of the validation research (see chapter 8). The second group consisted of children who
were tested in other countries (N=440). The research was conducted in Australia, the United
States of America and Great Britain, mainly with children without specific problems or handicaps, although some children with impaired hearing, bilingual children and children with a
learning handicap were included (see section 9.5, 9.6 and 9.7). The third and fourth groups
consisted of children with specific problems and handicaps, who were examined in the Netherlands in the context of the validation of the test (see chapter 7). The third group consisted of
children with a general developmental delay and children with a pervasive developmental
disorder (N=328). The fourth group consisted of children with a language/speech disorder,
impaired hearing and/or deaf children (N=346). In these four groups, with one exception, the
loadings on the first two rotated components corresponded to the distinction between performance and reasoning tests. In the group of children with language/speech and/or impaired hearing
disorders, the subtest Situations had its highest loading on the first performance component.
The distinction made by the SON-R 2,-7 between the Performance Scale and the Reasoning
Scale, is supported to a large extent by these results in very different groups. Though the
reliability of the difference between scores on the two scales is moderate, this distinction is the
most relevant one for the intra-individual interpretation of the test results. The empirical validity
of the distinction between the Performance Scale and the Reasoning Scale will be discussed in
section 9.9.
55
PSYCHOMETRIC CHARACTERISTICS
Table 5.10
Test-Retest Results with the SON-R 2,-7 (N=141)
r
Patterns
Mosaics
Puzzles
Situations
Categories
Analogies
.56
.64
.60
.48
.64
.49
SON-PS
SON-RS
.74
.69
SON-IQ
.79
Admin. I
Mean (SD)
10.2
10.6
10.2
10.4
10.5
10.6
(3.0)
(2.8)
(2.9)
(2.5)
(2.8)
(2.9)
Admin. II
Mean (SD)
10.8
11.6
11.4
11.6
11.2
11.1
Difference
(2.6)
(3.1)
(2.8)
(3.1)
(2.9)
(3.0)
.6
1.0
1.1
1.2
.7
.5
102.5 (14.3)
103.5 (13.7)
107.9 (14.3)
108.7 (15.2)
5.5
5.2
103.4 (13.7)
109.4 (14.7)
6.0
1.2 (Situations). The scores on the Performance Scale and the Reasoning Scale increased by
more than 5 points. The IQ score increased, on average, by 6 points. All differences in mean
scores were significant at the 1% level, except for the subtests Patterns and Analogies.
A distinction was made between the children who were younger than 4;6 years (mean age
3;4 years; N=67) at the first administration, and children who were older (mean age 5;7 years;
N=74). In the younger group the test-retest correlation for the IQ was .78, in the older group .81.
The correlation for the Reasoning Scale decreased slightly with age (from .71 to .69). For the
Performance Scale it increased clearly (from .65 to .80). The increase in the mean IQ in both
groups was practically equal.
Profile analysis
A profile analysis was carried out to determine the meaning of the intra-individual differences
between the subtest scores of a single subject. One of the characteristics of the profile is the
dispersion of the scores. This was calculated as the standard deviation of the six scores (the
square root of the mean square of the deviations of the six subtests from the individual mean). In
the entire norm group the mean of the dispersion was 2.0. For 24% of the children, the intraindividual dispersion was 2.5 or higher, and for 9% the dispersion was 3.0 or higher.
The mean individual dispersion for the 141 children who were tested twice with the SON-R
2,-7 was 2.0 on both occasions. Remarkably, the correlation between the dispersion on the first
and second administration was weak (.17) and not significant.
Another important characteristic of the profile is the relative position of the subtest scores.
To determine whether this was stable, the six subtest scores from the first administration were
correlated, for each child, with the six scores of the second administration. The mean correlation
was .32. The strength of the correlation depends very much on the dispersion of the scores on
the first administration. Clearly, if the differences are small, they are determined largely by
errors of measurement and are therefore unstable. Where the dispersion on the first administration was less than 2.0 (N=69), the mean correlation was .22; where the dispersion was 2.0 to 3.0,
the mean correlation was .38, and for the twelve children who had a dispersion of 3.0 or more,
the mean correlation was .61. This indicates that the differences between the subtest scores must
be substantial before we can conclude that they will remain stable over a period of some months.
When using the computer program, the dispersion is calculated and printed.
The difference between the scores on the Performance Scale and the Reasoning Scale in the
first administration correlated .46 with the difference between the two scores in the second
administration. For the children younger than 4;6 years, the correlation was .43 and for the older
children it was .50.
56
SON-R 2,-7
Table 5.11
Examples of Test Scores from Repeated Test Administrations (I and II)
Example A
I
II
Example B
I
II
Example C
I
II
Example D
I
II
SON-IQ
97
108
109
116
106
110
121
120
SON-PS
SON-RS
100
93
105
113
108
107
110
118
100
113
94
126
122
116
123
113
Patterns
Mosaics
Puzzles
Situations
Categories
Analogies
11
8
11
9
9
9
14
11
7
14
12
10
12
13
9
13
12
8
10
17
8
13
11
14
11
9
10
8
14
14
9
9
9
13
16
13
14
9
18
10
15
12
12
12
17
12
13
11
Dispersion
Correlation
1.1
2.4
.18
2.0
2.9
2.3
2.7
3.1
.32
.56
2.0
.78
As an example, the scores of a few children on the two administrations are presented in table
5.11. The dispersion and the correlation between the six scores are also shown. The examples
illustrate that important changes can take place in the intra-individual order of the subtest
scores.
57
In this chapter the relationship is discussed between test performance and a number of variables
that are important in order to judge the validity of the test. The analyses are based on the results
of the standardization research. Other tests were also administered to a large number of the
children in order to validate the SON-R 2,-7. The results are described in chapter 9. A comparison is made in section 9.11 between the SON-R 2,-7 and other tests, with respect to their
relationship with a number of variables that are discussed in this chapter, i.e. SES index,
parents country of birth, evaluation by the examiner, and the schools evaluation of language
skills and intelligence.
- 40 min
41 - 50 min
51 - 60 min
61 - 70 min
> 70 min
2-7 years
2 yrs
3-7 yrs
16%
32%
32%
14%
6%
49%
30%
17%
3%
1%
9%
32%
36%
16%
7%
Mean
(SD)
Patterns
Mosaics
Puzzles
Situations
Categories
Analogies
7.0
10.3
8.5
6.3
8.4
8.8
(3.0)
(3.9)
(3.1)
(2.3)
(3.3)
(2.8)
Total
49.2 (10.7)
58
SON-R 2,-7
The duration of the administration of the separate subtests was known for 1014 children (table
6.1). Situations had the shortest duration of administration with a mean of 6.3 minutes and also
the narrowest dispersion in duration. Mosaics had the longest duration of administration with a
mean of 10.3 minutes, and the widest dispersion.
The duration of administration was also recorded for children who participated in other
validation research projects (see chapter 7). The mean duration (including short breaks) for these
children, who had varying problems and handicaps in cognitive development and communication,
was 57 minutes. This was 5 minutes longer than for the children in the standardization research.
The duration of administration was relatively short for children with a general developmental
delay (a mean of 53 minutes) and relatively long for deaf children (a mean of 66 minutes).
dev.
Day of Week
108
231
240
139
162
115
70
1.9
.3
.4
1.9
.6
2.4
1.2
Monday
Tuesday
Wednesday
Thursday
Friday
dev.
198
287
162
262
156
.8
.1
.5
1.2
.3
Period
I
II
III
IV
dev.
305
302
178
280
.6
.2
.6
.8
59
Table 6.3
Examiner Effects (N=1073)
Mean Scores as
Deviation from the
Total Mean
Examiner
A
B
C
D
E
F
G
H
I
J
K
dev.
104
98
115
50
92
61
97
110
123
115
108
4.81
3.14
2.58
2.45
0.21
0.20
1.24
1.76
2.46
2.78
2.99
Score
eta
beta
Patterns
Mosaics
Puzzles
Situations
Categories
Analogies
.15
.22
.14
.13
.21
.16
.15
.20
.15
.12
.19
.17
SON-PS
SON-RS
.17
.18
.16
.18
SON-IQ
.18
.18
The strength of the examiner influence showed no clear relation to age; the beta for the two- and
three-year-olds was .23, the beta for the four- and five-year-olds was .18 and the beta coefficient
for the six- and seven-year-olds was .28.
The mean IQ score of the children who were tested by three male examiners was 2.2 points
lower than the mean IQ score of the children who were tested by the female examiners. The pvalue of the difference was .02. There was no interaction effect between the sex of the child and
the sex of the examiner. The number of male examiners was too small to assess whether the
difference between the male and the female examiners was based on their sex or whether this
was caused by personal characteristics unrelated to their sex.
With the exception of Situations, the examiner influence was significant in the various
subtests. The influence was greatest for Mosaics and Categories, the tests that are administered first. This may indicate that the differences between the examiners were related to the
manner in which the child was put at ease and motivated at the beginning of the test
administration.
60
SON-R 2,-7
Table 6.4
Regional and Local Differences (N=1102)
Deviations of the IQ Scores in relation to the Total Mean
I: without controlling for other variables
II: after controlling for other variables
Region
N
North/East
South
West
Community Size
(x 1000)
N
I
II
342 1.0 .7
212 2.3 1.5
548
.2 .1
Degree of Urbanization
II
< 20 375 .4
.4
<100 489
.8
.3
>100 238 2.1 1.4
Rural Community
Urbanized Rural Comm.
Commuter Community
Urban Community
II
164
250
183
505
.4
.6
.7
.1
.5
.0
.6
.1
exception was that with the SON-R 5,-17, relatively high performances were found for children in commuter communities, both before and after controlling for other variables.
Score
Boys
Mean (SD)
(2.8)
(3.0)
(3.0)
(2.9)
(2.9)
(2.9)
Girls
Mean (SD)
(3.0)
(3.0)
(3.0)
(2.9)
(3.0)
(2.8)
0.5
0.2
0.0
0.5
0.7
0.9
3.06 *
1.08
-.04
2.83 *
3.82 *
5.07 *
Patterns
Mosaics
Puzzles
Situations
Categories
Analogies
9.7
9.9
10.0
9.7
9.7
9.5
SON-PS
SON-RS
99.3 (14.9)
97.6 (14.8)
101.0 (15.2)
102.2 (14.9)
1.7
4.6
1.86
5.19 *
SON-IQ
98.4 (14.8)
101.9 (14.9)
3.5
3.94 *
10.3
10.1
10.0
10.2
10.4
10.4
Difference
61
On the basis of other research data, the decrease or disappearance of the difference between
boys and girls with age is plausible. No sex difference was found during the standardization
research of the SON-R 5,-17 during which 1350 children from 6;6 to 14;6 years were tested.
The mean IQ score of the boys was 100.1 and of the girls 100.0. During the American standardization research of the K-ABC (Kaufman & Kaufman, 1983), a positive difference was found of
4.4 points on the total score for girls in the age group from 2, to 5 years, whereas this difference
was only .2 in the age group from 5 to 12, years. During the standardization of the GOS
2,-4,, the Dutch version of the K-ABC for young children, the total score for the girls proved
to be 4.7 points higher than the total score for the boys (Neutel, Van der Meulen & Lutje
Spelberg, 1996). The results with regard to sex-related differences found in the SON-R tests and
the K-ABC are thus very similar. In the case of adolescents and adults, however, males appear to
perform better on intelligence tests (Lynn, 1994).
Occupational Level
Father
Pct
Mean
(SD)
0
1
2
3
33%
32%
19%
16%
96.1
99.6
102.6
108.4
(14.3)
(14.2)
(14.8)
(14.8)
Educational Level
Father
Pct
Mean
(SD)
0
1
2
3
4
7%
38%
29%
19%
7%
92.9
96.9
101.2
104.7
111.6
(12.5)
(14.7)
(13.4)
(14.9)
(15.0)
(Un)Skilled Worker/Housewife
Lower Empl/Sm. Ind. Business
Intermediate Employee
Professional
Primary Education
General Secondary Education
Higher Gen.Secondary Education
Tertiary: Non-University
Tertiary: University
Mother
Pct
Mean
(SD)
39%
44%
14%
3%
96.1
101.5
106.3
110.4
(14.8)
(14.0)
(15.1)
(14.7)
Mother
Pct
Mean
(SD)
6%
43%
34%
14%
3%
92.4
96.6
102.7
107.7
111.5
(12.3)
(14.3)
(14.2)
(14.5)
(15.8)
62
SON-R 2,-7
tional level of both parents, and the mean IQ for each category, is presented for these children.
The table shows that the IQ score clearly increased with the occupational and educational level
of the father and the mother. The correlation with the occupational level of the father was .28
and with the occupational level of the mother .27. The correlation with the educational level of
the father was .31 and with the educational level of the mother .32. All these correlations were
significant at the 1% level.
2-3 years
(N=396)
4-5 years
(N=409)
6-7 years
(N=313)
Pct
Mean
(SD)
Mean
(SD)
Mean
(SD)
Mean
(SD)
21%
32%
29%
18%
92.6
98.4
102.8
107.9
(13.7)
(14.0)
(14.0)
(14.9)
93.6
98.9
101.6
105.5
(15.2)
(14.8)
(14.5)
(12.5)
92.6
98.0
102.1
108.1
(13.2)
(14.1)
(13.3)
(17.1)
92.0
98.4
105.7
111.1
(13.2)
(12.6)
(13.8)
(13.9)
63
Table 6.8
Relationship Between IQ and Country of Birth of the Parents (N=1116)
Both Parents
Native Dutch
One Parent
Foreign
Both Parents
Foreign
Country of Birth
Mean
(SD)
Mean
(SD)
Mean
(SD)
The Netherlands
Surinam/Antilles
Turkey
Morocco
Other Western
Other Non-Western
969
100.7 (14.9)
11
2
1
27
25
103.6
79.5
82
103.0
101.0
(12.8)
(13.4)
(17.2)
(14.5)
27
18
21
3
12
91.8
94.1
88.8
107.3
98.9
(13.5)
(13.0)
(10.7)
(17.9)
(16.9)
Total
969
100.7 (14.9)
66
101.3 (15.7)
81
93.2 (13.8)
group (one parent born abroad). The eight immigrant children who were later added to the norm
group are not included in this analysis. With reference to the country of birth of the immigrants,
a distinction was made between the three most important groups, i.e. Surinam or the Antilles,
Turkey and Morocco. The remaining countries were subdivided into Western (Europe, North
America, Australia) and non-Western countries.
The mean IQ score of the immigrant children was 93.2, 7.5 points lower than the mean IQ of
native Dutch children. The difference was significant at the 1% level. The mean IQ of the mixed
group, with one foreign parent, was slightly higher than that of the native Dutch children.
However, this difference was not significant. In the mixed and immigrant groups, the performances of the Turkish and Moroccan children were low; the performances of the Surinam and the
Antillean children were above average in the mixed group and low in the immigrant group. The
remaining Western children scored above average in both groups and the remaining nonWestern children had an average score in both groups.
64
SON-R 2,-7
Table 6.9
Relationship Between the Evaluation by the Examiner and the IQ
Motivation
Poor
Mediocre/Varying
Good
Pct
(SD)
Pct
(SD)
Pct
90.3 (14.9)
98.2 (13.3)
101.4 (15.0)
1%
9%
89%
85.0 ( 4.2)
96.5 (13.6)
100.4 (15.4)
2%
98%
3%
24%
73%
Mean
Correlation
Concentration
Poor
Mediocre/Varying
Good
.15*
Pct
7%
32%
61%
Mean
Pct
89.3 (17.8)
97.0 (12.7)
103.2 (14.5)
1%
17%
82%
Poor
Mediocre/Varying
Good
Mean
3%
20%
77%
Mean
Pct
89.3 (14.7)
97.8 (13.4)
101.4 (14.9)
0.2%
7%
93%
Correlation
Pct
77.0 (11.9)
95.4 (12.4)
101.2 (15.4)
9%
91%
Mean
Pct
Poor
Mediocre/Varying
Good
7%
27%
66%
Correlation
Mean
Pct
Pct
87
93.9 (12.4)
100.3 (15.4)
0.3%
4%
96%
87.3 (11.7)
95.8 (14.1)
103.5 (14.1)
9%
91%
Mean
Mean
(SD)
112
89.5 ( 9.2)
100.7 (14.9)
.11
(SD)
Pct
91.4 (13.2)
100.7 (15.2)
3%
97%
.17*
(SD)
91.8 (16.5)
101.1 (14.5)
.11
(SD)
.33*
Mean
.18*
(SD)
.16*
Comprehension
of directions
106.9 (15.4)
100.1 (14.9)
.21*
(SD)
(SD)
.07
(SD)
.28*
Pct
Mean
.12*
(SD)
Correlation
Cooperation
Mean
Mean
(SD)
88.9 (13.9)
100.6 (14.8)
.14*
group. In this group the correlations of motivation and cooperation with intelligence were also
significant.
The four evaluations were also combined. Zero was the lowest possible combined score (all
four evaluations poor) and eight the highest possible combined score (all four evaluations
good). The combined score, which gives an indication of how well the child responds to being
tested, increased greatly with age until the age of four years. In the age groups of 2;3, 2;9, 3;3
and 3;9 the means were respectively 5.3, 6.3, 7.2, 7.5. From four years onwards the mean
gradually increased to 7.9 at the age of 7;3 years.
65
corresponds to first grade or form, and class four to second grade or form of primary schools).
At all schools, an evaluation was requested of motivation, concentration and work tempo of the
child, and on intelligence, motor development and language development. In classes 3 and 4 an
evaluation of the level of reading, writing and arithmetic was also requested. The evaluation was
given on a 5-point scale, ranging from low via average to high.
Table 6.10 presents the correlations between the schools evaluations of these characteristics
and the Performance Scale, the Reasoning Scale and the SON-IQ. Correlations are presented for
the entire group (N=616), and for the pupils of classes 1 and 2 (N=344, mean age 5;2 years) and
the pupils of classes 3 and 4 (N=272, mean age 6;9 years) separately. All correlations were
significant at the 1% level with one-tailed testing.
In classes 1 and 2, the evaluations of intelligence, concentration and language development
had strong relationships with the IQ score (the correlations are .47, .47 and .44 respectively) The
evaluations of motivation and work tempo also had a reasonably strong correlation with the IQ.
The weakest correlation was found for the evaluation of motor development (r=.28). After a
stepwise regression analysis, the multiple correlation of the evaluations of intelligence, concentration and language development with the IQ score was .53. The correlations of the evaluations
with the Performance Scale were higher than with the Reasoning Scale, except for the evaluation of motor development where little difference was found.
In classes 3 and 4, the correlations of the IQ score with the evaluations of intelligence and
language development were slightly weaker than in groups 1 and 2 (.44 and .42 respectively).
The correlations with motivation, concentration and work tempo decreased more, as did the
correlation with the evaluation of motor development. In classes 3 and 4 an evaluation was also
given of the level of reading, writing and arithmetic. Of these, arithmetic had the highest
correlation with the IQ score (r=.36). After stepwise regression analysis, the multiple correlation of the evaluations of intelligence, language development and writing skills with the IQ
score was .48. In classes 3 and 4 the evaluation of writing skills clearly had a stronger correlation with the Performance Scale than with the Reasoning Scale; this was less so for arithmetic
and work tempo. The other evaluations had stronger correlations with the Reasoning Scale.
For all classes combined, the correlation between the SON-IQ and the evaluation of intelligence was .46; the correlations with language development (r=.44) and with the evaluation of
concentration by the teacher (r=.40) were also high.
In table 6.11 the correlations of the subtests with the teachers evaluations are presented.
With the exception of the correlation between writing and Categories, all correlations were
significant at the 1% level. Of all the subtests, Mosaics had the strongest correlation with the
evaluation of intelligence (r=.38) and Situations the weakest (r=.24). Situations also had a weak
Table 6.10
Correlations of the Total Scores with the Evaluation by the Teacher
Classes 1 and 2
(N=344)
Classes 3 and 4
(N=272)
Classes 1-4
(N=616)
Evaluation
PS
RS
IQ
PS
RS
IQ
PS
RS
IQ
Motivation
Concentration
Tempo
.34
.45
.33
.24
.37
.27
.34
.47
.34
.23
.27
.26
.28
.31
.23
.28
.32
.27
.30
.37
.30
.27
.35
.26
.32
.40
.31
Intelligence
Motor Development
Language Development
.44
.25
.41
.37
.24
.34
.47
.28
.44
.37
.17
.36
.42
.18
.39
.44
.19
.42
.42
.22
.40
.40
.22
.37
.46
.24
.44
.26
.31
.34
.29
.23
.31
.31
.30
.36
Reading
Writing
Arithmetic
66
SON-R 2,-7
Table 6.11
Correlations of the Subtest Scores with the Evaluation by the Teacher
Groups 1-4 (N=616)
Evaluation
Pat
Mos
Puz
Sit
Cat
Ana
Motivation
Concentration
Tempo
.25
.31
.24
.24
.30
.27
.25
.29
.22
.15
.24
.14
.21
.24
.22
.24
.30
.22
Intelligence
Motor Development
Language Development
.33
.22
.33
.38
.14
.33
.31
.18
.32
.24
.12
.26
.33
.15
.29
.33
.21
.28
Pat
Mos
Puz
Sit
Cat
Ana
Reading
Writing
Arithmetic
.21
.28
.27
.17
.18
.33
.24
.28
.23
.18
.17
.19
.24
.13
.19
.24
.23
.30
correlation with the other evaluations. Patterns and Analogies had the strongest correlations
with the evaluation of motor development. The three performance subtests correlated most
highly with the evaluation of language development. Situations and Categories (multiple choice
tests) correlated less strongly with the evaluations of motivation, concentration and work tempo
than did the other subtests. Puzzles, Categories and Analogies had relatively strong correlations
with the evaluation of reading, Patterns and Puzzles with the evaluation of writing, and Patterns
and Analogies with the evaluation of arithmetic.
For the group as a whole, the evaluation of intelligence was low for six children, below
average for 63 children, average for 343 children, above average for 107 children, and high
for 33 children. The mean IQ scores were 74.2, 89.2, 97.7, 107.0 and 114.6 respectively. This
shows a difference of more than 40 IQ points between the children who were evaluated, more
than half a year after administration of the test, by the teacher as being either less or highly
intelligent.
When the fact is taken into account that the relationships examined here refer to subjective
evaluations by a large number of different teachers, and not to standardized measurements of
school achievement, the correlation between the evaluation of intelligence and the SON-IQ can
certainly be called good.
67
In practice, intelligence tests are administered mainly to children with a cognitive developmental delay and to children with specific handicaps. Many of these children have a handicap in
communicative skills such as language or speech, and/or hearing problems. With these children,
the use of a nonverbal intelligence test that does not depend on the use of language is a
prerequisite for an independent evaluation of their cognitive skills. In this chapter the results are
discussed of the research carried out with the SON-R 2,-7 on a number of groups of special
children. In chapter 9 the correlations between the SON-IQ and the scores on other tests
administered to these children will be discussed.
68
SON-R 2,-7
children and children who did not have multiple handicaps. The results of the pupils at one
institute for the deaf were not taken into account in the presentation because of a strong
examiner effect. At the four other institutes for the deaf 95 children were tested with the
SON-R 2,-7.
Autism teams
Three different autism teams tested 44 children who were diagnosed as autistic or as having a
developmental disorder related to autism. Autism teams are ambulatory institutions concerned with the diagnosis and guidance of children with these disorders. Autism and autism
related disorders belong to the category of pervasive developmental disorders (APA, 1987).
General
developm.
delay
Pervasive
developm.
disorder
100
89
11
162
149
183
Language/
speech
disorder
Hearing
impaired
Deaf
13
21
116
44
90
95
44
1
44
63
27
2
92
674
238
90
179
73
94
69
Hearing impaired
This group consisted of 73 hearing-impaired children with a hearing loss of more than 30 dB
and less than 90 dB. The children were mainly pupils from the schools for children with
language, speech and hearing disorders, and the outpatients department for nose, throat and
ear surgery. Two pupils who had been tested at the Institute for the deaf were also included in
this group.
Deaf
The deaf children had a hearing loss of at least 90 dB. The group of 94 deaf children consisted
mainly of children who had been tested at the Institutes for the deaf. Two pupils from the
schools for children with language, speech and hearing disorders, with a hearing loss of more
than 90 dB were also included in this group.
Pervasive
development
disorder
Speech/
language
disorder
Hearing
impaired
Deaf
72%
28%
79%
21%
70%
30%
63%
37%
60%
40%
Mean
(SD)
5;2
(1;2)
5;6
(1;2)
5;1
(1;1)
5;3
(1;3)
5;3
(1;3)
2 years
3 years
4 years
5 years
6 years
7 years
4%
13%
24%
31%
24%
5%
1%
11%
21%
27%
30%
10%
3%
17%
25%
34%
20%
1%
1%
18%
25%
21%
29%
7%
1%
19%
20%
25%
28%
7%
3.1
(2.1)
5.0
(2.7)
3.9
(2.3)
4.4
(2.2)
5.2
(2.8)
8%
4%
17%
23%
7%
86%
8%
6%
88%
5%
8%
96%
3%
1%
95%
2%
4%
94%
6%
0%
9%
1%
18%
23%
10%
Sex
Boys
Girls
Age
SES Index
Mean
(SD)
Unknown
Country of birth
Native Dutch
Mixed
Immigrant
Unknown
70
SON-R 2,-7
Children with one parent who was born outside the Netherlands belong to the mixed category.
Boys were over-represented in all groups. This is the case, in particular, in the groups with a
general developmental delay, a pervasive developmental disorder and with speech or language
disorders. In these groups the percentage of boys varied between 70% and 79%. On a national
scale, the percentage of boys in the age range up to 8 years in special education is also twice as
high as the percentage of girls (CBS, 1993). In the groups of hearing-impaired and deaf children, the ratio of boys to girls was lower, with the percentage of boys approximately 60%.
The age distribution was very similar in the various groups. The mean age varied from 5;1 to
5;6 years. Most children were between 3 and 6 years old at the time of the test administration. A
small number of two-year-olds (most older than 2;6 years) and a small number of seven-yearolds (most younger than 7;6 years) were tested.
In the norm group the mean SES level, based on the educational and occupational level of the
parents, was 4.8 with a standard deviation of 2.6. The mean SES level of the children with a
pervasive developmental disorder and of the deaf children was slightly higher; for the hearingimpaired children it was slightly lower. The SES level of the speech or language disabled
children was clearly lower with a mean of 3.9, and the mean SES index of 3.1 of the children
with a general developmental delay was very low. In view of the relationship between the level
of intelligence and the SES level in the norm population, a developmental delay may be expected to occur more frequently in children with a low SES level.
The percentage of native Dutch children in the groups of children with a speech or language
disorder, of hearing-impaired children and of deaf children was relatively high. This was
because, in the group of deaf children, immigrant children did not meet the selection criteria,
and in the other two groups, the research was carried out in the North of the Netherlands where
relatively few immigrants live.
71
Table 7.3
Test Scores per Group
Mean and Standard Deviation
General
developm.
delay
(N=238)
Pervasive
developm.
disorder
(N=90)
Speech/
language
disorder
(N=179)
Hearing
impaired
(N=73)
Mean (SD)
Mean (SD)
Mean (SD)
Mean (SD)
Pat
Mos
Puz
Sit
Cat
Ana
6.3
6.6
8.0
7.8
7.2
7.6
(3.3)
(3.5)
(3.3)
(3.3)
(3.5)
(2.7)
5.6
7.1
7.5
7.0
6.5
7.8
(3.6)
(3.9)
(3.3)
(3.4)
(3.7)
(3.4)
7.7
8.3
8.6
8.6
7.8
8.6
(3.0)
(3.2)
(3.0)
(3.0)
(2.9)
(3.1)
8.2
8.6
9.3
9.5
8.9
9.1
Deaf
(N=94)
(2.9)
(3.2)
(2.9)
(3.3)
(3.2)
(3.0)
Mean (SD)
9.9
9.9
10.3
10.5
8.4
9.2
(3.0)
(2.7)
(3.3)
(2.8)
(2.6)
(3.1)
PS
RS
81.4 (17.8)
83.2 (17.3)
80.2 (19.1)
80.9 (17.8)
88.8 (15.8)
88.8 (15.7)
91.9 (15.5)
94.4 (16.9)
100.0 (15.3)
95.9 (13.6)
IQ
80.3 (17.6)
78.3 (18.7)
87.5 (15.9)
92.2 (16.6)
97.9 (14.4)
1.0
.6
.8
.5
.1
.3
Pervasive
developm.
disorder
Speech/
language
disorder
Hearing
impaired
Deaf
.6
.0
.3
.4
.5
.4
.7
.4
.3
.6
.1
.2
.2
.2
.6
.8
1.3
.5
1.3
.2
.6
.0
.5
.9
Interval
Norm
group
General
developm.
delay
Pervasive
developm.
disorder
Speech/
language
disorder
Hearing
impaired
Deaf
50- 69
70- 89
90-110
111-130
131-150
2%
23%
49%
24%
2%
28%
40%
28%
3%
0%
32%
43%
19%
6%
0%
12%
46%
36%
6%
1%
8%
34%
48%
8%
1%
1%
32%
46%
20%
1%
The children with the diagnosis of autism had a lower IQ score (mean=73.3, N=38) than the
children with the diagnosis of autism related disorder (mean=82.0, N=52; t[88]=2.23, p=.03).
The largest difference between the autistic children and the children with an autism related
disorder was found in the subtests Categories and Situations. Apparently the autistic children
had difficulty completing reasoning tests that use concrete pictures and situations.
The mean IQ score of the children tested by the autism teams did not differ from the mean
scores of the children with a pervasive developmental disorder who were tested at other schools/
institutes.
72
SON-R 2,-7
Hearing-impaired children
The mean IQ score of hearing-impaired children was 92.2 with a standard deviation of 16.6. The
mean score on the Reasoning Scale (mean=94.4) was slightly higher than the score on the
Performance Scale (mean=91.9). However, the difference was not significant (t[72]=1.58,
p=.12). The differences between the mean scores on the subtests were also small.
No difference in IQ scores occurred between the children with a loss of hearing of 30-59 dB
(mean=92.5, N=36) and the children with a loss of hearing of 60-89 dB (mean=92.7, N=43).
Hardly any difference in mean IQ scores was found between the children who were tested at
the schools for children with speech, language and hearing disorders, and the children from the
outpatients department.
Deaf Children
The research with deaf children was restricted to native Dutch children who were not multiply
handicapped. A few children with one parent who was born outside the Netherlands were
included in the analysis. The mean IQ score of the deaf children was 97.9 with a standard
deviation of 14.4. As in the norm group, nearly half the children had an IQ score between 90 and
110. A clear difference was found between the scores on the Performance Scale (mean=100.0)
and on the Reasoning Scale (mean=95.9; t[93]=2.82, p=.01). Deaf children obtained the lowest
scores on the subtests Categories (mean=8.4) and Analogies (mean=9.2). The scores on the
other subtests deviated only slightly from the mean of 10 found in the norm group.
These results were very similar to those of the research carried out using the SON-R 5,-17
with the entire population of older deaf children (Snijders, Tellegen & Laros, 1989). The native
Dutch deaf children, who were not multiple handicapped (three quarters of the deaf population),
had a mean score on the SON-R 5,-17 of 97.0 and, as on the SON-R 2,-7, the lowest score was
on the subtests Categories and Analogies. In the research with the SON-R 5,-17, these abstract
reasoning tests also appeared to have the most substantial relationship with the STADO-R, a
written language test for deaf children (De Haan & Tellegen, 1986).
73
Figure 7.1
Distribution of the 80% Frequency Interval of the IQ Scores of the Various Groups
50
|
60
|
70
|
80
|
90
|
100
|
110
|
120
|
<
>
Primary Education
<
>
Deaf
<
>
Hearing Impaired
<
>
Speech/language Disorder
<
>
Pervasive Developmental Disorder
<
>
General Developmental Delay
|
50
|
60
|
70
|
80
|
90
|
100
|
110
|
120
ences between the groups. The children with a general developmental delay and the children
with pervasive developmental disorders had low performance levels. Deaf children were very
similar to children in primary education. The children with impaired hearing and the children
with a speech or language disorder took an intermediate position.
Besides these differences, the figure also shows a large overlap in the distributions of the
groups. The mean scores of the children with a developmental disorder or delay were low, but
in both groups a good 10% of the children had a score higher than 100, which is the mean of
the norm population. In contrast, 10% of the children in these groups had a score of 50 or
thereabouts, which means that they performed at such a low level that the test did not differentiate further.
In all the groups, children performed relatively poorly on the subtests Categories and (with
exception of the deaf) Patterns. In all the groups, children performed relatively well on Puzzles,
Situations and (with exception of the deaf) on Analogies. The results on Mosaics varied (see
table 7.3).
When evaluating the differences between the groups, the manner in which the groups were
selected must be taken into account. Most of the children examined attended special schools and
institutes that had strict selection procedures for admittance. Children who had, for example, a
pervasive developmental disorder or with impaired hearing, but who were in regular education
were strongly under-represented. In their case, a cognitive delay is less likely to occur. On the
other hand, autistic children in daycare centers for the mentally disabled were not included in
the research. The results are only representative for the children at the kinds of schools and
institutes listed above, and then only to a limited extent due to the small number of schools and
institutes involved. No statement can be made on the basis of this research about the intelligence of autistic children, or the intelligence of children with impaired hearing. Only in the
74
SON-R 2,-7
case of the deaf children was an effort made to obtain a representative picture of the intelligence
level of (native Dutch) deaf children who are not multiple handicapped.
Boys
Girls
Age
N
Dev
470
204
.2
.5
2-3 j.
4-5 j.
6-7 j.
SES Level
Country of birth
N Dev
Dev
Dev
121 1.3
354 .5
199 .2
Low
172
Below aver. 233
Above aver. 115
High
77
2.7
.5
3.3
2.6
.2
5.0
2.8
75
Table 7.5
Reasons for Referral of Children at Schools for Special Education and Medical Daycare Centers for Preschoolers (N=238), with mean IQ scores
Normal
Pct Mean
Home situation
Emotional problems
Behavioral problems
Communicative handicap
Motor development
Language development
Cognitive development
29%
80.1
Fairly
Unfavorable
Pct Mean
48%
80.6
Very
Unfavorable
Pct Mean
23%
82.9
None
Pct Mean
Light
Pct Mean
Severe
Pct Mean
17%
14%
60%
59%
51%
30%
24%
35%
10%
79.9
80.5
83.7
83.3
81.5
79.6
75.9
80.3
67.7
Normal
Pct Mean
Small
Delay
Pct Mean
Large
Delay
Pct Mean
40%
24%
32%
48%
44%
48%
12%
32%
20%
91.8
93.6
95.6
73.4
80.2
78.1
74.1
72.2
64.3
handicaps was -.26. The correlation with both motor and language development was .46. The
SON-IQ correlated most strongly, .66, with the evaluation of cognitive development. The mean
IQ score of the children whose cognitive development had been evaluated as normal was 95.6,
whereas the mean IQ score of the children with a large delay was more than 30 points lower, i.e.,
64.3. With a stepwise multiple regression the correlation with the IQ increased slightly, from .66
to .67, when motor development was also taken into account.
The Performance Scale and the Reasoning Scale both correlated strongly with the evaluation
of cognitive development (.59 and .61). The Performance Scale had a stronger correlation with
the evaluation of motor development (r=.43) than with the evaluation of language development
(r=.40). The Reasoning Scale had a higher correlation with the evaluation of language development (r=.44) than with the evaluation of motor development (r=.39).
76
SON-R 2,-7
Table 7.6
Relationship between IQ and Evaluation by the Examiner
Motivation
General
developm.
delay
(N=238)
Pervasive
developm.
disorder
(N=90)
Speech/
language
disorder
(N=179)
Hearing
impaired
(N=73)
Deaf
(N=94)
Pct Mean
Pct Mean
Pct Mean
Pct Mean
Pct Mean
Poor
2%
Mediocre/Varying 33%
Good
65%
63.7
73.4
84.4
Correlation
.33*
.37*
.33*
.41*
Pct Mean
Pct Mean
Pct Mean
Pct Mean
Concentration
2%
29%
69%
Poor
6%
Mediocre/Varying 44%
Good
50%
62.9
78.1
84.1
Correlation
.28*
.39*
Pct Mean
Pct Mean
Cooperation
Poor
1%
Mediocre/Varying 24%
Good
75%
61.7
78.6
81.1
Correlation
.11
Comprehension
of directions
Pct Mean
Poor
3%
Mediocre/Varying 22%
Good
75%
55.9
75.0
82.9
Correlation
.31*
4%
43%
52%
54.5
69.7
82.8
8%
14%
78%
61.3
72.0
85.1
60.7
72.2
81.3
2%
22%
75%
5%
39%
56%
76.3
78.1
90.6
21%
79%
92.6
99.3
.19
Pct Mean
1% 68
26% 92.2
73% 100.3
.44*
.49*
.31*
Pct Mean
Pct Mean
Pct Mean
66.3
76.5
89.4
6%
38%
56%
71.5
84.2
96.7
74.0
84.7
99.1
2%
12%
86%
72.3
80.7
93.4
3%
30%
67%
1%
12%
86%
86
78.7
94.3
13%
87%
86.1
99.6
.32*
.32*
.28*
.32*
Pct Mean
Pct Mean
Pct Mean
Pct Mean
2%
28%
70%
50.0
71.3
82.0
.34*
3%
15%
82%
79.2
76.6
89.8
.28*
3%
20%
77%
63.5
84.9
95.2
.38*
1%
9%
90%
83
90.5
98.8
.19
Sixty-five percent of the entire group of 674 children were rated good on all four aspects, or on
three aspects, with the fourth rated as mediocre/varying. Eleven percent had a mean rating of
mediocre/varying, or lower.
In comparison to the standardization research, the evaluations of the children from these
special groups were most similar to the evaluations of children two and three years of age.
However, children from special groups received much higher ratings for comprehension of the
directions than did the two- and three-year-olds in the standardization research.
The ratings of motivation, cooperation, and comprehension of the directions correlated
significantly with the IQ score in most groups. The correlations were strongest in the group with
impaired hearing and for the evaluation of concentration. The correlations were substantially
stronger than in the norm group. The main cause for this is that a negative evaluation was more
frequently given in the special groups.
77
Distribution
Mean
SD
correlation
Speech/language disorder
Hearing impaired/deaf (N=241)
Intell.
2.4
(.9)
2.3
(.8)
2.9
(.9)
3.0
(.9)
2.9
(.7)
Intell.
3.2
(1.1)
3.5
(1.0)
Patterns
Mosaics
Puzzles
Situations
Categories
Analogies
.56
.60
.37
.50
.58
.43
.42
.41
.19
.39
.45
.27
.44
.36
.36
.25
.31
.36
.24
.15
.19
.19
.28
.15
.53
.51
.45
.37
.45
.31
.35
.27
.22
.28
.12
.10
.34
.21
.24
.20
.22
.17
.21
.21
.18
.10
.20
.06
SON-PS
SON-RS
.59
.64
.40
.47
.45
.38
.22
.26
.59
.50
.33
.22
.32
.26
.24
.15
SON-IQ
.68
.48
.46
.27
.61
.31
.32
.23
78
SON-R 2,-7
both IQ scores and the evaluation of intelligence was narrower, the correlation was also weaker,
i.e., .61.
In the group of children with a developmental delay, the correlation of the Reasoning Scale
with the evaluation of intelligence was higher than that of the Performance Scale. The correlations with the subtests Puzzles and Analogies were relatively weak. In the group of children
with language/speech/hearing disorders, the Performance Scale had the highest correlations
with the evaluation of intelligence; Situations and Analogies had the lowest correlations.
Patterns and Mosaics had strong correlations with the evaluation of intelligence in both groups.
Reasonably strong correlations with the evaluation of language development and fine motor
development were also found in both groups. Patterns had the strongest correlation with the
evaluation of motor skills. The Performance Scale correlated more strongly than the Reasoning
Scale with motor skills. The correlations between the test scores and the evaluation of the
communicative orientation of the child were positive but weak.
Using a stepwise multiple regression analysis, the extent of the influence of the other evaluations on the correlation between the evaluation of intelligence and the SON-IQ was examined.
In both groups the correlation increased when the evaluation of motor skills was included; in the
first group from .68 to .74, and in the second group from .61 to .65.
79
Sit
Cat
Pat
Mos
Ana
Pat
Mos
Puz
Sit
Cat
Ana
.71
.62
.53
.53
.48
.58
.49
.48
.48
.50
.45
.39
.56
.42
.47
Subt.
Rest
.75
.71
.66
.64
.63
.57
Pat
Mos
Puz
Sit
Cat
Ana
Pat
Mos
Puz
Sit
Cat
Ana
.63
.54
.40
.43
.43
.55
.37
.42
.42
.46
.40
.37
.37
.34
.46
Subt.
Rest
.66
.66
.63
.51
.55
.54
80
SON-R 2,-7
tion between performance and reasoning tests. However, the subtest Situations had its highest
loading on the first (performance) factor.
Individual profile
The intra-individual differences among the subtest scores of the children in the special groups
were not exceptionally large. In the standardization research the mean dispersion of the six
scores was 2.0 with a standard deviation of .7. The mean for children from the special groups
was 2.1 with a standard deviation of .7. The means varied from 1.9 for children with impaired
hearing to 2.2 for children with a pervasive developmental disorder.
81
IMMIGRANT CHILDREN
In this chapter a study is made of the test performances of children one or both of whose parents
were born outside the Netherlands. These children were tested in the standardization research
(N=147), or attended a preschool playgroup (N=8), or a primary school where complementary
research projects were carried out (N=54). Of these 209 children, 118 were immigrant children
(both parents were born outside the Netherlands) and the remaining 91 children belonged to the
mixed group (one parent born outside the Netherlands). In section 8.5 the results of the immigrant children will be compared with the results of 90 children participating in OPSTAP(JE), a
program to stimulate the development of immigrant children.
Score
Patterns
Mosaics
Puzzles
Situations
Categories
Analogies
Native Dutch
(N=969)
Mixed
(N=91)
Immigrant
(N=118)
Mean (SD)
Mean (SD)
Mean (SD)
10.1
10.1
10.1
10.1
10.2
10.0
(2.9)
(3.0)
(3.0)
(2.9)
(2.9)
(2.9)
10.0
9.7
10.3
9.8
10.0
10.5
(2.6)
(2.9)
(3.0)
(2.9)
(3.1)
(3.2)
8.9
8.7
9.2
9.1
8.8
9.3
(2.8)
(3.1)
(2.4)
(2.8)
(3.0)
(3.2)
SON-PS
SON-RS
100.7 (15.2)
100.4 (14.8)
100.2 (14.3)
100.5 (15.2)
93.4 (13.6)
93.8 (15.6)
SON-IQ
100.7 (14.9)
100.6 (15.2)
92.8 (14.4)
82
SON-R 2,-7
occurred between Mosaics (9.7) and Analogies (10.5) is noteworthy. The results show that the
lower performances of the immigrant children were not caused or worsened specifically by the
subtests Categories, Situations and Puzzles. These subtests use meaningful picture materials
and might therefore have a culture specific meaning. The mean score on these three subtests was
equal to the mean score on Patterns, Mosaics and Analogies. These last three subtests use nonmeaningful picture materials such as geometrical forms. No differences were found between the
mean scores on the Performance Scale and the Reasoning Scale in the immigrant or in the mixed
group.
Mixed
(N=90)
Immigrant
(N=117)
Pct Mean (SD)
61% 90.3
22% 94.6
11% 99.1
6% 102.9
(13.5)
(12.3)
(17.8)
(16.6)
83
IMMIGRANT CHILDREN
Table 8.3
Differentiation of Mean IQ Scores According to Country of Birth
Country of birth of parents
Country
One or both
abroad
One parent
abroad
Both parents
abroad
of birth
of child
land
Mean
Mean
Mean
Mean
Surinam
Antilles
Marocco
Turkey
Indonesia
Other Africa
Other Asia
Other S-America
Other Western
49
22
26
26
15
14
11
8
38
92.3
99.3
88.7
91.0
102.6
96.1
101.4
96.8
103.9
13
7
1
4
12
10
4
7
33
97.8
102.9
82
86.3
103.6
99.3
106.0
97.6
102.8
36
15
25
22
3
4
7
1
5
90.3
97.6
88.9
91.8
98.7
88.0
98.7
91
111.8
4
9
3
2
4
6
4
7
79.8
93.1
97.3
93.0
99.5
88.7
88.3
107.7
209
96.2
91
100.6
118
92.8
39
94.2
Total
were slight. Only a small group with one Turkish or Moroccan parent scored clearly below
average.
A total of 39 children were not born in the Netherlands. In the case of seven of these
children, both parents were born in the Netherlands; these children were presumably adopted. The mean IQ score of these seven children was 4 points lower than of the native Dutch
children. Six of the children in the mixed group were born outside the Netherlands. Their
mean IQ score was nearly 2 points higher than the score of the children in the mixed group
who were born in the Netherlands. Of the 107 children whose parents were both born
outside the Netherlands, and whose country of birth was known, 26 were also born outside
the Netherlands. Their mean IQ score was more than 1 point lower than the IQ score of
the immigrant children who were born in the Netherlands. This indicates that whether the
child was born in the Netherlands or in another country had little effect on the test performance.
84
SON-R 2,-7
means that their mean score on the LEM was approximately 6 points lower than the mean IQ
score of Turkish and Moroccan children on the SON-R 2,-7.
The conclusion on the basis of these comparisons is that immigrant children get better results
on the SON-R 2,-7 than on the RAKIT and the LEM. Comparisons of the SON-R 2,-7 and
another test (see section 9.11), administered to the same children, indicate also that the SON-R
2,-7 is much less dependent on culture specific knowledge and skills.
Comparison Group
Immigrant
Native Dutch
Country of birth
of parents
Mean (SD)
Mean (SD)
Mean (SD)
Surinam
Morocco
Turkey
The Netherlands
33
22
35
98.4 (17.6)
106.5 (11.3)
104.7 (11.1)
36
25
22
90.3 (15.0)
88.9 (10.1)
91.8 (14.0)
969
100.7 (14.9)
Total
90
102.8 (14.2)
83
90.3 (13.3)
969
100.7 (14.9)
IMMIGRANT CHILDREN
85
children from the comparison group. The difference according to country of birth was largest
for Moroccan children and least for Surinam children. A variance analysis carried out with
country of birth and participation in OPSTAP(JE) as factors, showed that neither the interaction
effect nor the main effect for country of birth was significant. However, the main effect for
participation in OPSTAP(JE) was highly significant (F[1,167]=33.77, p<.01).
The possibility exists that factors other than participation in OPSTAP(JE) contributed to
these differences, such as, for instance, the SES level of the parents. A selection effect may have
occurred in the decision for parents to participate in OPSTAP(JE), or when parents agreed to
participate in this research. Another difference is that the test was administered at home in the
OPSTAP(JE) research, and at school in most of the other research projects. The ethnic background of the examiners appeared to have had no influence. The scores of the children who were
tested by the two immigrant examiners were on average two points lower than the scores of the
children who were tested by the two Dutch examiners. Furthermore, the scores of the children
who were tested by an examiner from their own ethnic group were no higher than those of the
other children. What could, of course, have played a role is, that all four examiners had a great
deal of experience with immigrant children and were therefore well able to motivate and stimulate the children. In order to be able to give an unambiguous evaluation of the effect of
OPSTAP(JE), research needs to be done with a pretest, post-test, control group design, with the
examiner as variable to be controlled for.
87
Within the framework of the validation of the test, the relationship between the IQ scores on the
SON-R 2,-7 and the performances on a large number of cognitive tests was examined. The
validation measures, here referred to as criterion tests, were mostly general development and
intelligence tests like the BOS 2-30, the Stutsman, the GOS 2,-4,, the LDT, the RAKIT,
various versions of the Wechsler tests, the BAS, the MSCA and the TONI-2, and tests for
language development and verbal intelligence like the Reynell Test and the Schlichting Test, the
TvK, the PPVT-R and the PLS-3. More specific tests were also administered, including a
memory test (TOMAL) and a test for visual perception (DTVP-2). In the text the tests will be
described and the acronyms explained.
The administration of the SON-R 2,-7 and the criterion tests was carried out within the
framework of a number of different research projects. In sections 9.1 through 9.7 the results of
each research project are described. These projects were:
1. The nationwide standardization research.
2. Research in the Netherlands on pupils at second year kindergarten level ( 5-6 years) at
primary schools.
3. Research in the Netherlands at OVB-schools. These are schools with a policy of educational
priority in certain areas designated as low SES areas.
4. The Dutch research at schools and institutes for children with special problems and handicaps.
5. Research in Australia on non-handicapped children and on children with impaired hearing or
a developmental delay.
6. Research in the United Stated of America on children in regular education.
7. Research in Great Britain on children without specific problems, children with learning
problems and children growing up bilingually.
Table 9.1 presents the tests that were used in the different research projects. In a number of cases
only some sections of the criterion test were administered. In order to be able to compare the
correlations of the research projects better, they have all been corrected for the dispersion of the
IQ scores of the SON-R 2,-7 (Guilford & Fruchter, 1978, p. 325). This correction is not
comparable to the correction for attenuation by which correlations are systematically strengthened. When correcting for dispersion, the correlations are strengthened if the standard deviation
of the SON-IQ in the research group is smaller than 15, and they are weakened if the standard
deviation is larger than 15. As an example we will give a few corrected correlations for an
observed correlation of .60. This becomes: .65 (sd=13); .63 (sd=14); .58 (sd=16) or .55 (sd=17).
In section 9.8 a summary is presented of the correlations between the SON-IQ and the
criterion tests that are discussed in this chapter. A distinction is made between general intelligence tests, nonverbal cognitive tests, and language and verbal intelligence tests. Approximately half of the correlations with general intelligence tests ranged from .59 to .70. With nonverbal
cognitive tests they ranged from .59 to .75, and with verbal (intelligence) measures half of the
correlations ranged from .45 to .54.
Section 9.9 examines whether important differences were found between the correlations of
the Performance Scale and the Reasoning Scale with the criterion tests, and whether these
differences were systematic. When differences were found, the Performance Scale of the
SON-R 2,-7 had a relatively strong correlation with the performance part of other intelligence
88
SON-R 2,-7
Table 9.1
Overview of the Criterion Tests Used and the Number of Children to Whom Each Test Was
Administered
Netherlands
Criterion
Test
P-SON/SON-R 5,-17
BOS 2-30 (BSID)
GOS 2,-4,/K-ABC
RAKIT
WPPSI(-R)/WISC-R
LDT
Stutsman
MSCA (MOS)
BAS
TONI-2
DTVP-2
TOMAL
Reynell (TB)
Schlichting (ZO/WO)
TvK
PPVT-R (Peabody)
PLS-3
Special
groups
Australia
USA
GB
119
50
115
165
153
73
41
73
206
26
70
112
80
42
155
31
75
26
58
153
153
558
558
108
179
49
29
47
tests and with visual perception, whereas the Reasoning Scale had a strong correlation with the
verbal part of other intelligence tests and with language comprehension.
In section 9.10 the differences between the mean scores of the SON-IQ and mean total scores
of the criterion tests are presented. The problems that occur when making these comparisons are
also examined. Large differences between standardized scores may occur as a result of norms
becoming obsolete, or as a result of differences in populations used for standardization. If
obsolescence of the norms was not taken into account, the scores on the SON-R 2,-7 were
generally lower than on the other tests. If scores on the other tests were corrected for obsolescence, the mean score of the SON-IQ corresponded with the corrected American and English
norms. The scores on the SON-R 2,-7 were relatively high in comparison to the corrected
Dutch test scores. However, the scores corresponded well with the most recently standardized
test in the Netherlands, the GOS.
Because of the amount of research described in this chapter, it may be easier for the reader to
read the summarizing sections 9.8 through 9.10 first, and then the separate descriptions of each
research project.
Finally, in section 9.11 a comparison is made between the relationship of the SON-R 2,-7
and the criterion tests, using a number of external variables. These are the testability of the
child, the correlation with SES level and native country of the parents, and external assessments
of intelligence and language skills.
The results of the research described in this chapter can clarify the extent to which the scores
on the SON-R 2,-7 are comparable with the scores on other intelligence tests, and give insight
into the relationships between the nonverbal measure of intelligence provided by with the
SON-R 2,-7 and other aspects of cognitive development such as language skill, memory and
perception. In chapter 10, the results of this correlational research are worked out in more detail
together with the results of the previous chapters. In chapter 10, attention is focussed especially
on the implications of the research results for the use of the test in practice.
89
Total
Retest
SON-R
2,-7
SON-R
5,-17
BOS
GOS
141
119
50
115
165
558
56
241
108
12
13
12
9
21
30
7
12
9
8
8
22
23
23
27
24
28
22
11
19
19
25
26
14
1
14
23
25
27
25
27
24
41
51
54
57
54
55
64
66
58
58
47
9
24
53
50
55
59
11
16
18
17
16
15
15
72
69
63
56
24
26
59
56
80
85
269
289
23
33
117
124
50
58
5.3
(2.7)
4.1
(2.2)
4.9
(2.7)
5.5
(2.7)
4.9
(3.0)
4.9
(2.5)
4.5
(2.3)
5.0
(2.5)
4.4
(2.3)
84%
6%
9%
83%
7%
10%
94%
2%
4%
90%
6%
4%
87%
4%
9%
92%
5%
3%
96%
4%
93%
4%
3%
90%
5%
5%
Reynell-Schlichting
RAKIT LC/SD/WD Lexi
AM
TvK
Age group
2;3 years
2;9 years
3;3 years
3;9 years
4;3 years
4;9 years
5;3 years
5;9 years
6;3 years
6;9 years
7;3 years
Sex
Boys
Girls
SES Index
Mean
(SD)
Country of birth
Native Dutch
Mixed
Immigrant
the age group is the age at the time of administration of the SON-R 2,-7
90
SON-R 2,-7
previously with another test, either the GOS or the RAKIT was administered approximately
three months after adminstration of the SON-R 2,-7, or the children were tested again with
either the SON-R 2,-7 or the SON-R 5,-17.
In table 9.2 the background of the children to whom a criterion test was administered is
presented. The age groups refer to the age at which the SON-R 2,-7 was administered. The
results are presented in table 9.3. The age in this table is based on the mean age at administration
of the SON-R 2,-7 and the criterion test. The interval (in months) is the period between the
administration of the tests. The results of the research are discussed for each test.
, -7
SON-R 2,
To determine the stability of the test results, the SON-R 2,-7 was administered a second time to
141 children after a delay of three to four months. The results were presented in section 5.5.
They will be discussed briefly here as they may serve as basis for the assessment of the
Table 9.3
Correlations with Other Tests in the Standardization Research
Scores
Age
(years)
Interval
(months)
Criterion
Mean (SD)
, -7
SON-R 2,
IQ-score on retest
141
.79
109.4 (14.7)
103.4 (13.7)
4.7 (1.4)
3.5 (0.7)
, -17
SON-R 5,
Standard IQ
119
.76
103.6 (12.2)
98.2 (12.6)
6.4 (0.7)
3.6 (0.7)
50
.59
.53
100.5 (17.7)
98.6 (17.3)
103.0 (15.9)
2.4 (0.3)
2.7 (0.7)
115
.65
.63
.49
104.4 (15.7)
102.8 (17.4)
105.1 (13.0)
102.9 (15.5)
3.6 (0.7)
3.2 (0.7)
165
.60
102.2 (15.6)
102.4 (14.6)
5.8 (1.0)
3.0 (0.7)
558
.48
.46
.35
.45
100.8
101.1
100.3
100.9
(12.8)
(15.0)
(14.2)
(14.8)
101.4 (15.4)
4.4 (1.4)
6.3 (1.4)
241
.27
100.6 (14.3)
101.9 (15.8)
4.1 (0.7)
6.1 (1.0)
56
.54
102.4 (15.8)
101.7 (16.5)
2.0 (0.1)
6.9 (2.5)
108
.59
4.7 (1.6)
101.2 (15.4)
5.7 (1.0)
3.0 (0.8)
Criterion Test
BOS 2-30
Mental Scale
Nonverbal Scale
, -4,
,
GOS 2,
Cognitive DI
Simultaneous DI
Sequential DI
RAKIT
Shortened version IQ
REYNELL/SCHLICHTING
Mean LC, SD and WD
Lang.comprehension (LC)
Sentence Developm. (SD)
Word Development (WD)
Auditive Memory
Lexilist
TvK
Mean of 5 subtests
SON-R 2,-7
Mean (SD) Mean (SD) Mean (SD)
the correlations have been corrected for the variance of the SON-IQ
the age is the mean age at the time of administration of the SON-R 2,-7 and the criterion test
91
correlations of the SON-R 2,-7 with other tests. The age at the first administration ranged from
two to seven years with a mean of 4;6 years. The correlation between the IQ scores was .79. This
correlation increased slightly with age. With children up to 4;6 years (N=67), the correlation
was .78 and with older children the correlation was .81 (N=74). On the basis of these retest
correlations, correlations with criterion tests were not expected to exceed .80 if the period
between administrations was a few months or more.
, -17
SON-R 5,
After a delay of at least three months, the SON-R 5,-17 was administered to 119 children 5
years and older (mean age was 6;3 years). The more difficult items of the subtests Mosaics,
Categories, Analogies and Situations of the SON-R 2,-7 are very similar in content to the
easier items of these subtests of the SON-R 5,-17. The subtest Puzzles does not have an
equivalent in the SON-R 5,-17. Two new subtests in this test are Stories and Hidden Pictures.
Both tests have a subtest Patterns, however, the subtests differ in content.
The correlation between the IQ scores of the two tests was .76. The correlation was as high
with children younger than 6;6 years (N=68; r=.75) as with the older children (N=51; r=.75). As
was the case with the retest of the SON-R 2,-7, there was a noticeable learning effect with the
administration of the SON-R 5,-17. The mean scores were more than 5 points higher with the
SON-R 5,-17.
BOS 2-30
The BOS 2-30 (Bayley Developmental Scales; Van der Meulen & Smrkovsky, 1983) is a test for
the mental and motor development of children in the age range from two to thirty months. This
test is the Dutch version of the Bayley Scales of Infant Development (BSID; Bayley, 1969). A
developmental index is calculated for the Mental Scale and the Motor Scale with a mean of 100
and a standard deviation of 16. A nonverbal score for the Mental Scale can be determined by
excluding the items with a verbal content in the scoring (Van der Meulen & Smrkovsky, 1987;
Le Coultre-Martin et al., 1988).
Fifty children (24 boys and 26 girls) were tested. In the case of 47 children, both parents
were born in the Netherlands. The SES level corresponded to that of the norm group. The mean
age at the time of administration of the BOS was 2;3 years. The SON-R 2,-7 was administered
two to four months later. The administration of the BOS was limited to the Mental Scale.
The correlation of the developmental index of the Mental Scale of the BOS with the SON-IQ
was .59. The correlation of the Nonverbal Scale of the BOS with the SON-IQ was slightly lower
(r=.53). On average, the children scored more than two points lower on the BOS than on the
SON-R 2,-7.
, -4,
,
GOS 2,
The GOS 2,-4, (Groningen Developmental Scales; Neutel, Van der Meulen & Lutje Spelberg,
1996) is the Dutch version, for children from 2, to 4, years, of the Kaufman Assessment
Battery for Children (K-ABC; Kaufman & Kaufman, 1983). Two new subtests were added to
the GOS (Motor Skills and Copying Figures). In contrast to the K-ABC, the GOS does not make
a distinction between a Mental Scale and an Achievement Scale. The number of subtests administered is 9, 11 or 13, depending on age. The total of all subtests forms the Cognitive Scale.
Furthermore, the subtests are subdivided into a Simultaneous Scale and a Sequential Scale.
Three subtests from the Achievement Scale of the K-ABC have been added to the Simultaneous
Scale. The subtest Arithmetic and the two new subtests are part of the Sequential Scale. The
three developmental indexes have a mean of 100 and a standard deviation of 15.
The GOS was administered to 115 children (59 boys and 56 girls). The mean SES index was
5.5. In the case of 103 children, both parents were born in the Netherlands. The period between
the administration of the tests was on average three months. The GOS was administered first to
64 children, and the SON-R 2,-7 was administered first to 51 children. The mean age at the
time of administration of the tests was 3;7 years.
The correlation between the Cognitive Developmental Scale of the GOS and the SON-IQ
was .65. The mean and the standard deviation of both tests were very similar. The correlation for
92
SON-R 2,-7
three age groups, based on the age at the time of administration of the GOS was .64 (younger
than 3;2 years; N=39), .62 (age between 3;2 and 4;2 years; N=51) and .77 (older than 4;2 years;
N=25).
In the entire group the correlation between the Simultaneous Scale and the SON-IQ was .63,
and between the Sequential Scale and the SON-IQ .49. The dispersion of Simultaneous Scale
(sd=17.4) was significantly larger than that of the Sequential Scale (sd=13.0). When the correlations were corrected not for the standard deviation of the SON-IQ, but instead for the standard
deviation of the two subscales of the GOS, the correlation of the Simultaneous Scale with the
SON-IQ was .59 and the correlation of the Sequential Scale with the SON-IQ was .56.
RAKIT
The RAKIT (Revision of the Amsterdam Intelligence Test for Children; Bleichrodt, Drenth,
Zaal & Resing, 1984) is a general intelligence test, developed in the Netherlands, for children in
the age range four to eleven years. There are twelve subtests which tap spatial-perceptual as
well as verbal abilities. In the age range six to ten years, the RAKIT IQ has a correlation of .81
with the IQ score on the WISC-R (Bleichrodt, Resing, Drenth & Zaal, 1987). In our research
project the shortened version of the RAKIT, five or six subtests, depending on age, was administered. The IQ score of the shortened RAKIT has a mean of 100 and a standard deviation of 15.
Research was done with 165 children (80 boys and 85 girls). The mean SES index was 4.9.
Thirteen percent of the children had one or both parents born outside the Netherlands. The
RAKIT was administered first to 111 children and the SON-R 2,-7 was administered first to 54
children. The mean interval between the two administrations was three months. The age at the
time of administration was on average 5;10 years.
The correlation between the SON-IQ and the shortened version RAKIT IQ was .60. The
mean and the dispersion of both tests corresponded well. Three age groups were distinguished
on the basis of the combination of RAKIT subtests administered. In the first group (mean age
4;8 years at the time of administration of the RAKIT; N=53) the correlation was .50. In the
second group (mean age 5;8 years; N=48) the correlation was .62. In the oldest age group (mean
age 6;10 years; N=64) the correlation between the SON-IQ and the RAKIT IQ was .65.
93
year-olds .36 (N=124), for the four-year-olds .51 (N=119), for the five-year-olds .52 (N=124)
and for the six-year-olds .72 (N=58).
The correlation between the SON-IQ and the score on the Lexilist was .54 (N=56). The age
at the time of administration of the Lexilist was 1;9 years. The mean age at the time of administration of the SON-R 2,-7 was 2;4 years.
The correlation of the SON-IQ with the Auditive Memory section of the Schlichting Test was
.27 (N=241). For children less than four years at the time of administration of the Schlichting
Test, the correlation was .25 (N=127) and for the older children the correlation was .28 (N=114).
TvK
The TvK (Language Tests for Children; Van Bon, 1982) is a test battery consisting of ten tests
for receptive and productive language development in children in the age range four to ten
years, developed in the Netherlands. The TvK is an adaptation of the Illinois Test of Psycholinguistic Abilities (ITPA). During the research two receptive tests (Choice of Sentence Structure
and the Choice of Vocabulary) and three productive tests (Word Form Production, Sentence
Structure Production 0 and Vocabulary Production 1) were administered. The scaled scores of
the tests have a mean of 5 and a standard deviation of 2.
The TvK was administered to 108 children (50 boys and 58 girls). In the case 97 children,
both parents were born in the Netherlands. The SES index had a mean of 4.4. The age at the time
of administration of the TvK was on average 5;6 years. The SON-R 2,-7 was administered on
average three months later.
The correlations of the SON-IQ with the subtests of the TvK ranged from .39 (Choice of
Sentence Structure) to .52 (Choice of Vocabulary). The correlation of the SON-IQ with the
mean score on the five subtests of the TvK was .59. For the younger children (age at the time of
administration of the TvK less than 5;6 years; N=53) the correlation was .50; for the older
children the correlation was .68 (N=55).
94
SON-R 2,-7
Table 9.4
Correlations with Nonverbal Cognitive Tests in the Second Year of Kindergarten, 5 to 6 Years of
Age (N=153)
Correlation
with SON-IQ
Test
Score
Mean (SD)
,-7
SON-R 2,
IQ
TONI-2
Form A
.51
103.5 (14.1)
TOMAL
.45
97.5 (11.7)
DTVP-2
.73
.70
.66
109.2 (14.4)
100.8 (13.8)
116.7 (15.7)
102.4 (15.8)
the correlations have been corrected for the variance of the SON-IQ
The testing materials did not have to be adapted for the research in the Netherlands. The
directions of the TOMAL and the DTVP-2 were translated. The directions of the TONI-2 are
given nonverbally. American norms were used in the research. The standardized total scores
have a mean of 100 and a standard deviation of 15. The norms for the TONI-2 and the TOMAL
are given for each year of age and are therefore very rough for the young age groups. The
standardized scores for the age in months were therefore calculated by interpolation and extrapolation.
The research was carried out on 153 children (64 boys and 89 girls). The mean age of the
children was 5;10 years with a standard deviation of 5 months. The SES index had a mean of 6.6
(sd=3.0) and was clearly higher than the mean of the norm group. The percentage of native
Dutch children was 86%.
All four tests were administered to the children in three sessions at school. The administration of the TONI-2 and the TOMAL was combined, the TONI-2 being administered first. The
sequence of administration of the SON-R 2,-7, the TONI/TOMAL and the DTVP-2 varied.
The mean interval between the administration of the SON-R 2,-7 and one of the other tests was
21 days with a standard deviation of 12 days.
The mean scores and the correlations between the SON-IQ and the total scores on the other
tests are presented in table 9.4. The correlation with the IQ score on the TONI-2 was .51; the
correlation with the Nonverbal Memory Index of the TOMAL was .45. The highest correlation,
.73, was found with the total score on the DTVP-2. The correlation with the tasks that do not
require motor skills was somewhat stronger (r=.70) than the correlation with the visual motor
tasks (r=.66).
95
At the time of administration of the SON-R 2,-7 and the second administration of the LDT and
the RAKIT, most of the children were in first grade (approximately 6-7 years of age). The mean
age at the time of administration of the SON-R 2,-7 was 6;8 years with a standard deviation of
four months. The SES index of the children (34 boys and 39 girls) had a mean of 2.0 (sd=1.9).
The SES level of 75% of the children was low, 16% were below average and 9% were above
average or high. Approximately half the children had one or both parents born outside the
Netherlands, mainly in Surinam or the Antilles.
The LDT (Leiden Diagnostic Test; Schroots & Van Alphen de Veer, 1976) is a general
intelligence test for children in the age range four to eight years. The test has eight subtests,
some taken from other tests. During the research the performance subtest Block Patterns and
three verbal subtests (Repeating Sentences, Questions about a Story and Comprehension and
Insight) were administered. The standardized subtest scores have a mean of 100 and a standard
deviation of 15. Four verbal subtests from the RAKIT (Bleichrodt et al., 1984) were administered (Meaning of Words, Learning Names, Production of Ideas and Story Pictures). These are
all components of the verbal learning and fluency factor. The standardized subtest scores have a
mean of 15 and a standard deviation of 5. The WISC-R is the Dutch edition (Van Haasen et al.,
1986) of the American test with the same name (Wechsler, 1974). The scores for the performance IQ (PIQ), the verbal IQ (VIQ) and the total IQ (FSIQ) have a mean of 100 and a standard
deviation of 15. Two reading tests that had been developed by the CITO (Central Institute for
Test Development) were administered in the school year 1995/96. These were the Cito ThreeMinute-Test for the level of Technical Reading and the Cito Test for Textual Reading.
The subtests of the LDT and the RAKIT were administered in 1991 (N=69) and in 1993
(N=73) in one session. The period between administration of the LDT/RAKIT and the SON-R
2,-7 varied in 1993 from several days up to several weeks. The WISC-R and the Test for
Table 9.5
Correlations with Cognitive Tests Completed by Children at Low SES Schools Given Educational Priority (OVB-Schools)
Year
adm.
91
93
95
Criterion Test
LDT
Block patterns
Mean 3 verbal tests
RAKIT
Mean 4 verbal tests
LDT
Block patterns
Mean 3 verbal tests
RAKIT
Mean 4 verbal tests
WISC-R
Total IQ
Performance IQ
Verbal IQ
CITO-test
Technical reading
Textual reading
Crit. Test
Mean (SD)
69
.54
.44
99.7 (13.7)
96.3 (11.0)
.61
12.7 ( 3.6)
.66
.54
95.8 (13.7)
98.6 (10.8)
.42
13.5 ( 3.5)
.74
.73
.60
90.5 (13.1)
91.5 (12.8)
91.2 (13.6)
.38
.52
39.0 (21.6)
16.4 (16.1)
73
41
the correlations have been corrected for the variance of the SON-IQ
the SON-R 2,-7 was administered in 1993
, -7
SON-R 2,
Mean (SD)
92.7 (15.4)
92.0 (15.0)
92.2 (14.1)
96
SON-R 2,-7
Technical Reading were administered at the beginning of the school year 95/96 (N=41); the
Test for Textual Reading was was administered later that year (N=35).
The correlations of the SON-IQ with the various test scores are presented in table 9.5. The
correlation with the performance subtest Block Patterns of the LDT, administered two years
earlier, was .54. When administered in the same period as the SON-R 2,-7, the correlation
increased to .66. The correlation with the three verbal subtests of the LDT also increased from
.44 to .54. The fact that the strong correlation of the SON-IQ with the four verbal subtests of the
RAKIT decreased from .61 to .42, is noteworthy. The two subtests that had weaker correlations
with the SON-R 2,-7 in 1993, (Word Meaning and Production of Ideas), also had weaker
correlations with the LDT when administered in 1993.
The strongest correlation was found between the SON-IQ and the WISC-R, which was
administered two years later. The correlation with the total IQ was .74; the correlations with the
PIQ and the VIQ were .73 and .60 respectively. On average the SON-IQ score was slightly
higher than the IQ score on the WISC-R.
The correlation of the SON-IQ with the Test for Textual Reading was .52 and the correlation
with the Test for Technical Reading was .38.
Preschool SON
In the case of 188 children, the IQ scores were known on the predecessor of the SON-R
2,-7, the Preschool SON. More than half of this group were children with language/speech
and hearing problems. Additionally, a large number of children with a general developmental delay and/or a pervasive developmental disorder were tested with the Preschool SON.
The IQ scores of the deaf children that were based on the separate standardization for the
deaf, were transformed into IQ scores based on the standardization for the hearing. Data
from the Preschool SON were only used in the analysis if the test had been administered in
full.
The mean age at the time of administration of the Preschool SON was 3;10 years, the mean
age at the time of administration of the SON-R 2,-7 was 5;3 years. The period between the
administration of the tests was, on average, nearly a year and a half. In a few cases the interval
was more than four years. In the case of 95% of the children, the SON-R 2,-7 was administered
after the Preschool SON.
The correlation between the IQ scores on both tests was .65. This correlation increased
greatly as the age at which the Preschool SON was administered increased. In the age group up
97
to 3;5 years the correlation was .57 (N=60); in the age range 3;5 to 4;1 years the correlation was
.64 (N=64) and in the age range from 4;1 onwards the correlation was .77 (N=64). The interval
between the administration of the two tests may have influenced the increase in the correlations
with age. In the youngest group the average interval was 22 months and in the oldest group 10
months. A relatively large difference, 13 IQ points, was found between the mean scores of the
two tests. A substantial decrease in IQ scores can be expected in view of the interval of more
than 20 years between the two standardizations.
Table 9.6
Characteristics of the Children in the Special Groups to Whom a Criterion Test Was
Administered
Total
P-SON
SON-R
5,-17
BOS
Stutsman
WPPSI
WPPSI-R
WISC-R
LDT
RAKIT
Reynell
TvK
206
26
42
112
80
70
179
49
21
63
73
41
8
4
11
6
4
1
2
5
17
8
9
1
2
16
42
46
6
1
9
20
39
11
8
23
36
3
2
39
47
60
31
6
21
21
1
57
22
58
23
46
9
13
4
12
3
27
61
8
4
39
44
7
20
9
26
4
24
16
64
26
74
15
12
28
9
140
66
11
15
30
12
86
26
55
25
45
25
128
51
33
16
4.2
(2.6)
3.9
(2.1)
4.7
(2.9)
4.0
(2.5)
3.4
(2.4)
3.6
(2.5)
3.5
(2.0)
3.7
(2.3)
89%
7%
4%
95%
5%
90%
5%
5%
94%
5%
1%
91%
4%
5%
97%
2%
2%
92%
5%
2%
96%
2%
2%
Age
2 years
3 years
4 years
5 years
6 years
7 years
Group
Gen.Dev.Disorder
Perv.Dev.Disorder
Speech/lang.Disord.
Hearing impaired
Deaf
Sex
Boys
Girls
SES Index
Mean
(SD)
Country of birth
Native Dutch
Mixed
Immigrant
the age is the age at the time of administration of the SON-R 2,-7
98
SON-R 2,-7
, -17
SON-R 5,
The children at one institute for the deaf were not included in the analysis of the special groups,
because of the probability that the low scores of these children were the result of an examiner
effect (see section 7.7). Most of these children (N=18) were tested again three years later with
the SON-R 5,-17, the revision of the SON for older children. The mean age at the time of
administration of the SON-R 2,-7 was 5;5 years. The mean age at the time of administration of
the SON-R 5,-17 was 8;6 years. The correlation between the IQ scores was .66.
Table 9.7
Correlations with Criterion Tests in the Special Groups
Scores
Age
(years)
Interval
(months)
criterion
Mean (SD)
188
.65
97.9 (16.4)
84.8 (18.2)
4.6 (0.7)
17.0 (12.6)
, -17
SON-R 5,
Standard IQ
18
.66
100.2 (20.4)
83.5 (14.5)
7.0 (0.8)
36.8 ( 7.0)
BOS 2-30
Nonverbal scale
26
.50
95.5 (18.1)
84.1 (13.9)
3.5 (0.5)
35.4 (14.5)
STUTSMAN
Total IQ
42
.57
106.7 (21.6)
92.7 (18.7)
4.1 (0.6)
21.1 (15.3)
WPPSI-R
Performance scale
19
.82
104.2 (15.6)
80.6 (13.2)
5.5 (0.8)
4.7 ( 2.8)
WPPSI
Performance scale
20
.82
111.4 (12.6)
102.0 (11.2)
5.5 (0.5)
10.7 ( 5.8)
53
.60
.49
.59
87.9 (17.4)
87.7 (16.4)
90.3 (18.8)
83.0 (16.0)
5.5 (0.6)
8.2 ( 8.9)
20
.62
.47
.76
85.9 (16.7)
91.8 (17.9)
82.6 (14.9)
82.1 (14.7)
6.6 (0.4)
2.4 ( 2.7)
LDT
Total IQ
80
.58
85.0 (14.8)
81.6 (14.0)
5.9 (0.7)
8.4 ( 6.1)
RAKIT
(Shortened version) IQ
40
.46
80.0 (16.6)
79.7 (15.0)
5.8 (0.6)
6.5 ( 5.0)
RAKIT
Mean of 4 subtests
30
.64
14.9 ( 3.2)
93.8 (14.6)
5.9 (0.6)
8.5 ( 5.5)
REYNELL
Language comp. A
179
.44
1.4 ( 1.2)
83.0 (17.5)
4.9 (1.0)
5.3 ( 6.2)
TvK
Mean of 4 subtests
49
.53
3.4 ( 1.5)
86.1 (14.3)
5.9 (0.7)
3.5 ( 2.8)
Criterion Test
P-SON
IQ-score
WPPSI
Total IQ
Verbal scale
Performance scale
WISC-R
Total IQ
Verbal scale
Performance scale
SON-R 2,-7
Mean (SD)
Mean (SD)
Mean (SD)
the correlations have been corrected for the variance of the SON-IQ
the age is the mean age at the time of administration of the SON-R 2,-7 and the criterion test
99
BOS 2-30
The scores of 26 children on the nonverbal developmental index of the BOS 2-30 were known
(Bayley Scales of Infant Development; Van der Meulen & Smrkovsky, 1983, 1987). All the
children had a language/speech or hearing disorder. The mean age at the time of administration
of the BOS was 2;0 years. The administration of the SON-R 2,-7 took place between one and
five years later. The period between administrations was, on average, nearly three years. The
mean age at the time of administration of the SON-R 2,-7 was 4;11 years. The correlation
between the nonverbal developmental index of the BOS and the SON-IQ was .50.
STUTSMAN
The Stutsman Test (Stutsman, 1931) uses toys and utensils. The tasks to be performed are
different for each age group. The test was adapted for the Netherlands (Smulders, 1963).
However, the old American norms were maintained.
In this investigation, the test was mainly administered to deaf children and children with a
general developmental delay. The mean age at the time of administration of the Stutsman was
3;3 years and the mean age at the time of administration of the SON-R 2,-7 was 5;0 years. In 40
of the 42 cases the Stutsman was administered first. The correlation between the IQ scores on
the two tests was .57. The norms of the Stutsman are obsolete; the scores have a mean that is 14
points higher than the SON-IQ.
WPPSI-R
The performance scale of the WPPSI-R (Wechsler Preschool and Primary Scale of Intelligence
- Revised; Wechsler, 1989) was administered to 19 children at one institute for the deaf. This is
the institute that was not taken into account in the analysis of the results of deaf children,
because of an examiner effect on the administration of the SON-R 2,-7 (see section 7.7). At the
time the WPPSI-R was administered it had not been translated and standardized for the Netherlands. A translation done by the institute was used, and the directions for the performance
subtests were adapted for use with deaf children. The scores were based on American norms.
The SON-R 2,-7 was administered first to 8 children and the WPPSI-R was administered
first to 11 children. The mean age at the time of administration of the tests was 5;6 years. The
interval between the tests was, on average, 5 months. The correlation between the performance
IQ of the WPPSI-R and the SON-IQ was .82.
WPPSI
A Dutch manual of the WPPSI (Wechsler, 1967) in which the American norms are used, was
published in 1973 (Berger, Creuwels & Peters, 1973). In 1981 a Flemish adaptation of the test
was published with Flemish norms (Stinissen & Vander Steene, 1981). The test data for the
WPPSI do not always show clearly which directions and norms were used.
In the case of 20 deaf children the administration of the WPPSI was limited to the performance scale. The SON-R 2,-7 was administered first to six children and the WPPSI was administered first to 14 children. The mean age at the time of administration of the WPPSI was 5;3
years and of the SON-R 2,-7 5;8 years. The interval between the tests was, on average, 11
months. The correlation between the WPPSI PIQ and the SON-IQ was .82.
The WPPSI was administered in full to 53 children. These were nearly all children with a
developmental disorder. In 70% of the cases the WPPSI was administered first. The mean ages
at the time of administration of the WPPSI and the SON-R 2,-7 were 5;3 and 5;9 years
respectively. The interval between administration of the tests was, on average, 8 months. The
correlation with the total IQ of the WPPSI was .60. The correlations of the SON-IQ with the
verbal scale and the performance scale of the WPPSI were .49 and .59 respectively.
WISC-R
In the case of 20 children, scores were available on the WISC-R (Van Haasen et al., 1986), the
Dutch language version of the Wechsler Intelligence Scale for Children - Revised, (Wechsler,
1974), that has been standardized for the Netherlands. This test was administered mainly to
100
SON-R 2,-7
children with a general developmental delay or with pervasive developmental disorder and to a
few children with a speech or language disorder. The SON-R 2,-7 was administered first to 15
children. The mean ages at the time of administration of the SON-R 2,-7 and the WISC-R were
6;6 and 6;8 years respectively. The interval between administration was, on average, a little
more than two months.
The correlation with the WISC-R total IQ was .62. With the verbal scale the correlation was
.47 and with the performance scale .76. The mean score of the SON-IQ was more than 3 points
lower than the WISC-R total IQ. The score on the verbal scale of the WISC-R was 9 points
higher than the score on the performance scale; the mean score on the performance scale was
practically the same as the SON-IQ.
Correlations with the SON-IQ were also calculated for the combined data of the WPPSI, the
WPPSI-R and the WISC-R. As the norms differ, the mean scores of the tests were equated to 0
for each test combination. Subsequently the correlations were calculated for the combined
group. Using this procedure, the correlation of the SON-IQ with the performance scale of the
Wechsler tests could be calculated for 112 children; this was .69.
In the case of 73 children to whom the WPPSI and the WISC-R were administered in full, the
correlation with the total IQ was .62. The correlations of the SON-IQ with the verbal scale and
the performance scale for these children were .49 and .63 respectively.
LDT
The LDT (Leiden Diagnostic Test; Schroots & Alphen de Veer, 1976) consists of eight subtests
which tap verbal and performance skills, and memory. The subtests are partially adapted subtests from other tests, including subtests of the WPPSI and the WISC. The test has been
standardized for the Netherlands.
The LDT was administered in full to 80 children, most of whom had a general developmental
delay or a speech or language disorder. In the case of 53 children, the LDT was administered
first. The mean ages at the time of administration of the LDT and the SON-R 2,-7 were 5;7 and
6;1 years respectively. The average interval between the tests was 8 months.
The correlation of the SON-IQ with the LDT IQ was .58. The correlation of the SON-IQ with
the mean score on three performance subtests (Block Patterns, Folding Papers and CopyTapping) was .67; the correlation with the mean score on two memory tests (Vocabulary Length
and Indicating Pictures) was .43 and the correlation with the mean score on three verbal tests
(Repeating Sentences, Questions about a Story and Comprehension and Insight) was .20. In the
case of children younger than 5;6 years at the time of administration of the LDT (N=38), the
correlation with the LDT IQ was .53.; in children older than 5;6 it was .61. The correlation with
the performance tests of the LDT increased with age from .59 to .74.
RAKIT
The administration of the RAKIT (Revision of the Amsterdam Intelligence Test for Children;
Bleichrodt et al., 1984) takes so long that usually only a few subtests were administered. The
RAKIT was administered to all groups except the deaf children. In the case of 54% of the
children the SON-R 2,-7 was administered first.
The shortened version of the RAKIT was administered to 27 children and the test was
administered in full to 13 children. In this group of 40 children, the mean ages at the time of
administration of the RAKIT and the SON-R 2,-7 were 5;7 and 5;11 years respectively. The
period between the administrations was, on average, a good half year. The correlation of the
SON-IQ with the RAKIT IQ was .46; the mean scores were practically the same.
In the case of 30 other children, the administration of the RAKIT was limited to the first
four subtests (Figure Recognition, Exclusion, Memory and Word Meaning). The mean ages
at the time of administration of the RAKIT and the SON-R 2,-7 were 5;10 and 6;0 years
respectively. The period between administrations was, on average, a good 8 months. The
correlation between the SON-IQ and the mean standard score on the four subtests of the
RAKIT was .64.
101
REYNELL
In these research groups, the scores on the RDLS (Reynell Development Language Scales;
Reynell, 1977) relate to the Dutch translation by Bomers and Mugge (1985), which uses the old
English norms. In most cases only the subtest Language Comprehension A was administered.
The standardized scores have a mean of 0 and a standard deviation of 1.
The Reynell Test was administered to 179 children. The mean age at the time of administration of the Reynell was 4;10 years and the mean age at the time of administration of the SON-R
2,-7 was 5;0 years. The interval between tests was on average a little more than 5 months; in
52% of the cases the Reynell was administered first.
The correlation between the score on Language Comprehension and the SON-IQ was .44. In
the group of children with a general developmental delay or with a pervasive development
disorder (N=90), the correlation was .55; in the group of children with a speech or language
disorder, or with impaired hearing (N=89), the correlation was .35. A distinction was made in
both groups between the children who were younger than five years at the time of administration of the Reynell and the older children. The correlation in the youngest group of children with
general or pervasive development problems was .63 (N=54) and in the oldest group the correlation was .46 (N=36). In the group of children with speech or language disorders, or with
impaired hearing, the correlation was .24 in the youngest group (N=44) and .49 in the oldest
group (N=45).
TvK
The scores of 49 children were known on at least three of the following four subtests of the TvK
(Language Tests for Children; Van Bon, 1982) Word-Form Production, Choice of Sentence
Structure, Choice of Vocabulary and Vocabulary Production.
The TvK was administered mainly to children with a speech or language disorder. Children
with a pervasive development disorder and hearing impaired children were also tested. In 84%
of the cases the TvK was administered after the SON-R 2,-7. The mean interval between the
tests was a little more than three months. The mean age at the time of administration of the
SON-R 2,-7 was 5;10 years and the mean age at the time of administration of the TvK was 6;0
years. The standardized scores on the TvK have a mean of 5 and a standard deviation of 2. The
correlation of the mean standard score on the subtests of the TvK with the SON-IQ was .53.
102
SON-R 2,-7
this correlation was .74 or .75. The correlation with the verbal IQ was clearly lower in the
control group (r=.54). The correlation with the full scale IQ of the WPPSI-R (r=.75) was slightly
higher in the control group than the correlation with the PIQ. On average, the scores on the
SON-R 2,-7 were five points lower than those of the PIQ.
The mean differences between the groups were very similar for the SON-IQ and the PIQ: the
difference between the hearing-disabled group and the control group was 13.3 for the SON-IQ
and 10.3 for the PIQ. For the group with a developmental delay the difference was 40.5 for the
SON-IQ and 38.5 for the PIQ.
Table 9.8
Correlations with the WPPSI-R in Australia
Mean and Standard Deviation
Entire
group
(N=155)
SON-IQ
WPPSI-R
PIQ
VIQ
FSIQ
Control
group
(N=59)
Hearing
impairment
(N=59)
Developm.
delay
(N=37)
94.2 (22.3)
108.9 (14.5)
95.6 (15.6)
68.4 (19.0)
99.1 (21.8)
112.2 (13.5)
109.1 (11.1)
112.2 (12.4)
101.9 (17.2)
73.7 (17.5)
Entire
group
Control
group
Hearing
impairment
Developm.
delay
.78
.74
.54
.75
.74
.75
WPPSI-R
PIQ
VIQ
FSIQ
the correlations have been corrected for the variance of the SON-IQ
103
WPPSI-R
The WPPSI-R was administered to 75 children whose mean age at the time the SON-R 2,-7
was administered was 5;1 years. The correlation of the SON-IQ with the total IQ (FSIQ) of the
WPPSI-R was .59; the correlations with the performance and verbal scales were .60 and .43
respectively. The mean score on the SON-IQ was more than two points lower than the FSIQ and
nearly four points lower than the PIQ.
K-ABC
The original American edition of the Kaufman Assessment Battery for Children differs in
several respects from the Dutch edition for young children (the GOS 2,-4,). The simultaneous
scale of the K-ABC consists of seven parts, and the sequential scale of six parts. However, the
number of subtests administered depends on age. The subtests of the simultaneous and sequential scales form the mental scale. A number of subtests of the mental scale, in which no verbal
abilities are required, form the nonverbal scale. The mean of all scale scores is 100 and the
standard deviation is 15.
The mean age of the 31 children to whom the K-ABC was administered was 4;7 years. The
SON-IQ had the highest correlation with the total Mental Score of the K-ABC, r=.66. The
correlation with the Sequential Scale (r=.29) was considerably lower than the correlation with
the Simultaneous Scale (r=.58). This corresponds with the results of Dutch research with the
GOS 2,-4,, but with the K-ABC, as with the GOS, the distribution of scores on the Sequential
Scale was considerably narrower that the distribution of scores on the Simultaneous Scale. The
correlation with the Achievement Scale was .58 and correlation with the Nonverbal Score of the
mental scale was .61.
MSCA
The McCarthy Scales of Childrens Abilities, published in the Netherlands as the MOS 2,-8,
(Van der Meulen & Smrkovsky, 1986), consists of eighteen subtests. The administration was limited to the subtests of the Verbal Scale, the Perceptual Performance Scale and the Quantitative
Scale, which, together, form the General Cognitive Index. The scale scores have a mean of 50 and
a standard deviation of 10. The general index has a mean of 100 and a standard deviation of 16.
The test was administered to 26 children with a mean age of 4;7 years. The correlation with
the General Cognitive Index of the MSCA was .61. The highest correlation was with with the
Perceptual Performance Scale (r=.61). The correlation with the Verbal Scale was .48 and the
correlation with the Quantitative Scale was .40.
PPVT-R
The Peabody Picture Vocabulary Test requires the child to choose from four pictures the one that
best represents the meaning of a word that has been presented verbally. The standard score on
the test has a mean of 100 and a standard deviation of 15.
The PPVT-R scores of 29 children to whom the SON-R 2,-7 was administered were known
by the school. The mean age at the time of administration of the SON-R 2,-7 was 5;6 years.
The correlation of the Peabody Standard Score with the SON-IQ was .47.
Table 9.9
Age and Sex Distribution of the Children in the American Validation Research
Sex
Criterion Test
Boys
Girls
WPPSI-R
K-ABC
MSCA
PPVT-R
PLS-3
75
31
26
29
47
38
16
12
15
26
37
15
14
14
21
28
31
24
3
47
45
1
25
104
SON-R 2,-7
Table 9.10
Correlations with Criterion Tests in the American Research
Scores
Criterion Test
WPPSI-R
Full Scale IQ
Performance IQ
Verbal IQ
K-ABC
Mental Processing Composite
Simultaneous Processing
Sequential Processing
Achievement Scale
Nonverbal Scale
MSCA
General Cognitive Index
Verbal Scale
Perceptual-Perform. Scale
Quantitative Scale
PPVT-R
Standard Score Equivalent
PLS-3
Total Language Score
Auditory Comprehension
Expressive Communication
Age Interv.
(years) (days)
Criterion
Mean (SD)
,-7
SON-R 2,
Mean (SD)
Mean
Mean
75
.59
.60
.43
96.8 (13.9)
98.3 (14.9)
96.1 (13.0)
94.5 (16.6)
5.1
14
31
.66
.58
.29
.58
.61
97.3
96.2
98.3
96.0
96.5
(16.0)
(19.3)
(13.3)
(13.9)
(15.5)
86.1 (20.9)
4.6
16
26
.61
.48
.61
.40
102.3
50.8
52.2
49.9
(19.3)
(13.3)
(10.2)
(10.9)
95.0 (19.1)
4.6
13
29
.47
95.7 (19.7)
95.5 (15.3)
5.5
47
.61
.59
.56
102.7 (19.8)
103.6 (18.8)
101.3 (18.6)
91.4 (18.3)
4.6
the age is the age at the time of administration of the SON-R 2,-7
the correlations have been corrected for the variance of the SON-IQ
PLS-3
The Preschool Language Scale-3 is a test for the receptive and expressive language ability of
young children. Separate scores are calculated for Auditory Comprehension and Expressive
Communication. Together they form the Total Language Ability Score. The three standardized
scores have a mean of 100 and a standard deviation of 15.
The test was administered to 47 children. The mean age at the time of administration of the
SON-R 2,-7 was 4;7 years. The correlation with the Total Score of the PLS-3 was .61. The
correlation with the Receptive Language Ability (r=.59) was slightly higher than the correlation
with the Expressive Language Ability (r=.56).
105
corresponds to group three in Dutch primary education. The mean age was 6;3 years with a
standard deviation of 3 months. The group consisted of 34 boys and 24 girls. The schools
selected children belonging to one of the following three groups: the control group (children
without specific problems and handicaps, N=20); the ESL group (English as a Second
Language, N=22) and the LD-group (Learning Disabled, N=16).
The shortened version of the BAS was administered. This consists of four subtests (Naming
Vocabulary, Digit Recall, Similarities and Matrices), supplemented by two nonverbal subtests
(Block Design and Visual Recognition). In addition to the IQ score for the shortened version and
the combination of six subtests, the mean score for the three verbal tests (Naming Vocabulary,
Digit Recall and Similarities) and the three nonverbal tests (Matrices, Block Design and Visual
Recognition) were also calculated. The IQ scores have a mean of 100 and a standard deviation
of 15; the subtest scores have a mean of 50 and a standard deviation of 10.
In table 9.11, the mean scores for the entire group and for the different subgroups are
presented, together with the correlations of the scores on the BAS with the SON-IQ. The
correlation with the shortened version of the BAS was .80 in the entire group. When two
nonverbal subtests are added to the shortened version of the BAS, the correlation increased to
.87. The correlation with the three nonverbal tests (r=.78) was higher than with the three verbal
tests (r=.71), but even the latter was high.
Within the three subgroups the correlations of the SON-IQ with the BAS IQ, based on six
subtests, and with the nonverbal tests, had comparably high values. In the control group, however, the correlations of the SON-IQ with the shortened version of the BAS, and with the three
verbal subtests, were clearly lower than in the other groups.
In the entire group, the IQ scores on the SON-R 2,-7 were, on average, 7 points lower than
on the shortened version of the BAS. The difference in IQ scores between the control group and
the ESL group was slightly less for the SON-R 2,-7 (20.8 points) than for the shortened form of
the BAS (23.9 points). When the two nonverbal tests were added to the BAS IQ, the difference
on the BAS between the two groups decreased to 19.5 points. The difference between the
Table 9.11
Correlations with the BAS in Great Britain
Mean and Standard Deviation
Entire
group
(N=58)
Control
group
(N=20)
English
2nd language
(N=22)
Learning
problems
(N=16)
SON-IQ
83.6 (20.4)
102.7 (14.4)
81.9 (11.2)
61.9 (11.9)
90.6 (20.0)
111.5 (10.0)
87.6 (11.7)
68.7 ( 9.8)
BAS IQ (6 Subtests)
Mean of 3 verbal tests
Mean of 3 nonverbal tests
92.4 (18.8)
44.1 ( 9.4)
49.2 ( 9.8)
111.2 ( 8.7)
54.3 ( 5.6)
56.4 ( 7.0)
91.7 (10.8)
41.4 ( 5.6)
51.1 ( 6.9)
69.9 ( 8.8)
35.0 ( 4.2)
37.6 ( 4.8)
Entire
group
Control
group
English
2nd language
Learning
problems
.80
.56
.76
.78
BAS IQ (6 Subtests)
Mean of 3 verbal tests
Mean of 3 nonverb.tests
.87
.71
.78
.83
.35
.69
.85
.60
.81
.87
.73
.81
the correlations have been corrected for the variance of the SON-IQ
106
SON-R 2,-7
control group and the LD group was 40.8 points for the SON-IQ and 42.8 points for the
shortened BAS. For the BAS IQ based on six subtests, the difference was 41.3 points.
107
Table 9.12
Overview of the Correlations with the Criterion Tests
Test
Country
Group
P-SON
SON-R 2,-7
SON-R 5,-17
SON-R 5,-17
NL
NL
NL
NL
special groups
stand.research
special groups
stand.research
188
141
18
119
Stutsman
NL
special groups
TONI-2
NL
BOS 2-30
BOS 2-30
Intelligence/Development
General
Nonverbal
Verbal
IQ
IQ
IQ
IQ
9.4
9.1
9.4
9.1
42
.57 IQ
9.4
prim.education
153
.51 IQ
9.2
NL
NL
stand.research
special groups
50
26
.59 MS
.53 Nonv.
.50 Nonv.
9.1
9.4
K-ABC (GOS)
K-ABC
NL
US
stand.research
prim.education
115
31
.65 GCI
.66 GCI
.61 Nonv.
9.1
9.6
WPPSI/WPPSI-R
WPPSI/WISC-R
WISC-R
WPPSI-R
WPPSI-R
WPPSI-R
NL
NL
NL
AU
AU
US
special groups
special groups
OVB-schools
special groups
prim.education
prim.education
39
73
41
96
59
75
.75 FSIQ
.59 FSIQ
.83
.63
.73
.77
.74
.60
MSCA
US
prim.education
26
.61
BAS (shortened)
GB
mixed group
58
LDT
LDT
NL
NL
OVB-schools
special groups
71
80
RAKIT (short)
RAKIT (short)
RAKIT
NL
NL
NL
stand.research
special groups
OVB-schools
165
70
71
DTVP-2
NL
prim.education
153
.73 GVP
9.2
TOMAL
NL
prim.education
153
.45 NMI
9.2
PPVT-R
US
prim.education
29
.47
9.6
PLS-3
US
prim.education
47
.61
9.6
TvK
TvK
NL
NL
stand.research
special groups
108
49
.59 (5s)
.53 (4s)
9.1
9.4
Reynell (old)
Reynell (new)
NL
NL
special groups
stand.research
179
558
.44 LC
.48 LC
9.4
9.1
Schlichting
Schlichting
Schlichting
Schlichting
NL
NL
NL
NL
stand.research
stand.research
stand.research
stand.research
558
558
56
241
.35
.45
.54
.27
9.1
9.1
9.1
9.1
.65
.79
.66
.76
sec.
.54 VIQ
.43 VIQ
9.4
9.4
9.3
9.5
9.5
9.6
.61
.48
9.6
.87 (6s)
.78 (3s)
.71 (3s)
9.7
.58 IQ
.60 BP
.67 (3s)
.49 (3s)
.20 (3s)
9.3
9.4
.51 (4s)
9.1
9.4
9.3
.62 FSIQ
.74 FSIQ
PIQ
PIQ
PIQ
PIQ
PIQ
PIQ
.49 VIQ
.60 VIQ
.60
.54
the correlations have been corrected for the variance of the SON-IQ
(3s) signifies score based on 3 subtests
NL (Netherlands); GB (Great Britain); US (United States of America); AU (Australia)
sec: the section in which the research has been described
SD
WD
Lex
AM
108
SON-R 2,-7
Table 9.13
Difference in Scores between SON-IQ and PIQ of the WPPSI-R (N=230)
Frequency Distribution of the Absolute Difference in Scores
No correction
Correction mean
09
10 19
20 29
30 39
40 49
50 59
54%
66%
34%
25%
10%
7%
0.4%
0.4%
0.9%
1.3%
0.4%
0%
Age
Land
SON-IQ
PIQ
Sex
Age
Land
A boy
B girl
4;4
4;8
Aust.
US
79
68
130
110
C girl
D girl
4;11
5;1
US
US
SON-IQ
PIQ
124
113
86
71
are tested decreases and as the period between the test administrations increases (Bayley, 1949).
As described in section 9.1 and 9.4, on the basis of various analyses, the correlation of the
SON-IQ with the criterion tests increased greatly as the age at which the tests were administered
increased. The facts that part of the research was done with children who were difficult to test
and that a shortened version of the criterion tests was often administered are also factors
contributing to the weakening of the correlations.
In order to illustrate the occasionally very large discrepancies between the scores on the SON-R
2,-7 and tests that correspond greatly in content, we shall make a further comparison of the
differences in scores between the PIQ of the WPPSI-R and the SON-IQ. The comparison is
based on the results of the 155 children who were tested in Australia and the 75 children who
were tested in West Virginia with the WPPSI-R. In these research projects the interval between
the two test administrations was generally limited to a few weeks. In table 9.13 the frequency
distribution of the absolute differences between the PIQ and the SON-IQ is presented. These
scores were also calculated after first correcting for the difference in means, so that possible
discrepancies in standardization of the tests do not play a role; five points were deducted from
the PIQ for this.
After correcting for the means, the differences in scores for two thirds of the children were
slight (less than 10 points). For a quarter of the children, the differences ranged from 10 to 19
points. In the case of 9% of the children the differences were larger than 20 points and for four
of these children the differences were quite extreme, i.e. 30 points or more. The scores of these
four children are presented in the second part of table 9.13.
Two children, a boy with impaired hearing tested in Australia and a girl tested in West
Virginia, scored substantially lower on the SON-R 2,-7 than on the performance section of the
WPPSI-R. In the latter case the performance on the SON-R 2,-7 was possibly influenced
negatively by the fact that the child had been tested on the WPPSI-R earlier that day. Two girls,
both tested in West Virginia, scored substantially higher on the SON-R 2,-7 than on the
WPPSI-R. Neither of these girls functioned well socially and both were difficult to test.
These extreme cases, in which a child performed far below his or her potential on one of the
tests, had a strong negative influence on the correlations. In the Australian research the correlation increased from .78 to .80 if the deviating subject was left out of the calculation. In the
American research the correlation increased from .60 to .74 if the three children with deviating
scores are left out.
The examples show that extreme underperforming can occur with the SON-R 2,-7 as well
as with the WPPSI-R. This certainly also applies to other intelligence tests. In chapter 10 the
significance of this for diagnostic work with young children is examined.
109
PS
RS
Diff.
Land
Group
sec.
WPPSI/
WISC-R
FSIQ
PIQ
VIQ
73
.54
.59
.39
.58
.53
.50
.04
.06
.11
NL
special groups
9.4
WPPSI-R
FSIQ
PIQ
VIQ
75
.59
.64
.42
.53
.51
.42
.06
.13
.00
VS
prim.education
9.6
WPPSI-R
FSIQ
PIQ
VIQ
59
.67
.77
.38
.62
.53
.53
.05
.24
.15
AU
control group
9.5
WPPSI-R
PIQ
96
.90
.74
.16
AU
special groups
9.5
BAS
IQ Shortened vers.
3 Nonverbal subt.
3 Verbal subtests
58
.77
.82
.68
.86
.79
.82
.09
.03
.14
GB
entire group
9.6
LDT
Total IQ
3 Performance subt.
2 Memory tests
3 Verbal subtests
80
.47
.66
.30
.07
.56
.49
.47
.32
.09
.17
.17
.25
NL
special groups
9.4
LDT
Blokpatronen
3 Verbal subtests
71
.62
.39
.45
.47
.17
.08
NL
OVB schools
9.3
RAKIT
IQ Shortened vers.
165
.58
.46
.12
NL
norm group
9.1
RAKIT
IQ Shortened vers.
70
.42
.52
.10
NL
special groups
9.4
DTVP-2
153
.78
.52
.26
NL
prim.education
9.2
Reynell
.36
.54
.18
NL
special groups
9.4
codes for the countries: NL (The Netherlands); GB (Great Britain); US (United States of America) ;
AU (Australia)
110
SON-R 2,-7
Australian research with the WPPSI-R, the children with impaired hearing were combined with
the children with learning problems when calculating the correlations.
The Performance Scale of the SON-R 2,-7 clearly had a stronger correlation than the Reasoning Scale with:
the performance scale of the Wechsler tests,
the performance subtests of the LDT,
the DTVP-2, the test for visual perception.
The Reasoning Scale of the SON-R 2,-7 clearly had a stronger correlation than the Performance Scale with:
the verbal scale of the Wechsler tests,
the verbal subtests of the BAS,
the verbal subtests and the memory tests of the LDT,
the Reynell Test for Language Comprehension.
The results in two research projects with the shortened version of the RAKIT were contradictory. In the case of the DTVP-2, the large difference in correlations was caused mainly by the
subtests of the scale for Visual Motor Integration; the difference here was .34. The difference
between the correlations with the scale for Motor Reduced Visual Perception was .14. In the
standardization research the difference in correlations with the Reynell Test for Language
Comprehension and the Schlichting Test for Language Production was slight. The difference in
two research projects with the TvK had a mean of -.08.
The results support the distinction that was made on the basis of the analysis of the internal
structure of the test (section 5.4). They indicate that two aspects of general intelligence are
represented in the SON-R 2,-7; on one hand the performance perceptual tasks, related to
spatial understanding and visual motor skills, and on the other hand the tasks that require
abstract and concrete reasoning. These latter tasks have a stronger relationship with verbal
intelligence and language skills. Because of this, the SON-R 2,-7 is more versatile than a
nonverbal intelligence test that is limited to specific performance tasks.
111
from country to country and from test to test. As a result of a general improvement in performance, the norms will become stricter for a new test, and the scores will be lower than on tests
standardized some time ago. A similar effect was observed in the Netherlands during the revision
of the WISC-R (Harinck & Schoorl, 1987) and the SON-R 5,-17 (Snijders, Tellegen & Laros,
1989). In an American comparison of the WISC-III with the WISC-R (Wechsler, 1991), and of
the WPSSI-R with the WPPSI (Wechsler, 1989), the increase for the FSIQ was 3.4 points per 10
years (averaged over both tests); for the PIQ this was 4.3 points and for the VIQ 1.9 points.
In table 9.15 the mean scores on the SON-R 2,-7 and the most important criterion tests are
presented. When possible, the results of different research groups were combined (the section in
which the research is described is referred to in the table). Neither criterion tests that were
administered to less than 50 children, nor specific verbal tests are shown in this table. Furthermore, a distinction was made between criterion tests that were scored according to Dutch,
American and English norms.
The differences between the mean scores were also corrected for the interval between the
publication of the criterion test and the publication of the SON-R 2,-7 (1996). Unfortunately
Table 9.15
Comparison Between the Mean Test Scores of the SON-R 2,-7 and the Criterion Tests
Dutch Norms
Criterion Test
P-SON
Total IQ
BOS
Mean
SON-R Crit.
Year
Type
Difference
without/with
correction
section
188
84.9
97.9
75 p
13.0
4.0
[9.4]
Mental Scale
Nonverbal Scale
50
76
103.0
96.5
100.5
97.5
83 g
83 p
2.4
1.0
6.8
4.6
[9.1]
[9.1/9.4]
GOS
115
102.9
104.4
93 g
1.5
.5
[9.1]
RAKIT
Short.version IQ
205
98.0
97.9
84 g
0.0
4.1
[9.1/9.4]
LDT
Total IQ
Block patterns
80
71
81.6
92.3
85.0
97.8
76 g
76 p
3.3
5.5
3.5
3.1
[9.4]
[9.3]
WISC-R
Total IQ
Performance IQ
61
61
88.9
88.9
89.0
88.6
86 g
86 p
.1
.3
3.3
4.6
[9.3/9.4]
[9.3/9.4]
Mean
SON-R Crit.
Year
Type
Difference
without/with
correction
American Norms
Criterion Test
section
TONI-2
IQ Form A
153
102.4
103.5
90 p
1.2
1.4
[9.2]
DTVP-2
153
102.4
109.2
93 p
6.8
5.5
[9.2]
TOMAL
153
102.4
97.5
94 p
4.9
5.7
[9.2]
WPPSI-R
Total IQ
Performance IQ
134
230
100.8
94.3
103.6
98.8
89 g
89 p
2.7
4.6
.3
1.6
[9.5/9.6]
[9.5/9.6]
Mean
SON-R Crit.
Year
Type
Difference
without/with
correction
79 g
7.1
English Norms
Criterion Test
BAS
N
Short.version IQ
58
83.6
90.6
1.3
section
[9.7]
112
SON-R 2,-7
most test manuals do not give any information about the period in which the norm data were
gathered. If the interval between gathering the norm data and the publication of the test was
known to be much longer than three years, this was taken into account (in the case of the GOS
the interval was six years). In the absence of reliable data about the obsolescence of the norms in
relation to country and test, the strength of the correction was based on the aforementioned
American results of the WPPSI-R and the WISC-III. For each year between the publication of
the criterion test and the SON-R 2,-7, .34 point was deducted from the mean scores for general
intelligence measures and .43 point for performance and nonverbal measures.
When no correction was performed, the differences in means between the SON-R 2,-7 and
the criterion tests were slight for the tests that were standardized in the Netherlands after 1980.
The scores on the SON-R 2,-7, however, were considerably lower than scores on the Preschool
SON (published in 1975) and also clearly lower than the scores on the LDT (1976). After
correction for the year of publication, the scores on the SON-R 2,-7 were, in general, 3 to 4
points higher than scores on the other tests that were standardized in the Netherlands. However,
even after correction, the scores on the SON-R 2,-7 were 4 points lower than the scores on the
Preschool SON. This supports the impression gained from practical experience that the norms
of the Preschool SON were much too easy. The reason for the relatively large difference with
the mean scores on the BOS may be the fact that both tests were administered at two years of
age. A ceiling effect occurs on the BOS at this age and a floor effect occurs on the SON-R
2,-7.
The fact that the scores on the SON-R 2,-7, after correction, were generally higher than on
the other tests could mean that the increase in the intelligence scores of Dutch children in the
last ten to fifteen years is less than we have assumed on the basis of the American data. It could
also mean that a number of children from the special groups and the immigrant group, with
whom part of this research was carried out, profited more from the specific characteristics of the
SON-R 2,-7, such as the nonverbal character and the feedback. When comparing the SON-R
2,-7 with the most recently standardized test, the GOS, which was administered to 115 children during the standardization research, little difference was found in the mean scores.
When comparing tests, using American norms, the scores on the SON-R 2,-7 were lower
than the scores on the American tests (with the exception of the TOMAL). However, after
correction, no differences were found, on average, with the different tests. The difference with
the total score on the WPPSI-R was minimal. With the PIQ the difference was -1.6 and with the
IQ score on the TONI-2 the difference was 1.4. However, a large negative difference occurred
with the DTVP-2 and an equally large positive difference occurred with the TOMAL.
When the English norms for the BAS were used, a large difference, 7 points, was found.
After correction for obsolescence of the norms, this difference practically disappeared.
These results indicate that a strong similarity exists in the development of the (nonverbal)
intelligence of children in the Netherlands, the United Stated of America and Great Britain, and
that the Dutch age norms of the SON-R 2,-7 can be used in Western countries for a broad
assessment of intelligence. However, standardizations conducted on a national level remain
preferable, in order to arrive at more precise norms at the subtest level, and at a better determination of the dispersion and the form of the score distributions.
113
In the comparisons, correlations between the SON-R 2,-7 and a number of criterion tests with
other variables, were calculated. The comparison between the SON and the other test was
always based on the same group of children. As these correlations were examined within a
group, and not between groups, they were not corrected for the variance of the SON-IQ.
Evaluation of testability
As with the SON-R 2,-7, the children who completed the GOS 2,-4, or the RAKIT in the
framework of the standardization research, were evaluated, after the test, by the examiner on
motivation, concentration and understanding of the directions. In table 9.16 the number of times
the children were given the evaluation good with relation to these aspects is presented for the
children who were evaluated on the SON and the GOS (N=107), and for the children who were
evaluated on the SON and the RAKIT.
The children were more frequently evaluated as being well motivated and well concentrated
during the administration of the SON-R 2,-7 than during the administration of the GOS and the
RAKIT. The difference in percentages varied from 10% to 18%. The evaluation of comprehension of directions was also more often positive with the SON-R 2,-7; the difference with both
other tests was about 6%.
The percentages were lower in the comparison of the SON and the GOS than in the comparison of the SON and the RAKIT. This was the result of the younger ages at which the SON-GOS
combination was administered.
The results are an indication that the attractiveness and variety of the testing materials of the
SON-R 2,-7, the opportunity for the child to be active, the help and feedback given, the limits
on the administration of difficult items, the absence of the necessity to talk, and the extensive
directions, have been successful in allowing the children to do the test in the best possible
circumstances.
Table 9.16
Comparisons between Tests of the Evaluation of the Subjects Testability
Percentage of the children with evaluation good
, -7
SON-R 2,
, -4,
,
GOS 2,
Motivation
Concentration
Compr. directions
107
107
79%
64%
77%
62%
79%
74%
15%
15%
5%
Difference
, -7
SON-R 2,
RAKIT
Motivation
Concentration
Compr. directions
169
169
91%
81%
85%
67%
88%
81%
10%
18%
7%
Difference
Background variables
The correlations of a number of criterion tests with the SES index and with the distinction native
Dutch subject or not, have been compared with the correlation between the SON-R 2,-7 and
these variables. The analyses were always carried out in the same group. Due to missing values,
small differences in numbers occur in the correlation with SES index and native country; in
table 9.17 the mean number is shown. Country of origin was dichotomised to form a native
Dutch group (children whose parents were both born in the Netherlands) versus a group of
children one or both of whose parents was born abroad. A positive correlation means that the
114
SON-R 2,-7
native Dutch children scored higher on the test. The comparisons were limited to the standardization research and the research at primary schools. In table 9.17 the column headed by
difference shows the difference between the correlation of the SON-R 2,-7 and the correlation of the criterion test with the variable. A positive difference means that the SON-R 2,-7 had
a stronger correlation with the background variable.
Most of the comparisons indicated that the SON-R 2,-7 correlated less strongly with the
SES level of the parents than the other tests. In nine of the thirteen comparisons involving
absolute differences of .05 or more, the differences were negative. The correlation of the
SON-R 2,-7 with the SES index was considerably weaker (.10 or more) than the GOS, the total
IQ on the WISC-R, the DTVP (visual perception), the verbal subtests of the RAKIT, the verbal
scale of the WISC-R and the TvK (language test). On the other hand, the correlation of the
SON-R 2,-7 with the SES index was considerably higher (.10 or more) than those of the BOS,
the TOMAL (nonverbal memory) and the performance scale of the WISC-R.
Nearly all comparisons showed that the differences in performance between the native Dutch
and the immigrant children was smaller on the SON-R 2,-7 than on the other tests. This was
Table 9.17
Comparisons Between Tests in Relation to Socioeconomic and Ethnic Background
Correlation with
SES index
Correlation with
Dutch/Immigrant
Crit.
test
SON-R
, -7
2,
Diff.
Crit. SON-R
, -7
test
2,
Diff.
118
.21
.24
.04
.11
.17
.06
50
.12
.23
.11
.30
.03
.27
, -4,
,
GOS 2,
115
.54
.39
.15
.17
.06
.11
RAKIT (Short.version)
168
.48
.43
.05
.16
.16
.00
REYNELL/SCHLICHTING
Mean LC, SD and WD
557
.39
.34
.05
.16
.04
.12
108
.52
.40
.11
.23
.05
.18
Crit.
test
SON-R
, -7
2,
Diff.
Standardization Research
,-17
SON-R 5,
BOS 2-30
Crit. SON-R
, -7
test
2,
Diff.
TONI-2
141
.39
.48
.09
.06
.24
.18
TOMAL
141
.32
.48
.16
.16
.24
.07
DTVP-2
141
.59
.48
.11
.36
.24
.12
OVB-Schools
Crit.
test
SON-R
, -7
2,
Diff.
65
.47
.49
.03
.06
.01
.05
65
.57
.49
.07
.07
.01
.06
65
.59
.49
.10
.07
.01
.06
40
.49
.56
.28
.38
.38
.38
.11
.18
.10
.27
.26
.22
.13
.13
.13
.14
.13
.09
WISC-R
Total IQ
Verbal Scale
Performance Scale
Crit. SON-R
, -7
test
2,
Diff.
115
particularly so for the BOS, the GOS, and the DTVP, the total score on the WISC-R and the
verbal scale of the WISC-R, and for the language tests (the Reynell/Schlichting Test and the
TvK). Only on the TONI were the differences between the native Dutch and the immigrant
children clearly smaller than on the SON-R 2,-7.
The differences between the SON-R 2,-7 and the SON-R 5,-17 were slight for both
background variables.
These comparisons demonstrate that the performance on the SON-R 2,-7 is less dependent on
social and cultural differences than the performance on tests that (partially) require verbal
knowledge and skills, like general and verbal intelligence tests and language tests.
116
SON-R 2,-7
Table 9.18
Comparisons Between Tests in Relation to Evaluation of Intelligence and Language Skills
Standardization Research
Correlation with
evaluation of
intelligence
Correlation with
evaluation of
language developm.
Crit. SON-R
, -7 Diff.
test
2,
Crit. SON-R
, -7
test
2,
Diff.
,-17
SON-R 5,
116
.47
.47
.00
.41
.34
.07
RAKIT (Short.version)
158
.62
.42
.20
.58
.43
.14
REYNELL/SCHLICHTING
Mean LC, SD and WD
285
.50
.47
.03
.54
.49
.05
TvK (5 subtests)
95
.56
.62
.06
.51
.56
.05
Special Groups
P-SON
Crit. SON-R
, -7 Diff.
test
2,
Crit. SON-R
, -7
test
2,
Diff.
152
.75
.74
.00
.33
.35
.02
STUTSMAN
40
.61
.79
.18
.64
.70
.06
WPPSI/WPPSI-R
Performance Scale
39
.45
.49
.03
.41
.35
.06
68
.74
.63
.70
.65
.65
.65
.09
.01
.05
.74
.73
.62
.55
.55
.55
.19
.18
.06
77
.69
.50
.55
.52
.53
.53
.53
.53
.16
.03
.02
.01
.49
.19
.39
.52
.23
.23
.23
.23
.26
.05
.15
.29
69
.49
.60
.11
.27
.41
.14
141
.55
.75
.21
.29
.36
.08
49
.45
.72
.27
.29
.27
.02
WPPSI/WISC-R
Total IQ
Verbal Scale
Performance Scale
LDT
Total IQ
Performance subtests
Memory subtests
Verbal subtests
RAKIT
Shortened version or
mean of 4 subtests
REYNELL
Language compreh. A
TvK
Mean of 4 subtests
117
In the previous chapters a detailed description was given of the results of the research carried
out with the SON-R 2,-7 to date. A summary of important results is presented here. In the
summary, the following questions will be answered:
have the objectives of the revision been realized,
does the test provide a valid measurement of intelligence,
for whom is the test suitable,
how should the results be interpreted.
Testing materials
New testing materials were developed, and existing material was completely renewed. The
number of items almost doubled and the number of subtests was increased from five, as in the
Preschool SON, to six. Our experience with the test suggests that the materials are attractive for
children and that the drawings and directions are clear. The storage system has been greatly
improved and the materials are very manageable and durable.
Directions
The description of the directions is much more detailed than in the previous version of the test.
This requires a greater effort from the examiner in learning how to administer the test. However,
it prevents the examiner from giving a personal interpretation of the directions, which would
result in the test not being administered in a standardized manner. The directions leave sufficient
room to adapt the administration to specific characteristics of the child.
The directions show clearly how to provide feedback and help. This is important as the way
in which directions are given differs from that of most intelligence tests. In comparison with the
Preschool SON, feedback and help are offered more consistently and are described in more
detail.
Norms
In the Preschool SON, age norms were given only for the total score on the test. The SON-R
2,-7 has norms at the level of the subtests, the scale scores (SON-PS and SON-RS), and the
total score (SON-IQ). Furthermore, the general norms are based on a large sample of 1124
children, instead of 500 children as with the Preschool SON. The statistical fitting procedure
used with the SON-R 2,-7 increases the accuracy of the norms still further. Weighting the
118
SON-R 2,-7
sample with respect to a number of variables related to intelligence (SES level, mothers country
of birth, and sex) prevents differences between age groups, with regard to these variables, from
influencing the accuracy of the norms.
The age range of the norms has been extended, for practical purposes, from 2;6 to 8;0 years.
The norms are rather precisely differentiated according to age. In the norm tables monthly norm
groups are distinguished, whereas the computer program bases the norms on the exact age.
Differentiated norms are very important for testing young children. For each age group in the
standardization research, the change in the IQ score that would result from using the norms for
children who were one month older was determined (table 10.1). For the two and three-yearolds the difference was approximately 3 IQ points, for the four and five-year-olds it was 2 IQ
points and for the older children 1 IQ point. If three-monthly norm tables had been used (as is
the case in WPPSI-R), then the administration of the test one day earlier or later could result in
a difference of 9 IQ points for a child on the border between two age groups. The systematic
deviation (upwards or downwards) on the borderline between two age groups of the tables is
then 4 to 5 points. In the Preschool SON, with age groups of half a year, these deviations were
even larger. By using monthly age groups, the systematic deviations for the youngest children
are at most 1 or 2 IQ points with the SON-R 2,-7.
The most precise results, certainly for the youngest age groups, are obtained with the computer program. Furthermore, the program makes an accurate calculation of the scaled test results
possible when the test has not been administered in full. Finally, the program can calculate the
reference age for the total scores. This can only be approximated using the tables.
Table 10.1
Mean Change in IQ Score Over a Period of One Month
Age
Diff.
Age
Diff.
Age
Diff.
2;3 years
2;9 years
3;3 years
3;9 years
3.9
3.3
2.9
2.6
4;3 years
4;9 years
5;3 years
5;9 years
2.3
2.0
1.8
1.6
6;3 years
6;9 years
7;3 years
1.4
1.1
.9
119
generalizability, this would imply that the subtests have no specific reliable variance and that a
uniform level of ability determines the performance on all test items. Research with the SON-R
5,-17, however, has shown that the proportion of specific reliable variance of the subtests
actually increases as the age of the children being examined decreases.
Adaptive procedure
The adaptive procedure, in which the entry and discontinuation rules are applied, was developed
to limit the duration of the administration of the test, and to improve the motivation of the
children. The administration of childish items, ones that are much too easy, as well as the
administration of items that are too difficult, has a demotivating effect. Especially for children
who are uncertain and often feel that they are failing, being confronted with tasks that are above
their level can be very frustrating. Because the administration of a subtest is discontinued after
a maximum of three mistakes, it was often possible to administer the test to children who were
otherwise difficult or impossible to test.
The mean duration of administration, in the different groups of children who have been
examined, was less than one hour. For very young children the duration of administration was
much shorter and for older children the duration of administration was somewhat related to the
level of their ability: children who performed relatively well completed more items and this
took more time.
, -17
Correspondence with the SON-R 5,
In the first two versions of the SON tests no distinction was made between a test for the younger
and a test for the older children. This distinction was first made in the construction of the
Preschool SON and the SSON. However, there were large differences between the Preschool
SON and the SSON in both content and manner of administration (see section 1.2). One
objective during the construction of the SON-R 2,-7 was to achieve a good correspondence
with the SON-R 5,-17, which was published in 1988. A strong similarity in content now exists
between the difficult items of the subtests Mosaics, Categories, Analogies and Situations of the
SON-R 2,-7 and the easy items of the corresponding subtests of the SON-R 5,-17. In both
tests an adaptive procedure is used and feedback is given. In the case of the SON-R 5,-17,
however, feedback is limited to indicating whether a solution is correct or incorrect. Both tests
also use highly differentiated norms that can be calculated with a computer program.
Strong similarities exist between the materials and the procedures used on the two tests. The
correlation between the SON-R 2,-7 and the SON-R 5,-17, with an interval of three to four
months between tests, was considerable (r=.76) and not much lower than the retest correlation
of the SON-R 2,-7 for children from 4;6 years onwards (r=.81). Sattler (1992) considers an
overlap in the content of tests such as the WPPSI-R and the WISC-III not very desirable,
because the tests can no longer be administered within a short period as independent tests. The
reason for the overlap of the age norms in the SON tests, however, is not to make retests within
a short period possible, but to offer a choice that is optimal with respect to the age, skills and
specific problems of the child.
120
SON-R 2,-7
Congruent validity
The relationship with other indicators of intelligence was investigated by examining the correlations with evaluations by other persons, and with performance on other intelligence tests.
The correlation between the SON-IQ and primary school teachers evaluations of the
childrens intelligence was .46. For children from special education programs and at medical
pre-school daycare centers, the SON-IQ had a correlation of .66 with the evaluation of the
cognitive development given with the referral to the school in question. The evaluation of
intelligence, generally given after the administration of the test for the children from these
groups, had a correlation of .68 with the SON-IQ. For the children with a language/speech/
hearing disorder, the correlation between the SON-IQ and the subsequent evaluation of intelligence was .61. The fact that the correlation within mainstream primary education was lower
than within the special groups, is not surprising. In the first years of primary education, cognitive development is not studied systematically. However, children in the special groups are
given an extensive psychological examination at the time of admission, and subsequently,
intelligence tests are administered at regular intervals to follow their development.
In comparison with general, partially verbal intelligence tests, the correlations between the
SON-R 2,-7 and the evaluation of intelligence are slightly lower. In comparison with the
performance section of such tests and the Stutsman, the correlations are practically similar or
higher.
The mean correlation of the SON-R 2,-7 with general intelligence tests and with nonverbal
tests was .65. Approximately half the correlations with general intelligence tests lay between
.59 and .70 and approximately half the correlations with nonverbal (intelligence) tests lay
between .59 and .75.
Correlations higher than .70 were found with the total score on the WPPSI-R (.75), the
WISC-R (.74) and the shortened version of the BAS (.87); with the performance section of the
WPPSI(-R) and WISC-R (.73 and .83) and the BAS (.78); with the DTVP-2 (.73) and with the
SON-R 5,-17 (.76). These results indicate that a reasonably strong correlation existed between
the score on the SON-R 2,-7 and a large number of very diverse (nonverbal) intelligence tests.
However, they also indicate that the childs performance on the SON-R 2,-7 can differ greatly
from his or her performance on another test. These differences can be much larger than may be
121
expected solely on the basis of the reliability of the tests. Four different causes can be indicated:
differences in content and procedure between the tests,
fluctuations in performance,
stable changes,
limitations of the research.
Differences in content
Large differences in content can exist between the SON-R 2,-7 and other intelligence tests.
Specifically verbal subtests are absent in the SON, as are memory tests and tests in which a
series of actions must be imitated, such as the sequential subtests of the K-ABC. Further, the
subtests of the SON-R 2,-7 do not have tempo characteristics, whereby simple tasks must be
completed as quickly as possible. In addition to these differences in the content of the items,
there are differences in the manner of administration that may influence the results: for instance,
the help and feedback given during the administration of the SON-R 2,-7 and the limited
number of mistakes before the test is discontinued.
Fluctuations in performance
Fluctuations in the performances of (young) children can also be an important cause of the
difference in scores. The retest correlation of the SON-R 2,-7, with an interval of three to four
months, was .79. This was clearly lower than the reliability of the test (.90), which is based on
the internal consistency. The idea that, in this short period, large stable changes occur effecting
this difference, is not plausible. The retest correlation may have been influenced slightly negatively by the fact that the learning effect that occurs with a retest is not the same for each child.
The study of differences in scores between the performance scale of the WPPSI-R and the SONIQ shows that large differences in performance may occur that cannot be explained solely on the
basis of content, stable changes, or errors of measurement. The relationship between the performance on the different subtests of the test was less strong, and the correlation of the SON-IQ
with other test scores were lower for the younger children. Problems with concentration and
motivation during the administration of the test occurred more often at younger ages. In the age
range for which the SON-R 2,-7 is intended (and especially for the youngest children), fluctuations in performance that are difficult to predict will have to be taken into account. All kinds of
factors can influence performance: how much at ease a child is with a specific examiner,
something that happened on the day of the test administration, feelings of anticipation, physical
condition, like tiredness or beginning influenza, etc. Fluctuations in performance, not related to
characteristics of the subtests, can also occur during the test administration. This could be due to
factors like tiredness during the course of the test administration, physical discomfort, or
increasing motivation as the child begins to feel more at ease in the test situation.
Stable changes
More stable changes in ability may lead to differences in scores when a relatively large interval
occurs between the administration of the tests. The rate at which children develop is not the
same for everyone and will fluctuate. Various factors influence the cognitive development of
children. Large changes in the circumstances in which a child grows up, and important events
may slow development down or, alternately, remove impediments to development. In various
correlational research projects with the SON-R 2,-7, the interval between the administration of
the tests was more than one year and differences in rate of development definitely affected part
of the correlations negatively.
Limitations of the research
In the different phases of the research administering the tests, scoring, calculating the age,
determining the scaled scores, recording and processing the data mistakes can also be
made that influence the results. One example is the switching of subjects when matching the
data. This happened during the comparative research of the SON-R 2,-7 and the WPPSI-R
(Tellegen, 1997). In large scale research such mistakes can be, and are, made. Also, knowing the
122
SON-R 2,-7
specific conditions under which each test was administered, and evaluating whether each administration occurred according to the standard directions becomes more difficult. This problem occurred, for instance, with the test results supplied by the special schools. The extent to
which the standard test administration procedure may have had to be adapted to the specific
problems of different children is not known for these results.
The inaccuracy of norm tables can also be a source of differences between scores. However,
in the case of tests with very broad norm tables (half-yearly or yearly), this was corrected for
during this research. When comparing the test scores, poor correspondence between norms
because of obsolete norms or because the norm groups are not comparable, can lead to large
differences between scores. However, this has generally no effect on the correlations.
During this research, the administration of other tests was, for practical purposes, sometimes
limited to a shortened version, or, in connection with handicaps of the children, to the nonverbal
or performance section. This often limited the reliability and validity of the criterion tests and
also the strength of the correlations.
Conclusions
The limitations of the reliability of the test, the specific characteristics of the contents of the
SON-R 2,-7, and the instability of the test performances of young children seem to us to be the
most important causes of the differences in scores between the SON-R 2,-7 and other (nonverbal) intelligence tests. These factors, which lead to lower correlations, play a smaller part
when children are older. When the influence of stable changes and the limitations of the
research are taken into account, it is realistic to take a value of approximately .70 for the
correlation of the SON-R 2,-7 with other intelligence tests as a point of departure, if the
interval between the administration of the tests is not longer than one year.
Based on this evaluation of the correlation with other intelligence tests, and the data on the
reliability (based on internal consistency) and the stability, the variance of the SON-IQ can be
roughly described as follows (see figure 10.1):
10% measurement error variance,
10% reliable unstable variance,
10% reliable test-specific variance,
70% stable reliable variance that is generalizable to other tests.
The last component, the variance that the SON-R 2,-7 has in common with other intelligence
tests administered at a different time, is the most relevant for the evaluation of intelligence, and
the value .70 can be seen as an indication of the validity. However, for very young children, the
validity will be lower due to the lower reliability of the test and greater instability; in older
children the validity will, in accordance, be higher. The proportion of test-specific variance will,
of course, also depend on the extent to which the criterion tests correspond in content and
procedure with the SON-R 2,-7. The validity in this case is based on the correlation with a
(nonverbal) intelligence test administered at another time. However, if we could correlate the
scores on the SON-R 2,-7 with the ideal score based on a large number of other tests,
administered at different times within one year, then the correlation would equal ,.70 and the
validity coefficient would be .84.
Construct Validity
The objective during the development of the first SON test was to overcome the one-sidedness
of the existing performance tests, and to incorporate tasks related to abstract and concrete
reasoning in the nonverbal test. The aim of the SON-test was, and is, to measure general
intelligence as precisely as possible, within the limitations of a nonverbal administration. The
results of the factor analyses on the subtests of the SON-R 2,-7, carried out with very diverse
groups of children, in the Netherlands as well as abroad, support the distinction between three
performance tests (Mosaics, Puzzles and Patterns) and three reasoning tests (Categories,
Analogies and Situations). This is a relative distinction, as the largest part of the common
variance of the subtests can be reduced to one general factor.
The significance of the two scales is confirmed by the correlations with other tests. The
123
Figure 10.1
The Components of the Variance of the SON-R 2,-7 IQ Score
10%
Unstable variance
10%
Test-specific variance
10%
Stable
generalizable
variance
reliability
(.90)
70%
stability
(.80)
proportion
of valid
variance
(.70)
Performance Scale had a relatively strong correlation with the performance scale of the Wechsler
tests, the performance section of the LDT and with the DTVP-2. The Reasoning Scale had a
relatively strong correlation with the verbal section of the Wechsler tests, the verbal section of
the LDT and the BAS, and with the Reynell Test for Language Comprehension. These results
show that a broader domain of intelligence is measured by the SON-R 2,-7 than by tests that
consist exclusively of a performance section.
Memory
The SON-R 2,-7, like the SON-R 5,-17, has no specific memory tests. The correlations of the
SON-IQ with two memory tests of the LDT (.43), the TOMAL (.45), the sequential development index of the K-ABC/GOS (.29 and .49), and with auditory memory in the Schlichting Test
(.27), were moderately positive. These relatively low correlations argue strongly in favor of
examining intelligence and memory separately. Incorporating a few memory subtests in an
intelligence test is too restricted a basis for a valid assessment of memory. In addition, the
interpretation of the intelligence score becomes more difficult because memory is a separate
factor.
Visual perception
The correlation of .73 with the DTVP-2, a test for visual perception, shows that visual perception is strongly represented in the SON-R 2,-7. The Performance Scale in particular was
strongly related to the DTVP-2. Perception, in this context, is not passively seeing, but comprises structuring, evaluating and comparing visual information.
Motor skills
The subtests of the Performance Scale of the SON-R 2,-7 in particular require visual-motor
skills. In primary education, the correlation of the SON-IQ with the teachers evaluation of
motor development was low (.24). However, for children with a developmental delay or with a
language/speech/hearing disorder, these correlations were higher (.46 and .32).
Verbal skills
Knowing the relationship between the SON-R 2,-7 and verbal intelligence and language skills
is important if we are to be able to judge to what extent the domain of intelligence is restricted
124
SON-R 2,-7
by the nonverbal character of the test. However, verbal intelligence is not a clearly defined
concept. On the basis of factor-analytical research, a distinction is made by Kaufman (1975) in
the verbal scale of the WISC-R between the factors Verbal Comprehension and Freedom from
Distractibility. Further, the factor Verbal Comprehension, includes quite diverse subtests, for
instance, Similarities, a verbal subtest of abstract reasoning, and Vocabulary, a test tapping
verbal knowledge. The skills required for the subtest Similarities belong to the intelligence
domain that the SON-R 2,-7 is intended to measure. The performance on a subtest like Vocabulary is so dependent on the circumstances in which a child grows up that a nonverbal alternative for this test is not feasible for the SON-R 2,-7. On the K-ABC, subtests which clearly tap
verbal knowledge are scored separately and are not included in the calculation of the mental
development index.
The fact that a precise distinction between intelligence, verbal intelligence and language
skills cannot be made is shown by the correlations of the SON-IQ with evaluations of intelligence and language skills. In the case of children in primary education, the correlation with the
evaluation of intelligence was .46 and the correlation with the evaluation of language development was only slightly lower, .44. However, clear differences in the correlations with these
evaluations were found for children with a developmental delay (.68 versus .48) and for children
with a language/speech/hearing disorder (.61 versus .31).
The correlations of the SON-IQ with the verbal section of intelligence tests, and with tests of
language skills and language development were in the order of .50. Taking into account the fact
that the SON-R 2,-7 can be administered completely without using any language, these correlations are considerable. The Reasoning Scale of the SON-R 2,-7 contributed most to these
correlations.
Socio-economic differences
The SON-IQ had approximately the same association with the SES level of the parents as other
(nonverbal) intelligence tests. However, the correlation with SES level was less strong than for
language tests and the verbal section of general intelligence tests. In comparison with most
other tests, smaller differences were found between immigrant and native Dutch children when
using the SON-R 2,-7.
Conclusions
These results support the conclusion that the concept of intelligence measured by the SON-R
2,-7 corresponds broadly with what is considered to be general intelligence. The SON-R 2,-7
emphasises visual-motor and perceptual skills, spatial insight, and the ability to reason abstractly and concretely. This corresponds with the factors Fluid Intelligence and Broad Visual
Perception of Carrolls classification (1993). Memory, knowledge, and language skills have an
indirect association with performance, but the measurement is not based on these skills. The test
is less dependent on socio-economic factors than are verbal tests, and can best be defined as a
nonverbal, general intelligence test with an emphasis on fluid intelligence and visual perception.
125
Communicative handicaps
The research with native Dutch children who were deaf but not multiple handicapped found that
the mean IQ score (97.9) deviated only slightly from the score of the hearing population. As
with the SON-R 5,-17, the lower scores in the group of deaf children related only to Categories
and Analogies, the subtests for abstract reasoning.
The mean score of the children with a language/speech and/or hearing disorder was approximately 90. However, this group of children cannot be compared very well with the deaf
children. On one hand, the group included children with multiple handicaps. On the other hand,
children with a language/speech/hearing disorder, who were functioning well in regular education were not included in the research group.
Administration of the SON-R 2,-7 appears to be quite possible for children with communicative handicaps. Cooperation and comprehension of the directions were judged to be good by
the examiner for 80%-90% of these children. Motivation was judged to be good in approximately 70%. Problems with concentration were most frequently mentioned: with about 40% of
the children rated as moderate or fluctuating. The test could be administered in full to practically all the children.
In the case of the above mentioned children a nonverbal test is necessary for a valid evaluation of the level of intelligence, as the delay in verbal development may result completely or
partially from this handicap, and bear little relation to other aspects of cognitive development.
In this group the SON-IQ correlated clearly with the evaluation of intelligence (r=.61), whereas
the correlation with the evaluation of language development was much lower (r=.31).
Developmental delay/disorder
The research on children with a developmental delay and developmental disorders was carried
out with children at schools for special education with a pre-school department, at medical preschool daycare centers, and with children with pervasive developmental disorders. With these
children, multiple social, emotional and behavioral problems often occur, as well as delays in
cognitive, verbal and motor development.
The mean SON-IQ for this group was approximately 80. A considerable delay was found
on all subtests. Large differences in scores were found within the group. Approximately 10%
of the children had a score close to 50, and slightly more than 10% had a score higher than
100. Performance on the test corresponded strongly with the diagnostic evaluation of cognitive development at the time of admittance to the school/institute (r=.66), and with the
evaluation of intelligence that was made later by other professionals involved with the
children (r=.68).
The children in this group were more difficult to test. Motivation, concentration, cooperation
or comprehension of the directions were more often rated as moderate or fluctuating by the
examiner than in the group of children with communicative handicaps.
126
SON-R 2,-7
more natural interaction between examiner and child. Strictly limiting the number of items
completed incorrectly also prevents the child from quickly becoming demotivated. Furthermore, the SON-R 2,-7 has very varied test items, with which the child is constantly and
actively involved.
These qualities make the SON-R 2,-7 attractive for use with children who do not have a
specific communicative handicap, but whose social, emotional and behavioral problems may
interfere with the administration of a more traditional intelligence test.
Immigrant children
Testing immigrant children with traditional intelligence tests can lead to an underestimation of
their cognitive potential. This occurs because no account is taken of the fact that lack of
knowledge of, and skill in the language of the examiner does not necessarily indicate that the
verbal capacities of these children are lower. A lower level of performance on the verbal section
can, but it does not necessarily, indicate a lower level of intelligence. The performance on the
performance section of these tests can also be biased because the directions are usually given
verbally. Correlational comparisons between the SON-R 2,-7 and a number of other tests
showed that, in most cases, the differences between native Dutch and immigrant children were
smaller when the SON-R 2,-7 was used. On the SON-R 2,-7, the difference in IQ scores
between native Dutch and immigrant children was 7.5 IQ points, half a standard deviation.
Children with one parent born outside the Netherlands scored as high as native Dutch children.
Turkish and Moroccan children scored approximately 10 points higher on the SON-R 2,-7 than
comparable groups tested with the RAKIT, and 6 points higher than comparable groups tested
with the LEM.
The delay that was found in the group of immigrant children was comparable to the delay
found in native Dutch children with parents of the same SES level. Research on immigrant
children who participated in OPSTAP(JE) showed that, after a two-year stimulation program,
these children performed at the mean level of native Dutch children. However, selection for
participation in OPSTAP(JE) and/or the research may have contributed to these relatively good
performances.
127
There were no indications that the contents of the pictures in the different subtests caused extra
problems for the children with a different cultural background. We assume that depicting children with a non-western appearance contributed to making the SON-R 2,-7 test materials
recognizable to these children.
All results indicate that the test can be used effectively with immigrant children. Of course,
an evaluation of these childrens language skills, and of the extent of their knowledge of the
Dutch language, can also be important. However, this must not be confused with intelligence,
nor should language skills directly influence the evaluation of intelligence.
Visual handicaps
The SON-R 2,-7 has a strong visual orientation. All subtests use pictures. When vision is
greatly impaired and not compensated by glasses or other means, use of the SON-R 2,-7 must
be strongly discouraged. Adapted tests are available for these children (Dekker, 1987). Slight
limitations of vision will probably not be a problem. The pictures in the test are large and clear
and do not require the ability to discriminate small visual differences.
Motor handicaps
Various tasks of the SON-R 2,-7 require motor skills and eye-hand coordination. During the
construction of the test an effort was made to minimize the influence of this on the evaluation of
performance. In the subtest Patterns wide criteria for the evaluation of the drawings are used. In
the subtests Puzzles and Mosaics, frames are used to make it easier for young and poorly
coordinated children to perform the tasks well. Furthermore, the time limits, in as far as they are
applied, are broad and speed is not scored. In the case of children with more serious motor
handicaps, the possibility that these handicaps may influence performance negatively should be
considered. In chapter 11, possibilities for adapting the administration procedure to the childs
level of motor skill are discussed.
Ages
The SON-R 2,-7 was originally intended for the age range 2, to 7 years. However, the norms
of the tests were constructed for the age range 2 to 8 years. In the following section we will show
how the test can be used for a number of different age groups. The question when the SON-R
5,-17 should be preferred to the SON-R 2,-7 will also be discussed here.
128
SON-R 2,-7
129
family doctors. An intelligence test is also frequently administered in combination with other
developmental tests. The administration of the test will often take place as part of a cycle in
which formulating questions and gathering relevant information are alternated (Kievit & Tak,
1996).
The SON-R 2,-7 supplies information about performance on different levels (subtests, scale
scores and total scores), and in different ways (reference age, deviation scores, observations). In
the following section, the value of this information for the diagnostic process, and the risks that
exist when a single result on one test is interpreted as the level of intelligence will be discussed.
Level of scores
The objective of the SON-R 2,-7 is to give an impression of the general intelligence level of the
child. Diverse subtests are used not to determine differences in performance among the subtests,
but rather because the influence of the specific characteristics of the subtests on the total score
decreases when the test is made up of several subtests. The accuracy of the IQ score is not
primarily judged by the reliability of the test, but by its generalizability. All the variance that is
specific to the subtests is considered irrelevant for the generalizability. The SON-IQ, with the
80% probability interval that is based on the generalizability should, in our opinion, be the basis
for the evaluation of the test results.
Subtest scores
The differences between the scores on the subtests have the lowest reliability and stability. The
retest research shows that differences between subtest scores are also unstable. When the differences between subtests are relatively large, the chance is greater that the order of the differences
is largely maintained. If one wants to interpret the differences between the subtest scores
further, one must therefore first determine whether the differences are relatively large. This can
be done by the computer program.
Although conclusions should not be drawn on the basis of the results at the subtest level,
evaluating the differences between subtest scores in relation to other information available
about the child, or to impressions gained during the administration of the test, may be worthwhile. Such an evaluation may allow specific ideas to be developed about the childs strengths
and weaknesses, which can subsequently be examined further. The explorative use of the subtest
data can be of value when the intertest differences are sufficiently large.
Scale scores
The possibilities for using the scores on the Performance Scale and the Reasoning Scale are
greater than the possibilities of the subtest scores, but are still more limited than the possibilities
for the score on the SON-IQ. The scale scores are more reliable than the subtest scores, with a
mean of .85, and a retest stability of .72. An important difference with respect to the subtest
scores is that the scale scores are based on several subtests. This means that generalizable
statements can be made on the basis of these scores. However, the correlation between the two
scale scores is rather high (.56), which means that the reliability of the difference between the
two scores is limited to .65. The stability of the difference score is even lower, i.e., .46. Before
interpreting differences between the two scores, one should certainly determine whether the
difference is significant. Both the norm tables and the computer program supply information on
this. General statements, for example about a possible difference between the development of
performance and reasoning ability, can be made only if the probability intervals of the two
scores do not overlap. This information is supplied when using the computer program.
The diagnostic possibilities of the two scale scores need to be studied in further detail. For
the time being, this information should be used exploratively.
IQ score
The SON-IQ, the scaled and standardized total score of the SON-R 2,-7, is the most usable,
generalizable, reliable, and stable result of the test. Combined with the 80% probability interval,
130
SON-R 2,-7
Table 10.2
Classification of IQ Scores and Intelligence Levels
IQ
Description
>130
121130
111120
90110
80 89
70 79
<70
Very high
High
Above aver.
Average
Below aver.
Low
Very low
2%
7%
16%
50%
16%
7%
2%
IQ
Description (1)
IQ
Description (2)
>130
121130
111120
90110
80 89
60 79
<60
Highly gifted
Gifted
Above average
Average
Less gifted
Learning probl.
Learning disorder
>129
120129
110119
90109
80 89
70 79
<70
Very superior
Superior
High average
Average
Low average
Borderline
Mentally
deficient
the SON-IQ gives a good indication of the level of intelligence of the child. The categories as
shown in table 10.2 can be used to give a rough definition of the test result. The first column is
neutral and descriptive: this shows whether the childs performance on the test was high, low or
average. This classification is also used by the DTVP-2 and, with slightly different limits, by the
WPPSI-R. The two other classifications give a description of the level of intelligence of the
child in qualitative terms, related to the IQ score.
131
can be instructive when reporting the results, for instance, to the parents. Furthermore, the
reference age provides information about the level of tasks that the child comprehends, and this
can be used to show at which level learning materials or training can be given. One can,
naturally, not only depend on the reference age. When a 7-year-old child has a reference age of
3;5, this child is in a completely different situation, and has completely different learning
abilities, from a 3-year-old child with a reference age of 3;5 years. For that matter, a 3-year-old
with an IQ of 80 is of course not equal to a 7-year-old with an IQ of 80.
The reference age can best be described as the childs performance on the test corresponds
to the mean performance of children of .. years old. This is better than saying the child
functions at the level of a .. year-old. The latter formulation suggests, unjustly, that the complete cognitive or mental level of the child is described by the test.
132
SON-R 2,-7
tion of the SON-R 2,-7 in a heterogeneous age group is as high as the generalizability coefficient. Whether this is valid for all age groups is not known. However, it means that the 80%
probability interval can also be interpreted in another way, namely as the expected interval for
the hypothetical IQ score if we could administer these six subtests many times with an intervening period of several months. In this interpretation, allowance has been made, in the 80%
interval, for the reliability as well as the instability, but no longer for the specific variance of the
subtests.
Generalizing across tests and time
The 80% interval of the SON-IQ, with which (approximately) two of the three limitations of a
single administration of the test can be taken into account in two different ways, has a width that
is not precise enough, in many practical situations, to make important decisions. If all three
aspects are taken into consideration unreliability, instability and test-specific characteristics
an assessment of the level of intelligence, based on the test result, can be made with even less
certainty. The real danger of drawing completely incorrect conclusions based on a single test
result for young children is demonstrated by the comparison of the scores on the SON-IQ with
the PIQ and the WPPSI-R (see section 9.8). In the case of four of the 230 children, a difference
of around 40 points occurred. In two cases the child had a low score on the SON-R 2,-7, and in
two cases on the WPPSI-R. If the evaluation is to be used to make important decisions, with far
reaching consequences for the child and his or her surroundings, the administration of a single
intelligence test is unlikely to be sufficient. The risk that a distorted idea of the intelligence will
be formed, due to a combination of unreliability, fluctuations in the performance and specific
characteristics of the test, is too great.
Administration of several tests
Based on the research on the congruent validity of the SON-R 2,-7, the variance of the test has
been described as follows in section 10.2 (see figure 10.1):
measurement error variance (10%)
unstable variance (10%)
test-specific variance (10%)
valid generalizable variance (70%)
The proportion of valid generalizable variance is based on correlations of approximately .70
with other (nonverbal) intelligence tests. If we assume that there are other intelligence tests,
with similar variance compositions, and with correlations of .70 with each other and with the
SON-R 2,-7, then the composition of the variance of the mean score when two or three
different tests are administered can be calculated (see table 10.3). The assumption here is that
the interval between the test administrations is between several weeks and several months.
When two tests are administered, the share of the undesired sources of variance is reduced by
40%. The proportion of valid variance increases from 70% to 82%. When three tests are administered, the share of the undesired sources of variance is reduced by 60%; the proportion of valid
variance now becomes 88%. For young children, the share of undesired variance is larger than
for the older children. Therefore, in the last part of table 10.3, an estimate has been made of the
Table 10.3
Composition of the Variance When Several Tests Are Administered
SON-R 2,-7
Average of
two tests
Average of
three tests
10%
10%
10%
6%
6%
6%
4%
4%
4%
5%
5%
5%
Valid variance
70%
82%
88%
85%
133
Table 10.4
Correction of Mean IQ Score Based on Administration of Two or Three Tests
Mean IQ
50
55
60
45
50
56
Mean
IQ
Mean
IQ
Mean
IQ
Mean
IQ
Mean
IQ
Mean
IQ
65
70
75
61
67
72
80
85
90
78
83
89
95
100
105
94
100
106
110
115
120
111
117
122
125
130
135
128
133
139
140
145
150
144
150
155
both the mean IQ score (Mean) and the newly standardized IQ score (IQ) are presented
components of the variance, when three tests are administered to children from 2 to 4 years of
age, and when two tests are administered to children from 5 to 7 years of age. This leads to an
estimate of 85% valid variance for both groups. The reliability of the mean score is .95 and the
stability is .90.
When the scores on two or three tests are averaged, the dispersion becomes narrower. Table
10.4 shows how to correct the mean score for this. This correction is based on a standard
deviation of the mean score of 13.6.
To calculate the mean of the scores, the norms of the different tests must be comparable. For
a number of reasons, including obsolescence of the test norms, this is often not the case. This
problem was discussed in section 9.10. In table 10.5 the expected obsolescence of the norms of
the SON-R 2,-7, based on an estimate of obsolescence of one IQ point per 3 years, is presented.
This means that, in theory, 3 years after the standardization 1 IQ point should be subtracted from
the IQ score. The same holds true for other intelligence tests.
Table 10.5
Obsolescence of the Norms of the SON-IQ
Year of administration
Year of administration
Year of administration
The obsolescence has been calculated from 1993/94, the year in which the standardization was carried
out. An obsolescence rate of 1 IQ point per three years was used.
An improved estimate of the level of intelligence can also be gained by administering the
test again. This also leads to higher reliability and improved stability. However, a retest
within a short period brings with it the problem of learning effects. Further, the specific
characteristics of the test will still influence the mean score. Administering a different test in
combination with the SON-R 2,-7 is a much more attractive alternative, because this
reduces the influence of various undesired sources of variance at the same time. Naturally
the alternative test must be suitable for the target group. For various groups of children with
whom the SON is used, only nonverbal tests, or the performance section of more general
tests will be considered as alternatives. Diversity in materials and method of testing is, as
far as possible, desirable. Furthermore, having different examiners administer the tests is
recommended.
Because the SON-R 2,-7 differs so clearly from many other tests in the method of administration and the materials used, it is very suitable to be administered as an extra test for children
to whom a (partially) verbal intelligence test can also be administered.
An IQ score based on two test administrations, and for young children preferably on three
test administrations, can be interpreted with much more confidence as the level of intelligence.
Such scores can be evaluated qualitatively according to the descriptions presented in the second
and third parts of table 10.2.
134
SON-R 2,-7
Other information
The question whether another test besides the SON-R 2,-7 should be administered depends
primarily on the consequences of an incorrect evaluation of the child. If these are not very
serious, and if the evaluation can easily be revised, a relatively large margin of uncertainty is
acceptable. The risk of an incorrect evaluation will also decrease if the result of the test can be
interpreted in combination with information from the parents, teachers and others concerned
with the child. The observations of the examiner may also give an indication of the desirability
of administering an extra test.
Manner of administration
A condition for the validity of the test result is that the test is administered in the correct manner
and according to the directions. Experience in administering tests is very important in this
respect, as is experience in interacting with young children and, if relevant, with children with
specific problems or handicaps. The administration of the test does not necessarily have to be
done by a psychologist. However, the interpretation and recording of the results remain the
domain of qualified experts.
An important aspect of the SON-R 2,-7 is its friendly approach to children and the interaction between examiner and child. This makes completing the test enjoyable, for both child and
examiner. However, this also means that the examiner is closely involved in the administration
and hence that the risks of examiner effects are greater. With one exception, systematic examiner effects were restricted to a few IQ points during our research. To reduce the risk of such
effects, it is advisable, in addition to closely following the directions, to be present on some
occasions when someone else administers the test and to allow someone else to be present when
administering the test oneself. If possible, comparing the results of different examiners now and
then is also useful.
10.5 CONCLUSIONS
The research has shown that the SON-R 2,-7 is a valid, reliable intelligence test that can be
used to good effect with children with problems and handicaps in language development and
communication, with children with a foreign language or bilingual background, with children
with a developmental delay and developmental disorders, and with mentally deficient children
and adults. When more traditional, verbal intelligence tests are used with these groups, the
evaluation of intelligence can be distorted by the language skills of the child. The test results of
the deaf children who were not multiple handicapped, and whose performance was almost equal
to that of the children in the norm group, demonstrate the importance of a nonverbal test
administration. The same holds true for the performance of the immigrant children. This was
much better than the performance on the traditional tests, and was comparable to the results of
the children in the norm group with a similar SES level.
Young children, in particular young children with a problem or handicap, are still frequently
difficult to test. In addition to the nonverbal character of the test, which allows, but does not
require, the child to speak, a child-oriented test situation is established by the help given by the
examiner, the attractiveness of the materials and the manner in which the child is actively
involved. Comparisons with two other tests showed that the children were more motivated and
concentrated with the SON-R 2,-7 and that, according to the examiners, they understood the
directions better.
The interaction between examiner and child offers extra opportunities for observation. However, the manner of administering the test does require the examiner to be thoroughly prepared
and to follow the directions.
The scores on the test correlated strongly with various evaluations of intelligence. The
performance of children with a developmental delay (in the Netherlands and in Australia) and
learning problems (in Great Britain) was low, as was expected. The correlations with other
nonverbal intelligence tests were reasonable. However, due to differences in content between
135
the tests and to fluctuations in the performances, the correlations were lower than would be
considered possible on the basis of the reliability of the test. The score on the test gives an
indication of the intelligence of the child; the score is not the level of intelligence. When
decisions with far-reaching consequences have to be made, the diagnosis should be based on the
administration of two or three intelligence tests.
The IQ score, for which the reference age can also be determined, is of prime importance for
the interpretation of the test results. The distinction between a Performance Scale and a Reasoning Scale was supported by the Principal Components Analysis and by the patterns of correlations with other tests. This is important because it, in turn, supports the multifacetted nature of
the concept of intelligence as it is measured with the SON-R 2,-7; however, the reliability of
the difference between the two scale scores is relatively low and of less practical importance.
The norms for the test scores are based on the exact age of the child, so avoiding systematic
distortions in the presentation of the results, and probability intervals are presented, allowing
the user to take the uncertainty about the results into account.
The difficulties which arise when testing young children, and the great diversity of problems
and handicaps of young children for which psychological assessments are requested, make it
extremely important that a number of well-constructed, standardized and validated intelligence
tests are available. The SON-R 2,-7 complies with these criteria.
137
11 GENERAL DIRECTIONS
In this chapter the general characteristics of the procedure for the test administration and
scoring are presented. In chapter 12 the directions for each separate subtest will be described.
11.1 PREPARATION
Before the test is administered for the first time, the examiner should become familiar with the
materials, the directions and scoring of the items. We strongly advise trying out the test a
number of times before using it. In our experience, administration of the test is not difficult. In
order to administer the test correctly the examiner must have a good command of the directions
so that he or she does not need to consult the manual during the administration. Learning to
administer the test is facilitated by observing a test administration or watching a video recording
of it.
If attention is not continually focused on the child, he or she can easily be distracted and
loose interest in the test. Specific characteristics of the administration of each subtest are
described on the record form so that these are always immediately available during the administration of the test.
A valid test administration of young children, certainly when they have problems and handicaps, requires a high level of expertise from the examiner. Experience in testing children is
essential. When a child has specific problems or handicaps, experience in interacting with these
children is desirable in order to be able to communicate easily with the child, and to deal with
any problems that may arise. Administration of the test is not restricted to psychologists
and (ortho)educationalists; experience in testing of, and interaction with young children is of
paramount importance. However, interpreting and reporting on the test results remains the
prerogative of experts.
The directions should be followed as closely as possible. Deviating from the directions may
influence the test results. In general, sufficient latitude is allowed in the directions for adapting
to the comprehension and skills of the individual child. Because of specific problems of a child,
e.g., motor handicaps, adapting the administration of the test may be necessary. This will be
discussed in section 11.6.
Set-up
The examiner sits at a table opposite the child. The table should not be too broad. Otherwise,
the examiner cannot easily help the child. The height of the table and chair should be
adjusted to the child. The child should be able to sit comfortably, to easily see what is on the
table and what the examiner does. Preferably, the examiner should sit so that the light falls
on his or her face.
Only the material the child needs at that moment should be on the table. The child works
on a large anthracite-colored mat. The mat stops the material sliding around, makes it easier
for the child to pick up items, and supplies a uniform background. The record form and the
material needed by the examiner are placed on another table, preferably outside the reach of
the child.
138
SON-R 2,-7
CHILD
bottom
left
test materials
right
top
EXAMINER
record
form
storage
box
When describing the directions, the left-right perspective is correct from the examiners position. The top-bottom perspective is correct when seen from the childs perspective. We have
chosen this top-bottom perspective because referring to the top of the test booklet as the bottom,
when it is lying the other way round according to the examiners perspective, may be confusing.
The test booklets are presented so that the title page is facing the child. The page numbers
and numbers on the cards are always legible from the examiners perspective. When studying
the directions, this test situation should be taken into account.
The examiner should always be sure that the childs view of the material is not blocked while
presenting materials, giving directions, or correcting answers. The examiner should consider
right or left handedness when placing materials on the table or giving them to the child.
Introduction
Before starting the test, the examiner should allow time for the child to get used to the setting.
The child should not have the impression that he or she has to achieve, but that he or she will be
playing with different materials.
The length of time, needed to administer the SON-R 2,-7, varies from three to five quarters
of an hour. The entire test should, preferably, be administered in one sitting. The examiner can
allow a short break between subtests now and then, so that the child can have a drink or go to the
bathroom.
139
GENERAL DIRECTIONS
When using nonverbal directions, the gesture for together is often used. This should be done in
the following manner; move both hands together (slowly) as if to catch a large ball.
The examiner points to the picture, block, puzzle, or card. The examiner corrects the mistake,
when possible with the child.
LOOK, IF WE DO IT LIKE THIS, ITS BETTER.
The examiner tries to involve the child in actively correcting the mistakes by letting him or her
perform the last activity. The examiner does not explain why the childs answer was wrong.
When the child does not react despite encouragement:
The examiner completes the item while trying to actively involve the child in the solution.
Nod encouragingly.
When a child does not comprehend the directions, these may be repeated.
The second part of each subtest, with the exception of Patterns, is preceded by an example. This
example is always completed when the child reaches the items of the second part.
Every time the child has given an answer to an item, one must ascertain whether the child has
finished.
ARE YOU READY? or THATS IT? or READY?
140
SON-R 2,-7
The child may immediately correct his or her answer him/herself. In such a case the examiner
should ask what the final answer is. Make sure that the child does not consider the question,
whether he or she is ready or whether the final answer has been given, to express doubt about the
correctness of the answer. Varying the questions might be advisable (for example: show me the
correct picture again, or which picture matches this best).
Wait with feedback until the answer is complete (this is very important when more than one
choice must be made).
Sometimes a child comments on slight differences in color between the testing material and the
pictures in the test booklets, or about the space remaining in the frame of Mosaics. Reassure
the child and tell him or her that it does not matter.
Time limits
In part II of the performance subtests (Mosaics, Puzzles and Patterns) a maximum amount of
time is allowed per item. The examiner uses a stopwatch for these items. The time limit is 2,
minutes. Experience has shown that items are hardly ever completed correctly after this amount
of time has passed. The examiner may stop earlier when the child clearly cannot finish the item
successfully. When the child is almost finished after 2, minutes, the examiner allows the child
to finish the item.
The following situations can arise:
When the child is finished before the time is up, the examiner scores the item as being either
correct (1) or incorrect (0).
When it is clear before the time is up that the child will not succeed, the examiner can offer
help. The item is then scored as being incorrect (0).
When the child is not finished and the time limit has been reached, the examiner can help. The
item is scored as being incorrect (0).
When the time limit has been reached and the child can finish the item independently in a
short time, the child is allowed to do so. The item is scored as being incorrect (0).
Refusal
When the child does not wish to continue halfway through an item, the examiner encourages the
child to go on. When this has no effect, the examiner offers help. The item is then scored as
being incorrect (0).
When the child refuses to do an item in advance, or even to begin with a subtest, and
encouragement does not help, the examiner completes the item and tries to involve the child.
The item is then scored as being a refusal (). The child is then encouraged to complete the next
item. The administration of the subtest is discontinued when two consecutive items have
been refused. This subtest cannot be used for the evaluation of the test performances or for
the calculation of the IQ score.
When the child does not want to continue, a break may be called for, and in this extreme
situation changing the sequence of administration of the subtests may be considered. If, for
instance, the child does not want to continue doing Analogies, Patterns can be administered first
141
GENERAL DIRECTIONS
followed by Situations. Patterns, during which the child draws, is more attractive to do for some
children than Situations, during which the child must make choices.
Entry procedure
The first item of a subtest to be completed depends on the age and level of the child. Based on
age and class in primary education, the following rule holds:
Entry-item 1:
Entry-item 3:
Entry-item 5:
When a discrepancy exists between the age of the child and the level in primary education, the
entry level corresponding to the lower level is chosen. A six-year-old who is still in his second
year of school will begin with entry-item 3. Children of 2 and 3 years always begin with entryitem 1.
When a child is suspected of having a substantial cognitive developmental delay, the entry
level can be adapted. When the examiner receives the impression that a five-year-old functions
at the level of a three-year-old child, he or she will begin with entry-item 1.
However, when a child is suspected of having only a slight developmental delay (roughly
corresponding to an IQ of 85 to 100), beginning at a lower level than is suggested on the basis of
age and level at school is not necessary or desirable. When the child has a fear of failure or is
difficult to test for another reason, beginning at a lower level may be wise.
In the subtest directions, the administration procedure is always described starting with item 1.
At the end of the description of part I of each subtest, changes in the directions due to beginning
with entry-item 3 or 5 are described.
The skipped items are scored as + on the record form. In the calculation of the subtest score
these items are reckoned as being correct.
142
SON-R 2,-7
PAY ATTENTION: The number of mistakes includes the items completed incorrectly (score
0) as well as the items that were refused (score ).
Examples for discontinuation rule A:
The subtest is discontinued when a total of three items have been scored as being incorrect. The
entry procedure does not affect the discontinuation rule.
The meaning of the scores is:
+ Item skipped
1 Item correct
(based on the entry procedure; scored as being correct for the total score),
(completed entirely, independently and within the time limit when in
effect),
0 Item incorrect (completed incorrectly, not independently, incompletely or not within the
time limit),
Item refused (scored as being incorrect for the total score).
Mos
Cat
Puz
Cat
Ana
Sit
Pat
10 11 12 13 14 15
score
2
10 11 12 13 14 15
10 11 12 13 14
10 11 12 13 14 15
10
10 11 12 13 14
10
11 12 13 14 15 16
score
6
0 1
score
13
score
6
11 12 13 14 15 16 17 score
7
score
6
score
10
143
GENERAL DIRECTIONS
Mos
Puz
Pat
10 11 12 13 14 15
10 11 12 13 14
10
11 12 13 14 15 16
score
9
score
10
score
10
Puz
Ana
9 10 11 12 13 14
score
5
10
11 12 13 14 15 16 17 score
A
Item 3, the first item to be completed has been solved incorrectly. One goes straight
back to item 1.
Sit
7
A
10 11 12 13 14
score
0
144
SON-R 2,-7
Entry-item 5
Children of 6 years and older start with item 5. When either item 5 or item 6 is scored as being
incorrect, one goes straight back to item 3 and does item 3 as well as item 4.
When items 3 and 4 are both scored as being correct, one goes on to the more difficult items
until the discontinuation criterion has been reached.
When either item 3 or 4, or both items 3 and 4 are scored as being incorrect, one goes back to
item 1 and does item 1 as well as item 2. When the discontinuation criterion has been met as
calculated from item 1 on, the score is calculated on the basis of the item at which the discontinuation criterion has been reached. When the discontinuation criterion has not yet been met, one
goes on to the more difficult items.
When the situation occurs that one has to go back in a subtest, the subsequent subtests are
started at the lowest entry level reached, i.e., at entry-item 3 or at entry-item 1.
Mos
10 11 12 13 14 15
score
6
Item 6 has been completed incorrectly, so item 3 and 4 are administered. Then one
continues with item 7 until a total of three items have been completed incorrectly.
Cat
8
A
10 11 12 13 14 15
score
5
Item 5 has been completed incorrectly. So item 3 and 4 are administered. Because item
3 is completed incorrectly, item 1 and 2 are administered. The total number of mistakes
is still less than three so one continues with item 6 until three mistakes have been made.
Puz
10 11 12 13 14
score
1
After completing item 5 and subsequently 3 and 4, three mistakes have been made.
However, one does go back to item 1.
Ana
10
11 12 13 14 15 16 17 score
A
Pay attention: The discontinuation criterion was reached at item 4 of Analogies. Item
5, which was completed first, is no longer counted for the score.
145
GENERAL DIRECTIONS
Various aspects of the adaptive procedure are also illustrated on this record form. If this is not
yet entirely clear, section 11.3 and 11.4 should be studied anew. The scores that are calculated
here are the raw subtest scores. Using the norm tables or the computer program, they may be
transformed into the scaled standard scores.
Moz
Cat
Puz
Ana
Sit
Pat
1
+
10 11 12 13 14 15
10 11 12 13 14 15
10 11 12 13 14
10
10 11 12 13 14
10
score
7
score
score
7
11 12 13 14 15 16 17 score
A
11 12 13 14 15 16
score
8
score
5
The subtest Categories has not been scored in this example because the child refused to complete two consecutive items. This subtest is not used when calculating the IQ score.
146
SON-R 2,-7
obviated, for example, by always offering the cards one by one during the subtests Situations
and Categories, by putting the blocks on the table instead of having the subject take them out of
the box when doing Mosaics, by adding an extra non-slip layer under the mat, or by allowing the
child to give the examiner directions (this does assume good verbal skills). Problems in
handling the materials also make adapting the time limits desirable, and possibly administering
the test over a period of a few days. However, our experience is limited and the diversity of
motor handicaps is so large, that giving set rules for administering the test to these children is
difficult. The examiner will have to discover what the limiting factors are and whether, and in
which manner, these can be compensated. When one works mainly with motor handicapped
children, administering the test a few times to children who are not handicapped is advisable.
This way one can get a clear idea of problems that occur during the administration that are
specific to the handicap.
Adapting the manner of administration of the test can also be desirable when children are
very fidgety of find it hard to focus on the test (for example autistic children). In such a case,
sitting at a corner of the table may be preferable to sitting opposite the child as one can then
draw the childs attention by touching him or her.
When one deviates from the standard directions during the administration of the test, this
should be mentioned on the record form so that others can take this into account when interpreting the results.
205
REFERENCES
Akker, J. van den & Boecop, A. van (1976). Test voor visuele waarneming van Marianne Frostig.
Handleiding. Amsterdam: Swets & Zeitlinger.
Alexander, P.A., Willson, V.L., White, C.S., Fuqua, J.D., Clark, G.D., Wilson, A.F. & Kulowich, J.M.
(1989). Development of analogical reasoning in 4- and 5-year-old children. Cognitive Development, 4, 65-88.
APA (1987). Diagnostic and Statistical Manual of Mental Disorders, DSM III-R. Washington: American Psychiatric Association.
Bayley, N. (1949). Consistency and variability in the growth of intelligence from birth to eighteen
years. Journal of Genetic Psychology, 75, 165-196.
Bayley, N. (1969). Manual for the Bayley Scales of Infant Development. New York: The Psychological
Corporation.
Beek, C. van de (1995). De toepasbaarheid van de SON-R 2,-7 bij kinderen met een motorische
handicap. RU Groningen: intern verslag.
Berg, W. van den, Heide, L. van der, Kamminga, J., Meeder, S. & Paredes, M.G. de (1994). Slim
gezien! een vergelijking tussen de SON-R 2,-7 (intelligentietest) en de DTVP-2 (visuele perceptietest). RU Groningen: intern verslag.
Berge, J.M.F. ten & Kiers, H.A.L. (1991). A numerical approach to the approximate and the exact
minimum rank of a covariance matrix. Psychometrika, 56, 309-315.
Berge, J.M.F. ten & Zegers, F.E. (1978). A series of lower bounds to the reliability of a test. Psychometrika, 4, 575-579.
Berger, H.J.Chr., Creuwels, J.M.P. & Peters, H.F.M. (1973). Nederlandse handleiding bij het gebruik
van Wechslers intelligentie-schaal voor kleuters, de W.P.P.S.I. Amsterdam: Swets & Zeitlinger.
Bleichrodt, N., Drenth, P.J.D., Zaal, J.N. & Resing, W.C.M. (1984). RAKIT Revisie Amsterdamse
Kinder Intelligentie Test. Instructie, normen, psychometrische gegevens. Lisse: Swets &
Zeitlinger.
Bleichrodt, N., Resing, W.C.M., Drenth, P.J.D. & Zaal, J.N. (1987). Intelligentie-meting bij kinderen.
Lisse: Swets & Zeitlinger.
Bollen, N. (1991). Cognitief aanvangsniveau jongste kleuters basisonderwijs. OVG Groningen: intern
verslag.
Bollen, N. (1996). De cognitieve ontwikkeling van kleuter tot achtjarige in het basisonderwijs. OVG
Groningen: intern verslag.
Bomers, A.J.A.M. & Mugge, A.M. (1985). Reynell Taalontwikkelingstest: Nederlandse instructie.
Nijmegen: Berkhout.
Bon, W.H.J. van (1982). TvK Taaltests voor Kinderen. Handleiding. Lisse: Swets & Zeitlinger.
Bracken, B.A. & McCallum, R.S. (1998). UNIT Universal Nonverbal Intelligence Test. Itaska, IL:
Riverside Publishing.
Brouwer, A., Koster, M. & Veenstra, B. (1995). Validation of the Snijders-Oomen test
(SON-R 2,-7) for Dutch and Australian children with disabilities. RU Groningen: intern
verslag.
Brown, L., Sherbenou, R.J. & Johnsen, S.K. (1982). Test of Nonverbal Intelligence. Austin, TX:
Pro-Ed.
Brown, L., Sherbenou, R.J. & Johnsen, S.K. (1990). TONI-2 Test of Nonverbal Intelligence.
Examiners manual. Second Edition. Austin, TX: Pro-Ed.
Carroll, J.B. (1993). Human cognitive abilities. A survey of factor-analytic studies. Cambridge:
Cambridge University Press.
206
SON-R 2,-7
Cattell, R.B. (1971). Abilities; their structure, growth, and action. Boston: Houghton Mifflin.
CBS (1993). Centraal bureau voor de statistiek: Statistisch Jaarboek 1993. s-Gravenhage: SDU/
uitgeverij.
CBS (1994). Centraal bureau voor de statistiek: de leefsituatie van de nederlandse bevolking 1993,
kerncijfers. s-Gravenhage: SDU/uitgeverij.
Coultre-Martin, J.P. le, Wijnberg-Williams, B.J., Meulen, B.F. van der & Smrkovsky, M. (1988). BOS
2-30. Normen voor kinderen met een vermoede hoorstoornis of met een spraak- of taalstoornis.
Tijdschrift voor Orthopedagogiek, 27, 75-84.
Cronbach, L.J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16, 297334.
Cronbach, L.J., Schnemann, P. & McKie, D. (1965). Alpha coefficients for stratified parallel tests.
Educational and Psychological Measurement, 25, 291-312.
Dekker, R. (1987). Intelligentie van visueel gehandicapte kinderen in de leeftijd van 6 tot 15 jaar.
Amsterdam: VU Uitgeverij.
Drenth, P.J.D. (1966). De psychologische test. Deventer: Van Loghum Slaterus.
Driesens, N., Horn, J. ten, Paro, I., Schoemaker, M. & Swartberg, D. (1994). De mogelijke samenhang
tussen twee niet-verbale intelligentietests: SON-R 2,-7 en de TONI-2. RU Groningen: intern
verslag.
Dunn, L.M. & Dunn, L.M. (1981). PPVT Peabody Picture Vocabulary Test Revised. Manual for
Forms L and M. Circle Pines, MN: American Guidance Service.
Eldering, L. & Vedder, P. (1992). OPSTAP: een opstap naar meer schoolsucces? Amsterdam/Lisse:
Swets & Zeitlinger.
Eldik, M.C.M. van, Schlichting, J.E.P.T., Lutje Spelberg, H.C., Meulen, Sj. van der & Meulen, B.F.
van der (1995). Reynell Test voor Taalbegrip. Handleiding. Nijmegen: Berkhout.
Elliott, C.D., Murray, D.J. & Pearson, L.S. (1979-82). British ability scales: Manuals. Windsor:
National Foundation for Educational Research.
Elsjan, M., Kooi, M. van de, Kuiper, M., Raaijmakers, M. & Wensink, J. (1994). SON-R 2,-7 en
TOMAL: samenhang tussen een niet-verbale intelligentietest en een geheugentest. RU Groningen: intern verslag.
Evers, A., Vliet-Mulder, J.C. van & Laak, J. ter (1992). Documentatie van Tests en Testresearch in
Nederland. Assen: Van Gorcum.
Flynn, J.R. (1987). Massive IQ Gains in 14 Nations: What IQ tests Really Measure. Psychological
Bulletin, 2, 171-191.
Frostig, M., Lefever, D.W. & Whittlesey, J.R.B. (1966). Administration and scoring manual for the
Marianne Frostig Developmental Test of Visual Perception. Palo Alto, CA: Consulting Psychologists Press.
Goswami, U. (1991). Analogical reasoning: what develops? A review of research and theory. Child
Development, 62, 1-22.
Guilford, J.P. & Fruchter, B. (1978). Fundamental statistics in psychology and education (6th ed.).
New York: McGraw-Hill.
Guttman, L. (1945). A basis for analyzing test-retest reliability. Psychometrika, 10, 255-282.
Haan, N. de & Tellegen, P.J. (1986). De herziening van de schriftelijke taaltest voor doven. RU
Groningen: intern verslag.
Haasen, P.P. van, Bruyn, E.E.J. de, Pijl, Y.J., Poortinga, Y.H., Lutje Spelberg, H.C., Steene,
G. vander, Coetsier, P., Spoelders-Claes & Stinissen, J. (1986). WISC-R, Wechsler
Intelligence Scale for Children Revised. Nederlandstalige uitgave. Lisse: Swets & Zeitlinger.
Hambleton, R.K. & Swaminathan, H. (1985). Item response theory: Principles and applications.
Boston, MA: Kluwer-Nijhoff.
Hammill, D.D., Pearson, N.A. & Voress, J.K. (1993). DTVP-2 Developmental Test of Visual Perception. Examiners manual. Second Edition. Austin, TX: Pro-Ed.
Hammill, D.D., Pearson, N.A. & Wiederholt, J.D. (1996). CTONI Comprehensive Test of Nonverbal
Intelligence. Examiners Manual. Austin, TX: Pro-Ed.
Harinck, F. & Schoorl, P. (1987). Wast vernieuwde WISC-R werkelijk witter? Kind en adolescent, 3,
109-118.
REFERENCES
207
Harris, S.H. (1982). An evaluation of the Snijders-Oomen Nonverbal Intelligence Scale for Young
Children. Journal of Pediatric Psychology, 7, 3, 239-251.
Hessels, M.G.P. (1993). Leertest voor Etnische Minderheden. Theoretische en Empirische
Verantwoording. Rotterdam: RISBO.
Hofstee, W.K.B. (1990). Toepasbaarheid van psychologische tests bij allochtonen. Rapport van de
testscreeningscommissie ingesteld door het LBR in overleg met het NIP. Utrecht: Landelijk
Bureau Racismebestrijding.
Hofstee, W.K.B. & Tellegen, P.J. (1991). SON 2,-7, subsidie-aanvraag NWO 560-267-033. Groningen: RUG Persoonlijkheids- en Onderwijspsychologie.
Horn, J. ten (1996). Amerikaanse validering van de Snijders-Oomen niet-verbale intelligentietest voor
jonge kinderen, de SON-R 2,-7. RU Groningen: intern verslag.
Jenkinson, J., Roberts, S., Dennehy, S. & Tellegen, P. (1996). Validation of the Snijders-Oomen
Nonverbal Intelligence Test Revised 2,-7 for Australian Children with Disabilities. Journal
of Psychoeducational Assessment, 14, 276-286.
Kaufman, A.S. (1975). Factor Analysis of the WISC-R at 11 age levels between 6, and 16, years.
Journal of Consulting and Clinical Psychology, 43, 135-147.
Kaufman, A.S. & Kaufman, N.L. (1983). K-ABC Kaufman Assessment Battery for Children.
Interpretive Manual. Circle Pines, MN: American Guidance Service.
Kiers, H.A.L. (1990). SCA: een programma voor simultane component analyse. Groningen: IEC,
ProGamma.
Kiers, H.A.L. & ten Berge, J.M.F. (1989). Alternating least squares algoritms for simultaneous
components analysis with equal weight matrices in two or more populations. Psychometrika,
54, 467-473.
Kievit, Th. & Tak, J.A. (1996). De praktijk van de hulpverlening en het gebruik van de regulatieve
cyclus. In: Kievit, Th., Wit, J de, Groenendaal, J.H.A. & Tak, J.A. (eds.), Handboek psychodiagnostiek voor de hulpverlening aan kinderen. Utrecht: De Tijdstroom.
Laros, J.A. & Tellegen, P.J. (1991). Construction and validation of the SON-R 5,-17, the SnijdersOomen non-verbal intelligence test. Groningen: Wolters-Noordhoff.
Lienert, G.A. (1961). Testaufbau und Testanalyse. Weinheim: Verlag Julius Beltz.
Lombard, A.D. (1981). Success begins at Home. Educational Foundations of Pre-schoolers.
Massachusetts, Toronto: Lexington Books.
Lord, F.M. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ:
Lawrence Erlbaum.
Lord, F.M. & Novick, M.R. (1968). Statistical Theories of Mental Test Scores. Reading, MA:
Addison-Wesley Publishing Company.
Lutje Spelberg, H.C. & Van der Meulen, Sj. (1990). Het meten van taalbegrip en taalproductie,
subsidie-aanvraag NWO 560-256-040. Groningen: RUG afd. Orthopedagogiek.
Lynn, R. (1994). Sex differences in intelligence and brain size: a paradox resolved. Personality and
Individual Differences, 17, 2, 257-271.
Lynn, R. & Hampson, S. (1986). The rise of national intelligence: evidence from Britain, Japan and the
U.S.A.. Personality and Individual Differences, 1, 23-32.
McCarthy, D. (1972). Manual for the McCarthy Scales of Childrens Abilities. San Antonio: The
Psychological Corporation.
Meulen, B.F. van der & Smrkovsky, M. (1983). BOS 2-30 Bayley Ontwikkelingsschalen. Handleiding.
Lisse: Swets & Zeitlinger.
Meulen, B.F. van der & Smrkovsky, M. (1986). MOS 2,-8, McCarthy Ontwikkelingsschalen.
Handleiding. Lisse: Swets & Zeitlinger.
Meulen, B.F. van der & Smrkovsky, M. (1987). BOS 2-30 Bayley Ontwikkelingsschalen. Handleiding
bij de niet-verbale versie. Lisse: Swets & Zeitlinger.
Millsap, R.E. & Meredith, W.M. (1988). Component analysis in cross-sectional and longitudinal data.
Psychometrika, 53, 123-134.
Mislevy, R.J. & Bock, R.D. (1990). BILOG 3: Item Analysis and Test Scoring with Binary Logistic
Models. Mooresville, IN: Scientific Software.
Neutel, R.J., Meulen, B.F. van der & Lutje Spelberg, H.C. (1996). GOS 2,-4, Groningse
OntwikkelingsSchalen. Handleiding. Lisse: Swets & Zeitlinger.
208
SON-R 2,-7
Nunnally, J.C. (1978). Psychometric Theory (2nd ed.). New York: McGraw-Hill.
Nunnally, J.C. & Bernstein, I.H. (1994). Psychometric Theory (3rd ed.). New York: McGraw-Hill.
Raven, J.C. (1962). Coloured Progressive Matrices. London: Lewis.
Rekveld, I. (1994). De cognitieve ontwikkeling van kleuters in het basisonderwijs. OVG
Groningen: Intern verslag.
Resing, W.C.M., Bleichrodt, N. & Drenth, P.J.D. (1986). Het gebruik van de RAKIT bij allochtoon
etnische groepen. Nederlands Tijdschrift voor de Psychologie, 41, 179-188.
Reynell, J.K. (1977). Reynell Developmental Language Scales. Windsor: NFER-Nelson.
Reynell, J.K. (1985). Reynell Developmental Language Scales, second revision. Windsor: NFERNelson.
Reynolds, C.R. & Bigler, E.D. (1994). TOMAL Test of Memory and Learning. Examiners manual.
Austin, TX: Pro-Ed.
Roelandt, Th., Roijen, J.H.M. & Veenman, J. (1992). Minderheden in Nederland: statistisch vademecum 1992. s-Gravenhage: SDU/uitgeverij.
Sattler, J.M. (1992). Assessment of Children. Revised and Updated Third Edition. San Diego, CA:
J.M. Sattler, Publisher, Inc.
Schlichting, J.E.P.T., Eldik, M.C.M. van, Lutje Spelberg, H.C., Meulen, Sj. van der & Meulen, B.F.
van der (1995). Schlichting Test voor Taalproduktie. Handleiding. Nijmegen: Berkhout.
Schroots, J.J.F. & Alphen de Veer, R.J. van (1976). LDT Leidse Diagnostische Test, deel 1
Handleiding. Amsterdam: Swets & Zeitlinger.
Sijtsma, K. (1993). Kaf en koren onder Nederlandse tests. De Psycholoog, 28, 12, 502-503.
Smulders, F.J.H. (1963). STUTSMAN intelligentietest voor kleuters. Nederlandstalige bewerking.
Nijmegen: Berkhout.
Snijders-Oomen, N. (1943). Intelligentieonderzoek van doofstomme kinderen. Nijmegen: Berkhout.
Snijders, J.Th. & Snijders-Oomen, N. (1958) eerste editie, (1970) tweede editie. Snijders-Oomen
niet-verbale intelligentieschaal SON-58. Groningen: Wolters-Noordhoff.
Snijders, J.Th. & Snijders-Oomen, N. (1976). Snijders-Oomen Non-verbal Intelligence Scale, SON
2,-7. Groningen: Tjeenk Willink BV.
Snijders, J.Th., Tellegen, P.J. & Laros J.A. (1989). Snijders-Oomen non-verbal intelligence test,
SON-R 5,-17. Manual and research report. Groningen: Wolters-Noordhoff.
Snippe, M.D. (1996). Prestaties van kinderen met autisme en aan autisme verwante stoornissen op de
SON-R 2,-7. RU Groningen: Intern verslag.
SPSS Inc. (1990). SPSS/PC+ 4.0 Advanced Statistics. Chicago, Illinois: SPSS Inc.
Starren, J. (1975). SSON 7-17. De ontwikkeling van een nieuwe versie van de SON voor 7-17 jarigen.
Verantwoording en handleiding. Groningen: Wolters-Noordhoff.
Stinissen, J. & Steene, G. vander (1981). WPPSI Wechsler Preschool and Primary Scale of
Intelligence. Handleiding bij de Vlaamse aanpassing. Lisse: Swets & Zeitlinger.
Struiksma, A.J.C. & Geelhoed, J.W. (1996). Intelligentieonderzoek. In: Kievit, Th., Wit, J de, Groenendaal, J.H.A. & Tak, J.A. (eds.), Handboek psychodiagnostiek voor de hulpverlening aan
kinderen. Utrecht: De Tijdstroom.
Stutsman, R. (1931). Mental measurement of preschool children. Yonkers-on-Hudson, NY: World
Book.
Tellegen, P.J. (1993). A nonverbal alternative to the Wechsler Scales: The Snijders-Oomen Nonverbal
Intelligence Tests. In First Annual South Padre Island International Interdisciplinary
Conference on Cognitive Assessment of Children and Youth in School and Clinical Settings, A
Compendium of Proceedings. Fort Worth, TX: CyberSpace Publishing Corporation.
Tellegen, P. (1997). An Addition and Correction to the Jenkinson et al. (1996) Australian SON-R
2,-7 Validation Study. Journal of Psychoeducational Assessment, 15, 67-69.
Tellegen, P.J. & Laros, J.A. (1993a). The Snijders-Oomen Nonverbal Intelligence Tests: General
Intelligence Tests or Tests for Learning Potential? In: Hamers, J.H.M., Sijtsma, K. &
Ruijssenaars, A.J.J.M. (eds.), Learning Potential Assessment: Theoretical, Methodological and
Practical Issues. Amsterdam/Lisse: Swets & Zeitlinger.
Tellegen, P.J. & Laros, J.A. (1993b). The Construction and Validation of a Nonverbal Test of
Intelligence: The Revision of the Snijders-Oomen Tests. European Journal of Psychological
Assessment, Vol 9, 2, 147-157.
REFERENCES
209
Tellegen, P.J., Winkel, M. & Wijnberg-Williams, B.J. (1997). Snijders-Oomen Nonverbal Intelligence
Test SON-R 2,-7. Manual. Lisse: Swets & Zeitlinger
Tellegen, P.J., Wijnberg, B.J., Laros, J.A. & Winkel, M. (1992). Evaluatie van de SON 2,-7 ten
behoeve van de revisie. RU Groningen: intern verslag.
Verhelst, N.D. & Glas, C.A.W. (1995). Dynamic Generalizations of the Rasch Model. In: Fischer,
G.H. & Molenaar, I.W. (eds.), Rasch Models: Foundations, Recent Developmemts, and
Applications. New York: Springer-Verlag.
Warm, T.A. (1989). Weighted Likelihood Estimation of Ability in Item Response Theory.
Psychometrika, 54, 3, 427-450.
Wechsler, D. (1967). Manual for the Wechsler Preschool and Primary Scale of Intelligence. San
Antonio, TX: The Psychological Corporation.
Wechsler, D. (1974). Manual for the Wechsler Intelligence Scale for Children Revised. San Antonio,
TX: The Psychological Corporation.
Wechsler, D. (1989). WPPSI-R, Wechsler Preschool and Primary Scale of Intelligence Revised.
Manual. San Antonio, TX: The Psychological Corporation.
Wechsler, D. (1991). WISC-III Manual. San Antonio, TX: The Psychological Corporation.
Westerlaak, J.M. van, Kropman, J.A. & Collaris, J.W.M. (1975). Beroepenklapper. Nijmegen:
Instituut voor Toegepaste Sociologie (ITS).
Wijnands, A. (1997). De SON-R tests: verkennend onderzoek van de SON-R tests bij kinderen en
volwassenen met een verstandelijke handicap. RU Groningen: intern verslag.
Zimmerman, I.L., Steiner, V.G. & Pond, R.E. (1992). PLS-3 Preschool Language Scale-3. Examiners
Manual. San Antonio, TX: The Psychological Corporation.
Zimowski, M.F., Muraki, E., Mislevy, R.J. & Bock, R.D. (1994). BIMAIN 2, Multiple-group IRT
Analysis and Test Maintenance for Binary Items. Chicago, IL: Scientific Software International.