Assessmen Cerebral

HANDBOOK of Normative Data for - N europsycholo een Assessment SECOND EDITION Maura Mitrushinc Kyle B. Boon Jill Razan Louis F. D’ElicOXFORD [UNIVERSITY PRESS Press, Inc,, publishes works that farther /s objective of excellence in research, scholarship, and education, Oxford New York Auckland Cape Town Dar es Salaam Hong Kong Karachi Kuala Lumpur Madrid Melbourne Mexico City "Nairobi New Delhi Shanghai Taipei Toronto With offices in Argentina Austria Brazil Chile Czech Republie France Greece Guatemala Hungary Italy Japan Poland Portugal Singapore South Korea Switzerland Thailand Turkey Ukraine Vietnam Copyright © 2005 by Maura Mitrushina, Kyle B. Boone, Jill Razani, and Louis F. D’Elia Published by Oxford University Press, Ine 198 Madison Avenue, New York, New York 10016 swww oup.com ‘Oxford is a registered trademark of Oxford University Press All rights reserved. No part of this publication may be reproduced, stored in & retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, oF othenwise, ‘without the prior permission of Oxford University Press Library of Congress Cataloging-in-Publication Data Handbook of normative data for neuropsychological assessment / Maura Mitrushina ... [et al. — 2nd ed. p.:em, Includes bibliographical references and indexes. ISBN-13 978-0-19-516930-0 ISBN 0-19-516930-1 1, Neuropsychological tests—Handbooks, manuals, etc, 2. Reference values (Medicine)—Handbooks, manuals, ete. [DNLM: 1. Neurupsychological Tests. 2. Reference Values. WL 141 H23654 2005] C386 6.N4SM5S 2005 61680475—de22 2004054724 987654321 Printed in the United States of America ‘on acid-free paperContents 1. BACKGROUND 1, Introduction, 3 Test-Taking Environment, 6 Test Norms, 7 Tests, 9 Standard and Experimental, 9 When Is a Test Considered Experimental?, 10 What Determines Whether a Test Is Considered “Standard?", 11 2. Use of Methodological Concepts in Neuropsychology Practice, 12 Interface of Neuropsychology with Other Clinical Disciplines, 12 Applications of Neuropsychological Evaluation, 13 Different Levels of Data Integration in Neuropsychology Practice, 15 Judgment and Decision Making in Clinical Neuropsychology, 17 Strategies in Test Selection, 17 Normative References and Interpretation of Clinical Data, 18 Alternative Methods for Interpretation of Clinical Data, 22 Factors Influencing Performance on Neuropsychological Tests, 27 Effort and Motivation, 27 Issues in Cross-Cultural and Multicultural Neuropsychological Assessment, 28 Final Caveats, 30 Data Inclusion in Neuropsychological Reports, 31 3. Statistical and Psychometric Issues, 33 Measurement and Interpretation of Numerical Values, 33 Standardization of Raw Scores, 35 Standard Scores and Normal Distribution, 36 Interpretation of Infrequent (Outlying) Scores, 38 Interpretation of Scores That Are Not Normally Distributed, 38 Psychometric Properties of Tests, 39 Reliability, 39 Methods of Estimating Test Reliability, 39 Standard Error of Measurement, 40 Validity, 41 Decision Theory, 42 Base Rates, 42xvi CONTENTS Selection Ratio, 43 Incremental Validity, 43 Cutoffs and Diagnostic Accuracy of a Test or Interpretive Strategy, 44 Synthesis of Results of Different Studies in a Meta-Analysis, 45 Historical Overview and the Rationale for Using Meta-Analysis in This Book, 45 Application of Meta-Analysis in Clinical Practice, 46 Advantages, 46 Sources of Bias, 46 Selection of Studies and Procedures for Meta-Analyses Presented in This Book, 47 Literature Search and Selection of Studies, 47 Procedures Used in the Analyses, 48 Data Editing, 45 Regression, 50 Prediction, 51 Standard Deviations, 51 ‘Testing Model Fit and Parameter Specifications, 52 Effect of Demographic Variables, . 54 ‘Comments on the Applicability of the Meta-Analyses Presented in ‘This Book, 55 Il, TESTS OF ATTENTION AND CONCENTRATION: VISUAL AND AUDITORY 4, Trailmaking Test, 59 . Brief History of the Test, 59 Contributions of Cognitive Mechanisms and Physical Layout Differences to Performance on Parts A and B, 60 it the Derived Measures, Which Are Based on Differences in Performance Times for Parts A and B, 61 Utility of the Error Analysis, 62 Utility of the Cutoffs for Impairment, 63 ffe the Order of Presentation and Practice Time, Practice Effect, and Alternate Versions of the TMT, 64 Culture-Specific Sets of Normative Data and Cultural Adaptations for the TMT, 65 Modified Versions of the TMT, 66 Relationship Between TMT Performance and Demographic Factors, 67 Method for Evaluating the Normative Reports, 70 Summary of the Status of the Norms, 71 Summaries of the Studies, 72 Results of the Meta-Analyses of the Trailmaking Test Data, 96 Conclusions, 98 5. Color Trails Test, 99 Brief History of the Test, 99 Relationship Between CTT Performance and Demographic Factors, 101 Method for Evaluating the Normative Reports, 102 Summary of the Status of the Norms, 103 Summaries of the Studies, 103 Conclusions, 106 6. Stroop Test, 108 Brief History of the Test, 108 Current Administration Procedures, 110CONTENTS Relationship Between Stroop Test Performance and Demographic Factors, 12 Method for Evaluating the Normative Reports, 114 Summary of the Status of the Norms, 115 Summaries of the Studies, 116 Results of the Meta-Analyses of the Stroop Test Data, 132 Conclusions, 133 7. Auditory Consonant Trigrams, 134 Brief History of the Test, 134 Administration Procedures, 134 Psychometric Properties, 135 Relationship Between ACT Performance, Demographic Factors, and Vascular Status, 135 Method for Evaluating the Normative Reports, 135 Summary of the Status of the Norms, 136 Summaries of the Studies, 137 Conclusions, 140 8. Paced Auditory Serial Addition Test, 141 Brief History of the Test, 141 Modifications and Alternate Formats of the PASAT, 142 Psychometric Properties of the Test, 143 Relationship Between PASAT Performance and Demographic Factors, 143 Method for Evaluating the Normative Reports, 145 ‘Summary of the Status of the Norms, 145 Summaries of the Studies, 146 Conclusions, 158 9. Cancellation Tests, 160 Brief History of the Tests, 160 Ruff 2&7 Selective Attention Test, 160 Brief Overview of the Ruff 2&7, 160 Psychometric Properties of the Ruff 2&7, 161 Relationship Between Ruff 2&7 Performance and Demographic Factors, 162 Digit Vigilance Test, 162 Brief Overview of the DVT, 162 Psychometric Properties of the DVT, 163 Relationship Between DVT Performance and Demographic Factors, 163 Method for Evaluating the Normative Reports, 163 Summary of the Status of the Norms, 164 Summaries of the Studies, 164 Conclusions, 170 Il, LANGUAGE 10. Boston Naming Test, 173 Brief History of the Test, 173 Studies Using BNT Error Quality Analyses, 174 Current Views on the Mechanisms Underlying Confrontation Naming Deficits, 176 xviixvii 11. CONTENTS Modifications and Short Versions of the BNT, 177 Cultural Adaptations and Culture-Specific Normative Data for the BNT, 178 Psychometric Properties of the Test, 179 Relationship Between BNT Performance and Demographic Factors, 180 Method for Evaluating the Normative Reports, 182 Summary of the Status of the Norms, 182 Summaries of the Studies, 183 Results of the Meta-Analyses of the Boston Naming Test Data, 197 Conclusions, 199 Verbal Fluency Test, 200 Brief History of the Test, 200 Psychometric Properties of the Test, 202 Cognitive Mechanisms Underlying Word Generation, 202 Biochemical and Anatomical Correlates and Effect of Brain Pathology on Verbal Fluency, 203, Assessment of Verbal Fluency in Different Languages, 205 Relationship Between VFT Performance and Demographic Factors, 206 Method for Evaluating the Normative Reports, 208 Summary of the Status of the Norms, 209 Summaries of the Studies, 209 Results of the Meta-Analyses of the Verbal Fluency Data, 235 Conclusions, 237 IV. PERCEPTUAL ORGANIZATION: VISUOSPATIAL AND TACTILE 12. Rey-Osterrieth Complex Figure, 241 Brief History of the Test, 241 Administration Procedures, 241 Alternate Versions, 242 Scoring Systems, 243 Reliability, 248 Clinical Utility, 249 Culture-Specific Studies and Normative Data for the ROCF, 251 Relationship Between ROCF Performance and Demographic Factors, 251 Method for Evaluating the Normative Reports, 253 Summary of the Status of the Norms, 254 Summaries of the Studies, 255 Results of the Meta-Analyses of the ROCF Data, 269 Conclusions, 270 . Hooper Visual Organization Test, 272 Brief History of the Test, 272 Construct Validity, 273 Psychometric Properties of the Test, 274 Relationship Between HVOT Performance and Demographic Factors, 274 Method for Evaluating the Normative Reports, 274 Summary of the Status of the Norms, 275 Summaries of the Studies, 275 Conclusions, 277CONTENTS 14. Visual Form Discrimination Test, 278 Brief History of the Test, 278 Relationship Between VFDT Performance and Demographic Factors, 280 Method for Evaluating the Normative Reports, 280 Summary of the Status of the Norms, 281 Summaries of the Studies, 281 Conclusions, 282 15. Judgment of Line Orientation, 284 Brief History of the Test, 284 Psychometric Properties of the Test, 286 Alternate Brief Forms of the JLO, 286 Relationship Between JLO Performance and Demographic Factors, 286 Method for Evaluating the Normative Reports, 287 Summary of the Status of the Norms, 288 Summaries of the Studies, 288 Conclusions, 296 16. Design Fluency Tests, 298 Brief History of the Tests, 298 Psychometric Properties of the Design Fluency Tests, 300 Ruff Figural Fluency Test, 300 Design Fluency Test (Jones-Gotman/Milner Version), 300 Relationship Between Design Fluency Performance and Demographic Factors, 301 Method for Evaluating the Normative Reports, 301 Summary of the Status of the Norms, 302 Summaries of the Studies, 303 Conclusions, 310 17. Tactual Performance Test, 312 Brief History of the Test, 312 Psychometric Properties of the TPT, 314 Relationship Between TPT Performance and Demographic Factors, 314 Method for Evaluating the Normative Reports, 315 Summary of the Status of the Norms, 316 Summaries of the Studies, 318 Conclusions, 333 V. VERBAL AND VISUAL LEARNING AND MEMORY 18. Wechsler Memory Scale (WMS-R, WMS-III, and WMS-IIIA), 337 Brief History of the Test, 337 Relationship Between Test Performance and Demographic Factors, 344 Method for Evaluating the Normative Reports, 345 Summary of the Status of the Norms, 345 Summaries of the Studies, 346 Conclusions, 355xx CONTENTS 19. List-Learning Tests, 357 20. v 21. Rey Auditory-Verbal Learning Test, 357 Variability in Administration of the Rey AVL, 357 Functioning of Different Memory Mechanisms, as Assessed by the Rey AVLT, 359 Practice Effect and Alternate Forms of the Rey AVLT, 361 Assessment of Auditory Verbal Learning with the Rey AVLT in Different Languages and Cultures, 362 California Verbal Learning Test—Second Edition, 362 Structure of the CVLT-II and Description of the Normative Data Provided in the Test Manual, 362 Alternate and Short Forms of the CVLT-I, 363 Review of the Recent Literature on the CVLT and CVLT-II, 363 Effect of Semantic Organization on Recall, 363 Anatomical Correlates, 364 Assessment of Learning and Memory in Traumatic Brain Injury, 365 Assessment of Serial Position Effect in Dementias, 366 Repeated Administration and Practice Effects, 366 Assessment of Effort with the CVLT, 367 Use of the CVLT in Other Languages and Cultures, 367 Adaptations and Alternate Versions of the CVLT, 367 Hopkins Verbal Learning Test, 368 WHO-UCLA Auditory Verbal Learning Test, 369 CERAD List-Learning Test, 370 Selective Reminding Test, 370 Other Verbal and Nonverbal List-Learning Tests, 371 Relationship Between List-Learning Test Performance and Demographic Factors, 372 Method for Evaluating the Normative Reports, 374 Summary of the Status of the Norms, 375 Summaries of the Studies, 375 Results of the Meta-Analyses of the Rey AVLT Data, 391 Conclusions, 392 Benton Visual Retention Test, 394 Brief History of the Test, 394 Psychometric Properties of the Test, 397 Relationship Between BVRT Performance and Demographic Factors, 398 Method for Evaluating the Normative Reports, 400 Summary of the Status of the Norms, 400 Summaries of the Studies, 402 Conclusions, 416 MOTOR FUNCTIONS Finger Tapping Test, 419 Brief History of the Test, 419 Relationship Between FTT Performance and Demographic Factors, 421 Method for Evaluating the Normative Reports, 422 Summary of the Status of the Norms, 422 Summaries of the Studies, 423CONTENTS xxi Results of the Meta-Analyses of the Finger Tapping Test Data, 441 Conclusions, 442 22. Grip Strength Test (Hand Dynamometer), 444 Brief History of the Test, 444 Relationship Between Hand Dynamometer Performance and Demographic Factors, 445 Method for Evaluating the Normative Reports, 445 Summary of the Status of the Norms, 446 Summaries of the Studies, 447 Results of the Meta-Analyses of the Hand Dynamometer Test Data, 457 Conclusions, 458 23. Grooved Pegboard Test, 459 Brief History of the Test, 459 Relationship Between GPT Performance and Demographic Factors, 460 Method for Evaluating the Normative Reports, 460 Summary of the Status of the Norms, 461 Summaries of the Studies, 462 Results of the Meta-Analyses of the GPT Data, 470 Conclusions, 471 VII. CONCEPT FORMATION AND REASONING 24, Category Test, 475 Brief History of the Test, 475 Alternate Formats, 477 Relationship Between Category Test Performance and Demographic Factors, 480 Method for Evaluating the Normative Reports, 481 Summary of the Status of the Norms, 482 Summaries of the Studies, 483 Results of the Meta-Analyses of the Category Test Data, 494 Conclusions, 495 25. Wisconsin Card Sorting Test, 496 Brief History of the Test, 496 ‘Anatomical Correlates and Effect of Brain Pathology on the WCST, 498 Brief Overview of Clinical Findings Using the WCST, 499 Modifications and Alternate Formats of the WCST, 503 Psychometric Properties of the Test, 505 Relationship Between WCST Performance and Demographic Factors, 508 Method for Evaluating the Normative Reports, 511 Summary of the Status of the Norms, 512 Summaries of the Studies, 513 Conclusions, 531 References, 533xxii CONTENTS Appendices 1. Where to Buy the Tests, 611 2a, Subject Instructions for ACT According to Boone et al. (1990) and Boone (1999), 613 2b. Auditory Consonant Trigrams (Boone et al., 1990; Boone, 1999), 614 2c. Subject Instructions for ACT According to Stuss et al. (1987, 1988), 615 2d. Auditory Consonant Trigrams (Stuss et al., 1987, 1988), 616 3. WHO-UCLA Auditory Verbal Learning Test: Instructions and Test Forms, 618 4. Locator and Data Tables for the Trailmaking Test (TMT), 623 4m, Meta-Analysis Tables for the Trailmaking Test (TMT), 648 5. Locator and Data Tables for the Color Trails Test, 657 6. Locator and Data Tables for the Stroop Test, 661 6m. Meta-Analysis Tables for the Stroop Test (Golden Version, Interference Version), 680 7. Locator and Data Tables for Auditory Consonant Trigrams, 684 8. Locator and Data Tables for the Paced Auditory Serial Addition Test, 689 9. Locator and Data Tables for the Cancellation Tests, 705 10. Locator and Data Tables for the Boston Naming Test (BNT), 709 10m. Meta-Analysis Tables for the Boston Naming Test (BNT), 724 11. Locator and Data Tables for the Verbal Fluency Test, 728 11m. Meta-Analysis Tables for the Verbal Fluency Test, 760 12. Locator and Data Tables for the Rey-Osterrieth Complex Figure (ROCF), 767 12m. Meta-Analysis Tables for the Rey-Osterrieth Complex Figure (ROCF), 72 13. Locator and Data Tables for the Hooper Visual Organization Test (HVOT), 793 14, Locator and Data Tables for the Visual Form Discrimination Test, 796 15. Locator and Data Tables for the Judgment of Line Orientation Test, 798 16. Locator and Data Tables for the Design Fluency Tests, 805 17. Locator and Data Tables for the Tactual Performance Test, 812 18. Locator and Data Tables for the Wechsler Memory Scale (WMS-R, WMS-III, and WMS-IIIA), 828 19. Locator and Data Tables for the List-Learning Tests, 837 19m. Meta-Analysis Tables for the Rey Auditory-Verbal Learning Test (Rey AVLT), 869 20, Locator and Data Tables for the Benton Visual Retention Test, 885 21. Locator and Data Tables for the Finger Tapping Test (FTT), 903 21m. Meta-Analysis Tables for the Finger Tapping Test (FTT), 921 22. Locator and Data Tables for the Grip Strength Test (Hand Dynamometer), 934 22m. Meta-Analysis Tables for the Grip Strength Test (Hand Dynamometer), 946 23. Locator and Data Tables for the Grooved Pegboard Test (GPT), 957 23m, Meta-Analysis Tables for the Grooved Pegboard Test (GPT), 969 24. Locator and Data Tables for the Category Test (CT), 976 24m. Meta-Analysis Tables for the Category Test (CT), 985 25. Locator and Data Tables for the Wisconsin Card Sorting Test, 989 Copyright Acknowledgments, 1013 Index, 1015a You have elther reached a page that is unavailable for viewing or reached your viewing lil far this booka You have elther reached a page that is unavailable for viewing or reached your viewing lil far this book4 the examination. A formal evaluation of the patient's emotional functioning and personality characteristics is also an intrinsi¢ part of neuropsychological evaluation. All this information taken together provides a framework for an accurate understanding of a patient's cognitive strengths and limitations. To illustrate the relationship between different sources of information, consider Figure 1.1. As can be seen, no single section of the pyramid in Figure 1.1 should be used alone to form an opinion about neuropsychological functioning, All interpretive elements (raw data/norms, ab- servations of test-taking behaviors, and medical history/presenting symptoms) must come into play in building evidence that forms the basis for a professional opinion. Further shaping the interpretive process, of course, is the neuro- psychologist’s clinical judgment, which is influenced by his or her education, professional experience, and research knowledge base. The history (including medical, psychiatric, educational, vocational, and avocational) and presenting symptoms are important in understanding test data. A detailed medical/psychiatric history is an especially important source of information given that neuropsychological test performance can be greatly influenced by medical or psychiatric conditions. Document- ing risk factors known to affect neuropsychological test performance is essential since an important task of neuropsychologists is to properly attribute the contribution of peripheral OBSERVATIONS. RAW DATA, HISTORY NORMS BACKGROUND nervous system, central nervous system, and medical and/or emotional dysfunction to the clinical picture. Itis also important to observe the qualitative process of performance leading to a specific score on a test. Reporting a score without revealing how it was obtained can sometimes be misleading, For illustrative purposes, assume that the criterion for passing a particular component of a driving test is that the car is parked in the garage, We look and, yes, the car is in the garage—criterion met. The driver passed the test. Or did he? We interview an observer of the event and sadly learn that the person drove through the garage door to get there! Obviously not a stellar performance. Similarly, with neuropsychological testing, not- ing how an individual obtained a score can be quite illuminating. Consider two 75-year-old architects (patient A and patient B) who each obtain a score of 36/36 on the Rey-Osterrieth test. On a normative basis alone, performance scores for both patients would be considered within normal limits, but how did these two architects obtain the score? Patient A quickly recognized the overall gestalt of the drawing, drew a box, and filled in the details. Patient B failed to appreciate the drawing's overall shape and built up the figure by accretion, taking 8 minutes to do so, with numerous erasures, Al- though patient B produced the same score as patient A, the dramatically different approach that patient B followed to complete the design Interpretation = REPORT Figure 1.1. Graphic representation of the relationship between different sources af information contributing to the deci on-making process in neuropsychology.a You have elther reached a page that is unavailable for viewing or reached your viewing lil far this booka You have elther reached a page that is unavailable for viewing or reached your viewing lil far this booka You have elther reached a page that is unavailable for viewing or reached your viewing lil far this book8 BACKGROUND Table 1.1. Published Rey Auditory-Verbal Learning Test Normative Reports Age Group Education Trial 5 Delayed Investigator (years) —n— (years) (# af words) Recall (# of words) Test Location Rey, 1964 70-90 15. No info, 95 Switzerland 22) Query & 70-81 2% 14 5.86 345 N. Dakota (all Veterans Megran, 1983 (2.04) (292) ‘Administration inpatients with physical complaints) Cohen et al, 73-89 4138 9.35 825 Peoria, IL pers. comm. (1.89) (263) Bleeker et al, 1988 80-89 no 1348 92 Maryland 1) Geffen et al, 1990 70-86 1 na 82 56 8. Australia (all male 25) 26) subjects) Innik et al, 1990 80.84 49 >12 90 55 Minnesota (25) (33) Mitrushina et al, 76-85 2% 133 97 S. California 1991 (36) 8) Mitrushina & 76-85 16 140 103 S. California Satz, 199]a (36) G4) (53rd percentile). However, close reading of that normative report would reveal that the data were collected on a sample of Veterans Administration patients hospitalized for a variety of physical complaints. Thus, overall performance scores of the comparison sample were probably artificially lowered because of hospitalization effects, chronic pain effects, and dysphoria. Therefore, applying the Query and Megran data would lead the examiner to conclude that the patient’s performance was better than it probably was. Depending on which other remaining normative reports were used, the patient's score would fall in the low average (Geffen et al., 1990, 19th percentile; Iniketal., 1990, 12th percentile) or borderline (Rey, 1964, 6th percentile; Cohen et al., personal communication, 4th percentile; Bleecker et al., 1988, 6th percentile; Mitrushina et al., 1991, 9th percentile; Mitrushina & Satz, 1991a, 4th percentile) range. Unfortunately, all the studies reporting trial 5 data for this age group suffer from small sample size (n <50). In terms of selecting the “best” study for comparison purposes, those with the smallest sample size should probably be first rejected. This would eliminate the studies of Cohen et al., Bleecker et al., Geffen et al., Rey, and Mitrushina and Satz, As noted in Spreen and Strauss (1998), use of Rey's norms (reported in 1964 but collected in 1944) should also be avoided because of test content and administration differences. These data were collected over 50 years ago in Switzerland, raising serious concerns about cohort and cultural effects. Similarly, data from the Geffen et al. (1990) report should be avoided due to cultural differences in comparing North American vs. Australian samples and the fact that the educational level of the samples was low. We had previously elimi- nated the Query and Megran (1983) study for the reasons mentioned earlier. Of the two normative reports remaining (Ivnik et al., 1990; Mitrushina et al., 1991), that by Ivnik et al. would be selected because of the larger sample size. The subject and procedural characteristics of these two studies are other- wise nearly identical. Finally, and most im- portantly, the demographic characteristics of the patient being evaluated matched well with the demographic characteristics of the participants in this normative study. If one were interested in examining the delayed recall performance of this patient, four studies would be available for normativea You have elther reached a page that is unavailable for viewing or reached your viewing lil far this booka You have elther reached a page that is unavailable for viewing or reached your viewing lil far this booka You have elther reached a page that is unavailable for viewing or reached your viewing lil far this book2 Use of Methodological Concepts in Neuropsychology Practice INTERFACE OF NEUROPSYCHOLOGY WITH OTHER CLINICAL DISCIPLINES The notion of mental status has three components: 1, Mood and affect 2. Perception and thought 3. Cognitive status content/process of Factors such as appearance, motor activity, insight, and motivation can be incorporated into these three components of mental status. Clinicians specializing in different disciplines approach the mental status evaluation from different perspectives. For psychiatrists, the presence of the following symptomatology is of prim: fective symptoms (¢.g., depression, mania, ray cycling), perceptual disturbances (e, nations), disturbance in the content of thought (eg, delusions), and disturbed process of thought (e.g., tangentiality, loosening of associ- ations, flight of ideas). Assessment of cognitive status represents one of the aspects of psychiatric evaluation and includes brief appraisal of the level of consciousness, orientation, attention, memory, language, ability to follow verbal 12 commands, calculations, abstract reasoning, fund of general information, and judgment. Assessment of cognitive and affective components of mental status also constitutes part of a neurological evaluation that addresses higher cortical and limbic system functions. In addition, the neurological evaluation focuses on the integrity of the lower levels of the nervous system through assessing functions of the cranial nerves, motor systems, sensory systems, reflexes, coordination, station, and gait. Psychiatric and neurological approaches are redefined in the context of neuropsychiatry, a recently rediscovered medical discipline that has its roots in the notion of “psychosomatics”, Neuropsychiatry is concemed with both neurological and psychiatric symptoms of brain- related disorders and is supported by advances in neuroimaging, psychopharmacology, genet- ics, and molecular biology. In the context of peyetiotto, neurological, and neuropsychiatric evaluations, cognitive status is assessed by unstructured questioning or through administration of structured screening instruments, allowing quantification of a patient's cognitive status (e.g, Mini-Mental State Examination: Folstein et al., 1975). This assessment is brief (limited to 10-20 minutes) and, therefore, yields only a gross estimate ofa You have elther reached a page that is unavailable for viewing or reached your viewing lil far this booka You have elther reached a page that is unavailable for viewing or reached your viewing lil far this booka You have elther reached a page that is unavailable for viewing or reached your viewing lil far this book16 (Kaplan et al., 1991) attests to the importance of the qualitative performance indices even in the context of structured battery assessment This movement toward attending to performance quality without compromising. standardization of test administration procedures (or with minimal modification of the procedures) is also reflected in Lezak et al.'s (2004) distinction between “optimal” and “standard” testing conditions. Based on a comprehensive review of recent developments in this area, Caplan and Shech- ter (1995) formulated the distinction between testing and evaluation as follows: We view the former as a largely mechanical en- terprise that, because of its rigidity, lends itself well to group or computer-based applications, Evalua- tion is, by contrast, an art applied on an individual basis that involves not only testing skills, but also professional creativity, observational expertise, flexibility, and ingenuity in the service of developing a multidimensional understanding of patients— their abilities and deficits, their emotional state, self-regulatory functions, the impact of environ- mental variables on test performance, and so forth, (pp. 359-360) ‘The authors passionately advocate flexibility in testing procedures—specifically in the rehabilitation setting—to allow patients to maximally express their potential in test performance. A similar appeal to “see beyond the test data” in offering an opinion on psychological functioning in the litigation setting was voiced by Matarazzo (1990). In forensic evaluations, it is especially important to address numerous sources of bias in the test data (see van Gorp & McMullen, 1997), which affect the accuracy of interpretations. Matarazzo proposed a distinction between psychological testing and psychological assessment, where the latter incorporates historical information, medical history, and other relevant information in clinical decision making. Meyer et al. (2001) refined the distinction with an emphasis on usage of multiple test methods in the, latter, incorporated in the context of historical information and behavioral observations, and addressed applications of the obtained information BACKGROUND. According to this model, the following two. levels of data integration should be considered in neuropsychological practice: 1, Testing refers to the psychometric aspects and addresses the quantitative appraisal of a patient's performance on different measures. It yields a score or a set of scores that allow comparison with normative data or with a patient's own scores across different tests and over time. 2. Assessment incorporates qualitative aspects of test interpretation in addition to psychometric determination of a patient’s relative standing in reference to the normative data. It is reliant on behavioral observations to allow better understanding of the nature of difficulties in test performance and of dysfunctional cognitive mechanisms contributing to low test scores. The clinician integrates various sources of information to place the interpretation of a patient's psychometric profile in the context of his or her history and current condition. Informa- tion is based on behavioral observations and an interview with the patient, in addition to the patient’s test performance. Additional information can be obtained from medical and school records, inter- views with significant others, school- teachers, nursing staff, ete. The following issues constitute the essence of a neuropsychological assessment: the psychometric aspect of a patient's performance across cognitive domains; qualitative interpretation of dysfimctional mechanisms; the patient's behavior and interaction with the clinician; effort/ motivation to perform on the tasks; other aspects of mental status, including affective state; personality characteristics impacting information processing; demographic information, including educational and occupational history; medical and psychiatric history; family history; current symptomatology, progression of symptoms, and treatment; sources of social/financial support and living conditions; motivation to improve and future plans. Neuropsychological evaluations based on sound assessment techniques, with propera You have elther reached a page that is unavailable for viewing or reached your viewing lil far this booka You have elther reached a page that is unavailable for viewing or reached your viewing lil far this booka You have elther reached a page that is unavailable for viewing or reached your viewing lil far this book20 average, or just average relative to'the normative group data (cf. Anastasi, 1988). For instance, knowing that a medically healthy 76-year-old male obtained 15 out of 36 possible points on 3-minute delayed recall of the Rey-Osterrieth Complex Figure has little meaning by itself because the raw seore conveys no information regarding the expected performance score. We have no idea whether this is a good, bad, or average score. Even knowing that 50% of the figure was recalled has little meaning because there is no way to dis- cern what percent recall would be expected. In this example, the subject (ie., medically healthy 76-year-old male) and procedural (ie., direct copy of drawing followed by $-minute delayed recall without warning, with scoring following Taylor's method) variables are known and used to locate an appropriate normative sample. When the raw performance |score is contrasted with the range of scores obtained by the normative sample, one can determine that a recall score of 18/36 is in the high average range (80th percentile; in reference to the norms reported by Boone et al., 1992). In other words, we now know the subject's relative standing compared to the normative group (namely, that performance is better than 80% of all normals who took the test). To more precisely judge the nature of performance on a test relative to the reference normative (e.g., standard) group, the raw score is converted to a standard score (typically az or T score, which is expressed in terms of standard deviation units from the mean, see Chapter 3). Such conversion permits not only determination of the subject's relative standing compared with the normative group but also direct comparison of scores across different tests. The development of “standard” measurement scales is especially important to neuropsychologists since test scores collected while assessing the same functional domain are often expressed in different units of measure ment. For instance, when assessing, motor functioning, the Grooved Pegboard Test score is based on the number of seconds to complete placing metal pegs in all the grooved slots on the pegboard, the PIN Test score is based on the number of holes punched, the Dynamometer score is expressed in kilograms, BACKGROUND. and the Finger Tapping Test score is expressed in terms of the number of taps made in 10 seconds. The ability to convert each of these various scores to a standard score equivalent, regardless of the previously expressed units of measurement (seconds, number of holes punched, kilograms, ete.) allows determination of a subject's relative standing in one distribution and permits its comparison with relative standing in another. ‘The underlying assumption when using z or T scores is that the distribution of scores obtained by the normative sample follows what is known as the “standard normal distribution,” which approximates the bell-shaped normal curve (see Chapter 3). Therefore, there is a fixed relationship between the standardized test scores, z scores, and percentile ranks. Table 2.2 illustrates the interrelationship between = scores, percentile ranks, and corresponding WAIS-II] 1Q equivalents. A positive score will translate to a percentile rank of 50 or greater (refer to left side of the percentile rank column) and to a WAIS-III IQ of 100 or greater (left side of the WAIS-III IQ column). A negative z score will translate to a percentile rank below 50 (use right side of column) and a WAIS-IIT 1Q below 100 (right side of column). Consider the following example. You have just assessed Mr. Smith's right (dominant) hand performance on the Grooved Pegboard Test, and you note that it took him 68 seconds to complete. Mr. Smith is 35 years old and has finished 11 years of formal schooling. He has lived almost his entire life in a large western Canadian city and only recently moved to the city where you evaluated his performance. After surveying the available normative data for possible comparison purposes (see Chapter 23), you decide that use of Bornstein’s (1985) normative data for the Grooved Pegboard performance would be optimal, Examining the normative table, you note that males in his age and education group performed the test with their dominant hand in 65.3 (8.5) seconds. 68 ~ 653 (85) 1.32 Considering that higher scores on this test reflect poorer performance (since it tooka You have elther reached a page that is unavailable for viewing or reached your viewing lil far this booka You have elther reached a page that is unavailable for viewing or reached your viewing lil far this booka You have elther reached a page that is unavailable for viewing or reached your viewing lil far this book24 a cutoff involves computation of the area under the ROC curve, which represents the most useful index of diagnostic accuracy (Swets, 1996). The score associated with the largest area under the curve is the most sensitive cutting score. In spite of their popularity, the use of cutoff scores has been criticized for im- posing an artificial dichotomy on a continuous distribution and for subjective judgment involved in the selection of cutoff points (Dwyer, 1996). It does not allow consideration of a spectrum of related disorders among diagnostic possi- bilities. According to a number of studies, use of a single cutoff for a specific test, without considering other clinical information and demographic factors, results in a large number of false-positive misclassifications. A similar effect on classification accuracy is ren- dered by the failure to account for the base rates (see Chapter 3) of the criterion condition in the normative sample. In addition, an ROC curve yields reliable cutoff scores for diagnostic clas: fications only when an external eriterion measure provides a reliable basis for diagnosis in the clinical group. Conditions resulting in subtle cognitive dysfunction frequently do not have a reliable external diagnostic criterion, which undermines the accuracy of classification. Some investigators suggest that many of the current cutoffs are too conserva- tive (Fromm-Auch & Yeudall,, 1983), thereby generating too many false nes tives. However, their work has been done. primarily with highly educated, high-IQ samples; and of course, cutoffs based on average performers would generate a high false-negative rate. Unfortunately, a large number of studies document unacceptably high false-positive misclassi- fication rates, placing normal subjects into impaired ranges across different tests which are interpreted using a cutoff criterion (see chapters on Halstead-Re- itan Battery tests in this book). . The authors of the Mayo Older American Normative Studies (MOANS) used the BACKGROUND overlapping interval strategies described by Pauker (1988) to maximize the sample size of the normative distribution at each midpoint age interval (Ivnik et al., 1996; Smith & Ivnik, 2003). Ten-year age bands were staggered at successive 3-year midpoint intervals. For example, all participants between 62 and 71 years of age contributed to the 67 year midpoint interval, whereas part of this sample also contributed to the 65-74 year age band with the midpoint of 70 years and to the 68-77 year age band with the midpoint of 73 years. ‘The data were co-normed on the same normative cohort for a large battery of tests. The raw score distribution for each test at each midpoint age was normalized by assigning standard scores with a mean of 10 and SD of 3, based on actual percentile ranks. Formulas based on linear regressions were generated for each test in the battery to be applied to the normalized, age-corrected MOANS Scaled Scores to adjust for education Duff et al. (2003) used the overlapping midpoint age interval technique with 5-year midpoint intervals to report normative data for the Repeatable Battery for the Assessment of Neuropsychologi- cal Status (RBANS; Randolph, 1998). Age-corrected scaled scores were further converted into education-corrected scaled scores using the same method across four education levels. In spite of the complexity of the procedures used in these studies for der- vation of the normative data, these techniques hold great promise as they allow maximization of the sample size for each age interval, “smoothening” of the tran- sition in normative expectations as the patient passes from one age group to another, and direct comparisons between various tests as they are co-normed on the same sample, . Crawford and colleagues proposed a single-case approach, where an individual is treated as a sample of n=1. Crawford and Howell (1998a) described a modified t-test method for interpreting,a You have elther reached a page that is unavailable for viewing or reached your viewing lil far this booka You have elther reached a page that is unavailable for viewing or reached your viewing lil far this booka You have elther reached a page that is unavailable for viewing or reached your viewing lil far this book28 Assessment of effort performs the important gatekeeper function of insuring that patients receive compensation only for actual disabilities and that neuropsychological assessments are not misused by individuals perpetrating fraud. In the case of somatoform/ conversion disorder patients, identification of noncredible cognitive complaints can steer treatment strategies away from reinforcing the medical patient role to addressing the underlying issues/concerns driving the symptom creation. Issues in Cross-Cultural and Multicultural Neuropsychological Assessment Issues of ethnic diversity pose difficult and serious challenges for the field of neuropsychology, particularly as clinicians are more frequently being requested to assess functioning in patients from varied ethnic, cultural, socioeconomic, and linguistic backgrounds (see Ferraro et al., 2002; Fletcher-Janzen et al., 2000; Nell, 2000; for review). Practitioners are faced with a number of problems when conducting such assessments. One problem may be finding standardized neuropsychological instruments in languages other than English. Alternatively, if the patient speaks English, the problem may be finding ethnicity-specific normative data that take into account issues such as culture and bilingualism, Some of the critical issues in using these approaches in cross-cultural and multicultural neuropsychological assessment will briefly be discussed below. Due to the lack of availability of neuropsychological instruments in other languages, clinicians often have to translate tests or find translated versions of standardized tests in the literature. However, Puente and Ardila (2000) describe a number of methodological problems with such an approach. These authors point out that “translation and adaptation require much time and expertise.” For example, translation of neuropsychological tests from English to Spanish is quite complex given that there are a number of Hispanic subgroups that use varied idioms and expressions of the language. The issue of test translation is just as BACKGROUND difficult in other ethnic groups, such as Asians (Wong, 2000) and Middle Easterners (Es- candell, 2002), where there is large diversity in the languages or dialects and cultures. Another methodological problem in adapting English-standardized tests into other languages involves the validity of such measures when used with other cultural groups (Ponton & Ardila, 1999, Puente & Ardila, 2000; Nell, 2000). Simple language translations may fail to take into account the impact of familiarity and relevance of the test items in different ethnic groups (Escandell, 2002; Puente & Ardila, 2000; Wong, 2000), possibly compromising the validity of the results. Similarly, once a neuropsychological test has been translated into another language, it may no longer measure the same cognitive functions it was once thought to measure in its standard form. For example, Escandell (2002) points out that according to a study by Loewenstein et al. (1995), a pattern of correlations for a measure of daily functioning, the Direct Assessment of Functional Status (DAFS) test, was different when it was translated into Spanish relative to its original English version. Similarly, Puente and Ardila suggest that tests such as Digit Span in the WAIS or WISC may require other cognitive processes when administered in Spanish than in English since naming the digits requires a different number of syllables. While developing cross-cultural tests can be challenging, when specific procedures and guidelines are used, adequate outcomes can be achieved. Along with other approaches, Puente and Ardila (2000) recommend using Brislin’s (1983) three-step procedure when translating tests, particularly for the Hispanic population. These steps include the initial translation, back translation, and resolving differences between the original version and the resulting translated version (fora more detailed discussion of this approach, see Puente & Ardila, 2000). Additionally, translation and test development for specific ethnic groups require a thorough understanding of the group's culture and a familiarity with the language. Another issue is that normative data need to take into account various cultural factors in addition to the usual demographic factors.a You have elther reached a page that is unavailable for viewing or reached your viewing lil far this booka You have elther reached a page that is unavailable for viewing or reached your viewing lil far this booka You have elther reached a page that is unavailable for viewing or reached your viewing lil far this book32 Practitioners who are confident in their neuropsychological skills and knowledge are not reticent to provide raw test data and to reveal how interpretations were derived. Matarazzo (1995) asserts that, in fact, the inclusion of test scores serves to minimize and clarify any interpretation biases or idiosyncrasies on the part of the writer. The inclusion of raw test scores in reports is also critical for comparing the results of initial and subsequent neuropsychological evaluations. Further, access to some services (e.g,, Regional Center resources for individuals with extremely low intelligence) and some criminal sentencing decisions (e.g,, ineligibility for the death penalty in mentally retarded individuals) require the Pieniadz and Kelland (2001), summarizing the results of a survey completed in the 1990s by 81 directors of neuropsychology training programs, indicated that only 35% of respondents routinely appended test scores to reports. The reasons most commonly given for including numerical data were “thoroughness” (100%) and “facilitation of comparison” of test records (96%). Of those who did not append actual test results, 80% indicated that their decision was based on a desire to avoid misinterpretation by unqualified persons. In contrast, Donders (2001), summarizing the results of a survey completed by 414 U.S. members of American Psychological Association Division 40, revealed that 88% of respondents included numeric data in their reports, although raw scores were provided less frequently. BACKGROUND It is of interest that although the most commonly stated reason for omitting actual test scores from reports is protection of the patient from misinterpretation of results by nonpsychologists, no empirical data have emerged in the decades of psychological testing showing any harmful effects caused by inclusion of scores in test reports. In fact, Freides (1995) and Matarazzo (1995) assert that there is a greater potential for harm from interpretations of scores without the scores than from scores without interpretations. The interested reader is directed to more thorough discussions of this topic by Freides (1993, 1995), Pieniadz and Kelland (2001), Donders (2001), Matarazzo (1995), and Nangle and McSweeny (1995, 1996). We recommend that neuropsychological reports contain (1) all raw scores and percentiles, standard scores, and/or T scores; (2) the normative studies used to derive percentiles if other than the published test manual; and (3) which demographic factors were ad- justed for in each test. Reporting of raw scores is important for various reasons, For example, a superior normative data set may emerge after an initial neuropsychological assessment; on retesting, the examiner might want to score both sets of test scores according to the more recent norms, which would not be possible if the initial raw scores were not avi le. Further, inclusion of the raw scores enables the reader to check that the scores were in fact converted and inter preted properly,a You have elther reached a page that is unavailable for viewing or reached your viewing lil far this booka You have elther reached a page that is unavailable for viewing or reached your viewing lil far this booka You have elther reached a page that is unavailable for viewing or reached your viewing lil far this book36 much poorer performance in relation to distribution B than in relation to distribution A. X-1SDM fobebnbeteteted 8 9101112 131415 67 Number of words ‘Thus, to account for the variability within the normative distribution, raw scores are standardized, i.e., converted into = scores that relate the difference between an individual score and the group mean (X — M) to the SD for the reference group: A negative = score indicates that the raw score lies below the mean for the reference group, a positive = score represents higher performance than the mean for the group, and az score of 0 indicates that the raw score is equal to the mean of the reference group." ‘The z score (SD units) shows not only how much an individual performance deviates from the mean of the sample but also how likely it is that other individuals in the sample would achieve scores as high or as low as the person being tested. Standardization of raw scores, eg., their conversion into = scores, allows comparison of the relative standing of individuals across different tests in spite of the differences in the measurement scales or the means and SDs for these tests. A standardized distribution of = scores has a mean of 0 and an SD of 1 because the mean is subtracted from each score and the result is divided by the SD. It pre- serves the same shape as the distribution of the raw scores from which it was derived. Therefore, differences in standard scores are proportional to the differences in the corresponding raw scores. ‘For those tests that measure performance in terms of time or number of errors, where the higher scares reflect lower performance, » scores represent an inverse of the obtained. score. Mathematically, in these cases the numerator should be multiplied by =I, .¢—(X—MD. BACKGROUND In spite of the obvious advantages of using z scores over raw scores, some of the properties of z scores are viewed as undesirable: (1) 2 scores have fractional values, which are car- ried to at least one decimal place; (2) half of the = scores in the standardized distribution are negative and half are positive, which leads to the zero-sum problem (i.e., corresponding values on both sides of the distribution cancel each other when totaled). Parameter values of the standard distribution are arbitrarily designated. Therefore, they can be easily changed through simple arith- metic transformations of z scores. T-score transformations overcome these disadvantages through multiplying z scores by 10 (thus eliminating fractional values) and adding a constant of 50 (which eliminates negative values and places all the scores on a scale of 0-100 with a mean of 50 and SD of 10): T=10z+50 For example, a z score of — 1.6 can be exe pressed in T scores as follows: ri —16) +50=34 An example of a test which uses T-score conversion is the Minnesota Multiphasic Personality Inventory (MMPI) and its recent revision. Clinically significant elevations on the scales are judged relative to a mean of 50 and an SD of 10, which equates the scale of measurement across all validity and clinical scales on this test, STANDARD SCORES AND NORMAL DISTRIBUTION Many biological measures and human characteristics are distributed so that the highest frequency of scores is observed around the distribution mean, with a gradual decrease in the frequency further away from the mean, which eventually tails off on both sides. Score distributions of many psychological tests ap- proximate this model, which in its ideal hypo- thetical form represents a normal distribution. Ita You have elther reached a page that is unavailable for viewing or reached your viewing lil far this booka You have elther reached a page that is unavailable for viewing or reached your viewing lil far this booka You have elther reached a page that is unavailable for viewing or reached your viewing lil far this book40 3. The split-half method involves splitting the test into two equivalent halves after a single administration. There are different ways of splitting a test. The highest comparability of the two halves is achieved by an odd-even split in which one form contains all odd-numbered items and the other form, all even-numbered items. 4. The internal consistency method estimates the reliability of a test based on the number of items and the averaged intercorrelations among them. This method is mathematically related to the split-half method. Coefficient x is the most general form of this method and represents the mean reliability coefficient obtained from all possible split-half comparisons. In essence, internal consistency estimates compare each item on a test’to every other item There is no universally agreed best method to evaluate test reliability, Each method has its advantages and disadvantages. The split-half reliability method overcomes theoretical and practical problems associated with the test- retest and alternate forms methods, such as difficulty in developing two equivalent forms of a test, carry-over effects, reactivity effects, and the effect of random variability on two test probes, However, the reliability estimate obtained by the split-half method varies depending on the arbitrarily chosen method of splitting. In addition, the split-half reliability coefficient underestimates the reliability of the full test and requires the use of a correction formula, The level of reliability varies for different tests. Ideally, a highly reliable test would be preferred to a test with low reliability. How- ever, many practical considerations might influence a clinician’s test selection. The cost of error in a decision-making situation is another factor which needs to be considered in selecting an appropriate test for the given situation. Test reliability should be high when a patient's test performance is considered as one of the factors in making a final diagn tic determination. Tests with lower reliability might be acceptable in preliminary sereening situation BACKGROUND Typical levels of reliability attained by neuropsychological tests range from 0.95 to 0.80, which represents a high to moderate test with a reliability estimate of 0.80, 20% of the variability in scores is due to measurement error. Thus, tests with reliability below 0.80 introduce a considerable proportion of “noise” in scores, which compromises their interpretability. For screening tests, reliability between 0.80 and 0.60 would be acceptable, whereas reliability estimates below 0.60 are usually judged as unacceptably low. Standard Error of Measurement The reliability estimate provides a relative measure of the accuracy of test scores. As any correlation, it is influenced by the variability of scores. In a sample with a heterogeneous score distribution, reliability will be higher than in a more homogeneous sample. The reliability estimate does not indicate how much variability should be expected due to measurement error and how accurate the idual test scores are. Therefore, in addition to reporting reliability coefficients, test developers report the size of the standard error of measurement (SEM), which is useful in interpreting the observed scores of each idual patient. The SEM is determined by the reliability of the test (r,.) and the variability of test scores (¢,): SEM = 6,,/1-rs Since no test provides a perfect measure of ability, a certain degree of variability in the scores obtained by the same subject is expected, The SEM indicates how much an individual’s score might vary if he or she is retested repeatedly with the same test (as- suming that there is no practice effect or fatigue effect). According to measurement theory, the scores obtained by one subject across an infinite number of retests with the same test would result in a normal distribution, with the mean equal to this subject's “true” score and the SD equal to the SEM. Since in most clinical situations we obtain only one score on a test, we may treat it as an estimate of the theoretical “true” score. Usinga You have elther reached a page that is unavailable for viewing or reached your viewing lil far this booka You have elther reached a page that is unavailable for viewing or reached your viewing lil far this booka You have elther reached a page that is unavailable for viewing or reached your viewing lil far this book44 clinical situation. Test usefulness depends largely on the context in which the test is used. Cutoffs and diagnostic accuracy of a test or interpretive strategy As pointed out above, in the framework of a decision theory approach, both the predictor (test) and criterion values are reduced to only two outcomes, Thus, the continuous nature of test scores is reduced to categories of pass/ fail, impaired / unimpaired, etc. Selection of a cutoff point dividing a sequence of test scores into these two categories is another factor that affects the accuracy of decisions. Through manipulating the cutoff, the frequency of a certain type of correct decision can be maxi- mized at the expense of increasing the frequency of another type of error. For example, test sensitivity, or the abi to correctly identify impaired individuals (expressed as the ratio of TP to all impaired individuals [TP+FN]), can be increased by fixing the cutoff at a small number of incorrect responses. This will reduce the frequency of EN errors but increase the proportion of FP errors. In other words, this will assure correct identification of the majority of individuals with even mild impairment and very few misidentifications of impaired individuals as being intact. At the same time, this will yield a large number of intact individuals who will be misidentified as impaired. The costs of such misidentification include inappropriate treatment, psychological distress, and adverse social/ economic consequences On the other hand, test specificity, or the ability to correctly identify the absence of impairment (expressed as the ratio of TN to all intact individuals [TN+FP]), can be increased by setting the cutoff at a large number of incorrect responses. This will reduce the proportion of FP errors but result in a large number of FN errors. In other words, only those patients who have pronounced impair ment will be identified as impaired, and very few intact individuals will be misidentified. However, many individuals with mild symptomatology will be missed. This will preclude timely therapeutic intervention which other- wise would allow stabilization or reversal of these patients’ symptomatology. BACKGROUND ‘Thus, manipulation of the cutoff affects the balance between sensitivity and specificity and results in different cost-benefit ratios. Based on the empirical evidence, the cutoff is usually set at a value that ensures a reasonable balance between sensitivity and specificity so that only “borderline” patients will likely be misidentified. Setting the optimal cutoff yields the highest Hit Rate, i.e., ability of the test to correctly identify the presence and absence of impairment (expressed as the ratio of [TP+TN] to all individuals in the sample [TP +FP+FN +TN}), In making a diagnostic decision, the clinician is concerned with the utility of a test in correctly identifying impairment in an individual patient, i.e., in the test’s predictive value, rather than in its accuracy in discriminating between groups. Positive Predictive Value represents the probability that the patient is indeed impaired, given an impaired test score (expressed as the ratio of TP to all individuals identified by the test as impaired [TP + FP]). Negative Predictive Value represents the probability that the patient is intact given a non- impaired test score {expressed as the ratio of TN to all individuals identified by the test as non-impaired [FN + TN]). The probability of the condition based on the test result (predictive value) is referred to as the pasttest probability. However, the usefulness of a test in aiding diagnostic decisions is also determined by the base rates (prevalence) of the condition in a given setting (see above), which represents the pretest probability. These probabilities can be converted into odds of having the condition, which are expressed as the ratio of the probability of having the condition to (1—probability of having the condition), Posttest odds (which represent the likelihood that the individual who obtained a score X on the test has the condition) take into account the pretest odds and likelihood ratio: Pretest odds x Likelihood ratio = Posttest odds where the likelihood ratio represents the odds of a specific test result occurring in an individual who has a condition over the odds of that test result occurring in an individual who does not have the condition. In other words, ita You have elther reached a page that is unavailable for viewing or reached your viewing lil far this booka You have elther reached a page that is unavailable for viewing or reached your viewing lil far this booka You have elther reached a page that is unavailable for viewing or reached your viewing lil far this book48 homogeneous studies that are based on the same version of the test or the same administration format. For those chapters that contain the meta-analytic tables, not all studies available in the literature were necessarily included in the database for the analyses. Those data sets that are based on clinical groups not well identified in terms of methodology or on administration of the tests by medical staff, rather than by trained examiners, were not reviewed. Among studies that were reviewed, those that do not contain test means and SDs (or data that can be converted into these sta- tistics), do not have demographic descriptions of the sample, or are based on idiosyncratic samples (e.g., data collected in China) or nonstandard administration procedures were not included in the meta-analyses. An effort was made to identify multiple publications based on the same study and to include a data set from each study only once, to avoid overlapping data sets. Similarly, when data are presented in overlapping age groups, only nonoverlapping data points were used. Data sets based on medical patients and on patients referred for neuropsychological evaluation which yielded no neurological findings were not included. The resulting data sets include data collected primarily (but not exclusively) in the United States and Canada, and the vast majority of the participants across the studies are Caucasian. Procedures Used in the Analyses Data were analyzed using Stata, which is a general-purpose, command line-driven statistical package for data management and analysis, It reads data into storage memory and is programmable, allowing the user to add new commands. This package was used for our purposes because it contains a comprehensive set of user-written commands for meta-analysis, in addition to commonly used ordinary regression analysis tools. Data in all analyses were inversely weighted ‘on standard errors for the means since such weighting allows one to account for both sample size and the dispersion around the mean for each data point (calculated as a square root of the ratio of squared SD to the BACKGROUND sample size). Data points that have a larger sample size and a smaller variance contribute more to the analysis. This helps to control for study quality since higher-quality studies tend to have larger weights. Stata’s analytic weights were used, which represent the number of elements that gave rise to the statistic re- presenting the data point. Fixed effect with a cluster option was used for all regressions. A cluster option was used to identify data points that were derived from the same study, to account for a lack of independence of data points within each study. Ordinary least square regressions (“regress” command) were used, as opposed to the meta- analysis regression (“metareg” command), because “metareg” does not allow for the cluster option. We opted for the fixed effect based on an assumption that all data came from the same population. Preliminary tests with the “Meta” command for all data sets revealed that pooled estimates of the fixed and random effects were comparable (e.g, 42.42 and 42.22, respectively, for the FAS). ‘Tables of predicted values are based on the parameters identified in the above regressions and include 95% Cls (expected to include 95 out of 100 estimated values if the trials were replicated 100 times), calculated according to the following formula: 95% CI=value + 1.96 ‘varivalue). Data Editing After the relevant literature was selected, mean test scores with their respective SDs, demographic variables, and study characteristics were recorded in the Stata database partitioned by age and/or education group or for the entire sample as reported by study authors. When data allowed gender comparisons in addition to the overall scores, they were also recorded in a separate file to avoid double sampling. Every entry in the database is viewed as a data point. For example, a study that provides test performance data stratified into 4 age x 2 education groups would generate eight data points, Data were examined for consistency and for outlying scores. To aid us in this examination, we used the “meta” test, which tests data fora You have elther reached a page that is unavailable for viewing or reached your viewing lil far this booka You have elther reached a page that is unavailable for viewing or reached your viewing lil far this booka You have elther reached a page that is unavailable for viewing or reached your viewing lil far this book52 significance tests for regressions on SDs are reported. Tests for model fit for the solutions on SDs were performed using the same approach as for the performance scores. The results of these tests were used for decision- making purposes, but they are not presented in the meta-analytic tables in the appendices, to avoid information overload. When the results suggested that age does not account for any notable amount of variability in SD, as reflected in a very low R®, mean $Ds derived from the original data are listed in the tables as they are applicable for all age groups. ‘Testing Model Fit and Parameter Specifications Postestimation tests of parameter specifications were performed to ensure accuracy of the prediction. Though violation of the normality of the residuals would not affect estimates of regression coefficients and predicted values, it would affect the validity of hypothesis testing; in other words, significant deviation from normality would affect the validity of p values for the t-test and F-test. The Shapiro-Wilk W test was used to assess the normality of residuals for the variables nsed in the regressions. The p value for the W statistic —— pany a ss is « = ° BACKGROUND is based on the assumption that the distribution is normal. Thus, high values of p indicate that we cannot reject the hypothesis that the variable is normally distributed. The normality of residuals was also assessed using the “kdensity” plot (Kernel Density Estimate), which approximates the probability density of a variable, and through visual inspection of residuals regressed on age. Close approxima- tion of the estimated curve to the normal density overlaid on the plot and no pattern in the dispersion of residuals support the results of the Shapiro-Wilk test. Kernel Density Es- timate and plot of the residuals regressed on age for the FAS are reproduced in Figures 3.3. and 3.4 for illustration purposes (the size of the bubbles in Fig. 3.4 reflects the size of the SEs of the data points, reciprocal to their weights). However, they are not included in the meta-analytic tables in the appendices. Homoscedasticity, or homogeneity of vari- ances of the residuals, is one of the main as- sumptions of the regression analysis. We used. White's general test for heteroscedasticity, which regresses the squared residuals on all distinct regressors, cross-products, and squares of regressors. It tests the null hypothesis that the variance of the residuals is homogenous. Low values of the derived Lagrange multiplier statistic and high values of p indicate that we 3 3 Residuals Figure 3.3. Kernel Density Estimate, which compares the estimated curve to the normal density (data for the Verbal Fluency-FAS test were used).a You have elther reached a page that is unavailable for viewing or reached your viewing lil far this booka You have elther reached a page that is unavailable for viewing or reached your viewing lil far this booka You have elther reached a page that is unavailable for viewing or reached your viewing lil far this book56 2 - wo limited variance across the data sets. Only few studies reported 10. Levels of education and 1Q for the majority of data sets are high. Therefore, the predicted values overestimate expected performance for individuals with a high school education or below and with average or lower than average range of intelligence. We cannot describe our aggregate sample in terms of ethnic distribution because of scarcity of information on participants’ ethnicity in the individual articles. We believe that the underlying samples are not representative of the mixture of ethnic groups according to U.S. Census figures since many samples were dominated by Caucasian participants. Those data that were collected exclusively on representatives of specific ethnic groups (e.g, Chinese, African American, or Hispanic) were not included in the meta-analyses as they increase the heterogeneity of the data. Ideally, separate analyses on data for different ethnic groups should be conducted in the future, providing that a sufficient number of studies reporting normative data specifically for different ethnic groups will be generated . Increments in the values of predictor or moderator variables extracted from the literature are uneven. As reflected in scatterplots depicting the distribution of data points around the regression line BACKGROUND for each relevant neuropsychological test, available data seem to cluster at the young and advanced ages, with more scarce data points in-between. Further investigations are needed to assure consistency in the relationship between predictor and outcome variables across all ages. However, large gaps in the ranges of predictor or moderator variables were avoided by eliminating extreme scores from the analyses. As a consequence of such adherence to empirically supported data, ranges of demographic categories covered in prediction tables are restricted; e.g., age groups are limited from ‘both ends, and lower levels of education are not represented. The suggested predictions for age (and education in a few cases) are based on the data for largely intact samples. It is unknown if the same relationship between. demographic variables and test performance holds for individuals with brain pathology. Ultimately, normative data- bases should be expanded to include meta-analyses based on various clinical samples across test batteries, to acquire information on expected performance profiles for different diagnostic categories. 2 In spite of the weaknesses addressed above, we hope that the predictions presented in this book will facilitate the process of clinical decision making, which encompasses historical, clinical, and psychometric information.a You have elther reached a page that is unavailable for viewing or reached your viewing lil far this booka You have elther reached a page that is unavailable for viewing or reached your viewing lil far this booka You have elther reached a page that is unavailable for viewing or reached your viewing lil far this book60 TESTS OF ATTENTION AND CONCENTRATION ‘TMT is one of the most frequently used tests. The TMT is a standard component of screet ing batteries designed to detect cognitive impairment in different neuropsychological conditions. For example, in 1990, the TMT was adopted as a measure of cognitive impairment by the Drug Abuse Treatment Outcome Study (DATOS), sponsored by the National Institute on Drug Abuse of the National Institutes of Health. The DATOS was a naturalistic, prospective cohort study of adults enrolled in drug abuse treatment programs, which collected data on 10,010 adults in 96 programs across 11 cities in the United States between 1991 and 1993 (Horton & Roberts, 2003). The TMT data for a subsample of 8,521 adults were analyzed and presented by Horton and Roberts in a series of 19 articles published by the Inter- national Journal of Neuroscience between 2001 and 2003, The findings reflected in these publications point to significant effects of age, education, and ethnicity on many indices of TMT performance across various groups of drug users. However, the authors emphasized that these demographic effects are weak. Contributions of Cogs and Physical Layout Performance on Parts A and B The TMT is described as a measure of visual conceptual and visuomotor tracking (Lezak et al., 2004); complex visual scanning with a motor component (Shui et al., 1990) with a contribution of motor speed and agility (Schear & Sato, 1989); simple motor-spatial skills and basic sequencing abilities (Lamberty ct al., 1994); visual tracking, mental flexibility, and attention (Crowe, 1998b); visual perceptual abilities (Groff & Hubble, 1981); motor speed and visual attention (Gaudino et al., 1995); attention, simple motor and spatial skills, and sequencing abilities (Martin et al., 2003); and executive function (Burgess, 2003). Based on the results of a neuroimaging study exploring cognitive correlates of brain aging, Coffey et al. (2001) concluded that the neural substrates for the functions measured with the TMT part B involve multiple systems distributed throughout the brain. They attributed age-related slowing on part B to reduced motor speed, impaired working memory, poor visual scanning, or a combination of several cognitive deficits. Factor analytic studies indicated that both parts A and B load on a visual perceptual factor (Groff & Hubble, 1981), a spatial factor (Moehle et al., 1990), a visuomotor scanning, factor (Shum et al., 1990), a visuomotor speed and coordination factor (Swiercinsky, 1979), a motor problem-solving factor (Goldstein & Shelly, 1975), and a sustained attention and mental tracking factor (Lamar et al., 2002). Because of the complexity of mechanisms contributing to TMT performance, poor performance on this test is a nonspecific finding, which can be attributable to visual perceptual, motor, executive, motivational, or other factors (Anderson et al., 1995; Crowe, 1998b; Heilbronner et al., 1991; Iverson et al., 2002; Lezak et al., 2004; Lorig et al., 1986; Reitan & Wolfson, 1995b). To tease out a contribution of executive functioning to TMT performance, investigators tured to part B as a more complex measure requiring sequence alternation. According to the literature, several factors contribute to greater difficulty of part B in comparison to part A, which include cognitive demands and physical layout. Part B was found to place additional demands on the ability to alternate (Crowe, 1998b; Gaudino et al., 1995; Salt- house et al., 2000) and to flexibly modify a course of action (Arbuthnott & Frank, 2000; Kortte et al., 2002; Lamar et al., 2002; Lamb- erty et al., 1994; Pontius & Yudowitz, 1980) with a task-set inhibition component (Ar buthnott & Frank, 2000). Conversely, several investigators have identified additional demands on the ability to maintain two response sets simultaneously as the cognitive mecha- nism contributing to the greater difficulty of part B (Eson et al., 1978; Lezak et al., 2004; Reitan, 1971). Recent studies suggest that differences in physical layout further contribute to the greater difficulty of part B. Rossini and Karl (1994) reported that part B is 32% longer than part A. According to Gaudino et al. (1995), mean distances for parts A and B are 7.8 (3.2) and 10.2 (4.5) cm, respectively, which increases trail length for part B by 56 cm ina You have elther reached a page that is unavailable for viewing or reached your viewing lil far this booka You have elther reached a page that is unavailable for viewing or reached your viewing lil far this booka You have elther reached a page that is unavailable for viewing or reached your viewing lil far this book64 TESTS OF ATTENTION AND CONCENTRATION The utility of the cutoff scores was further emphasized by Soukup et al. (1998), who recommended reporting cutoff scores that represent borderline (15th percentile) and defective (<5th percentile) performance in addition to the means and standard deviations in future studies, to offset problems associated with the positive skew in the distribution of TMT scores. Effect of the Order of Presentation and Practice Time, Practice Effect, and Alternate Versions of the TMT The effect of the order of presentation on performance on parts A and B was examined by Taylor (1998a) in a sample of patients with neurological disorders and by Miner and Ferraro (1998) in a sample of undergraduate students. Both studies revealed a significant time x order interaction, with time to completion being lower for part A and higher for part B, for the reverse order of presentation. Taylor (1998a) explained this trend in terms of a slight effect of practice in visual ‘scanning and noted that part B can be used in isolation as omission of part A will not lead to serious distortion of part B performance ‘Thompson et al. (1999) examined the utility of practice times in predicting success or failure on the full version of the test. The authors presented tables of classification accuracy for various practice times. They found that 20- and 30-second cutoffs on practice times for parts A and B, respectively, optimized the prediction of successful completion of the full version (within< 180 seconds for part A and <300 seconds for part B). The authors un- derscored the usefulness of practice. times in decision making regarding discontinuation of the full version, Significant practice effects over repeated administrations of the test were reported by Craddick and Stern (1963), Dye (1979), and Mitrushina and Satz (1991a), although Dodrill and Troupin’s (1975) data did not indicate a practice effect in their sample, McCaffrey et al, (1992, 1993) reported significant practice effects for part B within 7-10 days and then 3 months following initial assessment in their group of chronic cigarette smokers with a mean age of 59.1 years (standard deviation [SD]=9.3). At the’ fourth testing probe 6 months after the initial assessment, practice effect gains were partially lost. Practice effects, specifically on part B, were reported by DesRosiers and Kavanagh (1987). Frank et al. (1996) also reported a significant practice effect for part B on the 2-year retest for a sam ple of 380 elderly over the age of 65. To minimize the practice effect over repeated administrations, several alternate versions of the TMT were developed. Lewis and Rennick (1979) developed alternate forms for part B, which were included in the Repeatable Cognitive-Perceptual-Motor Battery. Further iscussion of the comparability of these forms to part B can be found tn _Kelland and Lewis (1994) and Lezak et al. ( DesRosiers and Ksvnagh (1987) developed Trail C (TMC) and Trail D (TMD) versions, which retained the same relative position of each circle but inverted the labeled sequences respective to their equivalents, TMA and TMB. Administration of both sets to 16 normal adults in the pilot study yielded high correlations between alternate forms (r=0.73 and 0.80 for TMA/TMC and TMB/ TMD comparisons, respectively). The equivalence of the alternate forms was further investigated in an orthopedic control group and in a sample of closed head injury patients. Alternate forms for both conditions were stable and consistent in both groups. The equivalence of these alternate versions was further tested by McCracken and Franzen (1992) and Franzen et al. (1996). Based on the data from clinical samples and patients referred for neuropsychological evaluation, the correlation analyses as well as a comparison of solutions for two runs of the principal component analysis provide support for the equivalence of the standard and alternate test versions. On a sample of healthy adults, LoSasso et al. (1998) found that the TMT-D version is somewhat more difficult than TMT-B. Therefore, the authors concluded that TMT-D can serve as an excellent alternate form to the TMT-B, if it is administered on the retest after the TMT-B. In the same study, the authors reported that there is no clinically meaningful difference in scores with respect to whether the test isa You have elther reached a page that is unavailable for viewing or reached your viewing lil far this booka You have elther reached a page that is unavailable for viewing or reached your viewing lil far this booka You have elther reached a page that is unavailable for viewing or reached your viewing lil far this book68 TESTS OF ATTENTION AND CONCENTRATION Ganguli et al., 1991, 1996; Giovagnoli, 1996; Gordon, 1972; Goul & Brown, 1970; Heaton et al, 1986, 1991, 1999, 2004; Horton & Roberts, 2001, 2002, 2003, see Brief History of the Test for the description of the study Innik et al, 1996; Kennedy, 1981; Lamberty et al., 1994; Lannoo & Vingerhoets, 1997; Lee and Chan, 2000a; Libon et al., 1994; Lu and Bigler, 2002; Lyness et al., 1994; Matthews et al., 1999; Parsons et al, 1964; Rasmusson , 2000; Siegert & os : Small et al, 2000: Soukup et al. 1998; Stanton et al., 1984; Stuss et al., 198 Vlahou & Kosmidis, 2002; Wahlin et al., 1996; Wiederholt et al., 1993). In two other studies, which do not formally acknowledge the association between age and TMT performance, TMT data are presented by age groupings, and the mean scores of the groups obviously increase with age (Fromm-Auch & Yeudall, 1983; Harley et al., 1980), Coffey et al, (2001) report a significant relationship between TMT part B performance and age-related brain changes documented on quantitative MRI in a sample of 320 elderly nonclinical volunteers, with poorer perfor mance being related to cerebral atrophy (reflected in decreased cerebral hemisphere volume and increased peripheral cerebrospi- nal fluid volume) and ventricular enlargement. ‘The authors comment on negative findings in similar neuroimaging studies and suggest that differences in subject characteristics and sample sizes, brain imaging methods, measurement techniques, and approach to statistical analysis might account for this discrepancy Yeudall et al. (1987) and Boll and Reitan (1973) detected no association between age and part A or B, Yeudall et al.'s (1987) negative findings may be due to the restricted age range of their sample (15-40); examination of other data sets suggests that declines with age appear to occur after age 40 (Goul & Brown, 1970; Stuss et al., 1987) or age 50 (Kennedy, 1981) for part A, The reason for Boll and Reitan’s (1973) failure to document age effects is less obvious, but it could involve small sample sizes in the very young and very old groups and problems data (e.g., very young and very old participants performing poorly). In spite of the reported nonsignificant correlations, the percent of participants correctly classified as normal for the oldest age group (60-64) did fall precipi- tously compared to other age groups, suggesting that there was a decline with age in test performance at least in this age group. Gonzalez et al. (2001) did not find a relationship between age and TMT part B performance in a sample of homeless individuals who were receiving medical care. The variance for the test scores was very large (probably ue to varied degree of impairment in mental status), which possibly obscured the relationship between age and performance time. Many studies have documented a significant relationship between education and TMT scores in normal individuals, with higher education levels being tied to better test performance (Alekoumbides et al., 1987; Anthony et al., 1980; Bornstein, 1985; Bornstein & Giovagnoli, 1996; Gonzalez et al., 2001; Gor- don, 1972; Heaton et al., 1986, 1991, 1999, 2004; Horton & Roberts, 2001, 2002, 2003a, 2003b; Kennedy, 1981; Lamberty et al, 1994; Lannoo & Vingerhoets, 1997; Lee & Chan, 2000a; Lu & Bigler, 2002; Matthews et al. 1999; Parsons et al., 1964; Portin et al., 1995; Saxton et al., 2000; Stanton et al., 1984; Stuss et al., 1987; Vlahou & Kosmidis, 2002; Wie- derholt et al., 1993); however, a few studies did not find a significant correlation between education and TMT scores (Fastenau, 1998; Ivnik et al., 1996; Wablin et al., 1996; Yeudall et al., 1987). Heaton and colleagues (1956), assessing the combined effect of age and education on TMT part B in normal participants, documented a significant interaction, suggesting that for individuals less than 60 years old lower levels of education are associated with greater amounts of age-associated impairment and for those more than 60 years old level of education has less of an effect than for younger individuals. Another aspect of the age/education interaction in reference to part B performance was presented by Richardson and Marottoli (1996). The mean performance for community- residing elderly participants with less thana You have elther reached a page that is unavailable for viewing or reached your viewing lil far this booka You have elther reached a page that is unavailable for viewing or reached your viewing lil far this booka You have elther reached a page that is unavailable for viewing or reached your viewing lil far this book72 TESTS OF ATTENTION AND CONCENTRATION provided in the studies described in this chapter." SUMMARIES OF THE STUDIES Reitan and Wolfson, 1985 The authors provided general guidelines for ‘TMT score interpretation in the form of test completion times (in seconds), which corre- spond to “severity ranges” for part B only: 0-60 sec: perfectly normal (or better than average) 61-72 sec: normal 73-105 sec: mildly impaired 2106 sec: seriously impaired No other information was provided, such as score means, SDs, or any data regarding the normative sample on which these guidelines were developed. These cutoffs represent a substantial departure from cutoffs published earlier; the definition of normal performance here is approximately 20 seconds less than in the 1958 and 1979 guidelines. Considerations regarding use of the study The authors argued that these norms were meant as “general guidelines” and that “exact percentile ranks corresponding with each possible score are hardly necessary because the other methods of inference are used to sup plement normative data in clinical interpretation of results of individual participants” (p. 97). However, we maintain that more precise scores as well as separate normative data for different age, 1Q, and educational levels are necessary to avoid false-positive errors in diagnosis. Gilandas, Touyz, Beumont, and Greenberg, 1984 (p. 102) The authors provided the percentile ranks associated with Davies’ (1968) TMT normative data and concluded that a percentile rank of 25 is “mildly suggestive of brain damage” and scores at the 10th percentile and lower are “moderately suggestive of brain damage.” "Norms for children are available in Baron (2004) and Spreen and Strauss (1998). Golden, Osmon, Moses, and Berg, 1981b (pp. 22-23) The authors provided recommendations regarding the detection of laterality of brain damage: Ais generally considered more a measure of right sphere integrity (ie., visual scanning skills), where part B is more indicative of left hemisphere intactness (Le., language symbol manipulation and direction of behavior according to a complex plan). ‘Therefore when one part indicates impairment relative to the other part, a Jateralized injury may be present. . . . Part Aisconsidered toindicate greater impairment if the score on part Bis less than twice the score on part A. Part B indicates greater impairmentif its score is more than three times the score on part A. Tests in which the part B score lies between two times and three times the part A score suggest that perfor- ances on the two parts are essentially equal. However, lateralizing properties of performance time ratios for two conditions have been repeatedly refuted in the literature (Hom & Reitan, 1990; Salthouse et al., 1996). [TMT.1] Davies, 1968 (Table A4.2) ‘The author published TMT data on 540 British participants as a part of her investigation of the influence of age on TMT performance. Test scores were obtained on 50 men and 40 women in each of six decade age groups. The reference Davies cited as containing a further description of her subject sample could not be located. Mean times in seconds corresponding to 10th, 25th, 50th, 75th, and 90th percentile ranks for parts A and B are provided for each age decade, with the exception that the data on the participants in their 20s and 30s were collapsed. Davies also reports optimal cutoff points for young vs. middle-aged individuals. No significant gender differences were observed within any specific decade, although in the group as a whole men performed slightly but significantly more quickly on part B. Study strengths 1. Presentation of the data in 10- or 20-year age intervals.a You have elther reached a page that is unavailable for viewing or reached your viewing lil far this booka You have elther reached a page that is unavailable for viewing or reached your viewing lil far this booka You have elther reached a page that is unavailable for viewing or reached your viewing lil far this book76 TESTS OF ATTENTION AND CONCENTRATION Sy strengths 1. Large sample size, although the individual cells had only 30 participants per cell 2. Presentation of the data in terms of age groupings. 3. Reporting of education, 1Q estimates, gender, and geographic area. 4, Means and SDs are reported. Considerations regarding use of the study 1. Very high mean intelligence scores. 2. Some variability in educational level across groups, which may have led to some unusual findings; inexplicably, those 60-69 years old performed either as well as or slightly better than those 50-59 years old. . Vague exclusion criteria. Lack of reference to ethnicityflanguage issues and the fact that data were obtained on Canadians, possibly reducing its generalizability for clinical interpretation in the United States. ae [TMT.9] Fromm-Auch and Yeudall, 1983 (Table 4.10) The authors obtained TMT data on 193 Canadian participants (111 male, 82 female) recruited through posted advertisements and personal contacts. Participants are described as “nonpsychiatrie” and “nonneurological.” Eighty-three percent of the sample were right-handed. Mean (SD) age was 25.4 (8.2) years (range = 15-64). Mean (SD) education was 14.8 (3.0) years (range =8-26) and included technical and university training. Mean (SD) WAIS FSIQ, VIQ, and PIQ were 119.1 (8.8, range=98-142), 1198 (9.9, range = 95-143), and 115.6 (9.8, range = 89- 146), respectively. Of note, no subject obtained an FSIQ which was lower than the average range. Mean time in seconds, SDs, and ranges for parts A and B are reported for five age groupings: 15-17, 18-23, 24-32, 33-40, and 41-64 years. Sample sizes range from 10 to 75. The two oldest age groupings had sample sizes less than 20. No gender differences were documented, and male and female data were collapsed. Sealy strengths 1. The large overall sample size. 2. Data are partitioned into five age groups. 3. Sample composition is described in terms of IQ, educational level, age, gender, handedness, recruitment procedures, and geographic area. 4. Some psychiatric and neurological exclusion criteria are used. Means and SDs are reported. aa Considerations regarding use of the study 1, High intellectual and educational levels of the sample. 2. Sample size for some age groups is very small. 3. Data were collected in Canada, which may limit their usefulness for clinical interpretation in the United States. Essentially no differences in performance were noted between those 18— 23 years old and those 24-32 years old, suggesting that use of a single age grouping for 18-32 would have been appropriate. S [TMT.10] Bornstein, 1985 (Table A4.11) The author collected data on 365 Canadian individuals (178 males and 187 females) recruited through posted notices on college campuses and unemployment offices, newspaper ads, and senior-citizen groups. Participants were paid for their participation. Participants ranged in age from 18 to 69 years, with a mean of 43.3 (17.1) years, and had completed 5-90 years of education, with a mean of 12,3 (2.7) years. Ninety-one and a half percent of the sample were right-handed. No other demographic data or exclusion criteria are reported. Mean time in seconds and SDs for parts A and B are reported for three age groupings (20-39, 40-59, and 60-69 years), two educational levels (less than high school, greater than or equal to high school), and gender, resulting in a total of 12 separate groups. Individual group sample sizes ranged from 13 to 86. Significant correlations were obtained between TMT scores and age and education, suggesting that better performance was associated with younger age and more years ofa You have elther reached a page that is unavailable for viewing or reached your viewing lil far this booka You have elther reached a page that is unavailable for viewing or reached your viewing lil far this booka You have elther reached a page that is unavailable for viewing or reached your viewing lil far this book80 TESTS OF ATTENTION AND CONCENTRATION ITMT.16] Stuss, Stethem, and Poirier, 1987 (Tables A4.17 and A4.18) The authors collected normative data on 60 Canadian English- or French-speaking participants, who were recruited through personal contacts or employment agencies and paid for their participation. Tests were administered in each subject's native language. Participants were tested twice at 1-week intervals. Exclusion criteria were abnormal vision (even after correction); history of substance abuse; presence of medical, neurological, and /or psychiatric disorders; and current use of psychotropic medication (Stuss, personal communication). Ten participants were assigned to each of six age ranges: 16-19, 20- 29, 30-39, 40-49, 50-59, and 60-69, Fifty-five percent of the sample were male, and 18% were left-handed. Mean education was 14.3 (2.62) years. Data are provided regarding handedness, gender distribution, and education. Mean time in seconds and SDs for the two parts of the TMT for the first, second, and combined testing sessions are reported for each age interval, Mean time and SDs are also provided for males, females, those with less than or equal to 12 years of education, and those with greater than 12 years of education, collapsed across age groupings. Older participants and those with a high school education or less performed significantly poorer than younger participants or those with some college or university education, Educational level was somewhat irregu- larly distributed across age groups, and the authors suggest that the normative data be used with caution. A practice effect was present, but the authors question the clinic: relevance of the improvement. No signi cant gender differences in performance were present. Study strengths 1. Presentation of the data by age groupings, education groupings. and gender. 2. Extensive information on educational level. 3, Sample composition is described in terms of age, gender, handedness, geographic location, and recruitment procedures. 4, Adequate exclusion criteria. 5. Information regarding practice effect. 6. Means and SDs are reported. Considerations regarding use of the study 1. Small sample sizes within each age group. Variability in mean educational levels across age groups; of importance, those 50-59 years old had the lowest mean educational level, the lowest mean test scores, and the largest SDs relative to the other age groups. . Lack of IQ data, Unknown influence of language differences. Data were obtained in Canada and may be of limited usefulness for clinical interpretation in the United States. to ae ow {TMT.17] Yeudall, Reddon, Gill, and Stefanyk, 1987 (Table A4.19) The authors obtained TMT data on 225 Canadian participants recruited from posted advertisements in workplaces and personal solicitations. The participants included meat packers, postal workers, transit employees, hospital lab technicians, secretaries, ward aides, student interns, student nurses, and summer students. In addition, high school teachers identified for participation average students in grades 10-12. The participants (127 males and 98 females) did not report any history of forensic involvement, head injury, neurological insult, prenatal or birth complications, psychiatric problems, or substance abuse. Data were gathered by experienced testing technicians who “motivated the participants to achieve maximum performance” partially through the promise of detailed explanations of their test performance. Means and SDs for time in seconds to complete parts A and B are presented for four age groupings (15-20, 21-25, 26-30, and 31-40) for males and females combined and separately. Information regarding percent right-handers, mean years of education, and mean WAIS/WAIS-R FSIQ, VIQ, and PIQ is reported for each age grouping and age- by-gender grouping. For the sample asa You have elther reached a page that is unavailable for viewing or reached your viewing lil far this booka You have elther reached a page that is unavailable for viewing or reached your viewing lil far this booka You have elther reached a page that is unavailable for viewing or reached your viewing lil far this booka4 TESTS OF ATTENTION AND CONCENTRATION (TMT.23] Elias, Robbins, Walter, and Schultz, 1993 (Table A4.24) ‘The authors explored the influence of gender and age on performance on tests included in the HRB. The sample consisted of 427 ‘community-dwelling volunteers. As per medical interview and self-report on the Cornell Medical Index, none of the participants had a history of treatment for neurological disorder, senility, alcoholism, brain trauma, mental illness, cerebral vascular or catastrophic disease, or a diagnosis of senile dementia. To achieve equivalence between age groups in terms ‘of education, the lower and upper limits for education were set at 12 and 19 years, respectively. All participants had normal or corrected- to-normal vision. Occupations ranged from blue-collar to professional. Non-age-corrected WAIS Vocabulary scaled scores ranged from 13.9 to 14.7, and Information scores ranged from 13.2 to 13.7. Mean time in seconds and SDs to complete parts A and B were reported for six age groups (15-24, 25-34, 35-44, 45-54, 55-64, and >65) for males and females separately. ‘The authors found significant linear trends across age cohorts for parts A and B. Study strengths 1. Large overall sample and adequate sample size for individual cells. 2. The sample composition is well described in terms of age, education, gender, and WAIS Vocabulary and Information scaled scores, Rigorous exclusion criteria. . Means and SDs for the test scores are reported. ae Considerations regarding use of the study 1. Education and estimated intelligence level for the sample are high. 2. Age range for the oldest group is not reported. (TMT.24] Cahn, Salmon, Butters, Wiederholt, Corey-Bloom, Edelstein, and Barrett-Connor, 1995 (Table A4.25) The study examined the accuracy of neuropsychological measures at detecting Dementia of the Alzheimer's Type (DAT) in a community- dwelling elderly sample. The participants are stable, upper middle-class, retired older adults who entered the Rancho Bernardo Study, surveying for heart disease risk factors, between 1972 and 1974. The initial sample included 5,052 adults between 30 and 79 years of age. who have been followed until the present. Participants over the age of 65 who returned for a reexamination in 1988 and later and screened positive for cognitive impairment were seen in clinic for diagnostic purposes (n=199). A matched control sample of 203 normal elderly participants who screened negative for cognitive impairment was randomly selected for the comprehensive evaluation, which included neurological examination, neuropsychological assessment, standard medical history and examination, and, in some cases, CT scans of the brain. On the basis of the diagnostic evaluation, the group composition was re-assessed. ‘The final sample of normal elderly included 238 participants (97 males, 141 females), with a mean age of 78.4 (6.8), education of 13.8 (2.6), and De- mentia Rating Scale (DRS) score of 136.8 (5.4). The TMT was administered as part of a larger battery by a trained psychometrist who was blind to the participants’ group assign- ment. Time to completion was reported for the entire sample. In addition, the authors provided optimal cutoff scores and sensitivity/ specificity of the TMT for the diagnosis of DAT: 69%/90% for part A at the cutoff of 66 seconds and 87%/88% for part B at the cutoff of 172 seconds. Study strengths 1. Large sample size. 2. The sample composition is well described in terms of age, education, gender, DRS score, geographic area, history of the project, and recruitment procedures. 3. Rigorous exclusion criteria. 4. Test administration procedures are specified. Means and SDs for the test scores are reported. Sensitivity and specificity for optimal cutoff scores for the two parts of the test are reported. a >a You have elther reached a page that is unavailable for viewing or reached your viewing lil far this booka You have elther reached a page that is unavailable for viewing or reached your viewing lil far this booka You have elther reached a page that is unavailable for viewing or reached your viewing lil far this book88 TESTS OF ATTENTION AND CONCENTRATION gender, geographic area, and recruitment procedures Mental status was assessed with MMSE id Blessed Mental Status Exam. Adequate exclusion criteria, Performance for very old group (90- 96 years) is reported. 6. Test administration procedures are thor- oughly described. Means and SDs for the test scores and the percentage of participants who made errors on parts A and B are reported. 8, Data are partitioned by four age groups. a Considerations regarding use of the study 1. Education level for the samples very high 2. No information on 1Q is reported. [TMT.30] Miner and Ferraro, 1998 (Table 4.32) ‘The study examined the role of different information-processing factors and presentation order in TMT performance. The sample consisted of 110 undergraduate students (88 females and 22 males) from the University of North Dakota, with a mean age of 21.7 (5.24) years, who received a course credit for their participation. Their health was assessed with a background information questionnaire and with the Geriatric Depression Scale. The TMT was administered in a counter- balanced order as part of a larger battery. Those participants who received the test in the part B-part A order demonstrated considerably slower performance on part B in comparison to the group tested in the standard order. Study strengths 1. Relatively large sample. 2 mposition is described education, gender, and incentive for participation Minimally adequate exclusion criteria 4. Test. administration procedures are specified Means and SDs for the test scores are reported. Considerations regarding use of the study 1, Exclusion criteria are not described. 2. No information on 1Q is reported. (TMT.31] Crowe, 1998b (Table A4.33) The TMT and a series of measures derived from it were administered to 98 undergraduate students from La Trobe University in Melbourne, Australia, in order to examine cognitive mechanisms contributing to performance on both parts. Participants were screened for a history of loss of consciousness or other neuropathology. The mean age for the sample was 23.4 (3.1) years, mean education 14.0 (2.3) years, and mean Wide Range Achievement Test (WRAT) Reading score 101.0 (9.0). The authors developed modified procedures in an effort to separate cognitive mechanisms contributing to TMT performance. They concluded that visual search and motor speed contributed to performance on part A, whereas visual search and cognitive alternation contributed to performance on part B. The latter was further influenced by reading level, ability to mentally maintain two simultaneous sequences, attention, and working memory. Time to completion for both TMT parts is provided. Study strengths 1. Large sample size. 2. The sample composition is described in terms of age, education, gender, WRAT Reading score, and geographic area. Minimally adequate exclusion criteria. Test administration procedures are specified. Means and SDs for the test scores are reported. ae a Considerations regarding use of the study 1. No information on 1Q is reported. 2. High educational level of the sample. 3. The data were obtained on Australian participants, which may limit their usefulness for clinical interpretation in the United States. [TMT.32] Tremont, Hoffman, Scott, and Adams, 1998 (Table A4.34) The authors challenged Dodrill’s_ (1997) findings of no relationship between level ofa You have elther reached a page that is unavailable for viewing or reached your viewing lil far this booka You have elther reached a page that is unavailable for viewing or reached your viewing lil far this booka You have elther reached a page that is unavailable for viewing or reached your viewing lil far this book92 TESTS OF ATTENTION AND CONCENTRATION a multicenter observational study of heart disease and stroke in Washington County, Maryland, and Pittsburgh, Pennsylvania. No selection criteria were used. Data were analyzed for a sample of 989 participants (444 males and 545 females), who completed all of the cognitive tests included in the battery. The mean age for the sample was 73.63 (4.45) years, and mean education was 13.23 (2.85) s; 93.9% of the sample were white. This mple was divided into two clinical groups and a “no disease” group, based on cardio- ular status. imes to completion for the TMT for the “no disease” sample of 357 participants are reproduced in Table A4.38. Demographic characteristics for this sample are not reported by the authors. However, we assume that they are similar to the demographics for the entire sample described above. Study strengths 1. Large sample size. 2. The sample composition is described in terms of age, education, gender, setting, geographic area, and recruitment procedures. 3. Means and SDs for the test scores are reported. Considerations regarding use of the study 1. No exclusion criteria. 2. The data are not partitioned by age group. 3. No information on IQ is reported. Demographic characteristics for the “no disease” group are not reported. {(TMT.39] Chen, Ratcliff, Belle, Cauley, DeKosky, and Ganguli, 2000 (Table A4.41) A control sample of 483 elderly nondemented individuals was derived from a community- based multiwave prospective study, the Mon- ‘ongahela Valley Independent Elders Survey (MoVIES), in southwestern Pennsylvania. The purpose of the study was to identify cognitive measures that are most accurate in discriminating between individuals with presymptom- atic DAT and nondemented individuals. The control participants remained nondemented over a 10-year follow-up period. The study protocol included a standardized general medical history and physical examination; a detailed neurological and mental status examination; hematological, metabolic, and se- rological tests; and neuroimaging when appropriate. Relevant medical records were abstracted. The sample included 302 females and 181 males, with a mean age of 74.9 (4.4) years; 31.9% of participants had less than a high schoo! education. ‘Times to completion for the two parts of the TMT were reported for the entire sample. Results of the ROC analysis suggested that ‘TMT part B was one of the tests that had the highest accuracy in discriminating between nondemented participants and those who were in the preclinical stages of DAT (area under the curve = 0.773). Study strengths 1. Large sample size. 2. The sample composition is well described in terms of age, education, gender, history of the project, and geographic area. 3. Rigorous exclusion criteria. 4, Means and SDs for the test scores are reported. 5. Information on the diagnostic accuracy of part B is provided. Considerations regarding use of the study 1. The data are not partitioned by age group. 2. No information on IQ is reported. 3. The number of participants with less than a high school education is reported. However, mean education and SD is not reported, (TMT.40] Small, Graves, McEvoy, Crawford, Mullan, and Mortimer, 2000 (Table A4.42) ‘The authors examined the relationship between APOE genotype and cognitive functioning in normal aging based on a sample of 413 adults between 60 and 85 years of age, with a mean age of 72.90 years, who were randomly selected from a larger sample of participants in the community-based, cross- sectional Charlotte County Healthy Aging Study conducted in south Florida. The sample was stratified into two age groups, young-olda You have elther reached a page that is unavailable for viewing or reached your viewing lil far this booka You have elther reached a page that is unavailable for viewing or reached your viewing lil far this booka You have elther reached a page that is unavailable for viewing or reached your viewing lil far this book96 TESTS OF ATTENTION AND CONCENTRATION [TMT.47] Tombaugh, 2004 (Table A4.49) The author provided normative data for 911 community-dwelling adults between 18 and 89 years of age. The data for volunteers who participated in earlier studies were analyzed. Out of this sample, 823. participants were recruited through booths at shopping centers, social organizations, places of employment, psychology classes, and word of mouth, E clusion criteria were history of neurological disease, psychiatric illness, head injury, or stroke, per self-report; the remaining 85 participants represent a subset of individuals who had received a consensus diagnosis of “no cognitive impairment” made by physicians and clinical neuropsychologists, based on history, clinical and neurological examination, and an extensive battery of neuropsychological tests, over two successive evaluations separated by approximately 5 years. The author pointed out that all participants 18-24 years old were university students Mean age for the sample was 58.5 (21.7) years, mean education was 12.6 (2.6) years, and the male/female ratio was 408/503. All participants scored above 23 on the MMSE, with a mean of 28.6 (1.5), and below 14 on the Geriatric Depression Scale, with a mean of 4.1 (3.4). Elderly participants were also excluded on the basis of a clinical evaluation of depression. Trails A and B were administered as part of a larger battery according to the Spreen and Strauss (1998) guidelines. The results indicated that test performance for both Trails A and B was affected by age. Performance on Trails B was also related to education, particularly in individuals over 54 years of age. Therefore, tables of raw data and percentiles are stratified into 11 age groups. For ages 55 and above, they are further partitioned into two education levels (0-12 and 12+ years). Study strengths 1, Large sample size. 2. The sample composition is well described in terms of age, education, gender, setting, and recruitment procedures. 3. Rigorous exclusion criteria 4. Test administration procedures are specified. 5. Means and SDs for the test scores are reported, 6, Data are stratified by age x education. Considerations regarding use of the study 1, As the authors pointed out, the sample size of the oldest group is small. 2. No information on the intellectual level of the sample is reported. 3. The data were obtained on Canadian participants, which may limit their usefulness for clinical interpretation in the United States. RESULTS OF THE META-ANALYSES OF THE TRAILMAKING TEST DATA (See Appendix 4m) Data collected from the studies reviewed in this chapter were combined in regression analyses in order to describe the relationship between age and test performance and to predict expected test scores for different age groups. Effects of other demographic variables were explored in follow-up analyses. The general procedures for data selection and analysis are described in Chapter 3. Detailed results of the meta-analyses and predicted test scores across adult age groups for parts A and B are provided in Appendix 4m. Educational range was unevenly represented, with a large gap between 8.5 and 11.59 years at the lower extreme. Based on the preliminary analyses, the data point with 8.5 years of education was retained in the main analyses but dropped in the analyses generating an education-correction factor (see below). After data editing for consistency and for outlying scores, 28 studies for Trails A and 29 for Trails B, which generated 89 data points for each part based on totals of 6,317 and 6,360 participants, respectively, were included into the analyses. Quadratic regressions of the test scores on age yielded R® values of 0.905 for Trails A and 0.876 for Trails B, indicating that 91% and 85% of the variance in test scores for the two parts, respectively, is accounted for by thea You have elther reached a page that is unavailable for viewing or reached your viewing lil far this booka You have elther reached a page that is unavailable for viewing or reached your viewing lil far this booka You have elther reached a page that is unavailable for viewing or reached your viewing lil far this book100 TESTS OF ATTENTION AND CONCENTRATION on English (or any other) alphabet letters as part of the test stimuli, Instructions for the CTT may be administered verbally or non- verbally, using only visual cues. Both the TMT and CTT are paper-and- pencil tests that are administered in two parts onan 8/4 x 11" page. However, for the CTT], the numbers 1-25 are printed within colored circles, All even-numbered circles are printed with a bright yellow background and all odd- numbered circles, with a vivid pink background. These background color differences are perceptible even to color-blind individuals. The individual is instructed to quickly draw a continuous line that connects the numbers in consecutive/sequential order. The incidental fact that color alternates with each succeeding number is not highlighted or discussed with the subject since attention to color sequence is not necessary for completion of the CTT The CTT2 introduces a divided attentional component, requiring attention to the alternating and sequencing of the stimuli. For the CTT2, the number 1 circle is printed against a vivid pink background; however, the numbers 2-25 are presented twice: once with a vivid pink background and once with a bright yellow background. The subject has to again quickly connect the numbers in sequence; however, the task requires alternation of colors as the sequence of numbers advances, so the subject must ignore distracter circles that contain the correct number but are printed in the wrong color background (e.g., start with pink 1 and avoid pink 2, select yellow 2, avoid yellow 3, select pink 3, avoid pink 4, select yellow 4, etc.). Therefore, there is always a distracter number that must be avoided because it is printed against a color background that is not appropriate to the sequence. Before the CTT] and CTT2 are administered, nontimed practice trials are administered to insure that the subject understands the task, When the CTT] and CTT2 forms are administered, however, the time required to complete each form is noted. Subjects must complete each form of the test in <240 seconds, or that part of the test is discontinued. ‘The CTT! is a less cognitively demanding task because it requires the subject to per- ceptually track only a single specified sequence (number), whereas the CTT2 requires the subject to simultaneously track both a specified number sequence and a separate color sequence. Therefore, an interference index was developed to quantify and highlight the relative difference regarding the effects of visual attention and perceptual tracking required on the CTT! from the more demanding sustained, divided attention and more complex perceptual tracking required by the CTT2. Interference Index = (CTT2 time raw score ~CTTI time raw score) CTT! time raw score The interference index reflects the comparison of the subject’s performance on the CTTL relative to the CTT. This index is expressed as a function of the level of performance on the CTT1. Therefore, the index score is a relatively “pure” measure of the extent of interference (if any) attributable to the more complex divided attention and the alternating sequencing tasks required by the CTT2. For example, an interference index score of 0 indicates that the subject’s time to complete the CTT1 was the same as that to complete the CTT2 (ie., no interference). An interference index score of 1.0 indicates that the subject required twice as long to complete the CTT2 as the CTT1, whereas a score of 3.0 indicates that it took the subject four times as long to complete the CTT2 relative to the CTT! (ie., significant interference). As the interference index score increases, the increasing score suggests the presence of greater susceptibility to cognitive interference from alternating and sequencing demands (i.e., decreased cognitive flexibility), The WHO's request for a test that would allow broader application in cross-cultural contexts seems quite reasonable. Ideally, neuropsychological procedures that assess the effects of conditions affecting neurological functioning, including brain injury, infectious diseases (e.g., HIV) and other pathologies, should be as culture-free as possible; but is it possible to develop a totally culture-free neuropsychological test? Perhaps not. If this is the case, then procedures should be developed that allow, at minimum, enhanced assessment.a You have elther reached a page that is unavailable for viewing or reached your viewing lil far this booka You have elther reached a page that is unavailable for viewing or reached your viewing lil far this booka You have elther reached a page that is unavailable for viewing or reached your viewing lil far this book104 TESTS OF ATTENTION AND CONCENTRATION after the effects of age were removed, explaining between 0.4% to 2.4% of the variance. Therefore, the relatively small proportion of women in the normative sample does not constitute a threat to either the validity or the utility of the CTT. (D'Elia et al., 1996) Spanish-language administration instructions and preliminary normative data for Hispanics are provided in the manual. The preliminary normative data are from a sample of healthy, normal Hispanics living in south- em California, participating in a large, ongoing normative study. The Hispanic data are reported separately since all participants in this subsample were educated outside the United States and were primarily Spanish- speaking or had Spanish as their first language. Data for Hispanics are presented by four age categories: 17-29, 30-39, 40-49, and 50-75 years. The normative data contained in the standardization manual are not reproduced here, and the interested reader is referred directly to the publication for further information. Study strengths 1. Sample composition is well deseribed in terms of exclusion criteria 2. Performance is reported by age and education intervals. 3. Data reporting includes means and SD scores for each age/education interval. 4. Age group intervals are generally adequate. Considerations regarding use of the study 1. Sample size within each of the 30 age/ education categories is not indicated 2. No information on the 1Q of participants is reported, although the data are presented by age/education intervals. ICTT.2] Ponton, Satz, Herrera, Ortiz, Urrutia, Young, D’Elia, Furst and Namerow, 1996 (Tables A5.2 and A5.3) This study presents normative data stratified by age and education for Spanish-speaking adults’ performance on the Neuropsycholo- gical Screening Battery for Hispanics (NeSBHIS), which contains the CTT. This is the initial report from an ongoing project. The sample consists of 300 volunteers (180 female, 120 male) recruited from fliers and advertisements posted at community centers and churches in Los Angeles County, Cali- fornia (Santa Ana, Pasadena, Pacoima, Mon- tebello, and Van Nuys). The sample was primarily right-handed (95%). Regarding language, 210 were monolingual Spanish and 90 were rated by the examiner to be bilingual. ‘The average (SD) duration of residence in the United States was 16.4 (14.4) years; however, 55% of the total sample had lived in the United States less than 15 years, and half of those participants had less than 6 years of residence in this country. Sixty-two percent of the sample were born in Mexico, 15% in Central America, and 23% in other Latin countries. Exclusion criteria included a history of neurological disease, psychiatric disorder, alcohol or drug abuse, or head trauma. Participants ranged in age from 16 to 75 years (mean = 38.4 [13.5] years). Whereas the 30- 39 and 40-49 age groupings are adequately narrow, the 16-29 and 50-75 age groupings are somewhat broad. ‘The data are reported by age and education groupings. The tables separately present data for males and females. Sale strengths 1, Sample composition is well described in terms of exclusion criteria. 2. Educational levels are reported. 3. Mean and SD scores are reported. 4. Age group intervals are generally adequate for younger samples (< 50 years). Considerations regarding use of the study 1. Sample size is generally small per age/ education interval. 2. The age group interval is too broad for the older sample (50-75 years). Other comments 1. IQ scores are not reported; however, scores are reported for Raven's Standard Progressive Matrices Test at each age/ education level. Raven's test is used to provide an estimate of nonverbal intelligence.a You have elther reached a page that is unavailable for viewing or reached your viewing lil far this booka You have elther reached a page that is unavailable for viewing or reached your viewing lil far this bookCOLOR TRAILS TEST language and reading disorders. Further research is needed to compare OTT performance in cross-cultural settings. Four equivalent forms have been developed for the CTT (A, B, C, and D). Currently, only form A has been normed for clinical use. Even though all four are physically equivalent, future research is needed to establish the psychometric and normative equivalence of the alternate forms. Future research should also focus on establishing the reliability and equivalence of the alternate forms in samples of both normal participants and patients with specific neurobehavioral dysfunctions (e.g., clinical comparison data; aka, abnorms). Fu- ture normative studies with any form of the test should also report base rate data regarding error and near-miss responses, data for prompts, as well as information regarding the interference index. For instance, no age- and education-corrected normative data are available for Hispanic samples regarding the occurrence of near-miss and error responses, nor is there information regarding prompts and the interference index. The one Chinese normative study reports information regarding time to complete the CTT1, CTT2, and the interference index but no information regarding prompts, near-misses, or errors. Normative data are needed for English- speaking and non-English-speaking individuals, with low or no education. Further normative 107 studies of different ethnic/cultural groups are needed. Reporting the data by age/education categories would allow performance comparison across cultures In general, the age categories need to be narrowed for reporting data on older adults. We recommend that future studies follow the WAIS-III age category groupings as an example. Although some excellent work has been done, further normative work still needs to be done regarding the performance of Hispanic individuals of Mexican descent on the CTT above age 75 years. Fortunately, Ponton and colleagues continue to collect normative information on the NeSBHIS; therefore, a larger normative database will accumulate, allowing a sample size more appropriate for inferential purposes with Hispanics. In their initial report of Spanish-speaking individuals, the sample size for the age- and education corrected groups was quite small. Yet, comparison of these preliminary performance data with those found in the U.S. standardization mannal for the CTT at the same age and education levels does not suggest a significant difference. This finding coupled with the findings of Maj et al. (1991, 1993) further supports the notion that the CTT may allow enhanced application in cross-cultural contexts. How many cultures this effect trans- cends remains to be discovered." 'Meta-analyses were not performed on the CTT due toa Jack of sufficient data,6 Stroop Test BRIEF HISTORY OF THE TEST ‘The Stroop Test measures the relative speed of reading names of colors, naming colors, and naming colors used to print an incongruous color name (e.g,, the color red used to print the word blue). The last task requires one to override a reading response. This conflict interference situation has come to be called the Stroop Effect. The interference section of the Stroop Test has traditionally been viewed as a measure of executive functioning involving cognitive inhibition (Boone et al., 1990) and, specifically, the ability to inhibit an overleamed response in fa- vor of an unusual one (Spreen & Strauss, 1998) and “to maintain a course of action in the face of intrusion by other stimuli” (Comalli et al., 1962, p. 47). Factor analyses of sets of executive measures suggest that the Stroop interference trial has more in common with timed executive measures, such as verbal fluency (FAS), and measures of information-processing speed, such as Digit Symbol, than executive tests involving set shifting (Wisconsin Card Sorting Test) or divided attentionAvorking memory (Auditory Consonant Trigrams) (Boone et al., 1998). Initial lesion studies indicated that poor performance on the interference section of the Stroop Test was associated with left frontal lobe 108 pathology (Perret, 1974), while subsequent functional imaging studies have found the Stroop interference effect to be associated with activation of anterior cingulate and/or frontal cortex (Bench et al., 1993; Brown et al., 1999; Carter et al., 1995, 1997; George et al., 1994; Pantelis et al., 1996; Pardo et al., 1990; Pe- terson et al., 1996; Taylor et al., 1997) Poor performance on the Stroop Test has been associated with frontal system dysfunction secondary to closed head injury (Trenerry et al., 1989), discrete frontal lobe lesions (especially left frontal lobe; Perret, 1974; see Regard, 1981, cited in Spreen & Strauss, 1998), frontotemporal dementia (Pachana et al., 1996), frontal lobe seizures (Boone et al., 1988), white-matter hyperintensities (Fukui et al., 1994; Ylikoski et al., 1993), Klinefelter's syndrome (Boone et al., 2001), age-associated memory impairment (Hanninen et al., 1997), transient global amnesia (Stillhard et al., 1990), depression (Boone et al., 1995; Trichard et al., 1995), schizophrenia (Brebion et al., 1996; Buchanan et al., 1994; Schreiber et al., 1995), late-life psychosis (Miller et al., 1991), attention-deficit hyperactivity disorder (ADHD; Seidman et al., 1997; Rapport et al., 2001), and exposure to alcohol in utero (Connor et al., 2000); and Stroop scores have been observed to predict aggression (Foster et al., 1993).STROOP TEST In addition, Stroop scores are lowered in cases of brain dysfunction not necessarily con- fined to anterior brain areas, such as left and right cerebrovascular accident (Trenerry et al. 1989), Alzheimer's disease (Binetti et al., 1996; Koss et al., 1984; Pachana et al., 1996), and myotonic dystrophy (Palmer et al., 1994), Stroop performance is impaired in both left and right cerebral damage but may be particularly pronounced with left-sided damage (Perret, 1974; Trenerry et al., 1989), although this may be an artifact of coexistent aphasia. Specifically, Nehemkis and Lewinsohn (1972) found that patients with left cerebral damage with aphasia performed particularly poorly on the Stroop, while patients with left cerebral damage without aphasia actually performed better than patients with right hemisphere dysfunction. Finally, there is evidence that Stroop performance is unaffected by chronic caffeine use (Hameleers et al., 2000) but is influenced by endogenous cholesterol synthesis (Teunissens et al., 2003). The Stroop Test paradigm is among the oldest in experimental psychology. Interest in the relative speed of color naming and reading color-words has been active for over a century. In 1883, as a result of a suggestion by Wilheim Wundt (who founded the first psychological laboratory in Leipzig, Germany), America’s first psychologist, James Cattell (then a student of Wandt), began conducting what would later become the earliest published study (1886) examining the relative speeds of color naming and color-word reading. Over 40 years later, the first published report of the conflict / interference situation (e.g., where one must name the color of the ink used to print the word when the color and color name are incongruous) origi- nated in the Marburg, Germany, laboratory of Erick Rudolf Jaenasch (Jensen & Rohwer, 1966). Some years later, John Ridley Stroop, then a graduate student working in the Jesup Psychological Laboratory at George Peabody College for Teachers, began his doctoral research, examining interference in serial verbal reactions in which he developed and 109 used the color-word interference test that now bears his name (Stroop, 1935). Stroop’s original studies employed three cards, all with white backgrounds: 1. An achromatic color-word reading card, consisting of a series of 100 words for colors printed in black ink. A chromatic color-word naming card, consisting of a series of 100 color names printed in a color of ink incongruent with the word. 3. A pure color card, consisting of a series of 100 squares printed in different solid colors. w For all cards, five colors and/or color-words were used (red, blue, green, purple, and brown). The words and the colors were gen= erally arranged in a 10" x 10" grid of evenly spaced rows and columns, As Stroop notes “The colors were arranged so as to avoid any regularity of occurrence and so that each color would appear twice in each column and in each row, and that no color would immediately succeed itself in either column or row. The words were also arranged so that the name of each color would appear twice in each line.” For the chromatic color-word naming cards, “no word was printed in the color it names but an equal number of times in each of the other four colors: i.e., the word ‘red’ was presented in blue, green, brown, and purple inks; the word ‘blue’ was printed in red, green, brown, and purple inks, etc. No word immediately succeeded itself in either column or row” (p. 648). An alternate form was also created by printing all the cards in the reverse order. In three experiments, Stroop examined four different tasks using the above: tioned three cards, Using cards 1 and 2, experiment 1 examined the differences in rates of reading color-word names (task 1) when the word was printed in black ink vs. an incongruous ink color (task 2). Using only cards 2 and 3, experiment 2 examined the differences in rates of verbally identifying squares of color (task 3) vs. naming ink colors against the distraction of incongruous color-words (task 4). For experiment 3, Stroop modified his test, shortening110 TESTS OF ATTENTION AND CONCENTRATION the cards to 10 columns and five rows (so that there were only 50 responses required per card instead of 100) and using colored swastikas on the pure color card instead of solid square color patches. For experiment 3, Stroop administered each of the four tasks separately on different days, Stroop never administered all three cards in the same testing period; this procedure did not become standard until Thurstone’s (1944) investigations of perception using the Stroop paradigm, As testimony to its popularity, the Stroop has been translated into several languages, including Spanish (Rosselli et al., 2002b; Armengol, 2002), Chinese (eg, Ghen & Ho, 1986), Czechoslovakian (e.g., Soveikova & Bronis, 1985), German (e.g., Perret, 1974), Hebrew (e.g,, Ingraham et al., 1988), Swedish (eg., Hugdahl & Franzon, 1985), Japanese (e.g, Fukui et al., 1994; Toshima et al., 1992, 1996; Yamazaki, 1985), Vietnamese (Doan & Swerdlow, 1999), and Italian (Barbarotto et al., 1998). In addition, a version employing numbers rather than words for a “language- neutral” test has been examined in various populations, including thase of low socioeconomic status and/or education, low reading level, and Mandarin and Spanish speakers (Sedo, 1998, personal communication). The major problem with the Stroop literature is the presence of numerous versions of the task. Following is a summary of the variations. Stimulus Cards 1. Color and shape of items: Cards have contained three, four, or five, colors presented as either squares, rectangles, circles, dots, or swastikas. . Number of items: Various stimulus cards have contained 17, 20, 22, 24, 27, 50, 100, 112, and 176 items, . Size and presentation of stimuli cards: Stimulus cards have varied from small flash cards to wall charts, and some studies have used a tachistoscopie, slide, or computer presentation. . Stimuli background: Although most investigators have used stimulus cards pe e * 110 with a white background, others have used cards with a black background or a color different from both the color ink of the word and the color name (“Super- Stroop;” Dyer, 1973). Number of stimuli cards: Various versions of the test require the use of two, three, or four cards. a Administration Procedures 1. Scanning orientation: Some versions require the examinee to scan across rows. from left to right, whereas others require the examinee to read down columns. Stimuli sequence: Some versions present word reading followed by color naming and vice versa. Method of Scoring Determination of the total score has ranged from the number of correct responses made in 45 or 120 seconds to the total time to complete each card to a difference score (color interference minus color naming or reading) to the total number of errors made in 45 seconds. rc) Current Adi ration Procedures At present, there is no one recognized standard version of the Stroop Test. There are, however, three versions that are commer- cially published: Charles Golden's (1978), Max Trenerry et al.’s (1989), and that contained in the Delis-Kaplan Executive Function System (Delis et al., 2001), The first two are available from Psychological Assessment Resources, and the Delis-Kaplan Executive Function System can be purchased from Psychological Corporation. Edith Kaplan's Stroop version can be used as a Comalli et al. (1962) version (reading words, naming colors, naming colors with incongruous calor names) or as the Comalli and Kaplan version (naming colors, reading words, naming colors with incongruous color names). Carl Dodrill (1978a) and Otfried Spreen and Esther Strauss (1998) have also developed versions of the Stroop, which can be obtained by writing to their respective laboratories (see Appendix 1 for ordering information).a You have elther reached a page that is unavailable for viewing or reached your viewing lil far this book112 TESTS OF ATTENTION AND CONCENTRATION the words red, green, and blue presented randomly and printed in black ink. Page 2 contains blocks of Xs printed in either red, green, or blue ink. Page 3 is the Stroop effect card and contains color-words printed in a noncongruent color (i.e., the word blue printed in red ink, etc.). For each page, the examinee is required to scan the columns vertically, starting on the left side and moving to the right, The score is the number of correctly identified items per page within 45 seconds. Errors are not counted. Dodrill Stroop, 1978a The Dodrill version of the Stroop consists of two alternate administrations of one stimulus card containing 176 color-words (red, green, blue, and orange) randomly printed in 11 columns of 16 color-words. Each color-word is printed in an incongruous color (e.g,, the color- word blue is printed in green ink, etc.). In the first administration, participants read the color-words as they scan down the columns. In the second administration, participants name the color of ink in which the words are printed. Time to complete each card is noted. Two scores are generated: the total time to complete part I (which is essentially an estimate of the examinee’s reading speed) and the total time for part H minus that for part I, “which reflects an estimate of the degree of interference induced by the test” (Dodrill, 1987, p. 6) Victoria Stroop, 1991 (Reported by Spreen & Strauss, 1991, 1998) The Spreen and Strauss version of the Stroop (also known as the Victoria version) uses three 21.5 x 14 om cards presented in the following order: part D, part W, and part C. Each card has six rows of four items. Part D contains colored dots (red, green, blue, and yellow), and on this card the task is to name the colors as quickly as possible. Part W has the words when, and, over, and hand printed in red, green, blue, or yellow ink; and the examinee must name the color of ink in which each word is printed as quickly as possible. On part C, the color-words red, green, blue, and yellow are printed in incongruous-colored ink (e.g., the word red is printed in green ink, ete.); and the examinee must name the color ink in which the color-word is printed as quickly as possible. Rows are scanned from left to right as the subject works down the page. Time to completion and the number of errors are recorded for each card. Jensen and Rohwer (1966) provide a detailed and fascinating review of the Stroop ‘Test and its many reincarnations; and Dyer (1973), Golden (1978), and MacLeod (1991) review applications and research findings subsequent to Jensen and Rohwer's report. Lezak et al. (2004) also provide an overview of the test for the interested reader. Trenerry et al. Stroop, 1989 This version of the Stroop consists of two cards: form C and form C-W. Form C contains 112 color-words (red, blue, green, and tan} randomly arranged in four columns of 28 color-words. Each color-word is printed in an incongruous ink color (e.g., the word tan is printed in red, ete.). Form C-W follows the same format as form C; however, there is a different random order of color-words. For form C, the examinee is requested to read the words as quickly as possible while scanning down the columns. For form C-W, the examinee is instructed to name the color ink in which the color-word is printed as quickly as possible, again while scanning down the columns. A maximum of 120 seconds is allowed to complete each task. The score for each task is the number of correct responses (or number of items completed) minus any incorrect responses. Although the Dodrill version relies upon a difference score between the reading and interference cards, a discriminant analysis conducted by Trenerry and colleagues demonstrated that the data from form C-W alone provided the sharpest classification accuracy; thus, the score from form C-W is the only one used for interpretation purposes. RELATIONSHIP BETWEEN STROOP TEST PERFORMANCE AND DEMOGRAPHIC FACTORS While some studies have found no significant age effect on the Stroop Test (Graf et al.,a You have elther reached a page that is unavailable for viewing or reached your viewing lil far this booka You have elther reached a page that is unavailable for viewing or reached your viewing lil far this booka You have elther reached a page that is unavailable for viewing or reached your viewing lil far this book116 TESTS OF ATTENTION AND CONCENTRATION 2002b; Miller, 2003; Strickland et al., 1997), occupational status is indicated in only five publications (Anstey et al., 2000; Daigneault et al, 1992; D’Elia, Satz, & Uchiyama, unpublished data; Dodrill, 1978a; Ivnik et al., 1996), and handedness is described ‘in only four reports (Boone et al., 1991; Doan & Swerdlow, 1999; Ivnik et al., 1996; Regard, 1981, cited in Spreen & Strauss, 1991). Lan- guage and/or fluency in English is reported in 10 studies (Boone et al. 1990, 1991, 2001; Boone, 1999; Daigneault et al., 1992; Doan & Swerdlow, 1999; Ingraham et al., 1988; Ros- selli et al., 2000; Miller, 2003; Stuss et al., 1985), and recruitment procedures are specified in 16 data sets (Anstey et al., 2000; Boone et al., 1990, 1991; Boone, 1999: 1988; Daigneault et al., 1992; D’Elia, Satz, & Uchiyama, unpublished data; Dodrill, 1978; Fisher et al, 1990; Ingraham et al., 1988; Ivnik et al., 1996; Lopez et al., 2003; Moering et al., 2004; Rosselli et al., 2000: sonal communication; Swerdlow et Regarding procedural variables, test stimuli and procedures are described in all reports. Means are presented in all data sets, although SDs are not reported in three studies (Anstey et al., 2000; Comalli et al., 1962; Golden, 1978). Percentiles corresponding to raw scores, stratified by age and education, are provided by Anstey et al. (2000) and Moering et al. (2004). One study provides data only for the color-naming trial (Stuss et al., 1985), and six studies report data only for the color- interference trial (Boone et al., 1991; Boone, 1999; Cohen et al., 2003; Daigneault et al., 1992; Sacks et al, 1991; Schiltz, personal tion). Two studies provide cutoff res (Dodrill, 1978a; Trenerry et al., 1989), and one study presents data for the first half and the second half of the color-interference trial separately (Schiltz, personal communication). Several studies report error scores as well as time scores (Boone et al., 1990; D'Elia, Satz, & Uchiyama, unpublished data; Regard, 1981, cited in Spreen & Strauss, 1991; Spreen & Strauss, 1998), and one presents information on alternate forms (Sacks et al., 1991). The text of study descriptions contains references to the corresponding tables identified by number in Appendix 6. Table A6.1, the locator table, summarizes information provided in the studies described in this chapter. SUMMARIES OF THE STUDIES Published manuals for the Stroop Test are reviewed first, followed by normative studies and control groups from clinical comparison studies presented in ascending chronological order for each version of the test separately. Studies using the Comalli version are presented first, followed by those using the Kaplan, Golden, Dodrill, Victoria, and Trenerry versions. Manuals [STROOP.1] Golden, 1978 (Golden Version) The test stimuli and administration dures developed by Golden are well specified. Primarily utilizing previously published normative reports, the norms presented in this manual have largely been empirically derived by calculating how many items the participants in other studies would have obtained if the test were discontinued after 45 seconds. In addition to including data from his own studies (sample sizes unknown), Golden utilized normative data provided by Stroop (1935), Jensen (1965), and Comalli et al. (1962) to generate the norms. No information is provided regarding demographic, 1Q, or other characteristics of Golden’s own normative samples. Using the tables in the manual, all raw scores can be converted to T scores. For participants younger than 17 and older than 45, age corrections need to be applied before the T-score conversion can be made, The manual cautions that the age corrections for adults over age 65 and children under age 17 are considered to be “experimental.” ‘The normative data contained in Golden’s manual are not reproduced here, and the interested reader is referred directly to this publication for further information proce- Study strengths 1, The Stroop cards developed by Golden and test administration procedures are well described in the manual,a You have elther reached a page that is unavailable for viewing or reached your viewing lil far this booka You have elther reached a page that is unavailable for viewing or reached your viewing lil far this booka You have elther reached a page that is unavailable for viewing or reached your viewing lil far this book120 TESTS OF ATTENTION AND CONCENTRATION 3. High educational level; two participants had a history of learning disability. 4, No data reported for word-reading and color-naming trials. ISTROOP.8] Demick and Harkins, 1997 (Comalli Version} (Tables A6.7-A6.10) The sample consists of 231 individuals recruited in Massachusetts who participated in a study assessing the relationship between field dependence-independence (FDI) cognitive style and driving behavior. Participants were community-dwelling individuals who in telephone screening denied any history of major impairment in perception, cognition, or motor execution and described themselves as having good overall health; corrected visual problems were allowed. The average educational level of the sample was high school plus some college courses completed. The Comalli cards and Kaplan administration procedures (ie., color naming, word reading, color interference) were employed. Means, SDs, and ranges for time in seconds, errors, color difficulty factor (total time on B/total time on A), and interference factor (total time on C total time on B) are provided for four age groupings (20-39, 40-59, 60-74, 75+ years) Study strengths 1. Overall sample size is large, with individual cell sizes exceeding 50. Data are presented by age groupings. Probably adequate exclusion criteria. 4, Information regarding gender, overall educational level, and geographic region is provided. Test stimuli and procedures are indicated. Means, SDs, and ranges are provided. es 2 Consideration regarding use of the study 1. No information regarding intellectual level. Other comments 1. Theoretical issues concerning the Stroop (eg, process vs. achievement measures, identification of a cognitive style) are discussed. ISTROOP.9] Boone, 1999 (Comalli Version) (Table A6.11) The author obtained Stroop data on 155 middle-aged and older individuals (age range 45-84) recruited as described by Boone et al. (1990); data from the 1990 study were included in the 1999 publication, Mean age of the sample was 63.07 (9.29), mean years of education was 14.57 (2.55), and mean WAIS-R FSIQ was 115.41 (14.11). Fifty-three were male and 102 were female. Medical and psychiatric exclusion criteria are the same as in the 1990 publication, with the exception that participants with significant white-matter hyperintensities documented on MRI were retained in the sample. All participants considered themselves healthy, although 51 had some evidence of vascular illness (defined as cardiovascular disease and/or significant white-matter hyperintensities on MRI) based on self-report or evidence on examination of at least one of the following: current or past history of hypertension (n =39), arrhythmia (n=8), large area of white-matter hyperintensities on MRI (e.g., 10 em®; n=7), cor- ‘onary artery bypass graft (n=3), angina (n ), and old myocardial infarction (n = 1). ‘Twenty-four participants were currently on cardiac and/or antihypertensive medications. The Comalli version of the Stroop was administered. Means and SDs for time in seconds to complete the color-interference portion of the test are provided. A stepwise regression analysis revealed that age and FSIQ were significant contributors to Stroop color-interference performance, accounting for 15% and 13%, respectively, of test score variance; educational level, gender, and vascular status did not account for a significant amount of unique test score variance. Stroop normative data are presented for color-interference time in seconds stratified by IQ and age (< 65 and >65; average, high average, and superior IQ). Study strengths 1. Large overall sample size. 2. Presentation of the data by 1Q and age groupings.a You have elther reached a page that is unavailable for viewing or reached your viewing lil far this booka You have elther reached a page that is unavailable for viewing or reached your viewing lil far this booka You have elther reached a page that is unavailable for viewing or reached your viewing lil far this book124 TESTS OF ATTENTION AND CONCENTRATION as detailed in the test manual or standard instructions plus six suggestions (“looking at no more than three words at a time; focusing on only one letter in the word; remembering that the same color never occurs twice con- secutively; going at an even, steady pace; try- ing not to become distracted or lose one’s place; and not repeating an already-correct answer when correcting a mistake”). Participants were administered the Stroop at baseline (pretest), following five practice sessions (post-test), and at a L-week follow-up. No effect of gender or instruction format was documented. A significant effect of practice was found between the pre- and post-test but not between the post-test and follow-up. Data are presented in means and §Ds for number of items completed for the pretest, posttest, and follow-up sessions. Study strengths 1. Information on the effects of practice, gender, and alternative instructions on Stroop performance is provided, 2. Information on age, gender, and geo graphic area, with some information on education and recruitment procedures, is reported, 3. Test stimuli and procedures are specified. 4, Data are presented in means and SDs for number of items completed. Considerations regarding use of the study 1. Relatively small sample size. 2. Undifferentiated age range, although it is somewhat restricted. 3. No information on exclusion criteria or IQ. 4. Data are not broken down by gender or education. {STROOP.17] Fisher, Freed, and Corkin, 1990 (Golden Version) (Table A6.19) The authors collected Stroop data on 36 older controls (typically spouses of patients) from southern California as part of an investigation of Stroop performance in Alzheimer’s disease. Mean age was 72.9 (8.3) years, mean educational level was 14.6 (2.7) years, and mean Blessed Dementia Seale score was 1.5 (6.1). The sample included 13 males and 23 females. Participants had no history (as judged through medical records) of color blindness, cataracts, or glaucoma. The Golden Stroop Test stimuli and administration procedures were employed. Means and SDs are reported for number of items completed on each trial, Some participants (five female, three male) had difficulty discriminating between the colors blue and green on the color trial. Study strengths 1. Data presented in a homogenous age grouping. 2. Information is given regarding mean age, mean educational level, gender, mean Blessed Dementia Scale score, geographic area, and recruitment procedures. 3. Information is given on test stimuli and administration procedures. 4, Means and SDs reported for number of items completed. 5, Test administration format was described. Considerations regarding use of the study 1. Relatively small sample size. 2. No information regarding 1Q. 3. High educational level of the sample. 4. Unclear exclusion criteria. [STROOP.18] Daigneault, Braun, and Whitaker, 1992 (Golden Version) (Table A6.20) Stroop data were obtained on 125 French- speaking participants in Canada as part of a study investigating the effects of aging on prefrontal lobe skills. Participants were recruited through ads, trade union collabora- tion, and the help of a large sports center. Exclusion criteria included consumption of more than 24 beers, five bottles of wine, or 15 ounces of spirits per week; consumption of cocaine, LSD, or psychostimulants; any neurological or psychiatric consultation, psy- choactive medication, head trauma with hospitalization, or major surgery (e.g, cardiac). Participants were divided into two age groupings: 20-35, with a mean of 27.71 (4.05) years (n=70), and 45-65, with a mean of 56.62 (5.29) years (n=58). The younger group contained 38 men and 32 women; they were primarily specialized blue-collara You have elther reached a page that is unavailable for viewing or reached your viewing lil far this booka You have elther reached a page that is unavailable for viewing or reached your viewing lil far this booka You have elther reached a page that is unavailable for viewing or reached your viewing lil far this book128 TESTS OF ATTENTION AND CONCENTRATION Spanish were significantly slower on the En- glish Stroop trials, while bilinguals more fluent in English were slower on the Spanish Stroop. Study strengths 1. Data on Spanish-language and English- language Stroop performance in a large sample of bilingual participants (n =71) as well as a smaller group of monolingual Spanish speakers are provided, Adequate exclusion criteria, Information provided regarding age, education, gender, and handedness, as well as comprehensive information on language characteristics. on Considerations regarding use of the study 1. The administration format was altered (time to finish the stimuli rather than number of responses at 45 seconds). No information regarding 1Q. Data are not stratified by age. High educational level. ReL ISTROOP.24] Lopez-Carlos, Salazar, Villasefior Saucedo, and Pefia, 2003 (Golden Version) (Tables A6.26-A6.29) ‘The Golden version of the Stroop was used in a study investigating the effects of demographic variables on cognitive abilities in Spanish-speaking individuals with low education. The total sample included 115 volunteer monolingual Latino men with <10 years of formal education, who worked at manual labor in the Los Angeles area (n =65) and Jalisco, Mexico (n =50). Volunteers were recruited from posted advertisements in workplaces and personal solicitations, The mean age for the sample was 28.23 (8.74) years and mean education was 6.66 (2.54) years. Exclusion criteria consisted of any self-report of head injury, neurological insults, prenatal or birth complications, learning disabilities, psychiatrie problems, or substance abuse. Scores on the Beck Depression Inventory-II-Spanish Ver- sion (Mean = 12.92, SD = 8.94) and the Beck Anxiety Inventory-Spanish Version (Mean = 6.60, SD = 6.03) are also reported. Standard administration procedures were used. Participants were tested in Spanish. Se- lected subtests from the WAIS-ITL (Mexican version) were included in the battery. WAIS- Til Block Design raw scores are included in Tables A6.26-A6.29. Mean performance ‘on the Marin Marin Acculturation Scale for the Los Angeles sample was 17.61 (6.19) For the Los Angeles group, Picture Vocabulary subscale scores from the Woodcock-Johnson- III Tests of Achievernent (Mean =5.36, SD = 6.01) and the Bateria Woodcock-Mufioz-R, Pruebas de habilidad cognitiva-R (Mean = 29.77, SD =5.37) were used to assess level of English and Spanish word expressive abilities. The results are presented by years of education (0-6, 7-10), age (18-29, 30-49 years), and education and age (18-29 years old, 0-6 and 7-10 years of education; 30-49 years old, 0-6 and 7-10 years of education). The authors found a significant difference (p< 0.05) in performance on the Stroop (color and color/ word interference) between the two education groups. However, the two age groups did not differ significantly on any of the sections of the Stroop. No significant differences in scores between individuals from Los Angeles and Mexico were noted. Study strengths 1, Large sample for age and education groups. 2. Data availability for a healthy, employable, monolingual Spanish-speaking group with low education level. The sample is stratified into two education groups, two age groups, and four age x education groups. Additionally, data are available for United States and Mexico. 4, The sample composition is well described in terms of age, education, gender, geographic area, and recruitment procedures. 5. Adequate exclusion criteria. 6. Means and SDs are reported. WAIS-III Block Design subtest scores are presented. e ~ Considerations regarding use of the study 1. All-male sample. 2. Small sample sizes for the combined age and education groups.a You have elther reached a page that is unavailable for viewing or reached your viewing lil far this booka You have elther reached a page that is unavailable for viewing or reached your viewing lil far this booka You have elther reached a page that is unavailable for viewing or reached your viewing lil far this book132 TESTS OF ATTENTION AND CONCENTRATION Study strengths Large overall sample size, although the sizes of individual cells are not reported Stratification of data by age and education. Information regarding age, education, gender, health status, residential setting, and occupational status is provided. Test stimuli and procedures are reported Percentiles for raw scores are provided for 12 subgroupings, as well as overall means and SDs for the sample as a whole rey oe Considerations regarding use of the study Questionable adequacy of exclusion criteria (17% had MMSE< 24). Participants age 62 or older are included. No information regarding 1Q. Data obtained in Australia, which may limit applicability in the United States. Rep - RESULTS OF THE META-ANALYSES OF THE STROOP TEST DATA (GOLDEN VERSION, INTERFERENCE CONDITION) (See Appendix 6m) Data collected from the studies reviewed in this chapter were examined. Only the data for the Golden version had a sufficient aggregate sample to be included in the analyses. Data were combined in regression analyses in order to describe the relationship between age and test performance and to predict expected test scores for different age groups. Effects of other demographic variables were explored in follow-up analyses. The general procedures for data selection and analysis are described pter 3. Detailed results of the meta- analysis and predicted test scores across adult age groups are provided in Table A6m.1. ‘After initial data editing for consistency and for outlying scores, six studies, which generated 10 data points based on a total of 490 participants, were included in the analyses for the Interference condition, Data for the Word Reading and Color Naming conditions included only eight datapoints collected from five studies. Due to scarcity of the data for these conditions, they were not analyzed. A linear regression of the Stroop scores on age yielded an R? of 0.791, indicating that 79%. of the variance in scores is accounted for by the model. Based on this model, we estimated test scores for age intervals between 25 and 7A years. If predicted scores are needed for age ranges outside the reported age bound- aries, with proper caution (see Chapter 3) they can be calculated using the regression equations included in the tables, which underlie calculations of the predicted scores. Linear regression of SDs for the Stroop scores on age suggests that age does not account for a significant amount of variability in SDs (R?=0.015). Though some increase in variability with advancing age is expected, this trend was not present in the collected data. Therefore, we suggest that the mean standard deviations for the aggregate sample be used across all age groups. Means and SDs for the Word Reading, Color Naming, and Interference conditions for four studies (seven data points) that report data for all three conditions are summarized in Table A6m.2. Examination of the effect of education on Stroop scores indicated that education did not contribute to the test scores beyond its association with age in the data available for analyses. Effects of IQ and gender on the test scores were not examined as data were not available for these analyses, Strengths of the analyses 1. Postestimation tests for parameter spe- ifications did not indicate problems with normality or homoscedasticity. Limitations of the analyses 1. R? of 0.791 is acceptable. However, this value indicates that only 79% of the variance in Stroop scores is accounted for by the model. 2. The number of studies available for the analyses is small. 3. It should be pointed out that the datapoints available for the analyses area You have elther reached a page that is unavailable for viewing or reached your viewing lil far this booka You have elther reached a page that is unavailable for viewing or reached your viewing lil far this booka You have elther reached a page that is unavailable for viewing or reached your viewing lil far this book136 TESTS OF ATTENTION AND CONCENTRATION To adequately evaluate the ACT normative reports, six key criterion variables were deemed critical. The first four of these relate to subject variables, and the remaining two relate to procedural variables. Minimal criteria for meeting the criterion variables were as follows. Subject Variables Sample Size Fifty cases are considered a desirable sample size. Although this criterion is somewhat arbitrary, a large number of studies suggest that data based on small sample sizes are highly influenced by individual differences and do not provide a reliable estimate of the population mean, Sample Composition Description Given the evidence that ACT performance may be significantly impacted by medical status (eg, vascular illness), information regarding medical exclusion criteria is critical. In addition, as discussed previously, information should probably also be provided regarding educational level, gender, psychiatric exclusion criteria, geographic region, ethnicity, occupation, handedness, and. recruitment procedures, even though there are as yet no data indicating that these factors influence test performance. Reporting of Age Given the equivocal and modest relationship between age and ACT performance, ACT normative data probably do not need to be presented by age group intervals, but information on the ages of the normative samples should be provided. 1Q Group Intervals Given the evidence that IQ may account for more unique test score variance than do demographic factors, information regarding 1Q level should be reported for each subgroup, and preferably normative data should he presented by IQ intervals. Procedural Variables Description of the Administration Format Used Given that different test administration formats involve differing lengths of distraction intervals, specific information regarding the delays should be provided. Data Reporting Means and standard deviations, and preferably ranges, for total score out of 60 are important. In addition, it is advantageous for data to be provided for each of the distraction intervals separately. SUMMARY OF THE STATUS OF THE NORMS In terms of subject variables, only one study provides data by IQ level (Boone, 1999), although 1Q data are reported in a second study (Boone et al., 1990), Information on age, gender, education level, geographic area, and recruitment procedures is reported for all studies. In addition, medical, psychiatric, neurologic, and substance abuse exclusion criteria are described and judged to be adequate for all studies. Ethnic composition was indicated in two studies (Anil et al., 2003; Boone et al., 1990). Handedness data were provided only in the investigations conducted by Stuss and colleagues (Stuss et al., 1987, 1988). While all studies exceeded a total sample size of 50, only one study reached the criterion of 50 partici, pants per individual grouping cell (Anil et 2003). In terms of procedural variables, information is available regarding the precise administration formats for all studies. Means and SDs are reported for total score in all but one study (Anil et al., 2003), and means and SDs for individual distractor delays are provided in all but one study (Boone, 1999). Practice effects are investigated in the reports by Stuss and colleagues (Stuss et al., 1987, 1988), and data on qualitative performance variables (perseverations, errors in letter sequence) are provided in Boone et al, (1990).a You have elther reached a page that is unavailable for viewing or reached your viewing lil far this booka You have elther reached a page that is unavailable for viewing or reached your viewing lil far this booka You have elther reached a page that is unavailable for viewing or reached your viewing lil far this book140 TESTS OF ATTENTION AND CONCENTRATION 4. Information regarding educational level, gender, geographic area, recruitment procedures, and fluency in English. 5. Though not stated, test administration procedures are the same as those in Boone et al. (1990). 6. Means and SDs for the test scores are reported. Considerations regarding use of the study 1. Individual 1Q-by-age groupings have sample sizes ranging 16-37. 2. Data are presented in terms of total score rather than separately for each distraction interval, TACT.6] Anil, Kivircik, Batur, Kabakci, Kitis, Giiven, Basar, Turgut, and Arkar, 2003 (Table A7.8) ACT data were collected on 236 individuals in Turkey, who were recruited from hospital staff or through personal contacts. Exclusion criterion included neurological or psychiatric conditions. The sample was stratified into three age groups (16-25, 26-45, and 46-65) and three education groups (8-10, 11-14, and >14 years). The youngest age group consisted ‘of 40 males and 22 females, who averaged 22 (2.7) years of age. The middle age group was composed of 70 males and 55 females, who averaged 34.1 (5.9) years. The oldest group included 28 males and 21 females, who averaged 53.8 (4.7) years The ACT was translated with the consultation of a linguist, and consonants from the Turkish alphabet showing similar phonetic characteristics to the original ACT were employed. Participants were instructed to count backward by Is rather than the standard 3s. Means and SDs are reported for each delay interval. Analyses revealed no gender effects, although better performance was associated with younger age and more years of education. Study strengths 1. Large overall sample size, although the sizes of the nine subcells are not reported. 2. Information provided for age, gender, education, geographic area, language, and recruitment procedures. 3. Data stratified by both age and education. 4, Adequate exclusion criteria. 5. Means and SDs for the test scores are reported, Considerations regarding use of the study 1. Test was translated into Turkish and data were collected in Turkey, rendering use problematic for English-speaking patients. 2. Test administration was not standard (subjects counted backward by Is rather than 3s), 3. No information regarding IQ CONCLUSIONS ACT has been underutilized as a clinical measure of executive dysfunction despite evidence that it may be particularly sensitive to white- matter disturbance. Given emerging interest in working-memory paradigms, the consonant trigrams task may experience an increase in popularity. Most working-memory paradigms have been used in experimental studies, and normative data are typically not available. The fact that a normative data pool of upward of 500 participants has been collected for ACT may make it an attractive working-memory procedure for clinical practice. In addition, the fact that the ACT task does not involve a timed response makes it a desirable executive measure in that test performance is not confounded by declines in mental speed. For tasks such as Trails B, Stroop Color Interference, and word and design generation, poor scores may reflect slowing in information-processing speed rather than executive dysfunction per se. Future research is needed to determine which delay intervals (i.e., 3, 9, and 18 seconds vs. 9, 18, and 36 seconds) are most sensitive and appropriate for clinical use. Also, normative data need to be obtained on populations with less than average 1Qs? ’Meta-analyses were not performed on ACT due to a lack of sufficient dataa You have elther reached a page that is unavailable for viewing or reached your viewing lil far this booka You have elther reached a page that is unavailable for viewing or reached your viewing lil far this booka You have elther reached a page that is unavailable for viewing or reached your viewing lil far this booka You have elther reached a page that is unavailable for viewing or reached your viewing lil far this booka You have elther reached a page that is unavailable for viewing or reached your viewing lil far this booka You have elther reached a page that is unavailable for viewing or reached your viewing lil far this booka You have elther reached a page that is unavailable for viewing or reached your viewing lil far this book148 TESTS OF ATTENTION AND CONCENTRATION substantially less education than the other age groups. 3. Overall sample size is adequate, but individual cells are small, 4. The data were obtained on Canadian subjects, sometimes in French, which may limit their usefulness for clinical interpretation in the United States. IPASAT.4] Rao, Mittenberg, Bernardin, Haughton, and Leo, 1989 (Gronwall Version) (Table A8.5) This study examined the effects of focal peri- ventricular white-matter changes on cognitive functioning in healthy adults. The authors selected 40 participants (10 males, 30 females) who had normal brain imaging to serve as controls. Participants ranged in age from 25 and 60 years, with an average age of 42.8 (8.1), average educational level of 14.0 (2.3), and average Verbal 1Q of 106.5 (5.8). All participants were recruited from newspaper’ advertisements in the Milwaukee, Wisconsin, area. Additional exclusion criteria were a history of hypertension, cardiac or cerebrovascular disease, neurological illness, head injury, substance abuse, or psychiatric illness, Partici- pants underwent physical and neurological exams Gronwall’s 61-digit test administration version of the PASAT was employed, but only two trials, at 3- and 2-second pacing rates, were used. Total correct responses for both trials are reported. Study strengths 1. The sample composition is well described in terms of age, education, gender, and recruitment procedures, 2, Exclusion criteria are provided. 3. Test administration procedures dre described. 4. Means and SDs for the test scores are reported. Considerations regarding use of the study 1. Relatively small sample size. 2. The data are not stratified by age, gender, or education. 3. Data for only two pacing rates for the PASAT are provided. [PASAT.5] Stuss, Stethem, Hugenholtz, and Richard, 1989 (Gronwall Version) (Table A8.6) ‘The authors compared the performance of two groups of head-injured patients to controls on three neuropsychological tests. Twenty-six control participants (20 males, 6 females) with no history of neurological or psychiatric disorder were recruited. Participants were matched with head-injured patients on age (+ 2 years), education (+2 years), and gender. Thus, control subjects ranged in age from 17 to 57, with an average of 29.7 (12.4), and ranged in educational level from 7 to 20 years, with an average of 13.2 (3.0). The standard 61-digit version using four trials (2.4, 2.0, 1.6, and 1.2 seconds) was administered at two different points in study 1 and at five different points in study 2, Testing and retesting sessions were separated by approximately 1 week. Data for study | are reported in this review. Study strengths 1. The sample composition is well described in terms of age, education, gender, and recruitment procedures, 2, Adequate exclusion criteria. 3. Test administration procedures are specified. 4. Means and SDs for the test scores are provided. Considerations regarding use of the study L. The geographic location where partic pants were recruited is not provided; however, it may be assumed that they were from the Ottawa, Canada, region, which may limit their usefulness for clinical interpretation in the United States. While not mentioned in this study, in previous studies the authors have administered the test in French or English, depending on the participant's language preference. 2. Small sample size. Other comments 1, Test data for two testing sessions (from study 1) have been reproduced in this chapter. In addition, the authors provide data for five testing probes (study 2), which can be found in the original study.a You have elther reached a page that is unavailable for viewing or reached your viewing lil far this booka You have elther reached a page that is unavailable for viewing or reached your viewing lil far this booka You have elther reached a page that is unavailable for viewing or reached your viewing lil far this book152 TESTS OF ATTENTION AND CONCENTRATION Other comments 1. The exercisers scored _ significantly higher on the 1.6-second trial of the PASAT relative to the nonexercisers. IPASAT.13] Wingenfeld, Holdwick, Davis, and Hunter, 1999 (Gronwall Version) (Table A8.14) This study was designed to develop normative data for a computerized version of Gronwall’s PASAT. The authors recruited 168 (80 males, 88 females) college students between the ages of 17 and 48 with an average age of 21 (5.1) years at the University of Arkansas, Fayette- ville. The sample was $8% Caucasian, 4% Af rican American, 4% Asian American, and 4% other ethnic group. The data were first stratified by gender and then by two age groups (17— 29, 30-48 years). Exclusion criteria were any history of neurological illness, emotional problems, learning disability, attentional problems, or uncorrected hearing difficulty. Only native English speakers were included. Subjects were given course credit for participation. The testing procedures are similar to those of Gronwall, except that the digits are presented by the computer via speaker and responses are recorded through an external speaker. Additionally, while all four trials are delivered (2.4-, 2.0, 1.6-, and 1.2-second pacing), a new random series of the 61 digits is presented during each trial. Study strengths . Adequate sample sizes, except for the 30-48 age group. The data are stratified first by gender and then by two age groups (17-29, 30- 48 years), The sample composition is well described in terms of age, gender, ethnicity, and recruitment procedures. Adequate exclusion criteria. ‘Test administration procedures are specified. Means and SDs for the test scores are reported. ae eo rey 2 Considerations regarding use of the study 1. Cell size for the 30-48 age group is relatively small (n = 12) Other comments 1, Additional outcome measures, such as number of errors committed and nam ber of “no” responses, are reported in the original article, which have not been reproduced in this chapter. IPASAT.14] Bate, Mathias, and Crawford, 2001 (Gronwall Version) (Table A8.15) This study examined the relationship between the Test of Everyday Attention and various neuropsychological measures in patients with severe head injury. The study was conducted in Australia, where 35 controls (20 males, 15 females) who were native English speakers with no history of psychiatric illness, neurological disorders, intellectual disability, substance abuse, or hemiplegia of the dominant hand, were recruited. Participants were an average of 30.2 (10.3) years of age, obtained an average of 126 (2.0) years of education, and had an average premorbid IQ of 101.1 (9.1) based on the National Adult Reading Test-Revised (NART-R). The exact location and procedures for participant recruitment are not specified. Also, it is unclear whether the participants were patients with non-brain injury-related illness or healthy individuals from the community. The Gronwall 61-digit version of the PASAT was presented with all four trials (2.4-, 2.0-, 1.6-, 1.2-second pacing). Study strengths 1. The sample composition is well described in terms of age, education, gender, and IQ. 2. Adequate exclusion criteria. 3. Test administration procedures are specified, 4, Means and SDs for the test scores are reported. Considerations regarding use of the study 1. The sample size is small. 2. Recruitment procedures are not well described. Controls may be non-head- injured medical patients. 3. The data were obtained on Australian subjects, which may limit their usefulness for clinical interpretation in the United States.a You have elther reached a page that is unavailable for viewing or reached your viewing lil far this booka You have elther reached a page that is unavailable for viewing or reached your viewing lil far this booka You have elther reached a page that is unavailable for viewing or reached your viewing lil far this book156 TESTS OF ATTENTION AND CONCENTRATION IPASAT.21] Wiens, Fuller, and Crossen, 1997 (Levin Version) (Tables A8.23 and AB.24) This is a normative study for Levin et al’s (1987) version of the PASAT. The authors selected 821 (672 male, 149 female) partici: pants aged 20-49 years who were adminis tered neuropsychological and psychological tests as part of a civil service job selection process. There were 699 Caucasians, 46 Afri- can Americans, 31 Hispanics, 32 Asians, and 13 Native Americans in the sample. The data were stratified by gender. Male participants were an average of 29.2 (6.1) years of age, with an average education of 14.6 (1.5) years and an average WAIS-R full-scale 1Q (FSIQ) of 106.6 (11.0). Female participants were an average of 29.2 (5.6) years of age, with an average education of 14.5 (1.6) years and an average WAIS-R FSIQ of 105.4 (11.1). They were all from the Pacific Northwest of the United States. All participants had passed physical and medical health screening prior to test administration. All had passed a test of basic academic skills, and none had alcohol or substance abuse. All four trials of Levin's version of the PASAT were administered. Study strengths 1. The sample composition is well described in terms of age, education, gender, IQ, ethnicity, geographic location, and recruitment procedures. 2. The data are stratified by gender and by age x 1Q. 3. Adequate exclusion criteria. 4. Test administration procedures are specified. Means and SDs for the test scores are reported. a Considerations regarding use of the study 1, Overall sample size is adequate, but some of the individual cells are relatively small. Other comments 1, The authors found differences between the ethnic groups, but the sample sizes were too small to make any definitive conclusions. IPASAT.22] Tiersky, Cicerone, Natelson, and DeLuca, 1998 (Levin Version) (Table 48.25) Information-processing speed was compared among patients with chronic fatigue syndrome, mild head injury, and normal controls. All 20 normal control participants were females, who were recruited from advertisements in the local community of New Jersey and paid for their participation. Participants were an average of 37.1 (2.4) years of age, with an average education of 15.0 (0.55) years. Exclusion criteria were current medical illnesses, a history of loss of consciousness > 5 minutes, psychiatric illness, use of medication, or participation in a regular exercise program. ‘The Levin et al. (1987) version of the PASAT was used, and the total number of correct responses for all four trials was reported. Study strengths 1. The sample composition is well described in terms of age, education, gender, geographic area, and recruitment procedures, Adequate exclusion criteria, Reference is provided for test administration procedures. Means and SDs for the test scores are reported. ep » Considerations regarding use of the study 1. Small sample size. Female participants only. . Education level is high. . Total scores are reported instead of individual scores for each of the four trials. ae [PASAT.23] Stein, Kennedy, and Twamley, 2002 (Levin Version) (Table A8.26) The authors examined the difference in neuropsychological test performance of female victims of partner violence with posttraumatic stress disorder (PTSD) compared to victims without PTSD and nonvictimized controls. Twenty-two female control participants were recruited through posted advertisements and personal contacts in the San Diego, California, community. ‘They were an average of 294 (10.7) years of age, had an average of 13.9 (1.5) years of education, and had an average raw WAIS-III Verbal subtest score of 45.9a You have elther reached a page that is unavailable for viewing or reached your viewing lil far this booka You have elther reached a page that is unavailable for viewing or reached your viewing lil far this booka You have elther reached a page that is unavailable for viewing or reached your viewing lil far this book9 Cancellation Tests BRIEF HISTORY OF THE TESTS A number of cancellation tests have been. developed over the years. Such tests are primarily designed to assess aspects of attention, such as sustained and selective attention. Sustained attention “refers to the ability to maintain a consistent level of performance over an extended period of time,”, while selective attention entails selection of relevant target stimuli while avoiding distractors (Ruff & Allen, 1996). Some cancellation tests are also referred to as “vigilance tests” (Lezak, 1995; Lezak et al., 2004) and typically involve measures of both speed and accuracy of performance. A number of cancellation tests using letters, numbers, or symbols as target stimuli are available to clinicians. The Ruff 2&7 (Ruff et al, 1986a), Digit Vigilance (Lewis & 1979), Digit Cancellation Test (Della al, 1998), Visual Search and Attention Test (Trenerry et al., 1990), Verbal and Nonverbal Cancellation Tasks (Mesulam, 1985), Letter and Symbol Cancellation Task (Caplan, 1985), and Star Cancellation (Halli- gan et al., 1991; Wilson et al., 1987) are among the many cancellation tests available to clinicians and researchers (see Lezak, 1995, and Lezak et al., 2004, for more details on these tests). The Ruff 2&7 Selective Attention Test and Digit Vigilance Test are the two most 160 commonly used cancellation tests with the most available literature and have been selected for review in this chapter. RUFF 2&7 SELECTIVE ATTENTION TEST Brief Overview of the Ruff 2&7 The Ruff 2&7 Selective Attention Test was developed by Ruff and colleagues and is included in the San Diego Neuropsychologi- cal Test Battery (Baser & Ruff, 1987; Ruff & Crouch, 1991). The test is designed to examine both sustained and selective attention using two distractor conditions. The test consists of 20 blocks, each containing three lines of 50 characters. Within each line, 10 target digits (2s and 7s) are intermixed with either other number distractors or capital letter distractors. Ruff distinguished two test conditions: (1) blocks in which the target numbers are embedded among letters, referred to as the “Automatic Detection” condition, and (2) blocks in which the target stimuli are embedded among other numbers, referred to as the “Controlled Search” condition. The presentation of the conditions (blocks of all digits or blocks of digits and letters) is alternated. Following brief practice trials, the examinee is given 15 seconds to complete each of the 20 blocks. He or she isa You have elther reached a page that is unavailable for viewing or reached your viewing lil far this booka You have elther reached a page that is unavailable for viewing or reached your viewing lil far this booka You have elther reached a page that is unavailable for viewing or reached your viewing lil far this book164 TESTS OF ATTENTION AND CONCENTRATION and do not provide a reliable estimate of the population mean. Sample Composition Description Information regarding medical and psychiatric exclusion criteria is important. It is unclear if gender, intellectual level, handedness, geographic recruitment region, socioeconomic status, occupation, ethnicity, or recruitment procedures are relevant. Until this is determined, it is best that this information be provided. Age Group Interval This criterion refers to grouping of the data into limited age intervals. This requirement is especially relevant for this test since a strong effect of age on cancellation test performance has been demonstrated in the literature. Reporting of Educational Levels Given the possible association between education and cancellation test scores, information regarding educational level should be reported for each subgroup. Procedural Variable Data Reporting For the Ruff 2&7, group means and standard deviations for the number of items correctly cancelled should be reported for the Automatic Detection and Controlled Search conditions separately. For the DVT, the mean and SD for time in seconds tuken to complete the task should be reported. Additional useful information for the cancellation tests includes the number of omissions (target numbers not cancelled) and the number of commissions (numbers other than the target digits cancelled). SUMMARY OF THE STATUS OF THE NORMS: Information presented in the studies reporting data for the cancellation tests differs ‘across studies. Some of these differences will be summarized below. ‘Only one study was designed to provide normative information on the Ruff 2&7 (Ruff et al., 1986a). Other data on the Ruff 2&7 come from control groups in clinical comparison studies. Ruff et al. (1986a) partition normative data for the two conditions by four age groups and three educational levels; the other studies report demographic information. Another study by Ruff et al, (1992) provides normative data for speed and accuracy for normal controls, Finally, Bate et al. (2001) provide Raff 2&7 data on a small sample of healthy controls. Most of these studies report either speed or speed and accuracy data summed across the two Ruff 2&7 conditions. Additional normative information, particularly tables for converting raw scores into T scores and percentiles, based on age and educational level, are provided in the Ruff 2&7 professional manual (Ruff & Allen, 1996) There are very few normative studies on the DVT. Most of the studies have small sample sizes (10-40), with the exception of Heaton et al’s (1991, 2004) normative manuals, which include data for 210 participants with standardized scores adjusting for age, education, and gender presented for African-American and Caucasian participants separately in the 2004 edition. In this chapter, we review studies which use Ruff 2&7, followed by DVT studies. Published manuals are reviewed first, followed by normative studies and control groups from clinical comparison studies presented in ascending chronological order for each test separately. The text of study descriptions contains references to the corresponding tables identified by number in Appendix 9. Table A9.1, the locator table, summarizes information provided in the studies described in this chapter." SUMMARIES OF THE STUDIES. Ruff 2&7 Manual [Ruff 287.1} Ruff and Allen, 1996 ‘The normative information in this manual is primarily based on previous studies by Ruff ‘Children's norms for various cancellation tests are available in Baron (2004) and Spreen and Strauss (1998).a You have elther reached a page that is unavailable for viewing or reached your viewing lil far this booka You have elther reached a page that is unavailable for viewing or reached your viewing lil far this booka You have elther reached a page that is unavailable for viewing or reached your viewing lil far this book168 TESTS OF ATTENTION AND CONCENTRATION Oklahoma City, Oklahoma. Standard administration procedures were used. Study strengths 1. ‘The sample composition is well described in terms of age, education, geographic location, and recruitment procedures Adequate exclusion criteria, Means and SDs for the test scores are reported. 2 Considerations regarding use of the study 1. Small sample size. 2. Wide age range for the sample. Data are not presented by age group. 3. The data for over half of the sample were obtained on Canadian participants, which may limit their usefulness for clinical interpretation in the United States. 4. Low educational level. IDVT.3] Grant, Prigatano, Heaton, McSweeny, Wright and Adams, 1987 (Table A9.6) The authors examined neuropsychological functioning in COPD patients with mild, moderate, and severe hypoxemia. They selected 99 “nonpatient” participants (75 male, 24 female) who did not have COPD, a history of “significant” head injury, a history of substance abuse, heart disease that required treatment, or neurological or metabolic illnesses. Participants were an average of 63.1 years of age and had obtained an average of 10.2 (3.6) years of education. The authors do not specify ‘testing procedures but do mention the larger battery from which the DVT is drawn (ie, the Rennick-Lafayette Repeatable Battery) Study strengths 1. Relatively large sample size. 2. The sample composition is _ well described in terms of age, education, and gender. 3. Adequate exclusion criteria. 4, Means and SDs for the test scores are reported, Considerations regarding. use of the study 1. Test administration procedures are not specifically described. 2. Data are not partitioned by age. Low educational level. IDVT.4] Kelland and Lewis, 1994 (Table 49.7) This study was designed to assess the test— retest reliability and validity of the DVT, as well as to measure the single-dose effects of diazepam in groups of college students. The authors selected 20 college students (10 male, 10 female) from a “large urban university” to serve as controls (who were administered a placebo rather than diazepam). Participants ranged in age from 18 to 30, with an average age of 20.0 (2.8) and an average educational level of 13.1 (1.3) years. Participants were excluded from the study if they reported taking medications; had a history of substance abuse; had a medical history that required central nervous system-depressant _medica- tion use; had a history of neurological, cardiac, renal, or hepatic disease; or drank more than two cups of coffee a day. The DVT, along with other neuropsychological tests, was administered two times to each participant, with each session separated by 1 week. Standard administration procedures were used. Data are reported for both the standard (crossing out 9s) and the alternate (crossing out Gs) administrations. These data were later reanalyzed by Kel- land and Lewis (1996), who found a practice effect from week 1 to week 2 of test administration but no differences between week 2 and week 3. The Kelland and Lewis (1996) data for weeks | and 2 are the same as those reported in this study and, thus, will not be reproduced in this chapter. Study strengths 1. The sample composition is well described in terms of age, gender, education, and recruitment procedures. . Adequate exclusion criteria. 3. Means and SDs for the test scores are reported. 4. Test-retest data are reported. rp Consideration regarding use of the study 1. Small sample size.a You have elther reached a page that is unavailable for viewing or reached your viewing lil far this booka You have elther reached a page that is unavailable for viewing or reached your viewing lil far this booka You have elther reached a page that is unavailable for viewing or reached your viewing lil far this book10 Boston Naming Test BRIEF HISTORY OF THE TEST The Boston Naming Test (BNT) is a test of confrontation naming consisting of simple Iine-drawn pictures. Its experimental version includes 85 drawings (Kaplan et al., 1978). The modified version of the BNT, published in 1983, is limited to 60 of the original 85 drawings, arranged in order of ascending difficulty (Kaplan et al., 1983). Participants are allowed 20 seconds to name each item. Stimulus cues are offered to correct for misperception errors. They are followed by phonemic cues, which provide the first phonemes of the word, facil- itating lexical retrieval. The total score on the test is the number of correct responses produced spontaneously (SR) and with the aid of stimulus cues (SC). The basal rule is eight consecutive pictures correctly named without any assistance, and the discontinuation rule is six consecutive failures. (For detailed administration and scoring instructions, see Lezak et al, 2004; Spreen & Strauss, 1998; and instructions in the test stimulus booklet.) ‘The authors provide normative data on the 60-item version for children 5.5-10.5 years of age, broken down into six age groups based on. five participants in each group; for normal adalts aged 18-59 years, broken down into two educational groups and five age groups based on a total of $4 participants; and for $2 aphasic patients partitioned by aphasia severity level. In 2000, Kaplan et al. published the second edition of the BNT, which includes a 15-item short form, four multiple-choice options for each of the same 60 items that were used in the previous edition of the test, and error codes to categorize incorrect responses, The ceiling was changed to eight items. The normative data included in the record booklet are partitioned by 15 age groups for children, spanning in age between 5-0 and 12-5 years, and five age groups for adults between 18 and 79 years. In addition to being used as a stand-alone test, the BNT, second edition, is included in the Boston Diagnostic Aphasia Examination (BDAE) published by Psychological Assess- ment Resources (Goodglass et al., 2001). All studies listed in this chapter are based on the original 60-item version of the BNT since no studies based on the second edition of the test were published by the time this book went into production. Thompson and Heaton (1989) and Heaton et al, (1991) reported high correlation between the 85-item and the 60-item versions (r=0.96), However, the mean percent of correct responses was somewhat lower for the original version (85.1% vs. 87.8%) in their sample of clinical referrals for neuropsychological evaluation. 173a You have elther reached a page that is unavailable for viewing or reached your viewing lil far this booka You have elther reached a page that is unavailable for viewing or reached your viewing lil far this booka You have elther reached a page that is unavailable for viewing or reached your viewing lil far this bookBOSTON NAMING TEST decline in naming ability (Beatty et al., 2002; Huff et al, 1986b; Margolin et al., 1990; Storandt & Hill, 1989). Goldman et al. (1998) reported decline in naming ability as a function of severity of Parkinson's disease. Several studies have explored the mechanisms of naming deficits in different types of aphasia. According to Nicholas et al. (1985), aphasic participants (across all major aphasic groups, except for anomics) have difficulty in the phonological encoding of words. Kohn and Goodglass (1985) support this finding by demonstrating considerable similarity in error types across different aphasic groups. In addition, they provide a more specific analysis of anomic errors associated with different types of aphasia: “Negated responses were associated with Broca’s aphasia, whole-part errors (hose for nozzle) were associated with frontal anomia, and poor phonemic cuing was associated with Wernicke’s aphasia” (p. 266). The authors also reported that anomic aphasics produced the highest frequency of multiword circumlocu- tions and the lowest number of phonemic errors, which they relate to minimal word production difficulty in anomic aphasia relative to other aphasia syndromes. On the other hand, Lewis and Soares (2000) showed that pre- language conceptual organization deficit might underlie naming difficulties in aphasic patients who present with semantic paraphasias as a leading feature of their language disturbance. Investigation of the neuroanatomical substrates of naming may shed light on its cognitive mechanisms. The hippocampus of the dominant hemisphere has been widely implicated in naming function (Davies et al., 1 Martin et al., 1999; Sawrie et al., 2000; Sei denberg et al., 1998). Other aspects of the dominant temporal lobe are also involved in naming function, according to Ojemann et al (1993) and Wiggs et al. (1999). Bell et al. (2000) showed that a decline in naming ability is a frequent sequela of left anterior temporal lobectomy. These findings consistently point to the involvement of the dominant temporal area in naming function, In addition, areas of the superior parietal and frontal cortices are implicated by Wiggs et al. (1999). In addition to the obvious use of the BNT in assessing word retrieval, Kaplan (1988) W7 observed that analysis of misperception errors allows identification of perceptual fragmenta- tion and inattention to a part of the visual field, which are associated with nondominant hemisphere dysfunction. Modifications and Short Versions of the BNT An attempt to create two shorter equivalent forms of the BNT for repeated testing was un- dertaken by Huff et al. (1986a). Based on the experimental 85-item version, these authors developed two 42-item versions that proved to be reliable (r=0.71-0.81 for controls and r=0.97 for AD patients) and equivalent in difficulty. Both versions were standardized on normal and brain-damaged participants. The different forms of the test were compared by Thompson and Heaton (1989). They administered an 85-item version of the BNT to a clinical group of participants; data were then rescored according to the criteria for the 60- and 42-item forms. Although certain differences between forms were found, there were high correlations among different versions of the test (ranging 0.82-0,96) and between BNT scores and other language measures Heaton et al. (1991, 2004) published normative data for the 85-item version. The revised set of norms (2004) based on a sample of over 1,000 normal adults, stratified by age, education, gender, and race/ethnicity (African American and Caucasian). Farmer (1990) proposed modifications of the administration, response coding, and scoring procedures for the full version of the BNT, which were used by the author to assess non- brain-damaged adults. Eight short versions of the test were compared by Mack et al. (1992) and Williams et al. (1989) in patients suffering from AD and neurologically intact elderly. The short forms were four 15-item versions developed by these authors, one 15-item version used by the Con- sortium to Establish a Registry for Alzheimer's Disease (CERAD), and three 30-item versions. Scores on each version could be extrapolated to a complete 60-item BNT score. Franzen et al. (1995) compared different short forms on a sample of 320 individualsa You have elther reached a page that is unavailable for viewing or reached your viewing lil far this booka You have elther reached a page that is unavailable for viewing or reached your viewing lil far this booka You have elther reached a page that is unavailable for viewing or reached your viewing lil far this bookBOSTON NAMING TEST in BNT performance was observed in groups with lower educational levels. In contrast, Cruice et al. (2000), Farmer (1990), Fastenau et al. (1998), Ivnik et al. (1996), and LaBarge et al. (1986) did not find any association between BNT performance and educational level, which might be related in part to restricted ranges of educational levels in some samples. A combined effect of age and education should be taken into consideration, according to Heaton et al. (1999), as older individuals with lower educational levels are more likely to be misidentified as dysnomic. Similarly, an interaction of age and education was reported by Borod et al. (1980), Farmer (1990), and Welch et al. (1996). Manly et al. (1999) showed that illiterates scored significantly lower than literate participants with up to 3 years of education in their sample of Spanish-speaking, non-demented elders, Albert et al. (1988), Killgore and Adams (1999), Thompson and Heaton (1989), and Tombaugh and Hubley (1997) found that verbal intelligence, as measured by WAIS-R Vocabulary score, in their samples of neurologically normal participants strongly affected BNT performance. Similarly, Hawkins et al. (1993) found reading vocabulary score to be strongly correlated with BNT performance in a sample of psychiatric and normal participants, The authors presented BNT performance expectation guidelines based on the Gates-MacGinite Reading Vocabulary Test for use as a complement to the published norms. Based on a review of the relevant literature, Hawkins and Bender (2002) emphasized the contribution of premorbid vocabulary to BNT performance and made recommendations on moderator variables to be considered in further research. Gender was shown in several studies to be unrelated to naming efficiency in normal samples (Cruice et al., 2000; Fastenau et al., 1998; Henderson et al., 1998; Ivnik et al., 1996; LaBarge et al., 1986). However, based ‘on an analysis of BNT performance, Ripich et al, (1995) suggest that naming skills are poorer for women than for men with similar clinical dementia rating (CDR) scores and demographic characteristics in their sample of 60 early AD participants. Similarly, Lansing 181 et al. (1999), Marien et al. (1998), Ran- dolph et al. (1999), Saxton et al, (2000), and Welch et al. (1996) reported males outperforming females in normal samples. Randolph et al. (1999) suggested that the gender effect is due to performance on specific items that are more familiar to men. Reports of the effect of ethnicity and culture on BNT performance yield contradictory findings. In the study by Henderson et al. (1998), comparing healthy African-Ameri and ‘sian participants, ethnicity was unrelated to BNT performance. Similarly, Manly et al. (2002) did not find a notable difference in naming ability between African-American and Caucasian elders. In contrast, Lichtenberg et al. (1994) and Ross et al. (1995) report higher scores for Caucasian compared to African-American participants in a group of medical inpatients. Similar findings are reported by Kimbarrow et al. (1996) on a sample of geriatric rehabilitation patients and by Whitfield et al. (2000) on a sample of healthy elderly. These findings are discussed by Hen- derson et al. (1998) in the context of lower educational level of African-American participants. Kimbarrow et al. (1996) emph economic status, ethnicity, and cultural factors and Whitfield et al. (2000) point to the cultural appropriateness of the material as ex- planatory variables. Furthermore, Manly et al (1998) found that participants who reported less acculturation obtained lower scores on the BNT in their sample of medically healthy African Americans. Similarly, Touradji et al (2001) reported lower naming performance in foreign-born Caucasian elders compared to those born in the United States. Further moderating variables were acculturation level and language use. Qualitative differences in BNT performance as a function of ethnic background and geo- graphical region were reported by Goldstein et al. (2000). Participants in their study tended to use alternative responses to several BNT items that are specific to their region, with further differences between black and white participants in which alternative responses were used. Utility of the standard version of the BNT in assessment of monolingual Spanish-speaki socio-a You have elther reached a page that is unavailable for viewing or reached your viewing lil far this booka You have elther reached a page that is unavailable for viewing or reached your viewing lil far this booka You have elther reached a page that is unavailable for viewing or reached your viewing lil far this bookBOSTON NAMING TEST Study strengths 1. Information regarding age, education, gender, geographic area, IQ, and fluency in English is reported. 2. Adequate exclusion criteria were used, The data are partitioned into four age groups. Test-retest data are provided. 5. Overall sample size is large, with some cells approaching 50 while some cells being rather small. 6. Means and SDs for the test scores are reported. » Consideration regarding use of the study 1. Mean education and intelligence levels are high. {BNT.5] Neils, Baris, Carter, Dell’aira, Nordloh, Weiler, and Weisiger, 1995 (Table A10.7) The study addresses the effects of demographic factors on BNT performance. Parti- cipants were 323 normal elderly (244 females, 79 males) aged 65-97 residing in northern Kentucky and the greater Cincinnati, Ohio, area; 167 participants were living indepen- dently and 156 were institutionalized in extended-care facilities for at least 1 month, All participants were carefully screened for neurological disorders and had adequate vision, language comprehension, and attention. The administration procedure differed from standard in that the stimulus cues were offered after any error was made, irrespective of whether it was a visual-perceptual error. The data are presented in an age-by- education-by-living environment matrix. The combination of age, education, and living environment accounted for 32% of the performance variance. The results suggest that scores for low-education and high-education groups are less affected by age and living environment than scores for participants with 10-12 years of education. Correlation between BNT score and education was r=0.38, whereas the correlation of BNT with age was r= —0.33, Study strengths 1. Information regarding age, education, gender, and geographic area is provided. 185 Data across wide ranges of different demographic characteristics are presented. 2 Strict selection criteria were used for neurological disorders and cognitive dysfunction. Overall very large sample size The data are presented in an age-by education-by-living environment matrix. Means and SDs for the test scores are reported. Ae wu Considerations regarding use of the study 1. No information regarding intellectual level. 2. Sample sizes in individual cells are small. 3. The administration procedure somewhat differed from standard instructions. IBNT.6] Ross, Lichtenberg, and Christensen, 1995 (Table A10.8) This article represents an expansion on the previously reported data in Lichtenberg et al (1994). In study 1, the authors provide data for 123, geriatric medical inpatients at an urban rehabilitation hospital in Michigan (60% African American, 40% Caucasian, 62% female, 38% male). Mean age was 75.87 (7.42), with mean education of 11.05 (3.38). Rigorous. exchision criteria for neurological disorders and depression were used. Mean Mattis Dementia Rating Scale (DRS) score for the sample was 132.76 (4.93). Patients treated for hypertension, diabetes, and hypothyroidism were included if their conditions were well controlled with medications and without neurological complication. Some participants were tested 2-3 weeks after orthopedic surgery and were not on narcotic medications at the time of assessment, In study 2, participants from study 1 were compared as a “normative” group to a “cognitively impaired” group of 151 participants with Mattis DRS scores below 123 (61% Af- rican American, 39% Caucasian, 30% male, 70% female). Mean age for this group was 79.7, with mean education of 8.9 years, Par- ticipants from this group presented with a wide variety of physical disorders which are likely to affect cognitive status. Twenty-foura You have elther reached a page that is unavailable for viewing or reached your viewing lil far this booka You have elther reached a page that is unavailable for viewing or reached your viewing lil far this booka You have elther reached a page that is unavailable for viewing or reached your viewing lil far this bookBOSTON NAMING TEST a mean age of 32.1 (9.7) years and mean education of 15.4 (2.4) years. The sample included 48 white, 4 black, and 2 Hispanic participants. Exclusion criteria were a history of medical, neurological, or psychiatric problems; more than moderate use of alcohol (12 oz/week), history of intravenous drug use, and self-reported history of learning disability (with enrollment in special education classes). Study strengths . Relatively large sample size. 2. The sample composition is described in terms of age, education, and ethnicity. 3. Rigorous exclusion criteria. 4, Means and SDs for the test scores are reported. Considerations regarding use of the study 1. Wide age and education range. No information on 1Q or gender distribution. 2. Recruitment procedures are not reported. 3. Educational level for the sample is high. [BNT.12] Ponton, Satz, Herrera, Ortiz, Urrutia, Young, D’Elia, Furst, and Namerow, 1996 (Table A10.16) The Ponton-Satz BNT was administered to Spanish-speaking volunteers as part of a larger battery in a project designed to provide a standardization of the Neuropsychological Screening Battery for Hispanics (NeSBHIS). Volunteers were recruited through fliers and advertisements in community centers of the greater Los Angeles area over a period of 2 years. Exclusion criteria were a history of neurological or psychiatric disorder, drug or alcohol abuse, and head trauma. Data for a sample of 300 participants, with a median educational level of 10 years, were analyzed. Participants ranged in age from 16 to 75 years, with a mean of 38.4 (13.5) years. Education ranged 1-20 years, with a mean of 10.7 (5.1) years. Male to female ratio was 40/60%, The average duration of residence in the United States was 16.4 (14.4) years, Seventy percent cof the sample were monolingual Spanish- speaking, and 30% were bilingual. The proportion of the sample respective to their country of origin closely approximates the 1992 U.S. Census distribution. Correlations 189 between Marin and Marin (1991) acculturation scale scores and neuropsychological variables are provided. The Ponton-Satz BNT is an adaptation of Kaplan's BNT, consisting of 30 items that are based on the original test but presented in different order. The selection of items was based on the ratings of expert judges in terms of cultural appropriateness and difficulty. In the follow-up study on the factor structure of the NeSBHIS (Ponton et al., 2000), which extracted five factors, the BNT primarily loa- ded on the Language factor, with a varimax- rotated factor loading of 0.84. Study strengths 1. Large overall sample with acceptable size for most of the cells. 2. The sample composition is well described in terms of age, education, gender, acculturation information, geographic area, and recruitment procedures. 3. Adequate exclusion criteria. 4. Test administration procedures are specified. 5. Means and SDs for the test scores are reported, 6. Data are partitioned by gender x age x education. Considerations regarding use of the study 1. Thirty-item version of the test adapted for use with Spanish-speaking population was administered. 2. No information on 1Q is reported. 3. It is unclear which of the two educational groups included participants with 10 years of education. [BNT.13] Tombaugh and Hubley, 1997 (ables A10.17, A10.18) The study provides age- and education- stratified norms for 219 community-dwelling, cognitively intact volunteers, who participated in a large study on the effect of aging on ac- quisition and retention of information. They were recruited through booths at shopping centers, social organizations, places of employment, psychology classes, and word of mouth. The sample included participants ageda You have elther reached a page that is unavailable for viewing or reached your viewing lil far this booka You have elther reached a page that is unavailable for viewing or reached your viewing lil far this booka You have elther reached a page that is unavailable for viewing or reached your viewing lil far this bookBOSTON NAMING TEST Study strengths 1. Large sample. 2. The sample composition is well described in terms of age, education, gender, factors influencing language prof ciency, incentives for participation, and setting, 3. Adequate exclusion criteria. . Test administration procedures are specified in detail. . Means and SDs for the test scores are reported. 6. Comparison of performance in Spanish and English is provided. * Consideration regarding use of the study 1. No information on IQ is reported. {BNT.19] Randolph, Lansing, Ivnik, Cullum, and Hermann, 1999 (Table A10.24) The effects of age, education, gender, and diagnostic group with respect to overall BNT performance; the influence of phonemic cuing: and performance on individual items were examined on samples of neurologically normal elderly, AD patients and temporal lobe epilepsy patients. The control group included 719 paid and unpaid volunteers for studies on neuropsychological function in normal aging. Procedures for recruitment and sample description are provided in earlier articles based on this study. The follow-up articles provide additional information (Lan- sing et al., 1999). The sample was almost exclusively white and 60% female, with a mean age of 73.6 (10.3) years and mean education of 13.4 (2.9) years. Spontaneously correct responses or responses correct with stimulus cue were scored as correct. The test was discontinued after six consecutive failures. The authors present means and SDs for the data broken down by age groups using the overlapping midpoints technique. They also present data for three education groups: <12, 12, and >12 years. ‘The authors found that age and education systematically influenced BNT scores. Gender had a. significant effect across diagnostic groups, with males outperforming females, which was interpreted by the authors as an item-related effect. 193 Study strengths 1. Information regarding age, education, IQ, gender, ethnicity, handedness, and geographic area is reported in the original articles based on this study (Ivnik et al., 1992a,b). 2. The data were stratified by age group based on the overlapping midpoints technique. The sample sizes for each group are large. 2 Considerations regarding use of the study 1. To interpret the data presented in age groups broken down by the overlapping midpoints technique, the reader is referred to the original articles by Ivnik et al. 2. Participants with a prior history of psychiatric or chronic medical illnesses were included. {BNT.20] Killgore and Adams, 1999 (Table A10.25) The study investigates the relationship between BNT performance and WAIS-R Vo- cabulary score and derives regression-based expected BNT scores from Vocabulary scaled scores. The sample consisted of patients con- secutively referred for neuropsychological evaluation at a large midwestern medical center over a 26-month period who were found to be without demonstrable neurological impairment. All patients had negative neurological evaluations and negative neuroimaging and laboratory studies. Patients were excluded if there was evidence of mild dementia or a history of alcohol abuse, learning disability, or seizure disorder. The sample consisted of 28 males and 34 females, ranging in age from 17 to 85 years, with a mean age of 45.7 (15.1) years, mean education of 13.1 (2.7) years, and mean WAIS-R FSIQ of 95.1 (12.0); 34% had a psychiatric diagnosis such as Major Depressive Episode or Adjustment Disorder. BNT was administered by trained person- nel, according to the standardized instructions, The results did not suggest a relationship between BNT performance and age. How- ever, the regression of Vocabulary scaled scores on the BNT scores accounted for 42% of the variance in BNT scores and was used toa You have elther reached a page that is unavailable for viewing or reached your viewing lil far this booka You have elther reached a page that is unavailable for viewing or reached your viewing lil far this booka You have elther reached a page that is unavailable for viewing or reached your viewing lil far this bookBOSTON NAMING TEST multicenter, population-based Cardiovascular Health Study. Data for a slightly larger subsample from this project are presented in Saxton, 2000 (see review above). Volunteers were excluded from the study if they were not right-handed, had a lifetime history of p chiatric illness or any illness or injury re- ferrable to the brain, MR images revealing structural abnormalities, or MMSE. scores of <24, Mean age for the sample was 74.85 (4.95) years, mean education 12.98 (2.87) years, mean MMSE score 28.29 (1.5), and mean WAIS-R Vocabulary score 47.52 (13.26). Of this sample, 71% were taking medications for one or more medical conditions. None was taking medication known to affect brain size (e.g., steroids). ‘The BNT was administered according to standard instructions. Data are presented for the whole sample and for males and females separately, Study strengths 1. Large sample size. 2. The sample composition is well described in terms of age, education, gender, MMSE score, WAIS-R Vocabulary score, geographic area, and research setting, Adequate exclusion criteria. 4, Test administration procedures are specified. 5. Means and SDs for the test scores are reported. © Consideration regarding use of the study 1. The data are not partitioned by age group. IBNT.28] Giovannetti, Goldstein, Schullery, Barr, and Bilder, 2003 (Table At0.33) The BNT was administered to 31 control participants in order to assess basic language skills associated with temporal lobe functions in a study on the mechanisms of verbal fluency deficits in first-episode schizophrenia. Participants were recruited from the hospital community through announcements in local newspapers and within the medical center. They had no history of substance abuse or neurological/psychiatric/medical illness, per 197 self-report and per Schedule for Affective Disorders and Schizophrenia _ Interview, physical examination, and urinalysis. Mean age for the group was 25.2 (6.07) years, mean education 15.0 (1.48) years, mean WAIS-R IQ 109.3 (11.51), and male/female ratio 21/10. The sample is further described in the articles by Bilder et al. and Lieberman et al., published between 1991 and 2000. Study strengths 1. The sample composition is well described in terms of age, education, gender, FSIQ, geographic area, and recruitment procedures. 2. Stringent exchision criteria, 3. Means and SDs for the test scores are reported. Considerations regarding use of the study 1. The sample is small 2, Educational level for the sample is high. RESULTS OF THE META-ANALYSES OF THE BOSTON NAMING TEST DATA (See Appendix 10M) Data collected from the studies reviewed in this chapter were combined in regression analyses in order to describe the relationship between age and test performance and to predict expected test scores for different age groups. Effects of other demographic va ables were explored in follow-up analyses. The general procedures for data selection and analysis are described in Chapter 3. Detailed results of the meta-analysis and predicted test scores across adult age groups are provided in Appendix 10m. ‘After initial data editing for consistency and for outlying scores, 14 studies, which generated 43 data points based on a total of 1,684 participants, were included into the Normative data from Kaplan's stan- dardiration sample were not included in the database. A quadratic regression of the BNT scores on age yielded an R? of 0,850, indicating that 85% of the variance in BNT scores is accounted for by the model. Based on thisa You have elther reached a page that is unavailable for viewing or reached your viewing lil far this booka You have elther reached a page that is unavailable for viewing or reached your viewing lil far this booka You have elther reached a page that is unavailable for viewing or reached your viewing lil far this bookVERBAL FLUENCY TEST neuropsychological dysfunctions yielded correlation coefficients of 0.87-0.94 for different samples (Lacy et al., 1996). The authors concluded that these intercorrelations even sur- pass correlations between CFL and PRW. In spite of such an optimistic view, the norms for the FAS test should be used with caution in application to the COWA sets (CFL and PRW) due to different levels of letter difficulty (Ruff et al., 1996). Other combinations of letters have been used in several studies. Cavalli et al. (1981) used P, F, and L in a study on lateralized deficits in linguistic processing. Nielsen et al. (1989) used S, N, and F on a large neurologically intact Danish sample. S and P were used by Barr and Brandt (1996) in a study on fluency deficits in dementia, Ganguli et al. (1991, 1993, 1996) in a study on cognitive impairment in an elderly rural population, Goldman et al. (1998) in a study examining cognitive deficits associated with Parkinson's disease, and Coffey et al. (2001) in a study exploring cognitive correlates of human brain aging. Lannoo and Vingerhoets (1997) used N, A, and K in a project collecting normative data for a Dutch version of the phonemic fluency test. Lopez-Carlos et al. (2003) used P, M, and R in their project, collecting normative data for monolingual Spanish-speaking individuals with a low level of education. The authors noted that illiterate individuals tend to make errors in the words that start with A and S, given that many words that start with a silent H begin with the A sound and words that start with a C sound like S. Other versions of the fluency tests involve generation of words from certain semantic categories (Category Naming), such as animal naming (Acevedo et al. 2000; Barr & Brandt, 1996; Beatty et al., 1997; Brady et al., 2001; Crossley et al., 1997; Epker et al., 1999; Fama et al., 2000; Ganguli et al., 1991, 1993; Gio- vannetti et al., 2003; Kempler et al., 1998; Kozora & Cullum, 1995; Lopez-Carlos et al., 2003; Monsch et al., 1992; Morris et al., 1989; Rosen, 1980; Rosselli et al., 2002a; Selnes et al., 1991; Simkins-Bullock et al., 1994; Tombaugh et al., 1999; Troyer, 2000; Ylikoski et al., 1993); types of transportation and parts of a car (Weingartner et al., 1984); items found in a 201 supermarket (Barr & Brandt, 1996; Kozora & Cullum, 1995; Monsch et al., 1992; Troyer, 2000); fruits and vegetables, foods, and things people drink (Acevedo et al., 2000; Miller, 2003; Monsch et al., 1992; Randolph et al., 1993; Simkins-Bullock et al., 1994); first names (Kozora & Cullum, 1995; Monsch et al., 1992); tools and clothing (Huff et al., 1986b); U.S. states (Kozora & Cullum, 1995); and inanimate objects (Fama et al., 2000). Fuld (1981) used category naming tasks, such as proper names of people (same gender as the examinee), foods, vegetables, things that make people happy, and things that make people sad, as distractor trials for delayed recall of the origi- nally presented stimuli in her Fuld Object- Memory Evaluation (see also Marcopulos etal., 1997). Food, clothing, animals, and things to ride categories are included in the McCar- thy Scales for Children’s Abilities (McCarthy, 1972). A version of the VF task tapping semantic switching is included in the Delis- Kaplan Executive Functions System (Delis etal., 2001). This test assesses letter fluency (F, A, $), category fluency (animals and boys’ names), and category switching (fruits and furniture), The test has a standard and alternate forms. Two parallel forms of a semantic fluency test are included in the Repeatable Battery for the Assessment of Neuropsycho- logical Status (RBANS; Randolph, 1995). One version of a category naming test is the Set Test (Isaacs & Kennie, 1973), which i volves generating items from four successi categories: colors, animals, fruits, and towns. According to this version, examinees are to recall up to 10 items from each category, after which they are instructed to shift to the next category. The score is the total number of items recalled for all categories. The versions proposed by Newcombe (1969), used in asse: ing patients with lateralized missile wounds, involved naming objects and animals and alternating between naming birds and colors over 1 minute for each of the three trials. The number of correctly generated items for the first two conditions and correct alternations for the third condition are recorded. Villardita et al. (1985) used a modification of the Set Test to assess a group of normal elderly, employing the categories proper names ofa You have elther reached a page that is unavailable for viewing or reached your viewing lil far this booka You have elther reached a page that is unavailable for viewing or reached your viewing lil far this booka You have elther reached a page that is unavailable for viewing or reached your viewing lil far this bookVERBAL FLUENCY TEST predominantly left frontal localization of activation appeared to be established by middle childhood, ‘A number of studies suggest that in addition to the left frontal structures commonly viewed as neural bases for word generation, phonemic vs. semantic fluency might involve different neural systems. Right (or bilateral) cerebellum has been found to participate in phonemic, but not in semantic, processing (Leggio et al., 2000; Ravnkilde et al., 2002; Schloesser et al., 1998), whereas hippocampi were found to play a role in semantic, but not in phonemic, fluency performance (Gleissner and Elger, 2001). Similarly, Pihlajamaeki et al. (2000) suggested that the left medial temporal lobe (hippocampal formation or posterior parahippocampal gyrus) is activated in retrieval by category. Stuss et al. (1998) suggested that in addition to the left hemisphere centers participating in phonemic processing, semantic processing involves right dorsolateral and inferior medial regions. This view is supported by N’Kaoua et al.’s (2001) finding that phonemic processing involves the left temporal lobe, whereas semantic processing involves the left and right temporal lobes. Differential rates of deterioration in phonemic vs. semantic fluency in dementia support the notion that these two types of fluency are subserved by different neural mechanisms. Animal naming has been shown in some studies to be performed at higher levels than word-generation tasks (Ober et al. 1986; Rosen, 1980). Similar findings are reported with reference to other semantic category naming tests (e.g., fruits: Ober et al., 1986; Randolph et al., 1993). In contrast, Bayles et al. (1989), Monsch et al. (1994), and Sherman and Massman (1999) did not find differences in efficiency of word generation for semantic vs. phonemic tasks. Furthermore, greater impairment of semantic fluency in comparison to phonemic fluency tasks in clinical samples was documented by Barr and Brandt (1996), Butters et al. (1987), Cahn et al. (1995); Cer- han et al. (2002); Crossley et al. (1997), Mickanin et al. (1994), Monsch et al. (1992), Rosser and Hodges (1994), and other authors. This pattern of greater deterioration in semantic fluency, which, according to Butters et al. (1987), is based on the ability to access and retrieve semantic knowledge, than in phonemic fluency, which is based on phono- logicaV/exical retrieval mechanisms, is viewed as being mostly due to disruption in the structure of semantic memory early in the course of dementia. Coen et al. (1996) related distinction in the efficiency of phonemic vs. semantic fluency to the rate of cognitive decline. The authors showed that shorter duration of illness in their sample of patients suffering from dementia of Alzheimer’s type (DAT) is associated with greater impairment in letter fluency, whereas longer duration resulted in predominance of the category fluency impairment. Consider- ing that there was no difference between the groups in dementia severity, these findings were viewed as a function of the rate of disease progression, with greater impairment in phonemic fluency being associated with more ra- pid cognitive decline. Effects of different types of brain pathology, including brain injuries, aphasias, some amnesiac conditions, and degenerative de- menting conditions on verbal production, are addressed in a number of studies, many of which provide information allowing comparison of word generation in clinical and control groups (Barr & Brandt, 1996; Carew # a 1997; Cerhan et al., 2002; Clark et al., Coen et al., 1996; Dalrymple-Alford ad 1994; Elvevag et al., 2001; Eslinger et all, 1984; Geffen et al., 1993; Goethe et al., 19! Goldman et al., 1998; Gurd, 2000; Huff et al., 1986b; Joyce et al., 1996; Klimczak et al., 1997; Lafleche and Albert, 1995; Locascio et al., 1995; Margolin et al., 1990: Miller, 1985; Piatt et al, 1999a; Poreh et al., 1! Robert et al., 1998; Shogeirat et al., 1990; Zec et al., 1999), Assessment of Verbal Fluency in Different Languages Spanish versions of the verbal fluency test are available, which are based on different sets of letters (Artiola i Fortuny et al., 1999; Lopez~ Carlos et al., 2003; Rey & Benton, 1991). Artiola i Fortuny et al. (1999) used letters P, M, and R to assess phonemic fluency as part

Assessmen Cerebral

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Assessmen Cerebral

Transféré par

Droits d'auteur :

Formats disponibles

Vous aimerez peut-être aussi