Vous êtes sur la page 1sur 6

Neurologic Critical Care

A multicenter prospective study of interobserver agreement using


the Full Outline of Unresponsiveness score coma scale in the
intensive care unit
Andrew A. Kramer, PhD; Eelco F. M. Wijdicks, MD, PhD; Vicki L. Snavely, RN; Jessica R. Dunivan, CCRN;
Linda L. Smitz Naranjo, DNP, RN; Shonna Bible, BSN; Thomas Rohs, MD; Stacy M. Dickess, BSN
Objective: The classification of the comatose patient has been
greatly improved with the use of coma scales. The Full Outline
of Unresponsiveness score has emerged as an alternative to the
Glasgow Coma Scale in that it incorporates essential information needed to assess the depth of coma. One set of patients for
which the Full Outline of Unresponsiveness score could be particularly beneficial is those admitted to an intensive care unit, where
approximately 30%35% of all patients are intubated or ventilated.
This manuscript reports on a study that examined the inter-rater
reliability of the Full Outline of Unresponsiveness score in five
intensive care units.
Setting: Seven intensive care units at five U.S. hospitals
participated.
Subjects: Patients admitted during parts of 2010 and 2011 had
their Full Outline of Unresponsiveness score assessed independently by two nurses within 1 hr of admission.
Design: We evaluated the weighted kappa statistic of the Full
Outline of Unresponsiveness score over all patients and stratified

he degree of coma can be captured in a scale or score that


allows better communication
between
healthcare
workers and attending physicians. Such a
scale may evendespite heterogeneity
of patient case-mixpredict outcome.
The Glasgow Coma Scale (GCS) (1) has
been universally used, as it was the first
method to measure level of consciousness in a manner that was associated with
From the Cerner Corporation (AAK, VLS), Vienna,
VA; Mayo Clinic (EFMW), Rochester, MN; Memorial
University Medical Center (JRD), Savannah, GA;
University of Maryland Medical Center (LLSN),
Baltimore, MD; Spartanburg Regional Health System
(SB), Spartanburg, SC; Borgess Hospital (TR),
Kalamazoo, MI; and St. Marys Medical Center (SMD),
Huntington, WV.
Dr. Kramer has stock ownership in Cerner
Corporation. The remaining authors have not disclosed
any potential c onflicts of interest.
For information regarding this article, E-mail:
wijde@mayo.edu
Copyright 2012 by the Society of Critical Care
Medicine and Lippincott Williams & Wilkins
DOI: 10.1097/CCM.0b013e318258fd88

by mechanical ventilation status. Finally, we looked for evidence


of heterogeneity in Full Outline of Unresponsiveness score agreement across hospitals.
Measurements and Main Results: A total of 907 adult critically ill patients had Full Outline of Unresponsiveness score
assessments by two evaluators. The overall weighted kappa
statistic was 0.92, and this did not differ by whether or not a
patient was on a ventilator. Among hospitals there was modest
heterogeneity for the weighted kappa; however, all of the values
were >0.80.
Conclusions: The Full Outline of Unresponsiveness score
showed excellent inter-rater agreement overall and at each of the
five hospitals. This demonstrates that the Full Outline of Unresponsiveness score can be utilized reliably in critically ill patients. (Crit
Care Med 2012; 40:26712676)
Key Words: assessment; brain injury; coma; reliability

eventual outcome. This allowed its inclusion into predictive models, such as the
World Federation of Neurologic Surgeons
scale (2), as well as in intensive care unit
(ICU) scoring systems, such as the Acute
Physiology and Chronic Health Evaluation (APACHE) IV (3).
While widely used, the GCS might
be replaced by a scale better suited to
address level of consciousness in critically ill patients. It is likely that the GCS
is less informative in acute brain injury
that results in a change of brainstem
reflexes and breathing pattern. In the
medical ICU, the disadvantage of GCS is
its inability to identify early changes in
consciousness (e.g., failure to track the
examiners fingers and to follow a series
of very specific commands). Loss of one
of its pillarsthe verbal response
occurs with the simple act of intubation,
which accounts for around 35% of all U.S.
admissions to ICUs (3, 4).
More recently, a new coma scalethe
Full Outline of Unresponsiveness (FOUR)
scorehas been introduced and several

prospective studies have validated the


score as a reliable tool in assessing comatose patients in medical intensive care,
neurointensive care units, and in the
emergency department (59). The FOUR
score incorporates eye opening and eye
movements, brainstem reflexes, respiration patterns, several motor responses,
and specific ways to assess comprehension to a command. A schematic diagram
of the FOUR score is given in Appendix 1.
In order to prove wide applicability,
any new scale must be able to be reliably assessed. The FOUR score has been
successfully tested in several European
countries and in one general ICU (813).
However, these studies were small and
skewed toward maximal scores. Therefore, we embarked on a prospective study
of inter-rater agreement in seven ICUs
within five hospitals across the United
States. In particular, we were interested in
the level of inter-rater agreement, as well
as whether or not inter-rater agreement
was good in patients who were mechanically ventilated.

Crit Care Med 2012 Vol. 40, No. 92671

MATERIALS AND METHODS


Five U.S. hospitals elected to participate in
the study. These hospitals were ascertained as
their ICUs were using the APACHE IV database. APACHE is a clinical information system (Cerner Corporation, Kansas City, MO)
that provides information for benchmarking
and quality improvement (3). Sites received
approval from their institutional review board
to engage in this study and collect data. One
or two ICUs in each hospital collected data on
consecutive admissions for 2 months or 250
patients, whichever came first. Although the
initiation of collection was staggered among
the sites, all data were collected between
September 2010 and February 2011.
Each patient received their normal care as
well as information collected for the APACHE
system, with one exception: within 1 hr after
admission, a patient had a FOUR score assessment by two caregivers in the unit. These individuals did not initially have any experience
in recording the FOUR score but underwent
training as follows: The initial letter from one
of the authors (E.F.M.W.) described the FOUR
score. This was followed up with education
provided by an experienced critical care nurse
(V.L.S.). The educational session was on-site
and web-based, focusing on assessment parameters for eye and motor response, brainstem reflexes, and patterns of respiration. Each
site was provided with FOUR score evaluation
cards and access to an educational compact
disc. Any questions were directed back to one
of the authors (E.F.M.W.) for clarification.
Consecutive, adult patients admitted to a
participating ICU received a FOUR score evaluation by the study nurse for that site. The initial assessment was performed upon patient
admission to the ICU. This was followed by a
second assessment by a different nurse on the
unit or a physician within the first hour after
admission. The second nurse was blinded to
the first nurses score. FOUR score data were
entered manually on a score sheet for the unit
and later put into a Microsoft Excel file.
The data were sent back to a central repository, where they were imported into SAS
(version 9.2, Cary, NC). Because the number
of possible FOUR score values was so large, 17
in all, a raw score would not adequately account for the ordinal nature of the data. Thus,
a weighted was calculated for the FOUR score
over all sites, at each site individually, and
for each component of the FOUR score. The
weighted (w) is a modification of the score
in that it gives partial weight to those instances
where the evaluators almost agree (14). The
formula for the weights for this study is:
wij = 1 (|Ci Cj| /16),
where wij is the weight applied to rowi and
columnj, Ci is the value for rowi, and Cj is the
value for columnj. Thus, the weight for exact
agreement is 1.0, with decreasing weights for
increasing disparity between scores. For example, if one assessor calculated a FOUR score
of 3 and another assessor calculated a FOUR

2672

score of 5 on the same patient, wij = 1 (2/16)


= 0.875.
Although the aim was to evaluate the FOUR
scores reproducibility, we retrospectively obtained additional data on 937 admissions that
could be linked to the APACHE database: day
1 values for worst GCS, age, and whether or
not the patient was ventilated. Using the latter
two variables, we constructed multivariable logistic regression models of hospital mortality:
one that included the FOUR score and another
that contained the GCS. The two models were
then compared for area under the receiver operating characteristic curve and the Hosmer
Lemeshow chi-square test.

Table 1. Characteristics of the patients (n = 1014)


receiving Full Outline of Unresponsiveness score
assessments
Characteristic
Gender, male
Race, White
Age, median (intraquartile range)
Mechanically ventilated at some
point during day 1
Postoperative admission
Diagnostic group
Primarily neurological
Head trauma
Stroke
Central nervous system infection
Other neurological

56.9%
75.3%
60 (48, 72)
51.6%
33.4%
25.3%
6.0%
3.0%
0.3%
16.0%

RESULTS
Seven ICUs from five hospitals participated: two trauma units, two neurologic
units, one neurologic/trauma unit, one
general medical unit, and one general
surgical unit. The hospitals had the following characteristics: one was a nonteaching hospital, one was a hospital that
is a member of the Coalition of Teaching
Hospitals, and three were teaching hospitals that were not members of the Coalition of Teaching Hospitals. Hospital bed
size ranged from 288 to 900. There were
a total of 1,014 patients seen by the seven
ICUs. Of this number, three (< 1%) did
not have FOUR score assessments and a
further 105 (10.4%) were only assessed
by one individual. The number of missing
FOUR score evaluations was not consistent over sites. Two sites had no pairs with
missing data, two sites had approximately
18% missing data, and one site had 33%
missing data (48/138). The latter was due

to the primary nurse being ill for part of


the period of data collection.
Characteristics of the patients are given
in Table 1. About half of the patients were
on a ventilator during day 1 after admission, and one third of the admissions to
the ICU were postoperative. Approximately
25% of admissions were due to a diagnosis
categorized as primary neurological. The
neurologic diagnosis was further subdivided into traumatic head injury (6%),
stroke (3%), and other (16%).
Figure 1 gives the frequency distribution of the FOUR score levels for the
first assessor. About half of the patients
received the best score (i.e., =16), 30%
had moderate scores (915), and 20%
had poor scores (08). This distribution
of FOUR score levels was almost identical
to that recorded by the second assessor
(Bowkers Qb = 55.2, df = 136, p > .90).
Table 2 lists the frequency distributions for each component of the FOUR

Figure 1. Frequency distribution of Full Outline of Unresponsiveness (FOUR) score values.

Crit Care Med 2012 Vol. 40, No. 9

Table 2. Frequency distribution of the Full Outline of Unresponsiveness score components as assessed
by the primary evaluator (n = 1011) and weighted (primary and secondary assessors, n = 906)
Component

0 (Worst)

4 (Best)

Kw

Eye
Motor
Brain
Respiratory

17.4%
9.3%
2.4%
15.7%

6.2%
1.1%
1.3%
25.8%

7.7%
5.0%
3.9%
3.9%

4.5%
19.6%
1.0%
0.1%

64.2%
65.0%
91.4%
54.5%

0.88
0.91
0.89
0.94

mechanical ventilator on day 1, patients


admitted postoperatively, and patients
receiving active therapy on day 1. (To
receive active therapy, a patient must have
had at least one of 32 life sustaining therapies, as described in [15]. These patients
are more severely ill than patients only
requiring extensive monitoring.) In each
instance, there was no significant difference in w between patients having a factor and those not having it.
The results of the multivariable logistic regression were quite close between
the FOUR score and the GCS. Area under
the receiver operating characteristic
curve for the FOUR score was 0.77 vs. 0.80
for the GCS-included model (p > .05).
The HosmerLemeshow chi-square was
7.66 for the model that included the
FOUR score (p = .47) and 15.84 for the
GCS-included model (p = .04).

DISCUSSION

Figure 2. Scatterplot of the Full Outline of Unresponsiveness score values recorded by two independent
assessors. A bubbles size is proportional to the frequency of the recorded values at that point.
Table 3.Weighted values and 95% confidence intervals for inter-rater agreement on the Full
Outline of Unresponsiveness score: across all patients and by hospital
Hospital
Overall
#1
#2
#3
#4 and #5

Weighted

95% Confidence Interval

907
300
250
206
151

0.92
0.91
0.92
0.84
0.96

(0.90, 0.93)
(0.89, 0.93)
(0.89, 0.95)
(0.79, 0.89)
(0.93, 0.99)

2 for heterogeneity = 16.0, degrees of freedom = 3, p = .001.

score as recorded by the first assessor.


The largest amount of variability was
seen in the respiratory component, followed by the eye and motor components, and the least amount of variation
was seen in the brainstem component.
Weighted scores for each component
ranged from 0.88 to 0.94.
For the FOUR score itself, the two
independent assessors agreed exactly on
758 of 907 (83.6%) patients. Figure 2
shows a scatterplot of the FOUR score
values recorded by the two independent
assessors. Almost all values were either
completely concordant between the two
assessors or nearly so. Table 3 presents
the weighted values and 95% confidence

intervals for inter-rater agreement on


the FOUR score: across all patients and
by hospital. Across all patients, w was
0.92 (95% confidence interval = 0.90,
0.93). Since two of the sites did not collect a sufficient number of patients so
that all levels of the FOUR score had a
frequency >0, their data were combined
for purposes of examining interhospital
variation. Although the heterogeneity
of w across hospitals was statistically
significant (2 for heterogeneity = 16.0,
df = 3, p = .001), all hospitals showed
excellent inter-rater agreement.
Table 4 gives the value for w stratified by various factors that might impact
the FOUR score: patients being on a

The GCS has fixed its position as the


gold standard for rating levels of consciousness. Its shortcomingsalbeit at
the time specifically designed to avoid
complexityhave been well recognized (16). One of these, the inability to
assess ventilated patients, is particularly
problematic when assessing critically
ill patients. The FOUR score has been
designed to overcome the limitations of
the GCS while maintaining reliability and
ease of measurement.
This study is the largest prospective investigation in the ICU to date.
The results indicate that the FOUR
score assessment is both reliable, easy
to obtain, and simple to implement at a
site. The FOUR score can be obtained in a
few minutes. Patterns of brainstem injury
can be easily interpreted by nurses (68).
Weighted values overall and at each site
were excellent. Of importance is that each
site was not aware of the identification of
the other sites participating in the study.
Thus, the similar results cannot be attributed to communication between sites.
Within a site, raters were blinded to each
others scores. When broken down into
its four components, inter-rater agreement was consistently high. The component with the largest amount of variation,
respiratory level, also had the highest
weighted score. This component is not
included in the GCS and conveys important information.
Data were not available to compare
the predictive capability of the worst
FOUR score vs. the worst GCS on day 1

Crit Care Med 2012 Vol. 40, No. 92673

Table 4.Weighted values stratified by factors that might influence the Full Outline of
Unresponsiveness score (n = 840a)
Factor
Patient on mechanical ventilation
Postoperative admission
Receiving active therapy on day 1

Yes/No
Yes
No
Yes
No
Yes
No

# Patients

Weighted

95% Confidence
Interval

432
408
273
567
558
282

0.87
0.87
0.89
0.92
0.89
0.88

(0.84, 0.89)
(0.81, 0.93)
(0.85, 0.92)
(0.90, 0.94)
(0.87, 0.91)
(0.82, 0.95)

Clinical information was not available for 67 (7%) of 907 patients.


None of the factors had levels that were significantly different from each other (p > .05).
a

after admission. However, we were able


to compare multivariable logistic regression models that included the first (only)
FOUR score measurement and the worst
GCS, respectively. There was no statistical
difference between the two models in discriminating hospital mortality. However,
the GCS had a statistically significant
HosmerLemeshow chi-square value,
which is not good considering that the
sample size was comprised of only 937
admissions. To further investigate the
predictive abilities of these two scores,
a larger cohort is needed that contains
the worst GCS and the worst FOUR score
measurement on day 1.
One of the advantages that the FOUR
score presents as opposed to the GCS
is that the former can be evaluated in
patients placed on a mechanical ventilator. The results from this study indicate
that FOUR score agreement for ventilated patients was as good as for nonventilated patients. Furthermore, admission
source (operative vs. medical) and need
for active therapy (i.e., life-sustaining
treatment) did not diminish the interrater agreement for the FOUR score. This
stability of measurement across patient
groups and, as noted above, across hospitals indicates that the FOUR score is a
robust level of consciousness metric for
patients in the ICU.
The amount of training required for
learning how to assess the FOUR score
was modest, requiring <1 hr for the clinicians to complete. This is important
as adoption of a new severity measure
depends to some extent on it having an
unimposing learning curve. Another factor in adoption of a new measure is the
amount of effort required to assess it.
The results from this study suggest that
assessing the FOUR score (as opposed to
the GCS) appears to impose no additional
burden. Of the 1,011 patients assessed,
only 10.3% were missing an evaluation
by the second assessor, with most of the
2674

missing data resulting from the illness of


one of the data collectors.
Previous studies have found excellent
values and have included neurointensivists, neurology residents, emergency
physicians, medical intensivists, and ICU
and emergency nurses. One study (2) specifically tested the influence of nurses
experience on FOUR score inter-rater
agreement and found none. Other studies have translated the FOUR score and
tested it successfully in large numbers
of patients with a variety of neurologic
disorders. All studies found the FOUR
score inter-rater agreement similar to
GCS or slightly better. A recent pooled
analysis in 381 prospectively studied
neurologic patients showed that both the
FOUR score and GCS could be useful in
predicting patient outcome. Severely ill
cases (i.e., patients with a FOUR score
of 3, patients with a GCS of 3) were
more likely to die than those with a score
of 4 (9).
Recently, the inter-rater agreement
was computed for both the FOUR score
and GCS in a medical ICU (17). In that
study, 267 patients had both measures
assessed. The results showed that the
FOUR score had better exact agreement than the GCS, and the two scores
agreement 1 score point were roughly
equivalent. The FOUR score equaled
GCS and APACHE II comparing 1 month
outcome.
Based on the FOUR scores reliability,
strong association with hospital mortality, and the capability of being measured
on ventilated patients, its use as a measure of level of consciousness in critically
ill patients should be seriously considered. Some potential benefits include 1)
better decision making, for example the
need for neurosurgical intervention, 2)
better outcome prediction in patients
affected the most, and 3) significantly
more information in clinical trials involving brain injury.

There are some limitations for this


study. First, there were only five hospitals
participating, and these were ascertained
from clients using a clinical information
support system. Thus the results found
here cannot be extrapolated to the entire
United States. Second, while reliability of
a severity measure is a prerequisite for
its use, the measures validity for predicting outcomes is of prime importance. We
plan on addressing this topic in a separate analysis. Finally, it is possible for the
weighted to be large when true agreement is modest, due to a lack of dispersion among the possible values. However,
the raw value for exact agreement here
was 0.78, indicating that the FOUR score
measure was reliably measured.

CONCLUSIONS
This is the first multi-ICU study in the
United States that evaluates the feasibility of the FOUR score in non-neurologic
and neurologic critically ill patients.
The FOUR score was found to be reliably assessed by two independent evaluators. This reliability varied only slightly
among sites and did not differ by patient
subgroup, including ventilator status.
These results, coupled with the relative
ease of assessing the FOUR score, warrant
considering that evaluation as a measure
for level of consciousness in critically ill
patients.

REFERENCES
1. Teasdale G, Jennett B: Assessment of coma
and impaired consciousness. A practical
scale. Lancet 1974; 2:8184
2. Drake CG: Report of World Federation of
Neurological Surgeons Committee on a Universal Subarachnoid Haemorrhage Grading
Scale. J Neurosurg 1988; 68:985986
3. Zimmerman JE, Kramer AA, McNair DS,
et al: Acute Physiology and Chronic Health
Evaluation (APACHE) IV: Hospital mortality
assessment for todays critically ill patients.
Crit Care Med 2006; 34:12971310
4. Higgins TL, Teres D, Copes WS, et al: Assessing contemporary intensive care unit outcome: An updated Mortality Probability
Admission Model (MPM0-III). Crit Care Med
2007; 35:827835
5. Wijdicks EF, Bamlet WR, Maramattom BV,
et al: Validation of a new coma scale: The
FOUR score. Ann Neurol 2005; 58:585593
6. Wolf CA, Wijdicks EF, Bamlet WR, et al: Further validation of the FOUR score coma scale
by intensive care nurses. Mayo Clin Proc
2007; 82:435438
7. Stead LG, Wijdicks EF, Bhagra A, et al: Validation of a new coma scale, the FOUR score, in

Crit Care Med 2012 Vol. 40, No. 9

the emergency department. Neurocrit Care


2009; 10:5054
8. Iyer VN, Mandrekar JN, Danielson RD, et al:
Validity of the FOUR score coma scale in the
medical intensive care unit. Mayo Clin Proc
2009; 84:694701
9. Wijdicks EF, Rabinstein AA, Bamlet WR, et
al: FOUR score and Glasgow Coma Scale in
predicting outcome of comatose patients: A
pooled analysis. Neurology 2011; 77:8485
10. Marcati E, Ricci S, Casalena A, et al: Validation of the Italian version of a new coma
scale: The FOUR score. Intern Emerg Med
2012; 7:145152
11. Idrovo L, Fuentes B, Medina J, et al: Validation of the FOUR Score (Spanish Version)
in acute stroke: An interobserver variability
study. Eur Neurol 2010; 63:364369

12. Akavipat P: Endorsement of the FOUR score


for consciousness assessment in neurosurgical patients. Neurol Med Chir (Tokyo) 2009;
49:565571
13. Bruno MA, Ledoux D, Lambermont B, et al:
Comparison of the full outline of unresponsiveness and Glasgow Liege Scale/Glasgow
Coma Scale in an intensive care population.
Neuro Crit 2011; 15:447453
14. Cicchetti DV, Allison T: A new procedure for
assessing reliability of scoring EEG sleep
recordings. Am J of EEG Technology 1971;
11:101109
15. Zimmerman JE, Wagner DP, Knaus WA,

et al: The use of risk predictions to identify
candidates for intermediate care units. Implications for intensive care utilization and cost.
Chest 1995; 108:490499

16. Kornbluth J, Bhardwaj A: Evaluation of coma:


A critical appraisal of popular scoring systems. Neurocrit Care 2011; 14:134143
17. Fischer M, Regg S, Czaplinski A, et al: Interrater reliability of the Full Outline of UnResponsiveness score and the Glasgow Coma
Scale in critically ill patients: A prospective
observational study. Crit Care 2010; 14:R64

the best level of alertness. A score of eye


response (E) 4 indicates at least three voluntary excursions. If eyes are closed, the
examiner should open them and examine tracking of a finger or object. Tracking with the opening of one eyelid will
suffice in cases of eyelid edema or facial
trauma. If tracking is absent horizontally,
examine vertical tracking. Alternatively,
two blinks on command should be documented. This will recognize a locked-in
syndrome (patient is fully aware). A score

of E3 indicates the absence of voluntary


tracking with open eyes. A score of E2
indicates eyelids opening to loud voice. A
score of E1 indicates eyelids open to pain
stimulus. A score of E0 indicates no eyelids opening to pain.

at least one of three hand positions


(thumbs-up, fist, or peace sign) with
either hand. A score of M3 indicates that
the patient touched the examiners hand
after a painful stimulus compressing the
temporomandibular joint or supraorbital
nerve (localization). A score of M2 indicates any flexion movement of the upper
limbs. A score of M1 indicates extensor
posturing. A score of M0 indicates no
motor response or myoclonus status
epilepticus.

Motor Response
Grade the best possible response of
the arms. A score of motor response (M)
4 indicates that the patient demonstrated

APPENDIX 1. COMPONENTS
OF THE FULL OUTLINE OF
UNRESPONSIVENESS SCORE
Eye Response
Grade the best possible response after
at least three trials in an attempt to elicit

Crit Care Med 2012 Vol. 40, No. 92675

Brainstem Reflexes
Grade the best possible response.
Examine pupillary and corneal reflexes.
Preferably, corneal reflexes are tested
by instilling two to three drops of sterile saline on the cornea from a distance
of 46 inches (this minimizes corneal
trauma from repeated examinations).
Cotton swabs can also be used. The cough
reflex to tracheal suctioning is tested only
when both of these reflexes are absent. A
score of brainstem reflexes (B) 4 indicates
pupil and cornea reflexes are present. A

2676

score of B3 indicates one pupil wide and


fixed. A score of B2 indicates either pupil
or cornea reflexes are absent, B1 indicates
both pupil and cornea reflexes are absent,
and a score of B0 indicates pupil, cornea,
and cough reflex (using tracheal suctioning) are absent.

Respiration
Determine spontaneous breathing
pattern in a nonintubated patient, and
grade simply as regular respiration (R)
4, irregular R2, or Cheyne-Stokes R3

breathing. In mechanically ventilated


patients, assess the pressure waveform
of spontaneous respiratory pattern or
the patient triggering of the ventilator
R1. The ventilator vmonitor displaying
respiratory patterns is used to identify
the patient-generated breaths on the
ventilator. No adjustments are made to
the ventilator while the patient is graded,
but grading is done preferably with
Paco2 within normal limits. A standard
apnea (oxygen-diffusion) test may be
needed when patient breathes at ventilator rate R0.

Crit Care Med 2012 Vol. 40, No. 9

Vous aimerez peut-être aussi