Vous êtes sur la page 1sur 11

Journal of Clinical Epidemiology 63 (2010) 1145e1155

Logistic regression had superior performance compared with regression trees for predicting in-hospital mortality in patients hospitalized with heart failure
Peter C. Austina,c,*, Jack V. Tua,b,c,d,e, Douglas S. Leea,e,f
a Institute for Clinical Evaluative Sciences, Toronto, Ontario, Canada Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada c Department of Health Management, Policy and Evaluation, University of Toronto, Toronto, Ontario, Canada d Schulich Heart Centre, Sunnybrook Health Sciences Centre, Toronto, Ontario, Canada e Department of Medicine, Faculty of Medicine, University of Toronto, Toronto, Ontario, Canada f Division of Cardiology, University Health Network, Toronto, Ontario, Canada b

Accepted 22 December 2009

Abstract Objective: To compare the predictive accuracy of regression trees with that of logistic regression models for predicting in-hospital mortality in patients hospitalized with heart failure. Study Design and Setting: Models were developed in 8,236 patients hospitalized with heart failure between April 1999 and March 2001. Models included the Enhanced Feedback for Effective Cardiac Treatment and Acute Decompensated Heart Failure National Registry (ADHERE) regression models and tree. Predictive accuracy was assessed using 7,608 patients hospitalized between April 2004 and March 2005. Results: The area under the receiver operating characteristic curve for ve different logistic regression models ranged from 0.747 to 0.775, whereas the corresponding values for three different regression trees ranged from 0.620 to 0.651. For the regression trees grown in 1,000 random samples drawn from the derivation sample, the number of terminal nodes ranged from 1 to 6, whereas the number of variables used in specic trees ranged from 0 to 5. Three different variables (blood urea nitrogen, dementia, and systolic blood pressure) were used for dening the rst binary split when growing regression trees. Conclusion: Logistic regression predicted in-hospital mortality in patients hospitalized with heart failure more accurately than did the regression trees. Regression trees grown in random samples from the same data set can differ substantially from one another. 2010 Elsevier Inc. All rights reserved.
Keywords: Logistic regression; Regression trees; Classication trees; Predictive model; Validation; Recursive partitioning; Congestive heart failure

1. Introduction There is an increasing interest in using classication and regression trees to predict the probability of adverse events in patients receiving medical or surgical treatment. Accurately predicting the probability of adverse events allows for effective patient risk stratication, thus permitting more appropriate medical care to be delivered to patients [1e5]. Classication and regression trees use binary recursive partitioning methods to partition the sample into distinct subsets [6]. Within each subset, the predicted probability of the event can be estimated. At the rst step, all possible

* Corresponding author. Institute for Clinical Evaluative Sciences, G1 06, 2075 Bayview Avenue, Toronto, Ontario M4N 3M5, Canada. Tel.: 416-480-6131; fax: 416-480-6048. E-mail address: peter.austin@ices.on.ca (P.C. Austin). 0895-4356/$ - see front matter 2010 Elsevier Inc. All rights reserved. doi: 10.1016/j.jclinepi.2009.12.004

dichotomizations of all continuous variables (above vs. below a given threshold) and of all categorical variables are considered. Using each possible dichotomization, all possible ways of partitioning the sample into two distinct subsets is considered. The binary partition that results in the greatest reduction in impurity is selected. Each of the two resultant subsets is then partitioned recursively until predened stopping rules are achieved. Although logistic regression is the most commonly used method for predicting the probability of an adverse outcome in the medical literature, methods, such as classication and regression trees, are increasingly being used to identify subjects at increased risk of adverse outcomes. Advocates for classication and regression trees have suggested that these methods allow for the construction of easily interpretable decision rules that can be easily applied in clinical practice. Furthermore, classication and regression tree methods are adept at

1146

P.C. Austin et al. / Journal of Clinical Epidemiology 63 (2010) 1145e1155

What is new? Key nding Logistic regression had superior performance compared with regression trees for predicting in-hospital mortality in patients hospitalized with heart failure. What this adds to what was known? Logistic regression has superior performance for predicting in-hospital mortality in patients hospitalized with heart failure because it can account for the underlying linear relationships between key continuous covariates and the log-odds of in-hospital mortality. What is the implication and what should change now? Logistic regression should be used for predicting patient-specic probabilities of in-hospital mortality in patients hospitalized with heart failure.

identifying important interactions in the data [7e9] and in identifying clinical subgroups of subjects at very high or very low risk of adverse outcomes [10]. Advantages of tree-based methods are that they do not require one to parametrically specify the nature of the relationship between the predictor variables and the outcome. Additionally, assumptions of linearity that are frequently made in conventional regression models are not required for tree-based regression methods. Classication and regression trees are data-driven methods of analyses: the data dictate both the variables that are used in the resultant tree and the values at which splits are made on those variables. This is in contrast to a classical regression model in which the analyst dictates both the variables that are entered in the model and how those variables are treated. In a classical regression model, the data are used only to estimate the regression coefcients for the prespecied regression model. Automated variable-selection methods, such as backward variable elimination or forward variable selection, are examples of data-driven methods of analysis that are familiar to many medical researchers. Earlier studies have shown that automated variable-selection methods result in nonreproducible models and have a low probability of correctly identifying the true predictors of an outcome [11e14]. The reproducibility and stability of regression trees have not been well examined. Both logistic regression models and regression trees have been developed for predicting mortality in heart failure patients. Fonarow et al. [2] derived a regression tree using data from the ADHERE Registry for predicting the probability of in-hospital mortality in patients hospitalized with acutely decompensated heart failure. The regression tree used binary splits on blood urea nitrogen (BUN),

systolic blood pressure, and serum creatinine to partition the sample into ve distinct subgroups, each with its own predicted probability of in-hospital mortality [2]. Predictive accuracy, as measured using the area under the receiver operating characteristic (ROC) curve was 0.668 in an independent validation sample. Lee et al. derived a logistic regression model (the Enhanced Feedback for Effective Cardiac Treatment in Heart Failure [EFFECT-HF] mortality prediction model) for predicting the probability of death within 30 days and 1 year of admission in patients hospitalized with heart failure [1]. For 30-day mortality, this method had an area under the ROC curve of 0.79 in an independent validation sample. The ability of the EFFECTHF model to predict in-hospital mortality has not been fully reported. The current study had three objectives. First, to compare the predictive accuracy of regression trees with that of logistic regression models for predicting in-hospital mortality in a sample of patients hospitalized with acute decompensated heart failure. Second, to examine the stability or reproducibility of regression trees derived for predicting in-hospital mortality in this sample of patients. Third, to explore the nature of the relationship between several important clinical variables and the likelihood of in-hospital mortality after hospitalization with acute decompensated heart failure. 2. Methods 2.1. Data sources The EFFECT study is an initiative intended to improve the quality of care of patients with cardiovascular disease in Ontario, Canada [15,16]. The EFFECT Study consisted of two phases. During the rst phase, detailed clinical data on patients hospitalized with heart failure between April 1, 1999 and March 31, 2001 at 103 acute care hospitals in Ontario, Canada, were obtained by retrospective chart review. During the second phase, data were abstracted on patients hospitalized with heart failure between April 1, 2004 and March 31, 2005 at 96 Ontario hospitals. Data on patient demographics, vital signs and physical examination at presentation, medical history, and results of laboratory tests were collected for this sample. Subjects with missing data on key continuous baseline covariates were excluded from the current study. In the EFFECT Study, detailed clinical data were available on 9,945 and 8,339 patients hospitalized with a diagnosis of heart failure during the rst and second phases of the study, respectively. After excluding subjects with missing data on key variables, 8,240 and 7,609 subjects were available from the rst and second phases, respectively, for inclusion in the current study. The rst phase of the EFFECT Study (hereafter referred to as EFFECT baseline) was used as the derivation sample for the current study. The second phase of the EFFECT

P.C. Austin et al. / Journal of Clinical Epidemiology 63 (2010) 1145e1155

1147

Study (hereafter referred to as EFFECT follow-up) was used as the validation sample in which the predictive accuracy of the different statistical methods was assessed. 2.2. Comparing regression trees with logistic regression models for predicting in-hospital mortality in patients hospitalized with heart failure We compared the predictive accuracy of logistic regression models with that of regression trees for predicting in-hospital mortality. We consider two previously derived logistic regression models and one previously derived regression tree. We also develop three new logistic regression models and one new regression tree for predicting in-hospital mortality. 2.2.1. Logistic regression models for predicting in-hospital mortality 2.2.1.1. Enhanced Feedback for Effective Cardiac Treatment in Heart Failure mortality prediction model. The EFFECT-HF mortality prediction model is a model for predicting the probability of 30-day mortality in patients hospitalized with congestive heart failure. This prediction model uses a logistic regression model with the following predictor variables: age (continuous), systolic blood pressure (continuous), respiratory rate (continuous), serum concentration of sodium (!136 vs. >136 mEq/L), serum concentration of BUN (continuous), history of cerebrovascular disease, history of dementia, history of chronic obstructive pulmonary disease, history of hepatic cirrhosis, and history of cancer. The EFFECT-HF model was initially developed in a subset of the EFFECT baseline sample. Further details of its derivation and validation are provided elsewhere [1]. Although this model was developed for predicting mortality within 30 days of admission, it was used in the current study for predicting in-hospital mortality. For this study, the regression coefcients for the EFFECT-HF model were estimated in the EFFECT baseline sample. Predictions of the probability of in-hospital mortality were then obtained for each subject in the EFFECT follow-up sample using the model coefcients estimated in the EFFECT baseline sample. In addition, we considered two modications of the original EFFECT-HF mortality model. The rst modication was to t the EFFECT-HF mortality prediction model with the inclusion of all possible two-way interactions between main effects. The second modication was to use a generalized additive model (GAM) based on the EFFECT-HF model. A GAM is an additive regression model in which continuous predictor variables can be modeled using nonparametric scatterplot smoothers or regression splines [17]. This allows one to relax the linearity assumption between the predictor variable and the outcome. Furthermore, one does not have to specify the nature of the relationship but can allow the data to dictate the nature of the relationship. We t a GAM in which the four continuous variables (age,

systolic blood pressure, respiratory rate, and BUN) that were modeled as linear effects in the EFFECT-HF mortality prediction model were modeled using smoothing splines, each with ve degrees of freedom. The remaining dichotomous variables were treated as such. 2.2.1.2. Logistic regression models developed using backward variable elimination. We developed a logistic regression to predict in-hospital mortality using backward variable elimination in the EFFECT baseline sample. The initial model contained the 28 predictor variables listed in Table 1. Variables were sequentially eliminated until all variables retained in the nal model were signicant with a P ! 0.05. The predictive accuracy of the resultant model was assessed in the EFFECT follow-up validation sample. 2.2.1.3. ADHERE logistic regression model. Using the ADHERE Registry, Fonarow et al. developed a logistic regression model to predict the probability of in-hospital mortality [2]. This logistic regression model used the following four predictor variables: BUN, systolic blood pressure, heart rate, and age. Each of these four variables was assumed to have a linear relationship with the log odds of in-hospital mortality. We rst used the ADHERE logistic regression model with the coefcients derived in the ADHERE Registry [1]. We then recalibrated the model by estimating the regression coefcients for the ADHERE logistic regression model in the EFFECT baseline derivation sample. Predictions of the probability of in-hospital mortality were obtained for each subject in the EFFECT follow-up validation sample. 2.2.2. Regression trees for predicting in-hospital mortality We considered three different regression trees for predicting in-hospital mortality. Predicted probabilities of in-hospital mortality were obtained for each subject in the EFFECT follow-up validation sample. 2.2.2.1. ADHERE regression tree. The rst regression tree was the one previously developed in the ADHERE Registry for predicting in-hospital mortality in patients hospitalized with acutely decompensated heart failure [2]. This regression tree, whose derivation is described in greater detail elsewhere, uses BUN, systolic blood pressure, and serum creatinine to predict in-hospital mortality [2]. The regression tree has ve terminal nodes or leaves, with predicted probabilities of in-hospital mortality ranging from 2.14% to 21.94%. This tree used the predicted probabilities of mortality obtained by Fonarow et al. in their derivation sample of 32,046 hospitalization episodes. For this previously derived regression tree, predictions were obtained directly for each subject in the EFFECT follow-up validation sample. 2.2.2.2. Recalibrated Fonarow regression tree. The second regression tree was the ADHERE regression tree

1148

P.C. Austin et al. / Journal of Clinical Epidemiology 63 (2010) 1145e1155

Table 1 Demographic and clinical characteristics of the 8,236 heart failure patients in the EFFECT baseline derivation study sample Variable Demographic characteristics Age, years Female Vital signs on admission Systolic blood pressure, mm Hg Heart rate, beats per minute Respiratory rate, breaths per minute Presenting signs and physical examination Neck vein distension S3 S4 Rales O50% of lung eld Findings on chest X-ray Pulmonary edema Cardiomegaly Past medical history Diabetes Cerebrovascular accident (CVA)/Transient ischemic attack (TIA) Previous MI Atrial brillation Peripheral vascular disease Chronic obstructive pulmonary disease Dementia Cirrhosis Cancer Electrocardiogramdrst available within 48 hr Left bundle branch block Laboratory tests Hemoglobin, g/L White blood count, 10E9/L Sodium, mmol/L Potassium, mmol/L Glucose, mmol/L BUN, mmol/L Creatinine, mmol/L EFFECT baseline sample (N 5 8,236) 77 (70e84) 4,154 (50.4%) 146 (126e170) 92 (76e110) 24 (20e30) 4,516 785 302 902 (54.8%) (9.5%) (3.7%) (11.0%) In-hospital death: No (N 5 7,613) 77 (69e83) 3,820 (50.2%) 148 (128e171) 92 (76e110) 24 (20e30) 4,202 750 293 791 (55.2%) (9.9%) (3.8%) (10.4%) In-hospital death: Yes (N 5 623) 82 (76e88) 334 (53.6%) 130 (110e150) 92 (78e110) 26 (20e32) 314 35 9 111 (50.4%) (5.6%) (1.4%) (17.8%) P-value !0.001 0.099 !0.001 0.664 !0.001 0.021 !0.001 0.002 !0.001 0.285 0.172 0.064 !0.001 0.319 0.161 0.081 !0.001 !0.001 0.043 0.004 0.168 !0.001 !0.001 !0.001 !0.001 0.007 !0.001 !0.001

4,215 (51.2%) 2,944 (35.7%) 2,871 1,372 3,021 2,402 1,082 1,404 642 63 948 (34.9%) (16.7%) (36.7%) (29.2%) (13.1%) (17.0%) (7.8%) (0.8%) (11.5%)

3,909 (51.3%) 2,737 (36.0%) 2,675 1,220 2,804 2,205 986 1,265 513 54 854 (35.1%) (16.0%) (36.8%) (29.0%) (13.0%) (16.6%) (6.7%) (0.7%) (11.2%)

306 (49.1%) 207 (33.2%) 196 152 217 197 96 139 129 9 94 (31.5%) (24.4%) (34.8%) (31.6%) (15.4%) (22.3%) (20.7%) (1.4%) (15.1%)

1,232 (15.0%) 124 9 139 4 8 8 106 (110e138) (7e12) (136e141) (4e5) (6e11) (6e12) (83e145)

1,127 (14.8%) 125 9 139 4 8 8 105 (110e138) (7e12) (136e141) (4e5) (6e11) (6e12) (83e142)

105 (16.9%) 121 10 138 4 8 12 129 (105e136) (8e13) (134e141) (4e5) (6e11) (9e18) (95e185)

Continuous variables are reported as medians (25the75th percentiles) and dichotomous variables are reported as N (%). Abbreviations: EFFECT, Enhanced Feedback for Effective Cardiac Treatment; BUN, blood urea nitrogen.

recalibrated to our EFFECT baseline derivation sample. Using this approach, the predicted probability of in-hospital mortality for each terminal node of the ADHERE regression tree was replaced by the estimated probability of in-hospital mortality for subjects in the EFFECT baseline sample who lay within that terminal node. Predicted probabilities of in-hospital mortality were then obtained for each subject in the EFFECT follow-up validation sample. 2.2.2.3. Regression tree derived using Enhanced Feedback for Effective Cardiac Treatment baseline data. The third regression tree was derived in the EFFECT baseline derivation sample. To do so, the EFFECT baseline derivation sample was randomly divided into two components: EFFECT baseline (A) and EFFECT baseline (B). EFFECT baseline (A) contained two-thirds of the subjects in the

EFFECT baseline sample, whereas EFFECT baseline (B) contained the remaining one-third of the subjects. An initial regression tree was grown in EFFECT baseline (A) using all 28 candidate predictor variables listed in Table 1. Binary recursive partitioning was used to grow the regression tree. Criteria for growing the tree included the following: at a given node, the partition was chosen that maximized the reduction in deviance; the smallest permitted node size was 10; and a node was not subsequently partitioned if the within-node deviance was less than 0.01 of that of the root node. Once the initial regression tree had been grown, the tree was pruned. The optimal number of leaves was determined by identifying the tree size that minimized the tree deviance when the EFFECT baseline (B) sample was used as a validation sample [18]. The initial regression tree was then pruned so as to produce a nal tree of the desired size. The resultant regression tree was then used to predict the

P.C. Austin et al. / Journal of Clinical Epidemiology 63 (2010) 1145e1155

1149

probability of in-hospital mortality for each subject in the EFFECT follow-up validation sample. 2.2.3. Measuring predictive accuracy To determine the predictive accuracy of each method, predicted probabilities of in-hospital mortality were obtained for subjects in the EFFECT follow-up validation sample using each of the different statistical methods. The predictive accuracy of each method was summarized by the area under the ROC curve [19]. This is equivalent to the c-statistic [19]. It has been suggested that the predictive ability of models be quantied using the ROC curve area [19]. Harrell has suggested that the predictive ability of models can also be quantied using the generalized R2 N index of Nagelkerke [20] and Cragg and Uhler [21] and by Briers score [19]. Briers score is dened as follows: B5
n 1X bi Yi 2 P n i51

2.2.2.3, a regression tree was grown in the rst component and pruned so as to minimize deviance in the second component. The predictive accuracy of the resultant regression tree was determined using the EFFECT follow-up validation sample. We noted the following characteristics of the regression tree grown in the random derivation sample drawn from the EFFECT baseline sample: the number of leaves or terminal nodes; the number of variables used in growing the tree; the rst variable on which a binary split was made in the resultant tree; the value at which a binary split was made on this rst variable. This process was then repeated 1,000 times, and the results were summarized across the 1,000 randomly drawn derivation components. 2.3.2. Stability of logistic regression models developed using backward variable elimination We used bootstrap methods to assess the stability of logistic regression models developed using backward variable elimination. We drew 1,000 bootstrap samples from the EFFECT baseline derivation sample. Within each bootstrap sample, we used backward variable elimination to develop a parsimonious model for predicting in-hospital mortality. We noted the variables that had been selected for inclusion in the nal model. We also assessed the predictive accuracy of the nal model in the EFFECT follow-up validation sample. Results were averaged across the 1,000 bootstrap samples. 2.4. Characterizing the relationship between important continuous variables and in-hospital heart failure mortality We used GAMs to describe the relationship between continuous predictor variables and the log odds of inhospital mortality. One thousand bootstrap samples were drawn from the EFFECT baseline derivation sample, and the GAM, described in Section 2.2.1.1, was tted in each of these 1,000 bootstrap samples. For each value of the four continuous predictor variables, we determined the predicted log odds of in-hospital mortality, holding the other continuous variables xed at the EFFECT baseline sample median and the dichotomous predictor variables set to the EFFECT baseline sample mode. Allowing each of the four continuous variables to incrementally increase across the observed range for that variable in the EFFECT baseline sample, we then computed the mean log odds of in-hospital mortality and the 2.5th and 97.5th percentiles of the log odds of in-hospital mortality across the 1,000 models t to the bootstrap samples. This approach to determine the nature of the relationship between continuous covariates and binary outcomes has been described elsewhere [22]. All model tting and model validation was done using the R statistical programming language (Vienna, Austria) [23].

bi is the predicted probability and Y bi is the observed where P response for the ith subject. Although these indices are less commonly used than the area under the ROC curve, we include them in this study for comparative purposes. Accordingly, we computed the generalized R2 N index and Briers score in the validation sample. A limitation in using the area under the ROC curve is that it does not take into account the magnitude of the disagreement between observed and predicted responses. The area under the ROC curve is equivalent to the proportion of all pairs consisting of one subject who experienced the outcome and one subject who did not experience the outcome, in which the subject who experienced the outcome had a higher predicted probability of experiencing the outcome than does the subject who did not experience the outcome. As such, the magnitude of the disagreement is not taken into account. Indices, such as Briers score, take into account the magnitude of the difference between observed and predicted responses. 2.3. Examining the stability of data-driven methods of analysis In Section 2.2, we described two different data-driven methods of deriving a predictive model: backward variable elimination and regression trees. In this section, we examine the stability of models derived using these approaches. 2.3.1. The stability of regression trees for predicting heart failure in-hospital mortality We randomly split the EFFECT baseline derivation sample into two components. First component consisted of two-thirds of the subjects, whereas the second component consisted of the remaining one-third of the subjects. Using methods similar to those described in Section

1150

P.C. Austin et al. / Journal of Clinical Epidemiology 63 (2010) 1145e1155

Table 2 Predictive accuracy of the different models in the EFFECT follow-up validation sample ROC curve area 0.772 0.765 0.773 0.775 0.747 0.751 Generalized R2 N index 0.154 0.141 0.154 0.161 0.136 0.142 Brier score 0.062 0.063 0.062 0.061 0.063 0.061

Model Logistic regression models EFFECT-HF mortality model EFFECT-HF model with two-way interactions Generalized additive model Logistic regressiondbackward variable elimination Logistic regression (ADHERE modeldoriginal coefcients) Logistic regression (ADHERE model recalibrated) Regression trees Regression tree (ADHERE tree) Regression tree (recalibrated ADHERE tree) Regression tree (grown to sample)

the chi-squared test were used to compare continuous and categorical characteristics, respectively, between patients who died before hospital discharge and those who survived till hospital discharge. There were statistically signicant differences in several baseline characteristics between those who died before hospital discharge and those who were discharged alive from hospital. 3.1. Comparison of the predictive accuracy of logistic regression models with that of regression trees for predicting in-hospital mortality The area under the ROC curve for each model in the EFFECT follow-up validation sample is reported in Table 2. In the EFFECT follow-up validation sample, the ROC curve area for the ADHERE regression tree was 0.650, whereas it was 0.620 for the recalibrated ADHERE regression tree. The regression tree grown in the EFFECT baseline derivation sample is described in Fig. 1. It had an ROC curve area of 0.633 when applied to the EFFECT follow-up validation sample. In contrast, the ROC curve area for the EFFECT-HF mortality prediction model was 0.772. Using a GAM resulted in a negligible increase of the ROC curve area to 0.773. When the EFFECT-HF mortality prediction model was modied by including all two-way interactions between the variables in the model, the ROC curve area of this modied EFFECT-HF model was 0.765. The recalibrated ADHERE logistic regression model had an ROC curve area of 0.751 in the EFFECT follow-up validation sample, whereas the model with the original regression coefcients had an ROC curve area of 0.747. The generalized R2 N index and Briers score in the validation samples are also reported in Table 2 for each of the predictive models. The ADHERE regression tree, the recalibrated ADHERE regression tree, and the newly developed regression tree had the lowest generalized R2 N index in the EFFECT follow-up validation sample. The generalized R2 N index for the EFFECT-HF mortality prediction model was 0.154, whereas it was 0.065 for the ADHERE regression tree and 0.043 for the recalibrated ADHERE regression tree. The generalized R2 N index of the ADHERE logistic regression model was 0.142 in the validation sample. When assessed using Briers score, the recalibrated ADHERE regression tree had marginally better performance compared with the EFFECT-HF mortality prediction model and the GAM. However, the original ADHERE regression tree and the regression tree grown to the study sample had a greater prediction error compared with those of the other methods. 3.2. Reproducibility of data-driven methods of analysis 3.2.1. The reproducibility of regression trees for predicting in-hospital mortality There was substantial heterogeneity in the regression trees that were grown in the 1,000 derivation samples

0.651 0.620 0.633

0.065 0.043 0.055

0.066 0.059 0.064

Abbreviations: ROC, receiver operating characteristic; EFFECT-HF, Enhanced Feedback for Effective Cardiac Treatment in Heart Failure.

3. Results The demographic and clinical characteristics of patients in the EFFECT baseline derivation sample are described in Table 1. Prevalence of dichotomous variables and medians and the 25th and 75th percentiles of continuous variables considered in the current study are reported for the entire EFFECT baseline sample and separately for those who died before hospital discharge and for those who survived to hospital discharge. The median patient age was 77 years (interquartile range: 70e84), and 50.4% of the sample members were females. Overall, 623 (7.6%) patients died before hospital discharge. The KruskaleWallis test and

Fig. 1. Regression tree grown in Enhanced Feedback for Effective Cardiac Treatment baseline sample.

P.C. Austin et al. / Journal of Clinical Epidemiology 63 (2010) 1145e1155

1151

randomly drawn from the EFFECT baseline derivation sample. The number of terminal nodes or leaves ranged from a minimum of 1 to a maximum of 6 (a tree with only one terminal node implies that the regression tree had no binary splits and that all subjects are in the same terminal node, which is the root of the tree). The percentages of trees that had 1, 2, 3, 4, 5, and 6 terminal nodes were 0.1%, 8.0%, 49.2%, 32.2%, 8.9%, and 1.6%, respectively. The number of variables used in constructing the regression trees ranged from a minimum of 0 to a maximum of 5. The percentages of trees that used 0, 1, 2, 3, 4, or 5 variables were 0.1%, 8.0%, 50.4%, 31.7%, 8.4%, and 1.4%, respectively. In 96.9% of the regression trees, the rst variable used to determine a binary split was BUN; in 0.5% of the trees, it was dementia; in 2.5% of the trees, it was systolic blood pressure; whereas in 0.1% of trees, no binary splits were made. In those 969 regression trees in which the rst split was on BUN, the value of the variable on which the split was made ranged from a minimum of 8.75 to a maximum of 17.05 (25th percentile and median: 12.15; 75th percentile: 12.15). In those 25 regression trees in which the rst split was on systolic blood pressure, the value of the systolic blood pressure on which the split was made ranged from a minimum of 120.5 to a maximum of 121.5 (25th percentile: 121.5; median and 75th percentile: 121.5). The variable age was used in 17.8% of the regression trees grown on the 1,000 random samples drawn from the EFFECT baseline derivation sample. Other variables that were used in at least one of the regression trees were as follows: systolic blood pressure (in 80.9% of regression trees), rales (2.9%), dementia (18.7%), hemoglobin (0.4%), white blood count (1.3%), potassium (22.5%), BUN (99.6%), and sodium (0.4%). Thus, although each of systolic blood pressure and BUN were used in most derived regression trees, no variables were used in all derived regression trees. Four of the derived regression trees are illustrated in Fig. 2. The regression tree depicted in Fig. 2a had two terminal nodes and classied patients into low-risk patients (BUN ! 12.15 mmol/L with a predicted in-hospital mortality rate of 4.6%) and high-risk patients (BUN > 12.15 mmol/L with a predicted in-hospital mortality rate of 16.0%). The regression tree depicted in the Fig. 2b has three terminal nodes and uses two variables. In this tree, the rst split is on BUN (BUN ! 12.15 mmol/L vs. BUN > 12.15 mmol/L). Predicted probabilities of mortality for the three subgroups of patients were 5.0%, 10.2%, and 24.7%. The two regression trees depicted in Fig. 2c and d had ve and six terminal nodes, respectively. In the latter regression tree, predicted probabilities of in-hospital mortality ranged from a minimum of 4.2% to a maximum of 79.0%. The 1,000 regression trees derived in the random samples drawn from the EFFECT baseline derivation sample were used to obtain predictions of in-hospital mortality in the EFFECT follow-up validation sample. The mean ROC curve area in the EFFECT follow-up validation sample was

0.637. The ROC curve areas ranged from 0.5 to 0.676. The 25th and 75th percentiles were 0.633 and 0.637, respectively. The mean generalized R2 N index across the 1,000 regression trees was 0.056. The generalized R2 N indices ranged from 0 to 0.080. The 25th and 75th percentiles were 0.052 and 0.058, respectively. The mean Brier score across the 1,000 regression trees was 0.065. The Brier scores ranged from 0.064 to 0.067. The 25th and 75th percentiles were 0.064 and 0.065, respectively. Thus, even the derived regression tree with the greatest predictive accuracy in the EFFECT follow-up validation sample had less predictive accuracy than the EFFECT-HF mortality prediction model. 3.2.2. Backward variable elimination for developing predictive models The 1,000 logistic regression models derived using backward variable elimination in bootstrap samples drawn from the EFFECT baseline derivation sample were used to obtain predictions of in-hospital mortality in the EFFECT follow-up validation sample. The mean ROC curve area in the EFFECT follow-up validation sample was 0.772. The ROC curve areas ranged from 0.756 to 0.783. The 25th and 75th percentiles were 0.769 and 0.774, respectively. The mean generalized R2 N index across the 1,000 logistic regression models was 0.159. The generalized R2 N indices ranged from 0.143 to 0.175. The 25th and 75th percentiles were 0.155 and 0.162, respectively. The mean Brier score across the 1,000 logistic regression models was 0.062. The Brier scores ranged from 0.061 to 0.064. The 25th and 75th percentiles were 0.061 and 0.062, respectively. Thus, even the derived logistic regression model with the lowest ROC curve area in the EFFECT follow-up validation sample had an ROC curve area which exceeded that of the three different regression tree methods. Despite the relative consistency of the ROC curve areas of the 1,000 logistic regression models derived using backward variable elimination, the regression models differed in terms of the retained variables. The number of retained variables ranged from 9 to 21, with a median of 14 (25th and 75th percentiles: 13 and 16, respectively). Each of the 28 candidate predictor variables was included in at least 2.9% of the nal regression models. Age, systolic blood pressure, dementia, and BUN were included in all 1,000 logistic regression models derived using backward variable elimination. Atrial brillation, female sex, diabetes, left bundle branch block, and pulmonary edema were retained in 9.7%, 8.9%, 6.9%, 3.5%, and 2.9% of the nal regression models, respectively. 3.3. Relationship between key continuous predictor variables and in-hospital CHF mortality The relationship between age, systolic blood pressure, respiratory rate, and BUN and the mean log odds of inhospital mortality is described in Fig. 3, along with the

1152

P.C. Austin et al. / Journal of Clinical Epidemiology 63 (2010) 1145e1155

Fig. 2. Four different regression trees for predicting in-hospital mortality in patients with heart failure. BUN, blood urea nitrogen.

empirical 2.5th and 97.5th percentiles of the predicted log odds of in-hospital mortality across the 1,000 bootstrap samples drawn from the EFFECT baseline derivation sample. Superimposed on each of the four panels is a density function describing the distribution of the given variable in the EFFECT baseline sample. One observes that the relationship between age and the log odds of in-hospital mortality was approximately linear over the entire range of age. For the remaining three continuous variables, nonlinear relationships were evident. However, for systolic blood pressure, respiratory rate, and BUN, the relationships were predominately linear over a range of the distribution in which most of the subjects lay.

4. Discussion There is an increasing interest in using classication and regression trees to predict the probabilities of adverse outcomes for patients undergoing medical or surgical treatment. The current study had three primary ndings. First, regression trees did not predict in-hospital mortality in patients hospitalized with acute decompensated heart failure as accurately as logistic regression models. Second, different regression trees may be grown in samples that do not differ systematically from one another. Third, several important baseline covariates had relationships with in-hospital mortality that were approximately linear over

P.C. Austin et al. / Journal of Clinical Epidemiology 63 (2010) 1145e1155

1153

a
Logodds of inhospital mortality 2

b
Logodds of inhospital mortality 2 5 4 3

10

20

40

60 Age

80

100

50

100

150 Systolic BP

200

250

c
2.0 Logodds of inhospital mortality

d
Logodds of inhospital mortality 1 5 4 3 2 0

5.0

4.0

3.0

10

20

30 Respiratory rate

40

50

60

10

20

30

40

50

60

Blood urea nitrogen (mmol/L)

Fig. 3. Relationship between continuous variables and log odds of in-hospital mortality in 1,000 bootstrap samples (Enhanced Feedback for Effective Cardiac Treatment baseline). (a) Age and log odds of in-hospital mortality; (b) systolic blood pressure and log odds of in-hospital mortality; (c) respiratory rate and log odds of in-hospital mortality; (d) blood urea nitrogen and log odds of in-hospital mortality.

a range of that variable in which most of the subjects lay. We discuss implications of each of these ndings. We found that all the logistic regression models considered had greater predictive accuracy compared with the regression trees that we considered. Arguably, one of the primary benets of regression trees is their simplicity. However, our results suggest that mortality prediction using a simple tree is potentially fallible because of poor discriminative ability. The area under the ROC curve of the EFFECT-HF mortality prediction model was 0.772 in the EFFECT follow-up validation sample, whereas it was 0.651 for the ADHERE regression tree. These ndings suggest that modeling death from heart failure (and potentially other diseases) is represented by a trade-off between model simplicity and predictive accuracy. However, these factors are not equivalent in their importance, because predictive accuracy is essential for mortality models, and model simplicity is a convenience that is conducive to wider clinical use. We found that regression trees grown on different random samples drawn from the same original derivation sample differed from one another. The number of terminal nodes or leaves ranged from 0 to 6 across the 1,000 regression trees. Similarly, three different variables were used as the rst variable for partitioning subjects across the 1,000 regression trees. Finally, even among the trees that used

BUN as the rst variable for binary partitioning, there were clinically important differences in the values of this variable that were used for partitioning the sample. For instance, the value of BUN that was used for partitioning the sample ranged from 8.75 to 17.05 mmol/L. The nonreproducibility of models derived using data-driven methods of analysis has been observed previously. For instance, Austin and Tu examined the use of automated variableselection methods for identifying variables associated with mortality after hospitalization with acute myocardial infarction (AMI) [11]. Using backward variable selection in 1,000 bootstrap samples, they identied 940 unique regression models. No model was selected in more than four bootstrap samples. Similar results were obtained when forward variable selection or stepwise variable selection was used. The ndings from the current study corroborate the observation that random variations between samples can result in different models being selected or derived when data-driven methods of analysis are used. When backward variable elimination was used in 1,000 bootstrap samples drawn from the derivation sample, we found that the resultant models differed across the bootstrap samples. However, all the resultant models had larger areas under the ROC curve in the validation sample than those of the regression trees, with ROC curves that were similar to those of the EFFECT-HF mortality model.

1154

P.C. Austin et al. / Journal of Clinical Epidemiology 63 (2010) 1145e1155

We found that the relationship between the log odds of in-hospital mortality and age, systolic blood pressure, respiratory rate, and BUN was approximately linear over the range of most of the distribution for each of these variables. This nding helps explain the superior performance of the EFFECT-HF mortality prediction model compared with that of the regression trees. The EFFECT-HF mortality prediction model was able to exploit the strong underlying linear relationships in the data. However, regression trees rely on partitioning the sample using binary decision rules, and thus, lose the ability to incorporate underlying linear but not piecewise constant relationships. One of the described advantages of regression tree methods is their ability to identify interactions among the predictor variables. We t two separate logistic regression models: one with only main effects and one that incorporated two-way interactions. The predictive performance of the two models in the validation samples was very similar. Thus, although important interactions may exist, their inclusion does not improve the prediction of CHF mortality. Regression trees have difculty in capturing additive relationships [24]. This may have contributed to the poor performance of this method in our sample. Our ndings are similar to those of an earlier study comparing the performance of logistic regression with that of modern data-driven methods of regression for predicting mortality after hospitalization for an AMI [22]. It was shown that the predictive accuracy of logistic regression substantially exceeded that of regression trees. However, the predictive accuracy of logistic regression was, at most, only slightly less than that of GAMs or multivariate adaptive regression spline models. There are certain limitations to the current study. First, our objective was not to compare the predictive accuracy of regression trees and logistic regression models in general. Instead, our objective was to compare the predictive accuracy of regression trees with that of logistic regression models for predicting in-hospital mortality in patients hospitalized with heart failure. Our ndings may not be applicable to patients with other diseases or for other outcomes. Second, we focused on predicting each patients probability of in-hospital mortality, rather than on classifying each patients vital status. Although regression trees can be used for prediction or classication, logistic regression results in a predicted probability of mortality for each subject. Deriving a classication from a logistic regression model would require a rule for dichotomizing the predicted probability. Given the lack of agreement on the best method for dichotomizing a predicted probability, our focus was on comparing the accuracy of each method for predicting a patients probability of mortality. Furthermore, a limitation of binary classication schemes is that, in some clinical scenarios, investigators may want to stratify patients into more than two levels of risk, based on the predicted probability of mortality. Binary classication schemes classify subjects into two strata: those predicted to die and those predicted to survive. A third limitation to the current study was the

deletion of subjects with missing data on covariates used in the regression models. Subjects with missing data may have been systematically different from those with complete data. However, our study was primarily methodological in nature; our objective was to compare the relative performance of two different methods for estimating patient-specic probabilities of mortality. We then conducted a series of additional analyses to examine why, in this particular sample, logistic regression had superior performance compared with that of regression trees. In the current study, we included predictive models developed in prior studies: the ADHERE mortality prediction models and the EFFECT-HF mortality prediction model. The ADHERE models were included, as the ADHERE regression tree is, to the best of our knowledge, the only regression tree-based method for predicting outcomes in patients hospitalized with heart failure. The focus of the ADHERE models on in-hospital mortality limited our choice of what other previously developed models could be considered in the inclusion of the current article. The EFFECT-HF model can be used for predicting short-term mortality in patients hospitalized with heart failure. The EFFECT-HF and ADHERE models use a small set of variables that are readily available soon after hospitalization. In particular, neither requires left ventricular ejection fraction. There are several risk-prediction models for mortality in heart failure patients that were not considered in this study. The MUSIC (MUerte aca) risk score predicts mortal bita en Insuciencia Card Su ity in ambulatory patients with heart failure [25], whereas the EFFECTand ADHERE models are for use in hospitalized patients. The OPTIMIZE-HF (Organized Program to Initiate Lifesaving Treatment in Hospitalized Patients with Heart Failure) model for in-hospital mortality requires the knowledge of the presence/absence of left ventricular systolic function [26], whereas this information is available only on a subset of the EFFECT patients. The Digitalis Investigation Group (DIG) models are for use in patients with subtypes of systolic function and predicts long-term mortality (rather than in-hospital mortality) [27,28]. Finally, the Seattle HF model predicts mortality at 1, 2, and 3 years (rather than inhospital mortality) [29]. For these reasons, none of these prediction models were considered in the current article. In conclusion, we demonstrated that logistic regression had superior predictive ability compared with regression trees for predicting in-hospital mortality in patients hospitalized with heart failure. Furthermore, regression trees grown in a specic patient sample may not be reproducible in samples that do not differ systematically from the original derivation sample.

Acknowledgments This study was supported by the Institute for Clinical Evaluative Sciences (ICES), which is funded by an annual grant from the Ontario Ministry of Health and Long-Term

P.C. Austin et al. / Journal of Clinical Epidemiology 63 (2010) 1145e1155

1155

Care (MOHLTC). The opinions, results, and conclusions reported in this article are those of the authors and are independent from the funding sources. No endorsement by ICES or the Ontario MOHLTC is intended or should be inferred. This research was supported by an operating grant from the Canadian Institutes of Health Research (CIHR) (MOP 86508). Dr. Austin is supported in part by a Career Investigator award from the Heart and Stroke Foundation of Ontario. Dr. Tu is supported by a Tier 1 Canada Research Chair in Health Services Research and a Career Investigator award from the Heart and Stroke Foundation of Ontario. Dr. Lee is a clinician-scientist of the CIHR. The data used in this study were obtained from the EFFECT Study. The EFFECT Study was supported by a Canadian Institutes of Health Research team grant in cardiovascular outcomes research to the Canadian Cardiovascular Outcomes Research Team; it was initially funded by a Canadian Institutes of Health Research Interdisciplinary Health Research Team grant and a grant from the Heart and Stroke Foundation of Canada. The study funder has no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; and preparation, review, or approval of the manuscript.

[11]

[12] [13]

[14] [15]

[16]

[17] [18]

[19] [20] [21]

References
[1] Lee DS, Austin PC, Rouleau JL, Liu PP, Naimark D, Tu JV. Predicting mortality among patients hospitalized for heart failure: derivation and validation of a clinical model. JAMA 2003;290:2581e7. [2] Fonarow GC, Adams KF Jr, Abraham WT, Yancy CW. Risk stratication for in-hospital mortality in acutely decompensated heart failure. JAMA 2005;293:572e80. [3] Tu JV, Jaglal SB, Naylor CD. Multicenter validation of a risk index for mortality, intensive care unit stay, and overall hospital length of stay after cardiac surgery. Steering Committee of the Provincial Adult Cardiac Care Network of Ontario. Circulation 1995;91:677e84. [4] Lee KL, Woodlief LH, Topol EJ, Weaver WD, Betriu A, Col J, et al. Predictors of 30-day mortality in the era of reperfusion for acute myocardial infarction. Circulation 1995;91:1659e68. [5] Sullivan LM, Massaro JM, DAgostino RB. Presentation of multivariate data for clinical use: The Framingham Study risk score functions. Stat Med 2004;23:1631e60. [6] Breiman L, Freidman JH, Olshen RA, Stone CJ. Classication and regression trees. Boca Raton, FL: Chapman&Hall/CRC; 1998. [7] Sauerbrei W, Madjar H, Prompeler HJ. Differentiation of benign and malignant breast tumors by logistic regression and a classication tree using Doppler ow signals. Methods Inf Med 1998;37:226e34. [8] Gansky SA. Dental data mining: potential pitfalls and practical issues. Adv Dent Res 2003;17:109e14. [9] Nelson LM, Bloch DA, Longstreth WT Jr, Shi H. Recursive partitioning for the identication of disease risk subgroups: a case-control study of subarachnoid hemorrhage. J Clin Epidemiol 1998;51:199e209. [10] Lemon SC, Roy J, Clark MA, Friedmann PD, Rakowski W. Classication and regression tree analysis in public health: methodological

[22]

[23]

[24]

[25]

[26]

[27]

[28]

[29]

review and comparison with logistic regression. Ann Behav Med 2003;26:172e81. Austin PC, Tu JV. Automated variable selection methods for logistic regression produced unstable models for predicting acute myocardial infarction mortality. J Clin Epidemiol 2004;57:1138e46. Austin PC. The large-sample performance of backwards variable elimination. J Appl Stat 2008;35:1355e70. Derkson S, Keselman HJ. Backward, forward and stepwise automated subset selection algorithms: frequency of obtaining authentic and noise variables. Br J Math Stat Psychol 1992;45:265e82. Flack VF, Chang PC. Frequency of selecting noise variables in subset regression analysis: a simulation study. Am Stat 1987;14:84e6. Tu JV, Donovan LR, Lee DS, Austin PC, Ko DT, Wang JT, et al. Quality of cardiac care in Ontario. Ontario, Canada: Institute for Clinical Evaluative Sciences; 2004. Tu JV, Donovan LR, Lee DS, Wang JT, Austin PC, Alter DA, et al. Effectiveness of public report cards for improving the quality of cardiac care: the EFFECT study: a randomized trial. JAMA 2009;302: 2330e7. Hastie TJ, Tibshirani RJ. Generalized additive models. London, UK: Chapman & Hall; 1990. Clark LA, Pregibon D. Tree-based methods. In: Chambers JM, Hastie TJ, editors. Statistical models in S. New York, NY: Chapman & Hall; 1993. pp. 377e419. Harrell FE Jr. Regression modeling strategies. New York, NY: Springer-Verlag; 2001. Nagelkerke NJD. A note on a general denition of the coefcient of determination. Biometrika 1991;78:691e2. Cragg JG, Uhler R. The demand for automobiles. Can J Econ 1970;3: 386e406. Austin PC. A comparison of classication and regression trees, logistic regression, generalized additive models, and multivariate adaptive regression splines for predicting AMI mortality. Stat Med 2007;26: 2937e57. R Core Development Team. R: a language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing; 2005. Hastie T, Tibshirani R, Friedman J. The elements of statistical learning. Data mining, inference, and prediction. New York, NY: Springer-Verlag; 2001. Vazquez R, Bayes-Genis A, Cygankiewicz I, Pascual-Figal D, Grigorian-Shamagian L, Pavon R, et alMUSIC Investigators. The MUSIC Risk score: a simple method for predicting mortality in ambulatory patients with chronic heart failure. Eur Heart J 2009;30: 1088e96. Abraham WT, Fonarow GC, Albert NM, Stough WG, Gheorghiade M, Greenberg BH, et alOPTIMIZE-HF Investigators and Coordinators. Predictors of in-hospital mortality in patientshospitalized for heart failure: insights from the Organized Program to Initiate Lifesaving Treatment in Hospitalized Patients with Heart Failure (OPTIMIZE-HF). J Am Coll Cardiol 2008;52:347e56. Jones RC, Francis GS, Lauer MS. Predictors of mortality in patients with heart failure and preserved systolic function in the Digitalis Investigation Group trial. J Am Coll Cardiol 2004;44:1025e9. Brophy JM, Dagenais GR, McSherry F, Williford W, Yusuf S. A multivariate model for predicting mortality in patients with heart failure and systolic dysfunction. Am J Med 2004;116:300e4. Levy WC, Mozaffarian D, Linker DT, Sutradhar SC, Anker SD, Cropp AB, et al. The Seattle heart failure model: prediction of survival in heart failure. Circulation 2006;113:1424e33.

Vous aimerez peut-être aussi