Guidelines in Analysis Phase PDF

Guidelines in the Analysis Phase
Analysis plan ...................................................................................................................................... 2 Data analysis in general ..................................................................................................................... 5 Initial data analysis ............................................................................................................................. 9 Post-hoc and sensitivity analyses ......................................................................................................11 Data analysis documentation.............................................................................................................13 Reporting results in tables and figures...............................................................................................14 Guidelines for reporting specific types of studies ...............................................................................17 Prognostic models .............................................................................................................................19 Handling Missing Data.......................................................................................................................26
Updated: 5 July 2012
Title of the document:
Page. 2 of 32 Rev. Nr.: Effective date: 1.2 1 Jan 2010 avs
Analysis plan
HB Nr. : 1.4-01 1. Aim To promote structured and targeted data analysis 2. Definitions An analysis plan is a stepwise plan created prior to the actual data analysis 3. Keywords Research questions, population, variables, analysis methods, stepwise plan
4. Description An analysis plan should be created prior to the data analyses. The analysis plan contains a description of the research question and what the various steps in the analysis are going to be. The analysis plan is intended as a starting point for the analysis. It ensures that the analysis can be undertaken in a targeted manner. However, both the research questions and the analyses may be revised during the data analysis. It may also be that certain options are not yet clear before the start of the data analysis. Even explorative data analysis is possible. The findings and decisions made during the analyses may be documented at a later stage in the analysis plan, meaning the analysis plan becomes a dynamic document. However, there is also the option of documenting findings and decisions made during the data analysis in SPSS syntax (see guideline 1.4-05 Documentation of data analysis). In this instance the analysis plan only serves as the starting point. The concrete research question needs to be formulated firstly within the analysis plan; this is the question intended to be answered by the analyses. Concrete research questions may be defined using the acronym PICO: Population, Intervention, Comparison, Outcomes. A question such as: What are the risk factors for back pain? is too general. An example of a concrete question could be: Does frequent bending at work lead to an elevated risk of lower back pain occurring in employees? (Population = Employees; Intervention = Frequent bending; Comparison = Infrequent bending; Outcome = Occurrence of back pain). Concrete research questions are essential for determining the analyses required. The analysis plan should then describe which statistical techniques are to be used to analyse the data. The following issues need to be considered in this process and described where applicable: Which (subgroup of the) population is to be included in the analyses Data from which endpoint (T1, T2, etc) will be used? Which (dependent and independent) variables are to be used in the analyses and how are the variables supposed to be analysed (e.g. continuous or in categories) Which variables are to be investigated as potential confounders of effect modifiers and how are these variables supposed to be analysed. There are different ways of dealing with confounders. Often variables are only included as confounders if they influence the relationship between the determinant and outcome in actual fact (i.e. when they modify the regression coefficient of the determinant, see example). Another frequently used method is to include all variables that have a significant relationship with the outcome, even if they are perhaps not (strong) confounders. How to deal with missing values Which analyses are to be carried out in which order (e.g. univariate analyses, multivariate analyses, analysis of confounders, analysis of interaction effects, analysis of sub-populations, etc.). A statistician may need to be consulted regarding the choice of statistical techniques. It can be quite efficient to create a number of empty tables to be included in the article prior to the start of data analysis. This is often very helpful in deciding which analyses are exactly required in order to analyse the data in a targeted manner. 2
5. Details Audit questions: Has an analysis plan been created prior to the start of analysis? Has a concrete research question been formulated in the analysis plan? Have the points described under section 4 been considered and have the most important options been decided? Has a stepwise description of the analyses to be applied been provided in the analysis plan? 6. Appendices/references/links 7. Amendments V1.2 1 Jan 2010: English translation. V1.1 21 Jan 2008: Text in guideline has been re-written with more emphasis on a flexible approach. EXAMPLE OF AN ANALYSIS PLAN Work-related psychosocial risk factors in relation to the occurrence of neck complaints. Research question What is the influence of the following psychosocial factors in the occurrence of neck complaints within 1 year in symptom-free employees? 1. Quantitative job demands 2. Skill discretion 3. Decision authority 4. Supervisor support 5. Co-worker support Population All 977 individuals who were symptom-free at baseline measurement and had a full follow-up. Outcome measure (dependent variable) Dichotomous variable: Presence (1) or absence (0) of neck complaints Time variable: Time prior to neck complaint arising (minimum length of time of 1 day) in days Independent variables: All independent variables and confounders are dimensions of the Job Content Questionnaire (Karasek questionnaire). 1. Quantitative job demands 2. Skill discretion 3. Decision authority 4. Supervisor support 5. Co-worker support Confounders: 1. Qualitative job demands 2. Job security For each analysis with 1 central psychosocial factor, the other 4 will be analysed as potential confounders.
Other potential confounders Age Sex Coping styles (3 variables): Avoidance behaviour, seeking social support, approaching problems actively Life events 3
Physical factors in leisure time (9 variables): Intensive sport/heavy physical activity during the last 4 months requiring a lot of exertion; Long-term sitting, computer screen work, working with hands above shoulder height, exertion with hands/arms; having to work in the same position for long periods of time, having to make the same hand/arm movements numerous times per minute, driving a vehicle, bending/twisting the upper body numerous times per hour. Work-related physical factors (11 variables): Percentage of work time neck flexion >45 degrees; Percentage of work time seated; Percentage of work time neck rotation >45 degrees; Frequency of lifting >25 kg per working day; Percentage of work time making repetitive movements with arms/hands and frequency >4 times per minute; Percentage of work time upper arm elevation >60 degrees; Working with hands above shoulder height, Computer screen work; Working with vibrating or pulsating objects; Driving a vehicle at work; Bending/twisting of the upper body numerous times per hour.
Statistical analysis One regression model for each psychosocial factor: - Firstly, univariate Cox regressions; dependent variable neck complaints, independent variable is the central psychosocial factor Confounding - Univariate Cox regressions of all potential confounders. Potential confounders with a p > 0.25 will no longer be considered as confounders. - Multivariate Cox regressions of always 1 central psychosocial factor and 1 potential confounder using p < 0.25. When the change in the regression coefficient of the central psychosocial factor is around 10% or greater, then the potential confounder should be viewed as a true confounder, and this confounder should then be included in the multivariable analysis. - Always add 1 potential confounder: If the change in the regression coefficient is greater than 10%, the confounder should be kept in the model, otherwise it can be excluded. Effect modification - Sex: Create a sex* psychosocial factor interaction. Add the interaction to the final model (with confounders). If the interaction is significant, then there is effect modification present.
Data analysis in general
HB Nr. : 1.4-2 1. Aim Outline of quality aspects of data analysis (principal analyses). 2. Definitions
Modelling Cross-validation Stepwise modelling
Imputing Multilevel analysis
GEE Logistic regression analysis Dichotomy Cox regression Normality Resampling method
Finding a statistical model that works well with the data. Method where the sample is split in two. One half is used to develop the models, the other to test the models developed. Modelling method involving stepwise procedures: A term is removed or added to the model at each step. A distinction is made between: forward stepwise, backward stepwise and stepwise Method of filling in missing values in a dataset. Type of regression analysis where a distinction can be made between more than one level: For instance, data collected from patients within a general practice whereby both the patients as well as general practitioners data play a role. Generalized Estimating Equations: Specific type of multilevel analysis Type of regression analysis where the dependent variable is dichotomous. A variable that can only assume one of two values. Type of survival analysis. The dependent variable reflects length of survival. Property of a variables distribution: The underlying distribution is normal or Gaussian. Method where samples are repeatedly taken from the available data, either to by-pass the distribution requirements of a test (for instance for bootstrapping), or to increase the precision of an estimate.
3. Keywords Data analysis, modelling, regression analysis, (co-)variance analysis, multilevel analysis, GEE, analysis of longitudinal data, factor analysis, structural models, exact testing, non-parametric testing, bootstrapping. 4. Description The variety of methods used in data analysis for medical/epidemiological research is enormous. This note provides an overview of the classes of frequently used methods. Here and there it discusses factors that may have an influence on the quality of interpretation, and therefore the conclusions as well. It is self-evident that no attempt has been made to provide an exhaustive list: The field is simply much too large for this. We discuss the following topics in brief: General modelling Regression analysis (Co)-variance analysis Multilevel analysis Methods for longitudinal data Factor analysis Methods for analysing structural models Special methods: Exact tests, non-parametric tests, bootstrapping General modelling Not very much has been written about the general principles of statistical modelling, although there is literature on modelling within specific academic areas. A general book about statistical modelling is Dixons [1]; Edwards discusses the advantages and disadvantages of 5
iterative (stepwise) methods in detail [2] (compare this with the article by Adr, Kuik, Hoeksma and Mellenbergh [3], which is available here as a handout). There are a number of issues that frequently occur in modelling: Reliability of the models determined Models may be specific to the data provided; this means that they may not be found in follow-up studies. A remedy for this is cross-validation. This method involves randomly splitting the sample in two halves. One half of the sample is used to develop the model, and the other to verify the models . In general, this will require a great number of observations. Stepwise analysis In this method the models are built in a stepwise manner. At each step a term is removed or added to the model. Although there are a number of arguments against this procedure (see Edwards [2]), it is still used frequently. In general it is recommended that only forward stepwise methods are used, preferably using a variation in which the user can confirm or prevent the removal or addition of a term suggested by the programme. Methods may also be used, which, instead of stepwise procedures, run through and detail the competing models [2, 4]. The results of this type of analysis therefore consist of more than one model. Misspecification If there are terms missing from a statistical model (for example confounders), or if the specified model does not represent certain essential aspects (for instance, the use of linear regression analysis whilst the data has a hierarchical structure, meaning multilevel analysis would have been more appropriate), this may influence the results dramatically. Missing observations Many multivariate methods (such as multiple regression analysis) are sensitive to missing values, as they apply standard listwise deletion: If one observation is missing from a respondent, then the respondent is not used in estimating the model parameters. Possible remedies: (i) Apply multiple imputation [5], even if this is often impractical in practice; (ii) Imputation using the EM algorithm or Imputation using regression analysis: This can be carried out using the Missing Value Analysis (MVA) programme within SPSS; (iii) Enter a safe value for the missing values; this is a value that is not expected to disrupt the estimates for the coefficients (mean imputation, last observation carried forward and similar methods). Although this appears to be simple, it is not always the right option; (iv) Multilevel analysis can be used for models with repeated measures where values are missing at certain time points. Violations of model assumptions Both linear regression analyses, as well as analysis of variance require that residuals are normally distributed. It is therefore good practice to calculate diagnostics for both analyses: Probability plots and other diagnostic plots [6]. However , both methods are relatively robust against violations of the assumptions. The main purpose of analysis of the diagnostics is therefore to get an impression of the reliability of the results. Regression analysis A number of methods fall into this category, each with specific properties and assumptions: (Multiple) linear regression analysis; Logistic regression analysis; Poisson regression; Cox regression analysis (survival analysis). Some comments: A multilevel variant also exists for all of these methods, which can be applied when the data have a hierarchical structure. It should be pointed out that there are specific assumptions that need to be met for the Cox regression analysis, and that GEE is a preferred method for logistical multilevel analysis. The diagnostics used for these methods differ greatly. The diagnostics for linear regression have already been described above. There are also diagnostics for logistic regression analysis: see Hosmer and Lemmeshows book [7]. Diagnostic assessment is less common for the other two methods. 6
A special type of logistic regression analysis is produced by calculating ROC curves: The result is a table and plot of sensitivity against (1 - specificity) at different thresholds for the predictor. In Cox regression analysis the time dependency of the covariate can be taken into consideration. It is standard practice to assume that covariates are constant over time. (Co)-variance analysis Often a one-way ANOVA is used when the averages of more than two groups need to be compared (for two groups this equates to a t-test). If, in addition to this, a number of co-variates both categorical as well as continuous - need to be included in the model, then an analysis of (co)variance needs to be carried out. The advantage of covariance analysis over regression analysis is that all covariates can be specified in a single model: All interactions are included automatically in the model. A disadvantage is that the variance analysis imposes strict requirements on the (continuous) covariates (the regression coefficients need to be equal in all subgroups), which are not always met. Analysis of variance is useful in the exploratory phase in order to get an impression of the influential covariates/confounders. Multilevel analysis Multilevel analysis is used if the data are nested. For instance, patient data collected from various GP practices, where some practices are group practices in which each doctor has his/her own patients. The methodology is complicated: It is advisable to take a course on the topic before undertaking the analysis and asking for advice prior to the analysis phase. Methods for longitudinal data Multilevel analysis can also be used if the lowest level contains observations over time. The GEE [8] programme can be used in this situation. The GGE estimating procedures are particularly reliable when the dependent variable is dichotomous (the equivalent of logistic regression analysis). In a methodological sense, use of GEE is recommended when comparisons between groups need to be made and the researcher is not interested in the variability between individual patients. Factor analysis Factor analysis is often used in validating questionnaires (see also guideline 1.1B-08 Selecting, translating and validating questionnaires), particularly when there is an assumption that the questionnaire contains more than one dimension. A distinction can be made between exploratory and confirmatory factor analysis. Principal Components Analysis (PCA) is often used in the former (as well as Common Factor Analysis); the latter often makes use of software to derive structural models (see below). The use of factor analysis is anything other than trivial: There are various pitfalls to avoid. The same advice applies here as for multilevel analysis: Take a course and ask advice prior to the analysis phase. Methods for analysing structural models Two programmes are often used for this: EQS and Lisrel. The standard reference text for SEM (Structural Equation Modelling) is Bollen [9]. Lisrel is obtainable through EMGO. Special methods: Exact tests In many instances a choice (in SPSS) can be made between asymptotic and exact tests, for instance in calculating chi-square tests on a cross tabulation. A specific statistical package has been developed for this purpose (Statexact), which can also be used to calculate exact odds ratios. Consult one of the EMGO+ biostatisticians. Special methods: Non-parametric methods, bootstrapping Methods such as regression analysis and variance analysis impose relatively strict requirements on the data that they are applied to. It is important that the distribution of the data are unimodal and, more generally, that the data are normally distributed. Various options are available if these requirements cannot be met.
Non-parametric methods SPSS has a (large) range of non-parametric tests available, for instance: Mann-Whitney U, KruskalWallis, Wilcoxon and Friedmans test. These tests do not specify all of the requirements regarding the data distribution and in most cases use the rank order of the dependent variable. Bootstrapping This is a so-called resampling method, which allows the distribution requirements for parametric tests to be by-passed. Bootstrapping is frequently used in cost-effectiveness analyses these days. The standard reference text is Efron and Tibshirani [10]. 5. Details
Appendices/references/links
[1] Dobson AJ. Introduction to Statistical Modelling. London New York: Chapman and Hall, 1983. [2] Edwards D. Introduction to Graphical Modelling. New York Berlin Heidelberg Bacelona Hon Kong London Milan Paris Singapore Tokyo: Springer, 2nd edn., 2000. ISBN 0-387-95054-0. [3] Adr HJ, Kuik DJ, Hoeksma JB, Mellenbergh GJ. Methodological aspects of statistical modelling: Some new perspectives. In: Stasinopoulos M, Touloumi G, eds., Statistical Modelling in Society. Proceedings of the 17th International Workshop on Statistical Modelling. Chania, Crete, Greece, July 8-12, 2002, Athens, 2002. National & Kapodistrian University of Athens and University of North London, 2002; 59-68. [4] Burnham KP, Anderson DR. Model selection and multimodel inference. A Practical Information-Theoretic Approach. ??????, 2nd edn., 2002. [5] Little RJA, Rubin DB. Statistical Analysis with Missing Data. New York: Wiley, 1987. [6] Judd. Statistical methods in the social sciences. To be sorted out. [7] Hosmer DW, Lemeshow S. Applied Logistic Regression. New York Chichester Brisbane Toronto Singapore: John Wiley & Sons, 1989. [8] Liang K, Zeger SL. Longitudinal data analysis using generalized linear models. Biometrika 1986;73:13-22. [9] Bollen KA. Structural Equations with Latent Variables. New York: John Wiley and Sons, 1989. [10] Efron B, Tibshirani RJ. An Introduction to the Bootstrap. New York London: Chapman & Hall, 1993.
6. Amendments V1.1 1 Jan 2010 English translation.
Initial data analysis
HB Nr. : 1.4-03
1. Aim To get a first impression of the data. Evaluating the randomisation procedures. Evaluating and potentially imputing missing values and outliers. To get an impression of the distribution properties of the continuous variables and the numbers in the subgroups. Exploration of the validity and reliability of the measurement instruments. 2. Definitions 3.Keywords Randomisation, missing values, distribution of continuous variables, subgroups, scale scores, measurement level of variables. 4. Description Exploratory analyses This type of analysis helps in assessing whether there are missing values and/or outliers and whether categories need to be combined. First impression It is advisable to always review the distribution of all the variables to be used. Frequencies are reviewed for all categorical variables (e.g. marital status, education). Descriptive statistics (percentage missing values, average, trimmed average, standard deviation, median, other possible percentiles, minimum, maximum, skewness and kurtosis) are calculated for continuous variables (e.g. body weight, blood pressure). It is advisable to create figures, e.g. boxplots or histograms in order to review the distribution. Outliers So-called outliers may occur in continuous variables. These are values that, theoretically, are not out of range, but are extremely unlikely given the observed distribution. Reviewing averages and standard deviations is not enough to discover outliers; a frequency or boxplot will need to be generated for this. Odd combinations Cross-tabulations can be generated for categorical variables (e.g. gender x ADL limitations) in order to assess whether odd combinations are present. Scatterplots can be created for continuous variables to reveal any unlikely combinations (simply reviewing correlations is not sufficient). For instance: A weight of 120 kg and height of 1.50 metres will be an outlier in most populations. When it has been decided that a certain value or combination of values are outliers and the true value cannot be recovered from the raw data, then these need to be recoded as missing. Missing values Also carefully review missing values when evaluating the distributions. Often specific codes (e.g. -1 or 9) are used for missing values. Note whether these codes have been defined as missing values. If there are missing values, consider whether these need to be imputed (filled in). There are a number of methods for this: Please consult a statistician. Normal distribution If there is a requirement that the variables are normally distributed for a given analysis, it is advisable to evaluate whether a variable is in fact normally distributed. Graphs can be used for this, such as histograms or Q-Q plots. If it is apparent that the variable is not normally distributed, then a transformation could be considered (for instance a logarithm transformation) to see whether this improves matters. 9
Distribution of categories Categories can be combined if the numbers in one or more categories is/are too small. The need for this is not always evident from an ordinary frequency distribution. However, it can be apparent from a cross-tabulation. For instance in a study where there is stratification by gender and education, the cross-tabulation of education by gender shows that for men the lowest category not completed primary education rarely occurs, whereas for women the highest category completed university education rarely occurs. The lowest and next lowest categories can then be added together, as well as the highest and second highest. Evaluating the randomisation procedure In order to evaluate whether the randomisation has been successful, the distribution of all the relevant (prognostic) variables needs to be reviewed separately for each treatment arm. Descriptive statistics (percentages, averages, median, standard deviation, range) can be used for this. Differences between groups can be tested (e.g. chi-square or t-test), although it needs to remembered that due to the randomisation procedure any differences found are, by definition, due to chance. Scale scores Prior to the items in a scale being summed, the way in which the items behave in the sample needs to be evaluated. The first step in this process is a frequency plot of the items in a scale. Usually there are positive and negative items. It may be necessary to reverse-score the positive or negative items prior to summing the items to a sum score, to ensure all items are scored in the same direction. The items can then be summed, possibly once the response categories have been combined (e.g. very severe and severe). The second step is a reliability and/or principal components analysis. The principal components or factor analysis can be used to explore which items belong to which (sub)scales. Cronbachs alpha can be used to determine the internal consistency (homogeneity) of the scale. It is advisable to always determine the Cronbachs alpha for the scales and, if possible, a principal components analysis (refer to the guideline Selecting, translating and validating questionnaires) to evaluate whether the expected scales are also evident in the data. If it is apparent from the study that a given item does not fit the scale (e.g. the item-total correlation is too low or it does not load adequately onto the principal component), then whether this item should be excluded from the sum score needs to be considered. This does, of course, have consequences on the comparability of scores with other studies. This should therefore be considered carefully. In general it is not advisable to modify frequently used scales. It is better to use the original scales and report the results found (e.g. low alpha or low item-total correlations) in the discussion of the article. 5. Details Audit questions Has the distribution of all the variables been reviewed? Were there variables with a high percentage of missing values? If so, how were these dealt with? Have outliers been explored? If so, how? Have the cell numbers for central variables been taken into consideration? Where relevant: How were (large) deviations from normality solved? Has been assessed whether the items belonging to a scale actually fit to the scale? 6. Appendices/references/links 7. Amendments V1.1: 1 Jan 2010: English translation. V1.0: 23 Apr 2007.
10
Page. 11 of 32 Rev. Nr.: Effective date: 1.1 1 Jan 2010 dd
Post-hoc and sensitivity analyses
HB Nr. : 1.4-05 1 Aim Specifying and correctly implementing post-hoc and sensitivity analyses. 2. Definitions 3. Keywords Post-hoc analyses, sensitivity analyses
4. Description Post-hoc analyses Post-hoc analyses are required when a significant relationship has been found between the dependent variable and a categorical, independent variable with more than two categories. This allows researchers to ascertain to which categories the significance can be ascribed. For logistic or Cox regressions the output provides both the overall significance for categorical variables as well as the significance of the OR or RR of the separate categories in respect of the reference category. The latter are, in fact, post-hoc analyses (albeit not corrected for repeated testing). However, there are also analysis methods in which the output does not automatically provide this specification. It is tempting to decide which categories differ significantly by eyeballing the results. However, additional analyses need to be undertaken in order for this to be determined. An example of this is variance analysis, in which so-called post-hoc tests can be used (examples include Tukey, Duncan, Scheff of Bonferroni tests). Sensitivity analyses There is always more than one way to carry out an analysis. In order to be more certain about the results it is advisable to redo the analyses in a slightly different way, often by changing one or more (external) parameters. There are a number of cases where a sensitivity analyses is almost always desirable. These will be discussed here. Firstly, when a cut-off has been selected for the dependent or independent variable for which there is, as yet, no consensus. Even if there is a consensus, there is the question of whether this cut-off is applicable to the study population. It is advisable to repeat the analyses with different cut-off values. Secondly, there may be variables with unused categories. This may be either missing values or variables that are composed of data from various sources where there may occasionally be conflicts between both sources. An example of the latter is a disease diagnosis based on data provided by the general practitioner and the respondent. Missing values can be substituted, meaning the respondent can be retained for the analysis. Advanced statistical imputation methods can be used for this. Substitution can also be based on the basis of a best guess. It is good practice to carry out the analyses both with and without respondents with missing values, and to compare the results. An example of this is an uncertain diagnosis where all uncertain cases are set at "no disease" in one analysis, and as "diseased" in another. All uncertain cases can be omitted in a third analysis. A third situation when sensitivity analysis is desirable is with longitudinal data. For instance, there may be data at two time points and the analysis concerns the definition of change in the dependent variable. There is an ongoing lively discussion around this topic in the literature: Whether the choice should be for a difference score or for one or another definition of relevant change, it is advisable to carry out the analyses using different definitions. A similar strategy is recommended in all cases in which there is uncertainty regarding the best choice of statistical measure or procedure. Finally, sensitivity analyses are a standard component of economic evaluations. The opportunities for multivariate analysis in economic evaluations are very limited, owing to the fact that the distribution of cost data is skewed. Sensitivity analyses are used to study the effect of, for instance, the value of cost points on outcomes. Often subgroup analyses and analyses with imputed missing values are carried out as sensitivity analyses (see Drummond et al., 1997).
11
5. Details Audit questions Have post-hoc tests been carried out following the omnibus tests? If not, why not? Have sensitivity analyses been carried out? Would it still be useful to do this for some of the variables? Are cost variables being used? Are sensitivity analyses needed for this?
6. Appendices/references/links Drummond MF, O'Brien BJ, Soddart, GL and Torrance GW. Methods for the Economic Evaluation of Health Care Programmes. Oxford New York Toronto: Oxford University Press, Second Edition, 1997. 7. Amendments V1.1: 1 Jan 2010: English translation. V1.0: 23 Apr 2007.
12
Data analysis documentation
HB Nr. : 1.4-05 1. Aim To ensure that the analyses can be properly reproduced. 2. Definitions 3. Keywords
4. Description It is important in respect of reproducibility and efficiency of data analysis that clear documentation of the data analysis takes place. This may be undertaken by creating a text file for all the relevant analyses, for instance in Word. This text file needs to include both the relevant control file (with clear information about all the steps taken), as well as the output (with clear information for all results). The text file needs to start off with the research question to be answered and the date of the analysis, and should end with a(n) (provisional) answer to the question. See details for an example 5. Details SPSS syntax can be used to document your analyses (e.g. for an article) to allow you and others to easily retrieve and reproduce everything. Text can be included in syntax files (as a kind of analysis logbook). Place your analysis in a logical order (e.g. firstly all the analyses for table 1, then table 2, etc.). Dont forget to always include get/file, so you know which file is related to your analysis (and where they are stored). A Dutch example of this can be found here. Tip: A PDF writer can be used to store the output in PDF format. This saves on paper. Audit question Does the data documentation for the analysis contain the following elements: Research question, control file with clear explanations, output with clear explanations and an answer to the research question? 6. Appendices/references/links 7. Amendments V1.1: 1 Jan 2010: English translation V1.0: 21 Apr 2004: Title modified: Documentation instead of Report. Adding details with example of documented syntax.
13
Page. 14 of 32 Rev. Nr.: Effective date: 1.1 1 jan 2010
Reporting results in tables and figures
HB Nr. : 1.4-06
1. Aim To present the results of your analysis in a clear and well-organised way. 2. Definitions 3. Keywords Graphs, figures, tables 4. Description Graphs and tables It is important to present your results in a clear and well-organised way in tables and graphs, since this will make a significant contribution to the attractiveness of your article, poster or PowerPoint presentation. The choice of presenting results in a table or graph depends on the aim, number of variables, analysis methods and personal preferences. Some journals have a fixed policy on the number and design of tables and graphs, usually a maximum of 5 to 6 tables or figures. This should be taken into consideration when writing your article. See examples of guidelines in the details. Tables and graphs need to be produced in such a way that the reader is able to understand them without having to read any additional text. The title needs to be informative and the rows and columns in the tables or axes of the graphs need to be properly labelled. All abbreviations used need to be explained in full in a footnote below the table or graph. In general tables are appropriate when you want to display the exact numbers from your analyses. Graphs are more appropriate for displaying trends or associations. It is common practice to have the tables and figures follow a specific order in an article. Table 1 is the baseline table with the most important features of the study population. The results of the analyses of the primary outcome measures are usually displayed in Table 2 (or Figure 1). The remaining tables/figures follow after this.
14
5. Details Almost every results section in an article starts with a paragraph about the recruitment of research participants. These days, when describing an RCT, the majority of medical journals require a patient flow chart to be included in the article. This represents how many patients were approached, which ones were selected and excluded (and the exclusion criteria), the dropouts and the number of patients ultimately remaining who participated in the trial. This will usually be Figure 1 in the article. For other articles these details can be represented in the text. Ensure that the numbers add up and that no participants appear to have disappeared (always ask someone to read through the article to check whether it is clear) (link to example) A flow chart is also recommended for a systematic review reflecting how many articles have been scanned, how many full text articles have been requested and how many articles have been included (see the systematic review guideline). A flow chart can also be useful in clarifying a complex treatment protocol. The baseline table (usually Table 1) is intended as a description of your research population. This will include the sociodemographic variables from your research population, such as age, gender and educational level. It will also contain the most important clinical characteristics describing your population, such as the severity of the disorder and general health status. Finally, all baseline values of the determinants, outcomes and potential prognostic variables will be included as well. The average, number of observations and standard deviation can be displayed in the baseline table (or the median and range for data not normally distributed, or ordinal data). When including effect estimates (e.g. when comparing two study populations in a trial) the effect estimate (e.g. average difference, relative risk or odds ratio) should always be included with the 95% confidence interval. For (multiple) linear regression analysis the regression coefficient(s) (B) should be included for all cases along, including the standard error(s) or a confidence interval. The p-value may also be included. However, this is not necessary if you present confidence intervals. Often odds ratio(s) and the 95% confidence interval are included in (multiple) logistic regression analyses. For an association model (e.g. what is the effect of alcohol use on developing a cardiac infarction?) it is advisable to include both the raw effect estimates (e.g. odds ratio with 95% confidence interval), as well as any corrected effect estimates (e.g. corrected for age and gender). For a prognostic model (e.g. what predicts levels of recovery after 6 months?) a measure of how well the model works needs to be included along with the regression coefficients, e.g. percentage variance explained or distinctive power (area under the ROC curve). For a prognostic model it is also 15
necessary to properly describe the strategy used in selecting the variables and the criteria for including variables in the model. N.b.: Please refer to the postgraduate course in logistic regression for more information about the difference between association and prognostic models. 6. Appendices/references/links Scientific style and format: the CBE manual for authors, editors, and publishers, 6th ed. Style Manual Committee, Council of Biology Editors. New York: Cambridge University Press, 1994 Iverson C, Flanagin A, Fontanarosa PB, et al. American Medical Association manual of style: a guide for authors and editors, 9th ed. Hagerstown, Maryland, Lippincott Williams & Wilkins; 1997. 7. Amendments V1.1: 1 Jan 2010: English translation. V1.0: 31 Jan 2008.
16
Guidelines for reporting specific types of studies
Page. 17 of 32 Rev. Nr.: Effective date: 1.3 1 dec 2011 mp
HB Nr. : 1.4-07
1. Aim To present the necessary details for correct interpretation of the published results. 2. Definitions 3. Keywords RCTs, meta-analyses, diagnostic study, observational study 4. Description For each type of study it is strongly advised to follow the international standards or statements. There are statement for RCTs, meta-analyses, diagnostic and observational studies: CONSORT (link to www.consort-statement.org) The CONSORT statement is intended to improve the reporting of RCTs, to enable readers to understand the trial design and correctly interpret the results. PRISMA Prisma stands for Preferred Reporting Items for Systematic Reviews and Meta-Analyses. It is an evidence-based minimum set of items for reporting in systematic reviews and meta-analyses. The aim of the PRISMA Statement is to help authors improve the reporting of systematic reviews and meta-analyses. http://www.prisma-statement.org/ QUOROM (link to http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=10584742&dopt =Citation) The QUOROM (Quality Of Reporting Of Meta-analysis) statement is specifically intended for the reporting of meta-analyses of RCTs. STARD (link to http://www.consort-statement.org/stardstatement.htm) The STARD statement is specifically intended for the accurate reporting of diagnostic studies. MOOSE (link to http://www.meduohio.edu/lib/instr/pdf/MOOSE.pdf) The MOOSE (Meta-analysis of Observational Studies in Epidemiology) statement is intended for the reporting of meta-analyses of observational studies. STROBE statement The STROBE statement: Strengthening the Reporting of Observational Studies in Epidemiology is a good checklist for preparing a publication of an observational study. The statement has been developed for cohort, case-control and cross-sectional study designs. Anybody using this type of design is advised to employ the Strengthening the Reporting of Observational Studies in Epidemiology checklist: Guidelines for reporting observational studies. The lancet, vol 370, Oct 20, 2004, p. 1453-1457. The explanation of the checklist items are described in a separate publication: Stobe explanation and elaboration. See also www.strobe-statement.org COREQ Consolidated criteria for reporting qualitative research (COREQ): a 32-item checklist for interviews and focus groups. International Journal for Quality in Health Care; Volume 19, Number 6: pp. 349 357 2007
17
TREND statement The TREND statement ,Transparent Reporting of Evaluations with Nonrandomized Designs, is intended for the reporting of theories used and descriptions of intervention and comparison conditions, research design, and methods of adjusting for possible biases in evaluation studies that use nonrandomized designs. (Am J Public Health. 2004;94:361366 In addition to these statements based on research designs there are also statements developed for research in a specific field: APA-statement The APA statement of the American Psychological Association includes a) standards for all journal articles, b) more specific standards for reports of studies with experimental manipulations or evaluations of interventions using research designs involving random or non-random assignment and c) standards for articles reporting meta-analyses. American Psychologist 2008:63:839-51. AERA-statement The AERA statement of the American Educational Research Association provides guidelines for reporting on empirical social science research in AERA publications. These guidelines apply to reports of education research grounded in the empirical traditions of social sciences. They cover, but are not limited to, qualitative and quantitative methods. Educational Researcher 2006;35:33-40. GRISP A checklist of 25 items recommended for strengthening the reporting of Genetic Risk Prediction Studies (GRIPS) Strengthening the Reporting of Genetic RIsk Prediction Studies: The GRIPS Statement Strengthening the Reporting of Genetic RIsk Prediction Studies (GRIPS): Explanation and Elaboration 5. Details 6. Appendices/references/links 7. Amendments V1.4: 1 Dec 2011: Addition of COREQ and GRIPS V1.3: 14 Feb 2011: Addition of the Prisma statement. V1.2: 11 Oct 2010: Addition of statements: Trend, APA and AERA V1.1: 1 Jan 2010: English translation and separation statements from graphs and tables V1.0: 31 Jan 2008: Addition of the Strobe statement for observational studies
18
Page. 19 of 32 Rev. Nr.: Effective date: 1.1 1 Mar 2011 mh
Prognostic models
HB Nr. : 1.4-08 Martijn W Heymans Tobias van den Berg Danielle van der Windt Caroline Terwee 1. Aim To describe how a prognostic model can be developed and validated as thoroughly as possible. 2. Definitions A prognostic model is a multivariable model consisting of a combination of predictors as strongly as possible associated with the outcome. 3. Keywords Prediction, prognosis/prognostic, model, regression, validity
4. Description This guideline describes the methods and techniques that are used to develop and validate prognostic models. The aim of a prognostic model is to estimate the probability of a particular outcome based on as few variables as possible. This may involve prognostic (risk or outcome) prediction (predicting the course of a disease), as well as aetiological models (predicting who will get the disease on the basis of risk factors) or diagnostic models (predicting the presence of the disease). The various steps to develop a prognostic model are provided in summary, from the selection of predictors to the testing of the external validity. For a few steps there is the option between a fundamental, yet simple approach, or the use of more complex techniques. These options are summarised briefly in this guideline. Contents of this guideline Introduction Preparation Choice of predictors Defining the outcome measure Choice of model Sample size and number of predictors Linearity Correlation between predictors Handling missing values Developing a prognostic model Preselecting predictors and building the model o Univariate and stepwise regression analysis o Least absolute shrinkage and selection operator (Lasso) The performance of the prognostic model Creating a prediction rule Validity Internal validity External validity
19
5. Details
A. Introduction The aim of a prognostic model is to estimate (predict) the probability of a particular outcome as optimally as possible, and not just to explore the causality of the association between a specific factor and the outcome (explanatory). The way in which a prognostic model is developed differs therefore from the method for building an explanatory model. For an explanatory (causal) model there is normally a single central determinant and correction for confounding; when building a prognostic model the focus is on the search for a combination of factors which are as strongly as possible related to the outcome. Prognostic models are often developed for the clinical practice, where the risk of disease development or disease outcome (e.g. recovery from a specific disease) can be calculated for individuals by combining information across patients. The model can then be presented in the form of a clinical prediction rule (1). It is often preferable that the variables in the model are easily determined in practice in order to ensure that a prognostic model is applicable in (clinical) practice. B. Preparation Choice of predictors Prognostic models can be developed using a broad variety of biological, psychological and social predictors. The correct predictors need to be carefully selected. It is advisable to include all predictors which have been shown to be strongly associated with the outcome in previous research, or those which can be expected to show an association on the basis of conceptual or theoretical models. A proper systematic literature review and expert advice is important in this step. When the practical applicability of the prognostic model is important, it is preferable for predictors to be determined quickly and simply (e.g. no complex or invasive tests and no extensive questionnaires). Defining the outcome measure The outcome is central to the prognostic model and needs to be carefully selected. Think carefully about the nature of the outcome (which concept), the method for determining the outcome (which measurement instrument, by whom) and the length of follow-up (which measurement time points). The outcome of a prognostic model is often dichotomous (e.g. ill or not ill), but it may also be a continuous outcome (for instance, the severity of functional limitations), or the time until a certain event occurs (time to event, for instance, the time until work is resumed or time until death). When defining a dichotomous outcome, occasionally a cut-off point is chosen on a continuous scale. Bare in mind that this leads to a loss of information and therefore only in the case of strong arguments this should be considered. If dichotomized, a cut-off needs to be carefully selected, preferably based on substantive arguments and the use of a conceptual or theoretical model. For instance, at what point do we define whether or not there is a case of depression? Choice of model The choice of the statistical model to be used in creating the prognostic model is dependent on the definition of the outcome measure. A logistic regression model should be chosen for a dichotomous outcome. A Cox regression model can be used for a time to event model and a linear regression model for a continuous outcome measure. There are various other options, but these will not be discussed in this guideline. Sample size and number of predictors The precision of the estimates in the prognostic model is highly dependent on the size of the study population. There are different ways of generating power calculations for determining the minimal sample size of the study population. This, in particular, will determine the number of variables that can be included in the regression model. A rule of thumb is that for a continuous outcome measure (linear regression) you will need at least 10 15 participants per variable in the model. For a dichotomous outcome (logistic regression) at least 10 15 events" or non-events, depending on which has the lowest number of participants, need to be considered per variable (2). Events and non-events refer to whether or not the outcome 20
occurs, for instance, disease/no disease. The logistic regression rule also applies to Cox regression models. When dealing with external validation of a prognostic model the validation cohort also needs to have a sufficient number of participants (validation cohort refers to a cohort used to externally test the model). The 10 - 15 participants rule also applies here. Linearity The regression models discussed in this guideline presupposes a linear relationship between the predictor and outcome. However, more often than not this relationship is non-linear rather than linear. An example of this, for instance, is the relationship between alcohol consumption and the risk of developing a cardiac infarction. This relationship is U-shaped. One therefore needs to consider investigating for all potential predictors (with the exception of nominal or dichotomous variables nominal variables should always be included as dummy variables) whether the relationship with the outcome measure is indeed linear. However, a balance must be sought between a data driven search for sample idiosyncratic non-linearity and specifics applying to the population. Most important is that not the exact form of the relationship is important but the increase in predictive performance. There are various options for investigation of non-linearity including spline functions. More information about the various methods for investigating linearity will be available in the Epidm course Prediction modelling that will start in 2012. Spline functions Spline functions can be used to further explore the linear/non-linear relationship between a predictor and the outcome (spline functions are mathematical functions that are used to carefully analyse the relationship between a predictor and the outcome measure, if this is nonlinear). These spline functions do not assume a linear relationship between a predictor and the outcome measure, if this is not present, but follow the pattern of the data in more detail. If there is a non-linear relationship between the predictor and the outcome, then this can be included as a function in the regression model. The advantage of this is that this does not reduce the power of the regression model too greatly in comparison with categorising the variables and including these as dummy variables, which often happens in a non-linear relationship. Contact Martijn W Heymans for more information about spline functions and how to apply them. Correlation between predictors A significant correlation between variables will affect the selection of both predictors. It is therefore sensible to generate a correlation table, including all potential predictors. When variables are strongly correlated (e.g. >0.70), it is sensible to choose which variables you are going to use in building the model, or if you intend to add variables together into a single variable. For instance, you could choose the variable most strongly associated with the outcome measure, or the measure that is easiest to measure. N.B.: There is no problem with variables strongly correlating with each other, i.e. the correlation between the dependent and the independent variable, in a single model. Problems arise when forward and backward selection takes place in combination with strongly correlated (independent) variables. Handling missing values There will be dropouts and missing values in virtually every cohort study1. Dropouts are participants not (or no longer) taking part in follow-up assessments and whose outcome measures are missing. The number and reasons of dropouts need to be described. If possible the personal characteristics of the dropouts should also be described and compared with those participants who did take part in the follow-up assessments, in order to investigate whether a selective dropout took place. In addition to dropouts there are often also (incidental) missing values, where results of one or more predictors are missing for a section of the participants. There are various strategies for dealing with missing values. One of these is to only use data from participants with a complete dataset (complete case analysis). In the most ideal case, where missing values are completely at random, coefficients are estimated less precisely. In
1
Dropouts will not arise during the research in patient-controlled studies. However, there may of course be missing values. The same solutions as described above apply.
21
less ideal cases, i.e. missing at random or missing not at random, this method will have a negative effect on the composition of the model and the regression coefficient estimates. This method is therefore strongly discouraged. It is possible to impute missing values in a dataset. There are various methods available for this, including imputing an average value or imputing a value estimated from regression methods. However, use of these techniques is strongly discouraged. Multiple imputation is considered to be one of the best methods. It is common practice for an expert or a statistician to be consulted for applying these techniques (Martijn W Heymans can be consulted for this). Make sure that the number of dropouts and missing values are always described in your study. For detailed information on techniques to evaluate and handle missing data we would like to refer to the missing data guideline in the quality handbook. Developing the model Preselecting predictors and building the model Once a set of predictors has been selected, the next step is to create the prognostic model. It is important in this process to distinguish between relevant and less relevant predictors, meaning that the final model can be developed with as few predictors as possible, but would still lead to reliable predictions. The following techniques can be used for developing a prognostic model. 1. Univariate and Stepwise regression analysis Selecting variables Firstly, the relationship between each individual predictor is investigated with the outcome measure in a model that only includes the predictor and outcome measure (univariate). The relationship between the predictor and outcome are evaluated against a specific p-value: 0.20 is often used for this, or lower. If the predictor has a lower p-value, then this can be considered as relevant and included in the next step. The importance of each predictor to the prognostic model can be explored in this way. Should too many variables be retained in this pre-selection phase, than you can be stricter in the level of selection, i.e. choose a lower p-value, i.e. p < 0.1 or p < 0.05. An important note to consider is that the pre-selection of predictors based on univariate statistical significance is arbitrary. It is a better choice to make use of previous research and expert opinion for the first selection of predictors without thrusting too much on statistical pre-selection alone. You may also choose to work with groups of variables. For instance, you could firstly generate the model on the basis of all easily obtainable variables (e.g. details from the case history). The most important predictors can then be selected from this group of variables (see building the model). You can then add the next group of variables (e.g. details from the physical examination). Select the most important predictors from this group of variables, plus from the variables that have been retained from the previous group, etc. Building the model The options for this are to use a forward or backward selection method, or a combination of the two (stepwise regression). Forward and backward selection methods can be used in order to select the predictors for the model step-by-step. In the forward selection method you add variables to the model, whereas in a backward selection method you remove variables from the model. The backward selection method is preferred, as it leads to fewer errors in the estimates for the predictors or in selecting the most relevant predictors. For these reasons this method is discussed in more detail here. N.B.: Selecting predictors by using forward or backward selection techniques will always generate more problems than selecting variables on the basis of previous research (prospective or systematic literature reviews) or by consulting clinical experts for choosing important variables on basis of a Delphi procedure. It is therefore advisable to use forward and backward selection techniques as little as possible.
22
In backward selection all the selected variables are firstly entered at the same time into a model. Subsequently the variables with the highest p-values are manually removed (i.e. those variables contributing the least) on the basis of the Wald test (which allows you to calculate the significance level of a predictor). Then the model is re-run. This step is repeated until there are no variables left with a p-value smaller than 0.10 or 0.20. A p-value of 0.10 or 0.20 is commonly used in prognostic models, as variables that are less strongly associated with the outcome may still make a relevant contribution to the prediction. Sometimes it may be informative to, following this procedure, add specific variables that did not end up in the final model (but perhaps were expected to fit in the model), to assess whether they make a significant contribution to the final model. This process is occasionally successful. It may also be interesting to interchange variables on the basis of the correlation between variables (e.g. variables that are easier to measure), to assess whether this generates an equivalent, but more easily applicable model. 2. Least absolute shrinkage and selection operator (Lasso) The Lasso is an advanced technique for the selection of variables. The Lasso is able to shrink regression coefficients to zero. This is the same as not selecting variables in a multivariable analysis. The Lasso method combines this shrinkage with variable selection and so does not need a separate shrinkage step (for more on shrinkage see paragraph G below). Furthermore, with the Lasso the number of potential prognostic variables to select can be much larger than with normal backward selection. To learn more about this technique and how to apply it contact Martijn W Heymans. The method is promising but has not been applied much in epidemiological studies yet. E. The performance of the prognostic model Once you have developed a prognostic model, it is also important to investigate how well the model works, that is to say, how well does the model predict outcomes? The section below describes which techniques, depending on the choice of model, can be used to test how well your prognostic model works (1): Linear regression The percentage variance explained (R2): This indicates the percentage of the total variance of the outcome measure explained by the predictors in the prognostic model. Logistic and Cox regression Calibration: Calibration can be used to assess how well the observed probability of the outcome agrees with the probability predicted by the model. This can also be presented graphically in a calibration plot. In a calibration plot groups of predicted probabilities of the outcome are plotted against groups of observed probabilities (groups of 10 are often used). Subsequently you can assess the extent to which these groups lie along the perfect calibration line, which forms a 45 degree angle with the horizontal axis. The Hosmer-Lemeshow test can also be used to investigate how well the predicted probabilities agree with the observed probabilities. This test should not be statistically significant (null hypothesis: There is no difference between predicted and observed values). Discrimination: This indicates how well the model discriminates between people with and without the outcome. If there are few predictors in the model, then a lot of people will fall into the same group of predicted probabilities and the model will not be able to discriminate very well between groups. If there are numerous predictors in the model, then few people will fall into the same group and the model will have a better discriminatory power. An ROC curve can be generated for the predicted probabilities to determine the level of discrimination. The Area Under the Curve (AUC) of the ROC curve is a measure of discriminatory power for the model, that is, how well the model is able to discriminate between people with and without the outcome based on the predicted probabilities (3). An AUC of 0.5 indicates that the model is not discriminating very well (no different to tossing a coin); an AUC of 1.0 indicates perfect discrimination. 23
Reclassification tables: This is a novel method to evaluate the performance of a prediction model and can be seen as a refinement of discrimination obtained by the ROC curve (4). This method is especially useful to detect an improvement in discrimination when a new variable is added to an existing prediction model. It makes use of the reassignment of subject with and without the outcome in their corresponding risk categories. When a new variable is added to the model and prediction is improved, subjects with the outcome are reassigned to a higher risk category. This means improved reclassification. When subjects with the outcome are reassigned to lower risk categories reclassification is worsened. For subjects without the outcome it works in the opposite direction. The Net Reclassification Improvement (NRI) and Integrated Discrimination Improvement (IDI) can be used to test of significance of reclassification and create confidence intervals. F. Creating a prediction rule For logistic and Cox regression models the regression coefficients can be used to calculate the outcome (predicted probabilities), based on individual patient characteristics (values of the determinants). The regression coefficients can be transformed into risk scores in order to facilitate use of the prediction rule in practice. A frequently used method for this is to divide the regression coefficients by the lowest value or to multiply the coefficients by a constant, for instance 10. A score card containing these scores can then be generated to allow the probability of an outcome to be easily calculated for a given individual. This is easy to use in practice. Refer to the article by Kuijpers et al. for an example. 2006 (5). Another example is to create a mathematical algorithm and install this on a website. G. Validity This is perhaps the most important part of developing a prediction rule. Prediction models commonly perform better in datasets used to develop the model than in new datasets (subjects). This means that the models regression coefficients and performance measures are too optimistic and that these have to be adapted to new situations (1, 6). A way to adapt prediction models is to shrink (i.e. make smaller) the regression coefficients before the model will be applied in new subjects. Internal and external validation are used to estimate the amount of optimism. In other words, validating the model explores how well predictions generated by the prognostic model agree with predictions for future patients or comparable patients not part of the study population. Determining validity of a prediction rule can be achieved in a number of ways, which are discussed briefly below. A nice reference for a more comprehensive overview is Vergouwe et al. (7). A distinction is made between internal and external validity when validating a prediction rule. Internal validity For internal validity the model is developed and validated using exactly the same dataset of patients. Techniques that can be used to determine internal validity include: Data-splitting (where the dataset is split in two at random), cross-validation (where the dataset is split into more than two datasets at random) and bootstrapping (a type of simulation technique). The last method is recommended, as this makes efficient use of all the data. External validity For external validity a model is developed in a cohort of patients and the validity is determined using another cohort of comparable patients. The previously described measures, such as variance explained (R2), calibration and discrimination, are used to determine validity. Contacts: If you would like more information about developing and/or validating prediction rules, then please contact Martijn W Heymans. There will also start a new Epidm course on Prediction modelling in 2012. Audit questions 24
1. Was the selection of the predictors based on a literature search and advice from experts? 2. Has the outcome measure been clearly defined? 3. Have dropouts and missing values been described and have the potential consequences of these been discussed in the research report (is dealt with missing values in a sensible way, i.e. multiple imputation)? 4. Is the sample size of the study population sufficient? 5. Has linearity been assessed for all potential predictors? 6. Has a correlation table been created of all potential predictors? 7. Has a (manual) backward selection been used for building the model? 8. Has the model quality been assessed? If possible, have calibration and discrimination been assessed? 9. Was the prediction model validated?
6. Appendices/references/links
Harrell F. Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis. Springer, 2001. (a new update will be available in june/july 2011). Peduzzi P, Concato J, Feinstein AR, Holford TR. Importance of events per independent variable in proportional hazards regression analysis. II. Accuracy and precision of regression estimates. J Clin Epidemiol 1995; 48(12):1503-10. Harrell F, Lee K, Mark D. Multivariate prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat Med 1996;15:361-87. Steyerberg EW, Vickers AJ, Cook NR, Gerds T, Gonen M, Obuchowski N, Pencina MJ, Kattan MW. Assessing the performance of prediction models: a framework for traditional and novel measures. Epidemiology. 2010;21(1):128-38. Kuijpers T, van der Windt DA, Boeke AJ, Twisk JW, Vergouwe Y, Bouter LM, van der Heijden GJ. Clinical prediction rules for the prognosis of shoulder pain in general practice. Pain 2006;120(3):276-85. Steyerberg EW. Clinical Prediction Models: A Practical Approach to Development, Validation, and Updating. New York: Springer Science+Business Media, 2009. Vergouwe Y, Steyerberg EW, Eijkemans MJ, Habbema JD. Validity of prognostic models: when is a model clinically useful? Semin Urol Oncol 2002;20:96-107.
7. Amendments V1.0: 1 Jan 2010. English translation. V1.1: 1 Mar 2011. Several textual changes and additions, Replacement of bootstrapping by the Lasso technique, addition of reclassification tables, more emphasis on validation the model. Update references.
25
Page. 26 of 32 Rev. Nr.: Effective date: 1.1 5-7-2012 aw
Handling Missing Data
HB Nr. : 1.4-09 1. Aim To give researchers a structured guideline for handling missing data 2. Definitions
3. Keywords Missing data, Missing completely at random, Missing at Random, Missing not at random, Imputation. 4. Description 4.1 Introduction Missing data is a common problem in all kinds of research. The way you deal with it depends on how much data is missing, the kind of missing data (single items, a full questionnaire, a measurement wave), and why it is missing, i.e. the reasons that the data are missing. Handling missing data is an important step in several phases of your study. 4.2 Why do you need to do something with missing data? The default option in SPSS is that cases with missing values are not included in the analyses. Deleting cases or persons results in a smaller sample size and larger standard errors. As a result the power to find a significant result decreases and the chance that you correctly accept the alternative hypothesis of an effect (compared to the null hypothesis of no effect) is smaller. Secondly, you introduce bias in effect estimates, like mean differences (from t-tests) or regression coefficients (from regression analyses). When the group of non-responders is large, and you delete them, your sample characteristics are different from your original sample and from the population you study. There could be a difference in characteristics between responders and non-responders. Therefore you need to inspect the missing data, before doing further analyses. Thus, always check the missing data in your data set before starting your analyses, and do never simply delete persons in your dataset with missing values (default option in SPSS). 4.3 What to do with missing data in different phases of your study Data preparation: If you work with questionnaires, make sure that all questions are clear and applicable to your respondents. If necessary, use the not applicable answer option. To decrease the chance of missing data, use digital applications to collect your data, such as Web based questionnaires where you can set the option that answering the question is required. You can also use these applications for sending reminders and tracking the respondents progress. If you work with physical or physiological data, the most frequent cause of missing data is a technical problem with the instruments. Testing the instruments in a pilot study will partly prevent you for these problems. Data collection: Closely monitor the completeness of the data when you receive or obtain the data. When you detect missing data during data collection, try to complete your data. Look back in the raw data (questionnaires), or ask your respondents to fill out the missing items. Describe in your logbook why data are missing. This helps you to decide whether data are missing at random or not. Data processing: Investigate the number of missing data you have (see 4.4) and estimate the need for imputation and think about the most adequate imputation method (see 4.5 and further). Data analyses: If you have missing values in your data set when starting your analyses, remember that case wise and list wise deletion (default in SPSS regression and ANOVAs) may hamper the reliability of your results (see 4.2). 26
4.4 How much data is missing? SPSS can help you to identify the amount of missing data. When you are interested in the percentage of missing values for each variable separately (e.g. item on a questionnaire) use the Frequency option in SPSS: 1. Select Analyze Descriptive Statistics Frequencies 2. Move all variables into the Variable(s) window. 3. Click OK. The Statistics box tells you the number of missing values for each variable. However, be aware that this only gives you information about the percentage of missing values for each variable separately. It is more important to study the full percentage of missing data, especially when you use more variables in your analysis. When you are interested in the full percentage of missing data use the following option: 1. Select Analyze Multiple Imputation Analyze patterns 2. Move all variables into the Variable(s) window. 3. Click OK. The output tells you the percentage of variables with missing data, the percentage of cases with missing data, and the number of missing values. This final pie chart tells you the full percentage of missing data. Note the 5% borderline. Also patterns of missing data are presented. 4. Tip: use the Help button, and click show me for more information about the options and output in SPSS. When you want to find out more about the patterns of missing data and the relation between missing data between variables, use the following option: 1. Analyze Missing Value Analysis, 2. Move all variables of interest into the Quantitative or Categorical Variable(s) window. 3. Use the patterns button to get information about the relation between missing data on more variables 4. A tutorial of the Missing Value Analysis (SPSS 16 and further) procedures in SPSS can be found via the Help button. A users guide can be downloaded freely on the internet. 4.5 What kind of data is missing? Next step is to identify the kind of data that is missing. You can find out this information from the steps described in 4.4. 1. A single item, or several items of a questionnaire is missing. 2. A full questionnaire or a single variable (such as blood pressure) 3. A measurement wave (in longitudinal / randomized studies) The way you deal with missing data depends on the type of missing data 4.6 What type of missings do you have? Missing values are either random or non-random. Random missing values may occur because the subject accidentally did not answer some questions. For example, the subject may be tired and/or not paying attention, and misses the question. Random missing values may also result from data entry mistakes. Non-random missing values may occur because subjects purposefully do not answer some questions. For example, the question may be confusing, so respondents do not answer the question. Also, the question may not provide appropriate answer choices, such as no opinion or not applicable, so the subject chooses not to answer the question. Also, subjects may be reluctant to answer some questions because of social desirability concerns about the content of the question, such as questions about sensitive topics like income, past crimes, sexual history, prejudice or bias toward certain groups, and etc. Think about your dataset. Is there an option that the missing values are non-random?
27
Rubin developed in 1976 a typology for missing data. Type of missings MCAR: Missing Completely Random: Description The data are MCAR when the probability that a value for a certain variable is missing is unrelated to the value of other observed variables, or unrelated to the variable with missing values itself. An example is when respondents accidentally skip questions. In other words, the observed values in your dataset is just a random sample from your dataset, when it would have been complete. The data are MAR when the probability that a value for a certain variable is missing is related to observed values on other variables. An example is when older respondents have more missing values than younger respondents. However, within the group of older and younger respondents, the data are still MCAR. Another example is when respondents with low scores on the first wave are not invited for a second wave. The data are MNAR when the probability that a value for a certain variable is missing is related to the scores on that variable itself. An example is that respondents with low income intentionally skip their low income scores because that violates their privacy. In that case, the probability that an observation is missing depends on information that is not observed, like the value of the income score, because only low values are missing. MNAR is a serious problem, which can not be solved with a technique as multiple imputation.
At
MAR: Missing at Random (most of the time)
MNAR: Missing Not At Random:
How do you know what kind of missings you have? There are three kinds of methods. 1. First you can inspect the data by yourself. Are the missings equally distributed in the data. Are low and / or high scores missing? If the missings are not equally spread this might be an indication that the data are MNAR. With this method you a-priori must now what the distribution of the variable normally is, i.e. is it normal or skewed? You need this information before you can judge which part of the data suffers from missing values. This method only applies if your dataset is large. 2. Second, SPSS can test whether the respondents with missing data differ from the respondents without missing data on important variables (Analyze Missing Value Analysis select important variables descriptives t-test formed by indicator. Significant? Indication for MAR. Be aware that if your sample size is large (>500) this t-test might be significant if the data truly are not MAR. So, just looking at the means and their difference might be good enough. In case this mean difference is very small, this might be an indication of MCAR. 3. In SPSS via (Analyze Missing Value Analysis, EM button), it is also possible to do a test for MCAR data. This is called Littles test. A tutorial of the Missing Value Analysis (SPSS 16 and further) procedures in SPSS can be found via the Help button. It is important to note that youre not able to test whether your missing data is MAR or MNAR. The above mentioned procedures (1 and 2) will only give you an indication. Pay attention to the possibility of MNAR, because all analyses have serious problems when your missing data is MNAR. 4.7 How to handle missing data? Missing data is random: 28
For MCAR and MAR, many missing data methods have been developed in the last two decades (Schafer & Graham, 2002). Although MCAR seems to be the least problematic mechanism, deleting cases can still reduce the power of finding an effect. It is argued that the MAR mechanism is most frequently seen in practice. An argument for this is that in most research multifactorial or multivariable problems are studied, so when data on variables are missing it is mostly related to other variables in the dataset. Missing data is not random: For MNAR, imputation is not sufficient, because the missing data are totally different from the available data, i.e. your complete data has become a selective group of persons. If you think your data is MNAR it might be wise to contact a statistician from EMGO+ who is willing to help you. For MCAR and MAR, there are roughly two kinds of techniques for imputation. Single and Multiple Imputation. Single imputation is possible in SPSS and is an easy way to handle missings when just a few cases are missing (less than 5%) and you think your missing values are MCAR or MAR. However, after single imputation the cases are more similar which may result in an underestimation of the standard errors, i.e. smaller confidence intervals. This increases the chance of a type 1 error (the null hypothesis of no effect is rejected, while there is truly no effect). Therefore, this method is less adequate when you have >5% missing data. Multiple imputation is more complex, but also implemented in SPSS 17.0 and later versions. Multiple imputation takes into account the uncertainty of missing values (present in all values of variables) and is therefore more preferred than single imputation. When your missingness is high (exceeds 5% in several variables and different persons) multiple imputation is more adequate. Imputation techniques Single imputation Single imputation techniques are based on the idea that in a random sample every person can be replaced by a new person, given that this new person is randomly chosen from the same source population as the original person. In that case you can use the observed available data of the other persons to make an estimation of the distribution of the test result in the source population. It is called single imputation, because each missing is imputed once. There are many methods for single imputation, such as replacement by the mean, regression, and expected maximization. Expected maximization is preferred, because in the other methods the variance and standard error are reduced and the chance for Type II errors increases. Expected maximization forms a missing data correlation matrix by assuming the shape of a distribution for the missing data and imputes missing values on the likelihood under that distribution. Single imputation is possible in SPSS (analyze missing value analyses button EM for Expected Maximization). Contact a statistician from EMGO+ who is willing to help you with this procedure. For the imputation of a missing score on a single item in a questionnaire (see 4.5) , SPSS syntaxes can be found at: http://www.tilburguniversity.edu/nl/over-tilburguniversity/schools/socialsciences/organisatie/departementen/mto/onderzoek/software/ tw.zip: Software for two-way imputation in SPSS. (Van Ginkel & Van der Ark, 2003a), or rf.zip: Software for response function imputation in SPSS (Van Ginkel & Van der Ark, 2003b).
29
Multiple imputation (MI) The difference with single imputation is that in MI the value is imputed for several times. There are more imputed datasets created. The different imputations are then based on random draws of different estimations of the underlying distribution in the source population. In this way, the imputed data comes from different distributions and therefore are less look alike. There is more uncertainty created in the dataset. Therefore the standard error increases. The amount of imputations is dependent on the amount of missing data, but mostly 5 to 10 imputations are enough. A drawback of this method it that several imputed datasets are created and that the statistical analysis has to be repeated in each dataset. Finally, results have to be pooled in a summary measure. Most statistical packages can do this automatically. Multiple imputation is possible in recent versions (vs 17) of SPSS (analyze multiple imputation impute missing data values). For more information see references. Contact a statistician from EMGO+ who is willing to help you with this procedure. Sensitivity analysis After imputation, sensitivity analysis is needed to determine how your substantive results depend on how you handled the missing data. Follow these steps: 1. Do a complete case analysis (default option in SPSS; cases with missings are not included) 2. Do a missing data analysis after you imputed the results 3. Compare substantive conclusions, decide how to report. When is imputation of missing data not necessary? 1) When your missing data is MCAR or MAR, and you use Maximum Likelihood estimation techniques in analyses such as Structural Equation Modelling (SEM) or Linear Mixed Models (LMM), imputation of missing data is not necessary. These techniques use the available data, and ignore the missing values and still give correct results. In such situations you do not have to use an extra imputation technique to handle your missing values. Missing data that are MNAR is still a problem for these methods. 2) A different approach may be used for descriptive studies. If you want to show the (observed) study data (means and standard deviations), for example to compare them with other countries/settings, without directly linking them to a conclusion, imputation is not immediately needed. However, the evaluative statistics (t-tests, regressions, etc.) would certainly need complete case analysis. So, if you use statistical tests to compare the descriptive, imputation is needed (of course depending on the amount and type of missing data). In this final case, you link your descriptive to a conclusion and want a corrected p-value / 95% CI, and therefore you need to use the data with imputed values. Do not forget the reviewer, who may sometimes have problems with using imputed and non-imputed data in one paper. Be clear about imputation and point out why you choose to present imputed/non-imputed data. 4. Summary Make every effort to avoid missing data, or failing that, to understand how much and why data is missing. Understand missing data mechanisms (MCAR, MAR, MNAR) and their implications Avoid default methods (listwise deletion, pairwise deletion) Avoid default fixups (mean imputation, etc.) where possible Use multiple imputation to take proper account of missings Do a sensitivity analysis
30
5. Details
I have missings in my data
What is the type of missing?
MCAR/MAR
MNAR
Ask a statistician from EMGO+ to help you
Try to complete your dataset
Ask a statistician from EMGO+ to help you
I use SEM or LMM for my analyses
I do not use SEM or LMM for my analyses. Imputation is needed.
Use ML estimation, no imputation is needed
How much data is missing?
< 5% use single imputation
>5% Use multiple imputation
6. Appendices/references/links
Multiple Imputation Methods, Niels Smits (technical literature). http://www2.chass.ncsu.edu/garson/pa765/missing.htm http://www.ssc.upenn.edu/~allison/MultInt99.pdf (especially for Multiple Imputation) Ask EMGO+ statisticians for help via: http://www.emgo.nl/kc/preparation/research%20design/3%20Advice%20and%20support.html
31
EMGO+ experts on Missing Data Martijn Heymans: mw.heymans@vumc.nl Jos Twisk: jwr.twisk@vumc.nl Recommended (non-technical) literature. 1. Sterne JA, White IR, Carlin JB, Spratt M, Royston P, Kenward MG, Wood AM, Carpenter JR. Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls. BMJ 2009;338:b2393. doi: 10.1136/bmj.b2393. 2. Allison, P.D. (2001). Missing Data (Sage University Papers Series on Quantitative Applications in the Social Sciences, series no. 07-136). Thousand Oaks: Sage. 3. Schafer, J.L. & Graham, J.W. (2002). Missing data: Our view of the state of the art. Psychological Methods, 7, 147-177. 4. Donders AR, van der Heijden GJ, Stijnen T, Moons KG. Review: a gentle introduction to imputation of missing values. J Clin Epidemiol. 2006; 59(10):1087-91. Review. 5. http://www.stat.psu.edu/~jls/mifaq.html (Multiple Imputation FAQ page met uitleg) 6. Van Ginkel, J. R., & Van der Ark, L. A. (2003a). SPSS syntax for two-way imputation of missing test data [computer software and manual]. Retrieved from http://www.tilburguniversity.edu/nl/over-tilburguniversity/schools/socialsciences/organisatie/departementen/mto/onderzoek/software/ 7. Van Ginkel, J. R., & Van der Ark, L. A. (2003b). SPSS syntax for response function imputation of missing test data [computer software and manual]. Retrieved from http://www.tilburguniversity.edu/nl/over-tilburguniversity/schools/socialsciences/organisatie/departementen/mto/onderzoek/software/
7. Amendments V1.0 1-12-2011 V1.1 5-7-2012 : addition to section When is imputation of missing data not necessary?
32

Guidelines in Analysis Phase PDF

Transféré par

Informations du document

Description originale:

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Guidelines in Analysis Phase PDF

Transféré par

Droits d'auteur :

Formats disponibles

Guidelines in the Analysis Phase

Updated: 5 July 2012

Title of the document:

Page. 2 of 32 Rev. Nr.: Effective date: 1.2 1 Jan 2010 avs

Title of the document:

Page. 5 of 32 Rev. Nr.: Effective date: 1.1 1 Jan 2010 avs

Data analysis in general

Imputing Multilevel analysis

6. Amendments V1.1 1 Jan 2010 English translation.

Title of the document:

Page. 9 of 32 Rev. Nr.: Effective date: 1.1 1 Jan 2010 avs

Initial data analysis

Title of the document:

Page. 11 of 32 Rev. Nr.: Effective date: 1.1 1 Jan 2010 dd

Post-hoc and sensitivity analyses

Title of the document:

Page. 13 of 32 Rev. Nr.: Effective date: 1.1 1 Jan 2010 avs

Data analysis documentation

Title of the document:

Page. 14 of 32 Rev. Nr.: Effective date: 1.1 1 jan 2010

Reporting results in tables and figures

Title of the document:

Guidelines for reporting specific types of studies

Page. 17 of 32 Rev. Nr.: Effective date: 1.3 1 dec 2011 mp

Title of the document:

Page. 19 of 32 Rev. Nr.: Effective date: 1.1 1 Mar 2011 mh

Title of the document:

Page. 26 of 32 Rev. Nr.: Effective date: 1.1 5-7-2012 aw

Handling Missing Data

MAR: Missing at Random (most of the time)

MNAR: Missing Not At Random:

What is the type of missing?

Ask a statistician from EMGO+ to help you

Try to complete your dataset

Ask a statistician from EMGO+ to help you

I use SEM or LMM for my analyses

I do not use SEM or LMM for my analyses. Imputation is needed.

Use ML estimation, no imputation is needed

How much data is missing?

< 5% use single imputation

>5% Use multiple imputation

Vous aimerez peut-être aussi