PSYCHOLOGICAL MEDICINE Analysis of Two Independent Samples with Non Normality Using Non Parametric Method, Data Transformation and Bootstrapping Method Ng Chong Guan 1) , Muhamad Saiful Bahri Yusoff 2) , Nor Zuraida Zainal 1) , Low Wah Yun 3) ABSTRACT Objective: To compare the use of non parametric test, data transformation and bootstrapping method in the analysis of two independent samples with non normal distribution. Method: 202 patients who were discharged from the psychiatric ward, University Malaya Medical Centre (UMMC) from 27 th August, 2007 to 15 th April, 2008 were recruited. The general psychopathology was measured with brief psychiatric rating scale (BPRS-24). On follow up, the patients who had early readmission (< 6 months) were identified. The BPRS score between the patients with and without early readmission were compared using the Mann-Whitney test, logarithm data transformation and bootstrapping method. Result: All three methods rejected the null hypothesis (p < 0.05). The effect estimation of the means difference is shown in bootstrapping method (95% confidence interval: 6.62-6.98). Conclusion: Non parametric method, logarithm data transformation and bootstrapping method can be used for hypothesis testing in the analysis of two independent samples with non normality. Effect estimation is only provided by bootstrapping method. KEY WORDS non parametric, transformation, bootstrapping, non normality Received on July 21, 2011 and accepted on October 19, 2011 1) Department of Psychological Medicine, Faculty of Medicine, University Malaya 2) Medical Education Department, School of Medical Sciences, Universiti Sains Malaysia 3) Health Research Development Unit, Faculty of Medicine, University Malaya Correspondence to: Ng Chong Guan (e-mail: chong_guan1975@yahoo.co.uk) 227 INTRODUCTION Parametric statistical methods used for two independent samples analysis are required to fulfill certain assumptions. One of the requirements is to assume the distribution of the two samples to be approximately normal 1) . Violation of this assumption can increase the chance of Type I or Type II error 1) . Unfortunately, observations with non normality are frequently encountered 3) . Generally, the solution for this problem is either transformed the data into one that meet the normality condition or use alternative methods that do not rely on normality assumption 4) . Non parametric statistical method used for two independent sam- ples is Wilcoxon rank sum test or Mann-Whitney test. It is based on the idea of replacing the original data by rank numbers. The ranks of the two groups are added up. The sums of the ranks are compared with a critical range and the appropriate p value is calculated 4) . Data transformation is the application of mathematical modifica- tion to the values of the original dataset. One of the common modifi- cations is converting the data into logarithm scale. This transforma- tion improves the data normality by compressing the extreme part of a distribution. The logarithm of a number to a given base is the power or exponent to which the base must be raised in order to produce that number. Base 10, 2, e (natural logarithm) are the frequently used options 2) . Bootstrap was first introduced by Efron as a method of estimating test statistic by resampling data in hand 5) . The original sample is expanded by resampled with or without replacement. The expanded sample with larger size improves in normality and treated as a virtual population. Statistical inferences are made on the sampling distribu- tion of the expanded sample 6-8) . In this study, we aim to compare the use of three statistical meth- ods (Mann Whitney test, logarithm data transformation and boot- strapping method) in the analysis of a dataset extracted from a study conducted in the psychiatric ward, University Malaya Medical centre (UMMC) in 2008. METHOD Study sample A series of 202 non-duplicated, conservative patients who were discharged from the psychiatric ward, UMMC from 27 th Aug 2007 to 15 th Apr 2008 were included into the study. Prior to discharged, the general psychopathology of the patients was assessed using BPRS- 24. On follow up assessment, the patients who had early readmission (less than 6 months) were identified. The BPRS scores between the patients with and without early readmission were compared. Brief psychiatric rating scale- Expanded Version (BPRS-E) The BPRS was developed by Overall and Garham 9-11) . It is the most established questionnaire scale for rapid clinical assessment that measures major psychotic and non-psychotic symptoms in individu- als with major psychiatric disorders. The rating is based upon obser- C 2012 Japan International Cultural Exchange Foundation & Japan Health Sciences University 228 vation made by the clinician or rater during a 15 to 30 minutes inter- view (items which measure tension, emotional withdrawal, manner- isms and posturing, motor retardation and uncooperativeness), and subject verbal report (items which measure conceptual disorganiza- tion, unusual thought content, anxiety, guilt feeling, grandiosity, depressive mood, hostility, somatic concern, hallucinatory behavior, suspiciousness and blunted affect). Each item is defined by 1-2 sen- tences of clinical description with anchor points clearly described. Its reliability has proved to be good several studies 12,13) . An expand- ed standardized version (BPRS-E) was developed by Lukoff et al. in 1984. Six extra items (suicidality, elevated mood, bizarre behaviour, self neglect, distractibility and motor hyperactivity) were included in order to increase the coverage of the instrument. This version was adapted by Ventura et al. in early 1990. It was demonstrated that symptoms assessed in the BPRS-E are rather stable cross-culturally 14) . Kolmogorov-Smirnov test (KS test) We used KS test as the inferential test of the normality. The KS distance to normality is a numerical measure of the maximal distance between the empirical cumulating frequency plot and the associated normal cumulative distribution function. It is always between the bound of 0 to 1. If the KS distance is small, one can assume that there is sufficient normality. The alpha of 0.1 significant level was used in the KS test in this study. Wilcoxon sign rank test or Mann Whitney test As the distributions of the two samples (BPRS score of patients with and without early readmission) do not meet normality, they were analyzed with non parametric test. Mann Whitney test is the non parametric test used to compare the median of the two samples in this study. Two sided significant level of 0.05 was used in the analysis. Data transformation The values of the original datasets were logarithm transformed. The base e (natural logarithm) was used in this study. The reason is that higher bases will pull the extreme values more drastically than the e base. The normality of the two logarithm transformed datasets were checked with KS test. The means of the logarithms transformed datasets were compared using independent t-test. We tried to estimate the 95% confidence interval of the means difference by exponential the result from the t-test. Bootstrapping method The two samples were resampled with replacement for 500 times. The mean of each resampled surrogate data was determined. This generated two expanded samples with the size of 500. The normality of the two expanded samples was checked with KS test. The means of the two samples were compared using independent t-test. RESULT The characteristics of the study subjects are shown in Table 1. The mean age of the subjects was around 40 years old. They were slightly more female (54.5%). Majority of the subjects comprised of Chinese ethnicity and had education of at least secondary level. Most of the subjects had affective disorder. Table 2 shows that the rate of readmission is 48% for the sample. The mean BPRS score is higher for the early readmission group. The original distribution of the two samples do not meet normali- ty based on KS test (p < 0.1). The samples in both the logarithm transformation and bootstrapping method are normally distributed (Table 3). Table 4 shows the statistical inferences using the three statistical methods. All three methods rejected the null hypothesis at significant level (p < 0.05). The 95% confidence interval of the means difference of the two samples was demonstrated in the bootstrapping method. Ng C. G. et al. Table 1. Characteristics of the study subjects (N = 202) Variable Age, mean (SD) 39.12 (13.64) Gender, n (%) Male 92 (45.5) Female 110 (54.5) Race, n (%) Malay 45 (22.3) Chinese 92 (45.5) Indian 57 (28.2) Others 8 (4.0) Marital status, n (%) Married 101 (50.0%) Never married 101 (50.0%) Eucational level, n (%) Secondary and below 132 (65.3%) Tertiary 70 (34.7%) Diagnosis, n (%) Psychotic disorder 73 (33.1) Affective disorder 93 (40.0) Others 36 (26.9) Table 2. Comparison of the BPRS score between the patients with and without early readmission Early readmission N (%) Mean SD Yes 65 (48) 43.72 14.57 No 137 (52) 36.86 9.42 Table 3. Kolmogorov-Smirnov test result for the BPRS score distribution of each model Model KS distance P value (absolute) Original Early readmission 0.166 0.06* No early readmission 0.113 0.06* Logarithm transformed Early readmission 0.109 0.42 No early readmission 0.066 0.59 Bootstrapping sampling Early readmission 0.033 0.63 No early readmission 0.027 0.86 *p value < 0.1 KS = Kolmogorov-Smirnov Table 4. Statistical inferences using Mann Whitney test, logarithm transformation and Bootstrapping method Method Readmission P value 95% confidence (sum of rank) interval Yes No Mann Whitney 7791 12712 0.002 N/A test Readmission (Mean) Logarithm 3.723 3.577 0.002 0.05-0.24 transformation* (exponential) (41.39) (35.76) (1.05-1.27) Bootstrapping* 43.68 36.88 <0.001 6.62-6.98 * analysed with independent t-test N/A = not application 229 DISCUSSION In the case where normality is not met, the use of parametric meth- ods in analysis becomes inappropriate. Non parametric methods with no or limited assumption on the distribution provide alternatives in the data analysis. It is also intuitive to apply various methods to improve normality in the data. However, it is crucial to make reason on the non normality of the data. The statistical inferences based on the 'modified' data may be misleading if the original dataset is incorrect 2) . In addition to the advantage of limited assumption on the format of data, non parametric method is simple to use. It is useful for analy- sis of ordered categorical data. The disadvantage of non parametric is the lack of power as compared to parametric approaches. The statisti- cal inference from non parametric method is more toward hypothesis testing rather than estimation of effect 4) . For instance, the result of Mann Whitney test of the study rejected the null hypothesis (median of BPRS (readmission) = median of BPRS (no readmission) ) but no measurement of the effect estimation. Logarithm transformation reduces non normality by compressing the spacing of measurements on the right side of the distribution more than the left side. Although it retains the order of the measure- ments, it changes the relative distance between adjacent values that were originally equidistance. As a result, the price paid to achieve normality using logarithm transformation is the conversion of mea- surement into ordinal (rank) data 2) . Due to the nature of logarithm transformation, the true effect estimation is not interpretable from the exponential of the effect estimation of the transformed data. For instance, the 95% confidence interval obtained from exponential of the effect estimation in the study was 1.05 to 1.27. The values were much smaller than the crude means difference (41.39 - 35.76 = 5.63). The use of bootstrap method based on resampling data with replacement in clinical research is increasing rapidly 15-18) . It aims to create an empirical sampling of the test statistic without additional assumptions. It is simple and straightforward to derive estimates of standard errors and confidence interval of the statistic 6-8) . In this study, we used a simple bootstrap method where the datasets were resampled with replacement for 500 times to create two expanded samplings. The expanded samples met the normality condition. Thus, the parameters of the expanded samples were used in the parametric analysis. The result of the bootstrap method rejected the null hypoth- esis and provided reasonable effect estimates (95% confidence inter- val: 6.62-6.98). One of the limitations of the bootstrap sample is that it under-represents the true variability if the sample size of the origi- nal dataset is too small. It is suggested sample should be greater than 50 for non parametric problem 19) . CONCLUSION Non parametric test, logarithm data transformation and bootstrap- ping method give similar result in hypothesis testing for comparison of two independent samples with non normality. The advantage of bootstrapping method is that it allows researchers to estimate confi- dence intervals for statistic. REFERENCES 1) Whitley E, Ball J. Statistics review 5: comparison of means. Crit Care 2002; 6: 424- 428. 2) Osborne JW. Notes on the use of data transformations. Pract Assess Res Eval 2002; 8. 3) Micceri T. The unicorn, the normal curve, and other improbable creatures. Psychol Bull 1989; 105: 156-166. 4) Whitney E, Ball J. Statistical review 6: nonparametric methods. Crit Care 2002; 6: 509-513. 5) Efron B. Bootstrap methods: another look at the Jackknife. Ann Statist 1979; 7: 1-26. 6) Haukoos JS, Lewis Rj. Advanced statistics: bootstrapping confidence intervals for sta- tistics with "difficult" distributions. Acad Emerg Med 2005; 12: 360-365. 7) Base O, Crown WH, Pollicino C. Guidelines for selecting among different types of bootstraps. Curr Med Res Opin 2006; 22: 799-808. 8) Wasserman S, Bockenholt U. Bootstrapping: application to pathophysiology. Psychophysiology 1989; 26: 208-221. 9) Overall JE, Gorham DR The Brief Psychiatric Rating Scale (BPRS): A comprehensive review. J Operat Psychiatr 1962; 11: 48-65. 10) Overall JE, Gorham DR. The Brief Psychiatric Rating Scale (BPRS): recent develop- ments in ascertainment and scaling. Psychopharmacol Bull 1988; 24: 97-9. 11) Overall JE, Gorham DR. The Brief Psychiatric Rating Scale, ECDEU Assessment manual for psychopharmacology, Guy W, ed, Rockville, MD: U. S. Department of Health, Education, and Welfare 1976; 157-69. 12) Dingemans PMAJ, Linszen DH, Lenior ME, Smeets RMW. Component structure of the expanded Brief Psychiatric Rating Scale (BPRS-E). Psychopharmacology 1995; 122(3): 263-267. 13) Schtzwohl M, Jarosz-Nowak J, Briscoe J, Szajowski K, Kallert T. Inter-rater reliabil- ity of the Brief Psychiatric Rating Scale and the Groningen Social Disabilities Schedule in a European multi-site randomized controlled trial on the effectiveness of acute psychiatric day hospitals. Int J Methods Psychiatr Res 2003; 12(4): 197-207. 14) Ventura MA, Green MF, Shaner A, Liberman RP. Training and quality assurance with the brief psychiatric rating scale: "The drift buster". Int J Methods Psychiatr Res 1993; 3: 221-244. 15) Aegerter P, Muller F, Nakache JP, et al. Evaluation of screening methods for Down's syndrome using bootstrap comparison of ROC curves. Comput Methods Prog Biomed 1994; 43: 151-157. 16) Baker SG, Chu KC. Evaluating screening for the early detection and treatment of can- cer without using a randomized control group. J Am Statist Assoc 1990; 85: 321-327. 17) Shen CF, Aglewicz B. Robust and bootstrap testing procedures for bioequivalence. J Biopharmacol Stat 1994; 4: 65-90. 18) Tsodikov A, Hasenclever D, LoefflerM. Regression with bounded outcome score: esti- mation of power by bootstrap and simulation in a chronic myelogenous leukemia clin- ical trial. Stat Med 1998; 17: 1909-1922. 19) Chernick MR. Bootstrap methods: a practitioner's guide. New York: Wiley, 1999. Analysis of Two Independent Samples with Non Normality Using Non Parametric Method, Data Transformation and Bootstrapping Method