0 évaluation0% ont trouvé ce document utile (0 vote)
120 vues7 pages
this paper will illustrate the computation of Cronbach Coefficient Alpha in a step-by-step manner, and also explain the meaning of each component of the SAS output
Titre original
An introduction to computing and interpreting Cronbach Coefficient Alpha in SAS
this paper will illustrate the computation of Cronbach Coefficient Alpha in a step-by-step manner, and also explain the meaning of each component of the SAS output
this paper will illustrate the computation of Cronbach Coefficient Alpha in a step-by-step manner, and also explain the meaning of each component of the SAS output
An introduction to computing and interpreting Cronbach Coefficient Alpha in SAS
Chong Ho Yu, Ph.D., Arizona State Universit, !empe, A"
A#S!$AC! In spite of the ease of computation of Cronbach Coefficient Alpha, its misconceptions and mis-applications are still widespread, such as the confusion of consistency and dimensionality, as well as the confusion of raw Alpha and standardized Alpha. To clarify these misconceptions, this paper will illustrate the computation of Cronbach Coefficient Alpha in a step-by-step manner, and also explain the meaning of each component of the SAS output. %&!$'DUC!%'& eliability can be expressed in terms of stability, e!ui"alence, and consistency. Consistency chec#, which is commonly expressed in the form of Cronbach Coefficient Alpha $Cronbach, %&'%(, is a popular method. )nli#e test- retest for stability and alternate form for e!ui"alence, only a single test is needed for estimating internal consistency. In spite of its ease of computation, misconceptions and mis- applications of Cronbach Coefficient Alpha are widespread. The following problems are fre!uently obser"ed* %. Assumptions of Cronbach Alpha are neglected by researchers and as a result o"er-estimation and under- estimation of reliability are not ta#en into consideration. +. Some researchers belie"e that the standardized Alpha is superior to the raw Alpha because they belie"e standardization can normalize s#ewed data. This problem also reflects the confusion of co"ariance matrix with correlation matrix. ,. Additionally, some people throw out difficult or easy items based on the simple statistics of each item without ta#ing the entire test into account. -. .urther, when a sur"ey or test contains different latent dimensions, some researchers compute the o"erall Alpha only and /ump to the wrong conclusion that the entire test or sur"ey is poorly written. '. 0n the other hand, when a high o"erall Alpha is obtained, many researchers assume a single dimension and do not further in"estigate whether the test carries subscales. 1. 2any people percei"e that Cronbach Alpha ranges from 3 to %. Indeed, the Cronbach Alpha can ha"e a negati"e "alue when the item co"ariance is extremely poor. 4. Se"eral researchers use a pretest as the baseline or as a co"ariate. 5owe"er, a low Alpha in the pretest may result from random guessing when the sub/ects ha"e not been exposed to the treatment $e.g. training of the test content(. 6udging the reliability of the instrument based on the pretest scores is premature. 7. 8ast but not least, !uite a few researchers adopt a "alidated instrument but s#ip computing Cronbach Coefficient Alpha with their sample. They failed to realize that reliability information attaches to the test scores rather than the test. This practice ma#es subse!uent meta-analysis of mean difference and Alpha impossible. To clarify these misconceptions, this paper will illustrate the computation of Cronbach Coefficient Alpha in a step-by-step manner, and also explain the meaning of each component of the SAS output. (H%CH $)*%A#%*%!Y %&+'$,A!%'& SH'U*D % US)- 0ne could compute Cronbach Coefficient Alpha, 9uder ichardson $9( .ormula, or Spilt-half eliability Coefficient to examine internal consistency within a single test. Cronbach Alpha is recommended o"er the other two for the following reasons* %. Cronbach Alpha can be used for both binary-type and large-scale data. 0n the other hand, 9 can be applied to dichotomously scored data only. +. Spilt-half can be "iewed as a one-test e!ui"alent to alternate form and test-retest, which use two tests. In spilt- half, you treat one single test as two tests by di"iding the items into two subsets. eliability is estimated by computing the correlation between the two subsets. The drawbac# is that the outcome is affected by how you group the items. Therefore, the reliability coefficient may "ary from group to group. 0n the other hand, Cronbach Alpha is the mean of all possible spilt-half coefficients that are computed by the ulon method $Croc#er : Algina, %&71(. (HA! %S C$'&#ACH A*PHA- Cronbach Alpha is a measure of s!uared correlation between obser"ed scores and true scores. ;ut another way, reliability is measured in terms of the ratio of true score "ariance to obser"ed score "ariance. The theory behind it is that the obser"ed score is e!ual to the true score plus the measurement error $< = T > ?(. .or example, I #now 73@ of the material but my score is 7'@ because of guessing. In this case, my obser"ed score is 7' while my true score is 73. The additional fi"e points are due to the measurement error. It is assumed that a reliable test should minimize the measurement error so that the error is not highly correlated with the true score. 0n the other hand, the relationship between true score and obser"ed score should be strong. In addition, it is assumed that the mean of the measurement error should be zero. In other words, the error scores should be random and uncorrelated with each other. .ailure of meeting this assumption may lead to an o"er- estimation of Cronbach Alpha though in practice this assumption cannot be fully met. It is also assumed that items must be essentially tau e!ui"alent, in which the true scores for any two items must be within a constant of each other for an examine. If this assumption for Cronbach Alpha is "iolated, Alpha may underestimate reliability. .or this reason, it is generally agreed that Cronbach Alpha is a lower bound estimate of reliability because perfect essentially tau-e!ui"alence is seldom achie"ed $Cortina, %&&,(. )sing simulations, Aimmerman, and Aumbo $%&&,( found that the "iolations of these assumptions lead to substanti"e o"er-estimation and under-estimation of Cronbach Alpha. H'( !' C',PU!) C$'&#ACH A*PHA The following is an example of SAS code to run Cronbach Alpha* Data one; input post_em1-post_em5; cards; 1 1 1 0 0 1 1 0 1 1 1 0 1 1 1 1 1 1 0 0 0 1 1 1 0 1 0 1 0 1 proc corr alpha nocorr nomiss; var post_em1-post_em5; run; In this example, the BnocorrC option suppresses the item correlation information. Although the correlation matrix can be used to examine whether particular items are negati"ely correlated with others, a more efficient way is to chec# the table entitled Bif items are deletedDC This table tells you whether particular items are negati"ely correlated with the total and thus it is recommended to suppress the correlation matrix from the output. BIf items are deletedDC will be explained in a later section. It is important to include the EnomissE option in the procedure statement. If the tester did not answer se"eral !uestions, Cronbach Alpha will not be computed. In sur"eys, it is not unusual for respondents to s#ip !uestions that they donFt want to answer. Also, if you use a scanning de"ice to record responses, slight pencil mar#s may not be detected by the scanner. In both cases, you will ha"e EholesE in your data set and Cronbach Alpha procedure will be halted. To pre"ent this problem from happening, the EnomissE option tells SAS to ignore cases that ha"e missing "alues. 5owe"er, in the preceding approach, e"en if the tester s#ips one !uestion, the entire test record will be ignored by SAS. In a speeded test where testers may not be able to complete all items, the use of EnomissE will lead to some loss of information. 0ne way to o"ercome this problem is to set a criterion for a "alid test response. Assume that 73 percent of test items must be answered in order to be included into the analysis. The following SAS code should be implemented* Data one; infile "c:\data"; input x1-x10; if nmiss(of x1-x10) ! then delete; arra" x#$% x1-x10; do $&1 to 10; if '($) &( then '($) & 0; proc corr alpha nocorr nomiss; var x1-x10; run; In the preceding SAS code, if a record has more than two unanswered !uestions $73@(, the record will be deleted. In the remaining records, the missing "alues will be replaced by a zero, and thus these records will be counted into the analysis. It is acceptable to count missing responses of a test as wrong answers and assign a "alue of EzeroE to them. Gut it is not appropriate to do so if the instrument is a sur"ey such as an attitude scale. 0ne of the popular approaches for dealing with missing data in sur"eys is the mean replacement method $Afifi : ?lashoff, %&11(, in which means are used to replace missing data. The SAS source code for the replacement is the same as the preceding one except the following line* if '($) & ( then '($) & mean(of x1-x10); H'( !' %&!)$P$)! !H) SAS 'U!PU! Descriptive statistics The mean output as shown in .igure % tells you how difficult the items are. Gecause in this case the answer is either right $%( or wrong $3(, the mean is ranging from 3 to %. 3.& indicates that the !uestion is fairly easy and thus &3@ of the testers scored it. It is a common mista#e that people loo# at each item indi"idually and throw out the item that appears to be too difficult or too easy. Indeed, you should ta#e the entire test into consideration. This point will be discussed later. .igure %. Simple statistics of Cronbach Coefficient AlphaHs output $a. and standardized Alphas As shown in .igure %, Cronbach Alpha procedure returns two coefficients* %. aw* It is based upon item correlation. The stronger the items are inter-related, the more li#ely the test is consistent. +. Standardized* It is based upon item co"ariance. Co"ariance is not a difficult concept. Iariance is a measure of how a distribution of a single "ariable $item( spreads out. Co"ariance is simply a measure of the distributions of two "ariables. The higher the correlation coefficient is, the higher the co"ariance is. Some researchers mista#enly belie"e that the standardized Alpha is superior to the raw Alpha because they thought that standardization normalizes s#ewed data. Actually standardization is a linear transformation, and thus it ne"er normalizes data. Standardized Alpha is not superior to its raw counterpart. It is used when scales are + comparable, because as mentioned before, "ariance and co"ariance are ta#en into account for computation. /ariance and covariance The concepts of "ariance and co"ariance are better illustrated graphically. In one "ariable, the distribution is a bell-cur"e if it is normal. In a two-"ariable case, the normal distribution appears to be a mountain as shown in .igure +. In this example, both item% and item+ has a mean of zero because the computation of co"ariance uses standardized scores $z-score(. .rom the shape of the Emountain,E we can tell whether the response patterns of testers to item% and item + are consistent. If the mountain pea# is at or near zero and the slopes of all directions spread out e"enly, we can conclude that the response pattern of these items is consistent. .igure +. Co"ariance as a BmountainC 5owe"er, in order to determine whether the response pattern to the entire test is consistent, we must go beyond /ust "iewing one pair. Cronbach Alpha computation examines the co"ariance matrix $all possible pairs( to draw a conclusion. It is noteworthy that not all the information on the matrix is usable. .or example, the pairs of the item itself such as $item%, item%( can be omitted. Also, the order of the pair doesnHt matter i.e. the co"ariance of pair $item%, item+( is the same as that of $item+, item%( $see Table %(. Table %. Co"ariance matrix table %tem 0 %tem1 %tem2 %tem3 %tem4 item 0 Co"aria nce Co"aria nce Co"aria nce Co"arian ce item 1 Co"aria nce Co"aria nce Co"arian ce item 2 Co"aria nce Co"arian ce item 3 Co"arian ce item 4 Consistenc and dimensionalit Jenerally spea#ing, the higher the Alpha is, the more reliable the test is. There isnHt a commonly agreed cut-off. )sually 3.4 and abo"e is acceptable $Kunnally, %&47(. It is a common misconception that if the Alpha is low, it must be a bad test. Actually your test may measure se"eral latent attributesLdimensions rather than one and thus the Cronbach Alpha is deflated. .or example, it is expected that the scores of J?-Ierbal, J?-Muantitati"e, and J?-Analytical may not be highly correlated because they e"aluate different types of #nowledge. If your test is not internally consistent, you may want to perform factor analysis or principal component analysis to combine items into a few factorsLcomponents. <ou may also drop the items that affect the o"erall consistency, which will be discussed in a later section. If you #now what the subscales are, you should compute the Cronbach Alpha for each subscale. 0n the other hand, when the Cronbach Alpha is larger than .43, researchers may go to another extreme. Cortina $%&&,( obser"ed that many people accept a high Alpha as ade!uate and thus seldom ma#e further scale modifications. Cortina explicitly criticized that this is an improper usage of statistics. It is important to note that a low o"erall Alpha may indicate the existence of latent constructs, but a high o"erall Alpha does not necessarily imply the absence of multiple latent dimensions. 0ne may argue that when a high Cronbach Alpha indicates a high degree of internal consistency, the test or the sur"ey must be uni-dimensional. Thus, there is no need to further in"estigate its subscales. This is a common misconception. Actually consistenc and dimensionalit must be assessed separately. The relationship between consistency and uni-dimensionality is illustrated in .igure ,. )ni-dimensionality is a subset of consistency. If a test is uni-dimensional, then it will show internal consistency. Gut if a test is internally consistent, it does not necessarily entail one construct $Jardner, %&&'N %&&1(. This logic wor#s li#e this* If I am a man, I must be a human. Gut if I am a human, I may not be a man $could be a woman(. The logical fallacy that Bif A then GN if G then AC is termed as Baffirming the conse!uentC $9elley, %&&7(. This fallacy often happens in the mis-interpretation of Cronbach Alpha. .igure ,. )ni-dimensionality and consistency , Jardner $%&&'( used a nine-item scale as an example to explain why a high Alpha does not necessarily indicate one dimension* Cronbach Alpha is a measure of common "ariance shared by test items. The Cronbach Alpha could be high when each test item shares "ariance with at least some other itemsN it does not ha"e to share "ariance with all items. Oifferent possible scenarios are illustrated in .igure -a- c. As mentioned before, Cronbach Alpha can be calculated based upon item correlation. Phen the correlation coefficient is s!uared, it becomes the strength of determination, which indicates "ariance explained. Iariance explained is often "isualized by sets. Phen two sets are intersected, the o"erlapped portion denotes common "ariance. It can be understood as how much the indi"idual difference in respect to the one item resposne could be explained by another one, and "ice "ersa. The non- o"erlapped portion indicates independent information. .igure -a. Inconsistent and no uni-dimension In .igure -a, all nine sets ha"e no o"erlapped area, and thus all nine items share no common "ariance. They are neither internally consistent nor uni-dimensional. In this situation, interpreting a low Alpha as the absence of uni- dimensionality is correct. .igure -b. Consistent and uni-dimensional The scenario presented in .igure -b is exactly opposite to that in .igure -c. It shows the presence of a high degree of internal consistency and uni-dimensionality because all items share common "ariance with each other. Interpreting a high alpha as an indication of the presence of one single construct could be accepted. .igure -c. Consistent but not uni-dimensional. )nli#e the abo"e two situations, the last scenario is misleading. In .igure -c, se"eral items share "ariance with some others. In other words, subscales exist inside the instrument e"en though the Alpha is high and the instrument is internally consistent. Interpreting a high Alpha as a sign of uni-dimensionality is problematic. Since consistency and dimensionality should be examined by different procedures, it is recommended that ;0C .ACT0 should be used in addition to ;0C C0 A8;5A. A real life example is found in a sociological research concerning cogniti"e and affecti"e attitudes toward sex $Apostopoulos, Sonmez, <u, 2attila, : <u, under submission(. 0riginally, the cogniti"e scale and the affecti"e scale are treated as two scales. Cronbach Alphas of the two scales are high $cogniti"e = .74, affecti"e = .7'( and thus it ga"e an illusion that the two scales represent two constructs only. 5owe"er, factor analysis indicated that indeed there are two subscales in each scale. After di"iding the two scales into four, Alphas were impro"ed $see Table +(. Table +. Subscales of attitudes toward sex Scales Alphas Sub-scales Alphas Cognitive attitude toward sex .87 Cognitive attitude toward sexual behaviors .92 Cognitive attitude toward using condoms .95 Aective attitude toward sex .85 Aective attitude toward sexual behaviors .9! Aective attitude toward using condoms .88 ,ore on variance and dispersion - Since Cronbach Alpha ta#es "ariance into account, it is important to interpret the data in the context of dispersion. .or example, when you compare the mean scores in the following two tables, you can find that both pre-test and post-test responses are consistent, respecti"ely. 5owe"er, the Alpha of post-test is only .,3 $raw( and .+& $standardized( while the Alpha of pre-test is as high as .13 $raw and standardized(. It is because the standard de"iation $SO( of the post-test ranges from .%4 to .+7, but the SO of the pre-test is more consistent $.-+-.-7( $see .igure '(. .igure '. Simple statistics %f the item is deleted... As mentioned before, a good analysis of test items should ta#e the whole test into consideration. The following table tells you how each item is correlated with the entire test and what the Alpha would be if that "ariable were deleted. .or example, the first line shows you the correlation coefficient between post-test item % and the composite score of post-test item%-item'. The first item is negati"ely correlated with the total score. If it is deleted, the Alpha will be impro"ed to .-% $raw( or .-+ $standardized(. Muestion ' has the strongest relationship with the entire test. If this item is remo"ed, the Alpha will be dropped to -.3% $raw( or .3- $standardized(. This approach helps you to spot the bad apple and retain the good one $see .igure 1(. .igure 1. If the item is deletedD .igure 4. Simple statistics when there is no "ariance 0nce again, "ariance plays a "ital role in Cronbach Alpha calculation. Pithout "ariance there will be no sensible result. The following !uestions are from another post-test. ?"ery body scored Muestion , and - $%.33( but missed Muestion - $3.33(. Gecause there is no "ariance, standardized Cronbach Alpha, which is based on co"ariance matrix, cannot be computed at all. Although the raw Cronbach Alpha, which is based on item correlation, can be computed, its "alue is -.,3. It clearly demonstrates that the Cronbach Alpha could ha"e a negati"e "alue $see .igure 4(. 5% D'&6! 7&'(8 'P!%'& %& !H) P$)!)S! In the pretest where sub/ects are not exposed to the treatment and thus are unfamiliar with the sub/ect matter, a low reliability caused by random guessing is expected. 0ne way to alle"iate this problem is to include EI donHt #nowE as an option in multiple choices. In experiments where studentsH responses would not affect their final grades, the experimenter should explicitly instruct students to choose EI donHt #nowE instead of ma#ing a guess if they really donHt #now the answer. 8ow reliability is a signal of high measurement error, which reflects a gap between what students actually #now and what scores they recei"e. The choice EI donHt #nowE can help in narrowing this gap. Konetheless, this proacti"e measure cannot sol"e the problem entirely when too many sub/ects choose too many BI donFt #now,C because lac# of "ariance would lead to a low Cronbach Alpha. ,'$) !HA& '&) SH'! Another common misconception of Cronbach Alpha is that if someone adopts a "alidated instrument, heLshe does not need to chec# the reliability and "alidity with hisLher own data. Imagine this* Phen I buy a drug that has been appro"ed by .OA and my friend as#s me whether it heals me, I tell him, EI am ta#ing a drug appro"ed by .OA and therefore I donHt need to #now whether it wor#s for me or notQE A responsible e"aluator should still chec# the instrumentHs reliability and "alidity with hisLher own sub/ects and ma#e any modifications if necessary. 5enson and Thompson $+33%( is critical to the reliability induction, in which the Cronbach Alpha of the test manual is used as e"idence to support the adoption of the scale into another study. They argued that when characteristics of samples may be "ery different, generalization of reliability is inappropriate. ' .urther, when the researchers report the reliability information of their own data, it helps other subse!uent researchers to conduct meta-analyses. 2eta-analysis is a research tool for synthesizing results obtained from pre"ious research. ?ffect size, which is expressed in terms of standardized mean difference across groups, is used in meta-analysis. 5owe"er, the effect size may be affected by the measurement error. To counteract this problem, 5unter and Schmidt $%&&3( ad/ust the effect size for measurement error by di"iding the effect size by the s!uare root of the reliability coefficient $r( of the dependent "ariable. The formula is shown in the following* ?ffect size 2easurement error correction = ---------------------- S!uare root of r )nfortunately, the absence of reliability information in many studies ma#es this type of meta-analysis impossible .or example, after re"iewing articles in three psychological /ournals from %&&3-%&&4, Iacha-5aase,Kess, Kilsson, : eetz $%&&&( found that one-third of those articles did not mention reliability information. .urther, different studies yield not only different effect sizes, but also different Cronbach coefficient Alphas. Gesides comparing mean differences, meta-analysis could also be employed to examine whether a particular instrument is consistently consistent. There are se"eral approaches to accomplish this goal. 0ne way is to transform Cronbach Alphas of the same instrument reported in past research "ia .isherFs A transformation, and then to compute the M statistics $Gehrens, %&&4(. Another way is )R test de"eloped by .eldt, Poodruff, and Salih $%&74(. )R test can be used for comparing Alphas obtained from different independent samples, as well as from the same sample. In recent years, se"eral psychologists and educational researchers de"eloped a methodology called Breliability generalizationC $J( study $Iacha-haase, %&&7(. In the J studies, "ariables that would affect the reliability estimation of test scores such as sample size, gender, and number of items, are identified. These "ariables are used as regressors to predict Cronbach coefficient Alphas in a generalized linear model. In this way researchers could find out what factors contribute reliability "ariation across different samples. Oiscussion of .isherFs A tranformation, M statistics, )R test, and J studies is out of the scope of this paper. The important point here is that researchers should consider going beyond the Cronbach Alpha reported in one particular study and loo# for a farther inference. C'&C*US%'& Although Cronbach Alpha is fairly easy to compute, its application re!uires conceptual understanding such as true score, obser"ed score, measurement error, "ariance, co"ariance matrix, consistency, and dimensionality. It is hoped that this paper could clarify common misconceptions of Cronbach Alpha and impro"e the effecti"e use of SAS procedures. $)+)$)&C)S Afifi, A. A., : ?lashoff, . 2. $%&11(. 2issing obser"ations in multi"ariate statistics. ;art I. e"iew of the literature. 6ournal of the American Statistical Association, 1%, '&'-13-. Apostolopoulos, <., Sonmez, S., <u, C. 5., 2attila, A., <u, 8. C. $under submission(. Alcohol abuse and 5II ris# beha"iors of American spring-brea# tra"elers. Gehrens, 6. $%&&4(. Ooes the Phite acial Identity Attitude Scale measure racial identity. 6ournal of Counseling ;sychology, --, ,-%+. Cortina, 6. 2. $%&&,(. Phat is Coefficient AlphaS An examination of theory and applications. 6ournal of Applied psychology, 47, &7-%3-. Croc#er, 8. 2., : Algina, 6. $%&71(. Introduction to classical and modern test theory. Kew <or# * 5olt, inehart, and Pinston. Cronbach, 8. 6. $%&'%(. Coefficient alpha and the internal structure of the tests. ;sychometri#a, %1, +&4-,,-. .eldt, 8.N Poodruff, O. 6., : Salih, .. A. $%&74(. Statistical inference for Coefficient Alpha. Applied ;sychological 2easurement, %%, &,-%3,. Jardner, ;. 8. $%&&'(. 2easuring attitudes to science* )nidimensionality and internal consistency re"isited. esearch in Science ?ducation, +', +7,-&. Jardner, ;. 8. $%&&1(. The dimensionality of attitude scales* A widely misunderstood idea. International 6ournal of Science ?ducation, %7, &%,-&. 5enson, . 9. $+33% April(. Characterizing measurement error in test scores across studies* A tutorial on conducting Breliability generalizationC analyses. ;aper presented at the annual meeting of the American ?ducational esearch Association, Seattle, PA. 5unter, 6. ?., : Schmidt, .. 8. $%&&3(. 2ethods of meta-analysis* Correcting error and bias in research findings. Kewbury ;ar#, CA* Sage. 9elley, O. $%&&7(. The art of reasoning $, rd ed.(. Kew <or#* P. P. Korton : Company. Kunnally, 6. C. $%&47(. ;sychometric theory $+ nd ed.(. Kew <or#* 2cJraw-5ill. Iacha-5asse, T. $%&&7(. eliability generalization* ?xploring "ariance in measurement error affecting score reliability across studies. ?ducational and ;sychological 2easurement, '7, 1-+3. Iacha-5aase, T., Kess, C., Kilsson, 6., : eetz, O. $%&&&(. ;ractices regarding reporting of reliability of reliability coefficients* A re"iew of three /ournals. The 6ournal of ?xperimental ?ducation, 14, ,,'-,-%. Aimmerman, O. P., : Aumbo, G. O. $%&&,(. Coefficient alpha as an estimate of test reliability under "iolation of two assumptions ?ducational : ;sychological 2easurement, ',, ,,-'3. AC7&'(*)D9,)&!S Special than#s to 2r. Shawn Stoc#ford for re"iewing this paper and ;rofessor Samuel Jreen for pro"iding "aluable input to the author. C'&!AC! %&+'$,A!%'& 1 <our comments and !uestions are "alued and encouraged. Contact the authors at* Chong 5o <u, ;h.O. ?ducational Oata Communication, Assessment, esearch and ?"aluation Arizona State )ni"ersity ,3+ ;ayne 5all, Tempe AA 7'+74-31%% $-73(&1'-,-4' ?mail* alexTasu.edu Peb* http*LLseamon#ey.ed.asu.eduLUalexL 4