Académique Documents
Professionnel Documents
Culture Documents
htm
StatEtics Comer
Questions and ans"Mn about language testing statistics:
:tr.
\ ~-
\ .....
Point-biserial correlation coefficients Jmms Dean Brown
University ofHawai'i at Manoa
[J QUESTION: Recently on the emai1 funm LlEST- ~ there was a discussion about point-biserial
corre1ation coe:fficients, and 1was not &rmliar with this term Could you exp1ain what point-biserial correlation
coeffic:ients are and how they are important fur 1anguage testers?
D ANSWER: To adequately explain the point-biserial correlation coefficient, 1 will need to address fuur
quest:ims: (a) What is the point-biserial correlation coefficient? (b) How is the point-biserial correlation
coe:fficient related to other correlation coefficients? (e) How is the point-biserial correlation coe:fficient
cakulated? And, (d) how is tbe point-biserial correlation coe:fficient used in Jangnage testing?
As 1 defined it in Brown (1988, p. 150), the point-bilerial correlatim coefficient (syni>oized as rpbl) is a
statil& used to estitmte the degree ofre1atKmship between a naturally occmring dichotmmus nominal scaJe
and an interva1 (or mtio) sca1e. For exa.IIJ'1e, a researcher might want to investigate the degree ofre1atKmshl
between gender (that is, being tm1e or fimB1e - a mturally occurring dichotmmus nominal sca1e) and
achievenmt in Eng1ish as a second language as m=asured by scores on the end-ot:the-year departmmtal
examination (an interva1 scaJe).
Aside ftom the types ofscaJes involved, the interpretation ofthe resulting coe:fficient is very similar to that fur
the Imre comrmnly reported Pearson product-DDment corre1ation coefficient (~s reterred to as
Pearson r, or sirq)Jy r). In brieflik:e tbe Pearson r, the l'pbi can range from Oto +1.00 ifthe two sca1es are
related positively (tbat is, in tbe sam;: directim) and :from Oto -1.00 iftbe two scaJes are re1ated negatively
(that is, in opposite direct.Dns). The ~ tbe va1ue ofrpbt (positive or negative), the stronger the re1at:iJmhip
between tbe two variab1es. [For nme detailed exp1anations oftbe interpretati>n and assmnptions ofPearson
r and rpbt, see Brown, 1996, 1999.]
In~ the point-biserial :fiom other corre1ati:m coefficients, 1 must fbt point out that 100 point-
biseria.l and biserial correlation coe:fficients are difrerent. The biserial correlation coefficient (or rb~ is
appropria.te when you are interested in 1he degree ofrelationsbip between two interva1 (or ratio) scaJes but fur
sorne logjcal reason one ofthe two rrore sensibly interpreted as an artificia11y created dichotoDDus nominal
scale. For instance, you might be interested in determining the degree ofre1ationship between passing or
fuiling a first year university ESL course and language aptitude test scores. To do this, grades at the end ofthe
course (A, B, C, D and F, often converted to a 4.00, 3.00, 2.00, 1.00, & 0.00 interva1 scale) migbt be
artificially separated into a nominal scale made up oftwo groups: pass (Ato D, or 1.00 to 4.00) and fuil (F or
0.00). The degree ofre1ationship between this new, arti:ficially created dichotomy and the intetval scores on
the language aptitude test could then be detennined by using the rbi coefficient. 11rus the biserial corre1ation
coefficient is appropriately applied when the nominal variable is artificially created (as in the pass-fuil variable
created :from grade points), while the point-biserial corre1ation coefficient is appropriately applied when the
nominal variable occurs naturally (as in the naturally occurring male-fetmle gender distinction).
p.12
A variety of diffurent corre1ation coefficients have been developed over the years fur various combinations of
scale types, as sunnnarized in Table l. The point-biserial is just one ofthese statistical too1s (see the :fifth row
of correlation coefficients).
The data in Table 2 are set up with sorre obvious examples to illustrate the calcu1ation ofrpbi between items
on a test and total test scores. N oti: e that the items have be en coded 1 for correct and O for incorrect (a
natural dichotomy) and that the total scores in the last cohmm are based on a total of 50 items (rmst of which
are not shown).
p.13
Hachiko 1 o 1 so
Kazuko 1 o 1 45
Toshi 1 o 1 45
Yoshi 1 o 1 40
Tomoko o 1 1 35
Yasuhiro o 1 1 30
Yuichi o 1 1 30
Masa o 1 1 25
St:a.mard
Mq 30 45 .00 8.29
Deviation
[ p.14 1
Wbere:
Ypbi = point-bi;erial corre1ation coefficient
Mp = whoJe...test tman fur students amwering tem correctly (ie., tbose coded as ls)
Mq = whoJe...test tman fur students answering item incorrectly (ie., tbose coded as Os)
St = staOOard deviation ::ti>r wbole test
p= proportion ofstudents answering correctly (ie., tbose coded as ls)
q= proportion ofstudents answering incorrectly (ie., 1hose coded as Os)
F or exatq)~, Jet's apply tbe 1bmu1a :fbr Tpbi to the data fur Item 1 in TabJe 2 (which we wouki expect to
corre1ate highly with the total scores), where 1he who1e-test mmn fur students a.nswering correctly i; 45; the
who1e-test mean ::ti>r students answering :incorrectly is 30; the standard deviation fur 1he whole test is 8.29; the
proportion of student:s answering correctly is .50; and the proportion answering incorrectly is .50.
1hus 1he correlatkm between item 1 mi the total scores ic; a vecy high .91, and this item appears to be
spreading the students out in vecy llDlCh tbe same way as tbe total scores are. In this sense, 1he point-biserial
correlation coe:fli;ieut il:uii;ates that item. 1 discriminares well atmng the students in this group (at 1east in
t:ertm ofthe waythe overall test di;criminates).
As another examp~, Jet's applythe i>rmula ::ti>r rpbl to the data fur Item2 in TabJe 2 (which we wou1d expect
to be 1gbJ.y but negatively correlated with the total scores), where 1he who1e-test mmn fur students a.nswering
correctly is 30; the whole-test mean :Lbr students answering :incorrectly is 45; the standard deviation i>r the
who1e test is still8.29; 1he proportion ofstudents answering correctly is still. .50; and the proportion answering
:incorrectly is still. .50.
1hus 1he correlatkm between item 2 mi the total scores ic; a very high negative value of- .91, and this tem
appears to be spreading the students out opposite to the way 1he total scores are. In other words, tbe point-
bmerial correlamn coefficient smws that tem 2 discriminates in a very diflimmt way :&om the total scores at
~ast i>r the students in thi; group.
As one last exatq)1e, Jefs apply tbe fOrmula fur rpbi to tbe data fDr Item 3 in Tab~ 2 (whi:h we wou1d expect
to have no correJation with tbe total scores), where tbe whole- test mean fur stlxlents amwering correctly is
37.5; the whoJe...test mean fur students answering incorrectly is 0.00 beca1me it is non-existent; the standard
deviation fDr the who1e test is still8.29; the proportion ofstudents answering correctly is 1.00; and 1he
proportion amwering :incorrectly is .OO.
MP - Mq e: 37 5 - 00 37 5
r pbl' = S ./Pq =
1
8 ' 29
J(l .OO)(.OO) = - J.00 = 4.52(.00) = .00
t 8.29
Thus 1he correlamn between item 3 mi the total scores ic; :zero, and this item does mt appear to be
spreading the student:s out in the sam:: way as the total scores. In otber words, tem 3 is mt di;criminating at
all ammg the students in tbis particular groupin this case because there is no va.riation in their a.mwers.
As mentioned above, the point-biserial correJation coefficient can be used in any research where you are
Pagina 4 de 6 2017-03-19 02:51
Questions and answers about language tes... https://jalt.org/test/bro_12.htm
interested in understanding the degree ofrelationship between a naturally occmring nominal se ale and an
interval (or ratio) scaJe. For instance, 1 might be interested in the degree ofrelationship between being tm1e or
remaJe and language aptitude as tmasured by scores on the Modem Language Aptitude Test (or MLAT;
Carron & Sapon, 1958). The point-biserial correlation coefiX:ient could he]p you exp1ore this or any other
similar question. For examples of other uses fur this statistic, see Guilfurd and Fruchter (1973).
p.l5
However, language testers rmst connmnlyuse rpbi to calculate the item-total score corre.lation as another,
rmre accmate, way of estimating item discrimination. The correlation coefficient being calculated bere is
between a natmally occurring dichotormus nominal scale (the correct or incorrect answer on each item
usually coded as 1 orO) with an interval scale (total scores on the test). Such item-total correlations are often
used to estimate item discrimination. Consider the item analysis resuhs shown in TabJe 3.
15 0.624 0.205 a test that is nei1her to diflkult nor too easy. 'I'1m
strategy is very similar to 1he way the discrimina1im
index is used (fur tmre on this stati;tic, see Brown, 1996, pp. 66-70). Such sta~tics can even be useful if
what you ooed , a Jonger test: s:imply examine those it.etm 1hat appear to be discriminating wen and write
rmre itmm like them
Ooo irq:Jortant caveat: remmber that tem ana1ysis s1ati;tics, Jike the rpbi, are only too1s 1hat can he]p you in
seJecting ~ best iteim fur a nonn- rerenn;ed test, but tbey sbouki never be used to replace tbe coiiDmn
sense notions involved in developing sound 1anguage test items. In other words, me these stati;tics to he]p
you Ullderstand how students perfunn on your test itmm and ~me 1hat :infurmattm to he]p you designa
better test next t.iloo, while always keeping in mind your theoretical and practica! reasons fur writing the items
you did and desigrring tbe test 1he way you did.
References
Brown, J. D. (1988). Understanding research in second language learning: A teacher's guide to statistics and research
design. Ca.tmridge: Catmridge University Press.
Brown, J. D. (1996). Testing in language programs. Upper Saddle River, NJ: P:nmtice Hall.
Brown, J. D. (tmns. by M. Wada). (1999). Gengo tesuto no kisochishiki. [Basic knowledge oflanguage testing]. Tok:yo:
Taishulam. Shoten.
Brown, J. D. (2000a). Statistics Comer. Questions and answers about language testing statist:ics: Howcan we caJculate item
statistics for weighted items?. Shiken: JALT Testing & Evaluation SIG Newsletter, 3 (2), 19-21. Also retrieved Maroh 1,
2001 ftomthe World Wide Web: htto://jalt.omltestlbro 6.htm.
Brown, J. D. (2000b). Statistics Comer. Questions and answers about Ianguage testing statistics: What issues affect I..ikert-
scale questionnaire formats?. Shiken: JALT Testing &: Evaluation SIG Newsletter, 4(1), 18-21. Also retrieved March 1, 2001
ftomthe World Wide Web: http;/t;alt,org/test/bro 7Jrtm.
Brown, J. D. (2001). Using surveys in language programs. Cambridge: Cambridge University Press.
Ca.rroll, J. B., & Sapon, S. M. (1958). Modero language aptitude test. New Yo:rk: The Psychological Corpomtion.
Chilford, J. P., & Fruchter, B. (1973), Fundamental statistica in psychology and education. (5th ed.). New York: McGraw-
HiiL
[ p.16 1