Vous êtes sur la page 1sur 6

Questions and answers about language tes... https://jalt.org/test/bro_12.

htm

Shik.en:JALT Testing & Evaluation SIG Newsletter


Vol. 5 No. 3. Oct.l001 (p.ll-16) [ISSN 1881-5537]
PDFVenion

StatEtics Comer
Questions and ans"Mn about language testing statistics:
:tr.
\ ~-
\ .....
Point-biserial correlation coefficients Jmms Dean Brown
University ofHawai'i at Manoa

[J QUESTION: Recently on the emai1 funm LlEST- ~ there was a discussion about point-biserial
corre1ation coe:fficients, and 1was not &rmliar with this term Could you exp1ain what point-biserial correlation
coeffic:ients are and how they are important fur 1anguage testers?

D ANSWER: To adequately explain the point-biserial correlation coefficient, 1 will need to address fuur
quest:ims: (a) What is the point-biserial correlation coefficient? (b) How is the point-biserial correlation
coe:fficient related to other correlation coefficients? (e) How is the point-biserial correlation coe:fficient
cakulated? And, (d) how is tbe point-biserial correlation coe:fficient used in Jangnage testing?

What Is the Point-Biserial Correlation Coefficient?

As 1 defined it in Brown (1988, p. 150), the point-bilerial correlatim coefficient (syni>oized as rpbl) is a
statil& used to estitmte the degree ofre1atKmship between a naturally occmring dichotmmus nominal scaJe
and an interva1 (or mtio) sca1e. For exa.IIJ'1e, a researcher might want to investigate the degree ofre1atKmshl
between gender (that is, being tm1e or fimB1e - a mturally occurring dichotmmus nominal sca1e) and
achievenmt in Eng1ish as a second language as m=asured by scores on the end-ot:the-year departmmtal
examination (an interva1 scaJe).

Aside ftom the types ofscaJes involved, the interpretation ofthe resulting coe:fficient is very similar to that fur
the Imre comrmnly reported Pearson product-DDment corre1ation coefficient (~s reterred to as
Pearson r, or sirq)Jy r). In brieflik:e tbe Pearson r, the l'pbi can range from Oto +1.00 ifthe two sca1es are
related positively (tbat is, in tbe sam;: directim) and :from Oto -1.00 iftbe two scaJes are re1ated negatively
(that is, in opposite direct.Dns). The ~ tbe va1ue ofrpbt (positive or negative), the stronger the re1at:iJmhip
between tbe two variab1es. [For nme detailed exp1anations oftbe interpretati>n and assmnptions ofPearson
r and rpbt, see Brown, 1996, 1999.]

How ls the Point-Biserial Correlation Coefficient Related to Other Correlation Coefficients?

In~ the point-biserial :fiom other corre1ati:m coefficients, 1 must fbt point out that 100 point-
biseria.l and biserial correlation coe:fficients are difrerent. The biserial correlation coefficient (or rb~ is
appropria.te when you are interested in 1he degree ofrelationsbip between two interva1 (or ratio) scaJes but fur
sorne logjcal reason one ofthe two rrore sensibly interpreted as an artificia11y created dichotoDDus nominal

Pagina 1 de 6 2017-03-19 02:51


Questions and answers about language tes... https://jalt.org/test/bro_12.htm

scale. For instance, you might be interested in determining the degree ofre1ationship between passing or
fuiling a first year university ESL course and language aptitude test scores. To do this, grades at the end ofthe
course (A, B, C, D and F, often converted to a 4.00, 3.00, 2.00, 1.00, & 0.00 interva1 scale) migbt be
artificially separated into a nominal scale made up oftwo groups: pass (Ato D, or 1.00 to 4.00) and fuil (F or
0.00). The degree ofre1ationship between this new, arti:ficially created dichotomy and the intetval scores on
the language aptitude test could then be detennined by using the rbi coefficient. 11rus the biserial corre1ation
coefficient is appropriately applied when the nominal variable is artificially created (as in the pass-fuil variable
created :from grade points), while the point-biserial corre1ation coefficient is appropriately applied when the
nominal variable occurs naturally (as in the naturally occurring male-fetmle gender distinction).

p.12

A variety of diffurent corre1ation coefficients have been developed over the years fur various combinations of
scale types, as sunnnarized in Table l. The point-biserial is just one ofthese statistical too1s (see the :fifth row
of correlation coefficients).

Table l. Types of Correlation Coefficients

Correlation Coefficient Types ofScales

Pearson product-tmtrellt Both scales intetval (or ratio)

Spearman rank-order Both scales ordinal

Phi Both scales are naturaJly dichototmus (oominal)

Tetrachoric Both scales are artificially dichototmus (nominal)

One scale naturally dichotomous (nominal), one scale


Point-biserial
interval (or ratio)

One scale artificially dichotormus (nominal), one scale


Biserial
interval (or ratio)

Gamma One scale nominal, one scale ordinal

How Is the Point-Biserial Co"elation Coefficient Calculated?

The data in Table 2 are set up with sorre obvious examples to illustrate the calcu1ation ofrpbi between items
on a test and total test scores. N oti: e that the items have be en coded 1 for correct and O for incorrect (a
natural dichotomy) and that the total scores in the last cohmm are based on a total of 50 items (rmst of which
are not shown).

p.13

Pagina 2 de 6 2017-03-19 02:51


Questions and answers about language tes... https://jalt.org/test/bro_12.htm

Table 2. Example Student Data

Student lteml lteml ltem3 ltem4, S, 6.. TotalScore

Hachiko 1 o 1 so

Kazuko 1 o 1 45

Toshi 1 o 1 45

Yoshi 1 o 1 40

Tomoko o 1 1 35

Yasuhiro o 1 1 30

Yuichi o 1 1 30

Masa o 1 1 25

Mp 45 30 37.5 TotaliiGlll 37.50

St:a.mard
Mq 30 45 .00 8.29
Deviation

p .50 .50 1.00

q .50 .50 .50 .00

Yphi .91 -.91 .00

To ca1culate the Ypbi fur ea.ch tem me the fulbwing funruJa:

[ p.14 1

Wbere:
Ypbi = point-bi;erial corre1ation coefficient
Mp = whoJe...test tman fur students amwering tem correctly (ie., tbose coded as ls)

Pagina 3 de 6 2017-03-19 02:51


Questions and answers about language tes... https://jalt.org/test/bro_12.htm

Mq = whoJe...test tman fur students answering item incorrectly (ie., tbose coded as Os)
St = staOOard deviation ::ti>r wbole test
p= proportion ofstudents answering correctly (ie., tbose coded as ls)
q= proportion ofstudents answering incorrectly (ie., 1hose coded as Os)

F or exatq)~, Jet's apply tbe 1bmu1a :fbr Tpbi to the data fur Item 1 in TabJe 2 (which we wouki expect to
corre1ate highly with the total scores), where 1he who1e-test mmn fur students a.nswering correctly i; 45; the
who1e-test mean ::ti>r students answering :incorrectly is 30; the standard deviation fur 1he whole test is 8.29; the
proportion of student:s answering correctly is .50; and the proportion answering incorrectly is .50.

rP 61 = MP; Mfl fPri = 45 30


- -.j(.SO)(.SO) = ~.,J.2500 = 1. 81(.50) = .91
t 8.29 8.29

1hus 1he correlatkm between item 1 mi the total scores ic; a vecy high .91, and this item appears to be
spreading the students out in vecy llDlCh tbe same way as tbe total scores are. In this sense, 1he point-biserial
correlation coe:fli;ieut il:uii;ates that item. 1 discriminares well atmng the students in this group (at 1east in
t:ertm ofthe waythe overall test di;criminates).

As another examp~, Jet's applythe i>rmula ::ti>r rpbl to the data fur Item2 in TabJe 2 (which we wou1d expect
to be 1gbJ.y but negatively correlated with the total scores), where 1he who1e-test mmn fur students a.nswering
correctly is 30; the whole-test mean :Lbr students answering :incorrectly is 45; the standard deviation i>r the
who1e test is still8.29; 1he proportion ofstudents answering correctly is still. .50; and the proportion answering
:incorrectly is still. .50.

r_w, =M"; Ms fjJq = 30-451(.50)(.50) = -15 ../.2500 = -1.81(.50) = -.9 1


8.29 8.29

1hus 1he correlatkm between item 2 mi the total scores ic; a very high negative value of- .91, and this tem
appears to be spreading the students out opposite to the way 1he total scores are. In other words, tbe point-
bmerial correlamn coefficient smws that tem 2 discriminates in a very diflimmt way :&om the total scores at
~ast i>r the students in thi; group.

As one last exatq)1e, Jefs apply tbe fOrmula fur rpbi to tbe data fDr Item 3 in Tab~ 2 (whi:h we wou1d expect
to have no correJation with tbe total scores), where tbe whole- test mean fur stlxlents amwering correctly is
37.5; the whoJe...test mean fur students answering incorrectly is 0.00 beca1me it is non-existent; the standard
deviation fDr the who1e test is still8.29; the proportion ofstudents answering correctly is 1.00; and 1he
proportion amwering :incorrectly is .OO.

MP - Mq e: 37 5 - 00 37 5
r pbl' = S ./Pq =
1

8 ' 29
J(l .OO)(.OO) = - J.00 = 4.52(.00) = .00
t 8.29

Thus 1he correlamn between item 3 mi the total scores ic; :zero, and this item does mt appear to be
spreading the student:s out in the sam:: way as the total scores. In otber words, tem 3 is mt di;criminating at
all ammg the students in tbis particular groupin this case because there is no va.riation in their a.mwers.

How ls the Point-Biserial Correlation Coefticient Used in Language Testing?

As mentioned above, the point-biserial correJation coefficient can be used in any research where you are
Pagina 4 de 6 2017-03-19 02:51
Questions and answers about language tes... https://jalt.org/test/bro_12.htm

interested in understanding the degree ofrelationship between a naturally occmring nominal se ale and an
interval (or ratio) scaJe. For instance, 1 might be interested in the degree ofrelationship between being tm1e or
remaJe and language aptitude as tmasured by scores on the Modem Language Aptitude Test (or MLAT;
Carron & Sapon, 1958). The point-biserial correlation coefiX:ient could he]p you exp1ore this or any other
similar question. For examples of other uses fur this statistic, see Guilfurd and Fruchter (1973).

p.l5

However, language testers rmst connmnlyuse rpbi to calculate the item-total score corre.lation as another,
rmre accmate, way of estimating item discrimination. The correlation coefficient being calculated bere is
between a natmally occurring dichotormus nominal scale (the correct or incorrect answer on each item
usually coded as 1 orO) with an interval scale (total scores on the test). Such item-total correlations are often
used to estimate item discrimination. Consider the item analysis resuhs shown in TabJe 3.

Table 3. Example Item Analysis


(for 32 students)
The goal ofthe analysis shown in Table 3 is to
estimate how difficult each itemis (the IF, or item
*=p<
ltem# IF rpb; fucility, shown in the second cohnnn) and how
.05 higb1y each item is correlated with the total scores
(the rpbi shown in the third cohnm). The item
1 0.930 0.153 fucility, as estimated by the IF, ranges ftom 0.00
(everybody answered incorrectJy) to 1.00
2 0.656 0.295 * (everyone answered correctly) and shows how
easy (or diffi:ult) each item is. The rpbi shows the
3 0.882 0.122 degree to which each item is separating the better
students on the whole test ftom the weaker
4 0.738 0.189 students. Thus the higher the rpbi, the better the
item is discriminating. N otice in TabJe 3 that
5 0.455 0.310 * asterisks rerer to the p > .05 at the bottom ofthe
tabJe and thereby indicate the iterm with point-
6 0.838 0.394 * biserial corre.lation coefiX:ients tbat are significant at
the .05 Jevel (in other words, those itetm that have
7 0.684 0.469 * only a five percent chance ofhaving occmred fur
cbance reasons a1one). [For IWre infunmtion on
8 0.552 0.231 how to detennine these p values fur rpbi, see
Brown, 1996, p. 178; fur tmre infunmtion on
9 0.581 0.375 * item analysis fur norm-refurenced testing purposes,
see Brown, 1996 (pp. 64-74), or 2000a]
10 0.398 0.399 *
CertainJy, ifyou are interested in creating a shorter,
11 0.926 0.468 * tmre efficient, norm-rererenced version ofthe test,
you might be wise to select those iterm with the
highest point-biserial correlation coefficients ftom
12 0.774 0.468 *
armng those tbat are significant (m.nnbers 2, 5-7,
& 9-13) fur the new revised version ofthe test. At
13 0.663 0.414 * the sanx: tinx:, you should keep an eye on the item
14 0.862 0.276 fucility index shown in the first cohmm so that you
select a balance of iteim tbat average out to rnake

Pagina 5 de 6 2017-03-19 02:51


Questions and answers about language tes... https://jalt.org/test/bro_12.htm

15 0.624 0.205 a test that is nei1her to diflkult nor too easy. 'I'1m
strategy is very similar to 1he way the discrimina1im
index is used (fur tmre on this stati;tic, see Brown, 1996, pp. 66-70). Such sta~tics can even be useful if
what you ooed , a Jonger test: s:imply examine those it.etm 1hat appear to be discriminating wen and write
rmre itmm like them

Ooo irq:Jortant caveat: remmber that tem ana1ysis s1ati;tics, Jike the rpbi, are only too1s 1hat can he]p you in
seJecting ~ best iteim fur a nonn- rerenn;ed test, but tbey sbouki never be used to replace tbe coiiDmn
sense notions involved in developing sound 1anguage test items. In other words, me these stati;tics to he]p
you Ullderstand how students perfunn on your test itmm and ~me 1hat :infurmattm to he]p you designa
better test next t.iloo, while always keeping in mind your theoretical and practica! reasons fur writing the items
you did and desigrring tbe test 1he way you did.

References
Brown, J. D. (1988). Understanding research in second language learning: A teacher's guide to statistics and research
design. Ca.tmridge: Catmridge University Press.

Brown, J. D. (1996). Testing in language programs. Upper Saddle River, NJ: P:nmtice Hall.

Brown, J. D. (tmns. by M. Wada). (1999). Gengo tesuto no kisochishiki. [Basic knowledge oflanguage testing]. Tok:yo:
Taishulam. Shoten.

Brown, J. D. (2000a). Statistics Comer. Questions and answers about language testing statist:ics: Howcan we caJculate item
statistics for weighted items?. Shiken: JALT Testing & Evaluation SIG Newsletter, 3 (2), 19-21. Also retrieved Maroh 1,
2001 ftomthe World Wide Web: htto://jalt.omltestlbro 6.htm.

Brown, J. D. (2000b). Statistics Comer. Questions and answers about Ianguage testing statistics: What issues affect I..ikert-
scale questionnaire formats?. Shiken: JALT Testing &: Evaluation SIG Newsletter, 4(1), 18-21. Also retrieved March 1, 2001
ftomthe World Wide Web: http;/t;alt,org/test/bro 7Jrtm.

Brown, J. D. (2001). Using surveys in language programs. Cambridge: Cambridge University Press.

Ca.rroll, J. B., & Sapon, S. M. (1958). Modero language aptitude test. New Yo:rk: The Psychological Corpomtion.

Chilford, J. P., & Fruchter, B. (1973), Fundamental statistica in psychology and education. (5th ed.). New York: McGraw-
HiiL

[ p.16 1

NEWSLEITER.: Topic Index Author Index Title Index Date Imex


TEVAL SIG: Main Page Background 1.inks N etwork Join

STA'DSDCS CORNJR ARnCLI!S:

-< Last Next


HTML: http://jalt .org/test/bro_12. htm 1 PDF:
http ://jalt.org/test/PDF/Brown12. pdf

Pagina 6 de 6 2017-03-19 02:51

Vous aimerez peut-être aussi