Vous êtes sur la page 1sur 45

Further Read'n

Fielding, N., Lee, R.M. and Blank, G. (2008) The SAGE Handbook of Online
Research Metlzods. London: Sage, pp. 177-194.
Goldie, I. and Pritchard, I. (1981) 'Interview m0tho<lology comparison of three
types or interview: One-to-one, group and telephone interviews', Aslib
ProcePdings, 33 (2): 62-66.
King, N. and Horrocks, C. (2012) l11tenE/ews m Quafitati11e Research. London: Sage.
Mikec:t., R. (2012) 'Lnterviewing dites: Addressing mf'thodological issues',
Qunlirariue ln171tiry, IR (6): 482-493.
Quinlan, C. (2011) Business Research Methods. Andover: Cengage J.e;irnrng
S;iunders, M.N.K. and Lewis, P. (2011) Doing Research in Bminess and
Management; /\n EssenHal G1lide to Pla1111int YIJUr Project. Harlow: Prentice l lall.

Using secondary data

After reading this chapter, you should be able to:

know what secondary data are. and how they can be incorporated into your research:
understand the different types of secondary data:
be aware or the main electronic secondary sources:
know the advantages and disadvantages or secondary data:
know how to access secondary data:
be aware of Lhe possible usage of foreign language sources:
c1ppreciate how to evaluate secondary data;
know how to link primary and secondary data; and
understand how to present secondary data.





1n the preceding chapter we looked at primary dare and associated collection

methods. In this chapter we continue the theme of data collect1on, by examining
econd::iry dat;i,
In ci.mtrast to primary data, secondary data are data that have ben cullected
by olh rei,earchen,. Secondary data t.:11compas1, a ra11ge of Jiffere.nl i,ourccs. We
explored some of these sources in Chapter 3 - general reports, theses, newspapers,
academic journals, textbooks, Internet websites, abstracts, catalogues, dictionaries,
bibliographies, encyclopaedias and dtation indices.
Most Business and Management students rt"ly heavily on secondary data when
rnnducting their research. To be sure, in some cases it can be used exclusively within
a research project. Conversely, other students may prefer their project to be domi
nated by primary data culleuion, with secondary data receiving limited atlention.
How does one make a dccision on the applkation of secondary dat::i? Well, a key
aim of this chapter is to help you to evaluate the extent to which second::iry data
may feature in yom research.




This chapter hegim hy defining the nature.: of sc..:rnnJary data I then examine
reasons th,H m,1) lc..:aJ you to b.m your rro1ect entirely on secondary Jata. At one
level, this is often detennint>d by the assessment regulations laid down by your
university or rnllt>ge. However, othe1 fauors may also influencc..: your decision. We
examine thest: later on tn the Lharter. Ne.xt, Lhe advantages and J1sadvantages of
,;;ernndary data art: presented. Unsurprisrngly, time is cited as a major aJvantage. The
plethora of eleclron1L i.ources ava1!:.ible has made searching for seroncla1 y Jata all
the e::is1er. Nevertheless, there an: notable disadvantages, anJ thest> reu::i, l' similar
Non-nalivt: English speakL'r:, and thoe who are ahle to rt>ad languages other than
English w1ll be nbll'. to use foreign language sources. Tlus ot course applits to many
international students. Thus, I havl'. included a section in the chapter that highlights
somt> of the issues surrounding the usage of foreign langungl:' our,es.
The availahility of crnndary Jata is a real rnncern to l>tut.lent researchers,
Unduuhteilly, your institution\ lihrary will contain a wealth of sources. The degree
to which Ull'.Sc corresrond with your research t.ll!pends on the nature of your topic.
fn some cnses, you may need lo access more specialized data. I will llwrefore make
recommendations as lo how this can he achieved.
Tht> conch1d1ng part of the chapter t'xamines lhe evalual.Jon and presentation
of st'condary data '1 he ability to evaluate secondary data is essential, not least to
detL'rrnine the dt>gree of relinbility Also, if you have amassed a huge amount of
data, how du you know what to indl1de and what to omit? One way is to cor1siJer
the folk,wing data evaluation factors - purpose, scope, authority, audience am!
format. Addressing 4uel>Lions associated with each of these five factors shot1kl
make the evnluat1on process easier. N<.!xt, 1 provide a relativtly hnef overview of
how to rresent secondary data. Often, it can be presenteJ in iL originaJ form, or
you may wish to rresem thL' data in your own way. As tl1erl'. is very little d1sunclion
hetweeu prL'l>enting st>rondary and primary data,] pay gre:iter attention to present
ing data in Chapters 9 :ind l U Finally, we examine how to link p1im.1ry data with
secondary d[lta.

h t .a\rP P(nnttarv n;:at;:a?

A,; noted earlie1, SL'LUnJary data art' data that have been collected by other
tt!!>earchers Of course, the researchers could be an individual, a group or a body
working on lwhr-ilf of an organization. Secondary data indu<le everything From
annual reports, rromotional material, parent comr>any doettmentalion, publibhed
case descriptions, magRzines, journal Rrticb, a.nJ newspaper rep()rlc; to govern
mt>nt printed suurLci..
Most research begms with secondary data analysis. The outcome of this analysis
usually dict.1tes whetl1L'.r or not the researcher will engage in rrim;:iry , cscarch for
example, if you dt>terminc that there u. a Limited amount of :.L't:ondary data on your
chosen topiL, you may be more inclrnl'.d to conduct primary research. Conversely,
if a plethora or data exists, thm you may not fet>l the need to engage in pt 1mary


data collert1on. The amO\ltll of ex1ting Jata ..waibble 1s 1usl ont> rt>ason that may
1110uence your deLision to focus soldy on secondary data. I discuss other reasons in
Ull' next seLtion

Reasons for Basing Your Research Project

l=r,t'r,:::aJv r,n C:.Pl"'nndllfV 0:lt
1'hL're are tv,o main rcaso11s that may leaJ you lo hasc your c:ntil'f' ft>earch project
on se,ond::iry data - the nature of your lOpic und YC>lll lnstllution 's asscsrn1t:nl
Certain rt'Scarch tapirs arc more likely to warrant a greater emphasis on second
ary data. For L'Xamplt>, let us say that you intend onducling a comparntive study
intu gross domestic rroduct (GDP) growth rates among European Union states.
Given the large amount of secondary data on thts pJrt1cular topic, the likelihood is
that you may not fL'cl the necessity to LOllcc.t primary data. You may argut' that you
are perfectly capable of producing a LOrnprehcnsive piece of research wiiliout the
nt>t>d for primary research Thi may be true. Neverthelt>ss, your mtcnliom may not
be workable due to your mslitutions assessment regulat.Jons.
Some universities and colleges insist that primary data must feature within a
reeMch project, although this tends to be more appliLablL' to postgraduate rather
than undergraduate programmf's. Therefore, if your i11Slitutioo permit,;; projects
solely hosed on secon<lury data, this may influence you to go down this roule. If i11
doubt, check with your ncadernic institution .
Then: are other reasons that may influence your decision. Arguably, these are
less significant, but they still need to be cnns1dered. These l,ln lw summarized as:
your choice of research design:
whether you are undertaking international or cross-cultural research: and
whether you are unable to conduct primary research.

rirst, your choice of rL'searrh design may inJ1ucnce your decision wht>ther or not to
conducl a projeLt based exclusively on secondary data. If; for example, you plan on
undertaking a long1tudinal study over several years, this will not be feasible using
primary research. However, you may find that there are exisllng longitudinal studit>s
relevant to your chos<'n topic. This would allow analysis, perhaps even comparative
analysis, of ex1ting studiL'S.
/\nother t>xamplt'. is using R case study research design based entirely on sernnJ
:iry data. Let us say that you are concerned with a comparatiw analysis n[ LhL' i.I1tr
nalionnlization strategies of two of the UK's leading surermarkets - Tesco and
Sainsb ury 's. You may argue that existing rnmparat1ve studies have already been
unJertakL'n hy other rt>sea.rchers, in which case Lhl' nature or your research is to
compare and contrast existing data.
Realislically, financ.ial and time constraints make undertaking mternationa.l or
cross-rultural resea,Lh difficult for student J'esearchers. Nevertheless, this does not

mean that it has lo be ru led out al together. An alrnnd:rn ce of existing secondary

sources (perhaps across differenl counltie) may mean that you a re in a posltion to
conduct y o ur resea rc h . However, you need to be caut1otL of rotential differences in
how stu dies are con ducted and analyzed across cultures.
You may fecl that an inabili ty to colle>ct rri n1ary data rneans that you have no
option but to focus entirely on seLondary data . foor example, a pri mary study into
pay awards among multinational company directors is likely to he heyond <:>ven the
most dedicated student! Yet, if several organizations publish these data , you m ight
a;1;k ynur;1;e]f - 'Why <lu I need to conduct pri mary resf'arch?'
Fi nal ly, I am of the view tlrnt a perfectly gno<l. undergraduate reliearc.h project
can be written based solely on secondary data. True, certain topics are best suited to
pri mary data. But if a Slutable amount of secondary data exists, there is no reason
why u project hasf'rl excl usivf'ly on secondary data can not he u ndertaken. Th is does
not me::1 11 that usi ng completel y secondary :m urce-:o; is a 'soft option ' . The rea l chal
lenge For !li.udt:nU. ill rnllecting 1 analyzing and i_n terpreting someo11e dse's data i.o
that it corresponds to their own researd1 problem.

Buc:inec;c: nd Ser.ondary Data

O rganiza tions also need to decide how to incorporate secondary data into th eir
research. For example, if a company is about tu launch a new product, it will run a
marki.:t survey to collect primary data and gauge customer reactions; if it wants to
evaluate general econornk activity in an area, it will use secondary data prf'pared hy
the govf'ro rnent (Waters, 1 997: 73).
In husiness, there are two broatl classificalions of secondary data - i11Lental and
i:xte-nzal data. Examples of internal sources i nclude customer records, sales invoices,
previous market research reports and mjnutes from boa rd meetings. External
sources tend to he more varied. T11ese include everyth i n g from com retitors ' prom o
ti on al broch ures lo governmen t reports.
Classifying secondary data on the basis of internal a nd external sources makes
sense when cunsiJc.:-ring atl organizational perspective. Yet, how does t.hls rdatc to
the student researcher? Arguably, a one-size-fits-all definition is neither practical
nor possible. The majority of st1 1dents do not h ave the privilege of access to internal
company data. Morf'over, not all students engage in organization ally hased rf'search .
Therefore, I propose a ' student-based ' classification of secondary data l ater in this
Small organizations have a propen1>i ty to use mainly secondary data when con
ducting market resea rch . Fo1 example, th.is may incl ude information from trade
magazines, art.ides in the local press and internal data . O n e of the reasons for this is
that many do not have the resources to engage in primary researc h . Where prim ary
research ls undertaken, this tends to be conducted on an i n form al basis.
Unl ike small firms, l arge organ i;,;ations often buy in secondary data , or rna.iling
Lists, from specialized agencies. To i llw.trate1 lel uli ay that a tyre manufacturer
wishes to promote their tyres to Frcnd1 car dealcrs. lf' no published data are available,

then one option is to buy the data from a specialized agency. Th.is is costly, of c.ourse,
and does not gu aramee a high re1ipone rate. O ther rotential problems arc associated
with h uying in data . l-'irst, secondary data soon hecome out of date. 'l11is is also a
conl:em with i nternal data such as customer records. Relying on such data ran help
maintain close customer relationsh ips, b ut it needs to bt: upduted regularly.
Second, alt.hough buying in data can he fllltentially rewarding, t.l1e data have to
be correct. 'J'bere are likcly lo be literally thousands of data lists ava ilable. Compa
n i es that fa il to buy from a reputable source may find that the data do not meet
tht!ir l!xpectations, rartkularly i n terms of reliabiliLy and validi ty.
Third, secondary data huugh t for the purpose of direct marketing activities are
unl ikely to be exdIDiW to one organization . This is especially true in business-to
business markets, where several compan.il!s compete tor a small number at customers.
I n sum, for many organizations intent on promoti ng their products and sei vices,
secondary data can be R u:,eful wiiy to target potential customers. Howevet such
tlata have several limitations: they soon become <la led; it is sometimes difficult to
verify the credibility of a source; and competitors have acces:; lo the same data.

RPliance on the Internet as a <;er.ondarv Data Sourc

ln the Last secti on we looked at h ow busi nesses might use secondary data . Of course,
oft en an important source of secondary da ta for both businesses anJ student is the
l nternet. First, businesses may use it as a veh icle for gathering infonnalicm From
com retitors' websites, accessing industry da ta or assessing polenlial en vironmental
threats to the business. Most businesses arc also aware of the i mportance of not
ovetly relying on the Internet. Other, more traditional, secondary sou rces are
equally irnrortant. These may incl ude printed business d i rectories, government
reports and, of course, internal data.
Si milarly, l am certain that the lntemet is al ready an importari t source of informa
tion in your research . An increasingly common theme I have witnessed in research
project is overemphasis on Internet-based sources. This oveMeliance is likely to
have a negative impact on th e reliability of your research, e. pedal ly if your chosen
source are un known and can not be tested for their credibility. As we have estab
lished, the lnlemet h as brought many heneHts to resea rchers, but the increasi ng
emph asis on Internet sources by some student. 1s a concern. Not only <loes il mean
that attention is being paid to poten ti ally unreliahle websites, but also th at more
traditional sou rces, sucl1 as books, arc sometimes being excluded fi on.1 their rescardt .

The Distinction Between Literature Review and

Second ry Data Anlysis
By now, you should be flware of the rol e that secondary sources play i n conduct
ing your literature rev ic:>w. Hnwever, secondary data can alo fo rm a m ajor, if not
exclusive1 part of your an al ysis. For those of you focusing your re:;earch en li rely


o n sPcondary Jala, yuu need Lo mnkP a dear disli.Jltlion in your rese::irch belwPen
your litcr3 ture revi ew a n d an alysis. Students someti me!, find t.hh dJ F 1cu l t . ln
essenre, your liternlure review is like l y to be one or two chapters a.nd will come!
before yuur secunJa.ry a n a l ysis. U nlike you r literatu re review, secondary rlata
arrnlysis may involve using prevtously puGlished survey data as the focal poin t for
your an::tlysis. J\ltern ::itively, Lhe m a i n focu-: migh l bt o n a particular case or rmu
liple ca.ws which co uld also feature publ ished su rvey data . In short, e1ther of these
approach es is hel pful l n maintain i n g a distinction between the liteniture review
and analysis chapkrs.
Whereas your literature review may describe your chosen survey, an<l compa re
and contnit similar studies, your secondary analysis is likely lo involve a delai.led
analysis of your r,.h osen survey. Tyrically, this might form the basis of one ch apter namely your an alysi. nnd results.
Remember lo j ustify yo u r thoic:e of survey. R easons m i,ght include: t.he repula
tion of the souro.:, the contemporary nature of t.he sL1..1dy1 thf' i;a m ple iL.t:, or simply
th at ft 1s the only recognizabl e stlldy conducted in your area of research.

r1 c::c::ifyina Serrn'1 r l:1ta



Secondary data crn he classifaJ in a nu111her of different ways. A Jjstinction ca n be

made on t..he basis of fonnat and i n te n dPd ::i udience. first, secondary data ,::in bt:
classified inlo elec1..r onic and wri lli.:n formats. A lth ough th is distinction h as lwc-omc
bl urred in recen t yc:ar:,, jt still applies lo lhe m:ijnrity of students engaged in
research . Sernnd, these groups ci:m be turthcr w vi<led in lt) subgroups accor<ling to
th eir i n tended target nudience. Usually, this means an exclusively academic or com
merci al :rndlenc-e. Figure 7 1 ill ustrates the classification or secondary data . Al though

Secondary data

l=lectronlc formal

Government websites


Company woositos



Cla51f, ,;1 1 1 11 1 11f

Acaclami journAl!i
Conferenco papers
Book reviews

eronaarv tl,,t


i t is Gy n o means t:Xh:iuslive, it classifies the main sources ot secondar y data used by

students. Now, let us begin tn look Rt Pa,h nf these classifications i n turn , starling
with el ecrronk data

FI Prtronic format
' Electronic data' refers to datA prPsen ted in elt,trnn 1r form rit, such as I n ternet weh
,i te<;, A<i n ond in this chripter, In ternet wehsi te ::i re An incrl;'esi ngly ropu lar source
nf l itf'rnture for st1 1dt>nt.s, hul the [n lemel i!i hy nn nwam the only ource of elec
tronic data. Other poten tial sources for yllu to considcr un: DVDs, videos a nd audio
rnt'!J ia. These can indudc an orga nization's in-house training video, prnmot1onal
materials for a multinational company, an audio recording of a rAdio i nterview with
a company director or even nn Rudin hnok. Although the l a ttPr may seem an
unli kely so11rce for y01 1 r research, occasionally you may find inval uable data that art:
on ly availah lf' i n one particular fonnat.
The main advantages of electronic data arc thal they save tirne because they arf'
easily accessible, and can be: easily stored. Indeed , gone arc the days when tudent
researchers were forced to keep several lever arch Hies of releva n t ani cles. l rt>m em
ber thl:'m weU !
Ti:ihl P 7 . 1 provi rles a l ist of useful gnvemment wehs ites, mu lti lateral organiza
tions and gf'neral husin ess-relateJ sites thuL prove popular wilh Gusiness stutlents.
Tn :a<lw tinn , the A merican Marketing A!.sucialion (20 1 3) provides a coniprche.n sive list
or declronic data sources.
Flr:>ctroni format - wrpmercial aud,rncr
Electrnnic soLt rces geared toward a commerciaJ audience incl udf' mul til aler:J organ
ization and government websi tes. The former are particularly useful for students
TABLE 7 .1

Written formal

Bulnoss dlroctotles
Trade magazines
Company reports


Lecture notes

l?l Cl n ,

,t 1



Web address

summary of information

Woi ltJ rratJe Organizallon


Data on


Stillistics on ml'!mbnr states

International Moncwrv Fund


Publishes a

Eurnean Union


range of rnmmerciat and financial data

St.ititiral information on memher st,1tes

Rnancial Times


Information on financial markets

Tile Economist


Economir data and articles

Business WPP.k

www.businl!sswcek.com Provider of global business news




British Broadcasting Corporation www.bbc.cu.uk/news

Dun &



wro meml.Jer states

news on Lile UK dntl worldwitJe

Provides global information on businesses


World Bank


Country data ancl analysis on the global econo111y

Tile Guardian


A leading UK newspaper


engaged in research on economics, the business environment and mtmatiunal trade.

for instance, the WTO site contains a wealth of mfurmalion on thee and tnany other
i:1 e"'tronir

for- - ,..adrmir l.!d1rr e

Elt>ctron1c data targeted at a mainly academic audience include acadf'm1l jounial

artlc!Ps and conferencP papers. These arr an 11waluable souru of data For sru<lt'nl
for the implc rl'ason that they an: aimed at a predominantly ocadcrnic audience
and arc Likely to contain many of the sectmns that will feature in your research
rro1ect, e.g. literature review, mPthodology, condusion, and so on.

case studies. You may <leciJ1: that inforrnation from Lhese cac;es can be ltseJ as pa.rt
of your s1:w11Jary analys1s.

Ttie nv;111t(1e of Secondar, nat:t

There arL several aJvantagcs of using seconJary data. First, vou will finJ that the
majonly of Jata arc avuilablc through vour im,tituliun's lihrary Jl no or very liuJe
cost Second, 1n uintrast to primary dt'lta, c;tcondary datu Lan h1: rdativcly straight
ionv:cird to ,ollect. DPtailt;>d advantages mdude.

LPr:;'-' f<lSC':.!"ce-irtPn' i\ P
Writtpr format




Written data refer to data that are printed in hanl-LOpy format . The main l!xam
ples of secon<lary data arc the more 'traditional' published sources, SLich as text
hooks and academic 1oumals Ostensihly, we will rootinue to sec a shift away from
traditionaJ publishing to electronic format. This has certainly bcc.n the case in
lrms of publishmg and arcessi.ng academic journals in recent years. Althollgh
your university or coUege library is likely to bold both electronic and hare.I copies
of academic journals, once again nccessing them electronically saves tune and is
easier to organize.

In general, secondary data ar1: a conveni1:nt and cost-effective source of information

for the student researcher. Given that much of your secondary data are likely to be
radily t'lvailable, accessing mto1 mation this way will save time when it (umes to
analyzing and interpreting yClur findings. Ohviouc;ly, cnndJcting pnmary research
involves a great deal of Lime tu prepare, implement, coltt>ct and interpreot results. By
focusing on seconJary <lata, you may bL able lo c.:ollect and analyzt- muc:.h larger data
sl'tS, like those published by an organization suLh as Mintcl.

Can allo1

Written hr.,.., - rommercial ::iurfrncr

Wrinen data produced for a rnmmercial audience are generally published for con
venient and functional purposes. For example, a building firm is likely to produce
sales invoices, rustomer records and sales figmes Although the company is legally
required to keep such data for accounting purposes, it is also a convenient way to
develop customer relationships through direct marketing And advertising. Further
more, sud1 internal data can aid the organil!:ation with stratf'g1c development over
the short, medium and long term.
Busine.ss data are also produced hy publishers such as Dun & Bradstreet Although
such data tend to he produced in both hard-copy and electronic fonnal, rn:my compa
nies still prefer hard copy. Organizations and governments can use the data produced
to target potential customers and/or business partners.

WrittPn f orrnat - ar1demi

di 'rice

Academic data in written format still tend to include textbook:; and lecture notes.
The funner an.: an essential source of data for students, as module!. are oftf'n st1 uc
tured on a particular key text. Both textbooks and lecture notes can make worthy
contributions to your research. For example, many textbooks feature contemporary


for comp2' '"ative malysis

Another advan tag uf secondary duta is that they can be compared to your primary
findings. By c.omparing your primary data with your secondary sources, you can
determine the extent to which you agree or disagree with existing studies. For
examrleo, 1f you consider a study into cnr ownership, you may find that . 11atwnal
study suggests a possible downturn in the market However, a questionnaire survey
of car owners in your area may gcnerah contradictory data.
Secondary data also enable cross-cultural or international comparative research,
as they help to overcome the obviou\ limitations associateJ with primary data L'Ol
lection. The Economist lntl"Jligence Unit (El U) provides country data that ca11 be
used exclus1v('ly or on a compnrative basis.

lde;iL frJr longitl'Cf" 1 c;turlie'i

I have already noted thal sernndary soLuccs provide stuJent.s wit.h an opportunity
to engage in longitudmal research. This is a cl1:ar advantage. Much of the data col
)pcted by governments are compiled over several decades. FOi instance, census data
and data published on tht> Re-tail Price Index (RPI) can be analyzed ove1 many years.
This lends it.self wt'II to a longitudinal study. On the other hanrl, you still need to hf'
wary of how suc.:h data rdate lo your research problem. For example, census data


are ideal il the noture of your rt'sPa1 ch is to examine JcniographiL changt> and/or
rorulation grn,vth, hut a,e u11suitahle it ynu want to t''<plore customers' purchasing
dec1s1ons, for instance.


rcocc:ihlP for other rPc;P;Jrrhers

Fin:ill. se1.ond.uy d.:1l:1 foulit:itt a(ces:. for olhl.'.r rcse.1r1.hcrs interested in ym1r area
ot resea11..h Many reseoidwrs rely on <;('condary data Therefon:, hy making refer
ence to sernndary sources you arc likely to a1d other rt>se:irchers engageJ in Jeveloping thPir own research.

Tho Disartvanta9es of Ser:nnd:arv nat

There an.: numerous disadvantages associated with secondary Jata. The main thing
lo cunsiJcr is that secondary Jata should not be uscJ c..xdusivcly simply as a means
of saviJ1g ti.J1H' ::ind mnney1 Also: data may be outdated, there rn:.iy ht> a cit>arth of
data relating to your study, and the data may he unreliable.

Arcess 1s diffirL"L+ arc! rr"'tl"





Often, you will find that high-quality anJ n.liahle sernndary data are difficult to
aLccss. Examples induJl' certain types of go, l.'.rntm:nl Jat:i anJ internally proJuccJ
org:.inizatwnal data. The rnain reason that acct:ss to th1s type of Jata 1s generally
restricted is largely due to its sensitive nature. Normally, the only way to gf't hold of
surh valuable ciRt::i is if you wurk for the organization that prmluLeS them. F'ven
then, acessing such data can still prove J.ifficult!
The cost of au:essing c.lata is also a di.suJvantagc to student researchers. On many
occasion students have tolJ me that they have founJ an l:xcdlent rt'port onli.ne, but
obtaining a copy usually involves a suhscription fee 01 a si:lcablc one-off payment.

Ma" 'lOt ff' trh yn"r rpc:-rp'"I"' prnhl 0 1""

You may find 1t problcrnal.Jc to Hnd secondary data that correspond to your study.
For instance, l remembc1 supcrvismg a student who wisheJ tu analyze thP develop
ment of e1 otourism 111 Zanzibar. Needless to <;ay, the Amount of seLonJary data l)ll
the mhi td can hest be desrribed :.ts 'narmw' In drcumstances like ll1esc, where the
nature of the topu;: is highly specialized, 1t 1s often f"SSt>ntial to conduct primary
Another p1oble111 is th:it although data may appQar lo correspond co your
re,;earc-h, sometimes you may find quite distinct Jiffcrcncc!> in how key variables
h,1ve been defined. S1milarly, a J.ifferent set of measures may have been employed.

You may believe you have a suffic..ient amount of data, hut re.member that second
ary dat,1 arc data th.1t have bef'n collcL"ted by othe1 ind1v1duals or orga11iz:1t1ons !or
their own rurposes. Sud, data my not therefore answer your research 4uestions.

n,H: 11lt tr


In gene1JI, tht' oh1lity to Jt>terminl! whether or not se<ondary cfata arl' rt>liAblc is
largely clow11 Lo the so11r1.c. Ct>rta!llly, al ::iclemic journals otter high levels ol n"l1al,il
ity, as do estabhsheJ husrnesi. puhlications such as the Haman/ Busine.s Review and
Tire Eco11nmist The main problem tends lo ht> with the mort> nhscun p11hl.iu1tions
and websites.
Son1ctimec;, it c..ru1 he argued, Ult> lullltation assoc1ateJ with reliability can be
ovPrrnmc by using a varicJ range of secondary sources. 13y way of illustration, in their
artidt> into annual working hours in Bntain, G:ill and Allsop (2007: 801) juslity the
use of sernndary data by making refe1ence to the ahsenu of iaodarJized and long1tudinal d::ita ln aJdition, the authnrs abo indude a wide range of secondary sources:
Tlw mutl'rial to1 this n1arrh 1 Jlrivecl from a m1mher of Sl'conda1 \ suurci>.
uch ,1s till' ruhlu ,H 1um or the AJn,ory, C \me 1'1at1on and Arbitration Scrvi(P
(ACAS), 1111.:uml s Data 1.:n1us llDSJ, lndustnal H.d.1t1nns SLniLl's (IRS),
L.1hour R1searcb Dt'partmi:nt (I.R [)) :.incl the Institute uf P1rso111\l'I Mnn.1gL'
mt>11t (lPM)/Cha1 ll'rl'J Institute nf F'C'rsonnt'l M:rn:igt'mcnl (C'TPI )) as wt'!( a ...
Ll.l\l'rage of salit'nt rkvek,pmPnts in tlw qualit, pres lik1 t!H Fi11,111fia/ Ti11lt's
,rnd The G11i1rdia11 Jml n111011g rt'g1onal dad; hm;1dslwcb. Whilst tlJL'n' .1r..: a
numb('r ol weak11esvs 111 the 10bustness ol tht.: Jat:1 ge11cn1tPd 11Sing such :i
m<'thod, this dat:i 1.a11 help s11ppkm1H nther dat.1 - \\h1t.!1 itstlt 1s nut with
nut wi>ak11l'sscs so that a fullf"1, multi n1mpnnenl ricture <,r annual hours
Cllll lw built 11p
The sernndary dat::i cited 111 Gall anJ Allsop's article may not be fomiLar to you
These inclt1de puhlJCations from publir sec lor bodies, professional boJit>s, and
regional and notional n<"wspapers.

'"' ?


,..,.,..._,I p fnrm

Data tnat have experienced l1ttle, if any, processing are referred to as m.1.1.1 dnrn,
whereas Jata that have received some form of procec;sing nr summan7tng are
known as compiled 01 c:uulud data. For examplt', a suctcssntl dot.com company scll
lllg books anJ musk CDs online might collect h11gt: volumes or sRles-rclated raw
data each dlly, but SL1ch data are not very usful until thev have hee11 anJlyzed 1
interpn'ted and presented in a manageable for n1. Once processed, this type of data
might be used for analY7tng sales trends, launching a targeted sales promotion cam
paign or simply analyzing the most rrofitable lines. T.ibles 7 2 and 7 .3 show raw
data and cooked data respectively.


TABl E. 7 2

An rx

Transaction no.



lABLE 7 3

An ex

Mon 73 March
Tues 24 March
Weds 25 Mi!lrch
Thurs 26 Marc/I

l ( 001\t'd)

Daily sales achieved ()


1-ri 21 March
Sat 28 March
sun 29 March
WTA/ far week

the World 'lra<le Organization that includes quantitutive,a statisti1.:al data 'iuLh as
GDP figures and qualitalivt' analysis in thf' form of C]Uotalionc; from le,aading eco
nomics experts.
In one sc.:nse, arplying a wiJe range of scrnndary sourres is a good tlung, hut you
mu!>t enc;ure that you can comparL your findings on an equal footing. Comparability
is often a problem when mt<'grnting and examining data from different sourrei.. Oif
ferenres may occur in the following aspect,;:

Daily sales tarset ()


21. 750




Difference (+I-)

; 2,484

A disadvantage of raw data is that researchers nee<l to ullocate time to processing

and summarizing the data. On the other hand, a major advantagP is that the data
can bP proce!>!>Cd in R way that suits the reSt'archcr. Table 7 .2 is an extract from a
claily list of sales transactiom for an inJepcnJent food retailer. Obviou!>ly, in iti. Lur
rent, unprocessed, form it pruvi<l1.::s very little inormalion for the retailer. Clearly,
the only information that the list does provide is the value of each transaction, and
its respective transaction number. Table 7 J shows how the food retailer might rro
cess weekly !>ales transactions. As yo11 ran sPe, the data havP heen rresented in a
much more manageable form. Morrover, 1t features some interesting information
that can be used to aid inventory levels, marketing and hudgeting. The rno:,t notable
feature i.s the Pxtent lo whid1 sales huve fluctuated over the course of the week, and
of LOun.c the fact that t.he retailer has exceeded 1tc; W('ekly target.

Compa"'q'J'' ih,

Onf' final disadvantage with secondary data is comparability. I have nlready men
tion ed that an advantage of using secondary Jata is that data can be compared
with your pt imary findings. l lowever, if you are engaged in exclusively secondary
<luta, then this option is not Opl.'11 to you. Of course, your secondary data may
include both 4ualitative and quantitative data, e.g. a country report published by

The reliability of the Information. In developing countries. where a substantial proportion of

the population may be illiterate or difficult to access. population or economic data may be
based on estimates or rudimentary data collection procedures.
The frequency or studies. The frequency with which surveys are undertaken may also vary
from country to country. While in the UK a population census is undertaken every 10 years. in
some countries It may be more than 30 years since a complete census was undertaken.
Measurement units. These are not necessarily equivalent from country to country.
Differences in circumstance. Even where data may seem comparable. there may be differ
ences in the circumstances that lie behind the data. Ir a researcher was to undertake a com
parison or GNP (gross national product) per capita data from Sweden and the UK. the
information may prove misleading. The high per capita income figures for Sweden. which
suggest a high standard of living, do not take account of the much higher levels of Swedish
taxation linked to the state's provision of social services.

(Wilson, 2006: 58)

T Usa"'

nfl Frrpjon I ;:rnoui\Op ()llrrc:;

If you have the capability of reading toreign langw1ge sources, then this is dearly an
adva11tage as it allows you to consult a wider mngi> of secondary sources. In addition,
if you a,e an international stuJent, then you are likely to UL familiar with secondary
sources in your home country that might prove useful for your research. for Pxam
ple, given that cme of my areas of research is the internationalization of' Chinese
brands, I tend to supervise several Chinese students. I always streoss to them the
unportan,e of consulting hoth English and Chinese sources. This is partly because
some secondary sources from different countncs make for an interesting compari
son, hut al!>o using secondary data frorn different countries can help to improve the
validity oJ your results.
I Tow Jo I refert'nce foreign language sources7 To some extent, conventions here
vary depending on the referencing system. Often , non-English sources arc treated
the same, w1Lh <lirect quotes translated into English. lf you are translating the te,..-t
yoursdf, onr way to illustrate this is by insertmg 'translation by the author' in brack
ets ofter tht> quote. For exampli>: 'Wong, 2012: 14, translation hy the author.' Again,
you will prnhahly finJ that there are !>uhtle variations of this approach, so if in
doubt cheLk with your projcc:l supervisor. In terms of how to reference the sourre
m your reference list, typically, the reference will include the title of the book or
article 111 the original language, followed by the English translation in brackets.




J:v;il11:2+in C:ornnrj.:.1 .' 11


A<;tan states, thf'fe may be cultural s1m1larities Hence, you may n.ot wish to cl.beard
the research in the first instam:e.

Yo11 may he in the position uf lwving gathered an abundanre of scrnnJ:1ry data.

Yet, on what basis do you Jccide what to include and what to omit from your
research? Bhunherg et al. (2008: 3 J !JJ list nve factors that should be taken into
f-lC'count when evaluating Sf'condary data - purpose, scupe, authority, audie1Ke and
lormal Table 7.4 summarizes the critical que:,tions that a rese:.HlhCT' might a.i.k
wht-n evaluat.i.ng .i.ecunJary inform.:ition sources. 'l'ht> questions urt: as:mciated with
el:tlh uf these factors.

Scopf' cnvers such quAlitif's as the age and the amount of data available, whether Lht>
information is up to date, how frequently data are updated, what period of lime
they Lover, how information is presented, etc.



The main point to consider here ii, the xll Lo wb.ic.h the purpose relatf'S tn your
own research. lt Joe, noL necessarily bave to be a 'perfect .flt' lf, for crumple, you are
conJucting research into cultural differences between Japanese and US consumers,
you may find that a similar study focusing on VS and South Korean conswncrs may
be of rclcvance, One might argue tbt as South Korea and fa1111n Are both South-East

A more well-known and credible authority on a topic is likely w he more reliable

than an unknown sourl-e. Assessing Lhe credihility of the authortty will allow yo11 lo
JcLermine whether or nut the JaLa warrant mdusion in your research.






t u, t111 Inf

Evaluating factor



n l 011
Why does the information exist?
What is its purpose?
Does it ac.hieve Its purpose?
How does its purpose. affect the typr and bias of the information presenletl?
Haw does it rel.ite to lhe purpose of my own research I
Haw old Is the information?
llaw often Is it upllated 7
How much inrarrnatlan is available?
Whal ;;ire the criteria for inclusion7
lf applicable, wl1at geographic area, time period or language doC!s it cover?
How tloes Lhe information presented compare with sirniliir information
What are the credentials of the author, institution or organization 5ponsaring
thr. information?
To whom is the lnrormatlon targeted?
What level or knowledge or experience Is assumed?
How does the intended audience affect the tYpe and bias 01 tl1e Information?
How quickly can you rind the required information?
How eav to ue is me Information source?
Is there an Index?
Is the Information downloadable into <1 spreaclshEier or word-processing
pr()!;r.im if desired?


The intendf'd audif'nc:f' is a good indicator of the natu1 e and qualily of the data.
Ch,sifying your JaLa on the basis of uimmercial and .:icademk contf'nt n111 hf']p you
to form a judgernenl as to ils appropriateness for your research.

F rmat
The format of the data [f'.g. whethf'r on bard copy 01 in ru1 electronic ve1siunJ Jic
tales the ease with which you can access and interpret the d::ita. Does the layout
make it easy for you lo finJ whaL you neeJ, for examrle? Can you conS11lt an Index?
Of course, if the darn arc vital to your resear<.:b, you will want to use them irrespec
tive of format.
In sum, it is sometimes d1ftlrult to evaluate secondary Jata. The source may be
rdatJvely unknown - even your supervisor may not be aware of its ex.istencd My
adviu: i to Lry Lo use secondary Jata From estahlished sources, which can be verifled
in some way. l.n gcnt::ral, if l:t source i.s unknown and cannot be verified, then it is
probably best avoideJ.

Presentina Seconrt:arv n::1t=1

Essentially, two factors are likely to <letennine how you present your secondary data
within yuur researLh prnjelt. First, whether your data are qualitative or qt1antitative.
SeC"ond, whether your Jata are raw data, or cooked data. If the latter, then you may
he in a position to present the data in their original format. For example, if Lhe


TABI E 7 5

Type of Investment
lolnt venture


H1v tn I in Prirrrv

Value (m)

Value (m)

No. of prolects







83.804 I


116 .166.B

No. of projects

21 ,710.5

' -NI


c h art sl1owmg a br e.1kdown uf t.he UK',; exports b y
Fi,u:wcwl Timl's pub l 1s I1e d n pie
. .
. .
intlustry, you could probably reprotluce this in its o r iginal form at, p r vu.lin g it 1
proper1 y sourc ed , Of, cours" However, i f the .1rticle only quotes t>xpu rt figures, thl:'.n
you wou.l<l need to coru.itle.r presenting the data i n yollt own way.
Tahle 7 5 illustrates the use of s econtlary clata in its origi nal fo rmat. You do not
m't>d to concern yourself with the actual topic. The pomt I am Lryi.ng o make hen'
is that tables, cha, ts and graphs an: ideally su1 te<l to mnny eseorch p roiects. lndced,
Lhcy can tnrm an importan t part of your seLOntlary analysis.
Presenting seron dary Jata also i ncl u<lcs qu l ital.ive data . Le u s y tha t you
intend analyzing secondary data in the rorm of le-aJing economists view on the
global economy. As you are analyzing qualitative secondary data, you may i l l uslratf'
Lheir view,; hy quuung tht>m directly.
. . . secondThere arl:' nu mero us ways of analy7 i n g qualitative a n d qu a n t1tat1vc
ary cl a t a 1 n r".."li' ty, m any or thel>e tech n iques a re equally appl i cable lO primary
. Ch ap te r,s
to t h i s top 1L 1n
ti ata a n aI ys1 s I w ill therefore devote grea ter attention
9 and l 0.

Vnt r Projprt





rfary D ta

When a student wishes to see me to Jiscuss concerns ovf'r sernndary data, it i i.u '
First 1 the swdent has been u nable to locate sufficient
a l ly r1or onf' o f two 1-esons
, they have aathereJ copious amou nts ol d ata b ut a re
. rc-es. Second
"' .
. . cover ed
. . chap te, r, I have
uns,1re what to givt> prorninenLe to in their n:seMch . In Lim,
both these points at some length .

B Y 11 Ow, Yo u should
be aware of just how i m port:.i nt your supcrv1so1 ts to y o ur
I cl 1 ff1cu 1t1es
n:search proiect. Th is include-s their advire on overcom i n g potent1a
with secondary data . Yo ur supervisor ran provide i nval uable ::idv 1ce- on how to
eva luntf' your data, a n d may also be able to reLom men<l seconda ry s ru:s h L

you h avL not yet ccmsi<lcred . Moreover, i f they fed that your top 1 1.. provi Jcs
access to a sparse amount of seco ndary <lata, they may be able lo offer u gg s
tiom, about primary data collection . Once again, when you fed y urself h i t rng
a hrick wall' wit.b your rt>st" arc h , do not forgt>t to consult your pro1 ect sup l:'rv1so r.

=l ri

Sprru,rfarv 0.::tt

If you in tend l ndu<ling primary and secontlary data in your research projcct, then it
cm sometimes he difficult to know h ow to link these two mRin sources of <latll . As
previously noted, one way is to comparP your pri mary tlata with secondary dnta
from books, journal artttles, puhl ishrd slalistks and other sources. Part of Lhis com
parison should invol vP the ..rnalys1s and mtcrpretatiun ot data. For exa mple, let us
i.ay a UK Government report slatPs that the main reason why joint ven tures i n
C h m a faJ is due to cul tural diffcn:nct>s. Yet , analysis o f your primary fi ndi ngs intli
cates that the leading cause is due to a fail ure to agree strategic objl:'.ctives. The key
question hPre is why the Jiffcrence between the primary and econdary data? To be
a good researcher you must wmpare, analy?.e anti interpret both prima1 y and sec
ondary dntr1 anti not fall into the trap oJ l>imply desLribing the <lata.
Secondary data Lan i1l s o be used to inuea(" the credibility of your primary
rt>senrch findings. for instance, if your pri mary findi ngs support the view of leading
au thors in the field, one argument tha t you colJJ make is that yollr reslJls arP l ikely
to be hoth valid and credible.

.-'Research In -Action -
' ;;..t -



fv:il'T1inino sp1irces rf sec'J1 1:iry d.ibi

In the academic article detailed below. an overview is provided on the sources or secondary data
used in business ethics research Irrespective of the topic, much or the content is very useful as it
e1<amines sources of secondary data which could equally apply to other areas or business research
Cowton, C.l (1998) 'lhe use of secondary data in business ethics research', Journal of Business Ethics,
1 7 (4) 423-434
The aim of Cowtons (1998) paper is to promote the interest of business ethics researchers in using
secondary data, either in place or or as a complement to primary data. The article examines both
sources and usage of secondary data, although the details below centre 011 sources of secondary
data. In short. the author Is making a call to business ethicists to pay greater consideration to
secondary data.
In the article. Cowton highlights a number of sources of secondary data, namely: governmen
tal and regulatory bodies. the press. companies, other academic researchers and private sources.
First, the author suggests lhat a source or material of considerable Interest to business ethicists
Is the legal system Examples given here include published legal ludgments, while regulatory
bodies such as the Advertising Standards Authority (ASA) in the UK publish reports which are
often easily accessible to the researcher
In terms or companies. the researcher stresses that much material is publicly available, par
ticularly In the annual report and accounts. This Includes the Chairman's statement lhat can be
analyzed using tools such as content analvsis (this is something we cover in Chapter 10). Refer
ence is made to quantitative. particularly company financial, data as an easily accessible source,
as this information is now published on a number of databases.



Evaluating and Presenting Secondary Data

The researcher suggests that both the press and media in general are useful sources of data.
Newspaper articles can be used as sources Also. In some articles you will find op1nton polls: the
advantage of these 1s that they distance the researcher from the evaluations. However, it is worth
noting that newspaper articles should be treated with care as they have not gone through a peer
review process.
Other academic researchers include data by other scholars Unlike newspaper articles, aca
demic articles are peer reviewed In addition, there is scope to undertake meta-analysis. This is
'quantitative analysis of a sroup of studies that uwestigates the same research questions One
of lhe benefits of meta-analysis 1s that it generates a large sample One issue for the researcher.
though 1s the need to perhaps negotiate with the original researcher in order to gain access to
the dataset.
Finally, private sources are another potential source or secondary data. Here. the author
suggests that these include companies' internal reports and memoranda: this Is in addition to
organizational archives. However. for the student researcher there are clearly obvious difficulties
in accessing such sources. Cowton makes a similar point by acknowledging the potential diffi
culty of negotiating access
Cowlon's (l 998) artide i helpful to researclwrs as it considers a numher of dif
ferent sourn, of seconJary Jata. A useful e'<ercise would he to look at the entire
article, ns it also wscusses using secondary data. Onu: .igam, allhuugh UH.: paper
focuses on husincss ethics, many of the author's points nre equally applicable to
othPr area.s of busmess.





Conrl 1c:inn

This chapll.'.r hns exarninf:'d how seconJary <lata can ht> inwrponlled inw your study.
It defined what 1s me>ant by secondary data and rnnsidercd the advnntages rnd dis
advant..'lges of using secondary data in your resenrch. IL also exam in Pd how to evalu
ate and prese>nl seLOn<lary <lata. Herc are the key points from this chapter:
Secondary data are data that have been collected by other researchers.
Secondary data can be classified into electronic and written formats, and subdivided into com
mercial and academic purposes.
The main advantage of secondary data is that such data save time and money for the
researcher. The main disadvantages include the potential difficulty In verifying the reliability
of the data and whether the data are applicable to your research problem .
If you have the capability of reading foreign language sources, then this is clearly an advan
tage as it allows you to consult a wider range of secondary sources.
Secondary data can be evaluated by considering its purpose. scope, authority, audience and
Your university or college library is likely to contain the majority of secondary data you wilt
need. However, consult your supervisor if you need more specialized data.
Cooked data can often be presented in its original format. whereas raw data needs to be
processed by the researcher.


Helen's chosen topic for her research project Is meraers and acquisitions In the European bank
Ing se tor . As BSc (Hons) Finance student, Helen has learned that the market has experienced
consolldat1on in recent years. and she is keen to examine me Impact or mergers and acquisition
M6.A) on the banking sector workforce. She has chosen a case study research design, and
intends to analyze two higl1profile mergers and one acquisition. AU or these took place within
the last 12 months.
Helen is fortunate In that her research supervisor's main area of research is similar to
her own proposed research topic. She has therefore sought advice from her supervisor on
nume!ous occasions. One of the suggestions her supervisor has made 1s tor her to Include a
question that examines whether or not the bout or recent mergers and acquisitions looks set
to continue.
Fortunately, Helen's topic has received detailed coverage across both the business press
and wider media. In addition. in recent months several empirical studies have been published
in leading academic Journals. Helen also has a list of all the mergers and acquisitions that have
taken place in Europe since 2000, including those in the banking sector. The main problem she
races is that this s simply a list from a specialist trade magazine; it provides no detailed
analysis or d1scuss1on on the companies Involved.

Case study questions

Suggest an approach that Helen could adopt to evaluate the suitability of the secondary data
she has collected.
Discuss the potential problems Helen might encounter by only using secondary sources.


Pauline has decided lo base her research topic an 'key changes in British family life over the
last 30 years' Although this may not seem like a business-related topic, she is interested to
see how key changes. particularly household demands, might Influence the marketing or cer
tain goods ad services. In the firt Instance. Pauline was very keen to adopt a longitudinal
research design, but she now realizes that this is not a realistic option given that her project
needs to be fully completed inside 12 months.
Cosequentty, Pauline has decided to base her entire project on secondary data. She is
es pec1ally pleased lo have discovered the General Household Survey (GHS). and Intends using
this as her exclusive econdary source of analysis. The GHS includes rigures on car ownership,
pensions, sport partlc1pat1on and the use of health services. Pauline believes the GHS is Ideal
tor her research as it fully addresses her research problem. The GHS Is described thus:
The eneral Household Survey (GHS) has been providing key data on life in modem
Bntam for over 30 years. The annual Living In Britain report, the printed and anline
docent that brings its main findings to a wide audience of researchers. students.
dec1s10n-makrs, medi and more, adds an extra dimension of explanation. Insight and
analysis. (National Statistics, 2002)



Much as Pauline would like lo carry out primary research. she reels that this is not a viable
option due to time constraints. She believes It would be impossible to generate a sample the
size or the GHS. Pauline has also ruled out using other secondary data anatvsis because she
believes the GHS fully addresses her research problem. Still. as her research supervisor. Pauline
has met vou to seek clarilication that vou share her views in relation to the GHS.

Do you agree with Pauline that the GHS should be used exclusively? Give reasons for your


1. You have decided to base your research project on the current economic crisis. Your
leading research question is 'How has the current economic crisis impacted on trade
between the UK and USA?' Suggest possible sources or secondary data that you might
use to answer this question.
Answer. Frgure 7 1 provides an eclectic range ol sernndary sources The m,11nrrty of these are
likely to be ideal ror the above topic. Moreover, you may also find that each country's
respect1vP Chamber or Commerce and Embassy wcbsrtcs orrer useful information. The
1mpor t.int lt11ng here ls to try to consult a wide range of secondary data At the tmr. ol writing,
the global frnanclal c, rsis continues to dom111a1e rJa,ly news. Consequently. you should have no
troul.Jle tn hnding an abundance of rnformahon. Rcrnembur that the main aclvantage m
consultmg wlcte range or seconctarv 1lata 1s th ilkcl1hood of sireater reltability.
2. can I base my research project entirely on secondary data?
Answer: Some universities antJ colleges Insist that primary dalt1 rnut also feature in a research
p roject. Ilowcver lhts tends to he more applir.r1ble to postgraduate rather than undergraduate
progrnmmes If in doubt, check with your resc, rch suprvrsor If y,iu decide to only rorcr 10
secondary data. then 11 1s wo, th cons1der1ng expla1nrng to the reader why you have llted not to
undertake pnmary researth For rnstance. one reason m1gllt be that there Is already a plethora
ol data on your chosen subject so you do not feel the neell to include pnmarv d ta.
It you arc perrrntted to condurt your research using purely secondary data, consrder the
followtn!;l questrons
Can my research p roblem be addressed by simply including secondary data?
How significant might the contribution or primary data be to my research?
3. Can I use foreign language sources in my research project?
Answer. The short answer 1s yes This IS ctearly an i'Jt1v11ntage tor the student researcher as 1t
allows access tn ii wic1er rilnge nf sources H(1wevet, one drawback 1s tho time it taKcs to

consult ltulh English ana foreign langudge sourm, H nee the rmportanc.e anached to
cons1ctenng the five li1cto1, when evalu ting secondary data - purpose, c pe, authority,
a11d1enre and format
4. Suggest the possible advantages of using secondary data in an electronic format.
Answer: There are two key advantages associated with usrng secondary data m elec:tronic
format time and organi at ion 111 ability to ,1ccess an extensive range of articles
eleclronrcc1lly will ave you a <r at deal I tnnc, wluch you rnn then sprnd on c.cessing a
wider range ol i.ourccs nr analyzrng vour r1ni1111p,s 111 gredter clpth ltl org 111Lal1011 ul your
linllmgs can tso b mnre easily achlever1 usrng , l ctront :.orrrcc For x.1mplc, a vast m unt
('( d I
n " w b transport m "Jall Y on rngl L'<jB m M rv 'ii 1<

American Ma1kcting Asso<..iatiun (2013) 'Secondary data sources', http://www.

marketingpower.rnm/Communitv/ ARC/Page!>/Research/SPcondaryData/default.
a$px, accessed 7 August 2013.
Blumberg, B., Cooper, D.R. and Schintller, PS. (2008) Business Res,wrch Method
(2nd edn). Maidenhead: McGraw-Hill.
Cowton, C.J. ( 1998) 'The> ll,e or i;ec:on<lar y data in bw,iness ethics research',
Iounwl of Business Ethics , 17 (4) 42J-414.
Gnll, G. and Allsop, D. (2007) 'Annual hours working in Britain', PPrso,mel Re1ne11J,
36 (5): 800-814
Nalional Statisttcs (2002) 'L1vmg in Britain Reults from the Gent>ral Household
Surve y', onlinesource: http://webarchive.nalmnalan.:hives.gov.uk/20 I 005200 l 1438/
statistics.gov.uk/Ub2UU2/defoult.asp, accessed 19 September 2013.
W:iters, D (1997) Quantitatitll' M<'tlwd.sfur Business (2nd c<ln). Harlow AJJisun
Wc:.ley Longman.
Wilson, A (2006) MarkNi11g Research . .An Integrated Approach (2nd edn). Harlov.,
Prentice Tiall.

Furtt" r Pec'inn
Bryman, A. nnd Bell, E (2011) Business Resparc/1 Methorls (3rd t"rln). Oxford:
Oxford Univcnity Press.
Couper, D.R. and Srhindler, P. (2008) B1Li11ess Research Metlwrls [10th Pdn).
Maidenhead McGraw-Hill.
QuinJan, C. (200 l) Rus111ess Research Methods. Andover: Cengage Lt"arning.
Saunders, M.N.K., Lewi:;, P. an<l Thomhill, A. (20 J 2) Research Methocls for JJ11siness
S1ttdent.s (G1.h e<ln). I Iarlow: Prf'ntiu.! Hull.
Skwarl, D.W. and Kanum, M.A. (1993) Sf!ro11dmy Resecirch: lnfomrnrinn SourcPs
and Methud_., (2nJ ctln). Newbury Park, CA Sage.



After reading this chapter, you should be able to:


know the stages m the sampling process:

understand the reasOrls for sampling;
recognize the differences between probahility and non-probability sampling;
apply a range of sampling technique!>:
appreciate the combining of sampling techniques;
know how to determine sample size; and
recognize issues associated with response and non-response.

In Lhe preceding {WO chapters we have conc:entrnted our attt.:nliun on Jata colll",
lion melhod\ An impo1 cant p,lrl t1f primary data rnllection 1s sampling. [n general,
the,c arc: two fundamental questions often associated with sampling - 'How Ju I
know which :.ampling technique to use?' and 'What is an appropriate sample size?'
This chaptc, sc.:ts out tu a nswer thrse questions.
As noted i.tl Chapter 6, wht!TI undertaking you1 resc.:arJ1, the likelihood is thiit you
will nef'd to collect primary data. However, in order to answer your research questions,
it is douhtful th.it you will be able to collect data from ,ill cases. Therefore, you \-viii
need to select a sample. The entire set of cases from whH h yoUJ sample is drawn is
callc:J the populntion. ln reality, most resc.:archen, neither have the time oor Lhc
re,;ources lo analyw the entire pop11lation. Fortunatdy, a range of sampling techniques
allnws you to reduce the: number nf cast>s taking pa1t in your study.
The samplmg techniquc:s(sJ you select brgely depenJ 011 whctht.:r or not you wish
tu in Fer thnt your tinJings apply tu the wiJer popul:nlon. Howt.:vcr, you may not wish
to generalize, hut ntm to provide a 'snapshot' of one partirnlar cusc, e.g. asking busmess
rnstoincrs what they think of one particular dclivcry process tonne particular supplicr,
rnthe1 than lo all suppLers.

This Lh:.-ipter hPgins hy describi11g the -.erie1- of stages involved in thP s.in1pling
process. During the early st.iges I make rlear the distinction hctwcen commonly
used terms in sampling; thesP includP population, sampling !ramc and sampling
it_,;;plf Then we explore the reasons for engaging in sampling and how yuu set
about sc.h:ctin.g yo11r sampling framc. Next, we examine s,impling tecbn iqul:!s.
Thcrc ure hasi,ally two broad types of sampling: prohahility [also rPterred to as
random) sampling and nonprobab1l1ty (also referred to as nun-random} sampling.
Each sampling method is dearly dt>finecl ancl thP main methud1- are illustrate d
with c:ase examples.
Yau Jo not nelessarily havc lo m,triLl youn,elf to 011( typl;' of sampling tech
ni4uc. Many students combine different types of sampling technique to answer
their research problem. Yet, whatever method you select, it is <>swntial that you
explain thP rationale behind your choire. A number uf factors are explored that
might lead you lo rnmbin sampling techniques. This section etmduJel> with :l swn
ma-ry of the strengths unJ weaknesses of euch sampling techruque.
Following this, I deal with a key issue in sampling - tl1Rt of sample size. Various
factors influence sample size. In essence, these factors can be divided on the basis of
eitbe1 subjective or :-;tatistic'al mc-thods.
Your choice of sampling technique can directly impact on your response rate.
Usually, non-probability sampling techniques are associated with highcr rcspunse
rates. llowevcr, this has to be balanced with the quality of the response, and not just
numberS. The concluding part of this chapter addresses how to deal with poor
rcsponsc rates, and how in some instancei; it may require the introduction of ::i sub
stitute sample

Clearly define your

target population

Select your sampllng


Choose your
sampling technique(s)

Del!!rmfne your

sampla slze

Collect your data

Assess your response


Stage in the c: mplinp Pror.esc;

Pigure 8.1 illustrates the stages that you are likely to go through when conducting your
sampling. In reality, sampling is not always a st1aighth.1rward line::tr process, as will
become apparent later on in this chapter. Still, figure 8.1 provides n useful templAte
from which to work wht>n choosing and lmplementing your sampling method(s).
Thrm1ghout the chapter l will discuss l;'ach rl;'spective stage in the sampling pro
cess, c:onduding with advice on how to assess your response rates.

taoe 1 Cle;:irly DefinP Your Target Population

Your first stage in the sampllng process is lo dearly deH.ne your target population.
Popuhllion is rnmmonly related to the munber of people living in A particular
rnuntry. However, population does not just relate to individuals. A pop11/ntion is
also a clearly defined group of research suhjec:ts that is being sampled. For example,
in the electronics seclor, the population might induJe all dectronics manufactur
ers in Japan (one partic11lar country). Similarly, if you considered doing research on

i:1GURE 8 .1

11g , ,n he Scl111p1 19 p, 11 l'S',

tourism dr.:vclopment in London, you may consiJer the population as all tourist
attractions locateJ within the London area.
Defining a population Is not always straightfo1wanl. It largely depends on your
researLh questions and the c:ontext with which yolJ wish to study. When defining
your population, you r1eed lo establish the types of case that make up your popu
lation, e.g. in<lividu::ils, Finns, households, etc.
Tn essence, a popufation can often be hroken down by moving from the gPneral to
more spedfic. Fur instancP, the population of a coL111Lry !>-uch as the UK can be sub
rlivi<lcd as follows: East Anglia, Cambridge, area in Cambridge. Most students do not
have the time or resource:,; to target a large popul::ition, and lhPrefore need to target a
smaller population or consider sampling. However, you might finJ that the size of your
population is smi:ill enough for you to target d1e lire population. Let us say thst you
wanted to target manufacturers of large passenger aircraft. In the main, the pop11lation
consist,;; of two companies -Airbus and Boeing. So, you may decide to target the entire


popul:ition. For this reasnn, Pigure H. l shows an arrow leaJing from 'clearly <lefine your
target population' to 'rnU1.:ct Jata'. Effectively, you are bypassing the stages concemed
with .sampling, as you are targt..:ting the whole popul..ition.


000 0

0 0
0 0

Sta:: 2: sLer Vnur c: Mplint'f J:rarr,



A sampling frmnr is a list nf t.he actufll n1ses from which your samplt..: will be Jrawn.
The sampling framl! must bl' representativf' of tht, population. However, in reality
it is difficult to know when a sample is unrepresentative and should not be used.
An important as pect of your sampling frame is to consider how such a !1st of
people or organizations can be located. For example, if you intend surveying local
building firm,; in your area, yo1 may identify your ropulat.ion as all those firms
within a tenmile radius nf ynur address. Al First, locating a list of building firms
may seem relativ<::ly simple:. An obvious source is your local telephone direct
ory, although bear irl mind that not all firms are likely to publish their details in
the phone book.. Therefore, how Jo you know that those firms in the directory
arc represcnt11tive of the pnp11latio11? We will come hack to this point latf'f in this
Some researchers do not have access to a sampling frame such as a definitive
directory provi<ling company <letails. If you find you1self in this position, yo u will
need to devdop yoW' own sampling frame. This can be a time-ronsuming and chal
lenging process. Moreover, yo u are faced with the problem of trying to ensure that
your sampling frame is representative of the population. In some respect,;, this can
be achieved hy referring tu earlier studies that might illustrate key cl1aracleristics of
the population.
Onet" you h:ivc cstablishcJ your sampling frame, tht.: next stage is lo LOt'lSide.r
sampl111g and sampllng techniques.

c;tage 3: Cho se Your am11Lin{;' Technique<s,

Prior to examining the various types of sampling met.hod, it is worth noting what is
meRnt by 'sampling', along wit.h reasons why you are Lkcly tu sdect a sample.
Taking a subset from yuur Lhuse.n sampling fram1.: or entire population called
samplin,!. Sampling can be used to make i.nfon.:m:cs about o population or to make
gencraln.ations in relation to ex1sting theory. In essence, this depends on your choice
of sampling technique.
Figure 8.2 illu..-trates Lhe relationship hetween population, sampling Frame and
sample. Thi" large circle rt>present,; the entire population , the inncr cirde Lhe sam
pling frame, anJ the centre drcle the sample. The :;izc uf c:ach circle represents the
number of cases. Fo r example, the population might be 1,000, a list of some of the
cases rnight represent 500, while the sample might be 250 (R subset of the sampling
fran1e). Each case is represented by a small circle.



Individual cases

elc1! n1t 1p betweeri population "'3111111 1, lrame a11d sample

IAJh 11 "'a I ner1 +., selert



Defore we examine factors that might leRd you to engage in sampling, it is worth noting
that businesses also use sampling Lechniques when con<lucting research. Suppuse, for
instance, a c:ar 1w111ufacturer is i.n the prncess of launching a new model in Germany
and wants some data about possible sales. E:;sentially, there are two ways li finding this:
1 It could ask every person in Germany who might buy the car whether or not they will actually
buy it, and how much they would be prepared to pay.
2 It could take a sample of people, asK them whether or not they will buy the new model, and
then estimate the Likely demand from the population as a whole.
Obviously, the first option is both costly and Lime-consuming. Purthermore, the car
manufacturer would be unlikely to cunlact the entire population of Germany, as noL
all people drive or are 111t1;"rested in buying a new car. In short, it is not feao;iblf' to con
Juct research on such a large scale, E'spedally as it is not targeted Lo those potential
purcha.o;ers of the ucw car. Option 2 is for more fotiblt! a sc1nipling can be used to
genera Le dependable results, rather than focusing on the whole population.
ln a<ldition, it is impracticable to even attempt to survey the entire population.
These impracticalities include the following:
there is no time to survey the entire population:
there are not the resources, particularly the budget. lo survey the entire population;
the skills to c:ollec:t all or the data and interpret findings are lacking:
the size of the population is too large;
it is too ditricult to access the entire population: and
you wish to compare your research to that of previous studies.


FirsL, an obvious impractkalitv t.hat preventc; you from st1rveying the entire popula
lilln tinw. Even if your population is within dmc geogrnphical proximity, il can still
be time-rnru.uming 1..onta(tmg, annlyzing and inlcrpreting the fimlings of each case.
Sernn<l, a gener,il lack of resources is hkdy to prevent you targeting the entire
population. In partirnlar, the cusl involved. The main cost 1s assocrnted with admin
i.<.tcring your data rnlkction W hethe1 you opt ror interviews or :.i questionnaire
survey, assoLiatc<l costs might include travel, ai.:rnmmoclation, postage, ell.
Third, your lack nf skills as a researcher may hamper your ability to -'>wvev the
entire population. If, for example, your stu<ly i.nvolvt"s international research, then
your lack of language skills may restrict your ability to larget the entire population.
Similarly, you may not have the skills rPCJllired to analyze l::1rge volumes of data 01
to allow tor the production ol a simple, clearly interpreted c;et of findings.
Fourth, the size of the population means that it is wholly unrealistii.. to target the
entire population. Examples include populations of countries ur reg10ns within a
country. Moreover, targeting companies within certain sectors, e.g. small businesses
in Pram_c, is similarly unrealistic.
Pifth, you might flnd it diITirnlt to ULCess the entire population. lnstAnres that might
restrict your acces.'i indudc: a sensitive topic, ::i lack of published data on the whole
population, and a general unwtl.J.jngness of participants to take part in your resea1ch.
Finally, pcrlrnps not c;o much ;m impracticality hut a preferred J10ice is your
intention lo compare your research with that of earlier stuJ.ies. ln order to be able to
compare like-tor-ltke data your intention might he to adopt the same sampling
meLho<l as other researchers.
In this section we have explored l'l numher of reasons For selecting a sample. The
next stage is tu consider what romtitutes a :;uitable 11mple size.

carir'ir" trcrniriu s

Sampling techniques

Probability sampling


Simplo random
Stratified random
Cluster sampling
Systematic sampling
Mulll-stage sampling



Non-probability sampling

Quota sampling
Snowball sampling
Judgement sampling
Convenience sampling

nrq 1c

he if you wcre to construct a sampling frame first and then use a random number
generation computer program to pick a sample rrom the sampling frame.
'Probability or random sampling has the greatest freedom from bias hut may
represent the most rnstly sample in terms of time and t>nergy for a given levt>I of
sampling error' (Rrown, 1947: 337). In addition, it is not always apprupriatt for
those tu<le.nt'i who are interested in doing case study research. There are several
different types of probability sampling tethnigue We shall now examine each one
in tum.
1m1 l., s The impk random sample means that every case ol the
population has an equal probability of mdus1on in your sample. Por example, kt us
say that you want to survey 250 members of a sports club. If the lotal membership
is 1,000, tht> probability of inclusion in the sample 1s:
, 1


In gc>neral, sampling tt>chniques can be divided into two types:

probability or random sampling: and
non-probability or non-random sampling.
Before choosing your c;pec1fic type(s) of sampling technique, you need lo decide
your broa<l sampling technique. The reasons that l have referred to sampling i11 both
the singular and the plural 1s because in some cases you may wish to odopt more
than one sampling method. We explo1 c reasons for this Inter on.
Figw-e 8.1 shows the various types of sampling technique. These nm he catego
rized on the ha.sis of probability or non-rrohability sampling.
Prnhat--fty ::;:31T1r,tinn
Probability SAJ11pl1ng means that every ilem in the population has an equal chance
of being mcludPd in your sample. One way to undertake random samplmg would

P(inclusion) =

sports club members

= 250 = 0.25 (i.e 1 in 4)
total membership

Disa<lv.inlages associatc<l with simple random sampling include:

a complete frame (a list or all units in the whole population) is needed;
in some studies, e.g. surveys by personal interviews. the costs of obtaining the sample can be
high if the units are geographically widely scattered; and
the standard errors or estimators can be high.
(Ghaurl and Gr0nhaug, 2005: 149)
- >< 11,i
Systematic samplrng Le; where every 11th lU!>e after a rnndnm
start is sclected. For example, if surveying a sample of consumers, every fifth consumer
may be sclected from your sample. You nee<l to ensure that there 1s no regular
pattern in the populat ion, whJCh , if it c01ndded with the sampling interval, or in



olher words distance be!V\ ecn your chosen cases, might leac.l tu a biased sample. In
addition, thf' suhjects in the population need to be ordered in some way. For
example, surermarket stock that a1e ordereJ on a shelt; names of businesses orc.lerd
in a hu:,iness Jin:Llury; names of health dub memhers ordered alrhabettcally on the
company database; or aJJre:...e!> ordered on the basi!> of postcoJe'.-1.
Referring to the earlier spQrts dub example, we kiiow t.hat we are lo select one
member in four. With a systematic sample, we would make a random start between
one ::inrl four inclusivt:>ly, possibly by usiug the last two <ligits in a table or mnc.lom
The ac.lva.ntage of this sampling technique is its simplicity. Moreover, ,1 s:imrling
frame is not always required. The main c.lownside is lhe potential for a regularly
ocCLu ring anomaly impacting 011 the quality of the data. By way of illustration, let
us say that ,1 manufacturer of llL'rnry orna.mwts deddes to implement a syslQ.rn of
Lota) qualiLy managt"ffie.nt (TQM). A maln featurP of their TQM strategy is regtJar
quality control LheLks oft.heir finest bone china range. Adopt.mg a systPtnlltk sam
pling method may sound fine in princ.iple. Yet, what if the machine painting the
ornrum:nts devPlops ll fault? lf this fault leads to the paint supply running out al
regular intervals, tl1ese may generate wildly inaLCLtrate data as to the overall 4ual1ty
of the production output. One way to overcome this problem is to have:: random
rather than systematic checks.
rl i1 l r 11u n san1pl111 StralifieJ sampling is whcr(' t.he population is divided
into strata (or subgroups) and a random sample is taken from t.:acli i.ubgroup. A
suhgroup is a natural set of items. Subgroups might he based on company size, gender
or occupation (to name but a +ew).
Stratified sampling is urten usPd whe:>re the:> rf' is a grPat <lt:>HI of varintion within a
population. lts purpose is to ensure thL evPry stratum is adequately represented.
Hence, the number of items chosen from each subgroup may be proportionate to
the size of tbt:> straturn in relation to the population. for this reason, when malung
inferences in relation to the wider population, strAtifled s,1mpling usually has a
smaller ,;ampling errur than imple random sampling. This higher level of precision
is a key ac.lvanlagc of strat.ified sampling. Still, it can sometimes he difficult to get
hulc.l of detailed information on your entire sampling frame, thereby making it prob
lematic when it comes to identjfying your strata.
A systt:>matic approach can be taken when conducting stratified sampling. The
stAges involved are shown in Figure 8.4.
The following example illustrates how stratified s,1mpling might work in prac
tice. A retailer employs a total of 2,000 people. Using a sampling frame of 1,000
(50 1}{, of the total workforce), your inlenLion is lo conduc:l researLh into ahsence in
the workpla,e. You have idenlifieJ the following st.rata - genJcr anJ occupation.
The total workforce is made up of 300 ma.le and 400 female mtmagers, and 800
male and 50() female employees respectively. Using stratified snmpling, your sample
of 1,000 woulc.l consist ot 150 male f.lnd 200 fern;ile rnan,1gers, And 400 male and
250 female employees (see rigure 8.51.

Choose slrallflca.tlon variables

Divide sampling frame Into

discrete strata

Number each of tho case:;;

within eact, stratum with a
unique number

Solect sample using simple

random or systematic



Stage 11 trali u 11 .a111pl1ng























S = samplE;J; R = response rate



All l'X unnt 1)1 rrollllti.l ctlT'pli11g

C. u tr, mp ing Cluster sampling is where t.h whole population js divided into
cluster or groupi.. Sub'.-lequently, a random sample is taken from these du$ters, .ill
of which are used i11 t.h.e final sample.
Cluster sampling is advantageous for those resea1chcri. whose subjects an: frag
me11tec.l ovt:>r a large geograprucal art.:a, as it ,;aves time anJ monf'y, rt: for example,
you wish lo xamine healthc.::are provision among hospitals in the UK, rather than
Largeting t.he entire population, yot1 can use cluster SAmpling. You may start by rec
ognizing regional N1 TS (National Health Service) Trusts AS clusters. Subsequently, a
sample oF these Trust.s can be chosl"n at random, with all hospitals in each Trill-t
incluJed in the furnl sample.

'l'ht> stages to dustrr sampling can ht> summarized as follows:

Choose cl uster grouping for your sampling frame, e.g. type of company or geographical region.
Number each of the clusters.
Select your sample using random sampling.
In the earlier cited example, the main aJvantage ot duster sampling 1s Ll.1..it 1l saves
the resetircher the time and money ot having to travel to every hospital in the UK.
l - 1
Multi-.,i:age sampling i1' a prou:-ss of mewing from a hroad to
a narrow sample, using a stcp-hy-step process. [f. for example, :1 Fn..'nLh publisher of
a pct food magazine were to Londuct a St1rvey, 1t could simply take a random sample
of dog owners within the entire French population. Ohviuusly, this is both expensive
and time rnnsuming. A cheaper alternative would be to use multi-stage sampling.
In essence, this would involve dividing Pranu: into a nurnbt'r of gc>ographical
regions. Subsequently, some of these regions are chosen at random, and then sub
<livisiuns are made, perhaps baseJ on locul authority areas. Next, somt.: of these are
again Lhoen at random and then divided into smaller areas, e.g. towns or cilics. The
rnuin pl1rpose of multi-stage amphng is to selt>ct samplt.:s which .ire concentrated
in a tew geographical regions. Once again, this saves time and money.
Nor-nrr,l:'1bility ,)m L'r
If, for t>xample, you decided lo ust> a non-probability sample 1 n your research, then
Lhe probability of each case heing selected from yo11r total population is not known.
Thus, it is not possible for you lo make statistical inferences in relation tu th<! wider
Non-prohability sampling is often associated with cast.: study research design and
qu:ilttat1ve researLh. With regards to the latter, case studies tend to focus on small
samples anJ are intended to examine a real-life phenomenon , not to make statistical
inferences in relation to the wider populaliun. A sample ot participants ur Lases does
not nt.:ed to ht;> representative, or random, but a dear rationale is needeJ for the
inclusion of some cases ur in<liv1duals rather than other1:,.
There are sevcral advantages and disadvantages assuciate<l with non-probability
sampling. We shall examme thf'se in rPl:tt1on to <-'ad1 of the respective non-probahility
sampling techn1qut>s. Thec;e are now explored.
r '.Jt , npl11, Quota sampling 1s a nun-random sampling technique in which p..irti
c1pants arP chosen on the basis of predetermined charaLlerislics so that the total
snnirle will have the same distribution of rharacteristics as the wider population. One
prnblem :c;souattd with quot;i s;:irnpling is the difficulty of gaming information on Uk'
characteristics of those in your intended sample. Ln addition, :-is a non-probability
samplii1 method you are unahle to infer from your results to the wider populaliun.
J h1 l JI p I
Snowhall sampling is a non-random sampling method that uses
a fcw cases to help encourage other cases to take p:irt i n the study, thereby increasing


sample si:w. 'This approaJ, is mot appucJble in small populations that are d i fficult
to access due to thf'ir "dosed" nalure, e.g. secret societies and inaccessihle professions'
(13rewerton and Millward, 2001J.
Tr, for example, you want to inlcrview managing director<; of local rnrnpanit>s,
you may finrl it difficult accessing a Stuficient n11mbt:r uf directors. Snowball sam
pliug allows you to pcrhaps uvtrc:ome this pmhlem by using Lhe d 1 rf'ctors ym1 do
know to contact other directors within their business network. I low can this be
achieved? WPII, oftt>n you tend tu finrl that people in sc>nlor management positions
are members uf various professional bodies, h11sines assrniatron Jud, in general, of
a wit.It.: hus1nes net work In f'ssence, you can use a few uf these key individuals to
rn11taLl tht>1r wider business network about your research.
'l'he advantage of snowball sampling is that other individuals can do much or the
work for you in dt>veloping your samplf' size. l.n a<lJition, ,ome of th ese individuah
may he crucial to your reeard1 The disadvnntages with snowball s.impling are that
they provide the researcht>r with very litlle control ovcr the cases within the sample.
for example, indiviJua.ls are likely to contnct dose frien<ls, who may share similar
views. I lence, thert.: is tht> possibility of obtaining II biased , unrepresentative numher
of cases.
r1 n
'11 e 1111hr1
Convenienre sampling is selecting participants because they
are often readily and easily availahle. Typkally, convenience sampling tt>nds to be a
favo11red sampling technique among students as 1t ls mexpensive and an 'cosy
option' compared to othe1 sampling techniques. Convenit>nce sampling nften help
to overcome many of the limitations associated with rt>March. For c:xarnpl t>, ustng
friends or fanuly a part of your sample is easier than targeting unknown in<liv1dua];,
A key limitation connected with convenience sampling i!i the potential fur high
levels of bias as well as the inahility to make gent>r.ilizations across the wider popu
l11tion In short, convenience sampling can be defined as a non-random sampling
technique lhut is chosen by the researcher tor its practicality.
u o vr
u nPnl ll rnp1
P11rposivc or judgemental am piing is a strategy
in which particular setti11gs, pcrsons or events are sele,ted Jcliherately in order to
provide 11l'lportant information that cannot be obtamed from other Lhoices (Pallun,
1990; Maxwell, ] 996). It is where the researrhe, includes case-S or participant in the
sample her;iuse t.hc::y believe that they warrant Inclusion. In otht>r worcb, incluion is
based un your own personal judgement. A numher of reasons may eXJst for yuur den ding
to .i.ncluJe certain cases but e..xdudf' others. These can be St1m.manzed as:
a unique case;
a critical case; and
a focus on heterogeneous or homogeneous groups.
A unique casL i one that dtsplays a chJractf'fistLC that is not shared by other cases.
Tb illustr.ite, let u ay that you are conducting a stu<ly into the world's leading
hrands. In general, these are dominated by Lht> USA and Japan. However, if one


European hrand - lel '_o; say Nokia - was the only one representing Curopf' in the Lop
ten, this might he interpnte-d lh a unique L&.c. By indud,ng a w1i4ue rn_o;e, the
intenlion is tu Lomparc anJ rnntrast findings with othf'r ,a,;es in your sample. The
inclusion of a 11n1que u1e can often provide interesting findmgs that can he turthe,
f'xplored in later studies.
A critical Lase is um: that is essf'ntial to ynur ri>searrh. for example, the USA
wottld he v1ew1"d as alrttirnl C3SL' in the wntext of re:.eard1 into the world's IPading
economies. In other wnrds, failure to mclude the USA in yolll reM:an:h is likely to
have a negat1v1:: mlfH:tLt on the credJhility of yo1.1r results.
Finally, you may whh to exnmine either heterogeneous or humugenPous groups.
ThP Former mighL comprise student pArticipirnts f1 om different cotuttrie\ whereai.
Lhl' laLtcr would foLL1i. 1m a group of stuJento.; from one partu:ular co111ttf)
One of the drawbaLks assouatcd with purposivl' sampling is the potential for
san1pl111g h1as. Moreover, you are unable to generalize your resea1 ch finJings to the
wider populaLion.


t t:?nt ., . d we ne.sses 01 s





Leas! expensive. least

time consuminR, most
Low-cost, convenient. not
lime-consuming, ideal for
explor ato,v researlh designs
Sample can be controlled for
r.ertain tharacteristics
CiJn e5trmale rare
easily understood, results

Selertion bias, sample not representative.

not recommended for descriptive or c.ausal
Does nul allow generalization. ub1ectivc

()uola sampling
Simple randon,

CiJn increase
representativeness. easier to
Implement than simple
random sampling. sampling
trame not always necessary
Includes all important
sub populations, precision

You Jll not rll:Cl:SSariJy have lo restrict yourself to one type ot sampling teduuqul' MJny
researchers combine various methods when conducting their rl:!oiearch. Of course, you
will need to justiry your rea'ions or combining your chosen metl10Js within ynut
re.earch. First, you might 3J gue that similar studif'S havf' comhined sampling tcd1ni4ue)).
Therefore, in nrder to compare finJings, you wi.JJ lo mah: a u1!oie for combining meth
ods. St>cond, you might he using methodological triangulation as art of your study.
Therefore, in orJl:r tu increase levels of reliability and validity, you n1ny wish to make
the case for combining sampling methods. Finally, you may simply arguP that combin
ing methods mPans that you are better placed to answer your researJi prohlem.
In general, ccrtain sampling techniques are bettt>r rnmbined than othl!fs. for
instance, if using conven1ence sampling, it seems natural to perhaps incluJ1:: La:.es
w1th111 your sample that arC' crudnl to your research QuJgement sampling). While
using probability .md non-prohnhil1ty mt>thods tor the same population l:an be
unJcrtak.l'n, it is often tin1e-com.uming anJ costly for the re'iearcher.
Table, 8 I illustrates strt'ngtlu, and wcakne:,:.c:. a:,sociateJ with each respective
sampling technique. By now, you should he aware that there 1s no one 'hesl' sam
pling method. As noted, a number of factors are likely to dictate your choice of
sampling technique, I lowever, Table 8.1 acts as a useful starling point.

tar "': Determin Your SamplP Si7P

'T Tow large does my samplt> size' need to be?' Student:, havl: Lon fronll'd mL' wi1 h this
question on rnuntless lll:Ca'iiom. In short, there is no easy answl:r. A number of Fae,
tors mflul:nce your JioiLe of samplf' size_ Howt>ver, remember that before you make
your choice, you should have already established two 1mportanl sets of c.:riLeria.
firsr, yolJ arr> ahlt> to clearly define your population anJ, second, yo11 havl' dcll.'.r
mined your sampling Frame.


ting 1e lln

Ouster sampling

Fasy to implement.
B1rK (,

elecliun bias. no assurance of

Time COl1SUllllll'.!j
Difficult to construct Sdrnpllng rra111e.
expensive, lower preclsron, no assurance
of representativeness
Can deuease representallveness

Oiffirull lo select relevant stratrficallon

variables. not feasible to stratify on many
variahles, expensive
lrnpredse, difricull to compute and
Int erprcl rcs11l ts

t, 17 1

ul some respects, the size of your sample is likely to be detemtincd by the nature
of your research philosophy. Ir you have adopted a positivist stance, then the likeli
hood 1s that you a1 e inll.'.reskd in genera Ling n largf' sample that permits statistical
analysis. On the other hand, J you decicll' to engage in an mtNprPt1v1st approach to
your resc>arch, thl"n you are likely to be concerned with a smaller sample. In sum
mary, your choice of sample> size dPpends on:
the confidence you need to have in your data, i.e. the level or certainty that the characteristics
or the data collected will represent the characteristics of the total population:
the comparative sample size or earlier studies;
the margin of error that you can tolerate, i.e. the precision you require for any estimates made
from your sample;
the types or analysis you are going to undertake, in particular the number of categories into
which you wish to subdivide your data, as many statistical techniques have a minimum
threshold of data cases for each ceU;
the size of the total population from Which your sample is being drawn; and
using formulas and published tables.


First, if you wish lu make an inference to tl1e wider pupuhiliun, as a ' rule of thurnb'
thl: ample sii.e should consi,;;t of at least 30 cases. kleally, where the population il>
less than 30, lhe entire popttlulion shou l d be inciudcJ in the !>tudy, although this also
depends on the extent to which a sa mplf' is homogen eou. . I n general, the more h et
erogeneous a popufat1011, the larger the sample requ ired to acquire a representative
samp le, while lhl: mure homogeneous, the less variability in the dislri bution of' the
characteristics in the population . Th e ronfidimr.e l.r-vel is exp ressed as a percentage,
typical ly q5% or 99%. lt u, lww cunOJenL you arc that a parameter fa.11 wi th.in a
specif1C' ran ge of values. A parameter is a 11opL1lation charaLleristic !>"U ch a!, a rr opor
Liun (P) or mean. for exam ple, if you adopt a 95% contldem-e level, then you expect
95 out of I 00 samples will have Lhe true population value within H1c range of preci
sion, e.g. S per cenl.
Second, you may wish tc1 base your i;an,p]e size on earl ier, si m i lar :;tudies i n your
chosen fiel<l . The aJvaatage of tl, is is th at il can allow for direcl rnmparalive a na
ly. is w i th prtvious rcsenrch . Yi.:t the downside is th al reviewiI1g lliese earlier studies
does nol always reveal the sampl ing techniques used by resea rchers. Thus, although
your sample may be of similar size, it is Lmhkely that you will adopt the same sam
pling method a nd/or cases.
Third, the level of precLsion or sampling error is the range in whic:h th e 'true'
population val ue is eslimaled to he. This range is typically illustrated using percent
ages, e.g. 5 per ccnl. For example, if a researcher finds Lhat 80% of consumers
engage in recycling activities with a precision rate of 5%, then the researcher can
condude that between 75% and 85% of conswners in the population b ave engaged
in recycling activities.
Every sample statistic you cak uh,te from a sample wi l l ha\'e sampling t>rror, no
matter which sampling lechnique you u.e. Ultimalely, lhc lower lhe sampli n g error
req uired, the larger lhe probability sam ple needs to be. The only way to el i m i n ate
sam pling error is to undertake a cen1;us of lhe en tire population.
Fourth , if you are adopting a case study research design, then the likelihood is
that you RH' f'Xamlning a real-life phenornenon, perhaps an i n-depth case. Hence,
the sa m ple size may be one. Conversely, i f you a re conducti ng quantitative research
and are i ntending lu use inferential statistic:s, then you need to consider your samrle
size carefully. In particular, subdividing your data into various categories will impact
on Lhe types of data ana lysis technique you can use.
Fift.h, if the population is sma ll , e.g. less than 1 50, then you may be in a pl)sition to
carry out a census of the entire population. The advantage of bei ng able to do this is
that it elim inates sampling error data on all individuals. Still, as we have established, it
can hf' hoth costly and time.consum i ng targeti ng the entire population .
Finally, therf' are several different form ulas a n d tables used for determining sam
ple size. Thi.: range of met.buds is beyond lhe scope or Lhis book. For a more in-Jepth
disC""ussion on sampl i ng methods see Cochran (1 977). However, below is an example
from Yamane ( 1 967) . This formula is used lu caJcttlulC' lhe sample sizes in Table 8.2.
The tahle shows the size of population and the required sample 1ii7".e for 5% pred
sion levels, where the confidence level is 95% and P = . 5 , which is the estimated
proporUon of a d1araclcrislic Lhat is presenl in the population.

Yamane ( 1 967: 886) suggesls the following way to cakulatf' sample size:
n = --

1 +N (e)1

n = s::i mpk size
N - population sizf'
e = preciinn (samp Lng error)
By way of illustration, if you knuw the population r,,ize to be 300 m:magers and wish
to know the sample size in order that 95% of lhe sample values are with i n 2 stand ard
deviatinns of the Lrue population mean, the ahovc furm ul::i would read a.s follows:
n = ---1 + 300 (.05) 2

A sa mple size of 1 72 managers Fro m a population of 300 corresponds to lhe Anal

set of tlgures given in Ta ble 8.2.
As mentioned, there are 'iever.tl methods used ln cak11 late sample size. The crilcria
of precision (sa mpling error), vanability and cortfldence level are important considera
lions when thi11king about sam ple size in your own research projccl. Remember thal
your calculated sample size does not allow for non-responses. Ideally, yuu need to
increase your sample size by al lea.,;t 30% in order to ovcn:ome th is problem.
Calcu la Ling samp[f' size i!> lypically associated with probabi.Hty sa mpling, i nvolv
ing quantitative anal ysis. DecidiI1g on sam p le si ze for a qualitalive study mighl seem
simple, as you are not necessarily making infermces in relation lo Lhe wider popu
lation . f lowever, irt some respects i1 is more difficulL hf'cause there are no rul es th :.i t
lABLE 8.2

a'lrf P -

Sample s1 e l ni .::5% prec1s101 1 lrvr[:; w11e e cunfitlPncr level is 95''1.,

Size or population

1 75

Sample size

( n)

for precision (e) of +5%


1 72



govern s.intple sde1..tiun 111 some resperu, Lhe final sample siL.e is almusL alwa% a
mancr of j11dgemenr r.ther than Lakulation (Hoi.I1villc and Jowell, 185).

Stage 5: rn1 h:irt Vm r n:at;a

Once you haVl' c!>Lablishd your target population, snmpling frame, sampling tech
nique(s) anJ sample si1.c, the next step is to collect your data. As noted in Chap
let G. thc1 e .ire n urnerous p1 imury data rnllection methods. You1 dwsen mcthod(s)
:-ire likely tn im pad nn yn111 dHlk c of sampling te1..hnig ue<;. Pnrticipn nt ob ... crva
tion is LypiLally assm iated with non probability s,m1pling, whereas you art>
more likely lo use probability techniques 1f you are conduc:Ling a qucst1nnnaire
Once you start to lOllclt your d.ita, a key rnm:em i tlLhicving a suitabll' number
of responses Ide.illy, your sample siLe should be large enough tll SllOllllllOdate non
responses. We examine th<> issue of response rates in the next section

e 6' Ac:c:P

Ynur RPC: onc:P R p

Your response ratt' is the numher of cases agreeing to take part in your study These
c:ases are taken from youi original sample. Your response rate rnn, of c:ourse, he
represented a-. fl percentage or actual number. for example, if we lOn.si<ler Lhe for
mer, let llS say that yo11 have 200 cases in your sample and a total of 40 participants
take part. Then our responsr rate equ::ites to:
40/200 x 100 = 20% (response rate)
In reality, most researchf'rs neve1 achieve a l 00% response ratf:' Reasons for lhi!.
might include refusal to respond, ineligibility to respond, mahility to respond, or the
respondent has been lorntcJ hut you are unahle to make contact. Rt>sponse rate is
important bernusc each non-response is liable to bias your final sample. If, for exam
pie, you decided to condult an email !>urvey mto working praltiLes anJ received a
response rate of b0%, this could be ulterpretcd as an excellent response Yet, closer
exarninatmn of your findmgs might indicate that the majority of respondents nre
those m full-time employment and working from home, whilf:' those in part-time
emplnyment generated a low numher of responses. You might then conclude that
this bias is poi.sibly dttt: lo t.hc fad that thoi.e in part-Lime employment have limited
access to the lnternet. Therefore, your sample is not representative or tht entire
A high response ratC' is essential 1f you wish to infer from your results to the wider
population Lf you Jo genf'r:ite a low response rate then the greater the hkelil1ood for
sample bias For example, response rates as low as 5 20% increase the likelihood of
the sample being un1epresentative. To illustrate, suppose you con<luLt research into


keeping pets using a random sampling method ot people in your area If you re,eiVl
a responst' rate of only 25%, dtis might be becaust> only thost:' respon<lents ,,ho bave
a pet or an interest in pets responde<l. This is becalle thnse people with an interest
in pets are more likely to return their completed questionnaire.
For dwsl! reasons, it is vital to keep track of responses. If you do experience a low
rf'sponse rate, Ju not feel loo Jei.pondent. A low response rate may be in keeping
with previow, reseat lh studies. If so, this Lan make for mteresting comparative
1lll:ilysi\. Through con.ducting your litCT"ature review, you should be familiar with
what constitutes a 'typical' rPsponse rate 111 your c.:hosl'n suhkct.
A key faLlor in increasing your response rntc is always to involve a 'chase-up'
stage. Thi basically involves contacting partkip;int5 again so as to 'encourage' them
to take part in your re!>carc.h. If all else fails, n reserve sample Lan always be plaetd
ml stanclhy - if your populauon is large enough to fl!low you to have a reserve sam
ple, that 1s!
In um, response rate 1s 1mport<1nt because each non-response is liable to bias
your final sample. Clearly <lefining your sample, employing the right sampling tech
ni4ue, and gl:Dcrating a large sample in some respells Lan help lo reduce the likel1
hood of sample bias.
Choos1n an inappropriate sampling tedrnique 1s something that you will prob
ahly realize in hindsight. A key indication of a poor choke of c;ampling method is a
low responsL rate. Certain method<., such as l01wcnienLt' liampling, are unlikely to
pose a problem in terms of response. Conversely, stratified random sampling is often
more problematic because of it,; very nature.
Generally, you should be rnnfidenl that you h:we lhose11 the right sampling
method prior to colleLti.ng your data. Your choke nf sampling met.hod may be
Jependent on how much accuracy is ncede<l. Let us sc1y that you have an important
problem to addresc;. There 1s therefore a greater need for an unbiased sample with
a mem,unible sampl111g f:'rror. The hesr technique woul<l be a rrohahility sample. On
the othe1 han<l, if the findings relate to a Iese; important Ul'Lision, a judgement samplt>
may hf' justified (Petcrson an<l O'Dell, 1 QSO).

(r,nnrPnt Drinkc:

Four years after rinlshing university, former Cambridge graduates Richard Reed, Adam Balon
And jon Wright decided lo work together on pursuing their business idea. With a background
in manageme nt consultancy and advertising. the three budding entrepreneurs certainly had
the knowledge required to start their own business. Their intention was to develop the 'best
fresh drinks in the whole world'. Yet. although they believed in the essence or their idea,
they still did not have a product.
In 1998. arter six months or trying out recipes on friends, they spent 500 on rru1t. turned
it into smoothies, and sold them at a small music festival in London. Next to their stall, they




put up a large sign saying 'Do you think we should give up our jobs to make these smoothies?
They put out a bin saying 'YES' and a tlin saying 'NO' and asl<E!d restival goers to put their empty
bollle in one of the bins. Richard. Adam and Jon were no doubt pleased to find that the 'YES' bin
was full because they did not hesitate In resigning from their jobs the following day' Innocent
Drinks was born.
Given their financial constraints, they continued with their sampling by taking several bottles at
a time into local retailers one day and returning the next day to see if the shop owner wanted to
sell more. With the help or a capital injection or 250,000 from an American Investor, the com
pany has grown from strength to strength. Products include a wide range of fruit drinks. all of
which are made of 100% natural ingredients.
Interestingly, although not particularly scientific, some might argue that a key factor m the
company's success was the relentless sampling undertak(m by the entrE!preneurs at the begin
ning of the venture.

Ol e t' nc;
1. What type of sampling technique do you think Richard, Adam and Jon adopted?
What are the advantages and disadvantages of their chosen technique?
3. Discuss the alternative sampling techniques the three entrepreneurs could have adopted. Give
reasons for your answer.

Sources: Iirnocent Drinks (2009); Marketing Week (2005)

ummarv 3nd Ccmcluc.'l..,

111 this chapter we have examined a number of sampling-related issues. In particular,
why sampling is often unJcrlaken by researchers, Lhe process involved in selecting
a sample, and thr.: range of sampling led1ni4ue!. available lo resr.:ardJers. Here arl'.'
Lhe key points from this chapter:

Miriam, a final year Finance student. has decided to base her research on the relationship
between the marketing and finance function In accountin!il practices. Miriam is a part-time
student, and divides her time between studying and working in a local accountancy practice.
Several of her colleagues are members of the professional accountancy organization, the
Association or Chartered Certified Accountants (ACCA). Miriam has asked her colleagues to
speak to rellow ACCA members to see tf they would be interested in participating In her study.
In essence, Miriam has opted for a non-probability sampling method - snowball sampling. She
is confident or generating a suitably sized sample as she has the ruu support or her colleagues.
Miriam's rationale for choosing snowball sampling is that several existing studies have used
the same sampling technique. Hence, she wishes to compare her findings with those of existing

Supervisor questions
1 Discuss the advantages and disadvantages associated with snowball sampling.
.. Describe the sampling design process Miriam is likely to go through.
What are the options available to Miriam if she fails to achieve a suitably sized sample?
4 In general, why are non-probability samples popular among students?

Deter rr1rnmi c-amplr :IZC

Richard's research focuses on student satisfaction within his own university. He has chosen to
undertake a single case study research dE!slgn. Given that Richard Is interested in the opinions of
his fellow students, he believes that a convenience sampling technique is the obvious choice for
his study. As an active member or his Student Union, Richard Is in the fortunate position of having
access to a large number or his PE!E!r group. However, although the number or participants does
not appear to be a problem, Richard is unsure how large a sample size is required. Richard has
arranged a meeting with you to discuss your views as to an appropriate sample size.
Case "tUdy quec;lion
Can you suggest a suitably sized sample for Richard?

The sampling method(s) you choose largely depend on whether or not you wish to infer that
your findings apply to the wider population.
A population is a clearly defined group of research subfects that is being sampled.
You need to consider a number or factors when considering your sample size. These include:
the confidence that you have in your data. earlier studies. the margin or error you can tolerate,
the types of analysis you are going to undertake, the size of the total population, and using
formulas and published tables.
In general, sampling techniques can be divided lnto two types: probability or random sampling
and non-probability or non-random sampling.
A high response rate Is essential if you wish to infer from your results to the wider population.
If you generate a low response rate, then there is a greater likelihood for sample bias.



1. What sampling technique shall I use for my research?

Answer: Vnu need to cons1cJe1 a numbe1 of fi1ctors when co11s1t.Jering your sample sile. 1111.? .. c
include the conl1dence that you llave 111 your ll,,lcl, erller stuules. llie rn rg111 ur or or v 1u c,111




tohm1tc, l1 1e ty pes ul t1nalysis you are guing lo uridellaKe . tile size CII the total pop ulati1111, and
using formulas and puhlJshetl tablr.s . In addinon, 1f op ting for a q ualitative resec1rch !>tra1e11y , then
t he we of your sample may not be urh an isslte This 1s b1?r<111sc th,. aim of your research is not
to rTidke inferences in relc1tinn tn the wu.J e r po,i ul at1on . but lo analyie a real-Ille phenomenon
2. Can I use more than one sampling technique?
Answer: Do not reel that y ou have to restrict your research to one sam p lmg techniq ue.
However, bear m mind that certain sam p lin g techniques are perhaps better suited than otllers .
Comhrnrng j1 1 dgernent and convenience is not 11m1s1 1;i [ Bnth are nnn- ,iroh,i hility s,'l mp l 1ng m ethods,
and there is a clear similarity between the two. I f . ror exam ple . you we1 e concerned with studying
the financial performance or SMEs in your region , yuu niay decide to choose those within a live
mile radiu s uf your hnme ( corwen1E11cc ) However, rf you are in lhl! fortunate position of having
nrany tiusinessl!r, w1thrn close proxrmrly , y11u would choos1! those that, rn vuur jurlgemcnt, were
perha ps likely to provide ltw most intere!iting findings (judgement sampUng ) .

Coch rr-tn, W.G. ( l 9 7 7 ) Sampli11g Teclmiques (3 rd et.l.n ) . New York: Wiley.

G hauri , P. a n c.l Grnnhaug, I< . ( 200 5 ) R,search Methods in B usiness Studies I l arlo w
tr f/Prentice I !all.
I-Toinv ille, C. and Towell , R. ( l 485 ) Sun1ey Research PracticP . Al<l ershot: Gower.
Tnnucent Drinks (2009 ) ' O LLT story' , on line source: www. inno1 entclrinb. co. L 1 kl
us/? Pa _E\e=Cl u.r_story, acresed 10 Ap ril 2009.
Malhotra , N. K . tm<l R 1 rks , D. F (2f lOnl Markeri11g Rf'searcli : A 11 Applied Appmach
(2nd e<l.n ) . H arlow: 1- 11/Prentke H a l l .
Mc1rfu,fi11g lVeek (2005 ) ' A l oss o f i n nocence' , on l i ne sou rce www m arkt>t 1 11 g week..
rn. uk/hom e/a-lnssof-ian oceno.:?/2004552.arlicl e, acussed 10 A p ril 2009.
M a.x we l l , J.A. ( 1 996 ) Qunlilati11e Research Design: A 1 1 lnteranii1e Apfimach ,
A pp lied Social Researl h Met.h ods Serif's, London : Sage.
Pntton, M. Q . ( l 990) Q ualitmit,t' Evnlu.aiim1 a11J Researr'1 1Vleth od Ncwhury Park ,
CA : Sage.
Peterson , P.G. and O ' Dell , W. F. ( 1 950 ) ' Selectin g sam p l i ng methods in rnmmf'rcial
research ' , JounU1l of MnrAe1ing, I S (2 ) : 1 82-1 89 .
Yamane, T. ( I t) 6 7 ) Sta ti.!.tics: A 11 Immrl11ct1.ny A nalysis. New York: I Ta rper & Row.

3. How do I identify my populatio n?

Answer: This can be ditricult. You ma y frnd that there 1s no one definitive directory of all
companies, consumers or managers lhcl t make u p your po pulilt1on. This 1s a p roblem I found
when resear ching UK companres t11at had esta hlishert joint ventures in China. Al1 t1ou g h the
number stands at somewhere ;irnund 4 , 000. at lhe time of my research. there did not exist
one der1mt1ve directory (011lainin !,l all 4 , 000 jo111t venture r,. Needless tn sa y , I spent a g reat deal
or lime trying to corn ile my own sam ling frame !his ,nvrJlvcd usin g a wide rang e or sources
In order Lo ensure u,at 111 y s;i mple was r e p re5ent a tive or the p opulatio11 , 1 took into .icc:ounl a
number or factors. na mely curn panY srze. type of 1ornt ven ture. number of y e,m established
11nd ty p e of industry, These were la1 i.;ety basd un earlier sturt1es.

Furth er Read ing

Barnell, V. ( l 9 9 1 ) Sample Suroey Prinlij ,/es and Methods . London :

Ed w a rd Arnold .
R ryman , A . anc.l Bell , E. ( 2007) Busi11ess Research Meth ods (3rd edn)
. Oxfor d
0>.-Ford Un i versi ty Press.
Cochran , W. G. ( l 977 ) Sampli11g TeclmiqLCes (3 rd f'dn) . Nt;> w York: John
Wiley & Sons.
Hen ry, GT. ( 1 990) Practim/ Sa,npling. NE>wb ury Park CA . Sa ge.
Robson , C. (20 1 ] ) T{eal World Research (3 rd eJ.n ) . Chkhester: Joh
n Wiley & Sons.

4. t am finding it difficult to generate a suitably sized sample. Can I use friends and family?
Answer. lo :;orne extent. this depends on your resedrct1 q 11est1ons ,llld chosen methodolog ical
appr oach, II you decide to use convenience sam pling , then note the Lack of re p resmtat 1ve people
within the study. Also, you rn.1 y g et a biased set of answers because your res pondents !.lo not come
Imm ctirrerrmt demng rapll1c backgrounds. Stuilents cln sometimes includP. friend ilnd famrly as part
or their chosen sarn le. If you dP.cide to do this , It is irnoc tar 1t U1<1t you h1gl1Lig til your reasons for
Mop t in g convenience san1 lirrg and address the issue of p ossible bias rn your me111oc1olugy . One
.1rgu men1 for usrn g a small sam le hased on convenience samp ling is that your research rs
intender1 a a pilot r.t urty, p rior to concluctin g a targ,-scatc piece of rec;earch.


Brewerton, P. and Millward, L. (2001 ) OrgC1.nisational Research Methocl.s. London : Sage.

Brown , G. H . ( 1 94 7 ) 'A com p arison of sam p lin g methods ' , Jounwl of Ma rketing,
A p ril , XJ ( 4 ) : 33 1 -31 7



Analyzing quantitative data

After reading this chapter, you should be able to:

know what is meant by quantitative data analysis:
understand how to summarize data;
be able to apply measures of central tendency:
be able to apply measures of dispersion:
understand inferential statistics;
recognize statistical software packages;
understand the basics of using SPSS: and
appreciate the role of the research supervisor in relation to quantitative data analysis.

In the preceding three chapters, we have loohd at issues surrounding the gathering
of Jata. OncE' you havE' completed your data Lollection, the next step is tu begin
analy'.l.ing your data.
This Lhapter is the firl of' Lwu that xplore tlala analysis, The chapter aims to pro
vide you with a soliJ grullllding in the different methods yuu can use to analyze quan
titative data. An in-depth discussion of the numerous methods associated with
quantitativE' data analysis L,; heyond the scope of this hook, although L have included a
numher of' sources dedicated to quanliu1t:iw methods in the 'Further readng' section
at the end of this chapter.
The chapter begins by cunsiJering the nature of quantitative analysis and goes un
to describe Lhe various methoJs you may consider when analyzing your data. The
rnethods you choose largely depend 011 the purpose of yoUI rcsean:h. Moreover, cer
tain conditions need to be met before choosing your methods of analysis. For example,
the nurnber and types of variable you are looking to analyze will ultimately inlluence
your choice of quantitativE' method.
A gooJ starting poinl in your analysis is lhL' surnmari1..ing of your data.
Fre4uency tables urn b very usefuJ here. They provide you with a brief o vt;>rview

o f yulll fi n d i ngs, and can help yo u lo dctern1ine your approach in undntaking

morl" rnm p lex statistical analyse\ such as tests of association . lll ustrative examples
show hnw you m igh t wish Lo i n rnrpurate freq uency tabl es in to yo1 1 r st u dy.
J lhrn explore the variou!> w;iys of describing your data. Possible methods here
Lome under the broad headmgs of ' measurt;>s of central tendency' or ' measu res of
dispersion ' . A l though husiness students lend to have a varying degree of mathem at i
cal ability, many are l ikely to he h1 m il iar wilh at least ome of these methods. This
se t inn pmviLLes a useful overvif'W, 1n cl11di11g t:Xamp les, a long with the advantages
and disadvan tages of each measure.
The next section examines slightly more complex methods of quanli talivf' data
Rnalysis. Thf'se relatf' to ml";isures or association , measures uf differenLc, anJ regrts
sion a n a lys i s. O n cf' a ga i n , yo1 1 r choice nf analys1s depends on the d1araucristics
:issocfated with your data. We shall explore these ch aracteristks, and h ow to deter
m i n<> your choice uf methoch latf'r on .
Data analysis is now typically unJertaken using slaUstical software. Th us, the
final part of the ,hapter provides a hrief guide to using Lhe Slalistical Package for
SodaJ Sciences (S PSS) . Th.is covers nanting and defining the properties of your
varia bles, enteri ng dat:-i a nd presentrn g descriptive statistics.

What is nuantitative Data Analysis?

Statistics is ::i brnnch of mathematics that is applied to quantitative data in order to
draw rnncl usions anJ m a kf' prerlictions. Sta tistics are used in a ll walks of life. In fact,
you prohahly LOml' across a whole rn nge of tatistic-s every day without rea lizing il.
Examplei. inJ ude govcr1:unent survPys, i n flation figu res, rnm pany sales figures,
unemploymc:rH figures, in terest rates, aml so on. Statistics can he used i n conn ection
w ith economic, political, environmental and !>Odal issues. Moreover, Lhey can he
used to a nalyzp past a n d rnrren\ data , and foreu,st futu re p rojections.
rF you have un dertaken a rositivist researi..-b philosorhy, you w i ll have gathered
main ly, if nol exclusivdy, quantitative data. Your q11antitati11e data involves data that
is n umerica l in nature. Those researchers who havf' adopted an interpretivist ph i
losophy may also use quanli lalive <lala. For example, con lent analysis, which is a
q u antitative fnrm of a nalysis, is typically associated wilh 4ualitaliVL' <lala.
A rangf' of q ua nUtative analytical tech.niqut=s can he used to anal yze and interpret
your data. These include f:'Verything from si mple tables to sum marize your data, to
multivariate tei.u. lo determint> thP strength of relationsh i ps betwef'n varfa bles.
Th ankfulJy, LhL' introduction of statistical software packages su,h m, SPSS means
thal tht.: time taken to prepare, con<luu anJ i.nlerprel qu:m litalive data has bt.:en
m arkedly reduced. This is i mportant for Lhuse sluJcnls whu ure concerned about hav
ing to man ually carry out calculations and who perhaps cite their ' fear' of statistics as
rl"asons for nut incorporating fj l tantitative analysis into their study. Jn some cases,
."1. u<lenls may reu.,gnize the valul" of statitics, b ut th ey foil to consider quantiwtive
<lata analysis because they believe that th f' fol lowing concerns apply to them :


1 I am no good at mathematics. and do not have the time to learn.

2 I need a large sample for quantitative analysis.
Wi th regard lu t.he first poi n l, cenainJy some statistkal methods useJ to snaly7,e
quanti tative data arL' exlremdy diffi cult to learn, although methods used to describe
d ata and make inferences to Lhe wider population art.: mostly basic Do not feel ubl i
gateJ tu using rnmplex method to ar1alye your da ta . Es!te.t1Lia.Uy, the quaJ ity of your
<lata and the Jarity of you r analysis arl' more important th,rn using complicated ana
lytical tools. Moreover, 1t is f'ssential that you understand the ru les of applical.Jon and
know how to interpret your result.
The second point tenJs to be more important for quanli.lalive research strategy as you
an: hkdy to m;ikf' inference in relation tu Lnf' wider population. In urdl-'r to do this, you
neeJ a rt:presentative sample, although this does not necessarily need to he large 1 11 size
(st=e Chaptcr 8). It is not always necessilly to have large quantitks of data for analysis.
In figure 9 . l we return to the Honeycoml, uf Resf'flrc-h Methodology. The high
lighted P!ement (6) shows Lhe Jata analy<:i techniques that you may consider when
unJerLak i ng your research methodology. f n th is chapter we look at qwintitative analysis,
while Chapter 1 0 examines quulitalive analysis. ' l 11e sta tistical procedurf's used to ana
lyze quaot:Jtativf' data can be divided into two hrand1es, namely: deslriptit,l' statisti,.s and
inferential stulistics.
The majority of texthooks on ri uantirntivc analysis tend to make a distinction
between descriptive statistics and infe-remial statislics. Thf' former is used to summarize

Descnptlvc sl3llstics
lnferenlial slslistics
Vi!lual analysis
Grounded theory
Narratlvo analysis
Discourse anely!IIS
Co nlon! anulysis
I nterviews
Secondary datii


Data analysis


DatR collection ,___



Epistemology: Positillim.
lnterpretlvlsm. Pmgrnali:sm
Ontology: Objactivism, Subjectivism
Axiology: Value-free. Biased




Co1nblnl11g quant1lailve and
qualitative strstaglos
(multi strategy research)

Action rase11 rcM

Casu study

Archiv11I anslysl e


The Honeyco111!1 of Re ,1,11 ell Methodology (@201 i ICJ11,1!1 an Wilson)



and describe Jatn, while the lattt>r , ust>d to rnoke inferences in relation to il wider
population . lnferentiill staristics cun alo;o be suhc.livideJ on the basis of non parametric
and parametric te.ts.
Parametric tests are regarded as mon: powerful us they assume that the observed
data follows a ,umnal distribunrm {we exami11c: this later in the chapter). Parametric
met.ho<ls un: use<l when you are able to t"stimale the parameters of Jistrihut1on in the
population Two of the main parameter.. are the ml'm1 JTIJ stwulard deL1iatinn. Non
pararnetnc methods are used where a normal d,strihution lJJlflol be ascertained. [n
other words, when you know nothing abot1t the parameters and have a small sample
size, then non-parametric: tests must he- ue<l.
When corH.lucting your quantitative analysis, it c11n he viewed as a process that
uwolves the followmg stages:
preparing your data for analysis;
summarizing and presenting your data using tables and graphs;
describing your data using suitable statistical methods: and
examining relationships and trends between variables.

the task of preparing your datu for analysis mlld1 ea'iier. When assigning your codes
1t is important to consider the fi.11lowing puinl:. (B ryman, 2004: 146):
the categories that are produced must not overlap;
the List of data must take into account all possibilities, This includes missing data and answers
to open questions that might come under the heading 'other': and
there should be a clear set of rules goveming how codes are applied. This is to ensure that
coding is consistent over time.

TvnPc; nf li1te:i

The lrkdihood is that not all of lhe data that you have ,ollected will be the same.
In order tu select appropriatl" methods tor analyzing your data, 1t is important that
you understanJ the Jifferent types ot d:itn There are four main types of data:
interval; and

We shall now explore each of thl' above tages m tum.

prop:iir"nr Vo1u 11 ttl for An:1Lv is

The first :-.1.ep in qunnlitative data annlysis is organizing your data so ch:H it is ready
for analysis. Typically, this involves entering your data into o specialized software
package such as SPSS. When entering your clata, you will :.tart by crt>atrng a spread
sheet or matrix. Each column in your spreaclshcet shoulJ represent a v11riable, and
each row represents a case (ee Figure 9.2).
F'igme 9.2 is only n brief example. In reality, you may have ,1 large number of
cases, and thus a very large spreadsheet. The flrst column indJCates the 'case num
ber'. As noted in Chapter 5, a single case might hi:' a company, individual or possibly
an event. The 'nge' column is self-explanatory. The final three columns representing
'gender', 'nattonality' and 'highest level of 4ualifo:alion' arc given numt>rkal values
rather than text. This is because most software packages are prograrnrm:J to analyze
data rn this way. Hence, all data should be given a numerical value.
Icleally, you should have assigned your codes when rnllerting your data. For
example, allocating cuJes to respedive questioM within a questionnaire- will make

Case number







d i

t f orr a data prec1d he




Highest level of

Numinal data are data that cannot be measured numerically. In other words, it i named
data and incluJes values that can be classillecl into c..ategories. 11; for example, you c.on
duct a questionnaire survey into employee promotion, you may ht' mtl"rested u, placing
employees into calegorie,. Thus, 'trainee' may be coded as 1, 'supervisor' 2, 'manager' 3,
and so on There an a limited numher of methods that can be used to analy1.e nominal
data. Typical methods incluJe frequency counts nnd fu1t.ling the mode.

Like norntnal, ordi11al data are anotht>r type of caregonml data. 1 Towever, the
main Jiffere-nce is that unlike nominal c.lata, ordinal ciata can be rank-orderecl. Let
us say that you ore interested 111 finding out the extent to which custome1 service
is important among a sample of consumers. Using a Likert-scale question, per
ceived importance may he ranked from l to 5, wlure I - very important, 2 =
important, 3 = neither important nor unimportant, 4 - unimportant, 5 = very
unimportant Exan1ples of types of analysis suitable for ordinal <lata includt> frc
quency counts and percentages lrom a set oF ra11lwd data. Note that the clistance
across your set of categories might not be equal. Regarding the Lw,tumcr service
example, we cannot say that those consumers who rnnsider rustomer service as
very important judge it to he 5 times more important than those who give a I although w can say what percentage of respondents lick each box on our 5-point
Likert-scille question.


fotP.,..,,n/ dn tn have hPen :ich 1 eveJ when the Jistan<..e between the numhers are equal
across the range ror ex:amrlf', the diffcrencc h<"twel!Il 5 a.nJ 6 .u. 1. 1 his i<; equal to
the diffcrcn<..e between 6 and 7 'lhe temperature s<..ale!> of Fahrenheit and Celsius
aie typiuil examples of intcrval scalf'.s m that the zero in both scales is abil1ry. So,
vou cannot say that 30 "C is twice as warm a, 15 C. Wllf'n di>aling with 111tcrvR1
data you n<:'ed to he vt-ry rnreful nut to make such daims. The nw,rn, mode and
m<'tli,m can he u.,ed 10 desnih1 inll.:rval Jata.

Ra tio du ta arc very similar to interv;:il data. Till' distinction between t.hc two is that
ratio data have a Bxed :.wru point. Esamples nf ratio data include income, weight and
he1ght. Interval and ratio data allow for more prense lf'Vf'ls of 1easurcmc1_it than
categoncal data (nominal and ordinal) For example, a directors sa la1?' 1111ght bf'
giwn in exact figures (ratio), or listed in relation to other Jiret.tors w1tm the co
pany (ranked). lnle.rval and ratio data alo offer a greater numher nf options when 1t
wme!> tu Jula artalvsis. We C'{plnre some of thes1;: options later in the d,apter.

N1'rPtier 'lf v--r; hi I '"

Just as the types of data influence your c.hoiLc uf an.i.lytical tool, the same can be said
of t.hc number of v.uia.hlt>s. In essence, statistkal mcthod.s t.:an be basd on the follow
ing numbcrs of variables univariate (one), bivariate (two) or multivariute (three or
more). The majority of methods assoch1ted with descriptive stastics a:e hased. on
univariate dstn. Conversely, inferential stafatics nre typirnlly assonated w1th b,vanate
nr multivariatr data.

Crcf ,..

We C'Ome across rndmg in Chapter G. All types of data should be coded numerit.:ally.
for example, the dirh o tomo11s 11,lriavle 'gcnJer' is usually ended 'I' and '2' when
using statistkal analysis software. The advantage ui <:o<l.ing is that it will ultimately
make your analysis easier and less confusing. Moreover, i.t is cssentiul ror most stn
tisti<..al software packagcs.

r "1rr r 11 ')\"r,O cl1t"'

rnl 1 "[Ir"

Tf you are uncertain nf the number and complexity of your responses, you may
deci<lc to uuplemenl your coding scheme afte1 <lata collect1011 1 although for a col
lection Looi such as a questionnaire survey, a predetermined set o[ codes for each
C]Uf'St1on can make data cntry nnd analysis less ttme-Lonstnning.

ll is csMmt1JI thnt you also code any missing data. failurl to dn so is likely lo impact
on the interprclut1on ol your resulLs. A missing dal,1 rnde LUil be used to UluLrate
why data are missing. For example, fl non-response might he indicatf'd by a 'O'
Coding m.issing Jata 1s required if you wish them to be ("xcludeJ frnm your analysis.
Missing data may :1rise for a nu111ber of reasons. These imJude a questilm h
1rrdev:rnt to a respondent; a queslion is left blank as the respoudent did nol wish tu
complete it; or a question is not answered because the respondent c.Ld not under
stand it. The latter ii, sometime!> a prohlem when conducting mtemalional or cross
cult11ral rese,irJi.

Yr r n t.:1
Summar.izing and presenting your <lata is likely to bl, the fint step in your analysis
It comes unde1 the hroad heading of dPc;criptive statistics (see Table 9.1) Undt->r
taking <lescriptive statistics not only allows you to describe your data, but also la
p1esent tt ma numbt:>r of different ways. Almost ,ertainly you will be familiar with
some of the techniques used Lo present descriptive statistics. These include fn:
qllency tables, bar charts, ric> chJrt.s and graph!>. Many slue.lies that engage in stut1st.i
cal analysis 11se Jescriptive stHtc,tiu. as a starting pornt. The main advantage of'
summari.l.ing the Juta in this way is that it provides t.ht-' reaJer with a simple over
view of your dala prior to more detailed 011alysis.
Tahle 9.1 shows the various mf'thocb that can he U!.c<l when describing your data.
Column two highlights the purpose of each met.ho<l, while column thtct.' shows a
brief example of how each method might be npplied. 'I he application of ea<..h of the
methotb Jepen<ls on a numher of factors. Tht>se includf' the number or v:iriables, the
lypf' of data anJ the purpose. I will now Jisc-uss each llf the methods listed 111
Table 9.1 in more dt>tail, mduding the advantage a11J <lisadvantnges of each one.

rndma durm ..,at1 cc11Pction

Fron1 ,onr-y tables

If you have a<lopted a deductive approach to your research a11d already huv a set
of predetermined categories, then you will most prohnhly rnde your categoncs on
your qucslionn.1irL survey. Certainly the advuntage of coding at this c;tage is that it
will save ti.me later when carrying out your a.nalysi.!>.

A good starting point when Analy:t.ing vour data is to look at the freq1w11ry d,strihu
rion for ead, voriablt' in your stlldy A fre,7ue11Ly is a numerical value that illustrates
the number of counts for an observed variable For instance, you might be interesti:J
in thl-' number of cars company directors have.





Exd'T1[ 1le

f d 5 np\lve "il-11 t s





Examples of application


Frequency tables
Graphs and rh;uts
Mean. metJian. mode

Summa,izlng data
Summarizing data
Measuring c.e111 ral
Measuring dis1nion

Numbe, and percentage of employees ln each firm

Ar1vettising spend on different types or media
Analyzing exam sco res from a finance exam

< 10,00U
10,000 < 20,000
2U,OOD < 30,000
30,000 < 40,000
4 0,000 < 50,000

Standard deviation

Me<1swing dispersion

Range and lnterqu;irtile

Index numbers
Scdtler tliagrams

Describln'il r.h;mge
Frequency tlislrlbution
rrequency distribution

Multiple bar cllarts

Frequency distribution

Analyzing the standard deviations from a finance

exa m
Analvzing the range from a finance exam
Changes to retail prices
A preference ru1 a brand of ,ereal based on gender
Exploring rhe link betwC!en car mileage and petrol
Comparing the output ror three different computer
manufacturers over a five-vear period

When cnnstrucUng a frequency tahle, Jal:i are :i.rrangecl in rows anJ columns.
There a re numerous ways lo present a frequency table. indeed, you probul.ily come
across a variety of examples on an almost daily basis. Examples include everything
from schoul Lo football leagtJe tables!
A table is also a good sta rti ng poi nt Fo r both pre:,enting and summarir,ing ym1 r
quantitative fi ndi ngs. Tab le 9.2 shows an exam ple of a fre4ue.ncy JiLribulion tahle
based on 1 5 company Jirectur and the number of cars Lhey havt:. Of course, a
response is n ot always guaranteeJ. If Lhis were the case, Ollr table would also foaLure
nm-response frequencies.
Ta ble 9 . 2 not only shows tbe frequency distribution for cars, it also shows the
percentage freq uency distri bution . This shows that 33.3% of directors have one car,
26.7% have two cars, 26. 7% have th ree cars and 1 3 .3% h ave four cars. A perc;Pn tage
frec ptf'ncy distribution makes a useful addition to a frequenLy table.
Using a small 1:>a.mple size SuLh as th at for Table 9 . 2 is straightforward to incor
porate intu a frequ ency table. Huwcver, adopting the same approach for large
samples is simply n ot practical . 'l'o overcome th is problem 1t is a good idea to group
your observed valuPs into classes. For instance, if unrlertaking a large su rvey on



Numbl!r of cars


1 1 111 1 n

nu1n11r r r






.H 111

m ill

C 1 J11 p::1 11v

1 1:. ect

Percentage frequency



tril.J:.1 1 101 1 '>I IOWIIW




l dfll: alliOI I


Cumulative frequency

lo frequency

Cumulative % frequency










empl oyees' sa lariel>, you ran group the respondents intn classes based on i n tervals
oi (5, 000 : for example, 1 0, 000 or more, but less than [ J 5 ,000; f 1 5 ,000 or more,
bt1t less than 20,0 0 0; and so on . Usmg in tervals of 1::5 ,000 is rea0n ah le given Lhe
likel y variation in s::ilaries. However, wh ere you are l i kely to have a wide range of
observed val ues, you may wish to group them into slightly l.i ru aJer cl asses.
Con versely, a narrower range wou.lJ best suit a smaller nu mber of classes.
The obvious advantage of grou ping your dnta is thal it will n,ake your lahle and
result!> look more presentable. /\ potential problem is failing to recogni ze th e types
of d:m1 you are using when forming classes. Par example, w hen w;i ng continuous
data such al> salary, it is sometimes easy to allocate data wrongly due to poorly
defi ned in tervals between classes. Let LI$ look at an exa m ple of a fre4uency distr1bu
tion table showing grou ped data (see Tahle 9 . 3) .
As well ::is showing grouped data, Table 9.1 also induJes a nm1t1lati ve frequency
,rnd cumulative peru.:.ntage Freq uency cli Lribution column. A cumulative figure 1s
obtaine<l by si m ply adJing the observations from the previously stated Qower)
cl nsses. For exam ple, when establish ing the cumulativl." freqt1c.nLy for the third cl ass
in Tahle 9 . 3 , we sim ply add the lwo lower classe, 3 + 1 1 t- 7 "" 2 ] . The same pri n
ciples apply for the curn ulative pcrcen tage frequency ditrib ution .
So for, we have exami ned frequency table.s in relation tc) uniuariate data. freq
uency tables are also useful for examining bivariate data. A table that al lows you tu
examine the relationship between two variables is called a cross-tnb11latio11. We shall
address this later in the chap ter.

IJiaqnrns uc;Pr1 &r - pr se1tin 6 cta

Diagrams or ilhtstrations can be a useful way of presenting your data . Like.: tables,
they help to hreak up your text and can make for very interesting reading. The main
types of diagram are graph anJ ch arts

26 7

Gr::>r' "


A graph is a type of diagram used to present data. A graph can be used to analyze
bivariate or m ultivariate Jata. Waters (1 997: 107) offers the followi ng advice when














MMP Builds


II' Lonstruct1on


Al Developments


Ma 1.


r r ,;even

Market share (%)

TML Build




Glul.Jal Plex

First quarter 2008



hdre r g ir

7 B11i1t1


ate<; lo ABl Ltd (firc;t QU3 te 20 8)

Z Build

producing graphs: as graph5 give a very slrong initial impact, the choice of scale for
the a.-xe:; i.s U<!arly 1rnport:1nt, with a baJ choice giving 8 tnlse view of the data.
Although tlK choice of scale is largdy subjective, some guidelines l'or good practice
can be given:


Always Label the axes clearly and accurately.

Show the scales on both axes.
The maximum of the scale should be slightly above the maximum observation.
Wherever possible, the scale on the axis should start at zero. If this cannot be done, the scale
must be shown clearly, perhaps with a zigzag on the axis to indicate a break.
Where appropriate, give the source of data.
Where appropriate. give the graph a title.
Line graphs are ideal for analyzing trends over time (longitudinal data). The data
i.houJ<l be of at least interval, ordillal or ratio status. Figure 9.3 shows a line graph.
A line to illustrate the trend over the entire timeframe Joins the data value fur
each respective trow pt-'nod - 1n this c-ase, four months (Jan. to Apr. 2008).
The :cldvantagc or graphs is that they can clearly illustiate the relationship
between two v.1ri:1blci.. A Jowni.i<l1 ii. that grap6 tcnd not to be a,; visually appeal
ing as charts.

Piri c11art,;
I\ pie cltan is llsed fo1 summarizing 1..ategorical data. A pit' chart is JivideJ into seg
ments. Fach segmPnt represe11ts a pat ticular category The size of each category is
proportional lo the numher of cases it represents. Typ1t:ally, t.!1c number of cases is
represented by J pcrc.cntage. Each segment has a different colour 01 pattern to
dearly distinguish each category.




TML Build


JP Construction

A p1 lll r1 how1mi md kd -.h re hgures for o 1:;tn, tiori comparnes

Pie cltarli. art> stra1ghttorward Jiagrams th8t compare a lrmrted amount of data.
By way or illw,Lrntron, let us say that you nr<" conccmeJ wrth the presentation of
market share figure), for seven construt:Uon companies. You may begin hy summa
rizing your <lata in the form of a simple table (see Table 9.4). rigure 9.4 shows Lhe
data from Table 9.4 incorporated into A pie chart.
'l'he1c ,HP two main aJvanlages ac;sociateJ with pit> chorb. Fin,t, th Py are a
clea, way or highlighting proportional ddterenc.cs. Second, the use or different
colmJrs mukes it easy to distinguish between categorici. Conversely, sometimes it
can ht> difficult to divide categories into segment This ii. t'spt>C'ially true with
fragmented data or when there are a large numher nf CJtcgories.


w II

Hall Media

BC Global


T M1!dla










O 10 minutes




11-20 minutes


21 30 minutes


31-40 minutes


'11-0 minutes


gj 110










BC Global


T Media



CJ H_R _

f t>lllplOyfe


lvert, ,ng

P -

as 1ries


A bar clzarl is similar to a pie chart in that it compares a simple set of observ
ations. However, instead of sectors of a circle, hars are used to represent the data.
A bar chart 1s a straightforward way of summarizing either ordinal or nominal
data. It mvolvPs h1varrnte analysis of the main characteristics of the distribution of
thf' data.
Essentially, there are two main types of bar chart: horizontal an<l vcrtkal. Each
bar is th!:! same width, whcrea the length represents the number of cases. A har
chart has a gap between ead1 bar, while a histogram does not have any gaps because
it represents continuous data. To illustrate, there are four advertising agencies loc
ated in a small town. The number of cmployes in each of their respe-ctive
<lL:part:menL,; is highlighted in Table 9.5. Now, let us put thcsl'. data into a 1,imple har
ch:lrl (see Figure 9.5).
The main advantage of a bar chart is that it is a simplistic way of illuslrating
tht> relat1onsh1p between two variables. On the othf'r hand, a disadvantage is
that a large number of case:, representing small values can make thf' chart look


A histogram is a lYPl'. uf bru chart lhat shows a frt'q11enLy distribution of a -:et of

data. Tt provides ,l clear indication of the nature of your d1stribulion, in particular
whether or not you have a normnl or skewed distribution. It 1s an excellent way of
summarizing <lata that are on an interval scale (either discrete or rnntinuous). ThL:
height or ead1 bar represents each obsE>rvf'd frequency,
Before compiling yuur histogram you need to divide the range of values from your
data set uno groups. On the x-axi!. of your t.hart, each gro11p is represented by a rec
tangle with a base length equal to the range of values within that particular group, and
an art-a proportional to the total number of observations applicable to that gruup.
Wht'n c()mpiling your histogram it is important that you allocate a suitable num
ber of groups. Too few or lllo many will negatively impaLl nn the presentatio11 of
your frequency distribution.
/\.o advantage of a histogram is that it i:, l'asy to interpret the data. fn addition ,
1t is ideal where tht> class divisions are nol the same. The ma111 disadvantage is Lhat
it lannnt provide prede individual values. By way of illustration, let us say that a
hank is intt>re'>t\"d in reducing customer wailing times. ft conducts research on its
busiest <lay of the week (Saturday) m ordf'r to establish a 'typical' waiting time.
Following its findings, the bank inll:nds to takf' steps to reduce waiting times in
order to improve levels of customer l>L:rviu!. The reslllts from the study are sum
m;:inzed in a frequency table (see Table 9.G), while the histogram of the same set of






0-10 mlnute5 11-20 rnlmJles 21-30 minutes 31 -40 minutes 41-50 minutes



H1 ICK di I 11 W I


data is i.llust1aterl in J:.'lg11re 9Ji. The y-axis is the ielative frequency, whilP the x-axis
hows the waiting ti.mes.

If we iire dealing with a set of grouped data, then we need to use a dilfl"renl
formula for rnkulating th<." mean:

x = !Ix

Dec;rribing Your Drtta

ln tl11.: last St:llion we looked al nurm:rous ways uf presenting and summarizing your
data. These include tables, cha11s and graphs. The next step in your data analysis is
likely to involvP thl" us' oF decriptive statistics. In PssPnrP, thNP an two bruatl
ar,pmaclws to d1suihing y,1ur data: measuring cl"ntrnl Lendl"ncy and measuring tl:is
r,ersion. We shall now examine thesl:' ar,r,roaches.

% - e,Kh observation
f"' Lhe freciucncy
l. = the sun, of
For PXamplc, Table 9.7 shows the momhly bonuses arnong sale!> :.taff :mJ i.s grouped
into dasse-s. Using Ll.til, fonnula, th' me;:in monthly sales hon us is:

x = r.tx
_ 7.
rt 30


Mea5uring central teindency

Summari7ing and presenting data is fine if we want a gt'neral overview of our data.
I Iowever, uftcn we like lo SUIT1marize data in a more cuncisc way. A cum.mun way
of doing this is Lo discuss the 'average'.
Measures of ceutral te11de11e,-y an: used lo illustnle a typical outcome in a set of
data. The main wc1ys of measu,:ing ce11tral tendency are the mean, median and mode.
First, let us look at tht;> mean.

= r260

The answer is that the mean monthly sales bonus is l:..260.

One of the advantages ot the mean is that it inrludes every score in a :,et of data.
Also, if taking several samples From a popubtiun, the mean& arc likely Lo be the
same. The rlisadv:.tntagc are that th' rnean is sensitive to outliers (extreme values)
and can only he used with i.nterval or ratio data.


The mean (x) is the arith.mcti<..al aw:rage of a frequency distribution. The melhod
uf calculating the arithmetic mean is:



ralri11 ling I he rne;in 1n a set of ,;iroupr.o I, t

Montnty bonus ()

add all observations together; and

divide by the total number of observations.
'fhe mean formula for cakulatiniz; single data is as follows:

Mean (X) =

.x "' each observation
11 - Lhc total nl1mber of ohscrvatiuns
:E = the sum of
for example, if a car dealership wisht"d to And the mean number of cars sold
over Lhc wursc of a wed, Lhc process would he a1> folluws:

Frequency ( f)




.:ile tarr 1 ,on11it11 unu,

Mid-values (x)









400-499 gg



J:.f -30


Itx -7800

The medtan is also sometimes referred to as an average. It is the middle numher in
a set of numbers. The median can be tound using the following fonnula:

Number of cars sold over six working days: 12, 16, 14, 11, 9, 10.


= 12
(X) =
The nwnber of cars old ovCT Lhe courst' of tht' week is 12.



M- n+l

= number of observations




For example, suppoe you wanted to find the median m tht> following set of
numbers: 42, l 6, 38, I I, 9. '1 hP ltrst step t<; to put them m ascending ordl:!r: 9, I I,
Hi, 38, 42 Next, apply tht? tormul:i:
M = n+l = 5+1 = 3
So, the rned1;Jn is tht' third numher in the sequence: ] 6.
lf our list 1s made up or an even number, we can simply take the mid-valul:!
between the third and rourth vulucs. Fur example:
8, 15, 18, 27, 39, 45
18+27 = 22.5
Our median is halfway bl'.lwcen the third anJ fourth values in our set of dim, (18
and 27). Tht>refore the median is 22.5.
The median has two main advantages. First, 1t can he u.,;ed with ordinal, interval
and ratio data (it cannot be used with categorical data). Second, the mecLan is
unaffeLlcd by outliers. Tht? Jisadvantages are that the median may not be a char
aLtcrislii.. of the <listrihution if 1t does not follow a normal distribution, iind it can
not be used for Further statistical analysis.

The modi> is the value that occurs the most often in yom set nf data. Tr, fnr exnmple,
we have the set of data 15, 15, 8, 9, 12, I5, 24, 2, 6, 15 tht! mllde b 15, as il appears
the most often: 4 times.
The advantage of' the moclc> 1s that, unlike the mean and median, it can be used
with nominal Jata. Moreover, it is vcry stiaightforw,ird to determine and is 1maf
fccted by outlier!.. The Jisadvantage or the mode is that vou can end up wi th more
tlrnn one mode value Moreover, 1t does not indicate the variation in a set of data
and 1s sensitive to additional observations.

Measuring dispersion
A key limitation or mcaSLJring <.:cnlrnl tenclency is that it does not give us t1n indka
tion of the shape of a frcquc1Ky clistribution. A measure that allows us to dt?scrihe
the spread or values 111 a Jislribution is 1 derred to as a measure of disp,,rsimz. By
combining measures of (enlrnl tenden,y and dispersion you rnn gain a useful
description of your st?t of data Typical methods used to measure clispen,ion include
the standard de11ic11ia11, range and mtrrquartile range.


The standard dev1at1011 is represented hy S for sample standard deviation anci cr (lower
case sigma) for population standard deviation. Thf' st.1ncbrJ dcvialion measures the
sp1 eacl of clata around the mf'an value.
The tcps in rinding thf' ,;tand11rd dcvialion l an ht> sununariLecl as follows:
Find the mean in your set or data (i.e. the difference between a particular value and the mean - see
Table 9.8).
Find the deviation from the mean for each value.
Squar e the deviations from the mean t o get rid or negativ e values (failure to do so will lead
to an answer of zero).
Find the sum of these values.
Divi de by the number of value s In order to get an average (also known as the variance).
Find the square root at the variance in order to tind the standard deviation.

The standard deviation is exprt>ssed by the formula.

s = J2.(x-x)1


S ;;:: the sample standard deviation
x = an observation
- the mt>an
n = the total numher of observations
V = the square root
I: = the sum of

Pleast? note, when calculating the standard deviation of a small sample, a hetter
estimate is gainc>d hy dividing by (n - l) rather than N
The formula for a set of grouped data i!> as follows:



S =- Lhe sample standard deviation
r = the mid-point of each data class
f = the frequency of each class
V ::::: the square ioot
l. = the um of

By way of illustration , let us say that you are rescarlhing the number of tim<"s
(x) a printing press hreaks down over a period of six months (see Table 9.8).













S= {415 = 2.58




The mean 1s 7 and the standard deviation is 2.58.

An advantage of t..he slun<lar<l Jc\ iat1on is that it w,cs l'Very value 1n the popu
lation or group of sample data. However, hPcause all items in a data set are used 1t
l30 be innucnccJ by extreme values.

ThP range is found by suhtrncling the lowest value trom the highest value 111 a set
of <lata. If, for e'l:amplP, we have the set of data 5, 6, 7 , 8, 8, 12, 13, 15, the lowest
value i!> 5, while the hight i. 15. By subtracting the lowest from dtl' highest we get.
15 - 5 = 10. So our range is lO The: main aclvantage of the range is that it is easy to
cakuhte anJ provides a clear mdtcation as to the broadness or n::irrnwnt'ss of a set of
data. A key disadvantage is that n range bascJ on a sm3ll sample size is likdy lo
exclude .xtrc1ne values. Conversely , the greater the sample size the greater the likcli
hood that e:>."treme values will he inJuJct!. Another disadvantage i!> that it Joe!> not
tell us anything about the values within the rnnge.
lntorntartile ram'.!e

As noted E>arlier , une uitic.isrn of the range is that il Lan be greatly affet-ted by
extrem<' values. The interquartile r:rnge helps to overcome this problem by ml:!as
uring thl! spread bet wePn thl" UJ1Jlf'r anJ luwe1 quartilr>s of a set of t!ata (the
miJJlt' 50%) A, the inlf'fquartile range 011ly focu<oes nn the miJJlL 50% of a
range of data, tt is not as sensitive as the range, althm1gh it i!> lcs!> susu:ptible to
Finding th!! intcrlluartilc: range I eqwres thf' following steps:
List your data In order of size. beginning with the smallest first.
Find the position of the median.
Find the median In the data to the left of your median (lower quartile).
Find the median in the data to the right of your median (upper quartile).
Find the difference between the medians for the upper and lower quartiles. This gives you the
interquartile range.

T!w follumg ex.unple hows the tutcrquarttle range tor a sE't of data. A small l'lec
tncal 1etailer h,1s ncnrded tht numher of rPturncJ items over a 12-montb periot!
(see Table <J.9).










First, let us rial e till' values in ascendmg order (Tabl E' 9 J

1ABLE 910







Next, flnd tlw met!ian for the luwE'r gllartile (mid-value Q J

Median = 3rd 4lt1 observations 2



6, 7, 9, 10, 12, 13.

= 9+10 _ g 5

Now, find the median for the tapper qtiartilc (m1d-v:1Jue Q )

3 = 14, 16, 18, 22, 23, 26.

3rd + 4th observatiuns

=- 18+22 = 20

Interquartile range (IR) = 03 - 01

= 20 - 9.5 = 10.5
The interquartile range and rnet!im, from our .l"t of Jata
ran be shown 3!i follow1.:
Lower quartile 9.5



Median 135





Upper quartile 20

Now, IPt us presume that the company only operated

for J] months of the year,
therebY gavmg
us an odd numher of 11 observations (s<!e Table 9.1
TABLE 9.11

N 11 er

I rr,









Find lhl' lower 4uarhlt' (nu<l-value QI) :: 6, 7, 9, I(), 12

Median - 3rd obcrvat1on = D
Mean fur the upper quartile (mid-valul' Q3l = 14, lfi,
18, 22, 23
Meehan = 3rt! ohservalion == 18
Interquartile rangf' {IR)= Q3 - Ql





If we wished lo measure lht> spreaJ of data bE"d un Lhe semiinterquarlile range,

our answer would be half of the mlt:rquartile rangt:. B;;ised on the La!>t xtunpk half
of 10.5 is 5.25.

[) '"rr:111'1"" li 1"'P
Measuring dispersi on allows us to l;'Xaminc tht: sprPaJ of data. However, this is
hased on a fixed pornt 111 timP. 1-lnw can we examine dula thal change over lime J
For Pxample, tht: percentage changes in fuel pnces, house prices ur inlt:rt:sl rates?
One mt:thml to proJucl! a simple in<lPx.

TABLE 9.13



I r r U' ( G/8 BL)


Average price of new car












i: c/p x 100

Index number (I)

filW 100


x lUO



1, 02'1

800 X


1,200 X 100

lnde number"
An index number shows how a quru1tity change1. uver time. Usually, the base period
equals I 00. Two of the most widely recognized indices are the ITSE I 00, which is the
list of the UK's top-pcrfouni.ng 100 companit:s, and tbP Retail Price index [RPJ1, 111e
latter exam.i.r11:s how peopl<' spend their income. It measurt:s the fluctuation in the cost
of o representative basket of goow, and services. Tht: RPI commencc<l in l 94 7. This
the refure represents the base yt:ar. The basf' year can be represe>nteJ by any given year,
although it is l)'pically a decade or more.Th.is is for two reasons. First, histoncal data
are oen widely available. Second, when analyzing ,hanges ovCJ time, we need a suf
ficient number of Yt!ttrs to identify any prn,sible trends.
lt is also worlh noting that the hasc year does not have to hf' I 00. Por e.xample,
the ITSE 100 of leading shares has a base ot 1,0ml.
T he formula for calculating an index is as follows:

x 100


The next step is to apply Lhf' formula so that we can clearly compart: lh t- extent
that the data has changed over Lime (set: Tahll" 9.13).
Weihted inr4., 111mhPr,;
A simple idex such as car prices is fine if we are rnnccrned about one observation over
tune. Yet, 1f we hnve a numbe1 of items, it is unlikely that Wf' wou]J assign equaJ .i.mpor
tance t each itl:'m. ln order n address this, Wt: can allocate a weighting to each one.
:yp1cally, a we1ght1.:cl price inde..x is calc11lakJ at the end of each year, thP n com
pansons are made over a given time period. A simple piice index ca11 be u,;ed
rnkulate the end yf'ar weighted price index. This is known as P::i asche's Price Index
and is represented hy the formula;

c/p = co:.t/price
b = hast: value
Let us look al an ex::1111ple of an inckx. Tablf' 9 .12 shows how the average price
of a new car has changPd over lime. The base year (the first year when Lhe <lata was
collected) is ] 978. Remember that 100 npresls the hasl:' year.
TABLE 9 ll


l\lE' ,

r n- 11 n w r 1r,;

11 P

Averaiie price of new car ()









P0 = ba.se year prices
P. = tht: current year pricf's
Q11 ..: current year quantity

Paasche's Index can only be calculated at the cnJ of the current year as the
we1g h are current year quantities in a pricC' index. To illustrak, let us say
that a
spccral1st car manufacturer huys thl: following products from one of its supplier
du.ring 2004 and 2005 (:.t:eTahle 9.14).
Sum Of pn X Qn : (25 X 42) + (20 X 36) + ()$ X 20) + (45 X 22) = 1050 + 720 + 700 + 990 = 3460
Sum of Pn X 0,, = (20 X 42) + (15 X 36) + (30
Thus, PaasUJe's Price Index


20)" (40

22)-= 840 + 540 1 600 + 880: 2860

x 100 =120.98,




TABLE 9.14

Car tvre


lt1 f"'l ., f ltur

2004 Unit price

u has, (2 1 4/1 1)

up plier

2004 Quantity


2005 Unit price

2005 Quantity


















Freciuen"'i' rfr:;tri L1 t1on

Earlier in the chapter we looked at frequency Lahlc.:!> Lu gain an insight into t.he fre
guency distribution of univariate data. A frequency tuble can also be used to exam
ine hivanatc or muluvariate data. This type of table is called a cross-tabulation.

Cross-t '-!bulations



A c:ross-tabulaLion is a table that shows Lhc joint distribution of hivariatP or multi

variate data. In Table 9.15, the cros!>-taLulation shows the nationality hy gender for
business school students.
111e advantage of cross-tabulations is thc1t they are !,-imple to produce and allow
for easy comparison between data. However, care !>hould be taken with Lhe nurnbE'r
and types of variable. For l:Xrunplc, luo many variables will have a negative impact
on the prc:.cntation of your Lable, mid probably include several low v,tlues.

Scatter dl<1f'rams
A scaller diagram (sometimes referred to as a scatter plot) is essentially a graph used
Lo assess the relationship between two variables. The>se are the lnJependent variable
rn,i.11g the .l-axis, and the rlep endent v::iriablf' the y-axis. The two variables are plotted
on the graph to see whether or not a relationship emls.

TABLE 9.15

m:,s lrJI ul t1 n s, owing lldtlondl IV lJV 11en e lur !JU 11 IC', ":., l11J0l tudu1h
Male students





Female students

Fig1rt> 9. 7 how'l the relationship bet ween thf' d1sta11ce travelled by c-ar and
petrol consumption. This i a strong positive linear correlation. In uther words an
iflc:rease in the value of one variable is associated with an increase in the valu of
th<' other. In Figure 9. 7, an increase in distance travelled (x) is associated with ari
i11crease in petrol ccmsumptioL1 LvJ
A negalive rnrrel::ition hHppell when an increase in the vah1e of one variable Lo;;
::issociate<l with a decrease in the value of the other. For example, higher levels ol
unemployment might he assornited with lower levels ol' car snli?s. It the points are
sratlc1ed rnndumly Ll1ruughuut the graph, there is no correlation between the two
variahles. Anotber possibility is that t.be variables shl.)W a non-linear ndalionshi.
For examrle, this might be the case whcri. t.:.Xamining the relationship bt:tw<::en age
and weight.
An out.lier is a value Imm a set of data that i!> inctmsistent with other values. It
can lie muLh larger or muLh smaller tbm other value!>. You should not" ignore an
ouLLe,, as it can impaLl on the rc!>-ults of descriptive statistic.. such as the ne;in. An
outUer ,an be caused by one of two reasons - an error in measurement or radical
behaviour from one of the participants. Either way, you need to establish reasons for
the inconsistency before progre-ssing with your research.
Multiple bar charts
A m11Ltiplc hnr dum is a little more con1plcx than a simple bar chart. Jt is u chart
illustraling two or more variables in the fonn of bars of length proportional to the
magnitudes oft.he v11riables. for example., Pigure 9.8 shows t.he output of a drinks
fartory over four years.
Figure 9.8 clearly shuw1. the changes in production output for cola, 1tpple
juice and lj;'monade over a period of tliree years. One might assum that the
factory i-. performing well as the output for cac.h type o drink has i11c1eased year
on year.



'= 15








Distance lravelled by car in miles (x)



scatter diagram s11ow1ng a 5trung pu1t1v linear correlation











CJ Apple juice

c:::J Lemonade

Tile uulput of d d 1nfir, faclory rmr o 11 li ul 111 11 VtJr

The advantage of a multiple bar chart is that it allows you to compare variable.
Conversely, as in the case of our <lri.n.ks factory example, they only de>s,ribe data Rnd
do not provide fill explanation of why variabks are of a ce>.rtain vali1e>.

Inferential stnlistics are used to drnw inferences about a popufotion from a given sam
ple. As noted earlier, inferential stati!ttiC:S c:an also be subdivided on the basis of non
pr,m:mwtric an<l parametric tests. A paramelric ll:St should only be applier! if the following
conditiom are met: you have interval or ratio d::ita, your ,;ample is randomly drawn from
the population, and your sample iS from a population that is normally distributed. The
1umnal distribution or bell cun1e is a grnph that shows data scores that accumulate around
the middle. The nom,al distrihution rroroses t.hat the mean, mode and median are all
equal. For example, Lhe results of an exam would show that the majority of e.xam scores
would ctn: round t.hi.: miJ<lk when t.he frt:que11cy curve is symmetrical. Wht>n the
frequrncy curve is skewi.:d, the mean, mode and median ,ill have different values.
Determining whether or not your sample is from a population that is normally
distributed ls imrortant as it will influt:"nce> your choice of stafatiuJ tests. One can
make this assumption using something called the Ce11tral Limit 11ieore111. The Central
Limit Theorem is bascJ on the notion that the average in a set of sample data drawn

from a wider population is approxirrrntcly c.lislributed as a normal dislributmn if certain

conditions are 111et. 'l'he main condition is that when a random, independent sample
has at least 30 ohsPrv::1tions 1 thf' distribution of all sample means of the same-sizi.:d
samples Joscd.y approaches a normal distribution of the.: population .&om wh.iLh Lhe
sample is drawn. Hence, for sample.: 1-izes of ahovf' 30, you can assume Lhat the sam
pli.ng distribution of' mean will approx.i.rnale to a normal distr1bution.
When a parametric tesl cannot be used, non-parametric test,; should hf' applie<l.
One advantagf' of thPst tf'st is that they make no assumption,; ahout the cli,;tribu
Lion uf Lhe population.
A number of paramdric :l.ll<l non-paramf'tric methods are associated wiui infer
ential statistics. AJthough by no means exhaustive, Table . l ti shows the methods
that Wf' shall examine in this chapter. ll also includf's the purpose of each method
(a 'P' or 'NP' alongside each 111Pthod indicates parametric or non-rarametrk respec
ti vdyJ and briefly illustrates how eilt'.h method might be applied.
At this stage, do not be too con,ernf'd 1f you are unable to recogniL.t: the condi
tions required in relation to each of the examplPs In Table 9. 16. By the end oft.his
c.hapter, you should be in a posit.ion lo develor your own examples for all of the
methods listed here.
TABLE 9.16

I xamp1es of inferential stat1st1cs



Hypothesis testing


Confidence inle1 vats


Time series analysis

Pear.;nn product moment
c:orrelatlon coefficient (P)
Spearrn,m's rank
corretallon coellicienl (NP)
Chi-squared test (NP)

Measuring association

Measuring difference

Student's t test

Measuring difference

Simple regression (P)

Asseslng l11e strength of

relationship belween
Assessing the strength of
relatinnship hetween

M11ltiple regression (P)

Measurln!l association

Examples of application

HO - Tllere ,s no difference in the mean exam

marks between male and female managers.
Hl - There is d lllfrerence In the mean exam
marks between male and female manager5.
Calculating a 95% confidence interval for lhe
r our lion of small firms in London that do
business with Ewrorie.
One-montn moving averages or retail sales data.
Correlating gender with heiglll.
[ornparin!l two managers' ranked assessment of
ten emolovees
Do some manufacturers produce more faulty
goods than ot11ers?
Comparing the sample means of ages of female
finance ancl marketing managers (independent
Strength of rettionsl1lp between allve1 lising
spend and sales.
Strength of relationshi between advertising
spend and training spend on sales.


fati111arinn reff'rs to esti mating a pop ulation parameter from samp le!>. It is unl i k e. l y
that you will he i n the- fortun a ti.: position to have access to th!;' enti re popu lati on . If
you have adopted a posit ivist approach to your rcsc,in:h , the l i kelih oncl ls that yo u r
intention is to estimate from your sample ch aracteristics of thL populat ion .
'l'wn meth nds commorJy useu tn f'sti n, atf' fro m ,;ample,; .iri.: hypothesi. testing
and confirlrnce in tm.,al.s.

Hypothesis testing
Hypnthf's1s tf'st i ng is one ol the mai n method\ used in i n terentiul slatisti,s . It
1 11volves m::t ki n g a statement about some aspect of t h e pop u l ation, then ge n erating
a samp le to see if the hypothesis can or t.annot be rejeLte<l . Tn tf'st a hypotlies.u. yo u
ni.:e<l to formulate a n ull h ypothesis (HO) and a n alternati ve hypothesis (H J ) The
fonner m akes the assumplion tl11-1t there is t10 cha nge in the va.lttc bcing tested . For
example, ' no change ' could_ relate to ' no difference' or ' no correl ation ' . For each null
h ypothC's is th er is an al ternative hypoth esis (H I ) . /\ n lternative hypothesis tends
to hf' vaguer than a n u ll hypothcsi1>, as will become apparl:'nt later on. The probabil
i ty of rejc<.ting th e null hypothesis whi.:n il is lrUc i:. referred Lo as the significan ce
lcvcl. If thi.: 1.ignificanc:e level is p < 0 05 (5% level) , tl1en you rej ect the n u il hypoth
esis. Converse.ly, if tbl:' significance l evel is p ?. 0 05 then you cannot re ject the null
hypoth esis. Be careful with your wordi ng. We cannot say 'accept ' . The umvPntion
among researchers is to use ' cannot reject.' the null hypothesi,;,
An example of a n ul l hypothi.:i!> might be:
HO: The.re is no difference between the 11alue of bonuses among 111(de and female
I-1 1 : There is a tli}Jerence het,1,PPn the uahtl' of bo1111ses among male n11d Jerrule
The step!> in hypothesis testing an: as follows:
State your null (HO) and alternative (Hl) hypotheses.
Choose a level of significance (typically the 0.05, or 5%, level),
Collect your data.
Carry out statistical tests.

011 1;'

of the following <lecisions:

t Reject the null hypothesis (HO) and accept the alternative hypothesis (Hl). or
1. Fail to reject the null hypothesis (HO) and subsequently note that there is not sufficient evidence to suggest the truth of the alternative hypothesis (Hl).
Type I errors and Type II errors Once you have rnrri<"d out your hyp othesis you should h ave
rcaLhed a decision ahout whether lo rejecL or not reject your null hypothesis, alLhough,
<lut lo sampling varialion, your c0t1clusiun will be subject to error A type 1 error is where
the null hypothesis is true hut rejected , wh.ik a type II enw is where the altemative

TABLE 9, 17
Accept HO
Reject HO

Pus.1lJl1' uut t 11 11:!S ul d I Yl> l1e:.1 1eI

HO is trul!

H1 is trul!

No error
T\lpe I error

Type II error

hypothesis JS true but rep.:>cted (see ' l 3ble 9. 1 7) It you set your level of significance too
high (e.g. p > 0.0 I ) , tht>rf' is a greater likelihood of you making 3 typ<" 11 enor.
IVP" L 1ypolht' ,,, t st RPfore carrying out a hypothesis test you 11eed ro Ji.:Li<le
wh ich ty(1f' of test to use_ There a.rf' two types of test. ' l11e Hrst of these is refcm:J
to as a 'two-tailed test', while the second is a 'one-tailed test' Your choice is important
11.s 1l will impact on how you word your altl:'mati ve hypothesis. Let us look at Pach
one iii tu1 n.
A two-tailed test ls carried out to see if your hypnthesis is above 0 1 bclnw w hat
you presume it to be. For exam ple, you might helieve tha t training (i.n di.:pen<lrnt
variahle) h as an effect on company performance ( dependent variable) but you can
not pre<lict the JireLtion . Whi.:n the alternative hypothesis (T--1 1) is written fl not
eq ual lo (:t:) , then you arc ind icating your in lt:nlion lu ca rry out a two-tai led test.
A on.i.:-taili.:c.l lct is usi.:d w hen you arc ma.king a preJiLlion lhat the.re will he an
effect in a particular direction . For example, if students are given extrn cla sses (i.m le
pendent va riable), their exam marks w ill increase (dependent va riable) .
If' conducting a left-tailed_ tesl, I I 1 is writtf'n as follows:
Hl: < 200g
If con<lucting a righ t-tai led tes t, 11 1 is written as follows:
H1: > 20Dg
Confidence intorval
A confide11c;P int(?'Tual uses a range o f values t h a t i s l i kely to comprise a n u nknown
population parameter. A parameter is a pop ulation characteristic such as ri propor
tion (.P) or mean (m) . CunuJcnce intervals are generally more lLeful than stniight
forward hypothesis ti.::;lS as they go beyond a sirnpli.: 'reject the null hypothesis' or
' do not re1ect the nul l hypothesis' by providing a ra nge of credible values for the
popuhtion parnmetN.
First, it f,; i mporta nt tCl make a cli,;;tinction between contldence intetvaJ aud confi
dence level. For f'Xample, if ynu collected data on joh eC'llrity from fmance managers,
75% mlght say that they [ell ' inst:cure ' in their job. [ le.nee, you m ight say that 75% nf
fl.na nce managi.:rs are insecure. You could support this by say ing that you are 95% cer
tain (con.fide.net: level) that this will represent the true population 95% of the Lime.

If. for example, we take a sample of 200 employees and find ii mean age of 35, can
we be sure that the mean age 1s representative of the population? lJnlortunately, our
mean (or point of estimate) is unlikely tu be the:> same as thL population, altllUugh it is
likely to he dost> to the population mean but have some c:>rror L1 order to address the
problem of our point estimate, we can define a range (confidence interval) within
which our population mean is likely to tall. The confidence intPrval is expressed with
R level of confidence that the inte, val contain.s thf' true population parameter. lypically,
most researd1c1s use a 95% ,onfidcnLe level. Thie; me::in that 95% nt tl,e prubahility
falls between z vn 1 ues of -I _ l)t, anrl + 1.96. These represent the most c:om111only used
critical value for calei dating rnnflrlence limits. A cri.tical 11al11e or z-scorc rc>lates to the
nwubcr of standard deviations t.hat the .:;ample mean departs from the popul Ation
mean of a normal tlilrihutton. Other examples are shown in Tr1hle () 18
A confidence interval is used to t>stimate the true mean of your emire popu
lation (rr) It is rcprc:,cnted hy the formula:


,, = population mean
x = sample n,enn
= margin of errors
z = critical value
cr = population stantl:ird deviation
n = s;imple size


0 71.96(0.046)


(0.61, 0.79)


I rnt' V I nd rec; t'( IIV

Critical value



z - uitirnl value
p = sample proportion
11 = samplt> stzP
,J = square root

'J'hus, we are 95% confident that the true proportion of small firms who have an
employee with a higher e<lurntron qualification is between GI% and 79%.

= (176.81 - 183.19) operational hours before servicing.


Now let us calculatl' the )5 1!-h confidence interval for tit<.: proportion of small flm1s in
CambriJgc that lrnve flt least one employee who holds ::i higher eduLatioo qualifo.:ation

-= 180 1.96x r.::

Confidence Interval (%)


Ip t 1 - P>

p = xln = 70/100 = 0.7

Let us say that you have rnndu<1ed a study inlo the rerformanc<.: of manufactur
ing machinery among a sarnple of 150 companies. Your Hndings produce n mean
length of operatJon before the mach111ery needs servicing of J 80 hours, with a
standard deviation of 20 hours. You now intend co detcrrnine thf' 95% ctmflJence
interval for the overall mean lengtl1 of operation prior to scrvidng, nmnng all com
panies that u,e this type of equirment.

le el r I

P :1:. z

Let us say that you have conducted a study into higher education qualifkations
among sm11ll busine,;ses in Cambridge. From your random sample of 100 companies
you have estahlished that 70 small firms have at le:u.t one \.'mrloyee who holds a
higher education qualification, t.hus x = 70 Jnd II = 100.

= x z

TABLE 9 18 C ,11r1dc n
l 1!11 1) V l e

In the above example, we assume that t.he standard deviation of the pop ulation is
known. fn reality, it is unlikely that we would know the standard Jeviation of a popu
lauon but not it:, me,111. ln most case:,, rcsearcheis need to tllic the sample :,"t::mdard
deviation (S) and mean to estimate: the population mean and standard deviation.
If you have :i sample srzl' n > 30, fl confidence mterval can nlso he use<l to esti
m,ite the rroportion in a population. This i:, rcprescnteJ by the l'ormula:

forecasti11g can he def'.u1ed as the estimation of a set uf values at future point m time.
forecastig plays an important role in business. fo, example, many organizations
engagl' in sale:, forecasting to mainlau1 better control over 11we11tory lcvds, while gov
crnm<.:nts forccast levels ot inflation m1J unemploymt>nt, nlung with eLOnomic growth ,
to determine npproprr:ite govemrnent policy. Forecasting flnandal data such as share
puces i5 also important for both individual and organizational decision-making.
There ar(' several different method,; associated with forecasting. These incl11d<."
qualitative met.hod:, and economic mode.ls. In this section I introduce you to on<' of


the moin categories assornited with fnrecasung - time senes a11aly1s. This is an intro
cluction to the subict.t. for a mo1e detailed d1scuss1on on time series analysis you
will need to consult a book dPvoteJ to fon:t.:astmg - see the Punher Reac.ling section.
Time series


A limf' senes is ;1 se11es of data rninl'i that art' typically mt>asurP<l over regular Lime
intervals. Time series :rnalysis involves using ,a1ious methods to understanJ a time
series in a 'historical' context, as well as lo make forei..asts or prcui<.:Lions. In rdauon
to the latter, we shall now Pxami1w two of thf' rnmmonly used methods of time
sf'nPs an:ilysi, - 'simple moving a, erage' anJ 'wdghtt>d moving averagt:'.
A .mnple mo11i11g auerage is used to compare possible changes in a variable over tune.
ft 1s found hy calculoting the mean for a given timt: period. For examplP, let us say that
yuu carrif'd out a study ullu the ages at whid1 inJ.ividua6 were aprolnted a!> Chief
Executiw Officer (CEO) for one partiLLilar company Since the firm\ inception in
1978, you have cstabBshed that there havl' heen eight C'F.Os Their ages upon appoint
ment are as follows: 53, 60, 55, 52, 49, 58, 48, 61 Our moving averagt: depends on the
numher of years that we wish to use as nur period. fur example, using fow years:
The first year is:

53+60+55+52 55

Tht. secoml year j,;;

+55 +52 +49 _

Thl th m.l year 1s:

55 1 52 + '19 + 58

Tht: fourth year is:

52+'19+58+43 51.75


_ 515

The advantage of the simple moving average is that it is easy to caf1 ul,1te and pro
viJes .1 reasonably acc11rt1te estimate for forf'casting. Howevcr, the main drnwbac.k i!>
that it does not account for po!>sible tr<>nd.s. For e..xampk:, 111aJ1y !>ea.<;onal produLts
experienlc an inuctl.!>e in sales over the Chnstm::is reriod. Ideally, this trend nel"ds to
bf' taken into ac:count when forecasting. One way to do this is to weight your data.
A u1e1ghted moving auercige is where greater signifiLance is rla,Pci on one part or
your <lata set than the rest For example, in A volatile economic:. climatic' the pric:.c of
petrol may fluct11atf' to a large degree. Hence, you are more likely to gsve the must
recent data in your set of fuel pnces more s1gn1f1cnnce than earlier data, as they give
a hetter rcprcscntat1on of the current state of thf' market.

"1 asu i


s ... ,ci.,i"n

If you have gatht>red bivariale data, then an interesting test would be to fin<l oul if your
two variables .ire associated in somf' way. For instanc:.e, you might be interested in
researching a possible association or correlation between age and salary among a group


ot employees. Rcmernbt'f th.it rnrrelntion does not mean causation. Fur l!xample, it an
umhrc>li:l manufacturf'r finds that there 1s a rnrrdat1on between umhrella sales and
::innual rainfall, this does not ncu:ssarily mean t.h::it incrca.se<l rainfall causes an increase
in umbrella sales. Othc1 variables ,Ut: likely to impaet on sah:s These might include:
p1ic.ing, d1t st,1lt uf the economy, or possibly comp<>tition 111 the ma, ket
Corr0 1"'1n coefflc1ert
A c-orrelation rol"ffiuL,nl 1s use<l for b1vartate analysis. ft measures thf' extent lo
whiLh two vanables art linearly related. Measurement is represented between
and 1 A value of 1 rep1est>nt,; a pe1 felt positivt: correlation; n pC'rfect negative lint>ar
relattnil',hlp 1s 1eprest>nted hy a valut' I, and a Lorrt>lation -oeltinen1 uf O mean!\
that there 1s no relationship lwtWt't'n the two variables. In other wor<ls, both vanables
are perfPctly independent.
ln reality, it is unlikely that you \Aili produce Hnclings that arc pcrflclly corr'latf'd
or perrcctly indcpl!ndcnt. 'Iypically, values usually fall somewhe1 c betwcc.J1 1 and O.
A straightforward way to find out if there is a correlation between t,.vo variables 1s
tn plot th<: data A:, we have <:een earlier in this chapter, a scotter plot is a simple way
of doing this as it cle;irly illulrales the rnrrelation bt:tween two vanahles. Howeve1 1 it
does not measur' thf' strength of the rt!lationsh1p betwt:cri lwu vanahllc's. In e,<;sence,
thl!re are two tyres of Lurrclatiou c:.oeffk1ent Penrson's pruduct 11wmt'11t mrrdnrim1
i ent and Specmnan's rank currel,tiun aiefficirmt. These are cxarnine<l beluw.
., cnri' c;

nt411 I .,..,.,,..Dnt rnrr I : 1 - "icieflt

Pearson' prvd11ct moment correlmio11 (r) ts a pararnelrk tf'chnlqi1e lh..:tt measures the
strength of association between two , ariables or bivarinte datn for example, you
may ish to examine a possible relationship between advertising spend nnd number
of salt's, or agt> and height amung teenagers. The data useJ must be of an interv::il 01
ratiu type and be normally distnbutcJ.
If your answer produces a strong relationsh1r between your x an<l y variables,
this docs not mean that:). rnuse.s y. We can go on to test this possibility by carrying
out 1egression trnalys,s.
Pearson',; product moment correlation coefficient .is represented hy tht> formuh.t:
r - -.======-:===-

11 = the number of data pairs
y - Llw dPpendf'nt variable
x = the independent vaiiahle
- square root
= the :.um of


By way of illustration, an engineering -ompany wishes lo lesl the association

between the nwnbc.:r of breakdowns to its machinery over a pc.:rio<l of 12 Jays and
;h consequent lime taken to carry out the repairs. Tbc.: data arc set out in Table 9.19.
Using the data in 'fable 9.19, we can now c.:nte1 this into the formula. Thus:

532 499
l180 1611(1.678 1,541]



n (rt-1)

r,=1-- -


The 11exl step is to intf'rpret our result. This can be made hy checking the v.ilues of
the correlation coefficient (r):

d = the difference between t.he two ranking!> of one item of data
11 - the number of ilc:rn uf Jata
= lhe sum of

belwcen 0.70 and 0.99 is a strong positive correlation;

bc:lwceo 0.40 and 0.69 i<; a medium positive correlation;
between () ;ind 0.39 is a weak positive c-orrelation;
between O and -0.39 L a weak negative correlation;
hf'lween -0.40 an<l -0.69 is a medium negat ive corrdaLion; and
hetween -0. 70 and -0.99 is u strong negalive correlation.

"f.rl'::. 0 + 9 + 1 + 0 + 0 I 1 + 0 + 1 + 1 + 9 = 22
r, = 1

r ouu tfo11 output !ala


Number of
breal(downs (x)

Time taken to repair

(minutes) (y)





















10(100 -1)


ln our ;ihove example, the rf':mlt (0.65) is significant at the S 1X1 level. T his means
that thPrf' is a medium positive corrtlation between the number of hrcakdowns and
the time taken lo repair each breakdown.
TABLF 9 19

crre' tirin rr f :r, nt

Spt<llrman 's m11ll correhllicm caefftcie111 (r$) is usc.:J to tes1 the strength and direr
Liou of assoc10tion betwcc.:n two ur<linal variables. ft is a test of assoc1::1tion th :it is
used for oon-p:m1mf'trlc data and can be usf'd tu ai<l either the proving or dis
proving of a hypothf'sis, f'.g. thf' profir of a company i.ncreasf's fls the size of the
company increases (see Table g,20). The formula give11 to Spearman's rank cor
relation c:nefficient 1S as rolJows;

- J(1Y)(13 I)









r. = 0.87
As noted earlier, Lhc.: clrn;er the r, v:ilue is to + 1 or -1, the stronger the l.ilely correlation. You shoul<l now be aware that our value for the above example of 0.87 for
r me.ans that there: is a slrong positive relationshjp_

TABLE 9.20



Number of

Rank of














mp nv ..,,z1 111111, r1t

Ra 1e I tldl







Rani( profit

Difference between
ranks {ct)









;tJrlnCJ d1ff erc:

TABLE 9 22

Measunng lht: diffcrem..c involves testing a hypothests that the, e 1s a <lifferf'nce

between an observeJ fre4ucnc.:y ::ind an e'<pected frequenc-y. In the last section we
looked 3t the strength of asi.oliation betwef'n variables. The data we h:ivt' used to carry
out thf'se tt>st have heen ordinal, i.ntcr\'aJ or ralio data. T he c.hi-squarec.l test 1<; a uscfiil
test as it Gill be use<l not only for mcasuriJ1g Jiffcrcnu.:, b11t abo for nomin:.tl Jata.
rhj- Oll"'rf'1 1"c;t
Thf' chi sr:1uarl'rl trst 1s 011<' of the ltlost widcly used h ypothcsis tests. Tt rnn ht> ll!,ed
for all type, nl Jala anJ i thf' only hypnrlwi ter frn 110111inal d:.tt:.t It is a non
parametri1.. lesl w.eJ tC1 st.>e 1f there 1s a stalu;tiu.lly s1gn1flc..aut <liffcrcntc between
observed data and what woulJ have been expected hy ch:ince The dH-squarcd test
1..an test !01 the null hypothesis (110), 1.t'. that there 1s no s1gn1hrant d1ffercnce
between the experted and oh5erved rcsull. Thc formula is:

0 = observed frequencies
E = cxpectcJ frequendt>s
The test only aplks lo calegoncnl data that ::ire ,ount,; or fr<"(]llencies, llol per
centages. The sl111plest exomplc of prei,cnting the ass<>cialion betwt>en observations
and categories is to ust> n 2 x 2 contingency table.
for example, five i..:tlmpanieci (A, 13, C, D, E) were 4ucslio11ctl on emrloyee skk
leave over a 12-monlh period (see Table 9.21). We now aim to establish if some
LOmpanies have more sic.:kness in the workplace than others. We can set out our
hypotheses as follows:
f-10. Eurh cnmpal!y xpects the same number aJ emplayees rm sick l1n11f'.
111; Ec1d1 rnmpnny does not liavt the same 11umber uf employl'I!,\ 1111 sick len11e.
The level of c;ignifkanu: ltl be w;ed in this ,ase is c;%.
The next step in our analysis invnlves compiling a du-squared table (see Table ().22).
Thcrc arc a total or 90 pl:'ople who have taken sick leave. lf each company expects
the same number, th\:'. expected number (E) of those who h,we taken sick leave is:

gQ = 18

The t-aklllation from '!'able 9.22 shows that the value of xi 1s 6.131.

rr. Lo e


Numner on Sl(k leave









I h1-


I tat


Observed (0)

Expected (E)








rota I







(0- E)Z

(0- E)Z







Our ac.:tuul value of 6.331 1s less than the (nt11..al value of 9.49. Theretore we
cannot reject the null hypothcsis. In other words, eaLh comp:my can t>xpect the
same number of employees to go on sick lf'ave. Any Jicrepancy in numbers is
purely hy chante

Student's t-tec.t
Tlw Swde'llt' r-tesr 1s a parametric te1..hnique used to test the d1fferenc.:e between

sample means. 'fhe samples must be gathered from two diffe, ent populations. In
short, the Stutlc>nt's t-lest c>stahlishes the probability that two pnpul:Hions a.re the
same in relation to the variabli! that is being tested. In order to (arry out u \-test,
your data must be normally distributed, of tnterva1 or rnllo status, and the two data
sets must have similar variances. ln addit1cm, or 3 po ired t-test (see below) each data
pn1r must be 1dated
T tlc;tc; infer the likelihond of three 01 more distinrt groups being different. An
inJepl:'nJent ttest is u.,e<l to test the dtfferenu! between two mdependent groups,
e.g. mall! and female. Let us say that you gathered data on IQ levels among male
do1..to1s Jncl lawyers, and compared the s:-imple means using the t test. I\ probability
of 0.3 mean'> lhat there is a 30% chance that you cannot dislinguish hetwe<"n your
grnup nt doctors and lawyers based on IQ alone.
A pairE'd sample t-tt>st is used to establish whelher or not there exists a !>igmfi
cant d1ffert>n,e hetween the mean valut'.s of matcht>d snmples ll is urten used to
measun: a case before anc.l after some form or man.ipulaLion or,hanges have taken
place. For example, yuu might u!>e a paind t-test to e!itublish lht> signifkance of a
difference n\ exam performance prior to and after a professional training pro
grnmn1e. A paired t-test can also be used lo comr,are samples. ror instance, our
exam example may involve comparing the effectiveness of the training programmt>
in imrnwing ex:im <;cores hy sampling employees from different LOmpanies, anJ
i.:omparing the scores of t.hnse respondents who have taken part and those who
have not taken part in the lratrung.





..,nh r,f rol t1()rchj') hoty.,oon \/!:!r1h[P

'lyrically, th1:: ftnal p.irt of stat1st1n1l nnalysis oen involve,; assessing the ,LrengLh of
the rdat10nsh1p hetween vanahles.
p -

r, n "1r;;i

Although a JetaileJ analysis anJ application oi rewessiun analysis 1s beyond the remit
nf rhis houk., t.lus <;ectlon p1 nvidcc; a brief inc;1ght into huw it might bt> applied tn your
research Regression rmn(yst is ;1 stntistical technique fur im f'stigaling tlw strength of
t1 relationship between vanables. Typically, the researcher aims to e!'ltahlish the LJusal
cffect of ont> variable on another. For example, thf" effect of a discount m price on
consumer demanJ, or company size on performanct'. Essentially, there are two main
types of rt>grei,i,ion an.ilys1s - s1111ple regression anJ m11ltiple regression.
Simpk regression determines the strength of n.::hnionship betwf"en a depentlent
variable and one 111dependent variable. It aims to flnd the extent that a JepenJent
(yJ and independent variable (x) arc linearly related. A regression c4u.nion is oh:en
represented on a scatter plot hy a regression line. A regression line is usc<l lo dearly
illw,trate the relationship between the variablt>s under invDLigalion. For example, in
linear rc;>grcssion you n11ght want to investigate t.he rdanonship between profo ::ind
advertising spend First, let us lnok at the formula tor linear regrf"ssion, and then
how this relates to Ol-lr profit and a<lvcrtising spend examrle. The formula is:
y =a+ bx


Tt the above quation wai, implementf"d, the reg1esi,iun rneffic1ent 1ndiLates bow
gootl a predktor it is likely to he. Rememhf"r th::it the value proJm:eJ is hetween - I
:mt! + 1. A figme ot + 1 i ndicates that your f"quation is a perfect predictor. Con
ve1 sdy, a value of O shows thrit the equation preJ1cts none of the variation.

,ndPpendf"nt variable

Hn Do I Kn

' ich Static:tir

I Tpc:,:tc:,: 0

U e?

[Jrown and Saw1Jers (2008: 103 I 0'1) m:ike the following suggt'stionc; beforf"
choosing a partk11lnr test:
What is the research question I am trying to answer?
What are the characteristics of the sample? For instance, are you using judgement sampling,
snowball sampling, etc.?
What types of data do I have?
How many data variables are there?
How many groups are there?
Are the data distributed normally?
If the data are not distributed normally, will this affect the statistic I want to use?
Are the samples Independent?
Ln atltlitiun, if you wish to make infernLes about a population.
Are the data representative of the population?
Are the groups different?
Is there a relationship between the variables?

y = dependent variable

a = point where the line intersects the y-;ixis

b = gradient of the line

Profit = a + b x advertising spend

Multiple linear regression aims to find a l inear relationship between a dependent
variable lv) anJ several independent variahlei. (x,). The multiple regression rnrrela
tion coefficient (r) 1s a measure oi the proptltlion of varwbility explained by, or clue
lo, the linear rf"lationship in a sample of paii ed data. It 1s rerresented by a number
between O and 1. The formula for multiple I egression is:

Let us say that our example of UH: relationship hetween prol1l anJ ac.lvert1sing
spend was lo take into account othe1 factors. for 1rn,tance, profit might also be
affectctl by staff Lraining expenditure, prire, bonuses anJ Lompetition T his would
he rep1csented in the following multiple regrsiun formula.
Profit - a + (b1 x staff training expenditure) + (b1 x price) + {b3 x bonuses)
+ (b4 x number of competitors)

<a;',\tjc:tir::lll Snft ::aro P::llC'k


Lf yuur rest>arch involves quantitative data analysis, it is unlikely that you will clo the

work manually. Most rnmputers have acces!-> to some kind of spreadsheet pack.ige,

such as Micrm,ofl E'fcel. These certainly allow you lo carry out elementary statisti
cal analysis. Y<:l, they Jo not have the range of options tyrically associatctl with
statistical packages. Fortunately, thf"re are now several excellent software packages
on the market. Many of these are user friendly and ,deal for the stuJent researchcr.
Twu of tht> leaclrng packages usl'J in UK institutions arc SPSS anJ Minitab. Thii,
bouk ii, not i ntendecl to provide :.1 LOmprehensivf" guide on how to use either of
these packages. The important thing is Lliat you recognize the advantages of using
such a packagl' as opposed to undertaking manual quant1tat1vt> data analysis. Tht>
advantages of using a software package are.
it saves time:
you avoid the need to learn how to perform calculations:
it provides greater scope for your analysis; and
data can be easily recorded, interpreted and presented.

Tha11kfully, tl,is means tl,at you dn not need to worry about remembe1 ing diffcrent
formulas! I lowevcr, you still need to understand the purpose of statistical tests and
the c:ircumtances in. which they Lan be used.

Prpp,:aring Your Data for Ancalysic; usng SPSS

Earlier 111 thf' chapter l brieny Jisrnsst>d preparing your data for analysis. This sec
tion provides a brief guiJc JS tu how tu enter Jata into SPSS. Entries are based on
IBM SPSS Statislils Version 20. Before entering yL111r dnta you will nf'ed to t"llter
the rndc1. for etich quPst1on so that SPSS reLognizes it. In Table C).23 there L" 1111
L"Xtract from :i c:bta spreadsheet on housC'hold spending. It include the foUowing
variables: case number (3 digits), age (2 digit,;), gender (I "'male or 2"' female),
nationality (1 = 13ritish, 2 = German, 3 = French, 4 = Chinese, 5 = Other) anJ
weekly houcholJ spending (f).
When you first open SPSS, yo11 will be presented with a window that ron
tains the following; Open an existing data source, Open anuther type of file, Run
the tutorial, Typf' in data, Run an C;"xistin g query, and Create new query using
Databiise Wizard. We want to enter data, so sdcct Type in data by placing the
n1rsor over thl'. Lirdl'. next lo Type in data. then click OK Thi" will takf' 11s to
the Data View screen. ln order to enter variables, we nf'ed Variahlf' View, so
click on Variable View in the bottom lett-han<l cormr of the screen. We are now
in the Variable View scrf'f'n and rPiHiy to hPgin naming and defining the propertif's
nf the variahlf's.
Name: Here, under Name and to the right of box 1, type a word to identify the variable. In
some surveys each question is given a number in order to preserve anonymity, e.g. 'case'. So
we can call our first variable CASE. The name must start with a letter, should not have any

TABLE 9.23

f:xlro: 1l I 1111111 a data spre,:1ll ,ht:l:.'l

Case number




Weekly household spending Cf)

























spaces and can be up to eight characters. The second variable relates to the first question in
our questionnaire (Age). A simple way to refer to this is 01. so type this in the box under
CASE. Type 02 to represent gender under 01, and so on. Do the same for Nationality and
Weekly household spending, QJ and Q4 respectively.
Type: The two basic types of variable that you will use are numeric and string. A numeric
variable can only have numbers allocated. String variables may contain letters or numbers. II
a string variable contains only numbers, numeric operations on that variable will not be
allowed. To change a variable type, click on the small grey box on the right-hand side of the
window. This brings up the Variable 'Type menu. If you select a numeric variable, you can then
click in the width box or the decimal box to change the default values of eight characters
reserved to displaying numbers with two decimal places. For whole numbers, you can drop
the decimals down to zero. If you select a string variable, vou can indicate the number of
characters to be allowed for data entry in this string variable. For gender and nationality.
select string as these are nominal data.
Width: The width of the variable is simply the number of characters SPSS will allow to be
entered ror lhe variable. SPSS selects eight by derault.
Decimals: This is the number of decimal places SPSS will display. For whole numbers, set
the number of decimals to zero. By default, SPSS inserts two decimals for each numerical
Label: The label helps you to recognize the variable. For example, 01 asks about your age:
you could give a label of 'Age', while for Q2 'Gender', Q3 'Nationality' amt Q4 'Spend'. You
could also use a key word for longer questions. For instance, a question on your favourite
holiday destination could be called 'Holiday'.
Values: Under values enter the codes and what they represent. For Q2. click to the right or
the cell under Values and a window will appear called Value labels. In the Value Box enter
'1' and in the Label Box enter 'Male' and click Add. Next, enter 2 in the Value Box and
'Female' in the Label Bo)( and click Add, followed by OK. Adopt this process when coding for
nationalities, entering 1 to 5 for each option.
Missing data: Sometimes respondents may miss out a question because they choose
to do so or it ls not applicable. It it is not applicable. tnen you should include this as an
option in your questionnaire. Any non-responses must be coded. SPSS provides a
defaults measure for missing data or non-response. You can choose your own number,
but make sure that the same number is used for all non-responses. Choose a number
unlikely to be used and easily recognizable, e.g. 9999. In this example there are no miss
ing data.
Columns: This refers to how many columns wide you would like the variable to be presented
in the Data View. We will leave il at the default.
Align: Here you can change the pre.sentation so that scores for the variable are left justified.
right justified or centred. We will leave it at the default.
Measure: Here, SPSS gives you a choice or Scale (this represents both ratio and
interval scale), Ordinal or Nominal. We looked at these different types of variable ear
lier in the chapter. For Ql (Age) and Q4 (Spend), select Scale by clicking to the right of
each respective box under Measure, While 02 (gender) and Q3 (nationality) should be
Role: This is a feature of newer versions of SPSS. The column here is concerned with the role
your variable is going to take in the analysis. For the procedures we are going to carry out it
can be left as the default role of Input.

1-.ntering the <law can he tJme-consum.ing and 1.ed1ous. J Iowever, il is 1mportant

that yo\l enter the data ac<..urately, a,; incorrectly entered clat:.i wJ.I of course impact llif'
analysis and interpretation of your nndings. Fit1rc 9.10 shows ou1 wmpletf'd Data Vkw.



'i' '1Plf 'lJ Ci 'll 'Ur


,. v.. ..::.. -


p tt c:tjrc l fci'l


SPSS allowi, you t o n1lrnlate ::i wide range of df's,riptive i,tat.istics 'illch as the mode,
nwdlan a11d 11\l'tln. In the Data Vit'W o;cree11 select Analyze > De.srriptive Siotistics
> Descriptives Ncxl, thl' Dei,criptives dialogue box will arpcar. We: are intereslcJ
ir1 analyzing housebolJ spCJ1J, so did nn Spend [Q41, then the: middle :mow
Spend [Q41 should move to the Variahlr.(s): box. Next, l Jick on Options. The
Descriptive: Options box will appear. Select Mean, Suru, StJ.dt,viation, Range and,
11nder Di!.play ort.ler, Variahle I ist. CliLk Continue, followeJ by OK.
SPSS no\\ proJuc:es an output showrng our selected descript1vf' statistics for
household spendlllg (Figure 9.1] ).


F tirina

Producina... rh rts Uc:iru:s <;p<:.<:.

Vn11r ('bt::a i r, tf") C::P<:.<:.

Once you have named and defh1eJ the properties of your variahles, you nre rcaJy to
begin entering your data. First, return to Data V1ew and enter the data vnl11es fo1 your
1cspondents as surnmari:wJ in Tahle 9.23. If you place your cursor over the nan1e of a
variable, SPSS will show t.he lahel ym1 added in V11riablc View (Figure 9.9). To enter
new data, click in an empty cell in the first row. Use the arrow keys to enter the values.







Oat V1 w


As mentioned e::irlit>r in the chapter, a har chart compares a simple set of

observations, A har chart i'i a straightforward way of ummarizing either ordinal or
nominaJ data. To product> a bar chart for Nationality the process 1s as follows: dick
on Analyze > Descriptive Statistics > Fr('quencics. The frequencies box will open.
Click on Nntionalit-y [Q3J and the midd((' anow. Nationality [Q3] no w moves to
tl1e Variahle(s) box. Next, cli<..k Chart,;; the Frequem.ie: Charts bllx will open.
Un<ler Chart Type select Bar charb and under Chart Values select Frequencies.




sc 1pt1ve


Click Continue. Selert Uisplay Frequency table and click OK. SPSS now prndui..es
a ba, drnrt for nationality ,;howing fn:qucnLy rnw1ts (!,e Figurl" C), 12), along with
a frequency' table (sl:t' figurt> 9.13). Clearly, our sample n = lO is very small. A much
larger sample showing stuJent 11::it1onrility will make for m1.1ch more interesting
When you arc reaJy to save or open your wurk, t..he steps are as follow
Saving your work- File > SRV(' As :> fatter :1 filL:name in the filename box, e.g.
'household' > click Save.
Opening your work: File> Open > Dala > dick 011 'houschulJ' > Open,

Firn11ly, this SP,tion has provided a snapshot of how SPSS works. You will And
more information on the accompanying website, ln aJJition, rccornrnended sources
are listed at tl,e PnJ oF t..hc thapler.

Using descriptive statistics and lwoothes1s testing

111 the article below, the authors undertake a quantitative-based study to examine the growth and
planning strategies within women lee! small and medium-sized enterprises (SMEs). The article
le, usetul as it illustrates how quantitative analysis might be Incorporated Into your own study
Mitchelmore. S. and Rowley, 1. (2013) 'Growth and planning strategies within women-led SMEs',
Management Decisions. 51 (1): 83-96

The authors set out the aim of the paper as exploring 'the planning strategies or female entre
preneurs who have indicated a desire to grow their busine5ses. the time horizons of planning
strategies, and the relationship between planning horizons and number of employees and annual
sales as measures of business performance.
The paper begins by provlC!lng an introduction 10 the study and identifying a gap in ttte literature;
namely, although there Is a growing interest in female entrepreneurship. there is a lack of research
into the influence of gender, especiaUy In terms of growth and long-term business performance
Following this, the authors review the key literature in Lhe context or business performance
and growth and business planning, and 1n female entrepreneurship. As well as having a set of
ob1ectives within the introduction, the authors also include two hypotheses that emerge from the
literature review. These are as follows:


A 11 r rl1cJrt ror I al1or1alitv





c.v t le Im II l11n Jly


Hl. There is a positive correlation between number of employees and the time horizons for
business planning.
H2. There is a positive correlation between business turnover and the time horizons for busi
ness planning

For the methodology, the authors adopted a questionnaire-based survey 1n order to collect
data about lemale entrepreneurs and their businesses. A convenience sampling method was used
when distributing the questionnaire. The researchers' rationale ror using this sampling technique
was the difficulty of identifying businesses that were owned by women.
Questionnaires were administered via email and sent as an attachment. A total of 210 question
naires were collected. The authors state that only questionnaires where respondents wish to grow their
business. and had completed all of the relevant questions on the profile of their business, were used
for analysis. Subsequently, data were entered into SPSS for descriptive analysis and hypothesis testing.
The descriptive analysis begins by analyzing the respondents' profiles. This includes their
age, education, years of business experience, sector and age of business. Second, a table is
provided that Illustrates the percentage of respondents who indicated that they used a
specific strategy lo promote the growth of their business. or the various strategies listed,
the most popular choice in terms of relevance to business growth strategy was 'Improving
existing products or services at 42%. Another table is used to h1ghl1ght how far ahead
respondents had planned for a range of activities. Including: sales. cash flow, new products,
entry into new markets. recruitment or human resources. expenditures and investment in

' 7J

_c-oMMoN ouesrfoNs ANi; ANsweRs

1. Mathematics is not one of my strong points. How do I know which method to use when
analyzing my data?

Chase:, C. (2009) Demmui-Dri11e,.1 Fnrecmtin

g: A Stnictured Approach
to Fvrecasn.ng.
I loboken, NJ: John Wil l:' y & Son.
Field, A. (2008) DisrrJ11ering Starisrics Usfrw
"' SPS
. (3rcl
.:.,,d 11) . Lon d on:. age
Kf' U,1::r, G,. :in J Warrack. I:!. (2002] Statistics for
MmuiKeinent a11d fa,onam'ics.
. Londo
Thomson Lea.rntng.

Answer: VoUI c.ho1c:e nf methulls rJeeni:Js on a r1urnher of far.tors. These inr:lut1e the numhP.r
ol va11ables, tyrie of data, sampling method and whether you have a normal d1stributinn
Using descriptive stiltist1c.s 1s relatively stri!lghlforwartJ, whrreas more acJvaricetJ analysis
rnn uc 11ndertilken us1r1g a !>tat1st1cal :.ollware package such as SPSS. Thi, avoids having to
manually pe1 form calculallons However, yn11 still need to be a hie to unrlcrstand the
cond,t1ons under which ,1 test can be earned out. under..,t nd It., purpose ,1ntJ be able lo
interpret your f111ding!>
2. Shall I analyze my data manually or use a sortware package?
Answer: If you are using descriptiVl! statistics. then there 1s no reason why you cannot
undertake ma11ual analysis Still. w1te11 r,resenting your results, thr.re is no substitute for the
likes of SPSS. A spedal11.ecl package contains several options lor present111g VOUI data.
Moreover, il can help tu reduce the mnount of tirne devoted lo this stage of your p1oject.
3. How should I structure my quantitative analysis within the body of my research project?
Answer: Your dem1on here depends on the extent of your analysis. Broadly speaking, start
with descriptive, followed by inferential, statistics. If you have ;:ictnpted a mrxed methods
approach to yo11r research. then your decision to sta1 l with qualitative or quantitative analysis
1s ijcnerally up to you. fhis might I.Je influenced by previous research on your chosen topic. or
possil.Jly advice from your research supervisor. Whatever you decide, the key thing to remember
Is that ynur analysis must he presentP.d in a rlear. thcmatir. way


Brown, R.B. am! Saunc.le.rs, M. (2008_) Dealing willt St.atistics: What Yuu Need Tv
K11ow. Maidenhead. McGraw-Hill/Open University Press.
Bryman, A. (2004) Social Research Merlwds (2nd edn). Q,..-fnr<l: Oxford University
Mltchelmore, S. and Rowley, J. (2013) 'Growth and planning stnnegks witltin
won1cn-led SMEs', Management Decision, 51 (I): 83-96.
Woters, D. (J 997) Quantitative Methods for Business ( 4th edn) l larlow: 17f/
Prentice Hall.

Further Re ding
Barrow, M. (2UOS) Statistics for Hco11omics1 AccountinK and Bitsiness Studies.
f f;irlow: FT/Prenticf' I !All.