Vous êtes sur la page 1sur 12

28/01/2015

LineardiscriminantanalysisWikipedia,thefreeencyclopedia

Lineardiscriminantanalysis
FromWikipedia,thefreeencyclopedia

Lineardiscriminantanalysis(LDA)andtherelatedFisher'slineardiscriminant
aremethodsusedinstatistics,patternrecognitionandmachinelearningtofinda
linearcombinationoffeatureswhichcharacterizesorseparatestwoormoreclasses
ofobjectsorevents.Theresultingcombinationmaybeusedasalinearclassifier,
or,morecommonly,fordimensionalityreductionbeforelaterclassification.
LDAiscloselyrelatedtoanalysisofvariance(ANOVA)andregressionanalysis,
whichalsoattempttoexpressonedependentvariableasalinearcombinationof
otherfeaturesormeasurements.[1][2]However,ANOVAusescategorical
independentvariablesandacontinuousdependentvariable,whereasdiscriminant
analysishascontinuousindependentvariablesandacategoricaldependentvariable
(i.e.theclasslabel).[3]Logisticregressionandprobitregressionaremoresimilarto
LDA,astheyalsoexplainacategoricalvariablebythevaluesofcontinuous
independentvariables.Theseothermethodsarepreferableinapplicationswhereit
isnotreasonabletoassumethattheindependentvariablesarenormallydistributed,
whichisafundamentalassumptionoftheLDAmethod.
LDAisalsocloselyrelatedtoprincipalcomponentanalysis(PCA)andfactor
analysisinthattheybothlookforlinearcombinationsofvariableswhichbest
explainthedata.[4]LDAexplicitlyattemptstomodelthedifferencebetweenthe
classesofdata.PCAontheotherhanddoesnottakeintoaccountanydifferencein
class,andfactoranalysisbuildsthefeaturecombinationsbasedondifferences
ratherthansimilarities.Discriminantanalysisisalsodifferentfromfactoranalysis
inthatitisnotaninterdependencetechnique:adistinctionbetweenindependent
variablesanddependentvariables(alsocalledcriterionvariables)mustbemade.
LDAworkswhenthemeasurementsmadeonindependentvariablesforeach
observationarecontinuousquantities.Whendealingwithcategoricalindependent
variables,theequivalenttechniqueisdiscriminantcorrespondenceanalysis.[5][6]

Contents
1LDAfortwoclasses
2Canonicaldiscriminantanalysisforkclasses
http://en.wikipedia.org/wiki/Linear_discriminant_analysis

1/12

28/01/2015

LineardiscriminantanalysisWikipedia,thefreeencyclopedia

3Fisher'slineardiscriminant
4MulticlassLDA
5Practicaluse
6Applications
6.1Bankruptcyprediction
6.2Facerecognition
6.3Marketing
6.4Biomedicalstudies
7Seealso
8References
9Furtherreading
10Externallinks

LDAfortwoclasses
Considerasetofobservations (alsocalledfeatures,attributes,variablesor
measurements)foreachsampleofanobjectoreventwithknownclassy.Thisset
ofsamplesiscalledthetrainingset.Theclassificationproblemisthentofinda
goodpredictorfortheclassyofanysampleofthesamedistribution(not
necessarilyfromthetrainingset)givenonlyanobservation .[7]:338
LDAapproachestheproblembyassumingthattheconditionalprobabilitydensity
functions
and
arebothnormallydistributedwithmeanand
covarianceparameters
and
,respectively.Underthisassumption,
theBayesoptimalsolutionistopredictpointsasbeingfromthesecondclassifthe
logofthelikelihoodratiosisbelowsomethresholdT,sothat

Withoutanyfurtherassumptions,theresultingclassifierisreferredtoasQDA
(quadraticdiscriminantanalysis).
LDAinsteadmakestheadditionalsimplifyinghomoscedasticityassumption(i.e.
thattheclasscovariancesareidentical,so
)andthatthecovariances
havefullrank.Inthiscase,severaltermscancel:
http://en.wikipedia.org/wiki/Linear_discriminant_analysis

2/12

28/01/2015

LineardiscriminantanalysisWikipedia,thefreeencyclopedia

because isHermitian
andtheabovedecisioncriterionbecomesathresholdonthedotproduct

forsomethresholdconstantc,where

Thismeansthatthecriterionofaninput beinginaclassyispurelyafunctionof
thislinearcombinationoftheknownobservations.
Itisoftenusefultoseethisconclusioningeometricalterms:thecriterionofan
input beinginaclassyispurelyafunctionofprojectionofmultidimensional
spacepoint ontovector (thus,weonlyconsideritsdirection).Inotherwords,
theobservationbelongstoyifcorresponding islocatedonacertainsideofa
hyperplaneperpendicularto .Thelocationoftheplaneisdefinedbythethreshold
c.

Canonicaldiscriminantanalysisforkclasses
Canonicaldiscriminantanalysis(CDA)findsaxes(k1canonicalcoordinates,k
beingthenumberofclasses)thatbestseparatethecategories.Theselinear
functionsareuncorrelatedanddefine,ineffect,anoptimalk1spacethroughthe
ndimensionalcloudofdatathatbestseparates(theprojectionsinthatspaceof)the
kgroups.SeeMulticlassLDAfordetailsbelow.

Fisher'slineardiscriminant
ThetermsFisher'slineardiscriminantandLDAareoftenusedinterchangeably,
althoughFisher'soriginalarticle[1]actuallydescribesaslightlydifferent
discriminant,whichdoesnotmakesomeoftheassumptionsofLDAsuchas
normallydistributedclassesorequalclasscovariances.

http://en.wikipedia.org/wiki/Linear_discriminant_analysis

3/12

28/01/2015

LineardiscriminantanalysisWikipedia,thefreeencyclopedia

Supposetwoclassesofobservationshavemeans
andcovariances
.
Thenthelinearcombinationoffeatures
willhavemeans
andvariances
for
.Fisherdefinedtheseparationbetweenthesetwodistributions
tobetheratioofthevariancebetweentheclassestothevariancewithintheclasses:

Thismeasureis,insomesense,ameasureofthesignaltonoiseratiofortheclass
labelling.Itcanbeshownthatthemaximumseparationoccurswhen

WhentheassumptionsofLDAaresatisfied,theaboveequationisequivalentto
LDA.
Besuretonotethatthevector isthenormaltothediscriminanthyperplane.As
anexample,inatwodimensionalproblem,thelinethatbestdividesthetwogroups
isperpendicularto .
Generally,thedatapointstobediscriminatedareprojectedonto thenthe
thresholdthatbestseparatesthedataischosenfromanalysisoftheone
dimensionaldistribution.Thereisnogeneralruleforthethreshold.However,if
projectionsofpointsfrombothclassesexhibitapproximatelythesame
distributions,agoodchoicewouldbethehyperplanebetweenprojectionsofthe
twomeans,
and
.Inthiscasetheparametercinthresholdcondition
canbefoundexplicitly:
.

MulticlassLDA
Inthecasewheretherearemorethantwoclasses,theanalysisusedinthe
derivationoftheFisherdiscriminantcanbeextendedtofindasubspacewhich
appearstocontainalloftheclassvariability.ThisgeneralizationisduetoC.R.
Rao.[8]SupposethateachofCclasseshasamean andthesamecovariance .
Thenthescatterbetweenclassvariabilitymaybedefinedbythesamplecovariance
oftheclassmeans

http://en.wikipedia.org/wiki/Linear_discriminant_analysis

4/12

28/01/2015

LineardiscriminantanalysisWikipedia,thefreeencyclopedia

where isthemeanoftheclassmeans.Theclassseparationinadirection inthis


casewillbegivenby

Thismeansthatwhen isaneigenvectorof
thecorrespondingeigenvalue.

theseparationwillbeequalto

If
isdiagonalizable,thevariabilitybetweenfeatureswillbecontainedinthe
subspacespannedbytheeigenvectorscorrespondingtotheC1largest
eigenvalues(since isofrankC1atmost).Theseeigenvectorsareprimarily
usedinfeaturereduction,asinPCA.Theeigenvectorscorrespondingtothesmaller
eigenvalueswilltendtobeverysensitivetotheexactchoiceoftrainingdata,andit
isoftennecessarytouseregularisationasdescribedinthenextsection.
Ifclassificationisrequired,insteadofdimensionreduction,thereareanumberof
alternativetechniquesavailable.Forinstance,theclassesmaybepartitioned,anda
standardFisherdiscriminantorLDAusedtoclassifyeachpartition.Acommon
exampleofthisis"oneagainsttherest"wherethepointsfromoneclassareputin
onegroup,andeverythingelseintheother,andthenLDAapplied.Thiswillresult
inCclassifiers,whoseresultsarecombined.Anothercommonmethodispairwise
classification,whereanewclassifieriscreatedforeachpairofclasses(giving
C(C1)/2classifiersintotal),withtheindividualclassifierscombinedtoproduce
afinalclassification.

Practicaluse
Inpractice,theclassmeansandcovariancesarenotknown.Theycan,however,be
estimatedfromthetrainingset.Eitherthemaximumlikelihoodestimateorthe
maximumaposterioriestimatemaybeusedinplaceoftheexactvalueintheabove
equations.Althoughtheestimatesofthecovariancemaybeconsideredoptimalin
somesense,thisdoesnotmeanthattheresultingdiscriminantobtainedby
substitutingthesevaluesisoptimalinanysense,eveniftheassumptionofnormally
distributedclassesiscorrect.

http://en.wikipedia.org/wiki/Linear_discriminant_analysis

5/12

28/01/2015

LineardiscriminantanalysisWikipedia,thefreeencyclopedia

AnothercomplicationinapplyingLDAandFisher'sdiscriminanttorealdataoccurs
whenthenumberofmeasurementsofeachsampleexceedsthenumberofsamples
ineachclass.[4]Inthiscase,thecovarianceestimatesdonothavefullrank,andso
cannotbeinverted.Thereareanumberofwaystodealwiththis.Oneistousea
pseudoinverseinsteadoftheusualmatrixinverseintheaboveformulae.However,
betternumericstabilitymaybeachievedbyfirstprojectingtheproblemontothe
subspacespannedby .[9]Anotherstrategytodealwithsmallsamplesizeistouse
ashrinkageestimatorofthecovariancematrix,whichcanbeexpressed
mathematicallyas

where istheidentitymatrix,and istheshrinkageintensityorregularisation


parameter.Thisleadstotheframeworkofregularizeddiscriminantanalysis[10]or
shrinkagediscriminantanalysis.[11]
Also,inmanypracticalcaseslineardiscriminantsarenotsuitable.LDAand
Fisher'sdiscriminantcanbeextendedforuseinnonlinearclassificationviathe
kerneltrick.Here,theoriginalobservationsareeffectivelymappedintoahigher
dimensionalnonlinearspace.Linearclassificationinthisnonlinearspaceisthen
equivalenttononlinearclassificationintheoriginalspace.Themostcommonly
usedexampleofthisisthekernelFisherdiscriminant.
LDAcanbegeneralizedtomultiplediscriminantanalysis,wherecbecomesa
categoricalvariablewithNpossiblestates,insteadofonlytwo.Analogously,ifthe
classconditionaldensities
arenormalwithsharedcovariances,the
sufficientstatisticfor
arethevaluesofNprojections,whicharethesubspace
spannedbytheNmeans,affineprojectedbytheinversecovariancematrix.These
projectionscanbefoundbysolvingageneralizedeigenvalueproblem,wherethe
numeratoristhecovariancematrixformedbytreatingthemeansasthesamples,
andthedenominatoristhesharedcovariancematrix.

Applications
Inadditiontotheexamplesgivenbelow,LDAisappliedinpositioningandproduct
management.

Bankruptcyprediction
http://en.wikipedia.org/wiki/Linear_discriminant_analysis

6/12

28/01/2015

LineardiscriminantanalysisWikipedia,thefreeencyclopedia

Inbankruptcypredictionbasedonaccountingratiosandotherfinancialvariables,
lineardiscriminantanalysiswasthefirststatisticalmethodappliedtosystematically
explainwhichfirmsenteredbankruptcyvs.survived.Despitelimitationsincluding
knownnonconformanceofaccountingratiostothenormaldistributionassumptions
ofLDA,EdwardAltman's1968modelisstillaleadingmodelinpractical
applications.

Facerecognition
Incomputerisedfacerecognition,eachfaceisrepresentedbyalargenumberof
pixelvalues.Lineardiscriminantanalysisisprimarilyusedheretoreducethe
numberoffeaturestoamoremanageablenumberbeforeclassification.Eachofthe
newdimensionsisalinearcombinationofpixelvalues,whichformatemplate.The
linearcombinationsobtainedusingFisher'slineardiscriminantarecalledFisher
faces,whilethoseobtainedusingtherelatedprincipalcomponentanalysisare
calledeigenfaces.

Marketing
Inmarketing,discriminantanalysiswasonceoftenusedtodeterminethefactors
whichdistinguishdifferenttypesofcustomersand/orproductsonthebasisof
surveysorotherformsofcollecteddata.Logisticregressionorothermethodsare
nowmorecommonlyused.Theuseofdiscriminantanalysisinmarketingcanbe
describedbythefollowingsteps:
1. FormulatetheproblemandgatherdataIdentifythesalientattributes
consumersusetoevaluateproductsinthiscategoryUsequantitative
marketingresearchtechniques(suchassurveys)tocollectdatafromasample
ofpotentialcustomersconcerningtheirratingsofalltheproductattributes.
Thedatacollectionstageisusuallydonebymarketingresearchprofessionals.
Surveyquestionsasktherespondenttorateaproductfromonetofive(or1to
7,or1to10)onarangeofattributeschosenbytheresearcher.Anywherefrom
fivetotwentyattributesarechosen.Theycouldincludethingslike:easeof
use,weight,accuracy,durability,colourfulness,price,orsize.Theattributes
chosenwillvarydependingontheproductbeingstudied.Thesamequestionis
askedaboutalltheproductsinthestudy.Thedataformultipleproductsis
http://en.wikipedia.org/wiki/Linear_discriminant_analysis

7/12

28/01/2015

LineardiscriminantanalysisWikipedia,thefreeencyclopedia

codifiedandinputintoastatisticalprogramsuchasR,SPSSorSAS.(This
stepisthesameasinFactoranalysis).
2. EstimatetheDiscriminantFunctionCoefficientsanddeterminethestatistical
significanceandvalidityChoosetheappropriatediscriminantanalysis
method.Thedirectmethodinvolvesestimatingthediscriminantfunctionso
thatallthepredictorsareassessedsimultaneously.Thestepwisemethodenters
thepredictorssequentially.Thetwogroupmethodshouldbeusedwhenthe
dependentvariablehastwocategoriesorstates.Themultiplediscriminant
methodisusedwhenthedependentvariablehasthreeormorecategorical
states.UseWilkssLambdatotestforsignificanceinSPSSorFstatinSAS.
Themostcommonmethodusedtotestvalidityistosplitthesampleintoan
estimationoranalysissample,andavalidationorholdoutsample.The
estimationsampleisusedinconstructingthediscriminantfunction.The
validationsampleisusedtoconstructaclassificationmatrixwhichcontains
thenumberofcorrectlyclassifiedandincorrectlyclassifiedcases.The
percentageofcorrectlyclassifiedcasesiscalledthehitratio.
3. Plottheresultsonatwodimensionalmap,definethedimensions,andinterpret
theresults.Thestatisticalprogram(orarelatedmodule)willmaptheresults.
Themapwillploteachproduct(usuallyintwodimensionalspace).The
distanceofproductstoeachotherindicateeitherhowdifferenttheyare.The
dimensionsmustbelabelledbytheresearcher.Thisrequiressubjective
judgementandisoftenverychallenging.Seeperceptualmapping.

Biomedicalstudies
Themainapplicationofdiscriminantanalysisinmedicineistheassessmentof
severitystateofapatientandprognosisofdiseaseoutcome.Forexample,during
retrospectiveanalysis,patientsaredividedintogroupsaccordingtoseverityof
diseasemild,moderateandsevereform.Thenresultsofclinicalandlaboratory
analysesarestudiedinordertorevealvariableswhicharestatisticallydifferentin
studiedgroups.Usingthesevariables,discriminantfunctionsarebuiltwhichhelpto
objectivelyclassifydiseaseinafuturepatientintomild,moderateorsevereform.
http://en.wikipedia.org/wiki/Linear_discriminant_analysis

8/12

28/01/2015

LineardiscriminantanalysisWikipedia,thefreeencyclopedia

Inbiology,similarprinciplesareusedinordertoclassifyanddefinegroupsof
differentbiologicalobjects,forexample,todefinephagetypesofSalmonella
enteritidisbasedonFouriertransforminfraredspectra,[12]todetectanimalsource
ofEscherichiacolistudingitsvirulencefactors[13]etc.

Seealso
Datamining
Decisiontreelearning
Factoranalysis
KernelFisherdiscriminantanalysis
Logit(forlogisticregression)
Multidimensionalscaling
Multilinearsubspacelearning
Patternrecognition
Perceptron
Preferenceregression
Quadraticclassifier

References
1. ^abFisher,R.A.(1936)."TheUseofMultipleMeasurementsinTaxonomic
Problems".AnnalsofEugenics7(2):179188.doi:10.1111/j.1469
1809.1936.tb02137.x(https://dx.doi.org/10.1111%2Fj.14691809.1936.tb02137.x).
hdl:2440/15227(http://hdl.handle.net/2440%2F15227).
2. ^McLachlan,G.J.(2004).DiscriminantAnalysisandStatisticalPatternRecognition.
WileyInterscience.ISBN0471691151.MR1190469
(https://www.ams.org/mathscinetgetitem?mr=1190469).
3. ^AnalyzingQuantitativeData:AnIntroductionforSocialResearchers,Debra
WetcherHendricks,p.288

http://en.wikipedia.org/wiki/Linear_discriminant_analysis

9/12

28/01/2015

LineardiscriminantanalysisWikipedia,thefreeencyclopedia

4. ^abMartinez,A.M.Kak,A.C.(2001)."PCAversusLDA"
(http://www.ece.osu.edu/~aleix/pami01.pdf).IEEETransactionsonPatternAnalysis
andMachineIntelligence23(=2):228233.doi:10.1109/34.908974
(https://dx.doi.org/10.1109%2F34.908974).
5. ^Abdi,H.(2007)"Discriminantcorrespondenceanalysis."
(http://www.utdallas.edu/~herve/AbdiDCA2007pretty.pdf)In:N.J.Salkind(Ed.):
EncyclopediaofMeasurementandStatistic.ThousandOaks(CA):Sage.pp.270275.
6. ^Perriere,G.&Thioulouse,J.(2003)."UseofCorrespondenceDiscriminantAnalysis
topredictthesubcellularlocationofbacterialproteins",ComputerMethodsand
ProgramsinBiomedicine,70,99105.
7. ^Venables,W.N.Ripley,B.D.(2002).ModernAppliedStatisticswithS(4thed.).
SpringerVerlag.ISBN0387954570.
8. ^Rao,R.C.(1948)."Theutilizationofmultiplemeasurementsinproblemsof
biologicalclassification"(http://www.jstor.org/stable/2983775).JournaloftheRoyal
StatisticalSociety,SeriesB10(2):159203.
9. ^Yu,H.Yang,J.(2001)."AdirectLDAalgorithmforhighdimensionaldatawith
applicationtofacerecognition",PatternRecognition,34(10),20672069
10. ^Friedman,J.H.(1989)."RegularizedDiscriminantAnalysis"
(http://www.slac.stanford.edu/cgiwrap/getdoc/slacpub4389.pdf).Journalofthe
AmericanStatisticalAssociation(AmericanStatisticalAssociation)84(405):165175.
doi:10.2307/2289860(https://dx.doi.org/10.2307%2F2289860).JSTOR2289860
(https://www.jstor.org/stable/2289860).MR0999675
(https://www.ams.org/mathscinetgetitem?mr=0999675).
11. ^Ahdesmki,M.StrimmerK.(2010)"Featureselectioninomicspredictionproblems
usingcatscoresandfalsenondiscoveryratecontrol"
(http://projecteuclid.org/euclid.aoas/1273584465),AnnalsofAppliedStatistics,4(1),
503519.
12. ^PreisnerO,GuiomarR,MachadoJ,MenezesJC,LopesJA.ApplicationofFourier
transforminfraredspectroscopyandchemometricsfordifferentiationofSalmonella
entericaserovarEnteritidisphagetypes.ApplEnvironMicrobiol.201076(11):3538
3544.
13. ^DavidDE,LynneAM,HanJ,FoleySL.Evaluationofvirulencefactorprofilingin
thecharacterizationofveterinaryEscherichiacoliisolates.ApplEnvironMicrobiol.
201076(22):75097513.
http://en.wikipedia.org/wiki/Linear_discriminant_analysis

10/12

28/01/2015

LineardiscriminantanalysisWikipedia,thefreeencyclopedia

Furtherreading
Duda,R.O.Hart,P.E.Stork,D.H.(2000).PatternClassification(2nded.).
WileyInterscience.ISBN0471056693.MR1802993
(https://www.ams.org/mathscinetgetitem?mr=1802993).
Hilbe,J.M.(2009).LogisticRegressionModels.Chapman&Hall/CRCPress.
ISBN9781420075755.
Mika,S.etal.(1999)."FisherDiscriminantAnalysiswithKernels"
(http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.35.9904).IEEE
ConferenceonNeuralNetworksforSignalProcessingIX:4148.
doi:10.1109/NNSP.1999.788121
(https://dx.doi.org/10.1109%2FNNSP.1999.788121).

Externallinks
ALGLIB(http://www.alglib.net/dataanalysis/lineardiscriminantanalysis.php)
containsopensourceLDAimplementationinC#/C++/Pascal/VBA.
Psychometrica.de(http://www.psychometrica.de/lds.html)opensourceLDA
implementationinJava
LDAtutorialusingMSExcel
(http://people.revoledu.com/kardi/tutorial/LDA/index.html)
Biomedicalstatistics.Discriminantanalysis
(http://www.biomedicalstatistics.info/en/prognosis/discriminantanalysis.html)
Retrievedfrom"http://en.wikipedia.org/w/index.php?
title=Linear_discriminant_analysis&oldid=644189195"
Categories: Multivariatestatistics Statisticalclassification
Classificationalgorithms
Thispagewaslastmodifiedon26January2015,at02:00.
TextisavailableundertheCreativeCommonsAttributionShareAlike
http://en.wikipedia.org/wiki/Linear_discriminant_analysis

11/12

28/01/2015

LineardiscriminantanalysisWikipedia,thefreeencyclopedia

Licenseadditionaltermsmayapply.Byusingthissite,youagreetothe
TermsofUseandPrivacyPolicy.Wikipediaisaregisteredtrademarkofthe
WikimediaFoundation,Inc.,anonprofitorganization.

http://en.wikipedia.org/wiki/Linear_discriminant_analysis

12/12

Vous aimerez peut-être aussi