Vous êtes sur la page 1sur 56

San Jose State University

SJSU ScholarWorks
Master's Projects Master's Theses and Graduate Research

Spring 5-18-2015

Using Neural Networks for Image Classification


Tim Kang
SJSU

Follow this and additional works at: http://scholarworks.sjsu.edu/etd_projects


Part of the Artificial Intelligence and Robotics Commons

Recommended Citation
Kang, Tim, "Using Neural Networks for Image Classification" (2015). Master's Projects. 395.
http://scholarworks.sjsu.edu/etd_projects/395

This Master's Project is brought to you for free and open access by the Master's Theses and Graduate Research at SJSU ScholarWorks. It has been
accepted for inclusion in Master's Projects by an authorized administrator of SJSU ScholarWorks. For more information, please contact
scholarworks@sjsu.edu.
Using Neural Networks for Image Classification

SanJoseStateUniversity,CS298Spring2015

Author:TimKang(
timothykang.x@gmail.com ),SanJoseStateUniversity
Advisor:RobertChun(
robert.chun@sjsu.edu),SanJoseStateUniversity
Committee:ThomasAustin(thomas.austin@sjsu.edu ),SanJoseStateUniversity
Committee:ThomasHowell(thomas.howell@sjsu.edu ),SanJoseStateUniversity

Abstract

Thispaperwillfocusonapplyingneuralnetworkmachinelearningmethodstoimages
forthepurposeofautomaticdetectionandclassification.Themainadvantageofusing
neuralnetworkmethodsinthisprojectisitsadeptnessatfittingnonlineardataandits
abilitytoworkasanunsupervisedalgorithm.Thealgorithmswillberunoncommon,
publicallyavailabledatasets,namelytheMNISTandCIFAR10,sothatourresultswill
beeasilyreproducible.

1
TableofContents

Introduction 3
Background 3
ShortHistoryofArtificialNeuralNetworks
QuickExplanationofArtificialNeuralNetworks
ALittleRegardingImageClassification
RelatedWork 7
DeepLearningwithCOTSHPCSystems
LearningNewFactsfromKnowledgeBaseswithNeuralTensorNetworksand
SemanticWordVectors
ConvolutionalRecursiveDeepLearningfor3DObjectClassification
AFastAlgorithmforDeepBeliefNets
ReducingtheDimensionalityofDatawithNeuralNetworks
ToRecognizeShapes,FirstLearntoGenerateImages
LearningMultipleLayersofRepresentation
LearningMethodsforGenericObjectRecognitionwithInvariancetoPoseand
Lighting
ScalingAlgorithmsTowardsAI
ComparingSVMandConvolutionalNetworksforEpilepticSeizurePrediction
fromIntracranialEEG
Google
Facebook
ProposedApproach 20
MachineLearningLibaries
DataSets
Hardware
Implementation 27
Overview
MNISTDigitsandCifar10Details
Torch7Details
DataPreparation&FeatureEngineering
RunningtheTest
Results 39
MNISTResults
CIFAR10Results
OtherThoughts
Conclusion 47
References 49

2
Introduction

Computervisionisaproblemthathasexistedforalongtime.Inthispaper,wewillbe

focusingonthetaskofclassificationofcomputerimagesintodifferentpreset

categories.Thisspecificpartofcomputervisionhasmanydiverserealworld

applications,rangingfromvideogamestoselfdrivingcars.However,italsohasbeen

traditionallyverydifficulttopulloffsuccessfully,duetotheenormousamountofdifferent

factors(cameraangles,lighting,colorbalance,resolutionetc.)thatgointocreatingan

image.

Wewillbefocusingonusingartificialneuralnetworksforimageclassification.While

artificialneuralnetworksaresomeoftheoldestmachinelearningalgorithmsin

existence,theyhavenotbeenwidelyusedinthefieldofcomputervision.Morerecent

improvementsinthemethodsoftrainingartificialneuralnetworkshavemadethem

worthlookingintoonceagainforthetaskofimageclassification.

Background

Short History of Artificial Neural Networks

Artificialneuralnetworksweredesignedtobemodeledafterthestructureofthebrain.

Theywerefirstdevisedin1943byresearchersWarrenMcCullochandWalterPitts[1].

3
Backthen,themodelwasinitiallycalledthresholdlogic,whichbranchedintotwo

differentapproaches:oneinspiredmorebybiologicalprocessesandonefocusedon

artificialintelligenceapplications.

Althoughartificialneuralnetworksinitiallysawalotofresearchanddevelopment,their

popularitysoondeclinedandresearchslowedbecauseoftechnicallimitations.The

computationalintensityofartificialneuralnetworkswastoocomplicatedforthe

computersatthetime.Computersatthetimedidnothavesufficientcomputational

powerandwouldtaketoolongtotrainneuralnetworks.Asaresult,othermachine

learningtechniquesbecamemorepopularandartificialneuralnetworksweremostly

neglected.

However,oneimportantalgorithmrelatedtoartificialneuralnetworkswasdeveloped

duringthistimebackpropagation,discoveredbyPaulWerbos[2].Backpropagationis

awayoftrainingartificialneuralnetworksbyattemptingtominimizetheerrors.This

algorithmallowedscientiststotrainartificialnetworksmuchmorequickly.

Artificialneuralnetworksbecamepopularonceagaininthelate2000swhencompanies

likeGoogleandFacebookshowedtheadvantagesofusingmachinelearning

techniquesonbigdatasetscollectedfromeverydayusers.Thesealgorithmsare

nowadaysmostlyusedfordeeplearning,whichisanareaofmachinelearningthattries

4
tomodelrelationshipsthataremorecomplicatedforexample,nonlinear

relationships.

Quick Explanation of Artificial Neural Networks

Eachartificialneuralnetworkconsistsofmanyhiddenlayers.Eachhiddenlayerinthe

artificialneuralnetworkconsistsofmultiplenodes.Eachnodeislinkedtoothernodes

usingincomingandoutgoingconnections.Eachoftheconnectionscanhaveadifferent,

adjustableweight.Dataispassedthroughthesemanyhiddenlayersandtheoutputis

eventuallyinterpretedasdifferentresults.

Inthisexamplediagram,therearethreeinputnodes,showninred.Eachinputnode

representsaparameterfromthedatasetbeingused.Ideallythedatafromthedataset

wouldbepreprocessedandnormalizedbeforebeingputintotheinputnodes.

Thereisonlyonehiddenlayerinthisexampleanditisrepresentedbythenodesin

blue.Thishiddenlayerhasfournodesinit.Someartificialneuralnetworkshavemore

thanonehiddenlayer.

6
Theoutputlayerinthisexamplediagramisshowningreenandhastwonodes.The

connectionsbetweenallthenodes(representedbyblackarrowsinthisdiagram)are

weighteddifferentlyduringthetrainingprocess.

A Little Regarding Image Classification

Imagerecognitionandclassificationisaproblemthathasbeenaroundforalongtime

andhasmanyrealworldapplications.Policecanuseimagerecognitionand

classificationtohelpidentifysuspectsinsecurityfootage.Bankscanuseittohelpsort

outchecks.Morerecently,Googlehasbeenusingitintheirselfdrivingcarprogram.

Traditionally,alotofdifferentmachinelearningalgorithmshavebeenutilizedforimage

classification,includingtemplatematching,supportvectormachines,kNN,andhidden

Markovmodels.Imageclassificationremainsoneofthemostdifficultproblemsin

machinelearning,eventoday.

Related Work

Inacademia,thereisrelatedworkdonebyProfessorAndrewNgatStanfordUniversity,

ProfessorGeoffreyHintonatUniversityofToronto,ProfessorYannLeCunatNewYork

University,andProfessorMichaelJordanatUCBerkeley.Muchoftheirworkdealswith

applyingartificialneuralnetworksorothermachinelearningalgorithms.Followingisa

7
samplingofafewpapersfromthelargebodyofworkthatisavailable.Thesepapers

areallfairlyrecentandaremoregearedtowardsspecificapplicationsofmachine

learningandartificialneuralnetworks.Thereisalsoalotofresearchinvolvingmachine

learningandartificialneuralnetworksgoingoninindustry,mostspecificallyatGoogle

andFacebook,whichwewilltalkaboutbriefly.

Deep Learning with COTS HPC Systems

ThisisacollaborationbetweenAndrewNgsStanfordresearchgroupandNVIDIA[3].

Oneofthemainproblemsfacingmachinelearningtodayistrainingthesystems.As

datasetsgetlargerandlarger,moreandmorecomputingpowerisrequiredtotrainthe

models.Infact,artificialneuralnetworksareespeciallyhardtotrain.

Thepaperpresentsamoreinexpensivealternativetotrainingmachinelearningmodels

byusingCOTSHPC(communityofftheshelfhighperformancecomputing)hardware.

COTSHPChardwarereferstocomputerhardwarethatcanbeboughtatyourtypical

computerhardwarestore:thingslikeIntelorAMDCPUs.Inthiscase,theCOTSHPC

usedwereNVIDIAGPUs.

Theirsetupwasof16servers.EachserverhadtwoquadcoreCPUs,fourNVIDIAGTX

680GPUsandtheFDRInfinibandadapter(forlowlatencycommunication).Theychose

thisconfigurationspecificallytobalancethenumberofGPUswithCPUs,citingpast

8
exampleswheretoomanyGPUsoverwhelmedthesystemsthroughI/O,cooling,CPU

compute,andpowerissues.

TheirsoftwarewaswritteninC++andbuiltontopofthepreviousMVAPICHMPI

framework,chosentomakecommunicationbetweendifferentprocesseseasier.The

codealsoincludesGPUcodewritteninNVIDIAsNVICUDAlanguage.

Intheend,theywereabletotraindeeplearningsystems(includingneuralnetworks)

withover11billionparameters,whichisseveraltimeswhatotherpeoplewereableto

dobefore.

Learning New Facts from Knowledge Bases with Neural Tensor Networks and

Semantic Word Vectors

ThisisanotherpaperfromAndrewNgsStanfordresearchgroup,thistimefocusedon

usingneuralnetworkstotrytoextractdataandinsightsfromunannotatedtext[4].It

focusesonlexicaldatabaseslikeWordNetandYago.Thesedatabasesstore

informationaboutEnglishwords,specificallydefinitionandusage,andalsoprovide

informationabouttherelationshipbetweendifferentEnglishwords.Theyarecommonly

usedinaritificialintelligenceandtextprocessingresearch.

9
Thepaperusesaspecifictypeofneuralnetworkcalledaneuraltensornetwork.This

typeofneuralnetworkisusedbecauseitiseasiertoadapttowordsandtheir

relationships.Abigadvantageofusingthistypeofmodelisthatitcanrelatetwoinputs

directly.Themodelsweretrainedusinggradients.

UsingtheWordNetdataset,thetrainedmodelwasaskedtoclassifywhetheran

arbitrarytripletofentitiesandrelationsistrueornot.Falseexampleswerecreatedby

purposelymessingupexistingknowntripletsbyswitchingentitiesandrelations.This

modelwasabletoachieve75.8%accuracy,whichismuchbetterthanwhatother

researcherswereabletoachievebefore,withsimilaritybasedmodels(66.7%)and

Hadamardbasedmodels(71.9%).

Convolutional Recursive Deep Learning for 3D Object Classification

ThisisyetanotherpaperfromAndrewNgsStanfordresearchgroup.Thistimethe

paperisaboutusingneuralnetworkstoclassifyimagesofobjects,specificallyfocusing

onRGBDimages[5].RGBDimagesareimagesthatalsohavedepthinformationin

additiontothetypicalcolorinformationincludedinimages.Agood,everydayexample

ofadevicethatcapturesRGBDinformationistheKinectsensorcreatedbyMicrosoft

fortheXboxOneandtheXbox360.

10
Thepaperusesatypeofneuralnetworkcalledtheconvolutionalrecursiveneural

network(CNNRNN)forlearningandclassifyingRGBDimages.Thisactuallyconsists

oftwodifferentnetworks.Theconvolutionalneuralnetworkisfirsttrainedinan

unsupervisedmannerbyclusteringtheimages.Thisisthenusedtocreatethefiltersfor

theCNNs.TheresultingfeaturesarethenfedintotheRNNs,whichclassifytheimages.

ThedatasetusedinthispaperwastheRGBDObjectDatasetfromtheUniversityof

Washington,organizedbyKevinLai.Thepaperwassuccessfulinclassificationofthe

objectsintheLaidataset,outperformingmostpreviousattemptsmadeusingthesame

dataset.

A Fast Algorithm for Deep Belief Nets

ThispaperisfromGeoffreyHinton'sgroupattheUniversityofTorontoanddealswith

usingtheconceptof"complementarypriors"tobothspeedupandimproveneural

networksthathavealargenumberofhiddenlayers[6].Thisisdonebycreatinga

neuralnetworkwiththetoptwolayersasundirectedassociatedmemoryandtherestof

thehiddenlayersasanacyclicgraph.Thereareafewadvantagesofdoingthis,most

notablythatitallowstheneuralnetworktofindadecentsetofparametersmuchmore

rapidly.

11
ThepaperusesthecommonlyreferencedMNISThandwrittendigitsdatabasetotest

outthisnewwayofcreatinganeuralnetwork.Anotheradvantagegainedfromusingthe

MNISTisthattherehasalreadybeenalotofresearchpublishedusingtheMNISTso

thatitwouldbeeasyfortheresearcherstofindothermethodstocompareagainst.The

resultsthispapergotafterrunningagainsttheMNISTdatabasewerefavorabletoother

resultsobtainedbyusingthemorecommonfeedforwardneuralnetworks.

Whiletheinitialresultsweregood,thepaperalsooutlinesseveralproblemsthatcould

limitthepowerofthisparticularmethod.Forexample,ittreatsnonbinaryimagesas

probabilities,whichwon'tworkfornaturalimages.

Reducing the Dimensionality of Data with Neural Networks

ThisisapublicationbyGeoffHintonoriginallyappearingintheScienceMagazinein

July2006anddealswiththeproblemofusingneuralnetworkstoreducethe

dimensionalityofdata[7].Theproblemofdimensionalityreductionhasbeen

traditionallytackledusingmethodslikeprincipalcomponentsanalysis(PCA).Principal

componentsanalysisbasicallylooksforthegreatestvarianceinthedatasetandcanbe

donewithalgorithmslikesingularvaluedecomposition.

Thispaperdiscussesusingatypeofneuralnetworkcalledan"autoencoder"asan

alternativetousingprincipalcomponentsanalysis.Anautoencoderisamultilayer

12
neuralnetworkthatcantakehighdimensionaldata,encodeitintolowdimensional

formatandalsohastheabilitytotrytoreconstructtheoriginalhighdimensionaldata

usingfilesinthelowdimensionalformat.

Thepaperteststhetrainedautoencoderonseveraldifferentdatasets.Oneofthedata

setsusedwasacustomrandomlygeneratedsetofcurvesinthetwodimensional

space.Forthisdataset,theautoencoderwasabletoproducemuchbetter

reconstructionsthanPCA.TheMNISTdigitsdataset,Olivettifacedataset,andadata

setofdocumentswerealsoused.Onceagain,theautoencoderwasabletoperform

betterthanPCA.

To Recognize Shapes, First Learn to Generate Images

ThispaperisfromGeoffreyHintonattheUniversityofTorontoandtheCanadian

InstituteforAdvancedResearchanddealswiththeproblemoftrainingmultilayerneural

networks[8].Itisanoverviewofthemanydifferentstrategiesthatareusedtotrain

multilayerneuralnetworkstoday.

Firstitdiscusses"fivestrategiesforlearningneuralnetworks,"whichincludedenial,

evolution,procrastination,calculus,andgenerative.Outofthesefivestrategies,the

mostsignificantonesarestrategiesfourandfive.Calculusincludesthestrategyof

13
backpropagation,whichhasbeenindependentlydiscoveredbymultipleresearchers.

Generativeincludesthe"wakesleep"algorithm.

Therestofthepapergoesovermanycommonwaysoftrainingamultilayerneural

networkincluding"learningfeaturedetectorswithnosupervision,learningonelayerof

featuredetectors(withrestrictedBoltzmannmachines),agreedyalgorithmforlearning

multiplehiddenlayers,usingbackpropagationfordiscrimininativefinetuning,andusing

contrastivewakesleepforgenerativefinetuning."

14
Learning Multiple Layers of Representation

ThisisapaperbyGeoffreyHintonthatdealswiththeproblemoftrainingmultilayer

neuralnetworks[9].Trainingmultilayerneuralnetworkshasbeendonemostlyusingthe

backpropagationalgorithm.Backpropagationisalsothefirstcomputationallyfeasible

algorithmthatcanbeusedtotrainmultiplelayers.However,backpropagationalsohas

severallimitations,includingrequiringlabeleddataandbecomingslowwhenusedon

neuralnetworkswithexcessiveamountsoflayers.

Thispaperproposesusinggenerativemodelstosolvethatproblem.Neuralnetworks

canberun"bottomup"inordertomakerecognitionmodelsor"topdown"tomake

generativeconnections.Running"topdown"throughaneuralnetworkwithstochastic

neuronswillresultincreatinganinputvector.Thepapersuggeststrainingmodelsby

tweakingtheweightsonthetoplevelconnectionsandtryingtoachievethemaximum

similaritywiththeoriginaltrainingdata,withthereasoningbeingthatthehigherlevel

featureswillbeabletoaffecttheoutcomemorethanthelowerlevelfeatures.According

tothepaper,thekeytomakingthisworkistherestrictedBoltzmannmachine(RBM).

Thepapertestsitstrainedmodelsontwodifferentdatasets,theMNISThandwritten

digitsdatasetandalsoadatasetofsequentialimages.Intheend,Hintonattributesthe

successofthismodeltothreedifferentfactors:usingagenerativemodelinsteadof

15
attemptingtoclassify,usingrestrictedBoltzmannmachinestolearnonelayeratatime,

andhavingaseparatefinetuningstage.

Learning Methods for Generic Object Recognition with Invariance to Pose and

Lighting

ThispaperisbyYannLeCunfromtheCourantInstituteatNewYorkUniversityand

dealswithobjectrecognition[10].Recognitionofobjectsusingjustshapeinformation,

withoutaccountingforpose,lighting,andbackgrounds,isaproblemthathasnotbeen

dealtwithfrequentlyinthefieldofcomputervision.

ThepaperusestheNORBdataset,whichisalargedatasetwith97,200imagepairsof

50objectsthatbelongtofivedifferentcategories,specificallyhumanfigures,animals,

airplanes,trucksandcars.ThisNORBdatasetwasusedtogenerateotherdatasets

wheretheimagesvaryinlocation,scale,brightness,etc.

Awiderangeofdifferentmachinelearningmethodswereusedtotesttheimagesfrom

thedatasets,includinglinearclassifier,KNN(nearestneighborwithEuclidean

distance),pairwiseSVM(supportvectormachineswithaGaussiankernel),and

convolutionalneuralnetworks.Forthemostpart,convolutionalneuralnetworksended

upwithgoodperformancecomparedtotheothermethods,exceptsurprisinglyonthe

jitteredclustereddataset.Manyoftheothermethodsalsoranintosignificanttroubles

16
becauseoflimitationsinCPUpowerandtimeandcouldnotbetrainedreasonablyon

manyofthedatasetstested.

Scaling Algorithms Towards AI

ThisisacollaborationbetweenYoshuaBengiooftheUniversityofMontrealandYann

LeCunofNewYorkUniversityanddealswiththelongtermproblemoftraining

algorithmstoworkwithartificialintelligence[11].Thepaperdiscussesmanyofthe

commonlimitationsfoundwhenworkingwithartificialintelligence,mainlyshallow

architectureandlocalestimators.

Thepaperdoesnotattempttofindagenerallearningmethod,sayingthatsuchataskis

doomedtofailure.Instead,theyattempttolookforlearningmethodsforspecifictasks,

reasoningthatfindingoutthesemethodswillbringthemclosertocreatinganartificially

intelligentagent.

Thepaperthengoesintomoredetailontheadvantagesanddisadvantagesofdifferent

algorithmsetups,forexampledeepversusshallow.Italsocomparesvariousalgorithms

likesupportvectormachinesandconvolutionalneuralnetworksagainsteachother,

usingdatasetsliketheMNISThandwrittendigitsdatasetandtheNORBdataset.

17
Comparing SVM and Convolutional Networks for Epileptic Seizure Prediction

from Intracranial EEG

ThisisapaperfromYannLeCunofNewYorkUniversitythatfocusesonusingmachine

learningmethodslikesupportvectormachinesandconvolutionalneuralnetworksin

ordertotrytopredictwhenepilepticseizureswilloccur[12].Epilepsyisaneural

diseasethataffectsaroundonetotwopercentoftheworldpopulationandcausesits

victimstohaveseizuresoccassionally.Therehasbeenalotofotherresearchdoneon

tryingtopredictwhentheseseizureswilloccur,butalmostnoneusingmodernmachine

learningmethods.

Thepaperusesdatagatheredfromanelectroencephalagraphymachine,whichrecords

thelocalvoltageofmillionsofbraincellsatatime.Currentmethodsofseizure

predictionsufferfromatradeoffbetweenbeingabletopredicttheseizuresandavoiding

falsealarmswhenpredictingtheseizures.Themostcommonapproachcurrentlyused

isbinaryclassification,whichissusceptibletotheseproblems.Themachinelearning

algorithmsusedbythispapercanmitigatetheseproblemsbecauseoftheirabilityto

clasifynonlinearfeaturesinahighdimensionalfeaturespace.

ThepaperthenusesMATLABtoimplementthesupportvectormachinesand

convolutionalneuralnetworksused.Theresultswerehighlysuccessful,especiallyfor

18
theconvolutionalneuralnetworks,whichwereabletoachievenofalsealarmsonallof

thepatientsexceptforone.

Google

Googlehasbeenfocusingmuchmoreonitsmachinelearninganddatascience

departments.AlmosteveryproductatGoogleusessomesortofmachinelearningor

datascience.Forexample,GoogleAdsenseusesdatasciencetobettertargetads

towardscustomersandPicasausesmachinelearningtorecognizefacesinimages.

OneofthemoreinterestingGoogleproductsusingmachinelearninganddatascience

isDeepMind.DeepMindTechnologieswasatechstartupbasedinLondonthatwas

acquiredbyGooglenearthebeginningof2014[13].Theirgoalistocombinethebest

techniquesfrommachinelearningandsystemsneurosciencetobuildpowerful

generalpurposelearningalgorithms.

DeepMindhas,infact,trainedaneuralnetworktoplayvideogames,includingclassics

likePongandSpaceInvaders[42].

Facebook

LikeGoogle,Facebookhasalsobeenfocusingalotonmachinelearninganddata

science.Thisisbecause,asacompanyrelyingheavilyonadvertisingforrevenue,and

19
asacompanywithhugeamountsofuserpersonaldata,machinelearninganddata

sciencewillallowthemtotargettheiradsbetterandgetmorerevenue.

ThemainFacebookoperationiscurrentlybasedinNewYorkCity.In2014,Facebook

hiredNewYorkUniversityprofessorandfamedneuralnetworkresearcherYannLeCun

tohelpheadandleadthisoperation.

Proposed Approach

Machine Learning Libaries

Weconsideredmanydifferentmachinelearninglibraries,including:

Torch7

Theano/PyLearn

Caffe

Torch7isamachinelearninglibrarythatisbeingdevelopedatNewYorkUniversity,the

IdiapResearchInstitute,andNECLaboratoriesAmerica[14].Accordingtotheir

description:

20
Torch7isascientificcomputingframeworkwithwidesupportformachine

learningalgorithms.Itiseasytouseandprovidesaveryefficient

implementation,thankstoaneasyandfastscriptinglanguage,LuaJIT,andan

underlyingCimplementation.

TheanoisamachinelearninglibrarybeingdevelopedmainlyattheUniversityof

Montreal[15].ItisaPythonlibraryandismoreofageneralpurposecomputeralgebra

systemlibrarywithaemphasisonmatrixoperations.

CaffeisamachinelearninglibrarybeingdevelopedatUCBerkeley[16].Accordingto

theirdescription:

Caffeisadeeplearningframeworkdevelopedwithcleanliness,readability,and

speedinmind.ItwascreatedbyYangqingJiaduringhisPhDatUCBerkeley,

andisinactivedevelopmentbytheBerkeleyVisionandLearningCenter(BVLC)

andbycommunitycontributors.CaffeisreleasedundertheBSD2Clause

license.

Itisimportanttonotethatallthreelibrarieshaveafocusonthedeeplearningaspectof

machinelearning,butarealsoconfigurableinmanydifferentwaysandcansupport

manyotheralgorithms.

21
Aftersomepreliminaryevaluation,wedecidedonusingtheTorch7machinelearning

library.Thislibrarywaschosenoutofthedifferentmachinelearningframeworksthat

supportartificialneuralnetworksbecauseofmanyreasons.Speedwise,itisalotfaster

thanthealternatives.ItalsosupportsinterfacingwithCandCUDAcodeeasily.Finally,

outoftheframeworksconsidered,atthispointintime,itisthemostcommonlyusedin

industry.BothFacebookandGooglehaveteamsthatareusingTorch7formachine

learningresearch.

Data Sets

Wealsoconsideredmanydifferentimagedatasets,including:

Caltech101

PASCALVOC

MNISTDigits

Flowerclassificationdataset

StanfordDogs

AnimalswithAttributes

Cifar10

Whenconsideringthedifferentimagedatasets,wetookintoconsiderationthesizeof

thedataset,thecontentoftheimages,andtheformatthedatainthedatasetis

22
presented.Wewantedadatasetthatalreadyhadtheimageswellformattedandwould

beeasytoworkwithusingourmachinelearninglibraries.

TheCaltech101dataset[17]is,accordingtotheirdescription:

Picturesofobjectsbelongingto101categories.About40to800imagesper

category.Mostcategorieshaveabout50images.CollectedinSeptember2003

byFeiFeiLi,MarcoAndreetto,andMarc'AurelioRanzato.Thesizeofeach

imageisroughly300x200pixels.

Thisdatasetofimagesalsocontainsoutlineannotationsofthedifferentobjectsshown,

whichpossiblycouldcomeinusefullater.

ThePASCALVOCdataset[18]isfromthe2009challengerunbythePASCAL2

NetworkofExcellenceonPatternAnalysis,StatisticalModelling,andComputational

LearningandfundedbytheEU.

Accordingtotheirwebsite,PASCALVOCcontainstwentycategoriesofeveryday

objects:

Person:person

Animal:bird,cat,cow,dog,horse,sheep

23
Vehicle:aeroplane,bicycle,boat,bus,car,motorbike,train

Indoor:bottle,chair,diningtable,pottedplant,sofa,tv/monitor

TheMNISTdigitsdataset[25]isalargecollectionofimagesofhandwrittendigits.They

haveatrainingsetof60,000examplesandatestingsetof10,000examples.The

imageshavealreadybeencenteredandnormalizedandthewholedatasetisasmaller

subsetofalargerdatasetavailablefromtheNIST(NationalInstituteofStandardsand

Technology).

TheFlowerClassificationdatasetisfromtheVisualGeometryGroupattheUniversityof

Oxford.Thereareactuallytwoversionsofthedataset,onewith17categoriesandone

with102categories.

TheflowersarecommonflowersseenintheUnitedKingdom[19].

StanfordDogsdataset[20]isadatasetfromStanfordUniversity.Accordingtotheir

website:

TheStanfordDogsdatasetcontainsimagesof120breedsofdogsfromaround

theworld.Thisdatasethasbeenbuiltusingimagesandannotationfrom

ImageNetforthetaskoffinegrainedimagecategorization.

24
Itcontains20,580imagesofdogssortedinto120differentcategorieswithclasslabels

andboundingboxes.

AnimalswithAttributes[21]isadatasetfromtheMaxPlanckInstituteforBiological

Cybernetics,whichislocatedinTbingen,BadenWrttemberg,Germany.Accordingto

theirdescription:

Thisdatasetprovidesaplattformtobenchmarktransferlearningalgorithms,in

particularattributebaseclassification.Itconsistsof30475imagesof50animal

classeswithsixpreextractedfeaturerepresentationsforeachimage.The

animalclassesarealignedwithOsherson'sclassicalclass/attributematrix,

therebyproviding85numericattributevaluesforeachclass.Usingtheshared

attributes,itispossibletotransferinformationbetweendifferentclasses.

Cifar10[36]isadatasetputtogetherbyAlexKrizhevsky,VinodNair,andGeoffrey

HintonfromtheUniversityofToronto.Itisasmallersubsetofthelarger80milliontiny

imagesdataset,butwiththeadvantageofhavingeverythinglabeled.Thereare50000

trainingimagesand10differentcategoriesairplane,automobile,bird,cat,deer,dog,

frog,horse,shipandtruck.

Afterlookingoverallthedifferentdatasets,itwasdecidedthatwewillbemainlyusing

theMNISTdigitsdataset.Alotofpriorresearchhasbeendoneusingthisdatasetsowe

25
caneasilycompareresultstoteststhatothershaverunbefore.Wewillalsoberunning

additionaltestsusingtheCifar10datasetduetotheexcellentsupportithaswithour

othertools.

Hardware

AllofthiswillberunonacomputerrunningUbuntuLinux14.04TrustyTahrwiththe

followingspecifications:

AMDPhenomIIX3720

4GBRAM

NvidiaGeforce750Ti

500GB7200rpmHDD

TheNvidiaGeforce750TigraphicscardcanbeespeciallyusefulbecauseTorch7is

alsocodedtoincludesupportfortheNvidiaCUDAlibrary.NvidiaCUDAallows

programstousethemassivelyparallelcomputingpowerofagraphicscard.Therestof

thehardwarewaschosensimplyjustbecauseitwasreadilyavailable.

26
Implementation

Overview

Theimplementationofourneuralnetworkrequiresmanydifferentsteps.

Thefirststeprequiredisdatasetpreparation.Evenforthosethatarecommonlyused,

datasetscomeinmanydifferentformats.Itisoftennecessarytowriteafewshort

scriptsthatwilltakeintheexamplesfromthedatasetandthenformatthemproperlyfor

themachinelearningtoolsthatwillbeused.

Itisalsooftennecessarytodowhatisknownasfeatureengineering.Examplesfrom

datasetscanhavetoomanyfeatures.Runningatrainingalgorithmonadatasetwithtoo

manyfeaturescancausethealgorithmtobecomeconfusedandproducesubpar

results.Therefore,itisnecessarytopickoutwhichfeaturestokeepandwhichto

remove(ortogivelessweightto).Thiscanbedonemanuallybyhandorusingan

algorithmlikePCA(principalcomponentsanalysis).

Finally,evenaftersuccessfullyrunningthealgorithmonadataset,itmaybehelpfulto

tweaksomeparametersandrerunthealgorithm.

27
MNIST Digits and Cifar10 Details

Asmentionedabove,theMNISTDigitsisadatasetofhandwrittendigits.Thereare

60,000differenthandwrittendigitfilesavailableinthisparticulardataset,designedtobe

usedfortrainingtheselectedalgorithm.Therearealso10,000digitfilesfromthe

datasetdesignedtobeavailablefortestingoutthealgorithmafterithasbeentrained

[25].

ItisavailableonlinefromtheNYUCourantInstituteandisasubsetofalargerNIST

(NationalInstituteofStandardsandTechnology)datasetofhandwrittendigits.The

originalNISTdatasetconsistsofmanyspecialdatabases,whicharehandwrittendigits

collectedfromdifferentsourcesandareorganizedintogroupscalledSpecialDigits.

TheMNISTdatasetusesdigitsfromtheNISTSpecialDigits1andNISTSpecialDigits

3.ThedigitsfromSpecialDigits1arefromCensusBureauemployeeswhilethedigits

fromSpecialDigits3werecollectedfromhighschoolstudents.MNISTusesaneven

mixtureof30,000digitsfromSpecialDigits1and30,000digitsfromSpecialDigits3for

thetrainingsetandanevenmixtureof5,000digitsfromeachforthetestingset.

ThedigitfilesareimagesoftheArabicnumerals0to9.Eachimageis28by28pixels

andisnormalizedsothatthenumeralsfitwhilealsokeepingthesameaspectratio.The

imageshavealsobeencenteredtofitintothe28by28pixelarea.Thefollowingisa

samplingofimagesfromtheMNISTdataset:

28

Cifar10andCifar100aredatasetsfromtheUniversityofToronto.Thesedatasetsarea

subsetofthe80milliontinyimagesdataset,withtheadvantageofhavingeverything

labeled.

TheCifar10datasetcontains60,000imagesthataresortedinto10differentclasses

whiletheCifar100contains60,000imagesthataresortedinto100differentclasses.We

willbeusingtheCifar10datasetforourexperiments.

29
Theimagesinthedatasetare32x32colorimages.Theyarecategorizedinto10

differentclassesairplane,automobile,bird,cat,dog,deer,frog,horse,shipandtruck.

Theclassescontainnooverlapwitheachotherforexampleifsomethingislabeledas

acaritwillnotbelabeledasatruck.Followingisasamplingofimagesfromthe

dataset:

Torch7 Details

Asmentionedabove,Torch7isanopensourcemachinelearninglibrarybeing

developedprimarilyatNewYorkUniversity.ItusestheLuascriptinglanguageasits

defaultlanguageofchoice,althoughitalsoallowssnippetsofCcodetobeinsertedas

wellasinterfacingwithNVIDIACUDA,whenspeedisespeciallyimportant.

30

TheLuaprogramminglanguageisanunusualchoicebythedevelopersofTorch7.

Accordingtotheirwebsite[26],Luadescribesitselfas:

Luaisapowerful,fast,lightweight,embeddablescriptinglanguage.

Luacombines simpleproceduralsyntaxwithpowerfuldatadescriptionconstructs

basedon associativearrays andextensiblesemantics.Luaisdynamicallytyped,

runs by interpreting bytecode for a registerbased virtual machine, and has

automatic memory management with incremental garbage collection, making it

idealforconfiguration,scripting,andrapidprototyping.

ItwasdevelopedatthePontificalCatholicUniversityofRiodeJaneiroinBrazilbythree

computerscientists:LuizHenriquedeFigueiredo,RobertoIerusalimschy,and

WaldemarCeles[26]whowerepartoftheTecgraf(ComputerGraphicsTechnology

Group)atthetime.LuawasdevelopedduringaperiodoftimewhenBrazilhadenacted

manytradebarriers,especiallyinregardstotechnology.Asaresult,Luawascreated

almostfromscratchandhasmanystrangequirks.Forexample,itiscustomaryforLua

arrayindicestostartat1insteadofthestandard0usedinmostotherprogramming

languages.

31
Althoughithasbeenusedbymanylargecompanies,includingAdobe,Bombardier,

Disney,ElectronicArts,Intel,LucasArts,Microsoft,NASA,OlivettiandPhilips[26],its

usageinthegeneralprogrammingcommunityremainsquitelow.

TheTIOBEIndexisarankingofprogramminglanguagepopularitythatismaintainedby

TIOBESoftware[27].Whileitisnotanexactmeasurementbyanymeans,itisagood

waytogetaroughestimateofaparticularprogramminglanguagespopularitywiththe

community.AccordingtotheTIOBEIndexforJanuary2015,Luaisranked31stin

popularitybeingusedinabout0.649%ofallprogrammingapplications,evenbehindold

languagessuchasAdaandPascal.

Torch7usestheLuaJITcompilerformostgeneralpurposes[28].LuaJITisaopen

sourceLuacompilerthataimstoprovideaJIT(justintime)compilerfortheLua

language.ManyotherlanguageslikeJavaalsousejustintimecompilationforthe

compiler.Theadvantageofjustintimecompilationisthatitallowscodetobeexecuted

morequicklythancodethanisinterpreted.

Torch7alsoallowsfortheuseofLuaRocks,whichisanopensourcepackage

managementsystemforLua[29].Programscanbebundledtogetherintheformofa

packagecalledaLuaRock.ManyofthecoreTorch7packagesarehostedat

LuaRocksandcanbeinstalledeasilyfromthecommandline.Thefollowingcommand

32
isanexampleofhowLuaRockscanbeusedtoinstallapackagecalled

somepackage.

$ luarocks install somepackage

ThereisalsoacustomcommandlineinterpreterincludedwiththedefaultTorch7install.

Thiscanbeaccessedthroughthethcommandfromtheterminal,onceTorch7is

installedandallthePATHsettingsareconfiguredcorrectly.Thiscustomcommandline

interpreteriscalledTREPL,whichstandsfortorchreadevalprintloop.TREPLhas

severaladvantagesoverthedefaultLuaonebecauseithasmanyextrafeatures

designedtomakeworkingwithTorch7Luacodeeasier,suchastabcompletionand

history.Thisisanexampleofthethcommand,takenfromtheTorch7website[14]:

$ th

______ __ | Torch7
/_ __/__ ________/ / | Scientific computing
forLua.
/ / / _ \/ __/ __/ _ \ |
/_/ \___/_/ \__/_//_/ | https://github.com/torch
| http://torch.ch

th> torch.Tensor{1,2,3}
1
2
3
[torch.DoubleTensor of dimension 3]

th>

33
OutofthenumerousLuaRocksavailablethroughthepackagemanagementsystem,an

especiallyimportantoneforthisprojectisdp.Thisisalibrarydesignedtofacilitatethe

processofusingTorch7fordeeplearning.dpwasdevelopedbyNicholasLeonard

whilehewasagraduatestudentworkingintheLISAlabunderthesupervisionof

YoshuaBengioandAaronCourville[30].

Itdescribesitselfonitshomepageasahighlevelframeworkthatabstractsaway

commonusagepatternsofthennandtorch7packagesuchasloadingdatasetsand

earlystoppingwithhyperparameteroptimizationfacilitiesforsamplingandrunning

experimentsfromthecommandlineorpriorhyperparameterdistributionsandfacilites

forstoringandanalysinghyperpametersandresultsusingaPostgreSQLdatabase

backendwhichfacilitatesdistributingexperimentsoverdifferentmachines.

Data Preparation & Feature Engineering

BoththeMNISTdigitsdatasetandtheCifar10datasetdonotcomeinastandardimage

format.Theycomeintheirownspecialformatsdesignedforstoringvectorsand

multidimensionalmatrices.Usuallywhenworkingwiththesetypeofdatasets,oneis

requiredtowriteasmallprogramtoparsethespecialformat.However,thedpLuarocks

module(whichisdesignedtoeliminatecommonrepetitivetasks)makesthis

unnecessarybecauseitalreadyincludesasmallamountofcodetofacilitatetheloading

ofthedatafrommanycommondatasets(includingMNISTandCifar10).

34

ThedpLuarocksmodulealsocontainsafewmethodstohelppreprocessand

dp.Standardize
standardizethedata.Thisisaccomplishedusingthe method,which

subtractsthemeanandthendividesbythestandarddeviation.WhiletheMNIST

datasetisalreadyformattednicelyforthemostpart,weapplyitanywayusingthe

commoncodepatternshowbelow:

mydata = dp.Mnist{ input_preprocess = dp.Standardize()}

Thisgetsridofanyanomaliesandisgoodpracticeingeneralwhendoingmachine

learning.TheCifar10datasetdoesnotworkwellwithdp.Standardize
,soweleaveit

outwhenrunningtestsonCifar10.

Running the Test

Thefirststepinrunningthetestiscreatingasetofparametersthatwillbeusedforthe

test.StoringalltheseparametersinatableintheformofaLuavariableisagoodidea

becauseitletsuskeeptrackofthingsmoreeasilyandalsoallowsustochange

parametersasweseefit.Thecodeforthatwouldlooksomethinglikethis:

myparams = {
hiddenunits = 100,
learningrate = 0.1,
momentum = 0.9,
maximumnorm = 1,

35
batchsize = 128,
maxtries = 100,
maxiterations = 1000
}

ThesearethebasicparametersthatweneedtosupplytoTorch7anddpinordertorun

ourtest.Hereisanquickexplanationofwhateachoftheseparametersmean:

hiddenunits
:Thisrepresentsthenumberofnodesinthehiddenlayer(shown

asthebluecolorednodesinapreviousdiagram).

learningrate
:Thisvariabledetermintesthelearningratefortheneural

network.Asmallerlearningratemakeshesystemlearninfinerincrements,but

canalsodrasticallyincreasethetimerequiredtotrainthesystem.

momentum
:Themomentumaddspartofthepreviousweighttothecurrent

weight.Thisisdonetotrytopreventthesystemfromsettlingonalocalminimum

whentraining.Ahighmomentumcanmakethesystemtrainfaster,butcanalso

endupovershootingtheminimum.

maximumnorm
:Thisisusedtodeterminehowmuchtoupdatetheneuron

weights.

batchsize
:Thisisthebatchsizeforthetrainingexample.

maxtries
:Thisdetermineswhentostopthetrainingprocessearly,ifaftera

certainamountoftries,errorhasnotdecreased

maxiterations
:Thisdeterminesthemaximumamountoftimestoiterate

overall.

36

Wethenneedtobuildamodeltorepresentourneuralnetwork.Weusethe
dp.Neural

classwhichrepresentsalayerintheneuralnetwork.Weusetheparametersthatwe

setaboveinmyparams.Asinglelayerlookssomethinglikethis:

dp.Neural {
input_size = mydata:featureSize(),
output_size = myparams.hiddenunits,
transfer = nn.Tanh()
}

Wecreateseveraloftheselayersandcombinethemtogetherusingthe

dp.Sequential transfer
moduletoformourneuralnetwork.The variableusedtoset

thetransferfunction.Transferfunctionsareusedtoallowformorecomplexitythanwhat

atypicallogisticregressionfunctionwouldprovide.

train
Nextwesetupthreepropagators: valid
, test
,and .Thesedeterminehowthe

neuralnetworkwilltrainthesystemanddeterminewhatisgoodandwhatisbad.

train
The dp.Optimizer
propagatorisaninstanceofthe classandrequiresafew

parameterstobeprovided:

loss
:Thisrepresentsthetypicalmachinelearninglossfunction,whichisa

functionthatthesystemwantstominimize

37
visitor
:Hereweusesomeoftheparameterswesetaboveinmyparms,

momentum
namely learningrate
, and
maximumnorm

feedback
:Thisdetermineshowfeedbackisprovidedaftereachiterationof

dp.Confusion
training.Weuse ,whichisjustaconfusionmatrix.

sampler
:Thisdetermineswhichordertoiteratethroughthedataset.Forthetrain

dp.ShuffleSampler
propagatorweuse whichrandomlyshufflesthedataset

beforeeachiteration.

valid
The test
and dp.Evaluator
propagatorsareinstancesofthe classandalso

requireafewparameterstobeprovided:

loss train
:(sameas propagator)

feedback train
:(sameas propagator)

sampler valid
:Forthe test
and propagators,weuseadifferentsamplerthan

train
the dp.ShuffleSampler
propagator.Insteadof ,weuse
dp.Sampler

whichiteratesthroughthedatasetinorder.

Finally,wesetuptheexperimentandpreparefortraining.Weusethe
dp.Experiment

classwhichtakesinthefollowingparameters:

model dp.Neural
:Thisissetusingthemodelthatwesetupbeforeusing to

dp.Sequential
formlayersand tocombinethemintoaneuralnetwork

38
optimizer train
:Thisissetusingthe propagatorthatwecreatedabove.

validator valid
:Thisissetusingthe propagatorthatwecreatedabove.

tester
:Thisissetusingthetestpropagatorthatwecreatedabove.

observer
:Observerisafeatureofdpthatlistensasthemodelisbeingtrained

andthencallsspecificfunctionswhencertaineventsoccur.Ourobserveruses

dp.EarlyStopper
whichendsthetrainingprocessearlyifnoadditionalresults

dp.FileLogger
arebeingobtainedand whichwritestheresultstoafile

max_epoch
:Thisisthemaximumnumberofiterationsthattheexperimentwillgo

myparams.maxiterations
throughwhentraining.Wesetthisto .

Thefinalstepisrunningtheexperimentonthedatasets,whichcanbeaccomplishedby

thefollowinglineofcode:

myexperiment:run(mydata)

Nowthatwehaveruntheexperiment,letuslookattheresults.

Results

MNIST Results

OneofthemainadvantagesoftheMNISTdatasetisthatitiswidelyusedandsothere

arealotofprevioustestsrunontheMNISTdatasetthatwecancompareourresultsto.

39
FollowingareafewpaperswithresultsrunontheMNISTDigitsdataset.Ihave

decidedtouseasamplingofresultsrunusingmanydifferentmachinelearning

methods,whichwillallowustomakeabettercomparisonandseeawiderpictureon

howourresultsmeasureup.

Typeof LowestError HighestError MedianError Citations


Machine Rate Rate Rate
Learning
Method

OurNN 2.7% 3.8% 3.04%


Results

Linear 7.6% 12.0% 8.4% [31][31][31]


classifier

KNearest 0.52% 5.0% 1.33% [32][31][35]


Neighbors

Nonlinear 3.3% 3.6% 3.3% [31][31][31]


classifiers

Supportvector 0.56% 1.4% 0.8% [33][25][31]


machines

NeuralNets 0.35% 4.7% 2.45% [34][31][31]

CIFAR10 Results

TheCifar10datasethasfewerprevioustestsrunonitwhencomparedtotheMNIST

dataset,buttherearestillenoughtomakeacomparison.Thetablebelowhasa

samplingofpriorresultsaswellasourownresults.Whenreportingresultsfrom

40
Cifar10,itistheaccuracyratethatiscommonlyreported(unlikeMNISTwheretheerror

rateisreported).AlsonotethattheimagestobeclassifiedinCifar10aremuchmore

complexthantheonesinMNIST.

TypeofMachine AccuracyRate Citation


LearningMethod

OurNNResults 50.46%

DeeplySupervisedNets 91.78% [37]

NetworkinNetwork 91.2% [38]

SumProductNetworks 83.96% [39]

ConvolutionalKernel 82.18% [40]


Networks

PCANet 78.67% [41]

Other Thoughts

Wetrainedtheneuralnetworkandranourtestafewtimesusingmanydifferent

parameters.Ourmaintestswereallsettorunforamaxof1000iterationswitha

learningrateof0.1andtostopearlyifafter100iterations,nofurtherprogresswas

made.Themodelsallhadonehiddenlayerwiththenumberofhiddennodes(also

sometimesreferredtoashiddenunits)beingvariableeachtimethetestwasrun.

Followingareafewtablesandgraphswithmoredetailedinformationonallthe

numerousexperimentsrun:

41
MNISTTable

Dataset HiddenUnits Numberof ErrorRate


IterationsRan
BeforeIdeal
Solution

MNIST 100 92 3.80%

MNIST 120 104 3.48%

MNIST 140 421 3.21%

MNIST 160 175 3.33%

MNIST 180 165 3.01%

MNIST 200 312 3.00%

MNIST 220 422 2.90%

MNIST 240 584 2.93%

MNIST 260 199 3.15%

MNIST 280 293 3.04%

MNIST 300 589 2.70%

42
MNISTGraphs


43
Cifar10Table

Dataset HiddenUnits Numberof AccuracyRate


IterationsRan
BeforeIdeal
Solution

Cifar10 100 98 46.85%

Cifar10 120 88 47.62%

Cifar10 140 55 49.32%

Cifar10 160 53 47.63%

Cifar10 180 54 48.06%

Cifar10 200 52 48.84%

Cifar10 220 68 49.65%

Cifar10 240 50 50.46%

Cifar10 260 46 48.91%

Cifar10 280 74 49.93%

Cifar10 300 55 49.90%

44
Cifar10Graphs

45
Onedisadvantageofneuralnetworksisthelongtrainingtimes.Wecanuseour

experiencestrainingMNISTtodemonstrate(ourexperienceswithCifar10aresimilar).

Whenwesetthenumberofhiddennodesto100,eachiterationtookaboutsixseconds

torunandourprogramwentthrough193iterationsbeforedecidingtostopearlydueto

lackofprogress.Theidealsolutionwasfoundonthe92nditeration.Weendupgetting

anerrorrateof3.8%.Whenwesetthenumberofhiddennodesto300,eachiteration

tookabouttensecondstorunandourprogramwentthrough589iterationsbefore

decidingtostopearlyduetolackofprogress.Theidealsolutionwasfoundonthe488th

iteration.Weendupgettinganerrorrateof2.7%.Weshouldalsonotethatincreasing

thenumberofhiddennodesfrom100to300dramaticallyincreasedthetimeittookto

traintheneuralnetworkfromaround30minutestoover3hours.

Interestinglyenough,thenumberofiterationsrequiredbeforereachingtheideal

solutionseemstohavenocorrelationwiththenumberofhiddenunits.Thisisbecause

thealgorithmchoosesarandomspottostartwalkingtowardstheidealsolutionand

maysometimeslandinamorefavorablespotinitially.However,ingeneral,more

hiddenunitsresultinmorefavorableresults,especiallyforMNIST.

TheCifar10datasetresultedinmuchworseresultsthantheMNIST.Thiscanbe

explainedbythemuchmorecompleximagescontainedinthedataset.Inorderto

improveresults,itmaybenecessarytoincreasethenumberofhiddenunits,increase

46
thenumberoflayersinthemodel,oruseamoreaggressivelearningrate.

Unfortunatelythisisunfeasiblewiththelimitedhardwarethatwehaveaccessto.

Overall,theseresultsaresomewhattypicalofneuralnetworks,whichseemtohavea

largevariationinerrorratepercentage.OntheMNISTdatasetotherresultsrangefrom

4.7%inatestrunbyYannLeCun[31]to0.35%inatestrunbyDanCiresan[34].The

currentbestresultforCifar10[37]alsousesavariationonneuralnetworks,showing

thatneuralnetworksareindeedagoodcandidateforclassificationofmorecomplex

imageryaswell.

Nevertheless,theseareprettyfavorableresults(especiallywhenconsideringthelimited

hardwarethatthetestwasrunon)andaptlydemonstratethepotentialthatneural

networkshavewhensolvingthesesortsofproblems.

Conclusion

Inconclusion,neuralnetworksareshowntobeaviablechoicewhendoingimage

recognition,especiallyofhandwrittendigitimages.Thisisbecauseneuralnetworksare

especiallyusefulforsolvingproblemswithnonlinearsolutions,whichappliesinthe

caseofhandwrittendigitimages,sincethehiddenunitsareabletoeffectivelymodel

suchproblems.Wemustalsonotethatthisiswiththecaveatofhavingthenecessary

47
computationalhardwareandtime.Withoutsuchresources,resultscanbesubpar,as

shownbytheCifar10tests.

Whiletherearemanyadvantagestousingneuralnetworks,therearealsoafew

drawbacks.Onedrawbacktoneuralnetworksisintraining.Sincethesystemhastogo

throughmanyiterationsduringthetrainingphase,thismaycausetrainingtotakealong

time,especiallywhenrunoncomputersusingolderhardware.

Anotherdrawbacktoneuralnetworks(althoughthisappliestoallmachinelearningin

general)ispickingtherightparameters(suchasnumberofhiddennodesperlayer,

learningrate,andnumberoflayers).Therightparameterscancauseahugedifference

inresultswithamassivedecreaseinerrorratepercentage.However,itisdifficultto

balanceandthewrongchoicescancauseextremelylongtrainingtimesorinaccurate

results.

Futureworkmayincluderunningtestsonmodelsthathavemorehiddenunitsand

layersaswellasusingamoreaggressivelearningrate.Toaccommodatethelarge

hardwaredemandsthatarerequiredtodosuchwork,wemaylookintorunningour

computationsinthecloud.AmazonhastheEC2cloudwhichmaybeabletooffloada

lotofthework.Anotherpossiblewaytodrasticallyimprovehardwareperformancemay

betousethegraphicscardtohelpwithcomputations.NVIDIAhastheCUDAlibrary

48
whichisexcellentatrunningcomputationsinparallel,whichTorch7actuallyhassome

builtinsupportfor.

Withtheriseoftoolssuchastorch7(anddp)neuralnetworksarenowmoreusefulthan

everbeforeandwillprobablybeappliedtomanyotherproblemsinthefuture.Thiscan

rangefromitemrecommendationatashoppingservicelikeAmazonoreventothings

likeselfdrivingcars.WeliveinaneraruledbydataandIamexcitedtoseewhatwill

comenext.

References

[1]WarrenS.McCullochandWalterPitts.
ALogicalCalculusoftheIdeasImmanentin

NervousActivity
.Published1943.AccessedOct2014.

[2]PaulJ.Werbos.
BackpropagationThroughTime:WhatItDoesandHowtoDoIt.

Published1990.AccessedOct2014.

[3]AdamCoates,BrodyHuval,TaoWang,DavidJ.Wu,BryanCatanzaroandAndrew

DeepLearningwithCOTSHPCSystems.
Y.Ng. PublishedJul2013.AccessedOct

2014.

[4]DanqiChen,RichardSocher,ChristopherD.ManningandAndrewY.Ng.
Learning

NewFactsFromKnowledgeBaseswithNeuralTensorNetworksandSemanticWord

Vectors
.PublishedMar2013.AccessedOct2014.

49
[5]RichardSocher,BrodyHuval,BharathBhat,ChristopherD.ManningandAndrewY.

ConvolutionalRecursiveDeepLearningfor3DObjectClassification
Ng. .Published

2012.AccessedOct2014.

[6]GeoffreyE.Hinton,SimonOsindero,andYeeWhyeTeh.
AFastLearningAlgorithm

forDeepBeliefNets
.Published2006.AccessedOct2014.

[7]GeoffreyE.HintonandR.R.Salakhutdinov.
ReducingtheDimensionalityofData

withNeuralNetworks
.Published2006.AccessedOct2014.

ToRecognizeShapes,FirstLearntoGenerateImages
[8]GeoffreyE.Hinton. .

PublishedOct2006.AccessedOct2014.

LearningMultipleLayersofRepresentation
[9]GeoffreyE.Hinton. .PublishedOct2007.

AccessedOct2014.

[10]YannLeCun,FuJieHuang,andLeonBottou.
LearningMethodsforGenericObject

RecognitionwithInvariancetoPoseandLighting
.PublishedCVPR,2004.Accessed

Oct2014.

ScalingLearningAlgorithmsTowardsAI
[11]YoshuaBengioandYannLeCun. .

PublishedMITPress,2007.AccessedOct2014.

[12]PiotrW.Mirowski,YannLeCun,DeepakMadhavan,andRubenKuzniecky.

ComparingSVMandConvolutionalNetworksforEpilepticSeizurePredictionfrom

IntracranialEEG
.Published2008.AccessedOct2014.

IsGoogleCorneringtheMarketonDeepLearning?
[13]AntonioRegalado. .Published

MITTechnologyReview,Jan2014.AccessedOct2014.

[14]Torch7.AccessedOct2014.<http://torch.ch/>

50
[15]Theano.AccessedOct2014.<http://deeplearning.net/software/theano/>

[16]Caffe.AccessedOct2014.<http://caffe.berkeleyvision.org/>

[17]Caltech101.AccessedOct2014.

<http://www.vision.caltech.edu/Image_Datasets/Caltech101/>

[18]PASCALVOC.AccessedOct2014.

<http://pascallin.ecs.soton.ac.uk/challenges/VOC/>

[19]OxfordFlowers.AccessedOct2014.

<http://www.robots.ox.ac.uk/~vgg/data/flowers/>

[20]StanfordDogs.AccessedOct2014.

<http://vision.stanford.edu/aditya86/ImageNetDogs/>

[21]AnimalswithAttributes.AccessedOct2014.

<http://attributes.kyb.tuebingen.mpg.de/>

[22]ThomasSerre,LiorWolf,andTomasoPoggio.
ObjectRecognitionwithFeatures

InspiredbyVisualCortex
.Published2005.AccessedNov2014.

[23]KristenGraumanandTrevorDarrell.
ThePyramidMatchKernel:Discriminative

ClassificationwithSetsofImageFeatures
.PublishedICCV,2005.AccessedNov2014.

[24]JianchaoYang,KaiYu,YihongGong,andThomasHuang.
LinearSpatialPyramid

MatchingUsingSparseCodingforImageClassification
.PublishedCVPR,2009.

AccessedNov2014.

[25]MNISTHandwrittendigitdatabase.AccessedDec2014.

<http://yann.lecun.com/exdb/mnist/>

51
[26]RobertIerusalimschy,LuizHenriquedeFigueiredo,andWaldemarCeles.
The

EvolutionofLua
.AccessedJan2015.<http://www.lua.org/doc/hopl.pdf>

[27]TIOBESoftware:TIOBEIndex.AccessedJan2015.

<http://www.tiobe.com/index.php/content/paperinfo/tpci/index.html>

[28]LuaJIT.AccessedJan2015.<http://luajit.org/luajit.html>

[29]LuaRocks.AccessedJan2015.<http://luarocks.org/>

[30]dp.AccessedJan2015.<http://dp.readthedocs.org/en/latest/index.html>

[31]YannLeCun,LeonBottou,YoshuaBengio,andPatrickHaffner.
GradientBased

LearningAppliedtoDocumentRecognition
.PublishedNov1998.AccessedFeb2015.

[32]DanielKeysers,ThomasDeselaers,ChristianGollan,andHermannNay.

DeformationModelsforImageRecognition
.Published2007.AccessedFeb2015.

TrainingInvariantSupportVectorMachines
[33]DennisDecoste,BernhardScholkopf. .

Published2002.AccessedFeb2015.

[34]DanCiresan,UeliMeier,LucaGambardella,andJuergenSchmidhuber.
DeepBig

SimpleNeuralNetsExcelonHandwrittenDigitRecognition
.PublishedMar2010.

AccessedFeb2015.

[35]MNISTNearestNeighborResults.AccessedFeb2015.

<http://finmath.uchicago.edu/~wilder/Mnist/>

[36]CIFAR10andCIFAR100Datasets.AccessedApril2015.

<http://www.cs.toronto.edu/~kriz/cifar.html>

[37]ChenYuLee,SainingXie,PatrickGallagher,ZhengyouZhang,ZhuowenTu.

DeeplySupervisedNets.
Published2014.AccessedApril2015.

52
NetworkInNetwork.
[38]MinLin,QiangChen,ShuichengYan. Published2013.

AccessedApril2015.

[39]RobertGens,PedroDomingos.
DiscriminativeLearningofSumProductNetworks.

Published2012.AccessedApril2015.

[40]JulienMairal,PiotrKoniusz,ZaidHarchaoui,CordeliaSchmid.
Convolutional

KernelNetworks.
Published2014.AccessedApril2015.

[41]TsungHanChan,KuiJia,ShenghuaGao,JiwenLu,ZinanZeng,YiMa.
PCANet:

ASimpleDeepLearningBaselineforImageClassification.
Published2014.Accessed

April2015.

[42]TheLastAIBreakthroughDeepMindMadeBeforeGoogleBoughtItFor$400m.

AccessedApril2015.

<https://medium.com/thephysicsarxivblog/thelastaibreakthroughdeepmindmadeb

eforegoogleboughtitfor400m7952031ee5e1>

53

54

Vous aimerez peut-être aussi