Using Neural Networks For Image Classification

San Jose State University
SJSU ScholarWorks
Master's Projects Master's Theses and Graduate Research
Spring 5-18-2015
Using Neural Networks for Image Classification

Tim Kang
SJSU
Follow this and additional works at: http://scholarworks.sjsu.edu/etd_projects

Part of the Artificial Intelligence and Robotics Commons
Recommended Citation
Kang, Tim, "Using Neural Networks for Image Classification" (2015). Master's Projects. 395.
http://scholarworks.sjsu.edu/etd_projects/395
This Master's Project is brought to you for free and open access by the Master's Theses and Graduate Research at SJSU ScholarWorks. It has been
accepted for inclusion in Master's Projects by an authorized administrator of SJSU ScholarWorks. For more information, please contact
scholarworks@sjsu.edu.
Using Neural Networks for Image Classification

SanJoseStateUniversity,CS298Spring2015

Author:TimKang(
timothykang.x@gmail.com ),SanJoseStateUniversity
Advisor:RobertChun(
robert.chun@sjsu.edu),SanJoseStateUniversity
Committee:ThomasAustin(thomas.austin@sjsu.edu ),SanJoseStateUniversity
Committee:ThomasHowell(thomas.howell@sjsu.edu ),SanJoseStateUniversity

Abstract

Thispaperwillfocusonapplyingneuralnetworkmachinelearningmethodstoimages
forthepurposeofautomaticdetectionandclassification.Themainadvantageofusing
neuralnetworkmethodsinthisprojectisitsadeptnessatfittingnonlineardataandits
abilitytoworkasanunsupervisedalgorithm.Thealgorithmswillberunoncommon,
publicallyavailabledatasets,namelytheMNISTandCIFAR10,sothatourresultswill
beeasilyreproducible.

1
TableofContents
Introduction 3
Background 3
ShortHistoryofArtificialNeuralNetworks
QuickExplanationofArtificialNeuralNetworks
ALittleRegardingImageClassification
RelatedWork 7
DeepLearningwithCOTSHPCSystems
LearningNewFactsfromKnowledgeBaseswithNeuralTensorNetworksand
SemanticWordVectors
ConvolutionalRecursiveDeepLearningfor3DObjectClassification
AFastAlgorithmforDeepBeliefNets
ReducingtheDimensionalityofDatawithNeuralNetworks
ToRecognizeShapes,FirstLearntoGenerateImages
LearningMultipleLayersofRepresentation
LearningMethodsforGenericObjectRecognitionwithInvariancetoPoseand
Lighting
ScalingAlgorithmsTowardsAI
ComparingSVMandConvolutionalNetworksforEpilepticSeizurePrediction
fromIntracranialEEG
Google
Facebook
ProposedApproach 20
MachineLearningLibaries
DataSets
Hardware
Implementation 27
Overview
MNISTDigitsandCifar10Details
Torch7Details
DataPreparation&FeatureEngineering
RunningtheTest
Results 39
MNISTResults
CIFAR10Results
OtherThoughts
Conclusion 47
References 49
2
Introduction
Computervisionisaproblemthathasexistedforalongtime.Inthispaper,wewillbe
focusingonthetaskofclassificationofcomputerimagesintodifferentpreset
categories.Thisspecificpartofcomputervisionhasmanydiverserealworld
applications,rangingfromvideogamestoselfdrivingcars.However,italsohasbeen
traditionallyverydifficulttopulloffsuccessfully,duetotheenormousamountofdifferent
factors(cameraangles,lighting,colorbalance,resolutionetc.)thatgointocreatingan
image.
Wewillbefocusingonusingartificialneuralnetworksforimageclassification.While
artificialneuralnetworksaresomeoftheoldestmachinelearningalgorithmsin
existence,theyhavenotbeenwidelyusedinthefieldofcomputervision.Morerecent
improvementsinthemethodsoftrainingartificialneuralnetworkshavemadethem
worthlookingintoonceagainforthetaskofimageclassification.
Background
Short History of Artificial Neural Networks
Artificialneuralnetworksweredesignedtobemodeledafterthestructureofthebrain.
Theywerefirstdevisedin1943byresearchersWarrenMcCullochandWalterPitts[1].
3
Backthen,themodelwasinitiallycalledthresholdlogic,whichbranchedintotwo
differentapproaches:oneinspiredmorebybiologicalprocessesandonefocusedon
artificialintelligenceapplications.
Althoughartificialneuralnetworksinitiallysawalotofresearchanddevelopment,their
popularitysoondeclinedandresearchslowedbecauseoftechnicallimitations.The
computationalintensityofartificialneuralnetworkswastoocomplicatedforthe
computersatthetime.Computersatthetimedidnothavesufficientcomputational
powerandwouldtaketoolongtotrainneuralnetworks.Asaresult,othermachine
learningtechniquesbecamemorepopularandartificialneuralnetworksweremostly
neglected.
However,oneimportantalgorithmrelatedtoartificialneuralnetworkswasdeveloped
duringthistimebackpropagation,discoveredbyPaulWerbos[2].Backpropagationis
awayoftrainingartificialneuralnetworksbyattemptingtominimizetheerrors.This
algorithmallowedscientiststotrainartificialnetworksmuchmorequickly.
Artificialneuralnetworksbecamepopularonceagaininthelate2000swhencompanies
likeGoogleandFacebookshowedtheadvantagesofusingmachinelearning
techniquesonbigdatasetscollectedfromeverydayusers.Thesealgorithmsare
nowadaysmostlyusedfordeeplearning,whichisanareaofmachinelearningthattries
4
tomodelrelationshipsthataremorecomplicatedforexample,nonlinear
relationships.
Quick Explanation of Artificial Neural Networks
Eachartificialneuralnetworkconsistsofmanyhiddenlayers.Eachhiddenlayerinthe
artificialneuralnetworkconsistsofmultiplenodes.Eachnodeislinkedtoothernodes
usingincomingandoutgoingconnections.Eachoftheconnectionscanhaveadifferent,
adjustableweight.Dataispassedthroughthesemanyhiddenlayersandtheoutputis
eventuallyinterpretedasdifferentresults.
Inthisexamplediagram,therearethreeinputnodes,showninred.Eachinputnode
representsaparameterfromthedatasetbeingused.Ideallythedatafromthedataset
wouldbepreprocessedandnormalizedbeforebeingputintotheinputnodes.
Thereisonlyonehiddenlayerinthisexampleanditisrepresentedbythenodesin
blue.Thishiddenlayerhasfournodesinit.Someartificialneuralnetworkshavemore
thanonehiddenlayer.
6
Theoutputlayerinthisexamplediagramisshowningreenandhastwonodes.The
connectionsbetweenallthenodes(representedbyblackarrowsinthisdiagram)are
weighteddifferentlyduringthetrainingprocess.
A Little Regarding Image Classification
Imagerecognitionandclassificationisaproblemthathasbeenaroundforalongtime
andhasmanyrealworldapplications.Policecanuseimagerecognitionand
classificationtohelpidentifysuspectsinsecurityfootage.Bankscanuseittohelpsort
outchecks.Morerecently,Googlehasbeenusingitintheirselfdrivingcarprogram.
Traditionally,alotofdifferentmachinelearningalgorithmshavebeenutilizedforimage
classification,includingtemplatematching,supportvectormachines,kNN,andhidden
Markovmodels.Imageclassificationremainsoneofthemostdifficultproblemsin
machinelearning,eventoday.
Related Work
Inacademia,thereisrelatedworkdonebyProfessorAndrewNgatStanfordUniversity,
ProfessorGeoffreyHintonatUniversityofToronto,ProfessorYannLeCunatNewYork
University,andProfessorMichaelJordanatUCBerkeley.Muchoftheirworkdealswith
applyingartificialneuralnetworksorothermachinelearningalgorithms.Followingisa
7
samplingofafewpapersfromthelargebodyofworkthatisavailable.Thesepapers
areallfairlyrecentandaremoregearedtowardsspecificapplicationsofmachine
learningandartificialneuralnetworks.Thereisalsoalotofresearchinvolvingmachine
learningandartificialneuralnetworksgoingoninindustry,mostspecificallyatGoogle
andFacebook,whichwewilltalkaboutbriefly.
Deep Learning with COTS HPC Systems
ThisisacollaborationbetweenAndrewNgsStanfordresearchgroupandNVIDIA[3].
Oneofthemainproblemsfacingmachinelearningtodayistrainingthesystems.As
datasetsgetlargerandlarger,moreandmorecomputingpowerisrequiredtotrainthe
models.Infact,artificialneuralnetworksareespeciallyhardtotrain.
Thepaperpresentsamoreinexpensivealternativetotrainingmachinelearningmodels
byusingCOTSHPC(communityofftheshelfhighperformancecomputing)hardware.
COTSHPChardwarereferstocomputerhardwarethatcanbeboughtatyourtypical
computerhardwarestore:thingslikeIntelorAMDCPUs.Inthiscase,theCOTSHPC
usedwereNVIDIAGPUs.
Theirsetupwasof16servers.EachserverhadtwoquadcoreCPUs,fourNVIDIAGTX
680GPUsandtheFDRInfinibandadapter(forlowlatencycommunication).Theychose
thisconfigurationspecificallytobalancethenumberofGPUswithCPUs,citingpast
8
exampleswheretoomanyGPUsoverwhelmedthesystemsthroughI/O,cooling,CPU
compute,andpowerissues.
TheirsoftwarewaswritteninC++andbuiltontopofthepreviousMVAPICHMPI
framework,chosentomakecommunicationbetweendifferentprocesseseasier.The
codealsoincludesGPUcodewritteninNVIDIAsNVICUDAlanguage.
Intheend,theywereabletotraindeeplearningsystems(includingneuralnetworks)
withover11billionparameters,whichisseveraltimeswhatotherpeoplewereableto
dobefore.
Learning New Facts from Knowledge Bases with Neural Tensor Networks and
Semantic Word Vectors
ThisisanotherpaperfromAndrewNgsStanfordresearchgroup,thistimefocusedon
usingneuralnetworkstotrytoextractdataandinsightsfromunannotatedtext[4].It
focusesonlexicaldatabaseslikeWordNetandYago.Thesedatabasesstore
informationaboutEnglishwords,specificallydefinitionandusage,andalsoprovide
informationabouttherelationshipbetweendifferentEnglishwords.Theyarecommonly
usedinaritificialintelligenceandtextprocessingresearch.
9
Thepaperusesaspecifictypeofneuralnetworkcalledaneuraltensornetwork.This
typeofneuralnetworkisusedbecauseitiseasiertoadapttowordsandtheir
relationships.Abigadvantageofusingthistypeofmodelisthatitcanrelatetwoinputs
directly.Themodelsweretrainedusinggradients.
UsingtheWordNetdataset,thetrainedmodelwasaskedtoclassifywhetheran
arbitrarytripletofentitiesandrelationsistrueornot.Falseexampleswerecreatedby
purposelymessingupexistingknowntripletsbyswitchingentitiesandrelations.This
modelwasabletoachieve75.8%accuracy,whichismuchbetterthanwhatother
researcherswereabletoachievebefore,withsimilaritybasedmodels(66.7%)and
Hadamardbasedmodels(71.9%).
Convolutional Recursive Deep Learning for 3D Object Classification
ThisisyetanotherpaperfromAndrewNgsStanfordresearchgroup.Thistimethe
paperisaboutusingneuralnetworkstoclassifyimagesofobjects,specificallyfocusing
onRGBDimages[5].RGBDimagesareimagesthatalsohavedepthinformationin
additiontothetypicalcolorinformationincludedinimages.Agood,everydayexample
ofadevicethatcapturesRGBDinformationistheKinectsensorcreatedbyMicrosoft
fortheXboxOneandtheXbox360.
10
Thepaperusesatypeofneuralnetworkcalledtheconvolutionalrecursiveneural
network(CNNRNN)forlearningandclassifyingRGBDimages.Thisactuallyconsists
oftwodifferentnetworks.Theconvolutionalneuralnetworkisfirsttrainedinan
unsupervisedmannerbyclusteringtheimages.Thisisthenusedtocreatethefiltersfor
theCNNs.TheresultingfeaturesarethenfedintotheRNNs,whichclassifytheimages.
ThedatasetusedinthispaperwastheRGBDObjectDatasetfromtheUniversityof
Washington,organizedbyKevinLai.Thepaperwassuccessfulinclassificationofthe
objectsintheLaidataset,outperformingmostpreviousattemptsmadeusingthesame
dataset.
A Fast Algorithm for Deep Belief Nets
ThispaperisfromGeoffreyHinton'sgroupattheUniversityofTorontoanddealswith
usingtheconceptof"complementarypriors"tobothspeedupandimproveneural
networksthathavealargenumberofhiddenlayers[6].Thisisdonebycreatinga
neuralnetworkwiththetoptwolayersasundirectedassociatedmemoryandtherestof
thehiddenlayersasanacyclicgraph.Thereareafewadvantagesofdoingthis,most
notablythatitallowstheneuralnetworktofindadecentsetofparametersmuchmore
rapidly.
11
ThepaperusesthecommonlyreferencedMNISThandwrittendigitsdatabasetotest
outthisnewwayofcreatinganeuralnetwork.Anotheradvantagegainedfromusingthe
MNISTisthattherehasalreadybeenalotofresearchpublishedusingtheMNISTso
thatitwouldbeeasyfortheresearcherstofindothermethodstocompareagainst.The
resultsthispapergotafterrunningagainsttheMNISTdatabasewerefavorabletoother
resultsobtainedbyusingthemorecommonfeedforwardneuralnetworks.
Whiletheinitialresultsweregood,thepaperalsooutlinesseveralproblemsthatcould
limitthepowerofthisparticularmethod.Forexample,ittreatsnonbinaryimagesas
probabilities,whichwon'tworkfornaturalimages.
Reducing the Dimensionality of Data with Neural Networks
ThisisapublicationbyGeoffHintonoriginallyappearingintheScienceMagazinein
July2006anddealswiththeproblemofusingneuralnetworkstoreducethe
dimensionalityofdata[7].Theproblemofdimensionalityreductionhasbeen
traditionallytackledusingmethodslikeprincipalcomponentsanalysis(PCA).Principal
componentsanalysisbasicallylooksforthegreatestvarianceinthedatasetandcanbe
donewithalgorithmslikesingularvaluedecomposition.
Thispaperdiscussesusingatypeofneuralnetworkcalledan"autoencoder"asan
alternativetousingprincipalcomponentsanalysis.Anautoencoderisamultilayer
12
neuralnetworkthatcantakehighdimensionaldata,encodeitintolowdimensional
formatandalsohastheabilitytotrytoreconstructtheoriginalhighdimensionaldata
usingfilesinthelowdimensionalformat.
Thepaperteststhetrainedautoencoderonseveraldifferentdatasets.Oneofthedata
setsusedwasacustomrandomlygeneratedsetofcurvesinthetwodimensional
space.Forthisdataset,theautoencoderwasabletoproducemuchbetter
reconstructionsthanPCA.TheMNISTdigitsdataset,Olivettifacedataset,andadata
setofdocumentswerealsoused.Onceagain,theautoencoderwasabletoperform
betterthanPCA.
To Recognize Shapes, First Learn to Generate Images
ThispaperisfromGeoffreyHintonattheUniversityofTorontoandtheCanadian
InstituteforAdvancedResearchanddealswiththeproblemoftrainingmultilayerneural
networks[8].Itisanoverviewofthemanydifferentstrategiesthatareusedtotrain
multilayerneuralnetworkstoday.
Firstitdiscusses"fivestrategiesforlearningneuralnetworks,"whichincludedenial,
evolution,procrastination,calculus,andgenerative.Outofthesefivestrategies,the
mostsignificantonesarestrategiesfourandfive.Calculusincludesthestrategyof
13
backpropagation,whichhasbeenindependentlydiscoveredbymultipleresearchers.
Generativeincludesthe"wakesleep"algorithm.
Therestofthepapergoesovermanycommonwaysoftrainingamultilayerneural
networkincluding"learningfeaturedetectorswithnosupervision,learningonelayerof
featuredetectors(withrestrictedBoltzmannmachines),agreedyalgorithmforlearning
multiplehiddenlayers,usingbackpropagationfordiscrimininativefinetuning,andusing
contrastivewakesleepforgenerativefinetuning."
14
Learning Multiple Layers of Representation
ThisisapaperbyGeoffreyHintonthatdealswiththeproblemoftrainingmultilayer
neuralnetworks[9].Trainingmultilayerneuralnetworkshasbeendonemostlyusingthe
backpropagationalgorithm.Backpropagationisalsothefirstcomputationallyfeasible
algorithmthatcanbeusedtotrainmultiplelayers.However,backpropagationalsohas
severallimitations,includingrequiringlabeleddataandbecomingslowwhenusedon
neuralnetworkswithexcessiveamountsoflayers.
Thispaperproposesusinggenerativemodelstosolvethatproblem.Neuralnetworks
canberun"bottomup"inordertomakerecognitionmodelsor"topdown"tomake
generativeconnections.Running"topdown"throughaneuralnetworkwithstochastic
neuronswillresultincreatinganinputvector.Thepapersuggeststrainingmodelsby
tweakingtheweightsonthetoplevelconnectionsandtryingtoachievethemaximum
similaritywiththeoriginaltrainingdata,withthereasoningbeingthatthehigherlevel
featureswillbeabletoaffecttheoutcomemorethanthelowerlevelfeatures.According
tothepaper,thekeytomakingthisworkistherestrictedBoltzmannmachine(RBM).
Thepapertestsitstrainedmodelsontwodifferentdatasets,theMNISThandwritten
digitsdatasetandalsoadatasetofsequentialimages.Intheend,Hintonattributesthe
successofthismodeltothreedifferentfactors:usingagenerativemodelinsteadof
15
attemptingtoclassify,usingrestrictedBoltzmannmachinestolearnonelayeratatime,
andhavingaseparatefinetuningstage.
Learning Methods for Generic Object Recognition with Invariance to Pose and
Lighting
ThispaperisbyYannLeCunfromtheCourantInstituteatNewYorkUniversityand
dealswithobjectrecognition[10].Recognitionofobjectsusingjustshapeinformation,
withoutaccountingforpose,lighting,andbackgrounds,isaproblemthathasnotbeen
dealtwithfrequentlyinthefieldofcomputervision.
ThepaperusestheNORBdataset,whichisalargedatasetwith97,200imagepairsof
50objectsthatbelongtofivedifferentcategories,specificallyhumanfigures,animals,
airplanes,trucksandcars.ThisNORBdatasetwasusedtogenerateotherdatasets
wheretheimagesvaryinlocation,scale,brightness,etc.
Awiderangeofdifferentmachinelearningmethodswereusedtotesttheimagesfrom
thedatasets,includinglinearclassifier,KNN(nearestneighborwithEuclidean
distance),pairwiseSVM(supportvectormachineswithaGaussiankernel),and
convolutionalneuralnetworks.Forthemostpart,convolutionalneuralnetworksended
upwithgoodperformancecomparedtotheothermethods,exceptsurprisinglyonthe
jitteredclustereddataset.Manyoftheothermethodsalsoranintosignificanttroubles
16
becauseoflimitationsinCPUpowerandtimeandcouldnotbetrainedreasonablyon
manyofthedatasetstested.
Scaling Algorithms Towards AI
ThisisacollaborationbetweenYoshuaBengiooftheUniversityofMontrealandYann
LeCunofNewYorkUniversityanddealswiththelongtermproblemoftraining
algorithmstoworkwithartificialintelligence[11].Thepaperdiscussesmanyofthe
commonlimitationsfoundwhenworkingwithartificialintelligence,mainlyshallow
architectureandlocalestimators.
Thepaperdoesnotattempttofindagenerallearningmethod,sayingthatsuchataskis
doomedtofailure.Instead,theyattempttolookforlearningmethodsforspecifictasks,
reasoningthatfindingoutthesemethodswillbringthemclosertocreatinganartificially
intelligentagent.
Thepaperthengoesintomoredetailontheadvantagesanddisadvantagesofdifferent
algorithmsetups,forexampledeepversusshallow.Italsocomparesvariousalgorithms
likesupportvectormachinesandconvolutionalneuralnetworksagainsteachother,
usingdatasetsliketheMNISThandwrittendigitsdatasetandtheNORBdataset.
17
Comparing SVM and Convolutional Networks for Epileptic Seizure Prediction
from Intracranial EEG
ThisisapaperfromYannLeCunofNewYorkUniversitythatfocusesonusingmachine
learningmethodslikesupportvectormachinesandconvolutionalneuralnetworksin
ordertotrytopredictwhenepilepticseizureswilloccur[12].Epilepsyisaneural
diseasethataffectsaroundonetotwopercentoftheworldpopulationandcausesits
victimstohaveseizuresoccassionally.Therehasbeenalotofotherresearchdoneon
tryingtopredictwhentheseseizureswilloccur,butalmostnoneusingmodernmachine
learningmethods.
Thepaperusesdatagatheredfromanelectroencephalagraphymachine,whichrecords
thelocalvoltageofmillionsofbraincellsatatime.Currentmethodsofseizure
predictionsufferfromatradeoffbetweenbeingabletopredicttheseizuresandavoiding
falsealarmswhenpredictingtheseizures.Themostcommonapproachcurrentlyused
isbinaryclassification,whichissusceptibletotheseproblems.Themachinelearning
algorithmsusedbythispapercanmitigatetheseproblemsbecauseoftheirabilityto
clasifynonlinearfeaturesinahighdimensionalfeaturespace.
ThepaperthenusesMATLABtoimplementthesupportvectormachinesand
convolutionalneuralnetworksused.Theresultswerehighlysuccessful,especiallyfor
18
theconvolutionalneuralnetworks,whichwereabletoachievenofalsealarmsonallof
thepatientsexceptforone.
Google
Googlehasbeenfocusingmuchmoreonitsmachinelearninganddatascience
departments.AlmosteveryproductatGoogleusessomesortofmachinelearningor
datascience.Forexample,GoogleAdsenseusesdatasciencetobettertargetads
towardscustomersandPicasausesmachinelearningtorecognizefacesinimages.
OneofthemoreinterestingGoogleproductsusingmachinelearninganddatascience
isDeepMind.DeepMindTechnologieswasatechstartupbasedinLondonthatwas
acquiredbyGooglenearthebeginningof2014[13].Theirgoalistocombinethebest
techniquesfrommachinelearningandsystemsneurosciencetobuildpowerful
generalpurposelearningalgorithms.
DeepMindhas,infact,trainedaneuralnetworktoplayvideogames,includingclassics
likePongandSpaceInvaders[42].
Facebook
LikeGoogle,Facebookhasalsobeenfocusingalotonmachinelearninganddata
science.Thisisbecause,asacompanyrelyingheavilyonadvertisingforrevenue,and
19
asacompanywithhugeamountsofuserpersonaldata,machinelearninganddata
sciencewillallowthemtotargettheiradsbetterandgetmorerevenue.
ThemainFacebookoperationiscurrentlybasedinNewYorkCity.In2014,Facebook
hiredNewYorkUniversityprofessorandfamedneuralnetworkresearcherYannLeCun
tohelpheadandleadthisoperation.
Proposed Approach
Machine Learning Libaries
Weconsideredmanydifferentmachinelearninglibraries,including:
Torch7
Theano/PyLearn
Caffe
Torch7isamachinelearninglibrarythatisbeingdevelopedatNewYorkUniversity,the
IdiapResearchInstitute,andNECLaboratoriesAmerica[14].Accordingtotheir
description:
20
Torch7isascientificcomputingframeworkwithwidesupportformachine
learningalgorithms.Itiseasytouseandprovidesaveryefficient
implementation,thankstoaneasyandfastscriptinglanguage,LuaJIT,andan
underlyingCimplementation.
TheanoisamachinelearninglibrarybeingdevelopedmainlyattheUniversityof
Montreal[15].ItisaPythonlibraryandismoreofageneralpurposecomputeralgebra
systemlibrarywithaemphasisonmatrixoperations.
CaffeisamachinelearninglibrarybeingdevelopedatUCBerkeley[16].Accordingto
theirdescription:
Caffeisadeeplearningframeworkdevelopedwithcleanliness,readability,and
speedinmind.ItwascreatedbyYangqingJiaduringhisPhDatUCBerkeley,
andisinactivedevelopmentbytheBerkeleyVisionandLearningCenter(BVLC)
andbycommunitycontributors.CaffeisreleasedundertheBSD2Clause
license.
Itisimportanttonotethatallthreelibrarieshaveafocusonthedeeplearningaspectof
machinelearning,butarealsoconfigurableinmanydifferentwaysandcansupport
manyotheralgorithms.
21
Aftersomepreliminaryevaluation,wedecidedonusingtheTorch7machinelearning
library.Thislibrarywaschosenoutofthedifferentmachinelearningframeworksthat
supportartificialneuralnetworksbecauseofmanyreasons.Speedwise,itisalotfaster
thanthealternatives.ItalsosupportsinterfacingwithCandCUDAcodeeasily.Finally,
outoftheframeworksconsidered,atthispointintime,itisthemostcommonlyusedin
industry.BothFacebookandGooglehaveteamsthatareusingTorch7formachine
learningresearch.
Data Sets
Wealsoconsideredmanydifferentimagedatasets,including:
Caltech101
PASCALVOC
MNISTDigits
Flowerclassificationdataset
StanfordDogs
AnimalswithAttributes
Cifar10
Whenconsideringthedifferentimagedatasets,wetookintoconsiderationthesizeof
thedataset,thecontentoftheimages,andtheformatthedatainthedatasetis
22
presented.Wewantedadatasetthatalreadyhadtheimageswellformattedandwould
beeasytoworkwithusingourmachinelearninglibraries.
TheCaltech101dataset[17]is,accordingtotheirdescription:
Picturesofobjectsbelongingto101categories.About40to800imagesper
category.Mostcategorieshaveabout50images.CollectedinSeptember2003
byFeiFeiLi,MarcoAndreetto,andMarc'AurelioRanzato.Thesizeofeach
imageisroughly300x200pixels.
Thisdatasetofimagesalsocontainsoutlineannotationsofthedifferentobjectsshown,
whichpossiblycouldcomeinusefullater.
ThePASCALVOCdataset[18]isfromthe2009challengerunbythePASCAL2
NetworkofExcellenceonPatternAnalysis,StatisticalModelling,andComputational
LearningandfundedbytheEU.
Accordingtotheirwebsite,PASCALVOCcontainstwentycategoriesofeveryday
objects:
Person:person
Animal:bird,cat,cow,dog,horse,sheep
23
Vehicle:aeroplane,bicycle,boat,bus,car,motorbike,train
Indoor:bottle,chair,diningtable,pottedplant,sofa,tv/monitor
TheMNISTdigitsdataset[25]isalargecollectionofimagesofhandwrittendigits.They
haveatrainingsetof60,000examplesandatestingsetof10,000examples.The
imageshavealreadybeencenteredandnormalizedandthewholedatasetisasmaller
subsetofalargerdatasetavailablefromtheNIST(NationalInstituteofStandardsand
Technology).
TheFlowerClassificationdatasetisfromtheVisualGeometryGroupattheUniversityof
Oxford.Thereareactuallytwoversionsofthedataset,onewith17categoriesandone
with102categories.
TheflowersarecommonflowersseenintheUnitedKingdom[19].
StanfordDogsdataset[20]isadatasetfromStanfordUniversity.Accordingtotheir
website:
TheStanfordDogsdatasetcontainsimagesof120breedsofdogsfromaround
theworld.Thisdatasethasbeenbuiltusingimagesandannotationfrom
ImageNetforthetaskoffinegrainedimagecategorization.
24
Itcontains20,580imagesofdogssortedinto120differentcategorieswithclasslabels
andboundingboxes.
AnimalswithAttributes[21]isadatasetfromtheMaxPlanckInstituteforBiological
Cybernetics,whichislocatedinTbingen,BadenWrttemberg,Germany.Accordingto
theirdescription:
Thisdatasetprovidesaplattformtobenchmarktransferlearningalgorithms,in
particularattributebaseclassification.Itconsistsof30475imagesof50animal
classeswithsixpreextractedfeaturerepresentationsforeachimage.The
animalclassesarealignedwithOsherson'sclassicalclass/attributematrix,
therebyproviding85numericattributevaluesforeachclass.Usingtheshared
attributes,itispossibletotransferinformationbetweendifferentclasses.
Cifar10[36]isadatasetputtogetherbyAlexKrizhevsky,VinodNair,andGeoffrey
HintonfromtheUniversityofToronto.Itisasmallersubsetofthelarger80milliontiny
imagesdataset,butwiththeadvantageofhavingeverythinglabeled.Thereare50000
trainingimagesand10differentcategoriesairplane,automobile,bird,cat,deer,dog,
frog,horse,shipandtruck.
Afterlookingoverallthedifferentdatasets,itwasdecidedthatwewillbemainlyusing
theMNISTdigitsdataset.Alotofpriorresearchhasbeendoneusingthisdatasetsowe
25
caneasilycompareresultstoteststhatothershaverunbefore.Wewillalsoberunning
additionaltestsusingtheCifar10datasetduetotheexcellentsupportithaswithour
othertools.
Hardware
AllofthiswillberunonacomputerrunningUbuntuLinux14.04TrustyTahrwiththe
followingspecifications:
AMDPhenomIIX3720
4GBRAM
NvidiaGeforce750Ti
500GB7200rpmHDD
TheNvidiaGeforce750TigraphicscardcanbeespeciallyusefulbecauseTorch7is
alsocodedtoincludesupportfortheNvidiaCUDAlibrary.NvidiaCUDAallows
programstousethemassivelyparallelcomputingpowerofagraphicscard.Therestof
thehardwarewaschosensimplyjustbecauseitwasreadilyavailable.
26
Implementation
Overview
Theimplementationofourneuralnetworkrequiresmanydifferentsteps.
Thefirststeprequiredisdatasetpreparation.Evenforthosethatarecommonlyused,
datasetscomeinmanydifferentformats.Itisoftennecessarytowriteafewshort
scriptsthatwilltakeintheexamplesfromthedatasetandthenformatthemproperlyfor
themachinelearningtoolsthatwillbeused.
Itisalsooftennecessarytodowhatisknownasfeatureengineering.Examplesfrom
datasetscanhavetoomanyfeatures.Runningatrainingalgorithmonadatasetwithtoo
manyfeaturescancausethealgorithmtobecomeconfusedandproducesubpar
results.Therefore,itisnecessarytopickoutwhichfeaturestokeepandwhichto
remove(ortogivelessweightto).Thiscanbedonemanuallybyhandorusingan
algorithmlikePCA(principalcomponentsanalysis).
Finally,evenaftersuccessfullyrunningthealgorithmonadataset,itmaybehelpfulto
tweaksomeparametersandrerunthealgorithm.
27
MNIST Digits and Cifar10 Details
Asmentionedabove,theMNISTDigitsisadatasetofhandwrittendigits.Thereare
60,000differenthandwrittendigitfilesavailableinthisparticulardataset,designedtobe
usedfortrainingtheselectedalgorithm.Therearealso10,000digitfilesfromthe
datasetdesignedtobeavailablefortestingoutthealgorithmafterithasbeentrained
[25].
ItisavailableonlinefromtheNYUCourantInstituteandisasubsetofalargerNIST
(NationalInstituteofStandardsandTechnology)datasetofhandwrittendigits.The
originalNISTdatasetconsistsofmanyspecialdatabases,whicharehandwrittendigits
collectedfromdifferentsourcesandareorganizedintogroupscalledSpecialDigits.
TheMNISTdatasetusesdigitsfromtheNISTSpecialDigits1andNISTSpecialDigits
3.ThedigitsfromSpecialDigits1arefromCensusBureauemployeeswhilethedigits
fromSpecialDigits3werecollectedfromhighschoolstudents.MNISTusesaneven
mixtureof30,000digitsfromSpecialDigits1and30,000digitsfromSpecialDigits3for
thetrainingsetandanevenmixtureof5,000digitsfromeachforthetestingset.
ThedigitfilesareimagesoftheArabicnumerals0to9.Eachimageis28by28pixels
andisnormalizedsothatthenumeralsfitwhilealsokeepingthesameaspectratio.The
imageshavealsobeencenteredtofitintothe28by28pixelarea.Thefollowingisa
samplingofimagesfromtheMNISTdataset:
28

Cifar10andCifar100aredatasetsfromtheUniversityofToronto.Thesedatasetsarea
subsetofthe80milliontinyimagesdataset,withtheadvantageofhavingeverything
labeled.
TheCifar10datasetcontains60,000imagesthataresortedinto10differentclasses
whiletheCifar100contains60,000imagesthataresortedinto100differentclasses.We
willbeusingtheCifar10datasetforourexperiments.
29
Theimagesinthedatasetare32x32colorimages.Theyarecategorizedinto10
differentclassesairplane,automobile,bird,cat,dog,deer,frog,horse,shipandtruck.
Theclassescontainnooverlapwitheachotherforexampleifsomethingislabeledas
acaritwillnotbelabeledasatruck.Followingisasamplingofimagesfromthe
dataset:
Torch7 Details
Asmentionedabove,Torch7isanopensourcemachinelearninglibrarybeing
developedprimarilyatNewYorkUniversity.ItusestheLuascriptinglanguageasits
defaultlanguageofchoice,althoughitalsoallowssnippetsofCcodetobeinsertedas
wellasinterfacingwithNVIDIACUDA,whenspeedisespeciallyimportant.
30

TheLuaprogramminglanguageisanunusualchoicebythedevelopersofTorch7.
Accordingtotheirwebsite[26],Luadescribesitselfas:
Luaisapowerful,fast,lightweight,embeddablescriptinglanguage.
Luacombines simpleproceduralsyntaxwithpowerfuldatadescriptionconstructs
basedon associativearrays andextensiblesemantics.Luaisdynamicallytyped,
runs by interpreting bytecode for a registerbased virtual machine, and has
automatic memory management with incremental garbage collection, making it
idealforconfiguration,scripting,andrapidprototyping.
ItwasdevelopedatthePontificalCatholicUniversityofRiodeJaneiroinBrazilbythree
computerscientists:LuizHenriquedeFigueiredo,RobertoIerusalimschy,and
WaldemarCeles[26]whowerepartoftheTecgraf(ComputerGraphicsTechnology
Group)atthetime.LuawasdevelopedduringaperiodoftimewhenBrazilhadenacted
manytradebarriers,especiallyinregardstotechnology.Asaresult,Luawascreated
almostfromscratchandhasmanystrangequirks.Forexample,itiscustomaryforLua
arrayindicestostartat1insteadofthestandard0usedinmostotherprogramming
languages.
31
Althoughithasbeenusedbymanylargecompanies,includingAdobe,Bombardier,
Disney,ElectronicArts,Intel,LucasArts,Microsoft,NASA,OlivettiandPhilips[26],its
usageinthegeneralprogrammingcommunityremainsquitelow.
TheTIOBEIndexisarankingofprogramminglanguagepopularitythatismaintainedby
TIOBESoftware[27].Whileitisnotanexactmeasurementbyanymeans,itisagood
waytogetaroughestimateofaparticularprogramminglanguagespopularitywiththe
community.AccordingtotheTIOBEIndexforJanuary2015,Luaisranked31stin
popularitybeingusedinabout0.649%ofallprogrammingapplications,evenbehindold
languagessuchasAdaandPascal.
Torch7usestheLuaJITcompilerformostgeneralpurposes[28].LuaJITisaopen
sourceLuacompilerthataimstoprovideaJIT(justintime)compilerfortheLua
language.ManyotherlanguageslikeJavaalsousejustintimecompilationforthe
compiler.Theadvantageofjustintimecompilationisthatitallowscodetobeexecuted
morequicklythancodethanisinterpreted.
Torch7alsoallowsfortheuseofLuaRocks,whichisanopensourcepackage
managementsystemforLua[29].Programscanbebundledtogetherintheformofa
packagecalledaLuaRock.ManyofthecoreTorch7packagesarehostedat
LuaRocksandcanbeinstalledeasilyfromthecommandline.Thefollowingcommand
32
isanexampleofhowLuaRockscanbeusedtoinstallapackagecalled
somepackage.
$ luarocks install somepackage
ThereisalsoacustomcommandlineinterpreterincludedwiththedefaultTorch7install.
Thiscanbeaccessedthroughthethcommandfromtheterminal,onceTorch7is
installedandallthePATHsettingsareconfiguredcorrectly.Thiscustomcommandline
interpreteriscalledTREPL,whichstandsfortorchreadevalprintloop.TREPLhas
severaladvantagesoverthedefaultLuaonebecauseithasmanyextrafeatures
designedtomakeworkingwithTorch7Luacodeeasier,suchastabcompletionand
history.Thisisanexampleofthethcommand,takenfromtheTorch7website[14]:
$ th
______ __ | Torch7
/_ __/__ ________/ / | Scientific computing
forLua.
/ / / _ \/ __/ __/ _ \ |
/_/ \___/_/ \__/_//_/ | https://github.com/torch
| http://torch.ch
th> torch.Tensor{1,2,3}
1
2
3
[torch.DoubleTensor of dimension 3]
th>
33
OutofthenumerousLuaRocksavailablethroughthepackagemanagementsystem,an
especiallyimportantoneforthisprojectisdp.Thisisalibrarydesignedtofacilitatethe
processofusingTorch7fordeeplearning.dpwasdevelopedbyNicholasLeonard
whilehewasagraduatestudentworkingintheLISAlabunderthesupervisionof
YoshuaBengioandAaronCourville[30].
Itdescribesitselfonitshomepageasahighlevelframeworkthatabstractsaway
commonusagepatternsofthennandtorch7packagesuchasloadingdatasetsand
earlystoppingwithhyperparameteroptimizationfacilitiesforsamplingandrunning
experimentsfromthecommandlineorpriorhyperparameterdistributionsandfacilites
forstoringandanalysinghyperpametersandresultsusingaPostgreSQLdatabase
backendwhichfacilitatesdistributingexperimentsoverdifferentmachines.
Data Preparation & Feature Engineering
BoththeMNISTdigitsdatasetandtheCifar10datasetdonotcomeinastandardimage
format.Theycomeintheirownspecialformatsdesignedforstoringvectorsand
multidimensionalmatrices.Usuallywhenworkingwiththesetypeofdatasets,oneis
requiredtowriteasmallprogramtoparsethespecialformat.However,thedpLuarocks
module(whichisdesignedtoeliminatecommonrepetitivetasks)makesthis
unnecessarybecauseitalreadyincludesasmallamountofcodetofacilitatetheloading
ofthedatafrommanycommondatasets(includingMNISTandCifar10).
34

ThedpLuarocksmodulealsocontainsafewmethodstohelppreprocessand
dp.Standardize
standardizethedata.Thisisaccomplishedusingthe method,which
subtractsthemeanandthendividesbythestandarddeviation.WhiletheMNIST
datasetisalreadyformattednicelyforthemostpart,weapplyitanywayusingthe
commoncodepatternshowbelow:
mydata = dp.Mnist{ input_preprocess = dp.Standardize()}
Thisgetsridofanyanomaliesandisgoodpracticeingeneralwhendoingmachine
learning.TheCifar10datasetdoesnotworkwellwithdp.Standardize
,soweleaveit
outwhenrunningtestsonCifar10.
Running the Test
Thefirststepinrunningthetestiscreatingasetofparametersthatwillbeusedforthe
test.StoringalltheseparametersinatableintheformofaLuavariableisagoodidea
becauseitletsuskeeptrackofthingsmoreeasilyandalsoallowsustochange
parametersasweseefit.Thecodeforthatwouldlooksomethinglikethis:
myparams = {
hiddenunits = 100,
learningrate = 0.1,
momentum = 0.9,
maximumnorm = 1,
35
batchsize = 128,
maxtries = 100,
maxiterations = 1000
}
ThesearethebasicparametersthatweneedtosupplytoTorch7anddpinordertorun
ourtest.Hereisanquickexplanationofwhateachoftheseparametersmean:
hiddenunits
:Thisrepresentsthenumberofnodesinthehiddenlayer(shown
asthebluecolorednodesinapreviousdiagram).
learningrate
:Thisvariabledetermintesthelearningratefortheneural
network.Asmallerlearningratemakeshesystemlearninfinerincrements,but
canalsodrasticallyincreasethetimerequiredtotrainthesystem.
momentum
:Themomentumaddspartofthepreviousweighttothecurrent
weight.Thisisdonetotrytopreventthesystemfromsettlingonalocalminimum
whentraining.Ahighmomentumcanmakethesystemtrainfaster,butcanalso
endupovershootingtheminimum.
maximumnorm
:Thisisusedtodeterminehowmuchtoupdatetheneuron
weights.
batchsize
:Thisisthebatchsizeforthetrainingexample.
maxtries
:Thisdetermineswhentostopthetrainingprocessearly,ifaftera
certainamountoftries,errorhasnotdecreased
maxiterations
:Thisdeterminesthemaximumamountoftimestoiterate
overall.
36

Wethenneedtobuildamodeltorepresentourneuralnetwork.Weusethe
dp.Neural
classwhichrepresentsalayerintheneuralnetwork.Weusetheparametersthatwe
setaboveinmyparams.Asinglelayerlookssomethinglikethis:
dp.Neural {
input_size = mydata:featureSize(),
output_size = myparams.hiddenunits,
transfer = nn.Tanh()
}
Wecreateseveraloftheselayersandcombinethemtogetherusingthe
dp.Sequential transfer
moduletoformourneuralnetwork.The variableusedtoset
thetransferfunction.Transferfunctionsareusedtoallowformorecomplexitythanwhat
atypicallogisticregressionfunctionwouldprovide.
train
Nextwesetupthreepropagators: valid
, test
,and .Thesedeterminehowthe
neuralnetworkwilltrainthesystemanddeterminewhatisgoodandwhatisbad.
train
The dp.Optimizer
propagatorisaninstanceofthe classandrequiresafew
parameterstobeprovided:
loss
:Thisrepresentsthetypicalmachinelearninglossfunction,whichisa
functionthatthesystemwantstominimize
37
visitor
:Hereweusesomeoftheparameterswesetaboveinmyparms,
momentum
namely learningrate
, and
maximumnorm
feedback
:Thisdetermineshowfeedbackisprovidedaftereachiterationof
dp.Confusion
training.Weuse ,whichisjustaconfusionmatrix.
sampler
:Thisdetermineswhichordertoiteratethroughthedataset.Forthetrain
dp.ShuffleSampler
propagatorweuse whichrandomlyshufflesthedataset
beforeeachiteration.
valid
The test
and dp.Evaluator
propagatorsareinstancesofthe classandalso
requireafewparameterstobeprovided:
loss train
:(sameas propagator)
feedback train
:(sameas propagator)
sampler valid
:Forthe test
and propagators,weuseadifferentsamplerthan
train
the dp.ShuffleSampler
propagator.Insteadof ,weuse
dp.Sampler
whichiteratesthroughthedatasetinorder.
Finally,wesetuptheexperimentandpreparefortraining.Weusethe
dp.Experiment
classwhichtakesinthefollowingparameters:
model dp.Neural
:Thisissetusingthemodelthatwesetupbeforeusing to
dp.Sequential
formlayersand tocombinethemintoaneuralnetwork
38
optimizer train
:Thisissetusingthe propagatorthatwecreatedabove.
validator valid
:Thisissetusingthe propagatorthatwecreatedabove.
tester
:Thisissetusingthetestpropagatorthatwecreatedabove.
observer
:Observerisafeatureofdpthatlistensasthemodelisbeingtrained
andthencallsspecificfunctionswhencertaineventsoccur.Ourobserveruses
dp.EarlyStopper
whichendsthetrainingprocessearlyifnoadditionalresults
dp.FileLogger
arebeingobtainedand whichwritestheresultstoafile
max_epoch
:Thisisthemaximumnumberofiterationsthattheexperimentwillgo
myparams.maxiterations
throughwhentraining.Wesetthisto .
Thefinalstepisrunningtheexperimentonthedatasets,whichcanbeaccomplishedby
thefollowinglineofcode:
myexperiment:run(mydata)
Nowthatwehaveruntheexperiment,letuslookattheresults.
Results
MNIST Results
OneofthemainadvantagesoftheMNISTdatasetisthatitiswidelyusedandsothere
arealotofprevioustestsrunontheMNISTdatasetthatwecancompareourresultsto.
39
FollowingareafewpaperswithresultsrunontheMNISTDigitsdataset.Ihave
decidedtouseasamplingofresultsrunusingmanydifferentmachinelearning
methods,whichwillallowustomakeabettercomparisonandseeawiderpictureon
howourresultsmeasureup.
Typeof LowestError HighestError MedianError Citations

Machine Rate Rate Rate
Learning
Method
OurNN 2.7% 3.8% 3.04%

Results
Linear 7.6% 12.0% 8.4% [31][31][31]

classifier
KNearest 0.52% 5.0% 1.33% [32][31][35]

Neighbors
Nonlinear 3.3% 3.6% 3.3% [31][31][31]

classifiers
Supportvector 0.56% 1.4% 0.8% [33][25][31]

machines
NeuralNets 0.35% 4.7% 2.45% [34][31][31]
CIFAR10 Results
TheCifar10datasethasfewerprevioustestsrunonitwhencomparedtotheMNIST
dataset,buttherearestillenoughtomakeacomparison.Thetablebelowhasa
samplingofpriorresultsaswellasourownresults.Whenreportingresultsfrom
40
Cifar10,itistheaccuracyratethatiscommonlyreported(unlikeMNISTwheretheerror
rateisreported).AlsonotethattheimagestobeclassifiedinCifar10aremuchmore
complexthantheonesinMNIST.
TypeofMachine AccuracyRate Citation

LearningMethod
OurNNResults 50.46%
DeeplySupervisedNets 91.78% [37]
NetworkinNetwork 91.2% [38]
SumProductNetworks 83.96% [39]
ConvolutionalKernel 82.18% [40]

Networks
PCANet 78.67% [41]
Other Thoughts
Wetrainedtheneuralnetworkandranourtestafewtimesusingmanydifferent
parameters.Ourmaintestswereallsettorunforamaxof1000iterationswitha
learningrateof0.1andtostopearlyifafter100iterations,nofurtherprogresswas
made.Themodelsallhadonehiddenlayerwiththenumberofhiddennodes(also
sometimesreferredtoashiddenunits)beingvariableeachtimethetestwasrun.
Followingareafewtablesandgraphswithmoredetailedinformationonallthe
numerousexperimentsrun:
41
MNISTTable
Dataset HiddenUnits Numberof ErrorRate

IterationsRan
BeforeIdeal
Solution
MNIST 100 92 3.80%
MNIST 120 104 3.48%
MNIST 140 421 3.21%
MNIST 160 175 3.33%
MNIST 180 165 3.01%
MNIST 200 312 3.00%
MNIST 220 422 2.90%
MNIST 240 584 2.93%
MNIST 260 199 3.15%
MNIST 280 293 3.04%
MNIST 300 589 2.70%
42
MNISTGraphs

43
Cifar10Table
Dataset HiddenUnits Numberof AccuracyRate

IterationsRan
BeforeIdeal
Solution
Cifar10 100 98 46.85%
Cifar10 120 88 47.62%
Cifar10 140 55 49.32%
Cifar10 160 53 47.63%
Cifar10 180 54 48.06%
Cifar10 200 52 48.84%
Cifar10 220 68 49.65%
Cifar10 240 50 50.46%
Cifar10 260 46 48.91%
Cifar10 280 74 49.93%
Cifar10 300 55 49.90%
44
Cifar10Graphs
45
Onedisadvantageofneuralnetworksisthelongtrainingtimes.Wecanuseour
experiencestrainingMNISTtodemonstrate(ourexperienceswithCifar10aresimilar).
Whenwesetthenumberofhiddennodesto100,eachiterationtookaboutsixseconds
torunandourprogramwentthrough193iterationsbeforedecidingtostopearlydueto
lackofprogress.Theidealsolutionwasfoundonthe92nditeration.Weendupgetting
anerrorrateof3.8%.Whenwesetthenumberofhiddennodesto300,eachiteration
tookabouttensecondstorunandourprogramwentthrough589iterationsbefore
decidingtostopearlyduetolackofprogress.Theidealsolutionwasfoundonthe488th
iteration.Weendupgettinganerrorrateof2.7%.Weshouldalsonotethatincreasing
thenumberofhiddennodesfrom100to300dramaticallyincreasedthetimeittookto
traintheneuralnetworkfromaround30minutestoover3hours.
Interestinglyenough,thenumberofiterationsrequiredbeforereachingtheideal
solutionseemstohavenocorrelationwiththenumberofhiddenunits.Thisisbecause
thealgorithmchoosesarandomspottostartwalkingtowardstheidealsolutionand
maysometimeslandinamorefavorablespotinitially.However,ingeneral,more
hiddenunitsresultinmorefavorableresults,especiallyforMNIST.
TheCifar10datasetresultedinmuchworseresultsthantheMNIST.Thiscanbe
explainedbythemuchmorecompleximagescontainedinthedataset.Inorderto
improveresults,itmaybenecessarytoincreasethenumberofhiddenunits,increase
46
thenumberoflayersinthemodel,oruseamoreaggressivelearningrate.
Unfortunatelythisisunfeasiblewiththelimitedhardwarethatwehaveaccessto.
Overall,theseresultsaresomewhattypicalofneuralnetworks,whichseemtohavea
largevariationinerrorratepercentage.OntheMNISTdatasetotherresultsrangefrom
4.7%inatestrunbyYannLeCun[31]to0.35%inatestrunbyDanCiresan[34].The
currentbestresultforCifar10[37]alsousesavariationonneuralnetworks,showing
thatneuralnetworksareindeedagoodcandidateforclassificationofmorecomplex
imageryaswell.
Nevertheless,theseareprettyfavorableresults(especiallywhenconsideringthelimited
hardwarethatthetestwasrunon)andaptlydemonstratethepotentialthatneural
networkshavewhensolvingthesesortsofproblems.
Conclusion
Inconclusion,neuralnetworksareshowntobeaviablechoicewhendoingimage
recognition,especiallyofhandwrittendigitimages.Thisisbecauseneuralnetworksare
especiallyusefulforsolvingproblemswithnonlinearsolutions,whichappliesinthe
caseofhandwrittendigitimages,sincethehiddenunitsareabletoeffectivelymodel
suchproblems.Wemustalsonotethatthisiswiththecaveatofhavingthenecessary
47
computationalhardwareandtime.Withoutsuchresources,resultscanbesubpar,as
shownbytheCifar10tests.
Whiletherearemanyadvantagestousingneuralnetworks,therearealsoafew
drawbacks.Onedrawbacktoneuralnetworksisintraining.Sincethesystemhastogo
throughmanyiterationsduringthetrainingphase,thismaycausetrainingtotakealong
time,especiallywhenrunoncomputersusingolderhardware.
Anotherdrawbacktoneuralnetworks(althoughthisappliestoallmachinelearningin
general)ispickingtherightparameters(suchasnumberofhiddennodesperlayer,
learningrate,andnumberoflayers).Therightparameterscancauseahugedifference
inresultswithamassivedecreaseinerrorratepercentage.However,itisdifficultto
balanceandthewrongchoicescancauseextremelylongtrainingtimesorinaccurate
results.
Futureworkmayincluderunningtestsonmodelsthathavemorehiddenunitsand
layersaswellasusingamoreaggressivelearningrate.Toaccommodatethelarge
hardwaredemandsthatarerequiredtodosuchwork,wemaylookintorunningour
computationsinthecloud.AmazonhastheEC2cloudwhichmaybeabletooffloada
lotofthework.Anotherpossiblewaytodrasticallyimprovehardwareperformancemay
betousethegraphicscardtohelpwithcomputations.NVIDIAhastheCUDAlibrary
48
whichisexcellentatrunningcomputationsinparallel,whichTorch7actuallyhassome
builtinsupportfor.
Withtheriseoftoolssuchastorch7(anddp)neuralnetworksarenowmoreusefulthan
everbeforeandwillprobablybeappliedtomanyotherproblemsinthefuture.Thiscan
rangefromitemrecommendationatashoppingservicelikeAmazonoreventothings
likeselfdrivingcars.WeliveinaneraruledbydataandIamexcitedtoseewhatwill
comenext.
References
[1]WarrenS.McCullochandWalterPitts.
ALogicalCalculusoftheIdeasImmanentin
NervousActivity
.Published1943.AccessedOct2014.
[2]PaulJ.Werbos.
BackpropagationThroughTime:WhatItDoesandHowtoDoIt.
Published1990.AccessedOct2014.
[3]AdamCoates,BrodyHuval,TaoWang,DavidJ.Wu,BryanCatanzaroandAndrew
DeepLearningwithCOTSHPCSystems.
Y.Ng. PublishedJul2013.AccessedOct
2014.
[4]DanqiChen,RichardSocher,ChristopherD.ManningandAndrewY.Ng.
Learning
NewFactsFromKnowledgeBaseswithNeuralTensorNetworksandSemanticWord
Vectors
.PublishedMar2013.AccessedOct2014.
49
[5]RichardSocher,BrodyHuval,BharathBhat,ChristopherD.ManningandAndrewY.
ConvolutionalRecursiveDeepLearningfor3DObjectClassification
Ng. .Published
2012.AccessedOct2014.
[6]GeoffreyE.Hinton,SimonOsindero,andYeeWhyeTeh.
AFastLearningAlgorithm
forDeepBeliefNets
[7]GeoffreyE.HintonandR.R.Salakhutdinov.
ReducingtheDimensionalityofData
withNeuralNetworks
ToRecognizeShapes,FirstLearntoGenerateImages
[8]GeoffreyE.Hinton. .
PublishedOct2006.AccessedOct2014.
LearningMultipleLayersofRepresentation
[9]GeoffreyE.Hinton. .PublishedOct2007.
AccessedOct2014.
[10]YannLeCun,FuJieHuang,andLeonBottou.
LearningMethodsforGenericObject
RecognitionwithInvariancetoPoseandLighting
.PublishedCVPR,2004.Accessed
Oct2014.
ScalingLearningAlgorithmsTowardsAI
[11]YoshuaBengioandYannLeCun. .
PublishedMITPress,2007.AccessedOct2014.
[12]PiotrW.Mirowski,YannLeCun,DeepakMadhavan,andRubenKuzniecky.
ComparingSVMandConvolutionalNetworksforEpilepticSeizurePredictionfrom
IntracranialEEG
IsGoogleCorneringtheMarketonDeepLearning?
[13]AntonioRegalado. .Published
MITTechnologyReview,Jan2014.AccessedOct2014.
[14]Torch7.AccessedOct2014.<http://torch.ch/>
50
[15]Theano.AccessedOct2014.<http://deeplearning.net/software/theano/>
[16]Caffe.AccessedOct2014.<http://caffe.berkeleyvision.org/>
[17]Caltech101.AccessedOct2014.
<http://www.vision.caltech.edu/Image_Datasets/Caltech101/>
[18]PASCALVOC.AccessedOct2014.
<http://pascallin.ecs.soton.ac.uk/challenges/VOC/>
[19]OxfordFlowers.AccessedOct2014.
<http://www.robots.ox.ac.uk/~vgg/data/flowers/>
[20]StanfordDogs.AccessedOct2014.
<http://vision.stanford.edu/aditya86/ImageNetDogs/>
[21]AnimalswithAttributes.AccessedOct2014.
<http://attributes.kyb.tuebingen.mpg.de/>
[22]ThomasSerre,LiorWolf,andTomasoPoggio.
ObjectRecognitionwithFeatures
InspiredbyVisualCortex
.Published2005.AccessedNov2014.
[23]KristenGraumanandTrevorDarrell.
ThePyramidMatchKernel:Discriminative
ClassificationwithSetsofImageFeatures
.PublishedICCV,2005.AccessedNov2014.
[24]JianchaoYang,KaiYu,YihongGong,andThomasHuang.
LinearSpatialPyramid
MatchingUsingSparseCodingforImageClassification
.PublishedCVPR,2009.
AccessedNov2014.
[25]MNISTHandwrittendigitdatabase.AccessedDec2014.
<http://yann.lecun.com/exdb/mnist/>
51
[26]RobertIerusalimschy,LuizHenriquedeFigueiredo,andWaldemarCeles.
The
EvolutionofLua
.AccessedJan2015.<http://www.lua.org/doc/hopl.pdf>
[27]TIOBESoftware:TIOBEIndex.AccessedJan2015.
<http://www.tiobe.com/index.php/content/paperinfo/tpci/index.html>
[28]LuaJIT.AccessedJan2015.<http://luajit.org/luajit.html>
[29]LuaRocks.AccessedJan2015.<http://luarocks.org/>
[30]dp.AccessedJan2015.<http://dp.readthedocs.org/en/latest/index.html>
[31]YannLeCun,LeonBottou,YoshuaBengio,andPatrickHaffner.
GradientBased
LearningAppliedtoDocumentRecognition
.PublishedNov1998.AccessedFeb2015.
[32]DanielKeysers,ThomasDeselaers,ChristianGollan,andHermannNay.
DeformationModelsforImageRecognition
.Published2007.AccessedFeb2015.
TrainingInvariantSupportVectorMachines
[33]DennisDecoste,BernhardScholkopf. .
Published2002.AccessedFeb2015.
[34]DanCiresan,UeliMeier,LucaGambardella,andJuergenSchmidhuber.
DeepBig
SimpleNeuralNetsExcelonHandwrittenDigitRecognition
.PublishedMar2010.
AccessedFeb2015.
[35]MNISTNearestNeighborResults.AccessedFeb2015.
<http://finmath.uchicago.edu/~wilder/Mnist/>
[36]CIFAR10andCIFAR100Datasets.AccessedApril2015.
<http://www.cs.toronto.edu/~kriz/cifar.html>
[37]ChenYuLee,SainingXie,PatrickGallagher,ZhengyouZhang,ZhuowenTu.
DeeplySupervisedNets.
Published2014.AccessedApril2015.
52
NetworkInNetwork.
[38]MinLin,QiangChen,ShuichengYan. Published2013.
AccessedApril2015.
[39]RobertGens,PedroDomingos.
DiscriminativeLearningofSumProductNetworks.
[40]JulienMairal,PiotrKoniusz,ZaidHarchaoui,CordeliaSchmid.
Convolutional
KernelNetworks.
[41]TsungHanChan,KuiJia,ShenghuaGao,JiwenLu,ZinanZeng,YiMa.
PCANet:
ASimpleDeepLearningBaselineforImageClassification.
Published2014.Accessed
April2015.
[42]TheLastAIBreakthroughDeepMindMadeBeforeGoogleBoughtItFor$400m.
AccessedApril2015.
<https://medium.com/thephysicsarxivblog/thelastaibreakthroughdeepmindmadeb
eforegoogleboughtitfor400m7952031ee5e1>
53

54

Using Neural Networks For Image Classification

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Using Neural Networks For Image Classification

Transféré par

Droits d'auteur :

Formats disponibles

San Jose State University

Using Neural Networks for Image Classification

Follow this and additional works at: http://scholarworks.sjsu.edu/etd_projects

Short History of Artificial Neural Networks

Quick Explanation of Artificial Neural Networks

A Little Regarding Image Classification

Deep Learning with COTS HPC Systems

Semantic Word Vectors

Convolutional Recursive Deep Learning for 3D Object Classification

A Fast Algorithm for Deep Belief Nets

Reducing the Dimensionality of Data with Neural Networks

To Recognize Shapes, First Learn to Generate Images

Scaling Algorithms Towards AI

from Intracranial EEG

Machine Learning Libaries

basedon associativearrays andextensiblesemantics.Luaisdynamicallytyped,

runs by interpreting bytecode for a registerbased virtual machine, and has

automatic memory management with incremental garbage collection, making it

$ luarocks install somepackage

Data Preparation & Feature Engineering

mydata = dp.Mnist{ input_preprocess = dp.Standardize()}

Running the Test

Typeof LowestError HighestError MedianError Citations

OurNN 2.7% 3.8% 3.04%

Linear 7.6% 12.0% 8.4% [31][31][31]

KNearest 0.52% 5.0% 1.33% [32][31][35]

Nonlinear 3.3% 3.6% 3.3% [31][31][31]

Supportvector 0.56% 1.4% 0.8% [33][25][31]

NeuralNets 0.35% 4.7% 2.45% [34][31][31]

TypeofMachine AccuracyRate Citation

DeeplySupervisedNets 91.78% [37]

NetworkinNetwork 91.2% [38]

SumProductNetworks 83.96% [39]

ConvolutionalKernel 82.18% [40]

PCANet 78.67% [41]

Dataset HiddenUnits Numberof ErrorRate

MNIST 100 92 3.80%

MNIST 120 104 3.48%

MNIST 140 421 3.21%

MNIST 160 175 3.33%

MNIST 180 165 3.01%

MNIST 200 312 3.00%

MNIST 220 422 2.90%

MNIST 240 584 2.93%

MNIST 260 199 3.15%

MNIST 280 293 3.04%

MNIST 300 589 2.70%

Dataset HiddenUnits Numberof AccuracyRate

Cifar10 100 98 46.85%

Cifar10 120 88 47.62%

Cifar10 140 55 49.32%

Cifar10 160 53 47.63%

Cifar10 180 54 48.06%

Cifar10 200 52 48.84%

Cifar10 220 68 49.65%

Cifar10 240 50 50.46%

Cifar10 260 46 48.91%

Cifar10 280 74 49.93%

Cifar10 300 55 49.90%

Vous aimerez peut-être aussi