Académique Documents
Professionnel Documents
Culture Documents
6TRRS8VHU*XLGHY
SqoopUserGuide(v1.4.2)
SqoopUserGuide(v1.4.2)
TableofContents
1.Introduction
2.SupportedReleases
3.SqoopReleases
4.Prerequisites
5.BasicUsage
6.SqoopTools
6.1.UsingCommandAliases
6.2.ControllingtheHadoopInstallation
6.3.UsingGenericandSpecificArguments
6.4.UsingOptionsFilestoPassArguments
6.5.UsingTools
7.
7.1.Purpose
7.2.Syntax
7.2.1.ConnectingtoaDatabaseServer
7.2.2.SelectingtheDatatoImport
7.2.3.FreeformQueryImports
7.2.4.ControllingParallelism
7.2.5.ControllingtheImportProcess
7.2.6.Controllingtypemapping
7.2.7.IncrementalImports
7.2.8.FileFormats
7.2.9.LargeObjects
7.2.10.ImportingDataIntoHive
7.2.11.ImportingDataIntoHBase
7.3.ExampleInvocations
8.
8.1.Purpose
8.2.Syntax
8.3.ExampleInvocations
9.
9.1.Purpose
9.2.Syntax
9.3.Insertsvs.Updates
9.4.ExportsandTransactions
9.5.FailedExports
9.6.ExampleInvocations
10.SavedJobs
11.
11.1.Purpose
11.2.Syntax
11.3.Savedjobsandpasswords
11.4.Savedjobsandincrementalimports
12.
12.1.Purpose
KWWSVVTRRSDSDFKHRUJGRFV6TRRS8VHU*XLGHKWPO
6TRRS8VHU*XLGHY
12.2.Syntax
13.
13.1.Purpose
13.2.Syntax
14.
14.1.Purpose
14.2.Syntax
14.3.ExampleInvocations
15.
15.1.Purpose
15.2.Syntax
15.3.ExampleInvocations
16.
16.1.Purpose
16.2.Syntax
16.3.ExampleInvocations
17.
17.1.Purpose
17.2.Syntax
17.3.ExampleInvocations
18.
18.1.Purpose
18.2.Syntax
18.3.ExampleInvocations
19.
19.1.Purpose
19.2.Syntax
19.3.ExampleInvocations
20.
20.1.Purpose
20.2.Syntax
20.3.ExampleInvocations
21.CompatibilityNotes
21.1.SupportedDatabases
21.2.MySQL
21.2.1.zeroDateTimeBehavior
21.2.2.
columns
21.2.3.andcolumns
21.2.4.Importingviewsindirectmode
21.2.5.DirectmodeTransactions
21.3.PostgreSQL
21.3.1.Importingviewsindirectmode
21.4.Oracle
21.4.1.DatesandTimes
21.5.SchemaDefinitioninHive
22.GettingSupport
KWWSVVTRRSDSDFKHRUJGRFV6TRRS8VHU*XLGHKWPO
6TRRS8VHU*XLGHY
23.Troubleshooting
23.1.GeneralTroubleshootingProcess
23.2.SpecificTroubleshootingTips
23.2.1.Oracle:ConnectionResetErrors
23.2.2.Oracle:CaseSensitiveCatalogQueryErrors
23.2.3.MySQL:ConnectionFailure
23.2.4.Oracle:ORA00933error(SQLcommandnotproperlyended)
23.2.5.MySQL:ImportofTINYINT(1)fromMySQLbehavesstrangely
1.Introduction
SqoopisatooldesignedtotransferdatabetweenHadoopandrelationaldatabases.Youcanuse
Sqooptoimportdatafromarelationaldatabasemanagementsystem(RDBMS)suchasMySQLor
OracleintotheHadoopDistributedFileSystem(HDFS),transformthedatainHadoopMapReduce,and
thenexportthedatabackintoanRDBMS.
Sqoopautomatesmostofthisprocess,relyingonthedatabasetodescribetheschemaforthedatato
beimported.SqoopusesMapReducetoimportandexportthedata,whichprovidesparalleloperation
aswellasfaulttolerance.
ThisdocumentdescribeshowtogetstartedusingSqooptomovedatabetweendatabasesandHadoop
andprovidesreferenceinformationfortheoperationoftheSqoopcommandlinetoolsuite.This
documentisintendedfor:
Systemandapplicationprogrammers
Systemadministrators
Databaseadministrators
Dataanalysts
Dataengineers
2.SupportedReleases
ThisdocumentationappliestoSqoopv1.4.2.
3.SqoopReleases
SqoopisanopensourcesoftwareproductoftheApacheSoftwareFoundation.
SoftwaredevelopmentforSqoopoccursathttp://svn.apache.org/repos/asf/sqoop/trunk.Atthatsite
youcanobtain:
NewreleasesofSqoopaswellasitsmostrecentsourcecode
Anissuetracker
AwikithatcontainsSqoopdocumentation
SqoopiscompatiblewithApacheHadoop0.21andClouderasDistributionofHadoopversion3.
4.Prerequisites
Thefollowingprerequisiteknowledgeisrequiredforthisproduct:
KWWSVVTRRSDSDFKHRUJGRFV6TRRS8VHU*XLGHKWPO
6TRRS8VHU*XLGHY
Basiccomputertechnologyandterminology
Familiaritywithcommandlineinterfacessuchas
Relationaldatabasemanagementsystems
BasicfamiliaritywiththepurposeandoperationofHadoop
BeforeyoucanuseSqoop,areleaseofHadoopmustbeinstalledandconfigured.Werecommendthat
youdownloadClouderasDistributionforHadoop(CDH3)fromtheClouderaSoftwareArchiveat
http://archive.cloudera.comforstraightforwardinstallationofHadooponLinuxsystems.
ThisdocumentassumesyouareusingaLinuxorLinuxlikeenvironment.IfyouareusingWindows,
youmaybeabletousecygwintoaccomplishmostofthefollowingtasks.IfyouareusingMacOSX,
youshouldseefew(ifany)compatibilityerrors.SqoopispredominantlyoperatedandtestedonLinux.
5.BasicUsage
WithSqoop,youcanimportdatafromarelationaldatabasesystemintoHDFS.Theinputtotheimport
processisadatabasetable.SqoopwillreadthetablerowbyrowintoHDFS.Theoutputofthisimport
processisasetoffilescontainingacopyoftheimportedtable.Theimportprocessisperformedin
parallel.Forthisreason,theoutputwillbeinmultiplefiles.Thesefilesmaybedelimitedtextfiles(for
example,withcommasortabsseparatingeachfield),orbinaryAvroorSequenceFilescontaining
serializedrecorddata.
AbyproductoftheimportprocessisageneratedJavaclasswhichcanencapsulateonerowofthe
importedtable.ThisclassisusedduringtheimportprocessbySqoopitself.TheJavasourcecodefor
thisclassisalsoprovidedtoyou,foruseinsubsequentMapReduceprocessingofthedata.Thisclass
canserializeanddeserializedatatoandfromtheSequenceFileformat.Itcanalsoparsethedelimited
textformofarecord.TheseabilitiesallowyoutoquicklydevelopMapReduceapplicationsthatusethe
HDFSstoredrecordsinyourprocessingpipeline.Youarealsofreetoparsethedelimitedsrecorddata
yourself,usinganyothertoolsyouprefer.
Aftermanipulatingtheimportedrecords(forexample,withMapReduceorHive)youmayhavearesult
datasetwhichyoucanthenexportbacktotherelationaldatabase.Sqoopsexportprocesswillreada
setofdelimitedtextfilesfromHDFSinparallel,parsethemintorecords,andinsertthemasnewrows
inatargetdatabasetable,forconsumptionbyexternalapplicationsorusers.
Sqoopincludessomeothercommandswhichallowyoutoinspectthedatabaseyouareworkingwith.
Forexample,youcanlisttheavailabledatabaseschemas(withthetool)andtables
withinaschema(withthetool).SqoopalsoincludesaprimitiveSQLexecutionshell
(thetool).
Mostaspectsoftheimport,codegeneration,andexportprocessescanbecustomized.Youcancontrol
thespecificrowrangeorcolumnsimported.Youcanspecifyparticulardelimitersandescape
charactersforthefilebasedrepresentationofthedata,aswellasthefileformatused.Youcanalso
controltheclassorpackagenamesusedingeneratedcode.Subsequentsectionsofthisdocument
explainhowtospecifytheseandotherargumentstoSqoop.
6.SqoopTools
6.1.UsingCommandAliases
6.2.ControllingtheHadoopInstallation
6.3.UsingGenericandSpecificArguments
6.4.UsingOptionsFilestoPassArguments
6.5.UsingTools
Sqoopisacollectionofrelatedtools.TouseSqoop,youspecifythetoolyouwanttouseandthe
argumentsthatcontrolthetool.
IfSqoopiscompiledfromitsownsource,youcanrunSqoopwithoutaformalinstallationprocessby
runningtheprogram.UsersofapackageddeploymentofSqoop(suchasanRPMshipped
withClouderasDistributionforHadoop)willseethisprograminstalledas.Theremainder
ofthisdocumentationwillrefertothisprogramas.Forexample:
KWWSVVTRRSDSDFKHRUJGRFV6TRRS8VHU*XLGHKWPO
6TRRS8VHU*XLGHY
Note
Thefollowingexamplesthatbeginwithacharacterindicatethatthecommands
mustbeenteredataterminalprompt(suchas).Thecharacterrepresents
thepromptitselfyoushouldnotstartthesecommandsbytypinga.Youcan
alsoentercommandsinlineinthetextofaparagraphforexample,.
Theseexamplesdonotshowaprefix,butyoushouldenterthemthesameway.
Dontconfusetheshellpromptintheexampleswiththethatprecedesan
environmentvariablename.Forexample,thestringliteralincludesa
"".
Sqoopshipswithahelptool.Todisplayalistofallavailabletools,typethefollowingcommand:
Youcandisplayhelpforaspecifictoolbyentering:forexample,.
Youcanalsoaddtheargumenttoanycommand:.
6.1.UsingCommandAliases
Inadditiontotypingthesyntax,youcanusealiasscriptsthatspecifythe
syntax.Forexample,thescripts,,etc.eachselectaspecifictool.
6.2.ControllingtheHadoopInstallation
YouinvokeSqoopthroughtheprogramlaunchcapabilityprovidedbyHadoop.Thecommandline
programisawrapperwhichrunsthescriptshippedwithHadoop.Ifyouhavemultiple
installationsofHadooppresentonyourmachine,youcanselecttheHadoopinstallationbysettingthe
environmentvariable.
Forexample:
or:
Ifisnotset,SqoopwillusethedefaultinstallationlocationforClouderasDistributionfor
Hadoop,.
TheactiveHadoopconfigurationisloadedfrom,unlessthe
environmentvariableisset.
6.3.UsingGenericandSpecificArguments
TocontroltheoperationofeachSqooptool,youusegenericandspecificarguments.
Forexample:
KWWSVVTRRSDSDFKHRUJGRFV6TRRS8VHU*XLGHKWPO
6TRRS8VHU*XLGHY
Youmustsupplythegenericarguments,,andsoonafterthetoolnamebutbeforeanytool
specificarguments(suchas).NotethatgenericHadoopargumentsarepreceededbyasingle
dashcharacter(),whereastoolspecificargumentsstartwithtwodashes(),unlesstheyaresingle
characterargumentssuchas.
The,,andargumentscontroltheconfigurationandHadoopserversettings.Forexample,
thecanbeusedtosetthenameoftheMRjobthatSqooplaunches,ifnot
specified,thenamedefaultstothejarnameforthejobwhichisderivedfromtheusedtablename.
The,,andargumentsarenottypicallyusedwithSqoop,buttheyareincludedas
partofHadoopsinternalargumentparsingsystem.
6.4.UsingOptionsFilestoPassArguments
WhenusingSqoop,thecommandlineoptionsthatdonotchangefrominvocationtoinvocationcanbe
putinanoptionsfileforconvenience.Anoptionsfileisatextfilewhereeachlineidentifiesanoption
intheorderthatitappearsotherwiseonthecommandline.Optionfilesallowspecifyingasingle
optiononmultiplelinesbyusingthebackslashcharacterattheendofintermediatelines.Also
supportedarecommentswithinoptionfilesthatbeginwiththehashcharacter.Commentsmustbe
specifiedonanewlineandmaynotbemixedwithoptiontext.Allcommentsandemptylinesare
ignoredwhenoptionfilesareexpanded.Unlessoptionsappearasquotedstrings,anyleadingor
trailingspacesareignored.Quotedstringsifusedmustnotextendbeyondthelineonwhichtheyare
specified.
Optionfilescanbespecifiedanywhereinthecommandlineaslongastheoptionswithinthemfollow
theotherwiseprescribedrulesofoptionsordering.Forinstance,regardlessofwheretheoptionsare
loadedfrom,theymustfollowtheorderingsuchthatgenericoptionsappearfirst,toolspecificoptions
next,finallyfollowedbyoptionsthatareintendedtobepassedtochildprograms.
Tospecifyanoptionsfile,simplycreateanoptionsfileinaconvenientlocationandpassittothe
commandlineviaargument.
Wheneveranoptionsfileisspecified,itisexpandedonthecommandlinebeforethetoolisinvoked.
Youcanspecifymorethanoneoptionfileswithinthesameinvocationifneeded.
Forexample,thefollowingSqoopinvocationforimportcanbespecifiedalternativelyasshownbelow:
wheretheoptionsfilecontainsthefollowing:
KWWSVVTRRSDSDFKHRUJGRFV6TRRS8VHU*XLGHKWPO
6TRRS8VHU*XLGHY
Theoptionsfilecanhaveemptylinesandcommentsforreadabilitypurposes.Sotheaboveexample
wouldworkexactlythesameiftheoptionsfilecontainedthefollowing:
6.5.UsingTools
Thefollowingsectionswilldescribeeachtoolsoperation.Thetoolsarelistedinthemostlikelyorder
youwillfindthemuseful.
7.
7.1.Purpose
7.2.Syntax
7.2.1.ConnectingtoaDatabaseServer
7.2.2.SelectingtheDatatoImport
7.2.3.FreeformQueryImports
7.2.4.ControllingParallelism
7.2.5.ControllingtheImportProcess
7.2.6.Controllingtypemapping
7.2.7.IncrementalImports
7.2.8.FileFormats
7.2.9.LargeObjects
7.2.10.ImportingDataIntoHive
7.2.11.ImportingDataIntoHBase
7.3.ExampleInvocations
7.1.Purpose
ThetoolimportsanindividualtablefromanRDBMStoHDFS.Eachrowfromatableis
representedasaseparaterecordinHDFS.Recordscanbestoredastextfiles(onerecordperline),or
inbinaryrepresentationasAvroorSequenceFiles.
7.2.Syntax
7.2.1.ConnectingtoaDatabaseServer
7.2.2.SelectingtheDatatoImport
7.2.3.FreeformQueryImports
7.2.4.ControllingParallelism
7.2.5.ControllingtheImportProcess
7.2.6.Controllingtypemapping
7.2.7.IncrementalImports
7.2.8.FileFormats
7.2.9.LargeObjects
KWWSVVTRRSDSDFKHRUJGRFV6TRRS8VHU*XLGHKWPO
6TRRS8VHU*XLGHY
7.2.10.ImportingDataIntoHive
7.2.11.ImportingDataIntoHBase
WhiletheHadoopgenericargumentsmustprecedeanyimportarguments,youcantypetheimport
argumentsinanyorderwithrespecttooneanother.
Note
Inthisdocument,argumentsaregroupedintocollectionsorganizedbyfunction.
Somecollectionsarepresentinseveraltools(forexample,the"common"
arguments).Anextendeddescriptionoftheirfunctionalityisgivenonlyonthe
firstpresentationinthisdocument.
Table1.Commonarguments
Argument
Description
SpecifyJDBCconnectstring
Specifyconnectionmanagerclasstouse
ManuallyspecifyJDBCdriverclasstouse
Override$HADOOP_HOME
Printusageinstructions
Readpasswordfromconsole
Setauthenticationpassword
Setauthenticationusername
Printmoreinformationwhileworking
Optionalpropertiesfilethatprovidesconnectionparameters
7.2.1.ConnectingtoaDatabaseServer
SqoopisdesignedtoimporttablesfromadatabaseintoHDFS.Todoso,youmustspecifyaconnect
stringthatdescribeshowtoconnecttothedatabase.TheconnectstringissimilartoaURL,andis
communicatedtoSqoopwiththeargument.Thisdescribestheserveranddatabaseto
connecttoitmayalsospecifytheport.Forexample:
ThisstringwillconnecttoaMySQLdatabasenamedonthehost.Its
importantthatyoudonotusetheURLifyouintendtouseSqoopwithadistributedHadoop
cluster.TheconnectstringyousupplywillbeusedonTaskTrackernodesthroughoutyourMapReduce
clusterifyouspecifytheliteralname,eachnodewillconnecttoadifferentdatabase(or
morelikely,nodatabaseatall).Instead,youshouldusethefullhostnameorIPaddressofthe
databasehostthatcanbeseenbyallyourremotenodes.
Youmightneedtoauthenticateagainstthedatabasebeforeyoucanaccessit.Youcanusethe
andorparameterstosupplyausernameandapasswordtothedatabase.For
example:
Warning
Theparameterisinsecure,asotherusersmaybeabletoreadyour
passwordfromthecommandlineargumentsviatheoutputofprogramssuch
as.Theargumentwillreadapasswordfromaconsoleprompt,andisthe
preferredmethodofenteringcredentials.Credentialsmaystillbetransferred
betweennodesoftheMapReduceclusterusinginsecuremeans.
KWWSVVTRRSDSDFKHRUJGRFV6TRRS8VHU*XLGHKWPO
6TRRS8VHU*XLGHY
Sqoopautomaticallysupportsseveraldatabases,includingMySQL.Connectstringsbeginningwith
arehandledautomaticallyinSqoop.(Afulllistofdatabaseswithbuiltinsupportis
providedinthe"SupportedDatabases"section.Forsome,youmayneedtoinstalltheJDBCdriver
yourself.)
YoucanuseSqoopwithanyotherJDBCcompliantdatabase.First,downloadtheappropriateJDBC
driverforthetypeofdatabaseyouwanttoimport,andinstallthe.jarfileinthe
directoryonyourclientmachine.(ThiswillbeifyouinstalledfromanRPMorDebian
package.)Eachdriverfilealsohasaspecificdriverclasswhichdefinestheentrypointtothe
driver.Forexample,MySQLsConnector/Jlibraryhasadriverclassof.Referto
yourdatabasevendorspecificdocumentationtodeterminethemaindriverclass.Thisclassmustbe
providedasanargumenttoSqoopwith.
Forexample,toconnecttoaSQLServerdatabase,firstdownloadthedriverfrommicrosoft.comand
installitinyourSqooplibpath.
ThenrunSqoop.Forexample:
WhenconnectingtoadatabaseusingJDBC,youcanoptionallyspecifyextraJDBCparametersviaa
propertyfileusingtheoption.Thecontentsofthisfileareparsedasstandard
Javapropertiesandpassedintothedriverwhilecreatingaconnection.
Note
TheparametersspecifiedviatheoptionalpropertyfileareonlyapplicabletoJDBC
connections.AnyfastpathconnectorsthatuseconnectionsotherthanJDBCwill
ignoretheseparameters.
Table2.Importcontrolarguments:
Argument
Description
AppenddatatoanexistingdatasetinHDFS
ImportsdatatoAvroDataFiles
ImportsdatatoSequenceFiles
Importsdataasplaintext(default)
Boundaryquerytouseforcreatingsplits
Columnstoimportfromtable
Usedirectimportfastpath
Splittheinputstreameverynbyteswhenimportingindirectmode
SetthemaximumsizeforaninlineLOB
Usenmaptaskstoimportinparallel
Importtheresultsof.
Columnofthetableusedtosplitworkunits
Tabletoread
HDFSdestinationdir
HDFSparentfortabledestination
WHEREclausetouseduringimport
Enablecompression
UseHadoopcodec(defaultgzip)
Thestringtobewrittenforanullvalueforstringcolumns
Thestringtobewrittenforanullvaluefornonstringcolumns
Theandargumentsareoptional.\Ifnotspecified,thenthestring"null"
willbeused.
7.2.2.SelectingtheDatatoImport
KWWSVVTRRSDSDFKHRUJGRFV6TRRS8VHU*XLGHKWPO
6TRRS8VHU*XLGHY
Sqooptypicallyimportsdatainatablecentricfashion.Usetheargumenttoselectthetableto
import.Forexample,.Thisargumentcanalsoidentifyaorothertablelikeentityin
adatabase.
Bydefault,allcolumnswithinatableareselectedforimport.ImporteddataiswrittentoHDFSinits
"naturalorder"thatis,atablecontainingcolumnsA,B,andCresultinanimportofdatasuchas:
Youcanselectasubsetofcolumnsandcontroltheirorderingbyusingtheargument.This
shouldincludeacommadelimitedlistofcolumnstoimport.Forexample:
.
YoucancontrolwhichrowsareimportedbyaddingaSQLclausetotheimportstatement.By
default,Sqoopgeneratesstatementsoftheform .Youcanappenda
clausetothiswiththeargument.Forexample:.Onlyrowswherethe
columnhasavaluegreaterthan400willbeimported.
Bydefaultsqoopwillusequerytofindout
boundariesforcreatingsplits.Insomecasesthisqueryisnotthemostoptimalsoyoucanspecifyany
arbitraryqueryreturningtwonumericcolumnsusingargument.
7.2.3.FreeformQueryImports
SqoopcanalsoimporttheresultsetofanarbitrarySQLquery.Insteadofusingthe,
andarguments,youcanspecifyaSQLstatementwiththeargument.
Whenimportingafreeformquery,youmustspecifyadestinationdirectorywith.
Ifyouwanttoimporttheresultsofaqueryinparallel,theneachmaptaskwillneedtoexecuteacopy
ofthequery,withresultspartitionedbyboundingconditionsinferredbySqoop.Yourquerymust
includethetokenwhicheachSqoopprocesswillreplacewithauniqueconditionexpression.
Youmustalsoselectasplittingcolumnwith.
Forexample:
Alternately,thequerycanbeexecutedonceandimportedserially,byspecifyingasinglemaptask
with:
Note
Ifyouareissuingthequerywrappedwithdoublequotes("),youwillhavetouse
insteadofjusttodisallowyourshellfromtreatingitasa
shellvariable.Forexample,adoublequotedquerymaylooklike:
Note
ThefacilityofusingfreeformqueryinthecurrentversionofSqoopislimitedto
simplequerieswheretherearenoambiguousprojectionsandnoconditionsin
theclause.Useofcomplexqueriessuchasqueriesthathavesubqueriesor
joinsleadingtoambiguousprojectionscanleadtounexpectedresults.
7.2.4.ControllingParallelism
Sqoopimportsdatainparallelfrommostdatabasesources.Youcanspecifythenumberofmaptasks
KWWSVVTRRSDSDFKHRUJGRFV6TRRS8VHU*XLGHKWPO
6TRRS8VHU*XLGHY
(parallelprocesses)tousetoperformtheimportbyusingtheorargument.Eachof
theseargumentstakesanintegervaluewhichcorrespondstothedegreeofparallelismtoemploy.By
default,fourtasksareused.Somedatabasesmayseeimprovedperformancebyincreasingthisvalue
to8or16.Donotincreasethedegreeofparallelismgreaterthanthatavailablewithinyour
MapReduceclustertaskswillrunseriallyandwilllikelyincreasetheamountoftimerequiredto
performtheimport.Likewise,donotincreasethedegreeofparallismhigherthanthatwhichyour
databasecanreasonablysupport.Connecting100concurrentclientstoyourdatabasemayincrease
theloadonthedatabaseservertoapointwhereperformancesuffersasaresult.
Whenperformingparallelimports,Sqoopneedsacriterionbywhichitcansplittheworkload.Sqoop
usesasplittingcolumntosplittheworkload.Bydefault,Sqoopwillidentifytheprimarykeycolumn(if
present)inatableanduseitasthesplittingcolumn.Thelowandhighvaluesforthesplittingcolumn
areretrievedfromthedatabase,andthemaptasksoperateonevenlysizedcomponentsofthetotal
range.Forexample,ifyouhadatablewithaprimarykeycolumnofwhoseminimumvaluewas0
andmaximumvaluewas1000,andSqoopwasdirectedtouse4tasks,Sqoopwouldrunfour
processeswhicheachexecuteSQLstatementsoftheform
,withsetto(0,250),(250,500),(500,750),and(750,1001)inthedifferenttasks.
Iftheactualvaluesfortheprimarykeyarenotuniformlydistributedacrossitsrange,thenthiscan
resultinunbalancedtasks.Youshouldexplicitlychooseadifferentcolumnwiththe
argument.Forexample,.Sqoopcannotcurrentlysplitonmulticolumnindices.If
yourtablehasnoindexcolumn,orhasamulticolumnkey,thenyoumustalsomanuallychoosea
splittingcolumn.
7.2.5.ControllingtheImportProcess
Bydefault,theimportprocesswilluseJDBCwhichprovidesareasonablecrossvendorimportchannel.
Somedatabasescanperformimportsinamorehighperformancefashionbyusingdatabasespecific
datamovementtools.Forexample,MySQLprovidesthetoolwhichcanexportdatafrom
MySQLtoothersystemsveryquickly.Bysupplyingtheargument,youarespecifyingthat
Sqoopshouldattemptthedirectimportchannel.Thischannelmaybehigherperformancethanusing
JDBC.Currently,directmodedoesnotsupportimportsoflargeobjectcolumns.
WhenimportingfromPostgreSQLinconjunctionwithdirectmode,youcansplittheimportinto
separatefilesafterindividualfilesreachacertainsize.Thissizelimitiscontrolledwiththe
argument.
Bydefault,Sqoopwillimportatablenamedtoadirectorynamedinsideyourhomedirectoryin
HDFS.Forexample,ifyourusernameis,thentheimporttoolwillwriteto
.Youcanadjusttheparentdirectoryoftheimportwiththe
argument.Forexample:
Thiscommandwouldwritetoasetoffilesinthedirectory.
Youcanalsoexplicitlychoosethetargetdirectory,likeso:
Thiswillimportthefilesintothedirectory.isincompatiblewith.
Whenusingdirectmode,youcanspecifyadditionalargumentswhichshouldbepassedtothe
underlyingtool.Iftheargumentisgivenonthecommandline,thensubsequentargumentsaresent
directlytotheunderlyingtool.Forexample,thefollowingadjuststhecharactersetusedby:
Bydefault,importsgotoanewtargetlocation.IfthedestinationdirectoryalreadyexistsinHDFS,
Sqoopwillrefusetoimportandoverwritethatdirectoryscontents.Ifyouusetheargument,
Sqoopwillimportdatatoatemporarydirectoryandthenrenamethefilesintothenormaltarget
KWWSVVTRRSDSDFKHRUJGRFV6TRRS8VHU*XLGHKWPO
6TRRS8VHU*XLGHY
directoryinamannerthatdoesnotconflictwithexistingfilenamesinthatdirectory.
Note
Whenusingthedirectmodeofimport,certaindatabaseclientutilitiesare
expectedtobepresentintheshellpathofthetaskprocess.ForMySQLthe
utilitiesandarerequired,whereasforPostgreSQLtheutility
isrequired.
7.2.6.Controllingtypemapping
SqoopispreconfiguredtomapmostSQLtypestoappropriateJavaorHiverepresentatives.However
thedefaultmappingmightnotbesuitableforeveryoneandmightbeoverriddenby
(forchangingmappingtoJava)or(forchangingHivemapping).
Table3.Parametersforoverridingmapping
Argument
Description
OverridemappingfromSQLtoJavatypeforconfiguredcolumns.
OverridemappingfromSQLtoHivetypeforconfiguredcolumns.
Sqoopisexpectingcommaseparatedlistofmappinginform<nameofcolumn>=<newtype>.For
example:
Sqoopwillriseexceptionincasethatsomeconfiguredmappingwillnotbeused.
7.2.7.IncrementalImports
Sqoopprovidesanincrementalimportmodewhichcanbeusedtoretrieveonlyrowsnewerthansome
previouslyimportedsetofrows.
Thefollowingargumentscontrolincrementalimports:
Table4.Incrementalimportarguments:
Argument
Description
Specifiesthecolumntobeexaminedwhendeterminingwhichrowstoimport.
SpecifieshowSqoopdetermineswhichrowsarenew.Legalvaluesforinclude
and.
Specifiesthemaximumvalueofthecheckcolumnfromthepreviousimport.
Sqoopsupportstwotypesofincrementalimports:and.Youcanusethe
argumenttospecifythetypeofincrementalimporttoperform.
Youshouldspecifymodewhenimportingatablewherenewrowsarecontinuallybeingadded
withincreasingrowidvalues.Youspecifythecolumncontainingtherowsidwith.Sqoop
importsrowswherethecheckcolumnhasavaluegreaterthantheonespecifiedwith.
AnalternatetableupdatestrategysupportedbySqoopiscalledmode.Youshouldusethis
whenrowsofthesourcetablemaybeupdated,andeachsuchupdatewillsetthevalueofalast
modifiedcolumntothecurrenttimestamp.Rowswherethecheckcolumnholdsatimestampmore
recentthanthetimestampspecifiedwithareimported.
Attheendofanincrementalimport,thevaluewhichshouldbespecifiedasfora
subsequentimportisprintedtothescreen.Whenrunningasubsequentimport,youshouldspecify
inthiswaytoensureyouimportonlytheneworupdateddata.Thisishandledautomatically
bycreatinganincrementalimportasasavedjob,whichisthepreferredmechanismforperforminga
KWWSVVTRRSDSDFKHRUJGRFV6TRRS8VHU*XLGHKWPO
6TRRS8VHU*XLGHY
recurringincrementalimport.Seethesectiononsavedjobslaterinthisdocumentformore
information.
7.2.8.FileFormats
Youcanimportdatainoneoftwofileformats:delimitedtextorSequenceFiles.
Delimitedtextisthedefaultimportformat.Youcanalsospecifyitexplicitlybyusingthe
argument.Thisargumentwillwritestringbasedrepresentationsofeachrecordtotheoutputfiles,
withdelimitercharactersbetweenindividualcolumnsandrows.Thesedelimitersmaybecommas,
tabs,orothercharacters.(Thedelimiterscanbeselectedsee"Outputlineformattingarguments.")
Thefollowingistheresultsofanexampletextbasedimport:
Delimitedtextisappropriateformostnonbinarydatatypes.Italsoreadilysupportsfurther
manipulationbyothertools,suchasHive.
SequenceFilesareabinaryformatthatstoreindividualrecordsincustomrecordspecificdatatypes.
ThesedatatypesaremanifestedasJavaclasses.Sqoopwillautomaticallygeneratethesedatatypes
foryou.Thisformatsupportsexactstorageofalldatainbinaryrepresentations,andisappropriatefor
storingbinarydata(forexample,columns),ordatathatwillbeprinciplymanipulatedby
customMapReduceprograms(readingfromSequenceFilesishigherperformancethanreadingfrom
textfiles,asrecordsdonotneedtobeparsed).
Avrodatafilesareacompact,efficientbinaryformatthatprovidesinteroperabilitywithapplications
writteninotherprogramminglanguages.Avroalsosupportsversioning,sothatwhen,e.g.,columns
areaddedorremovedfromatable,previouslyimporteddatafilescanbeprocessedalongwithnew
ones.
Bydefault,dataisnotcompressed.Youcancompressyourdatabyusingthedeflate(gzip)algorithm
withtheorargument,orspecifyanyHadoopcompressioncodecusingthe
argument.ThisappliestoSequenceFile,text,andAvrofiles.
7.2.9.LargeObjects
Sqoophandleslargeobjects(andcolumns)inparticularways.Ifthisdataistrulylarge,then
thesecolumnsshouldnotbefullymaterializedinmemoryformanipulation,asmostcolumnsare.
Instead,theirdataishandledinastreamingfashion.Largeobjectscanbestoredinlinewiththerest
ofthedata,inwhichcasetheyarefullymaterializedinmemoryoneveryaccess,ortheycanbe
storedinasecondarystoragefilelinkedtotheprimarydatastorage.Bydefault,largeobjectsless
than16MBinsizearestoredinlinewiththerestofthedata.Atalargersize,theyarestoredinfilesin
thesubdirectoryoftheimporttargetdirectory.Thesefilesarestoredinaseparateformat
optimizedforlargerecordstorage,whichcanaccomodaterecordsofupto2^63byteseach.Thesize
atwhichlobsspillintoseparatefilesiscontrolledbytheargument,whichtakesa
parameterspecifyingthelargestlobsizetokeepinline,inbytes.IfyousettheinlineLOBlimitto0,all
largeobjectswillbeplacedinexternalstorage.
Table5.Outputlineformattingarguments:
Argument
Description
Setsarequiredfieldenclosingcharacter
Setstheescapecharacter
Setsthefieldseparatorcharacter
Setstheendoflinecharacter
UsesMySQLsdefaultdelimiterset:fields:lines:escapedby:optionally
enclosedby:
Setsafieldenclosingcharacter
KWWSVVTRRSDSDFKHRUJGRFV6TRRS8VHU*XLGHKWPO
6TRRS8VHU*XLGHY
Whenimportingtodelimitedfiles,thechoiceofdelimiterisimportant.Delimiterswhichappearinside
stringbasedfieldsmaycauseambiguousparsingoftheimporteddatabysubsequentanalysispasses.
Forexample,thestringshouldnotbeimportedwiththeendoffield
delimitersettoacomma.
Delimitersmaybespecifiedas:
acharacter()
anescapecharacter().Supportedescapecharactersare:
(backspace)
(newline)
(carriagereturn)
(tab)
(doublequote)
(singlequote)
(backslash)
(NUL)ThiswillinsertNULcharactersbetweenfieldsorlines,orwilldisable
enclosing/escapingifusedforoneofthe,,or
arguments.
TheoctalrepresentationofaUTF8characterscodepoint.Thisshouldbeoftheform,
whereoooistheoctalvalue.Forexample,wouldyieldthe
character.
ThehexadecimalrepresentationofaUTF8characterscodepoint.Thisshouldbeofthe
form,wherehhhisthehexvalue.Forexample,wouldyield
thecarriagereturncharacter.
Thedefaultdelimitersareacomma()forfields,anewline()forrecords,noquotecharacter,and
noescapecharacter.Notethatthiscanleadtoambiguous/unparsiblerecordsifyouimportdatabase
recordscontainingcommasornewlinesinthefielddata.Forunambiguousparsing,bothmustbe
enabled.Forexample,via.
Ifunambiguousdelimiterscannotbepresented,thenuseenclosingandescapingcharacters.The
combinationof(optional)enclosingandescapingcharacterswillallowunambiguousparsingoflines.
Forexample,supposeonecolumnofadatasetcontainedthefollowingvalues:
Thefollowingargumentswouldprovidedelimiterswhichcanbeunambiguouslyparsed:
(Notethattopreventtheshellfrommanglingtheenclosingcharacter,wehaveenclosedthat
argumentitselfinsinglequotes.)
Theresultoftheaboveargumentsappliedtotheabovedatasetwouldbe:
Heretheimportedstringsareshowninthecontextofadditionalcolumns(,etc.)to
demonstratethefulleffectofenclosingandescaping.Theenclosingcharacterisonlystrictlynecessary
whendelimitercharactersappearintheimportedtext.Theenclosingcharactercanthereforebe
specifiedasoptional:
Whichwouldresultinthefollowingimport:
KWWSVVTRRSDSDFKHRUJGRFV6TRRS8VHU*XLGHKWPO
6TRRS8VHU*XLGHY
Note
EventhoughHivesupportsescapingcharacters,itdoesnothandleescapingof
newlinecharacter.Also,itdoesnotsupportthenotionofenclosingcharacters
thatmayincludefielddelimitersintheenclosedstring.Itistherefore
recommendedthatyouchooseunambiguousfieldandrecordterminating
delimiterswithoutthehelpofescapingandenclosingcharacterswhenworking
withHivethisisduetolimitationsofHivesinputparsingabilities.
Theargumentisashorthandargumentwhichusesthedefaultdelimitersforthe
program.Ifyouusethedelimitersinconjunctionwithadirectmodeimport(with
),veryfastimportscanbeachieved.
Whilethechoiceofdelimitersismostimportantforatextmodeimport,itisstillrelevantifyouimport
toSequenceFileswith.Thegeneratedclass'methodwillusethedelimiters
youspecify,sosubsequentformattingoftheoutputdatawillrelyonthedelimitersyouchoose.
Table6.Inputparsingarguments:
Argument
Description
Setsarequiredfieldencloser
Setstheinputescapecharacter
Setstheinputfieldseparator
Setstheinputendoflinecharacter
Setsafieldenclosingcharacter
WhenSqoopimportsdatatoHDFS,itgeneratesaJavaclasswhichcanreinterpretthetextfilesthatit
createswhendoingadelimitedformatimport.Thedelimitersarechosenwithargumentssuchas
thiscontrolsbothhowthedataiswrittentodisk,andhowthegenerated
methodreinterpretsthisdata.Thedelimitersusedbythemethodcanbechosenindependently
oftheoutputarguments,byusing,andsoon.Thisisuseful,forexample,to
generateclasseswhichcanparserecordscreatedwithonesetofdelimiters,andemittherecordstoa
differentsetoffilesusingaseparatesetofdelimiters.
Table7.Hivearguments:
Argument
Description
Override
ImporttablesintoHive(UsesHivesdefaultdelimitersifnoneareset.)
OverwriteexistingdataintheHivetable.
Ifset,thenthejobwillfailifthetargethive
tableexits.Bydefaultthispropertyisfalse.
SetsthetablenametousewhenimportingtoHive.
Replace\n,\r,and\01fromstringfieldswithuserdefinedstringwhenimporting
toHive.
Nameofahivefieldtopartitionareshardedon
Stringvaluethatservesaspartitionkeyforthisimportedintohiveinthisjob.
OverridedefaultmappingfromSQLtypetoHivetypeforconfiguredcolumns.
Drops\n,\r,and\01fromstringfieldswhenimportingtoHive.
7.2.10.ImportingDataIntoHive
SqoopsimporttoolsmainfunctionistouploadyourdataintofilesinHDFS.IfyouhaveaHive
metastoreassociatedwithyourHDFScluster,SqoopcanalsoimportthedataintoHivebygenerating
andexecutingastatementtodefinethedataslayoutinHive.ImportingdataintoHiveis
assimpleasaddingtheoptiontoyourSqoopcommandline.
KWWSVVTRRSDSDFKHRUJGRFV6TRRS8VHU*XLGHKWPO
6TRRS8VHU*XLGHY
IftheHivetablealreadyexists,youcanspecifytheoptiontoindicatethatexistingtable
inhivemustbereplaced.AfteryourdataisimportedintoHDFSorthisstepisomitted,Sqoopwill
generateaHivescriptcontainingaoperationdefiningyourcolumnsusingHivestypes,and
astatementtomovethedatafilesintoHiveswarehousedirectory.
ThescriptwillbeexecutedbycallingtheinstalledcopyofhiveonthemachinewhereSqoopisrun.If
youhavemultipleHiveinstallations,orisnotinyour,usetheoptiontoidentifythe
Hiveinstallationdirectory.Sqoopwillusefromhere.
Note
Thisfunctionisincompatiblewithand.
EventhoughHivesupportsescapingcharacters,itdoesnothandleescapingofnewlinecharacter.
Also,itdoesnotsupportthenotionofenclosingcharactersthatmayincludefielddelimitersinthe
enclosedstring.Itisthereforerecommendedthatyouchooseunambiguousfieldandrecord
terminatingdelimiterswithoutthehelpofescapingandenclosingcharacterswhenworkingwithHive
thisisduetolimitationsofHivesinputparsingabilities.Ifyoudouse,,or
whenimportingdataintoHive,Sqoopwillprintawarningmessage.
HivewillhaveproblemsusingSqoopimporteddataifyourdatabasesrowscontainstringfieldsthat
haveHivesdefaultrowdelimiters(andcharacters)orcolumndelimiters(characters)present
inthem.Youcanusetheoptiontodropthosecharactersonimporttogive
Hivecompatibletextdata.Alternatively,youcanusetheoptiontoreplace
thosecharacterswithauserdefinedstringonimporttogiveHivecompatibletextdata.Theseoptions
shouldonlybeusedifyouuseHivesdefaultdelimitersandshouldnotbeusedifdifferentdelimiters
arespecified.
SqoopwillpassthefieldandrecorddelimitersthroughtoHive.Ifyoudonotsetanydelimitersanddo
use,thefielddelimiterwillbesettoandtherecorddelimiterwillbesettotobe
consistentwithHivesdefaults.
ThetablenameusedinHiveis,bydefault,thesameasthatofthesourcetable.Youcancontrolthe
outputtablenamewiththeoption.
Hivecanputdataintopartitionsformoreefficientqueryperformance.YoucantellaSqoopjobto
importdataforHiveintoaparticularpartitionbyspecifyingtheand
arguments.Thepartitionvaluemustbeastring.PleaseseetheHivedocumentationformore
detailsonpartitioning.
YoucanimportcompressedtablesintoHiveusingtheandoptions.One
downsidetocompressingtablesimportedintoHiveisthatmanycodecscannotbesplitforprocessing
byparallelmaptasks.Thelzopcodec,however,doessupportsplitting.Whenimportingtableswith
thiscodec,SqoopwillautomaticallyindexthefilesforsplittingandconfiguringanewHivetablewith
thecorrectInputFormat.Thisfeaturecurrentlyrequiresthatallpartitionsofatablebecompressed
withthelzopcodec.
Table8.HBasearguments:
Argument
Description
Setsthetargetcolumnfamilyfortheimport
Ifspecified,createmissingHBasetables
Specifieswhichinputcolumntouseastherowkey
SpecifiesanHBasetabletouseasthetargetinsteadofHDFS
7.2.11.ImportingDataIntoHBase
SqoopsupportsadditionalimporttargetsbeyondHDFSandHive.Sqoopcanalsoimportrecordsintoa
tableinHBase.
Byspecifying,youinstructSqooptoimporttoatableinHBaseratherthanadirectoryin
HDFS.Sqoopwillimportdatatothetablespecifiedastheargumentto.Eachrowofthe
KWWSVVTRRSDSDFKHRUJGRFV6TRRS8VHU*XLGHKWPO
6TRRS8VHU*XLGHY
inputtablewillbetransformedintoanHBaseoperationtoarowoftheoutputtable.Thekeyfor
eachrowistakenfromacolumnoftheinput.BydefaultSqoopwillusethesplitbycolumnastherow
keycolumn.Ifthatisnotspecified,itwilltrytoidentifytheprimarykeycolumn,ifany,ofthesource
table.Youcanmanuallyspecifytherowkeycolumnwith.Eachoutputcolumnwillbe
placedinthesamecolumnfamily,whichmustbespecifiedwith.
Note
Thisfunctionisincompatiblewithdirectimport(parameter).
Ifthetargettableandcolumnfamilydonotexist,theSqoopjobwillexitwithanerror.Youshould
createthetargettableandcolumnfamilybeforerunninganimport.Ifyouspecify,
Sqoopwillcreatethetargettableandcolumnfamilyiftheydonotexist,usingthedefaultparameters
fromyourHBaseconfiguration.
SqoopcurrentlyserializesallvaluestoHBasebyconvertingeachfieldtoitsstringrepresentation(asif
youwereimportingtoHDFSintextmode),andtheninsertstheUTF8bytesofthisstringinthe
targetcell.
Table9.Codegenerationarguments:
Argument
Description
Outputdirectoryforcompiledobjects
Setsthegeneratedclassname.Thisoverrides.Whencombinedwith
,setstheinputclass.
Disablecodegenerationusespecifiedjar
Outputdirectoryforgeneratedcode
Putautogeneratedclassesinthispackage
OverridedefaultmappingfromSQLtypetoJavatypeforconfiguredcolumns.
Asmentionedearlier,abyproductofimportingatabletoHDFSisaclasswhichcanmanipulatethe
importeddata.IfthedataisstoredinSequenceFiles,thisclasswillbeusedforthedatasserialization
container.Therefore,youshouldusethisclassinyoursubsequentMapReduceprocessingofthedata.
Theclassistypicallynamedafterthetableatablenamedwillgenerateaclassnamed.You
maywanttooverridethisclassname.Forexample,ifyourtableisnamed,youmaywantto
specifyinstead.Similarly,youcanspecifyjustthepackagenamewith.
Thefollowingimportgeneratesaclassnamed:
Thesourcefileforyourclasswillbewrittentothecurrentworkingdirectorywhenyourun.
Youcancontroltheoutputdirectorywith.Forexample,.
Theimportprocesscompilesthesourceintoandfilestheseareordinarilystoredunder.
Youcanselectanalternatetargetdirectorywith.Forexample,.
Ifyoualreadyhaveacompiledclassthatcanbeusedtoperformtheimportandwanttosuppressthe
codegenerationaspectoftheimportprocess,youcanuseanexistingjarandclassbyprovidingthe
andoptions.Forexample:
Thiscommandwillloadtheclassoutof.
7.3.ExampleInvocations
Thefollowingexamplesillustratehowtousetheimporttoolinavarietyofsituations.
KWWSVVTRRSDSDFKHRUJGRFV6TRRS8VHU*XLGHKWPO
6TRRS8VHU*XLGHY
Abasicimportofatablenamedinthedatabase:
Abasicimportrequiringalogin:
Selectingspecificcolumnsfromthetable:
Controllingtheimportparallelism(using8paralleltasks):
EnablingtheMySQL"directmode"fastpath:
StoringdatainSequenceFiles,andsettingthegeneratedclassnameto:
Specifyingthedelimiterstouseinatextmodeimport:
ImportingthedatatoHive:
Importingonlynewemployees:
Changingthesplittingcolumnfromthedefault:
Verifyingthatanimportwassuccessful:
Performinganincrementalimportofnewdata,afterhavingalreadyimportedthefirst100,000rowsof
atable:
KWWSVVTRRSDSDFKHRUJGRFV6TRRS8VHU*XLGHKWPO
6TRRS8VHU*XLGHY
8.
8.1.Purpose
8.2.Syntax
8.3.ExampleInvocations
8.1.Purpose
ThetoolimportsasetoftablesfromanRDBMStoHDFS.Datafromeachtableis
storedinaseparatedirectoryinHDFS.
Forthetooltobeuseful,thefollowingconditionsmustbemet:
Eachtablemusthaveasinglecolumnprimarykey.
Youmustintendtoimportallcolumnsofeachtable.
Youmustnotintendtousenondefaultsplittingcolumn,norimposeanyconditionsviaa
clause.
8.2.Syntax
AlthoughtheHadoopgenericargumentsmustpreceedanyimportarguments,theimportarguments
canbeenteredinanyorderwithrespecttooneanother.
Table10.Commonarguments
Argument
Description
SpecifyJDBCconnectstring
Specifyconnectionmanagerclasstouse
ManuallyspecifyJDBCdriverclasstouse
Override$HADOOP_HOME
Printusageinstructions
Readpasswordfromconsole
Setauthenticationpassword
Setauthenticationusername
Printmoreinformationwhileworking
Optionalpropertiesfilethatprovidesconnectionparameters
Table11.Importcontrolarguments:
Argument
Description
ImportsdatatoAvroDataFiles
ImportsdatatoSequenceFiles
Importsdataasplaintext(default)
Usedirectimportfastpath
Splittheinputstreameverynbyteswhenimportingindirectmode
SetthemaximumsizeforaninlineLOB
Usenmaptaskstoimportinparallel
HDFSparentfortabledestination
Enablecompression
UseHadoopcodec(defaultgzip)
Theseargumentsbehaveinthesamemannerastheydowhenusedforthetool,butthe
KWWSVVTRRSDSDFKHRUJGRFV6TRRS8VHU*XLGHKWPO
6TRRS8VHU*XLGHY
,,,andargumentsareinvalidfor.
Table12.Outputlineformattingarguments:
Argument
Description
Setsarequiredfieldenclosingcharacter
Setstheescapecharacter
Setsthefieldseparatorcharacter
Setstheendoflinecharacter
UsesMySQLsdefaultdelimiterset:fields:lines:escapedby:optionally
enclosedby:
Setsafieldenclosingcharacter
Table13.Inputparsingarguments:
Argument
Description
Setsarequiredfieldencloser
Setstheinputescapecharacter
Setstheinputfieldseparator
Setstheinputendoflinecharacter
Setsafieldenclosingcharacter
Table14.Hivearguments:
Argument
Description
Override
ImporttablesintoHive(UsesHivesdefaultdelimitersifnoneareset.)
OverwriteexistingdataintheHivetable.
Ifset,thenthejobwillfailifthetargethive
tableexits.Bydefaultthispropertyisfalse.
SetsthetablenametousewhenimportingtoHive.
Replace\n,\r,and\01fromstringfieldswithuserdefinedstringwhenimporting
toHive.
Nameofahivefieldtopartitionareshardedon
Stringvaluethatservesaspartitionkeyforthisimportedintohiveinthisjob.
OverridedefaultmappingfromSQLtypetoHivetypeforconfiguredcolumns.
Drops\n,\r,and\01fromstringfieldswhenimportingtoHive.
Table15.Codegenerationarguments:
Argument
Description
Outputdirectoryforcompiledobjects
Disablecodegenerationusespecifiedjar
Outputdirectoryforgeneratedcode
Putautogeneratedclassesinthispackage
Thetooldoesnotsupporttheargument.Youmay,however,specifya
packagewithinwhichallgeneratedclasseswillbeplaced.
8.3.ExampleInvocations
KWWSVVTRRSDSDFKHRUJGRFV6TRRS8VHU*XLGHKWPO
6TRRS8VHU*XLGHY
Importalltablesfromthedatabase:
Verifyingthatitworked:
9.
9.1.Purpose
9.2.Syntax
9.3.Insertsvs.Updates
9.4.ExportsandTransactions
9.5.FailedExports
9.6.ExampleInvocations
9.1.Purpose
ThetoolexportsasetoffilesfromHDFSbacktoanRDBMS.Thetargettablemustalreadyexist
inthedatabase.Theinputfilesarereadandparsedintoasetofrecordsaccordingtotheuser
specifieddelimiters.
Thedefaultoperationistotransformtheseintoasetofstatementsthatinjecttherecordsinto
thedatabase.In"updatemode,"Sqoopwillgeneratestatementsthatreplaceexistingrecordsin
thedatabase.
9.2.Syntax
AlthoughtheHadoopgenericargumentsmustpreceedanyexportarguments,theexportarguments
canbeenteredinanyorderwithrespecttooneanother.
Table16.Commonarguments
Argument
Description
SpecifyJDBCconnectstring
Specifyconnectionmanagerclasstouse
ManuallyspecifyJDBCdriverclasstouse
Override$HADOOP_HOME
Printusageinstructions
Readpasswordfromconsole
Setauthenticationpassword
Setauthenticationusername
Printmoreinformationwhileworking
Optionalpropertiesfilethatprovidesconnectionparameters
Table17.Exportcontrolarguments:
Argument
Description
Usedirectexportfastpath
HDFSsourcepathfortheexport
Usenmaptaskstoexportinparallel
Tabletopopulate
KWWSVVTRRSDSDFKHRUJGRFV6TRRS8VHU*XLGHKWPO
6TRRS8VHU*XLGHY
Anchorcolumntouseforupdates.Useacommaseparatedlistofcolumnsif
therearemorethanonecolumn.
Specifyhowupdatesareperformedwhennewrowsarefoundwithnon
matchingkeysindatabase.
Legalvaluesforinclude(default)and.
Thestringtobeinterpretedasnullforstringcolumns
Thetableinwhichdatawillbestagedbeforebeinginsertedintothe
destinationtable.
Indicatesthatanydatapresentinthestagingtablecanbedeleted.
Usebatchmodeforunderlyingstatementexecution.
Thestringtobeinterpretedasnullfornonstringcolumns
Theandargumentsarerequired.Thesespecifythetabletopopulateinthe
database,andthedirectoryinHDFSthatcontainsthesourcedata.
Youcancontrolthenumberofmappersindependentlyfromthenumberoffilespresentinthe
directory.Exportperformancedependsonthedegreeofparallelism.Bydefault,Sqoopwillusefour
tasksinparallelfortheexportprocess.Thismaynotbeoptimalyouwillneedtoexperimentwith
yourownparticularsetup.Additionaltasksmayofferbetterconcurrency,butifthedatabaseisalready
bottleneckedonupdatingindices,invokingtriggers,andsoon,thenadditionalloadmaydecrease
performance.Theorargumentscontrolthenumberofmaptasks,whichisthedegree
ofparallelismused.
MySQLprovidesadirectmodeforexportsaswell,usingthetool.WhenexportingtoMySQL,
usetheargumenttospecifythiscodepath.Thismaybehigherperformancethanthestandard
JDBCcodepath.
Note
WhenusingexportindirectmodewithMySQL,theMySQLbulkutility
mustbeavailableintheshellpathofthetaskprocess.
Theandargumentsareoptional.Ifisnot
specified,thenthestring"null"willbeinterpretedasnullforstringtypecolumns.If
isnotspecified,thenboththestring"null"andtheemptystringwillbeinterpretedasnullfor
nonstringcolumns.Notethat,theemptystringwillbealwaysinterpretedasnullfornonstring
columns,inadditiontootherstringifspecifiedby.
SinceSqoopbreaksdownexportprocessintomultipletransactions,itispossiblethatafailedexport
jobmayresultinpartialdatabeingcommittedtothedatabase.Thiscanfurtherleadtosubsequent
jobsfailingduetoinsertcollisionsinsomecases,orleadtoduplicateddatainothers.Youcan
overcomethisproblembyspecifyingastagingtableviatheoptionwhichactsasan
auxiliarytablethatisusedtostageexporteddata.Thestageddataisfinallymovedtothedestination
tableinasingletransaction.
Inordertousethestagingfacility,youmustcreatethestagingtablepriortorunningtheexportjob.
Thistablemustbestructurallyidenticaltothetargettable.Thistableshouldeitherbeemptybefore
theexportjobruns,ortheoptionmustbespecified.Ifthestagingtablecontains
dataandtheoptionisspecified,Sqoopwilldeleteallofthedatabeforestartingthe
exportjob.
Note
Supportforstagingdatapriortopushingitintothedestinationtableisnot
availableforexports.Itisalsonotavailablewhenexportisinvokedusing
theoptionforupdatingexistingdata.
9.3.Insertsvs.Updates
Bydefault,appendsnewrowstoatableeachinputrecordistransformedintoan
KWWSVVTRRSDSDFKHRUJGRFV6TRRS8VHU*XLGHKWPO
6TRRS8VHU*XLGHY
statementthataddsarowtothetargetdatabasetable.Ifyourtablehasconstraints(e.g.,aprimary
keycolumnwhosevaluesmustbeunique)andalreadycontainsdata,youmusttakecaretoavoid
insertingrecordsthatviolatetheseconstraints.Theexportprocesswillfailifanstatementfails.
Thismodeisprimarilyintendedforexportingrecordstoanew,emptytableintendedtoreceivethese
results.
Ifyouspecifytheargument,Sqoopwillinsteadmodifyanexistingdatasetinthedatabase.
Eachinputrecordistreatedasanstatementthatmodifiesanexistingrow.Therowastatement
modifiesisdeterminedbythecolumnname(s)specifiedwith.Forexample,considerthe
followingtabledefinition:
ConsideralsoadatasetinHDFScontainingrecordslikethese:
Runningwillrunanexportjob
thatexecutesSQLstatementsbasedonthedatalikeso:
Ifanstatementmodifiesnorows,thisisnotconsideredanerrortheexportwillsilently
continue.(Ineffect,thismeansthatanupdatebasedexportwillnotinsertnewrowsintothe
database.)Likewise,ifthecolumnspecifiedwithdoesnotuniquelyidentifyrowsand
multiplerowsareupdatedbyasinglestatement,thisconditionisalsoundetected.
Theargumentcanalsobegivenacommaseparatedlistofcolumnnames.Inwhichcase,
Sqoopwillmatchallkeysfromthislistbeforeupdatinganyexistingrecord.
Dependingonthetargetdatabase,youmayalsospecifytheargumentwith
modeifyouwanttoupdaterowsiftheyexistinthedatabasealreadyorinsertrowsiftheydonot
existyet.
Table18.Inputparsingarguments:
Argument
Description
Setsarequiredfieldencloser
Setstheinputescapecharacter
Setstheinputfieldseparator
Setstheinputendoflinecharacter
Setsafieldenclosingcharacter
Table19.Outputlineformattingarguments:
Argument
Description
Setsarequiredfieldenclosingcharacter
Setstheescapecharacter
Setsthefieldseparatorcharacter
Setstheendoflinecharacter
UsesMySQLsdefaultdelimiterset:fields:lines:escapedby:optionally
enclosedby:
Setsafieldenclosingcharacter
KWWSVVTRRSDSDFKHRUJGRFV6TRRS8VHU*XLGHKWPO
6TRRS8VHU*XLGHY
Sqoopautomaticallygeneratescodetoparseandinterpretrecordsofthefilescontainingthedatato
beexportedbacktothedatabase.Ifthesefileswerecreatedwithnondefaultdelimiters(comma
separatedfieldswithnewlineseparatedrecords),youshouldspecifythesamedelimitersagainsothat
Sqoopcanparseyourfiles.
Ifyouspecifyincorrectdelimiters,Sqoopwillfailtofindenoughcolumnsperline.Thiswillcause
exportmaptaskstofailbythrowing.
Table20.Codegenerationarguments:
Argument
Description
Outputdirectoryforcompiledobjects
Setsthegeneratedclassname.Thisoverrides.Whencombinedwith
,setstheinputclass.
Disablecodegenerationusespecifiedjar
Outputdirectoryforgeneratedcode
Putautogeneratedclassesinthispackage
OverridedefaultmappingfromSQLtypetoJavatypeforconfiguredcolumns.
Iftherecordstobeexportedweregeneratedastheresultofapreviousimport,thentheoriginal
generatedclasscanbeusedtoreadthedataback.Specifyingandobviatethe
needtospecifydelimitersinthiscase.
Theuseofexistinggeneratedcodeisincompatiblewithanupdatemodeexportrequires
newcodegenerationtoperformtheupdate.Youcannotuse,andmustfullyspecifyanynon
defaultdelimiters.
9.4.ExportsandTransactions
Exportsareperformedbymultiplewritersinparallel.Eachwriterusesaseparateconnectiontothe
databasethesehaveseparatetransactionsfromoneanother.Sqoopusesthemultirowsyntax
toinsertupto100recordsperstatement.Every100statements,thecurrenttransactionwithina
writertaskiscommitted,causingacommitevery10,000rows.Thisensuresthattransactionbuffers
donotgrowwithoutbound,andcauseoutofmemoryconditions.Therefore,anexportisnotan
atomicprocess.Partialresultsfromtheexportwillbecomevisiblebeforetheexportiscomplete.
9.5.FailedExports
Exportsmayfailforanumberofreasons:
LossofconnectivityfromtheHadoopclustertothedatabase(eitherduetohardwarefault,
orserversoftwarecrashes)
Attemptingtoarowwhichviolatesaconsistencyconstraint(forexample,insertinga
duplicateprimarykeyvalue)
AttemptingtoparseanincompleteormalformedrecordfromtheHDFSsourcedata
Attemptingtoparserecordsusingincorrectdelimiters
Capacityissues(suchasinsufficientRAMordiskspace)
Ifanexportmaptaskfailsduetotheseorotherreasons,itwillcausetheexportjobtofail.The
resultsofafailedexportareundefined.Eachexportmaptaskoperatesinaseparatetransaction.
Furthermore,individualmaptaskstheircurrenttransactionperiodically.Ifataskfails,the
currenttransactionwillberolledback.Anypreviouslycommittedtransactionswillremaindurablein
thedatabase,leadingtoapartiallycompleteexport.
9.6.ExampleInvocations
Abasicexporttopopulateatablenamed:
KWWSVVTRRSDSDFKHRUJGRFV6TRRS8VHU*XLGHKWPO
6TRRS8VHU*XLGHY
Thisexampletakesthefilesinandinjectstheircontentsintothetableinthe
databaseon.Thetargettablemustalreadyexistinthedatabase.Sqoopperformsaset
ofoperations,withoutregardforexistingcontent.IfSqoopattemptstoinsertrowswhich
violateconstraintsinthedatabase(forexample,aparticularprimarykeyvaluealreadyexists),then
theexportfails.
10.SavedJobs
Importsandexportscanberepeatedlyperformedbyissuingthesamecommandmultipletimes.
Especiallywhenusingtheincrementalimportcapability,thisisanexpectedscenario.
Sqoopallowsyoutodefinesavedjobswhichmakethisprocesseasier.Asavedjobrecordsthe
configurationinformationrequiredtoexecuteaSqoopcommandatalatertime.Thesectiononthe
tooldescribeshowtocreateandworkwithsavedjobs.
Bydefault,jobdescriptionsaresavedtoaprivaterepositorystoredin.Youcanconfigure
Sqooptoinsteaduseasharedmetastore,whichmakessavedjobsavailabletomultipleusersacrossa
sharedcluster.Startingthemetastoreiscoveredbythesectiononthetool.
11.
11.1.Purpose
11.2.Syntax
11.3.Savedjobsandpasswords
11.4.Savedjobsandincrementalimports
11.1.Purpose
Thejobtoolallowsyoutocreateandworkwithsavedjobs.Savedjobsremembertheparameters
usedtospecifyajob,sotheycanbereexecutedbyinvokingthejobbyitshandle.
Ifasavedjobisconfiguredtoperformanincrementalimport,stateregardingthemostrecently
importedrowsisupdatedinthesavedjobtoallowthejobtocontinuallyimportonlythenewestrows.
11.2.Syntax
AlthoughtheHadoopgenericargumentsmustpreceedanyjobarguments,thejobargumentscanbe
enteredinanyorderwithrespecttooneanother.
Table21.Jobmanagementoptions:
Argument Description
Defineanewsavedjobwiththespecifiedjobid(name).AsecondSqoopcommandline,
separatedbyashouldbespecifiedthisdefinesthesavedjob.
Deleteasavedjob.
Listallsavedjobs
Givenajobdefinedwith,runthesavedjob.
Showtheparametersforasavedjob.
Creatingsavedjobsisdonewiththeaction.Thisoperationrequiresafollowedbyatool
nameanditsarguments.Thetoolanditsargumentswillformthebasisofthesavedjob.Consider:
KWWSVVTRRSDSDFKHRUJGRFV6TRRS8VHU*XLGHKWPO
6TRRS8VHU*XLGHY
Thiscreatesajobnamedwhichcanbeexecutedlater.Thejobisnotrun.Thisjobisnow
availableinthelistofsavedjobs:
Wecaninspecttheconfigurationofajobwiththeaction:
Andifwearesatisfiedwithit,wecanrunthejobwith:
Theactionallowsyoutooverrideargumentsofthesavedjobbysupplyingthemaftera.For
example,ifthedatabasewerechangedtorequireausername,wecouldspecifytheusernameand
passwordwith:
Table22.Metastoreconnectionoptions:
Argument
Description
SpecifiestheJDBCconnectstringusedtoconnecttothemetastore
Bydefault,aprivatemetastoreisinstantiatedin.Ifyouhaveconfiguredahosted
metastorewiththetool,youcanconnecttoitbyspecifyingtheargument.
ThisisaJDBCconnectstringjustliketheonesusedtoconnecttodatabasesforimport.
In,youcanconfigurewiththisaddress,soyoudo
nothavetosupplytousearemotemetastore.Thisparametercanalsobemodifiedto
movetheprivatemetastoretoalocationonyourfilesystemotherthanyourhomedirectory.
Ifyouconfigurewiththevalue,thenyoumustexplicitly
supply.
Table23.Commonoptions:
Argument Description
Printusageinstructions
Printmoreinformationwhileworking
11.3.Savedjobsandpasswords
TheSqoopmetastoreisnotasecureresource.Multipleuserscanaccessitscontents.Forthisreason,
Sqoopdoesnotstorepasswordsinthemetastore.Ifyoucreateajobthatrequiresapassword,you
willbepromptedforthatpasswordeachtimeyouexecutethejob.
Youcanenablepasswordsinthemetastorebysettingtointhe
configuration.
Notethatyouhavetosettoifyouareexecutingsavedjobsvia
KWWSVVTRRSDSDFKHRUJGRFV6TRRS8VHU*XLGHKWPO
6TRRS8VHU*XLGHY
OoziebecauseSqoopcannotprompttheusertoenterpasswordswhilebeingexecutedasOozietasks.
11.4.Savedjobsandincrementalimports
Incrementalimportsareperformedbycomparingthevaluesinacheckcolumnagainstareference
valueforthemostrecentimport.Forexample,iftheargumentwasspecified,along
withand,allrowswithwillbeimported.Ifanincremental
importisrunfromthecommandline,thevaluewhichshouldbespecifiedasina
subsequentincrementalimportwillbeprintedtothescreenforyourreference.Ifanincremental
importisrunfromasavedjob,thisvaluewillberetainedinthesavedjob.Subsequentrunsof
willcontinuetoimportonlynewerrowsthanthosepreviouslyimported.
12.
12.1.Purpose
12.2.Syntax
12.1.Purpose
ThetoolconfiguresSqooptohostasharedmetadatarepository.Multipleusersand/orremote
userscandefineandexecutesavedjobs(createdwith)definedinthismetastore.
Clientsmustbeconfiguredtoconnecttothemetastoreinorwiththe
argument.
12.2.Syntax
AlthoughtheHadoopgenericargumentsmustpreceedanymetastorearguments,themetastore
argumentscanbeenteredinanyorderwithrespecttooneanother.
Table24.Metastoremanagementoptions:
Argument Description
Shutsdownarunningmetastoreinstanceonthesamemachine.
RunninglaunchesasharedHSQLDBdatabaseinstanceonthecurrentmachine.Clients
canconnecttothismetastoreandcreatejobswhichcanbesharedbetweenusersforexecution.
Thelocationofthemetastoresfilesondiskiscontrolledbythepropertyin
.Thisshouldpointtoadirectoryonthelocalfilesystem.
ThemetastoreisavailableoverTCP/IP.Theportiscontrolledbythe
configurationparameter,anddefaultsto16000.
Clientsshouldconnecttothemetastorebyspecifyingor
withthevalue.Forexample,
.
ThismetastoremaybehostedonamachinewithintheHadoopcluster,orelsewhereonthenetwork.
13.
13.1.Purpose
13.2.Syntax
13.1.Purpose
Themergetoolallowsyoutocombinetwodatasetswhereentriesinonedatasetshouldoverwrite
KWWSVVTRRSDSDFKHRUJGRFV6TRRS8VHU*XLGHKWPO
6TRRS8VHU*XLGHY
entriesofanolderdataset.Forexample,anincrementalimportruninlastmodifiedmodewill
generatemultipledatasetsinHDFSwheresuccessivelynewerdataappearsineachdataset.The
toolwill"flatten"twodatasetsintoone,takingthenewestavailablerecordsforeachprimarykey.
13.2.Syntax
AlthoughtheHadoopgenericargumentsmustpreceedanymergearguments,thejobargumentscan
beenteredinanyorderwithrespecttooneanother.
Table25.Mergeoptions:
Argument
Description
Specifythenameoftherecordspecificclasstouseduringthemergejob.
Specifythenameofthejartoloadtherecordclassfrom.
Specifythenameofacolumntouseasthemergekey.
Specifythepathofthenewerdataset.
Specifythepathoftheolderdataset.
Specifythetargetpathfortheoutputofthemergejob.
ThetoolrunsaMapReducejobthattakestwodirectoriesasinput:anewerdataset,andanolder
one.Thesearespecifiedwithandrespectively.TheoutputoftheMapReducejobwill
beplacedinthedirectoryinHDFSspecifiedby.
Whenmergingthedatasets,itisassumedthatthereisauniqueprimarykeyvalueineachrecord.The
columnfortheprimarykeyisspecifiedwith.Multiplerowsinthesamedatasetshouldnot
havethesameprimarykey,orelsedatalossmayoccur.
Toparsethedatasetandextractthekeycolumn,theautogeneratedclassfromapreviousimport
mustbeused.Youshouldspecifytheclassnameandjarfilewithand.Ifthisis
notavailab,eyoucanrecreatetheclassusingthetool.
Themergetoolistypicallyrunafteranincrementalimportwiththedatelastmodifiedmode(
).
Supposingtwoincrementalimportswereperformed,wheresomeolderdataisinanHDFSdirectory
namedandnewerdataisinanHDFSdirectorynamed,thesecouldbemergedlikeso:
ThiswouldrunaMapReducejobwherethevalueinthecolumnofeachrowisusedtojoinrows
rowsinthedatasetwillbeusedinpreferencetorowsinthedataset.
ThiscanbeusedwithbothSequenceFile,Avroandtextbasedincrementalimports.Thefiletypesof
thenewerandolderdatasetsmustbethesame.
14.
14.1.Purpose
14.2.Syntax
14.3.ExampleInvocations
14.1.Purpose
ThetoolgeneratesJavaclasseswhichencapsulateandinterpretimportedrecords.TheJava
definitionofarecordisinstantiatedaspartoftheimportprocess,butcanalsobeperformed
separately.Forexample,ifJavasourceislost,itcanberecreated.Newversionsofaclasscanbe
createdwhichusedifferentdelimitersbetweenfields,andsoon.
KWWSVVTRRSDSDFKHRUJGRFV6TRRS8VHU*XLGHKWPO
6TRRS8VHU*XLGHY
14.2.Syntax
AlthoughtheHadoopgenericargumentsmustpreceedanycodegenarguments,thecodegen
argumentscanbeenteredinanyorderwithrespecttooneanother.
Table26.Commonarguments
Argument
Description
SpecifyJDBCconnectstring
Specifyconnectionmanagerclasstouse
ManuallyspecifyJDBCdriverclasstouse
Override$HADOOP_HOME
Printusageinstructions
Readpasswordfromconsole
Setauthenticationpassword
Setauthenticationusername
Printmoreinformationwhileworking
Optionalpropertiesfilethatprovidesconnectionparameters
Table27.Codegenerationarguments:
Argument
Description
Outputdirectoryforcompiledobjects
Setsthegeneratedclassname.Thisoverrides.Whencombinedwith
,setstheinputclass.
Disablecodegenerationusespecifiedjar
Outputdirectoryforgeneratedcode
Putautogeneratedclassesinthispackage
OverridedefaultmappingfromSQLtypetoJavatypeforconfiguredcolumns.
Table28.Outputlineformattingarguments:
Argument
Description
Setsarequiredfieldenclosingcharacter
Setstheescapecharacter
Setsthefieldseparatorcharacter
Setstheendoflinecharacter
UsesMySQLsdefaultdelimiterset:fields:lines:escapedby:optionally
enclosedby:
Setsafieldenclosingcharacter
Table29.Inputparsingarguments:
Argument
Description
Setsarequiredfieldencloser
Setstheinputescapecharacter
Setstheinputfieldseparator
Setstheinputendoflinecharacter
Setsafieldenclosingcharacter
KWWSVVTRRSDSDFKHRUJGRFV6TRRS8VHU*XLGHKWPO
6TRRS8VHU*XLGHY
Table30.Hivearguments:
Argument
Description
Override
ImporttablesintoHive(UsesHivesdefaultdelimitersifnoneareset.)
OverwriteexistingdataintheHivetable.
Ifset,thenthejobwillfailifthetargethive
tableexits.Bydefaultthispropertyisfalse.
SetsthetablenametousewhenimportingtoHive.
Replace\n,\r,and\01fromstringfieldswithuserdefinedstringwhenimporting
toHive.
Nameofahivefieldtopartitionareshardedon
Stringvaluethatservesaspartitionkeyforthisimportedintohiveinthisjob.
OverridedefaultmappingfromSQLtypetoHivetypeforconfiguredcolumns.
Drops\n,\r,and\01fromstringfieldswhenimportingtoHive.
IfHiveargumentsareprovidedtothecodegenerationtool,SqoopgeneratesafilecontainingtheHQL
statementstocreateatableandloaddata.
14.3.ExampleInvocations
Recreatetherecordinterpretationcodeforthetableofacorporatedatabase:
15.
15.1.Purpose
15.2.Syntax
15.3.ExampleInvocations
15.1.Purpose
ThetoolpopulatesaHivemetastorewithadefinitionforatablebasedonadatabase
tablepreviouslyimportedtoHDFS,oroneplannedtobeimported.Thiseffectivelyperformsthe"
"stepofwithoutrunningthepreceedingimport.
IfdatawasalreadyloadedtoHDFS,youcanusethistooltofinishthepipelineofimportingthedatato
Hive.YoucanalsocreateHivetableswiththistooldatathencanbeimportedandpopulatedintothe
targetafterapreprocessingsteprunbytheuser.
15.2.Syntax
AlthoughtheHadoopgenericargumentsmustpreceedanycreatehivetablearguments,thecreate
hivetableargumentscanbeenteredinanyorderwithrespecttooneanother.
Table31.Commonarguments
Argument
Description
SpecifyJDBCconnectstring
Specifyconnectionmanagerclasstouse
ManuallyspecifyJDBCdriverclasstouse
KWWSVVTRRSDSDFKHRUJGRFV6TRRS8VHU*XLGHKWPO
6TRRS8VHU*XLGHY
Override$HADOOP_HOME
Printusageinstructions
Readpasswordfromconsole
Setauthenticationpassword
Setauthenticationusername
Printmoreinformationwhileworking
Optionalpropertiesfilethatprovidesconnectionparameters
Table32.Hivearguments:
Argument
Description
Override
OverwriteexistingdataintheHivetable.
Ifset,thenthejobwillfailifthetargethive
tableexits.Bydefaultthispropertyisfalse.
SetsthetablenametousewhenimportingtoHive.
Thedatabasetabletoreadthedefinitionfrom.
Table33.Outputlineformattingarguments:
Argument
Description
Setsarequiredfieldenclosingcharacter
Setstheescapecharacter
Setsthefieldseparatorcharacter
Setstheendoflinecharacter
UsesMySQLsdefaultdelimiterset:fields:lines:escapedby:optionally
enclosedby:
Setsafieldenclosingcharacter
Donotuseenclosedbyorescapedbydelimiterswithoutputformattingargumentsusedtoimportto
Hive.Hivecannotcurrentlyparsethem.
15.3.ExampleInvocations
DefineinHiveatablenamedwithadefinitionbasedonadatabasetablenamed:
16.
16.1.Purpose
16.2.Syntax
16.3.ExampleInvocations
16.1.Purpose
ThetoolallowsuserstoquicklyrunsimpleSQLqueriesagainstadatabaseresultsareprintedto
theconsole.Thisallowsuserstopreviewtheirimportqueriestoensuretheyimportthedatathey
expect.
16.2.Syntax
KWWSVVTRRSDSDFKHRUJGRFV6TRRS8VHU*XLGHKWPO
6TRRS8VHU*XLGHY
AlthoughtheHadoopgenericargumentsmustpreceedanyevalarguments,theevalargumentscan
beenteredinanyorderwithrespecttooneanother.
Table34.Commonarguments
Argument
Description
SpecifyJDBCconnectstring
Specifyconnectionmanagerclasstouse
ManuallyspecifyJDBCdriverclasstouse
Override$HADOOP_HOME
Printusageinstructions
Readpasswordfromconsole
Setauthenticationpassword
Setauthenticationusername
Printmoreinformationwhileworking
Optionalpropertiesfilethatprovidesconnectionparameters
Table35.SQLevaluationarguments:
Argument
Description
ExecuteinSQL.
16.3.ExampleInvocations
Selecttenrecordsfromthetable:
Insertarowintothetable:
17.
17.1.Purpose
17.2.Syntax
17.3.ExampleInvocations
17.1.Purpose
Listdatabaseschemaspresentonaserver.
17.2.Syntax
AlthoughtheHadoopgenericargumentsmustpreceedanylistdatabasesarguments,thelist
databasesargumentscanbeenteredinanyorderwithrespecttooneanother.
Table36.Commonarguments
Argument
Description
SpecifyJDBCconnectstring
Specifyconnectionmanagerclasstouse
KWWSVVTRRSDSDFKHRUJGRFV6TRRS8VHU*XLGHKWPO
6TRRS8VHU*XLGHY
ManuallyspecifyJDBCdriverclasstouse
Override$HADOOP_HOME
Printusageinstructions
Readpasswordfromconsole
Setauthenticationpassword
Setauthenticationusername
Printmoreinformationwhileworking
Optionalpropertiesfilethatprovidesconnectionparameters
17.3.ExampleInvocations
ListdatabaseschemasavailableonaMySQLserver:
Note
ThisonlyworkswithHSQLDB,MySQLandOracle.WhenusingwithOracle,itis
necessarythattheuserconnectingtothedatabasehasDBAprivileges.
18.
18.1.Purpose
18.2.Syntax
18.3.ExampleInvocations
18.1.Purpose
Listtablespresentinadatabase.
18.2.Syntax
AlthoughtheHadoopgenericargumentsmustpreceedanylisttablesarguments,thelisttables
argumentscanbeenteredinanyorderwithrespecttooneanother.
Table37.Commonarguments
Argument
Description
SpecifyJDBCconnectstring
Specifyconnectionmanagerclasstouse
ManuallyspecifyJDBCdriverclasstouse
Override$HADOOP_HOME
Printusageinstructions
Readpasswordfromconsole
Setauthenticationpassword
Setauthenticationusername
Printmoreinformationwhileworking
Optionalpropertiesfilethatprovidesconnectionparameters
18.3.ExampleInvocations
Listtablesavailableinthe"corp"database:
KWWSVVTRRSDSDFKHRUJGRFV6TRRS8VHU*XLGHKWPO
6TRRS8VHU*XLGHY
19.
19.1.Purpose
19.2.Syntax
19.3.ExampleInvocations
19.1.Purpose
ListtoolsavailableinSqoopandexplaintheirusage.
19.2.Syntax
Ifnotoolnameisprovided(forexample,theuserruns),thentheavailabletoolsarelisted.
Withatoolname,theusageinstructionsforthatspecifictoolarepresentedontheconsole.
19.3.ExampleInvocations
Listavailabletools:
Displayusageinstructionsforthetool:
20.
20.1.Purpose
20.2.Syntax
20.3.ExampleInvocations
20.1.Purpose
DisplayversioninformationforSqoop.
KWWSVVTRRSDSDFKHRUJGRFV6TRRS8VHU*XLGHKWPO
6TRRS8VHU*XLGHY
20.2.Syntax
20.3.ExampleInvocations
Displaytheversion:
21.CompatibilityNotes
21.1.SupportedDatabases
21.2.MySQL
21.2.1.zeroDateTimeBehavior
21.2.2.
columns
21.2.3.andcolumns
21.2.4.Importingviewsindirectmode
21.2.5.DirectmodeTransactions
21.3.PostgreSQL
21.3.1.Importingviewsindirectmode
21.4.Oracle
21.4.1.DatesandTimes
21.5.SchemaDefinitioninHive
SqoopusesJDBCtoconnecttodatabasesandadherestopublishedstandardsasmuchaspossible.
FordatabaseswhichdonotsupportstandardscompliantSQL,Sqoopusesalternatecodepathsto
providefunctionality.Ingeneral,Sqoopisbelievedtobecompatiblewithalargenumberofdatabases,
butitistestedwithonlyafew.
Nonetheless,severaldatabasespecificdecisionsweremadeintheimplementationofSqoop,and
somedatabasesofferadditionalsettingswhichareextensionstothestandard.
ThissectiondescribesthedatabasestestedwithSqoop,anyexceptionsinSqoopshandlingofeach
databaserelativetothenorm,andanydatabasespecificsettingsavailableinSqoop.
21.1.SupportedDatabases
WhileJDBCisacompatibilitylayerthatallowsaprogramtoaccessmanydifferentdatabasesthrough
acommonAPI,slightdifferencesintheSQLlanguagespokenbyeachdatabasemaymeanthatSqoop
cantuseeverydatabaseoutofthebox,orthatsomedatabasesmaybeusedinaninefficient
manner.
WhenyouprovideaconnectstringtoSqoop,itinspectstheprotocolschemetodetermineappropriate
vendorspecificlogictouse.IfSqoopknowsaboutagivendatabase,itwillworkautomatically.Ifnot,
youmayneedtospecifythedriverclasstoloadvia.Thiswilluseagenericcodepathwhich
willusestandardSQLtoaccessthedatabase.Sqoopprovidessomedatabaseswithfaster,nonJDBC
basedaccessmechanisms.Thesecanbeenabledbyspecfyingtheparameter.
Sqoopincludesvendorspecificsupportforthefollowingdatabases:
Database
HSQLDB
MySQL
Oracle
1.8.0+ No
5.0+
Yes
10.2.0+ No
KWWSVVTRRSDSDFKHRUJGRFV6TRRS8VHU*XLGHKWPO
6TRRS8VHU*XLGHY
PostgreSQL 8.3+
Yes(importonly)
Sqoopmayworkwitholderversionsofthedatabaseslisted,butwehaveonlytesteditwiththe
versionsspecifiedabove.
EvenifSqoopsupportsadatabaseinternally,youmaystillneedtoinstallthedatabasevendorsJDBC
driverinyourpathonyourclient.Sqoopcanloadclassesfromanyjarsin
ontheclientandwillusethemaspartofanyMapReducejobsitrunsunlikeolderversions,youno
longerneedtoinstallJDBCjarsintheHadooplibrarypathonyourservers.
21.2.MySQL
21.2.1.zeroDateTimeBehavior
21.2.2.
columns
21.2.3.andcolumns
21.2.4.Importingviewsindirectmode
21.2.5.DirectmodeTransactions
JDBCDriver:MySQLConnector/J
MySQLv5.0andaboveoffersverythoroughcoveragebySqoop.Sqoophasbeentestedwith
.
21.2.1.zeroDateTimeBehavior
MySQLallowsvaluesofforcolumns,whichisanonstandardextensiontoSQL.When
communicatedviaJDBC,thesevaluesarehandledinoneofthreedifferentways:
Convertto.
Throwanexceptionintheclient.
Roundtothenearestlegaldate().
Youspecifythebehaviorbyusingthepropertyoftheconnectstring.Ifa
propertyisnotspecified,Sqoopusesthebehavior.
Youcanoverridethisbehavior.Forexample:
21.2.2.
columns
Columnswithtype
inMySQLcanholdvaluesbetween0and2^32(),butthe
databasewillreportthedatatypetoSqoopas
,whichwillcanholdvaluesbetween
and.Sqoopcannotcurrentlyimport
valuesabove.
21.2.3.andcolumns
Sqoopsdirectmodedoesnotsupportimportsof,,or
columns.UseJDBCbased
importsforthesecolumnsdonotsupplytheargumenttotheimporttool.
21.2.4.Importingviewsindirectmode
Sqoopiscurrentlynotsupportingimportfromviewindirectmode.UseJDBCbased(nondirect)mode
incasethatyouneedtoimportview(simplyomitparameter).
21.2.5.DirectmodeTransactions
Forperformance,eachwriterwillcommitthecurrenttransactionapproximatelyevery32MBof
exporteddata.Youcancontrolthisbyspecifyingthefollowingargumentbeforeanytoolspecific
arguments:,wheresizeisavalueinbytes.Setsizeto0to
disableintermediatecheckpoints,butindividualfilesbeingexportedwillcontinuetobecommitted
KWWSVVTRRSDSDFKHRUJGRFV6TRRS8VHU*XLGHKWPO
6TRRS8VHU*XLGHY
disableintermediatecheckpoints,butindividualfilesbeingexportedwillcontinuetobecommitted
independentlyofoneanother.
Important
NotethatanyargumentstoSqoopthatareoftheformare
Hadoopgenericargumentsandmustappearbeforeanytoolspecific
arguments(forexample,,,etc).
21.3.PostgreSQL
21.3.1.Importingviewsindirectmode
SqoopsupportsJDBCbasedconnectorforPostgreSQL:http://jdbc.postgresql.org/
TheconnectorhasbeentestedusingJDBCdriverversion"9.1903JDBC4"withPostgreSQLserver
9.1.
21.3.1.Importingviewsindirectmode
Sqoopiscurrentlynotsupportingimportfromviewindirectmode.UseJDBCbased(nondirect)mode
incasethatyouneedtoimportview(simplyomitparameter).
21.4.Oracle
21.4.1.DatesandTimes
JDBCDriver:OracleJDBCThinDriverSqoopiscompatiblewith.
SqoophasbeentestedwithOracle10.2.0ExpressEdition.Oracleisnotableinitsdifferentapproach
toSQLfromtheANSIstandard,anditsnonstandardJDBCdriver.Therefore,severalfeatureswork
differently.
21.4.1.DatesandTimes
OracleJDBCrepresentsandSQLtypesasvalues.AnycolumnsinanOracle
databasewillbeimportedasainSqoop,andSqoopgeneratedcodewillstorethesevaluesin
fields.
Whenexportingdatabacktoadatabase,Sqoopparsestextfieldsastypes(withtheform
)evenifyouexpectthesefieldstobeformattedwiththeJDBCdateescape
formatof.DatesexportedtoOracleshouldbeformattedasfulltimestamps.
Oraclealsoincludestheadditionaldate/timetypesand.
Tosupportthesetypes,theuserssessiontimezonemustbespecified.Bydefault,Sqoopwillspecify
thetimezone
toOracle.YoucanoverridethissettingbyspecifyingaHadoopproperty
onthecommandlinewhenrunningaSqoopjob.Forexample:
NotethatHadoopparameters()aregenericargumentsandmustappearbeforethetoolspecific
arguments(,,andsoon).
Legalvaluesforthesessiontimezonestringareenumeratedathttp://download
west.oracle.com/docs/cd/B19306_01/server.102/b14225/applocaledata.htm#i637736.
21.5.SchemaDefinitioninHive
HiveuserswillnotethatthereisnotaonetoonemappingbetweenSQLtypesandHivetypes.In
general,SQLtypesthatdonothaveadirectmapping(forexample,,,and)willbe
coercedto
inHive.TheandSQLtypeswillbecoercedto.Inthesecases,
Sqoopwillemitawarninginitslogmessagesinformingyouofthelossofprecision.
KWWSVVTRRSDSDFKHRUJGRFV6TRRS8VHU*XLGHKWPO
6TRRS8VHU*XLGHY
22.GettingSupport
Somegeneralinformationisavailableatthehttp://sqoop.apache.org/
ReportbugsinSqooptotheissuetrackerathttps://issues.apache.org/jira/browse/SQOOP.
QuestionsanddiscussionregardingtheusageofSqoopshouldbedirectedtothesqoopusermailing
list.
Beforecontactingeitherforum,runyourSqoopjobwiththeflagtoacquireasmuch
debugginginformationaspossible.Alsoreportthestringreturnedbyaswellasthe
versionofHadoopyouarerunning().
23.Troubleshooting
23.1.GeneralTroubleshootingProcess
23.2.SpecificTroubleshootingTips
23.2.1.Oracle:ConnectionResetErrors
23.2.2.Oracle:CaseSensitiveCatalogQueryErrors
23.2.3.MySQL:ConnectionFailure
23.2.4.Oracle:ORA00933error(SQLcommandnotproperlyended)
23.2.5.MySQL:ImportofTINYINT(1)fromMySQLbehavesstrangely
23.1.GeneralTroubleshootingProcess
Thefollowingstepsshouldbefollowedtotroubleshootanyfailurethatyouencounterwhilerunning
Sqoop.
Turnonverboseoutputbyexecutingthesamecommandagainandspecifyingthe
option.Thisproducesmoredebugoutputontheconsolewhichcanbeinspectedtoidentify
anyobviouserrors.
LookatthetasklogsfromHadooptoseeifthereareanyspecificfailuresrecordedthere.It
ispossiblethatthefailurethatoccurswhiletaskexecutionisnotrelayedcorrectlytothe
console.
Makesurethatthenecessaryinputfilesorinput/outputtablesarepresentandcanbe
accessedbytheuserthatSqoopisexecutingasorconnectingtothedatabaseas.Itis
possiblethatthenecessaryfilesortablesarepresentbutthespecificuserthatSqoop
connectsasdoesnothavethenecessarypermissionstoaccessthesefiles.
IfyouaredoingacompoundactionsuchaspopulatingaHivetableorpartition,trybreaking
thejobintotwoseparateactionstoseewheretheproblemreallyoccurs.Forexampleifan
importthatcreatesandpopulatesaHivetableisfailing,youcanbreakitdownintotwo
stepsfirstfordoingtheimportalone,andthesecondtocreateaHivetablewithoutthe
importusingthetool.Whilethisdoesnotaddresstheoriginalusecaseof
populatingtheHivetable,itdoeshelpnarrowdowntheproblemtoeitherregularimportor
duringthecreationandpopulationofHivetable.
SearchthemailinglistsarchivesandJIRAforkeywordsrelatingtotheproblem.Itis
possiblethatyoumayfindasolutiondiscussedtherethatwillhelpyousolveorworkaround
yourproblem.
23.2.SpecificTroubleshootingTips
23.2.1.Oracle:ConnectionResetErrors
23.2.2.Oracle:CaseSensitiveCatalogQueryErrors
23.2.3.MySQL:ConnectionFailure
23.2.4.Oracle:ORA00933error(SQLcommandnotproperlyended)
23.2.5.MySQL:ImportofTINYINT(1)fromMySQLbehavesstrangely
23.2.1.Oracle:ConnectionResetErrors
Problem:WhenusingthedefaultSqoopconnectorforOracle,somedatadoesgettransferred,but
duringthemapreducejobalotoferrorsarereportedasbelow:
KWWSVVTRRSDSDFKHRUJGRFV6TRRS8VHU*XLGHKWPO
6TRRS8VHU*XLGHY
Solution:Thisproblemoccursprimarilyduetothelackofafastrandomnumbergenerationdeviceon
thehostwherethemaptasksexecute.OntypicalLinuxsystemsthiscanbeaddressedbysettingthe
followingpropertyinthefile:
23.2.2.Oracle:CaseSensitiveCatalogQueryErrors
Problem:WhileworkingwithOracleyoumayencounterproblemswhenSqoopcannotfigureout
columnnames.ThishappensbecausethecatalogqueriesthatSqoopusesforOracleexpectthe
correctcasetobespecifiedfortheusernameandtablename.
Oneexample,usinghiveimportandresultinginaNullPointerException:
KWWSVVTRRSDSDFKHRUJGRFV6TRRS8VHU*XLGHKWPO
6TRRS8VHU*XLGHY
Solution:
1.Specifytheusername,whichSqoopisconnectingas,inuppercase(unlessitwascreated
withmixed/lowercasewithinquotes).
2.Specifythetablename,whichyouareworkingwith,inuppercase(unlessitwascreated
withmixed/lowercasewithinquotes).
23.2.3.MySQL:ConnectionFailure
Problem:WhileimportingaMySQLtableintoSqoop,ifyoudonothavethenecessarypermissionsto
accessyourMySQLdatabaseoverthenetwork,youmaygetthebelowconnectionfailure.
Solution:First,verifythatyoucanconnecttothedatabasefromthenodewhereyouarerunning
Sqoop:
Ifthisworks,itrulesoutanyproblemwiththeclientnetworkconfigurationorsecurity/authentication
configuration.
Addthenetworkportfortheservertoyourmy.cnffile:
SetupauseraccounttoconnectviaSqoop.Grantpermissionstotheusertoaccessthedatabase
overthenetwork:(1.)LogintoMySQLasroot.(2.)Issuethefollowing
command:
NotethatdoingthiswillenablethetestusertoconnecttotheMySQLserverfromanyIPaddress.
Whilethiswillwork,itisnotadvisableforaproductionenvironment.Weadviseconsultingwithyour
DBAtograntthenecessaryprivilegesbasedonthesetuptopology.
IfthedatabaseserversIPaddresschanges,unlessitisboundtoastatichostnameinyourserver,the
connectstringpassedintoSqoopwillalsoneedtobechanged.
23.2.4.Oracle:ORA00933error(SQLcommandnot
properlyended)
Problem:WhileworkingwithOracleyoumayencounterthebelowproblemwhentheSqoopcommand
explicitlyspecifiesthedriver<drivername>option.WhenthedriveroptionisincludedintheSqoop
command,thebuiltinconnectionmanagerselectiondefaultstothegenericconnectionmanager,
whichcausesthisissuewithOracle.Ifthedriveroptionisnotspecified,thebuiltinconnection
managerselectionmechanismselectstheOraclespecificconnectionmanagerwhichgeneratesvalid
SQLforOracleandusesthedriver"oracle.jdbc.OracleDriver".
Solution:Omittheoptiondriveroracle.jdbc.driver.OracleDriverandthenreruntheSqoop
command.
23.2.5.MySQL:ImportofTINYINT(1)fromMySQL
KWWSVVTRRSDSDFKHRUJGRFV6TRRS8VHU*XLGHKWPO
6TRRS8VHU*XLGHY
behavesstrangely
Problem:SqoopistreatingTINYINT(1)columnsasbooleans,whichisforexamplecausingissueswith
HIVEimport.ThisisbecausebydefaulttheMySQLJDBCconnectormapstheTINYINT(1)to
java.sql.Types.BIT,whichSqoopbydefaultmapstoBoolean.
Solution:AmorecleansolutionistoforceMySQLJDBCConnectortostopconvertingTINYINT(1)to
java.sql.Types.BITbyaddingintoyourJDBCpath(tocreatesomethinglike
).Anothersolutionwouldbetoexplicitlyoverridethecolumn
mappingforthedatatypeTINYINT(1)column.Forexample,ifthecolumnnameisfoo,thenpassthe
followingoptiontoSqoopduringimport:mapcolumnhivefoo=tinyint.InthecaseofnonHive
importstoHDFS,usemapcolumnjavafoo=integer.
ThisdocumentwasbuiltfromSqoopsourceavailableathttp://svn.apache.org/repos/asf/sqoop/trunk/.
KWWSVVTRRSDSDFKHRUJGRFV6TRRS8VHU*XLGHKWPO