Vous êtes sur la page 1sur 41



6TRRS8VHU*XLGH Y

SqoopUserGuide(v1.4.2)

SqoopUserGuide(v1.4.2)
TableofContents
1.Introduction
2.SupportedReleases
3.SqoopReleases
4.Prerequisites
5.BasicUsage
6.SqoopTools
6.1.UsingCommandAliases
6.2.ControllingtheHadoopInstallation
6.3.UsingGenericandSpecificArguments
6.4.UsingOptionsFilestoPassArguments
6.5.UsingTools
7.
7.1.Purpose
7.2.Syntax
7.2.1.ConnectingtoaDatabaseServer
7.2.2.SelectingtheDatatoImport
7.2.3.FreeformQueryImports
7.2.4.ControllingParallelism
7.2.5.ControllingtheImportProcess
7.2.6.Controllingtypemapping
7.2.7.IncrementalImports
7.2.8.FileFormats
7.2.9.LargeObjects
7.2.10.ImportingDataIntoHive
7.2.11.ImportingDataIntoHBase
7.3.ExampleInvocations
8.
8.1.Purpose
8.2.Syntax
8.3.ExampleInvocations
9.
9.1.Purpose
9.2.Syntax
9.3.Insertsvs.Updates
9.4.ExportsandTransactions
9.5.FailedExports
9.6.ExampleInvocations
10.SavedJobs
11.
11.1.Purpose
11.2.Syntax
11.3.Savedjobsandpasswords
11.4.Savedjobsandincrementalimports
12.
12.1.Purpose
KWWSVVTRRSDSDFKHRUJGRFV6TRRS8VHU*XLGHKWPO





6TRRS8VHU*XLGH Y

12.2.Syntax
13.
13.1.Purpose
13.2.Syntax
14.
14.1.Purpose
14.2.Syntax
14.3.ExampleInvocations
15.
15.1.Purpose
15.2.Syntax
15.3.ExampleInvocations
16.
16.1.Purpose
16.2.Syntax
16.3.ExampleInvocations
17.
17.1.Purpose
17.2.Syntax
17.3.ExampleInvocations
18.
18.1.Purpose
18.2.Syntax
18.3.ExampleInvocations
19.
19.1.Purpose
19.2.Syntax
19.3.ExampleInvocations
20.
20.1.Purpose
20.2.Syntax
20.3.ExampleInvocations
21.CompatibilityNotes
21.1.SupportedDatabases
21.2.MySQL
21.2.1.zeroDateTimeBehavior
21.2.2.
columns
21.2.3.andcolumns
21.2.4.Importingviewsindirectmode
21.2.5.DirectmodeTransactions
21.3.PostgreSQL
21.3.1.Importingviewsindirectmode
21.4.Oracle
21.4.1.DatesandTimes
21.5.SchemaDefinitioninHive
22.GettingSupport
KWWSVVTRRSDSDFKHRUJGRFV6TRRS8VHU*XLGHKWPO





6TRRS8VHU*XLGH Y

23.Troubleshooting
23.1.GeneralTroubleshootingProcess
23.2.SpecificTroubleshootingTips
23.2.1.Oracle:ConnectionResetErrors
23.2.2.Oracle:CaseSensitiveCatalogQueryErrors
23.2.3.MySQL:ConnectionFailure
23.2.4.Oracle:ORA00933error(SQLcommandnotproperlyended)
23.2.5.MySQL:ImportofTINYINT(1)fromMySQLbehavesstrangely
  
 

 




 


  
       



1.Introduction
SqoopisatooldesignedtotransferdatabetweenHadoopandrelationaldatabases.Youcanuse
Sqooptoimportdatafromarelationaldatabasemanagementsystem(RDBMS)suchasMySQLor
OracleintotheHadoopDistributedFileSystem(HDFS),transformthedatainHadoopMapReduce,and
thenexportthedatabackintoanRDBMS.
Sqoopautomatesmostofthisprocess,relyingonthedatabasetodescribetheschemaforthedatato
beimported.SqoopusesMapReducetoimportandexportthedata,whichprovidesparalleloperation
aswellasfaulttolerance.
ThisdocumentdescribeshowtogetstartedusingSqooptomovedatabetweendatabasesandHadoop
andprovidesreferenceinformationfortheoperationoftheSqoopcommandlinetoolsuite.This
documentisintendedfor:
Systemandapplicationprogrammers
Systemadministrators
Databaseadministrators
Dataanalysts
Dataengineers

2.SupportedReleases
ThisdocumentationappliestoSqoopv1.4.2.

3.SqoopReleases
SqoopisanopensourcesoftwareproductoftheApacheSoftwareFoundation.
SoftwaredevelopmentforSqoopoccursathttp://svn.apache.org/repos/asf/sqoop/trunk.Atthatsite
youcanobtain:
NewreleasesofSqoopaswellasitsmostrecentsourcecode
Anissuetracker
AwikithatcontainsSqoopdocumentation
SqoopiscompatiblewithApacheHadoop0.21andClouderasDistributionofHadoopversion3.

4.Prerequisites
Thefollowingprerequisiteknowledgeisrequiredforthisproduct:
KWWSVVTRRSDSDFKHRUJGRFV6TRRS8VHU*XLGHKWPO





6TRRS8VHU*XLGH Y

Basiccomputertechnologyandterminology
Familiaritywithcommandlineinterfacessuchas
Relationaldatabasemanagementsystems
BasicfamiliaritywiththepurposeandoperationofHadoop
BeforeyoucanuseSqoop,areleaseofHadoopmustbeinstalledandconfigured.Werecommendthat
youdownloadClouderasDistributionforHadoop(CDH3)fromtheClouderaSoftwareArchiveat
http://archive.cloudera.comforstraightforwardinstallationofHadooponLinuxsystems.
ThisdocumentassumesyouareusingaLinuxorLinuxlikeenvironment.IfyouareusingWindows,
youmaybeabletousecygwintoaccomplishmostofthefollowingtasks.IfyouareusingMacOSX,
youshouldseefew(ifany)compatibilityerrors.SqoopispredominantlyoperatedandtestedonLinux.

5.BasicUsage
WithSqoop,youcanimportdatafromarelationaldatabasesystemintoHDFS.Theinputtotheimport
processisadatabasetable.SqoopwillreadthetablerowbyrowintoHDFS.Theoutputofthisimport
processisasetoffilescontainingacopyoftheimportedtable.Theimportprocessisperformedin
parallel.Forthisreason,theoutputwillbeinmultiplefiles.Thesefilesmaybedelimitedtextfiles(for
example,withcommasortabsseparatingeachfield),orbinaryAvroorSequenceFilescontaining
serializedrecorddata.
AbyproductoftheimportprocessisageneratedJavaclasswhichcanencapsulateonerowofthe
importedtable.ThisclassisusedduringtheimportprocessbySqoopitself.TheJavasourcecodefor
thisclassisalsoprovidedtoyou,foruseinsubsequentMapReduceprocessingofthedata.Thisclass
canserializeanddeserializedatatoandfromtheSequenceFileformat.Itcanalsoparsethedelimited
textformofarecord.TheseabilitiesallowyoutoquicklydevelopMapReduceapplicationsthatusethe
HDFSstoredrecordsinyourprocessingpipeline.Youarealsofreetoparsethedelimitedsrecorddata
yourself,usinganyothertoolsyouprefer.
Aftermanipulatingtheimportedrecords(forexample,withMapReduceorHive)youmayhavearesult
datasetwhichyoucanthenexportbacktotherelationaldatabase.Sqoopsexportprocesswillreada
setofdelimitedtextfilesfromHDFSinparallel,parsethemintorecords,andinsertthemasnewrows
inatargetdatabasetable,forconsumptionbyexternalapplicationsorusers.
Sqoopincludessomeothercommandswhichallowyoutoinspectthedatabaseyouareworkingwith.
Forexample,youcanlisttheavailabledatabaseschemas(withthetool)andtables
withinaschema(withthetool).SqoopalsoincludesaprimitiveSQLexecutionshell
(thetool).
Mostaspectsoftheimport,codegeneration,andexportprocessescanbecustomized.Youcancontrol
thespecificrowrangeorcolumnsimported.Youcanspecifyparticulardelimitersandescape
charactersforthefilebasedrepresentationofthedata,aswellasthefileformatused.Youcanalso
controltheclassorpackagenamesusedingeneratedcode.Subsequentsectionsofthisdocument
explainhowtospecifytheseandotherargumentstoSqoop.

6.SqoopTools
6.1.UsingCommandAliases
6.2.ControllingtheHadoopInstallation
6.3.UsingGenericandSpecificArguments
6.4.UsingOptionsFilestoPassArguments
6.5.UsingTools
Sqoopisacollectionofrelatedtools.TouseSqoop,youspecifythetoolyouwanttouseandthe
argumentsthatcontrolthetool.
IfSqoopiscompiledfromitsownsource,youcanrunSqoopwithoutaformalinstallationprocessby
runningtheprogram.UsersofapackageddeploymentofSqoop(suchasanRPMshipped
withClouderasDistributionforHadoop)willseethisprograminstalledas.Theremainder
ofthisdocumentationwillrefertothisprogramas.Forexample:


KWWSVVTRRSDSDFKHRUJGRFV6TRRS8VHU*XLGHKWPO





6TRRS8VHU*XLGH Y

Note
Thefollowingexamplesthatbeginwithacharacterindicatethatthecommands
mustbeenteredataterminalprompt(suchas).Thecharacterrepresents
thepromptitselfyoushouldnotstartthesecommandsbytypinga.Youcan
alsoentercommandsinlineinthetextofaparagraphforexample,.
Theseexamplesdonotshowaprefix,butyoushouldenterthemthesameway.
Dontconfusetheshellpromptintheexampleswiththethatprecedesan
environmentvariablename.Forexample,thestringliteral  includesa
"".
Sqoopshipswithahelptool.Todisplayalistofallavailabletools,typethefollowingcommand:







  

  

   
   






Youcandisplayhelpforaspecifictoolbyentering:forexample,.
Youcanalsoaddtheargumenttoanycommand:.

6.1.UsingCommandAliases
Inadditiontotypingthesyntax,youcanusealiasscriptsthatspecifythe
syntax.Forexample,thescripts,,etc.eachselectaspecifictool.

6.2.ControllingtheHadoopInstallation
YouinvokeSqoopthroughtheprogramlaunchcapabilityprovidedbyHadoop.Thecommandline
programisawrapperwhichrunsthescriptshippedwithHadoop.Ifyouhavemultiple
installationsofHadooppresentonyourmachine,youcanselecttheHadoopinstallationbysettingthe
 environmentvariable.
Forexample:
  

or:
  


If  isnotset,SqoopwillusethedefaultinstallationlocationforClouderasDistributionfor
Hadoop,.
TheactiveHadoopconfigurationisloadedfrom  ,unlessthe   
environmentvariableisset.

6.3.UsingGenericandSpecificArguments
TocontroltheoperationofeachSqooptool,youusegenericandspecificarguments.
Forexample:
KWWSVVTRRSDSDFKHRUJGRFV6TRRS8VHU*XLGHKWPO





6TRRS8VHU*XLGH Y



 




 

 
  









 














Youmustsupplythegenericarguments,,andsoonafterthetoolnamebutbeforeanytool
specificarguments(suchas).NotethatgenericHadoopargumentsarepreceededbyasingle
dashcharacter(),whereastoolspecificargumentsstartwithtwodashes(),unlesstheyaresingle
characterargumentssuchas.
The,,andargumentscontroltheconfigurationandHadoopserversettings.Forexample,
thecanbeusedtosetthenameoftheMRjobthatSqooplaunches,ifnot
specified,thenamedefaultstothejarnameforthejobwhichisderivedfromtheusedtablename.
The,,andargumentsarenottypicallyusedwithSqoop,buttheyareincludedas
partofHadoopsinternalargumentparsingsystem.

6.4.UsingOptionsFilestoPassArguments
WhenusingSqoop,thecommandlineoptionsthatdonotchangefrominvocationtoinvocationcanbe
putinanoptionsfileforconvenience.Anoptionsfileisatextfilewhereeachlineidentifiesanoption
intheorderthatitappearsotherwiseonthecommandline.Optionfilesallowspecifyingasingle
optiononmultiplelinesbyusingthebackslashcharacterattheendofintermediatelines.Also
supportedarecommentswithinoptionfilesthatbeginwiththehashcharacter.Commentsmustbe
specifiedonanewlineandmaynotbemixedwithoptiontext.Allcommentsandemptylinesare
ignoredwhenoptionfilesareexpanded.Unlessoptionsappearasquotedstrings,anyleadingor
trailingspacesareignored.Quotedstringsifusedmustnotextendbeyondthelineonwhichtheyare
specified.
Optionfilescanbespecifiedanywhereinthecommandlineaslongastheoptionswithinthemfollow
theotherwiseprescribedrulesofoptionsordering.Forinstance,regardlessofwheretheoptionsare
loadedfrom,theymustfollowtheorderingsuchthatgenericoptionsappearfirst,toolspecificoptions
next,finallyfollowedbyoptionsthatareintendedtobepassedtochildprograms.
Tospecifyanoptionsfile,simplycreateanoptionsfileinaconvenientlocationandpassittothe
commandlineviaargument.
Wheneveranoptionsfileisspecified,itisexpandedonthecommandlinebeforethetoolisinvoked.
Youcanspecifymorethanoneoptionfileswithinthesameinvocationifneeded.
Forexample,thefollowingSqoopinvocationforimportcanbespecifiedalternativelyasshownbelow:




wheretheoptionsfilecontainsthefollowing:
KWWSVVTRRSDSDFKHRUJGRFV6TRRS8VHU*XLGHKWPO





6TRRS8VHU*XLGH Y






Theoptionsfilecanhaveemptylinesandcommentsforreadabilitypurposes.Sotheaboveexample
wouldworkexactlythesameiftheoptionsfilecontainedthefollowing:


















6.5.UsingTools
Thefollowingsectionswilldescribeeachtoolsoperation.Thetoolsarelistedinthemostlikelyorder
youwillfindthemuseful.

7.
7.1.Purpose
7.2.Syntax
7.2.1.ConnectingtoaDatabaseServer
7.2.2.SelectingtheDatatoImport
7.2.3.FreeformQueryImports
7.2.4.ControllingParallelism
7.2.5.ControllingtheImportProcess
7.2.6.Controllingtypemapping
7.2.7.IncrementalImports
7.2.8.FileFormats
7.2.9.LargeObjects
7.2.10.ImportingDataIntoHive
7.2.11.ImportingDataIntoHBase
7.3.ExampleInvocations

7.1.Purpose
ThetoolimportsanindividualtablefromanRDBMStoHDFS.Eachrowfromatableis
representedasaseparaterecordinHDFS.Recordscanbestoredastextfiles(onerecordperline),or
inbinaryrepresentationasAvroorSequenceFiles.

7.2.Syntax
7.2.1.ConnectingtoaDatabaseServer
7.2.2.SelectingtheDatatoImport
7.2.3.FreeformQueryImports
7.2.4.ControllingParallelism
7.2.5.ControllingtheImportProcess
7.2.6.Controllingtypemapping
7.2.7.IncrementalImports
7.2.8.FileFormats
7.2.9.LargeObjects
KWWSVVTRRSDSDFKHRUJGRFV6TRRS8VHU*XLGHKWPO





6TRRS8VHU*XLGH Y

7.2.10.ImportingDataIntoHive
7.2.11.ImportingDataIntoHBase



WhiletheHadoopgenericargumentsmustprecedeanyimportarguments,youcantypetheimport
argumentsinanyorderwithrespecttooneanother.
Note
Inthisdocument,argumentsaregroupedintocollectionsorganizedbyfunction.
Somecollectionsarepresentinseveraltools(forexample,the"common"
arguments).Anextendeddescriptionoftheirfunctionalityisgivenonlyonthe
firstpresentationinthisdocument.
Table1.Commonarguments
Argument

Description
SpecifyJDBCconnectstring
 Specifyconnectionmanagerclasstouse

ManuallyspecifyJDBCdriverclasstouse

Override$HADOOP_HOME

Printusageinstructions

Readpasswordfromconsole

Setauthenticationpassword

Setauthenticationusername

Printmoreinformationwhileworking
 Optionalpropertiesfilethatprovidesconnectionparameters


7.2.1.ConnectingtoaDatabaseServer
SqoopisdesignedtoimporttablesfromadatabaseintoHDFS.Todoso,youmustspecifyaconnect
stringthatdescribeshowtoconnecttothedatabase.TheconnectstringissimilartoaURL,andis
communicatedtoSqoopwiththeargument.Thisdescribestheserveranddatabaseto
connecttoitmayalsospecifytheport.Forexample:


ThisstringwillconnecttoaMySQLdatabasenamedonthehost.Its
importantthatyoudonotusetheURLifyouintendtouseSqoopwithadistributedHadoop
cluster.TheconnectstringyousupplywillbeusedonTaskTrackernodesthroughoutyourMapReduce
clusterifyouspecifytheliteralname,eachnodewillconnecttoadifferentdatabase(or
morelikely,nodatabaseatall).Instead,youshouldusethefullhostnameorIPaddressofthe
databasehostthatcanbeseenbyallyourremotenodes.
Youmightneedtoauthenticateagainstthedatabasebeforeyoucanaccessit.Youcanusethe
andorparameterstosupplyausernameandapasswordtothedatabase.For
example:



Warning
Theparameterisinsecure,asotherusersmaybeabletoreadyour
passwordfromthecommandlineargumentsviatheoutputofprogramssuch
as.Theargumentwillreadapasswordfromaconsoleprompt,andisthe
preferredmethodofenteringcredentials.Credentialsmaystillbetransferred
betweennodesoftheMapReduceclusterusinginsecuremeans.

KWWSVVTRRSDSDFKHRUJGRFV6TRRS8VHU*XLGHKWPO





6TRRS8VHU*XLGH Y

Sqoopautomaticallysupportsseveraldatabases,includingMySQL.Connectstringsbeginningwith
arehandledautomaticallyinSqoop.(Afulllistofdatabaseswithbuiltinsupportis
providedinthe"SupportedDatabases"section.Forsome,youmayneedtoinstalltheJDBCdriver
yourself.)
YoucanuseSqoopwithanyotherJDBCcompliantdatabase.First,downloadtheappropriateJDBC
driverforthetypeofdatabaseyouwanttoimport,andinstallthe.jarfileinthe 
directoryonyourclientmachine.(ThiswillbeifyouinstalledfromanRPMorDebian
package.)Eachdriverfilealsohasaspecificdriverclasswhichdefinestheentrypointtothe
driver.Forexample,MySQLsConnector/Jlibraryhasadriverclassof.Referto
yourdatabasevendorspecificdocumentationtodeterminethemaindriverclass.Thisclassmustbe
providedasanargumenttoSqoopwith.
Forexample,toconnecttoaSQLServerdatabase,firstdownloadthedriverfrommicrosoft.comand
installitinyourSqooplibpath.
ThenrunSqoop.Forexample:



WhenconnectingtoadatabaseusingJDBC,youcanoptionallyspecifyextraJDBCparametersviaa
propertyfileusingtheoption.Thecontentsofthisfileareparsedasstandard
Javapropertiesandpassedintothedriverwhilecreatingaconnection.
Note
TheparametersspecifiedviatheoptionalpropertyfileareonlyapplicabletoJDBC
connections.AnyfastpathconnectorsthatuseconnectionsotherthanJDBCwill
ignoretheseparameters.
Table2.Importcontrolarguments:
Argument

Description
AppenddatatoanexistingdatasetinHDFS

ImportsdatatoAvroDataFiles

ImportsdatatoSequenceFiles

Importsdataasplaintext(default)

Boundaryquerytouseforcreatingsplits

Columnstoimportfromtable

Usedirectimportfastpath

Splittheinputstreameverynbyteswhenimportingindirectmode

SetthemaximumsizeforaninlineLOB

Usenmaptaskstoimportinparallel

Importtheresultsof.

Columnofthetableusedtosplitworkunits

Tabletoread

HDFSdestinationdir

HDFSparentfortabledestination

WHEREclausetouseduringimport

Enablecompression

UseHadoopcodec(defaultgzip)

Thestringtobewrittenforanullvalueforstringcolumns
 Thestringtobewrittenforanullvaluefornonstringcolumns

Theandargumentsareoptional.\Ifnotspecified,thenthestring"null"
willbeused.

7.2.2.SelectingtheDatatoImport
KWWSVVTRRSDSDFKHRUJGRFV6TRRS8VHU*XLGHKWPO





6TRRS8VHU*XLGH Y

Sqooptypicallyimportsdatainatablecentricfashion.Usetheargumenttoselectthetableto
import.Forexample,.Thisargumentcanalsoidentifya orothertablelikeentityin
adatabase.
Bydefault,allcolumnswithinatableareselectedforimport.ImporteddataiswrittentoHDFSinits
"naturalorder"thatis,atablecontainingcolumnsA,B,andCresultinanimportofdatasuchas:



Youcanselectasubsetofcolumnsandcontroltheirorderingbyusingtheargument.This
shouldincludeacommadelimitedlistofcolumnstoimport.Forexample:
.
YoucancontrolwhichrowsareimportedbyaddingaSQL clausetotheimportstatement.By
default,Sqoopgeneratesstatementsoftheform .Youcanappenda
 clausetothiswiththeargument.Forexample:.Onlyrowswherethe
columnhasavaluegreaterthan400willbeimported.
Bydefaultsqoopwillusequerytofindout
boundariesforcreatingsplits.Insomecasesthisqueryisnotthemostoptimalsoyoucanspecifyany
arbitraryqueryreturningtwonumericcolumnsusingargument.

7.2.3.FreeformQueryImports
SqoopcanalsoimporttheresultsetofanarbitrarySQLquery.Insteadofusingthe,
andarguments,youcanspecifyaSQLstatementwiththeargument.
Whenimportingafreeformquery,youmustspecifyadestinationdirectorywith.
Ifyouwanttoimporttheresultsofaqueryinparallel,theneachmaptaskwillneedtoexecuteacopy
ofthequery,withresultspartitionedbyboundingconditionsinferredbySqoop.Yourquerymust
includethetoken  whicheachSqoopprocesswillreplacewithauniqueconditionexpression.
Youmustalsoselectasplittingcolumnwith.
Forexample:

      


Alternately,thequerycanbeexecutedonceandimportedserially,byspecifyingasinglemaptask
with:

      


Note
Ifyouareissuingthequerywrappedwithdoublequotes("),youwillhavetouse
  insteadofjust  todisallowyourshellfromtreatingitasa
shellvariable.Forexample,adoublequotedquerymaylooklike: 
   

Note
ThefacilityofusingfreeformqueryinthecurrentversionofSqoopislimitedto
simplequerieswheretherearenoambiguousprojectionsandnoconditionsin
the clause.Useofcomplexqueriessuchasqueriesthathavesubqueriesor
joinsleadingtoambiguousprojectionscanleadtounexpectedresults.

7.2.4.ControllingParallelism
Sqoopimportsdatainparallelfrommostdatabasesources.Youcanspecifythenumberofmaptasks
KWWSVVTRRSDSDFKHRUJGRFV6TRRS8VHU*XLGHKWPO





6TRRS8VHU*XLGH Y

(parallelprocesses)tousetoperformtheimportbyusingtheorargument.Eachof
theseargumentstakesanintegervaluewhichcorrespondstothedegreeofparallelismtoemploy.By
default,fourtasksareused.Somedatabasesmayseeimprovedperformancebyincreasingthisvalue
to8or16.Donotincreasethedegreeofparallelismgreaterthanthatavailablewithinyour
MapReduceclustertaskswillrunseriallyandwilllikelyincreasetheamountoftimerequiredto
performtheimport.Likewise,donotincreasethedegreeofparallismhigherthanthatwhichyour
databasecanreasonablysupport.Connecting100concurrentclientstoyourdatabasemayincrease
theloadonthedatabaseservertoapointwhereperformancesuffersasaresult.
Whenperformingparallelimports,Sqoopneedsacriterionbywhichitcansplittheworkload.Sqoop
usesasplittingcolumntosplittheworkload.Bydefault,Sqoopwillidentifytheprimarykeycolumn(if
present)inatableanduseitasthesplittingcolumn.Thelowandhighvaluesforthesplittingcolumn
areretrievedfromthedatabase,andthemaptasksoperateonevenlysizedcomponentsofthetotal
range.Forexample,ifyouhadatablewithaprimarykeycolumnofwhoseminimumvaluewas0
andmaximumvaluewas1000,andSqoopwasdirectedtouse4tasks,Sqoopwouldrunfour
processeswhicheachexecuteSQLstatementsoftheform  
,withsetto(0,250),(250,500),(500,750),and(750,1001)inthedifferenttasks.
Iftheactualvaluesfortheprimarykeyarenotuniformlydistributedacrossitsrange,thenthiscan
resultinunbalancedtasks.Youshouldexplicitlychooseadifferentcolumnwiththe
argument.Forexample,.Sqoopcannotcurrentlysplitonmulticolumnindices.If
yourtablehasnoindexcolumn,orhasamulticolumnkey,thenyoumustalsomanuallychoosea
splittingcolumn.

7.2.5.ControllingtheImportProcess
Bydefault,theimportprocesswilluseJDBCwhichprovidesareasonablecrossvendorimportchannel.
Somedatabasescanperformimportsinamorehighperformancefashionbyusingdatabasespecific
datamovementtools.Forexample,MySQLprovidesthetoolwhichcanexportdatafrom
MySQLtoothersystemsveryquickly.Bysupplyingtheargument,youarespecifyingthat
Sqoopshouldattemptthedirectimportchannel.Thischannelmaybehigherperformancethanusing
JDBC.Currently,directmodedoesnotsupportimportsoflargeobjectcolumns.
WhenimportingfromPostgreSQLinconjunctionwithdirectmode,youcansplittheimportinto
separatefilesafterindividualfilesreachacertainsize.Thissizelimitiscontrolledwiththe
argument.
Bydefault,Sqoopwillimportatablenamedtoadirectorynamedinsideyourhomedirectoryin
HDFS.Forexample,ifyourusernameis,thentheimporttoolwillwriteto
.Youcanadjusttheparentdirectoryoftheimportwiththe
argument.Forexample:



Thiscommandwouldwritetoasetoffilesinthedirectory.
Youcanalsoexplicitlychoosethetargetdirectory,likeso:



Thiswillimportthefilesintothedirectory.isincompatiblewith.
Whenusingdirectmode,youcanspecifyadditionalargumentswhichshouldbepassedtothe
underlyingtool.Iftheargumentisgivenonthecommandline,thensubsequentargumentsaresent
directlytotheunderlyingtool.Forexample,thefollowingadjuststhecharactersetusedby:



Bydefault,importsgotoanewtargetlocation.IfthedestinationdirectoryalreadyexistsinHDFS,
Sqoopwillrefusetoimportandoverwritethatdirectoryscontents.Ifyouusetheargument,
Sqoopwillimportdatatoatemporarydirectoryandthenrenamethefilesintothenormaltarget
KWWSVVTRRSDSDFKHRUJGRFV6TRRS8VHU*XLGHKWPO





6TRRS8VHU*XLGH Y

directoryinamannerthatdoesnotconflictwithexistingfilenamesinthatdirectory.
Note
Whenusingthedirectmodeofimport,certaindatabaseclientutilitiesare
expectedtobepresentintheshellpathofthetaskprocess.ForMySQLthe
utilitiesandarerequired,whereasforPostgreSQLtheutility
isrequired.

7.2.6.Controllingtypemapping
SqoopispreconfiguredtomapmostSQLtypestoappropriateJavaorHiverepresentatives.However
thedefaultmappingmightnotbesuitableforeveryoneandmightbeoverriddenby
(forchangingmappingtoJava)or(forchangingHivemapping).
Table3.Parametersforoverridingmapping
Argument

Description

 OverridemappingfromSQLtoJavatypeforconfiguredcolumns.
 OverridemappingfromSQLtoHivetypeforconfiguredcolumns.

Sqoopisexpectingcommaseparatedlistofmappinginform<nameofcolumn>=<newtype>.For
example:


Sqoopwillriseexceptionincasethatsomeconfiguredmappingwillnotbeused.

7.2.7.IncrementalImports
Sqoopprovidesanincrementalimportmodewhichcanbeusedtoretrieveonlyrowsnewerthansome
previouslyimportedsetofrows.
Thefollowingargumentscontrolincrementalimports:
Table4.Incrementalimportarguments:
Argument

Description

Specifiesthecolumntobeexaminedwhendeterminingwhichrowstoimport.

SpecifieshowSqoopdetermineswhichrowsarenew.Legalvaluesforinclude
and.

Specifiesthemaximumvalueofthecheckcolumnfromthepreviousimport.

Sqoopsupportstwotypesofincrementalimports:and.Youcanusethe
argumenttospecifythetypeofincrementalimporttoperform.
Youshouldspecifymodewhenimportingatablewherenewrowsarecontinuallybeingadded
withincreasingrowidvalues.Youspecifythecolumncontainingtherowsidwith.Sqoop
importsrowswherethecheckcolumnhasavaluegreaterthantheonespecifiedwith.
AnalternatetableupdatestrategysupportedbySqoopiscalledmode.Youshouldusethis
whenrowsofthesourcetablemaybeupdated,andeachsuchupdatewillsetthevalueofalast
modifiedcolumntothecurrenttimestamp.Rowswherethecheckcolumnholdsatimestampmore
recentthanthetimestampspecifiedwithareimported.
Attheendofanincrementalimport,thevaluewhichshouldbespecifiedasfora
subsequentimportisprintedtothescreen.Whenrunningasubsequentimport,youshouldspecify
inthiswaytoensureyouimportonlytheneworupdateddata.Thisishandledautomatically
bycreatinganincrementalimportasasavedjob,whichisthepreferredmechanismforperforminga
KWWSVVTRRSDSDFKHRUJGRFV6TRRS8VHU*XLGHKWPO





6TRRS8VHU*XLGH Y

recurringincrementalimport.Seethesectiononsavedjobslaterinthisdocumentformore
information.

7.2.8.FileFormats
Youcanimportdatainoneoftwofileformats:delimitedtextorSequenceFiles.
Delimitedtextisthedefaultimportformat.Youcanalsospecifyitexplicitlybyusingthe
argument.Thisargumentwillwritestringbasedrepresentationsofeachrecordtotheoutputfiles,
withdelimitercharactersbetweenindividualcolumnsandrows.Thesedelimitersmaybecommas,
tabs,orothercharacters.(Thedelimiterscanbeselectedsee"Outputlineformattingarguments.")
Thefollowingistheresultsofanexampletextbasedimport:




Delimitedtextisappropriateformostnonbinarydatatypes.Italsoreadilysupportsfurther
manipulationbyothertools,suchasHive.
SequenceFilesareabinaryformatthatstoreindividualrecordsincustomrecordspecificdatatypes.
ThesedatatypesaremanifestedasJavaclasses.Sqoopwillautomaticallygeneratethesedatatypes
foryou.Thisformatsupportsexactstorageofalldatainbinaryrepresentations,andisappropriatefor
storingbinarydata(forexample, columns),ordatathatwillbeprinciplymanipulatedby
customMapReduceprograms(readingfromSequenceFilesishigherperformancethanreadingfrom
textfiles,asrecordsdonotneedtobeparsed).
Avrodatafilesareacompact,efficientbinaryformatthatprovidesinteroperabilitywithapplications
writteninotherprogramminglanguages.Avroalsosupportsversioning,sothatwhen,e.g.,columns
areaddedorremovedfromatable,previouslyimporteddatafilescanbeprocessedalongwithnew
ones.
Bydefault,dataisnotcompressed.Youcancompressyourdatabyusingthedeflate(gzip)algorithm
withtheorargument,orspecifyanyHadoopcompressioncodecusingthe
argument.ThisappliestoSequenceFile,text,andAvrofiles.

7.2.9.LargeObjects
Sqoophandleslargeobjects(andcolumns)inparticularways.Ifthisdataistrulylarge,then
thesecolumnsshouldnotbefullymaterializedinmemoryformanipulation,asmostcolumnsare.
Instead,theirdataishandledinastreamingfashion.Largeobjectscanbestoredinlinewiththerest
ofthedata,inwhichcasetheyarefullymaterializedinmemoryoneveryaccess,ortheycanbe
storedinasecondarystoragefilelinkedtotheprimarydatastorage.Bydefault,largeobjectsless
than16MBinsizearestoredinlinewiththerestofthedata.Atalargersize,theyarestoredinfilesin
thesubdirectoryoftheimporttargetdirectory.Thesefilesarestoredinaseparateformat
optimizedforlargerecordstorage,whichcanaccomodaterecordsofupto2^63byteseach.Thesize
atwhichlobsspillintoseparatefilesiscontrolledbytheargument,whichtakesa
parameterspecifyingthelargestlobsizetokeepinline,inbytes.IfyousettheinlineLOBlimitto0,all
largeobjectswillbeplacedinexternalstorage.
Table5.Outputlineformattingarguments:
Argument



Description
Setsarequiredfieldenclosingcharacter
Setstheescapecharacter
Setsthefieldseparatorcharacter
Setstheendoflinecharacter

UsesMySQLsdefaultdelimiterset:fields:lines:escapedby:optionally
enclosedby:

Setsafieldenclosingcharacter

KWWSVVTRRSDSDFKHRUJGRFV6TRRS8VHU*XLGHKWPO





6TRRS8VHU*XLGH Y

Whenimportingtodelimitedfiles,thechoiceofdelimiterisimportant.Delimiterswhichappearinside
stringbasedfieldsmaycauseambiguousparsingoftheimporteddatabysubsequentanalysispasses.
Forexample,thestring shouldnotbeimportedwiththeendoffield
delimitersettoacomma.
Delimitersmaybespecifiedas:
acharacter()
anescapecharacter().Supportedescapecharactersare:
(backspace)
(newline)
(carriagereturn)
(tab)
(doublequote)
(singlequote)
(backslash)
(NUL)ThiswillinsertNULcharactersbetweenfieldsorlines,orwilldisable
enclosing/escapingifusedforoneofthe,,or
arguments.
TheoctalrepresentationofaUTF8characterscodepoint.Thisshouldbeoftheform,
whereoooistheoctalvalue.Forexample,wouldyieldthe

character.
ThehexadecimalrepresentationofaUTF8characterscodepoint.Thisshouldbeofthe
form,wherehhhisthehexvalue.Forexample,wouldyield
thecarriagereturncharacter.
Thedefaultdelimitersareacomma()forfields,anewline()forrecords,noquotecharacter,and
noescapecharacter.Notethatthiscanleadtoambiguous/unparsiblerecordsifyouimportdatabase
recordscontainingcommasornewlinesinthefielddata.Forunambiguousparsing,bothmustbe
enabled.Forexample,via.
Ifunambiguousdelimiterscannotbepresented,thenuseenclosingandescapingcharacters.The
combinationof(optional)enclosingandescapingcharacterswillallowunambiguousparsingoflines.
Forexample,supposeonecolumnofadatasetcontainedthefollowingvalues:



Thefollowingargumentswouldprovidedelimiterswhichcanbeunambiguouslyparsed:


(Notethattopreventtheshellfrommanglingtheenclosingcharacter,wehaveenclosedthat
argumentitselfinsinglequotes.)
Theresultoftheaboveargumentsappliedtotheabovedatasetwouldbe:



Heretheimportedstringsareshowninthecontextofadditionalcolumns(,etc.)to
demonstratethefulleffectofenclosingandescaping.Theenclosingcharacterisonlystrictlynecessary
whendelimitercharactersappearintheimportedtext.Theenclosingcharactercanthereforebe
specifiedasoptional:


Whichwouldresultinthefollowingimport:



KWWSVVTRRSDSDFKHRUJGRFV6TRRS8VHU*XLGHKWPO





6TRRS8VHU*XLGH Y

Note
EventhoughHivesupportsescapingcharacters,itdoesnothandleescapingof
newlinecharacter.Also,itdoesnotsupportthenotionofenclosingcharacters
thatmayincludefielddelimitersintheenclosedstring.Itistherefore
recommendedthatyouchooseunambiguousfieldandrecordterminating
delimiterswithoutthehelpofescapingandenclosingcharacterswhenworking
withHivethisisduetolimitationsofHivesinputparsingabilities.
Theargumentisashorthandargumentwhichusesthedefaultdelimitersforthe
program.Ifyouusethedelimitersinconjunctionwithadirectmodeimport(with
),veryfastimportscanbeachieved.
Whilethechoiceofdelimitersismostimportantforatextmodeimport,itisstillrelevantifyouimport
toSequenceFileswith.Thegeneratedclass'methodwillusethedelimiters
youspecify,sosubsequentformattingoftheoutputdatawillrelyonthedelimitersyouchoose.
Table6.Inputparsingarguments:
Argument

Description
Setsarequiredfieldencloser

Setstheinputescapecharacter
 Setstheinputfieldseparator

Setstheinputendoflinecharacter
 Setsafieldenclosingcharacter


WhenSqoopimportsdatatoHDFS,itgeneratesaJavaclasswhichcanreinterpretthetextfilesthatit
createswhendoingadelimitedformatimport.Thedelimitersarechosenwithargumentssuchas
thiscontrolsbothhowthedataiswrittentodisk,andhowthegenerated
methodreinterpretsthisdata.Thedelimitersusedbythemethodcanbechosenindependently
oftheoutputarguments,byusing,andsoon.Thisisuseful,forexample,to
generateclasseswhichcanparserecordscreatedwithonesetofdelimiters,andemittherecordstoa
differentsetoffilesusingaseparatesetofdelimiters.
Table7.Hivearguments:
Argument


Description
Override  
ImporttablesintoHive(UsesHivesdefaultdelimitersifnoneareset.)
OverwriteexistingdataintheHivetable.
Ifset,thenthejobwillfailifthetargethive
tableexits.Bydefaultthispropertyisfalse.

SetsthetablenametousewhenimportingtoHive.

Replace\n,\r,and\01fromstringfieldswithuserdefinedstringwhenimporting
toHive.
Nameofahivefieldtopartitionareshardedon

Stringvaluethatservesaspartitionkeyforthisimportedintohiveinthisjob.

OverridedefaultmappingfromSQLtypetoHivetypeforconfiguredcolumns.

Drops\n,\r,and\01fromstringfieldswhenimportingtoHive.

7.2.10.ImportingDataIntoHive
SqoopsimporttoolsmainfunctionistouploadyourdataintofilesinHDFS.IfyouhaveaHive
metastoreassociatedwithyourHDFScluster,SqoopcanalsoimportthedataintoHivebygenerating
andexecutingastatementtodefinethedataslayoutinHive.ImportingdataintoHiveis
assimpleasaddingtheoptiontoyourSqoopcommandline.
KWWSVVTRRSDSDFKHRUJGRFV6TRRS8VHU*XLGHKWPO





6TRRS8VHU*XLGH Y

IftheHivetablealreadyexists,youcanspecifytheoptiontoindicatethatexistingtable
inhivemustbereplaced.AfteryourdataisimportedintoHDFSorthisstepisomitted,Sqoopwill
generateaHivescriptcontainingaoperationdefiningyourcolumnsusingHivestypes,and
a  statementtomovethedatafilesintoHiveswarehousedirectory.
ThescriptwillbeexecutedbycallingtheinstalledcopyofhiveonthemachinewhereSqoopisrun.If
youhavemultipleHiveinstallations,orisnotinyour ,usetheoptiontoidentifythe
Hiveinstallationdirectory.Sqoopwilluse  fromhere.
Note
Thisfunctionisincompatiblewithand.
EventhoughHivesupportsescapingcharacters,itdoesnothandleescapingofnewlinecharacter.
Also,itdoesnotsupportthenotionofenclosingcharactersthatmayincludefielddelimitersinthe
enclosedstring.Itisthereforerecommendedthatyouchooseunambiguousfieldandrecord
terminatingdelimiterswithoutthehelpofescapingandenclosingcharacterswhenworkingwithHive
thisisduetolimitationsofHivesinputparsingabilities.Ifyoudouse,,or
whenimportingdataintoHive,Sqoopwillprintawarningmessage.
HivewillhaveproblemsusingSqoopimporteddataifyourdatabasesrowscontainstringfieldsthat
haveHivesdefaultrowdelimiters(andcharacters)orcolumndelimiters(characters)present
inthem.Youcanusetheoptiontodropthosecharactersonimporttogive
Hivecompatibletextdata.Alternatively,youcanusetheoptiontoreplace
thosecharacterswithauserdefinedstringonimporttogiveHivecompatibletextdata.Theseoptions
shouldonlybeusedifyouuseHivesdefaultdelimitersandshouldnotbeusedifdifferentdelimiters
arespecified.
SqoopwillpassthefieldandrecorddelimitersthroughtoHive.Ifyoudonotsetanydelimitersanddo
use,thefielddelimiterwillbesettoandtherecorddelimiterwillbesettotobe
consistentwithHivesdefaults.
ThetablenameusedinHiveis,bydefault,thesameasthatofthesourcetable.Youcancontrolthe
outputtablenamewiththeoption.
Hivecanputdataintopartitionsformoreefficientqueryperformance.YoucantellaSqoopjobto
importdataforHiveintoaparticularpartitionbyspecifyingtheand
arguments.Thepartitionvaluemustbeastring.PleaseseetheHivedocumentationformore
detailsonpartitioning.
YoucanimportcompressedtablesintoHiveusingtheandoptions.One
downsidetocompressingtablesimportedintoHiveisthatmanycodecscannotbesplitforprocessing
byparallelmaptasks.Thelzopcodec,however,doessupportsplitting.Whenimportingtableswith
thiscodec,SqoopwillautomaticallyindexthefilesforsplittingandconfiguringanewHivetablewith
thecorrectInputFormat.Thisfeaturecurrentlyrequiresthatallpartitionsofatablebecompressed
withthelzopcodec.
Table8.HBasearguments:
Argument

Description
Setsthetargetcolumnfamilyfortheimport

Ifspecified,createmissingHBasetables

Specifieswhichinputcolumntouseastherowkey
 SpecifiesanHBasetabletouseasthetargetinsteadofHDFS


7.2.11.ImportingDataIntoHBase
SqoopsupportsadditionalimporttargetsbeyondHDFSandHive.Sqoopcanalsoimportrecordsintoa
tableinHBase.
Byspecifying,youinstructSqooptoimporttoatableinHBaseratherthanadirectoryin
HDFS.Sqoopwillimportdatatothetablespecifiedastheargumentto.Eachrowofthe
KWWSVVTRRSDSDFKHRUJGRFV6TRRS8VHU*XLGHKWPO





6TRRS8VHU*XLGH Y

inputtablewillbetransformedintoanHBaseoperationtoarowoftheoutputtable.Thekeyfor
eachrowistakenfromacolumnoftheinput.BydefaultSqoopwillusethesplitbycolumnastherow
keycolumn.Ifthatisnotspecified,itwilltrytoidentifytheprimarykeycolumn,ifany,ofthesource
table.Youcanmanuallyspecifytherowkeycolumnwith.Eachoutputcolumnwillbe
placedinthesamecolumnfamily,whichmustbespecifiedwith.
Note
Thisfunctionisincompatiblewithdirectimport(parameter).
Ifthetargettableandcolumnfamilydonotexist,theSqoopjobwillexitwithanerror.Youshould
createthetargettableandcolumnfamilybeforerunninganimport.Ifyouspecify,
Sqoopwillcreatethetargettableandcolumnfamilyiftheydonotexist,usingthedefaultparameters
fromyourHBaseconfiguration.
SqoopcurrentlyserializesallvaluestoHBasebyconvertingeachfieldtoitsstringrepresentation(asif
youwereimportingtoHDFSintextmode),andtheninsertstheUTF8bytesofthisstringinthe
targetcell.
Table9.Codegenerationarguments:
Argument

Description
Outputdirectoryforcompiledobjects
Setsthegeneratedclassname.Thisoverrides.Whencombinedwith
,setstheinputclass.

Disablecodegenerationusespecifiedjar

Outputdirectoryforgeneratedcode

Putautogeneratedclassesinthispackage

OverridedefaultmappingfromSQLtypetoJavatypeforconfiguredcolumns.

Asmentionedearlier,abyproductofimportingatabletoHDFSisaclasswhichcanmanipulatethe
importeddata.IfthedataisstoredinSequenceFiles,thisclasswillbeusedforthedatasserialization
container.Therefore,youshouldusethisclassinyoursubsequentMapReduceprocessingofthedata.
Theclassistypicallynamedafterthetableatablenamedwillgenerateaclassnamed.You
maywanttooverridethisclassname.Forexample,ifyourtableisnamed,youmaywantto
specifyinstead.Similarly,youcanspecifyjustthepackagenamewith.
Thefollowingimportgeneratesaclassnamed:


Thesourcefileforyourclasswillbewrittentothecurrentworkingdirectorywhenyourun.
Youcancontroltheoutputdirectorywith.Forexample,.
Theimportprocesscompilesthesourceintoandfilestheseareordinarilystoredunder.
Youcanselectanalternatetargetdirectorywith.Forexample,.
Ifyoualreadyhaveacompiledclassthatcanbeusedtoperformtheimportandwanttosuppressthe
codegenerationaspectoftheimportprocess,youcanuseanexistingjarandclassbyprovidingthe
andoptions.Forexample:



Thiscommandwillloadtheclassoutof.

7.3.ExampleInvocations
Thefollowingexamplesillustratehowtousetheimporttoolinavarietyofsituations.
KWWSVVTRRSDSDFKHRUJGRFV6TRRS8VHU*XLGHKWPO





6TRRS8VHU*XLGH Y

Abasicimportofatablenamedinthedatabase:


Abasicimportrequiringalogin:




Selectingspecificcolumnsfromthetable:



Controllingtheimportparallelism(using8paralleltasks):



EnablingtheMySQL"directmode"fastpath:



StoringdatainSequenceFiles,andsettingthegeneratedclassnameto:



Specifyingthedelimiterstouseinatextmodeimport:




ImportingthedatatoHive:



Importingonlynewemployees:



Changingthesplittingcolumnfromthedefault:



Verifyingthatanimportwassuccessful:












Performinganincrementalimportofnewdata,afterhavingalreadyimportedthefirst100,000rowsof
atable:

KWWSVVTRRSDSDFKHRUJGRFV6TRRS8VHU*XLGHKWPO





6TRRS8VHU*XLGH Y




8.
8.1.Purpose
8.2.Syntax
8.3.ExampleInvocations

8.1.Purpose
ThetoolimportsasetoftablesfromanRDBMStoHDFS.Datafromeachtableis
storedinaseparatedirectoryinHDFS.
Forthetooltobeuseful,thefollowingconditionsmustbemet:
Eachtablemusthaveasinglecolumnprimarykey.
Youmustintendtoimportallcolumnsofeachtable.
Youmustnotintendtousenondefaultsplittingcolumn,norimposeanyconditionsviaa
 clause.

8.2.Syntax



AlthoughtheHadoopgenericargumentsmustpreceedanyimportarguments,theimportarguments
canbeenteredinanyorderwithrespecttooneanother.
Table10.Commonarguments
Argument

Description
SpecifyJDBCconnectstring
 Specifyconnectionmanagerclasstouse

ManuallyspecifyJDBCdriverclasstouse

Override$HADOOP_HOME

Printusageinstructions

Readpasswordfromconsole

Setauthenticationpassword

Setauthenticationusername

Printmoreinformationwhileworking
 Optionalpropertiesfilethatprovidesconnectionparameters


Table11.Importcontrolarguments:
Argument

Description
ImportsdatatoAvroDataFiles

ImportsdatatoSequenceFiles

Importsdataasplaintext(default)

Usedirectimportfastpath
 Splittheinputstreameverynbyteswhenimportingindirectmode
 SetthemaximumsizeforaninlineLOB

Usenmaptaskstoimportinparallel
 HDFSparentfortabledestination

Enablecompression
 UseHadoopcodec(defaultgzip)

Theseargumentsbehaveinthesamemannerastheydowhenusedforthetool,butthe
KWWSVVTRRSDSDFKHRUJGRFV6TRRS8VHU*XLGHKWPO





6TRRS8VHU*XLGH Y

,,,andargumentsareinvalidfor.

Table12.Outputlineformattingarguments:
Argument

Description
Setsarequiredfieldenclosingcharacter
Setstheescapecharacter




Setsthefieldseparatorcharacter
Setstheendoflinecharacter

UsesMySQLsdefaultdelimiterset:fields:lines:escapedby:optionally
enclosedby:

Setsafieldenclosingcharacter

Table13.Inputparsingarguments:
Argument

Description
Setsarequiredfieldencloser

Setstheinputescapecharacter
 Setstheinputfieldseparator

Setstheinputendoflinecharacter
 Setsafieldenclosingcharacter


Table14.Hivearguments:
Argument


Description
Override  
ImporttablesintoHive(UsesHivesdefaultdelimitersifnoneareset.)
OverwriteexistingdataintheHivetable.
Ifset,thenthejobwillfailifthetargethive
tableexits.Bydefaultthispropertyisfalse.

SetsthetablenametousewhenimportingtoHive.

Replace\n,\r,and\01fromstringfieldswithuserdefinedstringwhenimporting
toHive.
Nameofahivefieldtopartitionareshardedon

Stringvaluethatservesaspartitionkeyforthisimportedintohiveinthisjob.

OverridedefaultmappingfromSQLtypetoHivetypeforconfiguredcolumns.

Drops\n,\r,and\01fromstringfieldswhenimportingtoHive.

Table15.Codegenerationarguments:
Argument

Description
Outputdirectoryforcompiledobjects

Disablecodegenerationusespecifiedjar

Outputdirectoryforgeneratedcode
 Putautogeneratedclassesinthispackage


Thetooldoesnotsupporttheargument.Youmay,however,specifya
packagewithinwhichallgeneratedclasseswillbeplaced.

8.3.ExampleInvocations
KWWSVVTRRSDSDFKHRUJGRFV6TRRS8VHU*XLGHKWPO





6TRRS8VHU*XLGH Y

Importalltablesfromthedatabase:


Verifyingthatitworked:



 

  

9.
9.1.Purpose
9.2.Syntax
9.3.Insertsvs.Updates
9.4.ExportsandTransactions
9.5.FailedExports
9.6.ExampleInvocations

9.1.Purpose
ThetoolexportsasetoffilesfromHDFSbacktoanRDBMS.Thetargettablemustalreadyexist
inthedatabase.Theinputfilesarereadandparsedintoasetofrecordsaccordingtotheuser
specifieddelimiters.
Thedefaultoperationistotransformtheseintoasetof statementsthatinjecttherecordsinto
thedatabase.In"updatemode,"Sqoopwillgeneratestatementsthatreplaceexistingrecordsin
thedatabase.

9.2.Syntax



AlthoughtheHadoopgenericargumentsmustpreceedanyexportarguments,theexportarguments
canbeenteredinanyorderwithrespecttooneanother.
Table16.Commonarguments
Argument

Description
SpecifyJDBCconnectstring
 Specifyconnectionmanagerclasstouse

ManuallyspecifyJDBCdriverclasstouse

Override$HADOOP_HOME

Printusageinstructions

Readpasswordfromconsole

Setauthenticationpassword

Setauthenticationusername

Printmoreinformationwhileworking
 Optionalpropertiesfilethatprovidesconnectionparameters


Table17.Exportcontrolarguments:
Argument





Description
Usedirectexportfastpath
HDFSsourcepathfortheexport
Usenmaptaskstoexportinparallel
Tabletopopulate

KWWSVVTRRSDSDFKHRUJGRFV6TRRS8VHU*XLGHKWPO





6TRRS8VHU*XLGH Y




Anchorcolumntouseforupdates.Useacommaseparatedlistofcolumnsif
therearemorethanonecolumn.
Specifyhowupdatesareperformedwhennewrowsarefoundwithnon
matchingkeysindatabase.
Legalvaluesforinclude(default)and.

Thestringtobeinterpretedasnullforstringcolumns

Thetableinwhichdatawillbestagedbeforebeinginsertedintothe
destinationtable.
Indicatesthatanydatapresentinthestagingtablecanbedeleted.
Usebatchmodeforunderlyingstatementexecution.

Thestringtobeinterpretedasnullfornonstringcolumns

Theandargumentsarerequired.Thesespecifythetabletopopulateinthe
database,andthedirectoryinHDFSthatcontainsthesourcedata.
Youcancontrolthenumberofmappersindependentlyfromthenumberoffilespresentinthe
directory.Exportperformancedependsonthedegreeofparallelism.Bydefault,Sqoopwillusefour
tasksinparallelfortheexportprocess.Thismaynotbeoptimalyouwillneedtoexperimentwith
yourownparticularsetup.Additionaltasksmayofferbetterconcurrency,butifthedatabaseisalready
bottleneckedonupdatingindices,invokingtriggers,andsoon,thenadditionalloadmaydecrease
performance.Theorargumentscontrolthenumberofmaptasks,whichisthedegree
ofparallelismused.
MySQLprovidesadirectmodeforexportsaswell,usingthetool.WhenexportingtoMySQL,
usetheargumenttospecifythiscodepath.Thismaybehigherperformancethanthestandard
JDBCcodepath.
Note
WhenusingexportindirectmodewithMySQL,theMySQLbulkutility
mustbeavailableintheshellpathofthetaskprocess.
Theandargumentsareoptional.Ifisnot
specified,thenthestring"null"willbeinterpretedasnullforstringtypecolumns.If
isnotspecified,thenboththestring"null"andtheemptystringwillbeinterpretedasnullfor
nonstringcolumns.Notethat,theemptystringwillbealwaysinterpretedasnullfornonstring
columns,inadditiontootherstringifspecifiedby.
SinceSqoopbreaksdownexportprocessintomultipletransactions,itispossiblethatafailedexport
jobmayresultinpartialdatabeingcommittedtothedatabase.Thiscanfurtherleadtosubsequent
jobsfailingduetoinsertcollisionsinsomecases,orleadtoduplicateddatainothers.Youcan
overcomethisproblembyspecifyingastagingtableviatheoptionwhichactsasan
auxiliarytablethatisusedtostageexporteddata.Thestageddataisfinallymovedtothedestination
tableinasingletransaction.
Inordertousethestagingfacility,youmustcreatethestagingtablepriortorunningtheexportjob.
Thistablemustbestructurallyidenticaltothetargettable.Thistableshouldeitherbeemptybefore
theexportjobruns,ortheoptionmustbespecified.Ifthestagingtablecontains
dataandtheoptionisspecified,Sqoopwilldeleteallofthedatabeforestartingthe
exportjob.
Note
Supportforstagingdatapriortopushingitintothedestinationtableisnot
availableforexports.Itisalsonotavailablewhenexportisinvokedusing
theoptionforupdatingexistingdata.

9.3.Insertsvs.Updates
Bydefault,appendsnewrowstoatableeachinputrecordistransformedintoan 
KWWSVVTRRSDSDFKHRUJGRFV6TRRS8VHU*XLGHKWPO





6TRRS8VHU*XLGH Y

statementthataddsarowtothetargetdatabasetable.Ifyourtablehasconstraints(e.g.,aprimary
keycolumnwhosevaluesmustbeunique)andalreadycontainsdata,youmusttakecaretoavoid
insertingrecordsthatviolatetheseconstraints.Theexportprocesswillfailifan statementfails.
Thismodeisprimarilyintendedforexportingrecordstoanew,emptytableintendedtoreceivethese
results.
Ifyouspecifytheargument,Sqoopwillinsteadmodifyanexistingdatasetinthedatabase.
Eachinputrecordistreatedasanstatementthatmodifiesanexistingrow.Therowastatement
modifiesisdeterminedbythecolumnname(s)specifiedwith.Forexample,considerthe
followingtabledefinition:

  
 
 

ConsideralsoadatasetinHDFScontainingrecordslikethese:



Runningwillrunanexportjob
thatexecutesSQLstatementsbasedonthedatalikeso:
 
 

Ifanstatementmodifiesnorows,thisisnotconsideredanerrortheexportwillsilently
continue.(Ineffect,thismeansthatanupdatebasedexportwillnotinsertnewrowsintothe
database.)Likewise,ifthecolumnspecifiedwithdoesnotuniquelyidentifyrowsand
multiplerowsareupdatedbyasinglestatement,thisconditionisalsoundetected.
Theargumentcanalsobegivenacommaseparatedlistofcolumnnames.Inwhichcase,
Sqoopwillmatchallkeysfromthislistbeforeupdatinganyexistingrecord.
Dependingonthetargetdatabase,youmayalsospecifytheargumentwith
modeifyouwanttoupdaterowsiftheyexistinthedatabasealreadyorinsertrowsiftheydonot
existyet.
Table18.Inputparsingarguments:
Argument

Description
Setsarequiredfieldencloser

Setstheinputescapecharacter
 Setstheinputfieldseparator

Setstheinputendoflinecharacter
 Setsafieldenclosingcharacter


Table19.Outputlineformattingarguments:
Argument



Description
Setsarequiredfieldenclosingcharacter
Setstheescapecharacter
Setsthefieldseparatorcharacter
Setstheendoflinecharacter

UsesMySQLsdefaultdelimiterset:fields:lines:escapedby:optionally
enclosedby:

Setsafieldenclosingcharacter

KWWSVVTRRSDSDFKHRUJGRFV6TRRS8VHU*XLGHKWPO





6TRRS8VHU*XLGH Y

Sqoopautomaticallygeneratescodetoparseandinterpretrecordsofthefilescontainingthedatato
beexportedbacktothedatabase.Ifthesefileswerecreatedwithnondefaultdelimiters(comma
separatedfieldswithnewlineseparatedrecords),youshouldspecifythesamedelimitersagainsothat
Sqoopcanparseyourfiles.
Ifyouspecifyincorrectdelimiters,Sqoopwillfailtofindenoughcolumnsperline.Thiswillcause
exportmaptaskstofailbythrowing.
Table20.Codegenerationarguments:
Argument

Description
Outputdirectoryforcompiledobjects
Setsthegeneratedclassname.Thisoverrides.Whencombinedwith
,setstheinputclass.

Disablecodegenerationusespecifiedjar

Outputdirectoryforgeneratedcode

Putautogeneratedclassesinthispackage

OverridedefaultmappingfromSQLtypetoJavatypeforconfiguredcolumns.

Iftherecordstobeexportedweregeneratedastheresultofapreviousimport,thentheoriginal
generatedclasscanbeusedtoreadthedataback.Specifyingandobviatethe
needtospecifydelimitersinthiscase.
Theuseofexistinggeneratedcodeisincompatiblewithanupdatemodeexportrequires
newcodegenerationtoperformtheupdate.Youcannotuse,andmustfullyspecifyanynon
defaultdelimiters.

9.4.ExportsandTransactions
Exportsareperformedbymultiplewritersinparallel.Eachwriterusesaseparateconnectiontothe
databasethesehaveseparatetransactionsfromoneanother.Sqoopusesthemultirow syntax
toinsertupto100recordsperstatement.Every100statements,thecurrenttransactionwithina
writertaskiscommitted,causingacommitevery10,000rows.Thisensuresthattransactionbuffers
donotgrowwithoutbound,andcauseoutofmemoryconditions.Therefore,anexportisnotan
atomicprocess.Partialresultsfromtheexportwillbecomevisiblebeforetheexportiscomplete.

9.5.FailedExports
Exportsmayfailforanumberofreasons:
LossofconnectivityfromtheHadoopclustertothedatabase(eitherduetohardwarefault,
orserversoftwarecrashes)
Attemptingto arowwhichviolatesaconsistencyconstraint(forexample,insertinga
duplicateprimarykeyvalue)
AttemptingtoparseanincompleteormalformedrecordfromtheHDFSsourcedata
Attemptingtoparserecordsusingincorrectdelimiters
Capacityissues(suchasinsufficientRAMordiskspace)
Ifanexportmaptaskfailsduetotheseorotherreasons,itwillcausetheexportjobtofail.The
resultsofafailedexportareundefined.Eachexportmaptaskoperatesinaseparatetransaction.
Furthermore,individualmaptaskstheircurrenttransactionperiodically.Ifataskfails,the
currenttransactionwillberolledback.Anypreviouslycommittedtransactionswillremaindurablein
thedatabase,leadingtoapartiallycompleteexport.

9.6.ExampleInvocations
Abasicexporttopopulateatablenamed:


KWWSVVTRRSDSDFKHRUJGRFV6TRRS8VHU*XLGHKWPO





6TRRS8VHU*XLGH Y



Thisexampletakesthefilesinandinjectstheircontentsintothetableinthe
databaseon.Thetargettablemustalreadyexistinthedatabase.Sqoopperformsaset
of  operations,withoutregardforexistingcontent.IfSqoopattemptstoinsertrowswhich
violateconstraintsinthedatabase(forexample,aparticularprimarykeyvaluealreadyexists),then
theexportfails.

10.SavedJobs
Importsandexportscanberepeatedlyperformedbyissuingthesamecommandmultipletimes.
Especiallywhenusingtheincrementalimportcapability,thisisanexpectedscenario.
Sqoopallowsyoutodefinesavedjobswhichmakethisprocesseasier.Asavedjobrecordsthe
configurationinformationrequiredtoexecuteaSqoopcommandatalatertime.Thesectiononthe
tooldescribeshowtocreateandworkwithsavedjobs.
Bydefault,jobdescriptionsaresavedtoaprivaterepositorystoredin .Youcanconfigure
Sqooptoinsteaduseasharedmetastore,whichmakessavedjobsavailabletomultipleusersacrossa
sharedcluster.Startingthemetastoreiscoveredbythesectiononthetool.

11.
11.1.Purpose
11.2.Syntax
11.3.Savedjobsandpasswords
11.4.Savedjobsandincrementalimports

11.1.Purpose
Thejobtoolallowsyoutocreateandworkwithsavedjobs.Savedjobsremembertheparameters
usedtospecifyajob,sotheycanbereexecutedbyinvokingthejobbyitshandle.
Ifasavedjobisconfiguredtoperformanincrementalimport,stateregardingthemostrecently
importedrowsisupdatedinthesavedjobtoallowthejobtocontinuallyimportonlythenewestrows.

11.2.Syntax



AlthoughtheHadoopgenericargumentsmustpreceedanyjobarguments,thejobargumentscanbe
enteredinanyorderwithrespecttooneanother.
Table21.Jobmanagementoptions:
Argument Description

Defineanewsavedjobwiththespecifiedjobid(name).AsecondSqoopcommandline,

separatedbyashouldbespecifiedthisdefinesthesavedjob.

Deleteasavedjob.

Listallsavedjobs

Givenajobdefinedwith,runthesavedjob.
Showtheparametersforasavedjob.

Creatingsavedjobsisdonewiththeaction.Thisoperationrequiresafollowedbyatool
nameanditsarguments.Thetoolanditsargumentswillformthebasisofthesavedjob.Consider:



KWWSVVTRRSDSDFKHRUJGRFV6TRRS8VHU*XLGHKWPO





6TRRS8VHU*XLGH Y

Thiscreatesajobnamedwhichcanbeexecutedlater.Thejobisnotrun.Thisjobisnow
availableinthelistofsavedjobs:




Wecaninspecttheconfigurationofajobwiththeaction:

 









Andifwearesatisfiedwithit,wecanrunthejobwith:

  


Theactionallowsyoutooverrideargumentsofthesavedjobbysupplyingthemaftera.For
example,ifthedatabasewerechangedtorequireausername,wecouldspecifytheusernameand
passwordwith:



Table22.Metastoreconnectionoptions:
Argument

Description
 SpecifiestheJDBCconnectstringusedtoconnecttothemetastore
Bydefault,aprivatemetastoreisinstantiatedin .Ifyouhaveconfiguredahosted
metastorewiththetool,youcanconnecttoitbyspecifyingtheargument.
ThisisaJDBCconnectstringjustliketheonesusedtoconnecttodatabasesforimport.
In,youcanconfigurewiththisaddress,soyoudo
nothavetosupplytousearemotemetastore.Thisparametercanalsobemodifiedto
movetheprivatemetastoretoalocationonyourfilesystemotherthanyourhomedirectory.
Ifyouconfigurewiththevalue,thenyoumustexplicitly
supply.
Table23.Commonoptions:
Argument Description

Printusageinstructions

Printmoreinformationwhileworking

11.3.Savedjobsandpasswords
TheSqoopmetastoreisnotasecureresource.Multipleuserscanaccessitscontents.Forthisreason,
Sqoopdoesnotstorepasswordsinthemetastore.Ifyoucreateajobthatrequiresapassword,you
willbepromptedforthatpasswordeachtimeyouexecutethejob.
Youcanenablepasswordsinthemetastorebysettingtointhe
configuration.
Notethatyouhavetosettoifyouareexecutingsavedjobsvia
KWWSVVTRRSDSDFKHRUJGRFV6TRRS8VHU*XLGHKWPO





6TRRS8VHU*XLGH Y

OoziebecauseSqoopcannotprompttheusertoenterpasswordswhilebeingexecutedasOozietasks.

11.4.Savedjobsandincrementalimports
Incrementalimportsareperformedbycomparingthevaluesinacheckcolumnagainstareference
valueforthemostrecentimport.Forexample,iftheargumentwasspecified,along
withand,allrowswithwillbeimported.Ifanincremental
importisrunfromthecommandline,thevaluewhichshouldbespecifiedasina
subsequentincrementalimportwillbeprintedtothescreenforyourreference.Ifanincremental
importisrunfromasavedjob,thisvaluewillberetainedinthesavedjob.Subsequentrunsof
 willcontinuetoimportonlynewerrowsthanthosepreviouslyimported.

12.
12.1.Purpose
12.2.Syntax

12.1.Purpose
ThetoolconfiguresSqooptohostasharedmetadatarepository.Multipleusersand/orremote
userscandefineandexecutesavedjobs(createdwith)definedinthismetastore.
Clientsmustbeconfiguredtoconnecttothemetastoreinorwiththe
argument.

12.2.Syntax



AlthoughtheHadoopgenericargumentsmustpreceedanymetastorearguments,themetastore
argumentscanbeenteredinanyorderwithrespecttooneanother.
Table24.Metastoremanagementoptions:
Argument Description

Shutsdownarunningmetastoreinstanceonthesamemachine.
RunninglaunchesasharedHSQLDBdatabaseinstanceonthecurrentmachine.Clients
canconnecttothismetastoreandcreatejobswhichcanbesharedbetweenusersforexecution.
Thelocationofthemetastoresfilesondiskiscontrolledbythepropertyin
.Thisshouldpointtoadirectoryonthelocalfilesystem.
ThemetastoreisavailableoverTCP/IP.Theportiscontrolledbythe
configurationparameter,anddefaultsto16000.
Clientsshouldconnecttothemetastorebyspecifyingor
withthevalue.Forexample,
.
ThismetastoremaybehostedonamachinewithintheHadoopcluster,orelsewhereonthenetwork.

13.
13.1.Purpose
13.2.Syntax

13.1.Purpose
Themergetoolallowsyoutocombinetwodatasetswhereentriesinonedatasetshouldoverwrite
KWWSVVTRRSDSDFKHRUJGRFV6TRRS8VHU*XLGHKWPO





6TRRS8VHU*XLGH Y

entriesofanolderdataset.Forexample,anincrementalimportruninlastmodifiedmodewill
generatemultipledatasetsinHDFSwheresuccessivelynewerdataappearsineachdataset.The
toolwill"flatten"twodatasetsintoone,takingthenewestavailablerecordsforeachprimarykey.

13.2.Syntax



AlthoughtheHadoopgenericargumentsmustpreceedanymergearguments,thejobargumentscan
beenteredinanyorderwithrespecttooneanother.
Table25.Mergeoptions:
Argument

Description

 Specifythenameoftherecordspecificclasstouseduringthemergejob.

Specifythenameofthejartoloadtherecordclassfrom.
Specifythenameofacolumntouseasthemergekey.

Specifythepathofthenewerdataset.

Specifythepathoftheolderdataset.
 Specifythetargetpathfortheoutputofthemergejob.



ThetoolrunsaMapReducejobthattakestwodirectoriesasinput:anewerdataset,andanolder
one.Thesearespecifiedwithandrespectively.TheoutputoftheMapReducejobwill
beplacedinthedirectoryinHDFSspecifiedby.
Whenmergingthedatasets,itisassumedthatthereisauniqueprimarykeyvalueineachrecord.The
columnfortheprimarykeyisspecifiedwith.Multiplerowsinthesamedatasetshouldnot
havethesameprimarykey,orelsedatalossmayoccur.
Toparsethedatasetandextractthekeycolumn,theautogeneratedclassfromapreviousimport
mustbeused.Youshouldspecifytheclassnameandjarfilewithand.Ifthisis
notavailab,eyoucanrecreatetheclassusingthetool.
Themergetoolistypicallyrunafteranincrementalimportwiththedatelastmodifiedmode(
).
Supposingtwoincrementalimportswereperformed,wheresomeolderdataisinanHDFSdirectory
namedandnewerdataisinanHDFSdirectorynamed,thesecouldbemergedlikeso:

 

ThiswouldrunaMapReducejobwherethevalueinthecolumnofeachrowisusedtojoinrows
rowsinthedatasetwillbeusedinpreferencetorowsinthedataset.
ThiscanbeusedwithbothSequenceFile,Avroandtextbasedincrementalimports.Thefiletypesof
thenewerandolderdatasetsmustbethesame.

14.
14.1.Purpose
14.2.Syntax
14.3.ExampleInvocations

14.1.Purpose
ThetoolgeneratesJavaclasseswhichencapsulateandinterpretimportedrecords.TheJava
definitionofarecordisinstantiatedaspartoftheimportprocess,butcanalsobeperformed
separately.Forexample,ifJavasourceislost,itcanberecreated.Newversionsofaclasscanbe
createdwhichusedifferentdelimitersbetweenfields,andsoon.
KWWSVVTRRSDSDFKHRUJGRFV6TRRS8VHU*XLGHKWPO





6TRRS8VHU*XLGH Y

14.2.Syntax



AlthoughtheHadoopgenericargumentsmustpreceedanycodegenarguments,thecodegen
argumentscanbeenteredinanyorderwithrespecttooneanother.
Table26.Commonarguments
Argument

Description
SpecifyJDBCconnectstring
 Specifyconnectionmanagerclasstouse

ManuallyspecifyJDBCdriverclasstouse

Override$HADOOP_HOME

Printusageinstructions

Readpasswordfromconsole

Setauthenticationpassword

Setauthenticationusername

Printmoreinformationwhileworking
 Optionalpropertiesfilethatprovidesconnectionparameters


Table27.Codegenerationarguments:
Argument

Description
Outputdirectoryforcompiledobjects
Setsthegeneratedclassname.Thisoverrides.Whencombinedwith
,setstheinputclass.

Disablecodegenerationusespecifiedjar

Outputdirectoryforgeneratedcode

Putautogeneratedclassesinthispackage

OverridedefaultmappingfromSQLtypetoJavatypeforconfiguredcolumns.

Table28.Outputlineformattingarguments:
Argument



Description
Setsarequiredfieldenclosingcharacter
Setstheescapecharacter
Setsthefieldseparatorcharacter
Setstheendoflinecharacter

UsesMySQLsdefaultdelimiterset:fields:lines:escapedby:optionally
enclosedby:

Setsafieldenclosingcharacter

Table29.Inputparsingarguments:
Argument

Description
Setsarequiredfieldencloser

Setstheinputescapecharacter
 Setstheinputfieldseparator

Setstheinputendoflinecharacter
 Setsafieldenclosingcharacter


KWWSVVTRRSDSDFKHRUJGRFV6TRRS8VHU*XLGHKWPO





6TRRS8VHU*XLGH Y

Table30.Hivearguments:
Argument


Description
Override  
ImporttablesintoHive(UsesHivesdefaultdelimitersifnoneareset.)
OverwriteexistingdataintheHivetable.
Ifset,thenthejobwillfailifthetargethive
tableexits.Bydefaultthispropertyisfalse.

SetsthetablenametousewhenimportingtoHive.

Replace\n,\r,and\01fromstringfieldswithuserdefinedstringwhenimporting
toHive.
Nameofahivefieldtopartitionareshardedon

Stringvaluethatservesaspartitionkeyforthisimportedintohiveinthisjob.

OverridedefaultmappingfromSQLtypetoHivetypeforconfiguredcolumns.

Drops\n,\r,and\01fromstringfieldswhenimportingtoHive.

IfHiveargumentsareprovidedtothecodegenerationtool,SqoopgeneratesafilecontainingtheHQL
statementstocreateatableandloaddata.

14.3.ExampleInvocations
Recreatetherecordinterpretationcodeforthetableofacorporatedatabase:



15.
15.1.Purpose
15.2.Syntax
15.3.ExampleInvocations

15.1.Purpose
ThetoolpopulatesaHivemetastorewithadefinitionforatablebasedonadatabase
tablepreviouslyimportedtoHDFS,oroneplannedtobeimported.Thiseffectivelyperformsthe"
"stepofwithoutrunningthepreceedingimport.
IfdatawasalreadyloadedtoHDFS,youcanusethistooltofinishthepipelineofimportingthedatato
Hive.YoucanalsocreateHivetableswiththistooldatathencanbeimportedandpopulatedintothe
targetafterapreprocessingsteprunbytheuser.

15.2.Syntax



AlthoughtheHadoopgenericargumentsmustpreceedanycreatehivetablearguments,thecreate
hivetableargumentscanbeenteredinanyorderwithrespecttooneanother.
Table31.Commonarguments
Argument

Description
SpecifyJDBCconnectstring
 Specifyconnectionmanagerclasstouse

ManuallyspecifyJDBCdriverclasstouse


KWWSVVTRRSDSDFKHRUJGRFV6TRRS8VHU*XLGHKWPO





6TRRS8VHU*XLGH Y

Override$HADOOP_HOME
Printusageinstructions

Readpasswordfromconsole

Setauthenticationpassword

Setauthenticationusername

Printmoreinformationwhileworking
 Optionalpropertiesfilethatprovidesconnectionparameters


Table32.Hivearguments:
Argument

Description

Override  

OverwriteexistingdataintheHivetable.

Ifset,thenthejobwillfailifthetargethive
tableexits.Bydefaultthispropertyisfalse.
 SetsthetablenametousewhenimportingtoHive.

Thedatabasetabletoreadthedefinitionfrom.
Table33.Outputlineformattingarguments:
Argument



Description
Setsarequiredfieldenclosingcharacter
Setstheescapecharacter
Setsthefieldseparatorcharacter
Setstheendoflinecharacter

UsesMySQLsdefaultdelimiterset:fields:lines:escapedby:optionally
enclosedby:

Setsafieldenclosingcharacter

Donotuseenclosedbyorescapedbydelimiterswithoutputformattingargumentsusedtoimportto
Hive.Hivecannotcurrentlyparsethem.

15.3.ExampleInvocations
DefineinHiveatablenamedwithadefinitionbasedonadatabasetablenamed:



16.
16.1.Purpose
16.2.Syntax
16.3.ExampleInvocations

16.1.Purpose
ThetoolallowsuserstoquicklyrunsimpleSQLqueriesagainstadatabaseresultsareprintedto
theconsole.Thisallowsuserstopreviewtheirimportqueriestoensuretheyimportthedatathey
expect.

16.2.Syntax


KWWSVVTRRSDSDFKHRUJGRFV6TRRS8VHU*XLGHKWPO





6TRRS8VHU*XLGH Y



AlthoughtheHadoopgenericargumentsmustpreceedanyevalarguments,theevalargumentscan
beenteredinanyorderwithrespecttooneanother.
Table34.Commonarguments
Argument

Description
SpecifyJDBCconnectstring
 Specifyconnectionmanagerclasstouse

ManuallyspecifyJDBCdriverclasstouse

Override$HADOOP_HOME

Printusageinstructions

Readpasswordfromconsole

Setauthenticationpassword

Setauthenticationusername

Printmoreinformationwhileworking
 Optionalpropertiesfilethatprovidesconnectionparameters


Table35.SQLevaluationarguments:
Argument

Description

 ExecuteinSQL.

16.3.ExampleInvocations
Selecttenrecordsfromthetable:

   

Insertarowintothetable:

  

17.
17.1.Purpose
17.2.Syntax
17.3.ExampleInvocations

17.1.Purpose
Listdatabaseschemaspresentonaserver.

17.2.Syntax



AlthoughtheHadoopgenericargumentsmustpreceedanylistdatabasesarguments,thelist
databasesargumentscanbeenteredinanyorderwithrespecttooneanother.
Table36.Commonarguments
Argument

Description
SpecifyJDBCconnectstring
 Specifyconnectionmanagerclasstouse


KWWSVVTRRSDSDFKHRUJGRFV6TRRS8VHU*XLGHKWPO





6TRRS8VHU*XLGH Y

ManuallyspecifyJDBCdriverclasstouse
Override$HADOOP_HOME

Printusageinstructions

Readpasswordfromconsole

Setauthenticationpassword

Setauthenticationusername

Printmoreinformationwhileworking
 Optionalpropertiesfilethatprovidesconnectionparameters



17.3.ExampleInvocations
ListdatabaseschemasavailableonaMySQLserver:



Note
ThisonlyworkswithHSQLDB,MySQLandOracle.WhenusingwithOracle,itis
necessarythattheuserconnectingtothedatabasehasDBAprivileges.

18.
18.1.Purpose
18.2.Syntax
18.3.ExampleInvocations

18.1.Purpose
Listtablespresentinadatabase.

18.2.Syntax



AlthoughtheHadoopgenericargumentsmustpreceedanylisttablesarguments,thelisttables
argumentscanbeenteredinanyorderwithrespecttooneanother.
Table37.Commonarguments
Argument

Description
SpecifyJDBCconnectstring
 Specifyconnectionmanagerclasstouse

ManuallyspecifyJDBCdriverclasstouse

Override$HADOOP_HOME

Printusageinstructions

Readpasswordfromconsole

Setauthenticationpassword

Setauthenticationusername

Printmoreinformationwhileworking
 Optionalpropertiesfilethatprovidesconnectionparameters


18.3.ExampleInvocations
Listtablesavailableinthe"corp"database:


KWWSVVTRRSDSDFKHRUJGRFV6TRRS8VHU*XLGHKWPO





6TRRS8VHU*XLGH Y





19.
19.1.Purpose
19.2.Syntax
19.3.ExampleInvocations

19.1.Purpose
ListtoolsavailableinSqoopandexplaintheirusage.

19.2.Syntax



Ifnotoolnameisprovided(forexample,theuserruns),thentheavailabletoolsarelisted.
Withatoolname,theusageinstructionsforthatspecifictoolarepresentedontheconsole.

19.3.ExampleInvocations
Listavailabletools:







  

  





Displayusageinstructionsforthetool:


 




 

 
  







  
  
 

20.
20.1.Purpose
20.2.Syntax
20.3.ExampleInvocations

20.1.Purpose
DisplayversioninformationforSqoop.
KWWSVVTRRSDSDFKHRUJGRFV6TRRS8VHU*XLGHKWPO





6TRRS8VHU*XLGH Y

20.2.Syntax



20.3.ExampleInvocations
Displaytheversion:





21.CompatibilityNotes
21.1.SupportedDatabases
21.2.MySQL
21.2.1.zeroDateTimeBehavior
21.2.2.
columns
21.2.3.andcolumns
21.2.4.Importingviewsindirectmode
21.2.5.DirectmodeTransactions
21.3.PostgreSQL
21.3.1.Importingviewsindirectmode
21.4.Oracle
21.4.1.DatesandTimes
21.5.SchemaDefinitioninHive
SqoopusesJDBCtoconnecttodatabasesandadherestopublishedstandardsasmuchaspossible.
FordatabaseswhichdonotsupportstandardscompliantSQL,Sqoopusesalternatecodepathsto
providefunctionality.Ingeneral,Sqoopisbelievedtobecompatiblewithalargenumberofdatabases,
butitistestedwithonlyafew.
Nonetheless,severaldatabasespecificdecisionsweremadeintheimplementationofSqoop,and
somedatabasesofferadditionalsettingswhichareextensionstothestandard.
ThissectiondescribesthedatabasestestedwithSqoop,anyexceptionsinSqoopshandlingofeach
databaserelativetothenorm,andanydatabasespecificsettingsavailableinSqoop.

21.1.SupportedDatabases
WhileJDBCisacompatibilitylayerthatallowsaprogramtoaccessmanydifferentdatabasesthrough
acommonAPI,slightdifferencesintheSQLlanguagespokenbyeachdatabasemaymeanthatSqoop
cantuseeverydatabaseoutofthebox,orthatsomedatabasesmaybeusedinaninefficient
manner.
WhenyouprovideaconnectstringtoSqoop,itinspectstheprotocolschemetodetermineappropriate
vendorspecificlogictouse.IfSqoopknowsaboutagivendatabase,itwillworkautomatically.Ifnot,
youmayneedtospecifythedriverclasstoloadvia.Thiswilluseagenericcodepathwhich
willusestandardSQLtoaccessthedatabase.Sqoopprovidessomedatabaseswithfaster,nonJDBC
basedaccessmechanisms.Thesecanbeenabledbyspecfyingtheparameter.
Sqoopincludesvendorspecificsupportforthefollowingdatabases:
Database
HSQLDB
MySQL
Oracle

version support? connectstringmatches

1.8.0+ No

5.0+
Yes

10.2.0+ No

KWWSVVTRRSDSDFKHRUJGRFV6TRRS8VHU*XLGHKWPO





6TRRS8VHU*XLGH Y

PostgreSQL 8.3+

Yes(importonly)

Sqoopmayworkwitholderversionsofthedatabaseslisted,butwehaveonlytesteditwiththe
versionsspecifiedabove.
EvenifSqoopsupportsadatabaseinternally,youmaystillneedtoinstallthedatabasevendorsJDBC
driverinyour pathonyourclient.Sqoopcanloadclassesfromanyjarsin 
ontheclientandwillusethemaspartofanyMapReducejobsitrunsunlikeolderversions,youno
longerneedtoinstallJDBCjarsintheHadooplibrarypathonyourservers.

21.2.MySQL
21.2.1.zeroDateTimeBehavior
21.2.2.
columns
21.2.3.andcolumns
21.2.4.Importingviewsindirectmode
21.2.5.DirectmodeTransactions
JDBCDriver:MySQLConnector/J
MySQLv5.0andaboveoffersverythoroughcoveragebySqoop.Sqoophasbeentestedwith
.

21.2.1.zeroDateTimeBehavior
MySQLallowsvaluesofforcolumns,whichisanonstandardextensiontoSQL.When
communicatedviaJDBC,thesevaluesarehandledinoneofthreedifferentways:
Convertto.
Throwanexceptionintheclient.
Roundtothenearestlegaldate().
Youspecifythebehaviorbyusingthepropertyoftheconnectstring.Ifa
propertyisnotspecified,Sqoopusesthebehavior.
Youcanoverridethisbehavior.Forexample:



21.2.2.
columns
Columnswithtype
inMySQLcanholdvaluesbetween0and2^32(),butthe
databasewillreportthedatatypetoSqoopas 
,whichwillcanholdvaluesbetween
and.Sqoopcannotcurrentlyimport
valuesabove.

21.2.3.andcolumns
Sqoopsdirectmodedoesnotsupportimportsof,,or
 columns.UseJDBCbased
importsforthesecolumnsdonotsupplytheargumenttotheimporttool.

21.2.4.Importingviewsindirectmode
Sqoopiscurrentlynotsupportingimportfromviewindirectmode.UseJDBCbased(nondirect)mode
incasethatyouneedtoimportview(simplyomitparameter).

21.2.5.DirectmodeTransactions
Forperformance,eachwriterwillcommitthecurrenttransactionapproximatelyevery32MBof
exporteddata.Youcancontrolthisbyspecifyingthefollowingargumentbeforeanytoolspecific
arguments:,wheresizeisavalueinbytes.Setsizeto0to
disableintermediatecheckpoints,butindividualfilesbeingexportedwillcontinuetobecommitted
KWWSVVTRRSDSDFKHRUJGRFV6TRRS8VHU*XLGHKWPO





6TRRS8VHU*XLGH Y

disableintermediatecheckpoints,butindividualfilesbeingexportedwillcontinuetobecommitted
independentlyofoneanother.
Important
NotethatanyargumentstoSqoopthatareoftheformare
Hadoopgenericargumentsandmustappearbeforeanytoolspecific
arguments(forexample,,,etc).

21.3.PostgreSQL
21.3.1.Importingviewsindirectmode
SqoopsupportsJDBCbasedconnectorforPostgreSQL:http://jdbc.postgresql.org/
TheconnectorhasbeentestedusingJDBCdriverversion"9.1903JDBC4"withPostgreSQLserver
9.1.

21.3.1.Importingviewsindirectmode
Sqoopiscurrentlynotsupportingimportfromviewindirectmode.UseJDBCbased(nondirect)mode
incasethatyouneedtoimportview(simplyomitparameter).

21.4.Oracle
21.4.1.DatesandTimes
JDBCDriver:OracleJDBCThinDriverSqoopiscompatiblewith.
SqoophasbeentestedwithOracle10.2.0ExpressEdition.Oracleisnotableinitsdifferentapproach
toSQLfromtheANSIstandard,anditsnonstandardJDBCdriver.Therefore,severalfeatureswork
differently.

21.4.1.DatesandTimes
OracleJDBCrepresentsand SQLtypesas values.AnycolumnsinanOracle
databasewillbeimportedasa inSqoop,andSqoopgeneratedcodewillstorethesevaluesin
fields.
Whenexportingdatabacktoadatabase,Sqoopparsestextfieldsas types(withtheform
 )evenifyouexpectthesefieldstobeformattedwiththeJDBCdateescape
formatof.DatesexportedtoOracleshouldbeformattedasfulltimestamps.
Oraclealsoincludestheadditionaldate/timetypes    and    .
Tosupportthesetypes,theuserssessiontimezonemustbespecified.Bydefault,Sqoopwillspecify
thetimezone
toOracle.YoucanoverridethissettingbyspecifyingaHadoopproperty
onthecommandlinewhenrunningaSqoopjob.Forexample:



NotethatHadoopparameters()aregenericargumentsandmustappearbeforethetoolspecific
arguments(,,andsoon).
Legalvaluesforthesessiontimezonestringareenumeratedathttp://download
west.oracle.com/docs/cd/B19306_01/server.102/b14225/applocaledata.htm#i637736.

21.5.SchemaDefinitioninHive
HiveuserswillnotethatthereisnotaonetoonemappingbetweenSQLtypesandHivetypes.In
general,SQLtypesthatdonothaveadirectmapping(forexample,, ,and )willbe
coercedto 
inHive.The and SQLtypeswillbecoercedto.Inthesecases,
Sqoopwillemitawarninginitslogmessagesinformingyouofthelossofprecision.
KWWSVVTRRSDSDFKHRUJGRFV6TRRS8VHU*XLGHKWPO





6TRRS8VHU*XLGH Y

22.GettingSupport
Somegeneralinformationisavailableatthehttp://sqoop.apache.org/
ReportbugsinSqooptotheissuetrackerathttps://issues.apache.org/jira/browse/SQOOP.
QuestionsanddiscussionregardingtheusageofSqoopshouldbedirectedtothesqoopusermailing
list.
Beforecontactingeitherforum,runyourSqoopjobwiththeflagtoacquireasmuch
debugginginformationaspossible.Alsoreportthestringreturnedbyaswellasthe
versionofHadoopyouarerunning().

23.Troubleshooting
23.1.GeneralTroubleshootingProcess
23.2.SpecificTroubleshootingTips
23.2.1.Oracle:ConnectionResetErrors
23.2.2.Oracle:CaseSensitiveCatalogQueryErrors
23.2.3.MySQL:ConnectionFailure
23.2.4.Oracle:ORA00933error(SQLcommandnotproperlyended)
23.2.5.MySQL:ImportofTINYINT(1)fromMySQLbehavesstrangely

23.1.GeneralTroubleshootingProcess
Thefollowingstepsshouldbefollowedtotroubleshootanyfailurethatyouencounterwhilerunning
Sqoop.
Turnonverboseoutputbyexecutingthesamecommandagainandspecifyingthe
option.Thisproducesmoredebugoutputontheconsolewhichcanbeinspectedtoidentify
anyobviouserrors.
LookatthetasklogsfromHadooptoseeifthereareanyspecificfailuresrecordedthere.It
ispossiblethatthefailurethatoccurswhiletaskexecutionisnotrelayedcorrectlytothe
console.
Makesurethatthenecessaryinputfilesorinput/outputtablesarepresentandcanbe
accessedbytheuserthatSqoopisexecutingasorconnectingtothedatabaseas.Itis
possiblethatthenecessaryfilesortablesarepresentbutthespecificuserthatSqoop
connectsasdoesnothavethenecessarypermissionstoaccessthesefiles.
IfyouaredoingacompoundactionsuchaspopulatingaHivetableorpartition,trybreaking
thejobintotwoseparateactionstoseewheretheproblemreallyoccurs.Forexampleifan
importthatcreatesandpopulatesaHivetableisfailing,youcanbreakitdownintotwo
stepsfirstfordoingtheimportalone,andthesecondtocreateaHivetablewithoutthe
importusingthetool.Whilethisdoesnotaddresstheoriginalusecaseof
populatingtheHivetable,itdoeshelpnarrowdowntheproblemtoeitherregularimportor
duringthecreationandpopulationofHivetable.
SearchthemailinglistsarchivesandJIRAforkeywordsrelatingtotheproblem.Itis
possiblethatyoumayfindasolutiondiscussedtherethatwillhelpyousolveorworkaround
yourproblem.

23.2.SpecificTroubleshootingTips
23.2.1.Oracle:ConnectionResetErrors
23.2.2.Oracle:CaseSensitiveCatalogQueryErrors
23.2.3.MySQL:ConnectionFailure
23.2.4.Oracle:ORA00933error(SQLcommandnotproperlyended)
23.2.5.MySQL:ImportofTINYINT(1)fromMySQLbehavesstrangely

23.2.1.Oracle:ConnectionResetErrors
Problem:WhenusingthedefaultSqoopconnectorforOracle,somedatadoesgettransferred,but
duringthemapreducejobalotoferrorsarereportedasbelow:
KWWSVVTRRSDSDFKHRUJGRFV6TRRS8VHU*XLGHKWPO





6TRRS8VHU*XLGH Y

      


 
  

 









 
  
  

 








  






  
  
  
  
  
  


  
  
   
   



Solution:Thisproblemoccursprimarilyduetothelackofafastrandomnumbergenerationdeviceon
thehostwherethemaptasksexecute.OntypicalLinuxsystemsthiscanbeaddressedbysettingthe
followingpropertyinthefile:

Thefilecanbefoundunder  directory.Alternatively,thisproperty


canalsobespecifiedonthecommandlinevia:


23.2.2.Oracle:CaseSensitiveCatalogQueryErrors
Problem:WhileworkingwithOracleyoumayencounterproblemswhenSqoopcannotfigureout
columnnames.ThishappensbecausethecatalogqueriesthatSqoopusesforOracleexpectthe
correctcasetobespecifiedfortheusernameandtablename.
Oneexample,usinghiveimportandresultinginaNullPointerException:
  





  
  








  
  
  


KWWSVVTRRSDSDFKHRUJGRFV6TRRS8VHU*XLGHKWPO





6TRRS8VHU*XLGH Y







Solution:
1.Specifytheusername,whichSqoopisconnectingas,inuppercase(unlessitwascreated
withmixed/lowercasewithinquotes).
2.Specifythetablename,whichyouareworkingwith,inuppercase(unlessitwascreated
withmixed/lowercasewithinquotes).

23.2.3.MySQL:ConnectionFailure
Problem:WhileimportingaMySQLtableintoSqoop,ifyoudonothavethenecessarypermissionsto
accessyourMySQLdatabaseoverthenetwork,youmaygetthebelowconnectionfailure.


Solution:First,verifythatyoucanconnecttothedatabasefromthenodewhereyouarerunning
Sqoop:
 

Ifthisworks,itrulesoutanyproblemwiththeclientnetworkconfigurationorsecurity/authentication
configuration.
Addthenetworkportfortheservertoyourmy.cnffile:



SetupauseraccounttoconnectviaSqoop.Grantpermissionstotheusertoaccessthedatabase
overthenetwork:(1.)LogintoMySQLasroot .(2.)Issuethefollowing
command:


NotethatdoingthiswillenablethetestusertoconnecttotheMySQLserverfromanyIPaddress.
Whilethiswillwork,itisnotadvisableforaproductionenvironment.Weadviseconsultingwithyour
DBAtograntthenecessaryprivilegesbasedonthesetuptopology.
IfthedatabaseserversIPaddresschanges,unlessitisboundtoastatichostnameinyourserver,the
connectstringpassedintoSqoopwillalsoneedtobechanged.

23.2.4.Oracle:ORA00933error(SQLcommandnot
properlyended)
Problem:WhileworkingwithOracleyoumayencounterthebelowproblemwhentheSqoopcommand
explicitlyspecifiesthedriver<drivername>option.WhenthedriveroptionisincludedintheSqoop
command,thebuiltinconnectionmanagerselectiondefaultstothegenericconnectionmanager,
whichcausesthisissuewithOracle.Ifthedriveroptionisnotspecified,thebuiltinconnection
managerselectionmechanismselectstheOraclespecificconnectionmanagerwhichgeneratesvalid
SQLforOracleandusesthedriver"oracle.jdbc.OracleDriver".



Solution:Omittheoptiondriveroracle.jdbc.driver.OracleDriverandthenreruntheSqoop
command.

23.2.5.MySQL:ImportofTINYINT(1)fromMySQL
KWWSVVTRRSDSDFKHRUJGRFV6TRRS8VHU*XLGHKWPO





6TRRS8VHU*XLGH Y

behavesstrangely
Problem:SqoopistreatingTINYINT(1)columnsasbooleans,whichisforexamplecausingissueswith
HIVEimport.ThisisbecausebydefaulttheMySQLJDBCconnectormapstheTINYINT(1)to
java.sql.Types.BIT,whichSqoopbydefaultmapstoBoolean.
Solution:AmorecleansolutionistoforceMySQLJDBCConnectortostopconvertingTINYINT(1)to
java.sql.Types.BITbyadding intoyourJDBCpath(tocreatesomethinglike
).Anothersolutionwouldbetoexplicitlyoverridethecolumn
mappingforthedatatypeTINYINT(1)column.Forexample,ifthecolumnnameisfoo,thenpassthe
followingoptiontoSqoopduringimport:mapcolumnhivefoo=tinyint.InthecaseofnonHive
importstoHDFS,usemapcolumnjavafoo=integer.
ThisdocumentwasbuiltfromSqoopsourceavailableathttp://svn.apache.org/repos/asf/sqoop/trunk/.

KWWSVVTRRSDSDFKHRUJGRFV6TRRS8VHU*XLGHKWPO



Vous aimerez peut-être aussi