Académique Documents
Professionnel Documents
Culture Documents
ions on Hadoop
1.WhatdoescommodityHardwareinHadoopworldmean?(D)
a)Verycheaphardware
b)Industrystandardhardware
c)Discardedhardware
d)LowspecificationsIndustrygradehardware
2.WhichofthefollowingareNOTbigdataproblem(s)?(D)
a)Parsing5MBXMLfileevery5minutes
b)ProcessingIPLtweetsentiments
c)Processingonlinebanktransactions
d)both(a)and(c)
3.WhatdoesVelocityinBigDatamean?(D)
a)Speedofinputdatageneration
b)Speedofindividualmachineprocessors
c)SpeedofONLYstoringdata
d)Speedofstoringandprocessingdata
4.ThetermBigDatafirstoriginatedfrom:(C)
a)StockMarketsDomain
b)BankingandFinanceDomain
c)GenomicsandAstronomyDomain
d)SocialMediaDomain
5.WhichofthefollowingBatchProcessinginstanceisNOTanexampleof(D)
BigDataBatchProcessing?
a)Processing10GBsalesdataevery6hours
b)Processingflightssensordata
c)Webcrawlingapp
d)Trendingtopicanalysisoftweetsforlast15minutes
6.Whichofthefollowingareexample(s)ofRealTimeBigDataProcessing?(D)
a)ComplexEventProcessing(CEP)platforms
b)Stockmarketdataanalysis
c)Bankfraudtransactionsdetection
d)both(a)and(c)
7.Slidingwindowoperationstypicallyfallinthecategory(C)of__________________.
a)OLTPTransactions
b)BigDataBatchProcessing
c)BigDataRealTimeProcessing
d)SmallBatchProcessing
8.WhatisHBaseusedas?(A)
a)ToolforRandomandFastRead/WriteoperationsinHadoop
b)FasterReadonlyqueryengineinHadoop
c)MapReducealternativeinHadoop
d)FastMapReducelayerinHadoop
Interview Questions
9.WhatisHiveusedas?(D)
a)Hadoopqueryengine
b)MapReducewrapper
c)HadoopSQLinterface
d)Alloftheabove
10.WhichofthefollowingareNOTtrueforHadoop?(D)
a)ItsatoolforBigDataanalysis
b)Itsupportsstructuredandunstructureddataanalysis
c)Itaimsforverticalscalingout/inscenarios
d)Both(a)and(c)
11.WhichofthefollowingarethecorecomponentsofHadoop?(D)
a)HDFS
b)MapReduce
c)HBase
d)Both(a)and(b)
12.Hadoopisopensource.(B)
a)ALWAYSTrue
b)TrueonlyforApacheHadoop
c)TrueonlyforApacheandClouderaHadoop
d)ALWAYSFalse
13.Hivecanbeusedforrealtimequeries.(B)
a)TRUE
b)FALSE
c)Trueifdatasetissmall
d)Trueforsomedistributions
14.WhatisthedefaultHDFSblocksize?(D)
a)32MB
b)64KB
c)128KB
d)64MB
15.WhatisthedefaultHDFSreplicationfactor?(C)
a)4
b)1
c)3
d)2
16.WhichofthefollowingisNOTatypeofmetadatainNameNode?(C)
a)Listoffiles
b)Blocklocationsoffiles
c)No.offilerecords
d)Fileaccesscontrolinformation
17.Whichofthefollowingis/arecorrect?(D)
a)NameNodeistheSPOFinHadoop1.x
b)NameNodeistheSPOFinHadoop2.x
c)NameNodekeepstheimageofthefilesystemalso
d)Both(a)and(c)
18.ThemechanismusedtocreatereplicainHDFSis____________.(C)
a)Gossipprotocol
b)Replicateprotocol
c)HDFSprotocol
d)StoreandForwardprotocol
19.NameNodetriestokeepthefirstcopyofdatanearesttotheclientmachine.(C)
a)ALWAYStrue
b)ALWAYSFalse
c)Trueiftheclientmachineisthepartofthecluster
d)Trueiftheclientmachineisnotthepartofthecluster
20.HDFSdatablockscanbereadinparallel.(A)
a)TRUE
b)FALSE
21.WhereisHDFSreplicationfactorcontrolled?(D)
a)mapredsite.xml
b)yarnsite.xml
c)coresite.xml
d)hdfssite.xml
22.Readthestatementandselectthecorrectoption:(B)
ItisnecessarytodefaultallthepropertiesinHadoopconfigfiles.
a)True
b)False
23.WhichofthefollowingHadoopconfigfilesisusedtodefinetheheapsize?(C)
a)hdfssite.xml
b)coresite.xml
c)hadoopenv.sh
d)Slaves
24.WhichofthefollowingisnotavalidHadoopconfigfile?(B)
a)mapredsite.xml
b)hadoopsite.xml
c)coresite.xml
d)Masters
25.Readthestatement:
NameNodesareusuallyhighstoragemachinesintheclusters.(B)
a)True
b)False
c)Dependsonclustersize
d)TrueifcolocatedwithJobtracker
26.Fromtheoptionslistedbelow,selectthesuitabledatasourcesforflume.(D)
a)Publiclyopenwebsites
b)Localdatafolders
c)Remotewebservers
d)Both(a)and(c)
27.Readthestatementandselectthecorrectoptions:(A)
distcpcommandALWAYSneedsfullyqualifiedhdfspaths.
a)True
b)False
c)True,ifsourceanddestinationareinsamecluster
d)False,ifsourceanddestinationareinsamecluster
28.Whichoffollowingstatement(s)aretrueaboutdistcpcommand?(A)
a)ItinvokesMapReduceinbackground
b)ItinvokesMapReduceifsourceanddestinationareinsamecluster
c)Itcantcopydatafromlocalfoldertohdfsfolder
d)Youcantoverwritethefilesthroughdistcpcommand
29.WhichofthefollowingisNOTthecomponentofFlume?(B)
a)Sink
b)Database
c)Source
d)Channel
30.WhichofthefollowingisthecorrectsequenceofMapReduceflow?(C)
f)Map??Reduce??Combine
a)Combine??Reduce??Map
b)Map??Combine??Reduce
c)Reduce??Combine??Map
31.Whichofthefollowingcanbeusedtocontrolthenumberofpartfiles(B)inamapreduceprogramoutputdirectory?
a)NumberofMappers
b)NumberofReducers
c)Counter
d)Partitioner
32.WhichofthefollowingoperationscantuseReducerascombineralso?(D)
a)GroupbyMinimum
b)GroupbyMaximum
c)GroupbyCount
d)GroupbyAverage
33.Whichofthefollowingis/aretrueaboutcombiners?(D)
a)Combinerscanbeusedformapperonlyjob
b)CombinerscanbeusedforanyMapReduceoperation
c)Mapperscanbeusedasacombinerclass
d)CombinersareprimarilyaimedtoimproveMapReduceperformance
e)Combinerscantbeappliedforassociativeoperations
34.Reducesidejoinisusefulfor(A)
a)Verylargedatasets
b)Verysmalldatasets
c)Onesmallandotherbigdatasets
d)Onebigandothersmalldatasets
35.DistributedCachecanbeusedin(D)
a)Mapperphaseonly
b)Reducerphaseonly
c)Ineitherphase,butnotonbothsidessimultaneously
d)Ineitherphase
36.Counterspersistthedataonharddisk.(B)
a)True
b)False
37.Whatisoptimalsizeofafilefordistributedcache?(C)
a)<=10MB
b)>=250MB
c)<=100MB
d)<=35MB
38.Numberofmappersisdecidedbythe(D)
a)Mappersspecifiedbytheprogrammer
b)AvailableMapperslots
c)Availableheapmemory
d)InputSplits
e)InputFormat
39.WhichofthefollowingtypeofjoinscanbeperformedinReducesidejoinoperation?(E)
a)EquiJoin
b)LeftOuterJoin
c)RightOuterJoin
d)FullOuterJoin
e)Alloftheabove
40.WhatshouldbeanupperlimitforcountersofaMapReducejob?(D)
a)~5s
b)~15
c)~150
d)~50
41.Whichofthefollowingclassisresponsibleforconvertinginputstokeyvalue(c)PairsofMapReduce
a)FileInputFormat
b)InputSplit
c)RecordReader
d)Mapper
42.Whichofthefollowingwritablescanbeusedtoknowvaluefromamapper/reducer?(C)
a)Text
b)IntWritable
c)Nullwritable
d)String
43.DistributedcachefilescantbeaccessedinReducer.(B)
a)True
b)False
44.OnlyonedistributedcachefilecanbeusedinaMapReducejob.(B)
a)True
b)False
45.AMapreducejobcanbewrittenin:(D)
a)Java
b)Ruby
c)Python
d)AnyLanguagewhichcanreadfrominputstream
46.Pigisa:(B)
a)ProgrammingLanguage
b)DataFlowLanguage
c)QueryLanguage
d)Database
47.Pigisgoodfor:(E)
a)DataFactoryoperations
b)DataWarehouseoperations
c)ImplementingcomplexSQLs
d)Creatingmultipledatasetsfromasinglelargedataset
e)Both(a)and(d)
48.Pigcanbeusedforrealtimedataupdates.(B)
a)True
b)False
49.PigjobshavethesameruntimeasthenativeMapReducejobs.(B)
a)True
b)False
50.WhichofthefollowingisthecorrectrepresentationtoaccessSkillfromthe(A)
Bag{Skills,55,(Skill,Speed),{2,(San,Mateo)}}
a)$3.$1
b)$3.$0
c)$2.$0
d)$2.$1
51.Replicatedjoinsareusefulfordealingwithdataskew.(B)
a)True
b)False
52.Maximumsizeallowedforsmalldatasetinreplicatedjoinis:(C)
a)10KB
b)10MB
c)100MB
d)500MB
53.ParameterscouldbepassedtoPigscriptsfrom:(E)
a)ParentPigScripts
b)ShellScript
c)CommandLine
d)ConfigurationFile
e)Alltheaboveexcept(a)
54.Theschemaofarelationcanbeexaminedthrough:(B)
a)ILLUSTRATE
b)DESCRIBE
c)DUMP
d)EXPLAIN
55.DUMPStatementwritestheoutputinafile.(B)
a)True
b)False
56.DatacanbesuppliedtoPigUnittestsfrom:(C)
a)HDFSLocation
b)WithinProgram
c)Both(a)and(b)
d)Noneoftheabove
57.WhichofthefollowingconstructsarevalidPigControlStructures?(D)
a)Ifelse
b)ForLoop
c)UntilLoop
d)Noneoftheabove
58.WhichoffollowingisthereturndatatypeofFilterUDF?(C)
a)String
b)Integer
c)Boolean
d)Noneoftheabove
59.UDFscanbeappliedonlyinFOREACHstatementsinPig.(A)
a)True
b)False
60.WhichofthefollowingarenotpossibleinHive?(E)
a)CreatingTables
b)CreatingIndexes
c)CreatingSynonym
d)WritingUpdateStatements
e)Both(c)and(d)
61.Whowillinitiatethemapper?(A)
a)Tasktracker
b)Jobtracker
c)Combiner
d)Reducer
62.Categorizethefollowingtothefollowingdatatype
a)JSONfilesSemistructured
b)WordDocs,PDFFiles,TextfilesUnstructured
c)EmailbodyUnstructured
d)Datafromenterprisesystems(DB,CRM)Structured
63.WhichofthefollowingaretheBigDataSolutionsCandidates?(E)
a)Processing1.5TBdataeveryday
b)Processing30minutesFlightsensordata
c)Interconnecting50Kdatapoints(approx.1MBinputfile)
d)ProcessingUserclicksonawebsite
e)Alloftheabove
64.Hadoopisaframeworkthatallowsthedistributedprocessingof:(C)
a)SmallDataSets
b)SemiLargeDataSets
c)LargeDataSets
d)LargeandSmallDatasets
65.WheredoesSqoopingestdatafrom?(B)&(D)
a)LinuxFileDirectory
b)Oracle
c)HBase
d)MySQL
e)MongoDB
66.Identifythebatchprocessingscenariosfromfollowing:(C)&(E)
a)SlidingWindowAveragesJob
b)FacebookCommentsProcessingJob
c)InventoryDynamicPricingJob
d)FraudulentTransactionIdentificationJob
e)FinancialForecastingJob
67.WhichofthefollowingisnottrueaboutNameNode?(B)&(C)&(D)
a)ItistheMasterMachineoftheCluster
b)ItisNameNodethatcanstoreuserdata
c)NameNodeisastorageheavymachine
d)NameNodecanbereplacedbyanyDataNodeMachine
68.WhichofthefollowingareNOTmetadataitems?(E)
a)ListofHDFSfiles
b)HDFSblocklocations
c)Replicationfactoroffiles
d)AccessRights
e)FileRecordsdistribution
69.WhatdecidesnumberofMappersforaMapReducejob?(C)
a)FileLocation
b)mapred.map.tasksparameter
c)Inputfilesize
d)InputSplits
70.NameNodemonitorsblockreplicationprocess(B)
a)TRUE
b)FALSE
c)Dependsonfiletype
71.WhichofthefollowingaretrueforHadoopPseudoDistributedMode?(C)
a)Itrunsonmultiplemachines
b)Runsonmultiplemachineswithoutanydaemons
c)RunsonSingleMachinewithalldaemons
d)RunsonSingleMachinewithoutalldaemons
72.Whichoffollowingstatement(s)arecorrect?(C)
a)MasterandslavesfilesareoptionalinHadoop2.x
b)Masterfilehaslistofallnamenodes
c)CoresitehashdfsandMapReducerelatedcommonproperties
d)hdfssitefileisnowdeprecatedinHadoop2.x
73.WhichofthefollowingistrueforHive?(C)
a)HiveisthedatabaseofHadoop
b)Hivesupportsschemachecking
c)Hivedoesntallowrowlevelupdates
d)HivecanreplaceanOLTPsystem
74.WhichofthefollowingisthehighestlevelofDataModelinHive?(c)
a)Table
b)View
c)Database
d)Partitions
75.Hivequeriesresponsetimeisinorderof(C)
a)Hoursatleast
b)Minutesatleast
c)Secondsatleast
d)Millisecondsatleast
76.ManagedtablesinHive:(D)
a)CanloadthedataonlyfromHDFS
b)Canloadthedataonlyfromlocalfilesystem
c)Areusefulforenterprisewidedata
d)AreManagedbyHivefortheirdataandmetadata
77.PartitionedtablesinHive:(D)
a)Areaimedtoincreasetheperformanceofthequeries
b)ModifytheunderlyingHDFSstructure
c)Arenotusefulifthefiltercolumnsforqueryaredifferentfromthepartitioncolumns
d)Alloftheabove
78.HiveUDFscanonlybewritteninJava(B)
a)True
b)False
79.Hivecanloadthedatafrom:(D)
a)LocalFilesystem
b)HDFSFilesystem
c)OutputofaPigJob
d)Alloftheabove
80.HBaseisakey/valuestore.Specificallyitis:(E)
a)Sparse
b)SortedMap
c)Distributed
d)Consistent
e)Multidimensional
81.WhichofthefollowingistheoutermostpartofHBasedatamodel(A)
a)Database
b)Table
c)Rowkey
d)Columnfamily
82.Whichofthefollowingis/aretrue?(A&D)
a)HBasetablehasfixednumberofColumnfamilies
b)HBasetablehasfixednumberofColumns
c)HBasedoesntallowrowlevelupdates
d)HBaseaccessHDFSdata
83.DatacanbeloadedinHBasefromPigusing(D)
a)PigStorage
b)SqoopStorage
c)BinStorage
d)HbaseStorage
84.SqoopcanloadthedatainHBase(A)
a)True
b)False
85.WhichofthefollowingAPIscanbeusedforexploringHBasetables?(D)
a)HBaseDescriptor
b)HBaseAdmin
c)Configuration
d)HTable
86.WhichofthefollowingtablesinHBaseholdstheregiontokeymapping?(B)
a)ROOT
b).META.
c)MAP
d)REGIONS
87.WhatisthedatatypeofversioninHBase?(B)
a)INT
b)LONG
c)STRING
d)DATE
88.WhatisthedatatypeofrowkeyinHBase?(D)
a)INT
b)STRING
c)BYTE
d)BYTE[]
89.HBasefirstreadsthedatafrom(B)
a)BlockCache
b)Memstore
c)HFile
d)WAL
90.TheHighavailabilityofNamenodeisachievedinHDFS2.xusing(C)
a)PolledEditLogs
b)SynchronizedEditLogs
c)SharedEditLogs
d)EditLogsReplacement
91.TheapplicationmastermonitorsallMapReduceapplicationsinthecluster(B)
a)True
b)False
92.HDFSFederationisusefulfortheclustersizeof:(C)
a)>500nodes
b)>900nodes
c)>5000nodes
d)>3500nodes
93.Hivemanagedtablesstoresthedatain(C)
a)LocalLinuxpath
b)AnyHDFSpath
c)HDFSwarehousepath
d)Noneoftheabove
94.Ondroppingmanagedtables,Hive:(C)
a)Retainsdata,butdeletesmetadata
b)Retainsmetadata,butdeletesdata
c)Dropsboth,dataandmetadata
d)Retainsboth,dataandmetadata
95.Managedtablesdontallowloadingdatafromothertables.(B)
a)True
b)False
96.ExternaltablescanloadthedatafromwarehouseHivedirectory.(A)
a)True
b)False
97.Ondroppingexternaltables,Hive:(A)
a)Retainsdata,butdeletesmetadata
b)Retainsmetadata,butdeletesdata
c)Dropsboth,dataandmetadata
d)Retainsboth,dataandmetadata
98.Partitionedtablescantloadthedatafromnormal(partitioned)tables(B)
a)True
b)False
99.ThepartitionedcolumnsinHivetablesare(B)
a)Physicallypresentandcanbeaccessed
b)Physicallyabsentbutcanbeaccessed
c)Physicallypresentbutcantbeaccessed
d)Physicallyabsentandcantbeaccessed
100.Hivedatamodelsrepresent(C)
a)TableinMetastoreDB
b)TableinHDFS
c)DirectoriesinHDFS
d)Noneoftheabove
101.WhenistheearliestpointatwhichthereducemethodofagivenReducercanbecalled?
A.Assoonasatleastonemapperhasfinishedprocessingitsinputsplit.
B.Assoonasamapperhasemittedatleastonerecord.
C.Notuntilallmappershavefinishedprocessingallrecords.
D.ItdependsontheInputFormatusedforthejob.
Answer:C
Explanation:
InaMapReducejobreducersdonotstartexecutingthereducemethoduntiltheallMapjobshavecompleted.Reducersstartcopyingintermediatekeyvaluepairsfromthemappersas
soonastheyareavailable.Theprogrammerdefinedreducemethodiscalledonlyafterallthemappershavefinished.
Note:Thereducephasehas3steps:shuffle,sort,andreduce.Shuffleiswherethedataiscollectedbythereducerfromeachmapper.Thiscanhappenwhilemappersaregenerating
datasinceitisonlyadatatransfer.Ontheotherhand,sortandreducecanonlystartonceallthemappersaredone.
Whyisstartingthereducersearlyagoodthing?Becauseitspreadsoutthedatatransferfromthemapperstothereducersovertime,whichisagoodthingifyournetworkisthe
bottleneck.
Whyisstartingthereducersearlyabadthing?Becausetheyhogupreduceslotswhileonlycopyingdata.Anotherjobthatstartslaterthatwillactuallyusethereduceslotsnowcant
usethem.
Wecancustomizewhenthereducersstartupbychangingthedefaultvalueofmapred.reduce.slowstart.completed.mapsinmapredsite.xml.Avalueof1.00willwaitforallthe
mapperstofinishbeforestartingthereducers.Avalueof0.0willstartthereducersrightaway.Avalueof0.5willstartthereducerswhenhalfofthemappersarecomplete.Youcanalso
changemapred.reduce.slowstart.completed.mapsonajobbyjobbasis.
Typically,keepmapred.reduce.slowstart.completed.mapsabove0.9ifthesystemeverhasmultiplejobsrunningatonce.Thiswaythejobdoesnthogupreducerswhentheyarent
doinganythingbutcopyingdata.Ifwehaveonlyonejobrunningatatime,doing0.1wouldprobablybeappropriate.
102.WhichdescribeshowaclientreadsafilefromHDFS?
A.TheclientqueriestheNameNodefortheblocklocation(s).TheNameNodereturnstheblocklocation(s)totheclient.TheclientreadsthedatadirectoryofftheDataNode(s).
B.TheclientqueriesallDataNodesinparallel.TheDataNodethatcontainstherequesteddatarespondsdirectlytotheclient.TheclientreadsthedatadirectlyofftheDataNode.
C.TheclientcontactstheNameNodefortheblocklocation(s).TheNameNodethenqueriestheDataNodesforblocklocations.TheDataNodesrespondtotheNameNode,andthe
NameNoderedirectstheclienttotheDataNodethatholdstherequesteddatablock(s).TheclientthenreadsthedatadirectlyofftheDataNode.
D.TheclientcontactstheNameNodefortheblocklocation(s).TheNameNodecontactstheDataNodethatholdstherequesteddatablock.DataistransferredfromtheDataNodetothe
NameNode,andthenfromtheNameNodetotheclient.
Answer:C
103.WhenYouaredevelopingacombinerthattakesasinputTextkeys,IntWritablevalues,andemitsTextkeys,IntWritablevalues.Whichinterfaceshouldyourclassimplement?
A.Combiner<Text,IntWritable,Text,IntWritable>
A.Reducer<Text,IntWritable,Text,IntWritable>
A.Combiner<Text,Text,IntWritable,IntWritable>
A.Combiner<Text,Text,IntWritable,IntWritable>
Answer:B
4.IndentifytheutilitythatallowsyoutocreateandrunMapReducejobswithanyexecutableorscriptasthemapperand/orthereducer?
A.Oozie
B.Sqoop
C.Flume
D.HadoopStreaming
E.mapred
Answer:D
5.HowarekeysandvaluespresentedandpassedtothereducersduringastandardsortandshufflephaseofMapReduce?
A.Keysarepresentedtoreducerinsortedordervaluesforagivenkeyarenotsorted.B.Keysarepresentedtoreducerinsortedordervaluesforagivenkeyaresortedinascending
order.
C.Keysarepresentedtoareducerinrandomordervaluesforagivenkeyarenotsorted.
D.Keysarepresentedtoareducerinrandomordervaluesforagivenkeyaresortedinascendingorder.
Answer:A
106.Assumingdefaultsettings,whichbestdescribestheorderofdataprovidedtoareducersreducemethod
A.Thekeysgiventoareducerarentinapredictableorder,butthevaluesassociatedwiththosekeysalwaysare.
B.Boththekeysandvaluespassedtoareduceralwaysappearinsortedorder.
C.Neitherkeysnorvaluesareinanypredictableorder.
D.Thekeysgiventoareducerareinsortedorderbutthevaluesassociatedwitheachkeyareinnopredictableorder
Answer:D
Share this:
Share
17
Tweet
About Siva
Senior Hadoop developer with 4 years of experience in designing and architecture solutions for the Big Data domain and has been involved with several complex engagements.
Technical strengths include Hadoop, YARN, Mapreduce, Hive, Sqoop, Flume, Pig, HBase, Phoenix, Oozie, Falcon, Kafka, Storm, Spark, MySQL and Java.
View all posts by Siva
Leave a comment
Your email address will not be published. Required elds are marked *
Visual
File
Edit
Insert
View
Format
Table
Text
Tools
Name *
Email *
Website
PostComment
Nice collections
Dinesh
Reply
Reply
Post navigation
Hive Performance Tuning
Search
Search
Core Hadoop
Big Data
Hadoop
Map Reduce
EcoSystem Tools
Hive
Pig
HBase
Impala
1) Scala
2) Spark
3) Kafka
4) Real Time projects
if there are any doubts or questions call
on +91-9704231873.
MovingtoCanada?
JobSearchWebinarJoinUsNowInOurFreeWebinar!
Recent Comments
Govind on Big Data Introduction
Govind on Hadoop Input Formats
Subbu on Bucketing In Hive
Kapil Sharma on Sqoop Interview Questions and Answers for Experienced
Arpit Jain on Partitioning in Hive
Contat Us
Call Us On : +91-9704231873
Mail Us On : siv535@gmail.com
Email ID
Youremailaddress
Send
Back to top