Vous êtes sur la page 1sur 66

Data Scientist Interview

I applied online and the process took 4 weeks - interviewed at LinkedIn.


Interview Details There should be two rounds of phone interviews and one on-
site interview. I applied for the position in LinkedIn career page. A
recruiter contacted me the next day and scheduled my first phone screen.
After the first phone screen the recruiter contacted me regarding the next
round of the phone screen.
Interview Question I passed the first phone screen !basic data mining
"uestions including the concepts of classification and clustering# and a
simple dp "uestion which is "uite similar to $%limbing &tairs$' and failed
the second one right after I came back from another state !basic nlp
"uestions like named entity extraction and basic data mining "uestions
like &() naive bayes# and a sampling "uestion which is "uite similar to
*eservoir sampling'. View Answer
Interview Details ask "uestions about s"l and data mining
Interview Question "uestions are "uite standard.
I applied online and the process took + weeks - interviewed at LinkedIn in July 2012.
Interview Details I was first contacted by a recruiter. Two phone interviews
were arranged. The first interviewer asked some basic "uestions about my
resume and then we went into the technical "uestions. Two "uestions were
asked both about searching in sorted arrays of numbers. The second interview
was almost identical except that the "uestion was more about algorithm
design which re"uired general problem solving skill.
Interview Question ,uestions are not difficult. It is important to review
basic algorithm design and know how to talk through the interview and know
when to ask for help.
Interview Details -ad two phone interviews with data science group. .ne was
more design "uestions machine learning background check etc. The second
interview was strictly coding algorithms and stuff.
Interview Questions
Implement pow function. Answer Question
&egment a long string into a set of valid words using a dictionary.
*eturn false if the string cannot be segmented. /hat is the complexity
of your solution0
&egment a long string into a set of valid words using a dictionary. *eturn
false if the string cannot be segmented. /hat is the complexity of your
solution0$
Common Analytics Interview Questions
Posted y! "arita #i$u%arti on &ctoer 11' 201( in Articles 1 )o%%ent
*ou are e+cited. *ou have $ot that %uch awaited interview call ,or that drea% analytics -o. *ou
are con,ident you will e per,ect ,or the -o. .ow all that re%ains is convincin$ the interviewer.
#on/t you wish you knew what kind o, 0uestions they are $oin$ to e ask1
As co ,ounder and one o, the chie, trainers at Ji$saw Acade%y' an online analytics trainin$
institute' I re$ularly $et calls ,ro% our students days e,ore their scheduled interview askin$ %e
-ust this. I a% $oin$ to share with you -ust what I share with the%. 2ere you $o. 3elow are a ,ew
o, the %ore popular 0uestions you could $et asked and the correspondin$ answers in a nutshell.
Question 1. Can you outline the various steps in an analytics project?
3roadly speakin$ these are the steps. &, course these %ay vary sli$htly dependin$ on the type o,
prole%' data' tools availale etc.
1. Problem deinition - 4he ,irst step is to o, course understand the usiness prole%. 5hat is
the prole% you are tryin$ to solve 6 what is the usiness conte+t1 Very o,ten however your
client %ay also -ust $ive you a whole lot o, data and ask you to do so%ethin$ with it. In such a
case you would need to take a %ore e+ploratory look at the data. .evertheless i, the client has a
speci,ic prole% that needs to e tackled' then then ,irst step is to clearly de,ine and understand
the prole%. *ou will then need to convert the usiness prole% into an analytics prole%. I
other words you need to understand e+actly what you are $oin$ to predict with the %odel you
uild. 4here is no point in uildin$ a ,aulous %odel' only to realise later that what it is
predictin$ is not e+actly what the usiness needs.
!. Data "#ploration - &nce you have the prole% de,ined' the ne+t step is to e+plore the data
and eco%e %ore ,a%iliar with it. 4his is especially i%portant when dealin$ with a co%pletely
new data set.
$. Data Preparation 6 .ow that you have a $ood understandin$ o, the data' you will need to
prepare it ,or %odellin$. *ou will identi,y and treat %issin$ values' detect outliers' trans,or%
variales' create inary variales i, re0uired and so on. 4his sta$e is very in,luenced y the
%odellin$ techni0ue you will use at the ne+t sta$e. 7or e+a%ple' re$ression involves a ,air
a%ount o, data preparation' ut decision trees %ay need less prep whereas clusterin$ re0uires a
whole di,,erent kind o, prep as co%pared to other techni0ues.
%. &odellin' 6 &nce the data is prepared' you can e$in %odellin$. 4his is usually an iterative
process where you run a %odel' evaluate the results' tweak your approach' run another %odel'
evaluate the results' re-tweak and so on8.. *ou $o on doin$ this until you co%e up with a %odel
you are satis,ied with or what you ,eel is the est possile result with the $iven data.
(. )alidation 6 4he ,inal %odel 9or %aye the est 2-( %odels: should then e put throu$h the
validation process. In this process' you test the %odel usin$ co%pletely new data set i.e. data that
was not used to uild the %odel. 4his process ensures that your %odel is a $ood %odel in $eneral
and not -ust a very $ood %odel ,or the speci,ic data earlier used 94echnically' this is called
avoidin$ over ,ittin$:
*. Implementation and trac+in' 6 4he ,inal %odel is chosen a,ter the validation. 4hen you start
i%ple%entin$ the %odel and trackin$ the results. *ou need to track results to see the
per,or%ance o, the %odel over ti%e. In $eneral' the accuracy o, a %odel $oes down over ti%e.
2ow %uch ti%e will really depend on the variales 6 how dyna%ic or static they are' and the
$eneral environ%ent 6 how static or dyna%ic that is.

Question !. ,hat do you do in data e#ploration?
#ata e+ploration is done to eco%e ,a%iliar with the data. 4his step is especially i%portant when
dealin$ with new data. 4here are a nu%er o, thin$s you will want to do in this step 6
a. ,hat is there in the data 6 look at the list o, all the variales in the data set. ;nderstand
the %eanin$ o, each variale usin$ the data dictionary. <o ack to the usiness ,or %ore
in,or%ation in case o, any con,usion.
. -ow much data is there 6 look at the volu%e o, the data 9how %any records:' look at the
ti%e ,ra%e o, the data 9last ( %onths' last = %onths etc.:
c. Quality o the data 6 how %uch %issin$ in,or%ation' 0uality o, data in each variale.
Are all ,ields usale1 I, a ,ield has data ,or only 10> o, the oservations' then %aye that ,ield is
not usale etc.
d. *ou will also identi,y so%e i%portant variales and %ay do a deeper investi$ation o,
these. Like lookin$ at avera$es' %in and %a+ values' %aye 10
th
and ?0
th
percentile as well8
e. *ou %ay also identi,y ,ields that you need to trans,or% in the data prep sta$e.

Question $. ,hat do you do in data preparation?
In data preparation' you will prepare the data ,or the ne+t sta$e i.e. the %odellin$ sta$e. 5hat
you do here is in,luenced y the choice o, techni0ue you use in the ne+t sta$e.
3ut so%e thin$s are done in %ost cases 6 e+a%ple identi,yin$ %issin$ values and treatin$ the%'
identi,yin$ outlier values 9unusual values: and treatin$ the%' trans,or%in$ variales' creatin$
inary variales i, re0uired etc'
4his is the sta$e where you will partition the data as well. i.e create trainin$ data 9to do
%odellin$: and validation 9to do validation:.

Question %. -ow will you treat missin' values?
4he ,irst step is to identi,y variales with %issin$ values. Assess the e+tent o, %issin$ values. Is
there a pattern in %issin$ values1 I, yes' try and identi,y the pattern. It %ay lead to interestin$
insi$hts.
I, no pattern' then we can either i$nore %issin$ values 9"A" will not use any oservation with
%issin$ data: or i%pute the %issin$ values.
"i%ple i%putation 6 sustitute with %ean or %edian values
&@
)ase wise i%putation 6,or e+a%ple' i, we have %issin$ values in the inco%e ,ield.

Question (. -ow will you treat outlier values?
*ou can identi,y outliers usin$ $raphical analysis and univariate analysis. I, there are only a ,ew
outliers' you can assess the% individually. I, there are %any' you %ay want to sustitute the
outlier values with the 1
st
percentile or the ??
th
percentile values.
I, there is a lot o, data' you %ay decide to i$nore records with outliers.
.ot all e+tre%e values are outliers. .ot all outliers are e+tre%e values.

Question *. -ow do you assess the results o a lo'istic re'ression analysis?
*ou can use di,,erent %ethods to assess how $ood a lo$istic %odel is.
a. )oncordance 6 4his tells you aout the aility o, the %odel to discri%inate etween the event
happenin$ and not happenin$.
. Li,t 6 It helps you assess how %uch etter the %odel is co%pared to rando% selection.
c. )lassi,ication %atri+ 6 helps you look at the ,alse positives and true ne$atives.
"o%e other $eneral 0uestions you will %ost likely e asked!
5hat have you done to i%prove your data analytics knowled$e in the past year1
5hat are your career $oals1
5hy do you want a career in data analytics1
4he answers to these 0uestions will have to e uni0ue to the person answerin$ it. 4he key is to
show con,idence and $ive well thou$ht out answers that de%onstrate you are knowled$eale
aout the industry and have the conviction to work hard and e+cel as a data analyst.
/he /op ( Questions A Data 0cientist 0hould
As+ Durin' a 1ob Interview
Posted on July 29, 2013 by Sean Murphy
4he data science -o %arket is hot and an incredile nu%er o, co%panies' lar$e and s%all' are
advertisin$ a desperate need ,or talent.
3e,ore -u%pin$ on the ,irst =-,i$ure o,,er you $et'
it would e wise to ask the penetratin$ 0uestions elow to %ake sure that the see%in$ly $olden
opportunity in ,ront o, you isn/t actually pyrite.
1) Do they have data?
*ou %i$ht $et a $ood lau$h at this one and proaly assu%e that this co%pany interviewin$ you
%ust have data as they are interviewin$ you 9a data scientist:. 2owever' you know what they say
aout ass-u-%in$' ri$ht1
I, the co%pany tells you that the data is co%in$ 9si%ilar to the Acheck is in the %ailB:' start
askin$ a lot %ore 0uestions. Ask i, the needed data sharin$ a$ree%ents have een si$ned and
even ask to see the%. I, not' ask what the ackup plan is ,or i, 9or when: the data does not arrive.
4rust %e' it always takes lon$er than everyone thinks.
4o e an entrepreneur %eans to e an opti%ist at so%e level ecause otherwise no one would do
so%ethin$ with such a low proaility o, success. 4hus' it is pretty easy ,or an entrepreneur to
assu%e that $ettin$ data will not e that hard. It will only e a,ter %onths o, stalled ne$otiations
and several ,ailures that they will $ive up on $ettin$ the data or' in startup parlance' pivot. In the
%eanti%e' you est ,i$ure out so%e other ways o, ein$ use,ul and creatin$ value ,or your new
or$aniCation.
2) Who will you report to and what is her or his background?
"o' really what you are askin$ is! does the person who will clai% %e as a %inion actually have
e+perience with data and do they understand the a%ount o, ti%e that wran$lin$ data can take1
I, you are reportin$ to an Dana$e%entEF+ecutive type' this 0uestion is all i%portant and your
very survival likely depends on your answer.
7irst' $o read the <ervais Principle at rion,ar%. 7ro% %y e+perience' the ideas aren/t too ,ar
o,, o, the %ark.
"econd' %any data-related tasks are conceptually trivial. 2owever' these tasks can take an
a%ount o, ti%e see%in$ly inversely proportional to their si%plicity. &r' even worse' so%ethin$
that is conceptually very si%ple %ay e %athe%atically or statistically very challen$in$ or
re0uire %any di,,icult and ti%e-consu%in$ steps. "o%ethin$ like count the nu%er o, tweets ,or
or a$ainst a particular topic is trivial ,or people ut less so ,or al$orith%s.
7urther' as everyone knows' data wran$lin$ on any pro-ect can consu%e G0> or %ore o, the total
pro-ect ti%e and' unless that %ana$er has worked with data' she or he %ay not understand this
reality. 4he rule o, thu% to never ,or$et is that i, so%eone does not understand so%ethin$' that
person will al%ost always under appreciate it. I swear there %ust e a class in A%erican D3A
pro$ra%s that teaches i, you don/t understand so%ethin$ it %ust e si%ple and only take ,ive
%inutes.
I, you are reportin$ to a )4&-type' the situation %ay see% etter ut it actually %i$ht e worse.
"o,tware en$ineerin$ and develop%ent do not e0ual data science. 4echnical e+perience' %ost o,
the ti%e' does not e0ual data e+perience. 2avin$ $one throu$h a ,ew se%esters o, calculus does
not a statistics ack$round %ake. 2ope,ully' I have %ade %y point. 4here is a reason we call the
,ields so,tware HHen'ineerin'HH 9nice and predictale: and data HHscienceHH 9conductin$
e+peri%ents to test hypotheses:. 2owever' %any technically-oriented people %ay elieve they
know %ore than they actually do.
"hort version ,or I2 is that ti%e e+pectations are i%portant to ,lesh out up ,ront and are hi$hly
dependent on your oss/ ack$round.
4hird' your co%%unications strate$y will chan$e radically dependin$ on your oss/ ack$round.
#o they want the sordid details o, how you worked throu$h the data or do they -ust want the
otto% line i%pact1
3) How will my progress and/or perormance be measured?
Jnowin$ how to succeed in your new workplace is pretty i%portant and the e+pectations
surroundin$ data science are stratospheric at the %o%ent. Jeep your eyes peeled i, there is a
$ood 0uick win availale ,or you to de%onstrate your value 9and this is a 0uestion that I would
directly ask:.
4he $iant red ,la$ here is i, you will e included in an Aa$ileB so,tware process with data-work
shoehorned into short-ter% sprints alon$ with the en$ineerin$ or develop%ent tea%. #ata
"cience is science and %any tasks will o,ten have you dealin$ with the dreaded unknown
unknown. In other words' you are e+plorin$ terra incognita' a process that is unpredictale at
est. Dana$in$ data scientists is very di,,erent than %ana$in$ so,tware en$ineers.
!) How many other data scientists/practitioners will you be working with and are
in the company overall?
5hat you are tryin$ to understand here is how data-driven 9versus e$o-driven: the co%pany that
you are thinkin$ o, -oinin$ is.
I, the co%pany has e+isted ,or %ore than a ,ew years and has ,ew data science or analyst types' it
is proaly e$o driven. Put another way' decisions are %ade y the 2iPP&s 9the -I$hest Paid
Person/s 2pinions:. I, your data analyses are $oin$ to e used ,or internal decision %akin$' this
possily puts you' the new hire' directly a$ainst the 2iPP&s. <uess who will win that ,i$ht1 I,
you are $oin$ into this position' %ake sure you will e ar%in$ the 2iPP& with knowled$e as
opposed to ,i$htin$ directly a$ainst other 2iPP&s.
") Has anyone ever run analyses on the company#s data?
4his one is critical i, you will e doin$ any type o, retrospective analyses ased on previously
collected data. I, you si%ply ask the co%pany i, they have ever looked at their data' the answer is
o,ten yes re$ardless o, whether or not they have as %ost co%panies don/t want to ad%it that they
haven/t. Instead' ask what types o, analyses the co%pany has done on its data' did the
e+a%ination cover all o, the co%panies data' and ask who 9ein$ care,ul to in0uire aout this
person/s ack$round and credentials: did the work.
4he reason this line o, 0uestionin$ is so i%portant is that the ,irst ti%e you plu% the depths o, a
co%pany/s dataase' you are likely to di$ up so%e skeletons. And y likely I really %ean
certainly. In ,act' $oin$ throu$h historically collected data is %uch like an archeolo$ical
e+cavation. As you $o ,urther ack into the dataase' you $o throu$h deeper layers o, the history
o, the or$aniCation and will learn %uch. *ou %i$ht ,ind out when they chan$ed contractors or
when they decided to stop collectin$ a particular ,ield that you -ust happen to need. *ou %i$ht
see when the servers went down ,or a day or when a particularly well hidden u$ prevented
dataase writes ,or a ,ew weeks. 4he i%portant point here is that you %i$ht uncover issues that
so%e people still present in the co%pany would pre,er not to e unearthed. Dy si%ple advice'
tread li$htly.
Nailing the Tech Interview
Jessica Kirkpatrick is the Director of Data Science at InstaEDU, and formerly a data scientist
on the analytics team at Yammer (Microsoft). efore that she !as an "strophysicist at U#
erkeley and has also $een an Insi%ht mentor since the pro%ram&s fo'ndin%. elo! is a %'est
post, ori%inally appearin% on the (omen in "stronomy $lo%, !here Jessica shares her tips on
doin% !ell in technical )o$ inter*ie!s.
A year a$o' I %ade the transition ,ro% astrophysicist to data scientist. &ne o, the harder parts o,
%akin$ the transition was convincin$ a tech co%pany 9durin$ the interview process: that I could
do the -o. 2avin$ now een on oth sides o, the interview tale' IKd like to share so%e advice to
those wishin$ to reak into the techEdata science industry. 5hile this advice is applicale to
candidates in $eneral' IK% $oin$ to e $earin$ it towards applicants co%in$ ,ro% acade%ia E Ph#
pro$ra%s.
Dost tech co%panies are interested in s%art' talented people who can learn 0uickly and have
$ood prole% solvin$ skills. 5e see acade%ics as havin$ these skills. 4here,ore' i, you apply ,or
internships or -os at tech co%panies' you will %ost likely $et a response ,ro% a recruiter. 4he
prole% is that once you $et an interview' there are a lot o, industry-speci,ic skills that the
co%pany will try to assess' skills that you %ay or %ay not have already.
3elow are some o, the traits we look ,or when recruitin$ ,or the *a%%er analyticsEdata tea%'
descriptions o, how we try to deter%ine i, a candidate has these traits' and what you should do to
KnailK this aspect o, the interview.
1. Interest in the Position
4his sounds like a no-rainer' ut you would e surprised at how %any candidates
havenKt done proper research aout the co%pany or the position. It is especially i%portant
,or people co%in$ ,ro% acade%ic ack$rounds to de%onstrate why they are interested in
%akin$ this transition and why they are speci,ically interested in this opportunity.
5hen I ask a candidate L5hy are you interested in -oinin$ %y tea%1L I o,ten $et
responses like LI really want to %ove to "an 7ranciscoL or LIK% sick o, %y research.L
.either o, these responses de%onstrate speci,ic interest in %y tea% or %y co%pany.
How to Nail It! #o research aout the position you are applyin$ ,or. ;nderstand what the
role entails' the co%panyKs $oals and priorities' and the product9s: that you will e
workin$ on. 2ave a convincin$ story ,or why you are %akin$ this career chan$e or why
you want to leave your current position. "how enthusias% ,or the opportunityMevery
interviewer should think that their position is your nu%er one choice and that you canKt
wait to -oin their tea%. Dore i%portantly' only apply ,or roles that you $enuinely ,ind
interestin$.
2. "#cellent Problem 0olvin' 0+ills
&ne o, the %ost challen$in$ aspects o, the analystEdata scientist role is takin$ a va$ue
0uestion posed y so%eone within the co%pany' and ,i$urin$ out how to est answer it
usin$ our data sets. 4estin$ 9and de%onstratin$: this skill in an interview very di,,icult.
At *a%%er we try to test this skill y askin$ a co%ination o, open-ended prole%s'
rain teasers' and scenarios si%ilar to those we deal with on a re$ular asis. 7or %any o,
these 0uestions there isnKt a ri$ht or wron$ answer' we are %ore interested in the way the
candidate constrains the prole%' articulates her thou$ht process' and how e,,iciently she
$ets to a solution. 7or so%e data science positions you will e asked to do codin$
prole%s. 7a%iliariCe yoursel, with so%e o, the standard codin$ al$orith%s and
0uestions.
How to Nail It! 4hese types o, prole%s are asked y %any tech co%panies and there are
plenty o, e+a%ples o, the% on the we. Practice constrainin$' co%in$ up with a clear
$a%e plan' articulatin$ that plan' and then ,ollowin$ throu$h in a %ethodical way. Dany
prole%s are hard to answer as posed and so tryin$ si%pler versions o, the prole% or
lookin$ at ed$e cases can $ive you insi$ht into how to ,ind patterns. "o%eti%es not all
the relevant in,or%ation is $iven y the interviewer' donKt e a,raid to ask clari,yin$
0uestions or turn the process into a discussion. I, the interviewer tries to $ive you hints or
tips' take the%. 4here is nothin$ %ore ,rustratin$ 9as an interviewer: than tryin$ to $uide
a candidate ack on track and have her i$nore your help 9it also doesnKt ode well ,or the
intervieweeKs aility to work well with others:.
(. Communication 0+ills
As I said in a previous post' co%%unication is key. 5e are lookin$ ,or so%eone who can
clearly articulate her thou$ht process' and can e convincin$ that her approach is correct
even when ein$ challen$ed y the interviewer. A standard way we will test this is y
posin$ an open-ended 0uestion and when the interviewee says a reasonale answer' we
$ive a reason why that isnKt ri$ht' then the she co%es up with a di,,erent e+planation' and
we ne$ate it a$ain. 5e keep $oin$ to see how she deals with havin$ to switch directions'
and alances de,endin$ her answer and ein$ ,le+ile with takin$ the interviewerKs
su$$estions.
How to Nail It! Practice articulatin$ your approach and %ethods ,or the aove KtechnicalK
interview 0uestions. )o%e up with a i$-picture $a%e plan ,or approachin$ the prole%'
e clear aout that plan' have a %ethodical approach' and then e+ecute itMall the while
articulatin$ your thou$ht process as %uch as possile. I, the interviewer tries to %ake you
chan$e directions' itKs ok to de,end your approach' ut you donKt want to e too ri$id' they
%i$ht e tryin$ to help you not $o down the wron$ path. 4ry to %ake the interaction as
pleasant and war% as possile. Avoid $ettin$ de,ensive' ,rustrated' or -ust $ivin$ up. It is
a very hard alance' ut practice 9especially with another person who can $ive you
,eedack: %akes per,ect.
N. Culture 3it
In tech co%panies you work collaoratively on pro-ects on ti$ht knit tea%s. 5e are $oin$
to e spendin$ a lot o, ti%e with a candidate i, we hire herO we want to en-oy that ti%e
to$ether. 4here,ore we are also tryin$ to assess i, the interviewee would e a $ood
coworker at an interpersonal level. Is she ,riendly1 #oes she work well on tea%s1 #oes
she have the ri$ht alance o, ein$ opinionated ut not do%ineerin$1 Is she an interestin$
person1 5hat are her passions and $oals1
I canKt tell you how %any ti%es IKve asked a candidate L5hat do you like to do ,or ,un1L
and they answer! LI like to read pro$ra%%in$ ooks.L Is that really what you like to do
,or ,un1 &r do you -ust think that is what you are supposed to say in a tech interview1
How to Nail It! @e%e%er that your interviewer is a person too' and interact with the% as
a person. 4ry to show so%e o, your personality' passion' sense o, hu%or' and uni0ueness
in the interview. ItKs hard to e rela+ed in these situations' ut personality $oes a lon$
way.
P. As+ 4ood Questions
At the end o, the interview you will typically have a chance to ask 0uestions. 4his is your
ti%e to take control o, the process and turn the tales on the interviewer. "o%eti%es I
learn the %ost aout a candidate in how she uses this portion o, the interview. A
interviewee I a% on the ,ence aout can really tip the decision one way or another y
askin$ intelli$ent' thou$ht provokin$' and en$a$in$ 0uestions at the end o, an interview
9or orin$' unin,or%ed' or $eneric 0uestions:.
How to Nail It! ;se this as an opportunity to co%%unicate thin$s you werenKt ale to
show in other parts o, the interview. #e%onstrate that you have researched the co%pany'
that you understand their usiness $oals and the way you could contriute. Ask
thou$ht,ul 0uestions aout the role' de%onstrate that you want so%ethin$ that is
challen$in$ and discuss types o, skills you want to learn or apply. ;se this opportunity to
show the interviewer what skills you can rin$ to the role. I, applicale' try to relate what
you are learnin$ aout the -oEco%pany to what youKve done in the past. Prepare tons o,
0uestions' write the% down ahead o, ti%e and rin$ the% to the interview. *ou shouldnKt
run out o, 0uestions or have to repeat the% over the course o, the day.
4he aove is y no %eans an e+haustive list o, everythin$ a tech co%pany is lookin$ ,or' and o,
course di,,erent co%panies have di,,erent approaches. 5hen I interviewed ,or %y current -o 9I
recently %oved to the education start-up InstaF#;:' %ost o, the interview involved discussin$
%y previous pro-ects' the prole%s that the co%pany was ,acin$' and how I could provide value
to the% as a data scientist. It was a very di,,erent e+perience interviewin$ ,or %y second -o in
the tech industry than ,or %y ,irst. 2owever' I do hope that the aove de%ysti,ies the tech
interview process' $ives you insi$ht into how one co%pany $oes aout hirin$ data people' and
helps you understand what we are lookin$ ,or on the other side o, the tale.
Ads By Google
4he 3usiness Analyst -o description %ay vary ,ro% one co%pany to another.
4he -o re0uire%ents o, a person ,illin$ a usiness analyst position depend on the usiness
nature o, a $iven co%pany.
4here,ore' each 3usiness Analyst -o interview can e co%pletely di,,erent.
5ote. a 3usiness Analyst is so%eti%es re,erred to as a "yste% Analyst or Fn$ineers Analysts or
I4 usiness analyst.
4his article provides sa%ples o, -o interview 0uestions ,or a usiness analyst position.
In $eneral' the usiness analyst -o description is!
Effet!"ely translate bus!ness needs to appl!at!ons and operat!ons#
$hallenge ross o%pany un!ts and pro"!de the re&u!re%ents for the '() tea%#
*se ases, sur"eys+senar!os analys!s, and ,or-flo, analys!s, e"aluat!ng !nfor%at!on,
foal po!nt for !nternal and e.ternal usto%ers, def!ne users/ needs and on"ert to
bus!ness ases#
4he skills re0uire%ents ,or a usiness analyst are!
0eadersh!p
)e!s!on %a-!ng
$onfl!t resolut!on
Presentat!on s-!lls
E.ellent "erbal and ,r!tten o%%un!at!on s-!lls
1nterpersonal o%%un!at!on s-!lls
Analyt!al th!n-!ng and a negot!ator
Sample Business Analyst Interview Questions
1# )esr!be your respons!b!l!t!es as a bus!ness analyst !n your last 2ob#
2# 3hat are the B1, Bus!ness 1ntell!gene, report!ng tools you use for a g!"en pro2et#
)esr!be the pro2et, the B1 tool+s and the report e.trated#
3# 4o, do you selet ,h!h B1 tool to !%ple%ent5 4o, do you de!de on the report6
fre&ueny, update6fre&ueny and user6needs based on the ob2et!"es you ,ant to
7# ah!e"e for that report5 'efer to B1 tools suh as 8 $ognos )!so"erer, Bus!ness
9b2ets and $rystal reports et#
:# $an you l!st the des!red s-!lls needed to perfor% effet!"ely as a Bus!ness Analyst5
;# 3hat types of %odel!ng re&u!re%ents are used !n the bus!ness appl!at!ons of the
analyst5
<# 1f t,o o%pan!es are %erg!ng, e.pla!n ,h!h tas-s you ,ould !%ple%ent and ho,, to
ensure a suessful un!on5
=# 4a"e you been respons!ble for ass!gn!ng tas-s to testers5 4o, ,ere you re&u!red to
!ntegrate the results found5 4o, do you oord!nate these respons!b!l!t!es ,!th the tea%
and your %anage%ent5
9# $an you e.pla!n the ter% >push ba-? !n relat!on to bus!ness users5 3hat th!s %eans to
you5
10# 3hen ,or-!ng on a pro2et, at ,hat po!nt ,ould the re&u!re%ents of a @raeab!l!ty Matr!.
be !%ple%ented and for ,hat purpose5
11# Aor proess test!ng, an you e.pla!n the role of the Bus!ness Analyst5
12# 3hen ,or-!ng ,!th spe!f! dou%ent re&u!re%ents, an you e.pla!n or def!ne the steps
to reate *se $ases5
13# At ,hat po!nt of a pro2et !s the *se $ase syste% o%plete5 3hat are the ne.t steps !n
the pro2et phase5
"o%e other technical 0uestions that %ay e asked durin$ the -o interview include!
Technical terms and Technical questions
"o%e technical ter%s that %ay e used y the interviewer to veri,y your knowled$e and
co%petencies would e!
*M0 %odel!ng
GAP analys!s
S)0$ %ethodolog!es
@raeab!l!ty Matr!.
'*P, 'at!onal *n!f!ed Proess, !%ple%entat!on#
*1 )es!gns and *1 )es!gn Patterns
Syste% )es!gn )ou%ent BS))C
'e&u!re%ent Manage%ent @ool, 'e&u!re%ents Model!ng
*se $ase and @est $ase
'!s- Manage%ent
Bus!ness plan
)ata %app!ng
Bla- bo. test!ng and 3h!te bo. test!ng
Push Ba- fro% Bus!ness *sers
3aterfall Method and Prototyp!ng Model and the!r hybr!d
1nterfae + 1ntegrat!on %app!ng
Aunt!onal re&u!re%entsD AS)BAunt!onal Spe dou%entC or A'S+M')
End user support and user aeptane test!ng B*A@C
Eal!dat!on of the re&u!re%ents
)eter%!ne 8 '91, ash flo,, brea- e"en and f!.ed+"ar!able osts and sale pr!e
Answer.
4his is your ti%e to review your knowled$e aout the aove usiness ter%s. Dake sure you have
$ood understandin$ o, so%e o, these uCC words o, the industry o, your interest.
@ead a$ain the -o description which was detailed in the -o openin$ and i, you ,ind special
re0uire%ents' prepare yoursel, to answer related 0uestions
In case you have e+pertise in several speci,ic areas' ,ind the ri$ht ti%e durin$ the interview to
speak aout these pro,essional e+pertise that you $ained in your career.
-ow to 3ace an Interview or Data Analyst
7acin$ a -o interview ,or a data analyst position' so%eti%es re,erred to as a statistician position'
can e inti%idatin$. Analysts o,ten have to evaluate' sort and report on data that is inco%plete or
erroneous' so an interviewer will likely ask how you handle those assi$n%ents. #onKt $et rattled
y tou$h 0uestions. "tay positive and use personal e+a%ples ,ro% previous pro-ects to support
your skills and e+perience.
Data$%athering &'perience
#ata analysts are o,ten responsile ,or $atherin$ and co%pilin$ data ,ro% various reputale
sources e,ore %akin$ evaluations' drawin$ conclusions and issuin$ reports. F+pect the hirin$
%ana$er to ask 0uestions like' L2ow do you $o aout collectin$ in,or%ation to support your
analyses1L or' L5hat types o, data have you researched and analyCed in the past1L 4he e%ployer
%i$ht need data analyses to create new advertisin$ strate$ies' prepare short and lon$-ter%
,inance ud$ets' or deter%ine which co%pany products are %ost pro,itale. Answer data-
collectin$ 0uestions with speci,ic e+a%ples o, how you success,ully used $roup sa%ples'
conducted %arket research' reviewed ,inancial reports or analyCed surveys to %ake ,air and
consistent assess%ents.
(alidity o Data
#ata isnKt always accurate' co%plete' understandale' consistent' predictale or ene,icial to
%eetin$ a co%panyKs $oals' so e+pect interview 0uestions aout your %ethods ,or veri,yin$ and
validatin$ in,or%ation. *ou %i$ht discuss ways you take avera$es' ,ind %edians' doule-check
0uestionale entries' ,ind alternative research to support your ,indin$s or consult specialists.
Dost i%portantly' you want to show the interviewer that you are an e,,ective prole%-solver'
trouleshooter and decision-%aker so she has no reason to 0uestion your skills or capailities.
)otware
4he hirin$ %ana$er will likely ask aout your co%puter skills and e+perience usin$ analytic
so,tware. #ata analysts process collected data and reach conclusions with the help o, co%puter
so,tware' accordin$ to the ;.". 3ureau o, Laor "tatistics. #iscuss any e+perience youKve had
with statistical so,tware' such as "tata' @"tudio' P"PP or <D#2 "hell. I, %ost o, your previous
work has een with inter-o,,ice spreadsheets or Dicroso,t F+cel ,iles' assure the interviewer that
you are pro,icient with those types o, data ,iles and would e willin$ to learn any new so,tware
pro$ra%s necessary ,or the -o.
*ommunication and +resentation )kills
#ata analysts %ust co%%unicate results' ,indin$s and ,uture $oals usin$ visual aids' such as
charts' $ra,ts and in,o$raphics. 4he interviewer will likely ask' L5hat are your co%%unication
stren$ths1L or' LF+plain how you or$aniCe and create presentations to report analytical
,indin$s1L Answer these 0uestions with speci,ic e+a%ples o, presentations' reports and se%inars
youKve created or hosted. 4he interviewer wants assurance that you have the people skills and
interpersonal stren$ths to e,,ectively relay your analyses and results.
References
*#S# Bureau of 0abor Stat!st!sD Stat!t!!an
Resources
AorbesD Analyt!s !s Aast Beo%!ng a $ore $o%peteny for Bus!ness Profess!onals
Psyhology @odayD Serets to a Suessful Job 1nter"!e,
About the Author
Jristine 4ucker has een writin$ articles on ,inance' politics' hu%anities and interior desi$n
since 2001. 2er articles have een ,eatured in %any online pulications. 4uckerKs e+perience as
an Fn$lish teacher has $iven her the opportunity to read %any wonder,ul %asterpieces. "he
holds a de$ree in political science with a %inor in international studies.

F *po%!ng 1nfor%at!on 'etr!e"al $onferenes )rea%# A!t# Pass!on# G
,etiring a %reat -nterview +roblem
August 8th, 2011 105 Coents !eneral
Interviewin$ so,tware en$ineers is hard. Je,, Atwood e%oans how di,,icult it is to ,ind
candidates who can write code. 4he tech press sporadically pulishes AestB interview
0uestions that %ake %e crin$e M thou$h I love the IJFA 0uestion. "tartups like )odility and
Interview "treet see this challen$e as an opportunity' o,,erin$ hirin$ %ana$ers the prospect o,
outsourcin$ their codin$ interviews. Deanwhile' #ie$o 3asch and others are ur$in$ us to stop
su-ectin$ candidates to whiteoard codin$ e+ercises.
I don/t have a silver ullet to o,,er. I a$ree that IQ tests and $otcha 0uestions are a terrile way to
assess so,tware en$ineerin$ candidates. At est' they test only one desirale attriuteO at worst'
they are a crapshoot as to whether a candidate has seen a si%ilar prole% or stu%les into the
key insi$ht. )odin$ 0uestions are a %uch etter tool ,or assessin$ people whose day -o will e
codin$' ut conventional interviews M whether y phone or in person M are a suopti%al way
to test codin$ stren$th. Also' it/s not clear whether a codin$ 0uestion should assess prole%-
solvin$' pure translation o, a solution into workin$ code' or oth.
In the ,ace o, all o, these challen$es' I ca%e up with an interview prole% that has served %e
and others well ,or a ,ew years at Fndeca' <oo$le' and LinkedIn. It is with a heavy heart that I
retire it' ,or reasons I/ll discuss at the end o, the post. 3ut ,irst let %e descrie the prole% and
e+plain why it has een so e,,ective.
/he Problem
I call it the Aword reakB prole% and descrie it as ,ollows!
1iven an input string and a dictionary of words
segment the input string into a space-separated
se"uence of dictionary words if possible. 2or
example if the input string is $applepie$ and
dictionary contains a standard set of 3nglish words
then we would return the string $apple pie$ as output.
.ote that I/ve delierately le,t so%e aspects o, this prole% va$ue or underspeci,ied' $ivin$ the
candidate an opportunity to ,lesh the% out. 2ere are e+a%ples o, 0uestions a candidate %i$ht
ask' and how I would answer the%!
,4 /hat if the input string is already a word in the
dictionary0
A4 A single word is a special case of a space-separated
se"uence of words.
,4 &hould I only consider segmentations into two words0
A4 5o but start with that case if it6s easier.
,4 /hat if the input string cannot be segmented into a
se"uence of words in the dictionary0
A4 Then return null or something e"uivalent.
,4 /hat about stemming spelling correction etc.0
A4 7ust segment the exact input string into a se"uence
of exact words in the dictionary.
,4 /hat if there are multiple valid segmentations0
A4 7ust return any valid segmentation if there is one.
,4 I6m thinking of implementing the dictionary as a
trie suffix tree 2ibonacci heap ...
A4 8ou don6t need to implement the dictionary. 7ust
assume access to a reasonable implementation.
,4 /hat operations does the dictionary support0
A4 3xact string lookup. That6s all you need.
,4 -ow big is the dictionary0
A4 Assume it6s much bigger than the input string
but that it fits in memory.
"eein$ how a candidate ne$otiates these details is instructive! it o,,ers you a sense o, the
candidate/s co%%unication skills and attention to detail' not to %ention the candidate/s asic
understandin$ o, data structures and al$orith%s.
A 3i667u66 0olution
Fnou$h with the prole% speci,ication and on to the solution. "o%e candidates start with the
si%pli,ied version o, the prole% that only considers se$%entations into two words. I consider
this a 7iCC3uCC prole%' and I e+pect any co%petent so,tware en$ineer to produce the
e0uivalent o, the ,ollowin$ in their pro$ra%%in$ lan$ua$e o, choice. I/ll use Java in %y e+a%ple
solutions.
&tring &egment&tring!&tring input &et9&tring: dict' ;
int len < input.length!'#
for !int i < =# i 9 len# i>>' ;
&tring prefix < input.substring!? i'#
if !dict.contains!prefix'' ;
&tring suffix < input.substring!i len'#
if !dict.contains!suffix'' ;
return prefix > $ $ > suffix#
@
@
@
return null#
@
I have interviewed candidates who could not produce the aove M includin$ candidates who had
passed a technical phone screen at <oo$le. As Je,, Atwood says' 7iCC3uCC prole%s are a $reat
way to keep interviewers ,ro% wastin$ their ti%e interviewin$ pro$ra%%ers who can/t pro$ra%.
A 4eneral 0olution
&, course' the %ore interestin$ prole% is the $eneral case' where the input strin$ %ay e
se$%ented into any nu%er o, dictionary words. 4here are a nu%er o, ways to approach this
prole%' ut the %ost strai$ht,orward is recursive acktrackin$. 2ere is a typical solution that
uilds on the previous one!
&tring &egment&tring!&tring input &et9&tring: dict' ;
if !dict.contains!input'' return input#
int len < input.length!'#
for !int i < =# i 9 len# i>>' ;
&tring prefix < input.substring!? i'#
if !dict.contains!prefix'' ;
&tring suffix < input.substring!i len'#
&tring seg&uffix < &egment&tring!suffix dict'#
if !seg&uffix A< null' ;
return prefix > $ $ > seg&uffix#
@
@
@
return null#
@
Dany candidates ,or so,tware en$ineerin$ positions cannot co%e up with the aove or an
e0uivalent 9e.$.' a solution that uses an e+plicit stack: in hal, an hour. I/% sure that %any o, the%
are co%petent and productive. 3ut I would not hire the% to work on in,or%ation retrieval or
%achine learnin$ prole%s' especially at a co%pany that delivers search ,unctionality on a
%assive scale.
Analy6in' the 8unnin' /ime
3ut wait' there/s %oreQ 5hen a candidate does arrive at a solution like the aove' I ask ,or an i$
& analysis o, its worst-case runnin$ ti%e as a ,unction o, n' the len$th o, the input strin$. I/ve
heard candidates respond with everythin$ ,ro% &9n: to &9nQ:.
I typically o,,er the ,ollowin$ hint!
%onsider a pathological dictionary containing the words
$a$ $aa$ $aaa$ ... i.e. words composed solely of
the letter 6a6. /hat happens when the input string is a
se"uence of n-= 6a6s followed by a 6b60
2ope,ully the candidate can ,i$ure out that the recursive acktrackin$ solution will e+plore every
possile se$%entation o, this input strin$' which reduces the analysis to deter%ine the nu%er o,
possile se$%entations. I leave it as an e+ercise to the reader 9with this hint: to deter%ine that
this nu%er is &92
n
:.
An "icient 0olution
I, a candidate $ets this ,ar' I ask i, it is possile to do etter than &92
n
:. Dost candidates realiCe
this is a loaded 0uestion' and stron$ ones reco$niCe the opportunity to apply dyna%ic
pro$ra%%in$ or %e%oiCation. 2ere is a solution usin$ %e%oiCation!
)ap9&tring &tring: memoiBed#
&tring &egment&tring!&tring input &et9&tring: dict' ;
if !dict.contains!input'' return input#
if !memoiBed.containsCey!input' ;
return memoiBed.get!input'#
@
int len < input.length!'#
for !int i < =# i 9 len# i>>' ;
&tring prefix < input.substring!? i'#
if !dict.contains!prefix'' ;
&tring suffix < input.substring!i len'#
&tring seg&uffix < &egment&tring!suffix dict'#
if !seg&uffix A< null' ;
memoiBed.put!input prefix > $ $ > seg&uffix'#
return prefix > $ $ > seg&uffix#
R
R
%e%oiCed.put9input' null:O
return nullO
R
A$ain the candidate should e ale to per,or% the worst-case analysis. 4he key insi$ht is that
"e$%ent"trin$ is only called on su,,i+es o, the ori$inal input strin$' and that there are only &9n:
su,,i+es. I leave as an e+ercise to the reader to deter%ine that the worst-case runnin$ ti%e o, the
%e%oiCed solution aove is &9n
2
:' assu%in$ that the sustrin$ operation only re0uires constant
ti%e 9a discussion which itsel, %akes ,or an interestin$ tan$ent:.
,hy I 9ove /his Problem
4here are lots o, reasons I love this prole%. IKll enu%erate a ,ew!
1t !s a real proble% that a%e up !n the ouse of de"elop!ng produt!on soft,are# 1
de"eloped EndeaHs or!g!nal !%ple%entat!on for re,r!t!ng searh &uer!es, and th!s
proble% a%e up !n the onte.t of spell!ng orret!on and thesaurus e.pans!on#
1t does not re&u!re any spe!al!Ied -no,ledge 66 2ust str!ngs, sets, %aps, reurs!on, and
a s!%ple appl!at!on of dyna%! progra%%!ng + %e%o!Iat!on# Bas!s that are o"ered !n
a f!rst6 or seond6year undergraduate ourse !n o%puter s!ene#
@he ode !s non6tr!"!al but o%pat enough to use under the t!ght ond!t!ons of a 7:6
%!nute !nter"!e,, ,hether !n person or o"er the phone us!ng a tool l!-e $ollabed!t#
@he proble% !s halleng!ng, but !t !snHt a gotha proble%# 'ather, !t re&u!res a %ethod!al
analys!s of the proble% and the appl!at!on of bas! o%puter s!ene tools#
@he and!dateHs perfor%ane on the proble% !snHt b!nary# @he ,orst and!dates donHt
e"en %anage to !%ple%ent the f!IIbuII solut!on !n 7: %!nutes# @he best !%ple%ent a
%e%o!Ied solut!on !n 10 %!nutes, allo,!ng you to %a-e the proble% e"en %ore
!nterest!ng, e#g#, as-!ng ho, they ,ould handle a d!t!onary too large to f!t !n %a!n
%e%ory# Most and!dates perfor% so%e,here !n the %!ddle#
-appy 8etirement
;n,ortunately' all $ood thin$s co%e to an end. I recently discovered that a candidate posted this
prole% on <lassdoor. 4he solution posted there hardly $oes into the level o, detail IKve provided
in this post' ut I decided that a prole% this $ood deserved to retire in style.
ItKs hard to co%e up with $ood interview prole%s' and itKs also hard to keep secrets. 4he secret
%ay e to keep ,ewer secrets. An ideal interview 0uestion is one ,or which advance knowled$e
has li%ited value. IK% workin$ with %y collea$ues on such an approach. .aturally' IKll share
%ore i, and when we deploy it.
In the %ean ti%e' I hope that everyone who e+perienced the word reak prole% appreciated it
as a worthy test o, their skills. .o prole% is per,ect' nor can per,or%ance on a sin$le interview
0uestion ever e a per,ect predictor o, how well a candidate will per,or% as an en$ineer. "till'
this one was pretty $ood' and I know that a unch o, us will %iss it.
105 responses so far :
1 rp1 EE Au$ G' 2011 at 2!P? a%
5hy is this not -ust n lookups1 "tart with one character. I, that ,ails' look up the ,irst two'
etc. 5hen a lookup succeeds' insert a space and set the strin$ to start at the ne+t
une+a%ined character.
4hus AaaaaaB eco%es Aa a a a aB. 5orks $reat with a trie.
2 acepalm EE Au$ G' 2011 at (!2= a%
rp1' you/d ,ail the interview
the author already e+plained why in the article.
( binarysolo EE Au$ G' 2011 at (!N0 a%
rp1 6 *ou need to consider cases where a word is co%posed o, a valid word ase with a
su,,i+ that is not a standalone word.
"ay' valid word S an invalid word' such as! AshorterB 6T U short S er U 9valid: S 9invalid:.
@ecursion -ust %akes the %ost sense as the author wroteO reducin$ the prole% y
accountin$ ,or the strin$ ,ro% the ack. As he points out' ad case is when you have tons
o, viale character co%inations that re%ain workale with prior chars in co%inations'
and one persistent inco%patiility that ,orces the e+ploration o, the entire space o,
solutions .
Fr' I/% not articulatin$ it wellO -ust read his answer which e+plains it a lot etter.
N 4rant -usbands EE Au$ G' 2011 at (!N( a%
Vrp1' you/re ,ailin$ to think aout acktrackin$O i%a$ine the input is AaaaaaB.
P 0eems easy EE Au$ G' 2011 at (!NN a%
rp1Ws solution see%s per,ectly ,ine to %e. 4he author did not e+plain why I couldn/t use
it.
I, you think di,,erently e+plain why.
= 8ic+ ,illiams EE Au$ G' 2011 at (!N= a%
*ou %ake Ainterview candidatesB si$n .#As in order to interview. 4hat/s a%aCin$.
Dind sharin$ the te+t o, the .#A' I/d like to see what it covers. 4hanks in advance.
X binarysolo EE Au$ G' 2011 at (!N= a%
Incidentally I/% kinda surprised this is considered a worthy-enou$h interview pro. I/%
not a pro$ra%%er y trade' thou$h I en-oyed the )"10=+S10X se0uence at a li/l school
near #aniel/s location and would think that a ,resh%an with so%e a%ount o, )"
thinkin$Ee+perience would trivially solve this.
G 6ui EE Au$ G' 2011 at (!PN a%
the %ain issue with such interviews is that P0> o, the candidates have troules answerin$
those under stress no %atter how si%ple.
i know its an issue hard to solve.
interviewers don/t care aout ,indin$ the proper candidate. they care that a candidate
passed tests so can/t e la%ed i, the candidate does not per,or% well enou$h at work.
interviewers care o, a level o, assurance' i, you pre,er. hey it/s not as ad as it sounds' the
candidate indeed is $oin$ to e likely to ,it the position.
will he really ,it' will he like it and is he actually $ood at solvin$ HnewH prole%s or -ust
solvin$ classic interview 0uiCCes 1 5ho cares thats a risk the interviewer is willin$ to
take.
A true interviewer' that is' so%eone who/s %ain care is that the candidate is $oin$ to e
helpin$ out the co%pany is not $oin$ to $ive all these stupid 0uestions the sa%e way. A
true interviewer' while he has a lot o, 0uestions and steps prepared will not serve the% to
the candidate. he will learn to know the candidate in a short a%ount o, ti%e and ask the
ri$ht 0uestions' ones which %ay not even have een prepared e,ore.
a true interviewer sees the challen$e in every candidate he interviews' and not -ust the
opposite 9where the candidate is challen$ed y the interviewer/s 0uestions:
a true interviewer disre$ard the crap 9hello est 0uestions list: and ,ocus on what %atters.
,inally' new talents in the it world are 2A@# to co%e y so there/s a lot o, co%petition
to recruit the%. well' let %e tell you this strai$ht' there/s %any talents which are -ust
discarded y the reviewEinterview process that are here ,or you to pick ,ro%.
thats how we ,ind $e%s that everyone elses passed upon. then people wonder why %y
tea% always outper,or%s their.
? t6s EE Au$ G' 2011 at N!(2 a%
Interestin$ prole%' which I don/t think I/ve seen e,ore. 2ere/s the &9nY2: solution I/d
have co%e up with i, con,ronted with this thin$.
3uild a directed $raph consistin$ o, vertices laeled 0' 1' 2' 8' n' where n is the len$th o,
the strin$' where there is an ed$e ,ro% k to - i, and only i, there is a dictionary word o,
len$th --k startin$ at position k in the strin$.
4his can e done in &9nY2:.
"olutions to the prole% then correspond to paths throu$h the $raph ,ro% 0 to n. ;se
so%ethin$ like #i-kstra/s al$orith% to ,ind a %ini%al path in &9nY2:' which corresponds
to a solution that uses the s%allest nu%er o, dictionary words to e+actly cover the strin$.
10 7enjash EE Au$ G' 2011 at N!P( a%
I% not a ,ully lown coder y any %eans.
3ut' I was thinkin$ a solution' would e to reak the strin$ into consonant clusters
9syllales %aye1: and $roup the% into wei$hted $roups. &r so%e al$orith% %easurin$
le+ical units within the strin$.
so%e dataase o, word clusters like!
"trin$ U applepieis$reat
(wordcluster U AappB' AeatB' ApleB' AondB etc 8
2wordcluster U AapB' AleB' ApiB' AisB etc
4hen uild up the words
app S le E ap S ple E ple S pie E ple S pi E pie S is E etc
4hen workin$ ack ,ro% the i$$est clusters run look ups a$ainst the dictionary see i, its
a word.
4hen it would e a si%ple -i$saw puCCle o, what words ,it in the strin$.
11 7ob EE Au$ G' 2011 at N!PN a%
V"ee%s easy
Let/s assu%e that this is the dictionary!
this
te+t
is
short
shorter
.ow let/s apply rp1Ws solution to the prole%. It will walk alon$ the strin$ until it ,inds a
%atch and apply that %atch to the output. It produces this!
this te+t is short er
AshortB is oviously a %atch ,or the ,irst P characters o, AshorterB. "ince the solution
analyses those P characters ,irst and ,inds a %atch' it happily accepts the word AshortB. It
now has the strin$ AerB to analyse. AerB isn/t in the dictionary' so we/ll assu%e it -ust $ets
ta$$ed on the end instead o, discarded.
4he correct ehaviour here would e to start acktrackin$. I, we have AerB as an
un%atched strin$' it is possile that it connects with so%ethin$ we/ve %atched
previously. "o' let/s try AterB. .ope' no %atch. .ow ArterB. .ope' no %atch. 2owever'
when we eventually try AshorterB' we/ve $ot a %atch. 5ith acktrackin$' we/ve ,ound a
solution that works ,or everythin$ in the strin$!
Athis te+t is shorterB
)learly this isn/t &9n: as we have to re-e+a%ine co%ponents o, the strin$ %ultiple ti%es.
In ,act' rp1Ws al$orith% is &9nY2:' not &9n: as he su$$estst.
12 rc EE Au$ G' 2011 at P!02 a%
V "ee%s easy
"uppose the words are a' aa' aaa' a
"uppose the input is aaa.
4he proper se$%entation is aaSa' ut i, you do it rp1s way you/d try aS1 and ,ail to ,ind
a se$%entation. I, you also started with the lon$er strin$ aaa you/d still ,ail.
In the worst case scenario there/s a tiny chance you/d $et the ri$ht answer $reedily
1( /om ,hite EE Au$ G' 2011 at P!1X a%
)oder interviews are Arocket scienceB. 2ow to detect the est pro$ra%%ers is a hard
prole%.
3ut' this codin$ test is ,or a pro$ra%%er who is $oin$ to e hired to write a strin$-
processin$ lirary 9or si%ilar:. 4hat %i$ht not e your $oal.
4here is also a pretty hi$h de$ree o, 0uirks in this prole%. Dost pro$ra%%ers co%e
,ro% a ack$round o, processin$ whole lines' whole sentences' whole words' etc. 4his
see%s to e askin$ the pro$ra%%er to %ake one dictionary lookup Hper letterH. Perhaps
too %uch o, a rain shi,t ,or a hi$h-pressure interview situation.
4hen' you think you speed it up y creatin$ a &9n: cache to hold words you have seen
when the &9lo$ n: dictionary has all the words in %e%ory 9and the virtual %e%ory pa$es
that $et loaded wouldn/t $et pa$ed out to disk durin$ the one %illisecond this runs:. *our
cache can only really help i, it is the case that the pro$ra%' the cache' and the input strin$
are 0uite s%all and ,it in the ,astest level o, processor cache.
4his article and e+ercise con,ir%s %y e+perience that %ost interviewers are not $ood
enou$h pro$ra%%ers the%selves and uniased enou$h to reco$niCe the $reat
pro$ra%%ers when they pass y. 3ut' that/s &J' the 2@ depart%ent has usually already
re-ected the% due to so%e Aprole% with their resu%eB.
*ou are correct that a codin$ prole% is asolutely necessary. It needs to e %a+i%ally
understandale so as to overco%e the candidate/s an+iety. It needs to start easy and work
up via interactive discussion to show the pro$ra%%er/s actual level o, co%petence. 4he
auto%ated wesites ,or this look interestin$ ut can only do a ,irst level o, checkin$ ,or
asic pro$ra%%in$ aility.
5e all need to step outside ourselves and understand what %akes so%eone a $ood
pro$ra%%er! s%art' 0uick learner' and $ood to work with.
1N rc EE Au$ G' 2011 at P!P( a%
V4o% 5hite
4he cache has nothin$ to do with speedin$ up lookups or the siCe o, the dictionary. I, you
have a strin$ o, len$th L than the standard acktrackin$ solution will in the worst-case do
2YL dictionary lookups. 3y keepin$ track o, which sustrin$s are possile to se$%ent we
can reduce this to ZL lookups.
1P Ian ,ri'ht EE Au$ G' 2011 at P!PX a%
As to so%eone who hasn/t coded since hi$h school' I/% curious as to what le$al actions
you take a$ainst sites like <lassdoor that pulish in,or%ation ared in .#As. I assu%e
you would need to know who actually provided the in,or%ation to the site to rin$ any
sort o, penlites a$ainst the%. At the end o, the day you/re -ust closin$ the stale door
a,ter the horse has already olted.
1= roc+etsur'eon EE Au$ G' 2011 at X!(N a%
4here/s a u$ in the $eneral solution' it doesn/t work as written. Daye your interview
0uestions should include a section on testin$1
1X 8etric EE Au$ G' 2011 at X!N( a%
@eadin$ this I can/t help ut your ,allin$ into the -ava trap o, tryin$ to %ake a
ridiculously $eneric solution which I would consider a dan$er si$n ut plenty o, people
love to see.
Anyway' assu%in$ a real world input and real world dictionary you can try plenty o,
thin$s that reak down ,or a dictionary that includes ,our hundred A/s ut are actually
valid solutions. Also' i, you want a ,ast real world solution then stickin$ with a pure
lookup dictionary would slow thin$s down. F[! 3ein$ ale to toss P letters into a lookup
tale that says the lon$est and shortest words that start with those letters would save a lot
o, ti%e. 3asically' \++yCy/ U null' null saves you ,ro% lookin$ up + to the %a+i%u%
word len$th in your dictionary. "econdly sanitiCin$ inputs is $oin$ to e oth sa,er and
,aster. <ranted anythin$ you are codin$ in NP %inutes is $oin$ to e a lon$ way ,ro%
production ready code.
P"! Fven with acktrackin$ you can code an &9n: solution ,or vary lar$e input sets. Just
keep track o, the J est output se0uences that are aka .' .-1'.-2'8.-J. 9J ein$ the
len$h o, the lon$est word in your dictionary.:
1G jeremy EE Au$ G' 2011 at G!0( a%
Many candidates for software engineering positions cannot come up with the above or
an equivalent (e.g., a solution that uses an explicit stac! in half an hour. I"m sure that
many of them are competent and productive. #ut I would not hire them to wor on
information retrieval or machine learning problems, especially at a company that
delivers search functionality on a massive scale.
I have a co%%ent aout the relationship o, this 0uestion to A%assive scaleB in,or%ation
retrieval and %achine learnin$ prole%s. .aturally' no one wants a &92Yn: solution. 3ut
even when the solution is &9nY2:' i, the siCe o, your input n is 10 illion 9aka %assive
scale' we scale' i$ data:' even an ele$ant' %e%oiCed al$orith% isn/t tractale anyway'
correct1
"o while I indeed like this 0uestion' ecause it has a Apro$ressive revealB in levels o,
thinkin$' does so%eone workin$ on we scale 9%assive: data really ever need to
implement the hi$hest level o, thinkin$1 &r is it that you -ust want a pro$ra%%er who is
aware o, the issue' even i, she or he never has to use it1
1? /he ;word brea+< problem = en>47?blo' EE Au$ G' 2011 at G!0? a%
]...^ @etirin$ a <reat Interview Prole% 9via @etirin$ a <reat Interview Prole%:' this is
the est I could $et in less than (0 ]...^
20 betacoder EE Au$ G' 2011 at G!1P a%
4rivial ut noted! 4he si%pli,ied solution also has a o,,-y-one error.
21 Daniel /un+elan' EE Au$ G' 2011 at G!21 a%
4o those who noted the o,,-y-one error! thanks' it/s ,i+ed now.
22 Daniel /un+elan' EE Au$ G' 2011 at G!2( a%
Vinarysolo
I also used to think this prole% was too easy. F+perience proved otherwise. In ,act' I
,ound at <oo$le that per,or%ance on this prole% correlated stron$ly to overall
per,or%ance in the interview process' aleit with a li%ited sa%ple siCe.
2( rc EE Au$ G' 2011 at G!2P a%
V@etric con$ratulations' you/ve reinvented dyna%ic pro$ra%%in$ with your &9n:
solution 4hat/d e a per,ect solution.
Also' i, your %ap is stored in sorted order 9like in )SS: a inary search will ,ind you the
lon$est word that ,its the re%ainder o, the strin$. 94o also e,,iciently ,ind all o, the words
that ,it will re0uire so%ethin$ e+tra.:
2N Daniel /un+elan' EE Au$ G' 2011 at G!2X a%
V@ick 5illia%s
It/s pretty co%%on ,or co%panies to %ake candidates si$n .#As. I/ve learned aout
con,idential in,or%ation ,ro% several co%panies I interviewed with' typically as part o,
the sell o, e+citin$ thin$s I could work on.
3ut ,or$et the le$alities' which are unen,orceale in practice. Instead think aout how
you/d like yoursel, and other en$ineers to e assessed. Part o, the point o, interview
prole%s is to %ake hirin$ aout %ore than a person/s resu%e. It/s not clear who $ains
when interview prole%s are retired ecause they/ve een disclosed. "till' I reco$niCe
practically that it is i%possile to %aintain secrets once enou$h people know the%.
2ence %y co%%ent at the end o, the post.
2P jdavid.net EE Au$ G' 2011 at G!2? a%
Questions like this are $rand i, you are lookin$ ,or a \"earch Fn$ineer/ ut i, you are
lookin$ ,or a ,ront end developer or so%eone that understands how to desi$n a @F"4 api'
or so%ethin$ else use,ull this is a o$us 0uestion.
All to o,ten I have seen interviewers use a one 0uestion ,its all approach to interviewin$.
I have een out o, colle$e ,or a decade and have si$ni,icant work e+perience in
producin$ we pa$es and ,ra%eworks that respond to a rowser or app in a speci,ic way.
Questions like these ,ilter %e out ecause I have not een doin$ colle$e papers or playin$
around in a pro$ra%%in$ lan$ua$es class' ut rather have een uildin$ real tools.
Questions like how would you desi$n your own we ,ra%ework %i$ht e a etter
0uestion. 5hy is -Query popular how does it di,,er ,ro% other -s ,ra%eworks.
)an you write a dele$ated click handler1 2ow does it di,,er ,ro% an assi$ned click
handler1
Please ask a 0uestion relative to an applicant/s roleE e+perience.
4here are plenty o, 0uestions out there that are easy ,or colle$e $rads to answer' ut hard
to e+perienced pro$ra%%ers and vice verse.
2= /he ,ord 7rea+ Problem @ Plus 1 9ab EE Au$ G' 2011 at G!N2 a%
]...^ an interestin$ post on the .oisy )hannel lo$ descriin$ what author #aniel
4unkelan$ calls the word reak prole%. I didn/t ,ind this post interestin$ ecause the
prole% is a $ood interview 0uestion' I ]...^
2X Daniel /un+elan' EE Au$ G' 2011 at ?!00 a%
VCui
.o interview process is per,ect. It/s te%ptin$ to only hire people you/ve worked with M
indeed' I/d e inclined to so that as a ,ounder. 3ut that doesn/t scale' and it also iases the
process towards people who are already well connected. Another approach is to ,ocus
al%ost entirely on the resu%e' ut that has its own prole%s' ,ro% resu%e in,lation to
a$ain ,avorin$ those who started with advanta$es.
I/% y no %eans per,ect ut I think I $ive candidates as ,air a shot as I know how. I hired
a candidate with no colle$e de$ree and a hi$h-school e0uivalency M she per,or%ed
pheno%enally as a so,tware en$ineer and is now a %ana$in$ director at <old%an. I try to
put nervous candidates at ease. 3ut there/s no $ettin$ around that an interview is an
assess%ent process' and not everyone does well when they are ein$ assessed.
I/% curious to hear the details o, how you or your co%pany interview candidates.
Fveryone ene,its i, we can %ake this process etter.
2G 4ee+ 8eadin' Au'ust AB !C11 = 8e'ular 4ee+ EE Au$ G' 2011 at ?!02 a%
]...^ @etirin$ a <reat Interview Prole% ]...^
2? Daniel /un+elan' EE Au$ G' 2011 at ?!0( a%
V@etric
Is it really so $eneric1 I/ve si%pli,ied the prole% a little' ut it/s pretty close to a real
prole% I had to solve ,or i%ple%entin$ search ,eatures that would e deployed in a very
road ran$e o, do%ains' includin$ ,or part-nu%er search. I had the opportunity to see
do%ain-speci,ic heuristics reak down.
(0 Daniel /un+elan' EE Au$ G' 2011 at ?!0G a%
V-ere%y
4o clari,y' the n here is the siCe o, the input strin$' rather than the nu%er o, users or
docu%ents. 3ut al$orith%s like these run in a hi$h-tra,,ic environ%ent' with re0uests
ein$ processed concurrently. A ,eature whose cost lows up e+ponentially with a ad
input can take a site down. <ranted' there are other ways to $uard a$ainst such ,ailures
9e.$.' ti%e-outs:' it/s $ood to use al$orith%s that don/t have such low up. And even
etter to understand the ehavior o, al$orith%s e,ore rather than a,ter those al$orith%s
%ake it to production.
(1 4rant -usbands EE Au$ G' 2011 at ?!0? a%
Vrc,' V@etric! *ou don/t need to invent new structures ,or the dictionaryO lots o,
variants o, trieE.7AE#7A will do -ust ,ine. 2owever' the 0uestion e+plicitly disallows
chan$in$ the dictionary and its API. I/% sure #aniel is aware that &9.: solutions e+ist i,
the dictionary is i%proved' ut interview 0uestions are ,or e+plorin$ the ailities o, a
candidate rather than ,indin$ the est possile solution.
(2 jeremy EE Au$ G' 2011 at ?!(( a%
@i$ht' I understand that this e+a%ple prole% is a s%all input strin$. Perhaps what I was
tryin$ to ask is how indicative o, a real world %assive scale %achine learnin$ or
in,or%ation retrieval prole% this e+a%ple prole% really is.
A$reed aout ein$ ale to reco$niCe that an al$orith% %i$ht have 0uadratic or
e+ponential lowup. 3ut a$ain' I/% askin$ aout how realistically o,ten one has to
i%ple%ent solutions that need dyna%ic pro$ra%%in$ when doin$ in,or%ation retrieval
%achine learnin$ on a we scale. I a% assu%in$ that you/re not -ust talkin$ aout parsin$
the input. I assu%e when you say %assive scale in,or%ation retrieval and %achine
learnin$' you/re workin$ on al$orith%s to e+tract patterns ,ro% the data' to ,ind
relationships' to 9set ased: e+traction o, related entities. 7or e+a%ple.
And in that case' the siCe o, the input isn/t 100 or 1000. 3ut %illions. 3illions. "o a$ain'
how o,ten is %e%oiCation necessary' in practice1
9And reco$niCin$ that so%ethin$ is $oin$ to e .P co%plete' or has a 0uadratic-ti%e #P
solution' or a 0uadratic ti%e appro+i%ation' is di,,erent than ein$ ale to code that
solution' durin$ a hal, hour interview. "o %i$ht it perhaps e etter to test one/s aility to
reco$niCe say X or G di,,erent prole%s as to their potential co%ple+ity' rather than
havin$ a candidate write code ,or a sin$le e+a%ple1:
(( Daniel /un+elan' EE Au$ G' 2011 at ?!P0 a%
V-ere%y
#i,,erent prole%s test di,,erent skills. I/ve never used this 0uestion as the only
deter%inant in assessin$ a candidate' ut I/ve ,ound that it provides %ore its o,
in,or%ation than %ost.
(N 8etric EE Au$ G' 2011 at 10!01 a%
V<rant 2usands I was not su$$estin$ you needed to recreate a dictionary. 2owever' the
ideal pro$ra% ,or a hu%an lan$ua$e dictionary where the lon$est word is (2ish di$its vs.
so%e input te+t is very di,,erent than so%ethin$ you would use i, you had 200 di,,erent
Z100'000 di$it #.A se0uences in your dictionary. )onceder with a lon$ enou$h input
strin$ iteratin$ over the ,ull dictionary and creatin$ a 0uick inde+ could easily reduce
runti%es ,ro% years to seconds. And with a short enou$h input strin$ any opti%iCations
are asically pointless. F[! \cat/.
(P "arl EE Au$ G' 2011 at 10!01 a%
#aniel'
I think you have a oundary prole% in your e+a%ple M the sustrin$ ,unction in -ava
takes 9to %y %ind: weird ar$u%ents. *ou have to call sustrin$ until the endInde+ E ri$ht
ar$u%ent is e0ual to the strin$ len$th.
e$!
"trin$ test U A012(NP=_O
puts9A>s len$th >dB' test' test.len$th9::O
,or 9int iU1O i >sB' 1' i' test.sustrin$90' i::O
produces
pre,i+ ]1' 1^ -T 0
pre,i+ ]1' 2^ -T 01
pre,i+ ]1' (^ -T 012
pre,i+ ]1' N^ -T 012(
pre,i+ ]1' P^ -T 012(N
pre,i+ ]1' =^ -T 012(NP
pre,i+ ]1' X^ -T 012(NP=
(= "arl EE Au$ G' 2011 at 10!0P a%
3lech' ht%l
http!EEcodepad.or$EAld1l?v
(X 8etric EE Au$ G' 2011 at 10!11 a%
P"! 3y inde+ I %ean ,ind siCe o, the lon$est word' and or do other preprocessin$.
(G Daniel /un+elan' EE Au$ G' 2011 at 10!1P a%
VFarl
Are you sure1 )han$e the ,or loop to
,or 9int iU1O i ` U test.len$th9:O iSS:
puts9Lpre,i+ ]>d' >d^ -T >sB' 0' i' test.sustrin$90' i::O
R
produces
pre,i+ ]0' 1^ -T 0
pre,i+ ]0' 2^ -T 01
pre,i+ ]0' (^ -T 012
pre,i+ ]0' N^ -T 012(
pre,i+ ]0' P^ -T 012(N
pre,i+ ]0' =^ -T 012(NP
pre,i+ ]0' X^ -T 012(NP=
(? 4rant -usbands EE Au$ G' 2011 at 10!1? a%
V@etric! *our su$$ested P-letter lookup was essentially a chan$e to the dictionary APIO
it is to that that I was re,errin$. 7or any lack o, clarity on %y part' I apolo$ise. Anyway'
there are plenty o, ways o, preprocessin$ the dictionary' and I %entioned co%%on ones'
ut none o, the% ,it the prole% description' which e+plicitly disallows such
preprocessin$' %akin$ this deate irrelevant.
N0 1ohn - EE Au$ G' 2011 at 10!(( a%
5hy are you retirin$ it1 3ecause it/s out on the we1
I don/t think havin$ knowled$e o, the interview 0uestion ahead o, ti%e necessarily
precludes its use,ulness. 3ein$ ale to deliver the solution 0uickly and e,,iciently is still a
valuale assess%ent o, skill. Also' assu%in$ the candidate has to answer N-= diverse
prole%s over the course o, the day' it/s still a pretty $ood screen to have the%
\reproduce %e%oriCed answers/ 9assu%in$ they knew the% ahead o, ti%e:. 4his is
,urther %iti$ated y havin$ a couple o, prole%s you can switch etween' now the
candidate would need to have 2P-P0 0uestions %e%oriCed.
2onestly' havin$ the solution to P0 interview 0uestions to )" prole%s %e%oriCed is
pretty $ood. &n top o, that' very ,ew candidates do the research needed to ,ind the
prole%s ahead o, ti%e.
P"' 4hanks ,or the outline' I/% $oin$ to use this 0uestion ,or %y candidates in the ,uture.
N1 Charles 0calani EE Au$ G' 2011 at 2!N0 p%
I have always hated interview 0uestions' which is why I don/t use the%. I/d rather solve a
prole% 5I42 the candidate or i, I/% ein$ interviewed then with the interviewer. I
want real world prole%s that have .&4 een pre-solved. I want to have a desi$n
discussion with the% and talk throu$h the desi$n and i%ple%entation issues. )odin$ is
trivial a,ter that.
I would rather have the candidate rin$ in an e+a%ple o, code that is non-proprietary and
so%ethin$ they/re particularly proud o,. 4hen I review the code with the% like I would i,
they worked ,or %e.
I/d %uch rather have an interview as a pseudo-workin$ session. 4hen I can see how it
would e to actually work with that person. 4here/s no etter way to see how so%eone
thinks than %akin$ the% work 5I42 youO not -u%p throu$h your arti,icial hoops.
N2 binarysolo EE Au$ G' 2011 at (!2N p%
V#aniel
5ell' I still dunno i, this is a $reat interview prole% per se' ut it sure is a $reat
conversation starter $iven the very HaccessileH nature that even us lay%en who don/t
have %uch pro$ra%%in$ can access the 0uestion and think o, reasonale
i%ple%entations.
I/% not ,a%iliar enou$h with the pro$ra%%in$ world' ut I/d i%a$ine that the value o,
clever thinkin$ and e,,icient structural thinkin$ TT so%e technical' nuanced aspect o,
so%e co%puter lan$ua$e which is what this prole% ,ields out.
N( jeremy EE Au$ G' 2011 at P!1= p%
$ifferent problems test different sills. I"ve never used this question as the only
determinant in assessing a candidate, but I"ve found that it provides more bits of
information than most.
A$ain' yes.. ut you speci,ically said that this speci,ic 0uestion didn/t -ust test a
candidate/s aility to think $enerally' co%puter-scienti,ically. 3ut a candidate/s aility to
think speci,ically in ter%s o, %assive scale' %achine learnin$ and in,or%ation retrieval.
And it/s that speci,ic connection M etween #P and %assive scale in the conte+t o, DL
and I@ M that I/% stru$$lin$ to understand' rather than the roader 0uestion o, whether a
candidate is a $ood co%puter scientist.
It/s -ust that.. oh' never%ind. I/ll take it o,,line.
NN 8ic+ ,illiams EE Au$ G' 2011 at X!0= p%
4hank you ,or the response aout .#As. I a$ree co%pletely that it is unpro,essional to
pulish secret interview 0uestions a,ter an interview.
3ut that/s di,,erent ,ro% the .#A issue. I/ve een on doCens o, interviews and have
never een asked to si$n an .#A ,or interviewin$' nor have I re0uired one o, anyone I
a% interviewin$. .#As are so%ethin$ clients si$n e,ore ein$ in,or%ed o, proprietary
usiness trade secret in,or%ation. 4his is done only when asolutely necessary since it is
%uch etter si%ply not to reveal trade secrets to outside parties in the ,irst place. It/s
0uite iCarre to hear o, the% ein$ re0uired ,or interviews and I don/t really elieve that
this is a co%%on practice. I, it is co%%on in so%e se$%ent o, the ,ield' then it is an ill
advised practice.
NP 4olam Dawsar EE Au$ G' 2011 at X!(P p%
<reat interview 0uestion' ut in %y e+perience as a pro$ra%%er' I have not seen %any
pro$ra%%ers who can cook up dyna%ic pro$ra%%in$ solutions to prole%s like these'
let alone durin$ the stress o, an interview. Fven reco$niCin$ this as a dyna%ic
pro$ra%%in$ 0uestion will e hard ,or %any.
4hanks ,or such a nice post #aniel. Fn-oyed readin$ it a lotQ
N= Debnath EE Au$ G' 2011 at G!1( p%
4he $eneral solution will ter%inate i, it ,inds the word in the dictionary' should it still not
continue1 I %ean' there could e a word like Aend$a%eB' ,or the lack o, a etter e+a%ple
9or acktrackin$:' which %i$ht e a part o, the dictionary' ut are also individual
words8I can understand they %ay not e popular in I@ thou$h since they are usually an
areviation o, the su-words' and the su words don/t really %ake sense independently'
ut $iven the prole% de,inition8
NX A 'reat interview Problem = Phani+umar EE Au$ G' 2011 at 10!20 p%
]...^ prole% with the code and runti%e o, the al$orith% %entioned in the post.
http!EEthenoisychannel.co%E2011E0GE0GEretirin$-a-$reat-interview-prole%E 4his entry
was posted in )ode y phani. 3ook%ark the ]...^
NG Daniel /un+elan' EE Au$ G' 2011 at 11!(X p%
VJohn 2
I hope this 0uestion serves you well. It/s ti%e ,or %e to %ove on ,ro% it' and I thou$ht
the est way to retire this prole% was to do so in a way that others would learn ,ro% it.
N? Daniel /un+elan' EE Au$ G' 2011 at 11!N2 p%
V)harles "cal,ani
5e do have collaorative prole% solvin$ and product desi$n discussions as part o, the
interview process. 3ut I also want to see how a candidate writes code. @eviewin$ code
they/ve written e,ore is an option' ut it/s tricky M especially ,or a candidate that has
not written non-proprietary code in a lon$ ti%e.
P0 Daniel /un+elan' EE Au$ G' 2011 at 11!NN p%
V#enath
4he ter%ination condition is part o, the prole% state%ent. 4he prole% could e
chan$ed to re0uire outputtin$ all valid se$%entations' ut there could e an e+ponential
nu%er o, the%. 5e could also re0uire the AestB se$%entation' which is an interestin$
desi$n 0uestion as to what constitutes AestB.
P1 2n ;8etirin' a 4reat Interview Problem< @ ,ill.,him EE Au$ ?' 2011 at 1!N? a%
]...^ 4unkalen$ wrote an interestin$ lo$ post' A@etirin$ a <reat Interview Prole%B on
an interview prole% that he has' in the past' posed to interviewees' ut which he has ]...^
P2 ,ill 3it6'erald EE Au$ ?' 2011 at 1!P1 a%
I wrote a response to this' A&n \@etirin$ a <reat Interview Prole%/B at
http!EEt.coEu[Q,Noh.
P( Daniel /un+elan' EE Au$ ?' 2011 at =!N1 a%
5ill' I read your response. A short one here! assess%ent o, candidates on a prole% like
this isn/t inary. @ather' the point is to $et as holistic a picture as is possile in the ti%e
constraints o, how well a candidate solves an al$orith%ic prole% and i%ple%ents it. It/s
not a per,ect tool' ut no tool is. I/% always on the lookout ,or etter ones M don/t ,or$et
that I/% retiring this prole%.
And you allude to a candidate/s nervousness under interview conditions. Part o, the
interviewer/s -o is to put the candidate at ease. 4hat isn/t uni0ue to 0uestions that
involve codin$' and it doesn/t always work out. .o interviewer or hirin$ process is
per,ect.
7inally' while I share your concerns aout usin$ whiteoard codin$ in interviews 9see %y
link to #ie$o/s post:' I disa$ree that it discri%inates a$ainst older candidates. 4hat very
hypothesis strikes %e as a$eist' at least in the asence o, supportin$ data.
PN Patric+ /illand EE Au$ ?' 2011 at 11!0= a%
As rocketsur$eon %entioned' there is one u$ and also one typo in the code. 4here is a
parenthesis %issin$ in this line!
i, 9dict.contains9input: return inputO
And the loop condition never reaches the ne+t to last character in the input strin$!
,or 9int i U 1O i ` len 6 1O iSS:
PP Daniel /un+elan' EE Au$ ?' 2011 at 11!1N a%
Vrocketsur$eon
VPatrick 4illand
3u$s ,i+ed. 4hanks $uysQ
P= 0onic Charmer EE Au$ 10' 2011 at (!NG a%
I won/t write this out in JavaEwhatever synta+ as I a% too laCy' also not a pro$ra%%er so
not interestin$ in %e%oriCin$ synta+ o, this or that lan$ua$e' ut as stated I a% laCy so I
would want a \$ood/ se$%entation' not -ust any. 9I don/t want a lot o, Aa/sB i, there/s an
AaaaaB availale:. "o I would check the lon$est-len$th word in the dictionary' say that
len$th is k 9o, course cap this at input-strin$ len$th and discard all dictionary words
$reater than this 6 I/ll assu%e the dictionary has easy capaility oth o, this %a+-len$th-
check and to \discard/Ei$nore all Tk:' then startin$ ,ro% iU1 search all 9i'iSk-1: sustrin$s
o, the input unlessEtill I ,ind a %atch' i, none reduce k 9discard %ore dictionaryQ: a try
a$ain' i, so parse out a into \%atch/' pre,i+ and su,,i+ 9as applicale: and recurse onto
oth o, the latter. #etails too orin$Eovious to spec out.
Personally I dislike cutesy \interview 0uestions/ and a% instinctively distrust,ul o, the
,ilter they i%plicitly apply to candidates. 5hen I/% interviewin$ so%eone I do what is
known as \talk to/ the person' one %i$ht even say I \have a conversation/ with the%. 4hat
%ayE%ay not e etter ut %y way at least I don/t think so%eone could slide throu$h the
interview ,ilter y %e%oriCin$ so%e stu,, o,, wesites.
PX 0tavros &acra+is EE Au$ 10' 2011 at X!P? a%
Dany skills $o into a $ood so,tware en$ineer' and di,,erent interviewers will proe
di,,erent skills. 4his prole% e%phasiCes al$orith%ic thinkin$ and codin$ M and I ask a
si%ilar interview 0uestion %ysel,. 3ut al$orith%ic codin$ is eco%in$ surprisin$ly
unco%%on in %any environ%ents ecause the non-trivial al$orith%s are wrapped in
liraries. &, course' there are places where understandin$ all this is crucial M so%eone
has to write those liraries' and so%e people really do have 3i$ Prole%s that the
liraries don/t address M ut it doesn/t see% to e a central skill that all pro$ra%%ers
need to %aster.
Is this $ood or ad1 )onsider A... 5hitehead! A)iviliCation advances y e+tendin$ the
nu%er o, i%portant operations which we can per,or% without thinkin$ aout the%.B
PG 0tavros &acra+is EE Au$ 10' 2011 at G!1( a%
&ne thin$ I/ve learned aout interview 0uestions is that you have to lead up to the% in
steps i, you want to deter%ine where a candidate/s understandin$ trails o,, M which %ay
e very soon.
7or e+a%ple' %any candidates clai% on their resu%es that they know "QL. I used to ask
such candidates how they would deter%ine i, person A was an ancestor o, person 3 $iven
a tale o, parent-child relations. 4his re0uires the 9advanced: "QL ,eature o, recursive
0ueries 9and I/d actually e happy i, they could e+plain why it couldn/t e done in "QL'
as it can/t in asic "QL:. .ow' I ask the 0uestion in sta$es!
H In "QL' how would you represent parent-child relations1
H 2ow would you ,ind [/s parents1
H 2ow would you ,ind [/s $randparents1
H 2ow would you ,ind all o, [/s ancestors1
H 5hat i, you wanted to do all this hi$h-volu%e $enealo$y 5e site M would you
chan$e your tale desi$n or 0ueries1 or use so%e technolo$y other than "QL1
I was shocked to discover that %any candidates who listed "QL on their resu%es couldn/t
do HanyH o, this' and %any re0uired considerale coachin$ to do it. &ne candidate didn/t
even re%e%er that "QL 0ueries start with "FLF)4 M I would have ,or$iven this i,
he/d had conceptual understandin$ ut had -ust ,or$otten the keyword' ut he had Cero
conceptual understandin$ as well.
All this to say that you can/t really trust the sel,-reportin$ on a resu%e and you/ve $ot to
proe to understand what the candidate actually knows.
P? Daniel /un+elan' EE Au$ 10' 2011 at G!N2 a%
V"onic )har%er
*our sketch is actually a reasonale start towards what I/d e+pect o, a candidate. In ,act'
I think I could persuade you that your assu%ptions would have to e $eneral enou$h to at
least solve the interview prole% as a special case where the %in word len$th is one and
the %a+ is lar$e enou$h to e the len$th o, the input strin$. Please ear in %ind that this
AcutesyB prole% is a si%pli,ication o, one I had to solve to deliver so,tware that has
een deployed to hundreds o, %illions o, peopleQ
V"tavros
*ou/re ri$ht that not all so,tware en$ineers needs to have stron$ co%%and o, al$orith%s.
3ut I do re0uire that stren$th o, the ,olks I hire' $iven the prole%s that %y tea% solves.
"a%e applied at <oo$le and Fndeca. And yes' sel,-reportin$ on a resu%e is always
su-ect to the %a+i% o, Atrust ut veri,yB.
=0 ,ord 7rea+s @ Pro'rammin' Pra#is EE Au$ 12' 2011 at 2!0( a%
]...^ 4unkelan$ posted this interview 0uestion to his lo$! <iven an input strin$ and a
dictionary o, words' se$%ent the input strin$ into a space-separated ]...^
=1 Daniel /un+elan' EE Au$ 12' 2011 at =!2N a%
VPro$ra%%in$ Pra+is
A solution in "che%e. .iceQ
=2 7en &abey EE Au$ 1N' 2011 at N!1= p%
4hanks ,or sharin$ this prole%Q I did a )lo-ure and @uy solution and discussed the
di,,erences in laCy 9as in laCy lists: and non-laCy solutions!
http!EEen%aey.co%E2011E0GE1NEword-reak-in-clo-ure-and-ruy-and-laCiness-in-
ruy.ht%l
=( Daniel /un+elan' EE Au$ 1N' 2011 at G!(2 p%
3en' thank youQ I/% honored to have inspired such an insi$ht,ul and ele$ant post.
=N Conductin' a 8emote /echnical Interview = -irin' /ech /alent EE Au$ 1P' 2011 at
?!1? a%
]...^ hacked' as the wise candidate can research typical 0uestions ahead o, ti%e. #aniel
4unkelan$ has a $reat post on this' where he ,ound that one o, his est 0uestions was
posted on ]...^
=P 8aymond &oore EE Au$ 1X' 2011 at (!21 p%
F+cellent article b topic. 5e are currently recruitin$ ,or a hal, doCen )SS b 5e ;I
positions and ,or %any co%panies it is e+tre%ely di,,icult to deter%ine who% is talkin$
the talk and who% can write VF@* clean code and think lo$ically when ,aced with
di,,icult pro$ra%%in$ re0uests. As an e+a%ple a "ales #irector will hand a candidate ,or
a sales position a phone and say A%ake this call and pitch the%B -ust so you can see what
they have' this is partly what interviewin$ has eco%e like in the I4 world.
== Daniel /un+elan' EE Au$ 1X' 2011 at =!0X p%
@ay%ond' thanks. 4he prole% with all interviewin$ M and with interviewin$ so,tware
en$ineers in particular M is that it/s hard to e+tract a reliale si$nal under interview
conditions.
;lti%ately the est solution %ay e to chan$e the interview conditions. 3ut the approach
has to e oth e,,ective and e,,icient. It/s a $reat research prole% M and I/ll let readers
here know what %y collea$ues and I co%e up with.
&, course' I/d love to hear what others are doin$.
=X 0tate o /echnolo'y E!1 @ Dr DataFs 7lo' EE Au$ 1G' 2011 at 10!PP p%
]...^ 6 2ow to retire a $reat Interview prole% 6 Aword reakB prole% descried as ]...^
=G Anatoly Darp EE Au$ 21' 2011 at 12!20 a%
As a side note' there is a discussion o, a sli$htly %ore $eneral prole% in Peter .orvi$/s
e+cellent chapter ,ro% &/@eilly ook A3eauti,ul #ataB 6
http!EEnorvi$.co%En$ra%sEch1N.pd, 9see section A5ord "e$%entationB:. 2is re%ark that
the %e%oiCed solution can e thou$ht o, as the Viteri al$orith% ,ro% the 2DD theory
is nicely illu%inatin$ 9and o, course ovious upon a %o%ent/s thou$ht:.
=? Daniel /un+elan' EE Au$ 21' 2011 at ?!(1 a%
Indeed. Peter e%ailed %e his AnaiveB solution M un,ortunately' I don/t think he/s on the
%arket.
X0 Attention C&G 0tudentsH EE "ep X' 2011 at X!0N p%
]...^ and 5ednesday. And o, course LinkedIn will e conductin$ on-ca%pus interviews!
those will take place all day on 4hursday' "epte%er ]...^
X1 ,illiam EE .ov N' 2011 at 10!PX p%
2i #aniel!
how could you chan$e your code so that it can ,ind all valid se$%entation ,or the whole
strin$1
F.[. strin$ AaaaB' dictc\a/' \aa/R to e se$%entationcAa a aB' \a aa/' Aaa aBR
X2 Daniel /un+elan' EE .ov P' 2011 at 1!NN p%
5illia%' interestin$ 0uestion. 7or starters' the nu%er o, valid se$%entations %ay e
e+ponential in the len$th M in ,act' that will e the case ,or a strin$ o, n a/s i, every
se0uence o, a/s is a dictionary word. )ould still use %e%oiCation E dyna%ic
pro$ra%%in$ to avoid repeatin$ work' ut storin$ sets o, se0uences rather than a sin$le
one.
X( 8oberto 9upi EE #ec 1X' 2011 at 2!N1 p%
It can e done in &9n:' nUlen$th o, the strin$' with so%e pretty rela+ed assu%ption $iven
the nature o, the prole%! we -ust need a preprocessin$ step on the dictionary.
4he idea is to uild a set o, rollin$ hashin$ ,unction usin$ the @ain-Jarp al$orith%' ,or
each word len$th in the dictionary' and the correspondin$ hash value ,or each word.
4o se$%ent the strin$' we loop over it once' updatin$ the rollin$ hash values 9,or each
len$th: and i, we ,ind a %atch in our set o, hash values ,ro% the dictionary' we have a
potential %atch. 5e still have to check the actual dictionary to con,ir% the %atch'
avoidin$ ,alse positives.
4his desi$n has the added advanta$e that the dictionary can e lar$er than what can ,it
into %e%ory. 5e only need to store the hash values ,or each word in %e%ory.
XN Daniel /un+elan' EE #ec 1X' 2011 at (!01 p%
@oerto' one person I presented the prole%s to did su$$ested an approach alon$ these
lines! since dictionary %e%ership is a re$ular lan$ua$e' -ust uild a ,inite state %achine.
I was i%pressed y the in$enuity' ut I then en,orced the constraint that the dictionary
only supported a constant-ti%e %e%ership test.
3y the way' I/ve since %oved on to less e+citin$ codin$ prole%s that re0uire less
in$enuity and are %ore a test o, asics 9thou$h not 0uite as ele%entary as ,iCCuCC:. I/%
still surprised at how %any candidates with stron$ resu%es ,ail at these.
XP 8oberto 9upi EE #ec 1G' 2011 at 12!2X p%
Daye stron$er candidates tend to overco%plicate prole%s' instead o, solvin$ the% in
the si%plest way they search ,or a clever one and $et lost.
A]4^he stupider one is' the closer one is to reality. 4he stupid one is' the clearer one.
"tupidity is rie, and artless' while intelli$ence wri$$les and hides itsel,. Intelli$ence is a
knave' ut stupidity is honest and strai$ht ,orward.B M #ostoevsky 94he 3rothers
Jara%aCov:
X= Daniel /un+elan' EE #ec 1G' 2011 at 12!(( p%
"o%e stron$ candidates assu%e that an easy solution %ust e too naive and there,ore
wron$. 7or that reason' it/s i%portant to set e+pectations at the e$innin$. I, a prole% is
asic' I tell the candidate as %uch M which also helps avoid a candidate ,eelin$ insulted
or worryin$ that the ar is too low. And ,or all prole%s' I ur$e candidates to co%e up
with a workin$ solution e,ore opti%iCin$ it.
XX Ale# EE Jan 10' 2012 at 11!2( a%
At ,irst' it sounds depressin$. I/% writin$ co%pilers and &"es' a%on$ other thin$s' ut
don/t think I/d pass this interview.
2owever' the %ore I read the %ore I realiCe I/% a di,,erent kind o, pro$ra%%er than who
is sou$ht here. I don/t cite )L@ y heart' solve real prole%s in real %anner 9i.e.'
includin$ askin$ others' not to %ention usin$ ooks' Internet etc.: and $enerally not $ood
at Acodin$B. I $uess %y skills aren/t very %arketale' in this approach. Di$ht e a etter
choice not to pro$ra% ,or so%eody else.
XG Daniel /un+elan' EE Jan 10' 2012 at ?!12 p%
Ale+' I/ve actually switched to usin$ prole%s that are a it less )L@-ish. I still think it/s
reasonale to e+pect so%eone to e ale to real prole%s like this one' thou$h I realiCe
it/s unnatural to solve prole%s under interview conditions.
An alternative approach would e to put less e%phasis on the accuracy o, the
interviewin$ process and treat the ,irst ,ew %onths as a trial period. ;n,ortunately' that/s
not the cultural nor%' so instead we try to s0ueeCe all the risk out o, the hirin$ process.
Anyway' i, you/re ale to write a co%piler or &"' you should have no prole% ,indin$
work you/re $reat at and en-oy.
X? dbt EE Jan 11' 2012 at 2!P0 p%
Ale+' I have heard that la%ent e,ore. I work at a place that uses si%ilarly al$orith%ic
0uestions 9and other 0uestions too M al$orith%s aren/t everythin$' ut they/re i%portant:
and I so%eti%es hear %y coworkers la%ent that they couldn/t $et hired with the
standards we have today. 5hich is' o, course' nonsense.
5hat/s i%portant to realiCe aout these 0uestions is that in an hour' you have ti%e to
%ake an atte%pt at an answer' $et ,eedack' and i%prove your solution. It is a
conversation' and not -ust a lank whiteoard' an interviewer in the corner tappin$ a ruler
on the tale every ti%e you %ake a %istake' and a disappointin$ early trip ho%e.
G0 Daniel /un+elan' EE Jan 11' 2012 at (!2X p%
4appin$ a ruler on the tale1 Dore like rappin$ you on the knucklesQ &J' the nuns didn/t
really do that to us.
"eriously' dt is ri$ht. <ood interviews are a conversation. &therwise' it would e etter
to %ake the% non-interactive tests.
G1 DonIt write on the whiteboard /he Princeton "ntrepreneurship Club EE Jan 2G'
2012 at 2!(( p%
]...^ %e to write tests ,or %y code' ,ind corner cases. 2e then asked %e ( other prole%s.
4hey were #an 4unkelan$ type prole%s. 2e ran out o, prole%s and there were 1P
%inutes le,t. A.or%ally there/s not enou$h ]...^
G2 -irin'. you are doin' it wron' = ?abah'atFs blo' EE 7e 22' 2012 at (!2X a%
]...^ is a challen$e. A lot has een written aout the process itsel, and its 0uirks' ran$in$
,ro% pro$ra%%in$ puCCles to whiteoard interviews. 2owever' there are still a ,ew
details that are o,ten overlooked y ]...^
G( 0trata !C1!. 7i' Data is 7i''er than "ver EE Dar 2' 2012 at 12!PG a%
]...^ %inutes e+tended into three hours o, conversation aout everythin$ ,ro% nor%aliCed
JL diver$ence to interview prole%s M and se$ued into a reception with specialty i$-
data cocktails. 3y the ti%e I $ot ack to %y ]...^
GN man!code EE Dar 1G' 2012 at 1!1G a%
i, 9se$"u,,i+ QU null: c
%e%oiCed.put9input' pre,i+ S B B S se$"u,,i+:O
return pre,i+ S B B S se$"u,,i+O
R
this i, should e added with else as %ention elow' as to show words ,etched e,ore non-
dictionary word !!
i, 9se$"u,,i+ QU null: c
%e%oiCed.put9input' pre,i+ S B B S se$"u,,i+:O
return pre,i+ S B B S se$"u,,i+O
Relsec
return pre,i+ O
R
GP 1p 7ida EE Jun 2G' 2012 at 11!21 a%
5ould usin$ hu%an cycles with captcha/s $et you closer to &9n:1 &r does i$ & analysis
only apply when we actually understand the details o, the al$orith%1
G= netootprint EE Jul 1' 2012 at 1!N1 a%
I was thinkin$ aout the sa%e prole%. Quite surprised to see the sa%e thin$ appear here
and asked ,or interviews .
shouldn/t AaaaaaB e
AaaaaaB S 1 i, a' aa' aaa' aaaa'aaaaa are in dict 1
"tep 1! check ,or AaaaaaB' not in dictionary
"tep 2! check ,or AaaaaaB' ,ound in dictionary
word se$%ents are AaaaaaB S AB
I wonder why the need all co%inations in a Areal-worldB prole%Q
GX Al'orithms. ,hat is the most eicient al'orithm to separateconnectedwords?
assumin' all the constituent words are in the vocabulary Jand also assume or
simplicity that there arenFt any spellin' mista+esK L Quora EE Jul (' 2012 at N!2( a%
]...^ is a thorou$h discussion o, this prole% as an interview 0uestion y #aniel
4unkelan$! ]1^]1^ http!EEthenoisychannel.co%E2011E8)o%%ent Loadin$8 d Post d
N!2(a% Add ]...^
GG 8ob EE Au$ 2X' 2012 at G!(0 p%
I love the prole%8ut I can/t help to notice that the interviewer who has used this
0uestion to ,ilter %any ,ine candidates over the years can/t even write out a workin$
solution without u$s $iven unli%ited ti%e' years o, e+perience askin$ others this
0uestion and do so in the conte+t o, writin$ an in-depth analysis teachin$ the unwashed
%asses aout how $reat an interview prole% this is. )an it e such a $reat 0uestion i,
you can/t even $et it ri$ht1
4he i$$est prole% with 0uestions like this is that this type o, pro$ra%%in$ is a hi$hly
perishale skill. I have played the $uitar ,or 10 years ut lately have spent %ore ti%e
sin$in$ acoustically 6 tryin$ to ,it %y ,in$ers to a 3ach piece that used to co%e to %e
written on the ack o, %y eyelids is now i%possile.
.ow i, I were playin$ 3ach every day it would e a di,,erent story. "a%e with this
prole%. *ou are only $oin$ to hire the $uy who happened to solve several prole%s like
this last week ecause they ran into so%e issue where it was relavent.
G? Daniel /un+elan' EE Au$ 2X' 2012 at G!N1 p%
)riticis% noted. 3ut ear in %ind that the 0uestion isn/t a inary ,ilter. It/s a test o,
al$orith%ic thinkin$ and even o, workin$ out the prole% re0uire%ents.
I/% curious what you %ean y Athis type o, pro$ra%%in$B ein$ hi$hly perishale. I,
you %ean solvin$ a asic al$orith%ic prole% that co%es up in the course o, real work'
then I stron$ly o-ect. As I said in the post' the prole% isn/t ,ro% a te+took M it/s a
si%pli,ied version o, a real issue that has co%e up ,or %e and others in the course o,
writin$ production so,tware.
3ut I $rant that people don/t always have opportunities to use dyna%ic pro$ra%%in$ and
perhaps even recursion. 5ith that in %ind' I/ve switched to interview prole%s that are
less reliant on these. 3ut I still e+pect the people I hire to e ale to apply these
,unda%ental co%puter science techni0ues with con,idence.
&, course there/s a risk with this and any interview prole% o, over,ittin$ to so%eone/s
recent e+perience. 4hat/s why it/s $ood to use a diverse set o, interview 0uestions. I,
so%eone aces the interviews ecause she solved all o, those prole%s last week' I/ll take
%y chances and hire herQ
7inally' i, you have su$$estions aout how to interview %ore e,,ectively' I/% all ears.
?0 /apori EE "ep 2' 2012 at P!2( p%
&ne prole% with this is that $ood pro$ra%%ers tend to avoid recursion 9no data to
support it thou$h:.
"o while you are e+pectin$ a 10 %in recursive version' the $ood pro$ra%%er is tryin$ to
rin$ out a non recursive version and %i$ht ,ail. 9&, course' non recursive version ,or
this can e done in an interview:.
tw' this is a te+ook e+ercise prole%.
I elieve "ed$ewick/s ook 9or perhaps )L@: has it.
?1 Daniel /un+elan' EE "ep 2' 2012 at P!(0 p%
I concede that a lot o, $ood pro$ra%%ers %ay instinctively avoid recursion' althou$h in
this case I/d say that %akes the% worse pro$ra%%ers. 4he whole point o, learnin$ a set
o, so,tware en$ineerin$ tools is to apply the ri$ht one to the ri$ht prole%' and this
prole% is %ostly naturally de,ined and solved recursively.
As ,or it ein$ a te+took e+ercise' $ood to know. As I said in the ori$inal post' I ,irst
encountered this prole% in the course o, writin$ production so,tware.
?2 /apori EE "ep 2' 2012 at X!21 p%
4he prole% 9pun intended: is that the prole% is still inco%pletely de,ined. 7or
instance' we have no idea aout the e+pected tar$et hardware. #oes it have li%ited
%e%ory1 Li%ited stack space1 #oes the lan$ua$e we are to use support recursion1 etc
&k' %aye the last one 9or any o, the%: is not relevant these days' ut you $et %y point.
3asically we have no idea what we need to try and opti%iCe ,or. <ranted' a $ood
candidate %i$ht and proaly should try and clari,y that' ut in the ,ace o, a%i$uity'
$ood pro$ra%%ers ,ollow A9instinctive: $ood practicesB which they $ained throu$h
e+perience etc.
7or instance' you will use 0uadratic %e%ory and linear stack in the recursive version. A
$ood pro$ra%%er %i$ht instinctively try avoid the cost o, the stack.
3ut' i, the $oal is to opti%iCe the ti%e to write the code' then a recursive version will e
,aster 9and I suppose is an i%plicit $oal in the interviews' ut al%ost never the case in
critical production code:.
"o callin$ the% worse pro$ra%%ers ,or not usin$ recursion is not ri$ht' ID&.
tw' you see% to have i$nored the cost o, lookin$ up the %e%oiCed structure. "ince you
are lookin$ up the strin$ your recursive version is cuic' inspite o, the sustrin$ and
dictionary lookup ein$ &91:. &, course' that can e avoided i, we represent the strin$ ,or
lookup y the end point inde+es 9i'-:' rather than the strin$ itsel,.
"orry ,or the lon$ post.
?( Daniel /un+elan' EE "ep 2' 2012 at G!1P p%
.o need to apolo$iCe M I appreciate the discussionQ
And several o, the points you/ve raised have co%e up when I/ve used this prole% in
interviews. I/ve seen candidates i%ple%ent a stack-ased approach without recursion.
I/ve also had discussions aout nuances o, scale and per,or%ance' includin$ whether the
li%ited %e%ory re0uires the dictionary to e stored out o, code 9a $reat %otivation ,or
usin$ a 3loo% ,ilter: and whether the cost o, creatin$ a read-only copy o, a sustrin$ is
constant 9i.e.' represented y the end-point inde+es: or linear in the len$th o, the
sustrin$. 9ecause the sustrin$ is actually created as a strin$:.
"o you/re ri$ht that not all $ood pro$ra%%ers will -u%p to recursion M thou$h I do think
that is the si%plest path ,or %ost. And in an interview I ur$e candidates to start with the
si%plest solution that works. 4hat/s not only a $ood idea durin$ an interview' ut a $ood
idea in practice' to avoid pre%ature opti%iCation.
@e$ardless o, the choice o, interview prole%' the interviewer has to e co%petent and
,le+ile. I/% sure a interviewer can utcher an interview with even the est prole%. 3ut
so%e interview prole%s are etter than others. And I stron$ly ,avor interview prole%s
that are ased on real prole%' don/t re0uire specialiCed knowled$e' and provide
candidates options to succeed without dependin$ entirely on the candidate arrivin$ at a
sin$le insi$ht.
?N /apori EE "ep 2' 2012 at 10!1= p%
)o%pletely a$ree with the co%%ent aout co%petent and ,le+ile interviewers.
?P 1ob Interviews. ,hat is your avourite interview Muestionn or a sotware
en'ineer? L Quora EE "ep (' 2012 at 12!0X p%
]...^ is your ,avourite interview 0uestionn ,or a so,tware en$ineer14his lo$ post has a
nice 0uestion 6 http!EEthenoisychannel.co%E2011E8 . 5hat are your ,avourite
interviewin$ 0uestions1 Add ]...^
?= mm EE "ep 22' 2012 at X!N1 p%
#ue to this 0uestion I had interview with whitepa$es.co% and this is why I couldn/t $et
the -o8 <reat answer' hope,ully I won/t see this 0uestion a$ain in %y interviews. L&L
?X Quora EE "ep 2?' 2012 at 10!(X a%
,hy do topmost tech companies 'ive more priority to al'orithms durin' the
recruitment process?N
.ot all top tech co%panies. At LinkedIn' we put a heavy e%phasis on the aility to
think throu$h the prole%s we work on. 7or e+a%ple' i, so%eone clai%s e+pertise in
%achine learnin$' we ask the% to apply it to one o, our reco%%endation prole%s. A8
?G tech'uy EE &ct 1=' 2012 at 11!P1 p%
I have a 0uestion aout the %e%oiCed solution. I understand the advanta$e o, savin$ the
results o, a dead end co%putation' where the code reads!
%e%oiCed.put9input' null:O
3ut I don/t understand the advanta$e to %e%oiCin$ here!
%e%oiCed.put9input' pre,i+ S B B S se$"u,,i+:O
"ince se$"u,,i+ has already een ,ound to e not null' that %eans we have reached the
end o, the input strin$ so%ewhere deeper in the call stack' and are now -ust unwindin$
ack to the top1 Daye I/ve %issed so%ethin$' it/s late at ni$ht ,or %e' ut I can/t see it
any other way ri$ht now. 4hanks.
?? Daniel /un+elan' EE &ct 1X' 2012 at 10!0= p%
I think you/re ri$ht M we don/t need to %e%oiCe the non-null values. I/ve a%ended the
code accordin$ly.
100 CID& !C1!. 5otes rom a Conerence in Paradise EE .ov 12' 2012 at X!00 a%
]...^ sessions o, the con,erence. 4here was a talk on 0uery se$%entation' a topic
responsile ,or %y %ost popular lo$ post. Also a $reat talk on identi,yin$ $ood
aandon%ent' a prole% I/ve een interestin$ ever ]...^
101 /hou'ht this was cool. CID& !C1!. 5otes rom a Conerence in Paradise @
C,OAlpha EE .ov 1(' 2012 at X!11 a%
]...^ sessions o, the con,erence. 4here was a talk on 0uery se$%entation' a topic
responsile ,or %y %ost popular lo$ post. Also a $reat talk on identi,yin$ $ood
aandon%ent' a prole% I/ve een interestin$ ever since ]...^
102 Dir+ 4orissen EE Jan 10' 201( at N!N1 a%
As an aside' as ,ar as I can see' as $iven the $iven code will ,ail to $ive a co%plete
solution in cases like this!
dict U ]LtheL'Li$L'LcatL^
strin$ U Athei$$ercatB or Athei$,oocatB
Also' i, you call it with the strin$ AcatsB it will return null instead o, cat.
10( Daniel /un+elan' EE Jan 10' 201( at X!2P a%
#irk' that is how the prole% is set up.
7ro% the post!
Q! 5hat aout ste%%in$' spellin$ correction' etc.1
A! Just se$%ent the e+act input strin$ into a se0uence o, e+act words in the dictionary.
&, course you can $eneraliCe the prole% to %ake it %ore interestin$' especially i, a
candiate solves the ori$inal prole% with ti%e to spare.
10N Dir+ 4orissen EE Jan 10' 201( at X!(= a%
Indeed' %issed that' sorry. 4hanks ,or a $reat post tw.
10P Daniel /un+elan' EE Jan 10' 201( at X!(G a%
Dy pleasure' $lad you en-oyed itQ
-ow to Prepare or an Interview as a "ntry
9evel Data Analyst
by Rick Leander, Demand Media
#ata analyst -os vary $reatly etween industries' so preparation is key to landin$ the -o. 7ind
out aout the co%pany' its products and services' the industry' the analysis tea%' and the types o,
analyses used. In %any cases' %uch o, this in,or%ation can e ,ound on the internet' ut it also
helps to talk with people inside the co%pany to $ain a co%petitive advanta$e.
Step 1
"tudy the -o postin$. <o online and ,ind the -o listin$ on the co%pany/s wesite or' i, not
online' pull the application ,ro% your ,iles. 5rite down each re0uire%ent o, the -o' then list
relevant e+perience or trainin$ that relates to the ite%. I, the listin$ %entions survey analysis'
descrie a class pro-ect that involved survey preparation' and descrie how the survey was
ad%inistered and the statistical techni0ues used to analyCe the results. @epeat this ,or each -o
re0uire%ent.
Step 2
Jnow the co%pany. "tudy the co%pany/s wesite' payin$ close attention to the AAout ;sB
pa$es. 7ind out aout the co%pany/s products or services' the lar$est custo%ers' and the
ack$rounds o, the %ana$e%ent tea%. 7or $overn%ent or research co%panies' study their
research presentations. 4hese pa$es will o,,er 0uite a it o, insi$ht aout their data analysis
techni0ues and practices.
8elated 8eadin'. Accountin$ Fntry Level Interview Questions
Step 3
Ask ,or an in,or%ational interview. 5hen possile' ask to call and talk with the hirin$ %ana$er
or a lead analyst to ,ind out %ore aout the work ein$ done. 3y understandin$ what the tea%
does' you can ali$n your trainin$ and e+perience to etter %atch their needs. 5hen topics arise
that are un,a%iliar' take ti%e to research and ,ill in these de,iciencies.
Step 7
Prepare to answer co%%on interview 0uestions. Al%ost every interview starts with a 0uestion
like Atell %e aout yoursel,B' so e ready with a concise thirty to si+ty second answer that
includes a su%%ary o, your trainin$' work e+perience' and one or two o, your personal interests.
&ther standard 0uestions will include why you want to work at this co%pany' your stren$ths and
weaknesses' and what you can o,,er to the co%pany. Prepare short answers ,or each o, these
0uestions then practice answerin$ the% out loud.
Step :
3e ready ,or the tou$h 0uestions. Look throu$h your school transcript and resu%e and look ,or
weaknesses that the interviewer %ay proe' like dropped classes or low $rades. Dany e%ployers
run ack$round and credit checks so e ready to address any ,inancial or law en,orce%ent issues.
F+plain the circu%stances' then show how you learned and $rew ,ro% these e+periences.
Step ;
7ind the interview location. ;nless the co%pany sits directly across the street' drive to the
interview location ahead o, ti%e. 7ind the %ain entrance and visitor parkin$ then take ti%e to
oserve how the sta,, dresses ,or work. &n the day o, the interview' dress -ust a it etter than
you oserved. 7or e+a%ple' i, the dress is usiness casual' wear a sport coat and tie. Arrive ,or
the interview early' take a %inute or two to check your appearance' then $o in with a positive
attitude' knowin$ you are well prepared.
References "#$
About the Author
@ick Leander lives in the #enver area and has written aout so,tware develop%ent since 1??G.
2e is the author o, A3uildin$ Application "erversB and is co-author o, APro,essional J2FF FAI.L
Leander is a pro,essional so,tware developer and has a Dasters o, Arts in co%puter in,or%ation
syste%s ,ro% 5ester ;niversity.
** job interview Muestions or data scientists
1. What is the biggest data set that you processed, and how did you process it,
what were the results?
2. Tell me two success stories about your analytic or computer science projects?
How was lift (or success measured?
!. What is" lift, #$%, robustness, model fitting, design of e&periments, '()2( rule?
*. What is" collaborati+e filtering, n,grams, map reduce, cosine distance?
-. How to optimi.e a web crawler to run much faster, e&tract better information, and
better summari.e data to produce cleaner databases?
/. How would you come up with a solution to identify plagiarism?
0. How to detect indi+idual paid accounts shared by multiple users?
'. 1hould clic2 data be handled in real time? Why? %n which conte&ts?
3. What is better" good data or good models? 4nd how do you define 5good5? %s
there a uni+ersal good model? 4re there any models that are definitely not so
good?
1(. What is probabilistic merging (4#4 fu..y merging? %s it easier to handle with
167 or other languages? Which languages would you choose for semi,
structured te&t data reconciliation?
11. How do you handle missing data? What imputation techni8ues do you
recommend?
12. What is your fa+orite programming language ) +endor? why?
1!. Tell me ! things positi+e and ! things negati+e about your fa+orite statistical
software.
1*. 9ompare 141, :, $ython, $erl
1-. What is the curse of big data?
1/. Ha+e you been in+ol+ed in database design and data modeling?
10. Ha+e you been in+ol+ed in dashboard creation and metric selection? What do
you thin2 about ;irt?
1'. What features of Teradata do you li2e?
13. <ou are about to send one million email (mar2eting campaign. How do you
optim.e deli+ery? How do you optimi.e response? 9an you optimi.e both
separately? (answer" not really
2(. Toad or ;rio or any other similar clients are 8uite inefficient to 8uery =racle
databases. Why? How would you do to increase speed by a factor 1(, and be
able to handle far bigger outputs?
21. How would you turn unstructured data into structured data? %s it really
necessary? %s it =# to store data as flat te&t files rather than in an 167,powered
:>;?1?
22. What are hash table collisions? How is it a+oided? How fre8uently does it
happen?
2!. How to ma2e sure a mapreduce application has good load balance? What is load
balance?
2*. @&amples where mapreduce does not wor2? @&amples where it wor2s +ery well?
What are the security issues in+ol+ed with the cloud? What do you thin2 of
@?9As solution offering an hybrid approach , both internal and e&ternal cloud , to
mitigate the ris2s and offer other ad+antages (which ones?
2-. %s it better to ha+e 1(( small hash tables or one big hash table, in memory, in
terms of access speed (assuming both fit within :4?? What do you thin2 about
in,database analytics?
2/. Why is nai+e ;ayes so bad? How would you impro+e a spam detection algorithm
that uses nai+e ;ayes?
20. Ha+e you been wor2ing with white lists? $ositi+e rules? (%n the conte&t of fraud or
spam detection
2'. What is star schema? 7oo2up tables?
23. 9an you perform logistic regression with @&cel? (yes How? (use linest on log,
transformed data? Would the result be good? (@&cel has numerical issues, but
itAs +ery interacti+e
!(. Ha+e you optimi.ed code or algorithms for speed" in 167, $erl, 9BB, $ython etc.
How, and by how much?
!1. %s it better to spend - days de+eloping a 3(C accurate solution, or 1( days for
1((C accuracy? >epends on the conte&t?
!2. >efine" 8uality assurance, si& sigma, design of e&periments. Di+e e&amples of
good and bad designs of e&periments.
!!. What are the drawbac2s of general linear model? 4re you familiar with
alternati+es (7asso, ridge regression, boosted trees?
!*. >o you thin2 -( small decision trees are better than a large one? Why?
!-. %s actuarial science not a branch of statistics (sur+i+al analysis? %f not, how so?
!/. Di+e e&amples of data that does not ha+e a Daussian distribution, nor log,
normal. Di+e e&amples of data that has a +ery chaotic distribution?
!0. Why is mean s8uare error a bad measure of model performance? What would
you suggest instead?
!'. How can you pro+e that one impro+ement youA+e brought to an algorithm is really
an impro+ement o+er not doing anything? 4re you familiar with 4); testing?
!3. What is sensiti+ity analysis? %s it better to ha+e low sensiti+ity (that is, great
robustness and low predicti+e power, or the other way around? How to perform
good cross,+alidation? What do you thin2 about the idea of injecting noise in your
data set to test the sensiti+ity of your models?
*(. 9ompare logistic regression w. decision trees, neural networ2s. How ha+e these
technologies been +astly impro+ed o+er the last 1- years?
*1. >o you 2now ) used data reduction techni8ues other than $94? What do you
thin2 of step,wise regression? What 2ind of step,wise techni8ues are you familiar
with? When is full data better than reduced data or sample?
*2. How would you build non parametric confidence inter+als, e.g. for scores? (see
the 4nalytic;ridge theorem
*!. 4re you familiar either with e&treme +alue theory, monte carlo simulations or
mathematical statistics (or anything else to correctly estimate the chance of a
+ery rare e+ent?
**. What is root cause analysis? How to identify a cause +s. a correlation? Di+e
e&amples.
*-. How would you define and measure the predicti+e power of a metric?
*/. How to detect the best rule set for a fraud detection scoring technology? How do
you deal with rule redundancy, rule disco+ery, and the combinatorial nature of
the problem (for finding optimum rule set , the one with best predicti+e power?
9an an appro&imate solution to the rule set problem be =#? How would you find
an =# appro&imate solution? How would you decide it is good enough and stop
loo2ing for a better one?
*0. How to create a 2eyword ta&onomy?
*'. What is a ;otnet? How can it be detected?
*3. 4ny e&perience with using 4$%As? $rogramming 4$%As? Doogle or 4ma.on 4$%As?
4aa1 (4nalytics as a ser+ice?
-(. When is it better to write your own code than using a data science software
pac2age?
-1. Which tools do you use for +isuali.ation? What do you thin2 of Tableau? :?
141? (for graphs. How to efficiently represent - dimension in a chart (or in a
+ideo?
-2. What is $=9 (proof of concept?
-!. What types of clients ha+e you been wor2ing with" internal, e&ternal, sales )
finance ) mar2eting ) %T people? 9onsulting e&perience? >ealing with +endors,
including +endor selection and testing?
-*. 4re you familiar with software life cycle? With %T project life cycle , from gathering
re8uests to maintenance?
--. What is a cron job?
-/. 4re you a lone coder? 4 production guy (de+eloper? =r a designer (architect?
-0. %s it better to ha+e too many false positi+es, or too many false negati+es?
-'. 4re you familiar with pricing optimi.ation, price elasticity, in+entory management,
competiti+e intelligence? Di+e e&amples.
-3. How does EillowAs algorithm wor2? (to estimate the +alue of any home in F1
/(. How to detect bogus re+iews, or bogus Gaceboo2 accounts used for bad
purposes?
/1. How would you create a new anonymous digital currency?
/2. Ha+e you e+er thought about creating a startup? 4round which idea ) concept?
/!. >o you thin2 that typed login ) password will disappear? How could they be
replaced?
/*. Ha+e you used time series models? 9ross,correlations with time lags?
9orrelograms? 1pectral analysis? 1ignal processing and filtering techni8ues? %n
which conte&t?
/-. Which data scientists do you admire most? which startups?
//. How did you become interested in data science?
/0. What is an efficiency cur+e? What are its drawbac2s, and how can they be
o+ercome?
/'. What is a recommendation engine? How does it wor2?
/3. What is an e&act test? How and when can simulations help us when we do not
use an e&act test?
0(. What do you thin2 ma2es a good data scientist?
01. >o you thin2 data science is an art or a science?
02. What is the computational comple&ity of a good, fast clustering algorithm? What
is a good clustering algorithm? How do you determine the number of clusters?
How would you perform clustering on one million uni8ue 2eywords, assuming
you ha+e 1( million data points , each one consisting of two 2eywords, and a
metric measuring how similar these two 2eywords are? How would you create
this 1( million data points table in the first place?
0!. Di+e a few e&amples of 5best practices5 in data science.
0*. What could ma2e a chart misleading, difficult to read or interpret? What features
should a useful chart ha+e?
0-. >o you 2now a few 5rules of thumb5 used in statistical or computer science? =r in
business analytics?
0/. What are your top - predictions for the ne&t 2( years?
00. How do you immediately 2now when statistics published in an article (e.g.
newspaper are either wrong or presented to support the authorAs point of +iew,
rather than correct, comprehensi+e factual information on a specific subject? Gor
instance, what do you thin2 about the official monthly unemployment statistics
regularly discussed in the press? What could ma2e them more accurate?
0'. Testing your analytic intuition" loo2 at these three charts. Two of them e&hibit
patterns. Which ones? >o you 2now that these charts are called scatter,plots?
4re there other ways to +isually represent this type of data?
03. <ou design a robust non,parametric statistic (metric to replace correlation or :
s8uare, that (1 is independent of sample si.e, (2 always between ,1 and B1,
and (! based on ran2 statistics. How do you normali.e for sample si.e? Write an
algorithm that computes all permutations of n elements. How do you sample
permutations (that is, generate tons of random permutations when n is large, to
estimate the asymptotic distribution for your newly created metric? <ou may use
this asymptotic distribution for normali.ing your metric. >o you thin2 that an e&act
theoretical distribution might e&ist, and therefore, we should find it, and use it
rather than wasting our time trying to estimate the asymptotic distribution using
simulations?
'(. ?ore difficult, technical 8uestion related to pre+ious one. There is an ob+ious
one,to,one correspondence between permutations of n elements and integers
between 1 and nH >esign an algorithm that encodes an integer less than nH as a
permutation of n elements. What would be the re+erse algorithm, used to decode
a permutation and transform it bac2 into a number? Hint" 4n intermediate step is
to use the factorial number system representation of an integer. Geel free to
chec2 this reference online to answer the 8uestion. @+en better, feel free to
browse the web to find the full answer to the 8uestion (this will test the
candidateAs ability to 8uic2ly search online and find a solution to a problem
without spending hours rein+enting the wheel.
'1. How many 5useful5 +otes will a <elp re+iew recei+e? My answer" @liminate
bogus accounts (read this article, or competitor re+iews (how to detect them"
use ta&onomy to classify users, and location , two %talian restaurants in same Eip
code could badmouth each other and write great comments for themsel+es.
>etect fa2e li2es" some companies (e.g. Gan?eIow.com will charge you to
produce fa2e accounts and fa2e li2es. @liminate prolific users who li2e
e+erything, those who hate e+erything. Ha+e a blac2list of 2eywords to filter fa2e
re+iews. 1ee if %$ address or %$ bloc2 of re+iewer is in a blac2list such as 51top
Gorum 1pam5. 9reate honeypot to catch fraudsters. 4lso watch out for
disgruntled employees badmouthing their former employer. Watch out for 2 or !
similar comments posted the same day by ! users regarding a company that
recei+es +ery few re+iews. %s it a brand new company? 4dd more weight to
trusted users (create a category of trusted users. Glag all re+iews that are
identical (or nearly identical and come from same %$ address or same user.
9reate a metric to measure distance between two pieces of te&t
(re+iews. 9reate a re+iew or re+iewer ta&onomy. Fse hidden decision trees to
rate or score re+iew and re+iewers.
'2. What did you do today? =r what did you do this wee2 ) last wee2?
'!. What)when is the latest data mining boo2 ) article you read? What)when is the
latest data mining conference ) webinar ) class ) wor2shop ) training you
attended? What)when is the most recent programming s2ill that you ac8uired?
'*. What are your fa+orite data science websites? Who do you admire most in the
data science community, and why? Which company do you admire most?
'-. What)when)where is the last data science blog post you wrote?
'/. %n your opinion, what is data science? ?achine learning? >ata mining?
'0. Who are the best people you recruited and where are they today?
''. 9an you estimate and forecast sales for any boo2, based on 4ma.on public
data? Hint" read this article.
'3. WhatAs wrong with this picture?
3(. 1hould remo+ing stop words be 1tep 1 rather than 1tep !, in the search engine
algorithm described here? Answer" Ha+e you thought about the fact that mine
and yours could also be stop words? 1o in a bad implementation, data mining
would become data mine after stemming, then data. %n practice, you remo+e stop
words before stemming. 1o 1tep ! should indeed become step 1.
31. @&perimental design and a bit of computer science with 7egoAs
You need to e a memer of !ata Science "entral to add comments#
Join #ata "cience )entral
$o%%ent by "!shal! ra2!" on Jo"e%ber 1=, 2013 at 11D3:p%
.(incent
can i $et the possile answers ,or the aove interview 0uestions
Vishali
$o%%ent by E!nent Gran"!lle on Septe%ber 12, 2013 at 9D2;a%
I have added one new 0uestion - 0uestion I?0.
$o%%ent by E!nent Gran"!lle on May :, 2013 at <D77p%
"o%eone wrote!
%oos lie hiring managers expect data scientists to have expertise in machine learning,
statistics, business intelligence, database design, data munging, data visuali&ation and
programming. 're not these requirements too excessive(
&y answer!
I think ein$ familiar with all these do%ains 9add co%puter science' %ap reduce: is
necessary' as well as e+pertise in so%e o, these do%ains. Dasterin$ two pro$ra%%in$
lan$ua$es 9Java' Python: is a %ust' as well as ,a%iliarity with @ and "QL. VisualiCation
is easy to ac0uire.
Jnowin$ how to 0uickly and independently ,ind' learn 9or i, necessary' invent: and
assess use,ulness o, the techni0ues needed to handle the prole%s' is critical' and D&@F
i%portant than Lknowin$L the techni0ues in the ,irst place. A $ood a%ount o, e+perience
with some techni0ues is necessary.
3ut you donKt need to e an e+pert in everythin$. 7or instance' aout ?0> o, what I
learned in statistics courses' IKve never had to use it to solve usiness prole%s. "o why
learn it in the ,irst place1 Also' %achine learnin$ 9in %y opinion: is a suset o, statistics
,ocusin$ on clusterin$' pattern reco$nition and association rules.
4he %istake that %any hirin$ %ana$ers do is lookin$ ,or so%eone who is an e+pert in
everythin$.
$o%%ent by Joe M on Apr!l 21, 2013 at :D0;p%
Are 0uestions like this actually asked in hi$h-level interviews1 All I ever $ot when I was
startin$ out was L5hat was your %ost satis,yin$ e+perience1L and L&ther than ,or the
%oney' why do you want this -o1L
$o%%ent by '!tendra on Aebruary 2:, 2013 at 12D77p%
Vincent'
Que are $ood and %any o, the% are ased on practical e+p too.
4h+ ,or sharin$ the co%%ents ,ro% Allen Fn$elhardt' it provides etter conte+t.
Answer to IX=' I would luv to see I1 VA -T Visual Analytics ein$ top o, the%.
I2 should e' lar$e a%ount o, the data replaced y videoKs.
$o%%ent by Mars Ma on Aebruary 20, 2013 at :D:1a%
VVincent <ranville' really use,ul 0uestions' I like the%' thanks a lot QZZ
$o%%ent by A%y on Aebruary 19, 2013 at 11D70a%
V)rai$! I, the client returns result in your rowser' you can handle only as %uch data as
your rowser can. In %ost cases' a G0'000 row tale will crash your rowser. Just access
&racle directly via Python or Perl' and you can handle 9e+tract and save: $i$aytes o,
data 0uite easily. And ,ar' ,ar ,aster.
$o%%ent by A%y on Aebruary 19, 2013 at 11D3<a%
5hat %akes you a data scientist1 *ou -ust need to know how to $ather and turn data into
%oney - nothin$ %ore' nothin$ less. .o de$ree needed' you can learn so%e techni0ues y
readin$ %aterial online' ut %uch o, what %akes a success,ul data scientist
9dataEusiness cra,ts%anship: is not ,ound in any curriculu% or pulished article.
$o%%ent by ra!g ha%bers on Aebruary 19, 2013 at 10D22a%
)an so%eone help %e with I20 - L4oad or ;rio or any other similar clients are 8uite
inefficient to 8uery =racle databases. Why? How would you do to increase speed
by a factor 1(, and be able to handle far bigger outputs? 5 % didnAt 2now that 167
clients really affected actual 8uery efficiency. Than2sH
$o%%ent by E!nent Gran"!lle on Aebruary 1;, 2013 at <D19a%
2ereKs a potential answer to 0uestion I10 9proailistic %er$in$:. 7eel ,ree to add your
answers to any o, these 0uestions.
Answer to Muestion E1C!
.ot sure i, the prole% o, ,uCCy %er$in$ can e addressed within the ,ra%ework o,
traditional dataases. "ay you have a tale A with 10'000 users 9key is user I#:' a tale 3
with P0'000 users 9key is user I#:. *ou could created a user %appin$ tale ) with three
,ields!
1# user1) BK -eyC,
2# AlternateL*ser1) Bth!s f!eld ,ould also be a user 1)C and
3# Probab!l!ty Bprobab!l!ty that user1) K AlternateL*ser1)C#
4his tale would e populated a,ter so%e %achine learnin$ al$orith% had een applied to
tales A and 3 to identi,y si%ilar users and the proaility they %atch. Dake sure that
you only include 9in tale ): records where proaility is aove 9say: 0.2P' otherwise you
risk e+plodin$ your dataase.
You need to e a memer of !ata Science "entral to add comments#
Join #ata "cience )entral
$o%%ent by E!nent Gran"!lle on Aebruary 1;, 2013 at ;D:7a%
Also' %y X0 0uestions ,ocus %ostly on the tech aspects o, ein$ a data scientist. And
these are hi$h level 0uestions' ai%ed at senior pro,essionals 9I think there is no such thin$
as a -unior data scientist - they would e called data analyst' so,tware en$ineer'
statistician or co%puter scientist instead:. I did not include 0uestions aout so,t skills -
that would e another set o, X0 0uestions.
I will add a new one! do you think data science is an art or a science1 4he answer' as
always' is LothL. 4hen you can di$ deeper and ask whether you are %ore o, an artist than
a scientist. Dy answer would e! itKs %ore cra,ts%anship than art' ut in %y case' ein$ a
desi$nerEarchitect' itKs a tiny it closer to art than to science. )ertainly a lend o, oth.
And when rin$in$ the issue o, art vs. "cience' I would also add that I like to uild
solutions that are ele$ant in the way they contriute to @&I E li,t' ut not in the way they
contriute to statistical theory and the eauty o, science. I like a dirty' u$ly' i%per,ect
solution etter than a L$reat %odelL i, it is %ore scalale' si%ple' e,,icient' easy to
i%ple%ent and roust.
$o%%ent by '!hard G!a%brone on Aebruary 1:, 2013 at 12D:1p%
Vincent' I like these 0uestions. 4hey are $ood 0uestions to ask yoursel, even i, youKre
not interviewin$. ;nderstandin$ what you do is di,,erent ,ro% ein$ ale to e+plain
what you do.
@ich <ia%rone
$o%%ent by E!nent Gran"!lle on Aebruary 1:, 2013 at 10D22a%
A data scientist is a it o, everythin$ 9statistician' so,tware en$ineer' usiness analyst'
co%puter scientist' si+ si$%a' consultant' co%%unicator:' ut %ost i%portantly she is a
senior analytic practitioner
,!th a "ery good sense for bus!ness data and bus!ness opt!%!Iat!on at large#
-no,ledge of b!g data 6 both dra,ba-s and potent!al Band able to le"erage !ts
potent!alC
,ho en2oys s,!%%!ng !n unstrutured data, fuIIy non6SM0 N2o!nsN
,ho -no,s the l!%!tat!on of old stat!st!s Bregress!on et#C yet -no,s ho, to
orretly do sa%pl!ng, ross6"al!dat!on, Monte $arlo s!%ulat!ons, des!gn of
e.per!%ents, assess!ng l!ft, !dent!fy good %etr!s
,ho -no,s the l!%!tat!ons of Map'edue, and ho, they an be o"ero%e
,ho an des!gn and de"elop robust, s!%ple, eff!!ent, rel!able, salable, useful
pred!t!"e algor!th%s 6 ,hether or not based on stat!st!al theory
A data scientist %ay not know %uch 9ut at least a little: aout linear re$ression'
statistical distriutions' the co%ple+ity o, the 0uicksort 9sortin$: al$orith% or the li%it
theore%s. 2er knowled$e o, "QL can e a it ele%entary' althou$h she can run a i$
"QL 0uery 10 ti%es ,aster than usiness analysts who use tools such as 4oad or 3rio. 2er
stren$ths' skills and knowled$e are rie,ly outlined aove.
$o%%ent by E!nent Gran"!lle on Aebruary 1:, 2013 at =D27a%
Interviewers would pick a s%all suset' thereKs not enou$h ti%e in a one-day interview to
ask all these 0uestions. Also' several o, these 0uestions are aout relevant pro-ects 9e.$.
0uestions I1 and I2:. &, course' these are not yesEno 0uestion' and one would e+pect to
spend 10-1P %inutes and $o in so%e depths answerin$ these 0uestions. .ot ein$ ale to
answer one 0uestion in no i$ deal - this set has X0 0uestions and the interviewer can
easily pick another one. Indeed' this is the purpose o, %y list.
$o%%ent by E!nent Gran"!lle on Aebruary 17, 2013 at ;D:2a%
2ereKs a co%%ent ,ro% one o, our readers!
"o%e su$$estions ,or structure you %ay want to apply to your own list!
H 4ools 9I1( 1N 1G etc:
H Al$orith%s 9I2= (( etc:
H "tatistics 9I(P (= (X etc:
H 4echni0ues 9I( N 10 etc:
H #ata "tructures 9I21 22 2P etc:
H F+perience 9I1 2 etc:
H 3usiness lan$ua$e 9P2 PN:
H #o%ain-speci,ic 9IP = X G 10 1? 20 21 2N 2X N= PP P? and proaly others:
H Plain weirdness 9IP NG P? =1 =(:
It is proaly worth thinkin$ aout the areas that are i%portant to you and %ana$e a list
ased on those. I donKt think Vincent e+pects us to -ust use the list e+cept ,or inspiration.
Dy ,avourites ,ro% the list 9,or senior people: are I2 ? 9%y answer! Lvaluale actions are
estL: and =2. 5hich ones are your ,avourites1
By Allan Engelhardt
Interview Questions or Data 0cientists
Posted. January (' 201( e Author. 2ilary Dason e 3iled under. lo$ e /a's. datascience' hirin$' startups e !P
)o%%ents f
<reat data scientists co%e ,ro% such diverse ack$rounds that it can e di,,icult to $et a sense o,
whether so%eone is up to the -o in -ust a short interview. In addition to the technical 0uestions' I
,ind it use,ul to have a ,ew 0uestions that draw out the %ore creative and less discrete ele%ents
o, a candidate/s personality. 2ere are a ,ew o, %y ,avorite 0uestions.
1. ,hat was the last thin' that you made or un?
4his is %y ,avorite 0uestion y ,ar M I want to work with the kind o, people who don/t
turn their rains o,, when they $o ho%e. It/s also a $reat way to learn what $ets people
e+cited.
2. ,hatIs your avorite al'orithm? Can you e#plain it to me?
I don/t know any data scientists who haven"t ,allen in love with an al$orith%' and I want
to see oth that enthusias% and that the candidate can e+plain it to a knowled$ale
audience.
;pdate! As #rew pointed out on 4witter' do e aware o, hammer syndrome! when
so%eone ,alls so in love with one al$orith% that they try to apply it to everythin$' even
when etter choices are availale.
(. /ell me about a data project youIve done that was successul. -ow did you add
uniMue value?
4his is a chance ,or the candidate to walk us throu$h a success and show o,, a it. It/s
also a $reat $ateway into talkin$ aout their process and pre,erred tools and e+perience.
N. /ell me about somethin' that ailed. ,hat would you chan'e i you had to do it over
a'ain?
4his is a tricky 0uestion' and so%eti%es it takes people a ,ew tries to $et to a co%plete
answer. It/s worth askin$' thou$h' to see that people have the con,idence to talk aout
so%ethin$ that went awry' and the wisdo% to have reco$niCed when so%ethin$ they did
was not opti%al.
P. Oou clearly +now a bit about our data and our wor+. ,hen you loo+ aroundB whatIs
the irst thin' that comes to mind as ;why havenIt you done Q<?H
4echnical co%petence is useless without the creativity to know where to ,ocus it. I love
when people co%e in with 0uestions and ideas.
=. ,hatIs the best interview Muestion anyone has ever as+ed you?
I/d like to wish ,or %ore wishes' please.
I/% always lookin$ ,or new and interestin$ thin$s to add to %y list' and I/d love to hear your
su$$estions.
Al'orithms "very Data 0cientist 0hould
Dnow. 8eservoir 0amplin'
by Josh 3!lls BO2oshL,!llsC
Apr!l 23, 2013
2 o%%ents
#ata scientists' that peculiar %i+ o, so,tware en$ineer and statistician' are notoriously di,,icult
to interview. &ne approach that I/ve used over the years is to pose a prole% that re0uires so%e
%i+ture o, al$orith% desi$n and proaility theory in order to co%e up with an answer. 2ere/s
an e+a%ple o, this type o, 0uestion that has een popular in "ilicon Valley ,or a nu%er o,
years!
)ay you have a stream of items of large and unnown length that we can only iterate over once.
*reate an algorithm that randomly chooses an item from this stream such that each item is
equally liely to be selected.
4he ,irst thin$ to do when you ,ind yoursel, con,ronted with such a 0uestion is to stay calm. 4he
data scientist who is interviewin$ you isn/t tryin$ to trick you y askin$ you to do so%ethin$ that
is i%possile. In ,act' this data scientist is desperate to hire you. "he is uried under a pile o,
analysis re0uests' her F4L pipeline is roken' and her %achine learnin$ %odel is ,ailin$ to
conver$e. 2er only hope is to hire s%art people such as yoursel, to co%e in and help. "he wants
you to succeed.
@e%e%er! "tay )al%.
4he second thin$ to do is to think deeply aout the 0uestion. Assu%e that you are talkin$ to a
$ood person who has read #aniel 4unkelan$/s e+cellent advice aout interviewin$ data
scientists. 4his %eans that this interview 0uestion proaly ori$inated in a real prole% that this
data scientist has encountered in her work. 4here,ore' a si%ple answer like' AI would put all o,
the ite%s in a list and then select one at rando% once the strea% ended'B would e a ad thin$ ,or
you to say' ecause it would %ean that you didn/t think deeply aout what would happen i, there
were %ore ite%s in the strea% than would ,it in %e%ory 9or even on diskQ: on a sin$le co%puter.
4he third thin$ to do is to create a simple e+a%ple prole% that allows you to work throu$h what
should happen ,or several concrete instances o, the prole%. 4he vast %a-ority o, hu%ans do a
%uch etter -o o, solvin$ prole%s when they work with concrete e+a%ples instead o,
astractions' so %akin$ the prole% concrete can $o a lon$ way toward helpin$ you ,ind a
solution.
A $rimer on %eservoir Samplin&
7or this prole%' the si%plest concrete e+a%ple would e a strea% that only contained a sin$le
ite%. In this case' our al$orith% should return this sin$le ele%ent with proaility 1. .ow let/s
try a sli$htly harder prole%' a strea% with e+actly two ele%ents. 5e know that we have to hold
on to the ,irst ele%ent we see ,ro% this strea%' ecause we don/t know i, we/re in the case that
the strea% only has one ele%ent. 5hen the second ele%ent co%es alon$' we know that we
want to return one o, the two ele%ents' each with proaility 1E2. "o let/s $enerate a rando%
nu%er + etween 0 and 1' and return the ,irst ele%ent i, + is less than 0.P and return the second
ele%ent i, + is $reater than 0.P.
.ow let/s try to $eneraliCe this approach to a strea% with three ele%ents. A,ter we/ve seen the
second ele%ent in the strea%' we/re now holdin$ on to either the ,irst ele%ent or the second
ele%ent' each with proaility 1E2. 5hen the third ele%ent arrives' what should we do1 5ell' i,
we know that there are only three ele%ents in the strea%' we need to return this third ele%ent
with proaility 1E(' which %eans that we/ll return the other ele%ent we/re holdin$ with
proaility 1 6 1E( U 2E(. 4hat %eans that the proaility o, returnin$ each ele%ent in the strea%
is as ,ollows!
1# A!rst Ele%entD B1+2C P B2+3C K 1+3
2# Seond Ele%entD B1+2C P B2+3C K 1+3
3# @h!rd Ele%entD 1+3
3y considerin$ the strea% o, three ele%ents' we see how to $eneraliCe this al$orith% to any .! at
every step .' keep the ne+t ele%ent in the strea% with proaility 1E.. 4his %eans that we have
an 9.-1:E. proaility o, keepin$ the ele%ent we are currently holdin$ on to' which %eans that
we keep it with proaility 91E9.-1:: H 9.-1:E. U 1E..
4his $eneral techni0ue is called reservoir sa%plin$' and it is use,ul in a nu%er o, applications
that re0uire us to analyCe very lar$e data sets. *ou can ,ind an e+cellent overview o, a set o,
al$orith%s ,or per,or%in$ reservoir sa%plin$ in this lo$ post y <re$ <rothaus. I/d like to
,ocus on two o, those al$orith%s in particular' and talk aout how they are used in )loudera DL'
our open-source collection o, data preparation and %achine learnin$ al$orith%s ,or 2adoop.
Applied %eservoir Samplin& in "loudera '(
4he ,irst o, the al$orith%s <re$ descries is a distributed reservoir sa%plin$ al$orith%. *ou/ll
note that ,or the al$orith% we descried aove to work' all o, the ele%ents in the strea% %ust e
read se0uentially. 4o create a distriuted reservoir sa%ple o, siCe J' we use a Dap@educe
analo$ue o, the &@#F@ 3* @A.#9: trickEanti-pattern ,ro% "QL! ,or each ele%ent in the set'
we $enerate a rando% nu%er + etween 0 and 1' and keep the J ele%ents that have the lar$est
values o, +. 4his trick is especially use,ul when we want to create strati,ied sa%ples ,ro% a lar$e
dataset. Fach stratu% is a speci,ic co%ination o, cate$orical variales that is i%portant ,or an
analysis' such as $ender' a$e' or $eo$raphical location. I, there is si$ni,icant skew in our input
data set' it/s possile that a naive rando% sa%plin$ o, oservations will underrepresent certain
strata in the dataset. )loudera DL has a sa%ple co%%and that can e used to create strati,ied
sa%ples ,or te+t ,iles and 2ive tales 9via the 2)atalo$ inter,ace to the 2ive Detastore: such
that . records will e selected ,or every co%ination o, the cate$orical variales that de,ine the
strata.
4he second al$orith% is even %ore interestin$! a weighted distriuted reservoir sa%ple' where
every ite% in the set has an associated wei$ht' and we want to sa%ple such that the proaility
that an ite% is selected is proportional to its wei$ht. It wasn/t even clear whether or not this was
even possile until Pavlos F,rai%idis and Paul "pirakis ,i$ured out a way to do it and pulished
it in the 200P paper A5ei$hted @ando% "a%plin$ with a @eservoir.B 4he solution is as si%ple as
it is ele$ant' and it is ased on the sa%e idea as the distriuted reservoir sa%plin$ al$orith%
descried aove. 7or each ite% in the strea%' we co%pute a score as ,ollows! ,irst' $enerate a
rando% nu%er + etween 0 and 1' and then take the nth root o, +' where n is the wei$ht o, the
current ite%. @eturn the J ite%s with the hi$hest score as the sa%ple. Ite%s with hi$her wei$hts
will tend to have scores that are closer to 1' and are thus %ore likely to e picked than ite%s with
s%aller wei$hts.
In )loudera DL' we use the wei$hted reservoir sa%plin$ al$orith% in order to cut down on the
nu%er o, passes over the input data that the scalale k-%eansSS al$orith% needs to per,or%.
4he ksketch co%%and runs the k-%eansSS initialiCation procedure' per,or%in$ a s%all nu%er
o, iterations over the input data set to select points that ,or% a representative sa%ple 9or setch:
o, the overall data set. 7or each iteration' the proaility that a $iven point should e added to the
sketch is proportional to its distance ,ro% the closest point in the current sketch. 3y usin$ the
wei$hted reservoir sa%plin$ al$orith%' we can select the points to add to the ne+t sketch in a
sin$le pass over the input data' instead o, one pass to co%pute the overall cost o, the clusterin$
and a second pass to select the points ased on those cost calculations.
These Boo)s Behind 'e !on*t +ust 'a)e The ,ffice (oo) -ood
Interestin$ al$orith%s aren/t -ust ,or the en$ineers uildin$ distriuted ,ile syste%s and search
en$ines' they can also co%e in handy when you/re workin$ on lar$e-scale data analysis and
statistical %odelin$ prole%s. I/ll try to write so%e additional posts on al$orith%s that are
interestin$ as well as use,ul ,or data scientists to learn' ut in the %eanti%e' it never hurts to
rush up on your Jnuth.
-ow to hire data scientists and 'et hired as
one
s you %i$ht have heard e,ore i, you read DcJinsey reports' the .ew *ork 4i%es or -ust aout
any technolo$y news site' data scientists are in hi$h de%and. 2eck' the 2arvard 3usiness
@eview called it the se+iest -o o, the 21st century. 3ut landin$ a $i$ as a data scientist isn/t easy
M especially a top-notch $i$ at a %a-or we or e-co%%erce co%pany where
%erely talented people are a di%e a doCen.
2owever' co%panies are startin$ to talk openly aout what they look ,or in data scientists'
includin$ the skills so%eone should have and what they/ll need to know to survive an interview.
I spent a day at the Predictive Analytics 5orld con,erence on Donday and heard oth .et,li+
and &ritC $ive their two cents. 4hat/s also the sa%e day 2ortonworks pulished a lo$ post
aout how to uild a data science tea%.
<ranted that Adata scientistB is a neulous ter% M perhaps as %uch so as Ai$ dataB M these tips
9a %ashup o, all three sources: are still roadly applicale. I, you want to %ake the leap ,ro%
$uy who knows data to data scientist' I su$$est payin$ attention.
1/ 0now the core competencies/
7or %ost o, us' there/s readin' \ritin/ and \rith%etic. 7or data scientists' there/s "QL' statistics'
predictive %odelin$ and pro$ra%%in$ 9proaly Python:. I, you don/t have at least a $roundin$
in these skills' you/re proaly not $ettin$ throu$h the door' in part ecause they ,or% a co%%on
lan$ua$e that lets people ,ro% di,,erent ack$rounds talk to each other.
2ortonworks/ &,er Dendelevitch descries the ideal data scientist as occupyin$ a place on the
spectru% etween a so,tware en$ineer and a research scientist. In distin$uishin$ a $reat en$ineer'
%athe%atician or data analyst ,ro% a data scientist' pro$ra%%in$ skills are proaly the i$$est
variale. 4hat/s ecause ein$ ale to write code %eans you/ll have an easier ti%e testin$ out
your hypotheses and al$orith%s' hackin$ throu$h certain prole%s and $enerally thinkin$ in
ways that actually relate to the products your e%ployer is uildin$.
"ource! 2ortonworks
)hris Pouliot' director o, al$orith%s and analytics at .et,li+' said even ein$ ale to Apseudo-
codeB %i$ht e $ood enou$h i, so%eone is otherwise a stron$ candidate. *ou can pick up "QL
or Python or whatever you need pretty 0uickly' he noted.
&r' hinted &ritC VP o, Advanced Analytics "a%eer )hopra' you could -ust suck it up and learn
Python now! AI, you were to leave today and ask \5hat speci,ic skills should I learn1/! Python.B
2/ 0now a little more/
&, course' -ust %eetin$ the %ini%u% re0uire%ents never $ot anyody a -o 9well' al%ost
noody:. 5hat Pouliot is really lookin$ ,or in a candidate are! an advanced de$ree in a
0uantitative ,ieldO hands-on e+perience hackin$ data 9ideally usin$ 2ive' Pi$' "QL or Python:O
$ood e+ploratory analysis skillsO the aility to work with en$ineerin$ tea%sO and the aility to
$enerate and create al$orith%s and %odels rather than relyin$ on out-o,-the-o+ ones.
)hopra/s advice was to $et up to speed on %achine learnin$' especially i, you want to work in
"ilicon Valley' where %achine learnin$ has e+ploded in popularity. 2e/s also a i$ ,an o, honin$
those hackin$ skills ecause data %un$in$ is such a valuale skill when you/re dealin$ with so
%any types o, data that you need to process so they work to$ether. I, you can do 0uality
analytics across %yriad data sources' )hopra said' Ayou can write your own ticket in this day and
a$e.B
&h' and i, you/re plannin$ to work at a startup' he added' @ is al%ost a %ust-know ,or anyone
whose -o will entail statistical analysis.
3/ &mbrace online learning/
I, it all sounds a little dauntin$' don/t e too worried' )hopra advised. 4hat/s ecause there are
plenty o, opportunities to learn these new skills online via oth %assive open online courses
9he/s particularly keen on ;dacity/s )o%puter "cience 101 and Andrew .$/s %achine learnin$
course on )oursera: and universities/ own online curricula. )hopra also su$$ested -oinin$
pro,essional $roups on LinkedIn' participatin$ in Ja$$le co%petitons and %aye even $ettin$
out o, the house y $oin$ to %eetups.
5hatever you/re curious aout' thou$h M te+t %inin$' natural lan$ua$e processin$' deep
learnin$ M you can proaly ,ind so%eone willin$ to teach you ,or ,ree or nearly ,ree' and any
additional skills will help set you apart ,ro% the crowd.
!/ 1earn to tell a story/
Last %onth at "tructure! #ata' #J Patil told %e that one o, the i$$est skill shortco%in$s in data
science is the aility to tell a story with data eyond -ust pointin$ to the nu%ers. )hopra a$reed'
notin$ that today/s new visualiCation tools %ake it easier to display data in ,or%ats that non-
scientists %i$ht e ale to 9or at least want to: consu%e. A corollary o, storytellin$ is $ood' old-
,ashioned co%%unication! All the charts in the world won/t %ake a di,,erence i, you can/t
co%%unicate to product %ana$ers or e+ecutives why your ,indin$s %atter.
Pouliot is a little less sold on co%%unication skills' thou$h M at least so%eti%es. I, you/re an
en$ineer pri%arily talkin$ to other en$ineers' he told the roo%' you proaly can speak all the
-ar$on you want. It/s only i, so%eone has a usiness-,acin$ role when co%%unication really
eco%es i%portant.
"/ +repare to be tested 2aka 34our pedigree means nothing5)/
A,ter you/ve learned all these skills' added the% to your rgsu%g and talked to a hirin$ %ana$er
aout how $ood you are at the%' it/s likely testin$ ti%e. Prospective .et,li+ data scientists $o
throu$h a attery o, e+ercises' Pouliot says' includin$ e+plainin$ pro-ects they/ve worked on and
0uestions to deter%ine the depth o, their knowled$e. 4hey/ll also e asked to devise a ,ra%ework
that solves a prole% o, the interviewer/s choice.
)hris Pouliot
&ne thin$ Pouliot warned aout is an over-reliance on what/s on your rgsu%g. @i$ht o,, the at'
,or e+a%ple' he/ll test the heck out the skills or knowled$e that so%eone clai%s to ensure they
really know it.
2avin$ a "tan,ord de$ree and work e+perience at <oo$le don/t necessarily %ake so%eone a
shoo-in' either. Pouliot acknowled$ed durin$ a 0uick chat a,ter his presentation that he/s een
seduced y the per,ect resu%e e,ore M even $oin$ so ,ar as to cut a ,ew corners to $et so%eone
in ,or an interview M only to e disappointed in the end. Fveryone has to pass the tests' he said'
and so%e o, the est applicants on paper crashed and urned very early in the process.
6/ &'ercise creativity/
It/s durin$ the testin$ phase at places like .et,li+ that all those personal skills and e+perience can
co%e into play. 4here/s o,ten no ri$ht answer when it co%es to answerin$ the hypotheticals an
interviewer like Pouliot %i$ht ask' and he $ives onus points ,or solutions he/s never seen
e,ore. A)reativity is one o, the i$$est thin$s to look ,or when hirin$ data scientists'B he said.
Later' he added' A)reativity is kin$' I think' ,or a $reat data scientist.B
7onus tips or anyone hiring and managing data scientists
4echnically' Pouliot/s talk at Predictive Analytics 5orld was aout hirin$ data scientists' ut
%uch o, the insi$hts were proaly %ore valuale to aspirin$ data scientists. "o%e o, the%'
thou$h' we/re de,initely ,or %ana$e%ent' possily at the )-level. A ,ew points to consider!
Jetfl!. has a standalone data s!ene tea% that ,or-s losely ,!th other depart%ents
but ult!%ately ans,ers to !tself# @h!s helps the data s!ent!sts ollaborate ,!th one
another, g!"es the% up,ard %ob!l!ty B!#e#, they %!ght ne"er beo%e d!retor of
%ar-et!ng, but they ould beo%e d!retor of data s!eneC and %a-es !t eas!er to
%anage the% beause e"eryone spea-s the sa%e language so an e%ployee -no,s h!s
boss -no,s h!s stuff#
2owever' he noted' the alternative approach o, e%eddin$ data scientists within other
depart%ents does rin$ its own ene,its. 4hat type o, setup can result in a etter ali$n%ent o,
research e,,orts and usiness needs' and it can help products $et uilt ,aster ecause everyone is
on the sa%e pa$e. Pouliot su$$ests one co%pro%ise %i$ht e to keep a centraliCed data science
tea% ut locate it physically near the other tea%s it will e interactin$ with %ost o,ten' and other
is -ust to ensure you have representatives ,ro% every stakeholder depart%ent present ,or
%eetin$s and prole%-solvin$ e+ercises.
Atually, !f you 2ust annot h!re data s!ent!sts ,!th all the s-!lls you ,ant the% to ha"e,
Mendele"!th fro% 4orton,or-s suggests a s!%!lar tat!# 1t an be d!ff!ult to teah
appl!ed %ath to soft,are eng!neers and "!e "ersa, so, he ,r!tes, >QSR!%ply bu!ld a
4adoop data s!ene tea% that o%b!nes data eng!neers and appl!ed s!ent!sts, ,or-!ng
!n tande% to bu!ld your data produts# Ba- ,hen 1 ,as at SahooT, that/s e.atly the
struture ,e hadD appl!ed s!ent!sts ,or-!ng together ,!th data eng!neers to bu!ld large6
sale o%putat!onal ad"ert!s!ng syste%s#?
1f you ,ant to reta!n your good data s!ent!sts one you/"e h!red the% U espe!ally !n
S!l!on Ealley ,here they an ,al- out the door and get f!"e offers U pay!ng the% the
%ar-et rate !s a good start# Add!t!onally, Poul!ot sa!d, lett!ng the% ,or- on halleng!ng
produts ,!ll -eep the% happy# M!ro6%anag!ng the% ,!ll not#

Vous aimerez peut-être aussi