I applied online and the process took 4 weeks - interviewed at LinkedIn.
Interview Details There should be two rounds of phone interviews and one on- site interview. I applied for the position in LinkedIn career page. A recruiter contacted me the next day and scheduled my first phone screen. After the first phone screen the recruiter contacted me regarding the next round of the phone screen. Interview Question I passed the first phone screen !basic data mining "uestions including the concepts of classification and clustering# and a simple dp "uestion which is "uite similar to $%limbing &tairs$' and failed the second one right after I came back from another state !basic nlp "uestions like named entity extraction and basic data mining "uestions like &() naive bayes# and a sampling "uestion which is "uite similar to *eservoir sampling'. View Answer Interview Details ask "uestions about s"l and data mining Interview Question "uestions are "uite standard. I applied online and the process took + weeks - interviewed at LinkedIn in July 2012. Interview Details I was first contacted by a recruiter. Two phone interviews were arranged. The first interviewer asked some basic "uestions about my resume and then we went into the technical "uestions. Two "uestions were asked both about searching in sorted arrays of numbers. The second interview was almost identical except that the "uestion was more about algorithm design which re"uired general problem solving skill. Interview Question ,uestions are not difficult. It is important to review basic algorithm design and know how to talk through the interview and know when to ask for help. Interview Details -ad two phone interviews with data science group. .ne was more design "uestions machine learning background check etc. The second interview was strictly coding algorithms and stuff. Interview Questions Implement pow function. Answer Question &egment a long string into a set of valid words using a dictionary. *eturn false if the string cannot be segmented. /hat is the complexity of your solution0 &egment a long string into a set of valid words using a dictionary. *eturn false if the string cannot be segmented. /hat is the complexity of your solution0$ Common Analytics Interview Questions Posted y! "arita #i$u%arti on &ctoer 11' 201( in Articles 1 )o%%ent *ou are e+cited. *ou have $ot that %uch awaited interview call ,or that drea% analytics -o. *ou are con,ident you will e per,ect ,or the -o. .ow all that re%ains is convincin$ the interviewer. #on/t you wish you knew what kind o, 0uestions they are $oin$ to e ask1 As co ,ounder and one o, the chie, trainers at Ji$saw Acade%y' an online analytics trainin$ institute' I re$ularly $et calls ,ro% our students days e,ore their scheduled interview askin$ %e -ust this. I a% $oin$ to share with you -ust what I share with the%. 2ere you $o. 3elow are a ,ew o, the %ore popular 0uestions you could $et asked and the correspondin$ answers in a nutshell. Question 1. Can you outline the various steps in an analytics project? 3roadly speakin$ these are the steps. &, course these %ay vary sli$htly dependin$ on the type o, prole%' data' tools availale etc. 1. Problem deinition - 4he ,irst step is to o, course understand the usiness prole%. 5hat is the prole% you are tryin$ to solve 6 what is the usiness conte+t1 Very o,ten however your client %ay also -ust $ive you a whole lot o, data and ask you to do so%ethin$ with it. In such a case you would need to take a %ore e+ploratory look at the data. .evertheless i, the client has a speci,ic prole% that needs to e tackled' then then ,irst step is to clearly de,ine and understand the prole%. *ou will then need to convert the usiness prole% into an analytics prole%. I other words you need to understand e+actly what you are $oin$ to predict with the %odel you uild. 4here is no point in uildin$ a ,aulous %odel' only to realise later that what it is predictin$ is not e+actly what the usiness needs. !. Data "#ploration - &nce you have the prole% de,ined' the ne+t step is to e+plore the data and eco%e %ore ,a%iliar with it. 4his is especially i%portant when dealin$ with a co%pletely new data set. $. Data Preparation 6 .ow that you have a $ood understandin$ o, the data' you will need to prepare it ,or %odellin$. *ou will identi,y and treat %issin$ values' detect outliers' trans,or% variales' create inary variales i, re0uired and so on. 4his sta$e is very in,luenced y the %odellin$ techni0ue you will use at the ne+t sta$e. 7or e+a%ple' re$ression involves a ,air a%ount o, data preparation' ut decision trees %ay need less prep whereas clusterin$ re0uires a whole di,,erent kind o, prep as co%pared to other techni0ues. %. &odellin' 6 &nce the data is prepared' you can e$in %odellin$. 4his is usually an iterative process where you run a %odel' evaluate the results' tweak your approach' run another %odel' evaluate the results' re-tweak and so on8.. *ou $o on doin$ this until you co%e up with a %odel you are satis,ied with or what you ,eel is the est possile result with the $iven data. (. )alidation 6 4he ,inal %odel 9or %aye the est 2-( %odels: should then e put throu$h the validation process. In this process' you test the %odel usin$ co%pletely new data set i.e. data that was not used to uild the %odel. 4his process ensures that your %odel is a $ood %odel in $eneral and not -ust a very $ood %odel ,or the speci,ic data earlier used 94echnically' this is called avoidin$ over ,ittin$: *. Implementation and trac+in' 6 4he ,inal %odel is chosen a,ter the validation. 4hen you start i%ple%entin$ the %odel and trackin$ the results. *ou need to track results to see the per,or%ance o, the %odel over ti%e. In $eneral' the accuracy o, a %odel $oes down over ti%e. 2ow %uch ti%e will really depend on the variales 6 how dyna%ic or static they are' and the $eneral environ%ent 6 how static or dyna%ic that is.
Question !. ,hat do you do in data e#ploration? #ata e+ploration is done to eco%e ,a%iliar with the data. 4his step is especially i%portant when dealin$ with new data. 4here are a nu%er o, thin$s you will want to do in this step 6 a. ,hat is there in the data 6 look at the list o, all the variales in the data set. ;nderstand the %eanin$ o, each variale usin$ the data dictionary. <o ack to the usiness ,or %ore in,or%ation in case o, any con,usion. . -ow much data is there 6 look at the volu%e o, the data 9how %any records:' look at the ti%e ,ra%e o, the data 9last ( %onths' last = %onths etc.: c. Quality o the data 6 how %uch %issin$ in,or%ation' 0uality o, data in each variale. Are all ,ields usale1 I, a ,ield has data ,or only 10> o, the oservations' then %aye that ,ield is not usale etc. d. *ou will also identi,y so%e i%portant variales and %ay do a deeper investi$ation o, these. Like lookin$ at avera$es' %in and %a+ values' %aye 10 th and ?0 th percentile as well8 e. *ou %ay also identi,y ,ields that you need to trans,or% in the data prep sta$e.
Question $. ,hat do you do in data preparation? In data preparation' you will prepare the data ,or the ne+t sta$e i.e. the %odellin$ sta$e. 5hat you do here is in,luenced y the choice o, techni0ue you use in the ne+t sta$e. 3ut so%e thin$s are done in %ost cases 6 e+a%ple identi,yin$ %issin$ values and treatin$ the%' identi,yin$ outlier values 9unusual values: and treatin$ the%' trans,or%in$ variales' creatin$ inary variales i, re0uired etc' 4his is the sta$e where you will partition the data as well. i.e create trainin$ data 9to do %odellin$: and validation 9to do validation:.
Question %. -ow will you treat missin' values? 4he ,irst step is to identi,y variales with %issin$ values. Assess the e+tent o, %issin$ values. Is there a pattern in %issin$ values1 I, yes' try and identi,y the pattern. It %ay lead to interestin$ insi$hts. I, no pattern' then we can either i$nore %issin$ values 9"A" will not use any oservation with %issin$ data: or i%pute the %issin$ values. "i%ple i%putation 6 sustitute with %ean or %edian values &@ )ase wise i%putation 6,or e+a%ple' i, we have %issin$ values in the inco%e ,ield.
Question (. -ow will you treat outlier values? *ou can identi,y outliers usin$ $raphical analysis and univariate analysis. I, there are only a ,ew outliers' you can assess the% individually. I, there are %any' you %ay want to sustitute the outlier values with the 1 st percentile or the ?? th percentile values. I, there is a lot o, data' you %ay decide to i$nore records with outliers. .ot all e+tre%e values are outliers. .ot all outliers are e+tre%e values.
Question *. -ow do you assess the results o a lo'istic re'ression analysis? *ou can use di,,erent %ethods to assess how $ood a lo$istic %odel is. a. )oncordance 6 4his tells you aout the aility o, the %odel to discri%inate etween the event happenin$ and not happenin$. . Li,t 6 It helps you assess how %uch etter the %odel is co%pared to rando% selection. c. )lassi,ication %atri+ 6 helps you look at the ,alse positives and true ne$atives. "o%e other $eneral 0uestions you will %ost likely e asked! 5hat have you done to i%prove your data analytics knowled$e in the past year1 5hat are your career $oals1 5hy do you want a career in data analytics1 4he answers to these 0uestions will have to e uni0ue to the person answerin$ it. 4he key is to show con,idence and $ive well thou$ht out answers that de%onstrate you are knowled$eale aout the industry and have the conviction to work hard and e+cel as a data analyst. /he /op ( Questions A Data 0cientist 0hould As+ Durin' a 1ob Interview Posted on July 29, 2013 by Sean Murphy 4he data science -o %arket is hot and an incredile nu%er o, co%panies' lar$e and s%all' are advertisin$ a desperate need ,or talent. 3e,ore -u%pin$ on the ,irst =-,i$ure o,,er you $et' it would e wise to ask the penetratin$ 0uestions elow to %ake sure that the see%in$ly $olden opportunity in ,ront o, you isn/t actually pyrite. 1) Do they have data? *ou %i$ht $et a $ood lau$h at this one and proaly assu%e that this co%pany interviewin$ you %ust have data as they are interviewin$ you 9a data scientist:. 2owever' you know what they say aout ass-u-%in$' ri$ht1 I, the co%pany tells you that the data is co%in$ 9si%ilar to the Acheck is in the %ailB:' start askin$ a lot %ore 0uestions. Ask i, the needed data sharin$ a$ree%ents have een si$ned and even ask to see the%. I, not' ask what the ackup plan is ,or i, 9or when: the data does not arrive. 4rust %e' it always takes lon$er than everyone thinks. 4o e an entrepreneur %eans to e an opti%ist at so%e level ecause otherwise no one would do so%ethin$ with such a low proaility o, success. 4hus' it is pretty easy ,or an entrepreneur to assu%e that $ettin$ data will not e that hard. It will only e a,ter %onths o, stalled ne$otiations and several ,ailures that they will $ive up on $ettin$ the data or' in startup parlance' pivot. In the %eanti%e' you est ,i$ure out so%e other ways o, ein$ use,ul and creatin$ value ,or your new or$aniCation. 2) Who will you report to and what is her or his background? "o' really what you are askin$ is! does the person who will clai% %e as a %inion actually have e+perience with data and do they understand the a%ount o, ti%e that wran$lin$ data can take1 I, you are reportin$ to an Dana$e%entEF+ecutive type' this 0uestion is all i%portant and your very survival likely depends on your answer. 7irst' $o read the <ervais Principle at rion,ar%. 7ro% %y e+perience' the ideas aren/t too ,ar o,, o, the %ark. "econd' %any data-related tasks are conceptually trivial. 2owever' these tasks can take an a%ount o, ti%e see%in$ly inversely proportional to their si%plicity. &r' even worse' so%ethin$ that is conceptually very si%ple %ay e %athe%atically or statistically very challen$in$ or re0uire %any di,,icult and ti%e-consu%in$ steps. "o%ethin$ like count the nu%er o, tweets ,or or a$ainst a particular topic is trivial ,or people ut less so ,or al$orith%s. 7urther' as everyone knows' data wran$lin$ on any pro-ect can consu%e G0> or %ore o, the total pro-ect ti%e and' unless that %ana$er has worked with data' she or he %ay not understand this reality. 4he rule o, thu% to never ,or$et is that i, so%eone does not understand so%ethin$' that person will al%ost always under appreciate it. I swear there %ust e a class in A%erican D3A pro$ra%s that teaches i, you don/t understand so%ethin$ it %ust e si%ple and only take ,ive %inutes. I, you are reportin$ to a )4&-type' the situation %ay see% etter ut it actually %i$ht e worse. "o,tware en$ineerin$ and develop%ent do not e0ual data science. 4echnical e+perience' %ost o, the ti%e' does not e0ual data e+perience. 2avin$ $one throu$h a ,ew se%esters o, calculus does not a statistics ack$round %ake. 2ope,ully' I have %ade %y point. 4here is a reason we call the ,ields so,tware HHen'ineerin'HH 9nice and predictale: and data HHscienceHH 9conductin$ e+peri%ents to test hypotheses:. 2owever' %any technically-oriented people %ay elieve they know %ore than they actually do. "hort version ,or I2 is that ti%e e+pectations are i%portant to ,lesh out up ,ront and are hi$hly dependent on your oss/ ack$round. 4hird' your co%%unications strate$y will chan$e radically dependin$ on your oss/ ack$round. #o they want the sordid details o, how you worked throu$h the data or do they -ust want the otto% line i%pact1 3) How will my progress and/or perormance be measured? Jnowin$ how to succeed in your new workplace is pretty i%portant and the e+pectations surroundin$ data science are stratospheric at the %o%ent. Jeep your eyes peeled i, there is a $ood 0uick win availale ,or you to de%onstrate your value 9and this is a 0uestion that I would directly ask:. 4he $iant red ,la$ here is i, you will e included in an Aa$ileB so,tware process with data-work shoehorned into short-ter% sprints alon$ with the en$ineerin$ or develop%ent tea%. #ata "cience is science and %any tasks will o,ten have you dealin$ with the dreaded unknown unknown. In other words' you are e+plorin$ terra incognita' a process that is unpredictale at est. Dana$in$ data scientists is very di,,erent than %ana$in$ so,tware en$ineers. !) How many other data scientists/practitioners will you be working with and are in the company overall? 5hat you are tryin$ to understand here is how data-driven 9versus e$o-driven: the co%pany that you are thinkin$ o, -oinin$ is. I, the co%pany has e+isted ,or %ore than a ,ew years and has ,ew data science or analyst types' it is proaly e$o driven. Put another way' decisions are %ade y the 2iPP&s 9the -I$hest Paid Person/s 2pinions:. I, your data analyses are $oin$ to e used ,or internal decision %akin$' this possily puts you' the new hire' directly a$ainst the 2iPP&s. <uess who will win that ,i$ht1 I, you are $oin$ into this position' %ake sure you will e ar%in$ the 2iPP& with knowled$e as opposed to ,i$htin$ directly a$ainst other 2iPP&s. ") Has anyone ever run analyses on the company#s data? 4his one is critical i, you will e doin$ any type o, retrospective analyses ased on previously collected data. I, you si%ply ask the co%pany i, they have ever looked at their data' the answer is o,ten yes re$ardless o, whether or not they have as %ost co%panies don/t want to ad%it that they haven/t. Instead' ask what types o, analyses the co%pany has done on its data' did the e+a%ination cover all o, the co%panies data' and ask who 9ein$ care,ul to in0uire aout this person/s ack$round and credentials: did the work. 4he reason this line o, 0uestionin$ is so i%portant is that the ,irst ti%e you plu% the depths o, a co%pany/s dataase' you are likely to di$ up so%e skeletons. And y likely I really %ean certainly. In ,act' $oin$ throu$h historically collected data is %uch like an archeolo$ical e+cavation. As you $o ,urther ack into the dataase' you $o throu$h deeper layers o, the history o, the or$aniCation and will learn %uch. *ou %i$ht ,ind out when they chan$ed contractors or when they decided to stop collectin$ a particular ,ield that you -ust happen to need. *ou %i$ht see when the servers went down ,or a day or when a particularly well hidden u$ prevented dataase writes ,or a ,ew weeks. 4he i%portant point here is that you %i$ht uncover issues that so%e people still present in the co%pany would pre,er not to e unearthed. Dy si%ple advice' tread li$htly. Nailing the Tech Interview Jessica Kirkpatrick is the Director of Data Science at InstaEDU, and formerly a data scientist on the analytics team at Yammer (Microsoft). efore that she !as an "strophysicist at U# erkeley and has also $een an Insi%ht mentor since the pro%ram&s fo'ndin%. elo! is a %'est post, ori%inally appearin% on the (omen in "stronomy $lo%, !here Jessica shares her tips on doin% !ell in technical )o$ inter*ie!s. A year a$o' I %ade the transition ,ro% astrophysicist to data scientist. &ne o, the harder parts o, %akin$ the transition was convincin$ a tech co%pany 9durin$ the interview process: that I could do the -o. 2avin$ now een on oth sides o, the interview tale' IKd like to share so%e advice to those wishin$ to reak into the techEdata science industry. 5hile this advice is applicale to candidates in $eneral' IK% $oin$ to e $earin$ it towards applicants co%in$ ,ro% acade%ia E Ph# pro$ra%s. Dost tech co%panies are interested in s%art' talented people who can learn 0uickly and have $ood prole% solvin$ skills. 5e see acade%ics as havin$ these skills. 4here,ore' i, you apply ,or internships or -os at tech co%panies' you will %ost likely $et a response ,ro% a recruiter. 4he prole% is that once you $et an interview' there are a lot o, industry-speci,ic skills that the co%pany will try to assess' skills that you %ay or %ay not have already. 3elow are some o, the traits we look ,or when recruitin$ ,or the *a%%er analyticsEdata tea%' descriptions o, how we try to deter%ine i, a candidate has these traits' and what you should do to KnailK this aspect o, the interview. 1. Interest in the Position 4his sounds like a no-rainer' ut you would e surprised at how %any candidates havenKt done proper research aout the co%pany or the position. It is especially i%portant ,or people co%in$ ,ro% acade%ic ack$rounds to de%onstrate why they are interested in %akin$ this transition and why they are speci,ically interested in this opportunity. 5hen I ask a candidate L5hy are you interested in -oinin$ %y tea%1L I o,ten $et responses like LI really want to %ove to "an 7ranciscoL or LIK% sick o, %y research.L .either o, these responses de%onstrate speci,ic interest in %y tea% or %y co%pany. How to Nail It! #o research aout the position you are applyin$ ,or. ;nderstand what the role entails' the co%panyKs $oals and priorities' and the product9s: that you will e workin$ on. 2ave a convincin$ story ,or why you are %akin$ this career chan$e or why you want to leave your current position. "how enthusias% ,or the opportunityMevery interviewer should think that their position is your nu%er one choice and that you canKt wait to -oin their tea%. Dore i%portantly' only apply ,or roles that you $enuinely ,ind interestin$. 2. "#cellent Problem 0olvin' 0+ills &ne o, the %ost challen$in$ aspects o, the analystEdata scientist role is takin$ a va$ue 0uestion posed y so%eone within the co%pany' and ,i$urin$ out how to est answer it usin$ our data sets. 4estin$ 9and de%onstratin$: this skill in an interview very di,,icult. At *a%%er we try to test this skill y askin$ a co%ination o, open-ended prole%s' rain teasers' and scenarios si%ilar to those we deal with on a re$ular asis. 7or %any o, these 0uestions there isnKt a ri$ht or wron$ answer' we are %ore interested in the way the candidate constrains the prole%' articulates her thou$ht process' and how e,,iciently she $ets to a solution. 7or so%e data science positions you will e asked to do codin$ prole%s. 7a%iliariCe yoursel, with so%e o, the standard codin$ al$orith%s and 0uestions. How to Nail It! 4hese types o, prole%s are asked y %any tech co%panies and there are plenty o, e+a%ples o, the% on the we. Practice constrainin$' co%in$ up with a clear $a%e plan' articulatin$ that plan' and then ,ollowin$ throu$h in a %ethodical way. Dany prole%s are hard to answer as posed and so tryin$ si%pler versions o, the prole% or lookin$ at ed$e cases can $ive you insi$ht into how to ,ind patterns. "o%eti%es not all the relevant in,or%ation is $iven y the interviewer' donKt e a,raid to ask clari,yin$ 0uestions or turn the process into a discussion. I, the interviewer tries to $ive you hints or tips' take the%. 4here is nothin$ %ore ,rustratin$ 9as an interviewer: than tryin$ to $uide a candidate ack on track and have her i$nore your help 9it also doesnKt ode well ,or the intervieweeKs aility to work well with others:. (. Communication 0+ills As I said in a previous post' co%%unication is key. 5e are lookin$ ,or so%eone who can clearly articulate her thou$ht process' and can e convincin$ that her approach is correct even when ein$ challen$ed y the interviewer. A standard way we will test this is y posin$ an open-ended 0uestion and when the interviewee says a reasonale answer' we $ive a reason why that isnKt ri$ht' then the she co%es up with a di,,erent e+planation' and we ne$ate it a$ain. 5e keep $oin$ to see how she deals with havin$ to switch directions' and alances de,endin$ her answer and ein$ ,le+ile with takin$ the interviewerKs su$$estions. How to Nail It! Practice articulatin$ your approach and %ethods ,or the aove KtechnicalK interview 0uestions. )o%e up with a i$-picture $a%e plan ,or approachin$ the prole%' e clear aout that plan' have a %ethodical approach' and then e+ecute itMall the while articulatin$ your thou$ht process as %uch as possile. I, the interviewer tries to %ake you chan$e directions' itKs ok to de,end your approach' ut you donKt want to e too ri$id' they %i$ht e tryin$ to help you not $o down the wron$ path. 4ry to %ake the interaction as pleasant and war% as possile. Avoid $ettin$ de,ensive' ,rustrated' or -ust $ivin$ up. It is a very hard alance' ut practice 9especially with another person who can $ive you ,eedack: %akes per,ect. N. Culture 3it In tech co%panies you work collaoratively on pro-ects on ti$ht knit tea%s. 5e are $oin$ to e spendin$ a lot o, ti%e with a candidate i, we hire herO we want to en-oy that ti%e to$ether. 4here,ore we are also tryin$ to assess i, the interviewee would e a $ood coworker at an interpersonal level. Is she ,riendly1 #oes she work well on tea%s1 #oes she have the ri$ht alance o, ein$ opinionated ut not do%ineerin$1 Is she an interestin$ person1 5hat are her passions and $oals1 I canKt tell you how %any ti%es IKve asked a candidate L5hat do you like to do ,or ,un1L and they answer! LI like to read pro$ra%%in$ ooks.L Is that really what you like to do ,or ,un1 &r do you -ust think that is what you are supposed to say in a tech interview1 How to Nail It! @e%e%er that your interviewer is a person too' and interact with the% as a person. 4ry to show so%e o, your personality' passion' sense o, hu%or' and uni0ueness in the interview. ItKs hard to e rela+ed in these situations' ut personality $oes a lon$ way. P. As+ 4ood Questions At the end o, the interview you will typically have a chance to ask 0uestions. 4his is your ti%e to take control o, the process and turn the tales on the interviewer. "o%eti%es I learn the %ost aout a candidate in how she uses this portion o, the interview. A interviewee I a% on the ,ence aout can really tip the decision one way or another y askin$ intelli$ent' thou$ht provokin$' and en$a$in$ 0uestions at the end o, an interview 9or orin$' unin,or%ed' or $eneric 0uestions:. How to Nail It! ;se this as an opportunity to co%%unicate thin$s you werenKt ale to show in other parts o, the interview. #e%onstrate that you have researched the co%pany' that you understand their usiness $oals and the way you could contriute. Ask thou$ht,ul 0uestions aout the role' de%onstrate that you want so%ethin$ that is challen$in$ and discuss types o, skills you want to learn or apply. ;se this opportunity to show the interviewer what skills you can rin$ to the role. I, applicale' try to relate what you are learnin$ aout the -oEco%pany to what youKve done in the past. Prepare tons o, 0uestions' write the% down ahead o, ti%e and rin$ the% to the interview. *ou shouldnKt run out o, 0uestions or have to repeat the% over the course o, the day. 4he aove is y no %eans an e+haustive list o, everythin$ a tech co%pany is lookin$ ,or' and o, course di,,erent co%panies have di,,erent approaches. 5hen I interviewed ,or %y current -o 9I recently %oved to the education start-up InstaF#;:' %ost o, the interview involved discussin$ %y previous pro-ects' the prole%s that the co%pany was ,acin$' and how I could provide value to the% as a data scientist. It was a very di,,erent e+perience interviewin$ ,or %y second -o in the tech industry than ,or %y ,irst. 2owever' I do hope that the aove de%ysti,ies the tech interview process' $ives you insi$ht into how one co%pany $oes aout hirin$ data people' and helps you understand what we are lookin$ ,or on the other side o, the tale. Ads By Google 4he 3usiness Analyst -o description %ay vary ,ro% one co%pany to another. 4he -o re0uire%ents o, a person ,illin$ a usiness analyst position depend on the usiness nature o, a $iven co%pany. 4here,ore' each 3usiness Analyst -o interview can e co%pletely di,,erent. 5ote. a 3usiness Analyst is so%eti%es re,erred to as a "yste% Analyst or Fn$ineers Analysts or I4 usiness analyst. 4his article provides sa%ples o, -o interview 0uestions ,or a usiness analyst position. In $eneral' the usiness analyst -o description is! Effet!"ely translate bus!ness needs to appl!at!ons and operat!ons# $hallenge ross o%pany un!ts and pro"!de the re&u!re%ents for the '() tea%# *se ases, sur"eys+senar!os analys!s, and ,or-flo, analys!s, e"aluat!ng !nfor%at!on, foal po!nt for !nternal and e.ternal usto%ers, def!ne users/ needs and on"ert to bus!ness ases# 4he skills re0uire%ents ,or a usiness analyst are! 0eadersh!p )e!s!on %a-!ng $onfl!t resolut!on Presentat!on s-!lls E.ellent "erbal and ,r!tten o%%un!at!on s-!lls 1nterpersonal o%%un!at!on s-!lls Analyt!al th!n-!ng and a negot!ator Sample Business Analyst Interview Questions 1# )esr!be your respons!b!l!t!es as a bus!ness analyst !n your last 2ob# 2# 3hat are the B1, Bus!ness 1ntell!gene, report!ng tools you use for a g!"en pro2et# )esr!be the pro2et, the B1 tool+s and the report e.trated# 3# 4o, do you selet ,h!h B1 tool to !%ple%ent5 4o, do you de!de on the report6 fre&ueny, update6fre&ueny and user6needs based on the ob2et!"es you ,ant to 7# ah!e"e for that report5 'efer to B1 tools suh as 8 $ognos )!so"erer, Bus!ness 9b2ets and $rystal reports et# :# $an you l!st the des!red s-!lls needed to perfor% effet!"ely as a Bus!ness Analyst5 ;# 3hat types of %odel!ng re&u!re%ents are used !n the bus!ness appl!at!ons of the analyst5 <# 1f t,o o%pan!es are %erg!ng, e.pla!n ,h!h tas-s you ,ould !%ple%ent and ho,, to ensure a suessful un!on5 =# 4a"e you been respons!ble for ass!gn!ng tas-s to testers5 4o, ,ere you re&u!red to !ntegrate the results found5 4o, do you oord!nate these respons!b!l!t!es ,!th the tea% and your %anage%ent5 9# $an you e.pla!n the ter% >push ba-? !n relat!on to bus!ness users5 3hat th!s %eans to you5 10# 3hen ,or-!ng on a pro2et, at ,hat po!nt ,ould the re&u!re%ents of a @raeab!l!ty Matr!. be !%ple%ented and for ,hat purpose5 11# Aor proess test!ng, an you e.pla!n the role of the Bus!ness Analyst5 12# 3hen ,or-!ng ,!th spe!f! dou%ent re&u!re%ents, an you e.pla!n or def!ne the steps to reate *se $ases5 13# At ,hat po!nt of a pro2et !s the *se $ase syste% o%plete5 3hat are the ne.t steps !n the pro2et phase5 "o%e other technical 0uestions that %ay e asked durin$ the -o interview include! Technical terms and Technical questions "o%e technical ter%s that %ay e used y the interviewer to veri,y your knowled$e and co%petencies would e! *M0 %odel!ng GAP analys!s S)0$ %ethodolog!es @raeab!l!ty Matr!. '*P, 'at!onal *n!f!ed Proess, !%ple%entat!on# *1 )es!gns and *1 )es!gn Patterns Syste% )es!gn )ou%ent BS))C 'e&u!re%ent Manage%ent @ool, 'e&u!re%ents Model!ng *se $ase and @est $ase '!s- Manage%ent Bus!ness plan )ata %app!ng Bla- bo. test!ng and 3h!te bo. test!ng Push Ba- fro% Bus!ness *sers 3aterfall Method and Prototyp!ng Model and the!r hybr!d 1nterfae + 1ntegrat!on %app!ng Aunt!onal re&u!re%entsD AS)BAunt!onal Spe dou%entC or A'S+M') End user support and user aeptane test!ng B*A@C Eal!dat!on of the re&u!re%ents )eter%!ne 8 '91, ash flo,, brea- e"en and f!.ed+"ar!able osts and sale pr!e Answer. 4his is your ti%e to review your knowled$e aout the aove usiness ter%s. Dake sure you have $ood understandin$ o, so%e o, these uCC words o, the industry o, your interest. @ead a$ain the -o description which was detailed in the -o openin$ and i, you ,ind special re0uire%ents' prepare yoursel, to answer related 0uestions In case you have e+pertise in several speci,ic areas' ,ind the ri$ht ti%e durin$ the interview to speak aout these pro,essional e+pertise that you $ained in your career. -ow to 3ace an Interview or Data Analyst 7acin$ a -o interview ,or a data analyst position' so%eti%es re,erred to as a statistician position' can e inti%idatin$. Analysts o,ten have to evaluate' sort and report on data that is inco%plete or erroneous' so an interviewer will likely ask how you handle those assi$n%ents. #onKt $et rattled y tou$h 0uestions. "tay positive and use personal e+a%ples ,ro% previous pro-ects to support your skills and e+perience. Data$%athering &'perience #ata analysts are o,ten responsile ,or $atherin$ and co%pilin$ data ,ro% various reputale sources e,ore %akin$ evaluations' drawin$ conclusions and issuin$ reports. F+pect the hirin$ %ana$er to ask 0uestions like' L2ow do you $o aout collectin$ in,or%ation to support your analyses1L or' L5hat types o, data have you researched and analyCed in the past1L 4he e%ployer %i$ht need data analyses to create new advertisin$ strate$ies' prepare short and lon$-ter% ,inance ud$ets' or deter%ine which co%pany products are %ost pro,itale. Answer data- collectin$ 0uestions with speci,ic e+a%ples o, how you success,ully used $roup sa%ples' conducted %arket research' reviewed ,inancial reports or analyCed surveys to %ake ,air and consistent assess%ents. (alidity o Data #ata isnKt always accurate' co%plete' understandale' consistent' predictale or ene,icial to %eetin$ a co%panyKs $oals' so e+pect interview 0uestions aout your %ethods ,or veri,yin$ and validatin$ in,or%ation. *ou %i$ht discuss ways you take avera$es' ,ind %edians' doule-check 0uestionale entries' ,ind alternative research to support your ,indin$s or consult specialists. Dost i%portantly' you want to show the interviewer that you are an e,,ective prole%-solver' trouleshooter and decision-%aker so she has no reason to 0uestion your skills or capailities. )otware 4he hirin$ %ana$er will likely ask aout your co%puter skills and e+perience usin$ analytic so,tware. #ata analysts process collected data and reach conclusions with the help o, co%puter so,tware' accordin$ to the ;.". 3ureau o, Laor "tatistics. #iscuss any e+perience youKve had with statistical so,tware' such as "tata' @"tudio' P"PP or <D#2 "hell. I, %ost o, your previous work has een with inter-o,,ice spreadsheets or Dicroso,t F+cel ,iles' assure the interviewer that you are pro,icient with those types o, data ,iles and would e willin$ to learn any new so,tware pro$ra%s necessary ,or the -o. *ommunication and +resentation )kills #ata analysts %ust co%%unicate results' ,indin$s and ,uture $oals usin$ visual aids' such as charts' $ra,ts and in,o$raphics. 4he interviewer will likely ask' L5hat are your co%%unication stren$ths1L or' LF+plain how you or$aniCe and create presentations to report analytical ,indin$s1L Answer these 0uestions with speci,ic e+a%ples o, presentations' reports and se%inars youKve created or hosted. 4he interviewer wants assurance that you have the people skills and interpersonal stren$ths to e,,ectively relay your analyses and results. References *#S# Bureau of 0abor Stat!st!sD Stat!t!!an Resources AorbesD Analyt!s !s Aast Beo%!ng a $ore $o%peteny for Bus!ness Profess!onals Psyhology @odayD Serets to a Suessful Job 1nter"!e, About the Author Jristine 4ucker has een writin$ articles on ,inance' politics' hu%anities and interior desi$n since 2001. 2er articles have een ,eatured in %any online pulications. 4uckerKs e+perience as an Fn$lish teacher has $iven her the opportunity to read %any wonder,ul %asterpieces. "he holds a de$ree in political science with a %inor in international studies.
F *po%!ng 1nfor%at!on 'etr!e"al $onferenes )rea%# A!t# Pass!on# G ,etiring a %reat -nterview +roblem August 8th, 2011 105 Coents !eneral Interviewin$ so,tware en$ineers is hard. Je,, Atwood e%oans how di,,icult it is to ,ind candidates who can write code. 4he tech press sporadically pulishes AestB interview 0uestions that %ake %e crin$e M thou$h I love the IJFA 0uestion. "tartups like )odility and Interview "treet see this challen$e as an opportunity' o,,erin$ hirin$ %ana$ers the prospect o, outsourcin$ their codin$ interviews. Deanwhile' #ie$o 3asch and others are ur$in$ us to stop su-ectin$ candidates to whiteoard codin$ e+ercises. I don/t have a silver ullet to o,,er. I a$ree that IQ tests and $otcha 0uestions are a terrile way to assess so,tware en$ineerin$ candidates. At est' they test only one desirale attriuteO at worst' they are a crapshoot as to whether a candidate has seen a si%ilar prole% or stu%les into the key insi$ht. )odin$ 0uestions are a %uch etter tool ,or assessin$ people whose day -o will e codin$' ut conventional interviews M whether y phone or in person M are a suopti%al way to test codin$ stren$th. Also' it/s not clear whether a codin$ 0uestion should assess prole%- solvin$' pure translation o, a solution into workin$ code' or oth. In the ,ace o, all o, these challen$es' I ca%e up with an interview prole% that has served %e and others well ,or a ,ew years at Fndeca' <oo$le' and LinkedIn. It is with a heavy heart that I retire it' ,or reasons I/ll discuss at the end o, the post. 3ut ,irst let %e descrie the prole% and e+plain why it has een so e,,ective. /he Problem I call it the Aword reakB prole% and descrie it as ,ollows! 1iven an input string and a dictionary of words segment the input string into a space-separated se"uence of dictionary words if possible. 2or example if the input string is $applepie$ and dictionary contains a standard set of 3nglish words then we would return the string $apple pie$ as output. .ote that I/ve delierately le,t so%e aspects o, this prole% va$ue or underspeci,ied' $ivin$ the candidate an opportunity to ,lesh the% out. 2ere are e+a%ples o, 0uestions a candidate %i$ht ask' and how I would answer the%! ,4 /hat if the input string is already a word in the dictionary0 A4 A single word is a special case of a space-separated se"uence of words. ,4 &hould I only consider segmentations into two words0 A4 5o but start with that case if it6s easier. ,4 /hat if the input string cannot be segmented into a se"uence of words in the dictionary0 A4 Then return null or something e"uivalent. ,4 /hat about stemming spelling correction etc.0 A4 7ust segment the exact input string into a se"uence of exact words in the dictionary. ,4 /hat if there are multiple valid segmentations0 A4 7ust return any valid segmentation if there is one. ,4 I6m thinking of implementing the dictionary as a trie suffix tree 2ibonacci heap ... A4 8ou don6t need to implement the dictionary. 7ust assume access to a reasonable implementation. ,4 /hat operations does the dictionary support0 A4 3xact string lookup. That6s all you need. ,4 -ow big is the dictionary0 A4 Assume it6s much bigger than the input string but that it fits in memory. "eein$ how a candidate ne$otiates these details is instructive! it o,,ers you a sense o, the candidate/s co%%unication skills and attention to detail' not to %ention the candidate/s asic understandin$ o, data structures and al$orith%s. A 3i667u66 0olution Fnou$h with the prole% speci,ication and on to the solution. "o%e candidates start with the si%pli,ied version o, the prole% that only considers se$%entations into two words. I consider this a 7iCC3uCC prole%' and I e+pect any co%petent so,tware en$ineer to produce the e0uivalent o, the ,ollowin$ in their pro$ra%%in$ lan$ua$e o, choice. I/ll use Java in %y e+a%ple solutions. &tring &egment&tring!&tring input &et9&tring: dict' ; int len < input.length!'# for !int i < =# i 9 len# i>>' ; &tring prefix < input.substring!? i'# if !dict.contains!prefix'' ; &tring suffix < input.substring!i len'# if !dict.contains!suffix'' ; return prefix > $ $ > suffix# @ @ @ return null# @ I have interviewed candidates who could not produce the aove M includin$ candidates who had passed a technical phone screen at <oo$le. As Je,, Atwood says' 7iCC3uCC prole%s are a $reat way to keep interviewers ,ro% wastin$ their ti%e interviewin$ pro$ra%%ers who can/t pro$ra%. A 4eneral 0olution &, course' the %ore interestin$ prole% is the $eneral case' where the input strin$ %ay e se$%ented into any nu%er o, dictionary words. 4here are a nu%er o, ways to approach this prole%' ut the %ost strai$ht,orward is recursive acktrackin$. 2ere is a typical solution that uilds on the previous one! &tring &egment&tring!&tring input &et9&tring: dict' ; if !dict.contains!input'' return input# int len < input.length!'# for !int i < =# i 9 len# i>>' ; &tring prefix < input.substring!? i'# if !dict.contains!prefix'' ; &tring suffix < input.substring!i len'# &tring seg&uffix < &egment&tring!suffix dict'# if !seg&uffix A< null' ; return prefix > $ $ > seg&uffix# @ @ @ return null# @ Dany candidates ,or so,tware en$ineerin$ positions cannot co%e up with the aove or an e0uivalent 9e.$.' a solution that uses an e+plicit stack: in hal, an hour. I/% sure that %any o, the% are co%petent and productive. 3ut I would not hire the% to work on in,or%ation retrieval or %achine learnin$ prole%s' especially at a co%pany that delivers search ,unctionality on a %assive scale. Analy6in' the 8unnin' /ime 3ut wait' there/s %oreQ 5hen a candidate does arrive at a solution like the aove' I ask ,or an i$ & analysis o, its worst-case runnin$ ti%e as a ,unction o, n' the len$th o, the input strin$. I/ve heard candidates respond with everythin$ ,ro% &9n: to &9nQ:. I typically o,,er the ,ollowin$ hint! %onsider a pathological dictionary containing the words $a$ $aa$ $aaa$ ... i.e. words composed solely of the letter 6a6. /hat happens when the input string is a se"uence of n-= 6a6s followed by a 6b60 2ope,ully the candidate can ,i$ure out that the recursive acktrackin$ solution will e+plore every possile se$%entation o, this input strin$' which reduces the analysis to deter%ine the nu%er o, possile se$%entations. I leave it as an e+ercise to the reader 9with this hint: to deter%ine that this nu%er is &92 n :. An "icient 0olution I, a candidate $ets this ,ar' I ask i, it is possile to do etter than &92 n :. Dost candidates realiCe this is a loaded 0uestion' and stron$ ones reco$niCe the opportunity to apply dyna%ic pro$ra%%in$ or %e%oiCation. 2ere is a solution usin$ %e%oiCation! )ap9&tring &tring: memoiBed# &tring &egment&tring!&tring input &et9&tring: dict' ; if !dict.contains!input'' return input# if !memoiBed.containsCey!input' ; return memoiBed.get!input'# @ int len < input.length!'# for !int i < =# i 9 len# i>>' ; &tring prefix < input.substring!? i'# if !dict.contains!prefix'' ; &tring suffix < input.substring!i len'# &tring seg&uffix < &egment&tring!suffix dict'# if !seg&uffix A< null' ; memoiBed.put!input prefix > $ $ > seg&uffix'# return prefix > $ $ > seg&uffix# R R %e%oiCed.put9input' null:O return nullO R A$ain the candidate should e ale to per,or% the worst-case analysis. 4he key insi$ht is that "e$%ent"trin$ is only called on su,,i+es o, the ori$inal input strin$' and that there are only &9n: su,,i+es. I leave as an e+ercise to the reader to deter%ine that the worst-case runnin$ ti%e o, the %e%oiCed solution aove is &9n 2 :' assu%in$ that the sustrin$ operation only re0uires constant ti%e 9a discussion which itsel, %akes ,or an interestin$ tan$ent:. ,hy I 9ove /his Problem 4here are lots o, reasons I love this prole%. IKll enu%erate a ,ew! 1t !s a real proble% that a%e up !n the ouse of de"elop!ng produt!on soft,are# 1 de"eloped EndeaHs or!g!nal !%ple%entat!on for re,r!t!ng searh &uer!es, and th!s proble% a%e up !n the onte.t of spell!ng orret!on and thesaurus e.pans!on# 1t does not re&u!re any spe!al!Ied -no,ledge 66 2ust str!ngs, sets, %aps, reurs!on, and a s!%ple appl!at!on of dyna%! progra%%!ng + %e%o!Iat!on# Bas!s that are o"ered !n a f!rst6 or seond6year undergraduate ourse !n o%puter s!ene# @he ode !s non6tr!"!al but o%pat enough to use under the t!ght ond!t!ons of a 7:6 %!nute !nter"!e,, ,hether !n person or o"er the phone us!ng a tool l!-e $ollabed!t# @he proble% !s halleng!ng, but !t !snHt a gotha proble%# 'ather, !t re&u!res a %ethod!al analys!s of the proble% and the appl!at!on of bas! o%puter s!ene tools# @he and!dateHs perfor%ane on the proble% !snHt b!nary# @he ,orst and!dates donHt e"en %anage to !%ple%ent the f!IIbuII solut!on !n 7: %!nutes# @he best !%ple%ent a %e%o!Ied solut!on !n 10 %!nutes, allo,!ng you to %a-e the proble% e"en %ore !nterest!ng, e#g#, as-!ng ho, they ,ould handle a d!t!onary too large to f!t !n %a!n %e%ory# Most and!dates perfor% so%e,here !n the %!ddle# -appy 8etirement ;n,ortunately' all $ood thin$s co%e to an end. I recently discovered that a candidate posted this prole% on <lassdoor. 4he solution posted there hardly $oes into the level o, detail IKve provided in this post' ut I decided that a prole% this $ood deserved to retire in style. ItKs hard to co%e up with $ood interview prole%s' and itKs also hard to keep secrets. 4he secret %ay e to keep ,ewer secrets. An ideal interview 0uestion is one ,or which advance knowled$e has li%ited value. IK% workin$ with %y collea$ues on such an approach. .aturally' IKll share %ore i, and when we deploy it. In the %ean ti%e' I hope that everyone who e+perienced the word reak prole% appreciated it as a worthy test o, their skills. .o prole% is per,ect' nor can per,or%ance on a sin$le interview 0uestion ever e a per,ect predictor o, how well a candidate will per,or% as an en$ineer. "till' this one was pretty $ood' and I know that a unch o, us will %iss it. 105 responses so far : 1 rp1 EE Au$ G' 2011 at 2!P? a% 5hy is this not -ust n lookups1 "tart with one character. I, that ,ails' look up the ,irst two' etc. 5hen a lookup succeeds' insert a space and set the strin$ to start at the ne+t une+a%ined character. 4hus AaaaaaB eco%es Aa a a a aB. 5orks $reat with a trie. 2 acepalm EE Au$ G' 2011 at (!2= a% rp1' you/d ,ail the interview the author already e+plained why in the article. ( binarysolo EE Au$ G' 2011 at (!N0 a% rp1 6 *ou need to consider cases where a word is co%posed o, a valid word ase with a su,,i+ that is not a standalone word. "ay' valid word S an invalid word' such as! AshorterB 6T U short S er U 9valid: S 9invalid:. @ecursion -ust %akes the %ost sense as the author wroteO reducin$ the prole% y accountin$ ,or the strin$ ,ro% the ack. As he points out' ad case is when you have tons o, viale character co%inations that re%ain workale with prior chars in co%inations' and one persistent inco%patiility that ,orces the e+ploration o, the entire space o, solutions . Fr' I/% not articulatin$ it wellO -ust read his answer which e+plains it a lot etter. N 4rant -usbands EE Au$ G' 2011 at (!N( a% Vrp1' you/re ,ailin$ to think aout acktrackin$O i%a$ine the input is AaaaaaB. P 0eems easy EE Au$ G' 2011 at (!NN a% rp1Ws solution see%s per,ectly ,ine to %e. 4he author did not e+plain why I couldn/t use it. I, you think di,,erently e+plain why. = 8ic+ ,illiams EE Au$ G' 2011 at (!N= a% *ou %ake Ainterview candidatesB si$n .#As in order to interview. 4hat/s a%aCin$. Dind sharin$ the te+t o, the .#A' I/d like to see what it covers. 4hanks in advance. X binarysolo EE Au$ G' 2011 at (!N= a% Incidentally I/% kinda surprised this is considered a worthy-enou$h interview pro. I/% not a pro$ra%%er y trade' thou$h I en-oyed the )"10=+S10X se0uence at a li/l school near #aniel/s location and would think that a ,resh%an with so%e a%ount o, )" thinkin$Ee+perience would trivially solve this. G 6ui EE Au$ G' 2011 at (!PN a% the %ain issue with such interviews is that P0> o, the candidates have troules answerin$ those under stress no %atter how si%ple. i know its an issue hard to solve. interviewers don/t care aout ,indin$ the proper candidate. they care that a candidate passed tests so can/t e la%ed i, the candidate does not per,or% well enou$h at work. interviewers care o, a level o, assurance' i, you pre,er. hey it/s not as ad as it sounds' the candidate indeed is $oin$ to e likely to ,it the position. will he really ,it' will he like it and is he actually $ood at solvin$ HnewH prole%s or -ust solvin$ classic interview 0uiCCes 1 5ho cares thats a risk the interviewer is willin$ to take. A true interviewer' that is' so%eone who/s %ain care is that the candidate is $oin$ to e helpin$ out the co%pany is not $oin$ to $ive all these stupid 0uestions the sa%e way. A true interviewer' while he has a lot o, 0uestions and steps prepared will not serve the% to the candidate. he will learn to know the candidate in a short a%ount o, ti%e and ask the ri$ht 0uestions' ones which %ay not even have een prepared e,ore. a true interviewer sees the challen$e in every candidate he interviews' and not -ust the opposite 9where the candidate is challen$ed y the interviewer/s 0uestions: a true interviewer disre$ard the crap 9hello est 0uestions list: and ,ocus on what %atters. ,inally' new talents in the it world are 2A@# to co%e y so there/s a lot o, co%petition to recruit the%. well' let %e tell you this strai$ht' there/s %any talents which are -ust discarded y the reviewEinterview process that are here ,or you to pick ,ro%. thats how we ,ind $e%s that everyone elses passed upon. then people wonder why %y tea% always outper,or%s their. ? t6s EE Au$ G' 2011 at N!(2 a% Interestin$ prole%' which I don/t think I/ve seen e,ore. 2ere/s the &9nY2: solution I/d have co%e up with i, con,ronted with this thin$. 3uild a directed $raph consistin$ o, vertices laeled 0' 1' 2' 8' n' where n is the len$th o, the strin$' where there is an ed$e ,ro% k to - i, and only i, there is a dictionary word o, len$th --k startin$ at position k in the strin$. 4his can e done in &9nY2:. "olutions to the prole% then correspond to paths throu$h the $raph ,ro% 0 to n. ;se so%ethin$ like #i-kstra/s al$orith% to ,ind a %ini%al path in &9nY2:' which corresponds to a solution that uses the s%allest nu%er o, dictionary words to e+actly cover the strin$. 10 7enjash EE Au$ G' 2011 at N!P( a% I% not a ,ully lown coder y any %eans. 3ut' I was thinkin$ a solution' would e to reak the strin$ into consonant clusters 9syllales %aye1: and $roup the% into wei$hted $roups. &r so%e al$orith% %easurin$ le+ical units within the strin$. so%e dataase o, word clusters like! "trin$ U applepieis$reat (wordcluster U AappB' AeatB' ApleB' AondB etc 8 2wordcluster U AapB' AleB' ApiB' AisB etc 4hen uild up the words app S le E ap S ple E ple S pie E ple S pi E pie S is E etc 4hen workin$ ack ,ro% the i$$est clusters run look ups a$ainst the dictionary see i, its a word. 4hen it would e a si%ple -i$saw puCCle o, what words ,it in the strin$. 11 7ob EE Au$ G' 2011 at N!PN a% V"ee%s easy Let/s assu%e that this is the dictionary! this te+t is short shorter .ow let/s apply rp1Ws solution to the prole%. It will walk alon$ the strin$ until it ,inds a %atch and apply that %atch to the output. It produces this! this te+t is short er AshortB is oviously a %atch ,or the ,irst P characters o, AshorterB. "ince the solution analyses those P characters ,irst and ,inds a %atch' it happily accepts the word AshortB. It now has the strin$ AerB to analyse. AerB isn/t in the dictionary' so we/ll assu%e it -ust $ets ta$$ed on the end instead o, discarded. 4he correct ehaviour here would e to start acktrackin$. I, we have AerB as an un%atched strin$' it is possile that it connects with so%ethin$ we/ve %atched previously. "o' let/s try AterB. .ope' no %atch. .ow ArterB. .ope' no %atch. 2owever' when we eventually try AshorterB' we/ve $ot a %atch. 5ith acktrackin$' we/ve ,ound a solution that works ,or everythin$ in the strin$! Athis te+t is shorterB )learly this isn/t &9n: as we have to re-e+a%ine co%ponents o, the strin$ %ultiple ti%es. In ,act' rp1Ws al$orith% is &9nY2:' not &9n: as he su$$estst. 12 rc EE Au$ G' 2011 at P!02 a% V "ee%s easy "uppose the words are a' aa' aaa' a "uppose the input is aaa. 4he proper se$%entation is aaSa' ut i, you do it rp1s way you/d try aS1 and ,ail to ,ind a se$%entation. I, you also started with the lon$er strin$ aaa you/d still ,ail. In the worst case scenario there/s a tiny chance you/d $et the ri$ht answer $reedily 1( /om ,hite EE Au$ G' 2011 at P!1X a% )oder interviews are Arocket scienceB. 2ow to detect the est pro$ra%%ers is a hard prole%. 3ut' this codin$ test is ,or a pro$ra%%er who is $oin$ to e hired to write a strin$- processin$ lirary 9or si%ilar:. 4hat %i$ht not e your $oal. 4here is also a pretty hi$h de$ree o, 0uirks in this prole%. Dost pro$ra%%ers co%e ,ro% a ack$round o, processin$ whole lines' whole sentences' whole words' etc. 4his see%s to e askin$ the pro$ra%%er to %ake one dictionary lookup Hper letterH. Perhaps too %uch o, a rain shi,t ,or a hi$h-pressure interview situation. 4hen' you think you speed it up y creatin$ a &9n: cache to hold words you have seen when the &9lo$ n: dictionary has all the words in %e%ory 9and the virtual %e%ory pa$es that $et loaded wouldn/t $et pa$ed out to disk durin$ the one %illisecond this runs:. *our cache can only really help i, it is the case that the pro$ra%' the cache' and the input strin$ are 0uite s%all and ,it in the ,astest level o, processor cache. 4his article and e+ercise con,ir%s %y e+perience that %ost interviewers are not $ood enou$h pro$ra%%ers the%selves and uniased enou$h to reco$niCe the $reat pro$ra%%ers when they pass y. 3ut' that/s &J' the 2@ depart%ent has usually already re-ected the% due to so%e Aprole% with their resu%eB. *ou are correct that a codin$ prole% is asolutely necessary. It needs to e %a+i%ally understandale so as to overco%e the candidate/s an+iety. It needs to start easy and work up via interactive discussion to show the pro$ra%%er/s actual level o, co%petence. 4he auto%ated wesites ,or this look interestin$ ut can only do a ,irst level o, checkin$ ,or asic pro$ra%%in$ aility. 5e all need to step outside ourselves and understand what %akes so%eone a $ood pro$ra%%er! s%art' 0uick learner' and $ood to work with. 1N rc EE Au$ G' 2011 at P!P( a% V4o% 5hite 4he cache has nothin$ to do with speedin$ up lookups or the siCe o, the dictionary. I, you have a strin$ o, len$th L than the standard acktrackin$ solution will in the worst-case do 2YL dictionary lookups. 3y keepin$ track o, which sustrin$s are possile to se$%ent we can reduce this to ZL lookups. 1P Ian ,ri'ht EE Au$ G' 2011 at P!PX a% As to so%eone who hasn/t coded since hi$h school' I/% curious as to what le$al actions you take a$ainst sites like <lassdoor that pulish in,or%ation ared in .#As. I assu%e you would need to know who actually provided the in,or%ation to the site to rin$ any sort o, penlites a$ainst the%. At the end o, the day you/re -ust closin$ the stale door a,ter the horse has already olted. 1= roc+etsur'eon EE Au$ G' 2011 at X!(N a% 4here/s a u$ in the $eneral solution' it doesn/t work as written. Daye your interview 0uestions should include a section on testin$1 1X 8etric EE Au$ G' 2011 at X!N( a% @eadin$ this I can/t help ut your ,allin$ into the -ava trap o, tryin$ to %ake a ridiculously $eneric solution which I would consider a dan$er si$n ut plenty o, people love to see. Anyway' assu%in$ a real world input and real world dictionary you can try plenty o, thin$s that reak down ,or a dictionary that includes ,our hundred A/s ut are actually valid solutions. Also' i, you want a ,ast real world solution then stickin$ with a pure lookup dictionary would slow thin$s down. F[! 3ein$ ale to toss P letters into a lookup tale that says the lon$est and shortest words that start with those letters would save a lot o, ti%e. 3asically' \++yCy/ U null' null saves you ,ro% lookin$ up + to the %a+i%u% word len$th in your dictionary. "econdly sanitiCin$ inputs is $oin$ to e oth sa,er and ,aster. <ranted anythin$ you are codin$ in NP %inutes is $oin$ to e a lon$ way ,ro% production ready code. P"! Fven with acktrackin$ you can code an &9n: solution ,or vary lar$e input sets. Just keep track o, the J est output se0uences that are aka .' .-1'.-2'8.-J. 9J ein$ the len$h o, the lon$est word in your dictionary.: 1G jeremy EE Au$ G' 2011 at G!0( a% Many candidates for software engineering positions cannot come up with the above or an equivalent (e.g., a solution that uses an explicit stac! in half an hour. I"m sure that many of them are competent and productive. #ut I would not hire them to wor on information retrieval or machine learning problems, especially at a company that delivers search functionality on a massive scale. I have a co%%ent aout the relationship o, this 0uestion to A%assive scaleB in,or%ation retrieval and %achine learnin$ prole%s. .aturally' no one wants a &92Yn: solution. 3ut even when the solution is &9nY2:' i, the siCe o, your input n is 10 illion 9aka %assive scale' we scale' i$ data:' even an ele$ant' %e%oiCed al$orith% isn/t tractale anyway' correct1 "o while I indeed like this 0uestion' ecause it has a Apro$ressive revealB in levels o, thinkin$' does so%eone workin$ on we scale 9%assive: data really ever need to implement the hi$hest level o, thinkin$1 &r is it that you -ust want a pro$ra%%er who is aware o, the issue' even i, she or he never has to use it1 1? /he ;word brea+< problem = en>47?blo' EE Au$ G' 2011 at G!0? a% ]...^ @etirin$ a <reat Interview Prole% 9via @etirin$ a <reat Interview Prole%:' this is the est I could $et in less than (0 ]...^ 20 betacoder EE Au$ G' 2011 at G!1P a% 4rivial ut noted! 4he si%pli,ied solution also has a o,,-y-one error. 21 Daniel /un+elan' EE Au$ G' 2011 at G!21 a% 4o those who noted the o,,-y-one error! thanks' it/s ,i+ed now. 22 Daniel /un+elan' EE Au$ G' 2011 at G!2( a% Vinarysolo I also used to think this prole% was too easy. F+perience proved otherwise. In ,act' I ,ound at <oo$le that per,or%ance on this prole% correlated stron$ly to overall per,or%ance in the interview process' aleit with a li%ited sa%ple siCe. 2( rc EE Au$ G' 2011 at G!2P a% V@etric con$ratulations' you/ve reinvented dyna%ic pro$ra%%in$ with your &9n: solution 4hat/d e a per,ect solution. Also' i, your %ap is stored in sorted order 9like in )SS: a inary search will ,ind you the lon$est word that ,its the re%ainder o, the strin$. 94o also e,,iciently ,ind all o, the words that ,it will re0uire so%ethin$ e+tra.: 2N Daniel /un+elan' EE Au$ G' 2011 at G!2X a% V@ick 5illia%s It/s pretty co%%on ,or co%panies to %ake candidates si$n .#As. I/ve learned aout con,idential in,or%ation ,ro% several co%panies I interviewed with' typically as part o, the sell o, e+citin$ thin$s I could work on. 3ut ,or$et the le$alities' which are unen,orceale in practice. Instead think aout how you/d like yoursel, and other en$ineers to e assessed. Part o, the point o, interview prole%s is to %ake hirin$ aout %ore than a person/s resu%e. It/s not clear who $ains when interview prole%s are retired ecause they/ve een disclosed. "till' I reco$niCe practically that it is i%possile to %aintain secrets once enou$h people know the%. 2ence %y co%%ent at the end o, the post. 2P jdavid.net EE Au$ G' 2011 at G!2? a% Questions like this are $rand i, you are lookin$ ,or a \"earch Fn$ineer/ ut i, you are lookin$ ,or a ,ront end developer or so%eone that understands how to desi$n a @F"4 api' or so%ethin$ else use,ull this is a o$us 0uestion. All to o,ten I have seen interviewers use a one 0uestion ,its all approach to interviewin$. I have een out o, colle$e ,or a decade and have si$ni,icant work e+perience in producin$ we pa$es and ,ra%eworks that respond to a rowser or app in a speci,ic way. Questions like these ,ilter %e out ecause I have not een doin$ colle$e papers or playin$ around in a pro$ra%%in$ lan$ua$es class' ut rather have een uildin$ real tools. Questions like how would you desi$n your own we ,ra%ework %i$ht e a etter 0uestion. 5hy is -Query popular how does it di,,er ,ro% other -s ,ra%eworks. )an you write a dele$ated click handler1 2ow does it di,,er ,ro% an assi$ned click handler1 Please ask a 0uestion relative to an applicant/s roleE e+perience. 4here are plenty o, 0uestions out there that are easy ,or colle$e $rads to answer' ut hard to e+perienced pro$ra%%ers and vice verse. 2= /he ,ord 7rea+ Problem @ Plus 1 9ab EE Au$ G' 2011 at G!N2 a% ]...^ an interestin$ post on the .oisy )hannel lo$ descriin$ what author #aniel 4unkelan$ calls the word reak prole%. I didn/t ,ind this post interestin$ ecause the prole% is a $ood interview 0uestion' I ]...^ 2X Daniel /un+elan' EE Au$ G' 2011 at ?!00 a% VCui .o interview process is per,ect. It/s te%ptin$ to only hire people you/ve worked with M indeed' I/d e inclined to so that as a ,ounder. 3ut that doesn/t scale' and it also iases the process towards people who are already well connected. Another approach is to ,ocus al%ost entirely on the resu%e' ut that has its own prole%s' ,ro% resu%e in,lation to a$ain ,avorin$ those who started with advanta$es. I/% y no %eans per,ect ut I think I $ive candidates as ,air a shot as I know how. I hired a candidate with no colle$e de$ree and a hi$h-school e0uivalency M she per,or%ed pheno%enally as a so,tware en$ineer and is now a %ana$in$ director at <old%an. I try to put nervous candidates at ease. 3ut there/s no $ettin$ around that an interview is an assess%ent process' and not everyone does well when they are ein$ assessed. I/% curious to hear the details o, how you or your co%pany interview candidates. Fveryone ene,its i, we can %ake this process etter. 2G 4ee+ 8eadin' Au'ust AB !C11 = 8e'ular 4ee+ EE Au$ G' 2011 at ?!02 a% ]...^ @etirin$ a <reat Interview Prole% ]...^ 2? Daniel /un+elan' EE Au$ G' 2011 at ?!0( a% V@etric Is it really so $eneric1 I/ve si%pli,ied the prole% a little' ut it/s pretty close to a real prole% I had to solve ,or i%ple%entin$ search ,eatures that would e deployed in a very road ran$e o, do%ains' includin$ ,or part-nu%er search. I had the opportunity to see do%ain-speci,ic heuristics reak down. (0 Daniel /un+elan' EE Au$ G' 2011 at ?!0G a% V-ere%y 4o clari,y' the n here is the siCe o, the input strin$' rather than the nu%er o, users or docu%ents. 3ut al$orith%s like these run in a hi$h-tra,,ic environ%ent' with re0uests ein$ processed concurrently. A ,eature whose cost lows up e+ponentially with a ad input can take a site down. <ranted' there are other ways to $uard a$ainst such ,ailures 9e.$.' ti%e-outs:' it/s $ood to use al$orith%s that don/t have such low up. And even etter to understand the ehavior o, al$orith%s e,ore rather than a,ter those al$orith%s %ake it to production. (1 4rant -usbands EE Au$ G' 2011 at ?!0? a% Vrc,' V@etric! *ou don/t need to invent new structures ,or the dictionaryO lots o, variants o, trieE.7AE#7A will do -ust ,ine. 2owever' the 0uestion e+plicitly disallows chan$in$ the dictionary and its API. I/% sure #aniel is aware that &9.: solutions e+ist i, the dictionary is i%proved' ut interview 0uestions are ,or e+plorin$ the ailities o, a candidate rather than ,indin$ the est possile solution. (2 jeremy EE Au$ G' 2011 at ?!(( a% @i$ht' I understand that this e+a%ple prole% is a s%all input strin$. Perhaps what I was tryin$ to ask is how indicative o, a real world %assive scale %achine learnin$ or in,or%ation retrieval prole% this e+a%ple prole% really is. A$reed aout ein$ ale to reco$niCe that an al$orith% %i$ht have 0uadratic or e+ponential lowup. 3ut a$ain' I/% askin$ aout how realistically o,ten one has to i%ple%ent solutions that need dyna%ic pro$ra%%in$ when doin$ in,or%ation retrieval %achine learnin$ on a we scale. I a% assu%in$ that you/re not -ust talkin$ aout parsin$ the input. I assu%e when you say %assive scale in,or%ation retrieval and %achine learnin$' you/re workin$ on al$orith%s to e+tract patterns ,ro% the data' to ,ind relationships' to 9set ased: e+traction o, related entities. 7or e+a%ple. And in that case' the siCe o, the input isn/t 100 or 1000. 3ut %illions. 3illions. "o a$ain' how o,ten is %e%oiCation necessary' in practice1 9And reco$niCin$ that so%ethin$ is $oin$ to e .P co%plete' or has a 0uadratic-ti%e #P solution' or a 0uadratic ti%e appro+i%ation' is di,,erent than ein$ ale to code that solution' durin$ a hal, hour interview. "o %i$ht it perhaps e etter to test one/s aility to reco$niCe say X or G di,,erent prole%s as to their potential co%ple+ity' rather than havin$ a candidate write code ,or a sin$le e+a%ple1: (( Daniel /un+elan' EE Au$ G' 2011 at ?!P0 a% V-ere%y #i,,erent prole%s test di,,erent skills. I/ve never used this 0uestion as the only deter%inant in assessin$ a candidate' ut I/ve ,ound that it provides %ore its o, in,or%ation than %ost. (N 8etric EE Au$ G' 2011 at 10!01 a% V<rant 2usands I was not su$$estin$ you needed to recreate a dictionary. 2owever' the ideal pro$ra% ,or a hu%an lan$ua$e dictionary where the lon$est word is (2ish di$its vs. so%e input te+t is very di,,erent than so%ethin$ you would use i, you had 200 di,,erent Z100'000 di$it #.A se0uences in your dictionary. )onceder with a lon$ enou$h input strin$ iteratin$ over the ,ull dictionary and creatin$ a 0uick inde+ could easily reduce runti%es ,ro% years to seconds. And with a short enou$h input strin$ any opti%iCations are asically pointless. F[! \cat/. (P "arl EE Au$ G' 2011 at 10!01 a% #aniel' I think you have a oundary prole% in your e+a%ple M the sustrin$ ,unction in -ava takes 9to %y %ind: weird ar$u%ents. *ou have to call sustrin$ until the endInde+ E ri$ht ar$u%ent is e0ual to the strin$ len$th. e$! "trin$ test U A012(NP=_O puts9A>s len$th >dB' test' test.len$th9::O ,or 9int iU1O i >sB' 1' i' test.sustrin$90' i::O produces pre,i+ ]1' 1^ -T 0 pre,i+ ]1' 2^ -T 01 pre,i+ ]1' (^ -T 012 pre,i+ ]1' N^ -T 012( pre,i+ ]1' P^ -T 012(N pre,i+ ]1' =^ -T 012(NP pre,i+ ]1' X^ -T 012(NP= (= "arl EE Au$ G' 2011 at 10!0P a% 3lech' ht%l http!EEcodepad.or$EAld1l?v (X 8etric EE Au$ G' 2011 at 10!11 a% P"! 3y inde+ I %ean ,ind siCe o, the lon$est word' and or do other preprocessin$. (G Daniel /un+elan' EE Au$ G' 2011 at 10!1P a% VFarl Are you sure1 )han$e the ,or loop to ,or 9int iU1O i ` U test.len$th9:O iSS: puts9Lpre,i+ ]>d' >d^ -T >sB' 0' i' test.sustrin$90' i::O R produces pre,i+ ]0' 1^ -T 0 pre,i+ ]0' 2^ -T 01 pre,i+ ]0' (^ -T 012 pre,i+ ]0' N^ -T 012( pre,i+ ]0' P^ -T 012(N pre,i+ ]0' =^ -T 012(NP pre,i+ ]0' X^ -T 012(NP= (? 4rant -usbands EE Au$ G' 2011 at 10!1? a% V@etric! *our su$$ested P-letter lookup was essentially a chan$e to the dictionary APIO it is to that that I was re,errin$. 7or any lack o, clarity on %y part' I apolo$ise. Anyway' there are plenty o, ways o, preprocessin$ the dictionary' and I %entioned co%%on ones' ut none o, the% ,it the prole% description' which e+plicitly disallows such preprocessin$' %akin$ this deate irrelevant. N0 1ohn - EE Au$ G' 2011 at 10!(( a% 5hy are you retirin$ it1 3ecause it/s out on the we1 I don/t think havin$ knowled$e o, the interview 0uestion ahead o, ti%e necessarily precludes its use,ulness. 3ein$ ale to deliver the solution 0uickly and e,,iciently is still a valuale assess%ent o, skill. Also' assu%in$ the candidate has to answer N-= diverse prole%s over the course o, the day' it/s still a pretty $ood screen to have the% \reproduce %e%oriCed answers/ 9assu%in$ they knew the% ahead o, ti%e:. 4his is ,urther %iti$ated y havin$ a couple o, prole%s you can switch etween' now the candidate would need to have 2P-P0 0uestions %e%oriCed. 2onestly' havin$ the solution to P0 interview 0uestions to )" prole%s %e%oriCed is pretty $ood. &n top o, that' very ,ew candidates do the research needed to ,ind the prole%s ahead o, ti%e. P"' 4hanks ,or the outline' I/% $oin$ to use this 0uestion ,or %y candidates in the ,uture. N1 Charles 0calani EE Au$ G' 2011 at 2!N0 p% I have always hated interview 0uestions' which is why I don/t use the%. I/d rather solve a prole% 5I42 the candidate or i, I/% ein$ interviewed then with the interviewer. I want real world prole%s that have .&4 een pre-solved. I want to have a desi$n discussion with the% and talk throu$h the desi$n and i%ple%entation issues. )odin$ is trivial a,ter that. I would rather have the candidate rin$ in an e+a%ple o, code that is non-proprietary and so%ethin$ they/re particularly proud o,. 4hen I review the code with the% like I would i, they worked ,or %e. I/d %uch rather have an interview as a pseudo-workin$ session. 4hen I can see how it would e to actually work with that person. 4here/s no etter way to see how so%eone thinks than %akin$ the% work 5I42 youO not -u%p throu$h your arti,icial hoops. N2 binarysolo EE Au$ G' 2011 at (!2N p% V#aniel 5ell' I still dunno i, this is a $reat interview prole% per se' ut it sure is a $reat conversation starter $iven the very HaccessileH nature that even us lay%en who don/t have %uch pro$ra%%in$ can access the 0uestion and think o, reasonale i%ple%entations. I/% not ,a%iliar enou$h with the pro$ra%%in$ world' ut I/d i%a$ine that the value o, clever thinkin$ and e,,icient structural thinkin$ TT so%e technical' nuanced aspect o, so%e co%puter lan$ua$e which is what this prole% ,ields out. N( jeremy EE Au$ G' 2011 at P!1= p% $ifferent problems test different sills. I"ve never used this question as the only determinant in assessing a candidate, but I"ve found that it provides more bits of information than most. A$ain' yes.. ut you speci,ically said that this speci,ic 0uestion didn/t -ust test a candidate/s aility to think $enerally' co%puter-scienti,ically. 3ut a candidate/s aility to think speci,ically in ter%s o, %assive scale' %achine learnin$ and in,or%ation retrieval. And it/s that speci,ic connection M etween #P and %assive scale in the conte+t o, DL and I@ M that I/% stru$$lin$ to understand' rather than the roader 0uestion o, whether a candidate is a $ood co%puter scientist. It/s -ust that.. oh' never%ind. I/ll take it o,,line. NN 8ic+ ,illiams EE Au$ G' 2011 at X!0= p% 4hank you ,or the response aout .#As. I a$ree co%pletely that it is unpro,essional to pulish secret interview 0uestions a,ter an interview. 3ut that/s di,,erent ,ro% the .#A issue. I/ve een on doCens o, interviews and have never een asked to si$n an .#A ,or interviewin$' nor have I re0uired one o, anyone I a% interviewin$. .#As are so%ethin$ clients si$n e,ore ein$ in,or%ed o, proprietary usiness trade secret in,or%ation. 4his is done only when asolutely necessary since it is %uch etter si%ply not to reveal trade secrets to outside parties in the ,irst place. It/s 0uite iCarre to hear o, the% ein$ re0uired ,or interviews and I don/t really elieve that this is a co%%on practice. I, it is co%%on in so%e se$%ent o, the ,ield' then it is an ill advised practice. NP 4olam Dawsar EE Au$ G' 2011 at X!(P p% <reat interview 0uestion' ut in %y e+perience as a pro$ra%%er' I have not seen %any pro$ra%%ers who can cook up dyna%ic pro$ra%%in$ solutions to prole%s like these' let alone durin$ the stress o, an interview. Fven reco$niCin$ this as a dyna%ic pro$ra%%in$ 0uestion will e hard ,or %any. 4hanks ,or such a nice post #aniel. Fn-oyed readin$ it a lotQ N= Debnath EE Au$ G' 2011 at G!1( p% 4he $eneral solution will ter%inate i, it ,inds the word in the dictionary' should it still not continue1 I %ean' there could e a word like Aend$a%eB' ,or the lack o, a etter e+a%ple 9or acktrackin$:' which %i$ht e a part o, the dictionary' ut are also individual words8I can understand they %ay not e popular in I@ thou$h since they are usually an areviation o, the su-words' and the su words don/t really %ake sense independently' ut $iven the prole% de,inition8 NX A 'reat interview Problem = Phani+umar EE Au$ G' 2011 at 10!20 p% ]...^ prole% with the code and runti%e o, the al$orith% %entioned in the post. http!EEthenoisychannel.co%E2011E0GE0GEretirin$-a-$reat-interview-prole%E 4his entry was posted in )ode y phani. 3ook%ark the ]...^ NG Daniel /un+elan' EE Au$ G' 2011 at 11!(X p% VJohn 2 I hope this 0uestion serves you well. It/s ti%e ,or %e to %ove on ,ro% it' and I thou$ht the est way to retire this prole% was to do so in a way that others would learn ,ro% it. N? Daniel /un+elan' EE Au$ G' 2011 at 11!N2 p% V)harles "cal,ani 5e do have collaorative prole% solvin$ and product desi$n discussions as part o, the interview process. 3ut I also want to see how a candidate writes code. @eviewin$ code they/ve written e,ore is an option' ut it/s tricky M especially ,or a candidate that has not written non-proprietary code in a lon$ ti%e. P0 Daniel /un+elan' EE Au$ G' 2011 at 11!NN p% V#enath 4he ter%ination condition is part o, the prole% state%ent. 4he prole% could e chan$ed to re0uire outputtin$ all valid se$%entations' ut there could e an e+ponential nu%er o, the%. 5e could also re0uire the AestB se$%entation' which is an interestin$ desi$n 0uestion as to what constitutes AestB. P1 2n ;8etirin' a 4reat Interview Problem< @ ,ill.,him EE Au$ ?' 2011 at 1!N? a% ]...^ 4unkalen$ wrote an interestin$ lo$ post' A@etirin$ a <reat Interview Prole%B on an interview prole% that he has' in the past' posed to interviewees' ut which he has ]...^ P2 ,ill 3it6'erald EE Au$ ?' 2011 at 1!P1 a% I wrote a response to this' A&n \@etirin$ a <reat Interview Prole%/B at http!EEt.coEu[Q,Noh. P( Daniel /un+elan' EE Au$ ?' 2011 at =!N1 a% 5ill' I read your response. A short one here! assess%ent o, candidates on a prole% like this isn/t inary. @ather' the point is to $et as holistic a picture as is possile in the ti%e constraints o, how well a candidate solves an al$orith%ic prole% and i%ple%ents it. It/s not a per,ect tool' ut no tool is. I/% always on the lookout ,or etter ones M don/t ,or$et that I/% retiring this prole%. And you allude to a candidate/s nervousness under interview conditions. Part o, the interviewer/s -o is to put the candidate at ease. 4hat isn/t uni0ue to 0uestions that involve codin$' and it doesn/t always work out. .o interviewer or hirin$ process is per,ect. 7inally' while I share your concerns aout usin$ whiteoard codin$ in interviews 9see %y link to #ie$o/s post:' I disa$ree that it discri%inates a$ainst older candidates. 4hat very hypothesis strikes %e as a$eist' at least in the asence o, supportin$ data. PN Patric+ /illand EE Au$ ?' 2011 at 11!0= a% As rocketsur$eon %entioned' there is one u$ and also one typo in the code. 4here is a parenthesis %issin$ in this line! i, 9dict.contains9input: return inputO And the loop condition never reaches the ne+t to last character in the input strin$! ,or 9int i U 1O i ` len 6 1O iSS: PP Daniel /un+elan' EE Au$ ?' 2011 at 11!1N a% Vrocketsur$eon VPatrick 4illand 3u$s ,i+ed. 4hanks $uysQ P= 0onic Charmer EE Au$ 10' 2011 at (!NG a% I won/t write this out in JavaEwhatever synta+ as I a% too laCy' also not a pro$ra%%er so not interestin$ in %e%oriCin$ synta+ o, this or that lan$ua$e' ut as stated I a% laCy so I would want a \$ood/ se$%entation' not -ust any. 9I don/t want a lot o, Aa/sB i, there/s an AaaaaB availale:. "o I would check the lon$est-len$th word in the dictionary' say that len$th is k 9o, course cap this at input-strin$ len$th and discard all dictionary words $reater than this 6 I/ll assu%e the dictionary has easy capaility oth o, this %a+-len$th- check and to \discard/Ei$nore all Tk:' then startin$ ,ro% iU1 search all 9i'iSk-1: sustrin$s o, the input unlessEtill I ,ind a %atch' i, none reduce k 9discard %ore dictionaryQ: a try a$ain' i, so parse out a into \%atch/' pre,i+ and su,,i+ 9as applicale: and recurse onto oth o, the latter. #etails too orin$Eovious to spec out. Personally I dislike cutesy \interview 0uestions/ and a% instinctively distrust,ul o, the ,ilter they i%plicitly apply to candidates. 5hen I/% interviewin$ so%eone I do what is known as \talk to/ the person' one %i$ht even say I \have a conversation/ with the%. 4hat %ayE%ay not e etter ut %y way at least I don/t think so%eone could slide throu$h the interview ,ilter y %e%oriCin$ so%e stu,, o,, wesites. PX 0tavros &acra+is EE Au$ 10' 2011 at X!P? a% Dany skills $o into a $ood so,tware en$ineer' and di,,erent interviewers will proe di,,erent skills. 4his prole% e%phasiCes al$orith%ic thinkin$ and codin$ M and I ask a si%ilar interview 0uestion %ysel,. 3ut al$orith%ic codin$ is eco%in$ surprisin$ly unco%%on in %any environ%ents ecause the non-trivial al$orith%s are wrapped in liraries. &, course' there are places where understandin$ all this is crucial M so%eone has to write those liraries' and so%e people really do have 3i$ Prole%s that the liraries don/t address M ut it doesn/t see% to e a central skill that all pro$ra%%ers need to %aster. Is this $ood or ad1 )onsider A... 5hitehead! A)iviliCation advances y e+tendin$ the nu%er o, i%portant operations which we can per,or% without thinkin$ aout the%.B PG 0tavros &acra+is EE Au$ 10' 2011 at G!1( a% &ne thin$ I/ve learned aout interview 0uestions is that you have to lead up to the% in steps i, you want to deter%ine where a candidate/s understandin$ trails o,, M which %ay e very soon. 7or e+a%ple' %any candidates clai% on their resu%es that they know "QL. I used to ask such candidates how they would deter%ine i, person A was an ancestor o, person 3 $iven a tale o, parent-child relations. 4his re0uires the 9advanced: "QL ,eature o, recursive 0ueries 9and I/d actually e happy i, they could e+plain why it couldn/t e done in "QL' as it can/t in asic "QL:. .ow' I ask the 0uestion in sta$es! H In "QL' how would you represent parent-child relations1 H 2ow would you ,ind [/s parents1 H 2ow would you ,ind [/s $randparents1 H 2ow would you ,ind all o, [/s ancestors1 H 5hat i, you wanted to do all this hi$h-volu%e $enealo$y 5e site M would you chan$e your tale desi$n or 0ueries1 or use so%e technolo$y other than "QL1 I was shocked to discover that %any candidates who listed "QL on their resu%es couldn/t do HanyH o, this' and %any re0uired considerale coachin$ to do it. &ne candidate didn/t even re%e%er that "QL 0ueries start with "FLF)4 M I would have ,or$iven this i, he/d had conceptual understandin$ ut had -ust ,or$otten the keyword' ut he had Cero conceptual understandin$ as well. All this to say that you can/t really trust the sel,-reportin$ on a resu%e and you/ve $ot to proe to understand what the candidate actually knows. P? Daniel /un+elan' EE Au$ 10' 2011 at G!N2 a% V"onic )har%er *our sketch is actually a reasonale start towards what I/d e+pect o, a candidate. In ,act' I think I could persuade you that your assu%ptions would have to e $eneral enou$h to at least solve the interview prole% as a special case where the %in word len$th is one and the %a+ is lar$e enou$h to e the len$th o, the input strin$. Please ear in %ind that this AcutesyB prole% is a si%pli,ication o, one I had to solve to deliver so,tware that has een deployed to hundreds o, %illions o, peopleQ V"tavros *ou/re ri$ht that not all so,tware en$ineers needs to have stron$ co%%and o, al$orith%s. 3ut I do re0uire that stren$th o, the ,olks I hire' $iven the prole%s that %y tea% solves. "a%e applied at <oo$le and Fndeca. And yes' sel,-reportin$ on a resu%e is always su-ect to the %a+i% o, Atrust ut veri,yB. =0 ,ord 7rea+s @ Pro'rammin' Pra#is EE Au$ 12' 2011 at 2!0( a% ]...^ 4unkelan$ posted this interview 0uestion to his lo$! <iven an input strin$ and a dictionary o, words' se$%ent the input strin$ into a space-separated ]...^ =1 Daniel /un+elan' EE Au$ 12' 2011 at =!2N a% VPro$ra%%in$ Pra+is A solution in "che%e. .iceQ =2 7en &abey EE Au$ 1N' 2011 at N!1= p% 4hanks ,or sharin$ this prole%Q I did a )lo-ure and @uy solution and discussed the di,,erences in laCy 9as in laCy lists: and non-laCy solutions! http!EEen%aey.co%E2011E0GE1NEword-reak-in-clo-ure-and-ruy-and-laCiness-in- ruy.ht%l =( Daniel /un+elan' EE Au$ 1N' 2011 at G!(2 p% 3en' thank youQ I/% honored to have inspired such an insi$ht,ul and ele$ant post. =N Conductin' a 8emote /echnical Interview = -irin' /ech /alent EE Au$ 1P' 2011 at ?!1? a% ]...^ hacked' as the wise candidate can research typical 0uestions ahead o, ti%e. #aniel 4unkelan$ has a $reat post on this' where he ,ound that one o, his est 0uestions was posted on ]...^ =P 8aymond &oore EE Au$ 1X' 2011 at (!21 p% F+cellent article b topic. 5e are currently recruitin$ ,or a hal, doCen )SS b 5e ;I positions and ,or %any co%panies it is e+tre%ely di,,icult to deter%ine who% is talkin$ the talk and who% can write VF@* clean code and think lo$ically when ,aced with di,,icult pro$ra%%in$ re0uests. As an e+a%ple a "ales #irector will hand a candidate ,or a sales position a phone and say A%ake this call and pitch the%B -ust so you can see what they have' this is partly what interviewin$ has eco%e like in the I4 world. == Daniel /un+elan' EE Au$ 1X' 2011 at =!0X p% @ay%ond' thanks. 4he prole% with all interviewin$ M and with interviewin$ so,tware en$ineers in particular M is that it/s hard to e+tract a reliale si$nal under interview conditions. ;lti%ately the est solution %ay e to chan$e the interview conditions. 3ut the approach has to e oth e,,ective and e,,icient. It/s a $reat research prole% M and I/ll let readers here know what %y collea$ues and I co%e up with. &, course' I/d love to hear what others are doin$. =X 0tate o /echnolo'y E!1 @ Dr DataFs 7lo' EE Au$ 1G' 2011 at 10!PP p% ]...^ 6 2ow to retire a $reat Interview prole% 6 Aword reakB prole% descried as ]...^ =G Anatoly Darp EE Au$ 21' 2011 at 12!20 a% As a side note' there is a discussion o, a sli$htly %ore $eneral prole% in Peter .orvi$/s e+cellent chapter ,ro% &/@eilly ook A3eauti,ul #ataB 6 http!EEnorvi$.co%En$ra%sEch1N.pd, 9see section A5ord "e$%entationB:. 2is re%ark that the %e%oiCed solution can e thou$ht o, as the Viteri al$orith% ,ro% the 2DD theory is nicely illu%inatin$ 9and o, course ovious upon a %o%ent/s thou$ht:. =? Daniel /un+elan' EE Au$ 21' 2011 at ?!(1 a% Indeed. Peter e%ailed %e his AnaiveB solution M un,ortunately' I don/t think he/s on the %arket. X0 Attention C&G 0tudentsH EE "ep X' 2011 at X!0N p% ]...^ and 5ednesday. And o, course LinkedIn will e conductin$ on-ca%pus interviews! those will take place all day on 4hursday' "epte%er ]...^ X1 ,illiam EE .ov N' 2011 at 10!PX p% 2i #aniel! how could you chan$e your code so that it can ,ind all valid se$%entation ,or the whole strin$1 F.[. strin$ AaaaB' dictc\a/' \aa/R to e se$%entationcAa a aB' \a aa/' Aaa aBR X2 Daniel /un+elan' EE .ov P' 2011 at 1!NN p% 5illia%' interestin$ 0uestion. 7or starters' the nu%er o, valid se$%entations %ay e e+ponential in the len$th M in ,act' that will e the case ,or a strin$ o, n a/s i, every se0uence o, a/s is a dictionary word. )ould still use %e%oiCation E dyna%ic pro$ra%%in$ to avoid repeatin$ work' ut storin$ sets o, se0uences rather than a sin$le one. X( 8oberto 9upi EE #ec 1X' 2011 at 2!N1 p% It can e done in &9n:' nUlen$th o, the strin$' with so%e pretty rela+ed assu%ption $iven the nature o, the prole%! we -ust need a preprocessin$ step on the dictionary. 4he idea is to uild a set o, rollin$ hashin$ ,unction usin$ the @ain-Jarp al$orith%' ,or each word len$th in the dictionary' and the correspondin$ hash value ,or each word. 4o se$%ent the strin$' we loop over it once' updatin$ the rollin$ hash values 9,or each len$th: and i, we ,ind a %atch in our set o, hash values ,ro% the dictionary' we have a potential %atch. 5e still have to check the actual dictionary to con,ir% the %atch' avoidin$ ,alse positives. 4his desi$n has the added advanta$e that the dictionary can e lar$er than what can ,it into %e%ory. 5e only need to store the hash values ,or each word in %e%ory. XN Daniel /un+elan' EE #ec 1X' 2011 at (!01 p% @oerto' one person I presented the prole%s to did su$$ested an approach alon$ these lines! since dictionary %e%ership is a re$ular lan$ua$e' -ust uild a ,inite state %achine. I was i%pressed y the in$enuity' ut I then en,orced the constraint that the dictionary only supported a constant-ti%e %e%ership test. 3y the way' I/ve since %oved on to less e+citin$ codin$ prole%s that re0uire less in$enuity and are %ore a test o, asics 9thou$h not 0uite as ele%entary as ,iCCuCC:. I/% still surprised at how %any candidates with stron$ resu%es ,ail at these. XP 8oberto 9upi EE #ec 1G' 2011 at 12!2X p% Daye stron$er candidates tend to overco%plicate prole%s' instead o, solvin$ the% in the si%plest way they search ,or a clever one and $et lost. A]4^he stupider one is' the closer one is to reality. 4he stupid one is' the clearer one. "tupidity is rie, and artless' while intelli$ence wri$$les and hides itsel,. Intelli$ence is a knave' ut stupidity is honest and strai$ht ,orward.B M #ostoevsky 94he 3rothers Jara%aCov: X= Daniel /un+elan' EE #ec 1G' 2011 at 12!(( p% "o%e stron$ candidates assu%e that an easy solution %ust e too naive and there,ore wron$. 7or that reason' it/s i%portant to set e+pectations at the e$innin$. I, a prole% is asic' I tell the candidate as %uch M which also helps avoid a candidate ,eelin$ insulted or worryin$ that the ar is too low. And ,or all prole%s' I ur$e candidates to co%e up with a workin$ solution e,ore opti%iCin$ it. XX Ale# EE Jan 10' 2012 at 11!2( a% At ,irst' it sounds depressin$. I/% writin$ co%pilers and &"es' a%on$ other thin$s' ut don/t think I/d pass this interview. 2owever' the %ore I read the %ore I realiCe I/% a di,,erent kind o, pro$ra%%er than who is sou$ht here. I don/t cite )L@ y heart' solve real prole%s in real %anner 9i.e.' includin$ askin$ others' not to %ention usin$ ooks' Internet etc.: and $enerally not $ood at Acodin$B. I $uess %y skills aren/t very %arketale' in this approach. Di$ht e a etter choice not to pro$ra% ,or so%eody else. XG Daniel /un+elan' EE Jan 10' 2012 at ?!12 p% Ale+' I/ve actually switched to usin$ prole%s that are a it less )L@-ish. I still think it/s reasonale to e+pect so%eone to e ale to real prole%s like this one' thou$h I realiCe it/s unnatural to solve prole%s under interview conditions. An alternative approach would e to put less e%phasis on the accuracy o, the interviewin$ process and treat the ,irst ,ew %onths as a trial period. ;n,ortunately' that/s not the cultural nor%' so instead we try to s0ueeCe all the risk out o, the hirin$ process. Anyway' i, you/re ale to write a co%piler or &"' you should have no prole% ,indin$ work you/re $reat at and en-oy. X? dbt EE Jan 11' 2012 at 2!P0 p% Ale+' I have heard that la%ent e,ore. I work at a place that uses si%ilarly al$orith%ic 0uestions 9and other 0uestions too M al$orith%s aren/t everythin$' ut they/re i%portant: and I so%eti%es hear %y coworkers la%ent that they couldn/t $et hired with the standards we have today. 5hich is' o, course' nonsense. 5hat/s i%portant to realiCe aout these 0uestions is that in an hour' you have ti%e to %ake an atte%pt at an answer' $et ,eedack' and i%prove your solution. It is a conversation' and not -ust a lank whiteoard' an interviewer in the corner tappin$ a ruler on the tale every ti%e you %ake a %istake' and a disappointin$ early trip ho%e. G0 Daniel /un+elan' EE Jan 11' 2012 at (!2X p% 4appin$ a ruler on the tale1 Dore like rappin$ you on the knucklesQ &J' the nuns didn/t really do that to us. "eriously' dt is ri$ht. <ood interviews are a conversation. &therwise' it would e etter to %ake the% non-interactive tests. G1 DonIt write on the whiteboard /he Princeton "ntrepreneurship Club EE Jan 2G' 2012 at 2!(( p% ]...^ %e to write tests ,or %y code' ,ind corner cases. 2e then asked %e ( other prole%s. 4hey were #an 4unkelan$ type prole%s. 2e ran out o, prole%s and there were 1P %inutes le,t. A.or%ally there/s not enou$h ]...^ G2 -irin'. you are doin' it wron' = ?abah'atFs blo' EE 7e 22' 2012 at (!2X a% ]...^ is a challen$e. A lot has een written aout the process itsel, and its 0uirks' ran$in$ ,ro% pro$ra%%in$ puCCles to whiteoard interviews. 2owever' there are still a ,ew details that are o,ten overlooked y ]...^ G( 0trata !C1!. 7i' Data is 7i''er than "ver EE Dar 2' 2012 at 12!PG a% ]...^ %inutes e+tended into three hours o, conversation aout everythin$ ,ro% nor%aliCed JL diver$ence to interview prole%s M and se$ued into a reception with specialty i$- data cocktails. 3y the ti%e I $ot ack to %y ]...^ GN man!code EE Dar 1G' 2012 at 1!1G a% i, 9se$"u,,i+ QU null: c %e%oiCed.put9input' pre,i+ S B B S se$"u,,i+:O return pre,i+ S B B S se$"u,,i+O R this i, should e added with else as %ention elow' as to show words ,etched e,ore non- dictionary word !! i, 9se$"u,,i+ QU null: c %e%oiCed.put9input' pre,i+ S B B S se$"u,,i+:O return pre,i+ S B B S se$"u,,i+O Relsec return pre,i+ O R GP 1p 7ida EE Jun 2G' 2012 at 11!21 a% 5ould usin$ hu%an cycles with captcha/s $et you closer to &9n:1 &r does i$ & analysis only apply when we actually understand the details o, the al$orith%1 G= netootprint EE Jul 1' 2012 at 1!N1 a% I was thinkin$ aout the sa%e prole%. Quite surprised to see the sa%e thin$ appear here and asked ,or interviews . shouldn/t AaaaaaB e AaaaaaB S 1 i, a' aa' aaa' aaaa'aaaaa are in dict 1 "tep 1! check ,or AaaaaaB' not in dictionary "tep 2! check ,or AaaaaaB' ,ound in dictionary word se$%ents are AaaaaaB S AB I wonder why the need all co%inations in a Areal-worldB prole%Q GX Al'orithms. ,hat is the most eicient al'orithm to separateconnectedwords? assumin' all the constituent words are in the vocabulary Jand also assume or simplicity that there arenFt any spellin' mista+esK L Quora EE Jul (' 2012 at N!2( a% ]...^ is a thorou$h discussion o, this prole% as an interview 0uestion y #aniel 4unkelan$! ]1^]1^ http!EEthenoisychannel.co%E2011E8)o%%ent Loadin$8 d Post d N!2(a% Add ]...^ GG 8ob EE Au$ 2X' 2012 at G!(0 p% I love the prole%8ut I can/t help to notice that the interviewer who has used this 0uestion to ,ilter %any ,ine candidates over the years can/t even write out a workin$ solution without u$s $iven unli%ited ti%e' years o, e+perience askin$ others this 0uestion and do so in the conte+t o, writin$ an in-depth analysis teachin$ the unwashed %asses aout how $reat an interview prole% this is. )an it e such a $reat 0uestion i, you can/t even $et it ri$ht1 4he i$$est prole% with 0uestions like this is that this type o, pro$ra%%in$ is a hi$hly perishale skill. I have played the $uitar ,or 10 years ut lately have spent %ore ti%e sin$in$ acoustically 6 tryin$ to ,it %y ,in$ers to a 3ach piece that used to co%e to %e written on the ack o, %y eyelids is now i%possile. .ow i, I were playin$ 3ach every day it would e a di,,erent story. "a%e with this prole%. *ou are only $oin$ to hire the $uy who happened to solve several prole%s like this last week ecause they ran into so%e issue where it was relavent. G? Daniel /un+elan' EE Au$ 2X' 2012 at G!N1 p% )riticis% noted. 3ut ear in %ind that the 0uestion isn/t a inary ,ilter. It/s a test o, al$orith%ic thinkin$ and even o, workin$ out the prole% re0uire%ents. I/% curious what you %ean y Athis type o, pro$ra%%in$B ein$ hi$hly perishale. I, you %ean solvin$ a asic al$orith%ic prole% that co%es up in the course o, real work' then I stron$ly o-ect. As I said in the post' the prole% isn/t ,ro% a te+took M it/s a si%pli,ied version o, a real issue that has co%e up ,or %e and others in the course o, writin$ production so,tware. 3ut I $rant that people don/t always have opportunities to use dyna%ic pro$ra%%in$ and perhaps even recursion. 5ith that in %ind' I/ve switched to interview prole%s that are less reliant on these. 3ut I still e+pect the people I hire to e ale to apply these ,unda%ental co%puter science techni0ues with con,idence. &, course there/s a risk with this and any interview prole% o, over,ittin$ to so%eone/s recent e+perience. 4hat/s why it/s $ood to use a diverse set o, interview 0uestions. I, so%eone aces the interviews ecause she solved all o, those prole%s last week' I/ll take %y chances and hire herQ 7inally' i, you have su$$estions aout how to interview %ore e,,ectively' I/% all ears. ?0 /apori EE "ep 2' 2012 at P!2( p% &ne prole% with this is that $ood pro$ra%%ers tend to avoid recursion 9no data to support it thou$h:. "o while you are e+pectin$ a 10 %in recursive version' the $ood pro$ra%%er is tryin$ to rin$ out a non recursive version and %i$ht ,ail. 9&, course' non recursive version ,or this can e done in an interview:. tw' this is a te+ook e+ercise prole%. I elieve "ed$ewick/s ook 9or perhaps )L@: has it. ?1 Daniel /un+elan' EE "ep 2' 2012 at P!(0 p% I concede that a lot o, $ood pro$ra%%ers %ay instinctively avoid recursion' althou$h in this case I/d say that %akes the% worse pro$ra%%ers. 4he whole point o, learnin$ a set o, so,tware en$ineerin$ tools is to apply the ri$ht one to the ri$ht prole%' and this prole% is %ostly naturally de,ined and solved recursively. As ,or it ein$ a te+took e+ercise' $ood to know. As I said in the ori$inal post' I ,irst encountered this prole% in the course o, writin$ production so,tware. ?2 /apori EE "ep 2' 2012 at X!21 p% 4he prole% 9pun intended: is that the prole% is still inco%pletely de,ined. 7or instance' we have no idea aout the e+pected tar$et hardware. #oes it have li%ited %e%ory1 Li%ited stack space1 #oes the lan$ua$e we are to use support recursion1 etc &k' %aye the last one 9or any o, the%: is not relevant these days' ut you $et %y point. 3asically we have no idea what we need to try and opti%iCe ,or. <ranted' a $ood candidate %i$ht and proaly should try and clari,y that' ut in the ,ace o, a%i$uity' $ood pro$ra%%ers ,ollow A9instinctive: $ood practicesB which they $ained throu$h e+perience etc. 7or instance' you will use 0uadratic %e%ory and linear stack in the recursive version. A $ood pro$ra%%er %i$ht instinctively try avoid the cost o, the stack. 3ut' i, the $oal is to opti%iCe the ti%e to write the code' then a recursive version will e ,aster 9and I suppose is an i%plicit $oal in the interviews' ut al%ost never the case in critical production code:. "o callin$ the% worse pro$ra%%ers ,or not usin$ recursion is not ri$ht' ID&. tw' you see% to have i$nored the cost o, lookin$ up the %e%oiCed structure. "ince you are lookin$ up the strin$ your recursive version is cuic' inspite o, the sustrin$ and dictionary lookup ein$ &91:. &, course' that can e avoided i, we represent the strin$ ,or lookup y the end point inde+es 9i'-:' rather than the strin$ itsel,. "orry ,or the lon$ post. ?( Daniel /un+elan' EE "ep 2' 2012 at G!1P p% .o need to apolo$iCe M I appreciate the discussionQ And several o, the points you/ve raised have co%e up when I/ve used this prole% in interviews. I/ve seen candidates i%ple%ent a stack-ased approach without recursion. I/ve also had discussions aout nuances o, scale and per,or%ance' includin$ whether the li%ited %e%ory re0uires the dictionary to e stored out o, code 9a $reat %otivation ,or usin$ a 3loo% ,ilter: and whether the cost o, creatin$ a read-only copy o, a sustrin$ is constant 9i.e.' represented y the end-point inde+es: or linear in the len$th o, the sustrin$. 9ecause the sustrin$ is actually created as a strin$:. "o you/re ri$ht that not all $ood pro$ra%%ers will -u%p to recursion M thou$h I do think that is the si%plest path ,or %ost. And in an interview I ur$e candidates to start with the si%plest solution that works. 4hat/s not only a $ood idea durin$ an interview' ut a $ood idea in practice' to avoid pre%ature opti%iCation. @e$ardless o, the choice o, interview prole%' the interviewer has to e co%petent and ,le+ile. I/% sure a interviewer can utcher an interview with even the est prole%. 3ut so%e interview prole%s are etter than others. And I stron$ly ,avor interview prole%s that are ased on real prole%' don/t re0uire specialiCed knowled$e' and provide candidates options to succeed without dependin$ entirely on the candidate arrivin$ at a sin$le insi$ht. ?N /apori EE "ep 2' 2012 at 10!1= p% )o%pletely a$ree with the co%%ent aout co%petent and ,le+ile interviewers. ?P 1ob Interviews. ,hat is your avourite interview Muestionn or a sotware en'ineer? L Quora EE "ep (' 2012 at 12!0X p% ]...^ is your ,avourite interview 0uestionn ,or a so,tware en$ineer14his lo$ post has a nice 0uestion 6 http!EEthenoisychannel.co%E2011E8 . 5hat are your ,avourite interviewin$ 0uestions1 Add ]...^ ?= mm EE "ep 22' 2012 at X!N1 p% #ue to this 0uestion I had interview with whitepa$es.co% and this is why I couldn/t $et the -o8 <reat answer' hope,ully I won/t see this 0uestion a$ain in %y interviews. L&L ?X Quora EE "ep 2?' 2012 at 10!(X a% ,hy do topmost tech companies 'ive more priority to al'orithms durin' the recruitment process?N .ot all top tech co%panies. At LinkedIn' we put a heavy e%phasis on the aility to think throu$h the prole%s we work on. 7or e+a%ple' i, so%eone clai%s e+pertise in %achine learnin$' we ask the% to apply it to one o, our reco%%endation prole%s. A8 ?G tech'uy EE &ct 1=' 2012 at 11!P1 p% I have a 0uestion aout the %e%oiCed solution. I understand the advanta$e o, savin$ the results o, a dead end co%putation' where the code reads! %e%oiCed.put9input' null:O 3ut I don/t understand the advanta$e to %e%oiCin$ here! %e%oiCed.put9input' pre,i+ S B B S se$"u,,i+:O "ince se$"u,,i+ has already een ,ound to e not null' that %eans we have reached the end o, the input strin$ so%ewhere deeper in the call stack' and are now -ust unwindin$ ack to the top1 Daye I/ve %issed so%ethin$' it/s late at ni$ht ,or %e' ut I can/t see it any other way ri$ht now. 4hanks. ?? Daniel /un+elan' EE &ct 1X' 2012 at 10!0= p% I think you/re ri$ht M we don/t need to %e%oiCe the non-null values. I/ve a%ended the code accordin$ly. 100 CID& !C1!. 5otes rom a Conerence in Paradise EE .ov 12' 2012 at X!00 a% ]...^ sessions o, the con,erence. 4here was a talk on 0uery se$%entation' a topic responsile ,or %y %ost popular lo$ post. Also a $reat talk on identi,yin$ $ood aandon%ent' a prole% I/ve een interestin$ ever ]...^ 101 /hou'ht this was cool. CID& !C1!. 5otes rom a Conerence in Paradise @ C,OAlpha EE .ov 1(' 2012 at X!11 a% ]...^ sessions o, the con,erence. 4here was a talk on 0uery se$%entation' a topic responsile ,or %y %ost popular lo$ post. Also a $reat talk on identi,yin$ $ood aandon%ent' a prole% I/ve een interestin$ ever since ]...^ 102 Dir+ 4orissen EE Jan 10' 201( at N!N1 a% As an aside' as ,ar as I can see' as $iven the $iven code will ,ail to $ive a co%plete solution in cases like this! dict U ]LtheL'Li$L'LcatL^ strin$ U Athei$$ercatB or Athei$,oocatB Also' i, you call it with the strin$ AcatsB it will return null instead o, cat. 10( Daniel /un+elan' EE Jan 10' 201( at X!2P a% #irk' that is how the prole% is set up. 7ro% the post! Q! 5hat aout ste%%in$' spellin$ correction' etc.1 A! Just se$%ent the e+act input strin$ into a se0uence o, e+act words in the dictionary. &, course you can $eneraliCe the prole% to %ake it %ore interestin$' especially i, a candiate solves the ori$inal prole% with ti%e to spare. 10N Dir+ 4orissen EE Jan 10' 201( at X!(= a% Indeed' %issed that' sorry. 4hanks ,or a $reat post tw. 10P Daniel /un+elan' EE Jan 10' 201( at X!(G a% Dy pleasure' $lad you en-oyed itQ -ow to Prepare or an Interview as a "ntry 9evel Data Analyst by Rick Leander, Demand Media #ata analyst -os vary $reatly etween industries' so preparation is key to landin$ the -o. 7ind out aout the co%pany' its products and services' the industry' the analysis tea%' and the types o, analyses used. In %any cases' %uch o, this in,or%ation can e ,ound on the internet' ut it also helps to talk with people inside the co%pany to $ain a co%petitive advanta$e. Step 1 "tudy the -o postin$. <o online and ,ind the -o listin$ on the co%pany/s wesite or' i, not online' pull the application ,ro% your ,iles. 5rite down each re0uire%ent o, the -o' then list relevant e+perience or trainin$ that relates to the ite%. I, the listin$ %entions survey analysis' descrie a class pro-ect that involved survey preparation' and descrie how the survey was ad%inistered and the statistical techni0ues used to analyCe the results. @epeat this ,or each -o re0uire%ent. Step 2 Jnow the co%pany. "tudy the co%pany/s wesite' payin$ close attention to the AAout ;sB pa$es. 7ind out aout the co%pany/s products or services' the lar$est custo%ers' and the ack$rounds o, the %ana$e%ent tea%. 7or $overn%ent or research co%panies' study their research presentations. 4hese pa$es will o,,er 0uite a it o, insi$ht aout their data analysis techni0ues and practices. 8elated 8eadin'. Accountin$ Fntry Level Interview Questions Step 3 Ask ,or an in,or%ational interview. 5hen possile' ask to call and talk with the hirin$ %ana$er or a lead analyst to ,ind out %ore aout the work ein$ done. 3y understandin$ what the tea% does' you can ali$n your trainin$ and e+perience to etter %atch their needs. 5hen topics arise that are un,a%iliar' take ti%e to research and ,ill in these de,iciencies. Step 7 Prepare to answer co%%on interview 0uestions. Al%ost every interview starts with a 0uestion like Atell %e aout yoursel,B' so e ready with a concise thirty to si+ty second answer that includes a su%%ary o, your trainin$' work e+perience' and one or two o, your personal interests. &ther standard 0uestions will include why you want to work at this co%pany' your stren$ths and weaknesses' and what you can o,,er to the co%pany. Prepare short answers ,or each o, these 0uestions then practice answerin$ the% out loud. Step : 3e ready ,or the tou$h 0uestions. Look throu$h your school transcript and resu%e and look ,or weaknesses that the interviewer %ay proe' like dropped classes or low $rades. Dany e%ployers run ack$round and credit checks so e ready to address any ,inancial or law en,orce%ent issues. F+plain the circu%stances' then show how you learned and $rew ,ro% these e+periences. Step ; 7ind the interview location. ;nless the co%pany sits directly across the street' drive to the interview location ahead o, ti%e. 7ind the %ain entrance and visitor parkin$ then take ti%e to oserve how the sta,, dresses ,or work. &n the day o, the interview' dress -ust a it etter than you oserved. 7or e+a%ple' i, the dress is usiness casual' wear a sport coat and tie. Arrive ,or the interview early' take a %inute or two to check your appearance' then $o in with a positive attitude' knowin$ you are well prepared. References "#$ About the Author @ick Leander lives in the #enver area and has written aout so,tware develop%ent since 1??G. 2e is the author o, A3uildin$ Application "erversB and is co-author o, APro,essional J2FF FAI.L Leander is a pro,essional so,tware developer and has a Dasters o, Arts in co%puter in,or%ation syste%s ,ro% 5ester ;niversity. ** job interview Muestions or data scientists 1. What is the biggest data set that you processed, and how did you process it, what were the results? 2. Tell me two success stories about your analytic or computer science projects? How was lift (or success measured? !. What is" lift, #$%, robustness, model fitting, design of e&periments, '()2( rule? *. What is" collaborati+e filtering, n,grams, map reduce, cosine distance? -. How to optimi.e a web crawler to run much faster, e&tract better information, and better summari.e data to produce cleaner databases? /. How would you come up with a solution to identify plagiarism? 0. How to detect indi+idual paid accounts shared by multiple users? '. 1hould clic2 data be handled in real time? Why? %n which conte&ts? 3. What is better" good data or good models? 4nd how do you define 5good5? %s there a uni+ersal good model? 4re there any models that are definitely not so good? 1(. What is probabilistic merging (4#4 fu..y merging? %s it easier to handle with 167 or other languages? Which languages would you choose for semi, structured te&t data reconciliation? 11. How do you handle missing data? What imputation techni8ues do you recommend? 12. What is your fa+orite programming language ) +endor? why? 1!. Tell me ! things positi+e and ! things negati+e about your fa+orite statistical software. 1*. 9ompare 141, :, $ython, $erl 1-. What is the curse of big data? 1/. Ha+e you been in+ol+ed in database design and data modeling? 10. Ha+e you been in+ol+ed in dashboard creation and metric selection? What do you thin2 about ;irt? 1'. What features of Teradata do you li2e? 13. <ou are about to send one million email (mar2eting campaign. How do you optim.e deli+ery? How do you optimi.e response? 9an you optimi.e both separately? (answer" not really 2(. Toad or ;rio or any other similar clients are 8uite inefficient to 8uery =racle databases. Why? How would you do to increase speed by a factor 1(, and be able to handle far bigger outputs? 21. How would you turn unstructured data into structured data? %s it really necessary? %s it =# to store data as flat te&t files rather than in an 167,powered :>;?1? 22. What are hash table collisions? How is it a+oided? How fre8uently does it happen? 2!. How to ma2e sure a mapreduce application has good load balance? What is load balance? 2*. @&les where mapreduce does not wor2? @&les where it wor2s +ery well? What are the security issues in+ol+ed with the cloud? What do you thin2 of @?9As solution offering an hybrid approach , both internal and e&ternal cloud , to mitigate the ris2s and offer other ad+antages (which ones? 2-. %s it better to ha+e 1(( small hash tables or one big hash table, in memory, in terms of access speed (assuming both fit within :4?? What do you thin2 about in,database analytics? 2/. Why is nai+e ;ayes so bad? How would you impro+e a spam detection algorithm that uses nai+e ;ayes? 20. Ha+e you been wor2ing with white lists? $ositi+e rules? (%n the conte&t of fraud or spam detection 2'. What is star schema? 7oo2up tables? 23. 9an you perform logistic regression with @&cel? (yes How? (use linest on log, transformed data? Would the result be good? (@&cel has numerical issues, but itAs +ery interacti+e !(. Ha+e you optimi.ed code or algorithms for speed" in 167, $erl, 9BB, $ython etc. How, and by how much? !1. %s it better to spend - days de+eloping a 3(C accurate solution, or 1( days for 1((C accuracy? >epends on the conte&t? !2. >efine" 8uality assurance, si& sigma, design of e&periments. Di+e e&les of good and bad designs of e&periments. !!. What are the drawbac2s of general linear model? 4re you familiar with alternati+es (7asso, ridge regression, boosted trees? !*. >o you thin2 -( small decision trees are better than a large one? Why? !-. %s actuarial science not a branch of statistics (sur+i+al analysis? %f not, how so? !/. Di+e e&les of data that does not ha+e a Daussian distribution, nor log, normal. Di+e e&les of data that has a +ery chaotic distribution? !0. Why is mean s8uare error a bad measure of model performance? What would you suggest instead? !'. How can you pro+e that one impro+ement youA+e brought to an algorithm is really an impro+ement o+er not doing anything? 4re you familiar with 4); testing? !3. What is sensiti+ity analysis? %s it better to ha+e low sensiti+ity (that is, great robustness and low predicti+e power, or the other way around? How to perform good cross,+alidation? What do you thin2 about the idea of injecting noise in your data set to test the sensiti+ity of your models? *(. 9ompare logistic regression w. decision trees, neural networ2s. How ha+e these technologies been +astly impro+ed o+er the last 1- years? *1. >o you 2now ) used data reduction techni8ues other than $94? What do you thin2 of step,wise regression? What 2ind of step,wise techni8ues are you familiar with? When is full data better than reduced data or sample? *2. How would you build non parametric confidence inter+als, e.g. for scores? (see the 4nalytic;ridge theorem *!. 4re you familiar either with e&treme +alue theory, monte carlo simulations or mathematical statistics (or anything else to correctly estimate the chance of a +ery rare e+ent? **. What is root cause analysis? How to identify a cause +s. a correlation? Di+e e&les. *-. How would you define and measure the predicti+e power of a metric? */. How to detect the best rule set for a fraud detection scoring technology? How do you deal with rule redundancy, rule disco+ery, and the combinatorial nature of the problem (for finding optimum rule set , the one with best predicti+e power? 9an an appro&imate solution to the rule set problem be =#? How would you find an =# appro&imate solution? How would you decide it is good enough and stop loo2ing for a better one? *0. How to create a 2eyword ta&onomy? *'. What is a ;otnet? How can it be detected? *3. 4ny e&perience with using 4$%As? $rogramming 4$%As? Doogle or 4ma.on 4$%As? 4aa1 (4nalytics as a ser+ice? -(. When is it better to write your own code than using a data science software pac2age? -1. Which tools do you use for +isuali.ation? What do you thin2 of Tableau? :? 141? (for graphs. How to efficiently represent - dimension in a chart (or in a +ideo? -2. What is $=9 (proof of concept? -!. What types of clients ha+e you been wor2ing with" internal, e&ternal, sales ) finance ) mar2eting ) %T people? 9onsulting e&perience? >ealing with +endors, including +endor selection and testing? -*. 4re you familiar with software life cycle? With %T project life cycle , from gathering re8uests to maintenance? --. What is a cron job? -/. 4re you a lone coder? 4 production guy (de+eloper? =r a designer (architect? -0. %s it better to ha+e too many false positi+es, or too many false negati+es? -'. 4re you familiar with pricing optimi.ation, price elasticity, in+entory management, competiti+e intelligence? Di+e e&les. -3. How does EillowAs algorithm wor2? (to estimate the +alue of any home in F1 /(. How to detect bogus re+iews, or bogus Gaceboo2 accounts used for bad purposes? /1. How would you create a new anonymous digital currency? /2. Ha+e you e+er thought about creating a startup? 4round which idea ) concept? /!. >o you thin2 that typed login ) password will disappear? How could they be replaced? /*. Ha+e you used time series models? 9ross,correlations with time lags? 9orrelograms? 1pectral analysis? 1ignal processing and filtering techni8ues? %n which conte&t? /-. Which data scientists do you admire most? which startups? //. How did you become interested in data science? /0. What is an efficiency cur+e? What are its drawbac2s, and how can they be o+ercome? /'. What is a recommendation engine? How does it wor2? /3. What is an e&act test? How and when can simulations help us when we do not use an e&act test? 0(. What do you thin2 ma2es a good data scientist? 01. >o you thin2 data science is an art or a science? 02. What is the computational comple&ity of a good, fast clustering algorithm? What is a good clustering algorithm? How do you determine the number of clusters? How would you perform clustering on one million uni8ue 2eywords, assuming you ha+e 1( million data points , each one consisting of two 2eywords, and a metric measuring how similar these two 2eywords are? How would you create this 1( million data points table in the first place? 0!. Di+e a few e&les of 5best practices5 in data science. 0*. What could ma2e a chart misleading, difficult to read or interpret? What features should a useful chart ha+e? 0-. >o you 2now a few 5rules of thumb5 used in statistical or computer science? =r in business analytics? 0/. What are your top - predictions for the ne&t 2( years? 00. How do you immediately 2now when statistics published in an article (e.g. newspaper are either wrong or presented to support the authorAs point of +iew, rather than correct, comprehensi+e factual information on a specific subject? Gor instance, what do you thin2 about the official monthly unemployment statistics regularly discussed in the press? What could ma2e them more accurate? 0'. Testing your analytic intuition" loo2 at these three charts. Two of them e&hibit patterns. Which ones? >o you 2now that these charts are called scatter,plots? 4re there other ways to +isually represent this type of data? 03. <ou design a robust non,parametric statistic (metric to replace correlation or : s8uare, that (1 is independent of sample si.e, (2 always between ,1 and B1, and (! based on ran2 statistics. How do you normali.e for sample si.e? Write an algorithm that computes all permutations of n elements. How do you sample permutations (that is, generate tons of random permutations when n is large, to estimate the asymptotic distribution for your newly created metric? <ou may use this asymptotic distribution for normali.ing your metric. >o you thin2 that an e&act theoretical distribution might e&ist, and therefore, we should find it, and use it rather than wasting our time trying to estimate the asymptotic distribution using simulations? '(. ?ore difficult, technical 8uestion related to pre+ious one. There is an ob+ious one,to,one correspondence between permutations of n elements and integers between 1 and nH >esign an algorithm that encodes an integer less than nH as a permutation of n elements. What would be the re+erse algorithm, used to decode a permutation and transform it bac2 into a number? Hint" 4n intermediate step is to use the factorial number system representation of an integer. Geel free to chec2 this reference online to answer the 8uestion. @+en better, feel free to browse the web to find the full answer to the 8uestion (this will test the candidateAs ability to 8uic2ly search online and find a solution to a problem without spending hours rein+enting the wheel. '1. How many 5useful5 +otes will a <elp re+iew recei+e? My answer" @liminate bogus accounts (read this article, or competitor re+iews (how to detect them" use ta&onomy to classify users, and location , two %talian restaurants in same Eip code could badmouth each other and write great comments for themsel+es. >etect fa2e li2es" some companies (e.g. Gan?eIow.com will charge you to produce fa2e accounts and fa2e li2es. @liminate prolific users who li2e e+erything, those who hate e+erything. Ha+e a blac2list of 2eywords to filter fa2e re+iews. 1ee if %$ address or %$ bloc2 of re+iewer is in a blac2list such as 51top Gorum 1pam5. 9reate honeypot to catch fraudsters. 4lso watch out for disgruntled employees badmouthing their former employer. Watch out for 2 or ! similar comments posted the same day by ! users regarding a company that recei+es +ery few re+iews. %s it a brand new company? 4dd more weight to trusted users (create a category of trusted users. Glag all re+iews that are identical (or nearly identical and come from same %$ address or same user. 9reate a metric to measure distance between two pieces of te&t (re+iews. 9reate a re+iew or re+iewer ta&onomy. Fse hidden decision trees to rate or score re+iew and re+iewers. '2. What did you do today? =r what did you do this wee2 ) last wee2? '!. What)when is the latest data mining boo2 ) article you read? What)when is the latest data mining conference ) webinar ) class ) wor2shop ) training you attended? What)when is the most recent programming s2ill that you ac8uired? '*. What are your fa+orite data science websites? Who do you admire most in the data science community, and why? Which company do you admire most? '-. What)when)where is the last data science blog post you wrote? '/. %n your opinion, what is data science? ?achine learning? >ata mining? '0. Who are the best people you recruited and where are they today? ''. 9an you estimate and forecast sales for any boo2, based on 4ma.on public data? Hint" read this article. '3. WhatAs wrong with this picture? 3(. 1hould remo+ing stop words be 1tep 1 rather than 1tep !, in the search engine algorithm described here? Answer" Ha+e you thought about the fact that mine and yours could also be stop words? 1o in a bad implementation, data mining would become data mine after stemming, then data. %n practice, you remo+e stop words before stemming. 1o 1tep ! should indeed become step 1. 31. @&perimental design and a bit of computer science with 7egoAs You need to e a memer of !ata Science "entral to add comments# Join #ata "cience )entral $o%%ent by "!shal! ra2!" on Jo"e%ber 1=, 2013 at 11D3:p% .(incent can i $et the possile answers ,or the aove interview 0uestions Vishali $o%%ent by E!nent Gran"!lle on Septe%ber 12, 2013 at 9D2;a% I have added one new 0uestion - 0uestion I?0. $o%%ent by E!nent Gran"!lle on May :, 2013 at <D77p% "o%eone wrote! %oos lie hiring managers expect data scientists to have expertise in machine learning, statistics, business intelligence, database design, data munging, data visuali&ation and programming. 're not these requirements too excessive( &y answer! I think ein$ familiar with all these do%ains 9add co%puter science' %ap reduce: is necessary' as well as e+pertise in so%e o, these do%ains. Dasterin$ two pro$ra%%in$ lan$ua$es 9Java' Python: is a %ust' as well as ,a%iliarity with @ and "QL. VisualiCation is easy to ac0uire. Jnowin$ how to 0uickly and independently ,ind' learn 9or i, necessary' invent: and assess use,ulness o, the techni0ues needed to handle the prole%s' is critical' and D&@F i%portant than Lknowin$L the techni0ues in the ,irst place. A $ood a%ount o, e+perience with some techni0ues is necessary. 3ut you donKt need to e an e+pert in everythin$. 7or instance' aout ?0> o, what I learned in statistics courses' IKve never had to use it to solve usiness prole%s. "o why learn it in the ,irst place1 Also' %achine learnin$ 9in %y opinion: is a suset o, statistics ,ocusin$ on clusterin$' pattern reco$nition and association rules. 4he %istake that %any hirin$ %ana$ers do is lookin$ ,or so%eone who is an e+pert in everythin$. $o%%ent by Joe M on Apr!l 21, 2013 at :D0;p% Are 0uestions like this actually asked in hi$h-level interviews1 All I ever $ot when I was startin$ out was L5hat was your %ost satis,yin$ e+perience1L and L&ther than ,or the %oney' why do you want this -o1L $o%%ent by '!tendra on Aebruary 2:, 2013 at 12D77p% Vincent' Que are $ood and %any o, the% are ased on practical e+p too. 4h+ ,or sharin$ the co%%ents ,ro% Allen Fn$elhardt' it provides etter conte+t. Answer to IX=' I would luv to see I1 VA -T Visual Analytics ein$ top o, the%. I2 should e' lar$e a%ount o, the data replaced y videoKs. $o%%ent by Mars Ma on Aebruary 20, 2013 at :D:1a% VVincent <ranville' really use,ul 0uestions' I like the%' thanks a lot QZZ $o%%ent by A%y on Aebruary 19, 2013 at 11D70a% V)rai$! I, the client returns result in your rowser' you can handle only as %uch data as your rowser can. In %ost cases' a G0'000 row tale will crash your rowser. Just access &racle directly via Python or Perl' and you can handle 9e+tract and save: $i$aytes o, data 0uite easily. And ,ar' ,ar ,aster. $o%%ent by A%y on Aebruary 19, 2013 at 11D3<a% 5hat %akes you a data scientist1 *ou -ust need to know how to $ather and turn data into %oney - nothin$ %ore' nothin$ less. .o de$ree needed' you can learn so%e techni0ues y readin$ %aterial online' ut %uch o, what %akes a success,ul data scientist 9dataEusiness cra,ts%anship: is not ,ound in any curriculu% or pulished article. $o%%ent by ra!g ha%bers on Aebruary 19, 2013 at 10D22a% )an so%eone help %e with I20 - L4oad or ;rio or any other similar clients are 8uite inefficient to 8uery =racle databases. Why? How would you do to increase speed by a factor 1(, and be able to handle far bigger outputs? 5 % didnAt 2now that 167 clients really affected actual 8uery efficiency. Than2sH $o%%ent by E!nent Gran"!lle on Aebruary 1;, 2013 at <D19a% 2ereKs a potential answer to 0uestion I10 9proailistic %er$in$:. 7eel ,ree to add your answers to any o, these 0uestions. Answer to Muestion E1C! .ot sure i, the prole% o, ,uCCy %er$in$ can e addressed within the ,ra%ework o, traditional dataases. "ay you have a tale A with 10'000 users 9key is user I#:' a tale 3 with P0'000 users 9key is user I#:. *ou could created a user %appin$ tale ) with three ,ields! 1# user1) BK -eyC, 2# AlternateL*ser1) Bth!s f!eld ,ould also be a user 1)C and 3# Probab!l!ty Bprobab!l!ty that user1) K AlternateL*ser1)C# 4his tale would e populated a,ter so%e %achine learnin$ al$orith% had een applied to tales A and 3 to identi,y si%ilar users and the proaility they %atch. Dake sure that you only include 9in tale ): records where proaility is aove 9say: 0.2P' otherwise you risk e+plodin$ your dataase. You need to e a memer of !ata Science "entral to add comments# Join #ata "cience )entral $o%%ent by E!nent Gran"!lle on Aebruary 1;, 2013 at ;D:7a% Also' %y X0 0uestions ,ocus %ostly on the tech aspects o, ein$ a data scientist. And these are hi$h level 0uestions' ai%ed at senior pro,essionals 9I think there is no such thin$ as a -unior data scientist - they would e called data analyst' so,tware en$ineer' statistician or co%puter scientist instead:. I did not include 0uestions aout so,t skills - that would e another set o, X0 0uestions. I will add a new one! do you think data science is an art or a science1 4he answer' as always' is LothL. 4hen you can di$ deeper and ask whether you are %ore o, an artist than a scientist. Dy answer would e! itKs %ore cra,ts%anship than art' ut in %y case' ein$ a desi$nerEarchitect' itKs a tiny it closer to art than to science. )ertainly a lend o, oth. And when rin$in$ the issue o, art vs. "cience' I would also add that I like to uild solutions that are ele$ant in the way they contriute to @&I E li,t' ut not in the way they contriute to statistical theory and the eauty o, science. I like a dirty' u$ly' i%per,ect solution etter than a L$reat %odelL i, it is %ore scalale' si%ple' e,,icient' easy to i%ple%ent and roust. $o%%ent by '!hard G!a%brone on Aebruary 1:, 2013 at 12D:1p% Vincent' I like these 0uestions. 4hey are $ood 0uestions to ask yoursel, even i, youKre not interviewin$. ;nderstandin$ what you do is di,,erent ,ro% ein$ ale to e+plain what you do. @ich <ia%rone $o%%ent by E!nent Gran"!lle on Aebruary 1:, 2013 at 10D22a% A data scientist is a it o, everythin$ 9statistician' so,tware en$ineer' usiness analyst' co%puter scientist' si+ si$%a' consultant' co%%unicator:' ut %ost i%portantly she is a senior analytic practitioner ,!th a "ery good sense for bus!ness data and bus!ness opt!%!Iat!on at large# -no,ledge of b!g data 6 both dra,ba-s and potent!al Band able to le"erage !ts potent!alC ,ho en2oys s,!%%!ng !n unstrutured data, fuIIy non6SM0 N2o!nsN ,ho -no,s the l!%!tat!on of old stat!st!s Bregress!on et#C yet -no,s ho, to orretly do sa%pl!ng, ross6"al!dat!on, Monte $arlo s!%ulat!ons, des!gn of e.per!%ents, assess!ng l!ft, !dent!fy good %etr!s ,ho -no,s the l!%!tat!ons of Map'edue, and ho, they an be o"ero%e ,ho an des!gn and de"elop robust, s!%ple, eff!!ent, rel!able, salable, useful pred!t!"e algor!th%s 6 ,hether or not based on stat!st!al theory A data scientist %ay not know %uch 9ut at least a little: aout linear re$ression' statistical distriutions' the co%ple+ity o, the 0uicksort 9sortin$: al$orith% or the li%it theore%s. 2er knowled$e o, "QL can e a it ele%entary' althou$h she can run a i$ "QL 0uery 10 ti%es ,aster than usiness analysts who use tools such as 4oad or 3rio. 2er stren$ths' skills and knowled$e are rie,ly outlined aove. $o%%ent by E!nent Gran"!lle on Aebruary 1:, 2013 at =D27a% Interviewers would pick a s%all suset' thereKs not enou$h ti%e in a one-day interview to ask all these 0uestions. Also' several o, these 0uestions are aout relevant pro-ects 9e.$. 0uestions I1 and I2:. &, course' these are not yesEno 0uestion' and one would e+pect to spend 10-1P %inutes and $o in so%e depths answerin$ these 0uestions. .ot ein$ ale to answer one 0uestion in no i$ deal - this set has X0 0uestions and the interviewer can easily pick another one. Indeed' this is the purpose o, %y list. $o%%ent by E!nent Gran"!lle on Aebruary 17, 2013 at ;D:2a% 2ereKs a co%%ent ,ro% one o, our readers! "o%e su$$estions ,or structure you %ay want to apply to your own list! H 4ools 9I1( 1N 1G etc: H Al$orith%s 9I2= (( etc: H "tatistics 9I(P (= (X etc: H 4echni0ues 9I( N 10 etc: H #ata "tructures 9I21 22 2P etc: H F+perience 9I1 2 etc: H 3usiness lan$ua$e 9P2 PN: H #o%ain-speci,ic 9IP = X G 10 1? 20 21 2N 2X N= PP P? and proaly others: H Plain weirdness 9IP NG P? =1 =(: It is proaly worth thinkin$ aout the areas that are i%portant to you and %ana$e a list ased on those. I donKt think Vincent e+pects us to -ust use the list e+cept ,or inspiration. Dy ,avourites ,ro% the list 9,or senior people: are I2 ? 9%y answer! Lvaluale actions are estL: and =2. 5hich ones are your ,avourites1 By Allan Engelhardt Interview Questions or Data 0cientists Posted. January (' 201( e Author. 2ilary Dason e 3iled under. lo$ e /a's. datascience' hirin$' startups e !P )o%%ents f <reat data scientists co%e ,ro% such diverse ack$rounds that it can e di,,icult to $et a sense o, whether so%eone is up to the -o in -ust a short interview. In addition to the technical 0uestions' I ,ind it use,ul to have a ,ew 0uestions that draw out the %ore creative and less discrete ele%ents o, a candidate/s personality. 2ere are a ,ew o, %y ,avorite 0uestions. 1. ,hat was the last thin' that you made or un? 4his is %y ,avorite 0uestion y ,ar M I want to work with the kind o, people who don/t turn their rains o,, when they $o ho%e. It/s also a $reat way to learn what $ets people e+cited. 2. ,hatIs your avorite al'orithm? Can you e#plain it to me? I don/t know any data scientists who haven"t ,allen in love with an al$orith%' and I want to see oth that enthusias% and that the candidate can e+plain it to a knowled$ale audience. ;pdate! As #rew pointed out on 4witter' do e aware o, hammer syndrome! when so%eone ,alls so in love with one al$orith% that they try to apply it to everythin$' even when etter choices are availale. (. /ell me about a data project youIve done that was successul. -ow did you add uniMue value? 4his is a chance ,or the candidate to walk us throu$h a success and show o,, a it. It/s also a $reat $ateway into talkin$ aout their process and pre,erred tools and e+perience. N. /ell me about somethin' that ailed. ,hat would you chan'e i you had to do it over a'ain? 4his is a tricky 0uestion' and so%eti%es it takes people a ,ew tries to $et to a co%plete answer. It/s worth askin$' thou$h' to see that people have the con,idence to talk aout so%ethin$ that went awry' and the wisdo% to have reco$niCed when so%ethin$ they did was not opti%al. P. Oou clearly +now a bit about our data and our wor+. ,hen you loo+ aroundB whatIs the irst thin' that comes to mind as ;why havenIt you done Q<?H 4echnical co%petence is useless without the creativity to know where to ,ocus it. I love when people co%e in with 0uestions and ideas. =. ,hatIs the best interview Muestion anyone has ever as+ed you? I/d like to wish ,or %ore wishes' please. I/% always lookin$ ,or new and interestin$ thin$s to add to %y list' and I/d love to hear your su$$estions. Al'orithms "very Data 0cientist 0hould Dnow. 8eservoir 0amplin' by Josh 3!lls BO2oshL,!llsC Apr!l 23, 2013 2 o%%ents #ata scientists' that peculiar %i+ o, so,tware en$ineer and statistician' are notoriously di,,icult to interview. &ne approach that I/ve used over the years is to pose a prole% that re0uires so%e %i+ture o, al$orith% desi$n and proaility theory in order to co%e up with an answer. 2ere/s an e+a%ple o, this type o, 0uestion that has een popular in "ilicon Valley ,or a nu%er o, years! )ay you have a stream of items of large and unnown length that we can only iterate over once. *reate an algorithm that randomly chooses an item from this stream such that each item is equally liely to be selected. 4he ,irst thin$ to do when you ,ind yoursel, con,ronted with such a 0uestion is to stay calm. 4he data scientist who is interviewin$ you isn/t tryin$ to trick you y askin$ you to do so%ethin$ that is i%possile. In ,act' this data scientist is desperate to hire you. "he is uried under a pile o, analysis re0uests' her F4L pipeline is roken' and her %achine learnin$ %odel is ,ailin$ to conver$e. 2er only hope is to hire s%art people such as yoursel, to co%e in and help. "he wants you to succeed. @e%e%er! "tay )al%. 4he second thin$ to do is to think deeply aout the 0uestion. Assu%e that you are talkin$ to a $ood person who has read #aniel 4unkelan$/s e+cellent advice aout interviewin$ data scientists. 4his %eans that this interview 0uestion proaly ori$inated in a real prole% that this data scientist has encountered in her work. 4here,ore' a si%ple answer like' AI would put all o, the ite%s in a list and then select one at rando% once the strea% ended'B would e a ad thin$ ,or you to say' ecause it would %ean that you didn/t think deeply aout what would happen i, there were %ore ite%s in the strea% than would ,it in %e%ory 9or even on diskQ: on a sin$le co%puter. 4he third thin$ to do is to create a simple e+a%ple prole% that allows you to work throu$h what should happen ,or several concrete instances o, the prole%. 4he vast %a-ority o, hu%ans do a %uch etter -o o, solvin$ prole%s when they work with concrete e+a%ples instead o, astractions' so %akin$ the prole% concrete can $o a lon$ way toward helpin$ you ,ind a solution. A $rimer on %eservoir Samplin& 7or this prole%' the si%plest concrete e+a%ple would e a strea% that only contained a sin$le ite%. In this case' our al$orith% should return this sin$le ele%ent with proaility 1. .ow let/s try a sli$htly harder prole%' a strea% with e+actly two ele%ents. 5e know that we have to hold on to the ,irst ele%ent we see ,ro% this strea%' ecause we don/t know i, we/re in the case that the strea% only has one ele%ent. 5hen the second ele%ent co%es alon$' we know that we want to return one o, the two ele%ents' each with proaility 1E2. "o let/s $enerate a rando% nu%er + etween 0 and 1' and return the ,irst ele%ent i, + is less than 0.P and return the second ele%ent i, + is $reater than 0.P. .ow let/s try to $eneraliCe this approach to a strea% with three ele%ents. A,ter we/ve seen the second ele%ent in the strea%' we/re now holdin$ on to either the ,irst ele%ent or the second ele%ent' each with proaility 1E2. 5hen the third ele%ent arrives' what should we do1 5ell' i, we know that there are only three ele%ents in the strea%' we need to return this third ele%ent with proaility 1E(' which %eans that we/ll return the other ele%ent we/re holdin$ with proaility 1 6 1E( U 2E(. 4hat %eans that the proaility o, returnin$ each ele%ent in the strea% is as ,ollows! 1# A!rst Ele%entD B1+2C P B2+3C K 1+3 2# Seond Ele%entD B1+2C P B2+3C K 1+3 3# @h!rd Ele%entD 1+3 3y considerin$ the strea% o, three ele%ents' we see how to $eneraliCe this al$orith% to any .! at every step .' keep the ne+t ele%ent in the strea% with proaility 1E.. 4his %eans that we have an 9.-1:E. proaility o, keepin$ the ele%ent we are currently holdin$ on to' which %eans that we keep it with proaility 91E9.-1:: H 9.-1:E. U 1E.. 4his $eneral techni0ue is called reservoir sa%plin$' and it is use,ul in a nu%er o, applications that re0uire us to analyCe very lar$e data sets. *ou can ,ind an e+cellent overview o, a set o, al$orith%s ,or per,or%in$ reservoir sa%plin$ in this lo$ post y <re$ <rothaus. I/d like to ,ocus on two o, those al$orith%s in particular' and talk aout how they are used in )loudera DL' our open-source collection o, data preparation and %achine learnin$ al$orith%s ,or 2adoop. Applied %eservoir Samplin& in "loudera '( 4he ,irst o, the al$orith%s <re$ descries is a distributed reservoir sa%plin$ al$orith%. *ou/ll note that ,or the al$orith% we descried aove to work' all o, the ele%ents in the strea% %ust e read se0uentially. 4o create a distriuted reservoir sa%ple o, siCe J' we use a Dap@educe analo$ue o, the &@#F@ 3* @A.#9: trickEanti-pattern ,ro% "QL! ,or each ele%ent in the set' we $enerate a rando% nu%er + etween 0 and 1' and keep the J ele%ents that have the lar$est values o, +. 4his trick is especially use,ul when we want to create strati,ied sa%ples ,ro% a lar$e dataset. Fach stratu% is a speci,ic co%ination o, cate$orical variales that is i%portant ,or an analysis' such as $ender' a$e' or $eo$raphical location. I, there is si$ni,icant skew in our input data set' it/s possile that a naive rando% sa%plin$ o, oservations will underrepresent certain strata in the dataset. )loudera DL has a sa%ple co%%and that can e used to create strati,ied sa%ples ,or te+t ,iles and 2ive tales 9via the 2)atalo$ inter,ace to the 2ive Detastore: such that . records will e selected ,or every co%ination o, the cate$orical variales that de,ine the strata. 4he second al$orith% is even %ore interestin$! a weighted distriuted reservoir sa%ple' where every ite% in the set has an associated wei$ht' and we want to sa%ple such that the proaility that an ite% is selected is proportional to its wei$ht. It wasn/t even clear whether or not this was even possile until Pavlos F,rai%idis and Paul "pirakis ,i$ured out a way to do it and pulished it in the 200P paper A5ei$hted @ando% "a%plin$ with a @eservoir.B 4he solution is as si%ple as it is ele$ant' and it is ased on the sa%e idea as the distriuted reservoir sa%plin$ al$orith% descried aove. 7or each ite% in the strea%' we co%pute a score as ,ollows! ,irst' $enerate a rando% nu%er + etween 0 and 1' and then take the nth root o, +' where n is the wei$ht o, the current ite%. @eturn the J ite%s with the hi$hest score as the sa%ple. Ite%s with hi$her wei$hts will tend to have scores that are closer to 1' and are thus %ore likely to e picked than ite%s with s%aller wei$hts. In )loudera DL' we use the wei$hted reservoir sa%plin$ al$orith% in order to cut down on the nu%er o, passes over the input data that the scalale k-%eansSS al$orith% needs to per,or%. 4he ksketch co%%and runs the k-%eansSS initialiCation procedure' per,or%in$ a s%all nu%er o, iterations over the input data set to select points that ,or% a representative sa%ple 9or setch: o, the overall data set. 7or each iteration' the proaility that a $iven point should e added to the sketch is proportional to its distance ,ro% the closest point in the current sketch. 3y usin$ the wei$hted reservoir sa%plin$ al$orith%' we can select the points to add to the ne+t sketch in a sin$le pass over the input data' instead o, one pass to co%pute the overall cost o, the clusterin$ and a second pass to select the points ased on those cost calculations. These Boo)s Behind 'e !on*t +ust 'a)e The ,ffice (oo) -ood Interestin$ al$orith%s aren/t -ust ,or the en$ineers uildin$ distriuted ,ile syste%s and search en$ines' they can also co%e in handy when you/re workin$ on lar$e-scale data analysis and statistical %odelin$ prole%s. I/ll try to write so%e additional posts on al$orith%s that are interestin$ as well as use,ul ,or data scientists to learn' ut in the %eanti%e' it never hurts to rush up on your Jnuth. -ow to hire data scientists and 'et hired as one s you %i$ht have heard e,ore i, you read DcJinsey reports' the .ew *ork 4i%es or -ust aout any technolo$y news site' data scientists are in hi$h de%and. 2eck' the 2arvard 3usiness @eview called it the se+iest -o o, the 21st century. 3ut landin$ a $i$ as a data scientist isn/t easy M especially a top-notch $i$ at a %a-or we or e-co%%erce co%pany where %erely talented people are a di%e a doCen. 2owever' co%panies are startin$ to talk openly aout what they look ,or in data scientists' includin$ the skills so%eone should have and what they/ll need to know to survive an interview. I spent a day at the Predictive Analytics 5orld con,erence on Donday and heard oth .et,li+ and &ritC $ive their two cents. 4hat/s also the sa%e day 2ortonworks pulished a lo$ post aout how to uild a data science tea%. <ranted that Adata scientistB is a neulous ter% M perhaps as %uch so as Ai$ dataB M these tips 9a %ashup o, all three sources: are still roadly applicale. I, you want to %ake the leap ,ro% $uy who knows data to data scientist' I su$$est payin$ attention. 1/ 0now the core competencies/ 7or %ost o, us' there/s readin' \ritin/ and \rith%etic. 7or data scientists' there/s "QL' statistics' predictive %odelin$ and pro$ra%%in$ 9proaly Python:. I, you don/t have at least a $roundin$ in these skills' you/re proaly not $ettin$ throu$h the door' in part ecause they ,or% a co%%on lan$ua$e that lets people ,ro% di,,erent ack$rounds talk to each other. 2ortonworks/ &,er Dendelevitch descries the ideal data scientist as occupyin$ a place on the spectru% etween a so,tware en$ineer and a research scientist. In distin$uishin$ a $reat en$ineer' %athe%atician or data analyst ,ro% a data scientist' pro$ra%%in$ skills are proaly the i$$est variale. 4hat/s ecause ein$ ale to write code %eans you/ll have an easier ti%e testin$ out your hypotheses and al$orith%s' hackin$ throu$h certain prole%s and $enerally thinkin$ in ways that actually relate to the products your e%ployer is uildin$. "ource! 2ortonworks )hris Pouliot' director o, al$orith%s and analytics at .et,li+' said even ein$ ale to Apseudo- codeB %i$ht e $ood enou$h i, so%eone is otherwise a stron$ candidate. *ou can pick up "QL or Python or whatever you need pretty 0uickly' he noted. &r' hinted &ritC VP o, Advanced Analytics "a%eer )hopra' you could -ust suck it up and learn Python now! AI, you were to leave today and ask \5hat speci,ic skills should I learn1/! Python.B 2/ 0now a little more/ &, course' -ust %eetin$ the %ini%u% re0uire%ents never $ot anyody a -o 9well' al%ost noody:. 5hat Pouliot is really lookin$ ,or in a candidate are! an advanced de$ree in a 0uantitative ,ieldO hands-on e+perience hackin$ data 9ideally usin$ 2ive' Pi$' "QL or Python:O $ood e+ploratory analysis skillsO the aility to work with en$ineerin$ tea%sO and the aility to $enerate and create al$orith%s and %odels rather than relyin$ on out-o,-the-o+ ones. )hopra/s advice was to $et up to speed on %achine learnin$' especially i, you want to work in "ilicon Valley' where %achine learnin$ has e+ploded in popularity. 2e/s also a i$ ,an o, honin$ those hackin$ skills ecause data %un$in$ is such a valuale skill when you/re dealin$ with so %any types o, data that you need to process so they work to$ether. I, you can do 0uality analytics across %yriad data sources' )hopra said' Ayou can write your own ticket in this day and a$e.B &h' and i, you/re plannin$ to work at a startup' he added' @ is al%ost a %ust-know ,or anyone whose -o will entail statistical analysis. 3/ &mbrace online learning/ I, it all sounds a little dauntin$' don/t e too worried' )hopra advised. 4hat/s ecause there are plenty o, opportunities to learn these new skills online via oth %assive open online courses 9he/s particularly keen on ;dacity/s )o%puter "cience 101 and Andrew .$/s %achine learnin$ course on )oursera: and universities/ own online curricula. )hopra also su$$ested -oinin$ pro,essional $roups on LinkedIn' participatin$ in Ja$$le co%petitons and %aye even $ettin$ out o, the house y $oin$ to %eetups. 5hatever you/re curious aout' thou$h M te+t %inin$' natural lan$ua$e processin$' deep learnin$ M you can proaly ,ind so%eone willin$ to teach you ,or ,ree or nearly ,ree' and any additional skills will help set you apart ,ro% the crowd. !/ 1earn to tell a story/ Last %onth at "tructure! #ata' #J Patil told %e that one o, the i$$est skill shortco%in$s in data science is the aility to tell a story with data eyond -ust pointin$ to the nu%ers. )hopra a$reed' notin$ that today/s new visualiCation tools %ake it easier to display data in ,or%ats that non- scientists %i$ht e ale to 9or at least want to: consu%e. A corollary o, storytellin$ is $ood' old- ,ashioned co%%unication! All the charts in the world won/t %ake a di,,erence i, you can/t co%%unicate to product %ana$ers or e+ecutives why your ,indin$s %atter. Pouliot is a little less sold on co%%unication skills' thou$h M at least so%eti%es. I, you/re an en$ineer pri%arily talkin$ to other en$ineers' he told the roo%' you proaly can speak all the -ar$on you want. It/s only i, so%eone has a usiness-,acin$ role when co%%unication really eco%es i%portant. "/ +repare to be tested 2aka 34our pedigree means nothing5)/ A,ter you/ve learned all these skills' added the% to your rgsu%g and talked to a hirin$ %ana$er aout how $ood you are at the%' it/s likely testin$ ti%e. Prospective .et,li+ data scientists $o throu$h a attery o, e+ercises' Pouliot says' includin$ e+plainin$ pro-ects they/ve worked on and 0uestions to deter%ine the depth o, their knowled$e. 4hey/ll also e asked to devise a ,ra%ework that solves a prole% o, the interviewer/s choice. )hris Pouliot &ne thin$ Pouliot warned aout is an over-reliance on what/s on your rgsu%g. @i$ht o,, the at' ,or e+a%ple' he/ll test the heck out the skills or knowled$e that so%eone clai%s to ensure they really know it. 2avin$ a "tan,ord de$ree and work e+perience at <oo$le don/t necessarily %ake so%eone a shoo-in' either. Pouliot acknowled$ed durin$ a 0uick chat a,ter his presentation that he/s een seduced y the per,ect resu%e e,ore M even $oin$ so ,ar as to cut a ,ew corners to $et so%eone in ,or an interview M only to e disappointed in the end. Fveryone has to pass the tests' he said' and so%e o, the est applicants on paper crashed and urned very early in the process. 6/ &'ercise creativity/ It/s durin$ the testin$ phase at places like .et,li+ that all those personal skills and e+perience can co%e into play. 4here/s o,ten no ri$ht answer when it co%es to answerin$ the hypotheticals an interviewer like Pouliot %i$ht ask' and he $ives onus points ,or solutions he/s never seen e,ore. A)reativity is one o, the i$$est thin$s to look ,or when hirin$ data scientists'B he said. Later' he added' A)reativity is kin$' I think' ,or a $reat data scientist.B 7onus tips or anyone hiring and managing data scientists 4echnically' Pouliot/s talk at Predictive Analytics 5orld was aout hirin$ data scientists' ut %uch o, the insi$hts were proaly %ore valuale to aspirin$ data scientists. "o%e o, the%' thou$h' we/re de,initely ,or %ana$e%ent' possily at the )-level. A ,ew points to consider! Jetfl!. has a standalone data s!ene tea% that ,or-s losely ,!th other depart%ents but ult!%ately ans,ers to !tself# @h!s helps the data s!ent!sts ollaborate ,!th one another, g!"es the% up,ard %ob!l!ty B!#e#, they %!ght ne"er beo%e d!retor of %ar-et!ng, but they ould beo%e d!retor of data s!eneC and %a-es !t eas!er to %anage the% beause e"eryone spea-s the sa%e language so an e%ployee -no,s h!s boss -no,s h!s stuff# 2owever' he noted' the alternative approach o, e%eddin$ data scientists within other depart%ents does rin$ its own ene,its. 4hat type o, setup can result in a etter ali$n%ent o, research e,,orts and usiness needs' and it can help products $et uilt ,aster ecause everyone is on the sa%e pa$e. Pouliot su$$ests one co%pro%ise %i$ht e to keep a centraliCed data science tea% ut locate it physically near the other tea%s it will e interactin$ with %ost o,ten' and other is -ust to ensure you have representatives ,ro% every stakeholder depart%ent present ,or %eetin$s and prole%-solvin$ e+ercises. Atually, !f you 2ust annot h!re data s!ent!sts ,!th all the s-!lls you ,ant the% to ha"e, Mendele"!th fro% 4orton,or-s suggests a s!%!lar tat!# 1t an be d!ff!ult to teah appl!ed %ath to soft,are eng!neers and "!e "ersa, so, he ,r!tes, >QSR!%ply bu!ld a 4adoop data s!ene tea% that o%b!nes data eng!neers and appl!ed s!ent!sts, ,or-!ng !n tande% to bu!ld your data produts# Ba- ,hen 1 ,as at SahooT, that/s e.atly the struture ,e hadD appl!ed s!ent!sts ,or-!ng together ,!th data eng!neers to bu!ld large6 sale o%putat!onal ad"ert!s!ng syste%s#? 1f you ,ant to reta!n your good data s!ent!sts one you/"e h!red the% U espe!ally !n S!l!on Ealley ,here they an ,al- out the door and get f!"e offers U pay!ng the% the %ar-et rate !s a good start# Add!t!onally, Poul!ot sa!d, lett!ng the% ,or- on halleng!ng produts ,!ll -eep the% happy# M!ro6%anag!ng the% ,!ll not#