Vous êtes sur la page 1sur 30

Common Analytics Interview Questions

Question 1. Can you outline the various steps in an analytics project?


Broadly speaking these are the steps. Of course these may vary slightly depending on the
type of problem, data, tools available etc.
1. Problem definition The first step is to of course understand the business problem.
What is the problem you are trying to solve what is the business context? Very often
however your client may also just give you a whole lot of data and ask you to do something
with it. In such a case you would need to take a more exploratory look at the data.
Nevertheless if the client has a specific problem that needs to be tackled, then then first
step is to clearly define and understand the problem. You will then need to convert the
business problem into an analytics problem. I other words you need to understand exactly
what you are going to predict with the model you build. There is no point in building a
fabulous model, only to realise later that what it is predicting is not exactly what the
business needs.
2. Data Exploration Once you have the problem defined, the next step is to explore the
data and become more familiar with it. This is especially important when dealing with a
completely new data set.
3. Data Preparation Now that you have a good understanding of the data, you will need
to prepare it for modelling. You will identify and treat missing values, detect outliers,
transform variables, create binary variables if required and so on. This stage is very
influenced by the modelling technique you will use at the next stage. For example,
regression involves a fair amount of data preparation, but decision trees may need less
prep whereas clustering requires a whole different kind of prep as compared to other
techniques.
4. Modelling Once the data is prepared, you can begin modelling. This is usually an
iterative process where you run a model, evaluate the results, tweak your approach, run
another model, evaluate the results, re-tweak and so on.. You go on doing this until you

come up with a model you are satisfied with or what you feel is the best possible result with
the given data.
5. Validation The final model (or maybe the best 2-3 models) should then be put through
the validation process. In this process, you test the model using completely new data set i.e.
data that was not used to build the model. This process ensures that your model is a good
model in general and not just a very good model for the specific data earlier used
(Technically, this is called avoiding over fitting)
6. Implementation and tracking The final model is chosen after the validation. Then you
start implementing the model and tracking the results. You need to track results to see the
performance of the model over time. In general, the accuracy of a model goes down over
time. How much time will really depend on the variables how dynamic or static they are,
and the general environment how static or dynamic that is.

Question 2. What do you do in data exploration?


Data exploration is done to become familiar with the data. This step is especially important
when dealing with new data. There are a number of things you will want to do in this step
a.

What is there in the data look at the list of all the variables in the data set.

Understand the meaning of each variable using the data dictionary. Go back to the business
for more information in case of any confusion.
b.

How much data is there look at the volume of the data (how many records), look

at the time frame of the data (last 3 months, last 6 months etc.)
c.

Quality of the data how much missing information, quality of data in each

variable. Are all fields usable? If a field has data for only 10% of the observations, then
maybe that field is not usable etc.
d.

You will also identify some important variables and may do a deeper investigation of

these. Like looking at averages, min and max values, maybe 10 th and 90th percentile as
well

e.

You may also identify fields that you need to transform in the data prep stage.

Question 3: What do you do in data preparation?


In data preparation, you will prepare the data for the next stage i.e. the modelling stage.
What you do here is influenced by the choice of technique you use in the next stage.
But some things are done in most cases example identifying missing values and treating
them, identifying outlier values (unusual values) and treating them, transforming variables,
creating binary variables if required etc,
This is the stage where you will partition the data as well. i.e create training data (to do
modelling) and validation (to do validation).

Question 4: How will you treat missing values?


The first step is to identify variables with missing values. Assess the extent of missing
values. Is there a pattern in missing values? If yes, try and identify the pattern. It may lead
to interesting insights.
If no pattern, then we can either ignore missing values (SAS will not use any observation
with missing data) or impute the missing values.
Simple imputation substitute with mean or median values
OR
Case wise imputation for example, if we have missing values in the income field.

Question 5: How will you treat outlier values?

You can identify outliers using graphical analysis and univariate analysis. If there are only a
few outliers, you can assess them individually. If there are many, you may want to substitute
the outlier values with the 1stpercentile or the 99th percentile values.
If there is a lot of data, you may decide to ignore records with outliers.
Not all extreme values are outliers. Not all outliers are extreme values.

Question 6: How do you assess the results of a logistic regression analysis?


You can use different methods to assess how good a logistic model is.
a. Concordance This tells you about the ability of the model to discriminate between the
event happening and not happening.
b. Lift It helps you assess how much better the model is compared to random selection.
c. Classification matrix helps you look at the false positives and true negatives.
Some other general questions you will most likely be asked:

What have you done to improve your data analytics knowledge in the past year?

What are your career goals?

Why do you want a career in data analytics?

The answers to these questions will have to be unique to the person answering it. The key
is to show confidence and give well thought out answers that demonstrate you are
knowledgeable about the industry and have the conviction to work hard and excel as a data
analyst.

Macro Interview Question (for fresher)


Macro Interview Question

1. Have you used macros? For what purpose you have used?

Yes I have, I used macros in creating analysis datasets and tables where it is
necessary to make a
small change through out the program and where it is necessary to use the code
again and again.

2. How would you invoke a macro?


After I have defined a macro I can invoke it by adding the percent sign prefix to its
name like
this: % macro name a semicolon is not required when invoking a macro, though
adding one
generally does no harm.
3. How can you create a macro variable with in data step?
with CALL SYMPUT

4. How would you identify a macro variable?


with Ampersand (&) sign

5. How would you define the end of a macro?


The end of the macro is defined by %Mend Statement

6. For what purposes have you used SAS macros?

If we want use a program step for executing to execute the same Proc step on
multiple data sets.
We can accomplish repetitive tasks quickly and efficiently. A macro program can be
reused
many times. Parameters passed to the macro program customize the results
without having to
change the code within the macro program. Macros in SAS make a small change in
the program
and have SAS echo that change thought that program.

7. What is the difference between %LOCAL and %GLOBAL?


% Local is a macro variable defined inside a macro.%Global is a macro variable
defined in open
code (outside the macro or can use anywhere).

8. How long can a macro variable be? A token?


A component of SAS known as the word scanner breaks the program text into
fundamental units
called tokens.
Tokens are passed on demand to the compiler.
The compiler then requests token until it receives a semicolon.
Then the compiler performs the syntax check on the statement.

9. If you use a SYMPUT in a DATA step, when and where can you use the macro
variable?
The macro variable created by the CALL SYMPUT routine cannot be used in the
same datastep
in which it got created. Other than that we can use the macro variable at any time..

10. What do you code to create a macro? End one?


We create a macro with %MACRO statement and end a macro with %MEND
statemnt.

11. What is the difference between %PUT and SYMBOLGEN?

%PUT is used to display user defined messages on log window after execution of a
program
where as % SYMBOLGEN is used to print the value of a macro variable resolved, in
log
window.
12. How do you add a number to a macro variable?
Using %eval function or %sysevalf function if the number is a floating number.

13. Can you execute a macro within a macro? Describe.


Yes, Such macros are called nested macros. They can be obtained by using symget
and call
symput macros.

14. If you need the value of a variable rather than the variable itself what would you
use to
load the value to a macro variable?
If we need a value of a macro variable then we must define it in such terms so that
we can call
them everywhere in the program. Define it as Global. There are different ways of
assigning a
global variable. Simplest method is %LET.

Ex:

A, is macro variable. Use following statement to assign the value of a rather than
the variable
itself
%Let A=xyz; %put x="&A";

This will assign "xyz" to x, not the variable xyz to x.

15. Can you execute macro within another macro? If so, how would SAS know where
the
current macro ended and the new one began?

Yes, I can execute macro within a macro, we call it as nesting of macros, which is
allowed.
Every macro's beginning is identified the keyword %macro and end with %mend.

16. How are parameters passed to a macro?


A macro variable defined in parentheses in a %MACRO statement is a macro
parameter. Macro
parameters allow you to pass information into a macro.

%macro plot(yvar= ,xvar= );


proc plot;
plot &yvar*&xvar;
run;
%mend plot;
%plot(age,sex)

17. How would you code a macro statement to produce information on the SAS log?

This statement can be coded anywhere?


OPTIONS MPRINT MLOGIC MERROR SYMBOLGEN;

Advance SAS Certification Question

Recently update Advance SAS Certification Question

Option to control input output


Ans . busize and buffno

The following SAS program is submitted:


%macro execute;
Proc print data= sasuser.houses;
Run;
<insert here>
%end;
%mend;
%execute

Which statement completes the program so that it executes on Tuesday?


a) %if &sysday=Tuesday %then %do;
b) %if &sysday=Tuesday %then %do;
c) %if &sysdate= Tuesday %then %do;
d) %if &sysdate=Tuesday %then %do;

Assume today is Tuesday, August 15, 2006. Which statement, submitted at the
beginning of a SAS session, assigns the value Tuesday, August 15, 2006 to the
macro variable START?
a) %let start= %eval(today(), weekdate.);
b) %let start= %sysfunc(today(), weekdate.);
c) %let start= %sysexec(today(), weekdate.);
d)%let start= %sysevalf(today(), weekdate.);

The following program is submitted:


%let value=0.5;
%let add=5;
%let newwval=%eval(&value+&add);
What is the value of the macro variable NEWVAL?
a) 5
b) 5.5
c)0.5+5
d) null

The SAS data set ONE has a variable X on which an index has been created. The
data sets ONE and THREE are sorted by X.

The following SAS program is submitted:


Data two;
Set three;
Set one key=X;
Run;
What is the purpose of including the KEY= option in the program?
a) It forces SAS to use the index X.
b) It re-creates the index X on the output data set TWO.
c) It instructs SAS to do a sequential read of both sorted data sets.
d) It gives SAS the option to use the index X or to do a sequential read of the data
set ONE.

The following SAS program is submitted:


Data new(bufsize=6144 bufno=4);
Set old;
Run;
What is the difference between usage of BUFSIZE= AND BUFNO= options?
a) BUFSIZE= specifies the size of the input buffer in bytes; BUFNO= specifies the
number of input buffers.
b) BUFSIZE= specifies the size of the output buffer in bytes; BUFNO= specifies the
number of output buffers.
c) BUFSIZE= specifies the size of the input buffer in kilobytes; BUFNO= specifies the
number of input buffers.
d) BUFSIZE= specifies the size of the output buffer in kilobytes; BUFNO= specifies
the number of output buffers.

Given the data set SASHELP.CLASS:


SASHELP.CLASS

NAME AGE
------- -----Mary 15
Philip 16
Robert 12
Ronald 15
The following SAS program is submitted:
%let value = Philip;
proc print data = sashelp.class;
<insert WHERE statement here>
run;

Which WHERE statement successfully completes the program and produces a


report?
a)

where upcase(name) = upcase(&value);

b)

where upcase(name) = %upcase(&value);

c)

where upcase(name) = "upcase(&value)";

d)

where upcase(name) = "%upcase(&value)";

The following SAS program is submitted:


data combine;
merge one two;
by id;
run;
Which SQL procedure program produces the same results?

A. proc sql;

create table combine as


select coalesce(one.id, two.id) as id,
name,
salary
from one full join two
on one.id = two.id;
quit;
B. proc sql;
create table combine as
select one.id,
name,
salary
from one inner join two
on one.id = two.id;
quit;
C. proc sql;
create table combine as
select coalesce(one.id, two.id) as id,
name,
salary
from one, two
where one.id = two.id;
quit;
D. proc sql;
create table combine as
select one.id,

name,
salary
from one full join two
where one.id = two.id;
quit;

Given the SAS data sets CLASS1 and CLASS2:


CLASS1 CLASS2
NAME COURSE NAME COURSE
-------- ----------- -------- -----------Lauren MATH1 Smith MATH2
Patel MATH1 Farmer MATH2
Chang MATH1 Patel MATH2
Hillier MATH2

The following SAS program is submitted:


proc sql;
select name from CLASS1
<insert SQL set operator here>
select name from CLASS2;
quit;
The following output is desired:
NAME
-------Chang
Lauren

Which SQL set operator completes the program and generates the desired output?
A. UNION
B. EXCEPT
C. INTERSECT
D. OUTER UNION CORR

The following SAS program is submitted:


%macro loop;
data one;
%do I = 1 %to 3;
var&I = &i; %
end;
run;
%mend;
%loop

After this program executes, the following is written to the SAS log:
(LOOP): Beginning execution.
(LOOP): %DO loop beginning; index variable I; start value is 1; stop value is 3; by
value is 1.
(LOOP): %DO loop index variable I is now 2; loop will iterate again.
(LOOP): %DO loop index variable I is now 3; loop will iterate again.
(LOOP): %DO loop index variable I is now 4; loop will not iterate again.
(LOOP): Ending execution.
Which SAS System option displays the notes in the SAS log?
A. MACRO
B. MLOGIC

C. MPRINT
D. SYMBOLGEN

The following SAS program is submitted:


data temp;
array points{2,3} (10, 15, 20, 25, 30, 35);
run;

What impact does the ARRAY statement have in the Program Data Vector (PDV)?

A. The variables named POINTS1, POINTS2, POINTS3, POINTS4, POINTS5, POINTS6


are
created in the PDV.
B. The variables named POINTS10, POINTS15, POINTS20, POINTS25, POINTS30,
POINTS35
are created in the PDV.
C. The variables named POINTS11, POINTS12, POINTS13, POINTS21, POINTS22,
POINTS23
are created in the PDV.
D. No variables are created in the PDV.

Which SAS integrity constraint type ensures that a specific set or range of values
are the only
values in a variable?

A. CHECK
B. UNIQUE

C. NOT NULL
D. PRIMARY KEY
The following SAS program is submitted:
data new (bufsize = 6144 bufno = 4);
set old;
run;
What is the difference between the usage of BUFSIZE= and BUFNO= options?

A. BUFSIZE= specifies the size of the input buffer in bytes; BUFNO= specifies the
number of
input buffers.
B. BUFSIZE= specifies the size of the output buffer in bytes; BUFNO= specifies the
number of
output buffers.
C. BUFSIZE= specifies the size of the input buffer in kilobytes; BUFNO= specifies the
number of
input buffers.
D. BUFSIZE= specifies the size of the output buffer in kilobytes; BUFNO= specifies
the number of
output buffers.

The following SAS program is submitted:


%let first = yourname;
%let last = first;
%put &&&last;
What is written to the SAS log?
A. First
B. &&first

C. yourname
D. &yourname
Given the following SAS data set ONE:
ONE
REP COST
________________________
SMITH 200
SMITH 400
JONES 100
SMITH 600
JONES 100
JONES 200
JONES 400
SMITH 800
JONES 100
JONES 300

The following SAS program is submitted:


proc sql;
select rep, avg(cost) as AVERAGE
from one group by rep
having avg(cost) > (select avg(cost) from one);
quit;
Which one of the following reports is generated?
A. REP AVERAGE
_______________

JONES 200
B. REP AVERAGE
_________________
JONES 320
C. REP AVERAGE
________________
SMITH 320
D. REP AVERAGE
________________
SMITH 500
The following SAS program is submitted:
%let value = 9;
%let value2 = 5;
%let newval = %eval(&value / &value2);

Which one of the following is the resulting value of the macro variable NEWVAL?
A. 1
B. 2
C. 1.8
D. null

The SAS data set ONE has a variable X on which an index has been created. The
data sets ONE
and THREE are sorted by X. Which one of the following SAS programs uses the index
to select
observations from the data set ONE?
A. data two;

set three;
set one key = X;
run;
B. data two;
set three key = X;
set one;
run;
C. data two;
set one;
set three key = X;
run;
D. data two;
set three;
set one (key = X);
run;

The following SAS program is submitted:


proc sql;
select rep, area, count(*) as TOTAL
from one group by rep, area;
quit;
Which one of the following reports is generated?
A. REP AREA COUNT
----------------------------------------------JONES EAST 100
JONES NORTH 600

JONES WEST 500


SMITH NORTH 800
SMITH SOUTH 200

B. REP AREA TOTAL


----------------------------------------------JONES EAST 100
JONES NORTH 600
JONES WEST 500
SMITH NORTH 800
SMITH SOUTH 200

C. REP AREA TOTAL


----------------------------------------------JONES EAST 1
JONES NORTH 2
JONES WEST 3
SMITH NORTH 3
JONES WEST 3
SMITH NORTH 3
SMITH SOUTH 1
D. REP AREA TOTAL
----------------------------------------------JONES EAST 1
JONES NORTH 2
JONES WEST 3

SMITH NORTH 3
SMITH SOUTH 1
SMITH NORTH 3
SMITH SOUTH 1

The following SAS program is submitted:


data temp;
array points{3,2}_temporary_ (10,20,30,40,50,60);
score = points{2,1}
run;
Which one of the following is the value of the variable SCORE in the data set TEMP?
A. 10
B. 20
C. 30
D. 40

The following SAS program is submitted:


%macro execute;
<insert statement here>
proc print data = sasuser.houses;
run;
%end;
%mend;
Which of the following completes the above program so that it executes on
Tuesday?

A. %if &sysday = Tuesday %then %do;

B. %if &sysday = 'Tuesday' %then %do;


C. %if "&sysday" = Tuesday %then %do;
D. %if '&sysday' = 'Tuesday' %then %do;

Which one of the following SAS integrity constraint types ensures that a specific set
or range of
values are the only values in a variable?
A. CHECK
B. UNIQUE
C. FORMAT
D. DISTINCT

Which one of the following options displays the value of a macro variable in the SAS
log?
A. MACRO
B. SOURCE
C. SOURCE2
D. SYMBOLGEN

What is the correct syntax to create macro variable with sql?

Select distinct country into:cur seprated by from tablename

The following SAS program is submitted:


options yearcutoff = 1950;
%macro y2kopt(date);

%if &date >= 14610 %then %do;


options yearcutoff = 2000;
%end;
%else %do;
options yearcutoff = 1900;
%end;
%mend;
data _null_ ;
date = "01jan2000"d;
call symput("date",left(date));
run;
%y2kopt(&date)

The SAS date for January 1, 2000 is 14610 and the SAS system option for
YEARCUTOFF is set
to 1920 prior to submitting the above program. Which one of the following is the
value of
YEARCUTOFF when the macro finishes execution?

A. 1900
B. 1920
C. 1950
D. 2000

Check the symtax what will happn when we submit this program.

Data aa ;

Length x y 5 z ;
Run ;

Data set will not created.

Which one of the following statements about compressed SAS data sets is always
true?
A. Each observation is treated as a single string of bytes.
B. Each observation occupies the same number of bytes.
C. An updated observation is stored in its original location.
D. New observations are added to the end of the SAS data set

Given the following SAS data set ONE:

ONE
LEVEL AGE
---------------------1 10
2 20
3 20
2 10
1 10
2 30
3 10
2 20
3 30
1 10

The following SAS program is submitted:


proc sql;
select level, max(age) as MAX
from one
group by level
having max(age) > (select avg(age) from one);
quit;
Which one of the following reports is generated?
A. LEVEL AGE
------------------2 20
3 20
B. LEVEL AGE
--------------2 30
3 30
C. LEVEL MAX
-------------------2 20
3 30
D. LEVEL MAX
-------------2 30
3 30.

The following SAS program is submitted.

filename sales ('external-file1' 'external-file2');


data new;
infile sales;
input date date9. company $ revenue;
run;

Which one of the following is the result of including the FILENAME statement in this
program?
A. The FILENAME statement produces an ERROR message in the SAS log.
B. The FILENAME statement associates SALES with external-file2 followed by
external-file1.
C. The FILENAME statement associates SALES with external-file1 followed by
external-file2.
D. The FILENAME statement reads record 1 from external-file 1, reads record 1 from
external-file
2, and combines them into one record

Which technique is use to find the unique value from a data sets?

First. And last.by


Proc sql unique
Proc sort

Where we cant use not sorted option ?

Merge

Code

C
M
A
R
P

Proc print data = dataset name ;


By code;
Run ;
No output will print.

Which statement is use to write data in a file ;

File statement

What option will display macro code and macro execution details in log window?

Mlogic and mprint

Data step with view ;

When msg will come to log ; ;

Both time .

SAS Macro Interview Question


1. Have you used macros? For what purpose you have used?
Yes I have, I used macros in creating analysis datasets and tables where it is necessary to make a
small change through out the program and where it is necessary to use the code again and again.
2. How would you invoke a macro?
After I have defined a macro I can invoke it by adding the percent sign prefix to its name like
this: % macro name a semicolon is not required when invoking a macro, though adding one
generally does no harm.
3. How can you create a macro variable with in data step?
with CALL SYMPUT
4. How would you identify a macro variable?
with Ampersand (&) sign
5. How would you define the end of a macro?
The end of the macro is defined by %Mend Statement
6. For what purposes have you used SAS macros?
If we want use a program step for executing to execute the same Proc step on multiple data sets.
We can accomplish repetitive tasks quickly and efficiently. A macro program can be reused many
times. Parameters passed to the macro program customize the results without having to change
the code within the macro program. Macros in SAS make a small change in the program and
have SAS echo that change thought that program.
7. What is the difference between %LOCAL and %GLOBAL?
% Local is a macro variable defined inside a macro.%Global is a macro variable defined in open
code (outside the macro or can use anywhere).
8. How long can a macro variable be? A token?
A component of SAS known as the word scanner breaks the program text into fundamental units
called tokens.
Tokens are passed on demand to the compiler.
The compiler then requests token until it receives a semicolon.
Then the compiler performs the syntax check on the statement.
9. If you use a SYMPUT in a DATA step, when and where can you use the macro variable?

The macro variable created by the CALL SYMPUT routine cannot be used in the same datastep
in which it got created. Other than that we can use the macro variable at any time..
10. What do you code to create a macro? End one?
We create a macro with %MACRO statement and end a macro with %MEND statemnt.

Vous aimerez peut-être aussi