Académique Documents
Professionnel Documents
Culture Documents
come up with a model you are satisfied with or what you feel is the best possible result with
the given data.
5. Validation The final model (or maybe the best 2-3 models) should then be put through
the validation process. In this process, you test the model using completely new data set i.e.
data that was not used to build the model. This process ensures that your model is a good
model in general and not just a very good model for the specific data earlier used
(Technically, this is called avoiding over fitting)
6. Implementation and tracking The final model is chosen after the validation. Then you
start implementing the model and tracking the results. You need to track results to see the
performance of the model over time. In general, the accuracy of a model goes down over
time. How much time will really depend on the variables how dynamic or static they are,
and the general environment how static or dynamic that is.
What is there in the data look at the list of all the variables in the data set.
Understand the meaning of each variable using the data dictionary. Go back to the business
for more information in case of any confusion.
b.
How much data is there look at the volume of the data (how many records), look
at the time frame of the data (last 3 months, last 6 months etc.)
c.
Quality of the data how much missing information, quality of data in each
variable. Are all fields usable? If a field has data for only 10% of the observations, then
maybe that field is not usable etc.
d.
You will also identify some important variables and may do a deeper investigation of
these. Like looking at averages, min and max values, maybe 10 th and 90th percentile as
well
e.
You may also identify fields that you need to transform in the data prep stage.
You can identify outliers using graphical analysis and univariate analysis. If there are only a
few outliers, you can assess them individually. If there are many, you may want to substitute
the outlier values with the 1stpercentile or the 99th percentile values.
If there is a lot of data, you may decide to ignore records with outliers.
Not all extreme values are outliers. Not all outliers are extreme values.
What have you done to improve your data analytics knowledge in the past year?
The answers to these questions will have to be unique to the person answering it. The key
is to show confidence and give well thought out answers that demonstrate you are
knowledgeable about the industry and have the conviction to work hard and excel as a data
analyst.
1. Have you used macros? For what purpose you have used?
Yes I have, I used macros in creating analysis datasets and tables where it is
necessary to make a
small change through out the program and where it is necessary to use the code
again and again.
If we want use a program step for executing to execute the same Proc step on
multiple data sets.
We can accomplish repetitive tasks quickly and efficiently. A macro program can be
reused
many times. Parameters passed to the macro program customize the results
without having to
change the code within the macro program. Macros in SAS make a small change in
the program
and have SAS echo that change thought that program.
9. If you use a SYMPUT in a DATA step, when and where can you use the macro
variable?
The macro variable created by the CALL SYMPUT routine cannot be used in the
same datastep
in which it got created. Other than that we can use the macro variable at any time..
%PUT is used to display user defined messages on log window after execution of a
program
where as % SYMBOLGEN is used to print the value of a macro variable resolved, in
log
window.
12. How do you add a number to a macro variable?
Using %eval function or %sysevalf function if the number is a floating number.
14. If you need the value of a variable rather than the variable itself what would you
use to
load the value to a macro variable?
If we need a value of a macro variable then we must define it in such terms so that
we can call
them everywhere in the program. Define it as Global. There are different ways of
assigning a
global variable. Simplest method is %LET.
Ex:
A, is macro variable. Use following statement to assign the value of a rather than
the variable
itself
%Let A=xyz; %put x="&A";
15. Can you execute macro within another macro? If so, how would SAS know where
the
current macro ended and the new one began?
Yes, I can execute macro within a macro, we call it as nesting of macros, which is
allowed.
Every macro's beginning is identified the keyword %macro and end with %mend.
17. How would you code a macro statement to produce information on the SAS log?
Assume today is Tuesday, August 15, 2006. Which statement, submitted at the
beginning of a SAS session, assigns the value Tuesday, August 15, 2006 to the
macro variable START?
a) %let start= %eval(today(), weekdate.);
b) %let start= %sysfunc(today(), weekdate.);
c) %let start= %sysexec(today(), weekdate.);
d)%let start= %sysevalf(today(), weekdate.);
The SAS data set ONE has a variable X on which an index has been created. The
data sets ONE and THREE are sorted by X.
NAME AGE
------- -----Mary 15
Philip 16
Robert 12
Ronald 15
The following SAS program is submitted:
%let value = Philip;
proc print data = sashelp.class;
<insert WHERE statement here>
run;
b)
c)
d)
A. proc sql;
name,
salary
from one full join two
where one.id = two.id;
quit;
Which SQL set operator completes the program and generates the desired output?
A. UNION
B. EXCEPT
C. INTERSECT
D. OUTER UNION CORR
After this program executes, the following is written to the SAS log:
(LOOP): Beginning execution.
(LOOP): %DO loop beginning; index variable I; start value is 1; stop value is 3; by
value is 1.
(LOOP): %DO loop index variable I is now 2; loop will iterate again.
(LOOP): %DO loop index variable I is now 3; loop will iterate again.
(LOOP): %DO loop index variable I is now 4; loop will not iterate again.
(LOOP): Ending execution.
Which SAS System option displays the notes in the SAS log?
A. MACRO
B. MLOGIC
C. MPRINT
D. SYMBOLGEN
What impact does the ARRAY statement have in the Program Data Vector (PDV)?
Which SAS integrity constraint type ensures that a specific set or range of values
are the only
values in a variable?
A. CHECK
B. UNIQUE
C. NOT NULL
D. PRIMARY KEY
The following SAS program is submitted:
data new (bufsize = 6144 bufno = 4);
set old;
run;
What is the difference between the usage of BUFSIZE= and BUFNO= options?
A. BUFSIZE= specifies the size of the input buffer in bytes; BUFNO= specifies the
number of
input buffers.
B. BUFSIZE= specifies the size of the output buffer in bytes; BUFNO= specifies the
number of
output buffers.
C. BUFSIZE= specifies the size of the input buffer in kilobytes; BUFNO= specifies the
number of
input buffers.
D. BUFSIZE= specifies the size of the output buffer in kilobytes; BUFNO= specifies
the number of
output buffers.
C. yourname
D. &yourname
Given the following SAS data set ONE:
ONE
REP COST
________________________
SMITH 200
SMITH 400
JONES 100
SMITH 600
JONES 100
JONES 200
JONES 400
SMITH 800
JONES 100
JONES 300
JONES 200
B. REP AVERAGE
_________________
JONES 320
C. REP AVERAGE
________________
SMITH 320
D. REP AVERAGE
________________
SMITH 500
The following SAS program is submitted:
%let value = 9;
%let value2 = 5;
%let newval = %eval(&value / &value2);
Which one of the following is the resulting value of the macro variable NEWVAL?
A. 1
B. 2
C. 1.8
D. null
The SAS data set ONE has a variable X on which an index has been created. The
data sets ONE
and THREE are sorted by X. Which one of the following SAS programs uses the index
to select
observations from the data set ONE?
A. data two;
set three;
set one key = X;
run;
B. data two;
set three key = X;
set one;
run;
C. data two;
set one;
set three key = X;
run;
D. data two;
set three;
set one (key = X);
run;
SMITH NORTH 3
SMITH SOUTH 1
SMITH NORTH 3
SMITH SOUTH 1
Which one of the following SAS integrity constraint types ensures that a specific set
or range of
values are the only values in a variable?
A. CHECK
B. UNIQUE
C. FORMAT
D. DISTINCT
Which one of the following options displays the value of a macro variable in the SAS
log?
A. MACRO
B. SOURCE
C. SOURCE2
D. SYMBOLGEN
The SAS date for January 1, 2000 is 14610 and the SAS system option for
YEARCUTOFF is set
to 1920 prior to submitting the above program. Which one of the following is the
value of
YEARCUTOFF when the macro finishes execution?
A. 1900
B. 1920
C. 1950
D. 2000
Check the symtax what will happn when we submit this program.
Data aa ;
Length x y 5 z ;
Run ;
Which one of the following statements about compressed SAS data sets is always
true?
A. Each observation is treated as a single string of bytes.
B. Each observation occupies the same number of bytes.
C. An updated observation is stored in its original location.
D. New observations are added to the end of the SAS data set
ONE
LEVEL AGE
---------------------1 10
2 20
3 20
2 10
1 10
2 30
3 10
2 20
3 30
1 10
Which one of the following is the result of including the FILENAME statement in this
program?
A. The FILENAME statement produces an ERROR message in the SAS log.
B. The FILENAME statement associates SALES with external-file2 followed by
external-file1.
C. The FILENAME statement associates SALES with external-file1 followed by
external-file2.
D. The FILENAME statement reads record 1 from external-file 1, reads record 1 from
external-file
2, and combines them into one record
Which technique is use to find the unique value from a data sets?
Merge
Code
C
M
A
R
P
File statement
What option will display macro code and macro execution details in log window?
Both time .
The macro variable created by the CALL SYMPUT routine cannot be used in the same datastep
in which it got created. Other than that we can use the macro variable at any time..
10. What do you code to create a macro? End one?
We create a macro with %MACRO statement and end a macro with %MEND statemnt.