Académique Documents
Professionnel Documents
Culture Documents
Are you familiar with special input delimiters? How are they used?
DLM, DSD are the special input delimiters…
DELIMITER= delimiter(s)
specifies an alternate delimiter (other than a blank) to be used for LIST input
DSD (delimiter-sensitive data)
specifies that when data values are enclosed in quotation marks, delimiters within the value be
treated as character data. The DSD option changes how SAS treats delimiters when you use
LIST input and sets the default delimiter to a comma. When you specify DSD, SAS treats two
consecutive delimiters as a missing value and removes quotation marks from character values
http://support.sas.com/onlinedoc/913/getDoc/en/lrdict.hlp/a000146932.htm#a000177189
If reading a variable length file with fixed input, how would you prevent SAS from reading the
next record if the last variable didn't have a value?
Options MISSOVER and TRUNCOVER options..
MISSOVER
prevents an INPUT statement from reading a new input data record if it does not find values in
the current input line for all the variables in the statement. When an INPUT statement reaches
the end of the current input data record, variables without any values assigned are set to
missing.
TRUNCOVER
overrides the default behavior of the INPUT statement when an input data record is shorter than
the INPUT statement expects. By default, the INPUT statement automatically reads the next
input data record. TRUNCOVER enables you to read variable-length records when some
records are shorter than the INPUT statement expects. Variables without any values assigned
are set to missing.
http://support.sas.com/onlinedoc/913/getDoc/en/lrdict.hlp/a000146932.htm#a000177189
What is the difference between an informat and a format? Name three informats or formats.
http://support.sas.com/onlinedoc/913/getDoc/en/lrdict.hlp/a000178244.htm
It’s basically used in a datastep format / SQL select / Procedure format statements to output SAS data to
a file/report etc.
Formats can look-like informats but are differentiated as to which statement they are used in…
eg. Datew., Worddatew., mmddyyw.
http://support.sas.com/onlinedoc/913/getDoc/en/lrdict.hlp/a000178212.htm
Name and describe three SAS functions that you have used, if any?
http://sastechies.com/SASfunctions.php
How would you code the criteria to restrict the output to be produced?
In view of in-sufficient clarity as to what the interviewer refers to –
What is the purpose of the trailing @ and the @@? How would you use them?
Line-hold specifiers keep the pointer on the current input record when
Use a single trailing @ to allow the next INPUT statement to read from the same record. Use a double
trailing @ to hold a record for the next INPUT statement across iterations of the DATA step.
Normally, each INPUT statement in a DATA step reads a new data record into the input buffer. When
you use a trailing @, the following occurs:
input;
• immediately if the pointer moves past the end of the input record
• immediately if a null INPUT statement executes:
input;
• when the next iteration of the DATA step begins if an INPUT statement with a single trailing
@ executes later in the DATA step:
input @;
A record held by the double trailing at sign (@@) is not released until
Unlike the @@, the single @ also releases a record when control returns to the
top of the DATA step for the next iteration.
data perm.sales97;
infile data97 missover;
input ID $4. @;
do Quarter=1 to 4;
input Sales : comma. @;
output;
end;
run;
>V---+----10---+----
H 321 S. MAIN STperm.residnts;
data
>----+----10---+----20
P MARY E 21 F infile census;
retain Address; H 321 S. MAIN ST
P WILLIAM M 23input
M type $1. @;
P MARY E 21 F
P if type='H' then do; P WILLIAM M 23 M
SUSAN K 3if F _n_ > 1 then output; P SUSAN K 3 F
Total=0; H 324 S. MAIN ST
input Address $ 3-17; P THOMAS H 79 M
end; P
else if type='P' then total+1; P WALTER S 46 M
P ALICE A 42 F
P MARYANN A 20 F
H
P JOHN S 16 M
P 325A S. MAIN ST
JAMES L 34 M
H LIZA A 31 F
P 325B S. MAIN ST
P MARGO K 27 F
WILLIAM R 27 M
P ROBERT W 1 M
Under what circumstances would you code a SELECT construct instead of IF statements?
The SELECT statement begins a SELECT group. SELECT groups contain WHEN statements that
identify SAS statements that are executed when a particular condition is true. Use at least one WHEN
statement in a SELECT group. An optional OTHERWISE statement specifies a statement to be
executed if no WHEN condition is met. An END statement ends a SELECT group.
Null statements that are used in WHEN statements cause SAS to recognize a condition as true without
taking further action. Null statements that are used in OTHERWISE statements prevent SAS from
issuing an error message when all WHEN conditions are false.
Using Select-When improves processing efficiency and understandability in programs that needed to
check a series of conditions for the same variable.
Use IF-THEN/ELSE statements for programs with few statements.
Using a subsetting IF statement without a THEN clause could be dangerous because it would process
only those records that meet the condition specified in the IF clause.
http://support.sas.com/onlinedoc/913/getDoc/en/lrdict.hlp/a000201966.htm
What statement you code to tell SAS that it is to write to an external file?
The FILENAME statement is an optional statement that species the location of the external file.
PUT Statement – Writes the variable values to the external file.
The FILE statement specifies the current output file for PUT statements in the DATA step.
When multiple FILE statements are present, the PUT statement builds and writes output lines to the file
that was specified in the most recent FILE statement. If no FILE statement was specified, the PUT
statement writes to the SAS log. The specified output file must be an external file, not a SAS data library,
and it must be a valid access type.
If reading an external file to produce an external file, what is the shortcut to write that record
without coding every single variable on the record?
Data _null_;
_NULL_ - specifies that SAS does not create a data set when it executes the DATA step.
eg.
Data _null_;
Set somedata;
Call symput(‘macvar’,dsnvariable);
Run;
Eg.
The second DATA step in this program produces a custom report and uses the _NULL_ keyword to
execute the DATA step without creating a SAS data set:
data sales; input dept : $10. jan feb mar; datalines; shoes 4344 3555 2666 housewares 3777 4888
7999 appliances 53111 7122 41333 ; data _null_; set sales; qtr1tot=jan+feb+mar; put 'Total
Quarterly Sales: ' qtr1tot dollar12.; run;
What is the one statement to set the criteria of data that can be coded in any step?
WHERE statement can sets the criteria for any data set in a datastep or a proc step.
Have you ever linked SAS code? If so, describe the link and any required statements used to
either process the code or the step itself.
SAS code could be linked using the GOTO or the Link statement.
GOTO - http://support.sas.com/onlinedoc/913/getDoc/en/lrdict.hlp/a000201949.htm
LINK - http://support.sas.com/onlinedoc/913/getDoc/en/lrdict.hlp/a000201972.htm
The difference between the LINK statement and the GO TO statement is in the action of a subsequent
RETURN statement. A RETURN statement after a LINK statement returns execution to the statement
that follows LINK. A RETURN statement after a GO TO statement returns execution to the beginning of
the DATA step, unless a LINK statement precedes GO TO, in which case execution continues with the
first statement after LINK. In addition, a LINK statement is usually used with an explicit RETURN
statement, whereas a GO TO statement is often used without a RETURN statement.
When your program executes a group of statements at several points in the program, using the LINK
statement simplifies coding and makes program logic easier to follow. If your program executes a group
of statements at only one point in the program, using DO-group logic rather than LINK-RETURN logic is
simpler.
Goto eg.
data info;
input x;
if 1<=x<=5 then go to add;
put x=;
add: sumx+x;
datalines;
7
6
323
;
Link Eg.
data hydro;
input type $ depth station $;
/* link to label calcu: */
if type ='aluv' then link calcu;
date=today();
/* return to top of step */
return;
calcu: if station='site_1'
then elevatn=6650-depth;
else if station='site_2'
then elevatn=5500-depth;
/* return to date=today(); */
return;
datalines;
aluv 523 site_1
uppa 234 site_2
aluv 666 site_2
...more data lines...
;
How would you include common or reuse code to be processed along with your statements?
- Using SAS Macros.
- Using a %include statement
When looking for data contained in a character string of 150 bytes, which function is the best to
locate that data: scan, index, or indexc?
Index function - Searches a character expression for a string of characters
The INDEXW function searches for strings that are words, whereas the INDEX function searches for
patterns as separate words or as parts of other words. INDEXC searches for any characters that are
present in the excerpts.
s='asdf adog dog';
p='dog ';
x=indexw(s,p);
put x;
If you have a data set that contains 100 variables, but you need only five of those, what is the
code to force SAS to use only those variables?
Use KEEP= dataset option (data statement or set statement) or KEEP statement in a datastep.
eg.
Data fewdata (keep = var10 var11);
Set fulldata (Keep= VAR1 VAR2 VAR3 VAR4 VAR5);
Keep var6 var7;
Run;
Code a PROC SORT on a data set containing State, District and County as the primary variables,
along with several numeric variables.
Proc sort data= Dist_County;
By state district city;
Run;
data cricket;
input id country $9. score;
cards;
1 australia 342
2 somerset 343
1 australia 342
2 somerset 341
;
run;
Here in the example observation 1 and 3 are duplicate records….so Obs 1 is retained…
How would you delete observations with duplicate keys?
nodupkey option in a Proc Sort.
proc sort data = cricket nodupkey;
by id;
run;
In the above example Observation 1/ 3 and 2 / 4 have duplicate key (variable id) values i.e. 1 and 2
respectively…so observations 3 / 4 get deleted…
How would you code a merge that will keep only the observations that have matches from both
sets.
data mergeddata;
merge one(in=A) two(in=B);
By ID;
if A and B;
run;
How would you code a merge that will write the matches of both to one data set, the non-
matches from the left-most data.
Along with data set variables and computed variables, the PDV contains two automatic variables, _N_
and _ERROR_. The _N_ variable counts the number of times the DATA step begins to iterate. The
_ERROR_ variable signals the occurrence of an error caused by the data during execution. The value of
_ERROR_ is either 0 (indicating no errors exist), or 1 (indicating that one or more errors have occurred).
SAS does not write these variables to the output data set.
Does SAS 'Translate' (compile) or does it 'Interpret'? Explain. At compile time when a SAS
data set is read, what items are created?
SAS compiles the code sent to the compiler.
When you submit a DATA step for execution, SAS checks the syntax of the SAS statements and
compiles them, that is, automatically translates the statements into machine code. In this phase, SAS
identifies the type and length of each new variable, and determines whether a type conversion is
necessary for each subsequent reference to a variable. During the compile phase, SAS creates the
following three items:
input buffer is a logical area in memory into which SAS reads each record of raw data when SAS
executes an INPUT statement. Note that this buffer is created only when the DATA
step reads raw data. (When the DATA step reads a SAS data set, SAS reads the data
directly into the program data vector.)
program data is a logical area in memory where SAS builds a data set, one observation at a time.
vector (PDV) When a program executes, SAS reads data values from the input buffer or creates
them by executing SAS language statements. The data values are assigned to the
appropriate variables in the program data vector. From here, SAS writes the values to
a SAS data set as a single observation.
Along with data set variables and computed variables, the PDV contains two automatic
variables, _N_ and _ERROR_. The _N_ variable counts the number of times the
DATA step begins to iterate. The _ERROR_ variable signals the occurrence of an
error caused by the data during execution. The value of _ERROR_ is either 0
(indicating no errors exist), or 1 (indicating that one or more errors have occurred).
SAS does not write these variables to the output data set.
descriptor is information that SAS creates and maintains about each SAS data set, including data
information set attributes and variable attributes. It contains, for example, the name of the data set
and its member type, the date and time that the data set was created, and the number,
names and data types (character or numeric) of the variables.
• The DATA step begins with a DATA statement. Each time the DATA statement executes, a
new iteration of the DATA step begins, and the _N_ automatic variable is incremented by 1.
• SAS sets the newly created program variables to missing in the program data vector (PDV).
• SAS reads a data record from a raw data file into the input buffer, or it reads an observation
from a SAS data set directly into the program data vector. You can use an INPUT, MERGE, SET,
MODIFY, or UPDATE statement to read a record.
• SAS executes any subsequent programming statements for the current record.
• At the end of the statements, an output, return, and reset occur automatically. SAS writes an
observation to the SAS data set, the system automatically returns to the top of the DATA step,
and the values of variables created by INPUT and assignment statements are reset to missing in
the program data vector. Note that variables that you read with a SET, MERGE, MODIFY, or
UPDATE statement are not reset to missing here.
• SAS counts iteration, reads the next record or observation, and executes the subsequent
programming statements for the current observation.
• The DATA step terminates when SAS encounters the end-of-file in a SAS data set or a raw
data file.
All the variables are assigned missing values (Blank for character, . for numeric values)
What is _n_? The _N_ variable counts the number of times the DATA step begins to iterate.
It is one of the Automatic data step (and not proc’s) variables (the other one being _ERROR_) that
SAS provides in a PDV. It should be noted that _n_ does not necessarily equal the observation
number in a dataset.