Vous êtes sur la page 1sur 22

What SAS statements would you code to read an external raw data file to a DATA step?

We use SAS statements –


FILENAME – to specify the location of the file
INFILE - Identifies an external file to read with an INPUT statement
INPUT – to specify the variables that the data is identified with.

How do you read in the variables that you need?


Using Input statement with column /line pointers, informats and length specifiers.

Are you familiar with special input delimiters? How are they used?
DLM, DSD are the special input delimiters…

DELIMITER= delimiter(s)
specifies an alternate delimiter (other than a blank) to be used for LIST input
DSD (delimiter-sensitive data)
specifies that when data values are enclosed in quotation marks, delimiters within the value be
treated as character data. The DSD option changes how SAS treats delimiters when you use
LIST input and sets the default delimiter to a comma. When you specify DSD, SAS treats two
consecutive delimiters as a missing value and removes quotation marks from character values
http://support.sas.com/onlinedoc/913/getDoc/en/lrdict.hlp/a000146932.htm#a000177189

If reading a variable length file with fixed input, how would you prevent SAS from reading the
next record if the last variable didn't have a value?
Options MISSOVER and TRUNCOVER options..
MISSOVER
prevents an INPUT statement from reading a new input data record if it does not find values in
the current input line for all the variables in the statement. When an INPUT statement reaches
the end of the current input data record, variables without any values assigned are set to
missing.
TRUNCOVER
overrides the default behavior of the INPUT statement when an input data record is shorter than
the INPUT statement expects. By default, the INPUT statement automatically reads the next
input data record. TRUNCOVER enables you to read variable-length records when some
records are shorter than the INPUT statement expects. Variables without any values assigned
are set to missing.
http://support.sas.com/onlinedoc/913/getDoc/en/lrdict.hlp/a000146932.htm#a000177189

What is the difference between an informat and a format? Name three informats or formats.

INFORMAT Statement - Associates informats with variables


It’s basically used in an input / SQL create table statements to read external file raw data or data that is
not in a SAS format.

http://support.sas.com/onlinedoc/913/getDoc/en/lrdict.hlp/a000178244.htm

eg: commaw. datew. Wordatew. dollarw. $varyinglengthw.


FORMAT Statement Associates formats with variables

It’s basically used in a datastep format / SQL select / Procedure format statements to output SAS data to
a file/report etc.

Formats can look-like informats but are differentiated as to which statement they are used in…
eg. Datew., Worddatew., mmddyyw.

http://support.sas.com/onlinedoc/913/getDoc/en/lrdict.hlp/a000178212.htm

Name and describe three SAS functions that you have used, if any?

The most common functions that would be used are-


Conversion functions - Input / Put / int / ceil / floor
Character functions - Scan / substr / index / Left / trim / compress / cat / catx / upcase,lowcase
Arithmetic functions - Sum / abs /
Attribute info functions – Attrn / length
Dataset – open / close / exist
Directory - dexist / dopen / dclose / dcreate / dinfo
File functions – fexist / fopen/ filename / fileref
SQL functions – coalesce / count / sum/ mean
Date functions – date / today / datdif / datepart / datetime / intck / mdy
Array functions – dim

http://sastechies.com/SASfunctions.php

How would you code the criteria to restrict the output to be produced?
In view of in-sufficient clarity as to what the interviewer refers to –

Global statement – options obs=;


Dataset options – obs=
Proc SQL - NOPRINT option for reporting / inobs= , outobs= for SQL select
Proc datasets – NOLIST option

What is the purpose of the trailing @ and the @@? How would you use them?

Line-hold specifiers keep the pointer on the current input record when

• a data record is read by more than one INPUT statement (trailing @)


• one input line has values for more than one observation (double trailing @)
• a record needs to be reread on the next iteration of the DATA step (double trailing @).

Use a single trailing @ to allow the next INPUT statement to read from the same record. Use a double
trailing @ to hold a record for the next INPUT statement across iterations of the DATA step.
Normally, each INPUT statement in a DATA step reads a new data record into the input buffer. When
you use a trailing @, the following occurs:

• The pointer position does not change.


• No new record is read into the input buffer.
• The next INPUT statement for the same iteration of the DATA step continues to read the
same record rather than a new one.

SAS releases a record held by a trailing @ when

• a null INPUT statement executes:

input;

• an INPUT statement without a trailing @ executes


• the next iteration of the DATA step begins.
Normally, when you use a double trailing @ (@@), the INPUT statement for the next iteration of the
DATA step continues to read the same record. SAS releases the record that is held by a double trailing
@

• immediately if the pointer moves past the end of the input record
• immediately if a null INPUT statement executes:

input;

• when the next iteration of the DATA step begins if an INPUT statement with a single trailing
@ executes later in the DATA step:

input @;

A record held by the double trailing at sign (@@) is not released until

• the input pointer moves past the end of the record.


>----+----10--V+-
Then the input pointer moves down to the next record.
102 92 78 103
84 23 36 75

• an INPUT statement without a line-hold specifier input ID $4. @@;


executes. .
.
input Department 5.;
•enables the next INPUT statement to read from the same record
• releases the current record when a subsequent INPUT statement executes without
a line-hold specifier.

Unlike the @@, the single @ also releases a record when control returns to the
top of the DATA step for the next iteration.

data perm.sales97;
infile data97 missover;
input ID $4. @;
do Quarter=1 to 4;
input Sales : comma. @;
output;
end;
run;

Raw Data File Data97


>----V----10---+----20---+----30---+----40
0734 1,323.34 2,472.85 3,276.65 5,345.52
0943 1,908.34 2,560.38
1009 2,934.12 3,308.41 4,176.18 7,581.81

data perm.people (drop=type);


infile census;
retain Address;
input type $1. @;
if type='H' then input @3 Address $15.;
if type='P';
input @3 Name $10. @13 Age 3. @15 Gender $1.;
run;

>V---+----10---+----
H 321 S. MAIN STperm.residnts;
data
>----+----10---+----20
P MARY E 21 F infile census;
retain Address; H 321 S. MAIN ST
P WILLIAM M 23input
M type $1. @;
P MARY E 21 F
P if type='H' then do; P WILLIAM M 23 M
SUSAN K 3if F _n_ > 1 then output; P SUSAN K 3 F
Total=0; H 324 S. MAIN ST
input Address $ 3-17; P THOMAS H 79 M
end; P
else if type='P' then total+1; P WALTER S 46 M
P ALICE A 42 F
P MARYANN A 20 F
H
P JOHN S 16 M
P 325A S. MAIN ST
JAMES L 34 M
H LIZA A 31 F
P 325B S. MAIN ST
P MARGO K 27 F
WILLIAM R 27 M
P ROBERT W 1 M

Under what circumstances would you code a SELECT construct instead of IF statements?
The SELECT statement begins a SELECT group. SELECT groups contain WHEN statements that
identify SAS statements that are executed when a particular condition is true. Use at least one WHEN
statement in a SELECT group. An optional OTHERWISE statement specifies a statement to be
executed if no WHEN condition is met. An END statement ends a SELECT group.
Null statements that are used in WHEN statements cause SAS to recognize a condition as true without
taking further action. Null statements that are used in OTHERWISE statements prevent SAS from
issuing an error message when all WHEN conditions are false.
Using Select-When improves processing efficiency and understandability in programs that needed to
check a series of conditions for the same variable.
Use IF-THEN/ELSE statements for programs with few statements.
Using a subsetting IF statement without a THEN clause could be dangerous because it would process
only those records that meet the condition specified in the IF clause.
http://support.sas.com/onlinedoc/913/getDoc/en/lrdict.hlp/a000201966.htm
What statement you code to tell SAS that it is to write to an external file?

FILENAME / FILE/ PUT

The FILENAME statement is an optional statement that species the location of the external file.
PUT Statement – Writes the variable values to the external file.

The FILE statement specifies the current output file for PUT statements in the DATA step.
When multiple FILE statements are present, the PUT statement builds and writes output lines to the file
that was specified in the most recent FILE statement. If no FILE statement was specified, the PUT
statement writes to the SAS log. The specified output file must be an external file, not a SAS data library,
and it must be a valid access type.

If reading an external file to produce an external file, what is the shortcut to write that record
without coding every single variable on the record?

Use the _infile_ option in the put statement

filename some 'c:\cool.dat';


filename cool1 'c:\cool1.dat';
data _null_;
infile some;
input some;
file cool1;
put _infile_;
run;
If you're not wanting any SAS output from a data step, how would you code the data statement to
prevent SAS from producing a set?

Data _null_;

_NULL_ - specifies that SAS does not create a data set when it executes the DATA step.

Data _null_ is majorly used in

o creating quick macro variables with call symput routine

eg.
Data _null_;
Set somedata;
Call symput(‘macvar’,dsnvariable);
Run;

o Creating a Custom Report

Eg.
The second DATA step in this program produces a custom report and uses the _NULL_ keyword to
execute the DATA step without creating a SAS data set:
data sales; input dept : $10. jan feb mar; datalines; shoes 4344 3555 2666 housewares 3777 4888
7999 appliances 53111 7122 41333 ; data _null_; set sales; qtr1tot=jan+feb+mar; put 'Total
Quarterly Sales: ' qtr1tot dollar12.; run;
What is the one statement to set the criteria of data that can be coded in any step?
WHERE statement can sets the criteria for any data set in a datastep or a proc step.

Have you ever linked SAS code? If so, describe the link and any required statements used to
either process the code or the step itself.

SAS code could be linked using the GOTO or the Link statement.

GOTO - http://support.sas.com/onlinedoc/913/getDoc/en/lrdict.hlp/a000201949.htm
LINK - http://support.sas.com/onlinedoc/913/getDoc/en/lrdict.hlp/a000201972.htm
The difference between the LINK statement and the GO TO statement is in the action of a subsequent
RETURN statement. A RETURN statement after a LINK statement returns execution to the statement
that follows LINK. A RETURN statement after a GO TO statement returns execution to the beginning of
the DATA step, unless a LINK statement precedes GO TO, in which case execution continues with the
first statement after LINK. In addition, a LINK statement is usually used with an explicit RETURN
statement, whereas a GO TO statement is often used without a RETURN statement.
When your program executes a group of statements at several points in the program, using the LINK
statement simplifies coding and makes program logic easier to follow. If your program executes a group
of statements at only one point in the program, using DO-group logic rather than LINK-RETURN logic is
simpler.
Goto eg.
data info;
input x;
if 1<=x<=5 then go to add;
put x=;
add: sumx+x;
datalines;
7
6
323
;
Link Eg.

data hydro;
input type $ depth station $;
/* link to label calcu: */
if type ='aluv' then link calcu;
date=today();
/* return to top of step */
return;
calcu: if station='site_1'
then elevatn=6650-depth;
else if station='site_2'
then elevatn=5500-depth;
/* return to date=today(); */
return;
datalines;
aluv 523 site_1
uppa 234 site_2
aluv 666 site_2
...more data lines...
;

How would you include common or reuse code to be processed along with your statements?
- Using SAS Macros.
- Using a %include statement

When looking for data contained in a character string of 150 bytes, which function is the best to
locate that data: scan, index, or indexc?
Index function - Searches a character expression for a string of characters

SAS Statements Results


a='ABC.DEF (X=Y)';
b='X=Y';
x=index(a,b);
put x; 10

For learning purposes


The INDEXC function searches for the first occurrence of any individual character that is present
within the character string, whereas the INDEX function searches for the first occurrence of the
character string as a pattern.
b='have a good day';
x=indexc(b,'pleasant','very');
put x;

The INDEXW function searches for strings that are words, whereas the INDEX function searches for
patterns as separate words or as parts of other words. INDEXC searches for any characters that are
present in the excerpts.
s='asdf adog dog';
p='dog ';
x=indexw(s,p);
put x;

If you have a data set that contains 100 variables, but you need only five of those, what is the
code to force SAS to use only those variables?
Use KEEP= dataset option (data statement or set statement) or KEEP statement in a datastep.

eg.
Data fewdata (keep = var10 var11);
Set fulldata (Keep= VAR1 VAR2 VAR3 VAR4 VAR5);
Keep var6 var7;
Run;

Code a PROC SORT on a data set containing State, District and County as the primary variables,
along with several numeric variables.
Proc sort data= Dist_County;
By state district city;
Run;

How would you delete duplicate observations?


noduprecs option in a Proc Sort.

data cricket;
input id country $9. score;
cards;
1 australia 342
2 somerset 343
1 australia 342
2 somerset 341
;
run;

proc sort data = cricket noduprecs;


by id;
run;

Here in the example observation 1 and 3 are duplicate records….so Obs 1 is retained…
How would you delete observations with duplicate keys?
nodupkey option in a Proc Sort.
proc sort data = cricket nodupkey;
by id;
run;
In the above example Observation 1/ 3 and 2 / 4 have duplicate key (variable id) values i.e. 1 and 2
respectively…so observations 3 / 4 get deleted…

How would you code a merge that will keep only the observations that have matches from both
sets.
data mergeddata;
merge one(in=A) two(in=B);
By ID;
if A and B;
run;

How would you code a merge that will write the matches of both to one data set, the non-
matches from the left-most data.

Data one two three;


Merge DSN1 (in=A) DSN2 (in=B);
By ID;
If A and B then output one;
If A and not B then output two;
If not A and B then output three;
Run;
What is the Program Data Vector (PDV)? What are its functions?
PDV is a logical area in memory where SAS builds a data set, one observation at a time. When a
program executes, SAS reads data values from the input buffer or creates them by executing SAS
language statements. The data values are assigned to the appropriate variables in the program data
vector. From here, SAS writes the values to a SAS data set as a single observation.

Along with data set variables and computed variables, the PDV contains two automatic variables, _N_
and _ERROR_. The _N_ variable counts the number of times the DATA step begins to iterate. The
_ERROR_ variable signals the occurrence of an error caused by the data during execution. The value of
_ERROR_ is either 0 (indicating no errors exist), or 1 (indicating that one or more errors have occurred).
SAS does not write these variables to the output data set.

Does SAS 'Translate' (compile) or does it 'Interpret'? Explain. At compile time when a SAS
data set is read, what items are created?
SAS compiles the code sent to the compiler.
When you submit a DATA step for execution, SAS checks the syntax of the SAS statements and
compiles them, that is, automatically translates the statements into machine code. In this phase, SAS
identifies the type and length of each new variable, and determines whether a type conversion is
necessary for each subsequent reference to a variable. During the compile phase, SAS creates the
following three items:
input buffer is a logical area in memory into which SAS reads each record of raw data when SAS
executes an INPUT statement. Note that this buffer is created only when the DATA
step reads raw data. (When the DATA step reads a SAS data set, SAS reads the data
directly into the program data vector.)

program data is a logical area in memory where SAS builds a data set, one observation at a time.
vector (PDV) When a program executes, SAS reads data values from the input buffer or creates
them by executing SAS language statements. The data values are assigned to the
appropriate variables in the program data vector. From here, SAS writes the values to
a SAS data set as a single observation.
Along with data set variables and computed variables, the PDV contains two automatic
variables, _N_ and _ERROR_. The _N_ variable counts the number of times the
DATA step begins to iterate. The _ERROR_ variable signals the occurrence of an
error caused by the data during execution. The value of _ERROR_ is either 0
(indicating no errors exist), or 1 (indicating that one or more errors have occurred).
SAS does not write these variables to the output data set.

descriptor is information that SAS creates and maintains about each SAS data set, including data
information set attributes and variable attributes. It contains, for example, the name of the data set
and its member type, the date and time that the data set was created, and the number,
names and data types (character or numeric) of the variables.

The Execution Phase


By default, a simple DATA step iterates once for each observation that is being created. The flow of
action in the Execution Phase of a simple DATA step is described as follows:

• The DATA step begins with a DATA statement. Each time the DATA statement executes, a
new iteration of the DATA step begins, and the _N_ automatic variable is incremented by 1.
• SAS sets the newly created program variables to missing in the program data vector (PDV).
• SAS reads a data record from a raw data file into the input buffer, or it reads an observation
from a SAS data set directly into the program data vector. You can use an INPUT, MERGE, SET,
MODIFY, or UPDATE statement to read a record.
• SAS executes any subsequent programming statements for the current record.
• At the end of the statements, an output, return, and reset occur automatically. SAS writes an
observation to the SAS data set, the system automatically returns to the top of the DATA step,
and the values of variables created by INPUT and assignment statements are reset to missing in
the program data vector. Note that variables that you read with a SET, MERGE, MODIFY, or
UPDATE statement are not reset to missing here.
• SAS counts iteration, reads the next record or observation, and executes the subsequent
programming statements for the current observation.
• The DATA step terminates when SAS encounters the end-of-file in a SAS data set or a raw
data file.

All the variables are assigned missing values (Blank for character, . for numeric values)

Name statements that are recognized at compile time only?


drop, keep, rename, label, format, informat, attrib, where, by, retain, length, array

Name statements that are execution only.


INFILE, INPUT, Output, Call routines

Identify statements whose placement in the DATA step is critical.


DATA, INPUT, RUN, CARDS ,INFILE,WHERE,LABEL,SELECT,INFORMAT,FORMAT

Name statements that function at both compile and execution time.


options, title, footnote
In the flow of DATA step processing, what is the first action in a typical DATA Step?
The DATA step begins with a DATA statement. Each time the DATA statement executes, a new
iteration of the DATA step begins, and the _N_ automatic variable is incremented by 1.

What is _n_? The _N_ variable counts the number of times the DATA step begins to iterate.
It is one of the Automatic data step (and not proc’s) variables (the other one being _ERROR_) that
SAS provides in a PDV. It should be noted that _n_ does not necessarily equal the observation
number in a dataset.

How do I convert a numeric variable to a character variable?


Practically, the data type of a variable cannot be changed in one data step, but the data values
could…One should create a new variable with data type character and assign the values of the
numeric variable with a PUT function, drop the numeric variable, and rename the character variable
to the numeric variable name. Note: You would receive a warning saying that the variable has
already been defined as numeric. Eg.
http://support.sas.com/onlinedoc/913/getDoc/en/lrdict.hlp/a000199354.htm#a000226452

How do I convert a character variable to a numeric variable?


Practically, the data type of a variable cannot be changed in one data step, but the data values
could…One should create a new variable with data type numeric and assign the values of the
character variable with a INPUT function, drop the character variable, and rename the numeric
variable to the character variable name. Note: You would receive a warning saying that the variable
has already been defined as character.
http://support.sas.com/onlinedoc/913/getDoc/en/lrdict.hlp/a000180357.htm
find more @ http://sastechies.blogspot.com/

Vous aimerez peut-être aussi