Vous êtes sur la page 1sur 57

SAS Notes

SAS Notes

Introduction......................................................................................................................................................4
1. ORIGIN OF SAS..................................................................................................................................4
2. WHY SAS?.......................................................................................................................................4
3. CHARACTERISTICS OF SAS....................................................................................................................4
4. DATA WAREHOUSING............................................................................................................................5
Basics of SAS Software...................................................................................................................................7
1. SAS DATA SET................................................................................................................................7
2. A SAMPLE SAS PROGRAM....................................................................................................................8
3. THE DATA STEP..................................................................................................................................8
3. THE PROC STEP..................................................................................................................................9
4. THE PARSE / EXECUTE CYCLE.................................................................................................................9
5. DATA IN MEMORY..............................................................................................................................10
6. THE OBSERVATION LOOP....................................................................................................................11
8. FILE STRUCTURE................................................................................................................................12
9. SAS DATE & TIME...........................................................................................................................13
Rules of SAS Language.................................................................................................................................14
1. KEYWORDS.......................................................................................................................................14
2. AUTOMATIC VARIABLES......................................................................................................................15
3. VARIABLE ATTRIBUTES.......................................................................................................................15
4. VARIABLE LISTS ........................................................................................................................16
5. THE NUMERIC DATA TYPE..................................................................................................................16
6. OPTIONS..........................................................................................................................................17
SAS Programming concepts..........................................................................................................................18
I. THE DATA STEP.................................................................................................................................18
Types of DATA steps..............................................................................................................................18
1. Data from an external file.................................................................................................................18
2. Data in job stream.............................................................................................................................18
3. Data in existing SAS data set............................................................................................................19
4. Writing reports ..................................................................................................................................19
II. WORKING WITH SAS DATASETS............................................................................................................20
Dataset Options.................................................................................................................................................20
III. INPUT / OUTPUT STYLES....................................................................................................................21
1. List Input............................................................................................................................................21
2. Column Input.....................................................................................................................................21
3. Formatted Input.................................................................................................................................21
4. Named Input......................................................................................................................................22
IV. WORKING WITH EXTERNAL FILES.........................................................................................................23
INPUT operation...................................................................................................................................23
INFILE..............................................................................................................................................................23
OUTPUT operation................................................................................................................................25
FILE..................................................................................................................................................................25
Pointer Controls & Line-Hold Specifiers..............................................................................................27
V. DATA STEP STATEMENTS .................................................................................................................28
1. File handling Statements...................................................................................................................28
2. Action Statements...............................................................................................................................28
3. Control Statements.............................................................................................................................29
4. Information Statements......................................................................................................................29
VI. OPERATORS IN SAS..........................................................................................................................31
VII. COMBINING DATA SETS....................................................................................................................32
Introduction............................................................................................................................................32
1 Concatenating data sets......................................................................................................................32
2. Interleaving........................................................................................................................................33
3. One to One Reading..........................................................................................................................34
4. One to One Merging..........................................................................................................................34
5. Match-Merging.................................................................................................................................34
Duplicate values of BY variable.......................................................................................................................35
Nonmatching observations...............................................................................................................................36
6. Updating Data sets............................................................................................................................36

Page 2 of 57
SAS Notes

Duplicate values of BY variable.......................................................................................................................37


Non-matched observations...............................................................................................................................38
VIII. ARRAYS...................................................................................................................................40
IX. FUNCTIONS................................................................................................................................40
Procedures......................................................................................................................................................45
Append....................................................................................................................................................45
Compare.................................................................................................................................................45
Contents..................................................................................................................................................45
Datasets..................................................................................................................................................46
Formats..................................................................................................................................................47
Summary or Means................................................................................................................................48
Print .......................................................................................................................................................50
SQL.........................................................................................................................................................50
SAS Macro Language....................................................................................................................................54
Macro Variables....................................................................................................................................54
Macros....................................................................................................................................................54
Some other SAS Products..............................................................................................................................56

Page 3 of 57
SAS Notes

Introduction
The SAS system began as a software system for Data Analysis & statistical work. Since then, SAS has
evolved and made its presence in diverse fields. Today, SAS Systems analysis tools range from simple
statistics to specialized analysis for econometrics & forecasting, statistical design, computer performance
evaluation & Operation Research. SAS finds its highest application in the field of Data Warehousing &
Data Mining.

1. Origin of SAS
SAS originally stood for “statistical analysis system”and many of the characteristics of SAS can be traced
back to its statistical background.

In statistical experiments, a measuring process can be repeated at many different times. Each instance of
measurement is called an observation and different qualities that are measured are called variables.
That’s the source of these two SAS terms and the form of a SAS dataset.

Ideally, these observations are independent, i.e. different observations do not depend on each other. The
data from each observation can be processed independently, without reference to data from other
observations, and the order in which observations are processed do not affect the conclusion. This makes
possible the concept of observation loop for a computer program involving a repeated process of reading
one observation at a time into memory and extracting the information needed from it. This observation
loop is a central part of the design of the SAS system.

2. Why SAS?
SAS System is an integrated system of software products.

• Its power, flexibility & ease of use enables you to gain strategic control of all your data processing
needs. SAS System has a collection of ready-to-use programs called procedures. Combined with
other features of SAS System, it makes it possible to have a variety of applications – from general-
purpose data processing to specialized analysis in many application areas.

• It facilitates applications that run on more than one computing environment. SAS applications work
the same, look the same and produce the same results irrespective of your hardware or OS. This is
possible because SAS System has a layered structure called Multi Vendor Architecture (MVA)
This consists of a host specific component which is specifically written for each environment and the
portable component which brings it a universal ‘feel’. You can develop SAS applications on one
environment and run them in other environments without any changes.

• It can accommodate skill level of potential users. SAS provides flexible user interface in the form of
menu-driven or task-oriented interfaces. New users can practically develop applications without
learning the syntax of the SAS language through these interfaces.

• It provides an exhaustive inventory of application development tools.

3. Characteristics of SAS

• The SAS System has a modular design. It involves a large collection of several programs that are
coordinated by a central program called the supervisor.

Page 4 of 57
SAS Notes

• SAS is an interpreted language but has some characteristics of a compiler. Most SAS statements are
grouped into segments called steps, rather than being interpreted and executed at one time, before
execution.
• SAS is called a step-structured language because it only allows one step to run at a time, one after
the other.
• SAS has been called a very high-level language because much of its syntax is even more abstract
than most high-level languages. The program code correlate as close as possible to the ideas of the
programmer and the result he/she seeks to achieve.
• SAS has it’s own storage format and SAS language provides high-level access to files in this format.
Data files that SAS system accesses this way are called SAS Datasets. The simplified access to SAS
datasets in SAS syntax eliminates most of the work of programming input & output in SAS
programs. At the same time SAS also provides high-level access to files of other formats through
specialized routines called format & informat. The input and output capabilities of SAS are still
among the most powerful & flexible of any programming language.

4. Data warehousing
“The goal of data warehousing is to free the information that is locked up in the operational data
bases and to mix it with information from other, often external, sources of data”

Operational systems are systems that help in running the enterprise operations. They are the backbone
of the enterprise running daily transactions such as “inventory”, “payroll”, “accounting” and other such
transactional systems. Such systems are indispensable to an organization, as an enterprise cannot operate
without these systems. These systems are tuned for high performance and quick response time and often
need to extremely stable and robust.

Informational systems perform the crucial functions of enabling the planning, forecasting and other
strategy related management functions. In a dynamic business environment, the enterprise has to be
geared for the future in order to sustain itself and grow in a healthy manner. The Informational systems
are knowledge-based (where as the operational systems are data based) and they deal with analyzing data
and helping managers in arriving at decisions.

The significant difference between the operational systems and the informational systems can be seen in
the area of the focus of the two systems. An operational system is focussed on a single area while an
informational system has to span a breadth of different areas. This is because an operational system is
concerned with the data and transactions in a particular area while the informational systems needs to
data from different sources to facilitate decision making. Even if there is an all-encompassing
operational system, it cannot double up as an informational system because its main function is efficiency
in operations. Data used in analysis is typically historical data which is inactive and this data if mixed
with operational live data causes performance degradation of the operational system. Thus informational
systems have to be designed that aid the decision-makers in performing analyzing and planning for the
future. A Data warehouse effectively performs the function of an informational system on an enterprise
level.

A Data warehouse is a “collection of integrated subject-oriented databases designed to support the DSS
function; where each unit of data is relevant to some moment in time. The data warehouse contains
atomic data and lightly summarized data.“

Most databases are designed to ease data entry, reduce redundancy and speed the retrieval of a single
entity. The data warehouse, on the other hand, is designed of fast retrieval of information & answers.
This means that groups of records will be retrieved, manipulated & analyzed. It may require that data
needs to be accessed from multiple database sources – a collection of integrated subject-oriented
databases.

The data resident in a data warehouse is non-volatile data. Data from the operational systems is triggered
to go to the data warehouse when most of the activities on these operational data has been completed.

Page 5 of 57
SAS Notes

Data in typically saved for a large period of time as the efficiency of analysis improves with the breadth
and depth of data available in the data warehouse. The data in the data warehouse is usually at a level
higher than the data at the operational level i.e. some analysis or aggregation has already been performed
to the operational data. There are certain data items that will be in the same level as that of the
operational data (e.g. the grain of the fact in a data warehouse might be the sales on a particular day and
this might be the same as the data in a operational system)

Page 6 of 57
SAS Notes

Basics of SAS Software


The core of the SAS system is the Base SAS software. It gives you all the tools you need to make your
data useful & meaningful. It consists of

• The SAS language – programming language used to mange your data.


• Procedures – Pre-written computer programs that analyze & process datasets & display results.
• A Macro Facility – to generate & store text strings & communicate info from one program to
another.
• A Windowing Environment called SAS Display Manager System.

The Base SAS software provide tools for :

• Information Storage And Retrieval


• Data Modification And Programming
• Report Writing
• Statistical Analysis
• File Handling

The next few sections discuss the following :

• SAS DATA set


• Data step
• PROC step
• The Parse / Execute Cycle
• Options
• File structure
• SAS date and time

1. SAS DATA set


The Data to be used must be in a form understandable to the SAS System. This form is called
the SAS Data set. It consists of the Descriptor information & Data values.

• The Descriptor information describes the contents of the SAS dataset to the SAS System.
• The Data values contain the actual data to be analyzed.

The DATA values are arranged into a rectangular structure of rows (called observation) and
columns (called variables). An Observation is a collection of data that usually relate to a single object.
A variable is the set of data values that describe the characteristics of the object.

SAS recognizes missing values and has an internal representation for them. Missing values are values
unavailable to the System. This representation is used because SAS requires values for all variables for
every observation in the dataset. This ensures the rectangular structure of the data values.

SAS datasets are kept in collections called SAS data libraries. A SAS dataset is identified in a SAS
program by a two-level name that identifies the SAS data library & the SAS dataset. A one-level name
used for a SAS dataset implies that the default WORK library is being assumed.

All SAS programs consists of a series of statements that, as a group, are designed to accomplish a
specific task called SAS steps. These SAS steps fall into 2 categories –
DATA steps & PROC steps.
They are the building blocks of all SAS programs. The SAS datasets is useful to store data between SAS
steps

Page 7 of 57
SAS Notes

2. A Sample SAS program

data ht_wt;
input
name $ 1-10 sex $ 12 age 14-15 height 17-18 weight 20-22 init_ht 24-25
init_wt 27-29;
wt_loss = init_wt – weight;
ht_loss = init_ht – height;
cards;
John M 35 76 172 73 175
James M 32 75 167 73 169
Mary F 30 68 165 62 154
Ruby F 32 65 158 58 163
;
run;

proc summary data = ht_wt;


var wt_loss ht_loss;
output out = min_wt min(wt_loss )=;
output out = min_ht min(ht_loss)=;
run;

proc print data = min_wt;


run;

proc print data = min_ht;


run;

endsas;

Note: The variables in SAS dataset ht_wt are name, sex, weight, height, init_ht, init_wt, wt_loss and
ht_loss

3. The Data Step

DATA – instructs the SAS system to create s SAS dataset. It signals the beginning of the DATA step and
gives a name to the SAS data set you are creating.

INPUT – provides information to the System to organize data into SAS datasets. It describes your input
data, giving name to each variable and identifying its location on the disk or tape file.

CARDS – mark the end of programming stmt & beginning of data within the same step as the interpreter
stops as soon as it gets to the CARDS statement. Any statement that follows on the same line will be
completely ignored. Since there is more than one data line, the data step is executed repeatedly, creating
one observation from each input line, until the end of the input line is reached.

RUN – instructs SAS to execute the previous statements

Page 8 of 57
SAS Notes

3. The Proc Step


In addition to the data step, the program has two proc steps. Once your data are accessible as a SAS
dataset, you can analyze the data & write reports using a set of utilities known as SAS
Procedures. Proc’s are specialized application programs that analyze data in a
dataset for producing univariate descriptive statistics, frequency tables, cross tabulation
tables, tabular reports, charts, plots etc. Other procedures provide ways to manage SAS
files. To most SAS users they are the main attraction of the SAS System.

The first PROC step asks SAS to call a procedure form its library and to execute that procedure, with the
SAS data set ht_wt as input. Proc SUMMARY computes the minimum values of ht_loss & wt_loss, the
two variables listed in the var statement. Two output SAS datasets are created (min_wt & min_ht) with
minimum value of variable wt_loss in one(min_wt) & minimum value of variable ht_los in the other
(min_ht) SAS dataset. The minimum values of the variables ht_loss & wt_loss are stored in the same
variables itself in the corresponding output datasets, in this case. The output datasets contain all the
variables named in the proc step and a few extra identifier variables.

Note: The variable in SAS dataset min_ht is ht_loss & _freq_ and in SAS dataset min_wt is wt_loss &
_freq_. _freq_ is an automatic variable created by summary procedure which gives the number of
observations for the current subgroup.

The PRINT proc produces a report in a table form, of all the data vales in the SAS dataset. This report
prints the contents of the standard print file. The standard print file is the file ordinarily used to hold text
output from programs, whether it is sent to printer, displayed on screen, or stored for later use. Print files
are divided into pages and usually have a tittle that appears on the first few lines of each page.

Another file that is automatically created and maintained by SAS is the log file. The log file contains
the SAS supervisor’s step-by-step account of the execution of the program, including program
statements, notes, warnings & error messages.

4. The Parse / Execute cycle


1. When the interpreter reads lines from the program file, it parses them into tokens & statements until
a complete step is formed.

• It then checks the syntax of the statement


• If it is the kind of statement that is executed immediately, then the statement is
executed.
• If there is a syntax error it generates an error message.
• Otherwise, it adds information from the statement to the step being built
• Continues processing statements until it reaches the end of the step
• Then it executes the step if no errors are reported.

2 Then it parses more lines from the program file until it forms another complete step and the above
process is repeated for the execution of that step.

3 The above process continues until the end of program is reached.

There is no compiling feature in PROC steps because PROCs are already compiled programs.

Statements executed immediately are called global statements and do not have to be associated with a
step.

The scope of the SAS data step begins with the key word DATA and ends with any one of the following:

• The keyword RUN (or QUIT)

Page 9 of 57
SAS Notes

• Another Data step beginning with the key word DATA


• Another Proc step beginning with the key word PROC
• End of program code
• The keyword ENDSAS
• CARDS or CARDS4 statement in a data step

A SAS dataset can be also be compiled and stored separately. Compiling a data step creates a SAS file in
a SAS library, which can be run in a SAS program. The compiled data set is not a machine language
program but a parsed code called “intermediate code”.

DATA ABC;
INFILE ABC;
INPUT A B C D E;
RUN PGM=PROJECT.ABC;

The PGM= option specifies a SAS file name where the compiled dataset will be stored. The dataset is
not executed in this case. A compiled data step can be executed by using the PGM= option on a DATA
statement. You can change the names of the input(SET, MERGE or UPDATE statements) & output
(DATA statement) SAS data sets in the compiled data step by using the REDIRTECT INPUT &
REDIRECT OUTPUT statements.

DATA PGM =SAS file;


REDIRECT INPUT compiled SAS dataset name = actual SAS dataset name …;
REDIRECT INPUT compiled SAS dataset name = actual SAS dataset name …;
RUN;

The input SAS dataset that are present when the data step is compiled should have the same variables &
attributes as the SAS data set that will be used when the compiled dataset is run, but need not contain any
observations. They can be created as :

DATA NEW;
SET ABC;
STOP;

Where NEW is the dataset with only the Descriptor information of dataset ABC.

5. Data in memory

SAS keeps the values of all variables in a step in a block of memory called the program data vector or
PDV. The size of the PDV is fixed, which limits the number of variables a step can have to a few
thousands. The PDV and variable attributes represent a modest part of the SAS system’s use of memory.
In addition it can include:
• Pointers to all the files being used, including the program file, the log file, the standard print file,
input files, output files and any libraries being used in the step.
• Buffers that contain data read from or written to each file.
• The SAS supervisor
• The current step and any procs, functions, CALL routines, informats and formats being used in the
step
• File names associated with filerefs or librefs in a FILENAME or LIBNAME statement
• System options
• Array definitions
• Titles
• Macro variable names & values

Page 10 of 57
SAS Notes

6. The Observation Loop


The amount of input data read by one repetition of the observation loop is an observation.

During compilation, SAS creates three things:

1. Buffer : Area of memory into which each record of raw data is


• read when an Input statement executes, or
• written when a Put statement executes.

2. Program Data Vector : Area of memory where the SAS data set is built, one observation at a time.
Values are assigned to the variables in the program vector during execution. From here the values are
written to the output SAS dataset as a single observation.

3. Descriptor information : Information the SAS System creates & maintains about each SAS dataset
like data set attributes & variable attributes. The variable attributes include :

1 Name of the variable


2 Data type
3 Length
4 Label
5 RETAIN or reinitialize
6 DROP or KEEP
7 Initial value assigned, if any
8 Format or Informat , if any
9 Position of the variable in the dataset

During Execution,

• By default SAS data step executes once for each observation being created. Each time
the DATA statement executes, a new iteration of the data step begins. The automatic variable
_N_ is set to the next value.

• Statements that read data (INPUT, SET, MERGE, and UPDATE) are executable. They
may appear anywhere in the data step, and do not have to be placed right after the data step.

• A raw data reading statement like Input causes a record of data to be read into the input
buffer and then into the Program data vector. Records coming from another SAS dataset are
read directly into the Program Data Vector.

• Subsequent SAS statements are executed for the current record.

• When the scope of the Data step is detected, the following occur automatically :

At the bottom of the observation loop:


1. Output – an observation is written into the new SAS dataset
2. Return – The system returns to the top of the Data set in preparation for another
iteration.

At the top of the observation loop:


3. The observation count _N_ increases by 1
4. Values of variables created with Input or assignment statements are reset to missing
in Program Data Vector.
5. Resets the current input file to CARDS and the current output file to LOG.

Page 11 of 57
SAS Notes

• The next iteration ensues and the process is repeated.

• The data step terminates when the end-of-file condition is encountered for an INPUT, SET, MERGE
or UPDATE in a SAS dataset or raw data file.

The SAS Supervisor does not set variable values to missing at the top of the DATA step for:
1. Variables comes from a SAS dataset (read with SET, MERGE, or UPDATE statements) are
retained and not reset to missing as the program passes through the DATA statement.
2. Variables listed in a RETAIN statement
3. Variables used on the left-side of a sum statement e.g. variable I is not reset in I + 5; These are
initialized to 0.
4. Variables used in I/O statement option for INFILE, SET, MERGE, UPDATE or FILE. These are
initialized to 0 or data-dependent values.
5. Variables that are array elements and the array uses temporary variables.

Normally, the SET, MERGE, UPDATE and INPUT statements stop the data step when the end of input
data is detected. But this does not happen for the following cases:
• The INPUT statement has trailing @@
• The POINT= option is used with SET statement for direct access
• EOF= option is used with INFILE statement
• SET, MERGE, UPDATE or INPUT statement executes conditionally
• RECFM=U option is specified with INFILE option

The SAS interpreter does not create an observation loop for steps that do not have INPUT, SET,
MERGE, UPDATE or DISPLAY.

8. File Structure
1. Engine: Engines are a set of internal instructions that the SAS System uses to read from and write to
files. Every SAS data set and Data Library is accessed through an engine. Engines open files, direct
input/output operations and gather descriptive information about files and their contents. The engine
uses this information to organize data into correct logical form – SAS data sets.

2. SAS data Library: It is the logical structure of files accessed by an Engine for processing by the SAS
System. SAS Libraries contain SAS data files. They are of two types : Permanent & temporary

3. Permanent Libraries: they reside on the external storage medium and are not deleted when the SAS
session terminates. SAS files in permanent libraries are specified by a two level qualifier where the
first qualifier stands for the libref and the second qualifier for the name of the file. Permanent
libraries are stored till they are specifically deleted.

4. Temporary Libraries: the are available only for the current session or job run and are deleted at the
end of the session or job run. SAS provides these libraries for files created during the session but are
not required after the termination of the session. The first qualifier need not be specified in this case
as it defaults to the temporary WORK library.

5. SAS data Set: It is the logical structure into which Engine fits data for processing by the SAS
System. They are of two types : SAS data files & SAS data views.

Page 12 of 57
SAS Notes

6. SAS data file: Contains both the data values and the descriptor information

7. SAS data View: Obtains the descriptor information or data values or both from other files. Only the
information necessary to derive the descriptor information or data values is stored in the file.

9. SAS Date & Time


SAS System processes calendar date values by converting dates to integer representing the
number of days between January 1st, 1960 and a specified date. Valid SAS dates can be positive or
negative numbers and range from 1582 A.D. to 20,000 A.D.

SAS System processes time by converting it to integer representing number of seconds since
midnight of the current day. SAS time values are independent of the date.

SAS System processes datetime by converting it to integer representing number of seconds since
midnight of January 1st, 1960 and a specified datetime.

SAS System reads & displays date, time & datetime values through formats & informats. Informats are
used to read fields according to a specified width and form while a Format writes or displays values
according to a specified width and form.

YEARCUTOFF System Option : This option specifies the first year of a 100 year span used by
Informats & functions. Based on this, the century values of dates are determined by SAS system.

Page 13 of 57
SAS Notes

Rules of SAS Language


The smallest meaningful unit of a program is a token. The SAS language has three types of tokens –
words, symbols & constants. Tokens are grouped into statements. SAS statements are named for the
words they start with, except for a few statements that do not begin with a keyword – like assignment &
sum statement.

• A statement end with ‘ ; ‘


• uppercase or lowercase allowed as SAS convert all to uppercase before compiling.
• Statements can begin in any column
• There is no special continuation characters and single statements can flow over to next line.
• Names in SAS can only be upto 8 characters long.
• No blanks are allowed in SAS names.
• SAS Data set names can only have alphabets, numbers or ‘_’ in them and cannot start with a number

1. Keywords
A Keyword is a word that has particular meaning in SAS syntax. Words that are used as SAS keywords
can also be used as names. So, a keyword is identified by it’s location. Some keywords that begin
statements are :

STATEMENT ACTION WHERE


ATTRIB, FORMAT, INFORMAT, LABEL, LENGTH set variable attributes data & proc steps

BY, WHERE Input data & proc steps

ABORT, DELETE, DO, END, ELSE, GOTO, IF, control flow data step
LINK, LOSTCARD, OTHERWISE, RETAIN,
RETURN, SELECT, STOP, WHEN
INFILE, INPUT, MERGE, SET, UPDATE input data step

DATA, DROP, ERROR, FILE, KEEP, LIST, PUT, output data step
OUTPUT, RENAME

ARRAY, CALL, DISPLAY, WINDOW Miscellaneous data step

CLASS, FREQ, ID, MODEL, OUTPUT, PROC, VAR, run proc Proc step
WEIGHT

FILENAME, FOOTNOTE, LIBNAME, MISSING, Program parameters Anywhere


OPTIONS, TITLE

DM, SKIP, PAGE, X immediate action Anywhere

CARDS, CARDS4, LINES, LINES4, QUIT, RUN mark end of step between steps

Some other common keywords are :

Page 14 of 57
SAS Notes

FROUPFORAMT LT GE CHARACTER GT IN EQ CANCEL AND DEFAULT


DESCENDING LE LIKE BETWEEN MAX MIN NE NOBS SAME NOTIN
NOTSORTED OR OF NUMERIC OUT NOT TO UNTIL WHILE _PAGE_.

Note: Generally, the different classes of objects in SAS programs have different name spaces. That
means that you can use the same name for different objects as long as they are not the same kind of
object. E.g. you could have a SAS variable TIME in a SAS dataset TIME. Since variables and datasets
share the same name space, you cannot use the same name for a variable & an array in the same step.
Also, as only one step runs at a time, you can use the same names for variables, array & statement labels
in different steps.

2. Automatic Variables
These variables are automatically created by the SAS System in various circumstances. They
exist within the Program Data Vector but are not output to the data set being created. Some common
automatic variables found in data steps are :

1. _N_ : Denotes the iteration of the Data step. Always created within a Data step with an initial value
of 1. It’s value gets incremented automatically each time the Data step executes the Data step and
begins a new iteration of the Data set.

2. _Error_ : Initially set to 0. It’s value gets incremented if SAS encounters an error within the Data
step.

3. _I_ : If the index variable is omitted from the array definition, then SAS assigns the automatic index
variable _I_ by which elements in the array can be accessed.

4. First .<var_name> : Temporary variable created by SAS to identify the beginning of a BY group.
When an observation is the first in a BY group, the value of the First.<var_name> is set to 1 where
<var_name> is the BY variable. For all the other observations in the BY group, it’s value is set to 0.

5. Last .<var_name> : Temporary variable created by SAS to identify the end of a BY group. When an
observation is the last in a BY group, the value of the Last.<var_name> is set to 1 where
<var_name> is the BY variable. For all the other observations in the BY group, it’s value is set to 0.

6. _ALL_ : Depending on context _ALL_ may mean all the datasets or all the variables that are
available.

7. _DATA_ : Using _DATA_ asks the SAS interpreter to name the new dataset as a subsequent name
from the series: DATA1, DATA2, DATA3, DATA4 etc.

8. _NULL_ : specifies that no SAS dataset is to be created

9. _LAST_ : specifies the most recently created dataset.

3. Variable Attributes
SAS variables are of two data types: numeric and character.
In addition to their type, SAS variables have these attributes: length, informat, format, and label

The attributes of the variables are stored in the descriptor information of the SAS dataset.

Page 15 of 57
SAS Notes

• Length : The length attribute of a variable is the number of bytes used to store each of its values in a
SAS data set. The default length is 8. The Length statement determines the length of a numeric
variable only in the data set being created while for character variables it determines the length of
the variable both in the Program Data Vector and the dataset being created.

• Informat : A variables informat is the pattern that SAS uses to read data values into the variable. The
default informat is w. for numeric variables, $w. for character variables.

• Format : A variables format is the pattern SAS uses to write each value of a variable. The default
format is BEST12. for numeric variables, $w. for character variables

• Label : The label attribute of a variable is a descriptive label of up to 40 characters that can be
printed by certain procedures instead of the variable name.

4. Variable lists
Variable lists normally consist of variable names separated by spaces.

• One form of abbreviated variable list uses a hyphen to indicate variable names with a range of
numeric suffixes A1-A12 is same as A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 A11 A12.

• A double hyphen indicates a different kind of range of variables. It represents the order in which
they appear in memory. If a data step has variables AA BB CC SUM TOTAL defined in that order,
then BB--TOTAL represents the variables BB CC SUM TOTAL. This kind of variable list is called a
named range.

• There are three special variable lists :


_ALL_ represents list of all variables available, including automatic variables.
_CHARACTER_ or _CHAR_ represents list of all character variables available alone
_NUMERIC_ represents list of all numeric variables available alone

• A colon is sometimes used to indicate alphabetic range of variable names. As AQ: would be used to
identify all variables whose names begin with the letters AQ.

A named range can be combined with a special variable list by putting the keyword NUMERIC or
CHARACTER between the two hyphens in named range of variables as BB-NUMERIC-TOTAL
includes only numeric variables located from BB to TOTAL in memory.

5. The Numeric Data Type


The SAS programming language has only two data types: numeric and character.

This differs from most high-level languages, which have many different numeric data types. By this, the
SAS System saves programmers from having to be concerned with the problems of converting between
data types. This data type corresponds to the double precision or 8-byte real type of other languages.
Besides, numbers, the numeric data is used for dates, times and logical values.

All numeric variables use 8 bytes of memory, but their length attribute determines the amount of storage
that will be used or the variable if it is stored in a SAS dataset. Lengths shorter than 8 can be used to
save storage space and I/O time but values might not be precise due to truncation. The shortest length
allowed is dependent on operating system and can be 2 or 3.

Page 16 of 57
SAS Notes

6. Options
SAS software uses 3 types of options

• System options : This options will be in effect for all DATA and PROC steps in a SAS job or
session unless they are re-specified in another OPTIONS statement. They are instructions that affect
the entire SAS session and controls the way it performs operations.

• SAS data set options : Which are specified in parentheses following a SAS data set’s name and
affect only that data set. SAS applies data set options specified with input data sets before it
evaluates program statements or applies data set options applied on output data sets.

• Statement options : which are specified only in a given SAS statement or statements and affect
only that statement or step.

Page 17 of 57
SAS Notes

SAS Programming concepts


The programming concepts discusses the following :

• The Data step


• Input styles
• Data step statements
• Operators in SAS
• Combining datasets
• Arrays

I. The Data Step


Before you can use SAS software to prepare your data for analysis or use a SAS procedure to analyze
your data, you must first get them into a SAS data set. Once your data are in a SAS data set, you can
combine the data set with other SAS data sets in many different ways to and use any of the SAS
procedures.

You can use the DATA step for these purposes;

• Retrieval – Getting your input data into SAS data set.


• Editing – Checking for errors in your data and correcting them, computing new variables
• Printing reports according to your specifications and writing disk or tape files.
• Producing new SAS data sets from existing ones by subsetting, merging, and updating the old data
sets.

The DATA step can include statements asking SAS to create one or more new SAS data sets and
programming statements that perform the manipulations necessary to build the data sets. Data analysis,
file management, and information retrieval are all handled in DATA steps.

Types of DATA steps

A DATA step is a group of SAS statements that begins with a DATA statement and usually includes all
the statements in one of these four groups :

1. Data from an external file

DATA statement;
INFILE statement;
INPUT statement;
other SAS statements used in the DATA step
Run;

2. Data in job stream

DATA statement;
INPUT Statement;
other SAS Statements used on the DATA step
CARDS Statement;
data lines
;

Page 18 of 57
SAS Notes

3. Data in existing SAS data set

DATA Statement ;
SET|MERGE|UPDATE Statement;
other SAS Statements used in the DATA step
Run;

4. Writing reports

DATA _NULL_;
INPUT and CARDS|INFILE Statement;
or
SET|MERGE|UPDATE Statement;
FILE Statement;
PUT Statement;
other SAS Statements used in the DATA step
Run;

The FILE statement tells SAS where to print the report or write the file.
PUT Statements write the lines of the report or file.

Page 19 of 57
SAS Notes

II. Working with SAS datasets

The SAS language supports both sequential and direct access input from SAS datasets.
• Sequential input is provided by the SET, MERGE and UPDATE statement in the data step and the
DATA= option and some other options in the proc step.
• Direct access is done by the SET statement with the POINT= option.

SAS data steps create output SAS datasets using the DATA and OUTPUT statements.
PROC steps create output SAS datasets with the OUT= option on the PROC or OUTPUT statement.
All output SAS datasets are new files. If a SAS dataset exists with the same name, the old dataset is
deleted after the successful completion of the step and dataset with same name is created.

The SAS step first processes the descriptor information of the SAS dataset and then inputs or outputs one
observation at a time. After writing the last observation, most Engines also add more information to the
descriptor information.

Dataset Options

Dataset options input or output processes related to SAS datasets. On Output, they control the data stored
in SAS dataset. On Input, they affect the way the SAS dataset appears to the step, but they do not
actually change the stored data in the input file.

Both for Input & Output datasets

DROP= variable: list of variables not to be kept in the dataset


KEEP= variable: list of variables to be kept in the dataset
LABEL=: creates label for SAS datasets
RENAME= old = new: changes the name of a variable

For Input datasets

FIRSTOBS=: causes the processing to begin from a specified observation


OBS=: causes the processing to end with the specified observation.
WHERE=: selects observations that meet the specified condition.
CNTLLEV= MEM: specifies whether to lock the entire dataset
CNTLLEV= REC: specifies whether to lock only one observation at a time

For Output datasets

COMPRESS=: specifies whether the SAS dataset is compressed or not


REUSE=: specifies whether space can be reused in compressed SAS datasets
INDEX=: creates indexes
REPLACE=: specifies whether to allow existing SAS datasets to be replaced by new SAS datasets with
same name.

Page 20 of 57
SAS Notes

III. Input / Output Styles


1. List Input

Data values are required to be separated by at least one blank (the default delimiter) or the
delimiter (if specified). They are not required to be aligned in columns. SAS requires you to name the
variable and the data type alone (default is numeric, so give $ for character values). Formats or
Informats will convert it to Formatted Input style.

data newfile;
input name $
age
sex $;
run;

Features :

1. max (default) length of 8 will be applied to character values unless specifically overridden by a
LENGTH statement.
2. Character values cannot contain embedded blanks
3. Missing values must be specified by “." only.
4. Fields can only be read in order
5. Data must be standard numeric or character format.

2. Column Input

Data values are required to be aligned in columns. SAS requires you to name the variable, the
data type (default is numeric, $ for character values) and the columns within which the data values are to
be located for each record.

data newfile;
input name $ 1-8
age 12-14
sex $ 16;
run;

Features :

1. Character values can be from 1 - 200 characters long.


2. Data values must be within the same columns on all the input lines
3. Character values cannot contain embedded blanks
4. Missing values need not be specified by “.".
5. Fields can be read in any order regardless of their position in the records.
6. Data must be standard numeric or character format.
7. Leading & trailing blank within the fields are ignored.

3. Formatted Input

Formatted input is used with pointer controls which controls the position of the input pointer in
the input buffer when reading data.. It differs from the List Input in that it enables you to read non-
standard data for which SAS requires additional instructions.

data newfile;
input name $char 8.
+4 age 2.
+1 salary comma5.;

Page 21 of 57
SAS Notes

run;

Features :

1. Character values can be from 1 - 200 characters long.


2. Character values can contain embedded blanks
3. Missing values need not be specified by “.".
4. Fields can be read in any order regardless of their position in the records.
5. Can read data stored in non-standard form.

4. Named Input

Named input is used when data lines contain variable names followed by an equal sign and a value for
the variable. Once the INPUT statement starts reading named inputs, the System expects all remaining
values in the input line to be of the same form. The variables in the INPUT statement do not have to be
in the same order in which they occur in the data records.

data newfile;
informat phone 6.;
input @1 date julian5.
name=
age=
salary=;
cards;
99001 age=23 salary=1000 name=Mary phone=232345
97234 phone=242334 salary=1000 name=Martin age=21
91210 age=24 salary=2000 phone= 223198 name=John
99001 name=Maggie phone=238971 age=24 salary=1600
;

Features :

1. Cannot switch to another input style for a particular input line once you start reading it with named
input.
2. If any of the values are not in named input form then the System handles them as invalid data.
3. If a variable that appears on the named input lines appear in any other statement, the value is
automatically read from the input, whether or not it is explicitly specified in the input statement.
4. Cannot read data stored in non-standard form.

All these input styles have a corresponding output style as well.

Informat: specifies the format of the values for variables being read. SAS system interprets the format
and convert it to an internal format that it understands. Is supplied with the INPUT statement or as data
set option of input data set if required.

Format: specifies the format of the values for output variables. The SAS system converts the value to a
format that is required to appear in the output. Is supplied along with the PUT statement or other output
procedures.

Page 22 of 57
SAS Notes

IV. Working with External files


A Text file is a sequence of records. SAS supports sequential input & output for text files, even within
the same data step. Data positions are usually stated in terms of columns which represent the distance
from the beginning of the record. SAS input / output syntax for text files is more powerful and flexible
than any of the classic high level languages.

INPUT operation

The syntax for input from text files involves 2 statements:


• INFILE – provides general identification information about the input file
• INPUT – controls the way the input data in interpreted and assigned to variables

A blank INPUT statement (with no arguments) is called a NULL INPUT statement. It can have several
uses:
1) To bring input data line into the input buffer without creating any SAS variables. This data line can
be copied as such to output file
2) Or release an input line held by a trailing @ or double trailing @.

INFILE

Because the INFILE statement identifies the file to be read, it must execute before the INPUT statement
that reads the data lines.
As it is executable, you can use it in conditional processing (in an IF-THEN statement, for example).
You can read from several external files within one DATA step. To read from multiple input files in a
single iteration of the DATA step, you can use multiple INFILE and INPUT statements. To read from
one file, then close it and open another, you can use the FILEVAR= option. (FILEVAR= enables you to
dynamically change the current input file within your SAS job.
When you use more than one INFILE statement for the same fileref and you use options in each INFILE
statement, the effect is additive. That is, the options specified in each INFILE statement are added to the
options specified in any previous INFILE statements for that file.
You can use the INFILE statement in combination with the FILE statement to update records in an
external file. To do so, follow these steps:

1. Specify the INFILE statement before the FILE statement.


2. Specify the same fileref or physical filename in each statement.
3. Use options that are common to both the INFILE and FILE statements in the INFILE statement
instead of the FILE statement. Any such options used in the FILE statement are ignored.

To update individual fields within a record instead of the entire record, use the SHAREBUFFERS option.

The INFILE statement is an executable statement. It sets the current input file, which the INPUT
statement then reads. It must be executed before the INPUT statement to which it refers. The current
input file is changed to CARDS at the top of the observation loop. So the INFILE has to be executed in
every repetition of the observation loop that executes INPUT statement.

Options

LRECL=: The number of characters in a record

PAD: Pads input fields shorter than the LRECL value with trailing blanks. Default is NOPAD

LINESIZE= / LS=: Limits the number of characters in a record available to the INPUT statement. It
prevents the INPUT statement from reading past a certain column.

FIRSTOBS=: The number of the first observation to be read from the input file. Use this option to skip
records at the beginning of the file.

Page 23 of 57
SAS Notes

OBS=: The number of the last record to be read from the input file. Use this option to skip records at the
end of an input file.

N=: Specifies the number of lines available to the input pointer.

EOF= label: The INPUT statement branches to the statement label indicated if it attempts to read past
the end of a file.

END= variable: designates a numeric variable that the INPUT statement sets to 1 when it reads the last
record in the file.

UNBUFFERED / UNBUF: Tells the SAS supervisor not to look ahead at the next record when reading a
record. The END= variable cannot be used to indicate the last line in the input file.

COLUMN / COL= variable: designates a numeric variable that the INPUT statement sets to the column
pointer location

LINE= variable: designates a numeric variable that the INPUT statement sets to the line pointer
location.

LENGTH= variable: designates a numeric variable that contains the length of the input line. The
variable then can be sued with $VARYING informat to read varying length records. Changing the value
of the variable between the INPUT & the PUT statement can change the length of the _INFILE_ string.

START: designates the numeric variable that identifies the starting character to be used I the _INFILE_
string. A value can be assigned before the PUT statement to change the extend of the _INIFLE_ string.

DELIMITER= / DLM=: Delimiter used in list input. The default is ‘ ‘.

FILEVAR= variable: Changing the value of the character variable causes the INFILE statement to close
the input file and to open the file whose physical name is the value of the variable.

SHAREBUFFERS / SHAREBUFS: Use this option for text files being edited to use the same buffer for
input & output. This means that any positions skipped over by the PUT statement will stay the way they
were before. Otherwise, the positions in an output record that the PUT statement does not write to are
filled with blanks. This option makes it possible to change some fields in a file without processing other
fields.

DSD: This option changes the way delimiters are treated when using list input and enables you to read
delimiters as characters within quoted strings. When the DSD option is in effect, the delimiter is assumed
to be a comma. If the data contain another delimiter, you must specify it with the DELIMITER= option.
To read a value as missing between two consecutive delimiters, use the DSD option. By default,
consecutive delimiters are treated as a unit. When you use the DSD option, consecutive delimiters are
treated separately; therefore, a value that is missing between consecutive delimiters is read as a missing
value.

The DSD option also enables list input to read a character value that contains a delimiter within a quoted
string. For example, if data are separated with commas, using the DSD option enables you to place the
character string in quotes and read a comma as a valid character. The quotes are not stored as part of the
character value.

The following determines what the INPUT statement does when it gets to the end of a record, before it
finds all values for all the variables in the record.

FLOWOVER: The remaining variables are read from the first column of the next record. This is the
default action.

Page 24 of 57
SAS Notes

TRUNCOVER: Using the TRUNCOVER option enables you to read variable-length records when some
records are shorter than expected by the INPUT statement.

MISSOVER: The remaining variables are assigned missing values.

STOPOVER: The step stops running as an error condition is created.

The _INFILE_ string refers to the last record read from the current input file.
The default INPUT file is CARDS

OUTPUT operation

The syntax for input from text files involves 2 statements:


• FILE – provides general identification information about the output file
• PUT – controls the way the input data in interpreted and assigned to variables

FILE

The FILE statement is an executable statement. It sets the current output file, which the PUT statement
then writes. It must be executed before the PUT statement to which it refers. The current output file is
changed to LOG at the top of the observation loop. So the FILE has to be executed in every repetition of
the observation loop that executes PUT statement.

As it is executable, you can use it in conditional processing (in an IF-THEN statement, for example).
You can write to several external files within one DATA step. To write to multiple output files in a
single iteration of the DATA step, you can use multiple FILE and PUT statements.

Options
Many of the options find a similar one for INFILE statement.

LRECL=: The number of characters in a record

PAD: Pads input fields shorter than the LRECL value with trailing blanks. Default is NOPAD for
variable length records and PAD for fixed length records.

LINESIZE= / LS=: Limits the number of characters that can be written to a record by the PUT
statement.

OLD: This option makes the step writes output records at the beginning of the file, replacing any
previous contents of the file.

MOD: This option makes the step writes output records at the end of the file, adding records to the
previous contents of the file.

PRINT or NOPRINT: Tells whether a file is a print file or a non-print file.

NOTITLES: tells the supervisor not to put the current titles, defined in a TITLE statement, at the top of
each page of a PRINT file.

PAGESIZE / PS : determines the number of lines per page of output.

FIRSTOBS=: The number of the first observation to be written to the output file. Use this option to skip
records at the beginning of the file.

Page 25 of 57
SAS Notes

OBS=: The number of the last record to be written to the output file. Use this option to skip records at
the end of an output file.

N=: Specifies the number of lines available to the output pointer.

HEADER= label: When the PUT statement writes to the end of a PAGE, it branches out to the
HEADER= statement label to execute a group of statements there until a RETURN statement is reached.

COLUMN / COL= variable: designates a numeric variable that the PUT statement sets to the column
pointer location

LINE= variable: designates a numeric variable that the PUT statement sets to the line pointer location.

LINESLEFT= / LL= variable: designates a numeric variable that tells the number of lines remaining on
the current page, including the current line pointer.

DELIMITER= / DLM=: Delimiter used in list output. The default is ‘ ‘.

FILEVAR= variable: Changing the value of the character variable causes the FILE statement to close
the output file and to open the file whose physical name is the value of the variable, for output.

SHAREBUFFERS / SHAREBUFS: Use this option for text files being edited to use the same buffer for
input & output. This means that any positions skipped over by the PUT statement will stay the way they
were before. Otherwise, the positions in an output record that the PUT statement does not write to are
filled with blanks. This option makes it possible to change some fields in a file without processing other
fields.

When a PUT statement attempts to write beyond the maximum allowed line length (as specified by
LINESIZE= option in FILE statement), the following options on the FILE statement can cause varying
results

FLOWOVER: The current output line is written to the file and the data item that exceeds the current
line length is written to a new line.

DROPOVER: The option discards data items that exceed the output line length as specified by the
LINESIZE= option in the FILE statement and the column pointer remains positioned after the last value
written in the current line.

STOPOVER: stops processing the data step immediately if a PUT statement attempts to write a data
item that exceeds the current line length. The System writes the portion of the line built before the error
occurred and issues an error message.

The default OUTPUT file is LOG.

Page 26 of 57
SAS Notes

Pointer Controls & Line-Hold Specifiers

As the SAS System reads values from data records in the input buffer, it keeps track of its position with a
pointer. Pointer controls are provided on the Input statement so that you can reset the position of the
pointer to read data values in records at certain positions. Line-hold specifiers allow you to hold a data
record in the input buffer to be processed by another INPUT statement.

@ - Column pointer control that moves the pointer to column n. Any decimal portion of variable values
is truncated and only integer values are used. If 0 the pointer moves to column 1.

+ - Moves the pointer n columns. Any decimal portion of variable values is truncated and only integer
values are used. If 0 the pointer moves to column 1.

# - Moves the pointer to line n. Any decimal portion of variable values is truncated and only integer
values are used.

/ - Advances the pointer to column 1 of the next line.

: - Is a character comparison operator that modifies existing comparison operators compare all values that
start with a given character. It changes the nature of the comparison from an exact match to a “begins
with” match. It has no effect when it is used between two variables, only comparison with a string
constant.
e.g. if upcase (charvar) =: “SMIT”;

when used to compare 2 text strings, the longest string will be truncated to the length of the shortest one
for the purpose of evaluation. E.g.
if ‘ABC’ =: ‘ABCD’; or
if ‘ABC’ =: ‘AB’; will both evaluate to TRUE.
If charvar >: ‘A’; will return records where variable charvar values begin with B.
If charvar in: (‘SM’,’Will’,’aa’); will return values that start by any of these character set, irrespective
of their total length.

Trailing @ - To allow the next INPUT statement in the same DATA step to read from the same record.
Prevents the next INPUT statement from automatically releasing the current Input record and reading the
next one into the input buffer. Between INPUT statements the pointer position remains the same.

Trailing @@ - To allow a record to be held for the next INPUT statement, even across iterations of the
DATA step. Here each input line contains values for several observations. An input line held by the
system is released immediately if the pointer moves past the end of the line, if a NULL INPUT statement
executes

Page 27 of 57
SAS Notes

V. DATA step Statements


The SAS Statements that can appear in a DATA step fall into several categories:
File handling Statements, action Statements, Control Statements, and information Statements; Each
Statement is either executable, positional, or declarative.

Executable statements (denoted by X) are programming Statements that cause some action.

Positional Statements (P) cause no action at execution, but their position in DATA step is important.

Declarative Statements (D) supply additional information to SAS.

1. File handling Statements

• CARDS – precedes card data or lines entered at terminal - data that are part of the job stream (P)

• CARDS4 – precedes in-stream data lines containing semicolons (P)

• DATA – tells SAS to begin a DATA step and to start building a SAS data set (P)

• FILE – identifies the data file where lines are to be written by the DATA step (X)

• INFILE – The INFILE statement gives the fileref of the control Statement (FILENAME statement).
The fileref is a logical name to the physical file. The fileref identifies the external file containing raw
data to be read. When the INFLE statement is executed the external file is opened. (X)

• INPUT – describes the records on the external input file.(X)

• MERGE – combines observations from two or more SAS data sets into a new data set. (X)

• PUT – describes the format of the lines to be written by SAS.(X)

• SET – reads observations from one or more existing SAS data sets. (X)

• UPDATE – applies transactions to a master file. Both transaction and master file are SAS data sets
(X)

2. Action Statements

• ABORT – stops the current DATA step or the job, depending on the mode of executing.

• Assignment – creates and modifies variables.

• CALL – invokes or calls a routine.

• DELETE – excludes observations from the data set being created.

• ERROR – writes messages on the SAS log.

• LIST – lists the current input lines to the LOG.

• LOSTCARD – corrects for lost data lines when an observation has an incorrect number of data
lines.

• MISSING – declares that certain values in the input data represent special missing values for

numeric data fields.

Page 28 of 57
SAS Notes

• OUTPUT – creates new observations.

• STOP – stops creating the current data set.

• subsetting IF – selects observations for the data set being created.

• SUM – accumulates total

3. Control Statements

• DO – sets up a group of statements to be executed as one statement.


• iterative DO
• DO UNTIL
• DO WHILE

• END – signals the end of a DO or SELECT group.

• GO TO – causes SAS to jump to a labeled statement in the step and continue execution at
that point.

• IF-THEN/ELSE – conditionally executes a SAS statement

• LINK-RETURN – causes SAS to jump to a labeled statement in the step and execute statements
until it encounters a RETURN Statement.

• RETURN – when not combined with a LINK statement, causes SAS to return to the
beginning of the DATA step to begin execution. When combined with
a LINK statement, returns to the statement immediately following the
most recently executed LINK.

• SELECT – conditionally executes one of several SAS Statements.

4. Information Statements.

• ARRAY – defines a set of variables to be processed the same way. (D)

• ATTRIB – specifies a format, informat, label and length for a variable.(D)

• BY – specifies that the data set is to be processed in groups defined by the BY


variables.(D)

• DROP – identifies variables to be excluded from a data set or analysis.

• FORMAT – specifies formats for printing variable values.

• INFORMAT – specifies informats for storing variable values.

• KEEP – identifies variables to be included in a data set or analysis.

• LABEL – associates descriptive labels with variable names.

• RENAME – changes the name of the variables in a data set.

Page 29 of 57
SAS Notes

• RETAIN – identifies variables whose values are not to be set to missing each time the DATA
step is executed and can give variables an initial value for the first iteration (otherwise for first
iteration it would have been missing). Sum statements variables are retained by default.

Page 30 of 57
SAS Notes

VI. Operators in SAS

Priority Symbol Mnemonic Equivalent Definition Example

Group I ** exponetiation y=a**2;


+ positive prefix y=+(a*b);
- negative prefix z=-(a*b);
^ or ~ NOT logical NOT if not z then put x;
>< MIN minimum x=a><b;
<> MAX maximum x=a<>b;

Group II * multiplication c=a*b;


/ division f=g/h;

Group III + addition f=g+h;


- subtraction f=g-h;

Group IV || concatenate character name = 'J. ' || 'Smith';


values

Group V < LT less than if y < z then put x=;


<= LE less than or equal to if y le z then put x=;
= EQ equal to if y eq (a+b) then output;
^= NE not equal to if x ne z then output;
> GT greater than if z gt a then output;
>= GE greater than or equal to if z ge a then output;
IN equal to one of a list if sex in ('m','f') then result='correct';

Group VI & AND logical AND if a=b and c=d then x=1;
| OR logical OR if a=b or c=d then x=1;

Page 31 of 57
SAS Notes

VII. Combining Data Sets


Introduction

The SAS System provides a means for processing observations that are ordered or grouped
according to the values of one or more variables read from existing SAS data sets. The SAS System
detects the pattern by tracking the values of the temporary variables FIRST.variable & LAST.variable.
The SAS System expects observations to be ordered or grouped by the value of the variables specified in
the BY statement. The observations can be ordered by sorting or indexing the dataset.

The most frequent use of BY group processing in the Data step is to combine two or more SAS
data sets. When processing SET, MERGE & UPDATE statements, the SAS System reads one
observation at a time into the program data vector according to the values of the BY variable. After
processing all the observations from one BY group, it expects the next observation to be from the next
BY group. The NOTSORTED option in the BY statement is used when the data is not in alphabetical or
numeric order but are arranged in groups according to the values of the BY variable.

Note: Using 2 files in SET statement is equivalent to using proc APPEND if the output dataset also
appears as the first dataset on the SET statement. E.g.
Data in2;
Set in2 in3;

Is the same as,


Proc append base=in2 new=in3;
Run;

Using more than 1 SET statement is equivalent to MERGE operation. E.g.


Data in1;
Set in2;
Set in3;

Is the same as,


Data in1;
Merge in2 in3;

1 Concatenating data sets

Data myfile;
Set ourfile1 ourfile2….;

Concatenation is combining two or more datasets one after the other into a single dataset. The number
of observations in the new dataset is the sum of observations of the old data sets and the order is all
observations from the first followed by all from the second. If input data set contains different variables,
observations form one data sets have missing values for variables defined only in the other data set.

Ourfile1

OBS COMMON ANIMAL


1 a ant
2 b bird
3 c cat
4 d dog
5 e eagle
6 f frog

Page 32 of 57
SAS Notes

Ourfile2

OBS COMMON PLANT


1 a apple
2 b banana
3 c coconut
4 d dewberry
5 e eggplant
6 f fig

myfile

OBS COMMON ANIMAL PLANT


1 a ant
2 b bird
3 c cat
4 d dog
5 e eagle
6 f frog
7 a apple
8 b banana
9 c coconut
10 d dewberry
11 e eggplant
12 f fig

2. Interleaving

Data myfile;
Set ourfile1 ourfile2 ourfile3…..;
By var1 var2 var3…..;

The sum of observations in the new data set is the total of observations of the old data sets. The
observations in the new data sets is arranged by the value of the BY variable and within each BY group
by the order in the old data sets.

In the example, the data set ourfile1 & ourfile2 are SET by the variable COMMON

Myfile

OBS COMMON ANIMAL PLANT


1 a Ant
2 a apple
3 b Bird
4 b banana
5 c Cat
6 c coconut
7 d Dog
8 d dewberry
9 e Eagle
10 e eggplant
11 f frog
12 f fig

Page 33 of 57
SAS Notes

3. One to One Reading

Data myfile;
Set ourfile1;
Set ourfile2;

The new data set contains all the variables from all the input data sets. The number of
observations in the new data set in the number of observations in the smallest original data set. If the
data set contains common variables, the values read from the last data set replace those read from earlier
ones.

Myfile

OBS COMMON ANIMAL PLANT


1 a ant Apple
2 b bird Banana
3 c cat Coconut
4 d dog Dewberry
5 e eagle Eggplant
6 f frog Fig

4. One to One Merging

Data myfile;
Merge ourfile1 ourfile2;

SAS system combines the first observation from all the data sets in the MERGE statement into
the first observation in the new data sat. Similarly all nth observations from all data sets are merged to
from nth observation in new data set. The number of observations in the new data set is equal to the
number of observations in the largest data set.

Myfile

OBS COMMON ANIMAL PLANT


1 a ant apple
2 b bird banana
3 C cat coconut
4 D dog dewberry
5 E eagle eggplant
6 F frog fig

the result is similar the result obtained by One to One Reading when the number of observations
in the merging data sets are equal. When the number of observations are unequal, the SAS System stops
processing before all observations are read from all data sets with One to One Reading.
5. Match-Merging

Data myfile;
Merge ourfile1 ourfile2….;
By var1 var2…..;

Match-Merging combines observations from two or more data sets into a single observation in
the new dataset according to the values of the common variable. Observations from the different datasets
with the same BY variable values are combined. If there are several observations with the same BY

Page 34 of 57
SAS Notes

variable values, they are matched in a manner similar to the one-to-one merging process. But, in this
case, if one SAS dataset has fewer observations in a BY group than the other, the values of the last
observation in the BY group are used to form the rest of the observations in the BY group, instead of
missing values. Missing values are used when one SAS dataset has no observations in a BY group.

When there is a conflict in the value of the variable, other than a BY variable, between different input
SAS datasets in a MERGE statement, the value from the SAS dataset named later in the MERGE
statement is used.

The number of observations in the new data set is equal to the total of the largest number of observations
in each BY group from among all input data set. Each observation in new data set contains all the
variables from all data sets.

Duplicate values of BY variable

Ourfile1

OBS COMMON ANIMAL


1 a ant
2 a ape
3 b bird
4 c cat
5 d dog
6 e eagle

Ourfile2

OBS COMMON PLANT


1 a apple
2 b banana
3 c coconut
4 c celery
5 d dewberry
6 e eggplant

Myfile

OBS COMMON ANIMAL PLANT


1 a ant apple
2 a ape apple
3 b bird banana
4 c cat coconut
5 c cat celery
6 d dog dewberry
7 e eagle eggplant

Page 35 of 57
SAS Notes

Nonmatching observations

Ourfile1

OBS COMMON ANIMAL


1 a ant
2 c cat
3 d dog
4 e eagle

Ourfile2

OBS COMMON PLANT


1 a apple
2 b banana
3 c Coconut
4 e Eggplant
5 f Fig

Myfile

OBS COMMON ANIMAL PLANT


1 a ant apple
2 b . Banana
3 c cat Coconut
4 d dog .
5 e eagle Eggplant
6 f . fig

6. Updating Data sets

Data myfile;
Update master trans;
By common;

The update statement uses observations from the transaction data set to change values of corresponding
observations from master data set. All observations in the find relation with observations in the master
data set by values of the BY variable.

The values of the BY variable or combination of BY variables must be unique for each observation in the
master data set. The BY variables do not get updated. The number of observations in the new data set is
the sum of the observations in the master data set and the number of unmatched observations in the
transaction data set.

Master

OBS COMMON ANIMAL PLANT


1 a ant apple
2 b bird banana
3 c cat coconut
4 d dog dewberry

Page 36 of 57
SAS Notes

5 e eagle eggplant
6 f frog fig

Trans

OBS COMMON PLANT


1 a apricot
2 b barley
3 c cactus
4 d date
5 e eucalyptus
6 f fennel

Myfile

OBS COMMON ANIMAL PLANT


1 a ant apricot
2 b bird barley
3 c cat cactus
4 d dog date
5 e eagle eucalyptus
6 f frog fennel

Duplicate values of BY variable

Master

OBS COMMON ANIMAL PLANT


1 a ant apple
2 b bird banana
3 b bird banana
4 c cat coconut
5 d dog dewberry
6 e eagle eggplant
7 f frog fig

Trans

OBS COMMON PLANT


1 a apricot
2 b barley
3 c cactus
4 d date
5 d dill
5 e eucalyptus
6 f fennel

Myfile

OBS COMMON ANIMAL PLANT


1 A ant apricot
2 B bird barley
3 b bird banana
4 c cat cactus

Page 37 of 57
SAS Notes

5 d dog dill
6 e eagle eucalyptus
7 f frog fennel

If the master data set contains two observations with the same value of the BY variable, the first
observation is updated and the second is ignored. Warning message are also generated as the BY
variables are to be unique. If the transaction data set contains duplicate values of the BY variable, SAS
applies both transactions to the observation and the last value copied into the program vector is written
into the new data set.

Non-matched observations

Master

OBS COMMON ANIMAL PLANT


1 A Ant .
2 C Cat Coconut
3 D Dog Dewberry
4 E Eagle Eggplant
5 f Frog Fig

Trans

OBS COMMON PLANT MINERAL


1 a Apricot Amethyst
2 b Barley beryl
3 c Cactus .
4 e . .
5 f Fennel .
6 g Grape garnet

Myfile

OBS COMMON ANIMAL PLANT MINERAL


1 a Ant Apricot Amethyst
2 b . Barley Beryl
3 c Cat Cactus .
4 d Dog Dewberry.
5 e Eagle Eggplant .
6 f Frog Fennel .
7 g . Grape Garnet

Only non-missing values from the transaction dataset is used when updating values in the master dataset.
If necessary, the special missing value ._ can be used to update a value to missing. Any observation from
the transaction data set that does not correspond to the master data set is written to the program data
vector and becomes the basis for an observation in the new data set.

There can be more than one input file involved in the creation of an observation by combinations of SET
and INPUT statements. However, the step stops after one of the input streams (they operate completely
independent of each other) reaches the end of data. This means that data at the end of the other stream is
never reached. In these cases, Input files can be coordinated by using the END= option on the INPUT,
SET, or MERGE statements and using programming statements to identify the end of an input stream.

Page 38 of 57
SAS Notes

The following code does a one-to-one merge of a SAS dataset with a sequential file to create a new SAS
dataset.

SEQUENTIAL FILE FLATFILE

NAME AGE BIRTHDAY

MOLLY 24 23JAN69
SUSAN 22 23FEB72
ARCHIE 22 01OCT59
BILLY 26 29JUL79
JOHN 22 04NOV75
SARAH 27 11MAY79

SASFILE STRUCTURE

OBS SEQNO
1 1
2 2
3 3

SAS CODE:

DATA NEW;
SEQNO =_N_;
INFILE FLATFILE END= LAST1;
IF NOT LAST1 THEN
INPUT @1 NAME $10. @12 AGE 2. @15 BIRTHDAY $5.;
IF NOT LAST2 THEN
SET SASFILE (KEEP=SEQNO) END=LAST2;
IF LAST1 AND LAST2 THEN STOP;
RUN;

OBS SEQNO NAME AGE BIRTHDAY

1 1 MOLLY 24 23JAN69
2 2 SUSAN 22 23FEB72
3 3 ARCHIE 22 01OCT59
4 . BILLY 26 29JUL79
5 . JOHN 22 04NOV75
6 . SARAH 27 11MAY79

Page 39 of 57
SAS Notes

VIII. ARRAYS
Array arrayname [ number-of-elements ] list-of-variables

e.g.
array books [4] ref usage intro glossary;

subscript values can be passed as number, variable with numeric value or an expression.
Here books[2] and usage are equivalent.
e.g.

Data myfile (drop=count);


Input ref usage intro glossary;
Array books[4] ref usage intro glossary;
do count = 1 to 4;
if books[count] =. then books[count] = 0;
end;
cards;
45 46 112 23
65 53 123 .
56 . 154 32
. 23 134 45
;

proc print data=myfile;


title ‘Data set produced with array processing’;
run;

The Array statement defining the array must appear in a data step before any reference to that array. An
array definition is only in effect for the duration of the data step. To use in several data step, you must
redefine the array in each data step.
e.g.

%let list ref usage intro glossary;

data one;
array books[4] &list;
-- more SAS statements --
run;

data two;
array journal[4] &list;
-- more SAS statements --
run;

To refer to all variables in an array use the special array subscript asterisks (*)

IX. FUNCTIONS

Page 40 of 57
SAS Notes

ABS
Returns a non negative number equal in magnitude to that of the argument.
e.g. x = abs(-2.4);

BYTE
Returns the nth character on the ASCII or EBCIDIC collating sequence.
e.g. x = byte(80); put x; <== will return & in EBCIDIC

CEIL
Smallest integer that is greater than or equal to the argument.
e.g. x= ceil(2.1) <== x will return 3.

COMPRESS
Removes specified character(s) from character expressions. If second argument is not specified
then blanks, if any, will be removed
e.g. x = ‘A.B (C=D);’;
y = compress (x ,’.();’);
put y ; <== will give you AB C=D

DATE
Returns current date as a SAS date value

DATEJUL
Converts a Julian date to a SAS date value.
e.g. x= datejul(01001); <== will return value in SAS date of 14976

DATEPART
Extracts the date from a SAS datetime value.
e.g. datetm = ‘01feb98:8:45’dt ; <== way to specify constant datetime values.
thedate = datepart (datetm);
put thedate wordate.; <== prints the date alone in wordate fromat.

DATETIME
Returns the current datetime value.

DAY
Returns the date of the month from a SAS date value.
e.g. now = ’04May98’d; <== way to specify constant values for dates.
d = day (now); <== d will hold the value 4.

DHMS
Returns a SAS datetime from date, hour, minute and second.
Syntax : DHMS (date, hour, minute, second)
e.g. dt = dhms(‘01jan97’d,22,45,20);

DIM
Returns the number of elements in an array
Syntax : DIM <n>(arrayname)| (arrayname,bound-n)

Bound-n – Specifies the dimension in a multi-dimension array for which you want to know the number of
elements
e.g. dim(array2 , 3) returns the number of elements in the second dimension of a multi-dimension array
array2.

Page 41 of 57
SAS Notes

EXP
Raises the ‘e’ to the power supplied by the argument.

FLOOR
Largest integer that is less than or equal to the argument.
e.g . x= floor (1.6) <== returns 2.

HMS
Returns a SAS time value from hours, minutes & seconds values.
e.g. sastime = hms(14,45,20);

HOUR
Returns hour from a SAS time or datetime value

INDEX
Searches the source for the character string specified by the excerpt and returns its position.
Returns 0 if the excerpt is not found.
Syntax : index (source, excerpt)

INDEXC
Searches the source for the characters specified by the excerpt and returns its position. Returns
0 if the excerpt is not found.
Syntax : index (source, excerpt)

INPUT

The function allows you to read argument using any informat specified by the second argument.
The informat specified determines whether the result is numeric or character. Used to convert character
values to numeric values.
e.g. fmtsale = input(sale, comma8.); <== will return $10,000 from
values like 10000

INTNX
Syntax : INTNX(interval, from, number)

advances a date, time, or datetime value by a given interval.

This function generates a SAS date, time, or datetime value that is a given number of time intervals from
a starting value(from). The interval must be a character constant or variable whose value is one of those
listed below.

DATE interval DATETIME interval TIME interval


DAY DTDAY HOUR
WEEK DTWEEK MINUTE
MONTH DTMONTH SECOND
QTR DTQTR
YEAR DTYEAR

e.g. yr = intnx(‘year’,’05jan89’d); <== will return value 89.

INTCK
Gives the number of intervals in a given time span
Syntax : intck(interval, from, to)
e.g.
qart = intck(‘qtr’,’10jan98’d,’01may99’d); <== will return value 5

Page 42 of 57
SAS Notes

JULDATE
Returns the Julian date from a SAS date value
e.g.
sdate = ’01feb99’d;
jul = juldate(sdate); <== will return value 99032

LEFT
Left aligns a SAS character expression.
e.g.
sd = ’ feb day’;
p = left(sd); <== will return value with trailing blanks

LENGTH
Returns the length of an argument as the right-most non-blanks character in the argument. If the value is
missing the length returns a value of 1. If the argument is an uninitialized numeric variable, it returns a
value of 12.
e.g.
p = lenght(‘sdqwert ’); <== will return value of 7 for
variable p

MAX
Returns largest of the non-missing arguments.
e.g.
p = max(‘sdqwert’); <== will return value of 7 for variable p

LENGTH
Returns the length of an argument. If the value is missing the length returns a value of 1.
e.g.
p = left(‘sdqwert’); <== will return value of 7 for variable p

PUT
Specifies an output format for a value. The result of a put function is always a character string. This is
useful for converting a numeric value to a character value.

SUBSTR
Syntax : substr(argument,position<,n>)

If used on the right side of an assignment statement, it returns a portion of an expression you specify as
argument.
e.g.
part = substr(var1,3,6); <== will return character string of length 6 from the 3 rd
position.
partend = substr(var1,3); <== will return character string from the 3rd position till end.

If used on the left side of an assignment statement, it places the value of the expression on the right side
of the assignment statement into the argument of SUBSTR, replacing n characters starting with the
character you specify in position.
e.g.
var= ‘CATNAP’;
substr(var,1,3) = ‘KID’; <== will return value of ‘KIDNAP’ into variable var.
Put var;

SYSPARM

The SYSPARM function lets you access a character string specified with the SYSPARM = system option
in the job control for your job or in an OPTIONS statement.

Page 43 of 57
SAS Notes

Page 44 of 57
SAS Notes

Procedures
Append

The APPEND proc adds the observations from one dataset to the end of another.

Proc append base = myfile new = newfile (where ( x=2 ) ) force;

Options
BASE : names the dataset to which observations are added. If not found a new data set with the name
specified is created

DATA : names the SAS dataset with observations which is to be added to the end of the BASE= data
set.

WHERE : limits the observations selected from DATA= dataset that are to be appended to BASE=
dataset.

FORCE : forces PROC APPEND to concatenate datasets when the DATA= dataset contains variables
that are either
• not in the BASE= dataset or
• do not have the same type as the variables in the BASE= dataset or
• are longer than the variables in the BASE= dataset.

The advantage over SET operation is that it bypasses processing of data in BASE= dataset and adds new
observations directly to the end of the BASE= dataset.

Compare

This proc compares the contents of two datasets or compare the values of different variables within a
single dataset to produces a variety of reports.

Proc compare base = myfile (where (state = ‘NC’)) compare =yourfile;


var student birth state major;
with student1 birth5 state3;
out = newfile;

• matching variables : variables with the same name or those explicitly paired by VAR and WITH
statements.
• Matching observations : observations that have the same values for all variables that occur in the
same position in the datasets.

PROC COMPARE compares the following in order:


1. Data set attributes
2. Variables
3. Attributes of variables
4. Observations
5. Values in pairs of observations that match

It produces
1. An output dataset
2. Printed reports
3. A numeric return code stored in the automatic macro variable &SYSINFO

Contents

This Proc provides information about a SAS data library or individual files in a SAS data library.

Page 45 of 57
SAS Notes

libname person ‘ library-dataset ’;


Proc contents data=person . _all_
memtype = view nods position;
out = newfile;
run;

DATA : specifies the SAS dataset or library whose information is to be determined. The information of
all the files in the library having type as specified by MEMTYPE= option is gained by using the keyword
_all_

MEMTYPE : Specifies one or more types of members in the SAS data library. It can take the following
values:
ACCESS – access files created using SAS/ACCESS software
ALL – all member types
CATALOG – catalogs
DATA – SAS datasets
PROGRAM – stored compiled SAS programs
VIEW – views created using SQL procedures

NODS : Only the SAS data library is printed if this option is used.

OUT : gives the name of the output SAS dataset. The output dataset contains information similar to that
given in the variable description section in the printed output

POSITION: The default order of listing variables names in the SAS dataset is alphabetical. This option
prints a second list of variables names in the order of their position in the dataset.

Datasets

This Proc is used to list, copy, rename and delete SAS files and to manage indexes and append SAS
datasets in a data library. It also provides all the capabilities of the APPEND, CONTENTS, and COPY
procedures.
Some of the differences in DATASETS compared with other procs:
1. The input library is specified in the LIBRARY= option
2. Statements are executed in the order of writing.
3. Groups of statements can execute without a RUN statement.
4. There is a dependence of some statements on other statements. E.g. the SELECT & EXCLUDE
statements can only be executed immediately after a COPY statement, FORMAT, INFORMAT,
LABEL, RENAME, INDEX CREATE, INDEX DELETE can only be used after a MODIFY
statement
5. The DATASET procedure remains active until you type one of the following :
• QUIT
• RUN CANCEL
• A new PROC or DATA statement
If a syntax error is encountered the RUN group containing the error is not executed.

libname drink ‘dataset1 ’;


libname eat ‘dataset 2 ’;

Proc datasets library = eat memtype = data;


Copy out = drink move;
Select custard icecream;
Run;
Delete sandwich;
Run;
Change apple = apricot;

Page 46 of 57
SAS Notes

Run;
Modify hamburger;
Rename price = rate;
quit;

CHANGE : Use CHANGE <old_name> = <new_name> statement to rename one or more members
COPY : Use COPY OUT = <from_lib> to copy members from one library to another. The MOVE
option is used to delete from the input library after copying
DELETE : Use DELETE statement to specify members to be deleted from SAS library.
MODIFY : USE MODIFY statement to change the attributes of the specifies datasets. Only one dataset
name is allowed per modify statement.

Formats

This proc is used to create your own formats & informats. Options with FORMAT procedure can be used
to print the contents of a format library, create a control dataset for writing other informats and formats or
read a control dataset to create informats & formats. Formats & informats gives the SAS system
information about data that is to be read or written. For example,
1. Data type
2. How many bytes it occupies
3. Decimal placement for numbers
4. How to handle leading trailing or embedded blanks or zeros, etc.

A word immediately followed by a period indicates a format or informat name. They can have an
optional width specification before the period. Numeric formats & informats can also have an optional
decimal specification after the period.

User defined formats convert a value to a different form for output E.g.
1. Convert number to a character string – 1 to YES
2. Convert from one character string to another character string - YES to Y
3. Specify a template to format the way a numeric value is printed – print in the format of a telephone
number.

User defined informats convert character input values into a different form

1. Convert a character number to a character string - 1 to YES


2. Convert a character string to number - YES as 1(numeric)
3. Convert a character string to another character string - YES as Y

There are two types of formats :


Value format : converts output values to a different form. E.g.

Proc Format;
value SEX 1 = “Male” 2 = “Female” ;
invalue $FRENCH ‘OUI’ = ‘YES’ ‘NON’ = ‘NO’;

In the first case, numeric variable values converted to character form for output when the numeric
format SEX is used. Format stored as number but formatted as character values. In the second case,
values of OUI & NON are stored as YES and NO by the SAS system when the character informat
$FRENCH is used.

Picture format : specify template for printing numbers giving specifics like leading zeros, decimal &
comma punctuation, fill characters, prefix & negative number representation. Only applicable to
numeric values.

Proc Format;

Page 47 of 57
SAS Notes

picture PHONENUM other = ‘000 / 000 - 000’;


picture FAX other = ‘0999 ) 999 - 999’ ( prefix = ‘(‘ );

Summary or Means

The summary procedure computes descriptive statistics on numeric variables in a SAS dataset and
outputs the results to a new SAS dataset. The difference between Means & Summary is that Summary
does not produce any printed output on its own. The summary output data set is typically printed with
PROC PRINT or is input to a DATA step that extracts the desired information.

This procedure creates a SAS dataset containing summary statistics or descriptive statistics on
numeric variables. Each observation in the new dataset contains the statistics for a different subgroup of
the observations in the input dataset representing all possible combinations of the levels of variables
specified in the CLASS statement.

PROC SUMMARY data = ht_wt;


var wt_loss ht_loss;
by sl_no;
class group;
id name;
output out = min_wt min(wt_loss )=;
output out = min_ht min(ht_loss)=;
run;

BY : A separate analysis on observations in the group specified by the BY variables is obtained. The
SAS dataset should be sorted by the BY variables if the NOTSORTED option is not used.

CLASS : This specifies the variables used to form sub-groups. The level of interaction between the
variables specified is obtained by this statement. If a variable is taken into account for a certain sub-
group then it’s is assigned a binary value 1, else it is 0. The decimal equivalent of this binary numbers
for a sub-group is the _TYPE_ value for that sub-group. The output produces statistic info like
• number of observations,
• Mean ,
• Std Deviation,
• minimum &
• maximum value
for that sub-group for different values of _TYPE_. If OUT= option is used, then this forms a
single record in the output dataset.

VAR: The variables in the dataset for which statistics have to be calculated. If this statement is not used
then all variables except those in the BY, CLASS, FREQ, ID and WEIGHT statements are analyzed.

ID : If additional variables from the input dataset are to be included in the output dataset, then they can
be given with the ID statement.

e.g. The following code produces the result as :

Input file IN1:


code age date ind bit
1000 23 99123 Y 1
1001 . 99123 N 1
1003 32 99123 Y 0
1003 22 00123 Y 0
. 25 00123 Y 0

Code:

Page 48 of 57
SAS Notes

DATA IN2;
SET IN1;
PROC SUMMARY;
BY DATE NOTSORTED;
CLASS IND BIT;
VAR AGE CODE;
OUTPUT OUT=IN3;
run;

will give :

DATE IND BIT _TYPE_ _FREQ_ _STAT_ AGE CODE

99123 0 3 N 2.0000 3.00


99123 0 3 MIN 23.0000 1000.00
99123 0 3 MAX 32.0000 1003.00
99123 0 3 MEAN 27.5000 1001.33
99123 0 3 STD 6.3640 1.53
99123 0 1 1 N 1.0000 1.00
99123 0 1 1 MIN 32.0000 1003.00
99123 0 1 1 MAX 32.0000 1003.00
99123 0 1 1 MEAN 32.0000 1003.00
99123 0 1 1 STD . .
99123 1 1 2 N 1.0000 2.00
99123 1 1 2 MIN 23.0000 1000.00
99123 1 1 2 MAX 23.0000 1001.00
99123 1 1 2 MEAN 23.0000 1000.50
99123 1 1 2 STD . 0.71
99123 N 2 1 N 0.0000 1.00
99123 N 2 1 MIN . 1001.00
99123 N 2 1 MAX . 1001.00
99123 N 2 1 MEAN . 1001.00
99123 N 2 1 STD . .
99123 Y 2 2 N 2.0000 2.00
99123 Y 2 2 MIN 23.0000 1000.00
99123 Y 2 2 MAX 32.0000 1003.00
99123 Y 2 2 MEAN 27.5000 1001.50
99123 Y 2 2 STD 6.3640 2.12
99123 N 1 3 1 N 0.0000 1.00
99123 N 1 3 1 MIN . 1001.00
99123 N 1 3 1 MAX . 1001.00
99123 N 1 3 1 MEAN . 1001.00
99123 N 1 3 1 STD . .
99123 Y 0 3 1 N 1.0000 1.00
99123 Y 0 3 1 MIN 32.0000 1003.00
99123 Y 0 3 1 MAX 32.0000 1003.00
99123 Y 0 3 1 MEAN 32.0000 1003.00
99123 Y 0 3 1 STD . .
99123 Y 1 3 1 N 1.0000 1.00
99123 Y 1 3 1 MIN 23.0000 1000.00
99123 Y 1 3 1 MAX 23.0000 1000.00
99123 Y 1 3 1 MEAN 23.0000 1000.00
99123 Y 1 3 1 STD . .
123 0 2 N 2.0000 1.00

Page 49 of 57
SAS Notes

123 0 2 MIN 22.0000 1003.00


123 0 2 MAX 25.0000 1003.00
123 0 2 MEAN 23.5000 1003.00
123 0 2 STD 2.1213 .
123 0 1 2 N 2.0000 1.00
123 0 1 2 MIN 22.0000 1003.00
123 0 1 2 MAX 25.0000 1003.00
123 0 1 2 MEAN 23.5000 1003.00
123 0 1 2 STD 2.1213 .
123 Y 2 2 N 2.0000 1.00
123 Y 2 2 MIN 22.0000 1003.00
123 Y 2 2 MAX 25.0000 1003.00
123 Y 2 2 MEAN 23.5000 1003.00
123 Y 2 2 STD 2.1213 .
123 Y 0 3 2 N 2.0000 1.00
123 Y 0 3 2 MIN 22.0000 1003
123 Y 0 3 2 MAX 25.0000 1003
123 Y 0 3 2 MEAN 23.5000 1003
123 Y 0 3 2 STD 2.1213 .

Print

This Proc prints the observations in the dataset using some or all of the variables. They can also print
totals and sub-totals for numeric variables.

Proc print data = person split = ‘*’ label n ;


label name=‘Associates*in the team’;
var name age team weight;
id name;
by name;
pageby weight;
sumby weight;
sum age;
title ’ player details ’;
run;

SPLIT : Splits labels used as column headings across multiple lines where the split character appears.
The split character is not printed.
LABEL: variable labels are used as column headings instead of variable names, if this option is used.
N : prints the number of observations in the dataset at the end of printed output.
BY : the Procedure prints a separate analysis for each variable in the BY group. It is required that the
dataset be sorted by BY variable.
PAGEBY : begins printing on a new page when the value of the specified BY variable changes.
SUM : specifies the variables whose values is to be totaled.
VAR : names the variables to be printed.

SQL

The SQL procedure implements the Structured Query Language ( SQL) for SAS version 6 onwards.

Sample code

PROC SQL;
CONNECT TO DB2 (SSID=&SYS);

Page 50 of 57
SAS Notes

CREATE VIEW TEMP AS


SELECT * FROM CONNECTION TO DB2
(
SELECT T.ISCT_DT,
T.ISCT_CRED_USER_NBR,
T.ISCT_CRED_STR_ID,
T.ISCT_NBR,
T.ISCT_TOTL_AMT,
T.ISCT_CHRG_TYP_CD,
T.SLTKT_PUR_DT,
T.SLTKT_PUR_NBR,

/* more variables …… */

T.ISCT_CHRG_USER_NBR,
T.ISCT_CHRG_STR_ID,
0,
0,
' ',
' ',
' ',
' ',
0,
CURRENT DATE,
T.ISCT_CHRG_STR_ID,
''
FROM SI&OWN..STORE_TRSF T,
SI&OWN..VSTORE S,
SI&OWN..VSTORE B,
ST&OWN..STR_DLY_RPT D
WHERE
T.DRPT_ACPT_DT = &ED
AND T.ISCT_CHRG_TYP_CD = 'WTYRPR'
AND T.ISCT_CRED_USER_NBR = '01'
AND T.ISCT_CRED_USER_NBR = S.USER_NBR
AND T.ISCT_CRED_STR_ID = S.STR_ID
AND T.ISCT_CHRG_USER_NBR IN ('01','22')
AND T.DRPT_SALE_DT = D.DRPT_SALE_DT
UNION ALL

SELECT T.ISCT_DT,
T.ISCT_CRED_USER_NBR,
T.ISCT_CRED_STR_ID,
/* more variables …… */
FROM SI&OWN..STORE_TRSF T,
SI&OWN..TRSF_RFND_LN R,
SI&OWN..VSTORE S,
SI&OWN..VSTORE B,
ST&OWN..STR_DLY_RPT D,
MI&OWN..SKU_VRSN V,
ORDER BY 22, 39, 2, 3, 18, 43
);
DISCONNECT FROM DB2;
%PUT &SQLXRC;
%PUT &SQLXMSG;

Page 51 of 57
SAS Notes

QUIT;

DATA TEMP2;
SET TEMP (RENAME=(
ISCT_DT=ISCTDT ISCT_CRE=CREDUSR ISCT_CR0=CREDSTR
ISCT_NBR=ISCTNBR ISCT_TOT=ISCTAMT
ISCT_CHR=CHRGTYPE SLTKT_PU=ORGTKTDT
SLTKT_P1=ORIGTKT SLTKT_PY=PAYTYPE ISCT_CH2=CHGTOUSR
ISCT_CH3=CHGTOSTR JNL_ACCT=JACCTNBR ICRT_NBR=ICRTNBR
SRVC_TKT=SERVINV SRVC_CMP=WARREPDT SLMKR_IN=SLMKINTL
DRPT_ACP=DRACPTDT DRPT_SAL=DRDATE ISCT_RFN=RFNREASN
ISCT_CUS=CUSTNAME ISCT_CMN=FFCOMENT EXPRESSN=SKUNBR

/*rename all the variables from input dataset to a 8 character length name before
reading in*/

EXPRES10=SKUSMDES EXPRES11=VCHRAMT EXPRES12=SLTKTDT


EXPRES13=SLTKTNBR EXPRES14=SLTKTTM EXPRES15=ACCTNBR
EXPRES16=VCHRDESC ACCTG_DI=ACCTDIV ACCTG_RE=ACCTREG
ACCTG_17=ACCTDIST ACCTG_GR=ACCTGRP STR_TELE=STAREACD
STR_TE18=STEXCHNO STR_TE19=STSTANO ISCT_C20=CHRTOUSR
ISCT_C21=CHRTOSTR EXPRES22=CLASSCD DRPT_MAN=MANFLAG
)) END=LAST;
IF _N_ = 1
THEN DO;
%PUT &SQLXRC;
%PUT &SQLXMSG;
END;
RUN;
QUIT;
%PUT &SQLXRC;
%PUT &SQLXMSG;

Another Example:

PROC SQL;
TITLE "JOIN HOUSEHOLDS";
CREATE TABLE WORK2.CBLHHLD AS
SELECT * FROM HHLD1 AS BASEDS INNER JOIN WORK1.HHVIEW AS APNDS
ON BASEDS.HHLDID=APNDS.HHLDID
ORDER BY HHLDID;

This code is equivalent to the SAS MERGE step as:

DATA WORK2.CBLHHLD;
MERGE HHLD1(IN=BASEDS) WORK1.HHVIEW (IN=APNDS);
BY HHLDID;
IF BASEDS & APNDS THEN OUTPUT WORK2.CBLHHLD;
RUN;

Note: Through PROC SQL it is not possible to get merged observations for records where the BY
variable (WHERE clause) has values only in one dataset (or TABLE) as WHERE condition forces it to
retrieve records where values coincide from the two tables. But through MERGE, missing values of BY

Page 52 of 57
SAS Notes

variable are merged together to form observations in the output. So in order to simulate the SQL
statement, the merge criteria should be for common values of both indicator variables.

Page 53 of 57
SAS Notes

SAS Macro Language


SAS Macro language is a language in its own rights. It is not part of the proper SAS language but can act
on the SAS language.

The simplest macrolanguage object is the macrovariable. It is a set of characters that are identified by
name. Macroexpressions include macrovariables, constant text, macro functions and macro operators.
Macro statements act on macro languages in certain ways. Finally, the Macro is a stored
macrolanguage object.

Macro Variables

Macro variables (or symbolic variables) belong to the SAS macro language and are different from Data
step variables. You can define and use macro variables anywhere in a SAS program, except in data lines.
Also, the value of a dataset variable depends on the observation being processed. Macro variable, on the
other hand, contains one value that remains constant until explicitly changed. A Macro variable is
independent of the SAS Data set.

%let dsn= Newdata;

The value of a macro variable is simply a string of characters. The macro processor does not make a
distinction between character and numeric values. To refer to the macro variable value the pattern
&name (called macro variable reference) is used.

TITLE “Display of dataset &dsn”;

The macro processor resolves references in double quotes but not in single quotes.

It is also possible to create macro variable values that contain sections of SAS program as,

%let plot= %str(


proc print data = Newdata;
run;
);

&plot

The statements have to be enclosed in %STR() function so that semicolons within the value are part of
the text and not the end of the %let statement.

A null value assigned to a macro variable has a length of 0. So, the macro processor replaces a reference
to a null value with 0 characters.

The simplest way to display macro variables is to use %PUT statement as,
%put !!!&dsn!!!;

will be resolved as,


%put !!!Newdata!!!;

Macros

Page 54 of 57
SAS Notes

A macro is stored text identified by name. The %MACRO statement must begin every macro and must
contain a name for the macro. The %MEND statement closes every macro. The macro name can also
appear after %MEND for clarity. To invoke a macro, place a % in front its name, as

%macro dsn(lvar,fvar);
proc print data =&lvar..&fvar ;
run;
%mend dsn;
%dsn (work.sasdsn)

This pattern is called a macro invocation or a macro call. A macro is an entity in a utility catalog in a
library, usually WORK. SAS System does not support copying or renaming macros. A macro variable
defined in parenthesis in a %MACRO statement is a macro parameter. The macro processor matches the
first value to the first macro variable name, the second to the second, etc.

Page 55 of 57
SAS Notes

Some other SAS Products


SAS/CONNECT software is a cooperative processing product that through connections between remote
SAS sessions, provides the ability to transfer data among operating platforms supported by the SAS
System.
It also provides remote submission capabilities that allow users to submit SAS code to another host for
processing.

SAS/AF software is an interactive facility used to create user-friendly windowing applications. The
software can be used to develop computer-based training courses and on-line help systems. Ready-to-use
procedures construct menu screens from fill-in-the-blank panels, and transport screens between operating
environments.

The ACCESS procedure allows you to create access descriptors and view descriptors that can be used to
operate on data in a DBMS (like DB2) table or view through the SAS system procedures. The ACCESS
procedure creates SAS files of type ACCESS and VIEW that can be used in SAS programs. These files
describe a DB2 table or view to the SAS System so that the data contained in the table or view may be
read, analyzed, used, and updated by the SAS system. There are two basic types of files created by the
ACCESS procedure: ACCESS and VIEW. An ACCESS file is a file of descriptive information relating to
a DB2 table. The ACCESS file describes the data in the table to the SAS System. VIEW files can
identify a
subset of the data described by the ACCESS file. The data specified in a VIEW file can be used by the
SAS System in much the same way as a SAS data set is used. You can also create ACCESS files for DB2
views, thus allowing you to read data from several DB2 tables at once. However, you cannot modify data
read through a DB2 view.
You can also access data from many relational DBMSs using the SQL Procedure Pass-Through Facility.
The SQL procedure is a base SAS procedure that works with SAS/ACCESS software to send and receive
data directly between a DBMS and the SAS System. You can also store your Pass-Through code in a
PROC SQL view for later use. The SQL Procedure Pass-Through facility consists of three statements and
one component: CONNECT statement establishes a connection with the DBMS; EXECUTE statement
sends dynamic, non-query SQL statements to the DBMS; DISCONNECT statement terminates the
connection with the DBMS. You use the CONNECTION TO component in the FROM clause of a PROC
SQL SELECT statement to retrieve data directly from a DBMS. You can use the Pass-Through Facility
statements in a PROC SQL query or you can store them in a PROC SQL view. When you create the
view, any options that you specify in the corresponding CONNECT statement are also stored. Thus,
when the PROC SQL view is used in a SAS program, the SAS System can establish the appropriate
connection to the DBMS.

SAS/ETS software is an econometric and time series analysis tool for forecasting, planning and financial
modeling. There are procedures for time series analysis, linear and non-linear systems simulation,
loan amortization, depreciation and row-and-column financial reporting. SAS/ETS software is a
component of the SAS System.

SAS/FSP software, a component of the SAS System, includes procedures for full-screen interactive data
entry, editing, query and letter writing. With SAS/FSP software, users can generate personalized form
letters and reports, customize data presentation screens, create data entry applications that include cross
validation of field values and other data manipulation.

SAS/GRAPH software offers device-intelligent color graphics for producing charts, maps and plots in a
variety of patterns. Users can customize graphs with the software, and present multiple graphs on a
page. SAS/GRAPH software is a component of the SAS System, an applications system for data access,
management, analysis, and presentation.

SAS/OR software - an operations research, project management and decision support tool - handles
general assignment problems; performs critical path analysis and linear programming; and determines
minimum cost flow, maximum flow, and shortest path through a network. SAS/OR software is a
component of the SAS System.

Page 56 of 57
SAS Notes

SAS/QC software, a component of the SAS System, offers a variety of specialized tools for statistical
quality improvement applications. Included are procedures for generating Shewhart, cumulative sum and
moving average control charts, for performing process capability analysis, and for experimental design.

SAS/STAT software, a comprehensive statistical analysis tool, offers a wide range of capabilities
including analysis of variance, regression, categorical analysis, multivariate analysis, survival analysis,
psychometric analysis, cluster analysis and nonparametric analysis. SAS/STAT software is a component
of the SAS System.

SAS/SHARE software provides concurrent update access to SAS files. "Concurrent update access"
means that two or more users simultaneously update the same SAS file or SAS data library. SAS/SHARE
software is a component of the SAS System.

Page 57 of 57

Vous aimerez peut-être aussi