Vous êtes sur la page 1sur 3

Base SAS Notes from Prep Guide

Chapter -1

DATA steps typically create or modify SAS data sets. They can also be used to produce customdesigned
reports. For example, you can use DATA steps to
put your data into a SAS data set
compute values
check for and correct errors in your data
produce new SAS data sets by subsetting, merging, and updating existing data sets.

PROC (procedure) steps are pre-written routines that enable you to analyze and process the data in a
SAS data set and to present the data in the form of a report. PROC steps sometimes create new SAS
data sets that contain the results of the procedure. PROC steps can list, sort, and summarize data. For
example, you can use PROC steps to
create a report that lists the data
produce descriptive statistics
create a summary report
produce plots and charts.

Rules for SAS Names


SAS data set names
can be 1 to 32 characters long
must begin with a letter (A–Z, either uppercase or lowercase) or an underscore (_)
can continue with any combination of numbers, letters, or underscores.

Name
Each variable has a name that conforms to SAS naming conventions. Variable names follow exactly the
same rules as SAS data set names. Like data set names, variable names
can be 1 to 32 characters long
must begin with a letter (A–Z, either uppercase or lowercase) or an underscore (_)
can continue with any combination of numbers, letters, or underscores.

Type
A variable's type is either character or numeric.
Character variables, such as Name (shown below), can contain any values.
Numeric variables, such as Policy and Total (shown below), can contain only
numeric values (the digits 0 through 9, +, -, ., and E for scientific notation).

Length
A variable's length (the number of bytes used to store it) is related to its type.
Character variables can be up to 32,767 bytes long. In the example below, Name has a length of 20
characters and uses 20 bytes of storage.
All numeric variables have a default length of 8. Numeric values (no matter how many digits they
contain) are stored as floating-point numbers in 8 bytes of storage, unless you specify a different length.
Format
Formats are variable attributes that affect the way data values are written. SAS software offers a variety
of character, numeric, and date and time formats. You can also create and store your own formats.
Informat
Whereas formats write values out by using some particular form, informats read data values in certain
forms into standard SAS values. Informats determine how data values are read into a SAS data set.

Chapter -2

SAS Libraries
In the previous chapter, you learned that SAS files are stored in SAS libraries. By default, SAS
Base SAS Notes from Prep Guide

defines several libraries for you:


Sashelp is a permanent library that contains sample data and other files that control how SAS works
at your site. This is a read-only library.
Sasuser is a permanent library that contains SAS files in the Profile catalog that store your personal
settings. This is also a convenient place to store your own files.
Work is a temporary library for files that do not need to be saved from session to session.

PROC CONTENTS procedure to create SAS output that describes either of the following:
the contents of a library
the descriptor information for an individual SAS data set.

PROC DATASETS with the CONTENTS statement to view the contents of a SAS library or a SAS data
set.

Chapter 5

Referencing a Raw Data File


Before you can read your raw data, you must reference the raw data file by creating a fileref. Just
as you assign a libref by using a LIBNAME statement, you assign a fileref by using a FILENAME
statement.

Chapter – 6

Writing a NULL Data Set


After you write or edit a DATA step, you can compile and execute your program without creating
a data set. This enables you to detect the most common errors and saves you development time.
A simple way to test a DATA step is to specify the keyword _NULL_ as the data set name in the
DATA statement.

PUT Statement
When the source of program errors is not apparent, you can use the PUT statement to examine
variable values and to print your own message in the log.

Character Strings
You can use a PUT statement to specify a character string to identify your message in the log.
The character string must be enclosed in quotation marks.

Data Set Variables


You can use a PUT statement to specify one or more data set variables to be examined for that
iteration of the DATA step.

Automatic Variables
You can use a PUT statement to display the values of the automatic variables _N_ and _ERROR_.

Compilation Phase
At the beginning of the compilation phase, the input buffer and the program data vector are
created. The program data vector is the area of memory where data sets are built, one
observation at a time. Two automatic variables are also created: _N_ counts the number of DATA
step executions, and _ERROR_ signals the occurrence of an error. DATA step statements are
checked for syntax errors, such as invalid options or misspellings.
Execution Phase
During the execution phase, each record in the input file is processed, stored in the program data
vector, and then written to the new data set as an observation. The DATA step executes once for
each record in the input file, unless otherwise directed.
Base SAS Notes from Prep Guide

Diagnosing Errors in the Compilation Phase


Missing semicolons, misspelled keywords, and invalid options cause syntax errors in the
compilation phase. Detected errors are underlined and are identified with a number and message
in the log. If SAS can interpret a syntax error, then the DATA step compiles and executes; if SAS
cannot interpret the error, then the DATA step compiles but doesn't execute.
Diagnosing Errors in the Execution Phase
Illegal mathematical operations or processing a character variable as numeric causes errors in
the execution phase. Depending on the type of error, the log might show a warning and might
include invalid data from the program data vector, and the DATA step either stops or continues.
Testing Your Programs
To detect common errors and save development time, compile and execute your program without
creating observations. Specify the keyword _NULL_ as the data set name to view compilation or
execution errors without creating a data set. Or use the OBS= option in the INFILE statement to
limit the number of observations that are read or created during the DATA step. You can also use
the PUT statement to examine variable values and to generate your own message in the log.

Vous aimerez peut-être aussi