Vous êtes sur la page 1sur 62

SAS III

COMPSTAT GROUP
www.compstatgroup.com
SAS Functions
SAS functions are built-in expressions that enable you to complete
many types of data manipulations quickly and easily.
SAS functions can be used in DATA step programming statements
and in some statistical procedures. A SAS function can be specified
anywhere that you would use a SAS expression, as long as the
function is part of a SAS statement.
All SAS functions are written by specifying the function name
followed by the function arguments, enclosed in parentheses
function-name(argument-1< ,argument-n>);
where argument can be
variables mean(x,y,z)
constants mean(456,502,612,498)
expressions mean(37*2,192/5,mean(22,34,56))
Even if the function does not require arguments, the function name
must still be followed by parentheses, for example: function-name().
SAS Functions
When a function contains more than one argument, the
arguments are usually separated by commas.
function-name(argument-1,argument-2,argument-n)
However, for some functions, variable lists and arrays
can also be used as arguments, as long as the list or the
array is preceded by the word OF.
mean(x1,x2,x3)
mean(of x1-x3)
mean(of newarray {*})
What will happen if the word OF is omitted from mean(of
x1-x3)?
SAS Functions
INPUT function
Is used for converting character data to numeric data.
Consider the following example:
Data Test1;
Set Mylib.Employee;
Salary = YrsOfExperience*Basic + var_allowance;
Run;
Suppose the variable YrsOfExperience is defined as type character.
In the assignment statement, Yrsof Experience is being used in a
arithmetic operation. SAS software will detect the mismatched
variables and try an automatic character-to-numeric conversion.
This automatic conversion doesnt work always.
That is, whenever a character variable is referenced in a numeric
context SAS software tries to convert character values to numeric.
Whenever data is automatically converted, a message is written to the
SAS log stating that the conversion has occurred

SAS Functions
Automatic character-to-numeric conversion occurs when
a character value is
assigned to a previously defined numeric variable, such as the
numeric variable
used in an arithmetic operation
compared to a numeric value with a comparison operator
specified in a function that requires numeric arguments
The INPUT function converts character data values to
numeric values
The general form of the INPUT function is
INPUT (source,informat)
source indicates the character variable, constant, or expression to
be converted to a numeric value
a numeric informat must also be specified
SAS Functions
Consider the following example:
data testin;
input sale $9.;
fmtsale=input(sale,comma9.);
datalines;
2,115,353
;
When choosing the informat, be sure to select a numeric informat
that can read the form of the values.
SAS Functions
PUT function
Is used for converting numeric data to character data
Consider the following example:
SiteCode=site||department;
The variable Site is of type numeric whereas department is of type
character. As the variable Site (numeric variable) is being used in
character context SAS, will try to automatically convert numeric
variable to character variable.
Using the PUT function, you can explicitly convert numeric data values
to character data values
The general form of PUT function is as follows:
PUT (source,format)
source indicates the numeric variable, constant, or expression to be
converted to a character value
a format matching the data type of the source must also be
specified, as in this example:
SAS Functions
Consider the following example:
Assignment=put(site,2.)||'/'||department;
Remember that the format specified in the PUT function must match
the data type of the source
So, to do an explicit numeric-to-character data conversion, you
specify a numeric source and a numeric format.
Note that the PUT function requires a format, whereas the INPUT
function requires an informat
SAS Functions
SAS Functions for Date and Times
SAS software stores a date value as the number of days from
January 1, 1960, to a given date.
A SAS time value is stored as the number of seconds since
midnight. For example,
Consequently, a SAS datetime value is stored as the number of
seconds between midnight on January 1, 1960, and a given date
and time. For example,
SAS Functions
MONTH Function
Returns the month from a SAS date value
General form of MONTH function:
MONTH(date)
where date is a SAS date value that is specified as a variable or
a SAS date constant
The value returned by the MONTH function is a numeric value
that ranges from 1 to 12. It represents the month of the year. The
value 1 represents January, 2 represents February, and so on
Example for a MONTH function:
data hrd.tempnov;
set hrd.temp;
if month(begindate)=11; run;
SAS Functions
YEAR Function
Returns the year from a SAS date value
General form of YEAR function:
YEAR(date)
where date is a SAS date value that is specified as a variable or
a SAS date constant
The value that is returned by the YEAR function is a four-digit
numeric value representing the year, for example, 2002.
Example for a YEAR function:
data hrd.temp98;
set hrd.temp;
if year(begindate)=1998;
run;
SAS Functions
DAY Function
Returns the day of the month from a SAS date value
General form of DAY function:
DAY(date)
where date is a SAS date value that is specified as a variable or
a SAS date constant
The DAY function produces an integer from 1 to 31 that
represents the day of the month.
Example for a DAY function:
now='05may97'd;
d=day(now);
SAS Functions
MDY function
The MDY function creates a SAS date value from numeric values that
represent the month, day, and year.
The general form of the MDY function is:
MDY(month,day,year)
Where,
month can be a variable that represents the month or a number from 1-
12
day can be a variable that represents the day or a number from 1-31
year can be a variable that represents the year or a number with 2 or 4
digits.
Example for MDY function:
m=8;
d=27;
y=90;
birthday=mdy(m,d,y);
SAS Functions
Be careful when entering and formatting year values.
The MDY function accepts two-digit values for the year,
but SAS software interprets two-digit values according to
the 100-year span set by the YEARCUTOFF= system
option. For Version 8 of SAS Software, the default value
of YEARCUTOFF= is 1920
If you specify an invalid date in the MDY function, SAS
software assigns a missing value to the target variable.
m=15;
d=27;
y=90;
birthday=mdy(m,d,y);
SAS Functions
TODAY function
The TODAY function returns the current date from the system
clock as a SAS date value
The general form of the TODAY function is:
TODAY()
This function requires no arguments, but it must still be followed
by parentheses.
The DATE function can also create a SAS date value from the
current date. The TODAY and DATE functions have the same
form and can be used interchangeably.
actualdate1=today();
actualdate2=date();
SAS Functions
SCAN Function
The SCAN function enables you to separate a character value into
words and to return a specified word.
Suppose a variable Name has the following values:
CICHOCK, ELIZABETH MARIE
BENINCASA, HANNAH LEE
We have to separate the value of Name into First name, Middle Name and Last
Name.
The SCAN function uses delimiters, which are characters specified as
word separators, to separate a character string into words.
For example, if you are working with the character string below and you
specify the comma as a delimiter, the SCAN function separates the
string into four words.
209 RADCLIFFE ROAD, CENTER CITY, NY, 92716
SAS Functions
When using the SCAN function, you can specify as
many delimiters as needed to correctly separate the
character expression.
When you specify multiple delimiters, SAS software uses
all of the delimiters as word separators
For example, if you specify the slash and the hyphen as
delimiters, the SCAN function separates the following
text string into three words:
607/555-1273
The SCAN function treats two or more contiguous
delimiters, such as the parentheses and slash below, as
one delimiter. Also, leading delimiters have no effect.
(345)/5672/TRAILER
SAS Functions
If you do not specify delimiters when using the SCAN
function, default delimiters are used. The default
delimiters are
blank . < ( + | & ! $ * ) ; ^ - / , %
The general form of the SCAN function is:
SCAN(argument,n,delimiters)
where
argument specifies the character variable or expression to scan
n specifies which word to read
delimiters are special characters that must be enclosed in single
quotation marks (' ').
SAS Functions
Examples for SCAN function:
LastName=scan(name,1);
FirstName=scan(name,2);
MiddleName=scan(name,3);
The SCAN function assigns a length of 200 to each target variable.
To save storage space, add a LENGTH statement to the DATA step
to set an appropriate length for all three variables.
Because SAS software sets the length of a new character variable
the first time it is encountered in the DATA step, be sure to place the
LENGTH statement before the assignment statements that contain
the SCAN function.
SAS Functions
SUBSTR function
Is used to extract a substring from an argument.
Replaces character value contents.
General form of SUBSTR function
SUBSTR(argument,position,n)
Where
argument specifies the character variable or expression to scan
position is the character position to start from.
n specifies the number of characters to extract. If n is omitted, all
remaining characters are included in the substring.
SAS Functions
Example for SUBSTR
a='KIDNAP';
substr(a,1,3)='CAT';
put a;
date='06MAY98';
month=substr(date,3,3);
year=substr(date,6,2);
put @1 month @5 year;
SAS Functions
TRIM function
Removes trailing blanks from character expressions
and returns one blank if the expression is missing.
General form of TRIM function:
TRIM(argument)
where argument can be any character expression, such as
a character variable: trim(address)
another character function: trim(left(id))
SAS Functions
Example for TRIM function:
data test;
input part1 $ part2 $;
hasblank=part1||part2;
noblank=trim(part1)||part2;
put hasblank;
put noblank;
datalines;
apple sauce
;
SAS Functions
TRIM function does not affect the way a variable is stored.
Suppose you trim the values of a variable and then assign these
values to a new variable. The trimmed values are padded with
trailing blanks again if the values are shorter than the length of the
new variable.
SAS Functions
INDEX function
Searches a character expression for a string of characters
General form of INDEX function is as follows:
INDEX(source,excerpt)
Where
Source specifies the character expression to search.
Excerpt specifies the string of characters to search for in the
character expression
The INDEX function searches source, from left to right, for the
first occurrence of the string specified in excerpt, and returns the
position in source of the string's first character.
If the string is not found in source, INDEX returns a value of 0.
If there are multiple occurrences of the string, INDEX returns
only the position of the first occurrence.
SAS Functions
Keep in mind that the INDEX function is case sensitive, so the
character string you are searching for must be specified exactly
as it is recorded in the data set.
Examples for INDEX function:
a='ABC.DEF (X=Y)';
b='X=Y';
x=index(a,b);
put x;
index(upcase(job),'WORD PROCESSING')
SAS Functions
UPCASE function
The UPCASE function converts all letters in a character
expression to uppercase
General form of UPCASE function is as follows:
UPCASE(argument)
where argument can be any SAS expression, such as a
character value or constant.
Examples for UPCASE function:
name=upcase('John B. Smith');
put name;
SAS Functions
LOWCASE function
The LOWCASE function converts all letters in a character
expression to uppercase
General form of LOWCASE function is as follows:
LOWCASE(argument)
where argument can be any SAS expression, such as a
character value or constant.
Examples for LOWCASE function:
x='INTRODUCTION';
y=lowcase(x);
put y;
PROCEDURES
SORT procedure:
The general form of the simple PROC SORT step is
PROC SORT DATA=SAS-data-set
OUT=SAS-data-set;
BY BY-variable(s);
RUN;
where
the DATA= option names the data set to be read
the OUT= option creates an output data set containing the data
in sorted order.
BY-variable(s) in the required BY statement specifies one or
more variables whose values are used to order the data
PROCEDURES
Example of the simple SORT procedure
proc sort data=clinic.admit out=wgtadmit;
by weight;
run;
In the above example the dataset wgtadmit is sorted by the
weight of the patient.
The listing of this dataset
generates the following
report.
proc print data=wgtadmit;
run;
PROCEDURES
By default the report displays observations in ascending order of the
first BY variable.
The following example sorts the dataset wgtadmit in the
descending order of weight and generates the listing of the
dataset.
proc sort data = wgtadmit;
by descending weight;
run;
proc print data=wgtadmit;
run;
PROCEDURES
General form of the PRINT Procedure:
PROC PRINT <DATA=SAS-data-set>;
RUN;
where SAS-data-set is the name of the SAS data set to
be printed.
Note the following points for the default report generated
by the PRINT procedure:
all observations and variables in the data set are printed
a column for observation numbers appears on the far left
variables appear in the order that they occur in the data set
PROCEDURES
Example of the PRINT procedure:
libname clinic 'your-SAS-library';
proc print data=clinic.admit;
run;
PROCEDURES
The options used with the PRINT procedure
Customizing the OBS column header.
Specify the OBS= option in the PROC PRINT statement.
Example:
proc print data=clinic.admit obs=Patient;
run;
Removing the OBS column.
Specify the NOOBS option in the PROC PRINT
statement.
Example:
proc print data=clinic.admit noobs;
run;
PROCEDURES
Removing the OBS column.
Specify the NOOBS option in the PROC PRINT
statement.
Example:
proc print data=clinic.admit noobs;
run;
Controlling the order of the variables and the number
of variables to be listed in the report using the VAR
statement.
General form of the VAR statement:
VAR variable(s);
where variable(s) is one or more variable names, separated
by blanks.
PROCEDURES
Removing the OBS column.
Specify the NOOBS option in the PROC PRINT
statement.
Example:
proc print data=clinic.admit noobs;
run;
Controlling the order of the variables and the number
of variables to be listed in the report using the VAR
statement.
General form of the VAR statement:
VAR variable(s);
where variable(s) is one or more variable names, separated
by blanks.
PROCEDURES
Example of the VAR statement:
proc print data=clinic.admit;
var age weight height;
run;
Controlling the observations to be printed by adding
the WHERE statement to the PROC PRINT step.
General form of WHERE statement:
WHERE where-expression;
where where-expression specifies a condition for selecting
observations.
PROCEDURES
Example of the where statement:
proc print data=clinic.admit;
var age weight height;
where age >30;
run;
Examples of compounded where statement:
where (age<=31 and fee>200) or height > 61;
where pulse in ('LOW',HIGH');
where pulse ='LOW' or pulse = HIGH';
where fee in (150,300);
PROCEDURES
Generating column totals for numeric variables to be summed in
a SUM statement in the PROC PRINT step.
Example:
proc print data=clinic.admit;
var age height weight fee;
where age > 30;
sum fee;
run;
Column totals appear at the end of the report in the
same format as the values of the variables.
PROCEDURES
To display the report generated by PRINT procedure
with more descriptive text we use the Label statement
as well as the Label Option.For example:
proc print data=clinic.admit label;
label Weight='Weight/Lb ';
run;
Labels can be up to 256 characters long and must be enclosed
in quotes
PROCEDURES
Using the SPLIT= option with the PRINT
procedure to enhance the output.
proc print data=clinic.admit split='* ;
label Weight='Weight*Lb ';
run;
PROCEDURES
Using the ID statement with the PRINT procedure to identify
observations by using the formatted values of the variables that you
list instead of by using observation numbers.
proc print data=clinic.admit split='* ;
label Weight='Weight*Lb ';
Id fee ;
run;
PROCEDURES
Using the PAGEBY statement to control page
ejects that occur before a page is full.
The general form of the PAGEBY statement is :
PAGEBY BY-variable;
where
BY-variable identifies a variable appearing in the BY
statement in the PROC PRINT step. If the value of the
BY variable changes, or if the value of any BY
variable that precedes it in the BY statement
changes, PROC PRINT begins printing a new page.
PROCEDURES
Using the SUMBY statement to limits the number of sums that appear in the
report.
The general form of the SUMBY statement is :
SUMBY BY-variable;
where
BY-variable identifies a variable that appears in the BY statement in
the PROC PRINT step. If the value of the BY variable changes, or if the
value of any BY variable that precedes it in the BY statement changes,
PROC PRINT prints the sums of all variables listed in the SUM
statement.
PROCEDURES
The FREQ procedure is a descriptive as well as a statistical
procedure that produces one-way to n-way frequency and cross
tabulation tables. It can also compute measures of association and
of agreement, and organize output by stratification of the variables.
General form of the basic FREQ procedure
PROC FREQ <DATA=SAS-data-set>;
RUN;
where SAS-data-set names the data set to be used.
By default, PROC FREQ creates a one-way table with the
frequency, percent, cumulative frequency, and cumulative
percent of every value of all variables in a data set.
PROCEDURES
Frequency gives the number of observations with the value.
Percent gives the frequency of the value divided by the total number
of observations.
Cumulative frequency gives the sum of the frequency counts of the
value and all other values listed above it in the table.
Cumulative percent gives the Cumulative frequency of the value
divided by the total number of observations.
PROCEDURES
The FREQ procedure creates frequency tables for every variable in your
data set by default.
Frequency distributions work best with variables that contain repeating
values.
We use the TABLES statement to specify the variables in the FREQ
procedure.
The general form of the TABLES statement is
TABLES variable(s);
where variable(s) lists the variables to include.
PROCEDURES
Consider the following DATA step. The dataset created by this
DATA step will be used for running the various PROC FREQ
examples.
data temp;
input age height weight sex $;
datalines;
65 72 160 F
66 76 152 F
90 56 123 M
90 66 137 F
87 90 144 F
;
PROCEDURES
Consider the following example:
Proc Freq data = temp;
Tables sex age;
run;
PROCEDURES
Age Frequency Percent Cumulative
Frequency Cumulative
Percent
651 20.00 1 20.00
66 1 20.00 2 40.00
87 1 20.00 3 60.00
90 2 40.00 5 100.00
PROCEDURES
By default FREQ procedure displays frequency distributions in the
order of each variables unformatted values. To control the order in
which the FREQ procedure creates the distribution you specify the
ORDER = Option.
The general form of the ORDER= option is as follows:
ORDER=DATA|FORMATTED|FREQ|INTERNAL
where
DATA orders values by appearance in the data set
FORMATTED orders by formatted value
FREQ orders values by descending frequency count
INTERNAL orders by unformatted value (default).
ORDER= option does not apply to missing values, which are always ordered
first.
PROCEDURES
Example of FREQ procedure with ORDER = formatted;
data temp;
input age height weight sex $;
datalines;
65 72 160 F
66 76 152 F
90 56 123 M
90 66 137 F
87 90 144 F
;
proc format;
value agfmt low-65 ='Average'
66 -87 ='Old'
88 -95 ='Very Old';
PROCEDURES
Continued
proc freq data = temp order = formatted;
tables age;
format age agfmt.;
run;
PROCEDURES
For a frequency analysis of more than two variables, use PROC
FREQ to create n-way crosstabulations. A series of two-way tables
will result, with a table for each level of the other variables.
Proc Freq data = temp;
Tables sex*height*weight;
run;
The order of the variables is important. In n-way tables, the last two
variables of the TABLES statement become the two-way rows and
columns. Variables preceding the last two in the TABLES statement
stratify the crosstabulation tables.
PROCEDURES
To generate list output for crosstabulations, add a slash (/) and the
LIST option to the TABLES statement in your PROC FREQ step.
TABLES variable-1*variable-2 <* ... variable-n> / LIST;
Proc Freq data = temp;
Tables sex*height*weight/ LIST;
run;
PROCEDURES
You can use options to suppress any of these statistics. To control
the depth of crosstabulation results, add any combination of the
following options to the TABLES statement:
NOFREQ suppresses cell frequencies
NOPERCENT suppresses cell percentages
NOROWsupresses row percentages
NOCOL suppresses column percentages
Proc Freq data = temp;
Tables sex*height*weight/ NOFREQ;
run;
PROCEDURES
PROC MEANS
In its simplest form, PROC MEANS prints the n-count (number of
non-missing values), mean, standard deviation, and minimum
and maximum values of every numeric variable in a data set.
General form of PROC MEANS:
PROC MEANS <DATA=SAS-data-set>
<statistic-keyword(s)> <option(s)>;
RUN;
where
SAS-data-set identifies the data set to process
statistic-keyword(s) specifies the statistics to compute
option(s) control the content, analysis, and appearance of
output.
PROCEDURES
Mentioned below is a example of a PROC MEANS:
PROC MEANS Data = Temp;
Run;
To limit the statistics computed or to specify a different
statistic to be computed include statistic keywords as
PROC MEANS Options. When a statistic is specified in
the PROC MEANS statement, default statistics are not
produced.
PROC MEANS Data = Temp median range;
Run;
PROCEDURES
By default, PROC MEANS output uses the BEST. format. This can
result in unnecessary decimal places, making the output hard to
read.
To limit decimal places, use the MAXDEC= option in the PROC
MEANS statement and set it equal to the preferred length.
PROC MEANS Data = Temp median range MAXDEC = 0;
Run;
By default, the MEANS procedure generates statistics for every
numeric variable in a data set.
To specify the variables that PROC MEANS analyzes, add a VAR
statement and list the variable names.
PROC MEANS Data = Temp median range MAXDEC = 0;
VAR Height Weight;
Run;
PROCEDURES
To produce separate analyses of grouped observations,
add a CLASS statement to the MEANS procedure.
PROC MEANS Data = Temp median range
MAXDEC = 0;
VAR Height Weight;
CLASS Age;
Run;
PROC MEANS will not generate statistics for CLASS
variables because their values are only used to
categorize data. Thus, CLASS variables can be either
character or numeric.
PROCEDURES
Like the CLASS statement, the BY statement specifies
variables to use for categorizing observations
PROC MEANS Data = Temp median range MAXDEC = 0;
VAR Height Weight;
BY Age;
Run;
The BY statement and the CLASS statement differ in
following aspect:
Unlike CLASS processing, BY processing requires that your data
already be sorted in the order of the BY variables
BY group results have a layout that is different from that of
CLASS group results. The BY statement in the program creates
separate table for each BY group whereas a CLASS statement
would produce a single large table.
PROCEDURES
In order to output statistics to a new dataset use the OUTPUT Out =
option.
PROC MEANS DATA = TEMP noprint;
Output out = mean12 mean =chmean max = chmax min = chmin;
run;

Vous aimerez peut-être aussi