Académique Documents
Professionnel Documents
Culture Documents
Job N
Code Obs N Mean Std Dev Minimum Maximum
3
Calculating Summary
Statistics
for Numeric
The MEANS Variables
procedure displays simple
descriptive statistics for the numeric variables
in a SAS data set.
Example:
proc means data=mylib.crew;
title 'Salary Analysis';
run;
4
Calculating Summary
Statistics
Salary Analysis
5
Calculating Summary
Statistics
By default, PROC MEANS
analyzes every numeric variable in the
SAS data set
prints the statistics N, MEAN, STD, MIN,
and MAX
excludes missing values before
calculating statistics.
6
Specifying summary statistics to be
computed
PROC MEANS data = mylib.crew mean
median range std ;
8
Selecting Variables
The VAR statement restricts the
variables processed by PROC MEANS.
General form of the VAR statement:
VAR
VARSAS-variable(s);
SAS-variable(s);
9
Selecting Variables
Mylib.crew
HireDate LastName FirstName Location Phone EmpID JobCode Salary
07NOV1992 BEAUMONT SALLY T. LONDON 1132 E00525 PILOT1 72000
12MAY1985 BERGAMASCO CHRISTOPHER CARY 1151 E02466 FLTAT3 41000
04AUG1988 BETHEA BARBARA ANN FRANKFURT 1163 E00802 PILOT2 81000
CLASS
CLASS SAS-variable(s);
SAS-variable(s);
11
Grouping Observations
Mylib.crew
HireDate LastName FirstName Location Phone EmpID JobCode Salary
07NOV1992 BEAUMONT SALLY T. LONDON 1132 E00525 PILOT1 72000
12MAY1985 BERGAMASCO CHRISTOPHER CARY 1151 E02466 FLTAT3 41000
04AUG1988 BETHEA BARBARA ANN FRANKFURT 1163 E00802 PILOT2 81000
12
Grouping Observations using CLASS
statement
Salary by Job Code
Job N
Code Obs N Mean Std Dev Minimum Maximum
FLTAT1 14 14 25642.86 2951.07 21000.00 30000.00
15
Exercise
Write a program to read diabetes data set and use PROC Means to
produce summary statistics
for variables Age, Height and Weight.
Run the program and see the results.
Produce the summary statistics N, mean median, max, min std,
and range, and Set decimal places to two by Maxdec =2.
Run the program and see the results.
Ass the CLASS statement to produce summary results for each sex.
Run the program to see the results.
Practice using BY statement for each sex. Before you add the BY
SEX statement, Make sure you sort the data by SEX.
Run the program and see the result.
Add a WHERE statement to select cases for AGE > 30 to the
program.
Run the program and see the results.
16
Create data set for summary
statistics in PROC MEANS
In many occasions, we may want to create a SAS
data set consisting of the summary statistics
calculated by PROC MEANS.
OUTPUT OUT=sas-data-set
summary-keyword(s) = variablename(s);
NOTE: summary-keywords are: Mean, Min, Max,
Range, Std, etc.
Variablenames are the variable names you want to
call for each summary statistics for each variable.
17
Create Summary Data Set
using PROC MEANS
Examples:
PROC MEANS data= mylib.crew;
VAR Hiredate salary;
OUTPUT OUT = mylib.discrip
mean = avghiredate avgsalary
Median= medhiredate medsalary;
Run;
18
Exercise
Revise the following program to do the following task:
Use the OUTPUT OUT= statement to save the
summary statistics Mean, Median and Std to a sas
data set dia_summary, then print this data set to see
whats in there.
19
PROC SUMMARY procedure
PROC SUMMARY procedure uses the same
program codes as PROC MEANS.
21
PROC FREQ Output
Distribution of Job Code Values
22
Goal Report 1
International Airlines wants to know
how many employees are in each job
code.
Distribution of Job Code Values
23
Goal Report 2
Categorize job code and salary values to determine
how many employees fall into each group.
Salary Distribution by Job Codes
JobCode Salary
Frequency
Percent
Row Pct
Col Pct Less tha25,000 tMore tha Total
n 25,000o 50,000n 50,000
Flight Attendant 5 39 0 44
7.25 56.52 0.00 63.77
11.36 88.64 0.00
100.00 100.00 0.00
Pilot 0 0 25 25
0.00 0.00 36.23 36.23
0.00 0.00 100.00
0.00 0.00 100.00
Total 5 39 25 69
7.25 56.52 36.23 100.00 24
Creating a Frequency Report
PROC FREQ displays frequency counts of
the data values in a SAS data set.
General form of a simple PROC FREQ step:
PROC
PROCFREQ
FREQ DATA=SAS-data-set;
DATA=SAS-data-set;
RUN;
RUN;
Example:
proc freq data=mylib.crew;
run;
25
Creating a Frequency Report
By default, PROC FREQ
analyzes every variable in the SAS
data set
displays each distinct data value
calculates the number of observations in
which each data value appears (and the
corresponding percentage)
indicates for each variable how many
observations have missing values.
26
Default Frequency Reports
mylib.crew
29
Using PROC FORMAT to redefine
Categories of Values in TABLES
statement
International Airlines wants to use formats
to categorize the flight crew by job code.
Stored values Formatted values
PILOT1
PILOT2 Pilot
PILOT3
FLTAT1
FLTAT2 Flight Attendant
FLTAT3
30
Analyzing Categories of
Values
proc format;
value $codefmt
'FLTAT1'-'FLTAT3'='Flight Attendant'
'PILOT1'-'PILOT3'='Pilot';
run;
proc freq data = mylib.crew;
format JobCode $codefmt.;
tables JobCode;
run;
NOTE: The original data values for Jobocde are not changed.
They are still FLTAT1 FLTAT2, and so on.
31
Analyzing Categories of
Values
Cumulative Cumulative
JobCode Frequency Percent Frequency Percent
32
Crosstabular Frequency
Reports
A two-way, or crosstabular, frequency
report analyzes all possible combinations
of the distinct values of two variables.
The asterisk (*) operator in the TABLES
statement is used to cross variables.
General form of the FREQ procedure to
create a crosstabular report:
PROC
PROCFREQ
FREQ DATA=SAS-data-set;
DATA=SAS-data-set;
TABLES
TABLES variable1
variable1**variable2;
variable2;
RUN;
RUN;
Variable1 is ROW and Variable2 is Column
33
Crosstabular Frequency
Reports
proc format;
value $codefmt
'FLTAT1'-'FLTAT3'='Flight Attendant'
'PILOT1'-'PILOT3'='Pilot';
value money
low-<25000 ='Less than 25,000'
25000-50000='25,000 to 50,000'
50000<-high='More than 50,000';
run;
proc freq data=mylib.crew;
tables JobCode*Salary;
format JobCode $codefmt. Salary money.;
title 'Salary Distribution by Job Codes';
run;
34
Crosstabular Frequency
ReportsSalary Distribution by Job Codes
JobCode Salary
Frequency
Percent
Row Pct
Col Pct Less tha25,000 tMore tha Total
n 25,000o 50,000n 50,000
Flight Attendant 5 39 0 44
7.25 56.52 0.00 63.77
11.36 88.64 0.00
100.00 100.00 0.00
Pilot 0 0 25 25
0.00 0.00 36.23 36.23
0.00 0.00 100.00
0.00 0.00 100.00
Total 5 39 25 69
7.25 56.52 36.23 100.00 35
Additional Syntax for TABLES
statement in PROC FREQ;
Syntax
statement
Equivalent to
tables A*(B C); tables A*B A*C;
tables (A B)*(C D); tables A*C B*C A*D
B*D;
tables (A B C)*D; tables A*D B*D C*D;
tables A - - C; tables A B C;
tables (A - - C)*D; tables A*D B*D C*D
TABLES A*B*C;
Produces separate two-way tables of B*C
for each value of A.
To Suppress some columns
in the PROC FREQ summary
PROC FREQ;
report
TABLES var1*var2/ <OPTIONS>;
proc format;
value pulft LOW-70 = 'Low' 71-High = 'High'; run;
data diab; set mylib.diabetes; run;
39