Vous êtes sur la page 1sur 39

Chapter 9

Producing Descriptive Statistics


PROC MEANS;
Summarize descriptive statistics for
continuous numeric variables.
PROC FREQ;
Summarize frequency tables for
discrete numeric variables or
categorical variables.
Objectives
Compute statistical summaries such as
mean, median, std, min, max, and so
on for numeric continuous variables
Control # of decimals for reporting the
summary statistics
Difference between PROC MEANS and
PROC SUMMARY procedures.
Create one-way frequency table
Create 2-way, n-way cross frequency
table 2
PROC MEANS Output
Salary by Job Code

The MEANS Procedure

Analysis Variable : Salary

Job N
Code Obs N Mean Std Dev Minimum Maximum

FLTAT1 14 14 25642.86 2951.07 21000.00 30000.00

FLTAT2 18 18 35111.11 1906.30 32000.00 38000.00

FLTAT3 12 12 44250.00 2301.19 41000.00 48000.00

PILOT1 8 8 69500.00 2976.10 65000.00 73000.00

PILOT2 9 9 80111.11 3756.48 75000.00 86000.00

PILOT3 8 8 99875.00 7623.98 92000.00 112000.00

3
Calculating Summary
Statistics
for Numeric
The MEANS Variables
procedure displays simple
descriptive statistics for the numeric variables
in a SAS data set.

General form of a simple PROC MEANS step:


PROC
PROCMEANS
MEANS DATA=SAS-data-set;
DATA=SAS-data-set;
RUN;
RUN;

Example:
proc means data=mylib.crew;
title 'Salary Analysis';
run;
4
Calculating Summary
Statistics
Salary Analysis

The MEANS Procedure

Variable N Mean Std Dev Minimum Maximum

HireDate 69 9812.78 1615.44 7318.00 12690.00


Salary 69 52144.93 25521.78 21000.00 112000.00

NOTE: PROC MEANS computes summary statistics for


any variable we want. However, it is meaningless to
compute some variables, such as Hiredate.

5
Calculating Summary
Statistics
By default, PROC MEANS
analyzes every numeric variable in the
SAS data set
prints the statistics N, MEAN, STD, MIN,
and MAX
excludes missing values before
calculating statistics.

6
Specifying summary statistics to be
computed
PROC MEANS data = mylib.crew mean
median range std ;

To specify the summary statistics to be


computed, add them to the PROC
MEANS statement as options.
Limitting Decimal Places
By default, RPOC MEANS uses the BEST. Format
to display values in the report. It can be many
decimal places such as 52.000000
To specify the # of decimal places to k places:

PROC MEAN Data = Mylib.crew MAXDEC=k ;

Maxdec =2 will result in 2 decimals in the report.


Maxdec =0 will result in no decimal place in the report.

8
Selecting Variables
The VAR statement restricts the
variables processed by PROC MEANS.
General form of the VAR statement:

VAR
VARSAS-variable(s);
SAS-variable(s);

9
Selecting Variables
Mylib.crew
HireDate LastName FirstName Location Phone EmpID JobCode Salary
07NOV1992 BEAUMONT SALLY T. LONDON 1132 E00525 PILOT1 72000
12MAY1985 BERGAMASCO CHRISTOPHER CARY 1151 E02466 FLTAT3 41000
04AUG1988 BETHEA BARBARA ANN FRANKFURT 1163 E00802 PILOT2 81000

proc means data=Mylib.crew;


var Salary;
title 'Salary Analysis';
run;
Salary Analysis

The MEANS Procedure

Analysis Variable : Salary

N Mean Std Dev Minimum Maximum



69 52144.93 25521.78 21000.00 112000.00

10
Grouping Observations
Using CLASS statement
The CLASS statement in the MEANS
procedure groups the observations of
the SAS data set for analysis.

General form of the CLASS statement:

CLASS
CLASS SAS-variable(s);
SAS-variable(s);

11
Grouping Observations
Mylib.crew
HireDate LastName FirstName Location Phone EmpID JobCode Salary
07NOV1992 BEAUMONT SALLY T. LONDON 1132 E00525 PILOT1 72000
12MAY1985 BERGAMASCO CHRISTOPHER CARY 1151 E02466 FLTAT3 41000
04AUG1988 BETHEA BARBARA ANN FRANKFURT 1163 E00802 PILOT2 81000

proc means data=mylib.crew maxdec=2;


var Salary;
class JobCode;
title 'Salary by Job Code';
run;

NOTE: The MAXDEC= option controls the number of


decimal places displayed in the output.

12
Grouping Observations using CLASS
statement
Salary by Job Code

The MEANS Procedure

Analysis Variable : Salary

Job N
Code Obs N Mean Std Dev Minimum Maximum

FLTAT1 14 14 25642.86 2951.07 21000.00 30000.00

FLTAT2 18 18 35111.11 1906.30 32000.00 38000.00

FLTAT3 12 12 44250.00 2301.19 41000.00 48000.00

PILOT1 8 8 69500.00 2976.10 65000.00 73000.00

PILOT2 9 9 80111.11 3756.48 75000.00 86000.00

PILOT3 8 8 99875.00 7623.98 92000.00 112000.00


The summary is displayed based on the order of the categories of the


CLASS variable.
Variables in CLASS statement can be character or numeric.
13
Results due to CLASS
Statement
The summary is displayed based on the
order of the categories of the CLASS
variable.
Variables in CLASS statement can be
character or numeric. It is important to
make sure you do not use continuous
numeric variable in the CLASS statement.
If there are two or more variables in CLASS
statement, the order of the variables in the
CLASS statement determined the order in
the output report.
14
PROC MEAN procedure Using BY
Statement
PROC MEANS;
VAR variable list ;
BY Variable;

It is important to know that when using BY statement, the data


set MUST be sorted in ascending order based on the variables in
the BY statement first using PROC SORT.

The result using BY statement is displayed as separate tables


each is for the category of the variable in the BY statement.

If there are two or more variables in the BY statement, the order


determines the order of the displayed tables in the report.

15
Exercise
Write a program to read diabetes data set and use PROC Means to
produce summary statistics
for variables Age, Height and Weight.
Run the program and see the results.
Produce the summary statistics N, mean median, max, min std,
and range, and Set decimal places to two by Maxdec =2.
Run the program and see the results.
Ass the CLASS statement to produce summary results for each sex.
Run the program to see the results.
Practice using BY statement for each sex. Before you add the BY
SEX statement, Make sure you sort the data by SEX.
Run the program and see the result.
Add a WHERE statement to select cases for AGE > 30 to the
program.
Run the program and see the results.

16
Create data set for summary
statistics in PROC MEANS
In many occasions, we may want to create a SAS
data set consisting of the summary statistics
calculated by PROC MEANS.

OUTPUT OUT=sas-data-set
summary-keyword(s) = variablename(s);
NOTE: summary-keywords are: Mean, Min, Max,
Range, Std, etc.
Variablenames are the variable names you want to
call for each summary statistics for each variable.

17
Create Summary Data Set
using PROC MEANS
Examples:
PROC MEANS data= mylib.crew;
VAR Hiredate salary;
OUTPUT OUT = mylib.discrip
mean = avghiredate avgsalary
Median= medhiredate medsalary;
Run;

18
Exercise
Revise the following program to do the following task:
Use the OUTPUT OUT= statement to save the
summary statistics Mean, Median and Std to a sas
data set dia_summary, then print this data set to see
whats in there.

PROC MEANS data = mylib.diabetes maxdec =2 ;


var age height weight;
class sex;
run;

19
PROC SUMMARY procedure
PROC SUMMARY procedure uses the same
program codes as PROC MEANS.

PROC SUMMARY does not produce report by


default. In order to produce the report, you need
to add PRINT as the option:

PROC SUMMARY data = sasdataset PRINT;


When do we use PROC SUMMARY?
If you only want to produce and save the summary
to a SAS data set, you can use PROC SUMMARY.
OR you can use the option: NOPRINT in PROC
MEANS.
PROC FREQ procedure Objectives

Generate simple descriptive statistics


using the MEANS procedure.
Group observations of a SAS data set for
analysis using the CLASS statement in
the MEANS procedure.
Create one-way and two-way frequency
tables using the FREQ procedure.
Restrict the variables processed by the
FREQ procedure.

21
PROC FREQ Output
Distribution of Job Code Values

The FREQ Procedure

Job Cumulative Cumulative


Code Frequency Percent Frequency Percent

FLTAT1 14 20.29 14 20.29


FLTAT2 18 26.09 32 46.38
FLTAT3 12 17.39 44 63.77
PILOT1 8 11.59 52 75.36
PILOT2 9 13.04 61 88.41
PILOT3 8 11.59 69 100.00

22
Goal Report 1
International Airlines wants to know
how many employees are in each job
code.
Distribution of Job Code Values

The FREQ Procedure

Job Cumulative Cumulative


Code Frequency Percent Frequency Percent

FLTAT1 14 20.29 14 20.29
FLTAT2 18 26.09 32 46.38
FLTAT3 12 17.39 44 63.77
PILOT1 8 11.59 52 75.36
PILOT2 9 13.04 61 88.41
PILOT3 8 11.59 69 100.00

23
Goal Report 2
Categorize job code and salary values to determine
how many employees fall into each group.
Salary Distribution by Job Codes

The FREQ Procedure

Table of JobCode by Salary

JobCode Salary

Frequency
Percent
Row Pct
Col Pct Less tha25,000 tMore tha Total
n 25,000o 50,000n 50,000

Flight Attendant 5 39 0 44
7.25 56.52 0.00 63.77
11.36 88.64 0.00
100.00 100.00 0.00

Pilot 0 0 25 25
0.00 0.00 36.23 36.23
0.00 0.00 100.00
0.00 0.00 100.00

Total 5 39 25 69
7.25 56.52 36.23 100.00 24
Creating a Frequency Report
PROC FREQ displays frequency counts of
the data values in a SAS data set.
General form of a simple PROC FREQ step:

PROC
PROCFREQ
FREQ DATA=SAS-data-set;
DATA=SAS-data-set;
RUN;
RUN;
Example:
proc freq data=mylib.crew;
run;

25
Creating a Frequency Report
By default, PROC FREQ
analyzes every variable in the SAS
data set
displays each distinct data value
calculates the number of observations in
which each data value appears (and the
corresponding percentage)
indicates for each variable how many
observations have missing values.
26
Default Frequency Reports
mylib.crew

proc freq data=mylib.crew;


Distribution of run; Distribution of
HireDate Salary
Distribution of Distribution of
LastName JobCode
Distribution of Distribution of
FirstName Distribution of Distribution of EmpID
Location Phone
27
...
One-Way Frequency Report
Use the TABLES statement to limit the
variables included in the frequency counts.
These are typically variables that have a
limited number of distinct values.
General form of a PROC FREQ step with a
TABLES statement:
PROC
PROCFREQ
FREQ DATA=SAS-data-set
DATA=SAS-data-set ;;
TABLES
TABLESSAS-variables
SAS-variables// NOCUM;
NOCUM;
RUN;
RUN;

NOCUM option in the TABLES statement suppress


Cumulative frequency and Cumulative percentage
28
Creating a Frequency Report
proc freq data=mylib.crew;
tables JobCode;
title 'Distribution of Job Code Values';
run;
Distribution of Job Code Values

The FREQ Procedure

Job Cumulative Cumulative


Code Frequency Percent Frequency Percent

FLTAT1 14 20.29 14 20.29
FLTAT2 18 26.09 32 46.38
FLTAT3 12 17.39 44 63.77
PILOT1 8 11.59 52 75.36
PILOT2 9 13.04 61 88.41
PILOT3 8 11.59 69 100.00

29
Using PROC FORMAT to redefine
Categories of Values in TABLES
statement
International Airlines wants to use formats
to categorize the flight crew by job code.
Stored values Formatted values
PILOT1
PILOT2 Pilot
PILOT3

FLTAT1
FLTAT2 Flight Attendant
FLTAT3

30
Analyzing Categories of
Values
proc format;
value $codefmt
'FLTAT1'-'FLTAT3'='Flight Attendant'
'PILOT1'-'PILOT3'='Pilot';
run;
proc freq data = mylib.crew;
format JobCode $codefmt.;
tables JobCode;
run;

NOTE: The original data values for Jobocde are not changed.
They are still FLTAT1 FLTAT2, and so on.

31
Analyzing Categories of
Values

Distribution of Job Code Values

The FREQ Procedure

Cumulative Cumulative
JobCode Frequency Percent Frequency Percent

Flight Attendant 44 63.77 44 63.77


Pilot 25 36.23 69 100.00

32
Crosstabular Frequency

Reports
A two-way, or crosstabular, frequency
report analyzes all possible combinations
of the distinct values of two variables.
The asterisk (*) operator in the TABLES
statement is used to cross variables.
General form of the FREQ procedure to
create a crosstabular report:
PROC
PROCFREQ
FREQ DATA=SAS-data-set;
DATA=SAS-data-set;
TABLES
TABLES variable1
variable1**variable2;
variable2;
RUN;
RUN;
Variable1 is ROW and Variable2 is Column
33
Crosstabular Frequency
Reports
proc format;
value $codefmt
'FLTAT1'-'FLTAT3'='Flight Attendant'
'PILOT1'-'PILOT3'='Pilot';
value money
low-<25000 ='Less than 25,000'
25000-50000='25,000 to 50,000'
50000<-high='More than 50,000';
run;
proc freq data=mylib.crew;
tables JobCode*Salary;
format JobCode $codefmt. Salary money.;
title 'Salary Distribution by Job Codes';
run;

34
Crosstabular Frequency
ReportsSalary Distribution by Job Codes

The FREQ Procedure

Table of JobCode by Salary

JobCode Salary

Frequency
Percent
Row Pct
Col Pct Less tha25,000 tMore tha Total
n 25,000o 50,000n 50,000

Flight Attendant 5 39 0 44
7.25 56.52 0.00 63.77
11.36 88.64 0.00
100.00 100.00 0.00

Pilot 0 0 25 25
0.00 0.00 36.23 36.23
0.00 0.00 100.00
0.00 0.00 100.00

Total 5 39 25 69
7.25 56.52 36.23 100.00 35
Additional Syntax for TABLES
statement in PROC FREQ;
Syntax
statement
Equivalent to
tables A*(B C); tables A*B A*C;
tables (A B)*(C D); tables A*C B*C A*D
B*D;
tables (A B C)*D; tables A*D B*D C*D;
tables A - - C; tables A B C;
tables (A - - C)*D; tables A*D B*D C*D

TABLES A*B*C;
Produces separate two-way tables of B*C
for each value of A.
To Suppress some columns
in the PROC FREQ summary
PROC FREQ;
report
TABLES var1*var2/ <OPTIONS>;

Options for suppressing cell frequency: NOFREQ


Options for suppressing cell percent:
NOPERCENT
Options for suppressing ROW percent: NOROW
Options for suppressing COLUMN percent:
NOCOL
Additional usages of PROC FREQ
statement
In addition to reporting tables, PROC FREQ;
statement also conduct many statistical
tests for analyzing categorical data such
as
Chi-square test,
Cochran-Mantel-Haenszel test,
Fishers exact test,
Kappa coefficient,
Risk, Odds ratio and so on.
This is beyond the programming course.
Exercise
The Diabetes data set consists of Sex, Age, Height, Weight, Pulse FastGluc
PostGluc for 20 patients. Revise the following program by using PROC
FREQ procedure to perform the following tasks:
1. Use IF statement to create AGE_G variable : IF AGE > 45 then, AGE_G =
Senior , otherwise AGE_G = Young. Create one-way table for
variables SEX , Age_G, and Pulse using user-defined format.
Run the program and see the results.
2. Create cross tabular table sex*(Age_G Pulse), make sure the
user-defined format is applied for Pulse variable.
Run the program and see the results.
3. Suppressing ROW percent and Column percent.
Run the program and see the results.

proc format;
value pulft LOW-70 = 'Low' 71-High = 'High'; run;
data diab; set mylib.diabetes; run;

39

Vous aimerez peut-être aussi