Vous êtes sur la page 1sur 36

Workingwith Date

and LongitudinalDa

A. Introduction
B. Processing Date Variables
C. Workingwith Two-digitYearValues(the Y2K Problem
D. LongitudinalData
E. Selectingthe First or Last Visit per Patient
n ComputingDifferencesbetweenObservationsin a Lon
G. Computingthe Differencebetweenthe First and Last O
H. ComputingFrequencieson LongitudinalData Sets
YESI.D.;
I. CreatingSummaryData Setswith PROC MEANS or P
J. OutputtingStatisticsOther Than Means

A. INTRODUCTION
Working with datesis a task that data analystsfreque
manypowerfulresourcesfor workingwith dates.The
lPAlrlrC';
in almostany form or to computethe number of day
two dates.
Data collectedfor the samesetof subjectsat diff
longitudinaldata.Thesedatarequirespecializedtechn
seeinghow date valuesare handledwith SAS softwar

B. PROCESSING
DATEVARIABLES
Supposeyou want to read the following informationin

Indformat lists (1.) are explained


Variable Name Description Column(s)
ID PatientID 1-3
DOB Date of birth 4-13
ADMIT Date of admission r4-23
DISCHRG Dischargedate 24-33
DX Diagnosis 34
FEE Hospitalfee 35-39

You might be temptedto write an input statementlike:


INPI]'I ID 1-3
DOB 4-T3
ADMIT T4-23
DISCHRG 24-33
DX 34
FEE 35-39;

However,you cannotreadthe 10datedigitsand slashesasa number(you would get


an error).You could read the datesascharactervalues,but you would not be able to do
any calculationson their values(at leastdirectly).So what do we do? SAS softwarein-
cludesextensiveprovisionsfor working with dateandtime variables.Thefirst stepin read-
ing datevaluesis to tell the programhow the datevalueis written.Commonexamplesare:

Example Explanation SAS INFORMAT


102150 Month - Day - 2-digitYear MMDDYY6.
t02r1950 Month - Day - 4-digitYear MMDDYY8.
I0t21t50 Month - Day -2-digitYear MMDDYYS.
10t21t1950 Month - Day - 4-digitYear MMDDYYlO.
211050 Day - Month - 2-digitYear DDMMYY6.
21101950 Day - Month - 4-digitYear DDMMYY8.
501021 2-digit Year - Month - Day YYMMDD6.
19501021 4-digitYear- Month - Day YYMMDD8.
210CT50 Day,3-characterMonth, 2-digitYear DATE7.
210CT1950 Day,3-characterMonth, 4-digitYear DATE9.
OCT5O Month and 2-digitYear only MONYY5.
ocT1950 Month and 4-digitYear only MONYYT.
give the program instructions on how to read a da
Column(s) MMDDYY10., for example,is usedto read datesin mo
informat refers to the number of columns occupied b
1-3
space-between-the-numbers and column specificatio
+13
data values.
14-23
To read date values.we can use pointers and in
L+-JJ
input). A column pointer (@) first tells the program w
34
follow this with the variable name and a specification
35-39
ing, called an informat. Two very common informats a
informat that saysto read W columns of data and to
decimal point. For example,the informat 6.1 says to
decimal point before the last digit. If you are not spec
have to specifythe value to the right of the period.Th
to 6.0.The $W. informat is used to read W columns
MMDDYY10. is used for dates in the form 101211195o
column assisnments.a valid INPUT statementwould

sasa number (you would get


you would not be able to do rNPtrr G1 rD $3.
do we do? SAS software in- @4 DoB MMDDYYIO.
@1,4 AD,IIT MMDDYY1O.
rriables.The
first step in read-
824 DTSC}IRGMMDDYY1O.
tten.Commonexamplesare:
e34 DX 1.
835 FEE 5. ;
SASINFORMAT
MMDDYY6.
The @ signs,referred to as column pointers, tell
MMDDYY8.
start reading the next data value.Our three dates are a
MMDDYYs.
day-year)and occupy l0 columns,so the MMDDYYI0.
MMDDYY1O.
DDMMYY6. cided not to include the two slashesin the date,the date
the MMDDYYS. informat would be used.Remember
DDMMYY8.
mats end with periods (or a period followed by a numb
YYMMDD6.
able names.All the column pointers in the previous
YYMMDD8.
example,the program would start reading in column 1 w
DATE7.
the ID endsin column 3 and the date of birth startsin c
DATE9.
dant. Good programming practice suggeststhat using
MONYY5.
able is a good idea. (See Chapter 12 for more de
MONYYT.
statement.)
INPUI @1 ID 53.
@4 DOB MMDDYYIO.
81,4 ADI/IIT MMDDYYIO.
@24 DISCHRGMMDDYYIO.
G34 DX 1.
G35 FEE 5. ;
LH{GTH*STAY - DISCIIRG-ADMIT + 1;
AGE=ADvIlT-DOB;
DATALINES,
00110 / 21 / 19 4612 / 1.2/ 2004L2 / 1-4/ 20048 8000
00205/ 01,/ 1,98007/ 08 /200408 / 0B/2004412000
00301 / 01,/ L9 600L / 01,/ 200401 / 04 / 20043 9000
/ rr / 200 4r2 / 25 / 200 47 1s]-23
o0 406 / 23 / 1_9981,1
;

The calculation for length of stay (LENGTH_STAY) is relatively straightforward.


We subtract the admissi<lndate from the discharge date and add 1 (we want to count
both the admission day and the discharge day in the length-of-stay computation). If we
subtract the date of birth (DOB) from the admissiondate (ADMIT), the result is the
age in days.Look at a listing of this data set:

Listino of Data Set HoSPITAL

LENGTH_
ID DOB ADMIT DISCHRG DX FEE STAY AGE

001 -4820 16417 16419 B 8000 3 21237


002 7426 16260 16 2 9 1 4 12 0 0 0 32 8834
'1
003 0 6071 16074 3 9000 4 1607'l
004 14053 16386 16 4 3 0 15123 45 2333

This rather strange listing clearly demonstrateshow SAS stores dates.For example,
look at the date of birth for subject 003.Notice it is zero.Why? Becausethis person was
born on January 1, 1960,and this is day zero in SAS-land.Well, you wouldn't want to
show this listing to your bossor colleagues.How do we make the date values look like
the dates we know and love? Just as we used formats in the last chapter to change the
way our valuesprinted, we can use formats here to changethe appearanceof the date
values.Wedon't even have to use PROC FORMAT to create theseformats-SAS has
already done it for us.Two very popular date formats are MMDDYYIO. and DATE9.
Listing of Data Set HOSPITAL

ID DOB ADMIT DISCHRG

001 1 0 1 2 1I 1 9 4 6 12DEC2004 14 D E C 2 0 0 4
002 0 5 / 0 1/ 1 9 8 0 08JU12004 08AUG2004
003 0 1/ 0 1 / 1 9 6 0 0 1JAN2004 04JAN2004
004 06/23l1998 1 1N0V2004 25DEC2004

Notice the result of these two SAS formats. (The D


sion and dischargedates is particularly useful when y
that use the month-day-year format and others tha
Next, we need to compute age in years rather than
By subtractingthe date of birth from the adm
daysbetween thesetwo dates.We could convert this
is relativelystraightforward. proximately correct since there is a leap year every
md add 1 (we want to count
468 = (ADMIT-DOB) / 365.25;
r-of-staycomputation). If we
: (ADMIT), the result is the However, a better way to compute the difference in
SAS function calledYRDIF. You supply the YRDIF
second date, and it computes the number of years b
the exact age as of the admissiondate is:
AGE = YRDIF(DOB,ADMIT, 'ACTUAL') ;

LENGTH- SAS functions (see Chapters 17 and 18) are


STAY AGE
perform calculationsfor us.In this example,the thre
3 21237 are the first date,the seconddate, and a calculation m
32 8834 (information that is supplied to the function) are sep
4 16071 parenthesesfollowing the function name.
45 2333 What if you wanted a person's age as of a pa
How many days is January 1,200-5, from January 1, 1
Let's let SAS compute this for us.You can specifya
J storesdates.For example, for example.January 1,2005,is equal to:
hy? Becausethis person was '01JAN2005'D
. Well,you wouldn't want to
rke the date values look like The form of a SAS date constantis a one- or t
re lastchapterto changethe letter month abbreviation,and a two- or four-digit
) the appearanceof the date placed in single or double quotes,followed by a lo
ate theseformats-SAS has wanted to compute a person'sage as of January 1, 2
MMDDYYl0. and DATE9. AGE = YRDIF(DOB, '01JAN2005'D, 'ACTUAL')
;
We may want to define age so that a personis not considered,say,18 yearsold,
until his 18thbirthday.That is,we want to omit any fractionalportion of his/heragein
years.A SAS functionthat doesthis is the INT (integer)function.We can write:
AGE = fNT(AGE);

to remove the fractional part of the age value. You can "nest" functions (place one in-
side of another), as shown here, to make this part of your program more compact (and
yes.elegant).
AGE = INT(YRDIF(DOB,ADMIT, ,ACTUAL,) );

Noticethe useof parentheses to keepthingsstraight.If we wantedto round the ageto


the nearesttenth of a year,we would usethe ROUND function.This functionhastwo
arguments:the number to be roundedand the roundoff unit. To round to the nearest
tenth of a year,we use:
AGE = ROUND(\RDIF(DOB,ADMIT, 'ACTUAL') , .1) ;

To the nearestyear,the functionwould be:


A G E = R O U N D ( Y R D I F( D O B , A D M I T , ' A C T U A L ' )) ;

Note: If you leaveoff the secondargumentof the ROUND function,it assumes


you want to round to the nearestinteger.
It is importantto rememberthat onceSAS hasconvertedour datesinto the num-
ber of daysfrom JanuaryL,1960,it is storedjust like any othernumericvalue.Therefore,
if we print it out (with PROC PRINT,for example),the resultwill not look like a date.
We need to supply SAS with a format to usewhen printing out dates.This format does
not haveto be the sameonewe usedto readthe datein the first place.Someusefuldate
formatsand their resultsare shownin the followins table:

The Date l0l2Ul950 Printed with Different Date Formats:

Format Result
MMDDYY6. 102150
MMDDYY8. 10t2u50
MMDDYY1O. r0t2U\950
DATE7. 210CT50
DATE9. 210CT1950
WORDDATE. October2I,1950
rnsidered,say,18 years old, As October21,1904,or October2I,2004?SAS hasa
ral portion of his/her age in calledYEARCUTOFF:value. The valueyou suppl
nction.We can write: beginningof a 100-yearinterval.Any two-digitdatew
yearwindow.Startingwith SAS version7, the default
tion is 1920.Thus,any two-digityearwould fall betwe
st" functions(place one in- yearll7l40 would be readasJanuary1,I940;theyear
rogrammore compact (and 2015.rfyou want to changethe valueof the YEARCU
statement. To setthe valuebackto 1900,you would u
OPTTONS YEARCTITOFF = 1900;

wantedto round the apeto Chapter 17,Section D, contains a list of SAS fun
tion.Thisfunctionhastwo ing with dates.For example, month, day, and year va
.it.To round to the nearest SAS date, or you can extract a month or year from a

D. LONGITUDINALDATA

There is a type of data, often referred to as longitudin


tion. Longitudinal data are collected on a group of s
UND function,it assumes portion of the chapter is difficult and may be "hazard
To examine the special techniques needed to a
edour datesinto the num- low a simple example. suppose we are collecting d
r numericvalue.Therefore. same scheme would be applicable to periodic data in
It will not look like a date. ences with repeated measures.) Each time the patien
rutdates. Thisformat does an encounter form. The data items we collect are:
rstplace.Someusefuldate

PATIEI{I ID
DATE OF VISIT (Month Day Year)
I{IIAP'F PAfrIF

SYSTOLTC BIOOD PRESSURE


DIASTOLIC BLOOD PRESSURE
DIAoIOSIS CODE
DoCTOR FEE
I"AB FEE

'0 Now, suppose each patient's visits are a maximu


to arrange our SAS data set is as follows (each visit to
@15SBP1 3.
G].8 DBP1 3.
@2L Dx1, 3.
@24 DOCFEE1 4.
@28 I.,ABFEEI 4.
iL

@4 DATE2 MMDDYYB.
@L2 1lF.2 3.
e15 SBP2 3.
e18 DBP2 3.
@ 2 1D x 2 3.
824 mc.FEE2 4.
@28 I,ABFEE2 4.

E4 DATE3 MMDDYYB.
912 HR3 3.
@15SBP3 3.
@ 1 8D B P 3 3.
@ 2 1D X 3 3.
G24 DOCFEE3 4.
G2B LABFEE3 4.
*4
g4 DATE4 M}4DDYYB.
@12 FrR4 3.
G15 SBP4 3.
@18DBPA 3.
@2LDxA 3.
@24 mcEEE4 4.
G28 I"ABFEE4 4. ;
FORMATDATEI-DATE4 MMDDYY1O. ;
DATALINES;
007L0211,98307 0120 0 800140 0400150
00712011983072L300900200 050020 0
007
007
0 0 9 0 9 0 3L 9 8 30 5 6 r - r , 0 0 7 0 1 "030730 0 0 0 0
009
009
009
The number signs (#) in the INPUT statement
Sinceour date is in the month-day-year form, we use th
included an output format for our dates with a FOR
statement usesthe same syntax as the earlier example
our own formats. The output fbrmat MMDDYYI0. s
printed in month-day-year form with slashesbetween
With this method of one line per patient visit, w
lines of data for any patient who had lessthan four visi
subiect.Thisis not only clumsy but it also occupiesa lot
to compute within-subject means,we continue (before

A V E H R- M E A N ( O FH R I " - H R 4 ) ;
A V E S B P = M E A N ( O FS B P I - S B P 4 ) ;
A V E D B P= M E A N ( O FD B P 1 - D B P 4 ) ;
etc.

MEAN' is one of the SAS built-in functions t


the variables listed within parentheses.(ruo'lr,:If any o
of the MEAN function have missing values,the resul
ing values.See Chapter 17, Section B for more detail
the form VAR1-VAR4 are used, you need to includ
Without the OF , SAS would simply subtract the twcl
to treat each visit as a separateobservation. Our prog

DATA PATIENTS;
INPUT @1 ID <?

@4 DATE MMDDYYS.
@12HR 3.
@r-5sBP 3.
@18DBP 3.
@2aDX 3.
@24 DOCFEE 4.
@28LABFEE 4.;
FORMAT DATE MMDDYY]-O.;
DATALINES;
ID DATE HR SBP DBP DX DOCFEE LABFEE

007 't0121 40 150


11983 70 120 80 14
007 12101 1983 72 130 90 20 50 200
009 09/03/1983 66 110 70 137 0
005 07l05/1983 74 140 82 IJ 90 0
005 0111511982 80 180 96 14 200 15 0 0
005 0611811982 70 170 B4 14 80 400
005 07l03/1983 64 |40 B4 14 80 200

How do we analyzethis data set?


A simple PROC MEANS on a variable such as HR, SBP,or DOCFEE will not be
particularly useful since we are averaging one to four values per patient together. Per-
haps the average of DOCFEE would be useful since it represents the average doctor
fee per PATIENT VISII but statistics for heart rate or blood pressure would be a
weighted average,the weight depending on how many visits we had for each patient.
How do we compute the average heart rate or blood pressure per patient? The key is
to use ID as a CLASS or BY variable.
Here is our program (with sample data):

DATA PATIHTIIS;
INPUT Gl ID 53.
@4 DATE MI4DDYfB.
ISI.Z |1f(

@15 SBP
g].8 DBP {
821 DX ?

824 DOCTEE 4.
828 I,ABFEE 4.;
FORMAT DATEM}4DDYY1O, ;
DATALINES;
0 0 71 0 21 1 9 8 3 0 ? 0 1 2 8 0000 1 4 0 0 4 0 0 1 " 5 0
0 0 7 L 20 1 - 1 - 908732r . 30 0 9 0 0 2 0 0 0 5 0 0 2 0 0
0 0 9 0 9 0 3 1 9 8036 6 r - l - 0 0 ? 0 10"033?00 0 0 0
00507051983 074L40082013 00900000
0050115198208018009601402001500
0 0 5 06 1 8 1 9 8 2 0071 ? 0 0 8 4 0 1 - 4I 0000 4 0 0
005070319830641400840:-400800200
LABFEE

150 The resultis the meanHR, SBP,etc.,per patien


200 calledSTATS,with variablenamesM_HR, etc.(Se
0 for more about usingPROC MEANS to createout
0 after MEAN : in the OUTPUT statementwill be
15 0 0
theVAR statement,in the order they appear.ThusM
400
200 meanHR in the data set PATIENTS.In this exam
pearasfollows (we can alwaystest this with a PROC

i,SBP,or DOCFEE will not be Listing of Data Set STATS


luesperpatienttogether.per_
represents ID -TYPE- _FREO_ M_HR M-SBP M_DB
the averagedoctor
005 1 4 72 15 7 . 5 86.5
lr bloodpressurewould be a 007 '| 2 71 12 5 . 0 85.0
risitswe had for eachpatient. 009 1 I 66 11 0 . 0 70.0
3ssure per patient?The key is

This data set contains the mean HR, SBP,etc.,


data set with additional SAS procedures to investiga
or to compute descriptive statisticswhere each data v
(the MEAN) from each patient.

SELECTINGTHE FIRSTOR LAST VISIT PERPATIEN

What if we want to analyze the last (most recent) vis


in the previous data set PATIENTS? If we sort the
most recent visit would be the last observation for
these observations with the following SAS program:

PROC SORT DATA=PATIEIflIS,'


BY ID DATE;
RIN;
DATA RECBfI; @
SET PATIBTIIS; @

:: l::-q 1
l-.t, .Ltt>r'. .r-rt rD
RTIN;
our data set has been previouslysorted by the samevariable (it has).The effectof
addingthe BY statementis to haveSAS createwhat are calledFIRST.and LAST.vari-
ables.In this case,sinceour BY variableis ID, two variables,FIRST.ID and LAST.ID,
are automaticallycreated.Thesevariablesare availablein the DATA stepbut are not
addedto the data set (FIRST.and LAST. variablesare automaticallydropped).The
FIRST.and LAST. variablesare logicalvariables;thatis,they havevaluesof true (1) or
false(0).FIRST.ID will be true (1) wheneverwe are readingthe first observationfor a
givenID and will be false(0) otherwise;LAST.IDwill be true wheneverwe are read-
ing the lastobservationfor a givenID and will be false(0) otherwise.To clarifythis,the
following showsour observationsand the value of FIRST.ID and LAST.ID. Keep in
mind that the two variablesFIRST.ID and LAST.ID are not in the SAS data set (but
they are in the PDV and thus are availableto be referencedin the DAIA step) and
that the dataset PATIENTS is now in ID and DATE order.

ID DATE HR SBP DBP DX DOCFEE I,ABFEE FTRST.ID I,AST,ID


5 01,/1,s/82 B0 180 96 14 200 1500 1 0
5 06/tB/82 70 770 84 L4 B0 400 o 0
5 0 1/ 0 3 / 8 3 64 140 84 14 80 200 0 0
5 0 7/ 0 5 / 8 3 74 1,40 82 13 90 0 0 1_
1 1,0/21,/83 70 1,20 B0 74 40 150 1 0
7 12/0r/83 72 130 90 20 50 200 0 1
9 0 9/ 0 3 / 8 3 66 110 t0L31 30 0 1 1

By addingthe subsettingIF statement@, we can selectthe last visit for eachpa-


tient (in this case,observations4, 6, and 7). You can recognizethis IF statementas a
subsettingIF statementbecausethere is no THEN clause.A subsettingIF statement
hasthe followingstructure:

IF condition;

Here is how it works:It the conditionis true, the program continuesto process
the statementsfollowing the IF statement;if the condition is false,the program re-
turns to the top of the DATA step.Specificallyin this case,if LAST.ID is true,the pro-
gram continuesand, since this is the bottom of the DATA step,an observationis
automaticallywritten out to the data set RECENT. If LAST.ID is not true, the pro-
gram returnsto the top of the DATA step (and an observationis not written to data
set RECENT).
iable (it has). The effect of
IledFIRST.and LAST. vari-
Listing of Data Set RECENT
es,FIRST.ID and LAST.ID,
the DATA step but are not ID DATE HR SBP DX
utomatically dropped). The
005 07l05/1983 74 140 B2 13
ey havevaluesof true (1) or
007 1210111983 130 90 20
tg the first observation for a 009 09/03/1983 oo 110 70 137
true whenever we are read-
rtherwise.To clarify this, the
'.lD
and LAST.ID. Keep in COMPUTING BETWEEN
DIFFERENCES OBSERV
rot in the SAS data set (but DATASET
:ed in the DATA step) and
Supposeyou want to compute the change (difference
rate and blood pressure,from visit to visit.With the d
vation per patient visit, this gets a bit tricky. Two ver
FIRST.ID I,AST.ID tions between observations are the LAG function an
10 how the LAG function works.
00 You may have come acrossthe term "lagged" val
00 asthma-relateddoctor visits to ozone levels,you may w
01 the current day's ozone level and the ozone levelsfrom
10 referred to as the ozone level, lagged24 hours.Now for
01
The LAG function returns the value of its argument-the
IL
cuted.What does this mean? An example will help. Loo

ct the last visit for each pa-


DATA LOOKING-BACK;
,nizethis IF statement as a TNPIJ'I DAY OZONE;
A subsettingIF statement OZOIJE-LAG24 = LAG(OZONE);
OZONE_LAG4B= LAG2(OZONE);
DATALTNES;
l-B
2 1,0
rgramcontlnuesto process ? 1a

n is false,the program re- 3"7


f LAST.ID is true, the pro-
l A s t e p .a n o b s e r v a t i o ni s PROC PRINT DATA=LOOKING_BACK;
ST.ID is not true, the pro- TITLE "DemonsLrat.incr the LAG Function";
ltion is not written to data RUN,.
'1 1B
2 2 10 ;
3312 10 B
437 12 10

In this example,the value of OZONE_LAG}4 is the value of OZONE from the previ-
ous day. It is missing in the first observation since there is no previous day. As you
probably figured out by looking at the program and the listing.the LAG2 function re-
turns the value from two days earlier.There is a whole family of LAG functions.Now,
you may wonder why the definition seemedso strange.Why didn't we just say that the
LAG function returns a value from the previous observation?Well, becauseit doesn't
always.Look carefully at the following program:

DATA I,AGGARD;
INPT]'I X;
IF X GT 5 THBi LAG_X = LAG(X);
DATALINES;
7
9
1

;
PROC PRINT DATA=L,AGGARD;
TITLE -Demonstrating a Feature of the LAG Function",'
RUN;

Here is a listinsof LAGGARD:

Demonstrating a Feature of the LAG Function

Obs X LAG_X

1 7
2 I
J 1
4 I
raw data and created in assignmentstatementsin a D
time the DATA step iterates.)In observation 4. X
What is the value of X the last time the LAG funct
observation2 and the value of X was 9.That is what t
is the bottom line herc'l You usually do not want to
tionally.As long as you executethe LAG (or LAG2
rf OZONE from the previ_
iteration of the DATA step,you can think of the func
s no previous day. As you
previous observation.
ing,the LAG2 function re_
We are now ready to compute differences in
ily of LAG functions. Now'
from visit to visit. First the program, then the explan
t didn't we just say that the
n? Well,becauseit doesn,t

*Assume data set PATIE\IIS is already sorted


DATA DIFFERSiCE;
SET PATIDJTS;
DV TN.

DIFF_HR - HR - LAG(IIR);
D]FF*SBP = SBP - LAG(SBP);
DrFF_DBP = DBP - LAG(DBP);
]F NCff FIRST.ID THEV OUIPT]T;
RUN,.

For those readerswho are or are becoming "compuls


Ition";
: few keystrokesby using the DIF function insteadof
I

DATA DIFFERSICE;
SET PATISiTS;
RV TN.
DIFF_HR = DIF(HR);
DIFF_SBP = DIF(SBP);
DIFF*DBP = DIF(DBP);
IF NOT FTRST.ID THM{ OUTPUI;
RUN;

As you can see.DIF(X) is equivalent to X-LAG(X).


IF NOT FIRST.PATIENT THEN DIFF-HR = HR _ LAG(HR)

Well, don't do it! Remember you have to "prime the pump" and execute the LAG
function for every observation.As long as we do not output an observationfor the first
visit, all is well.
Here is the listing of the data set DIFFERENCE:

Listing of Data Set DIFFERENCE

ID D A T EH R S B P D B P D X D O C F E E
L A B F E ED I F F H R D I F F _ S B P D I F F _ D B P

005 06/18t1982 70 170 84 14 80 400 -10 10 - tz


005 0 7 1 0 3 / 1 9 8 36 4 140 84 14 B0 200 -6 -30 0
005 0 7 1 0 5 / 1 9 8 37 4 140 82 13 90 0 10 0 -2
007 12t01t1983 72 130 90 20 50 200 2 10 10

G. COMPUTINGTHE DIFFERENCE
BETWEENTHE FIRSTAND LAST OBSERVATION
FOR EACH SUBJECT

What if you want to see the differencesof heart rate and blood pressurefrom the first
visit to the last? You need a way of "remembering" a value from a previous observa-
tion. The SAS tool that does this for us is a retained variable.Using a RETAIN state-
ment, we can tell SAS not to set the value in the PDV (program data vector) to missing
when the DATA step iterates.So,if you set the value of a retained variable,it staysat
that value until you changeit. Let's see how we can use this to compute out difference
scores.Here is the program:

DATA F]RST-I,AST;
SET PATIENTS;
DV TN.

***Data set PATIENTS i-s sorted by ID and DATE,.


RETAIN FIRST-HR FIRST-SBP FIRST_DBP; O
***Omit patients with only one visit;
IF FIRST.ID AND LAST.ID THEN DELETE; O
***If it is the first visit assj-gn values to the
retained variables;
, G( H R ) D_SBP=SBP*FIRST_SBP;
D DBP = DBP - FIRST DBP;
np" and executethe LAG owpur;
an observationfor the first END;
RUN;

We use a RETAIN statement to tell SAS not to se


missing when the DATA step iterates.If a patient h
DIFF_SBP
DIFF_DBP pute a difference between the first and last visit, s
member, if there is only one visit for a patient, both
- 10 -12 equal to one, and the statement @ will be true). W
-30 0 for each patient, we set the three retained variabl
0-2
10
DO statement @ can be thought of as an "execute t
10
reach the END" statement @.These values will st
assignedagain, and they are not set to missing by
visit for each patient, we can subtract the value o
) LASTOBSERVATION from the current value.We want to output only one
difference scores,so we include an OUTPUT state
include an explicit OUTPUT statement in the DA
oodpressure from the first automatic output at the bottom of the DATA step.A
) from a previousobserva- is shown next:
le.Usinga RETAIN state-
'amdatavecl.or) to missing
etainedvariable,it staysat Listing o f D a t a S e t F I RST_LAST
to computeout difference
ID DATE HR SBP DBP DX

005 0 7 / 0 5 / 19 8 3 74 140 82 IJ

007 1 2 1 0 1I 1 9 8 3 72 130 90 20
FI RST_ FI RST_
FIRST_HR SBP DBP D-HR D_SB

B0 180 96 -6 4
}ATE;
70 120 802 10

to lhe Compulsive programmers (like one of the autho


problem just one way.The following program produ
vious program. It usesthe very unusual trick of execu
***Data set PATIHVTS is sorted by ID and DATE;
**.*Omit patients with only one visit;
IF FIRSj|.ID AND LAST.lD THEN DELEIE;
***If it is the first or last visit exeeute the LAG
frrnnf i nn.

rF FIRST.ID OR T,AS?.ID TTIENDO;


D-HR = rn _ r,AG(HR);
D*SBP = SBP - LAG(SBP);
D_DBP = DBP - LAG(SBP);
END;
IF I,AST.ID THMitrOUIPUT;
RUN;

As you can see,the LAG function only executes when we are reading the first or la
visit for each patient. When we read the last visit (LAST.ID is true), the difference
the current value minus the value the last time the LAG function executed-which w
the first visit. So,when LAST.ID is true, we output the observation.

H. COMPUTING FREQUENCIES
ON LONGITUDINALDATA SETS

To compute frequencies for our diagnoses,we use PROC FREQ on our original da
set (PATIENTS). We would write:

PROCFREQDATA=PATIENTS ORDER=FREQ;
TfTLE "Diagnoses in Decreasing Frequency Order";
TABLESDX;
RrI{,

Notice we use the DATA: option on the PROC FREQ statement to make su
we were counting frequencies from our original data set.The ORDER: option allo
us to control the order of the categoriesin a PROC FREQ output. Normally, the diagn
sis categoriesare listed in sort-sequenceorder. The ORDER:FREQ option lists the
agnosesin frequency order from the most common diagnosisto the least.While we a
on the subject,another useful ORDER: option is ORDER:FORMATTED. This w
FF.

i.
rlhe LAG
ID DX FIRST.ID
5 13 1
5 I4 0
5 I4 0
5 1A
l+ 0
7 l4 1
7 20 0
9 137 1

If we now use the logical FIRST.DX and FIRST


are reading the first or last our goal of counting a diagnosisonly once for a g
) is true). the difference is FIRST.DX will be true each time a new ID-diagno
;tion executed-which was The data set and procedure would look as follows (a
rvation. by ID and DX):

DATADIAG;
:TS SET PATIS{TS;
REQonouroriginaldata BY ID DX;
IF FIRST.DX;
RUN;
PROC FREQ DATA=DIAG ORDER=FREQ;
TABLES DX;
R{IN,
:,i ,

We have accomplished our goal of counting a


tient. As you can see,the SAS internal variables FIRS
ful. Think of using them any time you need to do som
) statementto makesure occurrence of another variable.
ORDER= option allows
rut.Normally,the diagno_
FREQoptionlisrsthedi- CREATINGSUMMARY DATA SETSWITH PROCME
.otheleast.Whilewe are Besidesproviding a printed output of descriptive stati
TORMATTED.
Thiswill CLASS (or BY) variables, PROC MEANS or PRO
To demonstratehow this is done, supposewe have collecteddata on severalstu-
dents.We have a student number, gender, the teacher'sname, the teacher's age,and
two test scores(a pretest and a posttest).We use the following data for our example:

SUBJECT GENDBR TEACHER T_AGE PRETEST POSTTEST


I M Jones 35 6l 81
2 F Jones 3-5 98 86
J M Jones 3-5 52 92
4 M Black 42 4I 14
5 F Black 4",
46 76
6 M Smith 68 38 80
7 M Smith 68 49 7I
8 F Smith 68 38 o-1
9 M Hayes L -') l1 72
10 F Hayes L-) 46 92
11 M Hayes L_) 10 90
72 F Wong 41 49 64
A1
l-t M Wong 50 o-1

NorES: l. T,Age is the teacher's age.

2. In a "real" study.we would probably entcr the teacher's name antl age only oncc in a scparate data set and comhine that

data set with the student data later on. saving some typing. However. tbr this examplc. it is sinipler to include thc teacher's

age for everv observation.

As a first step,let's see how we can compute the mean pretest and posttest,and
gain scoresfor each teacher. Look at the following program:

DATA SCHOOL;
T,F\TcTH r:F\ITIER q 1 .IEAa-T{FR
\ I J c
I V C' 6)
INPUT SUBfECT
atltnnFp (
.r'tracHFP (

T-AGE
PRETEST
D/lcmFqr.

GAIN = POSTTEST _ PRETEST;


7 M SMITH 68 49 71,
llecteddata on severalstu-
B F SMITH 68 38 63
.me,the teacher,sage, and
9 M HAYES 23 7L 72
vingdata for our examole: 10 F IiAYES 23 46 92
11 M HAYES23 10 90
l-2 F WONG47 49 64
IETEST POSTTEST 1_3M WONG47 50 63
67 ul
98 PROC MEANS DATA=SCHOOLN MEAN STD MAXDEC=
86 TITLE "Means Scores for Each Teacher";
52 92 CI,ASS TEACHER;
41 74 VAR PRETEST POSTTEST GAIN;
4() 76 RUN;
38 80
49 71
38 63
11
tl 72 This program is straightforward.The DATA step
46 92 MEANS requestsstatisticsfor each teacher by includ
70 90 able.The LENGTH statement @ is used to specify ho
49 64 the character variables GENDER and TEACHER.
50 OJ input, a default length of eight is used for all characte

)aratedata sct and cornbine that


ls slmpterto include thc lcacher,s
M e a n sS c o r e s f o r E a c h T e a c h e r

T h e M E A N SP n o c e d u r e
pretestand posttest,and
N
TEACHER 0bs V ar i a b l e M

BLACK PRETEST z 43
POSTTEST 75
GAIN 31
HAYES PRETEST 62
POSTTEST 84
GAIN zz

JONES PRETEST 72
POSTTEST 86
GAIN 14
Instead of just printing out the results,we want to create a new data set that has
TEACHER as the unit of observation instead of SUBJECT. In our example, we have
only five teachers,but we might have 100, and they might be using different teaching
methods and be in different schools,etc.To create the new data set,we do the following:

PROCMT,ANSDATA=SCHOOL
NOPRfIIT NWaY; @
CI,ASS TEACHER;
VAR PRE"IEST POSTTEST GATN;
ourPu'I orn=TEAcHsUM@
MEAN=M PRE M POST M GAIN;
RUN;
*To get a list of what was produced and therefore what
is contained in the data set TEACHSUM,add the following:,'
PROC PRI}TI DATA=TEACHSUM;
TITLE "Listing of Dat.a Set TEACHSUM";
RUN;
*Hey! This is a good exanple of why coments
are useful. ;

The NOPRINT option on the first line O tells the program not to print the re-
sults of this procedure (sincewe either already have them from the last run, or the list-
ing would be too large to want to look at). As an alternative,you can use PROC
SUMMARY without the NOPRINT option.It is equivalent to PROC MEANS with
the NOPRINT option. Take your pick. We want the computed statistics (means in this
case)in the new data set.To do this, we include an OUTPUT statement@ in PROC
MEANS.The OUTPUT statementcreatesa new data set.We have to give it a name of
our choosing(by saying OUT : TEACHSUM), tell it what statisticsto put in it, and
w h a t n a m e st o g i v e t h o s es t a t i s t i c s .
We can output any statisticsavailable with PROC MEANS by using the PROC
MEANS options (N, MEAN, STD, etc.) as keywords in the OUTPUT statement.
These statisticswill be computed for all the variables in the VAR list and will be broken
down by the CLASS variable.Sincewe want only the score meansin this new data set,
we said, "MEAN : M_PRE M_POST M_GAIN.' These new variablesrepresent the
means of each of the variables listed in the VAR statement. in the same order the
give us only results for each TEACHER (the CLASS
grand mean in the new data set. Don't forget this.We
tte a new data set that has you leave this out.Your new data set (the listing from
'.
In our example,we have
reusing different teaching
ta set,we do the followine;
Listing of Data Set TEACHSUM

Obs TEACHER TYPE _FREO_ M_PRE

1 BLACK 1 z 43.500
z HAYES 1 62.333
JONES 1 72.333
4 SMITH 1 + | .ooo/
5 WONG 1 49.500

r what
'llowing:,.
Let's leave the explanations of the _TYPE_
The variable _FREQ_ gives us the number of obser
for each value of the CLASS variable. If you go b
you will see that teacher BLACK had two studen
and so forth.
What if you wanted the teacher'sage in this ne
age to gain score,for example)? This is easily accom
ment as part of PROC MEANS. So, to include the t
gram not to print the re-
would use the followins code:
m the last run, or the list-
:ive,you can use PROC
to PROC MEANS with
I s t a t i s t i c(sm e a n si n t h i s PROCI'IEANSDATA=SCHOOL
NOPRllff NWaY; O
f statement@ in PROC CI"ASS TEACHER;
ID T_AGE;
haveto give it a name of
\IAR PRETEST POSTTEST GAIN;
statisticsto put in it, and
OUTPU| OU|=TEACHSUM O
MEAN=MPRE M POST M GAIN:
\NS by using the PROC RUNr
LeOUTPUT statement.
R list and will be broken
: a n si n t h i s n e w d a t a s e r ,
The resulting data set (TEACHSUM) will now
r variablesrepresentthe
an alternative, you could have included both variabl
:, in the same order the
CLASS variables with the same result.
01 M North 70 200
02 M North 72 220
03 M South 68 155
'74
04 M South 2t0
05 F North 68 130
06 F North 63 110
0l F South 65 740
08 F South 64 108
09 F South 220
10 F South 6l 130

Next. we create a SAS data set as follows:

DATA DEMOG;
q.
LNGTH GE}IDER$ 1 REGION$
INPUI SUBJ GE}JDER$ REG]ON $ HEIGHT WEfGHT;
DATAIINES;
01- M North 70 200
02 M North 72 220
03 M South 68 155
04 M South 74 21,0
05 F North 68 1-30
06 F North 63 1-10
07 F South 65 1-40
0B F South 64 108
09 F South 220
10 F South 61 1,30

To compute the number of subjects,the mean, and the standard deviation for each
combination of GENDER and REGION. include a CLASS statement with PROC
MEANS like this:
200 Remember that you do not have to sort your
220 statement with PROC MEANS. In this example, we h
155 of one. The output from this procedure is shown next
270
130
110 o u t p u t f n o m P R o CM E A N S
140
T h e M E A NP
Sr o c e d u r e
108
220 N
GENDER REGION 0bs Vaniable
130
Nonth HEIGHT 2
WEIGHT 2
South HEIGHT
WEIGHT 4

North HEIGHT a

WEIGHT z

South HEIGHT z

WEIGHT 2

Since we now have two CLASS variables, the r


for each combination of GENDER and REGION.
We first demonstrate what happens when we
output data set with GENDER and REGION as CL

PROCMEANSDATA=DEMOG NOPRINT; O
CI,ASSGUVDERREGION;
VAR HEIG}M WEfGHf;
OUTPUTOUT-SUMMARY@
MEAN=M_HETGHT M-WEIGHT ;
RUN;
***Ad.d a PROC PRIlfff t.o lisL the observations
PROC PRII\II DATA=SIJI4MARY i
mdard deviation for each TITT.E 'Listing of Dara Set SUMMARY";
SS statement with PROC RUN;
to create a new data set, to select which statistics to place in this data set, and what
names to give to each of the requested statistics.The name of the output data set is
placed after the OUT: keyword. The request to output means is indicated by the
keyword MEAN: O.The two variable names following the keyword MEAN: are
names you choose to represent the mean HEIGHT and WEIGHT, respectively.The
order of the names following MEAN: corresponds to the order of the variable
names in the VAR statement.In this example, the variable M_HEIGHT will repre-
sent the mean height, and the variable M_WEIGHT will represent the mean weight.
Other keywords (chosen from the list of statistics available with PROC MEANS
found in Chapter 2, Section B) can be used to output statisticssuch as standard devi-
ation (STD:) or sums(SUM:).
Using a PROC PRINT with DATA:SUMMARY to see the contents of this
new data set, we obtain the following listing:

Listing of Data Set SUMMARY

Obs GENDER REGION TYPE -FREO_ M_HE


IGHT M-WEIGHT
'1
0 10 67.8889 IOZ.JUU
2 North 1 4 68.2500 16 5 . 0 0 0
3 South 1 6 67.6000 16 0 . 5 0 0
4F 2 6 65.4000 13 9 . 6 6 7
5M 2 4 71.0000 19 6 . 2 5 0
6F North e 2 65.5000 12 0 . 0 0 0
7F South 4 65.3333 14 9 . 5 0 0
8M North z 71.0000 2 10 . 0 0 0
9M South e 2 71.0000 18 2 . 5 0 0

Besidesthe mean for each combination of GENDER and REGIO\ we see there
are five additional observations and two additional variables,_TYPE_ and _FREQ_.
Here's what they're all about. The first observation with a value of 0 for _TYPE_ is the
mean of all nonmissing values (9 for HEIGHT and 10 for WEIGHT) and is called the
grand mean.The two observationswith _TYPE_ equal to 1 are the mean HEIGHT and
WEIGHT for each REGION;the next two observations with _TYPE_ equal to 2 are
the mean HEIGHT and WEIGHT for each GENDER. Finally, the last four observa-
tions with _TYPE_ equal to 3 are the means by GENDER and REGION (sometimes
called cell means). This is getting complicated! Relax, there is actually a way to tell
which _TYPE_ value correspondsto which breakdown of the data.
this data set. and what
re of the output data set is Binary _TYPE-
meansis indicated by the
Mean ov
the keyword MEAN: are
REGION
VEIGHI respectively.The
U 1 1 Mean fo
the order of the variable
1 0 2 Mean fo
le M_HEIGHT will repre-
1 1 3 Mean fo
epresentthe mean weight.
and REG
IbIE With PROC MEANS
i t i c ss u c ha s s l a n d a r dd e v i -
Next, we can come up with a simple rule. When
lo see the contents of this
binary,gives you a "1" beneath a CLASS variable,
that variable. If we look at TYPE : 1, we write th
and realize that the TYPE-: I sGtisticsrepresen
c o n f u s e d ?I t s O K . t h i s i s n o t e a s y .
An alternative to interpreting the _TYPE_ varia
PROC MEANS (or PROC SUMMARY) option CH
,IGHT M-WEIGHT
tion, the _TYPE_ variable is a charactervariable cons
8889 16 2 . 3 0 0 how this works, let's run the previous program with the
2500 16 5 . 0 0 0
6000 16 0 . 5 0 0
4000 13 9 . 6 6 7
PROC MEANS DATA=DEMOG NOPRINT C}IARTYPE;
0000 19 6 . 2 5 0
5000 12 0 .0 0 0 CI,ASS GENDER REGION;
3333 14 9 .5 o O VAR HEIG}IT WEIGHT;
0000 2 10 . 0 0 0 OIIIPUT OUI=SIIMMARY
0000 18 2 . 5 0 0 MEAN=M HEfGHT M WEIGHT:
RUN;

ndREGION,we seethere The resultins data set SUMMARY now looks like th
s,_TYPE_and _FREe_.
lueof 0 for _TYPE_ is the
EIGHT) and is calledthe L i s t j . n g o f D a t a S e t SUMMARY
e themeanHEIGHT and
h _TYPE_equalto 2 are GENDER REGION TYPE FREO M_HE
tlly,the lastfour observa-
00 10 6 7 .B
nd REGION (sometimes 01 4 68.2
r is actuallya way to tell
data.
Notice that the values of *TYPE- are now strings of 1s and 0s.You can use this
variable to selectwhich meansyou are interested in. Supposeyou wanted a separatedata
set for each of the -TYPE- values.You can create several data sets at one time. like this:

DATA GRAND RXGION GM{DER GENDER_REGION;


SET SUMMARY;
IF _TYPF, = '00' THRJ OTIIPUT GFAND;
ELSE IF *TYPR_ = '01' THEN OUTP{.N REGTON;
ELSE IF _TYPtr: = '10' THNJ OU'IPUT GMJDER;
ELSE IF _TYPE* = '11' THM{ OU'IPIII GEIJDER_REGION;
RUN;

This program demonstratesseveral things.First, you can create more than one SAS data
set in one DATA step.To do this, you list all the data sets you want to create on the
DAIA statement.Next, you use an OUTPUT statement to force an output at that point
in the DATA step.You also need to name the data set you want to output. Otherwise,
SAS will output an observation to all the data setslisted in the DATA statement.Finally,
you can see how the -TYPE- variable lets you choose which sets of means you want to
output. Using the CHARTYPE option with PRoc MEANS really makes the process
of choosing the correct value of -TYPE_ much easier.You don't even have to know
how to count in binary!
For most applications,you don't even need to look at the _TYPE_ values. Since
most applications call for cell means (the values broken down by each of the classvari-
ables),you will want the highest value of the _TYPE_ variable. If you include the option
NWAY on the PROC MEANS statement, only cell means will be output to the new
data set. So, if you only want the mean HEIGHT and wEIGHT for each combination
of GENDER and REGION. you would write the PROC MEANS statements like this:

PROC MEANS DATA=DSIOG NOPRfNT NWAY;


CI,ASS GENDER REGION;
VAR HEIG}ff WEfG}flI;
OUTPUT OUr=SIIMMARY
MF,AN=M-HEIGHT M_WEI GHT ;

RUN;
; and0s.You can usethis
ouwanteda separatedata
setsat onetime.like this: Listing of Data Set SUMMARY

0bs GENDER REGION -FREQ_

1 F North a z
2 F South 4
3 M Nonth 3
4 M South z

!tr;

The value of the variable _FREQ_ is the num


nonmissing) in each subgroup.For example, there we
rmorethanoneSASdata so FREQ : 2 in observation 1 in the summary da
)u wantto createon the number ofnonmissing values that were used in com
ceanoutputat that point include a request for N : in your output data set.
antto output.Otherwise, followins code:
DATAstatement. Finally,
etsof meansyou want to
reallymakesthe process
lon't evenhaveto know PROC MEANS DATA=DBIIOG NOPRINT NWAY;
CI,ASS GNDER REG]OII;
re _TYPE_values.Since VAR HEIGITI WEIGTfI;
by eachof theclassvari- OUfPUf OUr = SUMI{ARY
If youincludethe option N = N_HEfGHT N_WEIGHT
'ill be outputto the new MEAN = M_T{EIGHT M-IITIEIGTTI;
'IT for eachcombination RI]N;
PROC PRIIfI DATA=SUMMARY;
\NS statements like this:
Tff,E1 "Listing of Data Set SUMMARY with
TITLE2 "with Requests for N= and MEAN="
RUN;

In this program, we have chosen the vari


N_WEIGHT to represent the number of nonmissing o
shown here, makes the difference between the value o
clear:
M South 32 z z 71.0000 182.5

Observe that the value for N_HEIGHT is 3 for femalesfrom the South,while the
value of _FREQ_ is 4 (there was a missingHEIGHT for a female from the South).
Finally,if you use the NWAY option, there is not much need to keep the _TYPE_
variable in the output data set.You can use a DROP: data set option to omit this vari-
able.The program, modified to do this,is shown here:

PROCMEANSDATA=DEtrIOG
NOPRINTNWAY;
CI,ASS GMiDER REGION;
\IA,R HEIGHT WEIG}TI;
OIIIPUT OLn = SUMI"IARY{DROP=_WPE_)
N = N_HEIGIIT N_WEIGHT
MEAN = M HEIGHT M WEIG}[I;
RUN;

Some lazy programmerssometimesomit the variable list following a requestfor


a statisticwhen only one statisticis requested.For example,if you only want meansfor
each combination of GENDER and REGION. vou would write:

PROC MEANS DATA=DEI4OG NOPRINT NWAY;


CI,ASS GNDER REGION;
VAR }TEIGHT WE]G}TT;
OUTPUf OllT = SUMI4ARY(DROP=_TYPn_)
MEAN =;
RUN;

Using this method, the variable names in the new summary data set will be the
same as those listed on the VAR statement.That is, the variable name representing
the mean height will be HEIGHI and the variable name representing the mean
weight will be WEIGHT. This is probably a bad idea since you may get confused and
not realize that a variable name representsa summary statistic and not the original
value. (Actually, that other author would not even put in (DROP:_TYPE_) since it
takes up too much time, and he doesn't mind the extra variable in the printout.)
71.0000 182.5 Suppose you want the number of nonmissing v
minimum, and the maximum value for each combina
in the DEMOG data set.The following program wou
)sfrom the South. while the
femalefrom the South).
needto keep the _TYPE_
PROC MEANS DAfA;DEMOG NOPRII{I NWAY;
setoptionto omit this van-
CI,ASS GH{DER REGION;
VAR HEIGI{T WE]GHT;
OUfPUf OIJ'I = SUMMARY
MEAN = MEAN-HEIGHT MEAN:I{EIGHT
N = M_HEIG}{I N_WEIGHT
MEDIAN = MEDIAN_HEIGITT MEDTAN*WEI
MfN - IvIfN_HEIGHT MIN_WEIGHT
I4AX = MAX-HEfGIII I4AX-WEIGHT;
RUN'

Notice that the variable names following the re


listfollowinga requestfor You can name these variables anything you wish. It
if youonlywantmeansfor names like MEDIAN_HEIGHT so that it is easy to r
write: If you would like SAS to name all the summary
OUTPUT option AUTONAME (you know, Ford, Ch
tion, SAS appends an underscore character and the
ables listed on the VAR statement. To demonstrate
from the DEMOG data set and let SAS name them f

PROC I'IEANS DATA=DS4OGNOPRINT NWAY;


CI;ASS GENDERREGION;
VAR HEIG}TI WEIGHT;
OUTpUrt OUlIr = SUMMARy(DROp=*Typr: RENAI'IE
mmarydataset will be the
'ariablenamerepresenting MEAN =
N=
ne representingthe mean MEDIAN =
youmayget confusedand MIN =
atisticand not the original MAX = / AU'IONAME;
(DROP:_TYPE-) sinceit Rt]N;
iablein the printout.)
HEIGHT_ WEIGHT-
GENDER REGION NUMBER Mean Mean HEIGHT-N WEIGHT-N
F North 2 65.5000 120.0 22
F South 4 65.3333 149.5 34
M Nonth 2 71.0000 210.0 22
M South 2 71.0000 182.5 22
HEIGHT- WEIGHT_ HEIGHT- WEIGHT_ HEIGHT
- WEIGHT_
Median Median Mi.n Min Max Max
65.5 120.0 63 110 68 130
65.0 135.0 64 108 67 220
7 1. O 210.0 70 200 72 220
7 1. O
' t8 2 . 5 68 155 74 210

PROBLEMS
Remember,you can download all the data setsand programsfor theseproblems from the web
site:wwwprenhall.com/cody
4.1 We have collecteddata on a questionnaireas follows:

Starting
Variable Column Length Description
ID 1 J SubjectID
DOB 5 8 Date of birth in
MMDDYY format
ST-DATE t3 Startdatein
MMDDYY format
END-DATE 21 Ending date in
MMDDYY format
SALES 29 Total sales

Here is somesampledata:
L2
L2345 67 89 0L2345 6'789 0t23 45 67 I 9 0 Colunn Indicators

00t I02I1"94611L2L9
80722819887343
002 091319550202L980020419880123
005 06061940 03L21_98103L220040000
003 07051944111s19801-11320009544
salesper year computed in part (c). Use the MM
HEIGHT
N W EI G H T - N (e) Modify the program to compute AGE as of
rounded to the nearest 10 dollars. Tiy using the
z
e 4.2 Run the following DATA step to create a SAS da
4
z
new SAS data set called AGES that contains all the
z new variables.One is AGE_ACTUAL, which is the
2005. The second is AGE_TODAY, which is the ag
WEIGHT rounded to the nearest tenth of a year. The thir
Max
dropped, as of the date stored in the variable VISI
130 new data set.
220
220
210
t**Program to create data set AaC_CORP'
DATA ABC-CORP;
DO SUBJ = 1 t0 10t
DoB = MI(FANUNI (1234) *15000) ;
theseproblemsfrom the web V I S I T _ D A T E = I N T ( R A N U N I{ 0 ) * 1 0 0 0 ) + ' 01JA
nTmDtm.
B{D;
FORMATDOB VISTT-DATS DAfE9.;
RUN;

Description
4.3 For each of eight mice, the date of birth, date of dise
iectID
: of birthin addition, the mice are placed into one of two groups
DDYY format compute the time from birth to disease,the time fr
t datein death. All times can be in days.Compute the mean, s
DDYY format of these three times for each of the two groups. Here
ingdatein
DDYY format
I sales
RAT_NO DOB DISEASE

1 23MAY1990 23JUN1990
2 21MAY1990 27JUN1990
3 23MAY1990 2-5JUN1990
A
27l[{A^Y1990 07JULl990
22MAYt990 29JUN1990
6 26MAY1990 03JUL1990
7 24MAY1990 0lJUL1990
8 29MAY1990 15JUL1990
*Use Lm{cTtl statement to control the order of
variables in the data set;

ffl ffiT.I'E'H:-'rsrr8;
DO PATIEITI = 1 T0 25;
IF RANUNI(135) LT .5 TTIHVGENDER.= 'Fernale';
ELSE GHIDER = 'Mal-e';
x = RANUNI(135)t
I F X L T . 3 3 T H S { G R O U P= ' A ' ;
E L S E I F X L T . 6 6 T H E N G R O U P= ' B ' ;
E L S E G R O U P= ' C ' ;
Do wsrT = 1 TO IltT(RANUNI(135)*5);
IF VISIT = 1 THEII DO;
D A T E - V I S I T = I l f T ( R A N U N I( 1 3 5 ) * 1 0 0 ) + 1 5 8 0 0 ;
WEIGTff = Ibrr{RANNOR(135}*10 + 150};
B{D;
H,SE DO;
DATE-VISIT r DATE*WSIT + VISIft{10 + :1,{T{RANIjNI(1"35}*50));

#;:,:..'_ ":_:"u'-"'i
IF RANUNI(l35) LT .2 TTIENLEAVE;
END;
END;
DROP X;
FOR},'ATDATE_\NSIT DATE9. ;
RUN,.

*4.5 Using the data set (PATIENTS)describedin SectionD of this chapter(usethe pro-
gram with sampledata), write the necessarySAS statementsto create a data set
(PROB4_5)in which the first visit for eachpatient is omitted.Then,usingthat data set,
compute the mean HR, SBP,and DBP for each patient. (Patient 9 with only one visit
will be eliminated.)
4.6 UsingdatasetCLINICAL from Problem4.4,createa new SAS dataset(CHANGE) with
one observationper subjectwith the differencein WEIGHT betweenthe first and last
*4.9 We have a data set called BLOOD that contains from
Each observation contains the variables ID, GROUP,
RBC (red blood cells).Run the following program to

***Program to create data set BLOODT


DATA BI,OOD;
T.RNTH CPNITD T 1.

INPUT ID GROUP $ TIME WBC RBC @G;


DATAIINES;
l " A 1 8 0 0 04 . 5 1 A 2 8 2 0 04 . 8 1 A 3 8 4 0 0 5 . 2
1A483005.3 1-A5 8 4 0 05 . 5
2A178004.92A2 ? 9 0 05 . 0
38182005.4382 8 3 0 05 . 4 3 8 3 8 3 0 0 5 . 2
3 B 4 8 2 0 04 . 9 3 B 5 8 3 0 05 . 0
4 B 1 -8 6 0 0 5 . 5
(135)*50) ) ;
(RANuNr
5 A 1 79005.2 5 A 2 80005.25A38200s.4
5 A 4 84005.5

I
i
i We want to create a data set that contains the mean W
I new data set should contain the variables ID, GRO
! M_WBC and M-RBC are the mean values for the su
Ii subjectsfrom this data set who have two or fewer obs
sume there are no missing values).
HtNr: We will want to use PROC MEANS with
both ID and GROUP in the new data set, you can m
rf this chapter(use the pro- include an ID statement (lD GROUP;) to cause the v
mentsto create a data set output data set.Also, remember the _FREQ_ variabl
:d.Then,usingthat data set, be useful for creating a data set that meets the last
'atient9 with only one two or fewer observations.
visit
*4.10 Using data set CLINICAL from Problem 4.4, creat
\S dataset(CHANGE) with mean, median, and standard deviation broken down
T betweenthe first and last statement).Using this summary data set, create four
to controlwhich observationsgo into eachof the datasets.Note that you can accomplish
this in one DATA step.Use the CHARTYPE option to make this problemeasier.

Vous aimerez peut-être aussi