Académique Documents
Professionnel Documents
Culture Documents
to
S-Plus
Final Version
B.D. Ripley
Professor of Applied Statistics,
University of Oxford
e-mail:
24 August 1994
Preface
This guide was originally written for graduate students in Statistics at the University of Ox-
ford. The first versions were based closely on notes by Dr. Bill Venables of the Department
of Statistics at the University of Adelaide, but have been updated to reflect later versions of S,
the extensions of S-Plus and local facilities. Several sections, in particular 4, 6 and 11, remain
close to Dr. Venables’ original material. This guide will no longer be updated, following the
publication of Venables & Ripley (1994). [See p. 1. Where that takes a significantly better
approach than earlier editions of these notes, the material formerly here has been dropped.]
The guide is to S-Plus, but much of it will be relevant to users of the underlying S. Extensions
which are only in S-Plus include dynamic graphics ( 6.3, "!$#
%& and %'
() ) and the classical
statistics functions ( 9). The terminology of this guide is intended to be precise, only referring
to S-Plus rather than S for features unique to S-Plus.
These notes were written for a particular environment, S-Plus 3.2 on Sun SparcStations running
the Open Windows windowing system. You will find a number of differences depending on
your local environment. It will help to have the library !(*'+-,/. available — it should be in the
same source as these notes. It can be also be obtained by anonymous ftp from
0
1 3! 24/5768%*9 1 9%
6:4/;76 1=< 6>#-2 (163.1.20.1)
in file '=#-?=@-?A!B('+-,.C6%D&C6FE . It is available from %*9 1 9+"(* (see Section A.2) as
%3,/)GH!('B+=,I.HJ"!4 0 @
Alternatively, +("! 1 !3.KML=NB@=@=O from Venables & Ripley (1994) can be used.
This guide may be freely copied and redistributed for any educational purpose (including com-
mercial courses) provided its authorship (B.D. Ripley and W.N. Venables) is clearly stated.
Where appropriate, a small charge to cover the costs of production and distribution, only, may
be made.
B.D. Ripley,
University of Oxford,
24th August, 1994.
i
Contents ii
Contents
1 Introduction 1
1.1 Starting and Finishing PPQPPQPQPPQPPQPRPSPRPQPPQPQPPQPPQPPQPRPSPRPQPSPRP 1
1.2 Getting Help PQPRPSPRPQPPQPPQPQPPQPPQPRPSPRPQPPQPQPPQPPQPPQPRPSPRPQPSPRP 2
1.3 Hardcopy Output PSPRPQPPQPPQPQPPQPPQPRPSPRPQPPQPQPPQPPQPPQPRPSPRPQPSPRP 3
2 Datasets 3
3 A First Session 5
6 Graphics 16
6.1 Graphical Parameters PPQPPQPQPPQPPQPRPSPRPQPPQPQPPQPPQPPQPRPSPRPQPSPRP 16
6.2 Some Basic Plotting Functions PQPPQPPQPRPSPRPQPPQPQPPQPPQPPQPRPSPRPQPSPRP 17
6.3 Interaction with Plots PPQPPQPQPPQPPQPRPSPRPQPPQPQPPQPPQPPQPRPSPRPQPSPRP 17
6.4 Brush and Spin PRPSPRPQPPQPPQPQPPQPPQPRPSPRPQPPQPQPPQPPQPPQPRPSPRPQPSPRP 18
6.5 Equally-scaled plots PQPPQPPQPQPPQPPQPRPSPRPQPPQPQPPQPPQPPQPRPSPRPQPSPRP 18
Contents iii
7 Statistical Summaries 20
7.1 Arithmetical Summaries PQPPQPQPPQPPQPRPSPRPQPPQPQPPQPPQPPQPRPSPRPQPSPRP 20
7.2 Histograms and Stem-and-Leaf Plots PQPRPSPRPQPPQPQPPQPPQPPQPRPSPRPQPSPRP 20
7.3 Boxplots PQPPQPRPSPRPQPPQPPQPQPPQPPQPRPSPRPQPPQPQPPQPPQPPQPRPSPRPQPSPRP 21
8 Distributions 22
8.1 Q-Q Plots PPQPRPSPRPQPPQPPQPQPPQPPQPRPSPRPQPPQPQPPQPPQPPQPRPSPRPQPSPRP 23
9 Classical Statistics 24
13 Statistical Models 32
13.1 Model Formulas PSPRPQPPQPPQPQPPQPPQPRPSPRPQPPQPQPPQPPQPPQPRPSPRPQPSPRP 32
13.2 One-way Layouts PRPQPPQPPQPQPPQPPQPRPSPRPQPPQPQPPQPPQPPQPRPSPRPQPSPRP 33
13.3 Designed Experiments PPQPPQPQPPQPPQPRPSPRPQPPQPQPPQPPQPPQPRPSPRPQPSPRP 35
13.4 Generalized Linear Models PQPQPPQPPQPRPSPRPQPPQPQPPQPPQPPQPRPSPRPQPSPRP 39
13.5 Updating and Selecting Models PPQPPQPRPSPRPQPPQPQPPQPPQPPQPRPSPRPQPSPRP 42
14 Multivariate Analysis 43
Appendix
A Libraries 45
A.1 Library !B('B+=,/. PSPRPQPPQPPQPQPPQPPQPRPSPRPQPPQPQPPQPPQPPQPRPSPRPQPSPRP 46
A.2 Sources of Libraries PQPPQPPQPQPPQPPQPRPSPRPQPPQPQPPQPPQPPQPRPSPRPQPSPRP 46
VIntroduction 1
1 Introduction
to a variable but the result is not printed automatically. An expression can be as simple as iHj
k or a complex function call. Assignments are indicated by the assignment operator l=m or .
(As the first needs two keystrokes, lazy typists use the second. However, the first is easier to
read.) For example,
Z $i j k
npo*qsr
Z 0 , 1 )Kt&
%9 1 !$9O
npo*quo k=v
Z 0 l-m 0 6d, w=1 wI)x=x KU&W%9 1 !39Oyz5{l=m|5 1 !Kt&W%*9 1 !$9WO
Z 0 ?"%I}=!$9~Kt5
O
npo*q k o v o
6 x-i
npo*q
The states that the answer is starting at the first element of a vector.
Commands are separated either by a semi-colon, y , or by a newline. If a command is not com-
plete at the end of a line, S will give a different prompt, namely
j
on second and subsequent lines and continue to read input until the command is syntactically
complete.
S can be extended by writing new functions, which then can be used exactly as built-in func-
tions (and can even replace them). How to write your own functions is covered in section 12.
S has an inbuilt help facility similar to the man facility of Unix. To get more information on
any specific named function or dataset, for example 0 , 1 ) , the command is
Z &B,=+I'K 0 , 1 W) O
For a feature specified by special characters, and in a few other cases (one is $%*W(-%=%" ), the
argument must be enclosed in double quotes, making it a ‘character string’:
Z &B,=+I'K =n n $ O
o*
Help uses a window which overlays your main window. The pager accepts a number of options,
including %*' 1-< , for the next page and } to quit. (Other useful options are to go to the top
<
and 4/)-9"!4-+=mI to go back a page.) If you prefer, a separate help window (which can be left
up) can be obtained by the argument W()G4I"$ . Another way to get help is by
Z
0 , 1 )
Short help is given by the function 1 !3W% .
S-Plus also has a window-based help facility, started by
Z &B,=+I'C68%*9 1 !$9KM-#(/
*4/',)B+=4$4I2W$O
Click with the left mouse button on items to select categories and items. The help window can
be left up, or removed by
1.3 Hardcopy Output 3
Z &B,=+I'C6U4IJ=J~KTO
It is not advisable to quit S-Plus windows from the frame menu.
Graphics are printed by holding down the right button on the ! 1 '=& menu in a 4/',I)+-4=4/2~KpO
window (see 6) and releasing over the print item. This will print on the nearest laser printer
(or that selected by your $
=$ environment variable).
To record a session cut-and-paste to a 9B,/;=9B,AGB(9 window, then remove your mistakes (if any)
and save as a Unix file.
2 Datasets
Datasets are stored in a directory ?6F 1 9 1 . They are permanent, so all the objects you create
are retained until explicitly deleted. (As the directory name 6F 1 9 1 begins with 6 it will nor-
mally be hidden in file listings from Unix by +"% .) If there is a 6 1 9 1 directory in the current
directory when S is invoked, that directory is used rather than ?6 1 9 1 . This provides one
way to organize your S, using separate directories for each project.
In S, to get a list of names of the objects currently defined use the command
Z 4I", < 9
%KDO
Your own functions are also stored in 6F 1 9 1 . To find out whether an object is a function or
dataset, and what is in it, just type its name at the prompt, e.g.
Z *% 9 1-< 2C6;
Z 'B+=4I9
This prints out the function, dataset, PDPDP . In the later versions of S it may print a short summary
of the object. To get the full details, use
Z '!(*)=9C6:G,IJ 1 #+I9
K object O
When S looks for an object, it searches in turn through a sequence of directories known as the
search list. Usually the first entry in the search list is the 6F 1 9 1 sub-directory of the current
working directory. The names of the directories currently on the search list can be found by
the function
Z %3, 1 ! < &~KTO
The names of the objects held in any directory on the search list can be displayed by giving the
+"% function an argument. For example 4/", < 9W%Ki"O lists the contents of the second directory
in the search list. Normally the second, third and fourth directories are built-in functions, and
the fifth, sixth and seventh contain standard datasets
Extra search directories can be added to this list with the 1 9=9 1=< &~KTPDPDPtO function and removed
with the G,/9 1=< &~KTPDPDPtO function, details of which can be found in the manuals or the &,-+/' fa-
Datasets 4
cility. Note that attached directories are searched after the 6 1 9 1 directory in the order last
attached to first attached.
To remove objects permanently the function ! 0 is available:
Z ! 0 KU;M.t~p()UI#=)-2M9, 0 'WO
The function !, 0 4I5,KTPDPDPtO can be used to remove objects with non-standard names.
Warning
Objects in your 6 1 9 1 directory will take precedence over system objects of the same name.
This is a frequent cause of rather obscure errors, and can cause apparently correct behaviour but
erroneous results. Avoid using names such as < %9B+ 0 ! 1 )-,
9"!,-, for your own
objects. If you get peculiar errors, clean up your 6F 1 9 1 directory and try again!
S keeps a record of commands in the 6N=#"GB(9 file in the 6F 1 9 1 directory. This is a hidden file
and can grow rather large. Use (from the Unix command line)
@/'B+/#
%==-"/N=-H
occasionally to clean out the audit file entirely (or omit the to keep the last 0.5Mb).
A First Session 5
3 A First Session
The sample session given below is intended to show by example some of the capabilities of the
system. Work through the session given by the commands on the left of the page. Some clues
as to what is going on are given at the right hand side of the page.
¡B¢/£¤¥¦§¨©ª3«¬"
®°¯ ª"§¦=« ¯I¯±²³ Start the session.
® «-¥8´$µ-¢TµI¶ ² µ¥8ª=«$§D¶ ³ Open the graphics window.
® ¤§«ª ²M· µ=§I§I ³ Add a library of functions and datasets.
®¸· µ=§I§I use q to quit
® ¢ ·/· ¢I£p¤ ²M· µ-§/§I ³ Print out a data frame of the trees data
® ¤¥ ·²t¹ ¥¢8¡ ³ so that we can use names diam etc
® ¤¥ ·²t¹ ¥¢8¡~ºQ¦£«3¢/I»"¼½ºª$µ ¯ ´¢p´¥T«-¥ · ¶$»¾ ³ Histogram as counts.
® ¤§«ª ² ¤¥ ·³ as probability density
® · §8¡ ²t¹ ¥¢8¡ ³
® ª=« ¯*·²t¹ ¥¢8¡~ºR¿ ¯ «¬¡B§ ³ Stem-and-leaf plot.
®¸· µ=§I§IÁÀF«p¡ÃÂ/Ä«p¡ ² ¿ ¯ «D¬¡§|Å ¹ ¥¢8¡ ³ Scatter plot.
® ¬¡I¡B¢TµI¶ ²:· µ=§I§/ÀF«p¡ ³ linear regression
® ¢¦ ¯ ¿-¢ ²:· µ-§I§/ÀÆ«¡ ³ summary of fit
® ¢´3«-¥¦"§ ²M· µ-§/§IÁÀF«¡ ³ analysis of variance table
® ¥ ¹ §¦ · ¥Ç/¶ ²U¹ ¥*¢d¡~ºR¿ ¯ «D¬¡§
º¤"§$¥È¤ ·"³ plot line on scatter plot
Move mouse to plot and click with left button
to see what height is. Click middle button to
® ª¢Tµ ² ¡=ÇIµ ¯*É »$£ ² ¼ºMÊ ³I³ quit.
® ª=« ¯*·²M· µ=§I§IÁÀF«p¡ ³ set up 1 row, 2 cols for plots
plots of fitted values and Ë residuals Ë vs fitted value.
® ª¢Tµ ² ¡=ÇIµ ¯*É »$£ ² ¼ºp¼ ³I³
®°ÌIÌ ¦ ¯ µT¡ ² µ-§IA¥ ¹ ¬¢«$ ²M· µ-§/§IÁÀF«8¡ ³/³ one plot again.
®°ÌIÌ ¦ ¯ µT¡ ² · ¬ ¹ µ=§I ²M· µ=§I§IÁÀF«p¡ ³³ normal probability plot of residuals
®°ÌIÌ «-¥¦"§ ² · ¬ ¹ µ=§I ²M· µ=§I§IÁÀF«p¡ ³³ and of Studentized residuals
® ª¢A¥µ- ²:· µ-§I§/ ³ line through quartiles
® ´$µ¬¤ ² £p´¥¦ ¹
²t¹ ¥*¢d¡~º¤"§$¥È¤ · ºz¿ ¯ «¬¡§ ³/³ all pair-wise scatter plots
Ì ·
rotate points in 3D, select and de-select points.
Click on ´ to end
®¸· µ=§I§IÁÀF«p¡ÊÂIÄ°«¡ ² ¿ ¯ «¬¡B§|Å ¹ ¥*¢d¡ÎÍϤ§$¥pÈ*¤ ·³
® · µ=§I§IÁÀF«p¡ÐÂIÄ°«¡ ² « ¯ È ² ¿ ¯ «D¬¡§ ³ Å« ¯ È ²U¹ ¥¢8¡ ³ Í|« ¯ È ² ¤§$¥pÈ*¤ ·³I³
¸ multiple regression. Try functions as before
® ¹ § · ¢I£p¤ ²TÑ· µ=§I§/ Ñ*³
°
® ¤§«ª ² µ ¯ ¢ ¹-³ to avoid any confusion
® ¢ ·/· ¢I£p¤ ² µ ¯ ¢ ¹-³
® ª=« ¯*·²t¹ µ"¥¿-§Tµ-Wº ¹ §I¢ · ¤ ³
® ª=« ¯*·²t¹ µ"¥¿-§Tµ-Wº ¹ §I¢ · ¤WºÒ« ¯ È$» Ñ8Ó ¶ Ñ*³
® · ¢ · §°ÂIÄϵ ¯*É À`¦¢d¡B§I ² µ ¯ ¢ ¹=³
® ¥ ¹ §¦ · ¥Ç/¶ ²U¹ µ¥p¿-§Dµ=
º ¹ §I¢ · ¤Wº · ¢ · § ³
® ª=« ¯*·² Ç*¬"§«Ôº ¹ §/¢ · ¤
ºÕ« ¯ ÈA» Ñ8Ó ¶ ѳ Find the ‘odd’ states.
® ¥ ¹ §¦ · ¥Ç/¶ ² Ǭ§«Ôº ¹ §I¢ · ¤"
º · ¢ · § ³
® µ ¯ ¢ ¹ ÀÖ¡B¢ · Â/Ä×£p´¥8¦ ¹
²t¹ µ"¥¿-§Tµ-WºÇ¬§«º ¹ §/¢ · ¤ ³ Set up a matrix
® ª¢A¥µ- ² µ ¯ ¢ ¹ ÀØ¡¢ ·³
® ´$µ¬¤ ² µ ¯ ¢ ¹ ÀØ¡¢ · ºRµ ¯É «$¢´=»$ · ¢ · §WºÙpª¥8¦=»IÚ ³ Look at pattern of all three
Ì ·
Use mouse to highlight points and check their
identity. Then click on ´
®°Ì
²p³ Finish session
ÛSimple Data Manipulation 6
The basic data objects in S are vectors, arrays, lists and data frames.
4.1 Vectors
S operates on named data structures. The simplest such structure is the vector, which is a sin-
gle entity consisting of an ordered collection of numbers. To set up a vector named ; , say,
consisting of five numbers, namely 10.4, 5.6, 3.1, 6.4 and 21.7, use the S command
Z ;{l-m < K o 6x r 6UÜ
k 6 o ÝÜ6FxÝi o 6 v O
This is an assignment statement using the function < KDPTPDPO taking an arbitrary number of vector
arguments and whose value is the vector of its arguments.
A number occurring by itself in an expression is taken as a vector of length one.
Assignments can also be made in the other direction, using the obvious change in the assign-
ment operator. So the same assignment could be made using
Z < K o 6Fx r 6UÜ
k 6 o ÏÜ6FxÝi o 6 v OÞm Z ;
If an expression is used as a complete command, the value is printed and lost. So now if we
were to use the command
Z o ?/;
the reciprocals of the five values would be printed (and, of course, the value of ; would be
unchanged).
Vectors can be used in arithmetic expressions, in which case the operations are performed
element-by-element. Vectors occurring in the same expression need not all be of the same
length. If they are not, the value of the expression is a vector with the same length as the longest
vector which occurs in the expression. Shorter vectors in the expression are recycled as often
as need be (perhaps fractionally) until they match the length of the longest vector. In particular
a constant is simply repeated. So with the above assignments the command
Z 5{l-mHi=ß/;sjH.sj o
generates a new vector 5 of length 11 constructed by adding together, element-by-element, i-ß/;
o
repeated 2.2 times, . repeated just once, and repeated 11 times.
The elementary arithmetic operators are the usual j , m , ß , ? and à for raising to a power. In
o
addition all of the common arithmetic functions are available. +-4/ , +-4/ , ,I;=' , %=() , < 4"% ,
9 1 ) , %/}-!$9 , and so on, all have their usual meaning. 0
1 ; and 0 () select the largest and small-
est elements of an vector respectively. ! 1 )=, is a function whose value is a vector of length
two, namely < K 0 (*)KU;WO 0W1 ;~KM;WO=O . The element-by-element maximum and minimum of two or
more vectors are given by ' 0W1 ; and ' 0 () . +-,/)-=9=&~Kt;
O is the number of elements in ; , %*# 0 Kt;WO
gives the total of the elements in ; and '"!4AGKt;WO their product.
á4.3 Generating Regular Sequences of Numbers. 7
S has aoånumber of facilitieso for generating commonly used sequences of numbers. For ex-
k is the vector < K 8i
pPDPDP/i=wæ k O . The colon operator has highest priority within
ample oåpoAr < k o
an expression, so, for exampleoå i=o ß oå vector
is the o Kpiæ:xÜæpPDPDPI8iAç~ "O . Put )èl=m and
compare the sequences )Bm and KU)m O .
The construction
k å o may be used to generate a backwards sequence.
The function %$,A}KTPDPDP8O is a more general facility for generating sequences. It has five argu-
ments, only some of which may be specified in any one call. The first two arguments, if given,
o
specify the beginning and end of the sequence, and if these are the only two arguments given
åo
the result is the same as the colon operator. That is, %$,$}
Kiæ "O is the same vector as i .
Parameters to %3,A}KTPDPDP8O , and to many other S functions, can also be given in named form, in
which case the order in which they appear
0 o k is irrelevant.0 The o first k two parameters may k be named o
J"!4 value and 94$ovalue; thus %3,A}
K "O , %3,A}
KUJ"!4 M94$ "O and %$,A}Kt9B4A æ:J=!4 0 O
åk
are all the same as . The next two parameters to %$,$}
KTPDPDP8O may be named =. value and
+=,I)=-9=&" value, which specify a step
o size and a length for the sequence respectively. If neither
of these is given, the default =. is assumed.
For example
Z %3,A}Kpm r r -."6di"OÞm Z % k
k r r
generates in % the vector < Km 6Uæm/x76Fç8mx76:ÜæPpPDP/:xC6dÜæMx76Mç~ 6:O . Similarly
Z %*xél-mê%$,$}
K+=,/)-=9-&= ro ÙJ"!4 0 m r z=.6di"O
generates the same vector in %*x .
The fifth parameter mayo be named 1 +-4/)-" vector, which if used must be the only parameter,
and creates a sequence ÏiæPDPTP-+=,/)-=9-&K vector O , or the empty sequence if the vector is
empty (as it can be).
A related function is !,I'KTPDPDP8O which can be used for replicating a structure in various com-
plicated ways. The simplest form is
Z % r l-më!,I'KU;Ù9W( 0 ,"%I r O
r
which will put five copies of ; end-to-end in % .
á4.4 Logical Vectors. Missing Values 8
o
Logical vectors may be used in ordinary arithmetic, in which case they are coerced into numeric
vectors, ì becoming and becoming . However there are situations where logical vectors
and their coerced numeric counterparts are not equivalent.
In some cases the components of a vector may not be completely known. When an element
or value is “not available” or a “missing value” in the statistical sense, a place within a vector
may be reserved for it by assigning it the special value -N . In general any operation on an =N
becomes an -N . The motivation for this rule is simply that if the specification of an operation
is incomplete, the result cannot be known and hence is not available.
The function (-%æ6F) 1 KU;WO gives a logical vector of the same size as ; with value if and only if
the corresponding element in ; is -N .
Z (*)"Gòl=mê(-%æ6) 1 K8O
Character quantities and character strings are used frequently in S, for example as plot labels.
They are denoted by a sequence of characters delimited by the double quote character. E.g.
D;Bm/5 1 +/#B,"% , D,Ié(9B,A! 1 9W(34/)î!,"%#B+/9
%" . Single quotes can also be used, in matching pairs.
Character strings may be collected into a vector by the < KDPTPDP8O function; examples of their use
will emerge frequently.
The ' 1 %9,KTPDPDPtO function takes an arbitrary number of character string arguments and concate-
nates them into a single character string. Any numbers given among the arguments are coerced
into character strings in the same way they would be if they were printed. The arguments are
by default separated in the result by a single blank character, but this can be changed by the
named parameter, %$,I'" string, which changes it to string, possibly empty.
For example
Z + 1
%Þl=mî' 1 %*9,K < KDóÁ
DDôW3O oåpo
Ï%$,I'"W-$O
o k o
makes + 1
% the character vector KDó Ô¸DôiÏDó ÏPDPDP°ówÔ¸Dô 3O . Note in par-
ticular that recycling of short vectors takes place here too; thus < KDóÏDô$O is repeated 5
á4.6 Index Vectors. Selecting and Modifying Subsets of a Data Set 9
1. A logical vector. In this case the index vector must be of the same length as the vector from
which elements are to be selected. Values corresponding to in the index vector are
selected and those corresponding to ì omitted. For example
Z .èl=mî; n ïU(=%æ6) 1 Kt;WO q
creates (or re-creates) an object . which will contain the non-missing values of ; , in the
same order. Note that if ; has missing values, . will be shorter than ; . Also
Z KU;"j o O n KWït(-%æ6>) 1 tK ;WO$O ð ; Z q m Z
o
creates an object and places in it the values of the vector ;"j for which the correspond-
ing value in ; was both non-missing and positive.
2. A vector of positive o integral quantities. In this case the values in the index vector must
lie in the the set ÷ ÏiæPDPTPø+=,/)-=9-&Kt;
O=ù . The corresponding elements of the vector
n q
are selected and concatenated, in that order, in the result. The index vector can be of any
length and the result is of the same length as the index vector. For example ; Ü is the
sixth component of ; and
Z ; npoåo q
o
selects the first 10 elements of ; (assuming =+ ,I)==9-&KU;WO7ú ). Also
Z < KD;DD.Á3O n !,'K < K o Ui
Uiæ o OM9( 0 ,=%/Ix
O q
(an admittedly unlikely thing to do) produces a character vector of length 16 consisting
of ;ÁÔ¸D.ÁÔ¸D.¸D; repeated four times.
3. A vector of negative integral quantities. In this case the index vector specifies the values
to be excluded rather than included. Thus
Z .èl=mî; n mK oådr O q
gives . all but the first five elements of ; .
á4.7 Arrays 10
4. A vector of character strings. This possibility only applies where an object has a ) 1*0 ,"%
attribute to identify its components. In this case a subvector of the names vector may be
used in the same way as the positive integral labels in 2.
Z +I#=) < &sl=mÞJ"!$#
(9 n < K 1 '-'+=,
DI4/! 1 )$,"3O q
This option is particularly useful in connection with data frames (see 4.9).
An indexed expression can also appear on the receiving end of an assignment, in which case
n q
the assignment operation is performed only on those elements of the vector. The expression
must be of the form 5B, < 9B4A! index vector as having an arbitrary expression in place of the
vector name would not make sense.
The vector assigned must match the length of the index vector, and in the case of a logical index
vector it must again be the same length as the vector it is indexing.
For example
Z ; n (-%æ6F) 1 KU;WO q l=m
replaces any missing values in ; by zeros and
Z . n .Bl= q l-mmI. n .Bl= q
has the same effect as
Z .{l-m 1 W%ÔKt.
O
4.7 Arrays
An array can be considered as a multiply subscripted collection of data entries of the same type,
for example numeric, logical or character string.
An array is defined by having a dimension vector, a vector of positive integers. If its length is
2 then the array is 2 –dimensional. The values in the dimension vector give the upper limits for
each of the 2 subscripts. The lower limits are always 1. Suppose, for example, is a vector of
1500 elements. The assignment
Z GB( 0 KdOÞl-m < K k r o ==O
allows to be treated as a ûÙãöüÙã¸ýTþIþ array.
Other functions such as 0
1 9!(;~KTPDPDPtO and 1 !-! 1 .KTPDPDP8O are available for simpler and more nat-
ural looking assignments in special cases, e.g.
r o
Z èl-m 1 =! ! 1 .~Kd < K k =O$O
Z èl-m 0W1 9 !(*;Kd~ k r O=O
The values in the data vector give the values in the array in the same order as they would occur in
k ÿ entries
Fortran, that is, with the first subscript moving fastest and the last subscript slowest. For exam-
ple if the dimension vector for an array, say 1 , is < K Mxi"O then there are ûã ÿ7ã
n o o *
o q 1 i
PDPTP 1 n i
Mx8i q
n o
o q
in 1 and the data vector holds them in the order 1
1 n k :x8i q . To make life easier, 0
1 9!(; has a -."!4/"3 parameter for data presented by row
rather than by column.
á4.8 Lists 11
Individual elements of an array may be referenced by giving the name of the array followed
by the subscripts in square brackets, separated by commas. More generally, subsections of an
array may be specified by giving a sequence of index vectors in place of subscripts; however if
n q array with dimension vector < Ktxi"O and data vector
any index position is given an empty index vector, then the full range of that subscript is taken.
Thus 1 iæ= is a ÿã
1 n i
o oq , 1 n
i 8iæ oq , 1 n i
k oq , 1 n i
Mx oq , 1 n i
o i q , 1 n i
8iæi q , 1 n i
k i q , 1 n i
Mxi q ,
n q
in that order. 1 - stands for the entire array, which is the same as omitting the subscripts
entirely and using 1 alone.
Arrays may be used in arithmetic expressions and the result is an array formed by element-by-
element operations on the data vector. The dimension vectors of operands generally need to be
the same, and this becomes the dimension vector of the result. So if N , and are all similar
arrays, then
Z {l-mHi=ß/NBßHjÎêj o
makes a similar array with data vector the result of the evident element-by-element opera-
tions. The matrix multiplication operator is XßAX .
There are extensive matrix manipulation facilities, including transposes and eigenvalue,
Cholesky, QR and singular-value decompositions. See help on 9 , ,(,I) , < &4-+ , }-! and %*5"G .
Any dimension of an array can be given a set of names using G( 0 ) *1 0 ", % , but is usually easier
to use the facilities of data frames.
Matrices can be built up from given vectors and matrices by the functions <
()"GKTPDPDPtO and
!$
()G
KTPDPDPUO . Informally, <
()G
KTPDPDPUO forms matrices by binding together vectors or matrices
horizontally, or column-wise, and !3W()G
KDPTPDPUO vertically, or row-wise.
4.8 Lists
and this generates much less output that printing the object, which will achieve the same pur-
pose.
The names of components may be abbreviated down to the minimum number of letters needed
to identify them uniquely. Most of the datasets are in fact lists (or can be treated as lists), so we
could refer to the component GB( 10 of the 9"!,-,"% data as 9"!,=,"%G . Similarly, many S functions
return lists of results.
n=npo*q=q noq n=n q=q
It is important to distinguish 9"!,-,"% n fromq 9!,=,% . “ PDPTP ” is the operator used to
select a single element of a list, whereas “ PDPTP ” is a general subscripting operator for vectors.
Fortunately, numbered components are needed very rarely.
New lists may be formed from existing objects by the function +"(-%9~KTPDPDPtO . An assignment of
the form
Z 9!,-,"%Þl=mÎ+"(-%9K8G( 10 9=!,$,6:G
Ò&B,"(*=&=9 9"!,$,6&z54-+/# 0 , 9!,-,6>5WO
sets up a list 9"!,=, of 3 components using the existing objects 9"!= , ,6:GÙ9"!,=,6&
and 9!,-,6F5
for the components and giving them names as specified by the argument names (which can be
chosen freely). If these names are omitted, the components are numbered only.
Lists can be 1 9=9 1-< & -ed as well as directories, and this allows their components to be accessed
as if they were stand-alone entities. Thus in the 9"!,=,% example we could have
Z 1 =9 9 =1 < ~& Kt9!,=,=%-O
Z 0 , 1 )KtB& ,"(*=&=9O
It is wise to G,/9 1=< &K9=!,$,%"IO after use to avoid any nasty surprises.
Data frames were introduced in the August 1991 release of S, and can be thought of as closely-
coupled lists of data vectors of the same length. Unlike matrices, the data vectors can be of
different types, including character data. Both the rows and columns can be labelled. Consider
the data frame !4 1 G from +"(*"! 1 !$.~Kd!('B+$,/.
O :
Z !4 1 G
G, 1 9-&W%G=!(*5,$oI!r % '4I'"G,/)!$#! 1 +°9B, 0 ' J=#B,=+
=o o
N+ 1 10W1 w=Ü$çk o-ç o ÜIxC6d Ü=Ür 6U Ük=i w6U
1
NB+ %2 1 x 6x 6Uw Ü6Ui
6-6=6-6=6=6-6=6-o 6=6=6$6-6$6=6-6 k o o
L
4 iAçr w ikx Ü k 6d v -6d x kço 6U
LW4I)=9 i w ç xC6dÜ i6U i=w 6U
which has both row and column labels. The columns can be treated as components of a list:
® µ ¯ ¢ ¹ µ*¬Aµ-¢«
¼
ÀF½ À ÐIÐÀF½ IÐÀF½H¼/¼Àƽ /ÐÀƽ À8¼ ÐÀ ½ÀF½ ÀF½ IÐÀF½ $ ½ÀF½
¼pÐ ¼p½IÊÀF½ Àƽ¼½I½ÀF½H¼pÊæÀF½
Àƽ A½Àƽ ¼ ÀF½ ÊÀF½ ¼ Àƽ ÀF½ë¼I¼½ÀF½ ÀF½
Ê ¼p½I½ÀF½ IÊ Àƽ
and the structure can be treated as a two-dimensional array:
á4.9 Data Frames 13
Z ! 4 1 G n
i Mx q
NB+ 1 r %2
1
Z ! 4 1 G 6Uw n TLW4ö9, 0 Á' q
L
4
Z xB! 4 1 G n TLW4 q
G, 1o 9=&
%°G-!(*5,Ak!B%¸'B4/'G,/)
k !3#"! o 1 +|9, 0 H
' J-#o ,=+
LW4 iAçw i x Ü - xB ç
Note how the row label is carried along.
Data frames can be 1 9-9 1-< & -ed just as lists can, and this allows their columns to be accessed as
if they were named vectors.
A data frame can be created from vectors and matrices by the G 1 9 1 6FJ! 1 0 , function. For ex-
ample:
Z 9!,-,/J"! 1 0 î
, l-mîG 1 9 1 6J"! *1 0 , KtG( D1 0 $ 9=!,=,æ6MG
Ù&,(-&=9"39"!-,-,6>&Õ54=+I# 0 ,A$9!,-,æ6F5O
If the columns are not named, they pick up the names of the vectors, so
Z 9!,-,/J"! 1 0 î
, l-mîG 1 9 1 6J"! *1 0 , KM9"!,$,6MG
:9"!-,=,æ6&F9!,$,6>5
O
gives
Data objects will usually be read as values from external files. This is done most conveniently
with the % <=1 )KDPTPDP8O function. To read a vector from the keyboard we can use
Z < 4/#-)=9W%Þl-m < pK iæ k k M x k Ui
o k Uç o-o UÜæUÜ
v o i o$o $o o
o=o v o o v r o$o
j i :x U i=itç 8 w$ç
Mx k i$æ wæUi-$wæÜ/çMx k 8Ü v Uw=wæF xBÜæ k-k O
or
< I4 #=)-9W%×l=mê% =< 1 ) KTO
k k x k i o k ç o-o s
io=o s Ü Ü v o i o=oo-o
v o i o x v i=iHç r $w çHx k =i -o o Î
w i==wÎÜAçÞx k Ü v w=wÞxÜ k-k
Input is terminated by a blank input line (from the terminal only, despite the documentation) or
by EOF (ctrl-D in Unix). To read in a character vector we specify the vector type by the second
argument:
Z GB($,I9sl=mê% <=1 )K"D=3O
êÎNsHìéÃìÎNs
ìNéÎH ÎHNsìÎ
Hì ÃêHNs N HìéÎ
To read from a file specify its name as the first argument, for example
Z < 4/#-)=9W%Þl-mê% =< 1 ~) K < &"G6:G 1 9IO
Now suppose that multiple data vectors of equal length are to be read in in parallel. For example
suppose that there are three vectors, the first of mode character and the remaining two of mode
numeric, and the file is ()-'=#=976MG 1 9 . Use % <=1 )~KTPDPDPtO to read in the three vectors as a list, as
follows
Z (*)él-mê% -< 1 ~) K$(*)='-#$9C6MG 1 9Á
+=(=%9
KD(/G$
=
z;æ.""O=O
The second argument is a dummy list structure that establishes the mode of the three vectors to
be read. The result, held in (*) , is a list whose (named) components are the three vectors read
in.
Matrices are usually read by row, as follows
Z ó{l-m W0 1 9 !(*;KT% <=1 )
KI+=(*=&$976FG 1 9W3Oz) < 4-+A r Ò-."!4/"3WO
The argument %2
(' to % <=1 ) can be used to skip header rows of files.
Data frames can be read from a file by the !, 1 G69 1 B+=, function. The data file should be a
table in one of a number of formats:
1. A file such as !4/9W(*J,$!6MG 1 9 (page 39) which has a first row naming the columns, fol-
lowed by the table of numeric data can be read by
Z !4/9W(*J,$!Hl-më!, 1 G6F9 1 +-,K!-4I9W(DJB,A!6MG 1 9Á
Ù&, 1 G,$!=3WO
5.1 Writing out data 15
2. A file laid out like the listing of a data frame. This has a first header line, and rows which
contain the row label followed by the data for the columns, such as
¹ §I¢ · ¤ ¹ µ"¥¿=§Dµ-öª ¯ ª ¹ §p¦ µ¬$µ-¢« · d§ ¡3ª Ǭ§«
«3¢´"¢8¡¢
¼
IÊ ¼/¼
«3¢I ± ¢ A Ð ¼/¼ ½ À! À" ÐI½
ÀFÊ
À/ÀIÀIÀ/ÀIÀ/ÀIÀIÀ/ÀIÀ/ÀIÀ
Note that the header has one less entry than subsequent rows. This format is read by
Z !4 1 GÎl-më!, 1 G6F9 1 +-,KT*!-4 1 G6:G 1 9$O
o o
3. A table without any header. The row and column labels are then PDPTP$# and %
PDPTP&%"â . However, if there exists a character column without duplicates, the first such is
taken as the row labels and removed as a column.
Sometimes it is necessary to read in character strings which contain spaces. This can be done
by separating the fields in the file by, for example, tabs or commas:
Z #
%/!4 1 GÎl-mê% <=1 )~K*!4 1 G6MG 1 9W%3,/'"
'I9Á+"(-%9~KT%9 1 9,$W$ÙG, 1 9=&
%/
jG-!(*5,A!B%/æz'B4/'"G,/)æ!$#! 1 +A
Ò 1 )=9B, 0 '
zJ=#B,=+A"O-O
where '/9 is the usual Unix abbreviation for a tab character. This device also applies to !, 1 G6F9 1 +-, .
There are amny ways to write out data from S, for example the '"!B()-9 , <-1 9 and J4$! 0W1 9 com-
mands. To write directly to a file, there are <=1 9 , "!(*9, and, from S-Plus 3.2, !(9B,69 1 B+$,
which is usually the simplest method. This can write a dataframe, matrix or vector, with syntax
Z !(*9,69 1 +=,K8G 1 9 1 ÒJ
($+=,$W-%3,/'"
D$O
and further arguments can be found in the help page. By default it writes out comma-separated
items on rows, but the separator can be changed to space or tab ( 'I9Á in Unix).
The function !(9B, writes a vector, with syntax
Z !(*9,K8G 1 9 1 ÕJW($+-,A
G 1 9 1 Ôz) < 4=+/# 0 )
%/ r O
for numeric data, and in one column for character data. To write out a matrix 0 , use
Z !(*9,KU9K 0 O ÕJW($+-,A
G 1 9 1 Ôz) < 4=+/# 0 )
%/$) < 4=+K 0 O$O
The function JB4A! 0
1 9 converts data to a line of characters, and can be used with "!(*9, or <-1 9
to construct custom reports.
Graphics 16
6 Graphics
The graphical facilities are central to S. The steps involved are as follows:
Functions producing graphical output usually have optional additional named arguments that
can be specified to override some default parameter settings and hence modify the character-
istics of a plot. A short list of the main ones is as follows:
1 ;B,"%I$í If Ú
* ©+ all axes are suppressed. Default ¾ ,-.+ , axes are automatically constructed.
9=.-',$W < Type of plot desired. Values for £ are:
ª for points only, (the default for function ª3« ¯· ),
« for lines only,
´ for both points and lines, (the lines miss the points),
º© for step functions ( specifies to change now, © to change just before the
¯ for overlaid points and lines,
next point),
Other graphical parameters control the background characteristics of all subsequent plots and
are usually specified by a call to the function ' 1 !KTPDPDP8O . There are a great number of these
parameters and the command
Z &B,=+I'Kt' 1 !BO
gives a complete list of them and their meanings. Some of the more commonly adjusted ones
are as follows:
/6.2 Some Basic Plotting Functions 17
+/9-."3) Line type is ¦ . If lines are being plotted, a variety of line types is available; ¦ë»
¼ means a solid line, ¦ë»ÊԺк $000
indicates a variety of broken line forms.
' < &"
< Specify the character to be used for plotting points (default: 1 for graphics ter-
2
minals, for PostScript).
0 J!4I" < K 0 F )WO multiple frames on the one plot. Instead of plotting just one graph per screen,
0 J < 4-+A < K 0 F)WO each screen (or page) will contain an array of ¡ ¦ graphs forming an
1 35476
If =
¡ I
Ç µ
¯ É is used the screen is filled row-by-row and if -
¡ =
Ç £ ¯ «
grid.
is used it is filled
column-by-column. Useful if many graphs are to be inspected simultaneously
and high resolution is not necessary.
'=9-."
< Specify the type of plotting region currently in effect. Possible values for £ are
to generate a square plotting region;
¡ (the default) to generate a maximal size plotting region.
'+-4/9~Kt;:.PpPDPtO Scatter plot of points with ; – and . –coordinates given by the two
main parameters. The pair ;:. may be replaced by a single list with
components labeled ; and . , called a ‘plot list’.
Graphical parameters are particularly useful.
'4()-9W%KU;:.pPDPpP8O Add points to an existing plot (possibly using a different plotting char-
acter. Follows on from a '+-4/9~KTPDPDP8O command.
+"(*),%Kt;M.8PTPDPUO Add lines to an existing plot. Similar to points.
Z 'B+=4/9~Kt;M.WOy +"()B,"%ÔKT%'B+"(*),Kt;:.WO$O
Note
S-Plus allows users to interact with plots, by identifying points and by adding information at
places selected by mouse clicks.
/6.4 Brush and Spin 18
(/G,/)-9W(J-.KU;M.U+ 1 ,-+=%=O Ó
On a current plot of ºF¶ , clicking the LEFT mouse button places
the appropriate string from «3¢´"§« near the point which has been
clicked on. Click the MIDDLE mouse button to finish. If «3¢´"§«
is omitted uses index numbers, and always returns the indices of
selected points.
+=4 =< 1 9 4A!KTO Returns a list of vector coordinates of points clicked by the LEFT
mouse button. Click the MIDDLE mouse button to finish.
+=4 <=1 94A!K"D'Á$O ditto, but plots the points as in ª=«
¯*· .
=+ ,I,I)"G
K+=4 <$1 94/!KpO6$6=68O Add a legend box at a mouse-selected point (one LEFT click). See
help page for the box contents and other options.
+=4 =< 1 9 4A!KTO is often used with 9B,/;=9 to add annotation to plots, e.g.
Z 9B,/;-9Kp+-4 -< 1 9 4/!KpOD < 4/)$9!4$+%=$OyÕ9,/;-9K+=4 <-1 94$!
KpOp <=1 %3,"%=$O
These are S-Plus enhancements to allow dynamic manipulation of graphs. Spin allows three
columns chosen from a matrix of data vectors to be rotated in space.
Z &B,=+I'K3%9 1 9 ,"3O
Z %*'W(*)KT%*9 1 9, 6>; v$v O
Use the left mouse button to select three of the variables, then use the cross-shaped pad to rotate
the point cloud. Finally click on }3#W(9 .
Z !$#
%&KD%9 1 9,æ6; v=v z &W(-%9"3WO
includes %*'W() and a ' 1 I( !% plot. Additionally one can ‘brush’ by selecting points with the left
mouse button, and de-selecting them with the middle button. One can mark points in different
ways, with the four symbols, and even label points if + 1 ,-+ is selected.
Z !$#
%&K8!$
()"GKD(!(-% n = o*q 8(I!(=% n -8i q p(!($% n $ k q O=O
Now select the first 50 points with one symbol and the last fifty with another. The intermediate
nature of the middle 50 then stands out.
Figure 1: Screen dump of an 4/'B,/)+-4=4I2KTO window displaying !$#W%*& on the (/!B(=% data, with
different highlights for the three groups.
ÛStatistical Summaries 20
7 Statistical Summaries
Standard summaries such as 0 , 1 ) , 0 ,AGB( 1 ) and 5 1 ! are available. The 5 1 ! function will take a
data matrix and give the variance-covariance matrix, and < 4A! computes the correlation matrix,
either from two vectors or a data matrix.
There are also standard functions 0
1 ; , 0 (*) , ! 1 )=, and }$# 1 )-9W($+-, . The functions 0 , 1 ) and < 4A!
will compute trimmed summaries. More sophisticated robust summaries are available, such as
+=4 <=1 9W($4I)C6 0 and % <=1 +-,69 1 # as well as via the !4/-#W%*9 library.
The standard histogram function is &W(=%*9KU;ÝPTPDPDO which plots a conventional histogram. More
control is available via the extra parameters. The parameter '!4/ 1
($+"(*9$."I gives a plot of unit
area rather than cell counts, and ) < + 1 %=% sets the number of bins.
Densities can be estimated via the function G,I)W%=(*9=. :
&W(-%9~Kt&W%*9 1 !I9z) < + 1 %=%Ii=
z'"!4/ 1 W(3+"(*9=.=$Ù.+( 0 < KæU6U$i"O-O
+"(*),%KdG,/)
%$(9=.
KU&%9 1 !$9WO$O
+"(*),%KdG,/)
%$(9=.
KU&%9 1 !$9ÕW(/G39=&i=O+I9=. k O
See figure 2.
0.020
0.015
0.010
0.005
0.0
50 100 150
8 200
hstart
L
9 # 1 !39W($+-,"%¸ o r 6diæ oIr ç6MçA
, B
G (
o
, < ( 0W1 +'4()-9é(=% '+ 1=< | , 9B4î9-&,î!(*=&-9s4/J9-&, < 4=+=4I)
r åÝr
å
Üv åÝi=r=ir kv w
å w-r w
ç å¸io k-kk r v=Ü vv
ow å i r w
o=o å =Ix r Ü
o å /x k Ü$ç v-v$v=v
o ki å =io-o xBkÜ=Ü=Ü-Ü r v ç-w-w
o å¸o i k=x=k=x-k x vÜ w=w
oIx r å i-i=o ki r x=x w=w=w
o å =oAr x ç
o vÜ å w
o å Ü=vÜ
o ç å v=i v
w å o k=k-k r v
i=o å k x=x Ü-Ü
i å ç
i=ki å¸ÜAo ç
i x
Apart from givingr-r a visual picture of the data, this gives more detail. The actual data, in sorted
k
order, is roughly ÝÜ-iæÏÜ-iæÝÜ ÏÜ-wæÏPDPDP and this can be read off the plot. Sometimes the
pattern of numbers (all odd?) gives clues. Quantiles can be computed (roughly) from the plot.
7.3 Boxplots
A boxplot is a way to look at the overall shape of a set of data. The central box shows the data
between the quartiles, with the median represented by a line. ‘Whiskers’ go out to the extremes
of the data, and very extreme points are shown by themselves. It is also possible to plot boxplot
for groups side-by-side:
Z + (! 1 !3.K8!('+-,.WO
Z B4/;-'+=4I9KD%'+=(*9
Kt)B49=9, 0 < . < +-,Kt)B4/9-9, 0 O$Oz) *1 0 , %/ 0 4/)-9=&6 1 $WO
divides a time-series into months, and plots the boxplots for each month on one plot. See fig-
ure 3. Other styles of boxplot are available—see the help page.
Distributions 22
60
50
40
30
: ; < = < : : = > ? @ A
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
8 Distributions
S has functions built it to (approximate) the density, cumulative distribution function and quan-
tile function (the inverse of the CDF) for many standard distributions. There are also function
to simulate samples from these distributions. The first letter of the name indicates the function,
e.g. G$)4$! 0 Ò'-)4A! 0 }$)B4A! 0 !$)B4A! 0 respectively.
Distributions available are:
The function % 1 0 B' +=, re-samples from a data vector, with or without replacement.
One of the best ways to compare the distribution of a sample ; with a distribution is to use a
Q-Q plot, of which the normal probability plot is the best-known example. Q-Q plots can also
be used to compare two samples. For a sample ; the quantile function is the inverse of the
empirical CDF, that is
quantile CÖäED
GFIHKJ C"LNM proportion ä of the data OLPD
The function }-}$'B+=4/9~Kt;Ò.¸PDPDPpO plots the quantile functions of two samples ; and . against
each other, and so compares two samples. The function }-}$)4$! 0 Kt;WO replaces one of the samples
by a sample at the quantiles of a standard normal distribution. This idea can be applied quite
generally. For example, to test a sample against a QSR distribution, we use
' +=4I9Kö}39Kt'-'4()=9%ÔKt;OUw"O%$4A!39KU;WOÞO
where '='4()-9W% computes the appropriate set of probabilities for the plot.
The function }=}+()B, helps assess how straight a }-}$)B4A! 0 plot is by plotting a straight line
through the upper and lower quartiles. (See the example in 3.)
Classical Statistics 24
9 Classical Statistics
S-Plus 3.1 has a section on classical statistics. The same functions are used to perform tests
and to calculate confidence intervals.
The table shows the amount of wear in a shoe experiment with 10 boys, an experiment reported
in Box, Hunter & Hunter (1977), Statistics for Experimenters. There were two materials ( T and
U ) that were randomly assigned to the left or right shoe.
4/. T U
o ok o
U6 i Ktí
O x 6d
C KtWO
ki o ç6Ui Ktí
O o-ço 6:ç KtWO
o 6Ukw Kt
O o 6di KtíWO
xr o xC6 v Ktí
O o-xCo 6di KtWO
6 Kt
O 6:ç KtíWO
vÜ Ü6UrÜ Ktí
O Ü6x KtWO
o w6 Ktí
O o-wo 6:çk KtWO
ç 6Mç Ktí
O 6 KtWO
wo ço
k 6M6 çk Kt
O wo k6 k KtíWO
Ktí
O 6dÜ KtWO
We can use these data to illustrate one-sample and paired and unpaired two-sample tests. The
rather voluminous output has been edited:
® ¤ ¯ §I°ÂIÄ/£I¢¦ ² º«-¥* ·² »I½ºVA»I½ ³I³
E¼ Wö¼pÐÀFÊë¼
Àƽ
ÐXWYÀFZ Ê "À
XWö¼/¼BÀFÊë¼½À
XWö[¼
ÀFÐë¼
ÀÆÊ
XWö¼p½"À ë¼I¼BÀ
¼I\¼ W]
À
^
À
¼_Ð W]À ^"À
¼ _W¼½"À ¼I¼ÀFÐ
¼ _W]ÀÆ^ Ð "À
¼ _W¼ÐÀFмÐ"À
Ê\¼ W
® ¢ ·/· ¢I£p¤ ² ¤ ¯ §I ³
®¸· À · §I ·² º ¡3¬=»"¼½ ³
` ¦"§IÄ/I¢8¡$ª=«$§ · ÄD¾-§/ ·
¹ ¢ · a¢ W
· »½À¼pÊÔº ¹ Çî»^Ôºª"ÄD¿-¢«¬"§¸»|½ÀAÐ/Ð
¢« · §Tµ*¦¢ · ¥p¿-§¤$¶*ª ¯*· ¤§IA¥*aW · µ¬§¡§I¢¦Ã¥*Ϧ ¯· § Ì ¬¢« ·A¯ ¼½
ª"§Dµ-£/§¦ · £ ¯ ¦$Ç"¥ ¹ §¦£/§|¥¦ · §TµI¿-¢«XW
À
b$Ê Þ¼pÊÀFÐ IÐ IÐ
I¢d¡3ª3«3§§/ · ¥t¡B¢ · §IaW
¡§I¢p¦ ¯ Ç Ó
Classical Statistics 25
¼p½À
IÐ
®¸· À · §I ·² ³ £ ¯ ¦$Ç
Àt¥¦ ·
[¼ À"
AÊH¼pÊÀÆÐIÐ/Ð
¢ ·/· µ ² º Ñ £ ¯ ¦AÇ
ÀÆ«3§D¿=§« Ñ*³ W
[¼ |½À
®¸É ¥T«3£ ¯Ó À · §I ·² º ¡3¬=»"¼½ ³
+ Ó ¢I£ ·Zc ¥T«3£ ¯Ó$¯ ¦H$¥pÈ*¦§ ¹ ÄTµ-¢¦
± · I§ ·
¹ ¢ · ¢aW
$¥pÈ*¦"§ ¹ ÄTµ-¢p¦ ± · ¢ · ¥ · ¥£edî»Ðºz¦H»ë¼½Ôºzª"ÄD¿-¢«¬"§¸»½À
¢« · §Tµ*¦¢ · ¥p¿-§¤$¶*ª ¯*· ¤§IA¥*aW · µ¬§¡$¬s¥Ý¦ ¯· § Ì ¬"¢« ·A¯ ¼½
®¸· À · §I ·² ºfV ³
© · ¢¦ ¹ ¢Dµ ¹ ¾ AÉ ¯ Ä ©3¢8¡$ª=«$§ · ÄD¾-§/ ·
¹ ¢ · a¢ W ¢¦ ¹ V
· »×ĽÀFÐ
º ¹ ÇºªÄT¿-¢«¬§ø»×½À""¼
¢« · §Tµ*¦¢ · ¥p¿-§¤$¶*ª ¯*· ¤§IA¥*aW · µ¬§ ¹ ¥ÇIÇ=§Dµ=§¦£/§|¥¦ø¡B§I¢p¦|¥Ï¦ ¯*· § Ì ¬"¢« ·$¯ ½
ª"§Dµ-£/§¦ · £ ¯ ¦$Ç"¥ ¹ §¦£/§|¥¦ · §TµI¿-¢«XW
ÄÊÀ IbÊ ¼B"À /Ê IÊ
I¢d¡3ª3«3§§/ · ¥t¡B¢ · §IaW
¡§I¢p¦ ¯ Ç Ó ¡§I¢¦ ¯ Ƕ
¼p½À
IÐ ¼I¼ÀF½
· À · §/ ·² º]Vºz¿-¢Tµ
ÀM§ Ì ¬"¢«/»IÚ ³
®
c §«3£p^
¤ g ¯/¹ ¥Ç"¥*§ ¹ ¾ AÉ ¯ Ä©$¢8¡$ª=«3§ · ÄT¾-§/ ·
¹ ¢ · a¢ W ¢¦ ¹ V
· »×ĽÀFÐ
º ¹ ÇÀ"ÔºQªÄT¿-¢«¬§ø»×½À""¼
¢« · §Tµ*¦¢ · ¥p¿-§¤$¶*ª ¯*· ¤§IA¥*aW · µ¬§ ¹ ¥ÇIÇ=§Dµ=§¦£/§|¥¦ø¡B§I¢p¦|¥Ï¦ ¯*· § Ì ¬"¢« ·$¯ ½
ª"§Dµ-£/§¦ · £ ¯ ¦$Ç"¥ ¹ §¦£/§|¥¦ · §TµI¿-¢«XW
ÄÊÀ ./½
¼B"À /Ê /½ .
I¢d¡3ª3«3§§/ · ¥t¡B¢ · §IaW
¡§I¢p¦ ¯ Ç Ó ¡§I¢¦ ¯ Ƕ
¼p½À
IÐ ¼I¼ÀF½
®¸· À · §I ·² ºfVºzª¢A¥µ-§ ¹ »¾ ³
h ¢A¥µ=§ ¹· ÄD¾-§/ ·
¹ ¢ · ¢aW ¢¦ ¹ V
· »×ÄÐÀFÐb. º ¹ Çî»^ÔºRªÄD¿=¢«D¬§¸»|½ÀƽI½
¢« · §Tµ*¦¢ · ¥p¿-§¤$¶*ª ¯*· ¤§IA¥*aW · µ¬§¡§I¢¦ ¯ Ç ¹ ¥pÇIÇ-§Tµ-§p¦£I§/¥Ï¦ ¯*· § Ì ¬"¢« ·$¯ ½
ª"§Dµ-£/§¦ · £ ¯ ¦$Ç"¥ ¹ §¦£/§|¥¦ · §TµI¿-¢«XW
Ľ/ Ð
ĽÀd¼Ð/ÐI½
¼
À/ÀIÀ
Classical Statistics 26
m Permutation dsn
1.0
t_9 cdf
0.4
0.8
0.3
0.6
0.2
0.4
0.1
0.2
0.0
0.0
8 k 8 k
-4 -2
l 0
diff
2 4 -4 -2
l 0
diff
2 4
Figure 4: Histogram and empirical CDF of the permutation distribution of the Q -test in the shoes
example. The density and CDF of Q&R are shown overlaid.
()B4 0 69,%D9 < &W(-%/}69,%D9 < 4A!6F9,%9 J
(=%*&,A!6F9B,"%D9
J!(3,AG 0
1 )76>9,"%D9 2!$#
%2 1 +69,"%9 0
1 )-9,=+I& 1 ,/)6F9,%9 0
< )B, 0W1 !69,=%9
'!4I'C6F9B,"%*9 976F9B,"%9 5 1 !6F9,%9
($+ < 4/;76F9B,"%D9
Many of these have alternative methods – for < 4$!6F9B,"%*9 there are methods D'B, 1 !B%$4I)Á ,
D2B,/)G 1 +-+ and 3%', 1 ! 0W1 )W .
nHandling Categorical Data 27
Consider a (fictitious) survey of shoppers in Britain. Amongst the variables collected for each
person surveyed are sex, age, TV areao , social classp , transport used for this trip to the shops,
and total spend at supermarkets. The possible values of these variables are
sex: M, F
age: –24, 25–44, 45–59, 60+
TV area: 1, PDPTP , 12
social: A, B, C1, C2
transport: car, bus, cycle, foot
spend: positive continuous
This provides examples of each of S’s types of categorical data structure. There are two main
structures, categories and factors. The latter were introduced in the August 1991 release, and
have almost entirely superseded the use of categories. A factor is regarded as a vector over the
set of levels which have no implied order. Thus sex, TV area and transport are all factors. How-
ever, TV area is coded by number rather than by the names of the companies. These variables
can be declared as
%$,I;él-m|J 1=< 94$!
KT%3,/;76FG 1 9 1 O
q%76 1 !, 1 l=mÞJ 1-< 94$!
KUq%C6FG 19 1 O
9"! 1 )
%'4$!$9Hl-mîJ 1=< 9B4A!Kt9"! 1 )%*'4A!$96:G 1 9 1 O
Internally in S levels are numbered in alphabetical order, and when factors are used as treat-
ments in designed experiments, the order of levels may matter. For example, if we want to
contrast females with males (rather than vice versa) we need to specify the levels of the factor
explicitly:
Z %3,/;èl=mîJ =1 < 9 4A!KT%3,;C6MG 1 9 1 +-,/5B,=+"%I < KLÁÔTì$O$O
Social class is an ordered factor in that the classes are perceived as ordered, with “A” (profes-
sionals) regarded as highest. We can declare an order by
$% 4 < ( 1 +Þl=m4$!=G,A!,$G
KUJ 1=< 9B4A!KD%I4 < ( 1 + 6FG 1 9 1 O$O
+=,I5,-+"%KD%$4 < ( 1 +=Oõl=m+-,/5B,=+"%ÔKT%34 < ( 1 +"O n x å8oq
1 B,l-mH4A!=G,A!,AG
KUJ 1-< 94A!K 1 ,6FG 1 9 1
r O r r
+=,I5,=+%/ < K*m-i/xWÔöIi m/x-xÁÔ x m wöAÜ=Aj
$O-O
The first line orders the levels by the default (alphabetical) order. The second shows how the
set of levels may be changed, in this case by reversing the existing ordering. Age is an ordered
category for which it is necessary to specify the levels explicitly. Had 1 ,6MG 1 9 1 been specified
as a continuous variable, it could have been categorized using < #=9 (whose help page gives other
ways to produce the categories):
1 B,6 < G 1 9 1 l-m < #=9K 1 r r
B,6MG 1 9 1 < KæÝi x Ý Ü=
Ïw-w"O=O
1 B,l-mH4A!=G,A!,AG
KUJ 1-< 94A!K 1 ,6 < G 1 9r 1 O
+=,I5,=+%/ < K*m-i/xWÔöIi m/x-xÁÔ x r m r w öAÜ=Aj
$O-O
r Britain is covered by 12 commercial TV companies, so this provides a simple geographical variable.
s Derived from occupation.
10.1 The Function
· ¢pªIª3«*¶ ² 000 ³ and Ragged Arrays 28
Some of the functions for statistical models treat ordered factors in appropriate special ways.
To continue the previous example, suppose we have want to summarize spend by some of the
factors To calculate the sample mean income for each age-group we can now use the special
function 9EtuquEvwyx{z|z|z~} :
u\6
Pt q 9EtuPu\vwyx uNtE
qt}
giving a means vector with the components labeled by the levels
u\6
Pt
P qqq \ qPq q
6 P 6 q 6 \ b 6 q
Suppose further we needed to calculate the standard errors of the mean spends. To do this we
need to write an S function to calculate the standard error for any given vector. We discuss
functions more fully in 12, but since there is an inbuilt function 5Etx{z|z|z[} to calculate the
sample variance, such a function is a very simple one-liner, specified by the assignment:
9q PjP \¡I9¢34£xM;} ¤ $ 9yxt5\t.xU;}¥qv.q$9P¦£xM;}.}
After this assignment, the standard errors are calculated by
u\6 9. q E9 tuPu\vwyx u\qyItE 9.qE}
and the values calculated are then
uE6 9q
P qqq \qPq q
6 6 P 6 P 6
The function 9\tuPu\vw£x{z|zz[} can be used to handle more complicated indexing of a vector by
multiple factors. For example, we might wish to split the spend by both age and sex:
9EtuPu\vwyx u\qyIv¢ 9yxtE /;}_
Pt}
The combination of a vector and a labelling factor is an example of what is called a ragged
array, since the subclass sizes are possibly irregular. When the subclass sizes are all the same
the indexing may be done implicitly and much more efficiently by using arrays. The function
tuPu\vw is the analogue of 9EtuPu\vw for arrays.
The pattern of our survey can be seen by the 9\t§Evq function, which takes a listing of factors
and returns the contingency table as an array, e.g.
9Et§Evq_x I;¨It\I©PªC6«tPt 4P¡¢.tPvÙ9t u4.$9}
¬Loops and Conditional Execution 29
As we have seen informally in 10.1, the S language allows the user to create his or her own
functions. These are true S functions that are stored in a special internal form and may be used
in further expressions and so on. In the process the language gains enormously in power, conve-
nience and elegance. Most of the functions supplied as part of the S system, such as
qt£x{z|z|zº}
and 5Etx{z|z|z[} and so on, are themselves written in S and thus do not differ materially from user
written functions. (However, increasingly such functions are being re-written as internal func-
tions to gain efficiency.) Listing these functions (by printing their name without parentheses)
is a very fruitful way to gain hints for writing your own functions.
A function is defined by an assignment of the form
Etb
qjP E¡/9¢ µ yx arg® arg °Zz{z|z} expression
The expression is an S expression, (usually a grouped expression), that uses the arguments,
arg¿ , to calculate a value. The value of the expression is the value returned for the function. A
call to the function then takes the form Etb
_x expr®. expr°z|z|z{} and may occur anywhere a
function call is legitimate.
For example, the À.ÁÂ function in v¢b§tw£x[ µ § 9 } is defined as:
À.ÁÂ P^q E¡/9¢ µ yx~w}
P ä t-9 ¢vq_xºw¨I¡Xx6 q H6 }P}
¨¹ » ¨¹ »
³
This first computes the quartiles, then returns the last value computed, their difference.
Note that any ordinary assignments done within the function are temporary and lost after exit
from the function. Thus is not left behind, and does not affect any other object .
qq ’ can be used. See the ¦Eqvu documentation for details, and see also the
If global and permanent assignments are intended within a function, then the ‘superassign-
wP\¡¦ µ ¢ÄP_x{} function.
ment’ operator, ‘
As a second example of a useful function, consider a function to evaluate the ‘Huber proposal
2’ robust estimator(s) of location and/or scale:
Å ¬ ÆÇ{È-ZÉʱˬÌÍ|ÎÏ{ÐÑÌÓÒ&ÔXÕfÖj×GØEÙÚÓÕfÛ3¬ÕÕÜÏ[Ì\ÏÎ{Û3¬j×eÛ\ÇbÝPÏbÞÌÓÒ&Ôß\ÕÎÐI«^×GØ\Ù"àÇÊbáqß
â
ÔGÉÊ ÔãäºÏ åÙæÌÞ\ÒSÔßç
ÌÉÊ°«ÇÌ.èÎ Å ÒSÔß
ÏËaÒÛ*Ï IÏÌèåÒ$Û ¬\ßß â
3Û ¬qà^ÉÊjÏÌÏÎ|$Û ¬
ÌØéÉÊéÌÊqØ
ê Ç «$Ç â
3Û ¬qà^ÉÊÜ3Û ¬
3Û ¬ØeÉÊÜ3Û ¬
ÌØéÉÊéÌ
ê
ÏËaÒÛ*Ï IÏÌèåÒ ßß â
Writing Your Own Functions 31
bà5ÉÊ$Û\ÞbÝÒ&Ôß
ê Ç«$Ç â
bà5ÉÊ×
qØéÉÊ×
ê
Î Å ÉÊZë^ø ì ªÌqÐbÈ|ÛÒSÖß±ÊÃØ
ÆÇ{ÎPÞ^ÉÊíÎ ÅÃî Öï|ë^ìÒ{ØéÊ Î Å ßÊéëÃì ÖGì Ý|ÌqÐÈ{ÛÒSÖß
ÈPpÇ ªÇÞ|Î â
ÔÔðÉø Ê ªÑÛÏ[Ì_Ò ªÑÛEÞ{ñåÒ$Û ¬qàZÊ Öòì°bàÓÕYÔßEÕóÛ$¬qà î ÖGì|bàPß
ÏËåÒ!ÛÏ I.Ï[Ì.èaÒ3Û ¬ßß $Û ¬ØéÉÊ×p¬bÛÒ&ÔÔßôÌ
ÏËåÒ!ÛÏ I.Ï[Ì.èaÒ .ßß â
I^É| Ê ¬ÑÛÒÒ&ÔÔ^ÊÜ3Û ¬EØßï{ëPßbôÌØ
qرÉ|
Ê bõbÈÎaÒ IôÆÇ|ÎPÞß
ê
ÏËåÒÒÞÆ \Ò!3Û ¬àÊÜ3Û ¬EØß±É ÎIÐ «5| ì bàqßeööðÞÆ \8Ò bà5|
Ê Øß±É Î.Ð/«ðìbàqß
Æ.ÈPÇÞ|Ö
3Û ¬qà^ÉÊÜ3Û ¬EØ
bà5É× Ê Ø
ê
«PÏ |ÎaÒ3Û ¬5×é$Û ¬qàå Õ ÷ë × ÑàPß
ê
This allows either of the location
and scale to be specified. Optional arguments are the
parameter ¾ , the initial value for
and a convergence tolerance. The first line removes all
missing values. The
a¢
q ¢bqyx{} function checks if a parameter is supplied. Two constants are
then calculated as functions of ¾ . The rest of the function is a loop. In general loops are ineffi-
cient in S and should be avoided if at all possible, but here we have no choice as the calculation
is iterative. Finally the function returns two components, the location and scale.
It is sometimes useful to be able to time commands:
¡u 9¢|
qq ð q \ ¡/9¢ µ £xt;\} b ¨
x ¢*;C6>9¢|
\_xM;}¹ . »}
q vPtu ðq \ ¡/9¢ µ £tx ;\} ¢ ;76F9¢Ñ
åxU;}_¹ »
which return the total cpu time and the elapsed time taken by a command or sequence of com-
mands enclosed in 6=6-6º³ . Note: as these are functions, assignments inside them are in the
µ
frame of the function rather than permanent. Alternatively, use u ¡6F9¢|
Xx{} before and after
a group of commands.
øStatistical Models 32
13 Statistical Models
These facilities form the heart of the 1991 version of S. They are based on object-oriented ex-
tensions, so that generic functions such as uE¢b=9 know what to do with the results of various
models. The two most basic notions are a data frame ( 4.9) and a model formula.
A model formula couples a y-vector with a model expressed in a terminology very similar to
that of GLIM and GENSTAT. The form is
v µ PGù ¦EtP.\ q 9E
for the linear regression of v
µP
on ¦EtP.\
q
and 9\ . Factors are replaced by a set of in-
dicator variables for the regression, and can interact via the ú operator (not z as this is a valid
character in a variable name). Thus we can have all the following constructs:
9¢|
ù u µ ¢ .µ 9Pt/9.
-9 u µ ¢ .µ · 9.tI9.
\$9 equivalent to
9¢|
ù u µ ¢ .µ üûî9Pt/9.
-9
9q-9q¦ ù wEt. ¥§ µ §P§¢| nested layout
Et¢ ù µ u ¢¢*9¢.tPv parallel lines
¡ µ E¡ ù\^ qt.\¢q line thorough the origin
¡ µ E¡ ù u µ vw£x«qt.¢bq£ } quadratic polynomial
¡ µ E¡ ù x[qt.\¢.¨ ¢b-9\.¡qu=9ý.©\} natural spline
¡ µ E¡ ù x«qtE¢bP\} smooth function, for èqÞ[Û
The syntax of a linear-model fit is
where the names in the model formula refer to columns of the data frame, which can be omitted
if it has already been attached. For example
v¢b§tw£x[\¢buvPbw}
It 9=9Etq¡¦yx« §q§.}
9Pw 6«vb
q vb
¨xv µqðù ¦\t.q \9 }
P
tw£Ux 9qwP 6ºv
}
t µ 5\t_Ux 9qw 6«v|
a}
¡ µ q ¢¡¢=9 xM9qwP 6&v
a}
uEv µ 9£x ¢ 9-9\XUx 9.w 6ºv|
å}_Y ¢xUq9 w 6&v
a}.}
This show how to extract information from a fit by the use of ancillary functions. There are no
standard ancillary functions for standardized and Studentized residuals, but I have added them
as 9P x{} and 9 q x|} in v¢§t.wyx«E¢buvqbw} .
13.2 One-way Layouts 33
The analysis of one-way layout is best illustrated by an example. The table gives data on ob-
served concentrations (ng/ml) of a chemical in groups of 10 patients after oral administration
of almitrine bismesylate:
P µ ½
x«þÿ}
§q¡I9 q P P qP
. P qP qP
\ q
q
\ q P P
qP
q P
P P
q |E b
P | qP
P P qP
§ µ ;Pu\v µ 9£x u\vq*¢ 9x¡¦b a¢¡Ptqv_Y µ }q} Make a factor from the doses
¡¦\ 6«t µ 5 P t µ 5yxv µ £x¡¦E| a¢.¡.tPv} ù µ u¸$¡¦Eb } and and the parameters
P
tw£xt µ 5xv µ yx[¡¦\|
å¢.¡.tPvq} ù v µ yx« µ } µb u£N¡¦Eb
}q}
test for linearity of response
which gives
¬ÑÛÛEÞ{ÈÔaÒÍ Å Ç[ÛåÙ&ÞÑÐ
ß
Ë
Ѭ ÛÐË
õ PÇÞÌ
õ PÞ«¬Ç ÈaÒqß
èÈЬ/ª ÚáàÙ"ÚGØØáÙZëXÙØàÃØEÙàáëØÇÊÑà
Ç/. Ï{ݬ Þ«3! á Ø"##áë_Ù ØàÙ"Ú
ÍbÐ.Ç|ËËÏÍÏÇÌÎ- ÒÍ Å Ç[Ûå Ù&ÞÑÐ
ß
Ò"$ÌÎPÇ|ÈqÍÇp.ª Îß èÈЬ/ª Ø èÈЬ/qª ë èÈЬ/%ª
ØÚ_Ù#Ú ëXÙ#Ú&Ù[ØØáá#&ëXÙ"áà
¬ÑÛÛEÞ{ÈÔaÒÍ Å Ç[ÛåÙ&ÞÑÐ
ß
Ë
Ѭ ÛÐË
õ PÇÞ
Ì
'
õ !PÞ «D¬Ç ÈåÒqß
13.2 One-way Layouts 34
èÈЬ/ª ëëXÙàØ(#_Ù"á.áàà#Ù#ëëáGØEÙÚÚØëÇÊqØÚ
/Ç .Ï{ݬޫ3!á XÙá#Ø$à_Ù[ØàëëÚ
¬ÑÛÛEÞ{ÈÔaÒÞbÐ
åÒt«ÐèaÒÍ Å Ç[ÛÏÍÞ«.ß*)¸«ÐèaÒºÝÐ3Çß î èÈ.ÐD¬IªÕÍ Å Ç[Û.ßß
Ë
¬ÑÛÐ
Ë
õ PÇÞÌ
õ PÞ«¬Ç ÈaÒPß
«ÐbèåÒ~Ý3Ð Ç.ß Ø ëØEÙ ##ZëØEÙ##ZëØ_Ù#áë àXÙàààààààà
èÈÐ ¬/ª ë ØEÙàá à àXÙÚ ëàë ÚXÙ[Ø ZàXÙàØà #Ú
/Ç .Ï{Ý ¬Þ «3!á XÙá #Ø àXÙ«Øàë
The parameterization of linear models for designed experiments is a little tricky. The usual
parameterization is to impose a ‘sum to zero’ constraint on the parameters for a factor. GLIM
sets the parameter for the first level to zero, so that parameters for the the other levels are differ-
ences between that level and the first. By default S uses the Helmert parameterization, which
compares the second and subsequent levels to the average of lower levels. The usual parame-
terization can be gotten as default by setting
µ u=9¢ µ x¡ µ $9Pt 9 ý¡åx¡ µ $96
+Ó,¡ µ =96 u µ vbw-.}.}
and the GLIM parameterization by
µ u=9¢ µ x¡ µ $9Pt 9 ý¡åx¡ µ $96>9.tI9.
\$9-å,¡ µ =96"u µ vbw+}.}
Of course, the parameterization only affects the coefficients, not the fitted values, residuals,
z|z|z . µThe
contrasts for a particular term in a fit can be changed by the .x{} function, e.g.
å} or using ¡ µ =9t 9 .
.xº u¸
There is a ‘clever’ way to test for linearity using a re-parameterization of the factor u as an µ
ordered factor, for which the default parameterization is polynomial in / z|z|zq,0xvP/5\Pv }%1 .
(This relies on v £x[
µ µ } having levels in an arithmetic progression. One could always use
u µ vwyxv µ £x[ µ q}X } in place of v. µ .)
.v µ qüµ Px t.¡I9 µ x[v µ x[ µq }.}q}
P
twC6«vb
¨x[t µ 5£xv µ £x[¡¦\|
墡qtqvq} ù v. µ N¡¦Eb
}q}
(As far as I can see the use of
q
t..7
b w 6ºvb
is necessary to get results for the individual coeffi-
cients.) This shows that the response can be regarded as quadratic in log(dose):
¬ÑÛÛEÞ{ÈÔÙF«ÛÒÞÑÐ
åÒt«ÐbèåÒÍ Å Ç«ÛÏ|ÍÞ«Pß*)Ï«ÝÐ3ÇÕ Í Å Ç[Û.ßß
2 Þ«I«43ÞbÐ
åÒ&ËÐÈ{Û3¬=«.Þ ×«ÐèåÒ[Í Å Ç[ÛÏbÍÞ«Pß!)«ÝÐ$ÇÕ ÝÞ|ÎqÞ ×ðÍ Å Ç[ÛBß
Ç/. Ï{ݬ Þ«3+ 3
ÏÌ Ø65 qÇbÝPÏbÞÌ 75 PÞ{ñ
ÊÑàXÙÚ#àá5Êbà_Ù"ëØ#^ÊbàXÙààØàë±à_Ù"ëàá÷àXÙ"áØ
2 Ð.Ç|ËËÏÍÏÇÌ.Î-+3
PÞ«¬Ç8
Î.Ý_Ù:9ÈÈ.ÐbÈ Î
PÞ«¬Ç*ÈaÒ 4; Î ; ß
Ò<$Ì.ÎqÇ|ÈPÍÇ ªÎß Ù##Úà à_Ù"àÚàá Ù=à àXÙ"àààà
«Ý3Ð Çå=Ù > Ø\Ù=#à à_Ù[ØàØë Ø"Ù"áëà àXÙ"àààà
«Ý3Ð Çå?Ù 5 Êbà_ÙëÚ à_Ù[ØàØë Ê@XÙ"ëØá àXÙ"ààë#
«Ý3Ð ÇåÙ 2 à_Ù"àëë à_Ù[ØàØë àXÙ"ëëÚÚ àXÙëë
/Ç .Ï{Ý ¬Þ «×|ÎqÞÌÝÞ|ÈÝ5Ç|ÈÈ.ÐbAÈ 3]à_Ù Ø ZÐ|Ìá^Ý.Ç|èÈqÇÇ/ ÐËZËÈPÇÇbÝÐÛ
*¬3«ÎÏ ª=«.Ç @Ê
Dõ ¬Þ|ÈqÇb4 Ý 3à_Ù áØá
13.3 Designed Experiments 35
Ê/|ÎqÞ|ÎÏ|ÎÏÍ-3B#Ù#ëZÐÑÌCðÞÌÝá^ÝÇ{èÈqÇÇIéÐËZËÈPÇÇÑÝÐÛyÕ
Î Å ÇϪÊDP
Þ«D¬Ç^ÏjØEÙÚÚP ÇÊØÚ
2 ÐbÈÈqÇ«Þ{ÎÏ{ÐÑÌÐË 2 Ð.Ç|ËËÏÍÏÇÌÎ-+3
<Ò $ÌÎPÇ{ÈPÍÇpª.ÎßÝ«ÝÐ3ÇåÙE>|«ÝÐ3ÇaÙ?5
«Ý3Ð Çå=Ù > à
«Ý3Ð Çå?Ù 5 à à
«Ý3Ð ÇåÙ 2 à à à
The central concept for designed experiments is a factor. Consider the famous Box-Cox poi-
sons data (survival times (in hours) of animals with 3 poisons and 4 antidotes, from Box & Cox
(1964), J. Roy. Statist. Soc. B26, 211–252 and Box, Hunter & Hunter (1977), Statistics for Ex-
perimenters). The function tq¡6& ¢bP generates the rows, columns and so on – consult its
help page for full details.
|ÎϺÛ\IÇ ZÉ× Ê ÍÞÌ_6Ò FUªqÐq*Ï bÐ|Ì£ÙÝÞ|GÎ Fß
Ë ÌÞ[Û\ÇIZÉÊ«qÏ*{ÎåÒ&ÎÈPÇÞ|Î.×>9HH9
_ã[ØI3EçåÕóÈPÇpª=«×ØI3EXÕªÐPÏbÐÑÌ×ÍÒ6F@$7FÕ<F@data $$7FÕ<F@$$$%Fbßß
in hours
QB QB
I
I
6
6
median of stimes
SU2
mean of stimes
II
T43
D
D
TS3
5
5
RC 1
RC U214 II
4
4
PA PA
3
Vtreat Vtreat
3
Wrepl X
III
poison
Wrepl X
III
poison
Factors Factors
12
12
10
10
stimes
stimes
8
8
6
6
4
4
PA QB RC
2
2
Vtreat D I
Xpoison
II III
poison poison
4 5 6 7 8
8
II I
median of stimes
I II
mean of stimes
III III
4 6
PA QB RC PA QB RC
2
Vtreat Vtreat
D D
•
20
•
resid(poisons.aov)
15
••
2
•• •
••••••••••
10
•••••••••••
0
••••••••••
••••
• ••
5
-2
•
•
•
0
-4 -2 0 2 4 -2 -1 0 1 2
resid(poisons.aov) Quantiles of Standard Normal
• 95%
4
•
resid(poisons.aov)
-100
Log Likelihood
•
2
•
• • •
••• • • • • •
••• •• • • • • •
0
-120
••• •
•
• •
• •
•• • •
-2
•
-140
•
•
2 4 6 8 -2 -1 0 1 2
Y fitted(poisons.aov) Lambda
¬ÑÛÛEÞ{ÈÔaÒÞbÐ
åÒ8|ÎÏ~ÛEÇ/!)éÎÈPÇÞ|Î î ªqÐqÏ*ÑÐÑÌ î ËÏÎ-ï|ë î ÎÈPÇÞ|ÎA3`ªqÐqÏ*bÐ|Ì\ßß
Ë
Db¬ ÛGÐbËN
õ qÇÞÌO
õ qÞ«¬Ç ÈaÒqß
ÎÈqÇÞ{Î ë_Ù[Øëàá!à_Ù#àáGØ_ÙàÚÚZàXÙààààà
ªqÐqÏ*ÑÐÑÌ ë Øà_ÙàØë ÚØ\Ù"áÚàáëZë_Ù"ëëØ#^àXÙàààààà
$\ÒSËÏÎ-.ï{ëPß Ø ØÚ_Ù #ë ÃØÚ_Ù#ëë á_ÙØØëZàXÙàØëÚØÚ
ÎÈqÇÞ{ÎA3 ª ÐPÏbÐÑÌ Ú XÙá \Ø Ø\Ùëë# à_ÙááZàXÙÚØë#ëÚ
Ç/. Ï{ݬ Þ«3 á à_Ù"à #ëÚ ë_Ù"ëë .ë
indicating the need for transformation. The Àåx6=6-6[} function protects the argument from ex-
µ .µ µ µ · µ .µ
pansion; xt9Pt/9 u ¢ \}Z is equivalent to 9qt/9 u ¢ 9Pqt/9 u ¢ and generally
x tq¡I9 µ }Z gives up to n-th order interactions.
µ P \¡I9¢ µ :
There is no direct Box-Cox function, but we can do the operations by hand. They are quite
slow (25 secs on a SparcStation IPC), due to the overhead of calling the t 5
$ñ «ðÉÊÇÑõÒÊÑëÓÕØÕÆÔ.×àXÙ[Øbß
«Ðbè$«qÏÖðÉÊ5Þ/åÙE
PÇÍ{Î.ÐbÈåÒ&ñA«Pß
ÌÉʸ«.ÇÌèÎ Å Ò{ÎϺÛ\ÇIß
Ì=«|Ì.è{ÛüÉ° Ê «ÐbèåFÒ ª.È.ÐÝ8Ò |ÎÏ~ÛE/Ç .ßß
Ë.ÐbÈåÒÏZÏÌ IØ 3Æ«ÇÌ.èÎ Å ÒS$ñ «Pßß â
ÏËaÒÞÆ \ÒS$ñ «_ãºÏçß à_Ù"àØbß
â
IZÉ× Ê p¬bÛÒÒÞÑÐ
aÒ |ÎϺÛ\ÇI.ïñ$«_ãºÏç)eÎÈPÇÞ{Î î ªqÐqÏ*bÐ|Ì\ß[|ÈPÇ/.Ï{ÝPßï{ëPß
«ÐAè «PÏÖãºÏçðÉÊÜÌì «ÐèaÒÞ"Æ \ÒS$ñ «Xã~ÏçßßeÊ$ÌôÑëì«ÐèaÒIß î ÒSñ$«Xã~ÏçqÊqØßbìÌ3«ÑÌ.è{Û
ê
Ç «$Ç
â
IZÉ× ß )eÎÈqÇÞ|Î î ªÐPÏ bÐÑÌß [{ÈPIÇ Ï|Ýqßï|ëqß
Ê p¬bÛÒÒÞÑÐ
aUÒ «Ðbèå8Ò |ÎÏ~ÛE/Ç .*
«ÐAè «PÏÖãºÏçðÉÊ÷ÊéÌôbë.ì «Ðbèå8Ò I.ßeÊe=Ì «|Ì.è|Û
ê
ê
ª=«ÐÎaÒ&ñ$«åÕÒ«Ðbè$«qÏÖXÕ ñ$«ÞÆj×\F<>qÞ[ÛÆÝÞ7FÕfÔ$«ÞÆÃ×OF<>Ðè>ÏÖPÇ«qÏ Å ÐÐÝ]FÕfÎﻂ ×\F«]Fß
«Þ«ÛÆÝÞ Å Þ|Îðɸ Ê «Ðb$è «qÏÖÆã «Ðb$è «PÏÖZ××eÛEÞ|ñaUÒ «Ð$è «qÏÖßç
«PÏ~ÛÏÎðÉÊ°«Þ«ÛÆÝÞ Å Þ|Î^Ê÷àXÙ"Úðì õ.Í Å Ï*bõÒºà_ÙÚåÕÜØß
Þ3Æ «PÏ[ÌÇ\tÒ «PÏ~ÛÏÎ_Õ]àqß
ÍÞ «5ÉÊÒ ªÞ{ÈåDÒ Fd¬{^È Fbß=ã ç5Ï Ê ªÞ|ÈåDÒ Fd¬"|^È Fbßã çßbô ªÞ|ÈåDÒ FdªÏGÌ Fßã"ëç
ÎPÇ{ñÎaÒÍ\ÒS$ñ «_ã[ØçßEÕ «PÏ~ÛÏÎ î à_Ù[Øé| ì ÍÞ «å, Õ F*Ú _Fß
A more efficient way (4 secs) is to use the function ` ;. ; in the library \¢u\vqw :
µ µ
v¢b§tw£x[\¢buvPbw}
` µ ;. µ ;yx 9¢|
\ ^ù 9P/t 9 u µ ¢ .µ }
Now consider a Latin square. Six litters of six piglets were ranked in order of birthweight,
providing a acb,a table, and each piglet given one of 6 dietary supplements in a Latin square.
The weight gain (in kg) over 12 weeks is given in the table.
13.3 Designed Experiments 38
ÝPÏbÇ|ÎðÉÊ×ÍÞÌ_ÒÕ<FFbß
9 d 2e
2 e d'9
d 2 9 'e
e2 9d
9 e2 d
d e 9 2
K ÎèPÞ.Ï[ÌGÉÊ°ÍÞÌ_Òß
ÚXÙ##_Ù"ëØéÚ_ÙëZáXÙÚ#_Ù"ëÙ
ÚXÙ ëZÚ_Ù#N_# ÙZ áXÙÚZ á_Ù#^ á_Ù"àÚ
XÙ«ØZ Ú_Ù5# Ú_Ù=
Ø(X # ÙE_ # Ù=j á_Ù=
XÙÚZ Ú_Ù"ëØéÚ_Ù"áØf Ù_ # Ù"á^ Ú_Ù=
áXÙàÚ_ Ù[Øá5á_Ù"ëZ # ÚXÙ^ á_Ù# ØeÚ_Ù##
Ù ÚZá_Ù Ú5Ú_Ù"ÚáX # ÙÚà_ # Ù"àj
á_Ù"ëë
ÝPÏbÇ|ÎðÉʱËqÞÍ{Î.ÐÈaÒºÝqÏÇ|Îß
«Þ{ÎÏÌÃÉÊ Ý.Þ|ÎPÞaÙËÈPÞ[Û\Ç\ÒSËPÞÍaÙ"Ý.ÇI.ÏèÌÓÒÍ\Ò~áÓÕ áPßEÕ «PÏ|ÎåÒ"Æ.ÈqÞÌ.Ö×Ø^3"áÓÕ«PÏÎÎPÇ{È.×ØI3"áqßß\Õ
î ÝqÏÇ{ÎXg
Õ KÎèPÞÏÌß
ª=«ÐÎÙÝ/Ç .ÏèbÌ_tÒ «Þ|ÎÏÌß
ÏbÇ|ÎðÉÊ 2 ÒºÝqÏÇ|Î_Õ]ÎÈqÇÞ{Î|ÛEÇÌ.Îß
«Þ{ÎÏÌÙ&ÞÑÐ
^ÉÊ^ÞbÐ
å?Ò KÎèqÞ.ÏC Ì )$Æ.ÈqÞÌ.Ö î «qÏÎÎPÇ|È îh ÏÇ{ÎXÕÒ«.Þ|ÎÏ[Ì\ß
¬ÑÛÛEÞ{ÈÔaUÒ «Þ{ÎÏ[Ì£Ù&ÞÑÐ
ß
¬ÑÛÛEÞ{ÈÔFÙ «ÛUÒ «.Þ|ÎÏ[Ì£ÙSÞbÐ
ß
The last command gives t-values for the contrasts (diet ? i diet A).
¬ÑÛÛEÞ{ÈÔaÒU«Þ{ÎÏ[Ì£Ù&ÞÑÐ
ß
Ë
Ѭ ÛÐË
õ PÇÞ
Ì
õ PÞ «¬Ç ÈaÒPß
Æ.ÈqÞÌÖ Ú #XÙëàÚGØ\Ù"ÚàZë_Ù#á# àXÙà.Ú.Úá
«PÏÎÎqÇ|È Ú #XÙ#ëà\ØZØ\Ù"Ú.àZë_Ù#Ø#àë àXÙàáë##
ÏbÇ|Î Ú ØØEÙáØ#ÚØ$ë_ÙëÚà&Ù"àÚØ àXÙàØàØÚà
/Ç .Ï{Ý ¬Þ«3 ëà ØØEÙáÚZà_Ù"Úáë
¬ÑÛÛEÞ{ÈÔÙF«ÛÒU«.Þ|ÎÏ[Ì£ÙSÞbÐ
ß
2 Þ «I«43ÞbÐ
åÒ&ËÐÈ{3Û ¬=«.Þ & × KÎèPÞÏÌ\)$ÆÈPÞÌÖ î «PÏÎÎPÇ{È î ÏbÇ|ÎXÕ ÝÞ|ÎqÞ ×|«Þ{ÎÏÌß
/Ç .Ï{Ý ¬Þ «3+3
Ï[Ì @Ø 5!qÇbÝPÏbÞÌ 5 qÞ|ñ
ÊÑëXÙàÚØeÊbàXÙë àáZà_Ù[ØëØØ$à_Ù #ØÚ÷àXÙ àáØ
2 Ð.Ç|ËËÏÍÏÇÌ.-Î +3
PÞ «¬8 Ç
Î.Ý_:Ù 9ÈÈ.ÐbÈ Î
PÞ «¬* Ç ÈaÒ 4; Î ; ß
<Ò $Ì.ÎqÇ|ÈPÍÇ ªÎß Ú_Ù"áàÚà à_Ù à # Ø XÙ"ëØëë àXÙ"àààà
ÙÙÙÙÙÙÙÙÙÙÙÙÙÙÙ
ÏbÇ|Î e à_=Ù áØ # à_=Ù Úë ØEÙ"àáà # àXÙ àØÚ
ÏbÇ|Î 2 à_=Ù à à_=Ù Úë àXÙ ëá # àXÙ áÚØ
ÏbÇ|Î à_Ù ÚÚà à_=Ù Úë àXÙ ØÚá àX=Ù ë
ÏbÇ|Î 9 à_Ù #àà à_=Ù Úë ëXÙ"ëë # àXÙ"à #Ú
ÏbÇ|Î Ø\Ù #Ú à_=Ù Úë Ù"à àXÙ"àààá
ÙÙÙÙÙÙÙÙÙÙÙÙÙÙ
13.4 Generalized Linear Models 39
Binary Data
The following example is taken from D. Collett (1991) Modelling Binary Data, page 217.
µ
Numbers of rotifers falling out of suspension for two species (Polyartha major and Keratella
cochlearis) are given for different fluid densities in the table, as file 9¢ 6St/9 :
klmn6o"p6qsrDt-uvqwr6t-uvp@xpwyzuvqwyzu{pxp
|uE}|"~ |6| 6 |"*|"|
|uE}D6} 6 |
,
@
|uE}D| |"} 6 6}6
|uE}D6} |"~ 6 |"}6D
|uE}D6} ~ 6 |
!|"D~
|uE}D6} | 6 6*|"|
|uE}D| |" 6~ 6*|"D
|uE}
}
6
66D
|uE}
} |"} | 6*|6|
|uE}
| 6 6 6*|"D
|uE}
6} 6
@
|uE}
~
6~ 6
@
|uE}D6} 6} 6 ~
@~
|uE}D6} ~ |
!|"D}
|uE}D6} |
|" |
|uE}D| |"} 6 6
@
|uE}D6
6 ~
!|"}|
|uE}D6} 6 6 6 D
|uE}D6}w
6w
~6*|"6*|"~D}
|uE}D6} 6 6~*|"
!|"
•
•
• ••
0.80.6
pm.prop
•
0.4
•
0.2
•
•
• • •
• •
• •
•
•
•
0.0
Figure 6: Plots for Rotifer data. The square symbols and dashed line indicate species Polyartha
major.
È.ÐbÎÏËPÇ|È=ZÉÊíÈPÇÞÑÝXÙ!ÎPÞÆ3«ÇÒ6FÈÐÎÏËPÇ{ÈÙÝÞ|ÎGFÕ Å ÇÞbÝ.Ç|È×Hß
È.ÐbÎÏËPÇ|=È
Þ|ÎÎPÞÍ Å ÒSÈ.ÐbÎÏËqÇ|=È .ß list the data frame
=«ÐÎaÒºÝÇÌ ÏÎÔ_ Õ ÑÛ¸vÙ .È.6Ð Õ]ÎÔ Çb× FºÌ FÕAÔ «PϺÛ×ðÍ\Ò~àÓÕØßß and plot them
qÐqÏÌ-Î \Ò~ÝÇÌ .ÏÎÔ_ Õ Í Å ×àPß
Õ ÑÛ¸Ù È.6Ð :
qÐqÏÌ-Î \Ò~ÝÇÌ .ÏÎÔ_ÕÖqÍåÙ È.6Ð \ß
«DÇ
PÇ «3b×^Í\6Ò Fb+Û F,
Õ F[ÖP%Í Fbßß
È.ÐbÎÏËPÇ|Èë5ÉÊéÝÞ|ÎqÞåÙ!ËÈPÞ«ÛEÇÒºÝÇÌ í×ðÍ\ÒºÝ.Ç"Ì .ÏÎÔXÕ ÝÇÌ .ÏÎÔßEÕ
ÔP/Ç ÷×ÃÍÒ ÑÛ¸ÙÔ_Õ]ÖPÍaÙÔßEÕ]ÎÐÎ^×ðÍ\Ò bÛ¨ÙÎ.ÐbÎXÕfÖPÍaÙÎ.ÐbÎß\ Õ ÇÍ.ÏbIÇ ß
Þ|ÎÎPÞÍ Å ÒSÈ.ÐbÎÏËqÇ|ÈëPß
è$«Û¸Ù!È.ÐÎjÉÊeAè «ÛÒ[ÍÆÏÌqÝÒ&ÔqIÇ ÕfÎ.ÐbÎPÊ{ÔPIÇ ß ) ÝÇÌ Z| ì ÇÍÏIÇ ÕÆ\Ï[ÌqÐÛÏÞ «tÒ «ÐèÏÎßß
13.4 Generalized Linear Models 41
è$«Û¸Ù!È.ÐÎ
è$«Û¸Ù!È.ÐÎjÉÊeèA«ÛÒ[ÍÆÏÌqÝÒ&ÔqÇIÕfÎ.ÐbÎPÊ{ÔPÇIß)ZÊqØ î ÝÇÌZì|Note ÇÍÏÇ/ÕóÆ\ÏÌÐÛÏÞ«ÒU«ÐèÏÎßß
the parameterization used
è$«Û¸Ù!È.ÐÎ
@Ð ÎÏ{ÐÑÌ ÒÍÑÐÑÌ.ÎÈP/Þ |-Î Ñ×Í6Ò FÑÍÑÐÑÌÎÈ_ÙÎÈPÇÞ{Î|ÛÇÌ^ Î F
Õ FÑÍbÐ|Ì.ÎÈÙseparate
ÐI«bÔ^Fßß means for each species
è$«Û¸Ù!È.ÐÎjÉÊeAè «ÛÒ[ÍÆÏÌqÝÒ&ÔqIÇ ÕfÎ.ÐbÎPÊ{ÔPIÇ ß ) ÝÇÌ ì/ÇÍ.ÏbÇIÕ ÆÏÌqÐÛÏbÞ«Òt«ÐbèÏÎßß
è$«Û¸Ù!È.ÐÎ
¬ÑÛÛ\Þ|ÈÔaÒ&Aè «Û¸Ù!È.ÐbÎß
ÞÌÐ
qÞ\Ò&Aè «Û¨ÙÈ.ÐbÎß over-dispersion, but a common slope
è$«Û¸Ù!È.ÐÎjÉÊeAè «ÛÒ[ÍÆÏÌqÝÒ&ÔqIÇ ÕfÎ.ÐbÎPÊ{ÔPIÇ ß ) ÝÇÌ î Çlooks ÍÏIÇ OK
ÕÆ\Ï[ÌqÐÛÏÞ «tÒ «ÐèÏÎßß
«PÏ[Ì/Ç \ÒºÝ.Ç"Ì .ÏÎÔXÕfËÏÎÎPÇÑÝÒS$è «Û¨ÙÈÐÎß:ã ÇÍÏ/Ç b×]× FºÖP%Í F[çß
«PÏ[Ì/Ç \ÒºÝ.Ç"Ì .ÏÎÔXÕfËÏÎÎPÇÑÝÒS$è «Û¨ÙÈÐÎß:ã ÇÍÏ/Ç b×]× F?b-Û F[çXR Õ «bÎÔ.× Pß
ñ.Ý.ÇÌGÉ| Ê ÇbõÒ|Ø\Ù"àëåÕÜØEÙà #åÕYàXÙààØbß these lines are rather crude, so try harder!
Poisson Data
We consider the log-linear analysis of a contingency table. As this has two ‘history’ factors
and two levels of the the response, it could also be treated as binomial data. The response is
the occurrence of coronary heart disease. The table is of the form:
blood pressure
serum
chd cholesterol 1 2 3 4
yes 1 2 3 3 4
2 3 2 1 3
3 8 11 6 6
4 7 12 11 11
no 1 117 121 47 22
2 85 98 43 20
3 119 209 68 43
4 67 99 46 33
¾q¾7µ 6"Evb
q Evb
¸x~
ù
ûu P û.¡¦ tb
a¢vwý.u µ ¢ .µ £It/9Etý¾q¾}
t 5Et_x~¾P¾Cq6 ð
vb
7Õ9\ 9ý..¦¢]} ù · q_·
¾q7¾ 6"Evb
µ¼ u /t 9E_xº¾q¾C6Eµ vb
¨î6 6
uP ¡b¦\}
u\t.x&
ýP¡_x }q}X¯]u\v 9£xº¾q¾76"\v|
}
µ
The t 5Et command gives an analysis of deviance for Evb
objects:
ÞÌÐ
PÞÒ&ÖÖÙèA«Û£ÕÎqÇI|Î×F 2ÑÅ ÏFß
dÌÞ«b-Ô .Ï ÐË 6Ç
ÏÞÌÍ Ç HPÞ3Æ «Ç
.Ðq*Ï /bÐÑÌéÛÐÝ.Ç «
14 Multivariate Analysis
u\t¢ ,
§ ¦ and u¢b . There are also functions for classical multivariate analysis.
S-Plus is particularly rich is functions for exploratory multivariate analysis, such as
Clustering
The workhorses here are E¢ 9 which computes distance matrices (also used in ¡
¡qtPvq ) and
¦\¡Pv 9 which computes a cluster tree by single-, average- or complete linkage.
\ ¢ 9 Distance matrix calculations
¦\ ¡Pv 9 Hierarchical clustering
¡ 9 q Create groups from a cluster tree
u\vP¡qv 9 Plot a cluster tree
vqt§\µ ¡Pv 9 Label a cluster tree plot
¡qb v P Re-order leaves of a cluster tree
=§ 9 q Extract part of a cluster tree
¡Pv q9 ”model-based” clustering
¡Pvqt µ auxiliary functions
qv ¡
Graphical Methods
¡b
\tP¡q ¡q tqvP Classical multi-dimensional scaling
9q
Chernoff’s faces
Minimal spanning tree
9Et µ Star plots
§¢u\v 9 Biplot (v 3.2)
Two analyses of socio-economic data on Swiss cantons:
«PÏ[Æ.ÈqÞ|ÈÔaÒ&ÈÏ<=«.Ç|Ôß
ÝðÉÊ ÝqÏ*{ ÎåÒD KÏIå Ù!ñß
ñGÉÊZÍ«ÛÝ$ ÍÞ« ÇÒºÝPß
qÍ Ø±ÉÊeñãZÕØ[çIÍÑëðÉÊeñãÕ ëç
Çbõ$Í=«ÐbÎåÒ[ÍqØÕYÍbëåÕYÎÔÇb×]F«ÌFbß(ZËÈ.ÐÛë«PÏÆÈPÞ{ÈÔåÒSÈÏ"=«Ç{Ôß
ÎPÇ{ñÎaÒÍqØÕIÍbëåÕÇÑõÒ[ÍqØßß
Å ÉÊ Å Í «D¬|ÎaÒºÝqß
=«.Í «D¬|ÎaÒ Å ß
Í ¬ÎÈqÇÇ\Ò Å
Õ qß
=«.Í «D¬|ÎaÒÍ «ÐÈÝÇ{ÈåÒ Å Õ]Í ¬ÎÈPÇÇ\Ò Å ß ÈqÇÊÑÐÈ.Ý.Ç|ÈZÎÈPÇÇ^ÏÌÎ.бΠŠÈPÇÇ èÈ.ÐD¬"
Õ qßß
Multivariate Analysis 44
Matrix Methods
c
-4
¡v ¡ v
c s
c c c c c c
-5
¡v c cc c
c
c
c
ss ss
second discriminant variable
¡v ¡v ¡v c ¡v c cc c
¡v ¡v c c c c s s s s
¡ sss s
-6
ss
¡ ¡ v c c
c c c c c c
ccc
¡v ¡v v¡ ¡v¡ v¡ ¡v
c
ss ss s s
¡v
v vv
¡v ¡ ¡v c c c c c c c ss
s
¡v ¡v ¡v ¡v¡v
c s s s ss ss s
-7
v
s s s
s s
c
¡vv ¡v ¡v c s
c s s
¡ ¡v
¡ c s s s s s
¡v¡ v ¡v v s
-8
v¡ ¡ v ¡ v
s s s
¡v
vv
¡v ¡v
¡v ¡v v ¡
-9
¡v s
-10 -5 0
5
first discriminant variable
A Libraries
Libraries are a mechanism to add ‘packages’ of extra objects (functions and datasets) to S. To
find out which libraries are available type
v¢b§tw£x|}
which on one of my systems gave:
H Å ÇeËÐI«/«ÐKÏÌèÞÇÍ|ÎÏ{ÐÑÌ"ZÞ|ÈqÇZÞ6
qÞ.ÏD«.ÞÆ3«Ç^Ï[Ì^Î Å Ç«qÏÆÈPÞ|ÈÔA3
9 2 H%$¢ e $69 9
2 $6H$¢
ÝÞ{ÎP/Þ Ç|=Î
Í Å Ç[Û Íb6Ð Ç{ÈGÏh Ì K Å /Ð «Ç«ÛEÇÞ «±$Ë «Ð ¬È
ÞÆÆÇ{Ô Ì\ÏbÍ|ÖPÇ «ðÏë Ì |ÔPÇÌ\ÏÎPÇeÈÐÍ|Ö
©
A.1 Library <=« ÈÏ .Ç|Ô 46
ÛÏT«Ö «ÇÞbÝðÏ[ÌZÛÏD«bÖ&qÐK.ÝÇ{È
Å ÐÑÌÇI e Ç« èÏbÞ̪¥E Å ÐÑÌÇ÷ÍÞ«/«3jØÚàÊqØ#
To use the library, invoke it by
v¢b§tw£x name }
which attaches it as a data directory at the end of the search list. Thus libraries cannot over-ride
standard functions nor your own functions. To make a library over-ride the system functions,
use
v¢b§tw£x name ¢ 9ý.©}
which attaches it at position 2 (after the 6 «\tI9\t directory).
ÛÝÇÞ|Î Å Õ Ë.ÝÇÞ|Î Å time series on UK lung deaths 1974-9 from Diggle
«Ç ¬Ö
dataset on times of Scottish hill races
«Å
(uncensored) survival times on leukaemia patients
ÛEÍ|ÔqÍ «.Ç body weight(kg) and brain weight (g) of mammals, from Weisberg
*È ¬ÆÆÇ{È
dataset on relating permeability to physical measurements
Å "Ï
dataset on rubber wear
Many S users have generously collected together their functions and datasets together into li-
braries and made them publically available. An archive of libraries is maintained at Carnegie-
©
A.2 Sources of Libraries 47
Mellon as a service to the statistical profession by Mike Meyer. To obtain details of its contents
by e-mail send a message to
9\tI9\v¢§]®v¢b§6 9\tI96º¡b
6º
with body
¢b/;
¢b/; µ
°¯
Ftp to v¢§C6 9\It 9C6«¡|
6&. with user 9\tI9\v¢b§ is also available.