FPLRP 17

U. S.
FOREST SERVICE RESEARCH PAPER FPL 17 DECEMBER 1964
LINEAR
REGRESSION
METHODS
for FOREST
RESEARCH
U. S DEPARTMENT OF AGRICULTURE
FOREST
PRODUCTS LABORATORY
FOREST SERVICE
MADISON. WIS
SUMMARY
This Research Paper discusses the methods of linear regression analysis that have
been found most useful in forest research. Among the topics treated are the fitting and
testing of linear models, weighted regression, confidence limits, covariance analysis,
and discriminant functions.
The discussions are kept at a fairly elementary level and the various methods are
illustrated by presenting typical numerical examples and their solution. The logical
basis of regression analysis is also presented to a limited extent.
ACKNOWLEDGMENTS
Appreciation is extended to Professor George W. Snedecor and the Iowa State University
Press, Ames, Iowa, for their permission to reprint from their book Statistical Methods
(ed. 5), the material in tables 1 and 8 of Appendix E of this Research Paper.
We are also indebted to the Literary Executor of the late Professor Sir Ronald A.
Fisher, F.R.S., Cambridge, to Dr. Frank Yates, F.R.S., Rothamsted, and to Messrs.
Oliver and Boyd Ltd., Edinburgh, Scotland, for their permission to reprint Table No. III
from their book Statistical Tables for Biological, Agricultural, and Medical Research
(Table 7 of Appendix E of this Research Paper): also Table 10.5.3 from Snedecors
Statistical Methods (ed. 5) shown as Table 6 in Appendix E of this Research Paper.
CONTENTS
INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
REGRESSION--THE GENERAL IDEA . . . . . . . . . . . . . . . . . . . . . .

A Moving Average . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Fitting a Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Confidence Limits and Tests of Significance . . . . . . . . . . . . . .
Interpreting a Fitted Regression . . . . . . . . . . . . . . . . . . . . .
THE
MATHEMATICAL MODEL . . . . . . . . . . . . . . . . . . . . . . . . .
FITTING A LINEAR MODEL . . . . . . . . . . . . . . . . . . . . . . . . .

The Least Squares Principle . . . . . . . . . . . . . . . . . . . . . .
Problem I--Multiple Linear Regression with a Constant Term . . .
Problem II--Multiple Linear Regression Without a Constant Term .
Problem III--Simple Linear Regression with a Constant Term . . .
Problem IV--The Arithmetic Mean . . . . . . . . . . . . . . . . . . .
Problem V--Fitting a Curve . . . . . . . . . . . . . . . . . . . . . .
Problem VI--A Conditioned Regression . . . . . . . . . . . . . . . .
Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
15
15
19
21
22
23
23
24
27
FITTING A WEIGHTED REGRESSION . . . . . . . . . . . .

Problem VII--A Weighted Regression with a Constant
Problem VIII--Ratio Estimators . . . . . . . . . . . . .
Transformations . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
28
30
31
33
.
.
.
.
.
.
34
34
37
37
39
42
ANALYSIS OF VARIANCE . . . . . . . . . . . . . . . . . . . . . . . . . . . .
A General Test Procedure . . . . . . . . . . . . . . . . . . . . . . . .
Degrees of Freedom . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Problem IX--Test of the Hypothesis that 1 + 2 = 1 . . . . . . . . .
Problem X--Test of the Hypothesis that 2 = 0 . . . . . . . . . . . .
Problem XI--Working With Corrected Sums of Squares and Products
Problem XI--Test of the Hypothesis that 2 = 3 = 0 . . . . . . . . .
Problem XIII--Test of the Hypothesis that 1 + 22 = 0 . . . . . . . .
Problem XIV--Hypothesis Testing in a Weighted Regression . . . . .
An Alternate Way to Compute the Gain Due to a Set. of X Variables .
46
47
50
51
54
57
59
61
63
64
. . . . . . . .
Term . . . .
. . . . . . .
. . . . . . .
SOME ELEMENTS OF MATRIX ALGEBRA . . . . . . . . . . . . . . . .

Definitions and Terminology . . . . . . . . . . . . . . . . . . . . .
Matrix Addition and Subtraction . . . . . . . . . . . . . . . . . . .
Matrix Multiplication . . . . . . . . . . . . . . . . . . . . . . . . .
The Inverse Matrix . . . . . . . . . . . . . . . . . . . . . . . . . .
Matrix Algebra and Regression Analysis . . . . . . . . . . . . . .
FPL 17
-i-
.
.
.
.
.
.
THE t-TEST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Problem XV--Test of a Non-Zero Hypothesis . . . . . . . . . . . . .
Page
66
69
CONFIDENCE LIMITS . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
General . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
^
Confidence Limits on Y . . . . . . . . . . . . . . . . . . . . . . . . .
Problem XVI--Confidence Limits in Multiple Regression . . . . . . .
Problem XVII--Confidence Limits on a Simple Linear Regression. .
Confidence Limits on Individual Values of Y . . . . . . . . . . . . . .
70
70
74
75
78
81
COVARIANCE ANALYSIS . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Problem XVIII--Covariance Analysis . . . . . . . . . . . . . . . . . .
Covariance Analysis with Dummy Variables . . . . . . . . . . . . . .
Problem XIX--Covariance Analysis with Dummy Variables . . . . . .
81
84
86
88
DISCRIMINANT FUNCTION . . . . . . . . . . . . . . . . . . . . . . . . . .
Use and Interpretation of the Discriminant Function . . . . . . . . . .
Testing a Fitted Discriminant . . . . . . . . . . . . . . . . . . . . . .
Testing the Contribution of Individual Variables or Sets of Variables
Reliability of Classifications . . . . . . . . . . . . . . . . . . . . . . .
Reducing the Probability of a Misclassification . . . . . . . . . . . .
Basic Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . .
94
96
97
98
99
100
101
ELECTRONIC COMPUTERS . . . . . . . . . . . . . . . . . . . . . . . . . .
101
CORRELATION COEFFICIENTS . . . . . . . . . . . . . . . . . . . . . . . .
General . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
The Simple Correlation Coefficient . . . . . . . . . . . . . . . . . . .
Partial Correlation Coefficients . . . . . . . . . . . . . . . . . . . . .
The Coefficient of Determination . . . . . . . . . . . . . . . . . . . .
Tests of Significance . . . . . . . . . . . . . . . . . . . . . . . . . . .
102
102
103
104
106
107
THE BEST OF TWO LINEAR REGRESSIONS . . . . . . . . . . . . . . . . .
107
SELECTED REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . .
110
APPENDIX A.--THE SOLUTION OF NORMAL EQUATIONS . . . . . . . . .

Method I--Basic Procedure . . . . . . . . . . . . . . . . . . . . . . .
Method II--Forward Solution . . . . . . . . . . . . . . . . . . . . . . .
Method III--Stepwise Fitting . . . . . . . . . . . . . . . . . . . . . . .
111
111
112
116
APPENDIX B.--MATRIX INVERSION . . . . . . . . . . . . . . . . . . . . .
118
APPENDIX C.--SOMESIMPLE FUNCTIONS AND CURVE FORMS .....
123
APPENDIX D.--THEANALYSIS OF DESIGNED EXPERIMENTS .... ..
128
APPENDIX
Table
Table
Table
133
133
135
136
FPL 17
E.--TABLES . . . . . . . . . . . . . . . . .
6.--The F-Distribution . . . . . . . . . . . .
7.--The t-Distribution . . . . . . . . . . . . .
8.--The Cumulative Normal Distribution
-ii-
.
.
.
.
. . . .
. . . .
. . . .
.....
. . . .
. . . .
. . . .
.....
.
.
.
.
LINEAR
REGRESSION METHODS
for
FOREST RESEARCH
by FRANK FREESE, Analytical Statistician
PRODUCTS
LABORATORY
FOREST SERVICE
U. S. DEPARTMENT OF AGRICULTURE
FOREST
INTRODUCTION
Many researchers and administrators have discovered the usefulness of regression

methods in deriving and testing empirical relationships among various observed
phenomena. In the field of forestry, for example, tree volumes have been expressed as
a function of diameter, merchantable height, and form class; the strength properties of
wood have been related to such characteristics as specific gravity, age, and average
rate of radial growth; studies have been made of how logging costs are affected by
average tree size, totalvolume, and distance from hard-surfaced roads: and site index
for various species has been related to certainproperties of the soil and topography.
Regression analysis provides an objective and widely accepted routine for fitting
mathematical models involving several variables. In addition, there are procedures
that can often be used to evaluate the fitted equation, and, with the development of
modern electronic computers, much of the computational dru dg e ry has been
eliminated.
Unfortunately, the obvious value and increased availability of regression methods
have resulted in their use by people who have had a rather meager knowledge of the
mechanism and its limitations. This is not necessarily a statistical catastrophemany people drive a car without having the slightest notion of what makes it go. But
the user of regression, like the driver of a car, will do a better job if he has learned
the best operating procedures and knows something of what the machinery can and
cannot do. The purpose of this paper is to provide some of this knowledge in relatively
simple terms.
1
Maintained at Madison, Wis.. in cooperation with the University of Wisconsin.
The expression relatively simple is not very informative. To be more specific,

it is necessary to spell out the level of knowledge that the reader is assumed to have.
Mathematically, nothing is assumed beyond high-school algebra. Though the solution
of simultaneous linear equations falls within this limit, a review of this topic is
given in Appendix A. The use of subscripts and the summation notation (for example,
n
m
x ) will not be reviewed; information on this subject is given in (3, 8).2 A
ij
i=1
j=1
knowledge of matrix algebra is not assumed. However, the so-called c-multipliers

play such an important role in regression analysis, and the term matrix appears so
often in regression literature, that a few pages are devoted to some of the basic
elements of matrix algebra.
The reader should have a knowledge of the elementary terms, concepts, and
methods of statistics. He does not have to be an expert but should have some idea of
the meaning of such terms as population, sample, mean, variance, standard deviation,
degrees of freedom, correlation, and normal distribution. He should also know the
rudiments of the analysis of variance and the t and F tests of significance. Those
who need brushing up on these topics should review one of the many textbooks on
statistical methods (1, 5, 7).
This research paper is not designed for statisticians but for research workers and
administrators who want to use some of the tools that statisticians have devised For
this reason, theemphasis will be on how rather than why. No attempt will be made
to give the theory, but for some of the methods described, a rather loose discussion
of the rationale may be given. It is hoped that when the reader becomes comfortably
familiar with some of the hows, he will find the time and inclination to take a
closer look at the whys.
REGRESSION - THE GENERAL IDEA

A Moving Average
The concept of the arithmetic mean or average of a population is familiar to most
people, particularly those who have had any exposure to statistical methods. Very
briefly, we envision a population of units, each of which can be characterized by a
variable (Y). There is a population mean ( ) around which the actual unit values are
y
distributed in some manner. Thus, the Y value of a given unit can be represented by
2
Underlined numbers in parentheses refer to Selected References at the end of this report.
FPL 17
-2-
where: Y = the actual value of Y for the i

i
th
unit
= the population mean of all Y values
th
1 = the difference between the Y value of the i unit and the population
mean (Yi -y). This is sometimes called a deviation or error.
A measure of how widely the individual values are spread around the mean is
known as the variance, and the square root of the variance is called the standard
deviation. For the population, the variance is the average squared deviation.
Now, think in terms of a series of such populations, each with its own mean and
variance. Often there will be some other characteristic (X) that has the same value
for all units within a given population but varies among populations. It also happens
at times that there is some sort of functional relationship between the mean Y values
for the populations and the associated X values. Graphically, such a relationship
might appear as shown in figure 1.
M 124 614
Figure 1.--Y and X values for four populations
z = Individual values of Y in population 1

1 = Mean Y for population 1
X1 = The value of X for population 1
o, , X = Similar values for population 2
2 2
r , , X = Similar values for population 3
3 3
, , X = Similar values for population 4
4 4
FPL 17
-3-
The line showing the relationship between mean Y and X is called a regression line,
and its mathematical expression is called a regression function. If the relationship
between the mean value of Y (y) and the value of X is a straight line, we could write
where: X =The value of X for the population having a mean Y value of .
y
a,b = constants, indicating the level and the slope of the straight line.
Thus, a regression can be thought of as a form of average that changes with changes
in the value of the X variable--amoving average. One of the aims in regression
analysis is to find an equation representing this relationship. In this relationship, Y
is usually called the dependent variable and X the independent variable. This does not
mean, however, that there has to be a cause and effect relationship: it only indicates
that the Y values are associated with the X values in some manner that can be
described approximately by some mathematical equation. Knowing the value of X
gives us some information about the value of Y. The person concerned with the
r e g r e s s i o n makes his own inferences as to what is implied by the indicated
relationship,
The equation
y = a + bX
specifies the relationship between the mean values of Y and the level of X. To indicate
that the individual values of Y vary about the mean, we might write
Or, since y varies linearly with the level of X,
In other words, this says that they value of any individual unit is due to the regression
of mean Y on X plus a deviation (i) from the mean.
If the spread (as measured by the variance) of the Y values about their mean ( y)
is the same for all of the populations, regardless of the value of the associated X
variable, the variance is said to be homogeneous. If the variance is not the same for
all populations, it is said to be heterogeneous.
Frequently, the populations can be characterized by more than one X variable (for
example, X1, X2, and X3) and it may happen that the mean (y) associated with each
combination of values of these variables is functionally related to these values. Thus,
we might have the regression equation
FPL 17
-4-
= + X + X + X
0
1 1
2 2
3 3
where: 0, 1, 2, and 3 are constants (usually called regression coefficients)
X1 , X2, and X3 are the numerical values of three associated characteristics.
y
This equation merely says that ifwespecifyvalues for X1, X2, and X3 then we would,
on the average, expect the characteristic labeled Y to have the value (y) given by the
equation. The relationship of y to the independent variables is sometimes spoken of
as a regression surface or a response surface, even though a direct geometric
analogy breaks down beyond two X variables.
Since y represents a mean value, some individual values of Y will have to be
higher and some lower than this. In short, we again write

or, in this case,
Yi = y + i
Y i = 0 + 1 X1 + 2 X2 + 3 X3 + i
And again, if the spread of the Y values about their mean is the same for all
points on the regression surface (that is. at all combinations of the independent
variables), the variance is said to be homogeneous. If the spread of Y values is not
the same at all points, the variance is heterogeneous.
In this introduction to the idea of a regression, we have talked as though there were
a number of separate populations--one for each value of X or one for each possible
combination of values for several different Xs. It is also possible (and more common)
to think in terms of a single population of units, each unit being characterized by a
Y value and one or more X values. There is a regression surface representing the
relationship of the Y value to the associated X values, but the Y values are not all
right on the surface: some of them are above it and some below. A given point on the
surface represents the mean Y value of all the units having the same X values as
those associated with that point. The spread of Y values above and below the surface
may be the same for all points (homogeneous variance) or it may differ from point to
point (heterogeneous variance).
Fitting a Regression
If there is a relationship between y and the independent variables (X1, X2, etc.),
it may be very desirable to know what the relationship is. To illustrate, it might be
used in predicting the value of Y that would, on the average, be associated with any
FPL 17
-5-
particular combination of X values; it could also be useful in selecting a combination

of X values that might be associated with some specified value Y; and it could suggest
how changes in Y are associated with changes in any of the X variables.
Ordinarily, the regression relationship will not be known but must be estimated
from observations made on a sample of the individual units. On each of the selected
units, we will observe the value of Y and each of the associated Xs. From these
observations, we must derive estimates of the coefficients ( 0, 1, etc.) in the
regression equation. Usually, we will also want to obtain some measure of the
reliability of these estimates.
A first step is to select a mathematical function or model which we think may
represent the relationship. Two broad classes of functions should be recognized;
those that are linear in the coefficients and those that are nonlinear in the coefficients.
An equation in which the coefficients are raised to only the first power and are
combined only by addition or subtraction is said to be linear in the coefficients.
Some examples are:
(1) Y = a + bX
2
(2) Y =a + bX + CX
(3) Y = + X + X + X
0
1 1
2 2
3 3
1
(4) Y = + X + ( )
1 1
0
2X
1
Note that the model can be linear in the coefficients even though it is nonlinear as
far as the variables (Y and X) are concerned
An equation in which the coefficients are raised to other than the first power,
appear as exponents, or are combined other than by addition or subtraction are said
to be nonlinear in the coefficients. The following are examples:
(1) Y = a + b
b
(2) Y = aX
c
(3) Y = a(X-b)
In some cases models that are nonlinear in the coefficients can be put into a linear
form by a transformation of the variables. Thus, the second equation above could be
converted to a linear form by taking the logarithm of both sides, giving
or
FPL 17
log Y = log a + b log X

Y' = a' + bX'
-6-
where:
Y ' = log Y
X ' = log X
This Research Paper will be confined to the fitting and testing of linear models.
The fitting of nonlinear models requires more mathematical skill than is assumed
here. While this will be an inconvenient restriction at times, it will often be found
that a linear model provides a very good approximation to the nonlinear relationship.
Having selected a mathematical model, we should next examine the variability of
the Y values about the regression surface. Two aspects of this variability are of
interest:
(1) Is it the same or nearly so at all points of the regression surface
(homogeneous variance), or does it vary (heterogeneous variance)? In the latter case,
we would also like to know how the variance changes with changes in the independent
variables.
(2) What is the form of the distribution of individual Y values about the
regression surface? In many populations, the values will follow the familiar Normal
or Gaussian Distribution.
The answer to the first question affects the method of estimating the regression
coefficients. If the variance is homogeneous, we can use equal weighting of all
observations. If the variance is not homogeneous, it will be more efficient (and more
work) to use unequal weighting. The answer to the second question is needed if tests
of significance are to be made and confidence limits obtained for the estimated
regression coefficients or for functions of these coefficients.
These questions are not easily answered, and the less fastidious users of regression
tend to bypass them by making a number of assumptions. Given sufficient critical
familiarity with a population, the assumptions as to homogeneity of variance and form
of distribution may be quite valid. Without this familiarity, special studies may have
to be made to obtain the necessary information.
Confidence Limits and Tests of Significance
When the mean of a population is estimated from a sample of the units of that
population, it is well known that this sample estimate is subject to variation. Its
value will depend on which units were, by chance, included in the sample. Such an
estimate would be worthless without some means of determining how far it might be
from the true value. Fortunately, the statisticians have shown that the variability of
FPL 17
-7-
the individual units in a sample can be used in obtaining an indication of the variability
of the estimated mean. This in turn enables us to test some hypothesis about the
value of the true mean or to determine confidence limits that have a specified
probability of including the true mean,
In fitting a regression, the estimated regression coefficients are also subject to
sampling variation. Again it is important to have a method of testing various
hypotheses about the coefficients and of determining some limits within which the
true coefficients or the true regression may be found. This would include testing
whether or not any or all of the coefficients could actually be zero, which would imply
no association between Y and a particular X or set of X variables. The statisticians
have provided the means of doing this. The procedures that have been devised for
testing various hypotheses about the regression coefficients or for setting confidence
limits on the regression estimates will be discussed following the description of the
fitting techniques.
Interpreting a Fitted Regression
Deriving the meaning of a fitted equation is one of the very difficult and dangerous
phases. Here there are no strict rules to follow. On the assumption that it has not
been copyrighted, 'THINK' is suggested as the guiding principle.
In searching for the meaning of a regression, the fact that it is man-made should
never be overlooked. It is an attempt to describe some phenomenon that may be
controlled by very complex biological, physical, or economic laws. It may, at times,
be an excellent description, but it is not a law in itself; only a mathematical
approximation.
Not only is the fitted regression an artificial description of a relationship, but it
is also a description that may not be reliable beyond the range of the sample
observations used in fitting the regression. For example, a straight line may be a
very good approximation of the relationship between two variables over the range of
the sample data, but this does not mean that outside of this range the relationship is
not curved. Similarly, a second-degree parabola (Y = a + bX + CX ) may give an

excellent fit over a certain range in the data, but this does not prove the existence of
the maximum or minimum point that will appear if this parabola is extended.
Finally, it must be remembered that a fitted regression is a sample-based
estimate and, as such, is subject to sampling variation. It should not be used without
giving due consideration to sampling error. Usually, this will mean computing
confidence limits on any predictions made from the fitted regression.
FPL 17
-8-
THE MATHEMATICAL MODEL

The most common applications of regression methods have one or both of the
following objectives :
(1) To find a mathematical function that can be used to describe the
relationship between a dependent variable and one or more independent variables.
(2) To test some hypothesis about the relationship between a dependent
variable and one or more independent variables.
This section will discuss some aspects of selecting a mathematical model to be
fitted and tested as a description of the relationship between the dependent and the
independent variables.
Through this Research Paper, we will be concerned with fitting and testing the
general linear model
Y = + X + X +
+ X
1 1
0
2 2
k k
where: Y = the dependent variable
---
X = an independent variable
i
= the regression coefficient for X (to be estimated in fitting).
i
i
This does not mean that we will only be able to fit straight lines or flat surfaces. For
example, the general equation for a second-degree parabola is
Y = a+ bX+
CX
Graphically, this might look roughly like one of the curves shown in figure 2.
Figure 2.--The second degree parabola.
M 124 619
2
To fit this curve with the general linear model, we merely let X = X, and X = X ,
1
2
then fit the model,
Y = + X + X
0
1 1
2 2
FPL 17
-9-
As another example, we might want to fit a hyperbolic function with the general
equation
Y= a + b (1 )
X
The form of this curve is illustrated in figure 3.
Figure 3.--The hyperbola.
If we let X = 1 , then we can fit this function with the model

1 X
Y = + X
0
1 1
M 124 619
A s a final example, the exponential curve represented by the function

x
Y = AB
bas the form shown in figure 4.
Figure 4.--Exponential curves.
M 124 622
The curve can be fitted by a linear model if we take the logarithm of both sides, giving
log Y = log A + X log B
This is the same as the linear model
Y' =
where we let Y' = log Y.
+ X
1
As noted earlier, when we speak of a linear model we are referring to a model that
is linear in the coefficients. The above examples show that a linear model may be
nonlinear as far as the variables are concerned
FPL 17
-10-
There are, of course, some curvilinear functions that cannot be transformed to a

X
linear model. The function Y = A + B , for example, cannot be transformed to the
c
simple linear form, nor can the function Y = a(X-b) . There are procedures for
fitting some nonlinear models, but they are generally too involved and laborious for
inclusion here. It should be mentioned, however, that there are electronic computer
programs available for fitting some nonlinear models.
Selecting the appropriate model can be both critical and difficult. It is one phase of
regression analysis that has not been taken over by the electronic computers. The
degree of difficulty and our probable success will depend to a considerable extent on
how much we know about the behavior of the subject matter.
In some cases, the model can be derived by reasoning from basic principles. In
formulating a model for the relationship between the specific gravity (S) of an annual
increment of wood at a given point on a tree and the distance (T) of that point from the
apex of the tree, Stage 3 reasoned as follows:
(1) The specific gravity is inversely related to the concentration
of auxin per unit area of cambium (C) and directly proportional to the
distance (T) from the apex. These effects are additive. This gives
in which a, b, and d are constants.
(2) The tree bole is approximately a paraboloid, so that the diameter
(D ) of the stem at a given distance (T) from the apex can be represented
T
by
in which g is a constant.
(3) Since auxin concentration would vary inversely with cambial
area (and hence with diameter) we have
C = k/D
or
C = k/g
in which k = a constant.
(4) This reasoning then led to the model
bg
S = a + dT + k
3
Stage, Albert R. Specific gravity and tree weight of single-tree samples of grand fir. U.S. Forest
Serv. Res. Paper INT-4, 11 pp., Intermountain Forest and Range Expt. Sta., Ogden, Utah. 1963.
FPL 17
-11-
or in terms of the general linear model

where: X1 = T
S= + X + X
2 2
0
1 1
X =
2
In many cases, our knowledge of the subject will be less specific, but the same line
of development must still be followed. If we were studying the relationship between Y
and two independentvariables (X and X ), we might have an idea that the relationship
1
2
between Y and one of the variables (say X ) could be represented by a straight line
1
(fig. 5).
Figure 5.--Y is a linear function of X .

1
M 124 615
Now, to work X into the model, we have to consider how changes in X might affect
2
2
the relationship of Y to X . Equal increments in X might result in a series of
1
2
equally spaced parallel straight lines for the relationship of Y to X (fig. 6).
1
Figure 6.--The relationship of Y to X and X . M 124 628

1
2
This suggests that in the equation Y = a + bX the slope (b) remains unchanged, but
1
the value of the Y intercept is a linear function of X (that is, a = a' + b' X ). Then
2
2
substituting for (a) in the relationship between Y and X , we have
1
Y = a ' + b ' X + bX
2
1
FPL 17
-12-
or the general model
Y= 0 + X + X
1 1
2 2
In this case, we say that the effects of X and X are additive.

2
1
If we have reason to believe that the Y-intercept remains constant but the slope
changes linearly (that is, b = a' + b' X ) with changes in X , we would have the model
2
2
Y = a + (a' + b' X )X
2 1
or
=a + a' X
+ b' X X
1
1 2
Y=
where: X' = X X
2
1 2
+ X + X'
1 1
2 2
In cases such as this, we say that there is an interaction between the effects of X
1
and X , and the variable X' = X X is called an interaction term. It implies that the
2
2
1 2
effect that one variable has on changes in Y depends on (interacts with) the level of
the other variable.
Most likely, if the slope changes, the Y intercept will also change. If both of these
changes are thought to be linear, then
a = a' + b' X2
b = a" + b"X2
and the model becomes
or
where: X = X X
3
1 2
Y = a' + b' X
+ a" X
+ b" X X
1 2
Y= + X + X + X
0
1 1
2 2
3 3
If the relationship of the de pe n de n t variable to an independent variable is

curvilinear, the problem is to select a mathematical function that gives a good
approximation to the particular form of curve. This is largely a matter of learning
the appearance of the various functions. Some of the forms associated with the more
commonly fitted functions are shown in Appendix C. Those who will be doing considerable regression work would do well to maintain a library of curve forms, adding to
it whenever a new form is encountered
FPL 17
-13-
If absolutely nothing is known about the form of relationship, then the selection of a
model gets to be a rather loose and frustrating process. There are no good rules to
follow. Plotting the data will sometimes suggest the appropriate model to fit. For a
single independent variable, plotting is no problem. For two independent variables
(say X and X ), we can plot Y over X using different symbols to represent different
1
2
1
levels of X As an example, the following set of hypothetical observations has been
2'
plotted in figure 7.
Figure 7.--Relationship of Y to X at various levels of X .

2
Each symbol represents a different class of X

of the graph. The
levels of X . The
2
appears that both
(a = a' + b' X , b
2
values as shown along the right side
2
different lines represent the relationship of Y to X at the various
1
relationship of Y to X seems to be linear (Y = a + bX ), and it
1
the Y intercept and the slop of the line increases linearly with X
2
= a" + b" X ). This would suggest the model
2
Y=
+ X + X + X
1 1
2 2
3 3
where: X
=X X .
3
1 2
Of course, real data will rarely behave so nicely.
FPL 17
M 124 617
-14-
If there are more than two independent variables, the graphics may not be much
more illuminating than the basic data tabulation. The usual procedure is to plot the
single variables in the hope of spotting some overall trend, and then perhaps to plot
pairs of independent variables (as above) to reveal some of the two-variable interactions. There are other graphical manipulations that can be tried, but the probability
of success is seldom high,
With the advent of electronic computers, much of the computational drudgery has
been removed from the fitting of a regression. This had led to what might be called a
'shotgun' technique of fitting. A guess is made as to the most likely form for each
variable, a number of interaction terms are introduced, usually up to the capacity of
the program, and then a machine run is made. This may consist of fitting all possible
linear combinations of a set of independent variables (as discussed in the section on
Electronic Computers) or may employ a stepwise fitting technique (Appendix A
Method III). From the output of the machine, the variables that seem best are selected,
The analysis may end here or further trials may be made using new variables or new
forms of the variables tried in the first run. Statistically, this technique has some
flaws. Nonetheless, it is useful when little or nothing is known about the nature of the
relationships involved. But, it should be recognized and used strictly as an exploratory
procedure.
FITTING
LINEAR
MODEL
The Least Squares Principle

The most commonly used procedures for fitting a regression surface are derived
from what is known as the least squares principle. To see what this principle is and
what it leads to, suppose that a sample of n units has been selected from some
population and on each unit a value has been observed for a dependent variable (Y) and
several independent variables (X , X ,---,X ). Supposefurther that the relationship
1
2
k
between the dependent and the independent variables can be represented by the linear
,
model
where: Y. = the observed value of the dependent variable for the i

X
FPL 17
ji
sample (i = 1, 2,
, n)
--th
= the value of the j
independent variable (j = 1, 2,
sample unit.
-15-
--- ,
th
unit in the
k) on the i
th
= the regression coefficient of the j

j
th
independent variable
= the deviation of the Y value from the regression surface

(that is,
= Y
- --- - kXki).
- X - X
1 1i
2 2i
We do not, of course, know the values of the coefficients, but must estimate them
from the sample data. The principle of least squares says that under certain
conditions, the best estimates of the coefficients are those that make the sum of
squared deviations a minimum
Now, for the i
th
sample unit, the deviation would be
and the squared deviation is
For all sample units, the sum of squared deviations is
---
In this quantity, we know what the values of Y , X , X ,

, X , are, because
i 1i
2i
ki
these were observed on the sample units. The magnitude of this sum of squared
deviations therefore depends on what values are used for the regression coefficients
( ). To distinguish them from the true but unknown coefficients, the estimates will
j
be symbolized by
It can be shown that the estimates that make the sum of squared deviations a
minimum, can be found by solving the following set of simultaneous equations:
FPL 17
-16-
These are known as least squares normal equations (LSNE), and the solutions
are called the least squares estimates of the regression
coefficients. The first equation is called the equation, the second the equation,
0
1
etc.
For those who are familiar with differential calculus, it can be mentioned that the
equation is obtained by taking the derivative of the sum of squared deviations with
j
respect to and setting it equal to zero: the familiar procedure for finding the value
j
of a variable for which a function is a maximum or minimum Thus,
Setting this equal to zero and moving the term with no coefficient to the righthand
side gives the equation
But, writing the normal equations for a particular linear model does not require a
knowledge of calculus. Merely use the set of equations given above as a general set
and select those needed to solve for the coefficients in the model to be fitted,
eliminating unwanted coefficients from the selected equations. Thus for the model
Y= + X + X
0
1 1
6 6
the equations, with unwanted coefficients eliminated, would be:
Coefficient
Equation
6
For the model
Y= X + X
2 2
1 1
FPL 17
-17-
the normal equations would be:

Coefficient
Equation
2
When the model contains a constant term ( ), it is possible to simplify the normal
0
equations and their solution. The simplification arises from the fact that the solution
of the normal equations will give as the estimate of ,
0
_ _
where: Y , X , _ _ _ = The sample means of Y, X , etc.
1
1
Using this value, we can rewrite the model
or
where: y = Y - Y
The normal equations for this model are:
where:
FPL 17
etc.
-18-
The usual procedure is to solve the normal equations for , ,

, and then to
1 2 --k
^
use these values to solve for . This may not appear to be much of a saving in labor,
0
but it is. The saving arises from the fact that the normal equations have been reduced
by one row and column.
The terms
etc., are usually referred to as the corrected sums
of squares and products, while
etc., are called uncorrected or raw
sums of squares and products. Some details of the analysis of variance depend on
which fitting procedure is used, as will be noted later on.
Problem1 - Multiple Linear Regression With a Constant Term
A number of units (n = 13) were selected at randomfrom a population. On each unit,
measurements were made of a Y variable and three independent variables (X ,X ,
1 2
and X ). The model to be fitted is of the form
The data were as follows:
Since the model contains a constant term, it will be simpler to work with the
corrected sums of squares and products. For this method, the normal equations will be
Coefficient
FPL 17
Equation
-19-
Calculating the corrected sums of squares and products:
and similarly,
In computing the sums of squares and products, it should be noted that a sum of
products may be either positive or negative, but a sum of squares must always be
positive; a negative sum of squares indicates a computational error.
Substituting the sums of squares and products into the normal equations, we have
The solution
of the system yields
and from these we obtain
Therefore, the fitted regression is
Appendix A reviews the method for solving a set of simultaneous equations.
FPL 17
-20-
In a fitted regression, the circumflex (^ ), which is also referred to as a caret or hat,

is placed over the Y to indicate that we are dealing with an estimated value, just as we
^
used to symbolize the estimate of . In this case, it will be recalled, the value being
i
1
estimated is the mean of all Y values associated with some specified combination of
values for the three independent variables.
Although the above method involves less work and is to be preferred when the
model contains a constant term, the same model can be fitted using uncorrected sums
of squares and products. The normal equations in this case would be:
Coefficient
Equation
or,
As before, the solutions are
Problem II - Multiple Linear Regression Without a Constant Term

Given the data of Problem I, fit the model
FPL 17
-21-
This presents no additional problems. Since the model contains no constant term, we
will have to work with uncorrected sums of squares and products. The normal
equations to be solved are:
Coefficient
Equation
or
The solutions are
= 3.1793 and
= -1,6204, so the fitted equation is
Problem III - Simple Linear Regression With a Constant Term

It is customary in elementary discussions of regression to start with the fitting of
a simple linear equation and then go on to the fitting of multiple regressions. In these
sample problems, the procedure has been reversed in the hope of emphasizing the
generality of the fitting procedure. Thus, fitting the linear model
is just a simple case of the general methods used in Problems I and 11. Since the
model has a constant term, we can work with the corrected sums of squares and
products. This results in the single normal equation
Coefficient
Equation
or using the data of Problem I,
The solution is
FPL 17
= 2.4615 and with this we find
-22-
so the fitted equation is
Problem IV - The Arithmetic Mean

It may be of interest to the reader who has had little exposure to the methods of
least squares toknowthatthe sample mean is also a form of least squares regression.
If we specify the model
we are merely saying that we want to estimate the mean value
_ of Y, ignoring the
values of the X variables. This is obviously the sample mean, Y. Treating this as a
regression problem, the normal equation would be:
Coefficient
Equation
which has the familiar solution
Problem V - Fitting a Curve

Fitting a curve presents no new problems, provided the curve can be expressed by
a linear model. To fit Y as a quadratic function of X (that is, Y = a t bX + cX ),

1
1
1
2
2
for example, we merely rename X (say X
= X ) and fit the linear model
1
1
4
The values of X would be
FPL 17
-23-
As the model contains a constant term, the normal equations can be written
Coefficient
Equation
The corrected sums of squares and products involving X are

4
so the normal equations are
Solving this set gives
= 1.1663 and = 0.0708. Then,

4
and the fitted quadratic is
Problem VI - A Conditioned Regression

Sometimes there is a reason to impose certain restrictions on the values of the
coefficients in a fitted regression. We have already seen one example of this in
Problem II, where we fitted a model without a constant term This is equivalent to
imposing the restriction that
= 0, that is, the regression surface passes through
0
the origin.
Fitting a regression with linear restrictions on the coefficients usually involves
nothing more than rewriting the model and sometimes a revision of the basic
FPL 17
-24-
variables. Suppose, for example, that we were going to fit the model
Y= + X + X
0
1 1
2 2
to the data of Problem I, but we wished to impose the restriction that
+ = 1.
1
2
This is equivalent to
= 1 -
1
2
and writing this into the original model gives
Y = + X + (1 - )X
0
1 1
1 2
or
(Y - X ) = + (X - X ).
2
0
1 1
2
This is obviously a linear model

Y' =
where: Y' = Y - X
X' = X
+ X'
1
- X
The normal equation for fitting the revised model is
There are two ways of getting the sums of squares and products of the revised
variables; one is to compute revised values for each observation. Thus we would have
Sums
Y'
12
25
X'
and
FPL 17
-4
-2
34
30
30
-2 -2
11
-2
10
-2 -4
x' 2 = 298, x' y' = 848
-25-
-6 -1
-2
-2
Means
14
143
11
26
Often it will be easier to work directly with the original values, thus:
x'
= X'
( X')
= (X
- X )
2
[ (X1 -
]2
X )
2
2
( X - X )
2
1
2
2
= (X
- 2X X + X ) n
1
1 2
2
2
2
X ) - 2( X )(X ) + ( X )
2
1
2
1
2
2 .
- 2 X X + X
= X
n
1
1 2
2
Then, using the values that have already been computed for the original variables,
2
2
2
(117) - 2(117)(91) + (91) = 298, as before.
x' = 1,235 - 2(847) + 809 13
Similarly,
x'y' = X'Y' -
(X')(Y')
= (X1 - X2)(Y - X2) n
[ (X 1 -
][
X ) (Y - X )
2
2
n
(X )( Y) - ( X )( Y) - (X )( X ) + ( X )
2
2
2
1
2
1
= X 1Y - X Y - X X + X 1 2
2
2
n
(117)(234) - (91)(234) - (117)(91) + (91)
= 2554 - 1382 - 847 + 809 13
= 848, as before.
Putting these values in the normal equation gives

^
298 = 848
1
^
and
= 2.8456
1
^
= 11 - (2.8456)(2) = 5.3088.
0
In terms of the revised variables, the regression is

^
Y ' = 5.3088 + 2.8456 X'.
FPL 17
-26-
This may be rewritten in terms of the original variables as

^
(Y - X ) = 5.3088 + 2.8456(X
2
1
- X )
2
or
^
Y = 5.3088 + 2.8456 X
- 1.8456 X
Note that the coefficients of X and X add up to 1 as required.

1
2
Requirements
In order to use the methods that have been described, the sample data must meet
certain requirements. For one thing, it must be from a population for which the
variance is homogeneous. That is, the variance of the Y values about the regression
surface must be the same at all points (for all combinations of X values). If the
variance is not homogeneous, it will usually be more efficient to use some weighting
procedure as will be discussed later. In this connection, it should be noted that if the
model to be fitted does not have a constant term, then the homogeneity of variance may
be open to question. Absence of the constant term implies that when all X variables
are equal to zero, then Y will also be zero. If Y cannot have negative values, then the
variability of Y may be restricted near the origin.
A second requirement is thatforthesampleunits the deviations ( ) of the Y values
i
from the regression surface must be independent of each other. That is, the size and
direction (+ or -) of the error for one unit should have no relationship to the size and
direction of the error for any of the other units in the sample, beyond the fact that
they are from the same population. Independence of errors can usually be assumed if
the sample units were randomly selected as far as the Y values are concerned
(purposive selection of X values is usually permissible and often desirable). The
errors may not be independent where a series of observations are made on a single
unit. Thus, when growth bands are placed on trees and the diameter is observed on
the same trees at intervals of time, the errors will probably not be independent. Also,
if the units observed are clustered in some way, the errors may not be independent
within clusters.
A final requirement is that the X values be measured with essentially no error.
Procedures exist for fitting a regression when the dependent and the independent
variables are both subject to error, but they are beyond the scope of this paper.
FPL 17
-27-
It should be noted that fitting a regression by the least squares principle does not
require that the Y values be normally distributed about the regression surface.
However, the commonly used procedures for computing confidence limits and making
tests of significance (t and F tests) do assume normality.
FITTING A WEIGHTED REGRESSION
The regression fitting procedures that have been described will give unbiased
estimates of the regression coefficients, whether thevariance is homogeneous or not.
However, if the variance is not homogeneous, a weighted regression procedure may
give more precise estimates of the coefficients. In a weighted regression, each
squared deviation is assigned a weight (w ), and the regression coefficients are
estimated so as to minimize the weighted sum of squared deviations. That is, values
^
are found for the s so as to minimize
j
This leads to the normal equations

Coefficient
Equation
The weights are usually made inversely proportional to the known (or assumed)
variance of Y about the regression surface. To understand the reasoning behind this,
refer to figure 8, in which a hypothetical regression of Y on X has been plotted along
with a number of individual unit values.
FPL 17
-28-
M 124 616
Figure 8.--An example of non-homogeneous variance.
It is obvious that the variance of Y about the regression line is not homogeneous:
it is larger for large values of X than for small values. It is also fairly obvious that
a single observation from the lower end of the line tells much more about the location
of the line than does a single observation from the upper end. That is, units that are
likely to vary less from the line (small variance) give more information about the
location of the line than do the units that are subject to large variation. It stands to
reason that in fitting this regression the units with small variance should be given
more weight than the units withlargevariance. This can be accomplished by assigning
weights that are inversely proportional to the variance. Thus, if the variance is known
to be proportional to thevalueof one of the X variables (say X ), then the weight could
j
be
If the variance is proportional to the square of X , the weight could be

j
Determining the appropriate weighting procedure can be a problem. If nothing is

known about the magnitude of the variance at different points on the regression
surface, special studies may have to be made.
It might be mentioned that if the variance is homogeneous, each observation is given
equal weight (w = 1). Notice that when w = 1, the normal equations for a weighted
i
i
regression are the same as those for an unweighted regression.
FPL 17
-29-
Problem VII - A Weighted Regression With Constant Term

To illustrate the weighted regression procedure, it will be assumed that in the data
of Problem I, the variance of Y is proportional to X and that we want to fit the model
Y=
+ X .
1 1
The appropriate weighting would be

w =
1i
The basic data and weights are
w X
i 1i
= 117, w X Y = 234.
i 1i i
The normal equations would be:

Coefficient
0
^
^
= w Y
( w ) + ( w X )
i 0
i 1i 1
i i
( w X ) + ( w X
)
= w X Y
i 1i 0
i 1i
1
i 1i i
Equation
or
^
1.9129
^
+ 13
^
13 + 117
= 24.631
= 234.
The solutions are = -2.9224 and = 2.3247, so the fitted equation is

1
0
^
Y = 2.3247X
FPL 17
- 2.9224.
-30-
Problem VIII - Ratio Estimators

A situation is frequently encountered where we have observations on a Y and an
associated X value, and we want to describe the relationship of Y to X by a ratio
Y = R.
X
This is equivalent to fitting the regression model
Y= X .
1 1
The appropriate estimate of will depend on how the variance of Y changes with
1
the level of X . Three situations will be considered: (1) the variance of Y is propor1
2
tional to X , (2) the variance of Y is proportional to X , and (3) the variance is
1
1
homogeneous.
(1) Variance of Y proportional to X . In this case, we would fit a weighted
1
regression using the weights
w = 1 .
i X
1i
The normal equation for would be
1
( w X )1 = w X Y
i 1i i
i 1i
so that
w X Y
i 1i i
=
1 w X
^
i 1i
2
1
However, w =
, so that w X Y = Y , and w X ,
= X . Hence,
i
X
i 1i i
i
i 1
1i
1i
Y
i
^
Y .
= nY
=
1
X
X
1i nX1
1
FPL 17
-31-
In other words, if the variance of Y is proportional to X, then the ratio of Y to X is

estimated by computing the ratio of Y to X. In sampling literature, this is sometimes
referred to as the ratio-of-meansestimator.
(2) Variance of Y proportional to X
w =
i
1
X
1i
. The weights in this case would be
As weve seen, the weighted estimate of is

1
^
But, if w =
i
w X Y
i 1i i ,
2
S
w X
i 1i
( )
2
X
1i = n. Hence,
1
=
, then w X Yi = (Y / X ), and w X
2
2
i 1i
i 1i
i 1i
X
X
1i
1i
2
Y
( i /X1i) .
=
n
1
2
, then the ratio of Y to X is estimated
1
by computing the ratio of Y to X for each unit and then taking the average of these
ratios. In sampling, this is called the mean-of-ratiosestimator.
So, if the variance of Y is proportional to X
(3) Variance of Y is homogeneous. If the variance is homogeneous, we can fit

unweighted regression (that is, a weighted regression with equal weights) for which
the normal equation is
( X
2^
) = ( X Y)
1
1 1
or
^
=
1
X Y
2
1
For the data of Problem I, the three estimates would be:

(1) Variance proportional to X .
1
^
FPL 17
SY = 234 = 2.0000
SX 117
-32-
(2) Variance proportional to X12.

^
(Y/X
)
1 = 24.631 = 1.8947
13
n
(3) Homogeneous variance.

^
X Y
1
2
X
1
2,554 = 2.0680
1,235
As many readers may know, fitting the model

Y = + X +
0
1 1
---
+ X
k k
by weighted regression methods with weights w leads to the same results as an

i
unweighted (or equal weighted) fitting of
where:
Transformations
Fitting a regression in the presence of heterogeneous variance may be a lot of work.
Special study of the variance is often required to select the proper weighting procedure, and the computations involved in a weighted fitting can be quite laborious.
To avoid the computations of a weighted regression, some workers resort to a
transformation of the variables. The hope is that the transformation will largely
eliminate the heterogeneity of variance, thus permitting the use of equal weighting
procedures. The most common transformations are log Y, arc sin
(used where Y
(frequently used if Y is a count rather than a measured
is a percentage), and
variable).
FPL 17
-33-
This may be perfectly valid if the transformation does actually induce homogeneity.
But, there is some tendency to use transformations without really knowing what
happens to the variance. Also, it should be remembered that the use of a transformation may also change the implied relationship between Y and the X variables. Thus,
if we fit
log Y = + X
0
1 1
we are implying that the relationship of Y to X is of the form
X
Y = ab 1
Fitting
= + X
0
1 1
implies the quadratic relationship
Y = a + bX
SOME
+ cX
ELEMENTS
2
1
OF
MATRIX
ALGEBRA
It is not necessary to know anything about matrix algebra (as such) in order to make
a regression analysis. If you can compute and use the c-multipliers as discussed in
the sections dealing with the t-test and confidence limits, then you have the essentials.
However, an elementary knowledge of matrix algebrais very helpful in understanding
certain procedures and terms that are used in regression work.
Definitions and Terminology
A matrix is simply a rectangular array of numbers (or letters). The array is
usually enclosed in brackets. The dimensions of a matrix are specified by the number
of rows and columns (in that order) that it contains. Thus in the matrices,
FPL 17
-34-
A is a 2 by 2 matrix B is a 1 by 3, C is a 3 by 1, and D is a 3 by 2 matrix.

The individual numbers (or letters) in a matrix are referred to as elements. A
particular element may be identified by subscripts designating the row and column
(in that order) in which the element appears. Thus, we could represent a matrix by
using subscripted letters in place of numerical elements. For example,
This is an m by n or (m x n) matrix.
A square matrix is one in which the number of rows equals the number of columns.
In a square matrix, the elements alongtheline from the upper left corner to the lower
, a ).
right corner constitute the diagonal of the matrix (that is, a , a ,
nn
11 22
----
If the elements above the diagonal of a square matrix are a mirror image of those
below the diagonal (that is, a = a for all values of i and j), the matrix is said to be
ji
ij
symmetrical. Some examples of symmetrical matrices are
A square matrix in which every element of the diagonal is a one and every other
element a zero is called the identity matrix and is usually symbolized by the letter I.
The last matrix above is an identity matrix.
FPL 17
-35-
Two matrices are equal if they have the same dimensions and if all corresponding
elements are equal. Thus,
only if
The transpose of a matrix is formed by 'rotating' the matrix so that the rows
become the columns and the columns become the rows. The transpose of
The transpose of
and of [3
2]
is
The transpose of a matrix (A) is symbolized by a prime (A' ).
FPL
- 36 -
Matrix Addition and Subtraction

Two matrices having the same dimensions can be added (or subtracted) simply by
adding (or subtracting) the corresponding elements. Thus,
Note that the sum (or difference) matrix has the same dimensions as the matrices
that were added (or subtracted).
Matrix Multiplication
Two matrices can be multiplied only if the number of columns of the first matrix is
equal to the number of rows of the second. If A is a (4 x 3) matrix, B is a (3 x 2),
and C is a (2 x 3), then the multiplications AB, BC, and CB are possible, while the
multiplications AC, BA and CA are not possible.
Y
The rule for matrix multiplication (when possible) is as follows: If A is an (r x n)

matrix and B is an (n x m) matrix, then the ith element of the product matrix (C) is
The dimensions of theproduct matrix will be (r x m). In words, the above rule states
th
th column of the product matrix is obtained as
that the element in the i row and the j
the sum of the products of elements from the ith row of the first matrix and the
th
corresponding elements from the j column of the second matrix.
Most persons find it easier to spot the pattern of matrix multiplication than to
follow the above rule. A few examples may be helpful:
FPL 17
-37-
(1)
(2)
(3)
(4)
(5)
Multiplication is not possible; the number of columns in
the first matrix does not equal the number of rows in the
second.
(6)
FPL 17
-38-
In addition to observing the pattern of matrix multiplication in these examples, a

few other points might be noted. For one thing, if the dimensions of a proposed matrix
multiplication are written down, the two inner terms must be equal for multiplication
to be possible. If multiplication is possible, the two outer terms tell the dimensions
of the product matrix.
Thus,
(3 x 2)(4 x 3)
multiplication is not possible.
(4 x 3)(3 x 2)
multiplication is possible; the product matrixwill be (4 x 2).
(1 x 200)(200 x 1)
multiplication is possible; theproduct matrixwillbe (1 x 1).
A second point to note is that even though the multiplications AB and BA may both
be possible (if A and B are both square matrices), the products will generally not be
the same. That is, matrix multiplication is, in general, not commutative. This is
illustrated by examples (1) and (6). The identity matrix (I) is one exception to this
rule; it will give the same results whether it is used in pre- or post-multiplication
(IA =AI =A).
Finally, it should be noted that any matrix is unchanged when multiplied by the
identity matrix, as in example (4).
The Inverse Matrix
ordinary algebra, an equation such as
ab = c
can be solved for b by dividing c by a (if a is not equal to zero). In the case of
matrices, this form of division is not possible. In place of division, we make use of
the inverse matrix, which basically is not too different from ordinary algebraic
division.
The inverse of a square matrix (A) is a matrix (called A inverse and symbolized
by A-1 ) such that the product of the matrix and its inverse will be the identity matrix
A
FPL 17
-1
A=I
-39-
As an example, the inverse of
is
since
Finding the inverse of a matrix is not too complicated though it may be a lot of work
if an electronic computer is not available. One method is to work from the basic
definition. In the matrix (A) given above, we can symbolize the elements of the inverse
matrix by the letter c, with subscripts to identify the row and column of the element.
Thus we can write,
Then, by the definition of the inverse, we know that
FPL 17
-40-
or
Now, two matrices are equal only if all of their corresponding elements are equal;
therefore, we have three sets of simultaneous equations, each involving three
unknowns.
Solving these leads to the inverse,
as given before.
When the matrix to be inverted is symmetrical, the inversion process is not quite
so laborious, for it turns out that the inverse of a symmetrical matrix will also be
symmetrical. This means that only the elements in and above the diagonal will have
to be computed; the elements below the diagonal can be determined from those above
the diagonal. A calculating routine for inverting a symmetrical matrix is given in
Appendix B.
As a final note, it may be mentioned that although matrix multiplication is not
usually commutative (AB BA),
the product of a matrix and its inverse does
-1
-1
commute (A
A = -1AA ).
FPL 17
-41-
Matrix Algebra and Regression Analysis

Matrix algebra provides a very useful tool in the regression analysis. To illustrate,
consider the set of normal equations for fitting a linear regression of Y on X1 and X2:
or with a small revision in notation
(Note that in this case aij = aji; this is a symmetric matrix)

Now, remembering what we learned about matrix multiplication, it will be noted that
this set of equations can be written in matrix form as:
Or even better,
where:
A is the matrix of sums, and sums of squares and products (computed from
the data).
^
is the matrix of estimated regression coefficients (to be computed).
R is the matrix of the righthandsides of the normal equations (computed from
the data).
FPL 17
-42-
In ordinary algebra the equation
^
^
could be solved for simply by dividing both sides by A, giving = R/A. This does
not work in matrix algebra, but there is a comparable process that will lead to the
desired result. This is to multiply each side of the equation by the inverse of A
-1
(= A ) giving
By definition, the product A
-1
A is equal to the identity matrix I, so we have
We have seen that a matrix is unchanged when multiplied by the identity matrix, so
^
^
I = and the above equation is
If we represent the elements of the inverse matrix by c the above equation is

ij
equivalent to
(Note
again that c
symmetric.)
ij
= c ; that is, the i n v e rse of a symmetric matrix is also

ji
Writing this out more fully gives
FPL 17
-43-
Then, since two matrices are equal only if corresponding elements are equal, we
have
or in general
To reassure ourselves that this procedure actually works, let us take a simple
numerical example. Suppose we have the normal equations
By the more familiar simultaneous equation procedures, we find
In matrix form, the normal equations are
The matrix of sums of squares and products is
and its inverse is
FPL 17
-44-
The inverse can (and should) be checked by multiplying it and the original matrix to
see that the result is the identity matrix
check
Then, the regression coefficients are given by
Or,
This is only one of the uses of the inverse matrix. It is, in fact, one of the leas
important uses, since the regression coefficients are just as easily computed by the
more familiar simultaneous equation techniques. Other uses of the inverse will be
discussed later under testing and computing confidence limits.
A word on notation is in order here. In regression work, the subscripting of the
c-multipliers depends on the form of the normal equations. If the normal equations
contain the constant term ( ) so that
0
then the c-multipliers are usually subscripted in this manner
FPL 17
-45-
If the normal equations do not contain the constant term, the subscripts for the
c-multipliers will be
It will be recalled that there are two situations in which the normal equations will
not have a constant term. The first is, of course, when the model being fitted does not
contain a constant term. The second is when the model being fitted does have a
constant term but the fitting is being done with corrected sums of squares and products.
ANALYSIS OF VARIANCE
It is important to keep in mind that when a linear model such as
is fitted to a set of sample data, we are in effect, obtaining sample estimates of the
population regression coefficients , , , . . . . These estimates will obviously
0
1
2
k
be subject to sampling variation: their values will depend on which units were, by
chance, selected for the sample.
If we have some hypothesis concerning what one or more of the coefficients should
be, this leaves us with the problem of determining whether the differences between
the observed and hypothesized values are real or could have occurred by chance. For
example, suppose we have a hypothesis that the relationship between Y and X is
linear. From a sample of Y and X values, we obtain the estimated equation
To test our hypothesis of a linear relationship, we would want to test whether the
^
observed value of = .082 represents a real or only a chance departure from a
1
true value of = 0.
1
FPL 17
-46-
Or, if we fitted the quadratic
and obtained the equation
we might ask whether

hypothesized value of
= -.11 represents a real or only a chance departure from a

2
^
= 0. If we find that a value of = -.11 could arise by chance
2
in sampling a population for which
= 0, then we might infer that there is no

2
evidence that the parabola is any better than the straight line for describing the
relationship of Y to X.
It may be desired to test more than one coefficient or values other than zero. For
example, we might have reasons or believing that the ratio of Y to X is some constant
K, which is equivalent to saying Y = KX. If we have fitted the linear regression
we would then want to test the joint hypothesis that
= 0 and
= K.
The exact form of the hypothesis will depend on the objectives of the research. The
main requirements are that the hypothesis be specified before the equation is fitted
and that it be meaningful in terms of the research objective.
This portion of the paper will deal with the use of analysis-of-varianceprocedures
in testing hypotheses about the coefficients. Some hypotheses can also be tested by
the t-test and this will be described later.
A General Test Procedure
There is a basic procedure that may be used in all situations, but in practice the
computational routine often varies with the method of fitting and the hypothesis to be
tested. First, lets look at the basic procedure and then illustrate the computations
or some of the more common testing situations. To illustrate the discussion of the
basic procedure, assume that we have a set of n observations on a Y-variable and
FPL 17
-47-
four associated X-variables, and that for the model
we want to test the joint hypothesis that

The first step is to fit the complete or maximum model to obtain estimates
^
^
^
( , , . . . , ) of the regression coefficients. With the results of this fitting, we
1
4
then compute the sum of the squared deviations of the individual Y-values from the
^
corresponding values predicted by the fitted regression (Y ). We will call this the
residual sum of squares (the sum of squared deviations or residuals).
Residual Sum of Squares
^
^ 2
Rather than computing each value of Y and each squared deviation (Y -Y) , the same
i i
i
result can be obtained by computing
Residual
where:
R = the right-handside of the j

j
th
normal equation.
2
) is called the total sum of squares. The second
In this equation the first term (SY
^
term (S
R ) is called the reduction or regression sum of squares.
j
j
The next step is to rewrite the basic model, imposing the conditions specified by
the hypothesis. In this case the hypothesis is that = 1 and = 2 , so we have
2
3
Rewriting the model we have,
or
where:
FPL 17
-48-
Now we fit this "hypothesis" model by the standard least squares procedures and
again compute
Residual
th
In this equation, the Rk term is the right-hand side of the k normal equation of the
^
set used to fit the hypothesis model, and the
are the resulting solutions of that set.
k
The analysis of variance can now be outlined as follows:
Degrees of
freedom
Source
Sum of
squares
Mean
square
Residual about hypothesis model

Residual about maximum model
Difference
for
testing
hypothesis
In this table, the residual sums of squares for the hypothesis and maximum models
are computed according to the equations given above. The difference is obtained by
subtraction. The degrees of freedom for a residual sum of squares will always be
equal to the number of observations (n) minus the number of independently estimated
coefficients. Thus, in the maximum model we estimated five coefficients so the
residual sum of squares would have n-5degrees of freedom. In the hypothesis model,
we estimated three coefficients so the residuals will have n-3 degrees of freedom.
The degrees of freedom for the difference (2) are then obtained by subtraction. The
mean squares are equal to the sum of squares divided by the degrees of freedom.
Finally, the test of the hypothesis can be made by computing
F =
Difference Mean Square

Maximum Model Residual Mean Square
This value is compared to the tabular value of F(table 6, Appendix E) with 2 and n-5
degrees of freedom (in this instance). If the computed value exceeds the tabular value
at the selected probability level, the hypothesis is rejected.
FPL 17
-49-
Loosely, the rationale of the test is this; given a sample of n observation on a

variable Y, there will be a certain amount of variation among the Y-values (the total
sum of Y-squares is a measure of this variation). When we fit a regression to these
values we are stating that some portion of the variation in Y is associated with the
regression and the remainder represents deviations from that regression (the total
sum of squares is divided into a regression sum of squares and a residual sum of
squares). Because the maximum model is subject to fewer restrictions than the
hypothesis model, it will fit the data better; the sum of squared residuals should (and
will) be smaller. If the hypothesis being tested is true, then the difference in residuals
between the hypothesis model and the maximum model will be no larger than might be
expected by chance. If the F-test indicates that this difference is larger than might
be expected by chance, then the hypothesis is rejected.
Degrees of Freedom
When we select a random sample of n observations on some variable Y, then all n
of the values are free to vary. Instatistical terms these observations (or the squares
of_ the observations) are said to have n degrees of freedom, If we estimate the mean
_
(Y ) of this sample and calculate the deviation of each value from the mean (y = Y-Y),
then since the deviations must sum to zero, only (n-1) of them are free to vary and the
deviations (or squared deviations) are said to have (n-1) degrees of freedom As shown
previously, estimating the mean is equivalent to fitting the regression Y = . If we
0
fit a straight line to the data, we are imposing two restrictions on the variation in Y
so that the deviations from regression (or the squared deviations) will have n-2
2
degrees of freedom. A parabola (Y =
+ X + X ) imposes three restrictions
0
1
on the variations in Y, so the residuals about a parabola would have n-3 degrees of
freedom. Thus, we have the rule that the degrees of freedom for the residuals is
equal to the number of observations minus the number of independently estimated
coefficients .
Just as we partitioned the total sum of squares into a portion due to regression and
a portion for deviations from regression, we can think of partitioning the total degrees
of freedom into a part associated with the regression and a part associated with
deviations from regression. If we fit a model with k independently estimated coefficients, we associate k degrees of freedom with the regression or reduction sum of
squares, and n-kdegrees of freedom with the residual sum of squares.
It might be mentioned that if we fit a model with n coefficients, the residuals will
have n-n = 0 degrees of freedom. This is equivalent to saying that the residuals have
no freedom to vary--thatthe model accounts for all the variation in Y; and, it will
FPL 17
-50-
turn out that every point will lie on the regression surface. The sum of squared
residuals will be zero. This will be true regardless of the independent variables used
in the model. This is a statistical form of the geometrical fact that two points may
define a straight line, three points may define a plane in three-dimensional space,
and n points define a "hyperplane" in n-dimensional space.
Problem IX - Test of the Hypothesis that
=1
Whenever the hypothesis specifies that a coefficient or some linear function of the
coefficients have a value other than zero, the basic test procedure must be used. To
illustrate the test of this non-zerohypothesis we will assume that we have fitted the
model (maximum model)
to the data of Problem 1.

The normal equations for fitting this model are:
or, substituting numerical values from Problem I,
The solutions are:
FPL 17
-51-
The total sum of squares is
The reduction or regression sum of squares is

Reduction
Since three
freedom.
coefficients
were
fitted,
the
reduction sum of squares has 3 degrees of
The residual sum of squares can now be obtained as

Residual
= Total
Reduction
The next step is to fit the hypothesis model. Under the hypothesis that + = 1
1
2
(or = 1 - ) the model becomes
1
or
This can be rewritten
where:
FPL
17
-52-
The normal equations for fitting this model are
At this stage individual values could be found for the new variables X' and Y' and
1
these could be used to compute the sums, and sums of squares and products needed.
However, with a little algebraic manipulation, we can save ourselves a lot of work.
Thus,
Substituting in the normal equations, we have
The solutions are
FPL 17
-53-
The reduction sum of squares is

Reduction
= 3986.0545, with 2 degrees of freedom.
Then, since the total sum of squares for Y' is
the residual sum of squares will be

Residual
= Total
- Reduction
= 4091 - 3986.0545
= 104.9455, with 13 - 2 = 11 df.
With these values we can now summarize the analysis of variance and F-test.
df
Sum of
squares
Residual-Hypothesis Model
11
104.9455
Residual-Maximum Model
10
101.7204
3.2251
Source
Difference
10.17204
3.2251
level.
The hypothesis would not be rejected at the

Problem X - Test of the Hypothesis that
Mean
square
= 0.
One of the most common situations in regression is to test whether the dependent
variable (Y) is significantly related to a particular independent variable. We might
want to test this hypothesis when the variableis fitted alone, in which instance (if the
variable is X ) we might fit
2
FPL 17
-54-
and test the hypothesis that
= 0.
Or, we might want to test the same hypothesis when the variable has been fitted in
the presence of one or more other independent variables. We could, for example, fit
and test the hypothesis that
= 0.
The latter situation will be illustrated with the data of Problem I. If we work with
uncorrected sums of squares and products, the normal equations for fitting the
maximum model are:
and the solutions are
Thus, the reduction sum of squares is

Reduction
= 5948.5182, with 4 df.
The total sum of squares (uncorrected) for Y is
Total
so the residual sum of squares for the maximum model is
FPL 17
-55-
Residual
= Total
- Reduction
= 6046 - 5948.5182
= 97.4818, with 13 - 4 = 9 df.
Under the hypothesis that
= 0, the model becomes
for which the normal equations are
giving the solutions
The reduction sum of squares will then be

Reduction
At this point we depart slightly from the basic procedure. Ordinarily, the residuals
for the hypothesis model would next be computed and then the difference in residuals
between the maximum and hypothesis models would be obtained. But, where the
hypothesis results in no change in the Y-variable, the difference between the residuals
is the same as the difference between the reductions for the two models.
Difference in residuals = Hypothesis models residuals - Maximum model residuals.
FPL 17
-56-
So, we can set up the analysis of variance in the following form:

Source
df
Maximum model reduction
5948.5182
Hypothesis model reduction

......................................
5335.4130
Difference for testing hypothesis
613.1052
613.1052
97.4818
10.8313

Total (uncorrected)
9
13
6046
As F exceeds the tabular value at the .01 level, the hypothesis would be rejected at
this level. We say that X makes a significant reduction (in the residuals) when fitted
2
after X and X
1
3'
Problem XI - Working With Corrected Sums of Squares and Products
It has been shown that if the model contains a constant term ( ), the fitting may be
0
accomplished with less effort by working with the corrected sums of squares and
products. That is, instead of fitting
we can fit
_
When this is done, it must be remembered that y = Y - Y is the deviation of Y
from its mean and that with n observations only n-1 of the deviations are free to vary
(since the deviations must sum to zero). Thus, the total sum of squares (corrected
sum of squares) will have only n-1 df s. Also, in the maximum model we now estimate
three rather than four coefficients, so the reduction will have only three degrees of
freedom.
FPL 17
-57-
Despite these changes, the test will lead (as it should) to exactly the same conclusion
that was obtained in working with uncorrected sums of squares and products.
Thus, using the data of Problem I and testing the same hypothesis that was tested
in Problem X, the normal equations for the maximum model are:
^
The solutions are = 2.7288,
1
reduction sum of squares is
^
^
= -0.0975. Therefore the

= -1.9218, and
2
2
Reduction
The total sum of squares (corrected) for y is

Total
The residual sum of squares for the maximum model is therefore

Residual
= Total
- Reduction
Under the hypothesis that
= 0, the reducedmodelwouldbecome y = x + x
2
1 1
2 2
for which the normal equations are
FPL 17
-58-
^
^
The solutions are = 2.3990 and = -0.2149, so the reduction clue to fitting this
1
3
model is
Reduction
= (2.3990)(448) + (-0.2149)(-226)
= 1123.3194, with 2 df.
Then in tabular form, the test of the hypothesis is as fallows:

Source
df
1736.5182

2
1123.3194
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
613.1988
613.1988
97.4818
10.8313
Total
(corrected)
12
1834.
To show what variables are involved in the two models, the various sources in the
analysis of variance may be relabeled as follows:
Source
Due to X , X , and X
1
3
Due to X , and X
3
1
- - - - - - - - - - - - - - - - -
Gain due to X after X and X

2
1
3
Residuals
Total
Problem XII - Test of the Hypothesis that
= 0.
A test of this hypothesis presents no new problems. For the model
FPL 17
-59-
we found (in Problem XI)

Reduction
= 1736.5182,
= 1834,
Total (corrected)
Residual
with 3 df
with 12 df
= 97.4818,
with 9 df
Under the hypothesis that 2 = 3 = 0 the model becomes
for which the normal equation is
giving
Then the reduction SUM of squares is

Reduction
= 2.4615(448) = 1102.7520, with 1 df.
Putting these values in the analysis of variance table we have

Source
df
Reduction due to X , X , X
1 2
1736.5182
Reduction due to X
1
1102.7520
- - - - - - - - -1 - - - - - - - - - - - - - - - - - - - - - - - - - - - - Gain due to X
Residuals
Total
FPL 17
and X
after X
2
9
12
-60-
633.7662
316.8831
97.4818
10.8313
1834.
Problem XIII - Test of the Hypothesis That 1 + 22 = 0

In Problem IX we tested the hypothesis that 1 + 2 = 1. We had to use the basic
procedure for this test because the maximum model and hypothesis model did not
have the same Y-values. For a zero hypothesis (e.g., 1 + 2 = 0) some of the
X-values may be changed, but the Y-values are unaffected and we may use the
simpler procedure shown in Problems XI and XII.
Thus in Problem XI we fitted
y = 1x1 + 2x2 + 3 x3
and found
Reduction
Total (corrected)
Residual
= 1736.5182,
=
with 3 df
1834,
with 1 2 df
with 9 df.
= 97.4818,
Under the hypothesis that 1 + 22 = 0 (or 1 = -22), the model can be written
or
where:
The normal equations for this model are:
Coefficient
Equation
Again, we could compute each value of

and products involving
separately and then get the sums of squares
using these individual values. It will be easier, however, to
make use of the sums of squares and products that were computed in fitting the
maximum model.
FPL 17
-61-
Thus,
So, the normal equations for the hypothesis model are:
The solutions to the normal equations are
so that the reduction sum of squares is

Reduction
The analysis of variance is as follows:
Source
df
1736.5182

2
1688.3534
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Difference
1
48.1648
48.1648
Residuals
Total
9
12
97.4818
1834
not significant at the .05 level.

FPL 17
-62-
10.8313
Problem XIV - Hypothesis Testing in a Weighted Regression

The primary difference in the test procedure for a weighted regression is the use of
a weighted total sum of squares in place of the unweighted (or equally weighted) sum
of squares.
VII we fitted the model
In
giving each observation a weight inversely proportional to the value of X The

1'
normal equations were
This gave as the reduction sum of
and the solutions,

squares
Reduction
The weighted sum of squares lor Y was

Total
so the residual sum of squares would be
Residual
Then to test the hypothesis that = 0, we would fit (by a weighted regression) the
1
model
for which the normal equation is
or
FPL 17
-63-
This model gives a reduction of 317.156 with 1 df, so the test of the hypothesis is:
Source
df
Reduction due to maximum model
Reduction due to hypothesis model

1
317.156
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
154.842
Residual
11
Total
13
81.743
7.431
553.741
significant at 0.01 level.

The hypothesis that
= 0, would be rejected at the 0.01 level.
An Alternative Way to Compute The Gain Due to a Set of X Variables

To test the hypothesis that one or more of the = 0, the difference in reduction
j
sum of squares between the maximum model and the hypothesis model must be
obtained. We have done this by first solving the normal equations and computing the
reductions for each model and then taking the difference. If each model has several
independent variables, this can be a very laborious process.
When the c-multipliers have been computed for the maximum model, there is a
method of obtaining the difference in reduction between the two models that is in some
cases less work. If we have fitted the regression of Y on X1, X2, X3, X4, and X5 and
we want to test X4, and X5 in the presence of X1, X2 and X3, that is, the hypothesis
that 4 = 5 = 0, then the difference in reduction sum of squares between the maximum
model (Y = 0
+ 1 X1 + 2X2 + 3X3 + 4X4 + 5X5) and the hypothesis model
(Y = 0 + 1X1 + 2X2 + 3X3 ) canbe obtained by the matrix multiplication:

Difference in Reduction
That is, we take the portion of the inverse associated with the variables to be tested,
invert it, and then pre- and post-multiply by a matrix of the coefficients being tested.
FPL 17
-64-
Two examples that illustrate the method will be given. In Problem XII we fitted
the model
^
^
and tested the hypothesis that 2 = 3 = 0. We found 2 =-1.9218and 2
=-0.0975,
and the gain due to X2 and X3 after X1 was found to be 633.7662.

To compute this gain using the c-multipliers, we must first find the inverse of the
matrix of sums of squares and products
The inverse is
The portion of the inverse associated with X and X is

3
2
The inverse of this is
FPL 17
-65-
Hence, the gain due to X2 and X3 after X1 is
= 633.7408
(as before, except for rounding errors),
In Problem XI we tested the hypothesis that 2

the gain due to Y2
after X1 and X3 was
^
= 0. We found 2 = -1.9218, and
By the present method, the gain
could be computed as
Whether or not this method saves any time or labor will depend on the number of
variables being tested, the number of variables in the hypothesis model, and the
individual's facility at solving simultaneous equations and inverting matrices. If only
one of the
variables is being tested (as in the last example), the t-test
as described in the next chapter will usually be the easiest. If the hypothesis model
involves only one or two variables or if the several variables are to be tested, it
may be easiest to fit each model and find the reduction due to each rather than
with the c-multipliers. For the beginner, the best method will usually be the one
with which he is most familiar.
THE
t-TEST
Many regression hypotheses can be tested by means of the t-distribution (table 7,

Appendix E), Setting up these tests requires the following bits of statistical knowledge:
1.
sample
variation
variation
FPL
The coefficients of a fitted regression are sample estimates which, like all
estimates, are subject to sampling variation. The statistical measure of the
of a variable is the variance, and the measure of the association in the
of two variables is the covariance.
-66-
The variance of an estimated regression
where:
is
is
cjj is an element of the inverse of the matrix of coefficients of the normal

equations.
The covariance of two coefficients estimated from the same set of normal equations
2.
3.
where:
a linear
function
of
several estimated coefficients (for example,
The general equation for the t test is
The estimated value of some function of normally distributed variables.

The true or hypothesized value of the function.
The variance of the sample estimate.
The t computed by this equation will have degrees of freedom equal to those of the
mean square used in the denominator.
Putting these three items together, we could, for example, test the hypothesis that
1 + 22 = 8 by
FPL 17
-67-
Or, we could test a more familiar hypothesis such as = 0 by

2
Since this last is the hypothesis that was tested in Problem XI, the t-test can be
^
illustrated with the same data. In that example, we had = -1.9218, and the residual
2
mean square with 9 degrees of freedom was Residual
= 10.8313.
The matrix of coefficients from the normal equations was
and the inverse would be
From the inverse, we find c

= .006
22
that = 0 is
2
022
9 so that the t-test of the hypothesis
If the absolute value (algebraic sign ignored) of t exceeds the tabular (table 7,
Appendix E) value for 9 df at the desired probability level, then the hypothesis is
rejected. In this case tabular t = 2.262 (at the .05 level), so we would reject the
hypothesis that = 0.
2
The F-test of this hypothesis also leads to a rejection. Those who are not familiar
with the relationship between the t and F distributions sometimes ask which is
FPL 17
-68-
the best test. The answer is that where both are applicable, the easiest one is best
because essentially there is no difference between them. If a given set of data is
used to test some hypothesis by both the t and F tests, it will be found that F with 1
and k degrees of freedom is equal to the square of t with k degrees of freedom. In the
2
example given above, we found t = -7.525 so t = 56.63. The F value for testing the
same hypothesis was
except for rounding errors, they should be identical.
Problem XV - Test of a Non-Zero Hypothesis
The main advantage of the t-test over the analysis of variance is in the test of a
non-zero hypothesis. It will be recalled that an F-test of this type of hypothesis
required the computation of separate total and residual sums of squares for both the
maximum and the hypothesis model.
the t-test, a non-zero hypothesis is
handled as easily as any other. This can be illustrated using the data in Problem XI
for a test of the hypothesis that
t
= 1.
12
^
^
In that problem we had 1 = 2.7288 2 = -1.9218, and Residual Mean Square =
10.8313 with 9 df. As we have seen, the inverse matrix is:
The ref ore,
The absolute value of t is less than the tabular value (2.262) at the .05 level, so
the hypothesis would not be rejected.
It should be noted that this is a test of the hypothesis that 1 + 2 = 1, when fitted
in the model
This is not the same as the hypothesis that was tested in Problem IX. In that problem
= 1 was tested when fitted in the model
CONFIDENCE
LIMITS
General
If you have ever asked for estimates on the cost of repairing a car or a TV set,
you are probably well aware of the fact that there are good and there are bad
estimates. Sample-based regression estimates can also be good or bad, and it is
important to provide some indication of just how good or bad they might be.
The variation that may be encountered in fitting sample regressions can be
illustrated by five separate samples of 10 units each, selected at random from a
population in which Y was known to have no relationship to the X variable. The simple
linear regressions that resulted from the five samples were as follows:
FPL 17
-70-
The sample regressions have been plotted in figure 9. The heavy horizontal line
represents the mean value of Y for the population.
Figure 9.--Plotting of 5 sample linear regressions.
M 124 626
These sample regressions illustrate two points that should never be forgotten by
those who work with regression. The first is that the fitting procedure can be applied
to any set of data. Put in the numbers, turn the crank, and out will come an equation
that expresses one variable in terms of one or more other variables. But no
how hard or how many times the crank is turned, it is impossible to induce relationships that did not exist to start with. It may all look very scientific, but the mere
existence of an equation with coefficients computed to eight decimal places on a
$3 million computer does not prove that there is a relationship.
The second point is that sample estimates are subject to variation. The variation in
these regressions may be quite startling to those who have had little experience with
the behavior of sample estimates. These results should not, however, be allowedto
shatter the beginners hopes for regression analysis. Ten units from this population
is far too light a sample for fitting even a simple linear regression and the erratic
results are no more than might be expected.
FPL 17
-71-
For properly designed sampling and estimating procedures, it is possible to compute

statistical confidence limits--values which will bracket the thing being estimated a
specifiable percentage of the time. If we compute 95-percent confidence limits for
the mean, these limits will include the population mean unless a one-in-twenty chance
has occurred in sampling. That is, about one time in 20 we will get a poor sample,
and the confidence limits computed from that sample will fail to include the mean.
We have no way of knowing which is the bad sample, but we can say that over the long
run, only one time in 20 will our 95-percent confidence limits fail to include the mean.
Similar confidence limits can be computed for regression coefficients and for
regression
predictions. For any estimate that can be assumed to follow the normal
distribution, the general equation for the confidence limits is:
where:
The thing being estimated.

A sample-based estimate of
The value of t for the desired probability level (Table 7, Appendix E).
Applying this to an estimated regression coefficient, we would have
The t would have degrees of freedom equal to those for the residual mean square.
In the section on the analysis of variance, we fitted a regression of Y on X , X ,
1 2
^
The residual
and X3 (Problem XI). The estimated value of 2 was 2 =
mean square was 10.8313 with 9 degrees of freedom, and the c-multiplier was
c22
= 0.006 022 9 (Problem XV). The 95-percent confidence limits for 2 would
then be given by
or
The confidence limits can be used as a test of some hypothesized value of the
coefficient. Since these limits do not include zero as a possible value, we would
reject the hypothesis that = 0. This is the sameconclusion that we reached by the
2
FPL 17
-72-
F and t tests. The confidence limit approach is usually more informative than F or
t tests of a hypothesis.
If we wish to place confidence limits on a linear function of the regression
coefficients, we must remember the rule for the variance of a linear function. This
rule was given previously, but will he repeated here.
If
are
set
of
^
variance of the linear function (a1 1
estimated
+ a 2 2
regression coefficients, then the

^
+ ak k) will be estimated by
Or, in more abbreviated form,
where:
This is nowhere near as difficult as it appears. For example, the variance of the
^ would be
function 2^ 1 - 3
2
or
Then if we are given a linear function of the regression coefficients

confidence limits on this function will be
FPL 17
-73-
the
then the confidence limits would be
If
where t has degrees of freedom equal to the df for the residual mean square and is
selected for the desired level of confidence.
^ (Predicted Mean Y)
Confidence Limits on Y
The preceding rule may be applied to theproblem of determining confidence limits
^
for Y (a predicted value of mean Y for a given set of values for the X variables). If
the predicted value of mean Y is
then the confidence limits can be obtained by treating the specified values of the Xs
as constants and applying the preceding rule. Thus,
Confidence Limits
An abbreviated way of writing this is

Confidence Limits
where X or X = 1, if i or j = 0
i
j
This again looks more difficult than it is. Thus, for the sample linear regression
the confidence limits are
FPL 17
-74-
(that is, no constant term),
If the fitted regression is of the form

would be
then the confidence
Note that in this case where the fitted model has no constant term, we will have no
c-multipliers with a zero in the subscript.
All of the
regressions .
above equations would apply for weighted as well as unweighted
The reader who has always used corrected sums of squares and products in
regression work may find that the confidence limit equations given are not quite the
same as those with which he is familiar. They will, however, give exactly the same
confidence limits. The somewhat
familiar equation (for unweighted regression)
for the confidence limits is
In the following problem, we will see that these equations lead to the same result.
Problem XVI - Confidence Limits In Multiple Regression
In Problem I, we fitted a regression of Y as a linear function of X1, X2, and X3.
The fitted equation was
The same result was obtained whether we used corrected or uncorrected sums of
squares and products. The residual mean square (Problem XI) was 10.8313 with
9 degrees of freedom.
Suppose now that we predicted the mean value of Y associated with the values
X1 = 6, X2 = 8, and X 3 = 12. We would have
FPL 17
-75-
The method of computing the confidence interval for this estimate will depend on
whether corrected or uncorrected sums of squares and products were used in the
fitting. We will assume first that the fitting was done with uncorrected terms. In
this instance the normal equations were
The matrix of sums and uncorrected sums of squares and products is
The inverse is
The equation for the confidence limits in this case will be
FPL 17
-76-
or, for 95 percent confidence limits,
Thus, unless a 1 in 20 chance occurred in sampling, we can say that the true mean
of Y associated with X1 = 6, X2 = 8, and X3 = 12 is somewhere between 4.0948 and
Note that this does not implythat individual values of Y will be found between
these limits. This is a confidence interval on regression Y which is the mean value of
Y associated with a specified combination of X values.
Suppose now that in fittingthis equation, we had used the corrected sums of squares
and products so that the normal equations were
The inverse of the matrix
is
FPL 17
-77-
Note that this is the same as the inverse of the matrix of uncorrected sums of
squares and products with the first row and column deleted.
The confidence limits can be computed by the equation
Limits on the mean value of Y associated with X1 = 6, X2 = 8, and X3 = 12 would be:
Problem XVII - Confidence Limits On a Simple Linear Regression

If we fit a simple linear regression(Y = 0 + 1X1) using uncorrected sums of
squares and products, the equation for the confidence limits will be
as may be determined from the general formula,

If corrected sums of squares and products are used in the fitting, the equation for
the confidence limits boils down to
FPL 17
-78-
The normal equation for the simple linear regression fitted with corrected sums of
squares and products is
and the inverse of the (1x1) matrix
is simply
so this equation can also be written
which is the form that appears in many textbooks.

Calculation of the limits can be illustrated with the simple linear regression fitted
in Problem 111. The equation was
^
Y = -4.1535 + 2.4615X .
1
_
We had n = 13, X = 9, and the residual mean square is 66.4771 with 11 degrees of
l
freedom. The normal equation was
or
For a value of X
= 7, we would have
and the 95 percent confidence limits would be
FPL 17
-79-
or
If regression Y and the confidence limits are computed for several values of X,
the confidence limits can be displayed graphically. In the above example, we would
have
In figure 10 these points have been plotted and connected by smooth curves.
Figure 10.--A linear regression with 95-percent confidence limits.
FPL 17
-80-
620
Confidence Limits on Individual Values of Y

It was mentioned and should be emphasized that the limits previously discussed
are limits for the regression line (mean value of Y for a specified X), not limits on
individual values of Y. Often, however, having estimated a value of Y by means of a
regression equation, we would like to have some idea as to the limits which might
include most of the individual Y values. These limits can be obtained by adding one
times the residual mean square to the term under the radical in the equations given
for the limits on regression Y.
This would make the general formula for the limits on an individual value of Y
The formula that can be used when the corrected sums of squares and products
have been used in the fitting would be
For the simple linear regression, this last formula reduces to
COVARIANCE
ANALYSIS
It frequently happens that the unit observations to be analyzed can be classified into
two or more groups. A set of tree heights and diameters might, for example, be
grouped according to tree species. This raises the question of whether separate
prediction equations should be used for each group or could some or all of the groups
be represented by a single equation? Covariance analysis provides a means of
answering this question.
In the case of simple linear equations, group regressions may differ either because
they have different slopes or, if the slopes are the same, because they differ in level
(fig. 11).
FPL 17
-81-
Figure 11.--Variation among linear regressions,
M 124 623
The standard covariance analysis first tests the hypothesis of no difference in slope.
Then if there is no evidence of a difference in slopes, the hypothesis of no difference
in levels is tested. If no significant difference is found in either the slopes or levels,
then a single regression may be fitted ignoring the difference in
The following set of data will be used in the problems illustrating the analysis of
covariance.
Group C
Group B
Group A
Y
5.9
10.7
11.4
9.6
12.6
8.0
12.8
7.5
12.5
14.2
8.4
Sums
Means
FPL 1 7
113.6
10.3273
0.8
3.1
4.4
1.6
4.6
2.6
5.5
1.1
3.9
4.9
1.4
33.9
3.0818
5.2
13.4
10.0
7.5
10.1
11.9
10.7
6.8
9.0
1.6
5.8
3.6
2.0
4.3
5.8
4.8
3.3
2.6
84.6
9.4
33.8
3.7556
Y
7.8
12.4
10.9
16.8
13.9
11.4
8.9
13.7
16.0
0.6
3.4
1.5
.7
4.5
4.1
2.3
1.3
3.1
4.6
121.7
12.17
26.1
2.61
11
242.72
9
848.20
10
1559.73
132.73
145.98
89.47
390.91
346.69
356.57
69.541 819
52.96
78.641
28.256 364
19.042 222
21.349
40.815 455
28.97
38.933
-82-
Pooled values (ignoring groups)
2
= 93.8; Y 2 = 3650.65; y =
667;
1
2
2
X1 Y = 1094.17; x1y = 93.949 334; X 1= 368.18; x = 74.898 667.
1
n = 30; Y = 319.9; X
Using corrected sums of squares and products, the normal equation for a linear
regression with constant term is:
If a separate regression were fitted for each group, we would have:

^
Group A: 28.256 364 = 40.815 455
1
^
= 1.444 469
1
= (1.444 469)(40.815 455) = 58.956 659, with 1 df
Reduction
2
- Reduction
Residual
= y
= 69.541 819 - 58.956 659 = 10.585 160, with 9 df.
Group B:
^
19.042 2221 = 28.97
^
356
=
1
= (1.521 356)(28.97) = 44.073 683, with 1 df
Reduction
Residual
Group C:
= 52.96 - 44.073 683 = 8.886 317, with 7 df.
^
21.349 = 38.933
1
^
1 = 1.823 645
Reduction
= (1.823 645)(38.933) = 70.999 971, with 1 df
Residual
= 78.641 - 70.999 971 = 7.641 029, with 8 df.
If a single regression were fitted (ignoring groups), we would have:
^
74.898 667 = 93.949 334
1
^
1 = 1.254 353
Reduction
Residual
353)(93.949 334) =
= 239.449 667 - 117.845 629

= 121.604 038, with 28 df.
FPL 17
-83-
629, with 1 df
Two approaches to the analysis of covariance will be illustrated. The first method
is that given by Snedecor (7), while the second is a general method involving the
introduction of dummy variables.
Problem XVIII - Covariance Analysis
Snedecor (7) presents the analysis of covariance in a very neat form. The steps in
this procedure are summarized in table 1.
Table 1.--Analysis of covariance
For test of difference in slopes:
not significant at
0.05 level.
For test of levels (assuming common slopes):
The first three lines in this table summarize the results of the fitting of separate
linear regressions for each group. In line 4, the residuals about the separate
regressions and the associated degrees of freedom are pooled. This pooled term can
be thought of as the sum of squared residuals about the maximum model; it represents
the smallest sum of squares that can be obtained by fitting straight lines to these
observations.
FPL 17
-84-
Skipping to line 6 for the moment, the first four columns are the pooled degrees of
freedom and corrected sums of squares and products for the groups. The last three
columns summarize the result of using the pooled sums of squares and products to
fit a straight line. The normal equation and solution for this fitting would be:
^
68.647 586 = 108.718 455
1
^
= 1.583 719
1
Thus the reduction sum of squares with 1 degree of freedom is
Reduction
= 1.583 719(108.718 455) = 172.179 483
And the residual sum of squares is

Residual
= y2
- Reduction
= 201.142 819 - 172.179 483

= 28.963 336 with 26 df.
This represents the residual that we would get by forcing the regressions for all
groups to have the same slope even though they were at different levels. Since this
is a more restrictive model, the residuals sum of squares will be larger than that
obtained by letting each group regression have its own slope. The mean square
difference in these residuals (line 5) can be used to test the hypothesis of common
slopes. The error term for this test is the mean square of the pooled residuals for
separate regressions (line 4). The F test gives no indication that the hypothesis of
common slopes should be rejected. If the hypothesis of common slopes is rejected,
we would usually go no further.
Having shown no significant difference in slopes, the next question would be
whether the regressions differ in level. Under the hypothesis of no difference in
levels (or slopes) we would, in effect, ignore the groups and use all of the data to
fit a single regression. The results are summarized in line 8. Because of the added
restriction (common levels) that has been imposed on this regression, the residuals
will be larger than those obtained where we let the group regressions assume
separate levels but force them to have a common slope (line 6). The mean square
difference (line 7) provides a test of the hypothesis of common levels. The error for
this test is the residual mean square for the model assuming common slopes
(line 6).
FPL 17
-85-
The significant value of F suggests that the group regressions are different. The
difference is mostly due to a difference m levels. There is no evidence of a real
difference in slopes.
If we are not interested in finding out whether the difference (if any) in the group
regressions is in the slopes or the levels, an overall test could be made using the
difference between lines 8 and 4. We would have:
It is possible to test more complex hypotheses than these. We could, for example,
test for difference in slope and level between groups A and C, or for the average of
groups A and B versus group C. We could also deal with multiple or curvilinear
regressions and test for differences between specified coefficients or sets of
coefficients. It is probably safe to say that readers who have sufficient understanding
of regression to derive meaningful interpretations of such tests will usually know
how to make them.
Covariance Analysis With
Variables
In fitting a regression within a single group, some workers are accustomed to

dealing with the model
where X is a dummy variable, defined to be equal to 1 for all observations. The

0
normal equations would, of course, be
Since X is equal to 1 for all observations, the normal equations are equivalent to:
0
FPL 17
-86-
GPO 815-411-6
So, the end result will be the same as that given by the methods previously described.
The idea of a dummy variable comes in quite handy in dealing with the problem of
group regressions. There are several ways of applying the idea, but the most easily
understood is to introduce a dummy variable for each group. The dummy variable
would be defined as equal to 1 for every observation in that particular group, and
equal to zero for any observation that is in a different group. As we are interested in
linear regressions of Y on X ,
the dummy variables could (in the case of three
1
groups) be labeled X , X , and X where
2
3
4
X
= 1 for any observation falling in group A.

= 0 for an observation falling in any other group.
= 1 for any observation falling in group B.

= 1 for any observation falling in group C.

It is also necessary to introduce three variables representing the interactions

between the measured variable (X ) and the three dummy variables. These could be
1
labeled X , X , and X where
56
7
With these variables, we can now express the idea of separate linear regressions
for each group by the general linear model
After solving for the coefficients, the regression for any group can be obtained by
assigning the appropriate value to each dummy variable. Thus for Group A, X =1,
2
X = 0, and X = 0.
3
4
So we have:
The equation will be exactly the same as that we would get by fitting a simple linear
regression of Y on X for the observations in group A only.
1
FPL 17
-87-
Under the hypothesis that the three groups have regressions that differ in level
but not in slope, the model would be
In this model, 1 is the common slope, while 2, 3, and 4 represent the different
levels.
The difference in reduction sum of squares for these two models could then be
used in a test of the hypothesis of common slopes.
Under the hypothesis that there is no difference in either slope or level, the model
becomes
The difference in reduction between this model and the model assuming common
slopes can be used to test the hypothesis that there is no difference in levels.
Problem XIX - Covariance Analysis With Dummy Variables
The use of dummy variables for a covariance analysis will lead to exactly the same
result as the method of Snedecor (7). Applying the procedure to the data of the
previous example, the values of the variables would be as follows:
FPL 17
-88-
FPL 17
-89-
Sums of Squares and Products (Uncorrected):
The model for separate regressions is
and the normal equations would be:
FPL 17
-90-
or
The solutions, which are easily obtained by working with pairs of equations involving
the same coefficients, are:
Thus, the separate regressions would be
These are the same as the equations that would have been obtained by fitting separate
regressions in each group.
The reduction due to this maximum model would be:
Reduction
This gives a residual sum of squares of

Residual
FPL 17
-91-
Under the hypothesis for common slopes but different levels, the model becomes
The normal equations are:
or,
The solutions are:
giving a reduction of
Reduction
Then to test the hypothesis of common slopes we have
FPL 17
-92-
as before.
Now t o test the hypothesis of no difference in levels (assuming no difference in
slopes),
must fit
model
The normal equations are:
or
The solutions are:
so the reduction s u m of squares is

Reduction
Then
test for common levels
common slopes) is
a s before.
FPL 17
Although the two procedures will lead to exactly the same results, the computational
routine of Snedecors procedure (7) is probably easier to follow and, therefore, better
for the beginner. The advantage (if any) of the dummy variable approach might be
that it gives a somewhat clearer picture of the hypotheses being tested. Once the
dummy variable approach has beenlearned, it maybe easier to work out its extension
to the testing of more complex hypotheses.
Dummy variables are also useful in introducing the relationship between regression
analysis and the analysis of variance for various experimental designs. Those
interested in this subject will find a brief discussion in Appendix D.
DISCRIMINANT FUNCTION
Closely related to the methods of regression analysis is the problem of finding a
linear function of one or more variables which will permit us to classify individuals
as belonging to one of two groups. The function is known as a discriminant. In
forestry, it might be used, for example, to find a function of several measurements
which would enable us to assign fire or insect damaged trees to one of two classes:
will live or will
The methods will be illustrated by the intentionally trivial example of classifying
an individual as male or female by means of the individuals height (X ), weight (X ),
1
2
and age (X ). To develop the discriminant, measurements of these three variables
3
were made on 10 men and 10 women.
FPL 17
-94-
The first step is to compute the difference in the group means for each variable and
the corrected sums of squares and products for each group.
Mean differences:
Corrected Within-Group Sums of Squares and Products:
The next step is to compute the pooled variances and covariances. The pooled
variance for X will be symbolized by s and computed as
j
jj
where:
n = number of males
m
n = number of females
f
The pooled covariance of X and X

will be symbolized by s
and computed as
j
k
jk
FPL 17
-95-
The computed values of the pooled variances and covariances are
Now what we wish to do is fit a function of the form
such that the value of Y (for measured values of X
1
classify an individual as male or female. Fisher has
be determined by solving the normal equations
X , and X ) will enable us to

2
3
that the b values can
i
entering the calculated values, we have
for which solutions are
Use and Interpretation of the Discriminant Function

Assuming for the moment that we are satisfied with the fitted discriminant, it
could be used as follows:
(1) Compute the mean value of the discriminant for males and for females
FPL 17
-96-
(2) The mean of these two
serves as a criterion for classifying individuals as male or female. Any individual

for whom Y is greater than 25.4388 would be classified as male, and any individual
for whom Y is less than 25.4388 would be classified as female.
Testing a Fitted Discriminant
Before using the discriminant function for classification purposes, we should test
its significance. This can be done using the F test with p and (N-p-1) degrees of
freedom.
where:
p = Number of variables fitted.

N = Number of observations in the first group.
1
N = Number of observations in the second group.
2
For the previously fitted discriminant, we have:
Thus,
= 8.908; significant at the .01 level.
FPL 17
-97-
This test tells us that there is a significant difference in the mean values of the
discriminant between males and females. Looking at it another way, we have shown
a significant difference between the two groups using measurements on several
characteristics. This is analogous to the familiar t and F tests where a significant
difference is shown between two groups using only a single variable. In fact, if we
fit and test a discriminant function using just a single variable, for example, weight,
we will get the same F value (29.824) as we would by testing the difference in weight
between male and female using an F test of a completely randomized experimental
design.
Testing the Contribution of Individual Variables or Sets of Variables
To test the contribution of any set of q variables in the presence of some other set
2
of p variables, first fit a discriminant to the p variables and compute D . Then fit
p
2
. The test of the contribution of the q variables
all p + q variables and compute D
p +q
in the presence of the p variables is:
F (with q and N-p-q-1degrees of freedom)
Thus, to test the contribution of weight and age in the presence of height, we first
fit a discriminant function for height alone. The single equation is:
The discriminant for all three variables gave a value of D
2
= 6.0127. The test is:
3
Hence, weight and age do not make a significant contribution to the discrimination
between male and female when used after height.
FPL 17
-98-
The contribution of single variables can be similarly tested. For example, we

could test the contribution of age when fitted after height and weight. To determine
2
D for the discriminant using height andweight, we must solve the normal equations:
2
The solutions are:
from which we find D
2
= 5.9660
2
For all three variables, we found D

Then,
2
= 6.0127
3
Similar tests of the contribution of weight in the presence of height (significant at

the .05 level) and the contribution of height after weight (not significant) suggest that
weight alone provides about as good a means of discrimination as does the use of all
three variables.
Reliability of Classifications
Using a discriminant function, we will misclassify some individuals. The probability
of a misclassification can be estimated by using K = D/2 as a standard normal
deviate and determining from a table of the cumulative normal distribution (table 8,
Appendix E) the probability of > K.
2
Using all three variables, the value of D was 6.0127, giving D = 2.452 and
K = 1.226. The probability of getting a standard normal deviate larger than 1.226
is found to be about P = 0.1101. About 11 percent of the individuals classified using
this function would be assigned to the wrong group. For the data used to develop the
discriminant, it actually turns out that 2 out of 20 (10 percent) would have been
misclassified.
FPL 17
-99-
For a discriminant involving height and weight but not age, the probability of a
misclassification would be about 0.11096. For the discriminant involving weight
alone, the probability of a misclassification is about 0.11098. Thus, we see, (as
previous tests indicated) that our classifications using weight alone would be almost
as reliable as those using weight, height, and age. For a discriminant involving
height alone, the probability of a misclassification is about 0.18, and with a discriminant using age alone, about 0.368 of our classifications would be in error.
Reducing the Probability of a Misclassification
There are two possible procedures for reducing the proportion of misclassifications.
One of these is to look for more or better variables to be used in the discriminant
function. The second possibility is to set up a doubtful region within which no
classification will be made. This requires determining two values, Ym and Yf. All
individuals for which Y is greater than Y
those for which Y is less than Y

between Y
and Y
f
m
To determine Y
will be classified as male, while all

m
will be classified as female. For values of Y
f
no classification will be made.
and Y
it is first necessary to decide the proportion of
m
f
misclassifications we are willing to tolerate. Suppose, for example, we will use our
three-variable discriminant function but we wish to make no more than 5-percent
misclassifications. The procedure is to look in a table of the cumulative normal for
a value of such that the probability of getting a standard normal deviate greater
than is 0.05. The value of meeting this requirement is =
appropriate limit values are
Then the
For the three-variable discriminant, we have:
The ref ore,
An individual with a Y value greater than 26.4659 would be classified as a male

while an individual with Y less than 24.4116 would be classified as female. No
classification would be given for individuals with a Y value between 24.4116 and
26.4659.
FPL 17
-100-
Basic Assumptions
The methods described here assume that for each variable, the within-group
variance is the same in both groups. Different variables can, of course, have different
variances. Also, any given pair of variables is assumed to have the same within-group
covariance in each group. All variables must follow (within the group) a multivariate
normal distribution. Since the methods are based on large sample theory, it is
ordinarily desirable to have at least 30 observations in each group.
ELECTRONIC COMPUTERS
The present popularity of regression analysis is due in no small way to the
advances that have been made in electronic computers. The computations involved in
fitting regressions with more than two or three independent variables are quite
tedious,
and with a large number of observations, fitting even a simple linear
regression may be an unpleasant task.
Also, the possibilities for simple but
devastating
arithmetical
mistakes
are
great.
Modern electronic computers have
overcome both of these obstacles. They can handle huge masses of raw data and
subject it to numerous mathematical operations in a matter of minutes, and they
seldom make mistakes.
Nearly every phase of regression analysis can be handled by one or more of the
computers and almost every computing center has programs for obtaining sums of
squares and products, fitting multiple regressions, inverting matrices, computing
reduction and residual sums of squares, etc. Despite the high per hour rental on
these computers, the cost of doing a particular regression computation will usually
be a small fraction of what it would cost to do the same job with a desk calculator,
and the work will rarely contain serious errors.
Because of the numerous variations in these programs and the rate at which new
ones are being produced, no attempt will be made to list everything that is available
or to describe the use of such programs. This information can best be obtained by
first learning what regression is, how and why it works, and then discussing your
needs with a computer specialist.
To merely indicate what can be done by a computer, a brief description will be
given of a few of the existing programs.
TV REM is the designation of a regression program for the IBM 704 computer.
It will take up to 586 sets of observations on a Y and up to 9 independent (X) variables
and compute the mean of each variable and the corrected sums of squares and products
FPL
17
-101-
for all variables. It will also fit the regressions of Y on all possible linear
combinations of up to nine independent variables (a total of 511 different equations)
and the reduction sum of squares associated with each fitted equation. The cost of
this may vary from $40 to $200, depending largely on the machine rental rate and
to a lesser extent on the volume of data. This program is described in a publication
by L. R Grosenbaugh (6).
SS XXR is another program for the IBM 704. It will take up to 999,999 sets of
up to 41 variables and compute their means, all possible uncorrected sums of squares
and products, the corrected sums of squares and products, and the simple correlation
coefficients for all possible pairs of variables. The cost may run from $5 to $50,
depending again on machine rental rates and on the number of observations and
variables.
These give just a faint idea of what is available. Other programs will compute
sums of squares and products and give the inverse matrix for 40 or 50 variables, fit
regressions for as many as 60 independent variables, or fit weighted regressions
and regressions subject to various constraints One programwill follow what is known
as a stepwise fitting procedure (see Appendix A, Method III), in which the best single
independent variable will be fitted and tested first; then from the remaining variables,
the program will select and fit the variable that will give the greatest reduction in
the residual sum of squares. This will continue until a variable is encountered that
does not make a significant reduction. The program can also be altered so as to
introduce a particular variable at any stage of the fitting.
No space need be devoted in this Research paper to encouraging the reader to look
into the computer possibilities, for he will be a convert the first time he has occasion
to fit a four-variable regression and compute the c-multipliers--ifnot sooner.
CORRELATION
COEFFICIENTS
General
In earlier literature, there is frequent reference to and use of various forms of
correlation coefficients. They were used as a guide in the selection of independent
variables to be fitted, and many of the regression computations were expressed in
terms of correlation coefficients. In recent years, however, their role has been
considerably diminished, and in at least one of the major texts on regression analysis,
correlation is mentioned less than a half-dozen times. The subject will be touched
upon lightly here so that the reader will not be entirely mystified by references to
correlation in the earlier literature.
FPL 17
-102-
GPO 815-411-5
The
Simple
Correlation
Coefficient
A measure of the degree of association between two normally distributed variables

(Y and X) is the simple correlation coefficient symbolized by ? and defined as
The correlation
would indicate
approaching -1
0 would suggest
coefficient can have values from -1 to +1. A value approaching +1

a strong positive relationship between Y and X, while a value
would indicate a strong negative relationship. A value approaching
that there is little or no relationship between Y and X.
For a random sample, the correlation between X and Y can be estimated by
In regression work we will seldom be dealing with strictly random samples. Usually
we try to get a wide range of values of the independent variable (X) in order to have
more precise estimates of the regression coefficients or to spot the existence of
curvilinear relationships. In addition, the data may not be from a normal population.
Far these reasons, the sample correlation coefficient computed from regression data
will usually not be a valid estimate of the population correlation coefficient.
It will, however, give a measure of the degree of linear association between the
sample values Y and X and this has been one of its primary uses in regression. If we
have observations on a Y and several X variables, the X variable having the strongest
(nearest to
or -1) correlation with Y will give the best association with Y in a
simple linear regression. That is, a linear regression of Y on this X will have a
smaller residual than that of the simple linear regression of Y on any of the other
X variables.
In this use of the correlation coefficient, it must be remembered that it is a measure
of linear association. A low correlation coefficient may suggest that there is little
or no linear relationship between the observed values of the two variables. There may,
however, be a very strong curvilinear relationship. The simple correlation between
Y and the X variables and among the X variables themselves may also be used as a
somewhat confusing guide in the selection of independent variables to be used in the
fitting of a multiple regression. In general, when two independent variables are
highly correlated with each other, it is unlikely that a linear regression involving both
of these variables will be very much better than a linear regression involving only
one of them. If we had, for example:
FPL 17
-103-
then the regression Y = 0

either Y
+ 1X1 + 2X2 would probably not be much better than
+ X , or Y
= + X . Of the two simple regressions,
0
1 1
2
0
2 2
Y
=
+ X would give the better fit, since the correlation of Y and X is greater
1
0
1 1
than the correlation of Y and X . In practice, the correlations usually are not so
2
large or the indications so clearcut. When a number of X variables are under consideration for use in a multiple regression, inspection of the simple correlation
coefficients between Y and each X and between pairs of Xs provides little more than
a rough screening.
1
Partial Correlation Coefficients

In the previous paragraph we considered an approach to the problem of which
variables to use in a multiple regression. In the case of a Y and two X variables,
this resolved down to the question of whether or not a linear regression involving
X
and X would be any better than a simple regression involving only X or X as
1
2
1
2
the independent variable. The simple correlation coefficients sometimes shed some
light on this, but they are just as likely to confuse the issue.
The partial correlation coefficient may give a better answer. Having fitted a
regression of Y on one or more X variables, the partial correlation coefficient
indicates the degree of linear association between the regression residuals (deviations
would be a measure
of Y from the regression) and some other X variable. Thus r
y21
of the linear relationship between y and X after adjustment for the linear relationship
2
between Y and X . The value of r
is given by
1
y21
In the example where we hadr
FPL 17
y1
= .84, r
y2
= .78, and r
-104-
21
(= r
12
) = .92 we would have
This tells us that after fitting the linear regression of Y on X , there would be little
1
associations between Y and X . A more exact way of putting this is that the corre2
lation between X and the residuals about the regression of Y on X is very low (.034).
1
2
The general equation for the partial correlation between Y and X after fitting the
j
linear regression of Y on X is
k
This is sometimes referred to as the first partial correlation coefficient.

If we wished to know the correlation between a variable (say X3) and the residuals
of Y about the multiple regression (say Y on X
and X ), the formula would be

2
This is sometimes referred to as the second partial correlation coefficient. In order

to compute the second partial it wouldbe necessary to first compute the first partials
(r
y21
, r
y31
, etc.) by means of the previous formula.
The process can be extended to the extent of the individual's inclination and energy.
The correlation of X with the residuals of Y after fitting the regression of Y on X .
1
4
X , andX would be:
2
3
The general equation for the correlation between X and the residuals of Y after
j
fitting the regression of Y on X , X ,
and X
is
k
1
2 ---
The use of partial correlation coefficients as an aid in the selection of the best
independent variables to be fitted in a multiple regression, has lost much of its
popularity since the advent of electronic computers. With these machines it has
FPL 17
-105-
fairly easy to fit regressions involving many or all possible combinations of

a set of independent variables and then select the best combination by an inspection
of the residual mean squares.
The Coefficient of Determination
A
commonly used measure of how well a regression fits a set of data is the
2
coefficient of determination, symbolized by r if the regression involves only one
2
independent variable, and by R if it involves more than one independent variable. For
the common case of a regression with a constant term ( ) which has been fitted with
0
corrected sums of squares and products, the coefficient of determination is calculated
as
2
Thus, R represents the proportion of the variation in Y that is associated with the
regression on the independent variables.
If the regression has been fitted and the reduction computed with uncorrected
2
sums of squares the formula for R is
The relationship between the coefficient of determination and the correlation

2
coefficient can be seen by an inspection of how r is computed. For a simple linear
regression the normal equation and its solution are
Then, since the reduction sum of squares is equal to the estimated coefficient
times the right-hand side of its normal equation, we have
Thus,
FPL 17
-106-
This can be recognized as the square of the simple correlation coefficient
By analogy, R is called the coefficient of multiple correlation.

Tests of Significance
The simple and multiple correlation coefficients (r and R) are sometimes used to
test a fitted regression. The distribution of these sample variables has been tabulated
and the test consists of comparing the sample value with the tabulated value. If the
sample value is greater than the tabular value at a specified probability level, the
regression is said to be significant.
In the case of a simple linear regression Y =
+ X , r has degrees of freedom

0
1 1
equal to the degrees of freedom for the residual mean square. The test of the
hypothesis that ? = 0 is equivalent to the previously described tests of the hypothesis
that = 0. It is possible to test other hypotheses about ? or to test the difference
1
among two sample r values, but these require a transformation of r to Z. The details
are given by Snedecor (7).
+ X + X +
+ X , testing
0
1 1
2 2
k k
--the significance of R is equivalent to testing the hypothesis that = = = k = 0.
1
2 --R has degrees of freedom equal to the degrees of freedom for the residual mean
square. The tables of R take into account the number of variables fitted.
In the case of a multiple regression Y =
Using r or R for tests of significance seems to offer no advantages over the

appropriate F- or t-test.
THE BEST OF T W O LINEAR REGRESSIONS

When a single set of observations has been used to fit simple linear regressions of
Y on each of two independent variables it is often desirable to know which of these
regressions is the best. That is, we have Y
= X
and Y =
+ X ,
1
11 1
2
02
12 2
both of which are significant and we want to know whether one is significant1y better
than the other.
FPL 17
-107-
A test credited to Hotelling and described by W. D. Baten in the Journal of the

American Society of Agronomy (Vol. 33: pp. 695-699) is to compare
with tabular t with n - 3 degrees of freedom.

In this equation,
|r| = The absolute value (sign ignored) of the simple correlation coefficient
(r = the correlation between Y and X , etc.).
y1
To illustrate, suppose we have the following set of observations.
FPL 17
-108-
Then,
not significant at the .05 level.
Thus, the linear regression of Y on X is not significantly better (from the stand2
point of precision) than the regression of Y on X . If X were significantly better than
1
2
X but X were more easily measured, thenselecting the best of the two regressions
1
1
becomes a matter of deciding how much the extra precision of X is worth.
2
By working with partial correlation coefficients it is possible to extend this test
to the problem of which of two variables is better, when fitted after some specified
set of independent variables. The test cannot, unfortunately, be extended to the
comparison of two sets of independent variables.
FPL 17
-109-
SELECTED REFERENCES
1. Dixon, W. J., and Massey, F. J., Jr.

1957. Introduction to statistical analyses. 488 pp. New York: McGraw-Hill
Book Co.
2. Ezekial, M., and Fox, K. A.
1959. Methods of correlation and regression analysis: linear and curvilinear.
Ed. 3, 548 pp. New York: John Wiley & Sons.
3. Freese, F.
1962. Elementary forest sampling. Forest Serv., U.S. Dept. of Agriculture,
Agr. Handbook No. 232, 91 pp., Washington, D.C.
4. Friedman, J., and Foote, R. J.
Computational methods for handling systems of simultaneous equations.
Marketing Serv., U.S. Dept. of Agriculture, Agr. Handbook No. 94,
109 pp., U.S. Government Printing Office, Washington, D.C.
5. Goulden, C. H.
1952. Methods of statistical analysis. Ed. 2, 467 pp., New York John Wiley
& Sons.
6. Grosenbaugh, L. R.
1958. The elusive formula of best fit: a comprehensivenew machine program.
U.S. Forest Serv., Southern Forest Expt. Sta. Paper No. 158, 9 pp.,
New Orleans, La.
7. Snedecor, G. W.
1956. Statistical methods. Ed. 5, 534 pp., Ames, Iowa: Iowa State University
Press.
8. Walker, H. M.
1951. Mathematics essential for elementary statistics. Ed. 2, New York The
Henry Holt & Co.
9. Williams, E., Jr.
Regression analysis. 214
FPL 17
New York: John Wiley & Sons.
-110-
APPENDIX A-The solution of normal equations

There are numerous routines available for solving a set of normal equations. One
of these is to use the c-multipliers as described in the section on matrix algebra.
However, if the c-multipliers are not needed for setting confidence limits or testing
hypotheses, then there are less laborious procedures available. Three of these will
be illustrated by solving the normal equations that appear in the first part of
Problem I:
In each of these
are warranted by
impossible to get
have been checked,
mensurate with the
methods, and throughout this Paper, more digits are carried than
the rules for significant digits. Unless this is done it is usually
any sort of check on the computations. After the computations
the coefficients should be rounded off to a number of digits comprecision of the original data.
Method I. --Basic Procedure. Basically, all methods involve manipulating the equations
so as to eliminate all but one unknown, and then solving for this unknown. Solutions
for the other unknowns are then obtained by substitution in the equations that arise
at intermediate stages. This may be illustrated by the following direct approach
which may be applied to any set of simultaneous equations.
^
Step 1. Divide through each equation by the coefficient of , giving
1
^
Step 2. Eliminate by subtracting any one of the equations (say the first) from
1
each of the others
FPL 17
-111-
^
Step 3. Divide through each equation by the coefficient of giving
2
^
Step 4. Subtract either equation (say the first) from the other to eliminate .
2
^
Step 5. Solve for
^
^
Step 6. To solve for , substitute the solution for in one of the equations
2
3
(say the second) of Step 3.
so,
^
^
^
Step 7. To solve for substitute for and in one of the equations (say the
1
2
3
third) of Step 1.
Step 8. As a check, add up all of the original normal equations giving
^
Now substitute the solutions for
^
^
, , and in this equation,
2
3
Check.
Method 11.--Forward Solution. A systematic procedure for solving the normal
equations is the so-called Forward Solution. It is a more mechanical routine and
perhaps a bit more difficult to learn and remember, but it has the advantage of
providing some supplementary information along the way. The steps will be described
using the symbols of table 2. The numerical results of these steps will be presented
FPL 17
-112-
in table 3. In this example, the columns headed Coefficients and Reduction

give supplementary information. If only the final regression coefficients are desired,
these columns may be omitted.
In any of the mechanical computation systems there is a pattern to the computations.
Once this pattern has been recognized, the systems are easily applied and extended
to larger or smaller problems. Learning the system is primarily a matter of
recognizing the pattern.
Step 1. Write the upper right half of the matrix of sums of squares and products
along with the sums of products involving Y. Thus, in table 2,
etc. In the column headed Coefficientsn are the
regression coefficients that would be obtained by fitting a simple linear
regression of Y on each of the X variables. The coefficients are computed as
Table 2.--The forward solution in symbols.
FPL 17
-113-
Table 3. --The forward solution- -numerical example.
In the last column are the reduction sums of squares that would be obtained by
linear regression.
fitting the
These are computed as
or,
Step 2. Rewrite the sums of squares and products from the X row.
1
Step 3. Divide each element in row 2 by the first element in that row (a ).
11
Thus, q
= a /a
12
12 11
Step 4. Compute the matrix of sums of squares and products adjusted for the
regression of Y on X The general equation is
1'
Thus,
The coefficients obtained at this stage are those that would be obtained for X or X
2
3
when fitted along with X . To indicate this the symbol often used is b
(or sometimes
1
21
b
) and b
(or sometimes b
). In the last colums are the reductions that would
Y21
31
Y31
be attributable to X (or X ) whenfitted after X . In this example the reduction due to
1
2
3
X alone is 1102.7690 and the reductiondueto X after X (i.e., the gain due to X ) is
1
2
1
2
629.5758; so the total reduction due to fitting X and X would be the sum of these or
1
2
1732.3448.
At this stage we could, if desired, compute a residual sum of squares and mean
square and test whether X or X made a significant reduction when fitted after X .
2
3
1
If neither did, we might not wish to continue the fitting.
FPL 17
-115-
Step 5. Copy the adjusted sums of squares and products in the first row of Step 4.
.
221)
Step 7. Compute the matrix of sums of squares and products adjusted for the
regression of Y on X and X .
1
2
Step 6. Divide each element of Step 5 by the value of the first element (a
The regression coefficient b

of X
is the coefficient for X fitted in the presence

312
3
and is one of the terms we are seeking (previously we labelled it
and X
1
2
^
here to distinguish it from b
and b ). The other two
as , but we use b
312
31
3
3
^
^
terms b
(or ) and b
(or ) are easily obtained from lines 6 and 3.
213
2
123
1
Thus, from line 6
and from line 3,
The reduction obtained in Step 7 is the gain due to X
after fitting X and X . Since

3
1
2
the reduction due to X and X was 1732.3448 and the gain due to X after X and X
1
2
3
1
2
is 4.1860, the total reduction due to X , X , and X is 1736.5308 (as given in Problem
1
2
3
XI). We could at this stage, test the gain due to X and decide whether to retain it as
3
a variable in the regression. If we decided to drop X the coefficients for the
3
regression of Y on X and X could be obtained from Steps 1 through 4 simply by
1
2
ignoring all the terms having a 3 in the subscript.
Method III.--Stepwise Fitting. This method is merely a modification of the second
method. At each stage of the fitting, the sums of squares and products (original or
adjusted) are rearranged so that the variable giving the largest reduction of the
residuals is on the left and will be the next one fitted. Also, at each stage, the
reduction due to the best variable is tested, and if the gain is not significant the
fitting is stopped.
FPL 17
-116-
The
procedure is helpful for screening a large number of independent
variables in order to select those that are likely to give a good fit to the sample
data. It should be noted, however, that the procedure is strictly exploratory. The
probabilities associated with tests of hypotheses that are selected by examination of
the data are not what they seemtobe. Significance tests made in this way do not have
the same meaning that they have when applied to a single preselected hypothesis.
It might also be noted that though the stepwise procedure will frequently lead to
the linear combination of the independent variables that will result in the smallest
residual mean square, it does not always do so. This can only be done by fitting all
possible combinations and then comparing their residuals. Here again, tests of
significance may be informative, but the exact probabilities are unknown.
FPL 17
-117-
APPENDIX 6- Matrix inversion

The inversion of a matrix is a common mathematical problem and dozens of
computational schemes have been devised for this purpose. The job is not particularly
complex, but it can be quite tedious and it is very easy to make simple but disastrous
arithmetical mistakes. To avoid a load of unpleasant labor and the possibility of some
frustrating mistakes, it is best to let an electronic computer handle the work (this is
true of all regression calculations). For a few dollars, the computer will do a job
that might takes days on a desk calculator.
There will be times, however, when electronic computer facilities are not
immediately available and hand computation is necessary. One of the many computational routines for inverting a symmetrical matrix is known as the Abbreviated
Doolittle Method. To illustrate the procedure we will obtain the inverse of the matrix
of uncorrected sums of squares and products from the second part of Problem I.
The matrix is
In describing this method, the elements of the matrix to be inverted will be

symbolized by a where i indicates the row and j the column in which the element
ij
appears. Since the matrix is symmetrical, a = a . The elements of the inverse
ij
ji
matrix will be symbolized by c . Since the original matrix is symmetrical, the
ij
inverse will also be so and hence, c
ij
= c . As this is a matrix of uncorrected sums

ji
of squares and products we will let i and j start at zero. If we were working with
corrected sums of squares and products we would usually let i and j start at one.
The results of each step in the method will be shown symbolically in table 4 and
numerically in table 5. In following these steps it is important to notice the pattern of
the computations. Once this has been recognized, the extension to a matrix of any
size will be obvious.
FPL 17
-118-
GPO
815-411-4
Table 4.--Invertinga symmetric matrix
FPL 17
-119-
Table
5.--Inverting a symmetric matrix--numerical example.
Step 1. In the A Columns write the upper right-halfof the matrix to be inverted.
Step 2. In the I Columns write a complete identity matrix of the same dimensions
as the matrix to be inverted.
Step 3. In the check column perform the indicated summations. For row 0 the
+a
+a
+ a
+ 1. For row 1 the sum will be a
+a
+
sum will be a
00
01
02
03
10
11
a + a
+ 1, and so forth. Note that a
= a
(the matrix is symmetrical).
12
13
10
01
Step 4. Copy the entries from row 0. In table 4, the entry in the first I Column
(=1) has been symbolized by d .
00
Step 5. Divide each element (including the check sum) of line 4 by the first
element (a ) in that line. The sum of all of the elements in the A and I Columns
00
will equal the value in the check column if no error has been made.
Step 6. The elements in this line (including the check) are obtained by
multiplying each element of line 4 (except the first) by b
and subtracting this
01
quantity from the corresponding elements of row 1. Thus, a
=a - b a
110
11
01 01
and a
= a - b a . The sum of these elements must equal the value in
12 0
12 01 02
the check column.
Step 7. Divide each element in line 6 by the first element in that line (a
.
11 0)
Check.
Step 8. The elements in this line are obtained by subtracting two quantities
from each element of row 2. The two quantities are b times (the element in
02
line 4 below the row 2 element) and b
times (the element in line 6 below
12 0
the row 2 element. Thus,
and
The elements in line 8 must equal the value computed for the check column.
Step 9. Divide each element of line 8 by the first element in this line. Check.
Step 10. The elements of this line are obtained by subtracting three quantities
from each element of row 3. The three quantities are b times (the line 4
03
element below the row 3 element), b
times (the line 6 element below the
13 0
row 3 element), and b
times (the line 8 element below the row 3 element).
23 01
Thus,
-121FPL 17
and
The sum of the elements in line 10 must equal the computed value in the check
column.
Step 11. Divide each element in line 10 by the first element in that line (a
33 012
).
Step 12. Compute the c-multipliers by the following formulae:
Step 13. As a final check, multiply the original matrix by the inverse. The
product should be the identity matrix
FPL 17
-122-
APPENDIX C-Some simple functions and curve forms
I.
Y = a + bX -- Straight line
Linear Model:
Y = b0 + b X
M 124 625
a = Y-intercept (Value of Y when X = 0)

b = Slope (Change in Y per unit change in X)
Estimates:
11.
(Y - a) = k(X - b)
-- Second degree parabola
Linear Model:
Y =b
Y-intercept is at Y = kb
X-intercepts are at X
+ b X + b X
2
1
M 124 625
+ a
(complex if
Estimates:
FPL 17
-123-
is negative)
III.
(Y - a) = k/X -- Hyperbola
a = Level of horizontal asymptote

Estimates:
IV.
Estimates:
FPL 17
-124-
V.
Estimates:
FPL 17
-125-
VI.
M124618
^
Estimates: a = anti-log b
^
=b
b
^
c
= anti-log b
FPL 17
-126-
VII.
Y = ab
(X - c)
Linear Model:
+ b X + b X
log Y = b
M 124 624
Estimates:
VIII. 10
= aX
Linear Model:
Y =b
+ b
(log X)
M 124 624
A
Estimates: a = anti-log b
b^ = b
FPL 17
-127-
APPENDIX D-The analysis of designed experiments
In the section on covariance analysis we encountered the use of dummy variables

in a regression analysis. This leaves us very close to the connection between
regression analysis and the analysis of variance in designed experiments. Those
who have some familiarity with the analysis of variance of designed experiments may
be interested in taking a look at this connection.
As a simple example, suppose that we have a completely randomized design
comparing three treatments with four replications of each. The yields might be as
follows:
Sums
Yields
Treatment
I
12
17
16
15
60
II
14
13
12
48
III
11
20
18
13
62
170
For the standard analysis of variance of this design we first calculate the correction
term and the sums of squares for total, treatment, and error.
Correction term (CT) =
Total
Treatment
Error
FPL 17
-128-
The completed analysis is

Source
df
SS
MS
Treatments
28.6667
14.3333
Error
81.0000
9.0000
Total
11
109.6667
F
1.593
These computations are merely a simplified form of regression analysis. To see

this, we first represent the yield for each plot in the experiment by the linear model
where:
= 1 for any plot receiving treatment I;

= 0 otherwise
= 1 for any plot receiving treatment 11;

= 0 otherwise
= 1 for any plot receiving treatment III;

= 0 otherwise.
, , and = The effects of treatments I, II, and III, expressed as

1
2
3
deviations from the overall mean (represented by ).
0
Because the treatment effects are expressed as deviations from the mean, they will
sum to zero (i.e., + + = 0) so that we can express one coefficient in terms
1
2
3
of the other two (say = - - ) and rewrite the model
3
1 2
or
where:
FPL 17
-129-
For any plot receiving treatment I, the independent variables will have values
X' = 1, and X' = 0; for treatment II plots, the values are X' = 0, and X' = 1; and for
2
treatment III plots, X' = -1, and X' = -1. Thus the study data can be listed as follows:
2
X'
1
Treatment
Y =Yield
12
17
16
15
1
1
1
1
0
0
0
0
II
14
9
13
12
0
0
0
0
1
1
1
1
III
11
20
18
13
-1
-1
-1
-1
-1
-1
-1
-1
170
Sums
X'
2
The normal equations for fitting the revised model (with uncorrected sums of squares
and products) are:
or
The solutions are:
FPL 17
-130-
The reduction sum of squares for this model is therefore

Reduction
= (14.1667)(170) + (.8333)(-2) + (-2.1667)(-14)

= 2437.0062, with 3 df,
and the residual sum of squares is

= CY
Residual
- Reduction
= 2518 - 2437.0062 = 80.9938, with 12 - 3 = 9 df.

The hypothesis of no difference among treatments is equivalent to the hypothesis
that
= = = 0, and the model becomes
1
2
3
which for the normal equation is
or
^
The solution is =
0
Reduction
so the reduction sum of squares is

=
= 2408.3390, with 1 df.
Then the analysis of variance for testing this hypothesis is

Source
df
SS
MS
Reduction due to maximum model
2437.0062
Reduction due to hypothesis model
2408.3390
28.6672
14.3336
Residuals about maximum model
80.9938
8.9993
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Total
12
F
2/9df
2518
= 14.3336 = 1.593
8.9993
Except for rounding errors and differences in terminology, this is the same result
as the standard test procedure.
FPL 17
-131-
In illustrating the regression basis for the analysis of a designed experiment, we

have made use of dummy variables so as to avoid too much of a departure from the
familiar regression situation. But in most text books on experimental design the
dummy variables are present by implication only. The model is written in terms of
the coefficients. Thus, for the completely randomized design the model might be
written
where
= The observed yield of the j
plot of the
th
treatment
= The overall mean yield

1
= The effect of treatment i expressed as a departure from the overall
mean (so that

ij
= 0).
= The e r r o r associated with the j
plot of the i
treatment.
For the randomized block design with one replication of each treatment in each
block, the model is
where i = The effect of block i expressed as a departure from the overall mean, so
that = 0.
i i
Other terms are as previously defined.
Thus, each experimental design is defined by some linear model. The analysis of
variance for the design involves a least-squares fitting of the model under various
hypotheses and testing the differences in residuals, As in any regression analysis,
the hypothesis to be tested should be specified prior to examination of the data.
FPL 17
-132-
APPENDIX E-Tables
Table 6.--The distribution of F
M 124 644
133
Table 6;--The distribution of F (cont.)
M 124 645
Reproduced by permission of the author and publishers from table 10.5.3 of Snedecor's Statistical
Methods (ed. 5), 1956, Iowa State University Press, Ames, Iowa. Permission has also been
granted by the literary executor of the late Professor Sir Ronald A. Fisher and Oliver and Boyd
Ltd., publishers, for the portion of the table computed from Dr. Fisher's table VI in Statistical
Methods for Research Workers.
134
GPO
815-411-3
Table 7.--The distribution of t
M 124 643
Table reproduced in part from table III of Fisher and Yates' Statistical Tables for Biological.
Agricultural. and Medical Research. published by Oliver and Boyd Ltd., Edinburgh. Scotland.
Permission has been given by Dr. F . Yates. by the literary executor of the late Professor Sir
Ronald A. Fisher. and by the publishers.
135
Table
8.--The cumulative normal distribution

(Probability of a standard normal
deviate being greater than 0 and less
than )
.00
.01
.02
.0
0000
0398
0793
1179
1554
0040
0438
0832
1217
1591
0080
0478
0871
1255
1628
0120
0517
0910
1293
1664
0160
0557
0948
1331
1700
0199
0596
0987
1368
1736
1915
2257
2580
2881
3159
1950
2291
2611
2910
3186
1985
2324
2642
2939
3212
2019
2357
2673
2967
3238
2054
2389
2704
2995
3264
1
1.1
1.2
1.3
1.4
3413
3643
3849
4032
4192
3438
3665
3869
4049
4207
3461
3686
3888
4066
4222
3485
3708
3907
4082
4236
1.5
1.6
1.7
1.8
1.9
4332
4452
4554
4641
4713
4345
4463
4564
4649
4719
4357
4474
4573
4656
4726
2.0
2.1
2.2
2.3
2.4
4772
4821
4861
4893
4918
4778
4826
4864
4896
4920
2.5
2.6
2.7
2.8
2.9
4938
4953
4965
4974
4981
3.0
3.1
3.2
3.3
3.4
.05
.07
.08
.09
0239
0636
1026
1406
1772
0279
0675
1064
1443
1808
0319
0714
1103
1480
1844
0359
0753
1141
1517
1879
2088
2422
2734
3023
3289
2123
2454
2764
3051
3315
2157
2486
2794
3078
3340
2190
2517
2823
3106
3365
2224
2549
2852
3133
3389
3508
3729
3925
4099
4251
3531
3749
3944
4115
4265
3554
3770
3962
4131
4279
3577
3790
3980
4147
4292
3599
3810
3997
4162
4306
3621
3830
4015
4177
4319
4370
4484
4582
4664
4732
4382
4495
4591
4671
4738
4394
4505
4599
4678
4744
4406
4515
4608
4686
4750
4418
4525
4616
4693
4756
4429
4535
4625
4699
4761
4441
4545
4633
4706
4767
4783
4830
4868
4898
4922
4788
4834
4871
4901
4925
4793
4838
4875
4904
4927
4798
4842
4878
4906
4929
4803
4846
4881
4909
4931
4808
4850
4884
4911
4932
4812
4854
4887
4913
4934
4817
4857
4890
4916
4936
4940
4955
4966
4975
4982
4941
4956
4967
4976
4982
4943
4957
4968
4977
4983
4945
4959
4969
4977
4984
4946
4960
4970
4978
4984
4948
4961
4971
4979
4985
4949
4962
4972
4979
4985
4951
4963
4973
4980
4986
4952
4964
4974
4981
4986
4987
4990
4993
4995
4997
4987
4991
4993
4995
4997
4987
4991
4994
4995
4997
4988
4991
4994
4996
4997
4988
4992
4994
4996
4997
4989
4992
4994
4996
4997
4989
4992
4994
4996
4997
4989
4992
4995
4996
4997
4990
4993
4995
4996
4997
4990
4993
4995
4997
4998
3.6
4998
4998
4999
4999
4999
4999
4999
4999
4999
4999
3.9
5000
M 124 646
Reprinted from Table 8.8.1 of Statistical Methods

(ed. 5) by G. W. Snedecor, 1956, published by
the Iowa State University Press, Ames, Iowa, and
by permission of the author and publisher.
136
1.5-137
Forest Service regional experiment stations and Forest Products laboratory
PUBLICATION LISTS ISSUED BY THE

FOREST PRODUCTS LABORATORY
The following lists of publications deal with investigative projects of the
Forest Products Laboratory or relate to special interest groups and are available upon request:
Box, Crate, and Packaging Data
Chemistry of Wood
Drying of Wood
Fire Protection
Logging, Milling, and Utilization

of Timber Products
Mechanical Properties of Timber
Pulp and Paper
Fungus and Insect Defects in

Forest Products
Structural Sandwich, Plastic

Laminates, and Wood-Base
Components
Glue and Plywood
Thermal Properties of Wood
Growth, Structure, and

Identification of Wood
Wood Finishing Subjects
Furniture Manufacturers,
Woodworkers, and Teachers
of Woodshop Practice
Wood Preservation
Architects, Builders, Engineers,
and Retail Lumbermen
Note: Since Forest Products Laboratory publications are so varied in subject

matter, no single catalog of titles is issued. Instead, a listing is made for
each area of Laboratory research. Twice a year, December 31 and
June 30, a list is compiled showingnewreports for the previous 6 months.
This is the only item sent regularly to the Laboratorys mailing roster,
and it serves to keep current the various subject matter listings, Names
may be added to the mailing roster upon request,
GPO 815-411-2
NOTES
NOTES
NOTES
NOTES
The Forest Service, U.S. Department of Agriculture, is dedicated to the

principle of multiple use management of the Nation's forest resources for
sustained yields of wood, water, forage, wildlife, and recreation. Through forestry research, cooperation with the States and private forest owners, and
management of the National Forests and National Grasslands, it strives--as
directed by Congress--to provide increasingly greater service to a growing
Nation.
U. S. Forest Products Laboratory.
Linear regression methods for forest research, by Frank Freese. Madison,

Wis., F.P.L., 1964.
136 pp.. illus. (U.S. FS res. paper FPL 17)
A presentation and discussion of the methods of linear regression analysis

that have been found most useful in forest research. Topics treated include the
fitting and testing of linear models, weighted regression, confidence limits,
covariance analysis, and discriminant functions. The various methods are
illustrated bytypical numerical examples and their solution.
FOREST PRODUCTS LABORATORY

U.S. DEPARTMENT OFAGRICULTURE
FOREST SERVICE---MADISON,WIS.
In
Cooperation
with
the
University
of
Wisonsin

FPLRP 17

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

FPLRP 17

Transféré par

Droits d'auteur :

Formats disponibles

U. S.

FOREST SERVICE RESEARCH PAPER FPL 17 DECEMBER 1964

REGRESSION--THE GENERAL IDEA . . . . . . . . . . . . . . . . . . . . . .

FITTING A LINEAR MODEL . . . . . . . . . . . . . . . . . . . . . . . . .

FITTING A WEIGHTED REGRESSION . . . . . . . . . . . .

SOME ELEMENTS OF MATRIX ALGEBRA . . . . . . . . . . . . . . . .

THE BEST OF TWO LINEAR REGRESSIONS . . . . . . . . . . . . . . . . .

APPENDIX A.--THE SOLUTION OF NORMAL EQUATIONS . . . . . . . . .

APPENDIX B.--MATRIX INVERSION . . . . . . . . . . . . . . . . . . . . .

APPENDIX C.--SOMESIMPLE FUNCTIONS AND CURVE FORMS .....

APPENDIX D.--THEANALYSIS OF DESIGNED EXPERIMENTS .... ..

by FRANK FREESE, Analytical Statistician

Many researchers and administrators have discovered the usefulness of regression

Maintained at Madison, Wis.. in cooperation with the University of Wisconsin.

The expression relatively simple is not very informative. To be more specific,

knowledge of matrix algebra is not assumed. However, the so-called c-multipliers

REGRESSION - THE GENERAL IDEA

where: Y = the actual value of Y for the i

= the population mean of all Y values

z = Individual values of Y in population 1

higher and some lower than this. In short, we again write

particular combination of X values; it could also be useful in selecting a combination

log Y = log a + b log X

not curved. Similarly, a second-degree parabola (Y = a + bX + CX ) may give an

THE MATHEMATICAL MODEL

Figure 2.--The second degree parabola.

Figure 3.--The hyperbola.

If we let X = 1 , then we can fit this function with the model

A s a final example, the exponential curve represented by the function

Figure 4.--Exponential curves.

There are, of course, some curvilinear functions that cannot be transformed to a

or in terms of the general linear model

Figure 5.--Y is a linear function of X .

Figure 6.--The relationship of Y to X and X . M 124 628

or the general model

In this case, we say that the effects of X and X are additive.

If the relationship of the de pe n de n t variable to an independent variable is

single independent variable, plotting is no problem. For two independent variables

Figure 7.--Relationship of Y to X at various levels of X .

Each symbol represents a different class of X

values as shown along the right side

= a" + b" X ). This would suggest the model

The Least Squares Principle

where: Y. = the observed value of the dependent variable for the i

= the regression coefficient of the j

= the deviation of the Y value from the regression surface

sample unit, the deviation would be

and the squared deviation is

For all sample units, the sum of squared deviations is

In this quantity, we know what the values of Y , X , X ,

the normal equations would be:

The normal equations for this model are:

The usual procedure is to solve the normal equations for , ,

etc., are usually referred to as the corrected sums

of squares and products, while

etc., are called uncorrected or raw

The data were as follows:

Calculating the corrected sums of squares and products:

of the system yields

and from these we obtain

Therefore, the fitted regression is

Appendix A reviews the method for solving a set of simultaneous equations.

In a fitted regression, the circumflex (^ ), which is also referred to as a caret or hat,