Académique Documents
Professionnel Documents
Culture Documents
Areas of application:
Correlation and Regression find a lot of applications in industrial quality control as they present
a means to predict and control product / process behavior by studying relationship across
variables. Some key uses are in:
---By determining important factors responsible in producing variability in the output quality.
---By determining, to what extent variation in a factor will be causing the variation in output
quality.
3. Industrial research
In correlation and regression studies, the Engineer takes data as he finds them instead of
controlled laboratory condition, and discovers the relationship.
Bivariate distribution
Regression theory is built upon the concept of a bivariate function. Joint distribution of two variables is
known as bivariate distribution. A chart known as scatter diagram can show relations between two
variables
A line or curve that shows how the mean of the values of one variable change with the values of other
variable is called a line or curve of regression.
Regression of Y on X
--It is the relationship between the average values of Y for a given X & the values of X.
Example: strength of cotton yarn (Y) and fibre length (X)
2 1 2
124.5 2 1 1 1 1
1 3
114.5 2 1 2
4 2 2
104.5 1 1 1 9 7 1 1
4 8 9 11 3 1
94.5 1 12 10 5 7 7 1
1 2 4 12 5 3 1
84.5 1 4 5 2 1
2 3 2 1 1
74.5 1 1 1 2
0.545 0.645 0.745 0.845 0.945
REASONS ARE:
-- The dependent variable Y is affected by variable other than X.
-- The dependent variable Y is affected by variable X and also by many other variables.
-- There is host of chance forces that causes the error of measurement.
Though there is not a single value of Y for a given value of X but there is a tendency for Y values
to be higher when X is higher and lower when X is lower. The mean value of Y increases steadily
with X. It is this locus of mean values that is called the “REGRESSION OF Y ON X.”
There is also a regression of X on Y. This would be the locus of mean values of X for a given
value of Y. A line of regression may have either a positive or negative slope indicating the type
of relationship.
COEFFICIENT OF CORRELATION
With every linear regression there is associated a coefficient of correlation which measures the degree
of association between two variables denoted by r.
n xy x y
r .
n x x n y y
2 2 2 2
If r is positive then the slope of the distribution is positive and if it is negative then slope is also negative.
When all the points are on the line, the deviation of y values from the line of regression is 0 and r
becomes 1, indicating a perfect linear relation.
The nearer r is to 1, closer are the points to the line of regression, thus the magnitude of r may be taken
as a measure of the degree to which the association between the variable approaches a linear functional
relationship.
x 15 y 1 xy 9 x 2 55 y 2 15
It is a measure of reliability of the estimate from line of regression. The SD of the distribution of
Y values for a given value of X gives it. It helps in determining a confidence interval for Y. The SD
of Y for a given X is commonly called the “ STANDARD ERROR OF ESTIMATE”, since it measures
the error involved in using the regression value to estimate Y. The universe quantity is denoted
by est and sample estimate by Sest.
∑(𝑦 − 𝑦̂)
Sest = √ 𝑛 −2
Where
This means that there is a 95% probability that the true linear regression line of the population
will lie within the confidence interval of the regression line calculated from the sample data.
In the graph on the left of Figure, a linear regression line is calculated to fit the sample data
points. The confidence interval consists of the space between the two curves (dotted lines).
Thus there is a 95% probability that the true best-fit line for the population lies within the
confidence interval (e.g. any of the lines in the figure on the right above).
There is also a concept called prediction interval. Here we look at any specific value of x, x0, and
find an interval around the predicted value ŷ0 for x0 such that there is a 95% probability that
the real value of y (in the population) corresponding to x0 is within this interval (see the graph
on the right side of Figure 1).
For any specific value x0 the prediction interval is more meaningful than the confidence
interval.