Vous êtes sur la page 1sur 26

DEPARTMENT OF OPERATIONAL

RESEARCH
UNIVERSITY OF DELHI

PROJECT ON REGRESSION ANALYSIS


GLOBAL WARMING
While climate model projections of the past benefit
from knowledge of atmospheric greenhouse gas
concentrations, volcanic eruptions and
other radioactive forcings affecting the Earth’s climate,
casting forward into the future is understandably more
uncertain. Climate models can be evaluated both on
their ability to hindcast past temperatures and forecast
future ones.

GREEN HOUSE EFFECT


A sample study is done to analyse what affects the
change in temperature in East Angela (geographical
area in East of England)

OBjECTIVE:
To establish the relationship between the temperature
change and the factors associated with it namely
month, greenhouse gases like CO2, N2O, CH4, CFC.11,
CFC.12, Aerosols, TSI and MEI.
INTRODUCTION
It has been observed that there is a constant increase in temperature every
year. This climate change, also known as greenhouse effect, is because of the
rise in average surface temperatures on Earth. Greenhouse effect, by
definition, is the process by which radiation from a planet's atmosphere warms
the planet's surface to a temperature above what it would be without its
atmosphere.
Most climate scientists agree the main cause of the current global warming
trend is human expansion of the "greenhouse effect" — warming that results
when the atmosphere traps heat radiating from Earth toward space.

Certain gases in the atmosphere block heat from escaping. Long-lived gases
that remain semi-permanently in the atmosphere and do not respond
physically or chemically to changes in temperature are described as "forcing"
climate change.

Carbon dioxide (CO2). A minor but very important component of the


atmosphere, carbon dioxide is released through natural processes such as
respiration and volcano eruptions and through human activities such as
deforestation, land use changes, and burning fossil fuels. Humans have
increased atmospheric CO2 concentration by more than a third since the
Industrial Revolution began. This is the most important long-lived "forcing" of
climate change.

Methane. A hydrocarbon gas produced both through natural sources and


human activities, including the decomposition of wastes in landfills,
agriculture, and especially rice cultivation, as well as ruminant digestion and
manure management associated with domestic livestock. On a molecule-for-
molecule basis, methane is a far more active greenhouse gas than carbon
dioxide, but also one which is much less abundant in the atmosphere.

Nitrous oxide. A powerful greenhouse gas produced by soil cultivation


practices, especially the use of commercial and organic fertilizers, fossil fuel
combustion, nitric acid production, and biomass burning.
Chlorofluorocarbons (CFCs). Synthetic compounds entirely of industrial origin
used in a number of applications, but now largely regulated in production and
release to the atmosphere by international agreement for their ability to
contribute to destruction of the ozone layer. They are also greenhouse gases.

The consequences of changing the natural atmospheric greenhouse are


difficult to predict, but certain effects seem likely:

 On average, Earth will become warmer. Some regions may welcome warmer
temperatures, but others may not.

 Warmer conditions will probably lead to more evaporation and precipitation


overall, but individual regions will vary, some becoming wetter and others
dryer.

 A stronger greenhouse effect will warm the oceans and partially melt
glaciers and other ice, increasing sea level. Ocean water also will expand if it
warms, contributing further to sea level rise.

 Meanwhile, some crops and other plants may respond favourably to


increased atmospheric CO2, growing more vigorously and using water more
efficiently. At the same time, higher temperatures and shifting climate
patterns may change the areas where crops grow best and affect the
makeup of natural plant communities.

As a result, it is the duty of every state or country to be aware of its


contribution in the greenhouse effect so that it can make regulations to reduce
the emission of these harmful gases.
As a result, we use Multiple linear regression model to helps us determine
how are these major factors used to predict the change in temperature of a
region due to per unit change in these parameters. Multiple regression (an
extension of simple linear regression) is used to predict the value of a
dependent variable (also known as an outcome variable) based on the value of
two or more independent variables (also known as predictor variables). In this
model we are using multiple regression to determine if the change in
temperature can be affected or predicted based on the presence of different
gases in the atmosphere. Multiple regression also allows you to determine the
overall fit (variance explained) of the model and the relative contribution of
each of the independent variables to the total variance explained.
ABOUT OUR DATA
For our model, the data has been collected from various trusted organisations’
site such as Climatic Research Unit at the University of East Anglia, ESRL/NOAA
Global Monitoring Division, Godard Institute for Space Studies at NASA,
SOLARIS-HEPPA project website and ESRL/NOAA Physical Sciences Division.
Available data includes:
Year & month: when the observation was recorded.

Temp: the difference in degrees Celsius between the average global


temperature in that period and a reference value.
CO2, N2O, CH4, CFC-11, CFC-12: atmospheric concentrations of
carbon dioxide (CO2), nitrous oxide (N2O), methane (CH4),
trichlorofluoromethane (CCl3F; commonly referred to as CFC-11) and
dichlorodifluoromethane (CCl2F2; commonly referred to as CFC-12),
respectively.
 CO2, N2O and CH4 are expressed in ppmv (parts per million by
volume)
 CFC.11 and CFC.12 are expressed in ppbv (parts per billion by
volume).

Aerosols: the mean stratospheric aerosol optical depth at 550 nm. This
variable is linked to volcanoes, as volcanic eruptions result in new particles
being added to the atmosphere, which affect how much of the sun's energy is
reflected back into space.

TSI: The total solar irradiance (TSI) in W/m2 (the rate at which the sun's
energy is deposited per unit area). Due to sunspots and other solar
phenomena, the amount of energy that is given off by the sun varies
substantially with time.
MEI: Multivariate El Nino Southern Oscillation index (MEI), a measure of
the strength of the El Nino/La Nina-Southern Oscillation (a weather effect in
the Pacific Ocean that affects global temperatures).
THE MODEL
Multiple regression is an extension of simple linear regression. It is used when
we want to predict the value of a variable based on the value of two or more
other variables. The variable we want to predict is called the dependent
variable (or sometimes, the outcome, target or criterion variable). The
variables we are using to predict the value of the dependent variable are called
the independent variables (or sometimes, the predictor, explanatory or
regressor variables).
Multiple regression also allows you to determine the overall fit (variance
explained) of the model and the relative contribution of each of the predictors
to the total variance explained. As a result, we have employed Multiple linear
regression model to helps us determine how are these major factors used to
predict the change in temperature of a region due to per unit change in these
parameters.
To establish the relationship between the temperature and the factors
associated with it namely month, CO2, CH4, N2O, CFC-11, CFC-12, total solar
irradiance (TSI), Aerosols, MEI

In this model, we have one dependent variable (Yi) that is temperature and 4
independent variables (Xi) or explanatory variables which are CO2, MEI, TSI
and Aerosol that contribute to our regression analysis.
Mathematically,
 Yi= the difference in degrees Celsius between the average global
temperature in that period and a reference value.
 X1= atmospheric concentrations of carbon dioxide (CO2) expressed in
ppmv (parts per million by volume i.e., 397 ppmv of CO2 means that
CO2 constitutes 397 millionths of the total volume of the atmosphere)
 X2= Multivariate El Nino Southern Oscillation index (MEI) a measure of
the strength of the El Nino/La Nina-Southern Oscillation (a weather
effect in the Pacific Ocean that affects global temperatures).
 X3= the mean stratospheric aerosol optical depth at 550 nm. This
variable is linked to volcanoes, as volcanic eruptions result in new
particles being added to the atmosphere, which affect how much of the
sun's energy is reflected back into space.
 X4= the total solar irradiance (TSI) in W/m2 (the rate at which the sun's
energy is deposited per unit area). Due to sunspots and other solar
phenomena, the amount of energy that is given off by the sun varies
substantially with time.
The multiple linear regression equation is as follows:

Y= b0 + b1 X1 + b2X2 + b3X3 + b4X4

Where X1 through X4 distinct independent or predictor variables, b 0 is the


value of Y when all of the independent variables are equal to zero, and
b1 through b4 are the estimated regression coefficients. Each regression
coefficient represents the change in Y relative to a one unit change in the
respective independent variable. In the multiple regression situation, b 1, for
example, is the change in Y relative to a one unit change in X1, holding all other
independent variables constant (i.e., when the remaining independent
variables are held at the same value or are fixed). Similarly, for other variables
as well.
We conduct the multiple linear regression analysis using SPSS. In our model we
need to enter the variable temp as the dependent variable to our multiple
linear regression model and the CO2, MEI, AEROSOL and TSI as independent
variables. The total number of observations in our model is 308
REGRESSION ANALYSIS
The output on running the regression model is as follows:
INTERPRETING THE RESULTS

BEST-FITTED MODEL:
We used the sequential approach (step-wise) for selecting the best regression
model. Out of the 9 variables, there were just 4variables contributing to the
model, namely- CO2, MEI, AEROSOLS and TSI.

This graph shows that there is a linear relationship between our dependent
variable-TEMP and independent variables- TSI, MEI, AREOSOL and CO2.
The histogram with a bell shaped curve shows that our error term
(disturbance) follows Normal Distribution, hence following the assumptions of
the model.

Explanation of the Regression Analysis


Output:
REGRESSION STATISTICS
These are the “Goodness of Fit” measures. They tell you how well the
calculated linear regression equation fits your data. Here, we will only consider
the data of model 4 which is our best fitted model as per step wise approach.

 R: This is the correlation coefficient. It tells you how strong the linear
relationship is. The correlation coefficient between the observed and
predicted values. It ranges in value from 0 to 1. A small value indicates
that there is little or no linear relationship between the dependent
variable and the independent variables. In our result the m R value is
0.848 which is close to 1 hence, there is a strong linear relationship
between the dependent and independent variables.

 R square: This is r2, the Coefficient of Determination. It tells you how


many points fall on the regression line. In our model 0.719 means that
71.9% of the variation of y-values around the mean is explained by the x-
values. In other words, 71.9% of the values fit the model. It is a statistical
measure of how close the data are to the fitted regression line. It is also
known as the coefficient of determination, or the coefficient of multiple
determination for multiple regression.

 Adjusted R square: The adjusted R-square adjusts for the number of


terms in a model. The adjusted R-squared is a modified version of R-
squared that has been adjusted for the number of predictors in the
model. The adjusted R-squared increases only if the new term improves
the model more than would be expected by chance. It decreases when a
predictor improves the model by less than expected by chance.

 Standard Error of the regression: An estimate of the standard deviation


of the error μ. This is not the same as the standard in descriptive
statistics! The standard error of the regression is the precision that
the regression coefficient is measured; if the coefficient is large
compared to the standard error, then the coefficient is probably
different from 0.
 DF2: We can see that our degree of freedom is 303 which is n-(k+1)
where n=308 and k=4.

EXPLANATION OF ANOVA TABLE


OUTPUT:

ANOVA (analysis of variance technique) is used to test the significance of the


model.
1. Sum of Squares (SS).
2. Regression Mean Square (MS) = SS / Regression degrees of freedom.
3. Residual Mean Square (MS) = mean squared error (Residual SS / Residual
degrees of freedom).
4. F: Overall F test for the null hypothesis.(Regression MS/Residual MS)
5. Significance F: The significance associated P-Value.

This table indicates that the regression model predicts the dependent variable
significantly well. How do we know this? Look at the "Regression" row and go
to the "Sig." column. This indicates the statistical significance of the regression
model that was run. Here, p < 0.05, and indicates that, overall, the regression
model statistically significantly predicts the outcome variable (i.e., it is a good
fit for the data).
The table is the F-test, the linear regression's F-test has the null hypothesis
that there is no linear relationship between the variables (in other words
R²=0). The F-test is highly significant, thus we can assume that there is a linear
relationship between the variables in our model.

Interpreting the Regression


Coefficients:

 Model - SPSS allows you to specify multiple models in a


single regression command. This tells you the number of the model
being reported.

 This column shows the predictor variables (CO2, MEI, AEROSOLS and
TSI). The first variable (constant) represents the constants as the Y
intercept, the height of the regression line when it crosses the Y axis. In
other words, this is the predicted value of TEMP when all other
variables are 0.
 B - These are the values for the regression equation for predicting the
dependent variable from the independent variable. The regression
equation is presented in many different ways, for example:
Y(predicted) = b0 + b1*x1 + b2*x2 + b3*x3 + b4*x4
The column of estimates provides the values for b0, b1, b2, b3 and b4 for this
equation.
1. CO2 - The coefficient for CO2 is .010. So for every unit increase in CO2,
a 0.010 unit increase in temperature is predicted, holding all other
variables constant.
2. MEI - For every unit increase in MEI, we expect a 0.068 unit increase
in the temperature, holding all other variables constant.
3. Aerosol - The coefficient for aerosols is 1.721. So for every unit
increase in Aerosols, we expect an approximately 1.721 point decrease
in the temperature, holding all other variables constant.
4. TSI - The coefficient for TSI is .099. So for every unit increase in TSI, we
expect a .099 point increase in the temperature.

 Std. Error - These are the standard errors associated with the
coefficients.

 Beta - These are the standardized coefficients. These are the


coefficients that you would obtain if you standardized all of the variables
in the regression, including the dependent and all of the independent
variables, and ran the regression. By standardizing the variables before
running the regression, you have put all of the variables on the same
scale, and you can compare the magnitude of the coefficients to see
which one has more of an effect. You will also notice that the larger
betas are associated with the larger t-values and lower p-values.
 t and Sig. - These are the t-statistics and their associated 2-tailed p-
values used in testing whether a given coefficient is significantly
different from zero. Using an alpha of 0.05:
1. The coefficient for CO2 (0.010) is significantly different from 0
because its p-value is 0.000, which is smaller than 0.05.
2. The coefficient for MEI (0.068) is significantly different from 0
because its p-value is 0.000, which is smaller than 0.05.
3. The coefficient for Aerosols (-1.721) is significantly different from
0 because its p-value i.e. 0.000 is definitely smaller than 0.05.
4. The coefficient for TSI (0.08) is statistically significant because its
p-value of 0.000 is less than .05.
The intercept is significantly different from 0 at the 0.05 alpha levels.
HETEROSCEDASTICITY
When error terms for different sample observations have same variances the
situation is called as homoscedasticity. The opposite of homoscedasticity is
called as heteroscedasticity. We are interested in detecting if there is a
heteroscedasticity present in the model. This can be done using a test based
on Spearman’s Rank Correlation.

Calculate the Spearman’s rank correlation coefficient between predicted


dependent variable ( ) and absolute residua (|ε |) as follows:

Where di =Yi^ - |ε i|
We set up the hypotheses as follows:

Null Hypothesis H0: ρ = 0 Absence of Heteroscedasticity

Alternative Hypothesis H1: ρ ≠ 0 Presence of Heteroscedasticity


Where ρ is the population rank correlation between predicted dependent
variable (Y^) and absolute residual (|ε ^|).

Define the t-statistic as follows

.
Calculate the p-value p = P (tn-2 > t) where tn-2 denotes a student’s t random
variable with n-2 degrees of freedom. If p < α/2 then reject the null hypothesis
at α level of significance, i.e. Heteroscedasticity is taken to be present.
For the model under study, we have

Thus we have p-value for the t-test is greater than 0.05. Hence we may
conclude that there is no heteroscedasticity in our model, i.e. our model is
homoscedastic in nature.
AUTOCORRELATION

We can see that the value of d under Durbin Watson D-test is d=0.911
We know that value of k=no of independent variables+1
Therefore, the value of k = 4+1=5.
Here, n= no of observations in our data
Therefore, n=308.

By using the significant table for finding the critical values of dL and dU
We find dL= 1.78766 and dU=1.83991
We can very well see that the value of d lies between 0 and the lower limit dL.
This implies that there is Positive autocorrelation present in our model.

REMOVAL:
Here we make use of the COCHRAN ORCUTT ITERATIVE METHOD
Consider the model-

Yt = b0 + b1Xt1 + b2Xt2 +……+ bkXtk + Ut -(1)

Where Ut is auto correlated under AR(1) scheme and is given by


Ut = PUt-1 + Vt
Rewrite eqn (1) for period (t-1)
Yt-1 = b0 + b1X(t-1)1 + b2X(t-2)2 +……+ bkX(t-k)k + Ut-1 -(2)
Multiply P with eqn(2) and subtract it with (1)
(3)=(1)-P(2)
Yt -P Yt-1 = b0 (1-P)+ b1 (Xt1 - PX(t-1)1)+ b2(Xt2 - PX(t-2)2) +……+ bk (Xtk - PX(t-k)k )+ (Ut - P
Ut-1 ) -(3)

Hence, this model becomes free from autocorrelation.


MULTICOLINEARITY

We started by using the best regression model selection using sequential


approach (Step-wise). Therefore it removed all the variables having
multicollinearity, i.e. all the variables having their VIF>2.5.
Hence, it removed 5 of the variables from our analysis and we were left with 4
independent variables having VIF (variance inflation factor)’s value less than
2.5
Hence, our best fitted model was free from multicollinearity.
REFERENCES
 The Climatic Research Unit at the University of East Anglia.
 The ESRL/NOAA Global Monitoring Division.
 The Godard Institute for Space Studies at NASA.
 The SOLARIS-HEPPA project website.
 The ESRL/NOAA Physical Sciences Division.
 https://en.wikipedia.org/wiki/Climate_change_in_the_United_
Kingdom.
 https://climate.nasa.gov/blog/search/?q=GLOBAL+WARMING

Vous aimerez peut-être aussi