Vous êtes sur la page 1sur 8

Timothy Thomas 04/23/2013 Math 547-001 A Bridge Between Linear Algebra and Econometrics By Way of Least Squares Simply

put, linear algebra is the art of solving systems of linear equations (Bretscher 1). It allows us to solve the unknowns within a system of linear equations as well as determine their transformation properties. Solving such unknowns and other linear algebra problems frequently arises in various fields of study including mathematics, statistics, physics, engineering, and economics. In this paper I will focus on the application of linear algebra within economics, particularly in the use of linear regression analysis in econometrics. First I will provide a little background of econometrics and linear regression. I will then jump into discussions on least-squares regression and data fitting. Models for each type will also be presented, displaying the importance of such linear algebra applications. To understand the importance of linear regression we must first establish the field in which it is often applied. This area of study, econometrics, is the application of mathematics and statistics to economics for the purpose of testing hypotheses and predicting future trends (Investopedia). It basically allows an economist to create models that allow economic forecasts to be made. Economists create models and then test them utilizing statistical tools. These models are then compared to real world data to see how well they match up and predict dependent variables. Take for example a graph looking at the correlation between high school students GPA and the income of their families (Pindyck and Rubinfeld 4-5). In this scenario the independent variable (X) is family income and the dependent variable (Y) is GPA. If one were to plot various students GPA along with their family income, one could then apply linear regression and find a function fitted to the data. This application of linear regression in econometrics could allow one to see a connection between family income and a students GPA, and it could predict a students GPA (Y) based upon family income (X). It is in econometrics ability to predict dependent variables and future trends in data that linear regression analysis plays a key role, a tool based upon applications of linear algebra. Linear regression analysis can be used to predict the value of a dependent variable from knowledge of the values of one or more independent variables (Ryan 1). Although it models a relationship between independent and dependent variables, linear regression analysis does not establish a functional or exact relationship between variables. Whereas in a functional relationship the dependent variable is not at first known, in a regression model this variable is usually known. Linear regression takes the independent and dependent variable and fits a linear equation to the data, thereby establishing a way to predict other dependent variables based upon independent variables, the latter also known as regressors. The general formula for a linear regression model is: Y = 0 + 1X + . In this model, Y is the dependent variable, X is the independent variable, and 0 and 1 are the parameters that are being estimated. The error term is represented by , and it indicates that an exact relationship does not exist between X and Y. This formula will change as we move to later regression models, but for now this will hold. The general formula does not indicate that the relationship between the two lines can be represented linearly; instead in saying linear regression model means that the model is linear in its parameters. Once 0 and 1 are found, this formula actually tells us several things about the relationship between X and Y. For example, 1 is the change in Y with a change in X, and 0 is the model which would predict for Y when X is 0 and is also the Yintercept when an error term is not present.

Now that we have established the basics of the linear regression model we can now move into solving actual models by way of a regression method. Linear algebra has not quite come up yet in introducing the basic linear regression model, but in showing a method of regression we can hopefully see linear algebra as an integral part in finding patterns in data and predicting future outcomes. The method that we will look into is regression by way of least-squares analysis. This method calculates a line of best-fit to data points on a graph by minimizing the sum of the squares of the vertical deviations from each data point to the line (Linear Regression). Deviations are squared so that positive and negative deviations do not cancel each other out. Minimizing squares of deviations also decreases the amount of error that comes with fitting a line to the data. To begin using least-squares regression, we will first slightly rewrite the regression model as Yi = 0 + 1Xi + i, where i = 1,2,3,..,n. In this model we see that ordered pairs (Xi,Yi) are written into the regression model, thereby allowing one to derive a number of equations. These equations are: Y1 = 0 + 1X1 + 1 Y2 = 0 + 1X2 + 2 Yn = 0 + 1Xn + n Here we see that the linear regression model has been turned into a set of linear equations. From this set of linear equations we can construct a matrix model represented by the formula, or linear regression in the matrix notation of: 1 1 1 1 2 0 [ 2 ] = [1 ] [ ] + [ 2 ]. 1 1 The notation above can also be simply represented as Y = X + . Carrying out matrix multiplication and addition, we find that the matrix notation above is the same as the set of linear equations found using Yi = 0 + 1Xi + i, where i = 1,2,3,..,n. In any given model matrix X and Y are known, and is the solution matrix that one must find to find the least-squares regression model. The solution matrix, , can be found by using the formula: = [ 0 ] = (XTX)-1(XTY). 1 In this equation XT represents the transpose matrix of matrix X, and (XTX)-1 represents the inverse matrix of the matrix XTX. It must be explained, however, how one arrives at this matrix notation. (Simon) First, we must envision the goal of least-squares regression, which is minimizing the sum of squared residuals, represented by: 1 2 T = [ ] [ ] = . We also know that = Y X, and thus T = (Y X)T( Y X). The derivative of T is then taken with regard to the variable and we are left with -2XT(Y X) = . Upon setting this equal to the zero vector and solving for , we arrive at XTY = (XTX) . It is from this that we then arrive at the solution for least-squares regression = [ 0 ] = (XTX)-1(XTY). (Jennings) 1
..

Now that we have a linear algebra based model for least-squares regression, we can now apply this tool to create a model for random econometrics data. The data set that we will be looking at concerns itself with unemployment rates and levels of education. Our goal will be too use least-squares regression to model the relationship between a level of education (independent variable) and an unemployment rate (dependent variable) that can then be used to predict what a person at a given level of education can expect in regard to employment. We will use data from January 2012 to December 2012 provided by the Bureau of Labor Statistics (attached at the back of this paper) and the unemployment rates will be the average of every 6 months. Here is a scatter plot of the data:
12 Unemployment Rate 11 10 9 8 7 6 5 4 Below High School Diploma High School Diploma Some College Bachelor's Degree and Above Jan. - Jun. Jul. - Dec.

As can be seen we have two sets of data, one occuring during the first half of 2012 the and second occuring during the latter half of the year. Thus we will have to solve for two models for two periods of time. In order to solve for the parameters of our models, we will have to set up the matrix notation for the least-squares solution. Starting with the period of January through June, the matrices X, Y, and XT must be figured out from the data. In trying to find the X matrix, it should be noted that the independent variables are qualitative and not quantitative. To make solving for the model easier, we will simply assign numbers 1 4 to the qualitative variables, with 1 being having below a high school diploma and 4 being having a B.S/B.A. or above. The 1 1 1 2 matrix notation for level of education turns out to be X = [ ], the unemployment rate for the 1 3 1 4 12.8 12 8.2 first part of the year Y1 = [ ] and the second part of the year Y2 = [8.4]. In setting up the two 7.5 6.8 4.1 4 least-square solutions, we arrive at: 1 1 1 12.8 0 1 1 1 1 1 2 1 1 1 1 8.2 (1) = [ ] = ([ ][ ]) ([ ][ ]) 1 1 2 3 4 1 3 1 2 3 4 7.5 1 4 4.1

1 1 1 12 0 1 1 1 1 1 2 1 1 1 1 8.4 (2) = [ ] = ([ ][ ]) ( [ ] [ ]) 1 1 2 3 4 1 3 1 2 3 4 6.8 1 4 4 4 10 32.6 1.5 0.5 T T -1 T We find that X X = [ ] and (X X) = [ ], while X Y1 = [ ] and 10 30 68.1 0.5 0.2 31.2 XTY2 = [ ]. We can now multiply (XTX)-1 by XTY1 and XTY2 to find our solution matrices 65.2 14.2 14.85 (1) and (2). We arrive at the solution (1) = [ ] and (2) = [ ], which gives us the least2.56 2.68 squares regression function of Y1 = 14.85 + -2.68X for the first half of 2012 and Y2 = 14.2 + 2.56X for the second half of 2012. Using the functions we have just found, we can plot them alongside the points on our previous graph.
12 Unemployment Rate 11 10 9 8 7 6 5 4 Below High School Diploma High School Diploma Some College Bachelor's Degree and Above Jul. - Dec.

12 Unemployment Rate 11 10 9 8 7 6 5 4 Below High School Diploma High School Diploma Some College Bachelor's Degree and Above Jan. - Jun. Y1 = 14.85 + -2.68X

What we find is that our previous data is closely modeled by the two least-squares regression functions we have found. The two functions allow one to estimate future dependent variables, which in this case are the unemployment rates, in relation to certain independent variables, which are the education levels. For example, I could estimate that an individual who has gone through 3 years of undergrad had an unemployment rate during the first half of 2012 of

5.5%, and in the second half 5.2%. These estimates were made by simply plugging in 3.5 into each regression equation. From the data we can also assume that as ones education level increases, ones unemployment rate also decreases. It can then be assumed that a person with a doctorate or masters degree will have a lower level of unemployment than that of a person with a bachelors degree. Econometricians use least-squares regression to make estimates such as this, and in creating such models they can foretell future trends and see patterns in the data that may indicate correlations between variables. Similar to least-squares regression analysis is data fitting. Similar in its approach to creating a model by a solution of least-squares, data fitting allows for functions to step beyond simply y = mx + b linear equations and instead tackle models falling into quadratic, cubic, exponential, trigonometric, rational, or polynomial (Bretscher 241). The solution = [ 0 ] = 1 T -1 T (X X) (X Y), as previously stated, still holds in this type of modeling and so we will not have to derive its linear algebra foundations. Econometricians use this statistical tool to model data so that they are not simply limited to linear model, and so that they can predict outcomes based on data that mimics any type of function. To illustrate data fitting as an important tool, we will look at a simple, microeconomic data set concerning utility. We will define utility as a measure of happiness, and utils is the arbitrary unit by which utility is measured. Consider a made-up person, Samantha, whose utility is simply measured by how many Reeses cups she is able to consume. After questioning Samantha, we find this graph representing Samanthas utility from various numbers of Reeses cups (table attached at back):

Utility of Reeses
30 25 20 Utils 15 10 5 0 0 5 10 15 Number of Reeses 20 25 30

We find that the data seems to take on a quadratic shape, and so we will utilize the formula Y = c0 + c1X + c2X2. By plugging in the points on graph into the formula just stated, we arrive at the linear set of equations: 5 = c0 + c1 + c2 16 = c0 + 5c1 + 25c2 21 = c0 + 9c1 + 81c2 31 = c0 + 15c1 + 225c2

21 = c0 + 21c1 + 441c2 13 = c0 + 28c1 + 784c2 Since we have already entered X,Y, and parameters (cn where n = 0,1,2) into matrix notation before, we will not do so again. From the set above, we find the solution for the leastsquares problem to be: 0 c* = [1 ] = 2 1 1 1 5 25 1 1 9 81 [1 5 15 225 1 25 21 441 28 784]) (

1 1 [1 5 1 25 (

1 9 81

1 1 1 1 1 1 15 21 28 ] 1 225 441 784 1 [1

5 16 1 1 1 1 21 9 15 21 28 ] 31 81 225 441 784 21 [13])

0.977 0.139 0.004 6 79 1557 T -1 We find that XTX = [ 79 1557 35443 ] and (X X) = [0.139 0.029 0.001], 0.004 0.001 0 1557 35443 866949 107 4.059 while XTY = [ 1544 ]. We can now find the solution of c*, which is (XTX)-1(XTY) = [ 1.369 ]. 28534 1.116 Thus, we know that Samanthas preference for utility for Reeses cups can be modeled by the equation Y = 4.059 + 1.369X + -1.116X2. With this function an econometrician could then estimate how many Reeses cups would maximize Samanthas utility by finding the maximum point on the quadratic equation. The econometrician also knows that providing beyond the maximizing number of Reeses cups is a waste of resources and not beneficial to Samantha. As can be seen from examples of least-squares regression and data fitting, the study of econometrics can utilize linear algebra to find ways to model data. Modeling data by matrix notation allows economists to predict future outcomes and observe patterns within the data to see if two variables are correlated. It should be stated that linear algebra applications are not simply limited to utility and finding patterns between education and employment, but they are also useful in a variety of macroeconomic and microeconomic studies. By way of least-squares approximations we built all of our models, but it must be said that other tools also exist within statistics that drive solutions in econometrics, utilizing other ways of regression analysis. Although we only covered a few ways that linear algebra could model data within econometrics, linear algebra has other applications within this field and many others that has made modeling and predicting data much easier.

Works Cited Brannick, Michael T. "Regression with Matrix Algebra." University of South Florida. Web. 3 Mar 2013. <http://luna.cas.usf.edu/~mbrannic/files/regression/regma.htm>. "Econometrics." Investopedia. N.p., n.d. Web. 1 Apr. 2013. <http://www.investopedia.com/terms/e/econometrics.asp>. Jennings, Kristofer. Statistics 512: Applied Linear Models. Purdue University. PDF File. "Linear Regression." Yale Statistics. Yale University. Web. 3 Mar 2013. <http://www.stat.yale.edu/Courses/1997-98/101/linreg.htm>. Pindyck, Robert, Daniel Rubinfeld. Econometric Models and Economic Forecasts. 4th Ed. Boston: Irwin McGraw-Hill, 1998. Print. Rencher, Alvin C. Methods of Multivariate Analysis. Hoboken, New Jersey: Wiley, 2012. Print. Robinson, Enders A. Least Squares Regression Analysis in Terms of Linear Algebra. Houston: Goose Pond Press, 1981. Print. Ryan, Thomas P. Modern Regression Methods. 2nd ed. Hoboken: Wiley, 2009. Print. Simon, Laura J. Lesson #8: Overview of Multiple Linear Regression.Penn State Department of Statistics. The Pennsylvania State University. Web. 3 Mar 2013. <http://online.stat.psu.edu/online/development/stat501/08multiple/07multiple_matrix.htm> "U.S. Bureau of Labor Statistics." U.S. Bureau of Labor Statistics. U.S. Bureau of Labor Statistics, n.d. Web. 30 Mar. 2013. <http://www.bls.gov/>.

Samanthas Preferences # of Reeses cups 1 5 9 15 21 28 Utils 5 16 21 31 21 13

Vous aimerez peut-être aussi