ECO 231 Empirical Project Mets Attendance

Empirical Project: Mets Attendance (1986-2011)
Kevin Mulcahey ECO-231 Dr. Letcher The College of New Jersey
I.
Statement of the Problem This empirical study seeks to discover the explanatory variables that best reflect the
stadium attendance of the New York Mets from 1986 to 2011. The explanatory variables that I have decided to study are the years since last playoff appearance, payroll (adjusted for inflation), number of all-stars, winning percentage, and average batters age. In order to discover the relationship between these explanatory variables and the dependent variable of attendance, multiple regression, residual plots, normal probability plots, and various other statistical analyses will be employed. A prediction for the Mets stadium attendance at Citi Field in 2012 will be conducted using a regression equation as well. II. Review of Literature Related to the Variables Before beginning the study, statistical journals and analyses regarding Major League Baseball attendance, payroll, winning percentages, and other variables will be consulted. The first journal, written by Don N. Macdonald and Morgan O. Reynolds of Texas A&M University, analyzes the relationship between players and their marginal product by using many of the same explanatory variables from my empirical study. Marginal product is defined as the amount of total revenue earned by the company by hiring one extra unit of labor. Research from the seasons of 1986 and 1987 in Major League Baseball proves that players are paid for what they earn for their respective teams in ticket sales. The reason for payroll correlating directly to revenue for MLB organizations has a lot to do with the institutions of free agency and final offer arbitration. Free agency allows players to test the market and seek the best offer for their abilities among all major league teams. Arbitration allows for pay increases during contracts, based upon performance. These two contractual outlets for players
allow their marginal revenue product to correlate more directly with ticket sales, attendance, and team revenue. These findings relate to my data very closely, as my dependent variable is attendance, and the explanatory variable with the lowest p-value is payroll (adjusted for inflation). My data also begins in 1986, just like the study performed by MacDonald and Reynolds. The allowance for arbitration officially began in 1970, when a second MLB collective bargaining agreement allowed impartial arbitrators to settle player contract disputes, as opposed to the commissioner of baseball. In the season of 1985, arbitrators discovered that owners across Major League Baseball were colluding to keep baseball player salaries artificially low, thus reducing competitive bidding. For their collusion, owners were fined $280 million dollars in damages, and baseball team payrolls have steadily increased each season thereafter. The journal article by Macdonald and Reynolds also relates to another one of my variables. Winning percentage, they say, is not as important of a significant predictor of attendance when compared with statistics that forecast a teams success. For example, if a teams winning percentage increases from .500 to .550, the increase from .550 to .600 will not make a noticeable difference on stadium attendance. This is due to the fact that fans view entertainment on an, ex ante basis rather than an, ex post, basis (Macdonald, 445). In other words, the forecasting of a teams success is more conducive of sales and attendance than post performance success. People are more likely to buy more tickets when they expect a team to perform well, rather than once a team is already doing well. In relating this idea to my variables, the number of all-stars and team payroll would be a more significant predictor of attendance than winning percentage. Payroll and all-stars are similarly related because when a roster is popular and high quality, attendance is more likely to increase in a given season.
Another statistical journal, written by Michael C. Davis of Missouri-Rolla University analyzes the interaction between baseball attendance and winning percentage. According to Davis, the interaction between baseball attendance and winning may not be completely obvious. It is expected that as a team performs well, the organizations, bandwagon effect, will come to fruition and a team should, therefore expect to see an increase in attendance during and following seasons in which the team played well on the field (Davis, 4). This journal article implies that although winning percentage affects attendance directly (winning has become increasingly important to fans in recent years), attendance also could affect winning percentage. When a team generates superior attendance and revenue, winning percentage should rise because successful organizations have more room in their budget to attain high quality players. In my regression, I chose to place payroll and winning percentage as the explanatory variables, and attendance as the dependent variable. The conclusion of the study proved that in the long-term, all ten of the sampled teams (Cubs, Reds, Yankees, White Sox, Phillies, Pirates, Indians, Tigers, Cardinals, Red Sox) exemplified positive attendance growth with winning percentage as an explanatory variable. Also, by the conclusion of the study, the data proved that only one team, the Indians, had a positive effect on winning percentage with attendance as an explanatory variable. This would indicate that my chosen dependent variable, attendance, is the best choice between the two. Winning percentage is a better explanatory variable in Major League Baseball. A final statistical analysis, conducted by market research analyst David P. Kronheim, takes into account another variable that affected New York Met attendance in the past 3 years. This journal discusses the effect of the stadium, which can have an effect when taken into account with the numerical data I employed in my analysis. Kronheim raises the point that when
the Mets moved to their new stadium in 2009, Citi Field, attendance was going to decrease regardless of performance. The total amount of seats in Shea Stadium was 57,365, whereas Citi Field has only 41,800 seats. From 2005 to 2007, The Mets had gains in attendance of more than 470,000 per year, which made them one of the top 2 teams in the National League in attendance increases. The Mets were very competitive during these last few years at Shea Stadium. Kronheim notes that, If the Mets had sold every single ticket possible in 2009, including player and comp tickets, their attendance still would have fallen by 656,243 (Kronheim, 31). However, the Mets still had quality attendance in 2009 at Citi Field, as 3,168,571 spectators attended. The huge drop off from 2009 to 2010 of 576,166 (an 18.4% decline), has to do with the lesser amount of seats, as well as other statistics. The Mets fell below a .500 winning percentage again in 2009-2011 and had less all-stars. In fact, the smallest attendance at any Mets home game in 2008 was 45,321, which is more than 3,500 higher than Citi Fields capacity (Kronheim, 31). The Mets were playoff contenders in their last year at Shea Stadium, although they missed out on the playoffs on the last game of the season. Attendance that year was 4,042,045. I decided not to include the type of stadium in my personal regression analysis, because the data dates back to 1986, and the Mets have only been at Citi Field since 2009. The majority of the analysis comes from Shea Stadium from 1986-2008, and the new stadium statistics would only appear to be outliers. However, I wanted to include this market research journal in my report because it could partially explain the drastic drop in the most recent data I have compiled (2009-2011). III. Data Sources and Descriptions
In compiling the Mets data set from 1986 to 2011, I used two main sources. For the dependent variable of attendance, as well as the explanatory variables of winning percentage, years since last playoff appearance, payroll, and average batters age, I used BaseballReference.com. In order to discover the number of all-stars per year for the New York Mets, I used Mets.com. I also decided to adjust the payroll for each year from 1986 to 2010 for inflation in order to have the most accurate comparison possible. The inflation calculator on bls.gov aided me in this process. As mentioned earlier, my data organizes the effect of years since last playoff appearance, payroll (adjusted for inflation), number of all-stars, winning percentage, and average batters age on stadium attendance for the New York Mets from 1986 to 2011 (Figure 1). For the first few years of the data, namely 1986 to 1990, the Mets were very successful. After making the playoffs in 1985, 1986, and 1988, and winning the World Series in 1986, total season stadium attendance ranged from 2.7 million to 3 million. In these 5 years, The Mets had high winning percentages ranging from .537 to .667, and a total of 19 all-stars, which is an extremely high amount. In direct contrast to the years of 1986-1990, the Mets performed horribly from the years of 1991 to 1998. The average batters age during these years was much younger than during successful years (27 as opposed to 30 in 2000 when they made it to the World Series), winning percentages were in the dismal range of .364 to .478, and payroll was much lower, highlighted by the 35,015,247.14 team payroll in 1996. Attendance in these years was very low, as it rarely broke 2 million. The Mets performed poorly again from 2001 to 2005, performed well from 2006 to 2008, and performed poorly again from 2009 to 2011. These three Mets eras indicate fluctuations in the dependent variable of attendance in correlation with most of the explanatory variables.
IV.
Scatter Plots, Multiple Regression, Variable Selection, and Analysis I have identified my explanatory variables, or X-variables, in this study as years since last
playoff appearance (YSLPA), payroll adjusted for inflation (Payroll), winning percentage (Win%), number of all-stars (All-stars), and average batters age (Avg Batt. Age). The dependent variable, or Y-variable, is attendance (Attendance). The data set is made up entirely of numeric explanatory variables. First, individual scatter plots of each explanatory variable were created. The scatter plot of the X-variable YSPLA against the Y-variable attendance is shown below.
YSLPA v. Attendance (Figure 2)

4500000 4000000 3500000 3000000 2500000 Attendance 2000000 1500000 1000000 500000 0 0 2
y = 12415x3 - 159530x2 + 290507x + 3E+06 R = 0.6341
10
12
Years Since Last Playoff Appearance
The scatter plot of YSPLA against Attendance indicates that as the amount of years since the Mets have reached the playoffs increases, the attendance decreases. Originally, I tried a linear trend line to fit the data. The linear line fit the data relatively well, aside from three data points from years 8 through 10. The R-square for the linear line, was only .4, however, and I opted to try a quadratic or cubic equation to reflect the curvature of the data in years 8, 9, and 10.
The R-square improved dramatically from .4 to .634. Although it would appear that there should be a direct, negatively linear line that fits the data, there could be an explanation for the curvature. In years 8, 9, and 10 of missed playoff berths, according to the data set, the Mets were starting to come out of their decade-long slump to become possible playoff contenders. It is possible that the fans, in anticipation of the better performance of the team and potential playoff implications, started to attend more games Payroll, my second X-variable, has a scatter plot that reveals some curvature as well. By using the R-square as a measure of fit, I decided that a linear equation was not appropriate. A linear equation had an R-square of .1, while the 4-order quartic equation
Payroll v. Attendance (Figure 3)

4500000 4000000 3500000 3000000 Attendance 2500000 2000000 1500000 1000000 500000 0 0
y = 2E-25x4 - 7E-17x3 + 1E-08x2 - 0.5384x + 1E+07 R = 0.4812

50,000,000 100,000,000 Payroll 150,000,000 200,000,000
had an R-square of .48. In choosing a 4-order equation, I took into account the R-squares of quadratic and cubic equations, and decided that parsimony did not apply. With each increase of higher orders, I received improved R-squares ranging from differences of .08 to .10. Therefore, the increases in goodness of fit were significant enough to alter the equation further.
For my third X-variable, All-stars, I analyzed the coefficient of determination, the Rsquare, once again. Although a linear equation had a decent fit at .397, I still opted with a cubic polynomial equation. The R-square for the fit of this equation was .419.
All-stars v. Attendance (Figure 4)

4500000 4000000 3500000 3000000 2500000 Attendance 2000000 1500000 1000000 500000 0 0 1 2
y = -41835x3 + 270645x2 - 49072x + 2E+06 R = 0.4192

3 All-stars 4 5 6
Clearly, as teams acquire more talented players, attendance increases. However, there is still some curvature that prevents the data from being linear. A possible explanation for this curvature could be that as a team has 1, 2, or 3 all-stars, the teams prospects for attendance rises dramatically, but the excitement fans have for all-stars 4, 5, and 6, increase at a slower rate. While the attendance rates are still higher, this may suggest that a team only performs marginally better with more than 3 or 4 all-stars. The fourth X-variable, Win%, has somewhat of a sporadic scatter plot. The goodness of fit, regardless of the type of equation, seems to be relatively low. The highest R-square I was able to attain was .284 with a cubic equation. The curvature indicates that there is low attendance from winning percentages of .35 to .45. This may suggest that regardless of higher or lower winning percentages, fans do not wish to attend games because the team is not competitive
10
within this percentage range. There are, however, dramatic increases in attendance from the winning percentages of .45 to .55. With these percentages, the Mets have a chance at playoff aspirations.
Win% v. Attendance (Figure 5)

5000000 y = -2E+08x3 + 4E+08x2 - 2E+08x + 3E+07 R = 0.2839 4000000 3000000 Attendance 2000000 1000000 0 0.3 0.35 0.4 0.45 0.5 Win% 0.55 0.6 0.65 0.7
There are still modest increases in attendance from .55 to .6, before it levels off. The next pattern in the curvature reflects that attendance actually decreases at the winning percentage of .667, but this could reflect an outlier, because the majority of the data has already leveled off from .6 to .65. My final X-variable is the Mets average batters age, Avg. Batt. Age|, for each season from 1986 to 2011.
11
Avg. Batt. Age V. Attendance (Figure 7)

y =5000000 4 + 1E+07x3 - 5E+08x2 + 1E+10x - 7E+10 -93161x 4000000 R = 0.3337
Attendance 3000000 2000000 1000000 0 27 27.5 28 28.5 29 29.5 30 30.5 31 Avg. Batt. Age
For this fifth variable, a polynomial equation was appropriate once again. A quartic equation appeared to have the highest R-square, with a value of .334. A linear equation would not have been as appropriate, because it appears that attendance rises from the average batters ages of 27.5 to 28, before leveling off from 28.5 to 29.5. Attendance then rises dramatically from the age of 30 and up. A possible explanation for this curvature could be that an older lineup may have more experience, reflect better performance, and thus affect attendance. This variable, however, proved to be insignificant toward attendance, as shown by the multiple regression performed in the subsequent portion of this study. Additional statistical analysis, aside from scatter plots and goodness of fit, is required in order to discover significant predictors of attendance. A multiple regression including each explanatory variable against the dependent variable of attendance indicated that a form of
12
variable selection was necessary (Figure 8).
First, a global F-test was run against each of the variables to decide whether any of them were significant predictors. The hypothesis test for the global F-test is as follows:
H0: B1+B2+B3+B4+B5= 0 Ha: At Least One of the Betas 0
As far as the alpha level for the hypothesis test, I decided to use an alpha of .15. My reasoning is that studies of social sciences are conducted with human beings, and there is usually more variation. Therefore, I do not want to reject any variables that could be found significant. The global F-test showed a very low P-value of .00002 (Figure 8). According to the P-value, I chose to reject the null hypothesis, and concluded that at least one of the explanatory variables is significant. After conducting the global F-test and deciding at least one variable was significant, I chose to use backward selection to narrow down my set of explanatory variables. Right away, according to the first regression (Figure 8), the X-variable of Average Batters Age had an
13
extremely high P-value of .6058. The P-value for this explanatory variable is much higher than the alpha level, and therefore is eliminated. Figure 9 is shown below to reflect the new multiple regression without the eliminated variable.
Upon eliminating the variable of Average Batters Age, the P-values for every other variable were well below the alpha level of .15, and were rendered significant predictors of attendance. The overall R-square for the second multiple regression with only 4 variables fell by just .004, and the standard error increased by minimal amounts as well. These statistics are not significant enough to indicate that the removal of Average Batters Age causes a weaker regression equation. Also, the Global F-test after the removal of Average Batters Age indicated that the P-value fell even lower, indicating that the right decision was made. V. Forming the Final Regression Equation and Testing Assumptions After running two multiple regressions and employing backward selection once, I was able to form the following regression equation:
Predicted = 393803.6703 + B1 -80945.75352(X1) + B2 0.008003511(X2) + B3 176153.9487(X3) + B4 2567089(X4)=
14 1. 2. 3. 4. 5. = Mets Attendance X1 = Years Since Last Playoff Appearance X2= Payroll (adjust for inflation) X3= Number of All-Stars X4= Winning Percentage
Upon recommending a regression equation, the disturbances must be checked to make sure that they do not violate any of the assumptions. The four assumptions include that the expected values of the disturbances add up to zero, the disturbances have constant variance, the disturbances are normally distributed, and the disturbances are independent. In order to check these assumptions, a residual plot of the residuals versus the predicted-y values is made.
Residuals v. Predicted Attendance (Figure 10)

1000000 500000 Residuals 0 500000 1000000 1500000 2000000 2500000 3000000 3500000 4000000 -500000 -1000000 Predicted Attendance
While interpreting the residual plot, it is clear that all of the residuals are normally and randomly distributed. There are no concave or fanning patterns, and almost all of the data points are within 95% of the data, accurately portraying the empirical rule. This reflects a very desirable residual plot that does not violate any of the assumptions of the disturbances. Another measure that ensures that the residuals fall within a normal distribution is a normal probability plot. In this graph, the residuals are plotted against Z-scores, which creates a range of a normal standard distribution for the points. The main assessment made in determining whether or not the data in a normal probability plot is desirable is the straightness of the points.
15
Normal Probability Plot (Figure 11)

1000000 800000 600000 400000 200000 Residuals 0 -200000 -2 -400000 -600000 -800000
-1
Z-Values
Upon review, the normal probability plot seems to be almost completely straight, and reflects a desirable plot. The residuals are normally distributed according to the residual plot and the normal probability plot, and no assumptions appear to be violated. VI. Prediction from the Regression Equation After assessing that my equation was reasonable, and that none of the assumptions of the disturbances of the data were violated, I made a prediction of the Mets stadium attendance for the 2012 season. While deciding the values for my explanatory variables, I took many considerations into account. First, the Mets have payroll constraints for the season of 2012 that have been set by GM Sandy Alderson. He is restricting the Mets payroll between 100 and 110 million. I decided to take an average and make Payroll 105 million. For my All-star variable, I realized that the Mets will probably not be able to re-sign 2-time all-star Jose Reyes. I decided to keep my total amount of all-stars for the Mets next year at 3. As for the amount of years since their last playoff appearance, YSLPA, that number will increase from 5 to 6 in 2012. Finally, I decided to increase the Mets winning percentage, Win%, to .531, under the assumption that
16
they will not have the same amount of injuries as 2011, and that they will acquire a better set of relief pitchers. My prediction appears below:
( Predicted Mets Attendance) = B0 393803.6703 + B1 YSLPA -80945.75352(6) + B2 Payroll 0.008003511 (105,000,000) + B3 All-stars 176153.9487(3) + B4 Win%2567089.4 (.531) =
2,640,084
This prediction is reasonable, as it reflects an increase in attendance of 287,488 for the season of 2012. If the Mets have more wins and one more all-star than 2011 (indicating popular, high caliber players for fans to see), it is reasonable that they could have close to 300,000 more fans attend games next season. However, the Mets payroll and amount of years since the playoffs will worsen. This equation shows that popular players and wins are more important to fans than the previous years success or the total team payroll. VII. Suggestions for Future Research
After completing my empirical project of Mets attendance, there are a multitude of additional factors I would like to study if I had the time or money. One issue that I had with my regression is that I wanted to invert my years since last playoff appearance (YSLPA) variable, but the zero values became undefined. I tried multiple remedies for this problem, but they all involved manipulations of the data set. Inversion would have dramatically improved my Rsquare, and would have fit the data much more appropriately. Inverting an X-variable is typically appropriate when a scatter plot reveals data that has a downward sloping curve and diminishes over time. My scatter plot of YSLPA v. Attendance reflects this description, and my standard error, R-square, and adjusted R-square all would have improved. I would be interested to see how this issue could be resolved.
17
Another suggestion for future research would be to employ quadratic, cubic, and quartic manipulations of all of my X-variables. Although this would be far too time consuming for the purposes of this project, the goodness of fit for each of the scatter plots would improve dramatically, and the predicted equation would be more accurate as well. A final aspect that I would like to study is the effect of specific teams, players, and promotions on stadium attendance. If I was able to gather these categorical X-variables, I would be able to get a sense of how fans react when players such as Jose Reyes decide to test the free agent market and sign elsewhere. Also, from a marketing perspective, it would be prudent to understand which opponents draw less spectators to Mets home games. This way, market researchers for the Mets could schedule promotions for games that are played against the less popular opponents.
18
VIII. Reference List Data Baseball-Reference Web Site. (2011). Retrieved November 5, 2011, from http://www.baseballreference.com/teams/NYM/attend.shtml. New York Mets Web Site. (2011). Retrieved November 5, 2011, from http://www.newyork.mets.mlb.com. Statistical Journals Are Baseball Players Paid Their Marginal Products? Don N. MacDonald and Morgan O. Reynolds Managerial and Decision Economics , Vol. 15, No. 5, Special Issue: The Economics of Sports Enterprises (Sep. - Oct., 1994), pp. 443-457 The Interaction Between Baseball Attendance and Winning Percentage: A VAR Analysis. Michael C Davis. University of Missouri-Rolla, Department of Economics. http://umresearchboard.org/resources/davis/Baseball_Attendance_Winning.pdf Major League Baseball 2010 Attendance Analysis: METS SUFFER A SECOND STRAIGHT HUGE LOSS. David P. Kronheim. Retrieved November 19, 2011, from http://www.numbertamer.com/files/2010_MLB_Attendance_Analysis.pdf Other Information Mets Payroll Will Hinder Teams Success. Adam Rubin. ESPN Web Site. Retrieved November 19, 2011, from http://espn.go.com/new-york/mlb/story/_/id/7123377/new-york-mets-payroll100-million-amazin-struggle-compete-2012.
19
APPENDIX
Figure 1 New York Mets Attendance Data Set
20
21

ECO 231 Empirical Project Mets Attendance

Transféré par

Informations du document

Description originale:

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

ECO 231 Empirical Project Mets Attendance

Transféré par

Droits d'auteur :

Formats disponibles

Empirical Project: Mets Attendance (1986-2011)

Kevin Mulcahey ECO-231 Dr. Letcher The College of New Jersey

YSLPA v. Attendance (Figure 2)

y = 12415x3 - 159530x2 + 290507x + 3E+06 R = 0.6341

Years Since Last Playoff Appearance

Payroll v. Attendance (Figure 3)

y = 2E-25x4 - 7E-17x3 + 1E-08x2 - 0.5384x + 1E+07 R = 0.4812

All-stars v. Attendance (Figure 4)

y = -41835x3 + 270645x2 - 49072x + 2E+06 R = 0.4192

Win% v. Attendance (Figure 5)

Avg. Batt. Age V. Attendance (Figure 7)

variable selection was necessary (Figure 8).

Residuals v. Predicted Attendance (Figure 10)

Normal Probability Plot (Figure 11)

Vous aimerez peut-être aussi