MLB Salaries

Predicting Major League Baseball
Salaries through Offensive Statistics
Devin Ensing
ECON 375
Professor Melick
December 7, 2010
Ensing 2
I. Introduction
Until the 1970s, baseball was believed to be a sport best understood through
observation. In 1977, statistician Bill James challenged this belief, theorizing that
baseball is best understood through numbers. He released annual books detailing his
exploration into the world of baseball through statistics, and quickly gained a large
following. However, major league baseball was still filled with old baseball men who
believed in scouting over numbers, ratings over statistics. This finally changed in the
1990s, when Sandy Alderson took over as general manager of the Oakland Athletics and
hired Billy Beane as a scout. Beane soon became the GM of Oakland, and changed how
baseball teams were built by relying on statistics instead of scouting reports. The book
Moneyball followed Beane and the A’s during 2002, and revolutionized the game of
baseball by changing the way it was viewed by both fans and those directly involved in
the game. This paper is attempting to show that the market for baseball players is
significantly different after Moneyball than before the book was released.
II. Review of the Literature
Extensive statistical analysis with baseball salaries could not be reasonably
researched until the 1970s. Until 1975, major league baseball teams owned their players
through the reserve clause, which essentially bound players to their team for life.
MacDonald and Reynolds (1994) noted that the reserve clause kept salaries below what
they would be in a competitive market. Scully (1974) wrote that there was a high level of
monopsonistic exploitation in baseball at the time, finding that an average player in
baseball received about 20 percent of his net marginal revenue product over his career.
Ensing 3
He concluded that the exploitation was of “considerable magnitude”. In 1975, the reserve
clause was removed, and free agency was available to players. After the removal of the
reserve clause, players began to earn more and more money, with MacDonald and
Reynolds finding average salaries climbing from $29,000 in 1970 to $150,000 in 1980.
Vrooman (1996) shows that by 1985-1987, roughly 80 percent of players that were
eligible for free agency were overpaid because of their “artificial monopoly power”
(347). He points out monopolistic inefficiencies in the free agent market, and concludes
by arguing that as of 1987, the labor market in baseball still involved “lower-tier
monopsonistic exploitation and upper-tier monopolistic inefficiency” (358). That is,
rookies and players with very few years of experiences are not paid what they are worth,
while players who have already hit free agency and have earned large contracts are
overpaid.
One of the problems for both players and general managers was that nobody
really knew how much a player was “worth”. Many articles tried to estimate the best
indicator of an offensive player’s performance, but there were contrasting results. In
1974, Scully argued that slugging percentage is the best indicator of the ability of hitters,
as it showed the highest correlation with hitting ability. But by 1994, MacDonald and
Reynolds argued that a player’s value is based on his contribution to team winning
percentage, as team winning percentage was significantly correlated with team revenue,
so owners will want players that most increase the team’s revenue. When they ran their
regressions, they found that “mean runs scored arguably is the best indicator of an
offensive player’s production” (447), as opposed to Scully’s earlier claim that slugging
percentage was the best indicator of worth.

Ensing 4
The labor market for baseball players showed little change from 1986 up to the
early 2000s, when the general manager of the Oakland Athletics, Billy Beane, began to
exploit the inefficiencies. Lewis (2003) followed the Athletics in the 2002 season in their
pursuit of winning a championship. Moneyball showed how the small-market Athletics
could compete with large budget teams such as the Boston Red Sox and New York
Yankees. The central premise of Beane’s theory to winning was to exploit inefficiencies
in the labor market for baseball players. Hakes and Sauer (2006) note the “valuation of
skills in the market for baseball players was grossly inefficient” (173). Certain offensive
statistics were overvalued, such as batting average and runs batted in, and some were
undervalued, such as on-base-percentage (OBP) and slugging percentage. Hakes and
Sauer showed that the “ability to get on base was undervalued” (175). Beane’s critical
principle was that players who were most valuable to their team were those with the
highest on-base percentages, and those players were grossly underpaid. Beane believed
that OBP was the most important offensive statistic because outs are the “currency” for a
baseball game, so players that get on base more should be worth more. But in 2002,
baseball valued players who could hit massive home runs or steal an excess of bases
much more than they valued the player who could get on base any way possible, be it by
a hit, a walk, or a hit-by-pitch.
Beane concluded that a team of players with high OBPs would be both very cheap
and compete very well. Lewis stated that the overall goal of the front office was to build a
team with the minimum payroll required to successfully contend for a playoff spot. As
Hakes and Sauer show, the A’s executed this strategy so well that they were able to
substitute new, cheaper players in for individual superstars, such as Jason Giambi, and
Ensing 5
still maintain team success, in some cases becoming an even more successful team. The
A’s made the playoffs four straight years with some of the lowest payrolls in baseball,
and the improved performance on the field increased attendance, which in turn increased
the revenue of the team. Hakes and Sauer concluded that the “Oakland strategy for
winning games was a successful exploitation of a profit opportunity” (183).
After Lewis published Moneyball in 2003, people around baseball vigorously
denied the A’s claims, but Hakes and Sauer argue that “market adjustments were in
motion” (174) and that the labor market no longer underpaid the ability to get on base, all
within a year of the book’s publication. They state that the market adjustments from
Moneyball caused the labor market to move further away from being a monopsonistic
market and more closely resemble a perfectly competitive market. Scully agrees, and
concludes that the labor market is not a perfect competition. As time has passed, the
market has been moving closer towards a perfect competition as information is
accumulated.
III. Economic Theory
We assume that in the labor market for baseball players, offensive statistics
determine the value of each player to their team. It should be noted that the market for
pitchers is beyond the scope of this paper. The goal is to find which statistics best allow
for an accurate measurement of the marginal product of labor (MPL) which is part of the
demand curve in the labor market. As discussed in the review of the literature, many
articles have already tried to estimate MPL, using statistics such as runs and slugging
percentage. This paper is going to use many different offensive performance statistics to
Ensing 6
estimate MPL. In general, a player with a better offensive skill set, or a higher MPL will
increase the probability of his team scoring runs, which will consequently increase the
probability of his team winning. That will increase the revenue of the team, as
MacDonald and Reynolds showed that a team with a better record would earn higher
revenue from ticket sales. If we can identify which offensive statistics accurately measure
a player’s value, then we can determine if and why certain players with differentiating
skill sets are paid differently, both before and after Moneyball.
We need to look at both the perfectly competitive market and the monopsony
market, as the baseball labor market is presumably somewhere between the two. In a
perfectly competitive labor market, a worker is paid the value of his MPL. In a
monopsony, owners can pay players less than the value of their MPL, so they will be able
to turn a larger profit while still staying competitive. As there are 30 teams competing for
players, the labor market should resemble a competitive market. However, as the A’s
showed, some players still receive less than they produce. But, the decrease in
asymmetrical information means that more and more players are now receiving the value
of their marginal product of labor. This means that the market is moving closer towards
perfect competition.
To determine the expected value of the salaries in a perfectly competitive market,
we must derive the supply and demand curves in the general model. The demand curve,
or the value of the marginal product of labor (VMPL), is simply equal to the marginal
product of labor multiplied by the output price, as the cost of hiring one more worker
should be less than or equal to the revenue that the hired worker can generate. So as hours
of labor increase, the wage for each worker should decrease, as their productivity is
Ensing 7
experiencing diminishing marginal returns. Therefore, the VMPL curve is downward
sloping, as can be seen in figure 1. The model will be mainly focused on this curve, as we
want to determine how wage changes as MPL changes. If we hold everything but MPL
constant, including the output price, we can determine how much a change in MPL will
change wage, or in this scenario, salary. If MPL increases, then we should see an increase
in salary, and if MPL decreases we should see a decrease in salary. As a result, we can see
which statistics affect salary the most, and whether or not there is a difference before and
after Moneyball.
The labor supply curve is determined by looking at budget constraints and
indifference curves. A worker has a choice between two “goods”, income and leisure
(figure 2), and the budget constraint will have a y-intercept of 24*wage, if the worker
worked all 24 hours a day. The x-intercept will be if the worker does not work at all,
which is located at the point (24,0). So we can see that the slope of the budget constraint
is equal to the negative wage. When we change the wage, the worker’s indifference
curves will shift depending on their preferences, and the different equilibrium points for
the different wages are then plotted to determine the worker’s supply curve.
There are two possibilities for the supply curve, depending on the size of the
income and substitution effects. An increase in wage has two effects. According to the
income effect, it can cause workers to work less, as they can earn the same amount of
income in less time, therefore leading to a decline in hours worked and an increase in
leisure. At the same time, according to the substitution effect, an increase in the wage
also causes leisure to become more expensive, as more income could be made instead of
consuming leisure, therefore causing workers to demand less leisure and work more. If
Ensing 8
the income effect dominates the substitution effect, then the worker’s supply curve will
be backward bending. If the substitution effect dominates the income effect, then the
supply curve will be upward sloping, which can be seen in figure 3.
Now that we have determined the equilibrium graph for a perfectly competitive
market, which can be seen in figure 4, we need to look at the monopsony model. In this
model, workers are not compensated properly for their work. The supply curve stays the
same, and is used to calculate wage. But there is now the marginal cost of labor curve,
which is steeper than the supply curve, as seen in figure 5. It is used to determine the
labor for a worker. The demand curve is still the value of the marginal product of labor
curve, which is equal to the output price multiplied by MPL, just as in the perfectly
competitive market. Labor in the monopsony model is found at the intercept of the MCL
and VMPL curves. Wage is then found at the point when the supply curve is equal to
labor (figure 6). Our goal in the monopsony model is the same as in the competitive
model; we want to set everything constant and then shift the demand curve, by shifting
MPL, to determine how the wages will shift.
We can see that in a monopsony model, a worker will receive a lower wage than
the value of his marginal product of labor while also working less than in a perfectly
competitive market (figure 7). Given this, while the monopsony labor market is great for
owners as they can increase profits, players are receiving a lower salary than the value
they are producing. The Oakland A’s were operating as if the baseball labor market was a
monopsony by paying players with a high OBP less than they were worth to the
franchise. It helped them win more games at a cheaper cost than their competition, and
although the market quickly adjusted, it has still not become perfectly competitive, which
Ensing 9
shows there still may be players who are receiving less than they are producing. Before
Moneyball, there were arbitrage opportunities for teams, but after the book, the market
should have become more of a perfectly competitive market, removing the arbitrage
opportunities and causing salaries to reflect the actual value of a player’s labor. This
should be reflected in the regressions, as some variables will have a much different affect
on salary before and after Moneyball.
IV. Data
I collected data on offensive performance for hitters from 1995 to 2010, from both
http://www.baseball-reference.com and http://www.thebaseballcube.com. I chose these
years because they represent a fairly large sample size, beginning in 1995 after the
baseball strike in 1994, which would have skewed the data, until last year. I collected
data on position players – no pitchers, only players that played defense and hit – who
qualified for the batting title. To qualify for the batting title, a player must have at least
3.1 plate appearances per game over an entire season. A plate appearance (PA) is every
time the batter gets into the batter’s box and a play occurs, whether the outcome is an at-
bat, walk, sacrifice, or anything else. In 1995, there were 144 games played, as the
beginning of the season was slightly delayed due to the strike, which means that the
minimum plate appearances to qualify for the batting title were 446. In every other year
in the data set, there were 162 games played, so a player must have at least 502 PAs to be
included in the study. For each player, I collected their offensive statistics, such as home
runs and runs batted in, their salary for the year, their team and the league their team
plays in, their age (as of June 30th of the year), and the position they played. There are a
Ensing 10
total of 603 different players and 2474 player years included in the study. Table 1
presents summary statistics on all of the variables used in my regressions.
The minimum salary was $109,000, earned by five different players in 1995 and
1996, and the maximum was $33 million, earned by Alex Rodriguez in both 2009 and
2010. The mean salary was $4.3 million. There is a large gap between the majority of the
salaries and the salaries of superstars, so to negate this, I use the natural log of the salaries
in my regressions. Unfortunately, salary information is not entirely accurate, as some
salaries include earned bonuses, while others do not, and some salaries depend on the
team that the player is on. In general, though, baseball has been more transparent about
salary information than other major sports, which will make it much easier to try and
estimate the effect of different variables on salary. Although there is a minimum salary in
baseball, which was $400,000 in 2009, it is binding in very few cases, so we can
disregard it in our models.
I collected a total of 23 different offensive measures for each player. Many of the
total statistics, like hits, at-bats, and walks, were used to calculate percentage statistics
such as batting average and on-base percentage, so I am going to ignore those statistics in
my regressions. I ran regressions involving many of the statistics in my data set, and
found that there were many that were insignificant in all regressions, so I have also
removed those statistics from my regressions. Finally, there are some variables that we
cannot include in regressions because we could not imagine increasing a variable while
holding another one constant. Home runs provide a good example of this. Unfortunately,
we cannot imagine an increase in home runs without an increase in both runs and RBIs,
so we are not able to include both home runs and runs in our regressions as the coefficient
Ensing 11
on home runs would not accurately reflect the value of home runs on salary. I decided on
five variables to use in my regressions: Wins Above Replacement (WAR), runs, runs
batted in (RBI), on-base percentage (OBP), and slugging percentage (SLG).
Wins above Replacement measure how much better a player is than an average
minor league replacement player with offensive, running, and defensive statistics (see
Appendix for details). It has a minimum value of -3.5 wins (values of WAR can be both
positive and negative, a negative value means the player is costing his team wins), which
belonged to Jose Guillen in 1997, and a maximum value of 12.5 wins, which belonged to
Barry Bonds in 2001. The mean WAR was 2.77 wins.
Runs are measured by the number of times a player scores a run. The minimum
runs scored was 31, by Rey Ordonez in 2001, the maximum runs scored was 152 by Jeff
Bagwell in 2000, and the mean number of runs scored is 83.4. Runs batted in are
measured by the number of times a player causes a player on his team to score a run. The
minimum RBI was 17, by Luis Castillo in 2000, the maximum RBI was 165 by Manny
Ramirez in 1999, and the mean number of RBI is 79.4. On-base percentage is calculated
as the number of hits, walks, and hit-by-pitches divided by the number of at-bats, walks,
hit-by-pitches and sacrifice flies. The minimum OBP was .259, by Angel Berroa in 2006,
the maximum OBP was .609 by Barry Bonds in 2004, and the mean OBP is .355.
Slugging percentage is calculated as the total number of bases (1 base for a single, 2 for a
double, 3 for a triple, and 4 for a home run) divided by the number of at-bats. The
minimum SLG was .268, by Cesar Izturis in 2010, the maximum SLG was .863 by Barry
Bonds in 2001, and the mean SLG is .462.

Ensing 12
V. Regressions
Now that we have all of the statistics needed, we can run regressions to try and
predict which performance statistics affect salary. Our hypothesis is that after Moneyball,
some performance statistics will be rewarded differently than before Moneyball.
Moneyball was written during the summer of 2002, published in March 2003, and Hakes
and Sauer argue that the baseball labor market had adjusted itself within a year of the
book’s publication. If Hakes and Sauer are correct, coefficient estimates from 1995-2003
should be different than those from 2004-2010, as the market should have adjusted.
Salaries in baseball are often determined by long-term contracts, so past
production better explains current salary. Meltzer (2005) found the average contract
length in baseball to be 1.79 years, which had risen from 1.31 years in 1993. We can
assume that it has risen since then, but that contract average is for all players in baseball,
while our data set contains only those players who qualified for the batting title. As such,
we should expect that these players are generally better players, so they should be
rewarded with longer contracts. This means that we expect the proper lag time to be
about three years. The best way to determine the most representative lag time (e.g. one,
two, or three years) is to run a single regression with all variables in the regression lagged
for several years. When we run regressions for one, two, and three year lags, we find that
the best lags to use are three year lags. We are assuming that each contract is
approximately three years long, so a player’s current salary will reflect their performance
from three years earlier.

Ensing 13
For each regression I am running in this paper, the independent variable is the
natural log of salary, and the dependent variables are the five performance statistics. I
have also created a "Moneyball" dummy variable, which takes on the value if 1 if the
year is greater or equal to 2003, and 0 if it is before 2003. Although there are apparent
differences between salary determination before and after Moneyball, the question of
whether these differences are statistically significant remains. We use the interaction
terms involving the “Moneyball” indicator variable to determine if the payment for some
statistics was significantly different before and after Moneyball. To determine if the
salaries would be different, we can test the “variable*MB” coefficients for each of the
offensive variables by performing t-tests on each “variable*MB”. If any of the statistics
turn out to have a p-value of < .05, we can conclude that there were differing payments.
To calculate the effect of the variable on salary before Moneyball, we simply look at the
coefficient on just the variable as the MB dummy would be equal to 0. To calculate the
effect of the variable on salary after Moneyball, we add the coefficient on the variable
and the variable*MB, as the MB dummy now equals 1. The five “variable*MB”
measures will be included in each regression, and each regression can be found in table 2.
The first regression to run is a simple OLS regression. Running this regression
will not take advantage of the fact that we have time-series data. When we run the
regression, we find that every variable but SLG is significant before Moneyball, but no
statistics are significant after Moneyball. I am going to estimate each coefficient by
increasing the variable by one standard deviation, which would mean an average player
becoming an above-average player. These estimates will produce much more significant
changes in salary than simple one-unit changes. Estimates for WAR indicate that every
Ensing 14
extra two wins (2.0 WAR) a player adds to his team before 2004 is associated with a 5.26
percent increase in the player's salary, and every additional two wins a player adds to his
team after 2003 is associated with a 6.36 percent increase in the player's salary. This is
found by adding the coefficient for WAR and WAR*MB. Every additional 19 runs
scored by a player for a season before 2004 is associated with a 9.48 percent increase in
the player’s salary, and every additional 19 runs scored by a player for a season after
2003 is associated with a 9.42 percent increase in the player’s salary. Although the
difference is not significant, this regression shows that players were rewarded less for
runs after Moneyball was published. Every additional 25 RBIs in a season before 2004 is
associated with a 17.4 percent increase in the player’s salary, and every additional 25
RBIs in a season after 2003 is associated with a 25.0 percent increase in the player’s
salary. Every additional 37 percentage points increase in OBP (e.g. from .355 to .392) for
a player in a season before 2004 is associated with a 6.76 percent increase in the player’s
salary, and every additional 37 percentage points increase in OBP for a player in a season
after 2003 is associated with a 10.05 percent increase in the player’s salary.
Next, we can check if there is any advantage to exploiting the panel nature of our
data. We have data over fifteen years, and we have many players that are in the dataset
for more than one year, so we can run a fixed effect estimator. We are worried that the
covariance between our variables and the error term does not equal zero (Cov(OBPi,t, ai)
≠ 0), which would mean that the OLS regression is biased. Ai is the error term which
takes into account information about an individual that does not vary over time. An
example of this would be the intangibles of Derek Jeter. Jeter is the captain of the New
York Yankees, and is in the data set every year since 1996. He is known as a very
Ensing 15
intelligent player, a leader for young players, and someone who handles the New York
media well, but unfortunately these traits are immeasurable and cannot be included in a
regression. As such, they show up in the error term, and we are afraid that the OLS
regression may severely underestimate Jeter’s salary because it will not know whether the
difference in salary is due to the error term or the actual variables because of the
covariance between the two. Fixed effects will take this into account and will more
correctly estimate the regression if there is covariance between any of the statistics and
the error term.
We can run both one-way fixed effects, which do not take advantage of the yearly
data, and two-way fixed effects, which do include coefficients for the years, but are not
important for the hypothesis of this paper. Because of this, two-way effects are preferred,
and when we run the regression, we find that only runs and OBP before Moneyball are
significant at the 5% level. We also find that RBI before Moneyball as well as OBP and
SLG after Moneyball are significant at the 10% level. The estimate for runs means that
every additional 19 runs scored by a player for a season before 2004 is associated with a
6.26 percent increase in the player’s salary. Also, every additional 37 percentage points
increase in OBP for a player in a season before 2004 is associated with a 14.66 percent
increase in the player’s salary. The coefficient on OBP*MB is actually negative, which
means that players were rewarded less for higher OBP after Moneyball than before,
which does not agree with our hypothesis.
We can also run a random effects estimator, which would be more appropriate
than the two-way fixed effects estimator if the covariance between our variables and the
error term does equal zero. This would be the case if there were no variables we were
Ensing 16
omitting (or could not include) that would significantly affect the covariance between the
performance statistics and the error term.
When we run random effects, we find that only RBI and OBP before Moneyball
are significant. Every additional 25 RBIs in a season before 2004 is associated with a
15.14 percent increase in the player’s salary. Every additional 37 percentage points
increase in OBP for a player in a season before 2004 is associated with a 13.63 percent
increase in the player’s salary. With random effects, the coefficient on OBP*MB is
positive, although only slightly, but this now agrees with our hypothesis.
The regression that I believe most accurately estimates the influence on
performance statistics on salary is the random effects estimator. Although fixed effects
and random effects have very similar coefficients for most of the variables, I believe that
there is an covariance between a player’s statistics and the error term, as was discussed
above with Derek Jeter. There are many reasons why a player could be getting paid
differently (usually higher) than his statistics indicate. He could have intangibles, such as
Jeter, that make him more valuable to his team, or he could simply be well-liked in his
hometown city and commands a higher salary because of his popularity. Nonetheless, I
believe that fixed effects show the true coefficients for predicting salary.
VI. Conclusions
In the regressions that we have run, only two of the statistics are consistently
significant: runs batted in and on-base percentage before Moneyball. Runs batted in was
significant at the 1% level in three of the four regressions and significant at the 10% level
in the two-way fixed effects regression. On-base percentage was always significant at the
Ensing 17
5% level. We hypothesized that players with high OBP would be paid more after
Moneyball, but we get mixed results. The OLS and random effects regressions give us
positive values for OBP*MB, but the one-way and two-way fixed effects regressions give
us negative values for OBP*MB.
Some of the statistics, while not statistically significant, are economically
significant. One such example would be a player increasing his slugging percentage from
average to above average (one standard deviation) before Moneyball. This would result in
the player increasing his salary by 5.12 percent. Given that the league average salary is
$4,834,683, on average this would be an increase of $247,729.16. Although SLG is not
statistically significant, it is definitely economically significant. Many of the variables
that are not statistically significant are economically significant, which means that
although we found no variables that were significantly different statistically before and
after Moneyball, there could still be monopsonistic exploitation.
We can see that although spending patterns were altered after the book was
released, they are not significantly different in a statistical sense. This is probably due to
the fact that many contracts were signed before the book came out that ran for years past
2004, and those contracts reward players for pre-Moneyball statistics. If we were to run
this regression again in a few years, we may be able to see both a statistically and
economically significant change in the spending habits of teams after the release of the
book.
Ensing 18
Figures and Tables
Figure 1: Perfect Competition Demand Model

W
VMPL
L
Figure 2: Budget Constraint and Indifference curve for a worker
Income
BC
W*
IC
Leisure (hr/day)
H*
Ensing 19
Figure 3: Perfect Competition Supply Model

W
SL
Figure 4: Perfect Competition Supply and Demand Model

W
SL
WP
VMPL
L
LP
Ensing 20
Figure 5: Monopsony Supply Model
W
MCL
SL
Figure 6: Monopsony Supply and Demand Model
W
MCL
VMPL SL
WM
VMPL
LM L
Ensing 21
Figure 7: Perfect Competition vs. Monopsony
W
MCL
VMPL SL
WP
WM
VMPL
LM LP L
Table 1 – Descriptive Statistics
Variable Mean Std. Deviation Minimum Maximum

Salary $4,834,683 $4,737,942 $146,366.40 $33,000,000
LnSalary 14.74953 1.304495 11.89387 17.31202
WAR 2.769725 2.247303 -3.5 12.5
Runs 83.4232 19.10701 31 152
RBI 79.39814 25.5787 17 165
OBP .3546892 .0370931 .259 .609
SLG .4617033 .0760041 .268 .863
Ensing 22
Table 2 – Regressions
Variable OLS One-Way Two-Way Random

Regression Fixed Effects Fixed Effects Effects
WAR .0263652 .0033822 -.0040468 -.0055655
(.0137541)* (.0157241) (.0160457) (.0142906)
WAR*MB .0054741 -.0270061 -.0004383 -.0064985
(.0218059) (.0225376) (.0283512) (.026163)
Runs .0049915 .0022994 .0032931 .0025642
(.0014698)*** (.0016533) (.0016333)** (.0014693)*
Runs*MB -.0000303 -.0010778 -.001663 .0004244
(.0026485) (.0026647) (.0026562) (.0024679)
RBI .0069603 .004546 .0030786 .0060543
(.001441)*** (.0016781)*** (.0016506)* (.0014295)***
RBI*MB .0030393 -.0007809 -.0013886 .0004685
(.0026261) (.0028024) (.0027512) (.0025023)
OBP 1.828273 4.002033 3.963249 3.683718
(.7808365)** (1.08716)*** (1.106998)*** (.9028859)***
OBP*MB .8882631 -.9280347 -2.969955 .219621
(1.135569) (1.16554) (1.715506)* (1.536049)
SLG .7642924 -.450428 -.1546544 .6742775
(.6034425) (.6929237) (.6772447) (.591939)
SLG*MB -.8940868 1.89428 1.964169 .551901
(1.082093) (1.116424)* (1.114851)* (1.032082)
Intercept 13.4791 13.75574 14.68708 13.03296
(.2123302) (.3135692) (.5487836) (.4425238)
N 1033 1033 1033 1033
R-Squared 0.3717 0.3020 0.2281 0.3701
Standard Errors in parentheses
*Significant at the 10% level, **Significant at the 5% level, ***Significant at the 1%
level
Ensing 23
Appendix
To calculate Wins Above Replacement:
Calculate a hitter’s weighted on-base average (wOBA), which is a statistic that combines
on-base percentage and slugging percentage. It is calculated by (0.72*NIBB + 0.75*HBP
+ 0.90*1B + 0.92*RBOE + 1.24*2B + 1.56*3B + 1.95*HR) / PA. NIBB = non-
intentional walks, and RBOE = reach base on an error.
These coefficients are the run values of each event relative to an out. To convert wOBA
to wins, we must compare the hitter’s wOBA to the league wOBA. Wins = (wOBA –
League wOBA) / 1.15 * 700 / 10.5
The league wOBA is usually around 0.338. 1.15 is the relationship between wOBA and
runs. The average player will get 700 plate appearances per 162 games, and the ratio of
runs to wins is 10.5. So the formula compares the number of runs above average a player
is per PA, through wOBA, multiplies it by the number of PAs in a season, and divides by
the runs-wins ratio to calculate WAR.
Then, you must add in the positional adjustment, the replacement level of the player, and
the park factor for the player’s home stadium. Positional adjustments are defined as: +1.0
wins for a catcher, +0.5 wins for a SS or CF, no wins for a 2B or 3B, -0.5 wins for a LF,
RF, or PH, -1.0 win for a 1B, and -1.5 wins for a DH. The replacement level is how much
the player played that year, so how hard he would be to replace. The park factor is the
number of runs above or below average the player’s home park is, so how conducive it is
to runs being scored. Once you have added in all adjustments, you have calculated Wins
Above Replacement.
Ensing 24
References
Frank, Robert H. Microeconomics and Behavior (7th ed). New York: McGraw-Hill, 2008.
Hakes, Jahn K., and Sauer, Raymond D. "An Economic Evaluation of the Moneyball
Hypothesis." Journal of Economic Perspectives, Vol. 20 No. 3 (Summer 2006), pp. 173–
186.
Lewis, Michael. Moneyball. New York: W.W. Norton & Company, Inc., 2003.
MacDonald, Don N., and Reynolds, Morgan O. “Are Baseball Players Paid their
Marginal Products?” Managerial and Decision Economics, Vol. 15, No. 5 (September –
October 1994), pp. 443-457.
Meltzer, Josh. "Average Salary and Contract Length in Major League Baseball: When do
they Diverge?" May 2005.
Rottenberg, Simon. “The Baseball Players’ Labor Market.” The Journal of Political
Economy, Vol. 64, No. 3 (June 1956), pp. 242-258
Scully, Gerald W. “Pay and Performance in Major League Baseball.” The American
Economic Review, Vol. 64, No. 6 (December 1974), pp. 915-930.
Tango, Tom M., Lichtman, Mitchel G., and Dolphin, Andrew E. The Book: Playing the
Percentages in Baseball. Dulles: Potomac Books Inc., 2007.
Vrooman, John. “The Baseball Players’ Labor Market Reconsidered.” Southern

Economic Association, Vol. 63, No. 2 (October 1996), pp. 339-360
Wooldridge, Jeffrey M. Introductory Econometrics (4th ed). Mason: Cengage Learning,

2009.

MLB Salaries

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

MLB Salaries

Transféré par

Droits d'auteur :

Formats disponibles

Predicting Major League Baseball

Salaries through Offensive Statistics

II. Review of the Literature

Extensive statistical analysis with baseball salaries could not be reasonably

monopsonistic exploitation in baseball at the time, finding that an average player in

monopsonistic exploitation and upper-tier monopolistic inefficiency” (358). That is,

indicator of an offensive player’s performance, but there were contrasting results. In

percentage was the best indicator of worth.

pursuit of winning a championship. Moneyball showed how the small-market Athletics

undervalued, such as on-base-percentage (OBP) and slugging percentage. Hakes and

a hit, a walk, or a hit-by-pitch.

winning games was a successful exploitation of a profit opportunity” (183).

After Lewis published Moneyball in 2003, people around baseball vigorously

market has been moving closer towards a perfect competition as information is

III. Economic Theory

To determine the expected value of the salaries in a perfectly competitive market,

experiencing diminishing marginal returns. Therefore, the VMPL curve is downward

The labor supply curve is determined by looking at budget constraints and

supply curve will be upward sloping, which can be seen in figure 3.

MPL, to determine how the wages will shift.

on salary before and after Moneyball.

http://www.baseball-reference.com and http://www.thebaseballcube.com. I chose these

presents summary statistics on all of the variables used in my regressions.

in my regressions. Unfortunately, salary information is not entirely accurate, as some

disregard it in our models.

batted in (RBI), on-base percentage (OBP), and slugging percentage (SLG).

Barry Bonds in 2001. The mean WAR was 2.77 wins.

Bonds in 2001, and the mean SLG is .462.

some performance statistics will be rewarded differently than before Moneyball.

Salaries in baseball are often determined by long-term contracts, so past

from three years earlier.

offensive variables by performing t-tests on each “variable*MB”. If any of the statistics

statistics are significant after Moneyball. I am going to estimate each coefficient by

the error term.

which does not agree with our hypothesis.

performance statistics and the error term.

The regression that I believe most accurately estimates the influence on

us negative values for OBP*MB.

Some of the statistics, while not statistically significant, are economically

$4,834,683, on average this would be an increase of $247,729.16. Although SLG is not

statistically significant, it is definitely economically significant. Many of the variables

after Moneyball, there could still be monopsonistic exploitation.

Figures and Tables

Figure 1: Perfect Competition Demand Model

Figure 2: Budget Constraint and Indifference curve for a worker

Figure 3: Perfect Competition Supply Model

Figure 4: Perfect Competition Supply and Demand Model

Figure 5: Monopsony Supply Model

Figure 6: Monopsony Supply and Demand Model

Figure 7: Perfect Competition vs. Monopsony

Table 1 – Descriptive Statistics

Variable Mean Std. Deviation Minimum Maximum

Variable OLS One-Way Two-Way Random

To calculate Wins Above Replacement:

on-base percentage and slugging percentage. It is calculated by (0.72*NIBB + 0.75*HBP

+ 0.90*1B + 0.92*RBOE + 1.24*2B + 1.56*3B + 1.95*HR) / PA. NIBB = non-

intentional walks, and RBOE = reach base on an error.

League wOBA) / 1.15 * 700 / 10.5

the runs-wins ratio to calculate WAR.

Vrooman, John. “The Baseball Players’ Labor Market Reconsidered.” Southern

Wooldridge, Jeffrey M. Introductory Econometrics (4th ed). Mason: Cengage Learning,

Vous aimerez peut-être aussi

on-base percentage and slugging percentage. It is calculated by (0.72NIBB + 0.75HBP

+ 0.901B + 0.92RBOE + 1.242B + 1.563B + 1.95*HR) / PA. NIBB = non-