Académique Documents
Professionnel Documents
Culture Documents
Summary
A graphical representation of the batting stats, for a baseball player, using the At Bats-Hits diagram is shown to reveal a simple linear relation y = hx + c relating the number of AB (x) to the Hits (y) where the constant c is related to the number of missing hits and the constant h is the rate of increase of hits as the AB increases (comparable to the marginal tax rate in the tax problem). For Josh Hamilton, who is now believed to be in a slump, the analysis of the game-by-game 2013 stats to date indicates that his BA can be improved to a value between 0.256 t0 0.263. His best performance in 2013 is a hitting rate h = y/x = 4/12 = 0.333 between game 25 (April 29, 2013 with stat 104, 21) and game 28 (May 2, 2013 with stat 116, 25). Much higher BA can only be achieved if either the slope h (the marginal hitting rate) and/or the intercept c (the baseball work function) can be further improved.
*********************************************************
In this article we will discuss the application of the Method of Least Squares to analyze the batting average (BA) of a baseball player. We will also see how it can be used to predict the end of the season performance for a baseball player based on the first month stats. We will use the example of the Angels player Josh Hamilton for no reason other than the fact that he is now in going through a slump and is in the news and at the center of the first month woes of the Angels, see Refs. [1,2]. The methodology used here is similar to that described in detail in my earlier analysis of the career batting stats of baseball legend Babe Ruth, published recently, see Refs. [3, 4].
Page | 1
The batting stats being analyzed here can be obtained from Yahoo Sports (or some other website of choice), see Ref. [5]. The game-by-game, seasonal, and split-season (i.e., month-by-month) stats are all available. If we analyze the game-by-game for Josh Hamilton, or any other baseball player (like the legendary Babe Ruth), we will find (x, y) scores such as (0, 0), (1,1), (2, 2), (3,3), (4,4) where the first number x is the number of At Bats (AB) and the second number is the number of Hits (H) in the game.
8
0 0 2 4 6 8
http://sports.yahoo.com/mlb/players/6679/splits;_ylt=AnBC4BTb9Y_nwJSf_wgJCnKFCLcF?year=2010&type =Batting
For Babe Ruth, we find all these scores. For Hamilton, I analyzed his best season to date (2010) and only found the scores of (0, 0) and (3, 3) in that season. Missing were the scores like (1, 1), (2, 2) etc. which we find for Babe Ruth. Regardless, in these games, the baseball player has the theoretically PERFECT batting average BA = H/AB = 1/1 = 1.000 = 2/2 = 3/3, etc. In other words, y = x + c where the ratio y/x = BA = 1 and the constant c = 0. We will also find scores such as (1,0), (2,1), (3,2), (4, 3), (5,4) and (6,5). For these games, the law is y = x + c = x 1 where the constant c = -1 is related to the number of missing hits. The batting average BA = y/x = 1 (1/x) is less than the perfect value and decreases as the number of AB = x increases. If we continue the analysis, we also find scores like (2,0), (3,1), (4, 2), (5,3) which means y = x + c = x 2 and again the constant c = -2 the number of missing hits and the batting average y/x = 1 (2/x) deviates from the perfect value of 1 and decreases with increasing AB. Josh Hamiltons 2010 season game-bygame performance is illustrated in Figure 1. The batting stats can be described
Page | 3
by a series of parallels with the general equation y = hx + c where the slope h = 1 and the nonzero intercept c = 0, -1, -2, etc., the number of missing hits. The constant c can be thought of as the baseball work function and is analogous to the idea of a work function used in physics to describe photoelectricity (the production of free electrons from within a metal, using energy in the form of a light source, see Refs. [3,4]). Now, if we start aggregating the batting stats on a monthly basis, and by season, and look at the performance of the same player, we find the more general law y = hx + c where the slope h < 1 and the constant c is related to the skill of the baseball player (how many missing hits on average). This is illustrated in Figures 2 and 3 for Hamilton. In Figure 2 we consider the 2010 season. The batting stats for the season are summarized in Table 1 for convenience. In Figure 3 we consider the career stats to date (2007-2013 through May 5, 2013 (AB = 125 and H = 26); see Table 2 for the data.
60 50 40 30 20 10 0 0 20 40 60 80 100 120 140 160
The individual (x, y) pairs in Figure 2 are the (AB, Hits) scores for each month. The data shows an upward trend and a best-fit line can be determined using the method of least squares; see Refs. [6, 7] for more details and a worked example. The equation of the best-fit line y = 0.375x 1.181, with the linear regression coefficient r2 = 0.8779. The BA y/x = 0.375 (1.181/x) is less than the slope h of the line and increases with increasing AB. The slope h of the best-fit line is the maximum BA that can be achieved by a player. At the end of the season AB = x = 518 and H = y = 186 and the batting average BA = y/x = 186/518 = 0.359, Hamiltons best BA to date. The slope h and the intercept c are related to the skill of the baseball player and the scatter is related to the consistency achieved by the player on a month-to-month basis; see also the discussion of Babe Ruths batting stats in Refs. [3,4]. The same trends are also observed when we aggregate the data on a seasonal basis. Each (x, y) pair in Figure 3 represents the data for a single season (with partial season for 2013). The same upward trend is again observed and the best-fit line through the data has the equation y = hx + c = 0.337x 15.805. The BA y/x = 0.337 15.805/x increasing with increasing At bats x and will approach the limiting value of h = 0.337, the slope of the line is the current performance continues. The (x, y) pair for the 2010 season falls above the best-fit line and thus gives the highest, to date, career BA = 0.359 for a season.
However, statistically speaking, the BA that Hamilton can achieve is the limiting value of h = 0.337. The BA for the 2013 season is only 0.208 to date.
Page | 5
This also means that Hamilton is capable of improving of his batting performance during the remaining months of the 2013 season.
250
200
150
100
50
35 30
having the energy ) shines on its surface. The maximum energy of the electron K = W where W is the energy that must be given up to do work necessary to overcome the forces binding the electron to the metal. The English translations of Einsteins original 1905 paper (written in German) refer to W as the work function. This is exactly like the missing hits in baseball. If every AB produces a Hit, the player will have the PERFECT Batting Average BA = 1.000. However, because of various factors (associated with the pitcher, pitching speed, even wind speed, difference in the stadium, night versus day, indoor versus outdoors, etc.) some of the AB do not produce Hits. Hence, the BA is less than the perfect values and the slope h < 1. However, the BA is always less than the slop h, if c < 0. This depends on the skill of the baseball player. As an example, for Lou Gehrig, in the 1927 season (when Ruth won the home run race and set the single season record of 60 home runs), the nonzero intercept c > 0 whereas for Babe Ruth c < 0; see Ref. [4]. This difference means that for Ruth, the BA increased with increasing AB whereas for Gehrig, the BA was decreasing with increasing AB.
At Bats AB = x
4 25 32 35 47 72 80 92 108 125
Hits H=y
0 4 5 7 11 16 18 21 22 26
For the month of May, through 5/5/13, Hamiltons rate h = 4/17 = 0.235
Page | 9
35 30
25 20 15 10
5
0 0 20 40 60
A more detailed discussion of the work function may also be found within the context of other problems such as the airline On-Time arrivals problem, Refs. [8-10] and the Debt-GDP problem, see Refs. [11-14], both of which are described by the same mathematical law y = hx + c. In the Airline Quality
Page | 10
Rating (AQR) problem, an On-Time arrival is like a hit in baseball and the number of flights operated by the airline is like the number of At Bats. The Debt/GDP ratio, a matter of great concern now, and the subject of one of the great economic debates of our times (following the discovery of coding errors in a paper written two Harvard economists) can be also be understood in terms of the idea of a work function, or the nonzero intercept c.
This is described in several articles (I have provided some links below). Notice that Babe Ruth had a negative c, which means that the more the AB the more the hits and Ruth's BA just keeps increasing the more he plays. For his Yankee teammate, Lou Gehrig, in the same season, the nonzero intercept c was positive, which means the more the AB the lower the BA. No wonder, Gehrig lost the home run race in the 1927 season. With this background, do take a look at my analysis of Josh Hamilton as well. Before sending this email, I checked Hamiltons April, May and June stats. 2013 April May June BA 0.204 0.237 0.190
80
AB 108 97 63
Hits (H) 22 23 12
Cum H 22 45 57
c is negative (c = -1.625). It also appears that he has improved his BA in May and June compared to April. But Hamiltons problem is the slope h = 0.219, which is equal to the rate of increase of Hits as AB increases. The slope of the AB-Hits graph is the theoretical maximum BA, since BA = y/x = 0.219 - (1.625/x) will increase as At Bats x increase (because of negative c) but can never exceed h = 0.219. Hamilton already had a BA of 0.220 at the end of May. (This is a minor statistical fluctuation.) In other words, the batting data reveals that Hamilton is all MAXED OUT. There is no way his BA will improve this season unless there is a fundamental change in what he is doing with the bat. Any improvement will require a major change in the nonzero intercept c (work function has to improve) or the slope h (hitting rate has to improve, more hits for same AB). I see other commentators mention alcoholism and even drugs. When I hear someone say "Weird man", that seems to be a problem too. Hamilton needs some help, some real professional help and must get off that booze, if it is true. This is coming from a baseball fan who is also a geek as far as baseball stats. If possible, please pass this message on to Hamilton and urge him to take a hard look in the mirror and think about what he is doing with his personal life off the field. Thanks and regards. Very sincerely V. Laxmanan Some articles that you might find of interest (I hope). There are lots of graphs. If you do read the Babe Ruth articles, you will get the hang of it. It is quite simple, really. And a lot better than WAR, WPA etc. Sabermetrics stuff.
Page | 13
1. Babe Ruths 1923 Batting Statistics and Einsteins Work Function, Published April 17, 2013, http://www.scribd.com/doc/136489156/BabeRuth-s-1923-Batting-Statistics-and-Einstein-s-Work-Function 2. Babe Ruth Batting Statistics and Einsteins Work Function, To be Published April 17, 2013, http://www.scribd.com/doc/136556738/BabeRuth-Batting-Statistics-and-Einstein-s-Work-Function 3. The Method of Least Squares: Predicting the Batting Average of a Baseball Player (Hamilton in 2013), Published May 7, 2013, http://www.scribd.com/doc/139924317/The-Method-of-Least-SquaresPredicting-the-Batting-Average-of-a-Baseball-Player-Hamilton-in-2013 4. Miguel Cabreras Career WAR and Batting Average: An Amazing Correlation http://www.scribd.com/doc/145839586/Miguel-Cabrera-sCareer-Wins-Above-Replacement-WAR-and-Batting-Average-BA-AnAmazing-Correlation 5. Miguel Cabreras Wins Above Replacement (WAR) and Batting Average: An Amazing Correlation, Published June 5, 2013. http://www.scribd.com/doc/145839586/Miguel-Cabrera-s-Career-Wins6. What is the Big Difference Between the Wilson and Cabrera Eras? Published June 3, 2013, http://www.scribd.com/doc/145626322/What-is-the-BigDifference-Between-the-Wilson-and-the-Cabrera-Eras-in-Baseball
Above-Replacement-WAR-and-Batting-Average-BA-An-Amazing-Correlation
7.
The Batting Average and Wins Above Replacement (WAR) for all the batting Leaders in 2013 Season (to date), Published June 6, 2013.
http://www.scribd.com/doc/146052658/The-Batting-Average-BA-and-Wins-AboveReplacement-WAR-Relation-for-the-Batting-Leaders-in-the-2013-Season
8.
What is Wrong with Ratio Analysis? Baseball Offers an Interesting Example with Wide Applications, Published May 31, 2013. http://www.scribd.com/doc/144798463/What-is-Wrong-With-Ratio-AnalysisBaseball-Offers-an-Interesting-Example-with-Wider-Applications
9.
Is Miguel Cabrera on Pace to Break Hack Wilsons Single-Season RBI Record?, Published May 28, 2013, http://www.scribd.com/doc/144083838/Is-Miguel-Cabrera-on-Pace-toBreak-Hack-Wilson-s-Single-Season-RBI-Record-YES-Can-I-Changed-MyMind-on-This-Read-On-Now 10. Trust Me, the Financial World will Change Forever if Wall Street Starts Analyzing Financial Data like we do Baseball Stats, Published May 26, 2013, http://www.scribd.com/doc/143781795/Trust-Me-the-Financial-World-will-changeforever-if-Wall-Street-starts-analyzing-financial-data-like-we-do-baseball-stats-Miguel-Cabrera
Reference List
1. Hamilton at the center of Angels first month woes, by Alden Gonzalez, http://mlb.mlb.com/news/article.jsp?ymd=20130506&content_id=46768 790&vkey=news_mlb&c_id=mlb May 6, 2013. 2. Struggling Hamilton is held out of Angels starting lineup, by Kevin Baxter, May 5, 2013, http://articles.latimes.com/2013/may/05/sports/lasp-0505-angels-notes-20130505 3. Babe Ruths 1923 Batting Statistics and Einsteins Work Function, Published April 17, 2013, http://www.scribd.com/doc/136489156/BabeRuth-s-1923-Batting-Statistics-and-Einstein-s-Work-Function 4. Babe Ruth Batting Statistics and Einsteins Work Function, To be Published April 17, 2013, http://www.scribd.com/doc/136556738/BabeRuth-Batting-Statistics-and-Einstein-s-Work-Function 5. Josh Hamilton, Yahoo! Sports http://sports.yahoo.com/mlb/players/6679/career;_ylt=AonDH5cy3IrM_ WMi_1w0IwKFCLcF 6. Legendre, On Least Squares, English Translation of the original paper http://www.york.ac.uk/depts/maths/histstat/legendre.pdf 7. Line of Best-Fit, Least Squares Method, see worked example given http://hotmath.com/hotmath_help/topics/line-of-best-fit.html The formula for h used in this example is an actually approximate one and was used, before the advent of modern computers, since it only involves the determination of x2 and xy and the sum of all the values of x, y, x2 and xy. The exact formula, is given below, with xm and ym denoting the mean or average values of x and y in the data set, and ym = hxm + c since the bestfit line always passes through the point (xm , ym). h = (x xm)(y ym)/ (x xm)2 Determine the deviations of the individual x and y values from the mean, or average, (x xm) and (y ym). Determine the product (x xm)(y ym) and their sum. This gives the numerator in the expression for h. Determine the square (x xm)2 and the sum. This gives the denominator in the expression for h. This also fixes the intercept c
Page | 15
via ym = hxm = c . Then, using the regression equation, determine the predicted value yb on the best-fit line and the vertical deviation (y yb) and the squares (y- yb)2. The sum of these squares is a minimum. This can be checked by assigning other values for h (using any two points) and allowing the graph to pivot around (xm, ym). The regression coefficient r2 = 1 - { (y- yb)2 / (y- ym)2 } is a measure of the strength of the correlation between x and y (or y/x versus x). For a perfect correlation, when all points lie exactly on the graph, r2 = +1.000. 8. Airline Quality Report: An Analysis of On-Time Percentages, Published April 18, 2013, http://www.scribd.com/doc/136760664/Airline-QualityReport-2013-Analysis-of-the-On-Time-Percentages 9. Airline Quality Rating 2013, Purdue University, e-Pubs, April 8, 2013, by Dr. Brent D. Bowen (Purdue University, College of Technology) and Dr. Dean E. Headley (Wichita State University, W. Frank Barton School of Business) http://docs.lib.purdue.edu/aqrr/23/ 10. Airline Quality Report 2013: An Analysis of On-Time Percentages, Published April 18, 2013, http://www.scribd.com/doc/136760664/Airline-Quality-Report-2013Analysis-of-the-On-Time-Percentages 11. The Method of Least Squares: The Debt-GDP Relation for the Trillionaire Club of Nations, Published May 4, 2013, http://www.scribd.com/doc/139348541/The-Method-of-Least-SquaresThe-GDP-Debt-Relation-for-the-Trillionaires-Club-of-Nations 12. An MIT Non-Economists View of the Harvard-UMass Debt/GDP Ratio and Economic Growth Debate, Published April 26, 2013, http://www.scribd.com/doc/138076426/An-MIT-Non-Economist-s-Viewof-the-Harvard-UMass-Debt-GDP-Ratio-and-the-Economic-Growth-Debate 13. Iceland Votes Against Austerity: Analysis of Icelands Debt-GDP, Published April 28, 2013, http://www.scribd.com/doc/138345921/IcelandVotes-Against-Austerity-Analysis-of-Iceland-s-Debt-GDP-Data-2002-2012
14. A Brief Survey of the Debt-GDP Relations for Some Modern 21st Century Economies, Published May 1, 2013,
Page | 16
http://www.scribd.com/doc/138912093/A-Brief-Survey-of-the-DebtGDP-Relationship-for-Some-Modern-21st-Century-Economies
Page | 17
actually have many applications far beyond blackbody radiation studies where it was first conceived. Einsteins photoelectric law is a simple linear law and was deduced from Plancks non-linear law for describing blackbody radiation. It appears that financial and economic systems can be modeled using a similar approach. Finance, business, economics and management sciences now essentially seem to operate like astronomy and physics before the advent of Kepler and Newton. Finally, during my professional career, I also twice had the opportunity and great honor to make presentations to two Nobel laureates: first at NASA to Prof. Robert Schrieffer (1972 Physics Nobel Prize), who was the Chairman of the Schrieffer Committee appointed to review NASAs space flight experiments (following the loss of the space shuttle Challenger on January 28, 1986) and second at GM Research Labs to Prof. Robert Solow (1987 Nobel Prize in economics), who was Chairman of Corporate Research Review Committee, appointed by GM corporate management.
Page | 19