|The Mastersball Projection Process|
|Written by Todd Zola|
|Saturday, 01 January 2011 13:56|
Here at Mastersball, we employ an objective projection engine that translates skills into expected player performance. A common critique of many projection sources is they fail to take chances, to go out on a limb on a player. Our philosophy is everything that goes into projecting a player’s performance is incorporated into the system, so we are very likely guilty of that as well.
We shy away from employing selective bias. What is true for one is true for all. Or perhaps better stated, what usually happens globally, we assume will happen on an individual basis. As such, we will miss players that perform against the norm. For example, if you believe John Doe is going to have a breakout campaign based on his second half performance, that same thinking should be applied to all players with similar second half success, not just John Doe. What we do is look at these factors, and if warranted, an algorithm is built into the projection model so it accounts for the factor. We will not selectively project John Doe to carry over an increase in skills due to a second half performance while not doing the same for other players. That said, we will provide accompanying profiles which will contain some subjective analysis, pointing out players that we feel may exceed expectations or fall short. But realize these are more gut calls and not numbers based. The projections are entirely numbers driven, with the occasional adjustment made for outlier seasons due to injury or other factors.
Our projections are a series of translations. We normalize each stat line to park neutral conditions, regress selected skills that involve an element of luck and come up with a weighted average of three years worth of data. The weighted average is then age adjusted. Players appearing in the minors for any of all of those three years have those stat lines included after undergoing our in-house minor league equivalency translation, which incorporates the player’s age and level. What results is a park-neutral projection for each player. The final projection is then adjusted for the player’s home park.
The entire hitting projection revolves around the unit of plate appearance. It all starts there. We will describe in detail later how the projected number of plate appearances is determined. What follows is a step-wise walk through how each statistic is projected.
Base on Balls – Walk rate is expressed In terms of plate appearances so the first step is extracting the number of walks directly from plate appearances. This leaves us with walks and at bats.
Strikeouts – Contact rate is expressed in terms of at bats so the next step is extracting strikeouts from the at bats.
Home runs – We use home run per plate appearance as the base metric. Since there is a little bit of luck involved with HR/FB, we regress HR/PA a little towards the league average.
Hits – There is also a little luck involved with batting average on balls in play so we regress a players real BABIP towards his expected BABIP using advanced batted ball data. Then with the regressed BABIP, at bats, home runs and strikeouts, the number of non-homer hits can be determined. We then use career percentage of singles, doubles and triples within those hits to distribute the non-home run hits.
All of the above are park-normalized using park factors. Hits and homers are applied according to lefty hitter, righty hitter or switch hitter, while doubles, triples, walks and strikeouts are the same from either side of the plate since that is how the data is available. Park adjustments for minor league parks are not so rigid since the data is not as robust.
Stolen bases – The percentage of times the hitter attempts a steal is determined based on the times and manner he gets on base and attempts a steal. The attempted steals is then calculated using the projected data. The stolen base success rate is then applied to this number of attempts to determine the number of steals and caught stealing.
Runs – An in-house algorithm is used to determine the expected number of runs based on how often a runner gets on base, how he gets on base and his stolen base data. This is determined for each season and compared to how many runs the player actually scored and a normalization factor is determined. The expected number of runs is then determined from the projected stats and corrected with the normalization factor. The normalization factor helps account for a couple different issues including strength of the team’s lineup and position in the batting order. If a player’s situation has changed from the previous season, we can change the normalization factor accordingly.
RBI – Runs batted in are calculated the same way as runs, just using a different algorithm to determine expected RBI.
The end result is a stat line based on 500 plate appearances. After inputting the player’s team and projected plate appearances, the stat line is adjusted to the home park and prorated to the number of projected plate appearances. Something that should be pointed out is the nature of this system results in non-whole number projections. In the past, we have converted these to the nearest rounded off integer, but this season, though we are expressing the stats in non-decimal form, the actual number is used for the value calculation. This will explain any discrepancy if you do your own manipulations and calculation and find slightly different slash numbers.
PLAYING TIME DETERMINATIONS
Our playing time determination is as rigorous as any in the industry. First, a playing time grid is determined, assigning the percentage each player will play each position. This sets the total expected playing time for each player. Then a batting order grid is determined, assigning each player a percentage that totals his playing time percentage. Since some teams accrue more plate appearances than others, and since players at the top of the order are at the plate more than at the bottom of the order, it is not fair to assign the same number of plate appearances to a Kansas City Royal projected for 50% playing time batting 9th as it is for a New York Yankee projected for 50% playing time that leads off. Our model employs a 3-year average for each of the team’s position in the batting order as a basis for the calculation. In addition, the number of times each team pinch hits is determined and used in the calculation, as the percentage of pinch hit appearances is also projected per player. The final result is a team’s total number of plate appearances is logically based on its 3-year history.
Pitching projections are done a little differently as the basis is more skills oriented. Also, home run rate and BABIP are regressed to the league average significantly more than hitter’s since hurlers tend to cluster around league average. It is not completely regressed, so if a pitcher indeed exhibits some measure of control over these luck-involved stats, it is still represented in the projection. And since most pitchers are indeed very close to league average, the adjustment is minimal. All pitchers counting stats are normalized to 200 innings for the park-adjusted, neutral stats.
Strikeout rate and walk rates – There are no regression adjustments to K/9 and BB/9, only whatever aging corrections are appropriate.
Home run rate – Each pitcher’s HR/9 is regressed towards league average. We then regress this towards the league average, adjusting the depth based on the percentage of fly balls the pitcher historically allows.
Hit rate – The number of non-homer base hits is determined based on the projected number of strikeouts, homers, regressed BABIP and batters faced. Since GB pitchers allow more hits on balls in play than FB pitchers, their regression is stronger.
WHIP – Straightforward calculation based on the projected walks and hits
ERA – An in-house expected ERA is determined, based on the above projected skills. Just like runs and RBI, the expected ERA is determined from the pitcher’s actual stats and a correction factor is determined, which is then applied to the projected expected ERA. This correction helps account for team defense and bullpen. If a pitcher’s situation has changed, we may alter this fact to account for the new situation.
Wins – There is no perfect means of projecting wins as it too much of a team-dependent statistic. What we do is use the pitcher’s ERA and the team’s historical number of non-earned runs allowed to determine how many runs per 9 innings each pitcher allows. The level of run support is then determined from the hitting projections. We then use the Bill James Pythagorean Theorem to determine the hurler’s expected winning percentage. The number of decisions is prorated to the number of innings pitched, and the won-loss record is then calculated. There are obvious inefficiencies in this method, but it is logical and objective and sets reasonable expectations. It should be mentioned that the expected runs calculation is done after the park correction.
Saves – While the number of saves is subjective, we make the team total logical using the expected team’s wins and historical percentage of wins saved as a target number of saves per team.
The final projection is then determined using the park factor and projected number of innings.
INNINGS PITCHED DETERMINATION
A total of 162 games started are assigned per team. The historical IP per start for each starter is used to project the number of innings. The innings from starters are totaled and reliever innings are subjectively assigned to bring the team total to 1458 (9 x 162).
Even though the playing time determinations are based on logical parameters, we do not force each team to completely account for every plate appearance or inning if we do not feel there is someone on the roster to account for the plate appearance or inning. We would rather carry a player we call OTHER than give excessive playing time to somebody.
We make sure the projections make sense globally in a couple respects, but are willing to sacrifice global accuracy for individual accuracy in another couple of instances. Historically, 95% of a team’s runs have an associated RBI. One global check we do is adjusting each team’s total to reflect this. If necessary, the manner this is accomplished is altering selected player’s runs and/or RBI normalization factors.
Something that will not be perfect globally is the total number of projected wins equaling the total number of projected losses. The reason for this is the wins and losses are based on projected runs scored. The number of projected runs scored is likely greater than the eventual numbers, as good players are going to get injured and be replaced by lesser players. This artificially raises the expected run support for each pitcher, elevating their win total. Similarly, good pitchers are going to get hurt and be replaced by lesser pitchers, which will add more losses. We feel it is best to reflect the wins on an individual player basis than to make it perfect globally. This same line of thinking can used to explain why the total number of projected hits, homers and runs scored by the hitters is not the same as the projected number allowed by the pitchers.
|Last Updated on Saturday, 24 November 2012 12:40|