Last week, we began a series on some of the new-fangled statistics or new means of looking at standard stats. Today, we are going to take old friend BABIP to the next level.
I wonder if Voros McCracken knew the monster he created when he first introduced DIPS theory to the baseball populace. Originally, the primary conclusion was pitchers have limited control over the fate of a batted ball in play. Be it Pedro Martinez or Pedro Feliciano, when round bat met round ball, there was about a 30% chance a ball in play would be a hit.
Next, the concept was applied to hitters and it was observed that hitters develop their own baseline BABIP.
Soon thereafter, improvement in data collection expanded to include classifying batted balls as line drives, fly balls and grounders, which helped explain why batters had differing BABIP’s. Line drives result in hits most frequently, about 71% of the time. Ground balls follow with around 24% becoming hits. Fly balls bring up the rear, landing safely only 15% or so. So hitters producing more line drives carry a higher BABIP. If two hitters had similar line drive rates, the one hitting more grounders sported a slightly higher BABIP.
With respect to pitchers, correlation studies suggested that line drive rate was random but they influenced whether a guy hit the ball on the ground or in the air. Thus, ground ball hurlers generally carry a higher BABIP than fly-ballers.
The focus of today’s story is the next iteration of batted ball data, and that is the further classification of the three hit types into soft, medium and hard, in terms of the force the ball was struck. It should also be noted that until now, the bulk of data collection of this nature has been subjective. The next major advancement, which is actually a work in progress, is electronically capturing the speed and trajectory of a batted ball which will eventually render classification an objective endeavor.
In the name of full disclosure, I only have data classified as hard and soft for my personal use. I will present some analysis using the information and will reference some more detailed studies incorporating medium hit data.
Let’s cut right to the chase. While this may appear to be right from the Mr. Obvious archives (long time reader, first time poster, if you get that give me a “right on” in the comments), the BABIP of hard hit balls is far higher than that of soft hits. Furthermore, some hitters historically hit balls harder than others, which really helps explain the different BABIP baselines, beyond solely line drive rates.
As alluded to above, this data is further refined by the inclusion of medium hit balls. Since I don’t have access to these numbers for the purpose of my own research, I am only comfortable referencing some work that I have heard discussed in public arenas. This part may not be as intuitive, but of the three classifications, medium hit balls exhibit the lowest BABIP. The original study was published by Patrick Davitt of Baseball HQ (a subscription service, sorry no link available), but I have heard and read similar analysis. This is just personal conjecture, but the results make sense. A softly hit grounder is more likely to be beaten out than one of the medium hit variety. A softly hit fly ball requires a longer run to be caught than a medium one, which might explain its BABIP being higher than a medium fly ball.
What I do have is data for hard and soft hit balls, which is presented in the table below for the 2011 and 2010 campaigns:
|2011 Line Drive||0.714||0.730||0.683|
|2011 Ground Ball||0.238||0.576||0.185|
|2011 Fly Ball||0.139||0.404||0.045|
|2010 Line Drive||0.708||0.723||0.691|
|2010 Ground Ball||0.239||0.552||0.187|
|2010 Fly Ball||0.158||0.426||0.062|
As suggested, refinement of BABIP is possible by using the three classifications of batted balls. But this does not really address what component of BABIP is in fact a player’s skill. The reason this is important is skill should not be regressed, only that which is out of control of the player should be regressed, be it a hitter or pitcher.
Thinking about this anecdotally, what if a pitcher had the skill to induce weak contact? This makes intuitive sense from the perspective that if a pitcher has the ability to make a batter swing and miss, he should have the ability to induce weaker contact. So now, instead of regressing a pitcher’s BABIP in total, or even using the three hit classifications, if only the percentage of hard hit balls allowed by each pitcher were regressed, since intuitively there is less control of the fate of a hard hit ball, then a more precise BABIP for each pitcher is possible.
Moving onto hitters, the same line of thinking holds true, but it is even more important to refine what is regressed and what is not since the BABIP variance around the norm for hitters far exceeds that of hurlers. Let’s first think in general terms. A faster player will beat out more weak and medium hit grounders than slower players and a hard hit grounder has as good a chance to get through regardless of who hit it. But is that really the case? Don’t infielders cheat in when a speedy guy is up, in an effort to cut down on infield hits? This may reduce some infield hits but also allow for more medium and hard balls to sneak through. And, don’t infielders play back with a slow runner up, increasing their range?
A similar thought process can be applied to balls hit in the air. Intuitively, the outfield cheats in when a non-power hitter is at the plate. This could result in fewer soft and medium fly balls (and line drives) being caught. On the other hand, more hard hit balls in the air may fall, especially those hit over the drawn in outfield. When a power hitter is up, he may have a greater percentage of weak and medium hit flies and liners land safely, since the outfielders are playing deeper. Of course, this reduces the number of hard hit fly balls that may land.
Putting this all together, it is clearly apparent to me why each hitter hovers around his own BABIP, and not simply because of their line drive, fly ball and ground ball distribution. For a speedy hitter, their BABIP on ground balls could be in part skill whereas it is completely luck on the part of a lumbering power guy. That is, what is a skill for a speedster may not be for a power hitter, and vice versa.
All this sounds well and good, but there are still two huge issues before any significant results can be determined. First, sufficient data breaking down each hit type needs to be recorded to eliminate randomness from the equation, significant referring to the number of years. Second, presently, there is too much subjectivity with respect to the classicization of batted balls. But as mentioned, data is being collected electronically. The bad news is this electronic collection is fairly recent. Subjective batted ball data has been collected for about 10 years, the hard/medium/soft label for only a couple. The good news is the added precision of the electronically collected data will reduce the sample necessary to render significant results, as some of the need for a large sample was to wash out collection bias.
For those curious, I utilized the above BABIP data in my projection engine in an effort to refine what is regressed, but most of the incorporation was global in nature. That is, Juan Pierre received the same treatment as David Ortiz.
Actually, that brings up a third problem with this intended study and that is the improvement in assessing how defense, specifically player positioning, plays a role in all of this. But that’s a story for another day, as the improvements in defensive metrics will be covered as a standalone topic in this series.