Tuesday, March 4, 2014
Give WAR a Chance
Mike Trout is only 22 years old, and he is the best baseball player in the world. The Los Angeles Angels of Anaheim recently signed Trout to a one-year, $1 million contract, and are expected to announce later this Spring that they have signed Trout to a long-term deal that makes him the highest-paid player, per season, in the history of Major League Baseball.
Each of the last two seasons, Trout has finished second in AL MVP voting, behind Miguel Cabrera. Because both players had terrific seasons, disagreements arose about who was the more deserving candidate. Cabrera's greatness was easier for fans to understand, because of the way it showed up in the statistics. Stats are about simplification: we can organize a 162-game season into a few numbers whose meanings we understand. Analytic fans have taken that one step further, with an all-in-one stat called Wins Above Replacement, or WAR. Many fans and journalists are skeptical of WAR, preferring traditional stats, but I'll explain why WAR makes sense.
Say you have two MVP candidates. One of them batted .310, with 45 home runs and 120 runs batted in. The other hit .325, with 40 HR and 110 RBI. Those are both great seasons. How do you balance one player's advantage in batting average with the other's edge in HR and RBI? But let's say one of those players stole 40 bases. Okay, now we've got a winner. But wait. The other guy hit 50 doubles and walked 100 times. Suppose one of them was a shortstop, or a Gold Glover? What if one of them plays in Colorado? There are actually a lot of variables we should consider. It gets to be a lot of work, taking everything into account.
WAR is one number. You don't have to do the math, because it's done. You just need a system you can trust. I'm going to explain why you can trust WAR. I'll use Trout and Cabrera as examples.
In 2012, Miguel Cabrera won the American League's Triple Crown. He batted .330/.393/.606, with 44 HR and 139 RBI. Mike Trout, a rookie who wasn't called up until a month into the season, hit .326/.399/.564, with 30 HR and 83 RBI. Three basic factors pushed Trout's WAR ahead of Cabrera's: park effects, baserunning, and defense.
1. Park Effects
Anaheim is a pitcher's park. If you take out the homefield and just look at road games, Trout's batting average, on-base percentage, and slugging percentage were all better than Cabrera's:
In 2012, Miguel Cabrera stole 4 bases. Mike Trout stole 49. When you consider taking an extra base as a runner, things like going first-to-third on a single, Trout has another 23 bases up on Cabrera. Beating the throw to first on a double play opportunity, Trout's advantage is 18-21 bases, depending on whether you count FC (+18) or GIDP (+21). Altogether, that's at least 86 bases Trout advanced that Cabrera didn't.
Cabrera was not a good defensive third baseman. Mike Trout is a very good defensive outfielder, and in 2012 he was one of the best center fielders in baseball. Trout's four home run robberies led the majors.
The biggest misconception about WAR is that it's trying to tell you something new. It's not. Rather, WAR organizes and simplifies what we already know. You don't need any help from Bill James to understand basic stuff like home/road splits, speed on the basepaths, or fielding. Fans and managers understood those ideas a century ago. WAR just puts it all together so that you don't need to compute park effects or look up baserunning stats.
The home/road data — and our knowledge of park factors — show that the only significant difference between Trout and Cabrera at the plate was playing time. Their averages were nearly equal, but Trout spent April in AAA. Trout had a significant baserunning advantage, worth roughly 10-15 runs, or about 1-1.5 WAR. The defensive valuations favor Trout by a little more than 20 runs, or roughly 2 WAR. I think most fans would agree that 20 runs seems roughly in line with their estimation of the difference between a good center fielder and a below-average third baseman — or more specifically, the defensive difference between Trout and Cabrera in 2012. Every individual component of the formula is fairly intuitive.
The WAR computation from Baseball Reference, rWAR, showed Trout with 10.9 WAR and Cabrera with 7.3. FanGraphs placed Trout at 10.0 fWAR and Cabrera at 6.8. Those are excellent numbers for both players, but both systems show Trout with a clear and decisive edge. The glossary at FanGraphs does an especially good job of spelling out how their formula works.
I'm not suggesting that all we need to know about a ballplayer is his WAR, but it does what every good stat should do: it simplifies a long season into a number that makes sense. Trout didn't deserve the MVP because his WAR was higher than Cabrera's; he deserved the MVP because of all the individual factors — things we all understand, like baserunning and fielding — that contributed to his higher WAR.
This year isn't quite as simple, because Cabrera's batting advantage was larger and Trout was less sensational in the field. We'll do this one in more detail, starting at the plate.
At the most fundamental level, a batter's job is to produce runs: to put himself and his teammates in scoring position. Breaking this down a little, hitters should get on base, advance on the bases, advance their teammates on the bases, and avoid making outs. You already know this; it's Baseball 101. Let's look at Trout and Cabrera, and let's start with getting on base.
I will warn you that this section is pretty stat-heavy — but it's not sabermetrics, it's stuff we all know about.
Getting on Base
Trout reached base 320 times last season: 190 hits, 110 walks, 9 hit by pitch, and 11 times reached on error. Some fans will want to discount those last 11, when the opponent was charged with an error, but Trout's speed allows him to reach base on plays where other batters would be thrown out, and it may even pressure the defense into misplays and poor throws. However he reached base, those 11 plays helped his team: Trout was on base and he didn't make an out. Cabrera reached base 289 times, which is also quite good, 2nd in the AL.
There's something unfair here working in Trout's favor, but we'll come back to that later. For now he's ahead by 31 times on base.
Advancing on the Basepaths
Much of the excitement of a game comes from baserunning, both on stolen base attempts and balls in play. The chart below shows not only steals, but also Bases Taken (on fly balls, wild pitches, etc.), advancing from first to third on a single, scoring from first on a double, and scoring from second on a single. All this data is available at Baseball-Reference.com.
That's 49 bases for Cabrera, and 100 for Trout. The main differences are stolen bases (+30) and first-to-third (+21). The other aspect of advancement is extra-base hits.
Trout ranked 2nd in the AL in XBH, and Cabrera ranked 4th, both excellent, but Cabrera's home run lead provided 22 more bases than Trout's doubles and triples. Trout's base-advancement edge isn't really +51, it's +29.
Advancing Teammates on the Basepaths
The simplest way to drive in runs or move your teammates up on the bases is with a hit, especially extra-base hits, and in particular home runs. If there's a man on first, walks and HBP advance him as well. Let's look at hits, plus unintentional walks and HBP.
The numbers are pretty similar, but Cabrera's ahead by 25 total bases. However, Trout had 100 non-IBB walks and 9 HBP — a total of 109 — compared to 71 unintentional walks and 5 HBP for Cabrera, a total of 76. That's an advantage of 33 for Trout. Sabermetric research shows a walk is worth about 65% of a hit, so let's drop Trout's +33 to +22. Cabrera's ahead by three, 25 to 22.
I mentioned earlier that the numbers we're working with unfairly favor Mike Trout. That's because we haven't yet addressed outs. Trout made 719 plate appearances in 2013, compared to 652 for Cabrera. Trout had more opportunities to compile stats. So let's look at the equalizer: outs. I've broken them down into at-bats, double plays grounded into, and outs on the basepaths. Outs in the first category (AB) include sacrifices and fielder's choice, but not reached on error. Baserunning outs include caught stealing, pickoffs, and other assorted outs (like tagging up and getting caught in a rundown), but not force-outs.
Cabrera's double plays and Trout's outs on the basepaths effectively cancel each other out.
I've implied some really horrendous statistical equivalencies above, and I'm sure any sabermetricians reading this are tearing their hair out. FanGraphs offers an explanation that is far more precise (a double is worth 42% more than a single!), but in the meantime, the numbers above show Trout at +31 times on base, +29 bases advanced, -3 advancing teammates, and using 35 more outs. This undersells the value of Cabrera's home runs, but it also is not adjusted for park effects. The road numbers do favor Cabrera this year: he was a better hitter, not just in raw numbers but in context.
The difference, effectively, is slugging. But Cabrera's power advantage is neutralized by Trout's speed, and Trout made many more plate appearances. While both were excellent offensive players, Trout was excellent more times — 67 more. Remember, Cabrera only had 25 more total bases than Trout, but Trout took 51 more bases as a runner and reached base 31 extra times. Between his greater workload and his baserunning, Trout's offense edges Cabrera's, and his defensive advantage for 2013 was estimated at about 10 runs, or 1 Win Above Replacement. Trout's 2013 was graded at 9.2 rWAR and 10.4 fWAR, versus Cabrera's 7.2 rWAR and 7.6 fWAR. Once again, I'm not saying that Trout was better because his WAR was higher. I'm saying that his WAR was higher because he was better.
Why Go To WAR?
I've gone through a lot of numbers here, to show why Trout's contributions, many of which don't show up in the Triple Crown stats, were more valuable than Cabrera's. But the beauty of WAR is that you don't need to comb through doubles and triples and caught stealing and hit by pitch, because they're all accounted for. According to FanGraphs, 2012 wOBA values were .88 runs for a single, 1.26 for a double, 1.59 for a triple, and 2.06 for a home run. Those numbers reflect extensive research, and they should make sense to most fans. WAR adds in walks and GIDP and all that, adjusts all the numbers for park effects, compares them to replacement level — which might be a free agent or a player in AAA, someone a team could easily plug in — and produces a single number to reflect offensive value.
The defensive valuations vary a little more between rWAR and fWAR, as do their pitching calculations, but they usually line up pretty closely. You don't need to memorize any formulas or learn any acronyms to benefit from sabermetric analysis. You just learn enough about the system to decide whether the formula makes sense, whether you trust it. And if you ever doubt its conclusions, you look at the details to see why it makes sense. The Trout/Cabrera debate is a great example. People who don't understand sabermetrics have looked at the batting statistics, looked at Trout's WAR, and concluded that the statisticians must be doing some shady maneuvering to manipulate the number, but that's not the case at all.
The biggest gap between WAR and public perception comes from four factors:
1. Park Effects
Most fans apply a radical adjustment for Colorado Rockies batting stats, but no adjustment for any other ballparks.
Traditional fielding statistics aren't very useful, sabermetric fielding statistics aren't very accessible, and the eye test is sometimes wildly misleading. Since it's hard to gauge these things, most of us simply don't do it. Some of us apply subjective adjustments — usually too small, and occasionally much too large — while others simply expect center fielders and shortstops to hit like corner outfielders and first basemen.
I should acknowledge that sabermetric fielding analysis is not perfect, and not terribly reliable in a one-year sample. It's still better to make an informed guess than to ignore an important part of the game. With regard to Trout and Cabrera specifically, no one doubts that Trout is a more valuable defensive asset, and one or two wins per year seems like a conservative estimate.
3. "Small Stats"
These are stats that you can find in the box score, but most of us skip over. The more important include doubles, triples, walks, HBP, and GIDP. The really small stats are mostly baserunning, like going first-to-third on a single. Traditional analysis ignore most of these, except in extreme cases, but WAR accounts for all of it, and sometimes it radically changes our evaluation of a player. For Trout and Cabrera in 2013, just the first five "small stats" I mentioned add up to a huge advantage for Trout: +13 doubles, +8 triples, +20 walks, +4 HBP, and -11 GIDP.
The wOBA numbers we saw earlier show that Trout's doubles and triples were worth about the same as 14 home runs. His BB and HBP were approximately as valuable as 19 singles. If we refigure their batting averages by giving Trout 19 singles and Cabrera 11 outs, you find Trout batting .336 with 41 HR, and Cabrera at .341 with 44 HR. Add in park factors, baserunning, and defense, and it's clear why Trout's ahead. That's the beauty of WAR: every step of the calculations make sense.
We usually don't pay much attention to these stats, but they matter. And despite my use of the wOBA formula, this isn't serious sabermetrics here. You already know that doubles and triples matter, you know that grounding into double plays hurts the team. It's just hard to organize all these numbers in a way we can process. That's what WAR is good for.
4. Overrated Stats
This is mostly RBI. We're all tired of arguing about using on-base percentage rather than batting average, and I already counted walks as a "small stat" above, so let's focus on ribbies. Dave Cameron once described RBI as a team stat masquerading as an individual stat. RBI are mostly a function of runners on base. The stat under-represents leadoff men, players on bad offensive teams, and players in pitcher's parks. We have much better ways to measure clutch hitting, but it's a familiar stat that supposedly measures what many fans and sportswriters want to believe.
WAR: What Is It Good For?
We want our sports statistics to be simple. WAR is a little weird because it's simultaneously very simple (one number) and very complex (includes fielding, baserunning, and every aspect of batting). Ultimately, I like WAR because it's a time-saver. You can look at this one stat to get an idea of who the best players are, then research specifics from there if you're so inclined. It's a great starting point. It's also a lot easier to say that Player A had a better WAR than Player B, rather than going through the minutiae of triples and HBP and sac flies and so forth.
There are a lot of important stats that it's too much work to evaluate individually; that's what WAR is useful for: providing an overall picture of a player's value. All I am saying is give WAR a chance.