Wednesday, June 1, 2011
Total Statistical Production: Basketball
Dave Heeren is one of the most important sports statisticians of all time. He doesn't have the name recognition of a Bill James or Tom Tango, but Heeren's TENDEX system formed the basis of almost every subsequent attempt at statistical player evaluation in basketball. I read Heeren's Basketball Abstract I don't know how many times in the '80s and '90s.
Since then, we've had similar creations, including Dean Oliver's Approximate Value (AV), John Hollinger's Player Efficiency Rating (PER) and Game Score, the NBA's own Efficiency statistic, and David Berri's Win Score and Wins Produced, based on the book The Wages of Wins, by Berri, Martin B. Schmidt, and Stacey L. Brook. The heart of Heeren's work, and most of the others, is that through most of NBA history, an average possession is worth about one point. Let me quote from Heeren:
"The formula works because a ball possession is worth one point on the average. This means that rebounds and steals which acquire possession are valued at one point apiece, an assist which converts an ordinary one-point possession into an easy two-point basket is worth one, and missed shots which result in squandering possession are equivalent to turnovers and are minus-one. Of course, if an offensive rebound follows a missed shot, the team having missed the shot is back where it started — still having the ball without having scored. A minus-one is given to the player who missed the shot and a plus-one to the player who rebounded. The sequence results in zero, as indeed it should because no points are scored."
The problem is that a rebound (+1) is not as valuable as a steal (also +1), and a missed field goal or free throw (both -1) is not as bad as a turnover (-1). Why not? Because only about 70% of missed shots are rebounded by the other team. If a player misses a shot, his team still has a 30% chance of getting the ball. If he commits a turnover, his team's chance of having the ball is 0%. A steal always turns a possession for your opponent into a possession for your team. Getting a steal is the equivalent of forcing the miss and grabbing the rebound.
Heeren is right that an offensive rebound cancels out a missed shot — it's appropriate for those to be equal — but the real value is plus or minus 7/10 of a possession. And he's right that a steal cancels out a turnover, since those are opposite sides of the same coin. But a missed shot doesn't create a turnover — it creates a neutral possession. With that in mind, the formula for Total Statistical Production (TSP) is:
PTS - .5*FTA - FG - .7*FGM + .7*ORB +.3*DRB +.5*AST + STL - TOV + .5*BLK
As in other systems, everything is scaled to points. A missed free throw is a half-point deduction. A successful free-throw is a half-point addition. A successful field goal which gives the other team possession is a full-point deduction, since it ends the possession. Thus, a regular two-point basket gives a player +1: 2 PTS - 1 FG. A missed field goal (which might be rebounded by the shooter's team) is -0.7, while an offensive rebound is +0.7. A defensive rebound is valued at +0.3. In this system, a player must shoot at least 41.2% on two-pointers, 26.0% from the arc, and 50% from the line in order to score positive points as a shooter. Anything below those levels will lower his score.
Take that 41.2% shooter. In 1,000 shots, he will score 824 points and miss 588 times. About 412 of those will result in defensive rebounds, so his team loses the ball. But there will also be about 176 offensive rebounds. So that's 824 points and 176 offensive boards. 824 + 176 = 1,000. This is a zero-value offensive player, and a zero-value scorer in TSP. Someone can shoot 42% or 43% and get a positive TSP rating from that, but his value will still be very close to zero-level. TSP is compared to zero, not to replacement level, so zero-value is a very poor player.
TSP values assists at half a point, steals as plus-one, and turnovers as minus-one. Blocked shots (on defense) are worth half a point. Fouls, which may be good or bad, are not part of the system. A foul which turns an opponent's breakaway into an in-bound pass, or an easy layup into a pair of free throws, is a good foul. A defensive foul near the end of the game is often helpful, even necessary. Fouls can also be associated with aggressive defense, which we would not wish to punish. I am open to the idea of a small penalty for fouls in the future, but right now I don't see a need. However, as Heeren expressed more than 20 years ago, the league should track offensive fouls separately, since those always result in loss of possession and are equivalent to a turnover. Similarly, flagrant/technical fouls are always negative.
The system is logically consistent. A missed shot (-0.7) and an offensive rebound (+0.7) cancel each other out. A turnover (-1) and a steal (+1) cancel each other out. The missed shot is less harmful (-0.7) than a turnover (-1), because a turnover definitely goes to the other team, but a missed shot creates a loose ball that the shooter's team might get back. A missed shot (-0.7 for the shooting team) and a defensive rebound (+0.3 for the other team) — effectively resulting in a turnover — are worth the same amount (1) as a turnover. The formula is in harmony. To convert TSP into an easy-to-handle number, I divide the result by 100.
There is one variable with which I am not entirely satisfied: Free Throw Attempts, currently scored as minus-½. Two successful free throws are worth the same (+1) as a successful field goal. That makes sense. But three successful free throws are worth less (+1.5) than a successful three-pointer (+2). A two-point basket plus a free throw is also worth +1.5. That's not fair. If a player gets fouled while attempting a three-pointer and makes all three shots, that should be the same value as a successful three. If a player drives and gets fouled, but his shot goes in and he makes the free throw, that should be the same value as a successful three. There's a similar problem regarding missed free throws, which are over-penalized.
Since I recognize this problem, why not fix it? Honestly, because I'm not smart enough to see a solution. I could lower the deduction for free throw attempts, say, from 0.5 to 0.4. But now a pair of free throws is worth more (1.2) than a two-point basket. Values with a free throw attempt set at -0.4:
2-pt FG: +1.0
2 FT: +1.2
FG + FT: +1.6
3 FT: +1.8
3-pt FG: +2.0
That seems just as bad, and unnecessarily complicated to boot. So why not just penalize missed foul shots, and let successful ones stand? But that would overvalue free throws. Sink them both, and now that's worth twice as much as a two-point field goal, equal to a three-pointer. Besides, a lot of free throws are at the end of the game when the other team is trying to foul, wants you shooting free throws. If you go 1/2, your opponent is happy.
Maybe I'm missing something obvious, and someone better at math than I am can tell me how to correct the formula. Wins Produced estimates the value of a free throw attempt at -0.47, but in the interest of saving some decimal points, I think -0.50 is close enough. For now, I'm willing to live with the fact that free throws are slightly undervalued, and make subjective adjustments for this.
Here's an example of the system in action: Carmelo Anthony's 2010-11 season. Anthony scored 1,970 points. That's +1970. He made 684 field goals (-684) and missed 817 (-572), bringing his score to +714. Remember, a missed shot is -0.7, a two-point field goal is +1.0 (+2 points, -1 possession), and a three-pointer is +2.0 (+3 pts, -1 poss). 'Melo also shot 605 free throws (-303), so his TSP from shooting is +411. Another way of looking at this is as 589 two-point field goals (+589), 95 three-point field goals (+190), 817 missed field goals (-572), 507 free throws (+253), and 98 missed free throws (-49). So 589 + 190 - 572 + 253 - 49 = 411.
Anthony grabbed 118 offensive rebounds (+83) and 445 defensive rebounds (+133), raising his score to +627. Add 221 assists (+110) and 46 blocks (+23), he's at +760. Finally, give him +68 for his steals and -206 for turnovers: +622. Divide by 100, and Anthony's 2010-11 TSP is 6.22. That's the system.
PTS - .5*FTA - FG - .7*FGM + .7*ORB +.3*DRB +.5*AST + STL - TOV + .5*BLK
Total Statistical Production is designed to rate a player's efficiency and production, so players with more floor time will have higher scores. A player who puts in 40 minutes might see his play suffer as a result of fatigue. His per-minute averages will drop as he tires, but someone who plays 40 minutes a game is out there because his team needs him, because he's better than the next-best guy. I don't want this system to punish iron men or superstars who are too good to sit. Someone who gives his team 40 minutes a game is more valuable than a sixth man who plays for 20 minutes. It's like a starter vs. closer argument in baseball: the relief pitcher has to be a lot better per inning to merit a higher score. Same thing here: a backup has to be a lot better per minute to be more valuable than a starter.
That said, TSP can be easily adapted to per-game or per-minute value: simply divide the original score by Games Played or Minutes Played. TSP is not adjusted for game pace, but that's possible, too, both on the team and league levels. The indispensable Basketball-Reference.com offers estimates of team possessions, or you can use a simple formula like the one below. The value of a game pace adjustment is especially apparent when looking at teams rather than individuals. Without an adjustment, the Denver Nuggets score the highest TSP in the NBA for 2010-11, followed by 'Melo's other team, the Knicks. The Bulls, who finished with the best record in the league, are 11th. When you apply a simple game pace adjustment — TSP * lg(PF+PA) / tm(PF+PA) — the results are much more intuitive:
1. San Antonio Spurs (61-21)
2. Los Angeles Lakers (57-25)
3. Dallas Mavericks (57-25)
4. Chicago Bulls (62-20)
5. Boston Celtics (56-26)
6. Miami Heat (58-24)
7. Oklahoma City Thunder (55-27)
8. Denver Nuggets (50-32)
9. Houston Rockets (43-39)
10. New York Knicks (42-40)
The bottom five, in descending order, are: Kings, Wizards, T-Wolves, Nets, Cavaliers. All of that seems pretty reasonable to me. Please note, however, that TSP is intended to evaluate individual players rather than teams.
Here are individual TSP leaders for the 2010-11 season:
1. LeBron James, 8.6
2. Dwight Howard, 8.4
3. Pau Gasol, 8.3
4. Kevin Love, 8.3
5. Chris Paul, 8.2
6. Blake Griffin, 7.5
7. Kevin Durant, 7.3
8. LaMarcus Aldridge, 7.3
9. Dwyane Wade, 7.2
10. Zack Randolph, 7.1
11. Derrick Rose, 6.9
12. Al Jefferson, 6.9
I suspect the first thing you noticed is that league MVP Derrick Rose is 11th. Plenty of people feel James or Howard was more deserving, but no one has Rose outside the top 10 (except the Wages of Wins folks, who rate him even lower). Rose is a little underrated by raw TSP because the Bulls had a fairly slow game pace, and he climbs to ninth if you adjust for that. But compare Rose to Chris Paul, ranked fifth. Both have similar shooting percentages and rebound totals. Rose scored 758 more points, which is a ton, but he also took 669 more shots. Paul had 150 more assists, but the biggest difference is steals and turnovers. Paul had 188 steals and only 177 turnovers (which is phenomenal). Rose had 85 steals and 278 turnovers. That's a difference of 204 possessions.
Rose rates behind Paul because, although he scored 758 more points, Rose had 150 fewer assists and used about 775 more possessions:
* 100 fewer steals
* 100 more turnovers
* 300 more field goals
* 300 more missed shots (about 200 possessions)
* 171 more free throw attempts (about 75 possessions)
TSP is not punishing Rose for his field goals or free throws: all of his successful shots score him points. A two-point field goal turns a normal one-point possession into a two-point possession — that's +1 in the formula. But it's not +2. A successful shot ends your team's possession. What TSP is saying is that those 669 shots and 171 free throws weren't worth +758 — more like +125. Shot are supposed to yield points. A good formula can't reward someone just for shooting. Paul makes up the -125 deficit with 159 assists (+80), 103 steals (+103), and 101 fewer turnovers (+101).
I wouldn't want to argue that Rose wasn't one of the 10 best players in the NBA this season. Put him as high as sixth, you'll get no argument from me. But this is a 44.5% shooter who didn't lead the league in scoring, doesn't rebound much, wasn't among the top five in assists, was in the top five in both turnovers and missed shots, and isn't a particularly good defensive player. I think 9th is closer to the mark than 1st.
Kevin Love easily led the league in TSP/G and TSP/MP, followed by Howard, James, Paul, and Gasol. TSP ranks Kobe Bryant (fourth in MVP voting) behind his teammate Gasol (no MVP votes), despite 537 more points. Kobe missed 900 shots this season, most in the NBA. Gasol missed 527. Kobe had 100 more assists, but Gasol had 200 more rebounds. Kobe had 50 more steals, but 100 more turnovers. Gasol blocked 100 more shots. Give me the big man.
My choice for MVP would have been Dwight Howard. His TSP comes in slightly behind James, and Love was better per-minute, but Howard is a great defensive player and shoots a lot of free throws, both of which are undervalued in the formula. As you have probably read before, the statistic isn't the analysis, just the framework of the analysis. Total Statistical Production is intended to be the beginning of the argument, not the end of it.
TSP is a retrodictive system, not a predictive one. It is designed to tell us what already happened, not what will. Of course the system has some predictive power — I feel comfortable assuming that James and Howard will be top-10 players again next season — but that's not what it's designed for. TSP looks at a game, season, or career, and estimates a player's statistical contributions using a simple formula. The results are similar to those of more complicated systems, but I would argue in favor of TSP because it is what
Arturo Galletti describes as something "a 10-year-old could do with a pencil." That's not necessarily a bad thing. In TSP, it is easy to determine why a player rates where he does.
The first test any statistical rating system must pass is common sense. Below are TSP rankings for the past three decades; I believe the system produces reasonable results each time. It sees superstars as superstars. TSP is fair to guards, forwards, and centers; it is fair to shooters, rebounders, and ball-handlers. I don't believe any major statistic is obviously over- or under-rated, though — like other statistical rating systems — it does not properly credit defensive skill. Here are TSP's top three for each of the last 30 seasons, with the leader's name in bold.
1981-82: Moses Malone, Magic Johnson, Julius Erving
1982-83: Magic Johnson, Larry Bird, Moses Malone
1983-84: Larry Bird, Adrian Dantley, Alex English
1984-85: Larry Bird, Michael Jordan, Isiah Thomas
1985-86: Larry Bird, Charles Barkley, Magic Johnson
1986-87: Michael Jordan, Larry Bird, Magic Johnson
1987-88: Michael Jordan, Charles Barkley, Larry Bird
1988-89: Michael Jordan, Charles Barkley, Magic Johnson
1989-90: Michael Jordan, Charles Barkley, Magic Johnson
1990-91: Michael Jordan, David Robinson, John Stockton
1991-92: Michael Jordan, John Stockton, David Robinson
1992-93: Hakeem Olajuwon, Michael Jordan, Karl Malone
1993-94: Shaquille O'Neal, David Robinson, Hakeem Olajuwon
1994-95: David Robinson, Shaquille O'Neal, John Stockton
1995-96: David Robinson, Michael Jordan, Karl Malone
1996-97: Karl Malone, Michael Jordan, John Stockton
1997-98: Karl Malone, Kevin Garnett, Tim Duncan
1998-99: Jason Kidd, Shaquille O'Neal, Gary Payton
1999-00: Shaquille O'Neal, Gary Payton, Kevin Garnett
2000-01: Shaquille O'Neal, Kevin Garnett, Dirk Nowitzki
2001-02: Tim Duncan, Elton Brand, Kevin Garnett
2002-03: Kevin Garnett, Tracy McGrady, Tim Duncan
2003-04: Kevin Garnett, Peja Stojakovic, Shawn Marion
2004-05: Kevin Garnett, LeBron James, Shawn Marion
2005-06: Shawn Marion, Kevin Garnett, LeBron James
2006-07: Shawn Marion, Kobe Bryant, Dirk Nowitzki
2007-08: Chris Paul, LeBron James, Amare Stoudemire
2008-09: Chris Paul, LeBron James, Dwyane Wade
2009-10: LeBron James, Kevin Durant, David Lee
2010-11: LeBron James, Dwight Howard, Pau Gasol
Those are all great players, not a crazy selection on the list. I suppose the one who might require some explanation is Shawn Marion. I wasn't shocked to see him sneak into the top three once or twice, but Marion's rating where he did was as much a surprise to me as much as it probably is to you. I even checked to make sure I hadn't mislabeled the statistics for Shaquille O'Neal or something. I hadn't. On paper, Shawn Marion was the most productive player in the NBA for a couple of years.
Let's use 2005-06 as an example of why he rates how he does. That season, Marion's teammate Steve Nash won MVP. Nash scores as a very good player, eighth in the league according to TSP. Nash dished 826 assists that season, almost 700 more than Marion (143). But the big guy outdid him everywhere else. Marion had 300 more points, 600 more rebounds, 100 more steals, 100 more blocks, and 150 fewer turnovers. This is a stat-based system — it doesn't see intangibles — but it's hard to understand how Nash's 700 assists could be worth more than Marion's 1,250 points, rebounds, steals, blocks, and turnovers.
One reason I like TSP is that it doesn't require a positional adjustment. The list above includes 16 seasons by point guards, 12 shooting guards, 18 small forwards, 29 power forwards, and 15 centers. Everything is basically equal except power forward, and their dominance is a chronological fluke, not a bias in the system. There's no slant towards big men — centers rate the same as everybody else.
The equality among positions in TSP goes back more than 30 years. It is true, though, that in the game's early years, centers and power forwards dominated. I'll address that next week, along with ideas for using TSP more effectively, particularly when ranking careers or multiple seasons. We'll also tackle Total Statistical Production Over Replacement.
In the meantime, I believe the simple formula I've developed is effective and interesting, but I would be happy to address questions or get ideas about making it better.
2017 update: A new version of this article is available! The Best NBA Players of 2016-17 includes the formula for TSP Over Replacement and rankings for the 2016-17 season, including James Harden and Russell Westbrook.