The RPM Smell Test and the Utah Jazz

April 10th, 2014 | by Dan Clayton
Is Hayward less valuable than nine Spurs? A look at RPM. (Photos by D. Clarke Evans/NBAE via Getty Images)

Is Hayward less valuable than nine Spurs? A look at RPM. (Photos by D. Clarke Evans/NBAE via Getty Images)

This week the basketball metrics community gave us another lens through which to analyze player performance. ESPN’s new Real Plus Minus (RPM) stat combines on-court plus/minus with absolute statistical contribution to give us a sense of how much of a team’s success or failure on the scoreboard we can attribute to a certain guy, on either end of the floor.

This comes at an interesting time relative to the dialogue here at SCH, because we’ve been talking lately (here and here) about whether certain stats systems either ladder up to or distract us from a more holistic understanding of basketball. So what’s the verdict on RPM? So far, that it’s an interesting and useful tool, as long as we know what it is and what it isn’t.

Here’s what it’s not. It’s not an overall player rater. Or if that what it attempts to be, it’s a poor one, one that asserts that Chris Andersen, Amir Johnson and Nick Collison would be the best players on the average team1. Or that Carmelo Anthony and Damien Lillard are significantly worse than Matt Barnes and Patrick Beverley.

RPM is not altogether new. It’s really an adjusted version of xRAPM, which adds box score stats to Regularized Adjusted +/0 (RAPM, explained here or straight from the creator here). RAPM essentially is an adjusted +/- that tries to quiet the statistical noise created by small-sample outliers and then xRAPM factors in player stats. RPM does this differently than xRAPM by using what’s called Bayesian logic2. There are other adjustments, too, but ESPN plays a little coy with us as far as sharing the formula. We know that they try to account for age and current score, which mostly help the tool’s predictive ability, rather than address the blind spots of xRAPM.

Inputs

To understand the blind spots of any stat system, first you have to know where its inputs are. Broadly speaking, all inputs fall into one of just a few categories, such as:

  • Box score – Since these are the most readily accessible, a lot of performance aggregators try to define player performance based on some calculation of the raw sum of their measurable outcomes in a box score. PER, Win Shares and eFG% are examples of metrics you can calculate just by looking at the stat sheet. The caution here is that a lot of very valuable basketball behaviors go unaccounted for, like screening, team defense, a strategic cut, a hockey assist, etc. Behaviors don’t matter, outcomes do.
  • Scoreboard – A whole family of stats tries to address what happens where it matters most, and is usually measured as a function of who is on the floor (player-wise, or combination-wise). The strength here is that this correlates most directly to winning. The drawback is there are a lot of variables that can muck this up with noise. And if you are on a losing team, there’s a good chance your variables (say, teammates) have a more averse impact on your performance here than somebody on a winning team, so right off the bat, the playing field isn’t level. Some versions of stats built around the scoreboard try to adjust to compensate for some of that noise.
  • Play-tracking – Here, you actually watch for specific behaviors and your data set is essentially a series of tick marks for when something happened, either as captured by a person or a video tracking system. This can account for a wider set of behaviors, but relies on accurate classification, which is difficult when you could ostensibly have 30 different offensive systems and 30 different defensive systems. A whole slew of new stats are available in this family, but most are only useful in a very specific context.3

There are dozens of wrinkles, adjustments and permutations, but at a broad level, most stat systems are sourced by one or more of those input types, and therefore possess some of the same benefits and watch-outs.

RPM tries to roll together the first two, and then adjust based on the relative performance of people around you. It’s first input is what happened on the scoreboard while you were in, then it tries to figure out how much of that you’re responsible for using a formula that’s roughly similar to WS or PER, and then it looks at who was around you.

+/- Blind Spots

RPM still has the problem of judging a player from a negative-differential team more harshly than his friend on a positive-differential team. Just by being a member of the Miami Heat, your RPM is likely to be positive because most of the time, your team is winning. If you play only in short, hyper-energetic spurts with at least one superstar, you’ll probably reap extra benefit. That might explain how Andersen ranks 13th, while someone like Gordon Hayward is 80+ spots lower.

The adjustment for teammate and opponent quality helps even out the comparison for players on the same team — meaning Jeremy Evans might be forgiven relative to other Jazz players’ RPMs since most of his minutes are with bench units — but he’ll still look worse compared to a statistically similar bench player on a team that usually outscores its opponents. And maybe he should; at the end of the day the most important measure of player quality is winning. But this is the inherent watch-out in using any type of +/- stat to compare players from different teams with very different records.

That probably explains low ratings by most Jazz players. The stat ultimately tries to figure out how much of the team’s plus/minus pie is yours, and if the pie is smaller for the Jazz, then Hayward’s piece is bound to be smaller than a player with similar stats on a winning team. San Antonio, for example, has NINE players with a higher RPM than Hayward, Utah’s best player. I’m as big a believer in the Spurs’ culture and system as anybody, but in overall terms, they do not have nine guys who would be Utah’s best player. Sorry, Boris Diaw. Same goes for OKC, who has 6 players rated above Hayward in RPM even though only three are better in PER — bigger pie from a scoreboard perspective, better numbers for everyone. Would Hayward really be 7th best on the team if he joined the Thunder tomorrow?

Anthony has the 11st best PER in the league, but ranks behind 51 other players in RPM because he plays for a team that usually is getting outscored4. That doesn’t mean PER is right and RPM is wrong. His actual value is probably somewhere in between 11th and 52nd. The disconnect here simply represents an invitation to apply intuition and our eyeballs to further investigation.

Box Score Blind Spots

RPM also carries the flaws associated with its box score inputs. Rhetorical question: using just a stat sheet, what’s the best way to determine who had the most impact on a team’s defensive performance? The answer to that question is near impossible. Steals and blocks do some of that, but the very best teams only influence their opponent possessions about 10-15% of the time with those behaviors, so how do we account for the other 85 possessions? Rebounds measure the endpoint of the defensive possession but fail to account for who made the player miss the shot in the first place and how (or, in cases where the shot went in, doesn’t describe the defense at all).

But that’s all the box score has to work with. So any box score-related stat is bound to overvalue those three columns on the stat sheet and completely ignore actual defensive behaviors like denying, positioning, rotations, help, adhering to a team system, etc.

We see this in the results, as even ESPN’s own Kevin Pelton points out. Many of the guys who rate out better than common sense tells us are guys who are thieves, shot-blockers or low-minute “energy guys.” Bruce Bowen would probably rate very poorly, just as he tended to do in PER and Defensive WS, despite having the reputation for being a very good behavioral defender within a team system. In general, anyone with a slight (because the sample is so small on blocks and steals) advantage in those rare occurrences of certain defensive outcomes will rate well regardless of their overall defensive behaviors. This is not a unique blind spot where RPM is concerned; the same is true of just about any box score-generated stat that tries to explain defense in an overall way (like DWS).

Another Tool

Does that mean RPM is bad, or useless, or lying to us? No. It just means that, like with any metric, we have to know what it can tell us and what it can’t. It can tell us who tends to see success on the floor, and it can guess as to how much of that he contributes statistically, but it still can’t articulate the value of defense, screening, cutting, rotating, or how any of those behaviors correlate to the box score. For those types of insights, we have to use a combination of stat systems, basketball knowledge and good old-fashioned basketball-watching.

Do I think Melo is worse than 51 other players in the league? Do I think Hayward would be the 10th best Spur if they acquired him tomorrow? Do I think Nick Collison is a top-10 player? No, no and no, which is why I don’t think this is a good aggregator of overall player value. It is a good next phase in the conversation around the impact that players have on the scoreboard and on winning games, and we should look at it through that lens.

One Comment