Under The TeamRankings Hood, Part 3: Pros And Cons

March 11, 2011 - by David Hess

[This post is the third in a four part series on our ratings and models. Part 1 is here. Part 2 is here. Today we examine strengths and weaknesses of the our ratings, compared to those published by Ken Pomeroy. Part 4 will cover our statistical models that combine the ratings with outside information in order to make game winner, spread, totals, and money line predictions.]

Any college basketball stat geek with a pulse is familiar with Ken Pomeroy’s ratings. We hold Ken in very high regard, and he has done an excellent job spreading the gospel of tempo-free statistics to the un-enlightened masses. Having said that, there appears to be a prevailing sentiment among certain college hoops aficionados that the Pomeroy ratings are the gold standard by which all others are measured, and nothing else comes close.

We disagree. Different rating systems use different mathematical designs, and the “best” system often depends just as much on the application for which you wish to use it than on the methodology or sophistication of the system itself.

Below, we’ll outline the main differences between our Predictive power ratings and KenPom’s, then look at when you might want to use one versus the other.

To Split Or Not To Split

First, the Pomeroy ratings split offense and defense into separate ratings. This seems helpful for descriptive analysis, since it’s easy for most fans to understand: “OK, I get it, this team has the #7 rated offense but only a mediocre #128 defense, so they’re not a Top 10 team overall.”

However, we don’t necessarily view offense and defense as two separate, highly independent variables. For one thing, a good defense can create steals that lead to easy points in transition.

Just today, Luke Winn published a table indicating that only three teams in the country average more than 1 point per half court possession, whereas nearly half the teams in the country average more than 1 PPP overall, meaning a large portion of offensive efficiency must depend on transition scoring, which is driven partly by defense.

For another, when a team builds a lead by really clicking on offense, they can reduce their defensive effort with little consequence, and vice versa. Even Dean Oliver himself, the godfather of modern tempo-free basketball analysis, acknowledged that offensive and defensive production are related, which is why he developed the correlated Gaussian method.

Reflecting this idea, our ratings look at a game as the overarching unit of measure. While Ken’s ratings are driven by adjusted tempo-free offensive and defensive efficiency, our ratings are essentially driven by adjusted game scores.

Blowouts

Second, the Pomeroy ratings do not discount increasing margins of victory, meaning that the 10 points between winning by 30 and winning by 40 are worth the same as the 10 points between losing by 5 and winning by 5. In our Predictive ratings, the latter difference is more valuable.

The win credits (see Part 1) earned by a team in our Predictive ratings are determined largely by what the final score would tell us about the probable outcome of a hypothetical rematch. A team that wins by 1 gets barely over 0.5 win credits, while a team that wins by a huge margin gets very nearly 1 full credit.

Extra points tacked onto a blowout victory tell us very little. Whether a team wins by 30 or 40, the conclusion is the same: they are far better than the opponent. Because those ten points change our conclusion very little, they also produce little change in the win credit awarded.

Picking Winners

So now for the million dollar question: Which system is more accurate? The answer is, it depends on how you define accuracy.

We collected published win probabilities from kenpom.com for all games through February 14, [not chosen for any reason other than that’s when we originally pulled the data] and compared them to predictions generated using our Predictive ratings over the same time frame. We then grouped the predictions according to the published degree of confidence, and compared the expected correct pick percentage against the actual correct pick percentage:

Pomeroy Ratings & TeamRankings Predictive Power Ratings: Prediction Accuracy

	80%+ Confidence			65-79% Confidence			50-64% Confidence			All Games
Rating	Exp%	Act%	Diff	Exp%	Act%	Diff	Exp%	Act%	Diff	Exp%	Act%	Diff
TR Predictive	89.3%	89.5%	+0.2%	72.5%	69.7%	-2.8%	57.9%	57.3%	-0.6%	75.0%	74.0%	-1.00%
Pomeroy	90.8%	88.4%	-2.4%	72.2%	68.7%	-3.5%	57.2%	58.9%	+1.7%	77.7%	76.0%	-1.70%

As you can see, Ken’s currently got us beat by a hair in overall accuracy this year, although we’re both doing well compared to the ratings listed at The Prediction Tracker.

While he has us pipped in correct pick percentage, our predictions have been more “true”. In other words, if both TR and KenPom tell you that a team has an 80% chance to win, TR has come closer to getting exactly 80% of those games right. As Ken himself has acknowledged, the win probabilities produced by his ratings seem to run a little hot. As a result, Pomeroy’s odds tend to underestimate the chances of large upsets occurring, or of a low seed making a deep NCAA tournament run.

Of course, we need to take these results with a grain of salt, since sample sizes over one season are still relatively small (about 4,000 games).

The main point here is that there are several smart ways to approach a challenge like algorithmically rating team performance. We and KenPom have adopted different philosophies toward achieving that goal. There are good and bad aspects to both of our systems, and our respective performance in terms of prediction accuracy is comparable.

Coming Tomorrow: Descriptions of the various models that we use to predict game winners, score margins, and spread or totals picks.