November 3, 2019 - by David Hess
John Calipari knows that past program success matters (Photo by Scott Winters/Icon Sportswire)
This post describes our methodology and process for creating college basketball preseason rankings for all 353 teams in Division I men’s basketball.
As one would expect from TeamRankings, our college basketball preseason rankings are driven by stats and modeling, rather than film study or media scouting reports.
Before we dive into the details of our approach, let’s cover a few basics.
First, it’s important to know that our preseason rankings are simply the rank order of the preseason predictive ratings that we generate for every Division I college basketball team.
So to create our preseason rankings, the first thing we do is calculate preseason ratings for every team.
In simple terms, a team’s predictive rating is a number that represents the margin of victory we expect when that team plays a “perfectly average” Division I team on a neutral court.
This rating can be a positive or negative number; the higher the rating, the better the team. A rating of 0.0 indicates a perfectly average team.
Because our predictive rating is measured in points, the difference in rating between any two teams indicates the projected winner and margin of victory in a neutral-site game between them.
For example, our system would expect Michigan State, which has a 2019 preseason rating of +21.6, to beat an average Division I team (with a 0.0 rating) by about 22 points on a neutral court.
It would expect Michigan State to beat Delaware State, which has a -20.0 rating, by about 41 or 42 points. And Delaware State would be expected to lose to an average team by about 20 points.
Understanding the nature of predictive ratings is critical, because they are a more precise metric than a simple ranking.
For example, Duke fans may not like that Kentucky is ranked ahead of them in our 2019 preseason rankings. But the two are only separated by 0.2 points, +19.5 for Kentucky and +19.3 for Duke.
So yes, if you put a gun to our head and forced us to rank order every team, we’d say Kentucky is going to be better than Duke this season. But the difference is so small that it’s practically meaningless. Based on our preseason ratings, Duke vs. Kentucky projects as a toss-up game on a neutral court.
However, it’s over a 2-point drop in preseason rating between No. 1 Michigan State and No. 2 Kentucky, which is more significant.
So don’t place too much stock in a team’s ranking. Ratings tell the more refined story.
Once the college basketball season starts, our predictive ratings go on autopilot. Every morning, our system automatically adjusts team ratings (and the resulting rankings) based on the game results from the day before.
Teams that win by more than our ratings had predicted see their ratings increase. Teams that suffer worse than expected losses see their ratings drop. Software code controls all of the adjustments and no manual intervention is required.
Generating preseason ratings, however, involves a more labor-intensive process that we go through before every new season starts. What we are trying to do, in basic terms, is to pre-calibrate our predictive ratings system. We want to give it a smarter starting point than simply having every team start out with a 0.0 rating.
Put another way, our preseason ratings are our first prediction of what we think every Division I men’s college basketball team’s predictive rating will be at the end of the upcoming season. And we need to make that prediction before any regular season games are actually played.
Despite being a substantial challenge from a data perspective, our approach to this process is still mostly data-driven and objective. However, there are some judgment calls incorporated, which we’ll explain below.
Before we get into the details, a brief history may help explain the how and why our current preseason ratings process evolved:
Why we took that final step is simple. Generating preseason team ratings using a customized model significantly improved the in-season game predictions made by our ratings — and not only in early season games, where one would logically expect to see the biggest improvement.
In fact, still giving the preseason ratings some weight even at the very end of the season improved our NCAA tournament prediction performance.
The payoff has been clear and measurable. For example, according to college basketball ratings analysis by Mark Moog, using data from the Massey College Basketball Rankings Composite, our rankings finished 1st out of 16 tracked systems in full-season prediction accuracy for the 2018-19 season. The year before that, our rankings finished 4th out of 14 tracked systems. The Pomeroy ratings were the only other system to finish in the top 4 both seasons, coming in 3rd last year and 2nd the year before that.
During every college basketball offseason, we first put in work to improve our preseason ratings methodology. We investigate new potential data sources, and refit our preseason ratings model using an additional year of data.
After implementing any refinements to our process and model, we then gather the necessary data from various sources, and generate our preseason ratings for the upcoming season. We typically complete the process a week or so before the regular season starts.
Now let’s get to the meat. By analyzing years of historical college basketball data — our current training data set includes team profiles going back to the 2007-08 season — we’ve identified a short list of descriptive factors that have correlated strongly with end-of-season power ratings.
We use a two-stage regression model to determine each factor’s weight in our preseason ratings:
Using a regression model helps ensure that the relative importance of each factor in our ratings is based on its demonstrated level of predictive power, rather than arbitrary weights that just “feel right” to us.
Finally, we group the impact of some variables into single components to help us interpret and talk about the model. Here are the components, which we’ll discuss in more detail below:
How good a team was in the most recent season — as measured by end-of-season predictive rating and not win-loss record — is the single best objective measure of how good that team will be in the upcoming season.
The year-to-year correlation coefficient for our predictive rating is +0.84. That’s very strong. The correlation of our preseason predicted ratings to end of season ratings is +0.90, so using last year’s rating gets us most of the way there.
In non-stat geek terms: Duke is not going to turn into Florida A&M overnight. Even “terrible” years for elite programs are good seasons in the overall college basketball landscape.
That said, other factors do contribute meaningfully to the final preseason ratings.
This factor measures how good a team has been in recent history, not including the previous season.
College basketball programs aren’t forged anew from the molten earth each season. They are continuations of the past. What happened 2, 3 or 4 years ago is relevant to this season for a number of reasons.
Some of the players are still around. Often times the coaching staff is largely the same. The facilities usually don’t change much, or the fan support. Geographic advantages and disadvantages don’t change. Looking at longer term performance trends measures the “brand value” of a program, so to speak.
We think most fans intuitively understand the importance of program history. If all you know about two teams is:
Which team do you think is likely to be better this year? (We’re going with Team B, in case it wasn’t clear.)
This is borne out by the numbers. The correlation between final predictive ratings in a given year and those from two seasons earlier is +0.76. (Remember, the correlation with the immediately previous season is +0.84.) The correlation with ratings from three seasons earlier is still +0.72., and four season ago is +0.70.
The returning offense component tells us how much additional improvement or decline we can expect based on the total offensive production (which we’ll explain shortly) of a team’s returning players, compared to a baseline expectation for a team of that quality.
The “additional” and “for a team of that quality” parts of that definition are important! A lot of the value of the returning players is already accounted for by the LAST YEAR component. In a way, you can think of that component as assuming that every team is returning an exactly average amount of their production from the previous season (so, about 50-55%).
If a team is returning less offensive production than that, it’s going to get docked some in the RETURNING OFFENSE component, even though the returning players might be very good. For example, Texas Tech in 2019 is returning only 29% of its offensive production, so it has a negative RETURNING OFFENSE value. Alcorn State is returning 76% of its production, so it has a positive RETURNING OFFENSE value. The returning players on Texas Tech are probably better than those on Alcorn State! But as a group their production was less than the “expected” returning value for a team as good as Texas Tech. Meanwhile, the returning Alcorn State players produced more than you’d typically expect for a team of Alcorn State’s quality.
In addition to simply looking at the percent of returning production, we make two additional small adjustments:
Again, we’re not doing these on a whim. These adjustments improve the accuracy of the model.
So, what do we mean by “offensive production”?
We calculate a player’s offensive production in 4 steps:
We sum the value for all players in order to find the total team offensive production. We can then look at the value of only the returning players to find the percent of returning production.
The returning defense component is very similar to the returning offense one. Like the offense, it’s the amount of additional improvement or decline expected based on the amount of returning defensive production, compared to a baseline for a team of that quality.
We calculate “defensive production” for each player based on the Dean Oliver definition of defensive rating, similar to the way we calculate “offensive production.” We then sum the production of all players, and calculate the percent returning.
And, again like returning offense, we make some additional adjustments beyond simply looking at the percent of returning defensive production:
The recruiting component represents the projected value of the last two recruiting classes. Most of the value (about 75%) comes from this season’s entering class, but there is still a bit of value in having a good class the previous year. Presumably this is because those highly-ranked players are likely to improve more this season than other non-elite recruits are.
In order to make our recruiting class rankings, we use RSCI consensus recruiting data. Based on their average rank across the various recruiting sites, each player is assigned a score that represents their expected value to a team. These scores are based on analysis of past data, mapping recruiting rankings to team rating improvements.
We then sum the value of all recruits to get a team’s overall class recruiting value.
Transfer value is calculated very similarly to returning player value. We calculate offensive and defensive production, and total those up to get the overall value for a player.
However, there’s a wrinkle here. In addition to the value calculation using a replacement-level baseline, we also calculate an overall production value using a higher baseline closer to the Division I average efficiency. This results in a second — and lower — player overall production value.
We blend those two values based on the initial predicted rating of the player’s new team from the “first stage” regression mentioned above. The better the team is the more weight we give to the second value.
In effect, this means that the same player has more value when transferring to a bad team than when going to a good one. This makes some sense. First, the worse team will likely have more minutes available for him. Second, worse teams tend to be in worse conferences, and play worse schedules, so the player is more likely to be facing easier competition, which ought to be better for his production.
It also means that the same player has more value when returning to a good team than when transferring to a different good team. We’re OK with that — players transfer for a reason, and this could reflect that transferring players tend to have hidden issues that aren’t evident from the efficiency stats. Or, it could simply reflect it takes some time to learn a new system and fit into a new team, and there is some chance the “fit” won’t be as good as before.
The coaching component is less rigorous than the others. In fact, it’s a manual adjustment much like the market adjustment that we’ll discuss below.
For teams with new coaches, we review the coaching history for both the old and new coach. This includes inspecting how each school performed (in terms of final season ratings, win loss record, and NCAA tournament seeding and results) before, during, and after the coach’s tenure there.
When the new coach appears to be better or worse than the old coach, based on their past coaching resume, we make an adjustment.
After our model generates its data-driven preseason ratings for college basketball, we then compare those ratings (and the resulting team rankings) to the betting markets and human polls.
If our assessment of a specific team seems way out of whack in comparison to those benchmarks, we’ll investigate more. Primarily, we’re looking to identify some factor not taken into account by our model (e.g. an injury in the previous season, or a coaching change 2 or 3 seasons ago) that is likely to impact the expected performance level of a team.
In some of those cases, we end up adjusting our rating to be closer to the consensus. As a result, this final part of the process does inject some subjective judgment calls into our process.
We’re data guys, so it typically takes a lot of convincing for us to incorporate some level of subjectivity into our predictions.
There’s a very high statistical bar to reach in order to anoint a particular stat as generally predictive of future performance. Consequently, very few stats pass the test.
That’s a good thing. One of the biggest challenges of predictive modeling is filtering out the signal from the noise, and “false positives” based on small sample sizes can ruin the future accuracy of a model.
At the same time, lots of different factors are still likely to impact the future performance of a particular team in some significant way. But until we have a large enough sample size of similar events to analyze, it would be very risky to incorporate them into our model.
Especially in more outlier-type cases, our best solution for the foreseeable future may be to make manual adjustments to incorporate the opinion of the betting markets.
Of course, now that we’ve been making these market adjustments for several years, we’ve evaluated them, and … they do improve our overall accuracy. So we’ll continue to use them.
There are many different ways to make college basketball preseason rankings. The approaches can vary greatly, from media power rankings to “expert” analysis, from building complex statistical models to making inferences from futures odds in the betting markets.
And speaking frankly, there’s plenty of crap out there. But there’s also no Holy Grail.
Within ten seconds of looking over our preseason college basketball rankings, you’ll probably find several rankings you disagree with, or that differ from what most other “experts” or ranking systems think. That’s to be expected.
When the dust settles at the end of the season, our college basketball preseason ratings, and the various projections we generate using them, will almost certainly be way off for a few teams. As happens every year, some teams simply defy expectations thanks to surprise breakout performances, while other teams are impacted by injuries, suspensions and other unanticipated events.
Nonetheless, the primary goal of our preseason analysis is to provide a baseline rating for each team (or “prior” in statistical terms) that makes our system better at predicting game results. We’re most concerned about the overall accuracy of the system — that is, how good it is at predicting where every predictive rating for every college basketball team will end up at the end of the upcoming season.
For that purpose, we’ve settled on an almost entirely data-driven (but still subjectively adjusted in a handful of cases) approach to preseason team ratings. And so far, this approach has delivered very good results.
Printed from TeamRankings.com - © 2005-2020 Team Rankings, LLC. All Rights Reserved.