Under the TeamRankings Hood, Part 4: Models, Models, Everywhere

[This post is the last of a four part series covering our ratings and models. Today we present an overview of each of our statistical models, which combine the ratings covered in Parts 1-3 with outside information in order to make game winner, spread, totals, and money line predictions. For background, see the previous posts. Part 1: Power Ratings Basics. Part 2: Defining Each Rating. Part 3: Pros And Cons]

In various places across our site, you’ll find game predictions. For example, our NCAA college basketball win picks page shows the win odds for all the current day’s games, and all our game matchup overviews have game winner, spread, totals, and money line picks.

These predictions are derived from six different statistical models, which range in complexity from simple historical win percentages to a machine learning algorithm that uses hundreds of input variables. To see an example of predictions from all six models, check out: Georgia vs Washington.

Below are detailed descriptions of all the models, along with pros and cons of each. We hope we’ve covered all the most important details, but if anything is still unclear, feel free to ask questions in the comment section.

Predictive Power Ratings Model

Our Predictive Power Ratings Model uses the Predictive power ratings described in the first three parts of this series. To summarize, a computer algorithm iteratively analyzes data on every Division I NCAA basketball game result this season. In the end, every team receives a simple numerical rating (e.g. 92.4 or 104.7) indicative of its actual, documented, proven track record at outscoring opponents.

Like fine wine, predictive power ratings tend to improve with age. The more game results data recorded for a given season, the lower the potential impact of luck on a team’s overall performance results, and the more “connections” the model can make between teams in conferences of varying strength.

By comparing the predictive ratings of any two teams, you can determine projections for the game winner and expected margin of victory in a hypothetical game between the two teams. Calculating win odds is more complex; a formula translates the differences in predictive ratings into respective win odds for each team.

If you went back and recreated the entire NCAA basketball season using the final predictive power ratings for every team and the prediction methods described above, all 300+ Division I teams would end up with margin of victory performance and a win-loss record equal or very close to what actually happened.

Outputs: win odds, margin of victory

Strengths: Predictive power ratings are relatively abstract measures that cut through media hype and rate teams based on actual scoring differentials, adjusted for game location and opponent strength. You can blab all you want about a team’s legendary coach, elegant offense and twin 7-footers — but who the heck cares if they still don’t outscore the average opponent better than 40% of the other teams in Division I?

Weaknesses: The Predictive Power Ratings Model is a dynamic and reactive system, but the sole inputs are game results. If an absolutely key player gets injured or a bad team all of a sudden gets fired up and starts playing well, it could take several games for the impact of those developments to make a big difference in predictive ratings. Abstraction has a downside too; if a huge situational mismatch exists between two particular teams (e.g. one team has an eight foot tall center and its opponent has no player over 5’5″), predictive ratings have no idea.

Similar Games Model

Our Similar Games Model uses data driven algorithms to identify historical NCAA tournament games that featured statistically alike teams competing under similar matchup circumstances.

For example, imagine that a first round matchup features a high scoring team from a weak conference playing a low scoring team that turns the ball over a lot. Both teams are traveling moderate distances to a neutral site arena. Similar matchup scenarios most likely have occurred in recent history, and the Similar Games Model identifies them and analyzes their outcomes.

Final predictions depend on the aggregate analysis of a number of data points about each identified similar historical game, such as which team won, by how much, how many points were scored, what results were implied by betting lines, and the relative degree of similarity to the current game.

Outputs: win odds, projected final score, point spread cover odds, over/under odds

Strengths: This model incorporates a range of power ratings and team stats as well as several contextual factors including Vegas line implications, travel distances, and game timing.

Weaknesses: This model does not explicitly consider several difficult-to-model factors such as recent injuries or days rest. If you feel one of those factors may have a material impact on the outcome of a given game, it may be wise to apply subjective adjustment to its predictions. Also, in certain cases, highly uncommon matchup scenarios make it impossible to find many relevant historical matchups.

Simulation Model

The theory behind our Simulation Model is to use possession based statistics (also known as ‘tempo-free’ or efficiency statistics) to project the likely outcome of a game.

Possession based stats are better measures of team performance than “per game” stats primarily because the pace of a basketball game is an important driver of the final outcome. For example, imagine two teams are playing each other. One team turns the ball over an average of 15 times a game, while the other only turns it over seven times a game. Which team takes better care of the ball?

Of course, that’s a trick question. What you really need to compare is how efficient each team is at handling the ball. If the first team typically has twice as many possessions per game as the second team, then in reality, these teams probably have about equal ball handling skills.

In our Simulation Model, we first generate what essentially are tempo-free power ratings for a team’s performance in major stat categories (blocks, rebounds, etc.). We then look at the number of possessions each of two opposing teams typically has, and estimate the number of possessions we expect in the game being modeled. Given that number, we can use the individual stat ratings for each team to project what will happen on every possession, leading to a score prediction and expected box score.

Outputs: win odds, margin of victory, projected final score

Strengths: Unlike the more abstract power ratings and similar games models, the Simulation Model takes a very “bottoms up” approach to analyzing the playing styles of two teams, how they match up, and how specific statistical differences (e.g. a strong rebounding team playing against a weak rebounding team) are likely to effect the final score. The data used is all from this season, and each statistical rating for each team is adjusted for opponent strength.

Weaknesses: There are more assumptions at play in this model than in other models. Primarily, we are assuming that a team will continue to play its same general style of basketball as its season stats so far imply. That’s usually a safe assumption, but if a coach makes a radical change in game strategy or play calling for a given opponent, the raw data on which this model is based loses relevance.

Decision Tree Model

Our Decision Tree Model is the output of a machine learning algorithm that views every college basketball game since 1999 through the lens of hundreds of input variables, ranging from contextual information like the distance traveled by squad to team statistics like effective field goal percentage.

The algorithm does what might be convenient to think of as complex, high volume, statistically significant trend analysis. It repeatedly partitions the games into smaller and smaller subsets based on the values of one or more variables. Each split is chosen so that the win probabilities of the teams in each group get further away from 50% and closer to 0% or 100%.

But it also takes care not to be too overzealous in the splitting, checking to be sure that the splits are meaningful, and not just the products of a small sample size. In the end, it’s left with a set of rules along the lines of: If Variable1 is greater than 100 and Variable2 is less than 7 and Variable3 equals “YES” (and so on) then the win probability of TeamA is 64%.

Complicated enough for you? OK, now here’s the twist: we actually have about 100 different decision tree models, each of which look at a different subset of variables. The results of all the individual models are averaged to get the overall win probability. This helps reduce the effect of crazy outlier results, and ensures that we have multiple reasons for picking a certain team to win.

Outputs: win odds, point spread cover odds, over/under odds

Strengths: This is our most complex model, incorporating the largest amount of information, so if there’s an obscure nugget of knowledge hidden deep within our database, this is the most likely place for it to show up. It also generally has proven so far to be our most accurate model, although prediction performance varies by sport.

Weaknesses: Since it is partly based on historical trends, changes in the way the game is played or in other associated data can lead the model down a new, unexplored path, where that trend no longer applies. The complexity of the model also makes it next to impossible to explain to an average fan. We know what the output is, but we never know exactly why the model gave us a specific number as an answer. We’re just shoveling data into a computer and trusting it, which is what most advanced quantitative prediction systems do, but it still makes some people nervous.

Vegas Implied Model

The Vegas Implied Model assumes that the betting lines offered by Las Vegas sports books are efficient, meaning that they are the market’s best prediction for the outcome of a contest.

Vegas offers up to three basic types of bets, which each describe a different aspect of a game. The spread tells us what the expected margin of victory is, the over/under tells us how many total points are expected to be scored, and the money line gives us implied win odds for each team.

When a money line is available, that’s what is used for this model. While the win odds implied by the money line have some “juice” added in, we can reduce the odds proportionally for both teams to come up with more realistic win probabilities.

In cases where no money line is available, we convert the point spread into expected win odds based on the historical winning percentages associated with particular lines. And when there is no spread available, we use our predictive power ratings to estimate what the spread would be, then use those same historical win odds.

Outputs: win odds, projected final score

Strengths: Because the betting market reacts dynamically to new information, the Vegas Implied model should adjust nearly immediately to the market’s anticipated effects of injuries, suspensions, and other contextual data that is, for all practical purposes, invisible to our other models.

Weaknesses: The biggest drawback to this method is that it relies on the assumption that the betting markets are efficient. In reality, some lines are probably affected by things unrelated to the outcome of the game: the public’s fetish for certain teams and their tendency to support favorites, or the activities of influential high-dollar bettors who may try to use their betting patterns to manipulate the market.

Seed Difference Model

The Seed Difference Model, which only applies to the NCAA basketball tournament, is simply a window into the past, created to satisfy the curiosity of ourselves and of our readers. We look back at all tournament games since 1998 to find what happened when teams with a matching seed difference played each other.

For example, when an 8-seed plays a 9-seed in round one, we find the winning percentages of the higher and lower seed in all previous 8-vs-9 games, as well as 2-vs-3, 1-vs-2, etc. We expand the search in this manner in order to increase the sample size of each possible combo. Otherwise there would be very little data for quite a few of the specific seed-vs-seed matchups.

Outputs: win odds

Strengths: Much of successful NCAA tournament pool strategy involves identifying inefficiencies in the likely picks of your opponents. They will often default to picking the higher seed, and this model can help you discover which they may be placing too much emphasis on a seed difference that shows little historical predictive power.

Weaknesses: While expanding our search to include any equivalent seed differences allows us to increase the sample size, it also leads to perhaps less relevant results being included. Though the seed difference for both is 3, the difference in quality between a 1-seed and a 4-seed should be larger than the difference between a 7-seed and a 10-seed. In addition, the NCAA selection committee’s tendencies may change over time, meaning the meaningfulness of seed differences could also change.