Madness Strikes November: Introducing Our Brand New NCAA Bracket Predictions

November 8, 2012 - by David Hess

A few days ago when we released our 2012 preseason college basketball top 25, we hinted that we had an exciting new feature that we were just itching to let loose into the wild. Well, here’s what we’re so amped about:

We are now simulating the entire college basketball season every single day, all the way from November through April. Our projections now include:

Conference tournaments
NCAA selection and seeding
The NCAA tournament itself

This means that every single day, you can count on TeamRankings to deliver intelligent, up-to-date, algorithmically derived odds for thousands of future college basketball outcomes, from Kentucky’s chance to land a 1 seed in the NCAA tournament to Grambling State’s odds to make the 2013 March Madness bracket.

There are dozens of postseason predictions and probability distributions for each and every one of the 347 Division I men’s basketball teams. A lot of these predictions, of course, relate to March Madness.

We’ve been working on this bad boy for months, and we’ve still got a few kinks to iron out. But there’s a whole lot of analytical firepower under the covers and we’re very excited about where we can go with this project.

Why Are These Bracket Predictions Different / Awesome / Better?

Most bracketologists operate under the following mantra: “If the season ended today, here’s how I think the NCAA bracket would look.” Maybe a few of them try to work in some rough projections of how the rest of the season could play out for a few key teams, but that’s rare. This whole approach is downright silly, especially during the first, oh, 90% of the season. Differences in remaining schedule strength, probable conference tournament seeds, and likely conference tournament opponents can make a HUGE difference in a team’s NCAA tournament selection and seeding fortunes. The season doesn’t end today. We simulate out everything that could still happen; other people don’t.

We then go one big step further and actually project each team’s numerical odds to make the NCAA tournament, and then to survive every successive round, with detailed probability distributions behind every projection. They tell you, “Kansas should make the tourney, and I have them as a 1 seed.” We now tell you, “Kansas has a 90% chance of making the 2013 NCAA tournament. While their most likely seed is a 1, the odds of that happening are actually only 22%, meaning that they are more likely to NOT get a 1 seed than to get a 1 seed. Overall, Kansas currently has a 4% chance of being 2013 NCAA champions, but those odds would increase to 11% if they are able to secure a 1 seed.” We actually tell you a lot more than that, but I’ll cut it off there for now.

We update every single projection mentioned above every day. Not every week. Not every few days. Every morning, based on the results of the previous day’s games, which factor into our power ratings. All the calculations are automated.

Interesting Stuff To Learn Here

The level of analysis we are doing will hopefully facilitate a much better popular understanding of the dynamics of things like season outcomes and NCAA tournament seeding. For example, most fans would probably assume that the probability distribution of a team’s expected NCAA tournament seed would look like a smooth bell curve, but that’s rarely the case.

Let’s say you think Gonzaga is most likely going to end up getting a 3 seed. From that point, most humans would probably reason that their second most likely seeding would be either a 2 or a 4, then after that maybe a 5 or a 6, or as low as a 7 if they finished out the season poorly.

Yet the model’s simulations don’t end up with a single smooth peak. It sees Gonzaga with a good chance of getting a 2 through 4 seed, but the next most likely outcome is a 7 or 8 seed.

We haven’t analyzed every projected outcome in depth yet, but such an effect may well reflect the difference between the Bulldogs winning their last couple games (and so also the WCC tourney) and faltering late. The team quality in the two cases is similar, but treatment by the committee may not be.

As the season plays out, we should be able to glean a lot of interesting insights from all this data.

A Quick 2013 NCAA Bracket Teaser

As of November 8, 2012, one day before the season starts, here’s our official projection of the 2013 NCAA Tournament bracket come March:

(click to enlarge)

[Quick update/note: We’re not worried about following the NCAA’s bracketing rules here. We know that, for example, Georgetown can’t play Cincinnati in the first round. Our goal here is to show expected seed lines for each team, and give an idea of the rough quality of opponent they might face in each round. Trying to predict actual bracket matchups at this point is, well, pointless.]

Where Can You Find The New Bracketology Projections?

Right now, we have two types of pages displaying this new info.

NCAA College Basketball Bracketology Summary Page

Currently accessible via the “Bracketology” link in the left green sidebar of our college basketball section, this page shows the following for every college basketball team:

Odds to make the NCAA tournament (including whether via automatic/at-large bid)
Average seed projection (for top teams, these will be skewed low due to the fact that averaging a bunch of other numbers with 1 will give you a number greater than 1)
Odds of receiving a #1, #2, #3, or #4 seed
Odds to win the NCAA tournament

Team Bracketology Pages

Found via the “Bracketology” link in the left gray pullout menu of any college basketball team page (e.g. Kansas, Syracuse, San Jose State), every team bracketology page is linked to from the master bracketology summary page. These let you drill down to more detailed info about a team’s bracketology projection, including:

Projected NCAA seed distribution
Odds to make the NCAA tournament based on final record (counting currently scheduled games only, so some early season tournaments won’t be accounted for yet)
Odds to advance to each round of the NCAA tournament

These pages are in a very rough “version 1” form right now, but we wanted to get them out and see what people thought. We intend to keep making them better over the course of the season, as we’ve got a lot more fun data on teams to show.

Projected 2013 NCAA Bracket, Updated Daily (Coming Soon)

You may notice there is one major thing missing from the new pages above — a single projected 2013 NCAA tournament bracket. Sure, all these projected odds are great, but you wanna see the most likely end result, right?

We’re working on it. We’re currently saving a new projected bracket to our database every day, and the next step is to get that info up on the site for all to see. We expected to have that ready for public consumption next week. To tide you over until then, we included our official preseason bracket above.

We acknowledge that there are a couple head scratchers in there right now — St. Louis as a #2 seed, in particular. As with any modeling project of similarly massive scale, there are a still a couple kinks we need to iron out in our logic, and that’s part of why the automated bracket page isn’t live on the site yet. However, as the season progresses, and teams actually play a few games, outliers like St. Louis right now should become much more rare.

How Do We Create Bracket Predictions?

We do the following 5,000 times, then report the results.

1. Simulate The Regular Season

Based on our team power ratings, we predict the outcome of every remaining game in the 2012-2013 Division I college basketball season. Early in the season, the simulations are based heavily on our college basketball preseason ratings. Later in the year, those ratings will become less important, and actual team performance will take precedence.

2. Seed & Play Out Conference Tournaments

Based on end-of-season win/loss records and conference standings that result from each season simulation, we create conference tournament seedings and brackets. As with our season simulation, we then semi-randomly pick winners for every conference tournament game based on team ratings, round by round, until we have all our conference tournament winners.

Accounting for the various formats of conference tournaments is actually a huge pain in the butt, and took a long time to track down all the appropriate data and get it right. There are still a few minor tweaks we need to make to handle some freak tournaments that re-seed teams after the first round, but it’s very close now.

The results of all these simulations are shown on our college basketball projected standings page, as well as on various team pages that are linked from the projected standings page.

3. Simulate NCAA Tournament Selection & Seeding

This is the new glitzy part that we’re super excited about.

We spent some time this summer developing a computer model to predict the decisions of the NCAA selection committee. [Technically it’s 2 models — one for selection and one for seeding — but they’re pretty similar.] We looked at how data points like RPI, record vs. the RPI Top 25, conference win percentage, record in last 10 games, schedule strength, and predictive power ratings could be combined to mimic the past selection and seeding results.

Our model certainly isn’t perfect, but we tested it on data from Selection Sunday 2012, and it would have placed in the top half of the bracket project. Of course, one season could be a stroke of luck, so we’re curious to see how we do this year.

But the main point is that these are fully automated algorithms, and even at their biggest disadvantage — when there are no more games to play in a season, and no “projecting the rest of the season” edge over humans — our automated logic still did better than most self-professed “bracketologists” last year at predicting team selection and seeding.

During every simulation, we keep track of a team’s selection resume, including all those nitty gritty details like projected RPI and record vs. the final projected RPI top 25, and we feed those resumes into the model. The model spits out projected selection and seeding odds, and then we semi-randomly seed the tournament using those odds. (We don’t just rank the teams in order of our projected odds; we add some randomness because we know that our model isn’t perfect, and that the committee can at times make some quirky decisions.)

Keep in mind that early in the season, a team’s selection resume is mostly based on the results of our simulation. So if we have a team rated too high or too low, their selection and seeding related odds will also be off. As the season progresses, though, more and more of a team’s simulated NCAA tournament resume will consist of things that have already happened, and less will of it will be dependent on our projections, and our selection and seeding projections should become more accurate.

4. Calculate NCAA Tournament Advancement Odds

Finally, the payout. After simulating the season and the selection committee, playing out the NCAA tournament is actually pretty simple in comparison. We start with the projected bracket and team ratings, apply a little math, and poof…we end up with odds to advance to every specific round, for each team.

Right now, the best place to view those odds is on each individual team’s bracketology page. For example, check out Michigan State. The chart at the bottom shows the Spartans’ odds to advance to every round of the tournament, and the table at bottom right shows how their odds to win the entire tournament changes based on what seed they get.

Coming Soon

This system is new, and like any new modeling project of this scale, it’s still a work in progress. We’re planning on adding more info to the site as the season goes along. At the top of our list right now:

An automated daily official bracket projection, in a pretty bracket format
Full NCAA seed odds tables
NCAA round-by-round advancement table

If you’ve got any suggestions, requests, or questions then please leave a comment in the discussion thread below. We’re definitely open to any bright ideas you might have for interesting data that we could pull out of these projections and display.

In the meantime, we’re going to keep pushing the envelope with college basketball and March Madness related predictive modeling, and work to make these initial pages and data presentations even better. Hopefully today’s announcement is just the beginning of an exciting new chapter in what’s come to be known as bracketology. Like Nate Silver did with election predictions, our goal here is to use data to trump the human pundits.