November 8, 2012 - by David Hess
A few days ago when we released our 2012 preseason college basketball top 25, we hinted that we had an exciting new feature that we were just itching to let loose into the wild. Well, here’s what we’re so amped about:
We are now simulating the entire college basketball season every single day, all the way from November through April. Our projections now include:
This means that every single day, you can count on TeamRankings to deliver intelligent, up-to-date, algorithmically derived odds for thousands of future college basketball outcomes, from Kentucky’s chance to land a 1 seed in the NCAA tournament to Grambling State’s odds to make the 2013 March Madness bracket.
There are dozens of postseason predictions and probability distributions for each and every one of the 347 Division I men’s basketball teams. A lot of these predictions, of course, relate to March Madness.
We’ve been working on this bad boy for months, and we’ve still got a few kinks to iron out. But there’s a whole lot of analytical firepower under the covers and we’re very excited about where we can go with this project.
The level of analysis we are doing will hopefully facilitate a much better popular understanding of the dynamics of things like season outcomes and NCAA tournament seeding. For example, most fans would probably assume that the probability distribution of a team’s expected NCAA tournament seed would look like a smooth bell curve, but that’s rarely the case.
Let’s say you think Gonzaga is most likely going to end up getting a 3 seed. From that point, most humans would probably reason that their second most likely seeding would be either a 2 or a 4, then after that maybe a 5 or a 6, or as low as a 7 if they finished out the season poorly.
Yet the model’s simulations don’t end up with a single smooth peak. It sees Gonzaga with a good chance of getting a 2 through 4 seed, but the next most likely outcome is a 7 or 8 seed.
We haven’t analyzed every projected outcome in depth yet, but such an effect may well reflect the difference between the Bulldogs winning their last couple games (and so also the WCC tourney) and faltering late. The team quality in the two cases is similar, but treatment by the committee may not be.
As the season plays out, we should be able to glean a lot of interesting insights from all this data.
As of November 8, 2012, one day before the season starts, here’s our official projection of the 2013 NCAA Tournament bracket come March:
(click to enlarge)
[Quick update/note: We’re not worried about following the NCAA’s bracketing rules here. We know that, for example, Georgetown can’t play Cincinnati in the first round. Our goal here is to show expected seed lines for each team, and give an idea of the rough quality of opponent they might face in each round. Trying to predict actual bracket matchups at this point is, well, pointless.]
More on this bracket in a bit.
Right now, we have two types of pages displaying this new info.
Currently accessible via the “Bracketology” link in the left green sidebar of our college basketball section, this page shows the following for every college basketball team:
Found via the “Bracketology” link in the left gray pullout menu of any college basketball team page (e.g. Kansas, Syracuse, San Jose State), every team bracketology page is linked to from the master bracketology summary page. These let you drill down to more detailed info about a team’s bracketology projection, including:
These pages are in a very rough “version 1” form right now, but we wanted to get them out and see what people thought. We intend to keep making them better over the course of the season, as we’ve got a lot more fun data on teams to show.
You may notice there is one major thing missing from the new pages above — a single projected 2013 NCAA tournament bracket. Sure, all these projected odds are great, but you wanna see the most likely end result, right?
We’re working on it. We’re currently saving a new projected bracket to our database every day, and the next step is to get that info up on the site for all to see. We expected to have that ready for public consumption next week. To tide you over until then, we included our official preseason bracket above.
We acknowledge that there are a couple head scratchers in there right now — St. Louis as a #2 seed, in particular. As with any modeling project of similarly massive scale, there are a still a couple kinks we need to iron out in our logic, and that’s part of why the automated bracket page isn’t live on the site yet. However, as the season progresses, and teams actually play a few games, outliers like St. Louis right now should become much more rare.
We do the following 5,000 times, then report the results.
Based on our team power ratings, we predict the outcome of every remaining game in the 2012-2013 Division I college basketball season. Early in the season, the simulations are based heavily on our college basketball preseason ratings. Later in the year, those ratings will become less important, and actual team performance will take precedence.
Based on end-of-season win/loss records and conference standings that result from each season simulation, we create conference tournament seedings and brackets. As with our season simulation, we then semi-randomly pick winners for every conference tournament game based on team ratings, round by round, until we have all our conference tournament winners.
Accounting for the various formats of conference tournaments is actually a huge pain in the butt, and took a long time to track down all the appropriate data and get it right. There are still a few minor tweaks we need to make to handle some freak tournaments that re-seed teams after the first round, but it’s very close now.
The results of all these simulations are shown on our college basketball projected standings page, as well as on various team pages that are linked from the projected standings page.
This is the new glitzy part that we’re super excited about.
We spent some time this summer developing a computer model to predict the decisions of the NCAA selection committee. [Technically it’s 2 models — one for selection and one for seeding — but they’re pretty similar.] We looked at how data points like RPI, record vs. the RPI Top 25, conference win percentage, record in last 10 games, schedule strength, and predictive power ratings could be combined to mimic the past selection and seeding results.
Our model certainly isn’t perfect, but we tested it on data from Selection Sunday 2012, and it would have placed in the top half of the bracket project. Of course, one season could be a stroke of luck, so we’re curious to see how we do this year.
But the main point is that these are fully automated algorithms, and even at their biggest disadvantage — when there are no more games to play in a season, and no “projecting the rest of the season” edge over humans — our automated logic still did better than most self-professed “bracketologists” last year at predicting team selection and seeding.
During every simulation, we keep track of a team’s selection resume, including all those nitty gritty details like projected RPI and record vs. the final projected RPI top 25, and we feed those resumes into the model. The model spits out projected selection and seeding odds, and then we semi-randomly seed the tournament using those odds. (We don’t just rank the teams in order of our projected odds; we add some randomness because we know that our model isn’t perfect, and that the committee can at times make some quirky decisions.)
Keep in mind that early in the season, a team’s selection resume is mostly based on the results of our simulation. So if we have a team rated too high or too low, their selection and seeding related odds will also be off. As the season progresses, though, more and more of a team’s simulated NCAA tournament resume will consist of things that have already happened, and less will of it will be dependent on our projections, and our selection and seeding projections should become more accurate.
Finally, the payout. After simulating the season and the selection committee, playing out the NCAA tournament is actually pretty simple in comparison. We start with the projected bracket and team ratings, apply a little math, and poof…we end up with odds to advance to every specific round, for each team.
Right now, the best place to view those odds is on each individual team’s bracketology page. For example, check out Michigan State. The chart at the bottom shows the Spartans’ odds to advance to every round of the tournament, and the table at bottom right shows how their odds to win the entire tournament changes based on what seed they get.
This system is new, and like any new modeling project of this scale, it’s still a work in progress. We’re planning on adding more info to the site as the season goes along. At the top of our list right now:
If you’ve got any suggestions, requests, or questions then please leave a comment in the discussion thread below. We’re definitely open to any bright ideas you might have for interesting data that we could pull out of these projections and display.
In the meantime, we’re going to keep pushing the envelope with college basketball and March Madness related predictive modeling, and work to make these initial pages and data presentations even better. Hopefully today’s announcement is just the beginning of an exciting new chapter in what’s come to be known as bracketology. Like Nate Silver did with election predictions, our goal here is to use data to trump the human pundits.
Printed from TeamRankings.com - © 2005-2018 Team Rankings, LLC. All Rights Reserved.