Basketball, Probability, And Quantum Mechanics | Stat Geek Idol

March 21, 2012 - by Kenneth Deakins

This is a Sweet 16 submission in our inaugural Stat Geek Idol contest. It was conceived of and written by Kenneth Deakins.

For my previous post, I talked about how I (along with my partners) used a couple of Machine Learning algorithms to predict the NCAA Tournament.

One of the problems with the techniques we used, KNN and SVM, is that they are both deterministic. This is to say, they both assume for any given inputted vector there is a correct label. So for any given match-up between two teams, both KNN and SVM assume that one team will always win, and that labeling that team as winning is the correct label. The “confidences” I talked about in the previous post were not actually probabilities of team A or team B, but rather the degree to which KNN/SVM felt they had labeled an example correctly. When SVM or KNN pick the wrong team to win, the assumption is that it does so because it had insufficient data to correctly model and predict the results.

Interestingly enough, this is similar to an old assumption in science, that if one were to know the entire state of the Universe, one could predict the future with 100% accuracy.

For example, it was traditionally thought that when flipping a coin, if one knew the initial mass, velocity, etc, they could predict the result of the coin flip. Probability was just a way of simplifying the problem. Then along came quantum mechanics in the early 20th century, and it was discovered that reality was actually probabilistic. At the quantum level, one cannot predict the future, only the range of future possibilities. Quantum physics gives you a probability function, so that given a set of inputs, rather than saying where particle A will end up, it gives probabilities for a ending up at various locations.

Determinism Vs. A Probabilistic Basketball Universe

Now, what does this have to do with basketball?

Basketball tends to be talked about deterministically. The assumption is (for the most part), that the “better” team won, and a game is a process by which the “better” team is determined. After a game is over, various experts weigh in to explain why and how the game was won (or lost). Team A won because they out-rebounded the other team. Team B lost because they shot poorly. Or when a player makes a whole bunch of shots in a row, it is because they are “feeling it”, not because of normal statistical variance. Or, when a team is complaining about being seeded too low, a common response is that everyone has to play good teams eventually. This assumes that a good team, if they’re really good, will win against a bad team every time.

If one instead looks at a match-up between two teams as having two possible outcomes, neither or which is determined to happen (or post game, that the one that happened was determined to happen), looking at basketball changes. The NCAA tournament, rather than being a way of finding out who the best team is, is rather a way of finding who the best team is with a certain level of confidence. What this confidence level is, I don’t know (nor do I really have a method of finding out).

Keep This In Mind When Filling Out Your Bracket

Thinking about basketball probabilistically is important for filling out a bracket, because it suggests a different way for filling out the bracket than one normally does. Rather than filling out the first round match-up first (who you think will win each match-up), then deciding who wins in the resulting match-up, one should fill out the bracket in reverse.

If, which is a big if, one has the probability of any given match-up, one should rather start by picking the champion as the team most likely to win. For example, if team A wins against every team 55% of the time, but team B wins against every team other than team A 95% of the time, team B is more likely to win the tournament, even though a match-up between the two teams is more likely to result in team A winning.

Additionally, when thinking about basketball probabilistically, one has to be very careful of the problem of over-fitting. Rather than thinking about the result of a game as the “correct” result, rather realize that the winning team could have only been a 5% shot to win, and happened to get lucky. So any model that tries to include this data point in a deterministic fashion will have problems predicting. There are machine learning techniques that are probabilistic, one of which (naïve Bayes) we implemented. However, naïve Bayes assumes statistics are independent, which simply isn’t true for most basketball stats, and while our model had good overall accuracy (around 70% as well), but tended to predict really badly for the tournament (16 seeds winning multiple games).

Use Public Pick Data Wisely

To return to the idea of predicting the bracket most likely to happen, rather than the bracket that actually happens, that is not necessarily the best picking strategy. For example, if you think team A wins the tournament 50% of the time, but 95% of the field predicts he wins. It might make more sense to predict a team that wins 5% of the time, but only 1% of the people predict to win. Assuming you predict team A, even though they’re more likely to win, if they win you aren’t as likely to win as if team B wins. The better your overall prediction accuracy, the more it makes sense to predict conservatively for your winner, as you’re more likely to do better than the other 95% who pick team B. If you don’t know much about overall strength of teams, but you know that team B is highly undervalued, it might make sense to pick B as your winner, even if you think team A is more likely to win.

A lot of what I’ve written may seem a bit unrelated to basketball. However, I think that it is important to remember that it’s easy to think of an alternative universe were Missouri was still alive in the tournament (perhaps Norfolk had not shot 53% from the 3-point line). Perhaps that was even the outcome that was more likely. If that was the outcome that was more likely, then an model that counts a match-up between those two teams as Norfolk winning is a model that is likely to have trouble predicting future results.

The difficulty modeling basketball and sports in general is part of what makes them so fun: the randomness, the madness.

[Editor’s Note: We take all the concepts Kenneth touches on above into account in our expert bracket picks. Check out our review of how our 2012 NCAA Tournament bracket picks are going so far.]