March 17, 2012 - by Nathan Walker
***NOTE*** This is an entry in our inaugural Stat Geek Idol contest. The opinions or predictions expressed below do not represent the views of TeamRankings.com, and are solely those of the author. This article was conceived of and written by Nathan Walker of the basketball distribution.
In the game of basketball, in order to predict whether or not one team is better than another, a good geek will at the very least look at point margin: a team’s ability to score more points than their opponents.
But there is an inherent problem with doing this: not all points are created equally. Some points are easier to create against certain opponents, for example.
I wanted to see if any statistics were controlled more by offense or by defense in NCAA basketball. To do this, I use a simple idea: out of sample data. By using all other games to estimate, it forces any math to be predictive: if we used the game in question as part of the dataset to predict the game in question, it would be skewing the data (if only slightly) in that game’s favor. Cheating, basically.
Here’s an example, using Kentucky’s Turnover Rate (Turnovers/Possessions) in their first four games:
We can definitely look inside and say that in Kentucky’s first four games, they turned the ball over on roughly 21% of their possessions. By just using this average number, we can estimate each of these games within 7.5% (where error is just the difference between actual and predicted).
However, this is not a predictive answer, we cheated by using the individual games in question. In reality, Kentucky’s first four games predict other games within 9.9%. The Marist game, for example, was UK’s lowest turnover rate in these four games. The average against KU, Penn State, and ODU is 23.9%, which misses the mark by 12.1%. This method is similar to Leave-One-Out-Cross-Validation (LOOCV), which validates an equation by removing one sample and using all other data to predict it.
Now for something a little more complex. In the spirit of NBA-blog-great Eli Witus, I set forth to see what offenses control and what defenses control, based on Dean Oliver’s four factors in the NCAA. For the uninformed, the four factors are shooting (field goal%, weighted extra for 3-pointers), turnover percentage (turnovers / possessions), offensive rebound percentage (offensive rebounds / available rebounds), and free throw rate (free throw attempts / field goal attempts). These explain over 90% of any team’s point margin.
If we look at Game Plan data on kenpom.com (subscribers only, unfortunately), we can see each team’s four factors. I looked at 10,572 Division I games in the 2010-11 season, and found all data excluding the game in question. It went something like this (for eFG%):
1) Adjust each team’s eFG% for home-court advantage (subtracting 1.44% at home)
2) Average the adjusted eFG% for EVERY GAME, except the game you are predicting
3) Average the opponent’s adjusted eFG%, except for the game you are predicting
4) Use excel to create a formula (using regression) that tells us:
Game eFG% = A x (Team 1 out-of-sample eFG%) + B x (Team 2 out-of-sample-defensive eFG%)
Luckily, the math checks out – each of the four factors can be explained by out-of-sample data by at least 93%; simply put: out-of-sample-data predicts game stats very well. In fact, each of the values we find almost exactly sums to 100%. That is, Offensive Control (A) + Defensive Control (B) = 100%, except in the case of Free-Throw-Rate.
Here are the results:
Field-Goal Shooting (FGM + 0.5*3PM)/FGA:
Offensive Rebounds (Offensive Rebounds/(Offensive Rebounds + Opponent Defensive Rebounds)):
Free Throw Rate (Free-Throw-Attempts/Field-Goal-Attempts):
So we can see that offenses tend to control shooting and offensive rebounding, while defenses tend to control turnovers and free-throw-attempts (and therefore fouls) more. The offensive rebounding part is pretty intuitive: defenses often have balls land in their laps, so it is the offense that must strive to change this.
Knowing this, we can adjust each team’s scoring & points allowed based on their competition at these specific rates. In this way, the basic theory of team strength is a little deeper than the standard method.
Game Result = Team 1 True Strength – Team 2 Strength.
At least for rebounds, we know, for example.
Offensive Rebound% = 64% Offense + 36% Defense.
But this is just the beginning…
Printed from TeamRankings.com - © 2005-2018 Team Rankings, LLC. All Rights Reserved.