Let me start with an obvious statement: In basketball, players who attempt more shots score more points on average. Other individual stats are also associated with more points, such as rebounds and blocks. Similarly, some statistics, like turnovers, are associated with fewer points scored. In this way, we can compute an expected number of points a player should score based on their other statistics such as shots attempted, rebounds, turnovers, etc. Then we can compare the expected number of points a player should have scored to the actual number of points scored and evaluate players based on their tendency to be above or below their expected number of points. I’m going to call this player efficiency.
In the same fashion, we can do the same thing with teams. Based on the number of shots a team attempts, team blocks, team turnovers, etc. we can calculate an expected number of points a team should score. Additionally, for the team analysis, we further consider the opponents’ statistics since the more shots an opponent attempts the more points on average a team will score. So, based on statistics from a team and its opponent the expected number of points a team should score can be calculated. Then, in the same way as with players, we can examine if teams are consistently scoring above or below their expectation and calculate a team’s offensive efficiency. For teams, a defensive efficiency can also be calculated measuring whether a team’s opponents are consistently scoring above or below their expected number of points. This will be referred to as defensive efficiency. Alright, let’s calculate!
We start by taking box scores for each player from every game and regressing each player’s points against attempted number of field goals (FGA), three point attempts (3PA), free throw attempts (FTA), number of defensive rebounds (DRB), blocks (BLK), and turnovers (TOV), and we find, for instance, that each field goal attempt is worth, on average about 0.91 points. This means the average college basketball player will average about 9.1 points per 10 shots when all other things are held constant. Likewise, a 3PA is worth about 1.1 points on average, though there is considerably more variability involved in shooting a three pointer. (Note: This doesn’t mean teams should only shoot threes since they are worth more on average, it just means that the kind of players who are attempting three pointers make these shots more valuable on average than all other regular field goals.)
Likewise, a free throw attempt is worth about 0.76 points, and defensive rebounds, blocks, and turnovers are worth on average about 0.06, 0.14, and -0.11 points, respectively. Using this, we can take a player’s box score and, based on their FGA, 3PA, FTA, DRB, BLK, and TOV, evaluate how many points they were expected to produce based on their other stats. A good player will score more points than expected, whereas poor players will score less than the expected number of points.
For example, on November 23, 2012 Belmont’s Ian Clark played 26 minutes in a game against Northeastern and shot 13 FGA and 11 T3PA with no FTA. He further had 3 DRB, 0 BLK, and 1 TOV. With these stats a player is expected to score about 14 points. However, Clark actually scored 29 points on 10 of 13 shooting, including 9 of 11 on three point tries. While this was certainly an impressive game, the best game of the whole season by this standard was played by Christian Williams of Jackson State on February 23 of this year. In 33 minutes, Williams took 16 shots, 8 of which were threes and attempted 11 free throws. He added 1 DRB, no blocks, and only had one turnover. This stat line should result in a little over 24 points; however, Williams actually scored 44 points on 13 of 16 shooting including 7 three pointers, while shooting a perfect 11 for 11 from the free throw line.
So, this works for individual games, but what if we want to evaluate the expected number of points for a player over the course of many games? In order to do this, we’d need to have data on the expected minutes a player will play, along with the expected number of shots a player takes, etc. In order to do this we can build a model for each separate component in the prediction: A model for minutes, a model for attempted shots, a model for three point attempts, and model for free throw attempts, etc. We need one model for each of the six factors mentioned in the previous paragraph (FGA, T3PA, FTA, DRB, BLK, TOV). Additionally, a model for the number of minutes played will also be estimated. Once we have all of this we can calculate the expected stat line for a player and based on this we can estimate the expected number of points for a player when they produce their average number of minutes, shots, rebounds, etc.
However, the ways in which players score their points can be very different. Scoring 20 points on 10 shots is much different than scoring 20 points on 20 shots. The first scenario is much more efficient than the second. One way to assess this is to look at the difference between expected points for a game and actual points in a game across all games over the course of the season, and to try to explain some of the excess variability by adding in the effect of a player. This can be accomplished by adding a random effect for each player to the model for predicting points. A large positive estimate will indicate that a player is consistently scoring above their expected points based on their game statistics, and a negative estimate shows they are scoring fewer points than expected. The resulting random effects estimates are what I’m referring to as effectiveness.
The plot below shows the relationship between expected points and effectiveness. That is the y-axis represents the expected number of points a player will score based on their expected stat line and the x-axis is the offensive estimate for each player. The players with the highest expected points this year are Erick Green and Brandon Likins, followed by Lamont Jones, Travis Bader, and Doug McDermott. But you can see that Doug McDermott is scoring points more effectively than Travis Bader and much more effectively that Brandon Likins. Other highly efficient scorers this year include Victor Oladipo, Ian Clark, Marshall Bjorklund, T.J. Warren, and Kelly Olynyk. The full list of players with their expected points and effectiveness scores can be found in this google doc. And if you’re interested in finding a specific player on the plot below, I’ve created an interactive version of the plot here using the shiny package in R.
The same type of analysis can be applied to teams. Each stat from before (FGA, 3PA, FTA, DRB, BLK, and TOV) can be assigned an average number of points to a team based on regression analysis. But here, not only will each team’s stats be considered, but also each team’s opponent’s stats. For instance, every field goal that a team attempts is worth on average about 1.19 points, and every shot a team’s opponent attempts is worth about 0.63 for the team (since more shots by your opponent means more shots for you which means more points, on average).
Likewise, a FTA is worth about 0.65 points for your team, and every free throw that your opponent attempts is worth about 0.29 points on average. Incorporating all of these different stats, we can predict the expected number of points for a team if we knew how many shots they attempted, how many shots their opponent attempted, etc. Since we don’t know this exactly, we can estimate all of these and simulate an expected stat line which we can then use to predict the number of points a team is expected to score with an average stat line.
In order to calculate team effectiveness, we need to look at the actual stat lines rather than a simulated one, and the process is similar to the case with players. Below is a plot of offensive effectiveness versus expected team points. You can see that Indiana and Iona had the highest expected points this season, but they weren’t nearly the most efficient team. The most efficient team this year was Creighton, thanks in large part to Doug McDermott being the most efficient player in the country. The rest of the top five most efficient teams this year includes Richmond, Belmont, San Francisco, and Denver followed by Duke, Kansas, North Carolina State, Pittsburgh, and Kentucky. The least efficient teams this year were Howard, North Texas, and Prairie View and the team with the lowest expected points this year was Grambling.
But offense is only half the story. We can further apply this same type of analysis to each team’s defense. Now instead of just estimating only an effect for offense, a defensive effect is added to the model. In this way, we can derive both offensive and defensive effectiveness ratings for each team in the NCAA. Below is a plot of offensive versus defensive effectiveness for all teams in the NCAA for the year 2013. I’ve highlighted some teams in the plot below, including all of the number 1 seeds (Indiana, Gonzaga, Kansas, and Lousiville), as well as a few other teams that I found interesting, like Kentucky, Ohio State, and Florida.One can see that of all of the number 1 seeds, Gonzaga is the least efficient on offense, but they are the most efficient on defense. Similarly, Indiana is nearly the same as Gonzaga in defensive efficiency, but slightly more efficient on offense. Other teams, like Louisville and Kentucky, are nearly average in defensive efficiency, but well above average in offensive efficiency. (It’s interesting that Kentucky and Louisville fall so close to each other on this plot; one team missed the tournament entirely and the other was a 1 seed.) Finally, there is a team like Florida, who many picked to win the entire tournament, who are about as efficient on offense as teams like Ohio State and Louisville, but are very nearly the most defensively efficient team in the whole country (only Alcorn State and Mississippi were more defensively efficient this year).
So, a big question is, does offensive or defensive efficiency actually mean anything in terms of winning tournament games? In order to address this, I looked at last year’s Sweet Sixteen and Final Four teams. In the plot below, Sweet Sixteen teams are indicated by light green dots and Final Four Teams are represented by their team colors. You can see that nearly all of the Sweet Sixteen teams last year had positive offensive efficiency and negative defensive efficiency, meaning they were above average in both measures.
However, this is likely due to the fact that teams with above average offensive and defensive efficiency are the only types of teams in the tournament to begin with. So, a much more interesting example occurs in looking at the Final Four teams. One can see that all of the Final Four teams from last year had and offensive efficiency above 1, and two of the teams were above 2. Further, the champion last year, Kentucky, can be found in the lower right corner of the graph near Ohio State, indicating both above average offensive and defensive efficiency. The only team close to this area of the graph this year is Florida.
For a full list of teams’ expected points and efficiencies, you can go to this google doc. Finally, an interactive version of offensive effectiveness versus expected points can be found here, and a plot of offensive versus defensive effectiveness can be found here for both 2012 and 2013.