Predicting NBA Draft Position from College Performance | Stat Geek Idol

This is a Sweet 16 submission in our inaugural Stat Geek Idol contest. It was conceived of and written by John Ezekowitz of the Harvard Sports Analysis Collective.

Every March, the sports nation turns its eyes to the NCAA Tournament. Under those bright lights, dreams are made, Cinderellas are found, and stars are born. Casual sports fans learn to love previously unknown players like Stephen Curry and Gordon Hayward.

For NBA talent evaluators, March is just one more data point in a set of observations they have been making for years. It is their job to become intimately acquainted with not just the players the larger sports world only finds out about in March, but also those whom the spotlight never finds in college.

The media, too, gets in on the act. Draft analysts like Jonathon Givony and Chad Ford keep the public abreast of the latest player stock movements. A good or bad game can send a player flying up or down mock draft boards.

But what factors actually determine NBA Draft position? What are the college stats that presage a player hearing his name called early on Draft night? There has been some work done on this subject already, including a paper that came out this month which looks at the effects of March Madness performance on Draft position, but the literature is flawed in one key way: these studies do not account for position properly.

Position Matters, Let’s Not Ignore It

It is intuitively obvious that the factors that make a center desirable probably substantially differ from those that make a point guard desirable to NBA scouts. We cannot estimate the effect of Points Per Game on Draft position as if it is the same across all positions.

Luckily, statistics has a solution to this problem: hierarchical linear modeling. Hierarchical models allow for different regression coefficients at each level of a specific variable. That is a fancy way of saying we get to estimate different weights for variables at each position. Team strength, for example, can have five different estimates depending on traditional basketball position of the player. This procedure helps to optimize model fit.

The Data

My dataset consists of every player who was either drafted or played in the NBA and in Division I college basketball between 2001 and 2010. I collected college stats, ranging from traditional measures (points per game, etc) to advanced metrics (like efficiency ratings) to team variables (team TeamRankings Power Ratings, etc) on each of these players, and placed them into positions using Basketball Reference’s categorizations. For each college variable, I used a minutes-weighted average of a player’s college career. Thus players get more credit for stats accumulated in years where they played more. These categorizations aren’t perfect—some players obviously inhabit more than one position—but they are as good, if not better than, anything else available.

The Model

My model ended up containing eight variables that predicted Draft position, and four positional variables. They, and their coefficients, are summarized in the table below:

(click to enlarge)

* Significant at 95% Level *** Significant at 99% Level

The positional coefficients represent the different slopes the model assigns to different positions. By themselves, they are meaningless, but when they are combined with the values of the predictor variables, they yield Draft position estimates (note: point guards are left out as the reference group).

Negative coefficients are associated with better Draft position, as the best pick in the Draft is the first overall pick. The coefficients are interpretable as numbers of picks. For instance, increasing age by a year and holding everything else constant is associated with falling over 3 spots. As you can see, team factors are very important in predicting Draft stock.

What Factors Help Predict Draft Position?

Players on better teams tend to get drafted earlier (in part because they make those teams better). Even controlling for that effect, the number of other NBA players on a player’s team is a significant predictor of improved Draft success. This might be indicative of what behavioral economists call the availability heuristic. Scouts have to allocate what games they can see, and thus might pick games with more potential NBA players. Players on those teams benefit from sticking out in talent evaluators’ minds more because the evaluator may have seen them more often.

The player variables that are significant are Offensive Rating, a measure of how efficient a player is offensively, height, Usage Rate (how many possessions a player uses), and Blocks per 40 Minutes. All of these variables are measured relative to the player’s position: a 6’4” point guard gets more of a boost from his height variable than a 6’4” small forward. All of these variables are negative, indicating that better performance leads to better Draft position. It is interesting that Usage Rate is significant. This implies that, even holding efficiency constant, players who are more involved offensively get drafted higher.

Unlike previous analyses, I do not find that Points Per 40 minutes is a significant predictor. The combination of Offensive Rating (efficiency) and Usage Rate (volume) was a much better predictor.

Finally, age is negatively associated with Draft position. The older a player is, all else equal, the worse his Draft position. This is because players generally improve over time: a freshman in college putting up identical numbers to a senior is evaluated as having more growth potential in the NBA.

Is The Model Accurate?

All of this is well and good, but how well does the model fit the data? To test it, I conducted out of sample testing where I pulled a year out of the dataset, ran the model, and then used that model to predict the Draft order of the missing year. I then correlated the predicted and actual rankings to see how well the model fit the data. The results for the last four years in the dataset are summarized below:

Average pick deviance calculates the (absolute) average miss of the model. In 2010, for instance, it was on average off by 6.8 spots. The high correlations illustrate that the model fits the data well out of sample, albeit better in some years than others. In general, the model is able to determine who will be picked in the top 10, and who will fall to the lower selections (or not be picked at all). In three of the past four years, it correctly predicted the number one overall pick, only missing Blake Griffin.

Conclusions

While this model is far from perfect, I believe it represents a big step forward in analyzing which factors predict NBA Draft position. By utilizing the hierarchical linear model approach, the model is able to better account for the fact that traits affect the Draft stocks of players differently based on what position they play.

It is interesting that in contrast to earlier studies, I find that efficiency and usage matter more than purely scoring points. Additionally, it seems that the team context matters a lot for player Draft position. In the future, I would like to add more contextual variables, like team success in the NCAA Tournament, to further analyze potential biases towards more exposed players.

When you watch the rest of the NCAA Tournament and hear chatter about players’ Draft positions, remember that not all stats are created equal, and that certain stats are good predictors of Draft success.