Each week, NFL fans, fantasy players, and bettors all spend time reviewing or hearing about NFL team injury reports. They use the injury status structure (players are listed as probable, questionable, doubtful, or out) to determine who will actually be playing in that week’s game.
However, not all injuries are created equal. It makes intuitive sense that players listed with the same injury status may play more or less often, depending on what position they play or what their specific injury is. To find out where these differences may arise, we recently conducted a study of the last two years of NFL injury report data.
As we build up our intelligence about NFL injuries, we will begin to incorporate our findings into our algorithmic NFL predictions, as well as pioneer a new level of injury analysis on our team and matchup pages.
Overview of NFL Injury Analysis Project
In this first post, we will explain our methods and high level conclusions. Over the next three weeks or so, we will examine the intricacies of each type of player injury status and how position and location affect the odds of playing.
The specific goal of this study is to determine the odds of players actually playing in the game subsequent to their injury. This percentage would most likely be driven by four factors:
- Their injury status
- The type of injury (e.g. hamstring, ankle, etc.)
- Their position
- Their overall value to the team (e.g. starter vs. scrub)
The primary end product of this analysis is a matrix of what we call “injury crosses,” which show the percentage chance of an injured player actually playing, broken out by the first three factors listed above.
In this and three following posts, we will lay out our specific findings. Each post will be an in-depth look at an injury status, examining which positions and injury types have either a high or low likelihood of playing, compared to the average likelihood for that status.
The Data Set
Our initial data set includes NFL injury reports for the 2008 and 2009 seasons. There have been over 4,000 injuries listed on NFL injury reports in the last two years. That’s a decent sample size, but with over 50 distinct injury types, there were some issues with small sample sizes for the more uncommon injuries. To account for this problem, we applied some Bayesian statistical techniques that added data based on overall percentages and allowed us to make more meaningful conclusions about rarer injuries.
For example, there were only nine instances of players listed as probable with elbow injuries. Seven of them played. While this seems to suggest a high likelihood of players with elbow injuries playing, the small sample size makes conclusions difficult. To address this, we simply added fictional observations in line with the overall average percentage of probable players that played. Since the overall average playing percentage for probable players is 72%, we “created” 25 fictional instances of probable players with elbow injuries, and assumed 18 of them played and 7 did not (18/25 = 72%).
This process gives us a bigger sample size, and also brings rarer injuries more in line with the mean, which we are more confident is correct because of the larger numbers of observations. The same process was applied for all injury types. The beauty of it is that for injuries with a lot of observations (ie: hamstrings), the added data did very little to change the percentage, which we were already more confident in.
Adjusting For Starters vs. Scrubs
One other concern became evident after running our initial analysis. Just looking at the numbers, we don’t know if an injured player did not play because he was too hurt to play, or if he was potentially fit enough to play but the coach simply chose not to play him.
To explore whether we were underestimating the likelihood of a player actually playing, we ran the analysis for two data segments:
- All players
- Players that started at least 5 games in that season (we defined this group as “Starters”)
We assumed that if a player who usually started was able to play, the coach would choose to play him, eliminating the chance that he was fit but did not play on account of a coaches’ decision alone.
The high level results of our initial segmented study are summarized in the table below:
Table 1: Overall percentage of injured players that ended up playing, by injury type.
Subset Probable Questionable Doubtful All Players 68.27% 48.70% 27.78% Starters Only 73.35% 57.85% 31.14%
As you can see from the table, starters played more often across all three injury levels, and 18.9 percent more often for the “questionable” status. Because the starters dataset seemed to eliminate the fit-but-DNP problem, and because fans and fantasy players will be most interested in injured starters, we chose to run the full analysis using the “starters only” constraint.
As one may have guessed, the chance a player will play increases almost linearly by injury status type. Questionable players play roughly 16% more than doubtful players, and probable players play roughly 16% more often than questionable players.
In our next post, we’ll dive into the specific player injury crosses for the probable status, and present data that shows variations in the overall odds to play percentage by player position and injury type. For example:
- Wide receivers listed as probable played 78.5% of the time, 7% more often than all players listed as probable
- Players listed as probable with shoulder injuries play 78.8% of the time, again, about 7% more often than all players listed as probable
As a result, from this data we would conclude that a wide receiver listed as probable with a shoulder injury likely has a very good (80%+) chance of playing, if we treat position and injury type as independent variables.
Knowing these odds can come in very handy for a fantasy manager debating whether to start an injured player, or a bettor trying to handicap a team with impact players listed on the injury report.
Stay tuned for the next chapter of our findings in about a week.