Predicting The Sweet 16 Using A Classification Tree | Stat Geek Idol

***IMPORTANT NOTE*** This is an entry in our inaugural Stat Geek Idol contest. The opinions and predictions expressed below do not represent the views of, and are solely those of the author. It was conceived of and written by Gregory Matthews of Stats In The Wild.

What kind of teams make it to the Sweet Sixteen?

What qualities in a team separate an NCAA qualifying team from a Sweet Sixteen team?

This seems like a great place to use a classification tree. The idea of a classification tree is to split teams into groups where each group has similar qualities. In this case, the quality we are interested in is making it to the Sweet Sixteen. Based on the data provided to me by the wonderful people at TeamRankings, I was able to build a classification tree with all teams that qualified for the NCAA tournament in the years 2007-2011 using an outcome of advancing to the Sweet Sixteen or not.

First Split

The first split is based on Overall RPI Rating, which splits the NCAA tournament teams into two big groups: high and low Overall RPI teams. Of the low Overall RPI teams that qualified for the tournament, only 24 of the 246 (9.756%) teams in this group advanced to the Sweet Sixteen while 56 of the 83 (67.46%) teams with larger Overall RPI Rating qualified for the Sweet Sixteen.

Larger RPI Teams From Past Years

Now we have two groups: large RPI and small RPI. We’ll call the large RPI groups the R-Groups (they are on the right side of the tree graphic) and the small RPI groups the L-Groups (they are on the left of the graphic).

First let’s look at the R-groups (large Overall RPI). This group can then be split again using Overall RPI Rating. Using this split, the group with the very large RPI, group R1 from the figure, has advanced to the Sweet Sixteen 31 out of 34 (91.17%) in the last 5 years. The teams in group R1 that failed to advance were the 2007 Wisconsin Badgers, Duke in 2008, and, most recently, Kansas in 2010.

The other group, team with Overall RPI between 0.6169 and 0.643 (we’ll call this the medium RPI group) advanced to the Sweet Sixteen 25 out of 49 (51.02%) times in the most recent half decade.

Things start to get really interesting in this medium RPI group. The next split in this group is based on Opponents Effective Possession Ratio (OEPR). In the past five years, there have been exactly nine medium RPI teams with an OEPR less than 0.9147 (Group R2), and all nine of them advance to the Sweet Sixteen. These teams were Southern Illinois, Memphis, and Tennessee in 2007, Villanova, Michigan State, and Missouri in 2009, and Butler, Tennessee, and Purdue in 2011. Interestingly, three of these teams went on to advance to the Final Four (Michigan State and Villanova in 2009 and Butler in 2010).

Traveling in the opposite direction down our tree from the medium RPI node, we come to the medium RPI, high OEPR group. In the last five years, 40 teams have fallen into this group and only 16 (40%) have advanced to the Sweet Sixteen. However, we make one more split in this group, this time based on an Average 2nd Half Scoring Margin (A2HSM). Of the 40 medium RPI, high OEPR teams, there are 31 with high A2HSM (Group R3) and 16 of them (51.61%) advanced to the Sweet Sixteen. On the other hand, of the medium RPI, high OEPR teams, nine of them have had A2HSM of less than the threshold (Group R4), and none of these teams over the past five years has advanced to the Sweet Sixteen. These teams included Duke, Arizona, and Kentucky in 2007, Vanderbilt and Florida State in 2008 and 2009, respectively, along with Pittsburgh and Texas A&M in 2010 and Georgetown and Notre Dame in 2011. In fact, 5 of these 9 teams (55.55%), all seeded 8 or higher, were upset in the first round by lower seeds. It’s interesting to note that 2 of these 5 upsets were perpetrated by VCU as an 11 seed in 2007 and then again in 2011.

Larger RPI Teams In 2012

So what about this year teams? There are 6 teams in the tournament this year that fall into group R1. These teams are Kentucky (1), Michigan State (1), North Carolina (1), Syracuse (1), Kansas (2), and Duke (2). These are the high Overall RPI teams that almost always advance to the Sweet Sixteen.

Group R2 only contains one team this year, Ohio State (2). Recall that in the last five years, all nine of the teams from group R2 have advanced to the Sweet Sixteen.

Group R3, which has advanced 51.61% of the time over the past five years includes eight teams this year: Missouri (2), Marquette (3), Baylor (3), Georgetown (3), Indiana (4), and Wichita State (5) and Memphis (8).

Group R4 consists of Florida State (3) and Michigan (4). No team from group R4 has advanced to the Sweet Sixteen in the last five years. We’ll see if either the Seminoles or the Wolverines can snap the losing streak for R4.

Smaller RPI Teams From The Past

While there are some interesting teams with large RPI, the real excitement of March comes from the small RPI guys. If we look at the small RPI teams, the first split is based on Assist to Turnover Ratio (ATR). There have been 206 smaller RPI teams with low ATR (Group L1), and only 11 (5.34%) of these teams have advanced to the Sweet Sixteen. The eleven teams that did advance out of this group were USC and Oregon in 2007, Villanova and Western Kentucky in 2008, Arizona in 2009, Washington, Michigan State, and Xavier in 2010 and Florida State, VCU, and Butler in 2011.

If we move in the other direction on the tree, to small RPI, large ATR teams we make one final split based on Opponents Percent of Points from 2 (OPP2). There have been 28 large OPP2 teams (Group L2), but only 5 of those (17.86%) have advance to the Sweet Sixteen: Washington State in 2008, Purdue in 2009, St. Mary’s in 2010, and Richmond and Marquette in 2011. Teams with large OPP2 (Group L3) have advanced 8 out of 12 (66.67%) times in the last five years. These 12 teams were Butler, Washington State, and Vanderbilt in 2007, West Virginia, Michigan State, and Davidson in 2008, Gonzaga and Arizona State in 2009, and Minnesota, Ohio State, UNLV, and Cornell in 2010. Of these 12, the four not to advance to the Sweet Sixteen were Washington State, Arizona State, Minnesota, and UNLV.

Smaller RPI Teams In 2012

So what about the small RPI teams this year? The tournament this year features nine teams in the group L2. These teams include St. Mary’s (7), Florida (7), Notre Dame (7), Creighton (8), Purdue (10), California (12), South Dakota State (14), Belmont (14), and Iona (14).

The L3 group contains one team this year: BYU (14). 8 out of 12 teams in this group over the past five years have gone on to the Sweet Sixteen, however, BYU to the Sweet Sixteen this year seems unlikely as they are a 14 seed and have to win a play-in game just to get into the round of 64. I think it’s interesting that all these fourteen seeds fall into these categories with relatively high probabilities of advancing to the Sweet Sixteen.

All of the remaining teams fall into the L1 category, which has advanced a little over 5% of its teams to the Sweet Sixteen. Some of the notable teams that fall into this group include Wisconsin (4), New Mexico (5), Temple (5), Vanderbilt (5), Murray State (5), Cincinnati (6), UNLV (6), and San Diego State (6). Also, in this group are Gonzaga (7), Kansas State (8), Iowa State (8), Alabama (9), Saint Louis (9), UConn (9), and Southern Mississippi (9).

So the moral of the story is don’t get too excited about Florida State (even though they just won the ACC) or Michigan, and if you’re looking to pick an impressive upset, just about any of the fourteen seeds will do.

Good luck picking your brackets, and my condolences when you inevitably lose to a person who chose their bracket based on teams colors. You’ll get ‘em next year.