Last week, I introduced my new and improved similarity scores model. The model allows us to examine the statistical profiles of teams from the last nine years in order to find the most similar teams to the 2012 contenders.
This week, I’m attempting to improve the accuracy by using historical data that only includes games up to this date in previous seasons. So I’ll be comparing current stats of the top four teams in the Predictive Power Rankings to previous teams’ stats from mid-February.
This poses an interesting question. Which comparison is more revealing: how similar a team is to past teams 25 games into the season, or how similar a team is to past teams at the end of their seasons?
Through-This-Date Comparisons Should Improve Results
In principle, the two snapshots should not be too different. But when teams have stats that are outliers, the differences can be larger, and more interesting. The concept of mean reversion tells us that playing more games will generally moderate how large a team’s deviance from the mean will be.
For instance, it is much more likely that Ohio State’s 77.8% defensive rebounding percentage, which is 3 standard deviations above the mean, will come down rather than go up. As such, when we compare Ohio State to teams that have only played games through the middle of February, we get much closer comparisons than when we compare with teams that have played the entire season:
Ohio State In-Season Similarity Scores
|North Carolina||2007||0.75||1||Elite Eight|
Ohio State Full Season Similarity Scores
As you can see, the difference between the two comparisons is stark. The Buckeyes’ rare combination of excellent defensive rebounding, taking care of the ball, and forcing turnovers is more common in February than it is after March. The other aspect to consider is the SOS adjustment; Ohio State’s in-season comps went further in March and played tougher schedules down the stretch, somewhat depressing their stats. It is reasonable to assume that Ohio State, too, will play a tougher schedule in its last 10 games than it did in its first 25. In fact, the Buckeyes have the toughest Future SOS in the entire country. Thus for a team like Ohio State, it is clear that the mid-season comparisons make more sense.
Kentucky In-Season Similarity Scores
Kentucky Full Season Similarity Scores
Kentucky’s top comparisons are more stable than Ohio State’s. The 2004 Connecticut team is the 5th-closest full season comp. Big Blue’s in-season similar teams are fantastic; Kansas 2010 was the best team in the country before the shock loss to Northern Iowa, and UConn 2006 was a juggernaut before it lost in overtime to that charmed George Mason team. All indications are Kentucky will be one of the top favorites entering March; can they avoid the upsets that plagued some of their closest comps?
Syracuse In-Season Similarity Scores
Syracuse’s top comparison is a bit surprising. The 2010 Maryland squad was terrible on the defensive boards, but took care of the ball exceptionally well and, crucially for being a close comparison for the 2012 Orange, had a highly variable offense. Syracuse has been under 1.00 PPP five times this year, winning four of those games, and over 1.30 PPP four times. The Terrapins got beat at the buzzer by eventual Final Four entrant Michigan State, so Cuse fans shouldn’t panic at this comparison.
Kansas In-Season Similarity Scores
|George Mason||2006||0.23||11||Final Four|
The Jayhawks have the least extreme stats of the four top teams in the Power Rankings, with only one stat (defensive eFG%) over 1.5 standard deviations from the mean. The George Mason comparison is fascinating. We all know what the Patriots did in March, and Kansas will get the advantage of a much easier potential road to the Final Four. The 2009 Kansas comparison is also interesting; that team had more depth than this edition of the Jayhawks, but statistically the duos of Sherron Collins and Cole Aldrich and Tyshawn Taylor and Thomas Robinson are fairly similar.