Last year, we tried to predict the NBA 2016/2017 regular season MVP. We presented the results of our MVP data analysis, which was conducted to determine who had the highest chances to win the award last season. We combined different basketball-reference.com data into a dataset with the most relevant regular and advanced statistics for all players who received at least 1 vote in MVP votings between 1979/1980 (introduction of a 3 point shot) and 2015/2016. We used multiple linear regression to predict the number of votes/MVP points. Although that Russell Westbrook was the MVP and our model was clearly wrong, since no statistical model is perfect, we established that multiple linear regression would be useful to identify the best candidates (i.e. All-NBA teams), but then another methods should be applied to predict the winner. In this blog post, we are presenting our binomial logistic regression prediction.
Last year we used the following standard and advanced basketball statistics/metrics in our regression analysis model: Win Shares, PTS, PER (Hollinger), Offensive and Defensive Rebounds Percentages, AST%, STL%, BLK%, Defensive Box Plus-Minus, Games Played, Minutes, Field-Goal Percentage, 3-point Field-Goal Percentage, Free-Throw Percentage, Winning Percentage (team), True Shooting Percentage, Three-Point Attempt Rate, Free-Throw Rate, Usage Rate, Offensive, Defensive and “Total” Box Plus-Minus. In the end, our results suggested us to use the following formula to calculate MVP voting points:
log10(1000 * Share of maximum possible number of MVP points)= -3.172 (Constant) + 2.344 * Winning Percentage + 0.053 * Minutes Player per Game + 0.077 * PER + 0.026 * Turnover Percentage – 0.638 * Three-point Percentage + 1.106 * Free-Throw Percentage – 1.569 * Total Shooting Percentage
This year, we used this formula to predict the top five candidates in MVP voting. The chart below shows the results.
James Harden seems to be a clear favourite to win it this year, after Russell Westbrook easilty beat him in 2017 with his record breaking triple-double numbers. However, this time, we will develop another model, a binomial logistic regression model, which can be used to predict probabilities that a certain player will win the MVP award based on his regular season stats.
We had to be smart about the selection of players in MVP votings between 1979/1980 and 2015/2016, since players with just a few votes would be quite irrelevant at this stage. That is why we decided to remove all players who collected less than 50% of all possible points from journalists. For example, if a player had received 100 (all) 3rd place votes (5 points), he would have collect exactly 50% points, which seems a reasonable threshold. We ended up with a sample size of 74, made up of 37 MVPs (value 1) and 37 non-MVPs (value 0). The analysis returned the following function:
Logit = -46.487 (Constant) + PTS*0.667 + FG%*(-22.157) + WIN%(Team)*28.126 + PER*1.209 + FTrate*(-11.026) + ORB%*0.517 + DRB%*(-0.510) + STL%*(-2.767) + BLK%*(-1.611) + TOV%*0.910 + OBPM*(-1.097) + DBPM*1.585
We know that
Odds = exp(Logit)
Probability = Odds / (1+ Odds)
In the end, we calculated probabilities for the top 5 players:
We probably don’t need to emphasize what kind of surprize it would be if Harden nor Lebron didn’t win it this year. Based on historical data, James Harden is a clear favourite with an estimated probability of 99.7%. And we all know that LeBron’s best performances usually come in the playoffs, which doesn’t matter in the MVP voting, anyway. But we will again have to see if journalists in 2018 do agree with projections based on modelling of historical data.