Just before the Draft Night, there are a handful or maybe even tens of Mock Drafts published online. Some mocks are based on real insider knowledge, some seem to be heavily influenced by those mock drafts with insider information, some on media reporting, and some are probably just guessing in the dark. We, on the other hand, decided to test an alternative approach – using Big Data to predict the first ten draft picks in 2018. Which, technically, is also guessing in the dark. For now at least.
So what is Big data? When can we say that certain data is big enough to be called Big? Wikipedia tells us that “Big data is data sets that are so voluminous and complex that traditional data-processing application software is inadequate to deal with them.”. In accordance to that definition, Facebook data, Twitter data or Google search data could fall within the concept of Big Data. To extract information from Twitter or Facebook, there are several statistical software packages which could to do that for you pretty much automatically, once the code is written, of course. This might be an interesting approach to analyse mentions of players, other hashtags or reposts of NBA club employees and associates, in order to predict which player they like the most. However, we decided to use public Google trends data to do almost the same thing.
Google Trends chart, search frequency in the US in the last 24 hours (blue=Ayton, yellow=Doncic):
Instead of asking ourselves “Which young player the club likes the most?”, the question we could answer using those data was “Which young player fans from the area of the club are the most interested in?”. Assuming that those concepts are to some degree related. So how did we do it? We used the order from the latest mock draft from NBA.com, i.e. the Consensus Mock Draft, which is a compilation of the 10 best mock drafts around the web. And then we checked how many times those players were entered in Google searches relative to the total search-volume across various states in the US. We followed the following principles:
- We analyzed Google search data for the past 7 days.
- We checked Google search data for the states in which the clubs picking in the draft were located (e.g. Suns, Phoenix, Arizona).
- We compared relative search scores for players who were predicted in the Consensus Mock Draft to go to that team/pick no., plus minus 2 (the probable no. 1 pick Ayton appeared in searches for picks 1-3, Doncic as no. 3 in the Consensus Mock Draft appeared in searches for picks 1-5).
- The player with the highest percentage of all searches for compared players got “Big Data Drafted”. Unless he got selected before.
The results can be seen in the table below:
As we can see, the Big data based mock draft differs from the Consensus mock draft. In Arizona, Ayton was well ahead in the number of Google searches. However, in California, where the Kings are located, Luka Doncic was actually ahead on both Deandre Ayton, as well as Marvin Bagley III (with 51% of the sum of all Google searches for Ayton, Bagley, Doncic and Jackson). The model returned a couple of surprising stats. No. 1, that Michael Porter Jr. is much more popular than his projected pick no.; in our Mock Draft he should be going to the Maverics with the 5th pick. Nr. 2, that Trae Young and Collin Sexton won’t be drafted in the top 10. Their “search popularity” was really low in the last week, almost concerning for GMs who would like to fill up the stadium every home game.
This mock draft is an exciting way of predicting picks, but we will have to see in practice how accurate it is compared to mock drafts of insiders. However, its advantage is that it does not only predict the draft statically but can predict it dynamically as well (to some extent). Let’s say that some team not owning a lottery pick traded for Kings’ 2nd pick and decided to select Mohamed Bamba. We could then predict who would go number 3; in this case, it would be Doncic. Last but not least, since this mock draft differs from the Consensus mock draft, it doesn’t seems to be a linear combination of other mock drafts, but is probably a combination of mocks, national media reporting, local media reporting, and materials posted online (e.g. Youtube videos). Plus (quite?) some unexplained variance which might even help the accuracy of this approach. Or not. We will figure it out in less than an hour!