Don’t blindly trust Big Data
The word ‘prediction’ triggers an uncomfortable feeling inside of me, since I read Nassim Nicholas Taleb’s book ‘The Black Swan – Impact of the Highly Improbable.’ The author demonstrates the astonishingly bad track record of human predictions and elaborates on philosophical, statistical and psychological issues that make it impossible for us to predict. McAffee and Brynjolfsson – , however, state in an article that was required for this course that ‘using big data leads to better predictions, and better predictions yield better decisions’. Now I would call this, two contradicting points of view.
Interestingly enough, Taleb recently wrote an article in Wired Magazin about the flaws of Big Data entitled ‘Beware the Big Errors of Big Data’ – . His view is that Big Data gives us more information, but also more false information. Consequently, he argues, the sheer amount of data makes it easy to find statistically significant relationships between variables. We humans like the simplicity of thinking that a certain effect has one or at least a few understandable causes – a phenomenon Taleb calls Narrative Bias. To bring this blog back in line with our course, let’s consider the example of taking twitter activities as the bases for prediction of stock markets, as mentioned in Luo et al.’s article -. The article states that twitter activities are ‘significant leading indicators of firm equity’. Just like the student 322165pk before me, I doubt this relationship. People are self-aggrandizing and therefore not honest on twitter. To base predictions on this sort of data seems careless. Nevertheless, my key concern is not the quality of the collected data but Big Data itself, which sources from twitter etc.. Big D. makes it virtually impossible not to find relationships like the ‘twitter predicts stock market’. The researcher can easily fall for a confirmation bias, so a quest to search for information to confirm their own point of view, whilst disregarding contrary or opposing data.
To visualize this point, please have look at the following: The graph shows that the more variables and information we have the more false correlation we will obtain.
Obviously, as a BIM student I would like to know why I should learn about Big Data, if I am that doubtful about its predictive abilities. Big Data can show how people use your products in ways that you have not expected, for example, services like tumblr started out with a focus on erotica and later the makers discovered that their site was mainly used for other, less wicked purposes. Generally speaking it can serve as a tool for self-measurement, as Mouton  puts it, and tell us the things we do not know instead of confirming the things that we would like to know.
I hope I could spark some interest about the author and his books ‘Black Swan’, ‘Fooled by Randomness’ and the freshly released ‘Antifragility’. If you are interested then you should join me to Amsterdam on the 4th of October, as Nassim Taleb is holding a lecture in the Pathé Tuschinkski. [https://www.nexus-instituut.nl/en/events/137-nassim-nicholas-taleb] Get your tickets!
 McAfee, A., and Brynjolfsson, E. 2012. Big Data: The Management Revolution. Harvard Business Review 90(10) 60-68.
 Luo, X., Jie, Z., and Duan, W. 2013. Social Media and Firm Equity Value. Information Systems Research 24(1) 146-163.