Don’t blindly trust Big Data


The word ‘prediction’ triggers an uncomfortable feeling inside of me, since I read Nassim Nicholas Taleb’s book ‘The Black Swan – Impact of the Highly Improbable.’ The author demonstrates the astonishingly bad track record of human predictions and elaborates on philosophical, statistical and psychological issues that make it impossible for us to predict. McAffee and Brynjolfsson – [1], however, state in an article that was required for this course that ‘using big data leads to better predictions, and better predictions yield better decisions’. Now I would call this, two contradicting points of view.

Interestingly enough, Taleb recently wrote an article in Wired Magazin about the flaws of Big Data entitled ‘Beware the Big Errors of Big Data’ – [2]. His view is that Big Data gives us more information, but also more false information. Consequently, he argues, the sheer amount of data makes it easy to find statistically significant relationships between variables. We humans like the simplicity of thinking that a certain effect has one or at least a few understandable causes – a phenomenon Taleb calls Narrative Bias. To bring this blog back in line with our course, let’s consider the example of taking twitter activities as the bases for prediction of stock markets, as mentioned in Luo et al.’s article -[3]. The article states that twitter activities are ‘significant leading indicators of firm equity’. Just like the student 322165pk before me, I doubt this relationship. People are self-aggrandizing and therefore not honest on twitter. To base predictions on this sort of data seems careless. Nevertheless, my key concern is not the quality of the collected data but Big Data itself, which sources from twitter etc..  Big D. makes it virtually impossible not to find relationships like the ‘twitter predicts stock market’. The researcher can easily fall for a confirmation bias, so a quest to search for information to confirm their own point of view, whilst disregarding contrary or opposing data.

To visualize this point, please have look at the following: The graph shows that the more variables and information we have the more false correlation we will obtain.

bigdatabigerrors_taleb

Obviously, as a BIM student I would like to know why I should learn about Big Data, if I am that doubtful about its predictive abilities. Big Data can show how people use your products in ways that you have not expected, for example, services like tumblr started out with a focus on erotica and later the makers discovered that their site was mainly used for other, less wicked purposes. Generally speaking it can serve as a tool for self-measurement, as Mouton [4] puts it, and tell us the things we do not know instead of confirming the things that we would like to know.

I hope I could spark some interest about the author and his books ‘Black Swan’, ‘Fooled by Randomness’ and the freshly released ‘Antifragility’. If you are interested then you should join me to Amsterdam on the 4th of October, as Nassim Taleb is holding a lecture in the Pathé Tuschinkski. [https://www.nexus-instituut.nl/en/events/137-nassim-nicholas-taleb] Get your tickets!

Nils

376314

[1] McAfee, A., and Brynjolfsson, E. 2012. Big Data: The Management Revolution. Harvard Business Review 90(10) 60-68.

[2] http://www.wired.com/opinion/2013/02/big-data-means-big-errors-people/

[3] Luo, X., Jie, Z., and Duan, W. 2013. Social Media and Firm Equity Value. Information Systems Research 24(1) 146-163.

[4] http://www.minyanville.com/sectors/technology/articles/Does-Big-Data-Have-Us-2527Fooled/8/19/2013/id/51346?refresh=1

http://www.fiercebigdata.com/story/big-data-predictions-fooled-randomness-and-subject-tmi/2013-08-21

Tags: , , , , , ,

About nilsbrosch

376314nb

3 responses to “Don’t blindly trust Big Data”

  1. lmoffereins says :

    Great post. I can’t help but notice not only this topic but also other IT/BIM-related subjects are mostly psychological cases that require a deeper understanding of the human psyche before bluntly putting al the available technology in people’s hands.

  2. nilsbrosch says :

    Thanks! Indeed, human psyche is more than relevant to correctly use statistics and Big Data. However, I don’t think we need more understanding of our own psychological processes, but a general scepticism towards large chunks of data. It is easy to feel more confident about your results when you have more data.

  3. 343347fl (Fabian Lanz) says :

    I really liked your post. After having read the work of Nassim Taleb myself i think what stuck with me the most was, as he pointed out, the tendency we have to believe that just because we are able to make sense of something today we should have been able to predict it yesterday (the obvious solution is always clear when we look at it in hindsight), and by doing so, we often misjudge the limits of our own forecasting abilities. Big data is a great way of compiling information but in the end is people who use this information to make predictions, we as humans have a need to make sense of what seems random so we try to look for patterns, generally what happens is that people overestimate the probability of unlikely events and by doing this the inexorably overweight unlikely events in their decisions (see “Thinking Fast and Slow” by Daniel Kahneman)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: