Archive | September 10, 2013

Stock predictions using Twitter

Apparently I will kick off this year’s discussions with the first (critical) post! After hearing about Professor Li’s research on using sentiment analysis of tweets to predict the stock market, I wanted to share my own findings. I would like to say I have yet to read Professor Li’s paper on this subject, so the points I make might not be applicable to said research.

I first heard about the concept of sentiment prediction in an episode of DWDD (a Dutch talk show) last year [1]. In the program, some young entrepreneurs talked about the company they founded. Their organization, which operates under the name SNTMNT (, analyzes company-specific tweets and predicts the motion of that company’s stock price. Their business model was based upon a thesis written by Laurens van Leeuwen [2]. The author, in turn, was inspired by the paper written by Bollen [3] — the article already mentioned by Professor Li this morning.

Of course, after seeing the interview, I wanted to replicate the models and create a predictor myself. However, after collecting data from Twitter and analysing it, I found it very hard to come even close to Bollen’s 87.6 % accuracy on index trackers, and company-specific predictions looked very random. This is most likely due to my inexperience in Big Data analysis and the small dataset I had to work with (around 1,000,000 messages for the index tracker, and 55,000 tweets as census for company-specific predictions), but it has left me quite sceptical about the whole concept. This all for a number of reasons:

1) The findings are very hard to replicate, because the actual methods and algorithms of the study are not available to the public
Bollen, for example, uses his self-developed “GPOMS” framework to categorize the tweets in certain classes. This framework however is nowhere to be found, while it is crucial to the framework. Of course, this is quite obvious, as a real working predictor might be of enormous value. But, this brings me to the next point.

2) Companies trading on these algorithms do not seem to take off
For example, Derwent Capital Markets, a fund established in 2011 in collaboration with Bollen, had to close earlier this year, and now relaunched to “rethink its strategy”. The company SNTMNT sells predictions, but also does not seem to make outstanding profits. If the statement “outperforming the market with x%” would prove itself to be true, this would definitely be the case.

3) The actual dataset for day-to-day trading seems to be too small for Big Data analysis
With about 25% of the tweets about $AAPL, and just a handful of messages for the other companies, it does not seem that much value can be extracted from the data for day-to-day prediction. Furthermore, many of the messages are plain spam. For example, the quote for the Walgreen Company is $WAG, which resulted in many tweets about the hype “swag”.

4) The “prediction” is made ex-post
Due to its real-time complexity, predictions in research are often performed after the data-collection period. The papers I have seen about this prediction use the same dataset for training their models, as well as the actual testing of their model. This implies that the researchers are actually predicting stocks with foreknowledge. This might explain why current companies fail to implement these models in real life as seen in 2).

I, for one, do believe that stock prices are results of people’s emotions and sentiment, rather than only historical data and factual knowledge. However, I doubt Twitter can, in its current state, make a significant contribution to market prediction due to its random noise. Less spam inflicted channels, like, have long been available, but why has no one been able to set-up a real company performing on working algorithms yet?

Do you think one day we will be able to accurately predict stock prices using sentiment analysis? Will we be able to do so using social networking sites like Twitter? Is the stock market predictable or will it always be represented by some random unpredictable following of events?

I would love to hear your views.