Tag Archive | Big Data

Detecting Anomalies in Large Data Sets

Data has become a common concern recently. Both companies and individuals have had to deal with information in multiple ways in order to improve or obtain insight from operations. IT has allowed unprecedented new levels in data management for both these parties. This blog, however, intends to focus on companies’ management of data–more specifically, in the auditing sector.

Fraud has frequently taken place in markets historically. I like to think of Imtech as an example, for those who aren’t familiar with the company there’s no need to worry–I’m sure you have an example of your own. Drawing back to the topic of Data and Fraud, it is becoming increasingly difficult to accurately determine potential fraud cases with the increase of information available. This has given rise to the use of computer algorithms to detect these cases (Pand, Chau, Wang & Faloutsos, 2007).

An interesting way to tackle this challenge is to use mathematical laws for large numbers in order to determine anomalies within these data sets. One particularly interesting example is the application of Benford’s Law to detect these cases of fraud on company documentation (Kraus & Valverde, 2014). In short, Benford’s Law states that 30.1% of random, naturally occurring numbers starts with a 1; 17.6% with a 2; 12.5% with a 3, and so on. Logically this makes sense given our counting structure. This can be expressed as,

Where, is a number {1,2..9} and is the probability of the number starting with d.

Despite the fact that this method seems promising, Kraus and Valverde (2014), could not find any outstanding peculiarities from their data set that contained fraud perpetrators. However, this law does serve a starting point for a drill-down approach to discovering perpetrators. Which brings us to the more strategic topic of whether IT will ever develop a way to outsmart fraud perpetrators in this context? Is an eternal drill-down chase ever going to take the lead?

What do you think? Will this ever be the case? Is there any way you thought this might work out?

I think it’s pointless-of course, as everything, IT methods have their degree of accuracy. However, I firmly believe there will never be a way to completely ensure an honest and transparent market. Not long ago I heard a man say, “Does anybody here know what EBITDA stands for? Exactly. Earnings Before I Tricked the Dumb Auditor.” It’s human nature, and that might take millennia before it changes ever so slightly.

I’d like to say it was nice to write a couple blogs here, till the next time!


Kraus, C., & Valverde, R. (2014). A DATA WAREHOUSE DESIGN FOR THE DETECTION OF FRAUD IN THE SUPPLY CHAIN BY USING THE BENFORD’S LAW. American Journal of Applied Sciences, 11(9), 1507-1518.

Pandit, S., Chau, D. H., Wang, S., & Faloutsos, C. (2007, May). Netprobe: a fast and scalable system for fraud detection in online auction networks. InProceedings of the 16th international conference on World Wide Web (pp. 201-210). ACM.

Modern Wars: How Information Technology Changed Warfare

Modern Wars: How Information Technology Changed Warfare

On the 30th of September Vladimir Vladimirovitsj Putin, President of Russia, announced that Russia conducted its first air-strikes in Syria targeted at ISIS (or ISIL). However, in the days after the United States of America and other countries began to question Russia’s motive and use of old school bombing technology which might cause harm to civilians and inflame the civil war in Syria (CNN/Time, 2015). According to US official’s Russian bombing technology is a lacking behind American weaponry in terms of accuracy. As such moves increase the tensions between the East and the West and businesses use information technology to reach their goals, I started to research how information technology has changed warfare over time.

A B-2 stealth bomber refuels.

A B-2 stealth bomber refuels.

The main goals of warfare have not really changed, but the way wars evolve and are waged certainly have. Just hundred twenty years ago, armies marched to battle in their uniforms, lined up against one another, and mainly used weapons with a short effective range. Thus, people who killed one another were always in close proximity. Later on, longer-range weapons emerged, and the distance between the soldiers became larger and larger. Today, some countries have the capability to destroy towns without having to be physically at the site or even have a within a hundreds of miles. All due to the introduction of IT in modern warfare which enables people to fight wars with the touch of button. This instantaneous transfer of information through the Internet and availability of the Internet around the world increases the number of participants in war. Unarmed actors thousands of miles away can participate in a conflict even by sitting at their computer, providing funding or (video/picture) information through the Internet or deep web.

Read More…

Deep Learning: Teaching Machines To Act Human

Deep Learning -Teaching Machines To Act Human

Recently there were increased news articles about AI: Artificial Intelligence. Some very smart people were concerned about the progress made in the field of advanced machine learning. Among them were serial Entrepreneur Elon Musk, the famous researcher Steven Hawkins and legendary philanthropist Bill Gates. All of them signed an open letter expressing their concern about the future of AI. Cause for the signage was a video showing Google owned company Boston Dynamics recording a trial run of their human robot ‘Atlas’ running through the woods among other recent advances in advanced machine learning.

What is advanced machine learning?

The field of machine learning in computer science has been there for a while. Starting during the second world war, the first attempts to teach computer to learn and being human were made. The recent movie around pioneer Alan Turing shows the origins of this scientific research field. Until today the Turing Test is still applied to evaluate if a computer is categorized as intelligent.

During the 80ies and early 90ies further attempts were made to teach computers to behave human. Early solutions weren’t practical caused by the limited processing power during that time. A lot of time passed since then.

So what exactly is machine learning? It’s basically to teach a computer to make sense of data. To teach him to recognize patterns in input values and gain insights from the process. Simple machine learning can be a regression analysis or simple classification of data depending on a single value pair into different categories. Advanced machine learning, of which deep learning is a part of, applies multiple analysis layers in analyzing big data sets. The first layer of an algorithm look only at certain parts of the data and then deliver the output value to an analysis layer further up the hierarchy conducting more abstract calculations with the input from the lower layer and itself delivering values to an even more abstract layer of algorithms. This structure allows the modeling of the human brain, imitating the network of neurons in the brain with many (trillions) of synapses.

Tapping into the huge potential

Today many layers are applied to solve difficult data analysis problems, therefore the name deep learning. With this methodology it is possible to teach a computer to analyze pictures, handwriting, speech, maps or even videos. In the future all applications that seem to be ‘magical’ will be the result of some kind of deep learning. The application are many: Categorization of images, indexation of unlabelled data, analysis of maps, using big data of many sources to refine and improve prediction models and so forth.

Facebook, Google, IBM and many start-ups today already apply deep learning technologies to gain an edge solve difficult problems. Until today there is no computer who can itself program something that can program. But that day will come, its just a matter of time.

Is it dangerous? Maybe. But it can also do much good if applied correctly.

If you’re interested in deep learning, here are some very interesting companies applying this cutting edge technology:

Have you heard about deep learning before? What do you think: Is it the future? Are you afraid of AI? I’m interested what you think so please leave a comment!




Is data the new ingredient for your refined glass of red wine?

Silicon valley, home of the largest tech-corporations, an entire startup ecosystem and where innovation does not seem to have limits. But what if you are the ‘lucky’ one to own a successful startup and decide to sell it resulting in let’s say 20 million dollars on your account – what to do then?

Then you start your own winery just 100 miles north in one of the world’s premier wine regions. Of course being a techie you’re not just going to produce the best wine the old-fashioned way, you incorporate data, because that’s what all companies do nowadays right?

Examples of stories like these are Palmaz winery and Vineyard 29, both data-driven wineries that have proven to produce good quality Cabernet-Sauvignon and Chardonnay wines (Jon Fine, 2011). Bandages with sensors incorporated are attached to the vines in order to keep track of how much water the vines take in. This is send to a computer and can be analyzed to establish a perfect water regime for the vines. Lately the wineries have even been experimenting with sensors attached to specific grapes that can measure the level of water and acidity.

Fruition’ Sciences is one of the first tech-startups that delivers a complete solution for wineries that want to start with a data-driven approach (Julie Bort & Kevin McLauglin, 2013).Their sensors apply heat to the vine stem and measure the temperature before and after applying the heat to figure out how much water is in the plant. The sensors are solar powered and are able to directly send this information to a server.

Although this data-driven approach seems to be a big step forward there are a lot more variables at work in the process of producing wine. It may not be the way to make the perfect wine but it can be a way to enhance the wine’s quality.
I can’t say too much about the taste of a good glass of wine but I can imagine this data-driven approach may change the game of wineries in the future. Being able to monitor exactly how your precious grape is developing and consequently produce high-quality wine may be the next big thing.


Big data only matters if you have a question

Everybody is talking about “big data”. The possibilities of collecting, generating & storing data are endless. Because of its popularity, big data is also more and more becoming a buzz. It has assumed such a variety of terms that it big date has become an unclear term if it is not further specified. When somebody is talking about big data, chances are slim that everybody in the room is thinking about the same thing.


Today, every company feels the need to start using big data. Everybody is talking about it, more and more companies are doing it, hence we need to use it as well ! But often they forget a very important thing

…data on its own is meaningless. The value of data is not the data itself – it’s what you do with the data.

First you need to know what data you need and only after that you should start collecting. Because why would you put so much energy and effort in collecting data that you cannot use to deliver relevant business insights?

Many people try to start with the solution; “we need to use big data”, rather than start with the problem. In order to have a successful data strategy, you need to begin with defining the insights needed to identify the pathways towards growth, otherwise you will drown in all the data available. The focus should always lie on the questions and not on the solution. Big data can offer many answers but it will always require a person to frame the question, identify the data that will be able to provide an answer and interpret the obtained results. Then these results can be used to create a strategy that can add value to your business.


For example, if you wish to enter the weight management market, questions you will need answers to might be, ‘How many people are overweight?’, ‘how many people are interested in losing weight?’, ‘what is the average income of these people?’ and so on. Identifying what needs to be done in order to collect this data will then be a lot easier.

Everybody now has the opportunity to use data. Availability is not the issue anymore. However answers to things that don’t matter, won’t bring you any further. If you focus on the relevant questions first, and tackle them with big data, then the power of data will be of great value to your company.

By: Melanie Pieters, 420914MP





Electronic Markets, Computing Power and the Quants: Volatility & High Frequency Trading

Electronic Markets, Computing Power and the Quants:  Volatility & High Frequency Trading

Markets can be – and usually are – too active, and too volatile”

Joseph E. Stiglitz – Nobel prize-winning economist

As some of you might have noticed, the oil market is currently showing wilder fluctuations at a higher frequency than before: volatility has increased. This happened after the market enjoyed relative stability price stability during the last few years. Of course, this is partly due to U.S. shale oil production, quite high supply and lower demand due to the financial crisis aftermaths, and growing demand and supply uncertainties. However, another factor affecting volatility is the increased usage of trading indicators in combination with changes in trading practices: an increasing number of players in the financial markets tend to use algorithmic and high-frequency trading practices (HFT).

Like other derivative based markets, also the crude oil market has a wide range of market players of which many are not interested in buying physical oil. HFT traders are probably drawn towards oil futures due to the market’s volatility. Because, the greater the price swings, the greater their potential profit. HFT is not an entirely new practise, but as technology evolves it is increasingly present in today’s electronic financial markets.

These players make extensive use of computing and information technology in order to develop complex trading algorithms, which are often referred to as the “quants”. HFT trading firms try to gain advantage over other competitors which are still using mostly human intelligence and reaction times. The essence of the game is to use your algobots to get the quickest market access, fastest processing speeds, and perform the quickest calculations in order capture profits which would have otherwise been earned by someone who is processing market data slower (Salmon, 2014). At essentially the speed of light, these systems are capable of reacting to market data, transmitting thousands of order messages per second, as well as automatically cancelling and replacing orders based on shifting market conditions and capturing price discrepancies with little human intervention (Clark & Ranjan, 2012). New trading strategies are formulated by using, capturing and recombining new information with large datasets and other forms of big data available to the market. The analysis performed to derive the assumed direction of the market makes use of a bunch of indicators such as historical patterns, price behaviour, price corrections, peak-resistance and low-support levels, as well as (the moving average of) trends and counter-trends. By aggregating all this information, the databases and its (changes of) averages are usually a pretty good predictor of potential profits for HFT companies.

This information technology enabled way of trading is cheaper for the executors, but imposes great costs on workers and firms throughout the economy. Although quants provide a lot liquidity, but can also alter markets by placing more emphasis on techniques and linking electronic markets with other markets (as well information as financial linking). In most cases, non-overnight, short-term strategies are used. Thus, these traders are in the market for quick wins and use only technical analysis in order to predict market movements instead of trading based upon physical fundamentals, human intelligence or news inputs.

Recent oil price volatility increased

Recent oil price volatility increased

Although, some studies have not found direct prove that HFT can cause volatility, others concluded that HFT in certain cases can transmit disruptions almost simultaneously over markets due to its high speed in combination with the interconnectedness of markets (FT, 2011; Caivano, 2015). For example, Andrew Haldane, a top official at the Bank of England said that HFT was creating a system risks and the electronic markets may need a ‘redesign’ in future (Demos & Cohen, 2011). Further sophistication of “robot” trading at decreasing cost is expected to continue in the foreseeable future. This can impose a threat to the stability of financial markets due to amplified risks, undesired interactions, and unknown outcomes (FT, 2011). In addition, in a world with intensive HFT the acquisition of information will be discouraged as the value of information about stocks and the economy retrieved by human intelligence will be much lower due to the fact that robots now do all the work before a single human was able to process and act on the information (Salmon, 2014). For those interested in the issues of HFT in more detail, I would like to recommend the article of Felix Salmon (2014).

However, it is important to mention that not only HFT and automated systems and technicalities do cause all the volatility. Markets have known swift price swings for centuries. For example in the oil industry, geopolitical risk can cause price changes as it is an exhaustible commodity. As most people know, also human emotions can distort markets as well as terrorist actions. Even incomplete information such as tweets from Twitter and Facebook posts can cause shares to jump or plumb nowadays. As markets are becoming faster, more information is shared and systems can process and act on this information alone quickly due to (information) technological advancements, which will in turn increase volatility. Therefore, it is more important than ever that there are no flaws in market data streams, e.g. the electronic markets and its information systems need to have enough capacity to process, control, and display all the necessary information to market players in order to avoid information asymmetries.

In my opinion, HFT is strengthened by the current state of computing technology and cost reductions of computing power now enable the execution of highly complex algorithms in a split-second. As prices go down and speed goes up, these systems will become more and more attractive as they outperform human intelligence. This can potentially form an issue in the future: volatility might increase and it is this volatility that provides many opportunities for traders, but not the necessary stability for producers and consumers which are more long-term focussed.

Therefore, in the future action is necessary toquality-efficiency-speed-cost restrict, or at least reduce, HFT. Examples might be big data collection by regulators to monitor risk and predict future flash crash or volatility events. Another option can be the introduction of a “minimum resting period” for trading. So traders have to hold on to their equity or trade for a pre-specified time before selling it on, reducing the frequency and thus volatility. Also, widening spreads will help as it makes quick selling and buying more costly and thus HFT less attractive.

Given that the financial market’s watchdogs currently have difficulties with regulating automated trading. Some HFT firms have enjoyed enormous profits from their trading strategies (Jump trading, Tower Research capital, DRW). For example also during the last turmoil of August this year, a couple of HFT firms earned a lot of money (Hope, 2015). Due to these successes, new players enter the market and competition is growing. As speed is essential (even milliseconds matter) HFT firms try to place their servers physically near the exchanges (such as the NYSE), so they can increase their advantage. The HFT firms are expected to stay in the market, ultimately resulting in more price volatility (Hope, 2015).

What do you think, how far should we let our technology intervene with the financial markets? Do we really need to allow algobot’s or similar automated trading systems to influence our financial markets as they can perform the human job faster, fact-based and at a lower cost? Or should the financial markets be always human intelligence based, which might be ultimately better for the economy as a whole and also provides a richer knowledge base of the real world economy (as it this information remains valuable and numbers do not always say everything)?

In case you are interested in this dilemma, I can also recommend reading Stiglitz’ speech at the Federal Reserve Bank of Atlanta in 2014.

Author: Glenn de Jong, 357570gj

Read More…

The Future of Working

The internet of things is currently a hot topic and has the potential to disrupt the way industries an we as individuals work and live (Gartner, 2015). Yet in our daily lives many of these technologies we see are not connected to each other to make a truly integrated system. There is however a new office building in Amsterdam called the Edge, who declares itself the world’s most sustainable office building.

The Edge, which opened its doors on 29 May 2015, is a multi-tenant office building that is far ahead of its time in terms of quality, sustainability and user comfort (The Edge). The moment you wake up as an employee, your connected to the building. Your schedule is checked, and when you arrive at the office, your car is recognized and you are directed to one of the parking spots with electric powering for your electric vehicle. When you enter the building you are appointed to one of the types of desks in the building based on your schedule (sitting desk, standing desk, work booth, meeting room, balcony seat, or concentration room) (Randall, 2015). By using this appointing technology the office is able to provide 2,500 employees a working space with only 1,000 desks. Once you arrive at your desk, the temperature and lighting changes to your preferences. Currently the office provides 39,673 square meter of floor office, whereby 92,3% is rented and there are in total more 28,000 sensors in the whole building. Data is constantly stored and used for big data analysis to optimise every possible aspect of the building.

The British rating agency BREEAM, gave the the Edge the highest sustainability score ever awarded: 98,4 percent. Not only do employees experience the comfort and stimulating working environment, the building produces more electricity than it uses. Thermal energy is stored 130 below the ground and generates the required heating and cooling of the building. In the summer when it is hot, warm water is stored and isolated below the ground and in the winter this energy is used.

While this building provides us an insight of what the future of working might hold, the question is if office building and possibly homes will look this way in the future? Or is this just a prestige project carried out by multiple partners? What aspects do you think will be implemented in offices in the near feature? What are your thoughts on this?

Works Cited

Gartner. (2015). The Internet of Things Enables Digital Business. Retrieved September 27, 2015, from Gartner.com: http://www.gartner.com/technology/research/internet-of-things/

Randall, T. (2015, September 23). The Smartest Building in the World. Retrieved September 27, 2015, from Bloomberg Business: http://www.bloomberg.com/features/2015-the-edge-the-worlds-greenest-building/

The Edge. (n.d.). Info. Retrieved September 27, 2015, from The Edge: http://www.the-edge.nl/en/info

Amazon knows what you´re going to buy BEFORE you even push “buy” – they know you too well!

Everybody is highly aware about the data-driven culture at Amazon, and how they are utilizing Big Data in every direction to boost revenues. Now they are going down another avenue of their business models powered by Big data, more specifically, their distribution channels and how they are delivering products to their customers. Amazon wants to ship products to their customers even before they make a purchase – Amazon knows their customers very well through Big Data patterns. They use previous orders, product searches, wish lists, shopping-cart contents, returns and even how long an Internet user’s cursor hovers over an item to decide what and when to ship. This “anticipatory shipping” will dramatically reduce delivery time and probably increase customer satisfaction to the extent that the customer will be even more willing to use online-channels. Amazon continues the battle of customers with instant order fulfillment, which everybody I assume has experienced at IKEA. IKEA´s business model and its value proposition is reflected upon their capability of instant order fulfillment, maybe not on all products, but especially the fast movers. We love to get the products we order and buy right away, so imagine the case of ordering from home and get it the same day or even within a couple of hours. It is important to mention that this is not something that Amazon has implemented yet, but only filled a patent. Anyway, this truly reflects the capabilities of Amazon´s data scientists to utilize Big Data to transform their business model and in the end use their supply chain as a strategic weapon. This shows the relentless implications of predicting customer behavior/demand.

If we take this predictive shipping to the next level and combine it with Amazon´s vision of transporting products using unmanned flying vehicles, they will dramatically change the order fulfillment process of both online and offline players. The drawback of this predictive shipping process would of course be costly return and unnecessary impact on the external environment. But as this algorithm is constantly fed with new data, the prediction will strengthen by time. So next time you´re diving into Amazon´s endless world of products it might be the case that one of these products is already on its way!

Source: http://techcrunch.com/2014/01/18/amazon-pre-ships/

Big Data and Mobile Data Security: Two bagels and a Cup of tea

Every day you wake up with that same daily ritual: alarm goes off, you get ready, you leave the house. Given your high environmental consciousness (or the lack of a driver’s license), you take as part of your morning ritual to set off on a train to get to your destination. To help you kill time and make your ride more enjoyable you take your mobile phone out of your pocket, connect to the train’s Wi-Fi and complete your journey as most of the people in the train.

What seemed to be a normal day may come with an unpleasant surprise. We often make use of public hotspots to save a few of those megabytes that consume our bill at month end. However, what is often neglected is the security of these connections.

Hannes Muhleisen is just an Amsterdam citizen who happens to live in a boat. In one regular afternoon he was setting up his internet connection when his laptop recognizes the Wi-Fi network very familiar to many of us, the “Wifi in de trein”, as a train passed by. Curious, Muhleisen decided to experiment by setting up equipment to ‘listen in’ into the devices of the train’s travelers (Maurits, M 2015). Would NS provide such an unsecure connection to its customers? With two antennas and some open software, Hannes was set to test. Thus, you are probably wondering: what kind of information was he able to pickup?

  • 114,558 different MAC-addresses over 5 months
  • Unique numbers of devices, time and data
  • Device history of web-browsing and app usage
  • Types of devices the travelers were using (e.g. Apple, Samsung, etc.)

For the additional fun, Muhleisen even created a model evaluating Wi-Fi usage based on the weather.


Muhleisen’s example is just one in many of the big data security and privacy concerns. However, these extend further than simply individuals’ data security, it also affects society and organizations. Among the top 10 big data privacy risks are (Herold, R 2015):

  1. Targeted marketing leading private information to become public.
  2. The need to have one piece of data linked to another to make sense would make your data impossible to be anonymous.
  3. Based on the previous point, data masking could easily be overrun and reveal personal information.
  4. Big data can be used to influence business decisions without taking into account the human lives involved.
  5. Big data does not contain rigorous validation of the user data, which could lead to inaccurate analytics.
  6. Big data could lead to discrimination of job candidates, employee promotion and more because it is an ‘automated’ discrimination.
  7. There are only few legal protections to involved individuals.
  8. Big data is growing indefinitely and infinitely making it easier to learn more about individuals.
  9. Big data analytics allows organizations to narrow documents relevant to litigation, but raises accusations of not including all necessary documents.
  10. Due to the size of big data it makes difficult to make sure patents and copyrights are indeed unique.


All these implications lead to major concerns towards IT security investments, paranoia and conspiracy theories. How to tackle all the ethical implications that come with big data? If one man with two cheap antennas can collect enough data to learn what you ate for breakfast, what can corporations do to trigger behaviors using first-of-line equipment? Whether you are an iOS or Android user, the big brother is watching.

Lilian Shann, 342890ls


Marits, M (2015). De wifi in de trein is volstrekt onveilig (en de NS doet er niets ann), [Online], Available at: https://decorrespondent.nl/3166/De-wifi-in-de-trein-is-volstrekt-onveilig-en-de-NS-doet-er-niets-aan-/97373496-af07ccc1  [Accessed: 13 September 2015].

Herold, R (2015). 10 Big Data Analytics Privacy Problems, [Online], Available at: https://www.secureworldexpo.com/10-big-data-analytics-privacy-problems [Accessed: 13 September 2015].

Damato, T (2015). Infographic: What’s threatening your mobile apps?, [Online], Available at: http://blog.vasco.com/application-security/infographic-whats-threatening-mobile-apps/ [Accessed: 13 September 2015].

Big Data for Big Energy Savings.

Articles on big data and its many implications for the world around us are yet around for a while and numerous. From improving healthcare and optimizing firm performance to fooling terrorists plots by the NSA, big data’s reach touches nearly all imaginable facets of the modern world. The plots and marketing of the hit series House of Cards has even been based by its creators, Netflix, on observations from their 33 million users, a topic various authors discussed on this blog. These are all great examples of the seemingly unlimited applications of big data and the impact it can have on all of us.

Though, what seems to miss in all the buzz around Big Data is what it can do to help up face the probably biggest challenge of our generation; drastically decreasing our energy consumption.

Is this because of the reluctance of the consumer or are the large energy companies to blame?
Either way, the solution may very well lie in the use of Big Data.

If Netflix created House Of Cards success largely by using the known habits, preferences and ratings of its users available to them, why can’t the energy firms come up with personalized advices and energy saving offerings to its customers?

The Tata Consultancy Services 2012/2013 Big Data Study revealed that the Energy sector is one of least advanced sector with regard to the use of Big Data. Whereas in the same study is stated that Energy firms Big Data spending in 2012 resulted in a 60.6% return, the second highest score!


Table from The Tata Consultancy Services 2012/2013 Big Data Study

Energy companies could start right away with using the terabytes of data on past bills and participation in past energy saving initiatives to send out personalized advices on energy saving. Furthermore, they can send personalized offers for energy-saving products and measures. This can greatly increase the engagement of their customer and result in new revenue streams. And effectively face the greatest challenge of our generation.

How would you feel about receiving personalized advices and offerings to reduce your energy consumption from your energy provider, based on your past energy consumption behaviour?


The Tata Consultancy Services 2012/2013 Big Data Study, 2013, Tata Consultancy Services Limited

Why Big Data Analytics Could Spell Big Energy Savings, 27 February 2013, Spotfire Blogging Team

Big Data and Predictive Maintenance: Fast tracking past the Trough of Disillusionment

Last week Rotterdam School of Management organized their annual “Leadership Summit”. This year’s topic was Big data – what’s in it for me? Many examples were presented on how data with high velocity, variety, and volume can bring added value to organizations. What was left somewhat uncovered was how much benefit can big data bring. What is the monetary gain of collecting, cleaning and analyzing the vast amount of data points? General Electric recently held their “Minds + Machines” event presenting their vision on how data will change organizations. They organized a customer panel to provide insights on how they have benefited from GE’s software. Interestingly, GE’s customer cases happened to almost coincide with ones that Jens-Peter Seick, Vice President Product Management and Development at Fujitsu presented on stage, but provided some additional information on how much value can be gained. Let’s take a look how these two companies are adding value to other organizations through big data.

In his presentation Jens-Peter Seick from Fujitsu explained how big data is already being used for predictive maintenance purposes. Predictive maintenance allows machinery and equipment to be maintained based on their condition instead of a time-based schedule. This allows to save on costs as maintenance is only done when it is required. The condition is monitored using sensors and data logged by information systems and is analyzed using statistical techniques to plan and predict maintenance operations. These sensors form a part of the often talked about phenomenon “Internet of Things”, or Industrial Internet as GE likes to refer to it. Mr. Seick used the example of jet airplanes collecting gigabytes of engine data to relay to maintenance personnel in order to predict fleet malfunctions and be prepared with the correct parts available. As a GE customer, AirAsia used data collected from the GE engines in their fleet to route their planes on more efficient routes saving up to $10 million in fuel costs.

Mr. Seick talked about how offshore wind farms can relay information on their condition observed by sensors. It can then be combined with weather and other external data sets to predict failure points and find the right time to send out a boat for a maintenance operation. GE provided the example of an offshore oil rig that saved 7,5 million dollars by predicting a parts failure and allowing preventative maintenance to be done. Energy company E.ON has benefited from GE’s assistance with the data gathered from its wind farms, generating 4% more power in the turbines than previously.

By saving millions for its customers, GE’s annual revenues from its big data analytics efforts already tops $1 billion and they are continuing to invest heavily into the field. In addition to GE and Fujitsu other players in the field include all the big names from IBM to Microsoft. Professor Eric van Heck mentioned the Gartner Hype Cycle in his presentation at the Summit and pointed out the position of big data on the verge of falling into the trough of disillusionment. With so much interest and added value already brought to companies I can’t see big data staying in that valley for too long.

Are there any areas you know where big data is already being used effectively?


GE’s Customer Panel at “Minds + Machines”

Jens-Peter Seick at the RSM Leadership Summit 2014


Is your food healthy and safe? Ask your own nutritionist —new ‘smart chopsticks’


Smart tech is a new term for tech that connects to the Internet or communicates machine-to-machine. It has a rapid development these years, from heating our homes, leading people to recycling now to keep a watchful eye on people’s daily lives. Recently, Chinese search engine Baidu has created chopsticks (known as “Kuaisou” in Chinese) -that can detect whether the food is safe to eat (The guardian, 2014).

How does it work?

smart1In early Spring of 2014, this idea of the device was born of an April Fool video in China, but then it generates lots of excitement that encourages this joke idea to become a reality.

The version 1.0 pair of smart chopsticks was revealed in the beginning of September 2014, it runs alongside a smartphone app to judge the freshness of the oil used for cooking, when you placed the chopsticks in three cups of cooking oil, sensors attached inside are able to tell the temperature, PH levels and also calories. If the oil is not fresh, the chopsticks flash a red light.

Although now this idea is not being commercially produced yet, it can give rise to the public concerns since it is a marvelous example of how technology can be used for the power of food—not just making our lives more convenient, but actually leading people a safe life. It detects any anomalies within their food in such fast, easy and averagely accurate manner that everyone can use/check it.


It is said that China has the largest market for this product because of the severity of food safety, specifically the food threats such as pork spray painted red to make it beef or food coated in paint can be solved. Even in Europe there are also food scandals such as poisonous cooking oil, dioxin laced chickens, which can be solved by this technology. In addition, the intelligent part of it is not chopstick itself but the sensor implemented inside. That good idea can later on also be used in folks or knives as a smart device.

However, I am considering that will this technology be over-sensitive, if all people turn to care more about the detailed food elements since they start to use the chopsticks to check, will it be a kind of risk that people are afraid of eating some food which should have be fine to be eaten but get bad results through this pair of chopsticks? Then can people still enjoy the food? Moreover, whether and how to make this tech go into commercial production? As the development is still in its first stage, the company just made only a few limited run of prototypes, there is no exact release date or a pre-set price for that. We will see how Baidu define its international market in the near future.


Bea, F., 2014. A spin on Google Glass and smart chopsticks on Baidu’s menu. [blog] Available at: <http://www.cnet.com/news/a-spin-on-google-glass-and-smart-chopsticks-from-chinese-search-company-baidu/> %5BAccessed 26 September 2014]

Donews, 2014. Tech.ifeng.com [online] Available at: <http://tech.ifeng.com/a/20140903/40785057_0.shtml> %5BAccessed 27 September 2014]

The Guardian, 2014. China launches chopsticks that tell you if your food is safe to eat. [online] Available at: <http://www.theguardian.com/world/2014/sep/04/baidu-china-search-engine-smart-chopsticks-food-safety> %5BAccessed 26 September 2014]

Big Data is watching you!



Edward Snowden has revived the debate privacy versus security. Edward Snowden, born in 1983, is a computer specialist that has worked for the CIA and NSA. On June 5th he became world-famous for his whistle blowing of massive privacy violation by the United States and British governments.

Snowden pointed out that the NSA had developed PRISM, a mass electronic data mining system. This system is able to track every US citizen with a digital footprint. The problem is not that PRISM can collect data, the problem is the data that is being collected:

…audio and video chats, photographs, e-mails, documents, and connection logs… [Skype] can be monitored for audio when one end of the call is a conventional telephone, and for any combination of “audio, video, chat, and file transfers” when Skype users connect by computer alone. Google’s offerings include Gmail, voice and video chat, Google Drive files, photo libraries, and live surveillance of search terms.

                                                                      – Washington Post

This not only counts for Skype and Google, but for Microsoft, Facebook, YouTube, Yahoo and many more as well. Practically everything you have ever done online is to be found by PRISM, including a lot of private information that you would not share with your government voluntarily.

When thinking of the consequences a system like PRISM might have on privacy, the famous movie ’1984’ from George Orwell comes to mind as an extreme possibility. A state where every civilian is closely monitored and controlled by the government. A state in which the government knows everything about you, and sees everything you do. Big Brother is watching you. And Big Brother uses big data to do this.

As reaction to Snowden’s whistle blowing, the NSA has proclaimed that this kind of surveillance has already prevented dozens of potential attacks towards the US. There lies truth in this as well, as increased surveillance also increases the likeliness to detect acts of violence/crime/terror.  

When we abstract to a higher level, there seems to be a constant trade-off between privacy and security. We all want to be secure, but we all see privacy as an important and not-to-be-violated right at the same time. The government is the main body to offer security, but it increasingly wants more of your private information to do this.

There remains one question:  How much of your privacy are you willing to trade for security?









Big Data can help in converting brick and mortar store visitors into buyers

A Russian tech start-up called Synqera offers a solution that gives personalized shopping experience to visitors of physical retailers. According to Synqera’s study 67 % of the Americans prefer shopping in-store to buying online and 81% of surveyed agreed on that they prefer to shop in a store that provides customized shopping experience.

(Source: http://www.marketingcharts.com/wp/topics/e-commerce/shoppers-favor-customized-in-store-experiences-29669/)

The company offers a solution that involves several devices (touchscreen, NFC-enabled, Digital cameras) to collect personal information about customers and offer them personalized discounts, products or coupons. The process starts when the customer enters the shop, where he can use the “Loyalty Generator” that contains a touchscreen and an NFC-reader. The system reads the customer’s loyalty card and provides personalized information (store map, offers, and announcements) and while he or she is shopping it offers goods and discounts through the in-store devices. When the customer arrives at the cashier the system is offering once more customized ads and then asks for feedback.


The Loyalty generator can recognize facial expressions in order to analyse customer’s mood and in-store sensors scan baskets and visitors appearance. It gives suggestions based on season, time of the day and other personal data. For example the system may offer a six pack of beer for a 23 years old student at Friday night especially if he or she looks excited, but it won’t do the same if the student is under 18.

Synqera has already made a deal with the Russian “Ulybka Radugi” and the company claims that European and American chains have already expressed their interests towards Loyalty Generator.

On the first sight, Synqera’s idea looks useful especially if we consider the time we spend in grocery stores looking for few things for dinner. However it is a question whether customers would be willing to hand over such personal data for less frustration and for better deals.

Do you think the idea will spread or the NSA scandal and EU regulations will prevent Synqera from penetrating Western markets?





When Big Data meets Psychology.

Research on personality has always been quite time-consuming: looking for participants who are willing to fill out an extensive questionnaire can be very difficult. The following study provides next to interesting insights, and a great basis for further research on personality.

A group of scientists from the University of Pennsylvania conducted one of the largest studies on language and personality ever. They analyzed the Facebook status updates of 75,000 participating volunteers, resulting in more than 700 million words, phrases and topics.

All the participants completed a personality questionnaire through an application and made their updates available for this study. This way, the researches could look for linguistic patterns and match them to character traits. They found amazing correlations between personalities and language used in social media and were able to built models that predicted the individual’s age, gender and result of the personality questionnaire based on their online communication.

The researchers created word clouds with words, expressions and symbols that were common to the ‘psychological’ world of people with a certain trait. The word clouds provided insights into the relationship between personality or traits and language. For instance, participants that scored well on emotional stability in the questionnaire turned out to refer to sports much more than did others. This could be a great opportunity to explore; neurotic people could become more emotionally stable if they would engage more in sports.

The researchers are quite font of this new research method: they state that ‘’these word clouds come much closer to the heart of the matter than do all the questionnaires in existence.’’ Moreover, ”traditional studies have found interesting connections with pre-chosen categories of words such as ‘positive emotion’ or ‘function words.’ However, the billions of word instances available in social media allow us to find patterns at a much richer level.”
This research method could be much more efficient, as participants in future studies could just make their social media updates available, instead of filling in surveys and questionnaires. This could lead to an enormous increase in participants willing to volunteer for research purposes.

These are the word clouds that the study made available. Take a look at your own status updates and compare: do you fit in your profile?









Don’t blindly trust Big Data

The word ‘prediction’ triggers an uncomfortable feeling inside of me, since I read Nassim Nicholas Taleb’s book ‘The Black Swan – Impact of the Highly Improbable.’ The author demonstrates the astonishingly bad track record of human predictions and elaborates on philosophical, statistical and psychological issues that make it impossible for us to predict. McAffee and Brynjolfsson – [1], however, state in an article that was required for this course that ‘using big data leads to better predictions, and better predictions yield better decisions’. Now I would call this, two contradicting points of view.

Interestingly enough, Taleb recently wrote an article in Wired Magazin about the flaws of Big Data entitled ‘Beware the Big Errors of Big Data’ – [2]. His view is that Big Data gives us more information, but also more false information. Consequently, he argues, the sheer amount of data makes it easy to find statistically significant relationships between variables. We humans like the simplicity of thinking that a certain effect has one or at least a few understandable causes – a phenomenon Taleb calls Narrative Bias. To bring this blog back in line with our course, let’s consider the example of taking twitter activities as the bases for prediction of stock markets, as mentioned in Luo et al.’s article -[3]. The article states that twitter activities are ‘significant leading indicators of firm equity’. Just like the student 322165pk before me, I doubt this relationship. People are self-aggrandizing and therefore not honest on twitter. To base predictions on this sort of data seems careless. Nevertheless, my key concern is not the quality of the collected data but Big Data itself, which sources from twitter etc..  Big D. makes it virtually impossible not to find relationships like the ‘twitter predicts stock market’. The researcher can easily fall for a confirmation bias, so a quest to search for information to confirm their own point of view, whilst disregarding contrary or opposing data.

To visualize this point, please have look at the following: The graph shows that the more variables and information we have the more false correlation we will obtain.


Obviously, as a BIM student I would like to know why I should learn about Big Data, if I am that doubtful about its predictive abilities. Big Data can show how people use your products in ways that you have not expected, for example, services like tumblr started out with a focus on erotica and later the makers discovered that their site was mainly used for other, less wicked purposes. Generally speaking it can serve as a tool for self-measurement, as Mouton [4] puts it, and tell us the things we do not know instead of confirming the things that we would like to know.

I hope I could spark some interest about the author and his books ‘Black Swan’, ‘Fooled by Randomness’ and the freshly released ‘Antifragility’. If you are interested then you should join me to Amsterdam on the 4th of October, as Nassim Taleb is holding a lecture in the Pathé Tuschinkski. [https://www.nexus-instituut.nl/en/events/137-nassim-nicholas-taleb] Get your tickets!



[1] McAfee, A., and Brynjolfsson, E. 2012. Big Data: The Management Revolution. Harvard Business Review 90(10) 60-68.

[2] http://www.wired.com/opinion/2013/02/big-data-means-big-errors-people/

[3] Luo, X., Jie, Z., and Duan, W. 2013. Social Media and Firm Equity Value. Information Systems Research 24(1) 146-163.

[4] http://www.minyanville.com/sectors/technology/articles/Does-Big-Data-Have-Us-2527Fooled/8/19/2013/id/51346?refresh=1