Detecting Anomalies in Large Data Sets

Data has become a common concern recently. Both companies and individuals have had to deal with information in multiple ways in order to improve or obtain insight from operations. IT has allowed unprecedented new levels in data management for both these parties. This blog, however, intends to focus on companies’ management of data–more specifically, in the auditing sector.

Fraud has frequently taken place in markets historically. I like to think of Imtech as an example, for those who aren’t familiar with the company there’s no need to worry–I’m sure you have an example of your own. Drawing back to the topic of Data and Fraud, it is becoming increasingly difficult to accurately determine potential fraud cases with the increase of information available. This has given rise to the use of computer algorithms to detect these cases (Pand, Chau, Wang & Faloutsos, 2007).

An interesting way to tackle this challenge is to use mathematical laws for large numbers in order to determine anomalies within these data sets. One particularly interesting example is the application of Benford’s Law to detect these cases of fraud on company documentation (Kraus & Valverde, 2014). In short, Benford’s Law states that 30.1% of random, naturally occurring numbers starts with a 1; 17.6% with a 2; 12.5% with a 3, and so on. Logically this makes sense given our counting structure. This can be expressed as,

Where, is a number {1,2..9} and is the probability of the number starting with d.

Despite the fact that this method seems promising, Kraus and Valverde (2014), could not find any outstanding peculiarities from their data set that contained fraud perpetrators. However, this law does serve a starting point for a drill-down approach to discovering perpetrators. Which brings us to the more strategic topic of whether IT will ever develop a way to outsmart fraud perpetrators in this context? Is an eternal drill-down chase ever going to take the lead?

What do you think? Will this ever be the case? Is there any way you thought this might work out?

I think it’s pointless-of course, as everything, IT methods have their degree of accuracy. However, I firmly believe there will never be a way to completely ensure an honest and transparent market. Not long ago I heard a man say, “Does anybody here know what EBITDA stands for? Exactly. Earnings Before I Tricked the Dumb Auditor.” It’s human nature, and that might take millennia before it changes ever so slightly.

I’d like to say it was nice to write a couple blogs here, till the next time!


Kraus, C., & Valverde, R. (2014). A DATA WAREHOUSE DESIGN FOR THE DETECTION OF FRAUD IN THE SUPPLY CHAIN BY USING THE BENFORD’S LAW. American Journal of Applied Sciences, 11(9), 1507-1518.

Pandit, S., Chau, D. H., Wang, S., & Faloutsos, C. (2007, May). Netprobe: a fast and scalable system for fraud detection in online auction networks. InProceedings of the 16th international conference on World Wide Web (pp. 201-210). ACM.


Tags: , ,

2 responses to “Detecting Anomalies in Large Data Sets”

  1. 358545sb says :

    First of all, I would like to state that you have discussed a topic of my personal interest here. Also, I am afraid that I have to agree with you on the fact that there will probably never be a way to completely ensure a honest and transparent market. There will always be people trying to game the system. However, I do not agree with you on the statement that trying to find IT methods to detect fraud is pointless.

    During the master Accounting & Financial Management I did last year at RSM, we had a guest lecture from one employee of the forensic accounting department of KPMG. In that lecture, he explained to us what they did there to detect fraud. In fact, among the various data analysis techniques used to go through all the required data, they are using Benford’s Law as well (I’m assuming this actually does work for them, otherwise they would have discarded the technique already). However, this is not the point I want to stress here. I want to emphasize the importance of IT in this field.

    There are three types of information a forensic accountant can have about a dataset. There is information they know about the dataset, information they know that they don’t know about the dataset, and lastly there is information they don’t know they don’t know about the dataset. Acquiring the latter knowledge is called data mining, whereby hidden patterns in the data are detected or can be predicted. By categorizing the data and finding anomalies, patterns and interactions, the investigators are given a sense of direction of where to look. After this they can structure their search, visualize specific ledgers and send queries into the database. All of these techniques require the extensive use of IT. Hence, while I agree with you that a completely honest and transparent market seems an utopia, the value of IT in forensic accounting should not be underestimated.

    • Dennis Oliver Huisman says :

      Thanks for your reply! Glad you’re interested in this too. Right, I probably went too far by saying it was pointless. What I really meant to say is that detection techniques (in my view) will never reach a point where every ‘bad’ guy will be caught. If that were the only purpose of such methods then it would be pointless-but this is not the case. The objective is not to be able to prevent all illegal action but a strong number of it. About Benford’s Law, the paper didn’t recommend using it alone. I imagine KPMG followed the drill-down method there.

      Interesting to hear your take on this. Indeed, IT has a promising potential for this field and many other that can help in “knowing what we don’t know”. I’m eager to see what happens!

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: