Find real business insights with a data processing pipeline approach

I love beach vacations. It is a great time to relax, have fun and let time slow down. However, nothing spoils a beach vacation like losing car keys. It happened to me recently. Aside from the inconvenience and disruption, the one thing that struck me was the monumental task of searching for the keys in sand. Despite knowing what I was looking for, what it looked like and where to search, it took me hours before I found it. (I would have given up sooner, but it was a remote beach).

Just imagine how much harder this would have been if I didn’t even know what I was looking for. This is the challenge we must deal with when it comes to finding insights in our business data. Unlike regular business reporting, insights require noticing things that we were not explicitly looking for. To make things worse, the rate of data growth is increasing thanks to connected systems, the internet, and now IoT. The scary fact is that 90% of the data in the world was generated in just the last two years.

So, how do we handle this challenge? We could take a page out of the book of an old and well-established industry like mining – more specifically diamond mining. Just like us, they also have to go through mountains of stuff before they find the gems they are looking for. Diamonds are mined out of these huge carrot shaped ore deposits (kimberlite). Digging out these ore deposits means that there is a lot of excess material that needs to be mined with the diamonds. On average, 1750 metric tons of material are extracted to find a 1 carat diamond.

The diamond industry meets this challenge by setting up a processing pipeline. The ore is mechanically crushed, sorted, and filtered until “roughs” have been separated and can be hand sorted. The roughs are checked and graded, with only the best selected for the grinding and polishing process that ultimately turns a dull, translucent rock into a gemstone.

data processing pipeline

 

How can we adapt this model to extract the hidden gems of insight from our data? One possible approach is shown above. Here, the data is processed at multiple stages to extract meaning and insights. These processing stages convert data to information, intelligence, decisions and finally action. At each stage in the process, the volume of data drops by an order of magnitude or more.

As the old adage goes, “There is no such thing as an original idea.” Such is the case here. The data processing pipeline concept is not new and is already used in some shape or form in most organizations. What is new is the amount of data and our inability to extract competitive advantage from it.

Dealing with this deluge requires systems that not only look for known patterns, but also unusual or previously unknown patterns. Traditional data processing is not up to this task. People are great at this, but the sheer volume of data makes using people for this unviable.

Artificial intelligence (AI), however, is ideal for dealing with this type of situation. It is capable of sifting through mountains of data and identifying unusual patterns. It may not understand what these patterns represent, but it can look through Gigabytes of data to locate a new unusual pattern for further analysis.

So, what does this all mean for people who currently do this work? Wouldn’t this make them redundant? Far from it — they are even more critical for the future. While an AI-driven pipeline can extract intelligence from data and even make some routine decisions, identifying new insights and making decisions will continue to require people. The only difference is that their work is going to be lot more engaging as only interesting situations will be presented to them.

The later stages in the data processing pipeline require increasing amounts of human input and oversight, especially as the pipeline moves to decisions and actions. This work involves analysis and the ability to make decisions and is generally seen as engaging and interesting. For the foreseeable future, people will continue to do the bulk of the work in these later stages.

Data is a great resource and how it is utilised will separate the leaders from the followers in the future. Putting data to work does not have to be expensive or resource intensive. However, it does require a new strategy. A data processing pipeline built using AI is a great place to start.


Amitabh Mathur is a technology enthusiast who brings together the curiosity of a child with the knowledge and experience of someone who has spent way too much time in the IT industry. Finding new ways to apply a mix of technology, science and process-based approaches to a problem is what keeps him excited. He started his career as an entrepreneur in India, developing diverse solutions like voter ID cards, Indian Language support and interactive voice response. Over the years, he has worked in various industries (finance, healthcare, education and government) and countries (India, USA and New Zealand).

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: