Want your data lake to start generating some real value? Here’s how.

ripples-in-lake-water

Warehouse. Lake. Repository. The terms we’ve historically used for managing data are all oriented around the concept of containing. That makes sense, because that’s largely how we’ve viewed data – as something to be collected and stored.

But that’s becoming less true. We know that data volume has been growing exponentially, and the trends point to an even faster rate of growth. We haven’t even seen the fat part of the curve in the growth of Internet-enabled devices. Every IoT device — from sensors on a jet engine to the smoke detector in your bedroom — are spitting out data that has some value to someone, somewhere. A lot of data.

That’s a problem, because as data volume and speed continue to grow, the number of use cases that rely on real-time analytics grows with it. Smart cities, connected cars and supply chain processing are just a few examples that rely on the ability to analyze data as it’s collected.

But data warehouses and repositories have traditionally focused on data that’s been collected to be analyzed later. What if we could combine the ability to process data-in-motion with data-at-rest?

Actually, it’s being done and is already yielding examples we use every day. Real-time product recommendations in e-commerce and technologies like Siri and the Google Assistant are powered by the ability to combine historical and real-time data. The problem is, these example were built using very expensive technologies and tons of custom code. That means they’re solutions that only made sense for very broad applications and only the most deep-pocketed companies could afford to build them. But that’s changing.

New data flow products like Hortonworks DataFlow are bringing these capabilities to a wider audience, enabling any company to make better use of its data. Not only that — and this may be even more important — these tools are dramatically decreasing the time required to try new ideas.

Those early, expensive, highly custom technologies depended on complex data flow orchestration that required a lot of time, highly specialized code, and complex build cycles and development environments. It could easily take months to develop a solution before you saw any value from it. And as we know in today’s environment, by that time, the question we’re solving could easily have changed. More than once.

New generation data flow products are making the process much simpler and faster. The complex code once required to build an analytics application has been replaced by an easy-to-use drag-and-drop interface. When this is paired with web-based tools, it eliminates the need for coding — more like assembling Lego blocks and less like chiseling a solution out of stone.

These modern solutions enable a business to combine data-at-rest and data-in-motion with intuitive interfaces to build easy-to-use and reusable applications. They’re giving companies the power to create their own intelligent, real-time analytics applications that can answer questions in the timeframe that matters most. Now.


Larry Kozak is VP Channels & Alliances at Hortonworks.

Samir Sehovic is an Analytics Platform Architect at DXC.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: