How to realize the value of Hadoop

computer-charts

With every new technology, there comes a time when exploration and discovery are largely done, and the application must begin. Then comes the real test: can businesses or other institutions put the technology to work and gain value from it? Making that leap is, in part, a leap of faith. But it’s said faith makes things possible, not easy.

And that’s where we are today with Hadoop for real life data insight development and operationalizing analytics.

The faith is there, but the challenge remains. Almost everyone now recognizes the exploding volume of data available in enterprises may enable insights businesses can apply to increase customer intimacy, offer better products and services, and become more competitive. There’s a lot of technology in play, but Hadoop—with its distributed approach to storing and processing vast amounts of data and an ecosystem of analytic tools—is the star attraction.

Unfortunately, many companies struggle to practically realize the value Hadoop deployments may promise. Working with our clients, we’ve developed a robust, field-proven approach to overcome these struggles to bring business value out of Hadoop projects. We break this approach down into three parts: a logical adoption cycle; leveraging the right tools and techniques; and ensuring returns on investment.

A Logical Hadoop Adoption Cycle

Complexity is a large part of the problem. From its origins as an Internet search engine, Hadoop has grown in scale and complexity into an ever-evolving platform that surrounds Hadoop distributed file system and MapReduce technology with tools and applications. Hadoop originator Doug Cutting has compared it to Linux—a kernel surrounded by an ecosystem of open-source projects.

So, planning, deployment, and long-term management are a challenge, and that challenge is exacerbated by a shortage of Hadoop skills. Even more fundamental, while many businesses profess the data-driven faith, business-oriented drivers for Hadoop applications are often unclear, and projects lack focus and a clear path to value.

The logical adoption cycle we’re guiding our clients through, however, is a three-step, phased approach that balances cost and benefits at each phase. Here’s how it works:

  • Discovery—Here we identify possible Hadoop and data analytics project candidates, assess the business case for each one, and identify the projects that promise the value we seek and have a good chance for success. We also validate the business value assumptions with actual results on sample data.
  • Development and integration—For a project with proven value potential, we build or modify the analytic applications per the findings confirmed in the Discovery and integrate it with the larger business intelligence (BI) and analytics landscape.
  • Implementation—For the project to actually pay off, of course, we must roll it out in production throughout the target application area. The rollout needs to enable adoption by targeted user communities, and they may be addressed in incremental waves, and require tailored training.

Our approach recognizes the potential for data analytics using the applicable Hadoop ecosystem components, and we augment that with hard quantitative factors and demonstrated benefits that guide our selection, development, and implementation decisions. That helps assure we focus on the right projects and implement them for the right reasons.

The Right Tools and Techniques for Hadoop

Promises are often easy to make, hard to deliver. That’s what many analytics and data science teams are finding with Hadoop.

This has left most companies at a crossroads—they’re convinced they can use it to get real business value out of Data, but the promise is eluding them. So, once you’ve wrapped your head around a logical adoption cycle it becomes time to drill down into some of the specific techniques and tools you can apply to help make Hadoop projects pay off.

  • Get help—One of the main obstacles to Hadoop adoption (and success) is lack of skilled personnel. An experienced partner can help you assess opportunities and scope, plan, and execute projects to achieve the value you seek.
  • Structure your approach—A robust discovery environment helps you identify the opportunities in your data with the most value. But your discovery activities should follow a structured methodology to let you explore, test, and learn from your data on Hadoop.
  • Consider SaaS—Hadoop and surrounding technologies are rapidly evolving, so upfront spending for platform infrastructure and software that may quickly become obsolete. Fortunately, consumption-based services offer the discovery and analytic functionality you need while holding down upfront setup costs and speeding time to value.
  • Think hybrid—Much of the promise of Hadoop—and data analytics in general—lies in the vast amount of unstructured data. But traditional BI systems and approaches are proven value producers. Your approach and your platform must bridge those two worlds with hybrid data management and workload optimization across platforms.
  • Dive into a data lake—The data lake is a large repository and processing engine into which data from many sources can flow. It makes it relatively simple to move and store data of all types from the appropriate sources and make it available for targeted analytics.
  • Govern with an iron hand—Mark Twain said, “Put all your eggs in one basket—and watch that basket!” When you put all your information eggs in a data lake, you must implement formal and effective processes for information governance.
  • Examine deployment alternatives—Lack of skills and financial pressure to limit capital spending can put your project on hold before it even gets off the ground. Consider consumption-based managed analytic and BI services to fast-track deployment. They can reduce up-front costs, get your project on wheels quickly, and provide better service levels while you grow your organization’s skills and contemplate incremental applications.

Ensuring Hadoop ROI

Hadoop requires a thoughtful approach and the right tools and techniques, of course. But to truly deliver on the promise of Hadoop and data insights requires a bit more because, as the poet Robert Service said, “A promise made is a debt unpaid.”

We have found that Hadoop projects tend to pay off in two general ways, and you should examine each case closely to identify value in these areas.

  • Modernizing existing BI environments: Most legacy BI efforts are built around transactional data generated or collected by company operations. The explosion in the volume of data—and the awareness that we can extract insight from unstructured data—is straining the storage and processing capabilities of those existing systems. Hadoop—along with modernized versions of Oracle, SAP, and other BI stacks —lets you apply distributed technology to scale these systems to take advantage of this volume of data while lowering run costs and TCO.
  • Operationalizing analytics: Traditionally, we have delivered analytic insight via PowerPoint or, best case, dashboards. Then someone had to apply it to business decisions or processes to actually achieve the value promised. Now, we are having success integrating analytics models into the applications that run the business day to day. That lets analytical insights flow automatically into the thousands of little decisions the business makes every day: the way you target buyers and interact with customers, how you price and ship, how you deploy and use assets, when and what you purchase. Each of these applications can have a measureable value—multiplied by thousands or millions of instances.

I hope you’ve found this information useful. For something you can stick in your briefcase and share with colleagues, check out the white paper, “Realize the value of your data using Hadoop.”


Jan Jonak headshotJan Jonak is Analytics and Big Data Platform, Engineering Lead for DXC’s analytics team. He has more than eight years of experience with HP, and now DXC, in areas such as business intelligence (BI) and data warehousing, and has extensive delivery experience for major clients. Jan has been involved in offerings development for big data, data discovery and production platforms, including Hadoop/Spark/Vertica/Haven, on premise, and cloud.

RELATED LINKS

The business case for Hadoop technology

The scoop on Hadoop: Is it right for your enterprise?

Cheat sheet: Best data ingestion tools for helping deliver analytic insights

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: