Since its introduction less than a decade ago, Hadoop has ushered in a data revolution. Even century-old companies are rapidly transforming themselves into data-driven businesses, driving new revenue streams through data-based products and services that were unimaginable before Hadoop:

  • A travel booking service crunches billions of flight-price records to predict whether a particular airfare is likely to increase or decrease.
  • Agricultural researchers speed the breeding of crops for drought resistance by analyzing terabyte genome datasets.
  • A credit-card company evaluates the risk of default based on a consumer’s prior purchase history.

These are just a few examples of the new horizons Hadoop is opening up across numerous industries. The possibilities seem unbounded.
Along with these seemingly unbounded possibilities come seemingly unbounded headaches for IT managers. For many enterprises, Hadoop lacks sufficient capabilities to ensure SLAs, forcing IT managers to set up dedicated clusters for specific purposes, and size them to support theoretical maximum capacity levels. Meanwhile, Hadoop and its ecosystem continue to evolve rapidly, creating constant new challenges and solutions for IT managers to contend with. Despite its incredible promise, Hadoop today is not for the faint of heart.

Alex heads the IT department at a large e-commerce company that deployed Hadoop several years ago to provide internal teams greater visibility into their customer purchase behavior. Since then, usage of the cluster has expanded to include data modeling by the R&D team as well as real-time processing of customer service queries. Despite the notable successes his team has already achieved with their Hadoop cluster, including a glowing mention from the CEO at the last investor meeting, Alex feels overwhelmed.

Hadoop costs at Alex’s company started out high and seem to be spiraling out of control. For their initial Hadoop deployment, they were among the approximately 50% of enterprises that incurred a data migration expense in the six figures. Since then, Alex has had to go back to the CFO a few times requesting additional budget to expand the cluster based on the company’s demonstrable business demand. Each time Alex had to revise his estimates to the CFO, he felt more than a bit embarrassed, unable to accurately plan for additional capacity demand.

Today, several years into his company’s Hadoop deployment, Alex still cannot reliably predict how usage of the Hadoop cluster will continue to grow and dreads the next budget meeting, not to mention his upcoming performance review where his bonus might be in jeopardy due to his manager’s lack of faith in Alex’s estimates this year.

At the most recent budget meeting, the CFO posed several seemingly reasonable questions to Alex that he struggled to answer:

  • How is the cluster being used today, and how much headroom do we have to grow with the current deployment?
  • What kinds of servers will we need, and how big do they need to be?
  • For the purposes of internal P&L, how do we report usage of the Hadoop cluster across teams and individuals so that we can allocate costs and calculate chargebacks accordingly?

