Since its introduction less than a decade ago, Hadoop has ushered in a data revolution. Even century-old companies are rapidly transforming themselves into data-driven businesses, driving new revenue streams through data-based products and services that were unimaginable before Hadoop:
- A travel booking service crunches billions of flight-price records to predict whether a particular airfare is likely to increase or decrease.
- Agricultural researchers speed the breeding of crops for drought resistance by analyzing terabyte genome datasets.
- A credit-card company evaluates the risk of default based on a consumer’s prior purchase history.
These are just a few examples of the new horizons Hadoop is opening up across numerous industries. The possibilities seem unbounded.
Along with these seemingly unbounded possibilities come seemingly unbounded headaches for IT managers. For many enterprises, Hadoop lacks sufficient capabilities to ensure SLAs, forcing IT managers to set up dedicated clusters for specific purposes, and size them to support theoretical maximum capacity levels. Meanwhile, Hadoop and its ecosystem continue to evolve rapidly, creating constant new challenges and solutions for IT managers to contend with. Despite its incredible promise, Hadoop today is not for the faint of heart.
Alex heads the IT department at a large e-commerce company that deployed Hadoop several years ago to provide internal teams greater visibility into their customer purchase behavior. Since then, usage of the cluster has expanded to include data modeling by the R&D team as well as real-time processing of customer service queries. Despite the notable successes his team has already achieved with their Hadoop cluster, including a glowing mention from the CEO at the last investor meeting, Alex feels overwhelmed.
Hadoop costs at Alex’s company started out high and seem to be spiraling out of control. For their initial Hadoop deployment, they were among the approximately 50% of enterprises that incurred a data migration expense in the six figures. Since then, Alex has had to go back to the CFO a few times requesting additional budget to expand the cluster based on the company’s demonstrable business demand. Each time Alex had to revise his estimates to the CFO, he felt more than a bit embarrassed, unable to accurately plan for additional capacity demand.
Today, several years into his company’s Hadoop deployment, Alex still cannot reliably predict how usage of the Hadoop cluster will continue to grow and dreads the next budget meeting, not to mention his upcoming performance review where his bonus might be in jeopardy due to his manager’s lack of faith in Alex’s estimates this year.
At the most recent budget meeting, the CFO posed several seemingly reasonable questions to Alex that he struggled to answer:
- How is the cluster being used today, and how much headroom do we have to grow with the current deployment?
- What kinds of servers will we need, and how big do they need to be?
- For the purposes of internal P&L, how do we report usage of the Hadoop cluster across teams and individuals so that we can allocate costs and calculate chargebacks accordingly?
Alex needs Pepperdata. Pepperdata brings unprecedented visibility, control, and capacity to Hadoop. It monitors all facets of cluster performance, including CPU, memory, disk I/O, and network by user, job, and task, in real time. It dynamically adjusts cluster utilization based on your policies and priorities so that your jobs run faster, more reliably, and more efficiently.
With the Pepperdata Dashboard, for the first time, Alex will be able to see exactly which cluster resources are being consumed by specific users and groups over any given time period. He’ll be able to produce chargeback reports that provide the visibility internal users need to plan and execute their Hadoop workloads in the most cost-effective manner possible. At the same time, Finance will be able to accurately allocate Hadoop hardware costs across business units and departments with precision, based on actual resource usage.
Alex also will be able to set priorities to m