It all starts so innocently: a dev group sets up Hadoop on a small cluster and starts playing around with it. Within a few weeks, an engineer has loaded everything from the data warehouse into Hadoop, written a few Pig jobs, and even started to get some interesting customer engagement data. She emails the report out to her team, and someone says, “Hey, this is great – can you run this every week?” She sets up the reporting job to run every day, figuring that’s just as easy as weekly.
A month passes, and she gets an urgent email from another team asking why the customer engagement reports have been late for the past few days. “Our sales team has been using your reports to identify at-risk customers to call. But they haven’t gotten new reports for the past few days. What happened? When will you have them up and running on time again?”
The engineer’s first reaction is likely to be something like this: But that was just an experiment! I didn’t know anyone was actually counting on them. Then she logs into the Hadoop cluster to try to figure out what went wrong – and why these test jobs (which have silently graduated to production) aren’t running when they should. Unfortunately, the answer is usually not easy to find.
We hear stories like this all the time, though sometimes the urgent email turns out to be from the CEO! These scenarios follow a common pattern in Hadoop adoption: Hadoop is such a flexible, scalable system that it’s easy for an engineer to quickly grab data that could never before be combined in one place, write some jobs, and get interesting results. Sometimes the results are so interesting that other teams start using them, and all of a sudden the company’s business depends on something that started as an experiment.
This is the beauty – and challenge – of Hadoop, and it’s one of the main reasons companies large and small, across industries are adopting Hadoop so fast. We at Pepperdata have seen the huge impact Hadoop can have on driving business results, and we’re excited to be part of making it even better – something companies can count on when they need results on time, every time.