Pepperdata already gives you a granular view of everything happening across your Hadoop clusters, as we actively govern the use of CPU, memory, disk I/O, and network for every task, job, user, and group — all in real time.
Now, there’s more: you can turn any trackable metric into an alert. And it’s ridiculously easy to do. With just two clicks of your mouse, an alert can be defined at any level of granularity: cluster, node, user, queue, job, or task, providing deeper understanding of the cluster.
The alerting feature was a key feature for our newest customer, Trulia. As one of the largest online home-shopping marketplaces, Trulia must deliver the most up-to-date real estate information available to compete in today’s housing market. By using the built-in alerting feature, Trulia can easily create detailed notifications to proactively track performance metrics across its Hadoop environment.
Trulia’s personalized recommendations are based on sophisticated data science models that analyze more than a terabyte of data daily from new listings, public records, and user behavior, then cross-referencing search criteria to alert customers quickly when new properties become available. Throughout the night, there are tens of workflows and hundreds of complex jobs that must complete on time to meet customer expectations for real-time market data.
In Trulia’s case, daily email and push notifications are one of the drivers of website visits it can control, meaning it has a powerful incentive to eliminate any instability in the workflows that control the accuracy and on-time delivery of those communications. With the Pepperdata dashboard and the alerting function, it’s made identifying problems far easier and faster.
Hadoop is playing a more critical role in Trulia’s business and it has expanded its usage to an entire data engineering department consisting of several teams using multiple clusters. With many teams writing Hadoop jobs or using Hive or Spark, Trulia has to ensure reliability in its multi-tenant, multi-workload environment. So, in response to delayed or unpredictable jobs that affected their customer push-notification programs,
Trulia would intentionally underutilize its clusters to ensure jobs completed on time and prevent traffic being negatively impacted.
Now, Trulia uses Pepperdata along with the alerting feature to actively monitor and control all their Hadoop clusters, shedding light on performance issues, optimizing usage and maximizing utilization.
Trulia is a fantastic example of a company that has successfully disrupted a decades-old industry by using and analyzing real-time data to deliver customized insights straight to consumers. Doing this is no small task; it requires world-class data processing. Yet many companies run into problems when they try to guarantee performance because priority jobs aren’t completing on time and clusters are underutilized.
Pepperdata helps Trulia, and many other customers, head off these problems at the pass, dynamically solving them while guaranteeing optimal performance in real time. Needless to say, we’re pretty excited to be working with Trulia, an entirely data-driven business, to deliver the Quality of Service (QoS) it needs to grow its business.