Webinar: Proven Approaches to Hive Query Tuning

Webinar: Proven Approaches to Hive Query Tuning

Last night Pepperdata hosted a meetup in San Francisco at our customer Upsight’s office. The subject of the evening focused on our customer’s specific issues that they were struggling with when they first implemented Hadoop. Upsight is a mobile analytics and marketing tools company. Their product offering necessitates real-time data ingestion and processing. Upsight ingests upwards of 15 billion events per day across 1.5 billion devices and they manage a large Hadoop cluster with a variety of systems, processes, and access patterns.. All this data ingestion and processing helps them serve up hundreds of millions of push messages every day, respond to billions of content requests per month, and deliver 250 million in-app messages per month. To process all this data, the company uses many different technologies, such as HBase, MapReduce, Hive, Kafka, etc. Needless to say, they are working with A LOT of data every minute (never mind every day), and many different technologies on a single cluster. Performance is a must-have in order for them to stay responsive to customers and remain a valuable service.

Given this high premium on performance and SLA delivery, Upsight started to face some contention on their cluster that needed to be addressed. Their HBase workloads wer being disrupted by batch MR jobsand normal methods like manual tuning  weren’t addressing the problem. . They eventually spun up a test environment to walk through potential solutions. Upsight’s engineering team couldn’t find a solution that was simple, performant as well as cost effective.

Enter Pepperdata! Upsight turned to Pepperdata to guarantee HBase protection so that they could have stable HDFS write throughput for HBase, regardless of other MapReduce activity. Working with Pepperdata, Upsight saw their contention issues disappear while maintaining the performance they needed without requiring a large engineering investment — because Pepperdata handles resource contention dynamically and in real time.

Here’s a
link to a video of Alex Pierce, a Pepperdata Field Engineer, explaining how Pepperdata works within your existing cluster to do real-time resource monitoring, management, and reporting. See a comparison of HDFS writes on Upsight’s test cluster before and after Pepperdata was installed, as well as the level of consistency achieved on the cluster (enabling the HBase jobs to complete, and service levels to be achieved). Alex also explains how our software dynamically allocates resources based on customer-set policies to ensure high-priority jobs complete on time.

If you’re using HBase and MapReduce on the same cluster, you are probably experiencing these same challenges. And if that’s the case, Pepperdata can help. Why continue to have headaches when we have the panacea? 🙂 Contact us, request a trial, and see how we can help you!