This blog post is part two of a two-part series authored by Pepperdata Field Engineer Jimmy Bates. Bates is a true veteran of the big data world. He speaks from a place of expertise and in-the-trenches experience. This second part describes his discovery within his career at Pepperdata, and his thoughts on how it would have made so many of his previous MapR customers’ lives easier.
For full context here, read our part one.
Pepperdata to the Rescue
I wish I had known of Pepperdata from day one, during my first deployment of Hortonworks. I wish I had used Pepperdata on every Cloudera project I worked on. I wish I had used it on every MapR success.
I could have saved thousands of hours, had I possessed the visibility provided by Pepperdata. It would not have solved all my problems. But it would have allowed me to identify and solve all my problems much faster, and solved some of my tougher production resource problems before I even knew I had them. Pepperdata would have given me 20% of my big data life back.
The GPS of Big Data
I think of Pepperdata as the GPS of Big Data. It continually helps me triangulate my current position on my flight of production success. It helps me in operations, in development, and in planning, as I carry my legacy loads from destination to destination. It gives me feedback to stay on course.
Triangulating Your Current Position
When you have real-world production workloads in big data systems, you always have waste—unless you fully understand every job and have tuned it to perfection. The most obvious benefit I get from Pepperdata? Capacity optimization. This has an immediate impact and takes no effort to implement.
Pepperdata looks at every node in your big data system and compares the resources requested and allocated, to the resources on each instance, to determine what is actually used. It then goes back and adjusts what is reported as available every 30 seconds so that the queues and scheduling rules you have in place have a better understanding of what is still available. This allows new projects where you don’t really know what you need to ask for to access more resources – without fear of turning your big data freeway into a big data parking lot. This works in all major Hadoop offerings and in all cloud-managed Hadoop offerings.
The advent of auto-scale cloud services may lead you to think that this problem is already solved. But it’s not.
Why? Because an auto-scale cloud service acts by scaling resources on a cluster-wide view. When X amount of a cluster resource hits specific conditions, the cluster will scale in or out accordingly. Unfortunately, this also scales your waste conditions directly with your consumption needs. A great cost model for the cloud, but not so much for a consumer.