Why Best Practices Lead to Underutilized Clusters, and Which New Tools Can Help
Hadoop is a popular (if not de facto) framework for processing large data sets through distributed computing. YARN allowed Hadoop to evolve from a MapReduce engine to a big data ecosystem that can run heterogeneous (MapReduce and non-MapReduce) applications simultaneously. This results in larger clusters with more users and workloads than ever before. Traditional recommendations encourage provisioning, isolation, and tuning to increase performance and avoid resource contention but result in highly underutilized clusters.
The Hadoop Performance Myth looks at the challenges of improving performance and utilization for today’s dynamic, multitenant clusters and how emerging solutions help when best practices fall short.