With Kubernetes emerging as the de facto operating system of the cloud, capable of running almost anything, it’s not a surprise that many enterprises are rapidly porting their Apache Spark workloads to Kubernetes. This includes migrating Amazon EMR workloads to Amazon EKS to gain the additional deployment and scaling benefits of a fully managed service like Amazon EKS. Amazon EKS automates load distribution and parallel processing and makes it easy to run tooling and plug-ins from the Kubernetes open-source community.

One such enterprise migrating their key Amazon EMR workloads to Amazon EKS is a global software development firm and a Pepperdata customer. Working alongside this customer afforded Pepperdata the opportunity to measure the “before and after” performance of Capacity Optimizer with our Autoscaling Optimization feature enabled for Amazon EMR on EKS using a live customer’s workloads.

Running Capacity Optimizer with this additional included feature measured an impressive 42.5 percent reduction in normalized instance hours in production and therefore the cost of running these workloads, translating into tens of thousands of dollars in savings per month:

Pepperdata for KubeCon 2023 (1)

Figure 1: Normalized Instance Hour Savings Per Month running Pepperdata Capacity Optimizer’s Autoscaling Optimization feature for Amazon EMR on Amazon EKS

Methodology and a Closer Look at the Results

To measure the effectiveness of Capacity Optimizer’s Autoscaling Optimization feature in this environment, Pepperdata calculated the ratio of two numbers:

  1. Numerator: The resources the scheduler allocated to Apache Spark applications (a proxy for the amount of workload on the cluster)
  2. Denominator: The resources that are actually provisioned in the cluster (a proxy for the instance hours used and thus dollars spent)

This ratio serves as a proxy for how much work is getting done per dollar spent and can be used to measure the efficiency of a cluster. When the workload increases, or the instance hours required decreases, the ratio (and thus the overall efficiency) increases.

Without Capacity Optimizer enabled, the theoretical maximum of this ratio would be one. With Capacity Optimizer enabled, the maximum cluster efficiency by memory was increased beyond one to 1.07, which is not uncommon because Capacity Optimizer is able to safely increase the effective allocation per host, which can raise the ratio above beyond one.

Although Pepperdata had a hunch that our Autoscaling Optimization feature could deliver even better results, the actual findings proved to be dramatic, in that the maximum cluster efficiency by memory increased all the way to 2.67, an improvement of 150 percent, as shown in the following table:

CLUSTER EFFICIENCY WITH CAPACITY OPTIMIZER ENABLED
With Autoscaling Optimization Feature DISABLED With Autoscaling Optimization Feature ENABLED Improvement
Maximum Cluster Efficiency by Memory 1.07 2.67 150%
Average Cluster Efficiency by Memory 0.23 0.40 74%
Maximum Cluster Efficiency by Core 0.72 2.06 186%
Average Cluster Efficiency by Core 0.15 0.37 146%

Figure 2: Summary of Amazon EMR on EKS Performance Improvement Metrics running Pepperdata Capacity Optimizer’s Autoscaling Optimization feature.  

Pepperdata’s customer dashboard was used to visualize this increase in core and memory utilization, as shown in the following screenshot:

Pepperdata for KubeCon 2023 (2)

Figure 3: Increase of Average Core Utilization with Pepperdata’s Autoscaling feature on Amazon EMR on EKS 

Improvements in memory and core utilization create greater efficiency and throughput, allowing jobs to complete faster and with fewer resources. More importantly, they translate directly to using fewer Amazon EC2 instances and therefore cost reductions. In fact, Pepperdata calculated the estimated monthly instance hours cost savings from this increased utilization as follows:

ESTIMATED MONTHLY SAVINGS WITH PEPPERDATA CAPACITY OPTIMIZER AUTOSCALING FEATURED ENABLED
Normalized Instance Hours Saved  r5.8xlarge Instance Hours Saved  Cost Savings*
Customer’s Staging Environment 101,221 1,581 $3,187.30
Customer’s Production Environment Approximately 2 million 31,173 $62,844.77

Figure 4: Estimated Additional Monthly Cost Savings with Pepperdata Capacity Optimizer’s Autoscaling Optimization feature for Amazon EMR on EKS

*Assuming an on-demand price of $2.016/hour for an r5.8xlarge instance

Driving Cloud Growth Through Improved Efficiency and Reduced Cost

These findings are not only relevant to this customer’s cloud bill, they also have important implications to the broader Kubernetes and cloud environment. According to a recent survey, “significant or unexpected spend” was identified as the top challenge to Kubernetes adoption. Solutions like Pepperdata’s that address these cost concerns can help enterprises adopt the latest cloud technologies with greater confidence that their resources will be used at high efficiency and with minimal waste.

Pepperdata Capacity Optimizer, a cost-optimization software solution battle-tested in some of the world’s largest and most complex computing environments, delivers automatic cost control for Kubernetes—and specifically Amazon EMR on EKS—without the need for manual application tuning. Capacity Optimizer empowers enterprises to accelerate their savings on Amazon EKS and increase the overall value and effectiveness of their cloud investment.

If autonomously reducing the cost of your Amazon EMR on EKS deployment by 42.5 percent sounds compelling, you can learn more here or explore a customized proof of value to see Pepperdata Capacity Optimizer at work in your environment.

Explore More

Looking for a safe, proven method to reduce waste and cost by up to 47% and maximize value for your cloud environment? Sign up now for a free Cost Optimization Proof-of-Value to see how Pepperdata Capacity Optimizer can help you start saving immediately.