Here at Pepperdata, we continuously work to improve our products and better serve our customers. Whether it’s executing more big data workloads or ensuring their resource consumption remains optimal, we want our customers to get the best value and tangible benefits from our products while not overshooting their big data cloud budgets. Today, we’re bringing you the data to back up our claims that all of this is possible.

Recently, we decided to benchmark Pepperdata Capacity Optimizer to showcase how it can further improve AWS Custom Auto Scaling. Modern enterprises that rely on AWS Auto Scaling often have to choose between better performance and lower costs and waste. However, our benchmark results using both TPC-DS and HiBench revealed that this tradeoff isn’t necessary. Capacity Optimizer can deliver improvements in both domains.

Priming the Pepperdata-AWS Benchmark Tests

We chose AWS because they are one of the biggest cloud services platforms out there. Being one of the leading cloud service providers in the world, we wanted to compare how AWS Custom Auto Scaling would perform on its own as well as with the help of Capacity Optimizer.

big data workloads

These were just a couple of the questions we sought to answer.

For our first benchmark test, we used TPC-DS, which is a Decision Support (DS) framework from the Transaction Processing Performance Council. TPC-DS is a standard benchmark that has been around for 20 years. It consists of 99 queries that simulate the activity of a hypothetical online retailer, including standard and ad hoc queries.

We used a generic AWS Custom Auto Scaling Policy that we got from a customer that was presumably tuned to their big data workloads and not necessarily to the TPC-DS benchmarking workloads. ​​We also used the standard Capacity Optimizer configurations that we provide to our prospects and customers—again, nothing tuned to benchmarking workloads.

Specifically, we looked into the following factors as our main points for comparison:

  • Instance hours
  • Cloud CPU utilization
  • AWS memory utilization
  • Duration of individual jobs and the overall duration of the entire benchmark suite
  • Price/performance

We ran the TPC-DS benchmark four times: twice using the AWS Custom Auto Scaling Policy alone and twice using Capacity Optimizer plus the AWS Custom Auto Scaling Policy. In each case, we averaged the two runs.

Throughout our experiment, we aimed to adhere to prevailing industry standards for big data benchmarking. To ensure that this benchmarking procedure was unbiased, the benchmarks were run “out of the box.” We did not modify or recompile them using any special libraries.

AWS Autoscales Better with Pepperdata

The results were highly encouraging.

AWS Custom Auto Scaling with Capacity Optimizer outscored AWS Custom Auto Scaling alone, in all categories we specified.
Capacity Optimizer reduced instance hours by 38% while increasing both cloud CPU utilization by 157% and AWS memory utilization by 38% when compared to just AWS Custom Auto Scaling for the TPC-DS workload. Capacity Optimizer also provided an approximate 8% decrease in the overall runtime of the entire suite of 104 queries, with the overall duration decreasing from 3.62 hours to 3.33 hours.

big data workloads

More than 90% of the queries which occurred while running Capacity Optimizer resulted in savings versus running the AWS Custom Auto Scaling policy alone.

Pepperdata excelled in the most complicated queries that reflect the most demanding real-world environments. These results show that, when also using Capacity Optimizer, AWS users can run more big data workloads, fully utilize their CPU, and consume fewer memory resources compared to when just relying on AWS Custom Auto Scaling.

Similar Results with HiBench Benchmark Testing

We also executed a series of benchmark tests on AWS and Capacity Optimizer using HiBench Benchmark workloads. The results, which we compiled in our AWS HiBench Benchmark Report, were also conclusive—AWS ran better with Pepperdata Capacity Optimizer.

On average, HiBench workloads needed 53.4 instance hours with AWS alone. But when we ran AWS with Capacity Optimizer, HiBench saw the instance hours reduced to just 35.6. This gain meant significant reductions to AWS costs and other cloud-related expenses. Why? Because cloud costs are generally determined as a multiplier of instance hours.

Using AWS with Capacity Optimizer shortened average duration hours from 5.8 to 5.1, a decrease of 12%. Capacity Optimizer also enhanced CPU utilization from 8.10% to 10.3%, which is more than a 26% improvement!

Memory consumption average was also better when AWS ran with Capacity Optimizer, which HiBench clocked in at 11.4%. With just AWS default configurations, the memory consumption average was at 10.7%.

The Implications

We published a study earlier this year that pinpointed cost management and containment as a major issue among IT professionals in various industries.

Investing in cloud services and big data is now essential in today’s business landscape. Thus, you want to know and understand whether your cloud providers are doing the best job possible minimizing resources used, waste, and costs.

AWS Custom Auto Scaling Policy is great. But Pepperdata Capacity Optimizer can further increase the impact of your cloud and big data investment.

In the end, companies that migrate big data workloads to the cloud will want to derive great value from their cloud and big data investments. These findings will have important implications on their decisions, particularly with regard to cost and resource management.

Watch our benchmarking results webinar to hear Pepperdata Product Manager Heidi Carson discuss the deep and finite details of these benchmarking results.

Take a free 30-day trial to see what Big Data success looks like

Pepperdata products provide complete visibility and automation for your big data environment. Get the observability, automated tuning, recommendations, and alerting you need to efficiently and autonomously optimize big data environments at scale.