Apache Spark Cost Optimization

REQUEST A DEMO

Maximize the value of your Apache Spark workloads

Autonomously and continuously eliminate application inefficiencies with no manual tuning, no need to apply recommendations, and no changes to application code.

autonomous co ebook thumbnail featured img

Immediately reduce instance hours and cost

Only pay for what you use when CPU and memory are optimized in real time.
Save engineering time and effort with no manual tuning

Reclaim hours of engineering time that can be reallocated to GenAI and AgenticAI projects.
Autonomously eliminate in-application waste
without code changes

Autonomously eliminate in-application waste without code changes

Spark applications are inherently wasteful. Pepperdata eliminates in-app waste in real time.
DOWNLOAD THE EBOOK

How Pepperdata optimizes Apache Spark clusters

V2 Continuous Intelligent Tuning section

No matter where you run Apache Spark—in the cloud, on prem, or in hybrid environments—Pepperdata Capacity Optimizer saves you money by:

Automatically identifying where more jobs can be run in real time
Enabling the scheduler to more fully utilize available resources before adding new nodes or pods
Dynamically tuning the Cluster Autoscaler to respond to changing application workload needs

The result: Apache Spark CPU and memory are automatically optimized to reduce costs and increase utilization, enabling more apps to be launched for an average cost savings between 30-47%.

LEARN MORE

Autodesk reduced Spark costs by over 50% with
Pepperdata Capacity Optimizer

Autodesk reduced Spark costs by over 50% with Pepperdata Capacity Optimizer

Let us do the same for you.

Challenge

Autodesk experienced runaway costs as the team could not keep up with manually tuning its Spark on Amazon EMR workloads.

Solution

Pepperdata Capacity Optimizer autonomously tuned Autodesk’s Spark applications in real time for maximum resource utilization.

Results

Autodesk realized cost savings by over 50% for its Spark on EMR workloads, and automated manual tuning tasks to free the developer team for more innovative, high-growth projects.

TRY A DEMO

TPC-DS benchmarks for Apache Spark

Spark on Amazon EKS benchmark

Reduced the total instance hours and related costs by 41.8% and enabled the entire workload to run 45.5% faster (October 2023)

Spark on Amazon EMR benchmark

Optimized resource utilization with a 157% increase in CPU utilization and a 38% increase in memory utilization (August 2021)

*TPC-DS is the Decision Support framework from the Transaction Processing Performance Council. TPC-DS is an industry-standard big data analytics benchmark. Pepperdata’s work is not an official audited benchmark as defined by TPC. TPC-DS benchmark results (Amazon EKS), 1 TB dataset, 500 nodes, 10 parallel applications with 275 executors per application.

Explore More

Looking for a safe, proven method to reduce waste and cost by 30% or more and maximize value for your cloud environment? Sign up now for a free cost optimization demo to learn how Pepperdata Capacity Optimizer can help you start saving immediately.

Automatically Reduce Spend in Apache Spark

Maximize the value of your Apache Spark workloads

Immediately reduce instance hours and cost

Save engineering time and effort with no manual tuning

Autonomously eliminate in-application waste without code changes

Autonomously eliminate in-application waste without code changes

How Pepperdata optimizes Apache Spark clusters

Autodesk reduced Spark costs by over 50% with Pepperdata Capacity Optimizer

Autodesk reduced Spark costs by over 50% with Pepperdata Capacity Optimizer

Challenge

Solution

Results

TPC-DS benchmarks for Apache Spark

Spark on Amazon EKS benchmark

Spark on Amazon EMR benchmark

Explore More

Autonomously eliminate in-application waste
without code changes

Autodesk reduced Spark costs by over 50% with
Pepperdata Capacity Optimizer