FAQ | Pepperdata

REQUEST A DEMO

Pepperdata Basics

1. How does Pepperdata Capacity Optimizer work?

Pepperdata Capacity Optimizer automatically solves the problem of workload resource overprovisioning by giving the system scheduler real-time visibility into actual CPU, GPU, and memory utilization levels. This allows the scheduler to launch more containers per node—whether in Kubernetes or YARN—without requiring additional nodes.

Capacity Optimizer enables YARN or Kubernetes scheduler to see the actual physical resource utilization instead of the total resource allocations requested by workloads, so that the scheduler can launch more pods without launching new nodes. As a result:

In the cloud, Capacity Optimizer lowers the monthly cloud bills for customers ranging from startups to the Fortune Five. Pepperdata is installed in some of the largest and most complex environments in the world.
On premises, Capacity Optimizer helps increase the throughput of the clusters and ultimately extend the life of hardware infrastructure investments.

Once Capacity Optimizer is enabled:

Workloads on nodes are scheduled based on real-time physical utilization.
Pods are then launched with optimized resource requests.
The scheduler can now make more accurate and efficient resource decisions with Pepperdata-provided data.
The autoscaler can scale up nodes more efficiently based on actual utilization. (Note: The autoscaler can include the default YARN or Kubernetes autoscaler or other open-source autoscalers like Karpenter.) Capacity Optimizer augments the autoscaler to prevent the addition of new nodes until the existing nodes have been optimized. The net effect of this optimization is higher hardware utilization. With Capacity Optimizer, more pods run per node than with the standard Kubernetes autoscaler.

2. What kind of compute environment would be an ideal candidate for Pepperdata Capacity Optimizer?

A multi-tenant cluster running many concurrent apps is an ideal candidate for Pepperdata Capacity Optimizer.

3. What types of workloads can Pepperdata Capacity Optimizer optimize?

For Kubernetes: Apache Spark apps, Apache Flink apps, Apache Airflow apps, Jobs, JobController, CronJobs, custom labeled apps

For YARN: Apache Spark apps, MapReduce apps, Apache Tez apps

4. Does Pepperdata Capacity Optimizer work with other cloud cost optimization solutions?

Yes, Capacity Optimizer is complementary to all other solutions in the marketplace. Continue to do what you’re doing to optimize your cloud operations, using your existing tooling and processes. Then implement Capacity Optimizer on top of it to achieve an additional cloud cost reduction.

5. Can’t I just hire another engineer to do what Pepperdata Capacity Optimizer does?

If you use a handful of instances in the cloud, an engineer might help you optimize that workload. However, with larger-scale operations, it is impossible to do what Capacity Optimizer does. Capacity Optimizer works directly with the native Kubernetes or YARN scheduler to make hundreds and thousands of decisions in real time, around the clock. Capacity Optimizer operates in the background, autonomously and continuously, optimizing your cloud or on-premises environment in a way that far exceeds what even the most diligent engineer would be able to accomplish.

6. I thought a managed Kubernetes or YARN service from my cloud provider already does what Pepperdata does. Why do I need Pepperdata on top of my existing managed service that I'm already paying for?

Yes, cloud providers do provide services that achieve a certain level of resource and cost optimization, such as autoscaling. However, these vendor-provided solutions only go so far. Pepperdata’s IP allows Capacity Optimizer to take advantage of the volumes of data that are involved in the scheduler’s real-time decision making. Capacity Optimizer takes resource and cost optimization to the next level by providing the scheduler with this granular data to ensure pods/executors are continuously operating at maximum utilization. Pepperdata is also an AWS partner, and AWS regularly invites Pepperdata into opportunities where Pepperdata can solve customer problems that the managed service alone cannot.

Supported Platforms & Technologies

7. What happens with Capacity Optimizer when my company migrates to new releases of related technologies like Apache Spark?

As new releases of technologies are introduced, Capacity Optimizer goes through a certification process that takes approximately 30 days, depending on how much change there has been from the previous version. Pepperdata’s Customer Success team is happy to work with you on delivering you maximum value from your engagement with Pepperdata, including ensuring the smoothest possible rollout of new technologies.

8. Which platforms does Pepperdata Capacity Optimizer support?

Supported Environments	Open Source Kubernetes, Amazon EKS, Google GKE Apache Spark on Cloudera Data Engineering Amazon EMR, Google DataProc Cloudera Data Platform (CDP)
Supported Schedulers	Default scheduler on Amazon EMR and EKS and Google GKE Apache YuniKorn on Amazon EKS
Supporter Autoscalers	Amazon EMR Managed Autoscaling and Custom Autoscaling Policy on Amazon EMR Cluster Autoscaler and Karpenter on Amazon EKS Cluster Autoscaler with and without Node Auto-Provisioning (NAP) on Google GKE

Kubernetes, Autoscaling & Cloud Integrations

9. Does Capacity Optimizer replace the default cluster autoscaler?

No, Capacity Optimizer does not replace any of the cluster autoscaler’s standard components. Capacity Optimizer simply provides those components with better information with which to operate.

10. I already use my cloud provider’s default autoscaler. Why do I need Capacity Optimizer?

Capacity Optimizer makes your current cloud autoscaler work better. Cloud autoscaling is implemented based on resource allocation. Cloud autoscalers add more instances when the scheduler cannot add more containers to the cluster because all the existing resources are already allocated. Capacity Optimizer ensures that the new instances are added only when the existing instances are fully utilized.

11. I already use Karpenter. Why do I need Capacity Optimizer?

Although Karpenter is a better autoscaler than the default cluster autoscaler, it still makes decisions based on resource allocations and not actual, physical, hardware utilization.

Capacity Optimizer makes Karpenter work better. Cloud autoscaling is implemented based on resource allocation. Cloud autoscalers add more instances when the scheduler cannot add more containers to the cluster because all the existing resources are already allocated. Capacity Optimizer ensures that the new instances are added only when the existing instances are fully utilized.

12. I already use Horizontal Pod Autoscaling. Why do I need Capacity Optimizer?

Horizontal Pod Autoscaling (HPA) is best suited for long-running microservices while Capacity Optimizer is suited for batch workloads that start and stop.

13. How does Capacity Optimizer work with the default Cluster Autoscaler?

Capacity Optimizer prohibits the Cluster Autoscaler from launching new nodes until Capacity Optimizer has fully optimized all existing nodes. In this way, new nodes are launched only when existing nodes are fully utilized. In fact, customers have noticed up to a 71% decrease in the use of autoscaling once Capacity Optimizer is enabled. Capacity Optimizer makes no changes to the Cluster Autoscaler’s downscaling behavior.

14. I already use Vertical Pod Autoscaling (VPA). Why do I need Capacity Optimizer?

Vertical Pod Autoscaling (VPA) is a component within Kubernetes designed to resize the CPU and memory requests of pods based on their observed, historical usage patterns.

It might seem that Capacity Optimizer acts like VPA, since both Capacity Optimizer and VPA change the resource requests of pods in response to changing workload resource requirements. However, there are key differences:

VPA requires some time (hours) to work, whereas Capacity Optimizer starts working immediately: VPA requires some amount of time to observe a workload’s resource usage over time before adjusting the pods’ resource requests. This learning process can take several hours at a minimum. By the time VPA has determined a suitable level of resources, the app could have already terminated.
VPA may not optimize cluster autoscaling, whereas Capacity Optimizer improves autoscaling: VPA may reduce resource requests after pods have already been launched, but cluster (node) autoscaling decisions often occur before that point. By contrast, Pepperdata improves autoscaling by ensuring that new nodes are only added when existing ones are fully utilized.
VPA (historically) requires pod restarts, which Capacity Optimizer does not require: When VPA updates the pod request, the pod must be restarted. This may only take a few seconds maximum, but it still can introduce a delay, which can be particularly challenging for stateful workload sensitive to downtime. Because Capacity Optimizer updates the pods when they are launched, there is no issue of restart. Note that Kubernetes 1.33 (released May 16, 2025) supports the resizing of pods without a restart. Although this new feature is enabled by default, it is still in beta in this release.

15. I already use Kubernetes In-Place Pod Resizing. Why do I need Capacity Optimizer?

Kubernetes In-Place Pod Resizing is a new feature (recently graduated to beta as of Kubernetes 1.33) for changing the requests and limits for CPU and memory in a pod’s specification.

It might seem that Capacity Optimizer is similar to Kubernetes In-Place Pod Resizing, since both Capacity Optimizer and Kubernetes In-Place Pod Resizing change the resource requests of pods in response to changing workload resource requirements.

However, there’s a key difference: Kubernetes In-Place Pod Resizing is simply one mechanism to allow dynamic pod resource requests updates without needing pods to be restarted. Any product that is designed to update the pod requests could theoretically use Kubernetes In-Place Pod Resizing. VPA could use Kubernetes In-Place Pod Resizing. Even Capacity Optimizer could use Kubernetes In-Place Pod Resizing.

But that’s all there is to Kubernetes In-Place Pod Resizing. Kubernetes In-Place Pod Resizing does NOT monitor the pods and figure out what to change the requests to. Kubernetes In-Place Pod Resizing has no intelligence; it just updates the pods. You could imagine Kubernetes In-Place Pod Resizing as a knob that’s just sitting there and waiting for some intelligence to adjust it.

Capacity Optimizer, by contrast, is the intelligence that decides how the pod resource requests should be adjusted in response to changing workload resource requirements. Capacity Optimizer uses actual utilization to determine how many resources are available in real time and informs the scheduler so that more pending pods can be launched on the same node.

Implementation, Cost, & Business Value

16. How is Capacity Optimizer priced?

Pepperdata’s pricing is based on your usage. Book a meeting with us to get up and running with a free trial.

17. How long does it take to get up and running with Capacity Optimizer?

Capacity Optimizer typically installs within an hour on most enterprise environments. As soon as optimization is enabled in your cluster, you will start to see waste and cost savings on your Pepperdata-provided dashboard.

18. After installing Capacity Optimizer, how much do I have to manage it as another workload in my environment?

Pepperdata Capacity Optimizer operates automatically and autonomously and should require very little ongoing management as it works in the background to improve utilization and reduce costs in your cluster. It’s designed to reduce operational burden, not add to it.

19. Can I try Capacity Optimizer for free in my environment?

Yes! We welcome the opportunity to bring the same cost savings we see with leading enterprises into your environment. Book a meeting with us to get up and running with a free trial.

20. How can I justify the cost of purchasing Capacity Optimizer to my management team?

Capacity Optimizer typically pays for itself in hard cloud cost savings within a few months of installing. One of the largest companies in the world recently installed Pepperdata on a single cluster and shaved 28% off their cloud cost. In the case of this customer, that cost savings translates to over $400,000 in reduced cloud costs per year—for a single cluster.

In addition, with Capacity Optimizer, you will also enjoy the soft cost savings of reduced personnel costs. You won’t need dedicated engineers constantly monitoring your systems. In addition, your finance personnel will no longer need to corral your engineering teams to implement configuration recommendations with the goal of saving costs. Instead, your development teams can be freed from the tedium of tweaking and tuning code to focus on high-value, innovative activities to grow your business.

21. What does success look like once I enable Capacity Optimizer?

You should see your clusters operating at a higher level of utilization of physical hardware resources once Capacity Optimizer is enabled. This can be evaluated through Pepperdata’s dashboard and possibly through existing observability tools you may have. You should also expect to see savings on your next cloud bill.

22. I already use Reserved Instances/Savings Plans. Why do I need Capacity Optimizer?

Reserved Instances/Savings Plans are financial optimizations that can help reduce your cloud bill, but do not address the fundamental problem of overprovisioning waste that exists inside your workloads, like Capacity Optimizer does. Capacity Optimizer is complementary to Reserved Instances/Savings Plans and will save money in addition to the reductions provided by Reserved Instances/Savings Plans.

Technical Details & Advanced Scenarios

23. Is Capacity Optimizer safe to run with my most critical workloads?

For critical apps which cannot handle any preemptions, or which have strict SLAs, it is a best practice to set conservative optimization settings or to choose to exclude the app or namespace from optimization.

24. Does Capacity Optimizer respect the minimum and maximum number of nodes that I specify?

Yes, Capacity Optimizer respects these limits, even down to the individual Apache Spark application.

25. How does Capacity Optimizer handle idle applications?

Consider a workload that requests 100 cores of CPU and 1 TB of memory, but this workload only requires these resources on Thursdays. With Capacity Optimizer, other pods/executors from other workloads would be able to use that workload’s resources the other six days of the week, without anyone having to make any changes to the original workload.

In addition, Capacity Optimizer also makes no changes to that workload, because it does not know that the workload only needs the resources on Thursdays. On idle days, Capacity Optimizer simply informs (and reinforms) the scheduler and autoscaler every few seconds that this workload is not using the resources it requested, so no resources are wasted. Capacity Optimizer does this same thing for every pod/executor on every node.

Pepperdata FAQs

Have some questions regarding Pepperdata Capacity Optimizer? Here are some answers to help.

Pepperdata Basics

Supported Platforms & Technologies

Kubernetes, Autoscaling & Cloud Integrations

Implementation, Cost, & Business Value

Technical Details & Advanced Scenarios

Explore More