Spark Dynamic Allocation | Myth #5 of Kubernetes Resource Optimization

In this blog series we’re examining the Five Myths of Kubernetes Resource Optimization.

The fifth and final myth in this series relates to another common assumption of many Kubernetes users: Dynamic Allocation for Apache Spark applications automatically prevents Spark from overprovisioning resources while improving workload utilization levels.

The Value of Apache Spark Dynamic Allocation

Apache Spark Dynamic Allocation is a useful feature that was developed through the Spark community’s focus on continuous innovation and improvement. This feature optimizes the resource utilization of Spark applications by dynamically adding and removing executors based on workload requirements. It attempts to fully utilize the available task slots per executor, eliminating the need for developers to rightsize the number of executors before applications start running.

Because of these benefits, Spark Dynamic Allocation is considered a no brainer. If the application architecture can handle it, then most developers will enable Spark Dynamic Allocation.

But an important question to ask is: What can Spark Dynamic Allocation not do?

**What Spark Dynamic Allocation Cannot Do**

Tasks Cannot Use Their Full Allocation at All Times
If a certain number of tasks is capable of running inside an executor, then ideally that number of tasks should be running. But for most applications, this number is not constant, because the system scheduler does not see the actual resource usage inside tasks —leading to wasted resources because the executor relies on allocations versus utilization.As we saw with application resource requirements in Myth 4, allocations are typically set to accommodate peak usage, even though applications and the tasks within them don’t run at peak most of the time. In fact, the number of running tasks often varies quite dramatically over time.
Spark Dynamic Allocation Leaves Waste on Table Due to Task Variability
No matter what a Spark developer does, it’s not possible to turn a knob within Spark that forces all the tasks to fully use all of the available executors. As a result, Spark executors underutilize resources, leading to waste and unneeded spend.
Spark Dynamic Allocation Cannot Guarantee Equitable Resource Allocation in Multi-Tenant Environments
Even when Spark Dynamic Allocation is enabled, Spark applications can request and can potentially consume all the cluster resources. If more than a few applications are running, these resource-hungry applications could potentially starve or even stop other applications which are running in the same cluster.This problem can be amplified in a multi-tenant environment—a common environment for SaaS-based applications—possibly preventing users or teams from accessing or using the environment.

Spark Dynamic Allocation: A Useful but Incomplete Solution

Spark Dynamic Allocation provides significant efficiency benefits in terms of automatically adding or removing executors when there is a backlog of pending tasks. It also eliminates the need for developers to rightsize the number of executors.

However, Spark Dynamic Allocation is not a standalone solution to the problem of Spark optimization, because it cannot prevent low resource utilization inside Spark executors. Even when Spark Dynamic Allocation is implemented, resources are often still underutilized because tasks frequently run at levels of underutilization— leading to unused executor capacity since the executor relies on how many tasks are allocated rather than the actual resource utilization of those tasks.

Even though it appears that the executor might be full with the amount of tasks running, more tasks can be added because they are not using optimal levels of CPU and memory. As a result, significant waste can still remain.

Summarizing the Five Myths

That wraps up our examination of the five myths around Apache Spark Optimization! Here’s a quick recap of each myth and why buying into these myths means that you still leave money and capacity on the table:

Myth 1. Observability & Monitoring

Observing and monitoring my Kubernetes environment means I’ll be able to find the wasteful apps and tune them.

The Truth About Observability & Monitoring

Observing and monitoring your Kubernetes environment can help you find pockets of waste that increases costs, but finding the waste isn’t the same as fixing it. Recommendations for eliminating waste simply generate more tasks to complete, which become impossible to implement at scale. Busy developers may be unwilling to implement such recommendations for apps that aren’t actually broken. And waste still exists even after tuning for peak resource usage, because the non-peak times are still driving peak-level costs.

Myth 2. Cluster Autoscaling

Cluster Autoscaling stops applications from wasting resources.

The Truth About Cluster Autoscaling

Cluster Autoscaling adds tremendous value in automatically responding to requests for resources and terminating instances when they're no longer needed. However, applications—and specifically Apache Spark executors—still generate waste and operate at lower utilization by requesting resources and not using them, regardless of whether Cluster Autoscaling is enabled or not.

Myth 3. Instance Rightsizing

Choosing the right instances will eliminate the waste in my cluster.

The Truth About Instance Rightsizing

Truth: Instance Rightsizing can reduce costs by aligning application needs with instance resources. However, Instance Rightsizing cannot prevent inefficient applications from driving waste—even with optimal instance types. Furthermore, the choice of instance type cannot be made dynamically from second to second as application resource requirements change, which leads to waste.

Myth 4. Manual Application Tuning

Application tuning can eliminate all of the waste in my applications.

The Truth About Manual Application Tuning

Truth: Application tuning can pull down resource allocations to the peak of the utilization curve while preventing the application from failing due to too few resources. However, it cannot eliminate the waste that still occurs when the utilization curve is not at peak—which is most of the time—nor can it account for changing needs as data characteristics change dynamically. This waste from non-peak times driving peak-level costs is still significant, typically 30% or more for most Kubernetes applications. And, most of the time, busy developers want to be developing, not spending their time tuning applications.

Myth 5. Spark Dynamic Allocation

Spark Dynamic Allocation automatically prevents Spark from wasting resources.

The Truth About Apache Spark Dynamic Allocation

Truth: As we saw above, Apache Spark Dynamic Allocation is a "no brainer" for many applications, since it eliminates the need for developers to rightsize the number of executors by fully utilizing the available task slots per executor. However, Spark Dynamic Allocation cannot prevent low resource utilization inside Spark executors. Even when Spark Dynamic Allocation is implemented, Spark applications still underutilize resources because, most of the time, tasks are not consuming resources at their peak allocation levels.

We have one more blog article in this series—an extra, bonus myth that we haven’t covered yet, along with a solution to the fundamental problem of Spark applications wasting resources. Stay tuned for a sneak peak!

Myth #5 of Kubernetes Resource Optimization: Spark Dynamic Allocation