In this blog series we are examining the Five Myths of Apache Spark Optimization. So far we’ve looked at Myth 1: Observability and Monitoring and Myth 2: Cluster Autoscaling. Stay tuned for the entire series!
The third myth addresses another common assumption of many Spark users: Choosing the right instances will eliminate waste in a cluster.
The Value of Instance Rightsizing
Selecting appropriate cloud instance types for workload demands is an essential step in any cloud computing effort. Some applications are more CPU intensive, while others might be more memory intensive. In either case, Instance Rightsizing—aligning the CPU and memory requirements of an application with an appropriate instance type—ensures that an application can run more efficiently, which can lead to significant cost savings and performance improvements.
Choosing optimal instance types from among the 600+ offered on AWS can be a daunting challenge, especially as workloads change. Many cloud practitioners rely on the assistance of cloud-providers or third-party instance recommendation tools to help with this effort. Karpenter in particular addresses many challenges around Instance Rightsizing in Kubernetes environments.
It is worth noting that an overall Instance Rightsizing effort might include additional financial optimizations such as:
- Reserved Instances and Savings Plans are volume-based and time-based purchase programs designed to reduce costs over on-demand cloud pricing. These programs provide cost and capacity predictability. This strategy can lower costs, but requires a commitment to a specific cloud provider for the duration of the reservation term. In addition, the upfront payment for the entire reservation term often requires a significant investment.
- Spot Instances reduce costs by running fault-tolerant workloads on flexible and less consistently available compute resources. This option reduces costs dramatically, but may not be an appropriate choice for all workloads since Spot Instances are subject to termination with little notice and pricing can fluctuate.
The Limitations of Instance Rightsizing
The Problem of Waste Within Applications
Although a powerful and popular option for Instance Rightsizing with Kubernetes, Karpenter and other third party tools do not completely eliminate the problem of application waste. While Karpenter and other third party tools can save money by selecting best-fit instance types for each workload, these solutions do not address the problem that data-intensive applications such as Apache Spark tend to waste the resources that are provisioned for them.
Karpenter is simply not designed to remediate inefficiencies within applications, such as a poorly written job that is overprovisioned. It is quite common for applications to be over provisioned, leading to “built-in” waste that is difficult to address at scale even with an army of developers manually tuning each application.
The Problem of Real-Time Application Volatility
Another challenge with Instance Rightsizing as an end-all cost optimization strategy is that it does not address real-time changes in application performance. An application might be so volatile during its life cycle that the instance type that was appropriate for it when the application first started running may no longer be so by the end of its run.
“Karpenter is a great tool… but modern applications are a different beast now. There are too many real-time changes which can happen during the course of the application life cycle, which may not be easy for Karpenter to handle. It can create a place where… the optimization is not at a peak.”—Shashi Raina, Partner Solution Architect, AWS
Given the volume, velocity, and variety of modern applications in the cloud, responding to the resource needs of dynamically changing applications in real time is often impossible without some form of automation. As a result, there can be times—and, indeed, long periods of time—when the provisioned resources for an application are wasted since they are running at a suboptimal level.
Instance Rightsizing:
An Important But Partial Solution
To summarize, although Instance Rightsizing can have a major impact on an application’s performance and can significantly reduce overall cloud costs, it is not a complete solution to the problem of Apache Spark optimization. This is because even the most sophisticated Instance Rightsizing effort cannot remediate the waste inherent inside Spark applications. The idea that cluster waste can be fully eliminated through optimal instance type selection truly is a myth.
In our next blog entry in this series, we’ll examine the fourth myth of Apache Spark, which concerns the effectiveness of manual application tuning. Stay tuned!