Observability & Monitoring | Myth #1 of Kubernetes Resource Optimization

In this blog series we’ll be examining the Five Myths of Kubernetes Resource Optimization. (Stay tuned for the entire series!)

The first myth examines a common assumption of many Kubernetes users: Observing and monitoring your Kubernetes environment means you’ll be able to find the applications underutilizing resources and tune them.

Certainly, identifying wasteful apps and tuning them for greater efficiency is a great starting point for resource optimization. This effort typically involves deploying services like Amazon Cloudwatch or third-party application monitoring tools. These services and tools analyze running applications and propose settings or other modifiable configurations for increased efficiency. Some observability and monitoring solutions even provide specific tuning recommendations for individual applications, such as Change spark.driver.memory from 6g to 4g.

However, observability and monitoring tools do not actually improve resource utilization and eliminate the waste, and they certainly do not do so automatically. These tools can surface problems and often generate recommendations for remediation, but do not solve them.

This creates a gap, because finding resource waste is not fixing waste.

Tuning recommendations generated by observability and monitoring tools often translate into a lengthy to-do list, especially as the number of applications increases. What to do with such a recommendations list? Most organizations go back to their developers, list in hand.

The Challenges of Implementing Recommendations

Implementing manual tuning recommendations requires significant effort and is a primary pain point for developers, according to the FinOps Foundation. Because developers are generally not responsible for the cost of their applications, asking them to adjust configurations to minimize cost can seem outside their scope of work. Developers may even be reluctant to tweak something that seems to be running well out of fear of breaking it, following a completely reasonable mindset of “if it ain’t broke, don’t fix it.” And no developer, no matter how dedicated, can keep pace with the real-time dynamism of modern applications and their volatile, ever-changing resource requirements.

Even assuming an army of developers is at the ready to tweak and tune Kubernetes applications in real time, that still doesn’t solve the problem of waste inside the application. The waste inside Kubernetes applications is based on how s resources are provisioned and used.

Kubernetes Applications Overprovisioning Leads to Higher Cost and Lower Utilization

Many Kubernetes applications utilize resources in a highly dynamic way because the data that these applications process is typically bursty, unpredictable, and highly variable. Even the way they interact with other concurrently running applications may impact their resource requirements. As a result, the resource utilization profile for a typical Kubernetes application might look like this:

Figure 1: The resource utilization of a Kubernetes application can vary dramatically and unpredictably over time.

As is evident from this graph, most applications run at peak provisioning levels for only a small fraction of their execution time. These peaks represent brief periods of high resource usage, while the valleys indicate extended stretches of low activity. Interestingly, the majority of the application’s runtime is spent in these lower-demand valleys, with resource-intensive peaks occurring only occasionally. In fact, as much as 80 percent of the application runtime can be in the valleys.

Figure 2: Most applications reach peak resource utilization for only a small fraction of their runtime.

Developers are required to request a certain allocation level of memory and CPU for each of their applications. To prevent their applications from being killed due to insufficient resources, developers typically request memory and CPU resources to accommodate peak usage (and maybe then some on top, just to be safe). And developers can only allocate the compute resources to applications in a static way.

Figure 3: Developers are required to allocate memory and CPU for each of their Kubernetes applications in a static way. To prevent their applications from being killed due to insufficient resources, developers typically request such resources to accommodate peak usage.

Some cost-conscious developers might make an effort to reduce the provisioning line as low as possible, to align with peak resource requirement levels.

Figure 4: From a cost-cutting perspective, the best a developer can do via manual tweaking and tuning is to reduce their resource request level to match the peak of what an application requires. However, since most applications run at peak only about 20 percent of the time, often significant waste remains.

However, even if a developer reduces the allocation level to match the peak requested by the application, they cannot effectively “bend the allocation line” in real time to match the peaks and valleys and align with actual resource usage requirements that vary in real time. As a result, waste cannot be eliminated entirely by tweaking and tuning alone.

Finding Waste ≠ Fixing Waste

Identifying wasteful apps is a great starting point for cost optimization, but observability and monitoring tools don’t fix waste. Attempts to remediate this waste through manual tweaking and tuning can only go so far. To peek ahead at a solution, check out this page on Pepperdata Resource Optimization.

In our next blog entry in this series, we’ll examine the second myth, which centers around cluster autoscaling. Stay tuned!

Myth #1 of Kubernetes Resource Optimization: Observability & Monitoring

The Challenges of Implementing Recommendations

Kubernetes Applications Overprovisioning Leads to Higher Cost and Lower Utilization

Finding Waste ≠ Fixing Waste

Explore More