In part one of this two-part blog post, we began our deep dive into Apache Spark tuning to optimize resources. We looked at what is involved in executor and partition sizing, particularly choosing the number of partitions and choosing an executor size. After establishing some principles of optimization here, we ended by asking an important question: Is it really practical for all applications to be optimized?
As our recent State of the Market report helped reveal, the answer is two-sided. The good news? Yes. The bad news? It’s pretty much impossible without the right tools.
The Challenge of Optimizing All Your Apps
It’s actually quite difficult even for knowledgeable developers to keep their jobs optimized because it’s not always clear how much resources an application will use. Pepperdata Platform Spotlight helps with this by showing both the allocated resources and the used resources for a given job run (see Figure 1).