Code Analyzer for Apache Spark

Pepperdata® Code Analyzer for Apache Spark enables Spark application developers to easily identify performance bottlenecks in their applications. Code Analyzer correlates real-time cluster resource use (CPU, memory, network, and disk I/O) to particular blocks of application code.

Code Analyzer for Apache Spark provides performance feedback to the Code, Build, Test, and Release phases of the DevOps cycle.

Benefits for Devs

  • Identify the lines of code and the stages that cause performance issues related to CPU, memory, garbage collection, network, and disk I/O

  • Easily disambiguate resources used during parallel stages

  • Understand why run-time variations occur for the same application

  • Determine whether performance issues are due to the application or other workloads on the cluster

Benefits for Managers

  • Improve communication of performance issues between Dev and Ops

  • Shorten time to production

  • Increase cluster ROI

Benefits for Ops

  • Reduce the number of performance incidents in production

  • Easily communicate detailed performance issues back to developers

Code Analyzer Identifies the lines of code and the stages that cause performance issues related to CPU, memory, garbage collection, network, and disk I/O

Code-Centric Approach

Code Analyzer uses a code-centric approach that presents the developer with a code block and correlates it with the timeline of cluster resources consumed during execution. This enables developers to pinpoint specific segments of code and stages that require optimization.

For example, if an application is consuming a lot of CPU while two stages are running in parallel, it is impossible to determine which stage is causing the problem when only using the Apache Spark UI. Because Code Analyzer provides a time-series view overlaid with parallel stages, it becomes almost trivial to understand which stage is causing the high CPU usage.

Using Apache Spark UI or other tools, it is very difficult to understand variance in application runtime performance. This is because none of these tools provides the entire context of what else is running on the cluster. Code Analyzer shows the “cluster weather” so that developers can understand whether the performance variation was due to their application or the context within which the application was running.

Seamless Integration with Pepperdata Products

Code Analyzer for Apache Spark is integrated with the other Pepperdata products to provide an end-to-end DevOps solution, combining overall cluster awareness (monitoring, troubleshooting, and alerting) with deep recommendations for improving the performance of individual jobs.

Works With All Big Data Distributions

Code Analyzer works with Big Data distributions including Apache, Cloudera, Hortonworks, MapR, and IBM. Pepperdata supports clusters running on-premise and in the cloud.