Cluster Analyzer for Amazon EMR

Cluster Analyzer for Amazon EMR

Pepperdata® Cluster Analyzer for Amazon Elastic MapReduce (EMR) monitors the cluster and provides DevOps with important, actionable performance feedback to quickly pinpoint, diagnose, and fix problems in the cluster.

Cluster Analyzer enables customizable real-time alerting, and chargeback tracking and allocation. Cluster Analyzer also provides one-click access to historical runs, allowing for job run comparisons. Installing and running Cluster Analyzer for Amazon EMR requires a simple one-line configuration change.  To start using Cluster Analyzer for EMR, sign up on the Amazon Web Services Marketplace.

[gap size=”20px”]

Cluster Analyzer for Amazon EMR provides performance feedback to the Deploy and Operate phases of the DevOps cycle.

[gap size=”20px”]
[gap size=”20px”]

Benefits for Ops

  • View historical data of all transient EMR cluster runs

  • Quickly identify unused EMR cluster capacity

  • Proactively alert on outlier behavior

Benefits for Managers

  • Reduce operational costs of running EMR clusters

  • Increase productivity

Benefits for Devs

  • Understand how jobs use resources

  • Understand why jobs run slowly

  • Understand performance of different job phases

  • Compare job runs over time

Cluster Analyzer for Amazon EMR displays real-time and historical hardware usage charts and reports, with drill-down capability, via a cloud-hosted dashboard.

[gap size=”20px”]

Run Jobs Faster and Reduce Cost

Amazon EMR clusters are ephemeral. When a run completes, the cluster terminates, and all performance data is lost. This absence of performance data makes it almost impossible to troubleshoot  problems in the cluster or pinpoint areas for improvement. Managing cost is also a top priority for Amazon EMR customers. Because Amazon EMR charges on an hourly basis, longer run times typically result in significant overall cost increases.

Cluster Analyzer for EMR Pepperdata software collects hundreds of metrics tracking the use of CPU, memory, disk I/O, and network resources by container/task, job, and user on a second-by-second basis. Operators and developers can use this highly granular performance data to troubleshoot and solve performance issues in their applications and clusters.

This critical performance data enables granular analysis of current and historical runs to help the DevOps team optimize workloads and decrease run times caused by code inefficiencies. Instant visibility into cluster utilization makes it easier to determine the appropriate configuration to complete jobs in the shortest run times and at the lowest cost.

Certified on All Big Data Distributions

Cluster Analyzer is certified on all Big Data distributions including Apache, Cloudera, Hortonworks, MapR, and IBM, with support for both MapReduce and Spark. Pepperdata supports clusters running on-premise and in the cloud.