Installing Pepperdata Cloud for Amazon EMR

This page describes how to install and launch Pepperdata from the AWS Marketplace on your EMR cluster.
Pepperdata provides automated infrastructure optimization, improved management of AWS autoscaling, full-stack observability, and real-time insights—all in one place. With Pepperdata, you can save money, optimize big data performance, and run EMR applications more efficiently.

We’ll take you through the following steps to get you up and running with Pepperdata:


install get started install task 1 install task 2 install task 3 install task 4


Getting Started

Visit the Pepperdata AWS Marketplace storefront here. You can choose a 30-day Free Trial, or if you’re ready to fully engage with Pepperdata, choose the Starter Pack to access our complete suite of features and functionality.

Click Continue to Subscribe at the top of the page, and fill out the information we need to create your Pepperdata account. We’ll need your 12-digit AWS account ID so we can link your AWS and Pepperdata accounts. You’ll find this under the My Account dropdown in your AWS account.

emr one more step window

Once you submit your signup page, we’ll send you a few important emails you’ll need for the remaining setup steps.

Look for the email titled Your Pepperdata Trial Essentials. Inside your Pepperdata Trial Essentials email are a few important steps to configure your EMR cluster.

Task 1: Launch the CloudFormation stack to create the Pepperdata Role and grant Pepperdata access

An IAM role is required to enable Pepperdata access for operation in the EMR environment. You can create the policies for access to all resources or to only specific resources, such as clusters where given functionality is required. For details about IAM service roles and permissions, refer to the Customize IAM Roles page from Amazon.

Click the Launch Stack button in Task 1 of your Pepperdata Trial Essentials to automatically launch the CloudFormation stack to create the Pepperdata Role and grant it access. Click through the four steps of the Cloud Formation Template as shown below. You don’t need to edit anything on these pages.

emr create stack templateemr create stack template step 2

At the bottom of the final step, check the acknowledgment box, then clickCreate Stack.

emr iam role

Task 2: Create a new cluster with Pepperdata

Now start up a new EMR cluster as usual. The procedure below assumes that you will not need to leverage any custom cluster management functions, such as certificate management. If such additional (non-Pepperdata) functions are needed, you should create a “helper bootstrap” script to invoke those functions and call the Pepperdata bootstrap script. In this case, upload the helper bootstrap script to the cluster configuration folder, and use its location and filename for the Script location field in the procedure.

Heads up! Some important items that you must pay attention to as you create your new EMR cluster:

— Be sure to choose EMR Managed Autoscaling.
Be sure to choose the IAM role created in Step 1 above.
Ensure that Security-Enhanced Linux (SELinux) is disabled. By default, EMR disables SELinux. But if it’s been enabled, you must disable it before activating Pepperdata.


Note: These steps do not tell you how to configure everything in your cluster, only the key items needed for Pepperdata to work correctly.

1. In your Amazon AWS environment, launch the Create cluster wizard, and click Go to advanced options (not quick options).
2. Under Software and Steps > Software Configuration, be sure to choose EMR 6.1 or 6.2. If you are using an earlier version of EMR, please contact us at for alternate instructions.

emr releases

3. Under Hardware > Cluster scaling, check Enable Cluster Scaling and ensure Use EMR-managed scaling is selected.

emr cluster scaling

4. Under General Cluster Settings > Additional Options, add a Bootstrap Action, and select Custom Action, then Configure and Add. Enter your custom bootstrap script location and your custom bootstrap optional arguments from your Pepperdata Trial Essentials email, as shown here:

bootstrap actions 1









5. Click Add to add the bootstrap script.
6. Under Security > Permissions, choose Custom Permissions. Assign the IAM role for Pepperdata access—which you already created above under Task 1: Launch the CloudFormation stack to create the Pepperdata Role and grant Pepperdata access—to the EC2 instance profile. Navigate to the Task 4: Security > Permissions section, and select Custom. In the EC2 instance profile list, select the IAM role that you previously created; it’s called PepperdataManagedAutoscalingRole.
7. Click Create Cluster.
8. Wait a few minutes while the cluster is created, the Pepperdata software is installed, and the Pepperdata services are automatically started.

Once your new EMR cluster is up and running, you can start running some jobs on your EMR cluster. You may discover you already have a simple default job or two provided by AWS.

Task 3: Reset Your Password

Approximately 30 minutes after your cluster is up and running, you should be able to login to your Pepperdata dashboard. Click the dashboard login link that is also in your Pepperdata Trial Essentials email. If asked to enter an email address, use the same one you provided in Task 2 above. Choose a secure password that meets the requirements shown.


Task 4: Visit Your Pepperdata Dashboard and Start Optimizing!

Once you log in, you should start seeing your first reports! You can always return to your Pepperdata dashboard at

To get the most out of your Pepperdata trial, here are some good next steps to check out (You must login to your Pepperdata account to access these guides.): 

Platform Spotlight User’s Guide

Capacity Optimizer User’s Guide

If you have any questions, please feel free to contact us at Or feel free to schedule a time with one of our Support Engineers to learn how to get the most out of Pepperdata. We’re here to help you!

Take a free 15-day trial to see what Big Data success looks like

Pepperdata products provide complete visibility and automation for your big data environment. Get the observability, automated tuning, recommendations, and alerting you need to efficiently and autonomously optimize big data environments at scale.