OPTIMIZE PERFORMANCE FOR YOUR ENTIRE BIG DATA STACK

PLATFORM SPOTLIGHT

APPLICATION SPOTLIGHT

CAPACITY OPTIMIZER

The 451 Take on Cloud-Native: Truly Transformative for Enterprise IT

Helping to shape the modern software development and IT operations paradigms, cloud-native represents a significant shift in enterprise IT. In this report, we define cloud-native and offer some perspective on why it matters and what it means for the industry.

Elements of Big Data APM Success

Pepperdata delivers proven big data APM products, operational experience, and deep expertise.

PLATFORM SPOTLIGHT
PLACEHOLDER

Request a free trial and join other Fortune 500 companies who partner with Pepperdata to save millions on infrastructure. Get real-time visibility across infrastructure and applications for a complete view of your big data performance. Eliminate manual tuning, automatically tune your platform, and run up to 50% more jobs on your Hadoop clusters. Simplify troubleshooting and problem resolution, and quickly resolve issues to meet SLAs. Pepperdata solutions optimize performance on-premise, in the cloud, with no manual tuning or coding needed.

Request Trial

Resources

Cloudwick Collaborates with Pepperdata to Ensure SLAs and Performance are Maintained for AWS Migration Service

Pepperdata Provides Pre- and Post-Migration Workload Analysis, Application Performance Assessment and SLA Validation for Cloudwick AWS Migration Customers

San Francisco — Strata Data Conference (Booth 926)  — March 27, 2019 — Pepperdata, the leader in big data Application Performance Management (APM), and Cloudwick, leading provider of digital business services and solutions to the Global 1000, today announced a collaborative offering for enterprises migrating their big data to Amazon Web Services (AWS). Pepperdata provides Cloudwick with a baseline of on-premises performance, maps workloads to optimal static and on-demand instances, diagnoses any issues that arise during migration and assesses performance after the move to ensure the same or better performance and SLAs.

“The biggest challenge for enterprises migrating big data to the cloud is ensuring SLAs are maintained without having to devote resources to entirely re-engineer applications,” said Ash Munshi, Pepperdata CEO. “Cloudwick and Pepperdata ensure workloads are migrated successfully by analyzing and establishing a metrics-based performance baseline.”

“Migrating to the cloud without looking at the performance data first is risky for organizations and if a migration is not done right, the complaints from lines of business are unavoidable,” said Mark Schreiber, General Manager for Cloudwick. “Without Pepperdata’s metrics and analysis before and after the migration, there is no way to prove performance levels are maintained in the cloud.”

For Cloudwick’s AWS Migration Services, Pepperdata is installed on customers’ existing, on-premises clusters — it takes under 30 minutes — and automatically collects over 350 real-time operational metrics from applications and infrastructure resources, including CPU, RAM, disk I/O, and network usage metrics on every job, task, user, host, workflow, and queue. These metrics are used to analyze performance and SLAs, accurately map workloads to appropriate AWS instances, and provide cost projections. Once the AWS migration is complete, the same operational metrics from the cloud are collected and analyzed to assess performance results and validate migration success.

To learn more, stop by the Pepperdata booth (926) at Strata Data Conference March 25-28 at Moscone West in San Francisco.

More Info

About Pepperdata
Pepperdata (https://pepperdata.com) is the leader in big data Application Performance Management (APM) solutions and services, solving application and infrastructure issues throughout the stack for developers and operations managers. The company partners with its customers to provide proven products, operational experience, and deep expertise to deliver predictable performance, empowered users, managed costs and managed growth for their big data investments, both on-premise and in the cloud. Leading companies like Comcast, Philips Wellcentive and NBC Universal depend on Pepperdata to deliver big data success.

 Founded in 2012 and headquartered in Cupertino, California, Pepperdata has attracted executive and engineering talent from Yahoo, Google, Microsoft and Netflix. Pepperdata investors include Citi Ventures, Costanoa Ventures, Signia Venture Partners, Silicon Valley Data Capital and Wing Venture Capital, along with leading high-profile individual investors. For more information, visit www.pepperdata.com.

About Cloudwick

Cloudwick is the leading provider of digital business services and solutions to the Global 1000. Its solutions include data migration, business intelligence modernization, data science, cybersecurity, IoT and mobile application development and more, enabling data-driven enterprises to gain competitive advantage from big data, cloud computing and advanced analytics. Learn more at www.cloudwick.com.

###

Contact:
Samantha Leggat
samantha@pepperdata.com

Pepperdata and the Pepperdata logo are registered trademarks of Pepperdata, Inc. Other names may be trademarks of their respective owners.

March 27, 2019

Pepperdata Announces Free Big Data Cloud Migration Cost Assessment to Automatically Select Optimal Instance Types and Provide Accurate Cost Projections

Pepperdata Eliminates Guesswork and Complexity Associated with Identifying Best Candidate Workloads Down to Queue, Job and User Level, for Moving to AWS, Azure, Google Cloud or IBM Cloud

CUPERTINO, Calif. — March 6, 2019 — Pepperdata, the leader in big data Application Performance Management (APM), today announced its new Big Data Cloud Migration Cost Assessment for enterprises looking to migrate their big data workloads to AWS, Azure, Google Cloud or IBM Cloud. By analyzing current workloads and service level agreements, the detailed, metrics-based Assessment enables enterprises to make informed decisions, helping minimize risk while ensuring SLAs are maintained after cloud migration.

The Pepperdata Big Data Cloud Migration Cost Assessment provides organizations with an accurate understanding of their network, compute and storage needs to run their big data applications in the hybrid cloud. Analyzing memory, CPU and IO every five seconds for every task, Pepperdata maps the on-premises workloads to optimal static and on-demand instances on AWS, Azure, Google Cloud, and IBM Cloud. Pepperdata also identifies how many of each instance type will be needed and calculates cloud CPU and memory costs to achieve the same performance and SLAs of the existing on-prem infrastructure.

“When enterprises consider a hybrid cloud strategy, they estimate the cost of moving entire clusters, but that’s not the best approach,” said Ash Munshi, Pepperdata CEO. “It’s far better to identify specific workloads that can be moved to take full advantage of the pricing and elasticity of the cloud. Pepperdata collects and analyzes detailed, granular resource metrics to accurately identify optimal workloads for cloud migration while maintaining SLAs.”

The Big Data Cloud Migration Cost Assessment enables enterprises to:

  • Automatically analyze every workload in your cluster to accurately determine their projected cloud costs
  • Get cost projections and instance recommendations for workloads, queues, jobs, and users
  • Map big data workloads to various instance types including static and on-demand
  • Compare AWS, Azure, Google Cloud, and IBM Cloud

Availability

Pepperdata Big Data Cloud Migration Cost Assessment is available free at pepperdata.com/free-big-data-cloud-migration-cost-assessment. Pepperdata customers should email support@pepperdata.com for their free assessment.

Learn more:

About Pepperdata
Pepperdata (https://www.pepperdata.com) is the leader in big data Application Performance Management (APM) solutions and services, solving application and infrastructure issues throughout the stack for developers and operations managers. The company partners with its customers to provide proven products, operational experience, and deep expertise to deliver predictable performance, empowered users, managed costs and managed growth for their big data investments, both on-premise and in the cloud. Leading companies like Comcast, Philips Wellcentive and NBC Universal depend on Pepperdata to deliver big data success.

 Founded in 2012 and headquartered in Cupertino, California, Pepperdata has attracted executive and engineering talent from Yahoo, Google, Microsoft and Netflix. Pepperdata investors include Citi Ventures, Costanoa Ventures, Signia Venture Partners, Silicon Valley Data Capital and Wing Venture Capital, along with leading high-profile individual investors. For more information, visit www.pepperdata.com.

###

Contact:
Samantha Leggat

925-447-5300
samantha@pepperdata.com

Pepperdata and the Pepperdata logo are registered trademarks of Pepperdata, Inc. Other names may be trademarks of their respective owners.

March 5, 2019

Pepperdata Unveils 360° Reports, Enabling Enterprises to Make More Informed Operational Decisions to Maximize Capacity and Improve Application Performance

360° Reports Empower Executives to Better Understand Financial Impacts of Operational Decisions

CUPERTINO, Calif. — February 19, 2019 — Pepperdata, the leader in big data Application Performance Management (APM), today announced the availability of 360° Reports for Platform Spotlight. Pepperdata 360° Reports leverage the vast amount of proprietary data collected and correlated by Pepperdata to give executives capacity utilization insights so they better understand the financial impacts of operational decisions.

“Pepperdata 360° Reports demonstrate the power of data and the valuable insights Pepperdata provides, enabling enterprises to make more informed and effective operational decisions,” said Ash Munshi, Pepperdata CEO. “Operators get a better understanding of what and where they’re spending, where waste can be reclaimed, and where policy and resource adjustments can be made to save money, maximize capacity and improve application performance.”

360° Reports for Pepperdata Platform Spotlight include:

  • Capacity Optimizer Report: This gives operators insight into memory and money saved by leveraging Pepperdata Capacity Optimizer to dynamically recapture wasted capacity.
  • Application Waste Report: This report compares memory requested with actual memory utilization so operators can optimize resources by changing resource reservation parameters.
  • Application Type Report: This gives operators insight on the technologies used across the cluster and the percentage of each (percentage of Spark jobs, etc.). This provides executives with insights into technology trends to make more data-driven investment decisions.
  • Default Container Size Report: This report identifies jobs using default container size and where any waste occurred so operators can make default container size adjustments to save money.
  • Pepperdata Usage Report: This presents Pepperdata dashboard usage data, highlighting top users, days used, and more to give operators insights to maximize their investment. With this data, operators can identify activities to grow the user base, such as promoting features, scheduling onboarding sessions, and training on custom alarms.

Availability

Pepperdata 360° Reports are available immediately for Pepperdata Platform Spotlight customers. For a free trial of Pepperdata, visit https://www.pepperdata.com/trial.

About Pepperdata
Pepperdata (https://pepperdata.com) is the leader in big data Application Performance Management (APM) solutions and services, solving application and infrastructure issues throughout the stack for developers and operations managers. The company partners with its customers to provide proven products, operational experience, and deep expertise to deliver predictable performance, empowered users, managed costs and managed growth for their big data investments, both on-premise and in the cloud. Leading companies like Comcast, Philips Wellcentive and NBC Universal depend on Pepperdata to deliver big data success.

 Founded in 2012 and headquartered in Cupertino, California, Pepperdata has attracted executive and engineering talent from Yahoo, Google, Microsoft and Netflix. Pepperdata investors include Citi Ventures, Costanoa Ventures, Signia Venture Partners, Silicon Valley Data Capital and Wing Venture Capital, along with leading high-profile individual investors. For more information, visit www.pepperdata.com.

###

Contact:
Samantha Leggat
samantha@pepperdata.com

Pepperdata and the Pepperdata logo are registered trademarks of Pepperdata, Inc. Other names may be trademarks of their respective owners.

Sample report attached.

Sample Capacity Optimizer Report – memory and money saved with Capacity Optimizer

February 19, 2019

How to Overcome the Five Most Common Spark Challenges

Everyone in the Big Data world knows Spark. It’s an amazing, powerful piece of software that runs workloads 100x faster than other methods.

However, fantastic as Spark is, like all software, it has its challenges. In a recent webinar, we sat down with Alexander Pierce, a Pepperdata Field Engineer, to discuss them. Alex drew on his experiences across dozens of production deployments, and pointed out the best ways to overcome the five most common Spark challenges.

 

Serialization is Key

Serialization plays an important role in the performance of any distributed application. Formats that are slow to serialize objects into, or those that consume a large number of bytes, will greatly slow down the computation.

“Because you’ll have to distribute your codes for running and your data for execution, you need to make sure that your programs can both serialize, deserialize, and send objects across the wire quickly,” Alex explains. This will often be the first thing you should tune to optimize a Spark application. Furthermore, Alex recommends the use of the Kryo serializer, because Java’s default serializer has mediocre performance with respect to runtime, as well as the size of its results. 

 

Getting Partition Recommendations and Sizing to Work for You

Generally speaking, any performance management software that sees data skew will recommend more partitions, but not too many more. “The more partitions you have, the better your sterilizations could be,” says Alex. But that’s not always the case. 

The best way to decide on the number of partitions in an RDD is to equate the number of partitions to a multiple of the number of cores in the cluster. This is so that all the partitions will process in parallel and the resources receive optimum utilization. Alex further suggests that you’ll want to avoid a situation where you have four executors and five partitions.

 

Monitoring Both Executor Size, And Yarn Memory Overhead

Often, what you’re trying to do is subdivide your data set into the smallest pieces that can be easily consumed by your Spark executors, but you don’t want them to be too small. There are a few ways to find that happy middle ground, but you’ll have to find a way around data skew by ensuring a well-distributed key space.

“Make a guess at the size of your executor based on the amount of data you expect to be processed at any one time,” Alex says. “Know your data set, know your partition count.” However, that’s not everything there is to it. There are two values in Spark on YARN to keep an eye on: the size of your executor, and the YARN memory overhead. This is to prevent the YARN scheduler from killing an application that uses a large amount of NIO memory or other off-head memory areas.

 

Getting the Most out of DAG Management

It’s always a good idea to keep an eye on the complexity of the execution plan. Use the DAG (directed acyclic graph) Visualization tool that comes with SparkUI for one possible visual map. If something that you think should be straightforward (a basic join, for example) is taking 10 stages, you can look at your query or code and perhaps reduce it to two or three stages.

Alex offers one solid tip: “Look at each of the stages in the parallelization.” he says. “Keep an eye on your DAG, not just on the overall complexity. Make sure each stage in your code is actually running in parallel.” If you have a (non-parallel) stage using less than 60% of the available executors, the questions to keep in mind are: Should that compute be rolled into another stage? Is there a separate partitioning issue?

 

Managing Library Conflicts

When it comes to shading, one quick tip from Alex is to make sure that any external dependencies and classes you bring in are available in the environment you are using, and that they do not conflict with internal libraries used by your version of Spark. A specific example of such is the use of Google Protobuf, a popular binary format for storing and transporting data more compact than JSON.

Watch the webinar to learn more about these Spark challenges and how to overcome them, or start your Pepperdata free trial now and see how we can solve these challenges for you!

December 4, 2019

Data Correlation: The Key to Optimal Cluster Performance

Modern data analytics platforms such as Hadoop and Spark have become central to many Fortune 1000 businesses. As critical components for digital business success, they produce insights that can only be obtained by analyzing the massive amounts of data. The continued growth of these systems can mean that an enterprise may be running 100,000 applications a day on 1,000 nodes, and servicing over 2,000 users. Challenges at this massive scale include piecing together performance data from those applications and the infrastructure they are running on. 

There’s no foreseeable end to the relentless growth of users and applications. So how do you address performance management problems and end the headache of constant manual tuning? 

The answer: deploy a solution that automatically correlates both application and infrastructure performance data allowing you to be laser-focused in your efforts to improve performance. This solution must go beyond standard monitoring and provide real actionable insights. 

 

Auto-Correlate Infrastructure and Application Performance Events

It’s much easier to resolve bottlenecks and failures when you have rich contextual information that traverses infrastructure and application performance. 

Application performance management helps developers improve application and query performance within the context of cluster operations. This also supports better organizational alignment with IT Operations. Within today’s enterprise environment, it’s critical that the process is automated. Manual tuning is not an option.

With detailed application/workload metrics, IT Operations can quickly identify and troubleshoot infrastructure issues within such an environment, optimize related cluster resources, and quickly resolve performance problems. Streamlining this process is essential to successfully scaling analytics environments to meet the business’ needs.

 

Application and Infrastructure Correlation Requires a Holistic Approach

Gaining visibility across your distributed system means correlating and visualizing metrics to quickly pinpoint and resolve issues. This requires a holistic approach, one that looks at how your applications interact within the context of your big data infrastructure. 

Pepperdata solutions provide that holistic strategy, allowing a view of your cluster resources and delivering context-aware application tuning recommendations. You get a unified operational view, real-time granular data, and historical references to optimize application performance and resource utilization.

The solutions also make it easy to quickly see whether an application, the infrastructure, or a combination of both are contributing to the latency of your workloads. A 360-degree view of all your performance data in one dashboard lets you gauge performance, diagnose issues up to 90% faster, and improve the overall efficiency of your entire cluster. 

Furthermore, Pepperdata provides intelligent tuning recommendations for improving on application performance so you can better allocate and utilize resources by pinpointing exactly what specific resources each application requires. This application right-sizing also improves your ability to only deploy on-premise and cloud resources that are needed to support a given workload.

 

Further Your Efficiency with Pepperdata Application Spotlight and Pepperdata Platform Spotlight 

Application Spotlight is a self-service portal that provides 360-degree visibility and insights into your Hadoop and Spark application performance data, along with powerful APM tools. Developers get useful, actionable recommendations that eliminate the time-consuming “try-test-repeat” processes.  

Meanwhile, Platform Spotlight continuously monitors and collects unique data from all relevant hardware and execution framework sources, providing a 360-degree cluster view that enables IT Operations to quickly diagnose performance issues and make resource decisions based on user priorities and needs. It also enables alerting to identify the root cause of problems before your users and applications are impacted.  

Pepperdata solutions correlate metrics to provide you with rich telemetry data and actionable insights to monitor and manage the performance of your entire cluster. Tap into the unmatched experience and expertise of Pepperdata to:

  • Get real-time visibility into resource utilization, along with the ability to recapture wasted resources and optimize capacity utilization.
  • Easily troubleshoot difficult issues and automatically optimize your big data application and infrastructure performance, both in the cloud and on-premise.
  • Get detailed tuning recommendations for each of your applications, derived from hundreds of  metrics that Pepperdata tracks in real time.
November 26, 2019

What We Learned at Big Data London 2019

Big Data LDN returned to London last week. Since launching in 2016, the conference’s overall attendance has tripled to over 8,000 people, making it one of the largest big data events in Europe. Over the two days, attendees heard from over 150 data experts and visionaries on a wide variety of big data topics. 

The Pepperdata team were there to meet with enterprise organizations, hear about their analytics infrastructure performance challenges, and talk about how our performance management solutions can help enterprises to optimize their infrastructure and apps.

As we do at most events, we had a blast at Big Data LDN. Reflecting during the flight home, we landed on three key observations that we took away from the event: 

  • Data privacy and ethics are a hot topic 

At the conference, there was a lot of discussion about how countries, lawmakers and businesses should navigate the future in a world defined by data. A lot of the conversation centered around the European Union’s globally applicable General Data Protection Regulation (GDPR), the gold standard for data protection. Poised to resound far beyond Europe, GDPR has set a global benchmark for data protection standards. As more countries around the world are adopting GDPR-like standards, it will be interesting to see whether GDPR inevitably becomes the world standard and what new global business challenges will be created as a result.

 

  • Big data-driven businesses face common analytics performance management challenges

From conference discussions, it was clear that many organizations are managing large clusters with very little insight into how infrastructure resources are being consumed, or how to manage and optimize performance within a multi-tenant distributed system. Though they are spread across different verticals, these organizations are asking the same questions: Is someone hogging all the resources? Why is my app running slower today? They know that bridging this gap in understanding and being able to optimize their analytics infrastructure, is critical to their success.

  • Migrating to the cloud is easy and inexpensive – said no big data company ever

For most organizations, the cloud has quickly transformed from a market disruptor to an essential IT strategy. However, many of these organizations need help, because migrating to the cloud is hard. Migration pitfalls can quickly escalate costs and derail digital transformation efforts. Whether companies were completely cloud-native, hybrid cloud, or taking baby steps toward managing data in the cloud, cloud strategy and cloud migration are top priorities.

November 21, 2019