OPTIMIZE PERFORMANCE FOR YOUR ENTIRE BIG DATA STACK

PLATFORM SPOTLIGHT

APPLICATION SPOTLIGHT

CAPACITY OPTIMIZER

The 451 Take on Cloud-Native: Truly Transformative for Enterprise IT

Helping to shape the modern software development and IT operations paradigms, cloud-native represents a significant shift in enterprise IT. In this report, we define cloud-native and offer some perspective on why it matters and what it means for the industry.

Elements of Big Data APM Success

Pepperdata delivers proven big data APM products, operational experience, and deep expertise.

PLATFORM SPOTLIGHT
PLACEHOLDER

Request a trial to see firsthand how Pepperdata big data solutions can help you achieve big data performance success. Pepperdata’s proven APM solutions provide a 360° degree view of both your platform and applications, with realtime tuning, recommendations, and alerting. See and understand how Pepperdata big data performance solutions helps you to quickly pinpoint and resolve big data performance bottlenecks. See for yourself why Pepperdata’s big data APM solutions are used to manage performance on over 30K Hadoop production clusters.

Request Trial

Resources

Cloudwick Collaborates with Pepperdata to Ensure SLAs and Performance are Maintained for AWS Migration Service

Pepperdata Provides Pre- and Post-Migration Workload Analysis, Application Performance Assessment and SLA Validation for Cloudwick AWS Migration Customers

San Francisco — Strata Data Conference (Booth 926)  — March 27, 2019 — Pepperdata, the leader in big data Application Performance Management (APM), and Cloudwick, leading provider of digital business services and solutions to the Global 1000, today announced a collaborative offering for enterprises migrating their big data to Amazon Web Services (AWS). Pepperdata provides Cloudwick with a baseline of on-premises performance, maps workloads to optimal static and on-demand instances, diagnoses any issues that arise during migration and assesses performance after the move to ensure the same or better performance and SLAs.

“The biggest challenge for enterprises migrating big data to the cloud is ensuring SLAs are maintained without having to devote resources to entirely re-engineer applications,” said Ash Munshi, Pepperdata CEO. “Cloudwick and Pepperdata ensure workloads are migrated successfully by analyzing and establishing a metrics-based performance baseline.”

“Migrating to the cloud without looking at the performance data first is risky for organizations and if a migration is not done right, the complaints from lines of business are unavoidable,” said Mark Schreiber, General Manager for Cloudwick. “Without Pepperdata’s metrics and analysis before and after the migration, there is no way to prove performance levels are maintained in the cloud.”

For Cloudwick’s AWS Migration Services, Pepperdata is installed on customers’ existing, on-premises clusters — it takes under 30 minutes — and automatically collects over 350 real-time operational metrics from applications and infrastructure resources, including CPU, RAM, disk I/O, and network usage metrics on every job, task, user, host, workflow, and queue. These metrics are used to analyze performance and SLAs, accurately map workloads to appropriate AWS instances, and provide cost projections. Once the AWS migration is complete, the same operational metrics from the cloud are collected and analyzed to assess performance results and validate migration success.

To learn more, stop by the Pepperdata booth (926) at Strata Data Conference March 25-28 at Moscone West in San Francisco.

More Info

About Pepperdata
Pepperdata (https://pepperdata.com) is the leader in big data Application Performance Management (APM) solutions and services, solving application and infrastructure issues throughout the stack for developers and operations managers. The company partners with its customers to provide proven products, operational experience, and deep expertise to deliver predictable performance, empowered users, managed costs and managed growth for their big data investments, both on-premise and in the cloud. Leading companies like Comcast, Philips Wellcentive and NBC Universal depend on Pepperdata to deliver big data success.

 Founded in 2012 and headquartered in Cupertino, California, Pepperdata has attracted executive and engineering talent from Yahoo, Google, Microsoft and Netflix. Pepperdata investors include Citi Ventures, Costanoa Ventures, Signia Venture Partners, Silicon Valley Data Capital and Wing Venture Capital, along with leading high-profile individual investors. For more information, visit www.pepperdata.com.

About Cloudwick

Cloudwick is the leading provider of digital business services and solutions to the Global 1000. Its solutions include data migration, business intelligence modernization, data science, cybersecurity, IoT and mobile application development and more, enabling data-driven enterprises to gain competitive advantage from big data, cloud computing and advanced analytics. Learn more at www.cloudwick.com.

###

Contact:
Samantha Leggat
samantha@pepperdata.com

Pepperdata and the Pepperdata logo are registered trademarks of Pepperdata, Inc. Other names may be trademarks of their respective owners.

March 27, 2019

Pepperdata Announces Free Big Data Cloud Migration Cost Assessment to Automatically Select Optimal Instance Types and Provide Accurate Cost Projections

Pepperdata Eliminates Guesswork and Complexity Associated with Identifying Best Candidate Workloads Down to Queue, Job and User Level, for Moving to AWS, Azure, Google Cloud or IBM Cloud

CUPERTINO, Calif. — March 6, 2019 — Pepperdata, the leader in big data Application Performance Management (APM), today announced its new Big Data Cloud Migration Cost Assessment for enterprises looking to migrate their big data workloads to AWS, Azure, Google Cloud or IBM Cloud. By analyzing current workloads and service level agreements, the detailed, metrics-based Assessment enables enterprises to make informed decisions, helping minimize risk while ensuring SLAs are maintained after cloud migration.

The Pepperdata Big Data Cloud Migration Cost Assessment provides organizations with an accurate understanding of their network, compute and storage needs to run their big data applications in the hybrid cloud. Analyzing memory, CPU and IO every five seconds for every task, Pepperdata maps the on-premises workloads to optimal static and on-demand instances on AWS, Azure, Google Cloud, and IBM Cloud. Pepperdata also identifies how many of each instance type will be needed and calculates cloud CPU and memory costs to achieve the same performance and SLAs of the existing on-prem infrastructure.

“When enterprises consider a hybrid cloud strategy, they estimate the cost of moving entire clusters, but that’s not the best approach,” said Ash Munshi, Pepperdata CEO. “It’s far better to identify specific workloads that can be moved to take full advantage of the pricing and elasticity of the cloud. Pepperdata collects and analyzes detailed, granular resource metrics to accurately identify optimal workloads for cloud migration while maintaining SLAs.”

The Big Data Cloud Migration Cost Assessment enables enterprises to:

  • Automatically analyze every workload in your cluster to accurately determine their projected cloud costs
  • Get cost projections and instance recommendations for workloads, queues, jobs, and users
  • Map big data workloads to various instance types including static and on-demand
  • Compare AWS, Azure, Google Cloud, and IBM Cloud

Availability

Pepperdata Big Data Cloud Migration Cost Assessment is available free at pepperdata.com/free-big-data-cloud-migration-cost-assessment. Pepperdata customers should email support@pepperdata.com for their free assessment.

Learn more:

About Pepperdata
Pepperdata (https://www.pepperdata.com) is the leader in big data Application Performance Management (APM) solutions and services, solving application and infrastructure issues throughout the stack for developers and operations managers. The company partners with its customers to provide proven products, operational experience, and deep expertise to deliver predictable performance, empowered users, managed costs and managed growth for their big data investments, both on-premise and in the cloud. Leading companies like Comcast, Philips Wellcentive and NBC Universal depend on Pepperdata to deliver big data success.

 Founded in 2012 and headquartered in Cupertino, California, Pepperdata has attracted executive and engineering talent from Yahoo, Google, Microsoft and Netflix. Pepperdata investors include Citi Ventures, Costanoa Ventures, Signia Venture Partners, Silicon Valley Data Capital and Wing Venture Capital, along with leading high-profile individual investors. For more information, visit www.pepperdata.com.

###

Contact:
Samantha Leggat

925-447-5300
samantha@pepperdata.com

Pepperdata and the Pepperdata logo are registered trademarks of Pepperdata, Inc. Other names may be trademarks of their respective owners.

March 5, 2019

Pepperdata Unveils 360° Reports, Enabling Enterprises to Make More Informed Operational Decisions to Maximize Capacity and Improve Application Performance

360° Reports Empower Executives to Better Understand Financial Impacts of Operational Decisions

CUPERTINO, Calif. — February 19, 2019 — Pepperdata, the leader in big data Application Performance Management (APM), today announced the availability of 360° Reports for Platform Spotlight. Pepperdata 360° Reports leverage the vast amount of proprietary data collected and correlated by Pepperdata to give executives capacity utilization insights so they better understand the financial impacts of operational decisions.

“Pepperdata 360° Reports demonstrate the power of data and the valuable insights Pepperdata provides, enabling enterprises to make more informed and effective operational decisions,” said Ash Munshi, Pepperdata CEO. “Operators get a better understanding of what and where they’re spending, where waste can be reclaimed, and where policy and resource adjustments can be made to save money, maximize capacity and improve application performance.”

360° Reports for Pepperdata Platform Spotlight include:

  • Capacity Optimizer Report: This gives operators insight into memory and money saved by leveraging Pepperdata Capacity Optimizer to dynamically recapture wasted capacity.
  • Application Waste Report: This report compares memory requested with actual memory utilization so operators can optimize resources by changing resource reservation parameters.
  • Application Type Report: This gives operators insight on the technologies used across the cluster and the percentage of each (percentage of Spark jobs, etc.). This provides executives with insights into technology trends to make more data-driven investment decisions.
  • Default Container Size Report: This report identifies jobs using default container size and where any waste occurred so operators can make default container size adjustments to save money.
  • Pepperdata Usage Report: This presents Pepperdata dashboard usage data, highlighting top users, days used, and more to give operators insights to maximize their investment. With this data, operators can identify activities to grow the user base, such as promoting features, scheduling onboarding sessions, and training on custom alarms.

Availability

Pepperdata 360° Reports are available immediately for Pepperdata Platform Spotlight customers. For a free trial of Pepperdata, visit https://www.pepperdata.com/trial.

About Pepperdata
Pepperdata (https://pepperdata.com) is the leader in big data Application Performance Management (APM) solutions and services, solving application and infrastructure issues throughout the stack for developers and operations managers. The company partners with its customers to provide proven products, operational experience, and deep expertise to deliver predictable performance, empowered users, managed costs and managed growth for their big data investments, both on-premise and in the cloud. Leading companies like Comcast, Philips Wellcentive and NBC Universal depend on Pepperdata to deliver big data success.

 Founded in 2012 and headquartered in Cupertino, California, Pepperdata has attracted executive and engineering talent from Yahoo, Google, Microsoft and Netflix. Pepperdata investors include Citi Ventures, Costanoa Ventures, Signia Venture Partners, Silicon Valley Data Capital and Wing Venture Capital, along with leading high-profile individual investors. For more information, visit www.pepperdata.com.

###

Contact:
Samantha Leggat
samantha@pepperdata.com

Pepperdata and the Pepperdata logo are registered trademarks of Pepperdata, Inc. Other names may be trademarks of their respective owners.

Sample report attached.

Sample Capacity Optimizer Report – memory and money saved with Capacity Optimizer

February 19, 2019

Big Data and the Amazing Evolution of Application Performance Management (APM)

My journey with Big Data began at a time when performance monitoring wasn’t a major focus, and yet it landed me at a company where that’s all we do. In 2012, I joined Groupon as part of a team of systems engineers who were focused on best practices. With the exception of Nagios telling us that a host was down or otherwise unreachable, there wasn’t much in terms of performance monitoring. The handful of Hadoop clusters at the time were all based on Cloudera CDH3.x (MapReduce v1), and within my first year most of this hardware was replaced with infrastructure that could handle more data with faster throughput. The notion of running Hadoop on commodity servers was fading as organizations adopted high-end infrastructure that could handle the increasing demands of big data workloads.

Eventually, the demand for Hadoop infrastructure grew to cover almost every group in the company. More than 2,000 nodes were in production, spanning two shared clusters that handled ETL and various other batch processing workloads plus some additional large clusters handling various other production workloads. The sudden need for “data-driven everything” was catching on at data-dependent organizations around the world. So while companies were investing millions in upgraded hardware, the question remained, “Who is going to manage all of this infrastructure?” At some point, the answer to this question pointed to me, when in 2014 I became the engineering manager responsible for Hadoop operations. Now I had to build a team.

Managing Big Data systems is tricky. But staffing Big Data operations teams is even harder. Finding experienced engineers with solid Linux administration experience, a decent mastery of an automation system (puppet, chef, cf engine) and experience working at scale with Hadoop was extremely challenging.  Going through the process of screening calls and interviews, it was clear that the majority of candidates were not qualified even though their freshly-obtained certified administrator credential claimed otherwise. The biggest obstacle was finding candidates who had worked on these systems at scale, and had experienced the interesting ways in which Hadoop could “break”.

Fortunately, I was able to recruit two rock-star engineers internally. Unfortunately, my new team spent a substantial part of their time answering questions like, “Why is my job running so slow? Why did my job fail?, and “Is there something wrong with the Hadoop cluster?” By the time I was managing the team we inherited workloads from other teams that we migrated to an Ambari-based shared cluster.

Inheriting clusters with existing workloads and data made it impossible to deploy best practices and policies with a clean slate. Implementing them on existing data pipelines, while not impossible, is quite challenging because of all the moving parts. What type of compression is used? What is the minimum file size? How to best manage data partitioning, table layout, data at rest encryption, security, etc? These are not things you want to discuss after the fact. When teams are busy spending the majority of the time trying to answer “Why is my job running slowly today?”, there isn’t much time left to work on infrastructure improvements.

At least now we had some monitoring, but each trouble ticket had to be investigated thoroughly. Triage took ages. It seemed like every support request required digging through application logs, the yarn application history server, the spark history server, node manager logs, and resource manager logs. Missed SLAs started a blame-game among user groups. There were no detailed metrics to either substantiate claims or resolve them.

Eventually, our team received a support ticket from an engineer indicating that there was excessive garbage collection causing delays in workloads. There wasn’t. HDFS was hosed. Yes, small files. I spent the majority of my time in meetings explaining what small files were, why they are breaking HDFS, and why we needed to address the issue immediately. The sky was falling and HDFS performance on the main shared cluster was dismal. Fortunately, at that time, I discovered a company named Pepperdata after making a desperate request to Google in search of Hadoop monitoring solutions.

Within a week of connecting with Pepperdata, my Hadoop operations team was working with their field engineers on a POC on that very same, failing cluster. Pepperdata instantly confirmed the small file issue by clearly identifying the number of files opened under 1Mb and then 10Mb per application in a nifty dashboard with data organized by username, app id, group, or pretty much any category I wanted. With hundreds of other metrics and the ability to understand what was going on with the cluster as well as individual applications in next to real-time, my prayers were answered. Now we knew where to start looking when problems arose.

Eventually, we were able to get HDFS performing optimally. Running the load balancer no longer brought the filesystem and applications accessing it to a screeching halt. I did, however, wind up leaving the company about 6 months later. After five years as a Hadoop operations manager, I was burnt out. After taking a break for a few months, I was ready to rejoin the workforce.  The first company I reached out to was Pepperdata, and I ended up joining the company in 2017. This was no accident.

I was really impressed by the technology and the people at Pepperdata while working with them on a POC deployment at my previous employer. They had a solution that could bridge the communication gap between engineering and operations and correlate application performance with infrastructure performance, providing a single pane of glass to work from…something that could have saved me a lot of lost sleep (and hair) back in 2014. Fast forward to 2019, and I’m still very excited about my role at Pepperdata. I get to work with a lot of awesome customers, several in the Fortune 100, who experience many of the big data performance challenges that I’m familiar with.

At Pepperdata I’m able to help customers with best practices by optimizing big data performance monitoring and management, and dramatically reducing their triage time. I also teach developers to monitor and performance-tune their own applications using our big data APM solutions. If you are managing Hadoop, Spark, or Yarn-based infrastructure and are experiencing any of the challenges that I described in this blog post, please reach out to me at support@pepperdata.com.

Things You Can Do on the Pepperdata Web Site

April 17, 2019

The Many Ways of Modern Big Data Chargeback

In this week’s blog, we’ll describe various approaches to successful big data chargeback as observed by Pepperdatafrom our unique vantage point providing performance management solutions for hundreds of big data platforms around the world.

Big data chargeback models have been fluid since the earliest deployments of Hadoop. A long-standing question has been, what is the most effective way to quantify and recoup the cost of providing a big data platform to multiple tenants? The answer has changed over time as more enterprises continue trying various, evolving approaches. As you’ll see, the answer is neither simple or static.

The Old Way

Hadoop itself is synonymous with big data and it’s no surprise that early chargeback models were driven by the amount of data an individual or group stored in the Hadoop File System (HDFS). Initially charging per the amount of data stored was fairly effective. Early on, groups or individuals were pulling in datasets for manipulation and managing the space they were provided as a standard. Data lifecycle management as a big data practice began to break this model.

When big data platforms hit their early growth curves, businesses took a critical look at what was driving the growth. They found data duplication and replicated processing pipelines both having a strong negative impact on ROI for the platform as a whole. To tackle these problems, production processes were stood up to ingest and prepare data for multiple tenants, enabling over-arching data management without the waste. Extract, transform, and load (ETL) processes became the norm, at which point production or service accounts began to “own” the majority of the data in these environments. The old storage-based chargeback models needed to be revisited.

The New Way

When individuals no longer own the bulk of the data, the model naturally shifts to charging for the use of the data as the primary chargeback method. Charging tenants for the use of data required quantifying the workloads that were run on any data set. Resource management in Hadoop relies heavily on the segmentation of access to CPU and RAM. Given the modern state of data management, charging tenants for the amount of CPU and RAM they used to extract value from data became the norm.

This new approach has a number of advantages over the old model. First, you eliminate any gray areas that arise when data ownership is not clear cut. Should the fraud prevention group own the transaction data they use to train their models when that same data is used by seven other groups within the business? No need to answer this question if you charge by resources used. Second, this system is a lot harder to game. Believe it or not, tenants will adjust usage patterns to incur less cost. In the extreme, there are cases where the data owner becomes aware of the audit schedule and deletes or compresses data only to reingest or expand it again after the measurements are taken. Better to incentivize the more efficient use of CPU and RAM, as this behavioral adjustment actually improves the overall health of the environment.

The Next Way?

If there’s one constant in technology, it’s change. The current best practice may not be the best way after the next wave of complexity or innovation. There are already questions about the new ways that enterprises are exploring to improve chargeback along two key vectors:

  • Attribution
    • What is a tenant?
    • Can we change how a tenant is defined on the fly?
    • Can we group workloads across queues or even clusters?
  • Flexibility
    • Can we re-calculate chargeback totals if needed?
    • Can we charge different rates for high-performance hardware like SSDs and GPUs?
    • What about network and disk IO as factors?

Are You Ready for the Future?

The key to successful chargeback is to capture the performance data for every workload running on the platform in a way that supports whichever chargeback model best serves your business needs. Time-series data, captured every five seconds across every hardware component and application layer is the optimal way to accomplish this goal.

Pepperdata performance management solutions, including flexible chargeback, are all built on the foundation of our unique, best in class, data platform. To learn more, contact us at info@pepperdata.com

A few Additional Things You Can Do

April 9, 2019

Maintain SLAs and Performance after Migrating to the Cloud

We learned of some interesting big data use cases and met many people embarking on exciting projects last week at the Strata Data Conference in San Francisco. Cloud migration has been an especially hot topic for enterprises lately, and not surprisingly, it came up often at the show. To remain competitive, organizations need real-time access to their data, and the reality is that a multi-cloud strategy is likely inevitable. So it makes sense that, along with helping attendees understand the importance of unified big data performance management, cloud migration was a familiar topic of discussion at the show.

We recently identified Three Crucial Considerations When Assessing Your Big Data Cloud Migration, which addressed pre-migration analysis. Today we’ll look at the bigger picture, from pre- to post-migration, to help ensure applications in the cloud perform reliably and meet their SLAs.

Pre-Migration Baseline and Workload Mapping 

Migrating to the cloud without first looking at on-premises performance data is risky. If a migration is not done right, it’s more likely you’ll have to devote time, resources and money to re-engineer applications and complaints (and missed SLAs) will be inevitable. Assess your current on-premises environment carefully and look closely at what’s going on. Collect and analyze data, see how applications are performing, gauge your SLAs, and determine where there are typically usage bursts and performance dips. The only way to do that is to collect and measure actual metrics over time: Look at what’s happening for a month so you can see where the spikes are, when you are more likely to see issues, and what you need to ensure SLAs are maintained when you move to the cloud.

Each cloud service provider offers various instance types and each with different compute, memory, and storage capabilities. You’ll want to analyze your baseline assessment with the various instance types to identify which workloads, with their baseline and burst levels, are best suited for static or on-demand instances based on CPU and memory requirements. These results will help you calculate costs for each workload in the various cloud providers to ensure the most cost-effective implementation of your hybrid or multi-cloud strategy.  

Post-Migration Assessment

One of the biggest cloud migration hurdles is ensuring application SLAs and performance are maintained after migrating to the cloud. Skipping this crucial step could be disastrous.

Once the migration is complete, assess the applications in the cloud and compare the performance to the results of your on-premises baseline. Collect performance metrics again once everything is in the cloud and compare the results to your original baseline assessment. This will indicate whether performance levels have been maintained in the cloud or if more work needs to be done to make improvements.

Leverage Automation and Expertise

You could do the assessment manually but it’s difficult, time-consuming, and will likely cause delays in migration. Applying automation to pre- and post-migration discovery will accelerate the process and ensure accuracy.  

Pepperdata can automatically analyze and profile every workload in your cluster to accurately determine your projected cloud costs, and provide the most appropriate instance recommendations for workloads, queues, jobs, and users (learn more). We map your big data workloads to various instance types to meet SLA requirements and enable you to compare services and costs across AWS, Azure, Google Cloud, and IBM Cloud (start a free assessment here).

Additionally, if your cloud service provider is AWS, Pepperdata has partnered with the big data experts at Cloudwick to ensure migration success. Pepperdata provides Cloudwick migration services customers with a baseline of on-premises performance, maps workloads to optimal static and on-demand instances, and diagnoses any issues that may arise during migration (read the full press release.) Once the migration is complete, Pepperdata collects and analyzes the same operational metrics from the cloud to assess performance results and validate migration success.

Learn more about Pepperdata and how we can help you with your cloud migration strategy here.

More resources

April 2, 2019