What You Need to Know When Choosing a Big Data Performance Management Solution

The tendency to classify and categorize people, objects, information, and experiences is a deeply ingrained aspect of human nature.  In many cases, this is a good thing. Without this ability, we’d quickly become overwhelmed in every new situation. Nevertheless, this fundamental skill can also be misleading and sow confusion and misunderstanding.

Consider information technology, with its plethora of products, markets, and associated acronyms. Let’s take Application Performance Management (APM), for example. What is APM, exactly? Gartner defines it as one or more software and hardware components that facilitate monitoring to meet five main functional dimensions: end-user experience monitoring, runtime application architecture discovery modeling and display, user-defined transaction profiling, component deep-dive monitoring in an application context, and analytics.

In other words, APM comprises the tools and solutions used to analyze and track a software application’s performance as well as to alert the system administrator or other stakeholders of any problematic performance issues. Sometimes the term is used interchangeably with application performance “monitoring”. Either way, performance management tools help IT professionals discover root causes of infrastructure or application issues.

Why APM and IPM are Merging

Many enterprises maintain separate Infrastructure Performance Management (IPM) and APM services to monitor and manage performance issues for individual vendors, technology, or domains. In the world of Big Data, APM and IPM are merging, eliminating the need for separate domain-specific tools and processes. Pepperdata, for example, provides an integrated platform that blends together performance data across infrastructure and applications to provide a more comprehensive assessment that continuously correlates application performance and infrastructure performance in near-real-time.

Some vendors that have no insight into infrastructure performance management may claim to guarantee the reliability and performance of your Big Data applications. How can they do that you may ask? The fact is…they can’t. Without insight into your operational infrastructure, you’re missing a critical aspect that can make or break the performance of your Big Data application environment and negatively impact user experience.

Assuming an application’s code is working properly, the performance of the application ultimately depends on physical infrastructure. In distributed computing environments, shared physical resources (CPU, memory, storage and networking) support virtual machines (VMs), and those resources may become overcommitted—especially during peak usage times. For example, a VM may be under-provisioned because it is configured with inadequate memory, a situation similar to that of an application running on a physical server with inadequate memory. One impact of this is constant memory swapping, which can slow application performance.

An APM vendor that doesn’t support comprehensive IPM might suggest you buy a separate IPM product to discover the root cause of infrastructure performance issues. But siloed APM and IPM systems are problematic: They discourage collaboration and complicate efforts to correlate and analyze IT performance data and application performance data.

The most efficient approach is a converged solution that blends APM and IPM within a unified Big Data performance platform. This provides IT professionals with comprehensive visibility of their platform and issues impacting their services and enables them to be more proactive with infrastructure and application issues, rather than being reactive. This approach not only delivers greater user satisfaction and productivity but is much more scalable.

Big Data Performance Management – What’s Needed

A true Big Data performance management solution must be able to determine (a) how applications are impacting infrastructure performance and (b) how the infrastructure is impacting application performance. Without tight integration of APM and IPM, it’s not possible to meaningfully correlate the two. Unlike fragmented and limited approaches to managing Big Data performance, Pepperdata provides a unified solution that combines application performance management (APM) and infrastructure performance management (IPM) into a single, comprehensive platform.

Automation is playing a greater role in performance management solutions. Driven by AI and machine learning, Pepperdata’s unified IPM and APM solution automatically detects performance roadblocks, something that products limited to APM simply cannot effectively do. AI and ML enable Pepperdata to continuously learn, identify, and address abnormalities in infrastructure and application behavior by adjusting policies to account for changes in usage and performance.

Some APM vendors promise to “untangle” your data and help you tune your jobs, but that’s a limited focus. Pepperdata’s Big Data performance management solutions do that as well, but additionally include a strong operational component to provide IT with answers to an important question like:

  • What application is blowing up our cluster right now?
  • What applications are reading/writing small files?
  • What drives are about to die?
  • What application just flooded HDFS and is hammering the NameNode?
  • Who are my most expensive users?

Questions to Ask Before You Commit

Before you commit to a Big Data performance management solution from any vendor, here are some important issues to consider:

  • Can the solution correlate the applications to the infrastructure, from node-level to application-level, and show you the root-cause of a performance problem?
  • Can the solution instrument everything on your Big Data platform?
  • Can the solution collect sufficient granular real-time metrics on jobs and tasks, including CPU, RAM, disk I/O, and network usage to help you isolate and identify problems?
  • Can the solution continuously collect real-time data from applications and platform resources (history servers, resource managers, etc.)?
  • Can the solution scale to support your organization as it grows to hundreds or thousands of nodes?
  • Can the solution help you automatically optimize resource capacity to help you meet your SLAs?
  • Can the solution provide you with queue-level information and help your workload better fit the cluster?
  • Can the solution be applied in complex, multi-tenant cluster environments?
  • Can the solution help you get more out of your investment?

These are just some of the questions you need to ask when you evaluate a Big Data performance management solution. Remember, Big Data performance management involves more than just big data APM: It must also provide infrastructure performance management and be able to correlate operational and application performance data in order to effectively optimize performance management across the enterprise.

Pepperdata not only helps optimize applications but also helps your operators understand what’s happening across the system. An application slowdown can be the result of a busy cluster and Pepperdata is the only solution that can automatically correlate the infrastructure and application events to make this determination. Pepperdata then takes the next step to automatically and instantaneously tune your Big Data platform for optimum performance. As a result, most of our customers see a 30-50% throughput increase without any code or configuration changes…meaning you can get up to 50% more compute power without purchasing additional hardware

Contact sales@pepperdata.com today and learn how we can help you.

More info: