As a capacity manager, you’re responsible for managing costs and growth as well as driving business by helping developers succeed with their applications. Both the applications and the platform must to be high-performing, and developers must understand the aspects of the platform that affect their applications. It’s one thing to get a platform up and running and allow users to run applications; it’s another thing to make truly effective decisions and changes.
Capacity managers are in a very unique position because they’ve got a responsibility to the business to ensure that users are satisfied and the business is benefiting from optimal performance and capacity. With your APM solution, you should be able to answer yes to the following questions:
- Can your platform go faster?
- Are users constrained in any way?
- Do your business priorities align with application performance?
- Are you getting all the correct data to answer performance questions?
- Is your solution helping you to maximize your big data resource investment?
- Are you able to receive alerts about potential problems before they happen?
- Can you provide self-service access so that users can check their own application status?
- Are there any drives that are about to die?
- Who’s blowing up the cluster?
- How can I run more applications?
- Why does YARN say it’s full when I know I have capacity?
To answer these questions and leverage your existing investment, you need an APM solution that:
- Improves throughput, uptime, efficiency and performance in a multi-tenant environment.
- Provides comprehensive reporting for accurate capacity planning.
- Recaptures wasted resources.
- Identifies the users and applications that are putting the biggest demands on your cluster.
A comprehensive APM solution collects all relevant data, metrics on both application workloads – user behaviors, errors, response times, API calls, etc.– and the environment–resource utilization, data sources, etc.–to obtain accurate and useful insights. By measuring and tracking user transactions, you can understand how applications behave and whether SLAs are being met. Environment measurements help you identify patterns in resource usage and capacity demands. The goal of measuring and analyzing these is to deliver an excellent user experience and get the most out of your infrastructure.
Pepperdata: Massive Scale APM
Pepperdata provides a complete view of your entire cluster so that you can uncover performance problems and identify patterns that impact the entire application environment and make intelligent resource decisions. Pepperdata continuously collects extensive data on hundreds of real-time metrics from all of your applications and infrastructure resources — metrics about CPU, RAM, disk I/O, and network usage for every job, task, user, host, workflow, and queue. This data is not available with any other tool or solution. Pepperdata helps you understand and improve platform performance, reduce mean time to problem resolution and increase capacity utilization by 30-50% without adding new hardware. In addition to surfacing performance bottlenecks, Pepperdata provides automatic tuning for recurring applications, delivers app-specific recommendations and allows you to set up alerts on specific behaviors and outcomes to avoid the risk of failure.
360° Platform View
You need a holistic source of operational and performance truth across your clusters. Pepperdata allows you to access real-time and historical data about the cluster, including system demand, abusive users, and wasteful applications, queues, and container sizes. Drill down or zoom out to analyze any application and understand