The Big Data Performance Company

DevOps is the modern standard for application development and deployment, fostering collaboration and communication between developers, quality assurance, and IT operations teams. DevOps toolchains improve and automate stages and feedback loops within the DevOps cycle: plan, code, build, test, release, deploy, operate, and monitor. DevOps can shorten time to delivery, improve user satisfaction, deliver a higher quality product, improve productivity and efficiency, and better meet user needs by allowing faster experimentation.
DevOps is a part of many successful Big Data environments, even if it’s not always recognized as such today. DevOps-style rapid iteration, feedback, and release cycles are clearly used in many such environments.

Performance is Key for Big Data Developers and Operators

With Big Data, performance can mean the difference between business critical and business useless. Better performance allows users to discover more, run more accurate models, and extract more value from data. And it’s not just raw speed—operations teams must be able to manage systems to achieve performance, reliability, and scale, for example when running mixed data science and production-critical jobs on multi-tenant clusters. Modern distributed Big Data systems and applications are complex, and achieving acceptable performance is often a challenge.
Since performance is critical to Big Data, it is a key part of any thinking about DevOps for Big Data. The feedback loop and toolchain must include rich performance feedback that is usable by developers, analysts, data scientists, quality assurance, and operations staff.
Every step of the DevOps cycle—plan, code, build, test, release, deploy, operate, and monitor—must become performance aware. For example, code that is not parallel enough, tests that do not check performance, deployment on improperly configured hardware, and resource under-utilization or over-contention can limit or negate the success of Big Data applications.

Bringing performance feedback to every phase of the DevOps cycle is critical to business success.

Pepperdata Application Performance Management (APM) and Operations Performance Management (OPM)

The Pepperdata product suite brings this critical performance feedback to every phase of the DevOps cycle for Big Data. Pepperdata Cluster Analyzer collects hundreds of metrics tracking the use of CPU, memory, disk I/O, and network resources by container/task, job, and user on a second-by-second basis. Operators and developers can use this highly granular performance data to troubleshoot and solve performance issues in their applications and clusters overall.
Pepperdata brings total performance management to big data. For developers, Application Spotlight our self-service application performance management (APM) portal makes it easy to get recommendations and insights into how to optimize applications and the root cause of bottlenecks and failures. For operators, the Cluster Analyzer operations performance management (OPM) solution makes it easy to identify applications and users causing issues on the platform, proactively alert on those issues, and improve cluster performance. Cluster Analyzer also have roll up reports for things like chargeback and capacity planning. Pepperdata software also automatically and dynamically optimizes the usage of cluster resources using machine learning algorithms, enabling administrators to maintain the cluster at peak performance. The Capacity Optimizer add-on module automatically increases cluster throughput 30-50% by addressing some of the inefficiencies of how YARN does resource management today.