DevOps for Big Data

DevOps is the modern standard for application development and deployment, fostering collaboration and communication between developers, quality assurance, and IT operations teams. DevOps toolchains improve and automate stages and feedback loops within the DevOps cycle: plan, code, build, test, release, deploy, operate, and monitor. DevOps can shorten time to delivery, improve user satisfaction, deliver a higher quality product, improve productivity and efficiency, and better meet user needs by allowing faster experimentation.

DevOps is a part of many successful Big Data environments, even if it’s not always recognized as such today. DevOps-style rapid iteration, feedback, and release cycles are clearly used in many such environments.

Performance is Key in DevOps for Big Data

With Big Data, performance can mean the difference between business critical and business useless. Better performance allows users to discover more, run more accurate models, and extract more value from data. And it’s not just raw speed—operations teams must be able to manage systems to achieve performance, reliability, and scale, for example when running mixed data science and production-critical jobs on multi-tenant clusters. Modern distributed Big Data systems and applications are complex, and achieving acceptable performance is often a challenge.

Since performance is critical to Big Data, it is a key part of any thinking about DevOps for Big Data. The feedback loop and toolchain must include rich performance feedback that is usable by developers, analysts, data scientists, quality assurance, and operations staff.

Every step of the DevOps cycle—plan, code, build, test, release, deploy, operate, and monitor—must become performance aware. For example, code that is not parallel enough, tests that do not check performance, deployment on improperly configured hardware, and resource under-utilization or over-contention can limit or negate the success of Big Data applications.

Bringing performance feedback to every phase of the DevOps cycle is critical to business success.

Pepperdata Products and DevOps for Big Data

The Pepperdata product suite brings this critical performance feedback to every phase of the DevOps cycle for Big Data. Pepperdata software collects hundreds of metrics tracking the use of CPU, memory, disk I/O, and network resources by container/task, job, and user on a second-by-second basis. Operators and developers can use this highly granular performance data to troubleshoot and solve performance issues in their applications and clusters overall. Pepperdata software also automatically and dynamically optimizes the usage of cluster resources using machine learning algorithms, enabling administrators to implement policies that guarantee the completion of high-priority jobs while maintaining the cluster at peak performance.