DevOps for Big Data
DevOps is the modern standard for application development and deployment, fostering collaboration and communication between developers, quality assurance, and IT operations teams. DevOps toolchains improve and automate stages and feedback loops within the DevOps cycle: plan, code, build, test, release, deploy, operate, and monitor. DevOps can shorten time to delivery, improve user satisfaction, deliver a higher quality product, improve productivity and efficiency, and better meet user needs by allowing faster experimentation.
DevOps is a part of many successful Big Data environments, even if it’s not always recognized as such today. DevOps-style rapid iteration, feedback, and release cycles are clearly used in many such environments.
Performance is Key in DevOps for Big Data
With Big Data, performance can mean the difference between business critical and business useless. Better performance allows users to discover more, run more accurate models, and extract more value from data. And it’s not just raw speed—operations teams must be able to manage systems to achieve performance, reliability, and scale, for example when running mixed data science and production-critical jobs on multi-tenant clusters. Modern distributed Big Data systems and applications are complex, and achieving acceptable performance is often a challenge.
Since performance is critical to Big Data, it is a key part of any thinking about DevOps for Big Data. The feedback loop and toolchain must include rich performance feedback that is usable by developers, analysts, data scientists, quality assurance, and operations staff.
Every step of the DevOps cycle—plan, code, build, test, release, deploy, operate, and monitor—must become performance aware. For example, code that is not parallel enough, tests that do not check performance, deployment on improperly configured hardware, and resource under-utilization or over-contention can limit or negate the success of Big Data applications.