Two years ago, IBM made a $300 million commitment to Spark. The investment shows how Spark has grown from its humble beginnings at UC Berkeley as an AMPLab project to a production-grade open source project. More importantly, it demonstrates an urgency around Spark as it sees continued enterprise adoption.
I witnessed Spark’s momentum firsthand as one of its earliest adopters. I first came across Spark back in 2011 as a founder at Knobout, a Big Data analytics startup I founded with Dr. Nando DeFreitas, Dr. Alex Smola, Peter Cnudde, and Ian O’Connell. Back then, we relied on MapReduce to run our algorithms, each of which took a day or more running on tens of thousands of machines.
As we learned about Spark, we decided to give it a try. We retooled our infrastructure, learned Scala, rebuilt the algorithms, and ran our analysis. We were among the first to use Spark on tens of thousands of machines and found that Spark was one to two orders of magnitude faster than MapReduce. We were able to take tens of hours of run time and reduce it to minutes!
I remain convinced that Spark is the future of Big Data. Spark has quickly evolved to address the gamut of Big Data applications including interactive, analytical and batch applications. Momentum for Apache Spark continues to build, and I believe Spark will completely replace MapReduce.
This is why Pepperdata has increasingly focused our engineering efforts on Spark. Today we are introducing Pepperdata Code Analyzer for Apache Spark to fill a void in application development for Spark. Code Analyzer helps developers optimize their Spark applications for large-scale production by providing easy access to performance feedback for any particular block of code within a Spark application.
Historically, using the performance metrics from Spark Web UI have been a challenge for developers. Code Analyzer makes it easy for Spark developers to accurately measure how cluster resources (CPU, memory, and network and disk I/O) are consumed. Dev teams are empowered with the ability to pinpoint the specific segment of their application code responsible for performance issues, as well as to communicate with each other about those issues within the DevOps paradigm.
This is all part of the Pepperdata vision to deliver products that are indispensable for operating Big Data systems in production. As with the recent release of Pepperdata Application Profiler and other Pepperdata products, our commitment is to address the entire DevOps cycle by delivering solutions that increase the velocity of the DevOps cycle for production applications.
Continue to keep an eye out for more exciting news from Pepperdata this year.