It’s been an extremely busy–but, very productive – couple of weeks. Pepperdata sponsored two critically important events attended by many of our clients and target customers.
Thousands of developers, data engineers, data scientist and business professionals attended each event. Spark Summit is dubbed the world’s largest event for the Apache Spark community, and DataWorks Summit is one of the industry’s premier big data community events. Combined, these events promote the opportunity to participate in hundreds of formal presentations as well as countless informal conversations.
Events like these let us hear directly from clients and prospects on a range of issues. What types of projects are they working on? What works well? Where do they struggle? What are their primary concerns today, tomorrow, and in the future?
I wanted to share six key takeaways from both events based on the conference sessions that I attended, as well as direct conversations that I had with attendees:
- Formerly known as the Hadoop Summit, DataWorks Summit has expanded its sphere of influence beyond Hadoop. This includes a greater focus on Spark and the use cases enabled by Spark like machine learning, predictive analytics, and artificial intelligence. The focus on Spark and use cases matches our own focus and what we observe within our own customer base.
- There is great momentum behind Spark as observed at both events. Spark was a hot topic in keynotes, in breakout sessions, at the booth, and in the line for coffee. Customers understand the benefits of Spark, but face challenges in writing applications that deliver great performance. Many of them are new to Spark.
- Developers are using Spark to quickly develop projects with sample data on small development clusters, but face significant challenges when deploying those projects on large production clusters with production scale data.
- DevOps for Big Data strongly resonated with participants at both events who expressed the corresponding need for tools that accelerate the DevOps cycle from code to monitor.
- As customers maximize their Big Data investments, there is a corresponding growth in multitenancy usage, and an acute need expressed by operators for tools that can help them deliver performance across clusters.
- Developers and operators are coming to grips with the importance of understanding “cluster weather” when looking at the performance of any single application. Cluster weather is a term that we use at Pepperdata to describe the performance impact imposed by all the applications vying for resources on a cluster and the health of all the cluster resources for an application of interest at any given point in time. We observe cluster weather as a combined view of all Spark and Hadoop applications that run on the cluster, and the health of all nodes on the cluster. What many customers think of as a “Big Data” problem is a DevOps issue that requires a detailed understanding of cluster performance and the impact of individual applications in a multitenancy environment. Look for a blog post focused on cluster weather in the coming weeks.
Based on hundreds of conversations over the past couple of weeks, I am confident that Pepperdata is well positioned to serve these industry trends. Pepperdata uses fine granularity time-series data across the full stack, combined with active automatic controls that maximize cluster utilization to display the performance impacts of developing and running Big Data clusters. Other tools don’t provide performance views into applications, nodes, and clusters to identify where problems originate. Several customers revealed that prior to selecting Pepperdata, they spent months using other