From our introduction of developer tools that provide Hadoop and Spark developers with easy-to-understand recommendations for improving job performance to working closely with our growing customer base of Big Data heavy hitters, 2017 was a good year for Pepperdata. We saw that Big Data is changing, with less emphasis on infrastructure and more emphasis on machine learning and applications. Spark continued to gain momentum, becoming the dominant framework for developing new Big Data applications. We think 2018 will give rise to even more innovation and breakthroughs, and we are excited about what these changes mean for Pepperdata in 2018 and beyond.
We gathered four Pepperdata experts to discuss Big Data trends for 2018 and asked them to each make a prediction. Here’s what we came up with:

AI will free us to focus on impactful decisions.

Despite the hype, AI has demonstrated value in industries across the board — from agriculture to biotech to manufacturing. AI is just beginning to ingest data to power services and offerings, as well as providing information necessary for better decision-making.
AI’s success will continue in the new year, specifically in a new area: troubleshooting. Expect to see an impact on troubleshooting for operators, data centers, etc. as AI helps individuals tackle day-to-day issues, enabling them to focus on critical problems that AI itself can’t help with. In 2018, AI will guide and augment humans in solving hard problems as it further cements its value-add as a human cognitive partner, freeing people from worrying about details so they can spend time on more impactful decisions.
— Pepperdata CEO, Ash Munshi

Spark on Kubernetes will catch fire.

Kubernetes, the open-source system for automating deployment, scaling, and management of containerized applications, has been a rising star over the past few years. In 2018, we will see its popularity take center stage. Kubernetes has already been quite successful in running microservices. Enabling the use of Spark on Kubernetes will allow IT organizations to standardize both services and Big Data on the same Kubernetes infrastructure. This will propel adoption and reduce costs by enabling IT organizations to focus more effort on applications.
— Pepperdata Software Developer, Kimoon Kim

In addition to having an infrastructure for ML, soon ML will power infrastructure.

We’ve already been using our Big Data about Big Data systems to build models about performance. We’ve found that we can model and predict the effect of memory swapping on systems, classify jobs by patterns of resource consumption, and predict wait times on busy systems. These algorithms are being used as part of our cluster optimization products.
We will see more applications like these — for example, this paper will influence database design and we’ll see a real-world implementation of GPU-accelerated data structures that use dataset-specific learned models.

— Pepperdata Co-Founder and CTO, Sean Suchter


DataOps will replace DevOps to accelerate the pace of application development.

Tools developed today are empowering data scientists to become active users of Big Data, enabling model development and iteration with production data. Major vendors including Databricks, MapR, and Cloudera have built tools that enable collaboration between business users, data engineers, and data scientists to iterate fast from business use case to model development and application development. We expect these tools to become more sophisticated, reducing the need for data engineers to write special code to run models developed by a data scientist to run efficiently on big data clusters.
— Pepperdata Director of Product Management, Vinod Nair

Bonus Prediction: We won’t need to write this yearly prediction blog next year. AI will write it for us.

Take a free 15-day trial to see what Big Data success looks like

Pepperdata products provide complete visibility and automation for your big data environment. Get the observability, automated tuning, recommendations, and alerting you need to efficiently and autonomously optimize big data environments at scale.