Apache Spark is playing a critical role in the adoption and evolution of Big Data technologies because it provides sophisticated ways for enterprises to leverage Big Data compared to Hadoop. The increasing amounts of data being analyzed and processed through the framework is massive and continues to push the boundaries of the engine.
Built on the premise that Apache Spark is the only unified analytics engine that combines large-scale data processing with state-of-the-art machine learning and AI algorithms, the 2019 Spark + AI Summit rolled into San Francisco last week. The event was billed as the largest data and machine learning conference in the world. The Pepperdata team was there to exhibit and meet customers and prospects, giving us an opportunity to learn more about customer wants and needs in the rapidly evolving Big Data application and infrastructure performance (APM / IPM) environment.
This was not a large-scale event, but it was very well attended. It’s clear that Big Data is playing an increasingly important role in business operations that rely on massive amounts of data to support revenue-generating online applications. Think always-on streaming services, on-demand applications, and business-critical transactional processing for retail, travel, banking, insurance, and healthcare services.
We saw a constant stream of visitors to the Pepperdata exhibit booth with many large enterprises being represented. Here are some of our impressions and takeaways from this event…
Although focused on Spark and AI, the common thread that ran through the summit was the adoption of and migration to the cloud. The big cloud service providers were in evidence, including the Big Four: AWS, Azure, Google Cloud and IBM Cloud. And of course, Databrick runs its unified analytics platform in the cloud. Almost every exhibitor had booth messaging that referenced the cloud, and many sessions at the event had a cloud theme. Up, up and away!
We spoke with a lot of people who are considering managed platforms, ephemeral clusters, or no cluster at all because they are frustrated with wrestling with large Hadoop platforms. Data science in the cloud is happening at scale, but cloud can be hard to manage. So, many enterprises are choosing to off-load cloud management to vendors like Databricks and Snowflake. We also had many visitors to our booth who were adopting a hybrid cloud strategy involving two or more cloud service providers; one as primary and the other as failover.