As we look forward to this week’s Hadoop Summit, I took a few minutes to scan through the program to see what would be new and exciting. There will be a lot of great talks by great speakers – yet the real story isn’t what any one person is going to cover, but just how far Hadoop has come in the past few years.

What a difference a few years make! Check out the 2014 program:

The 2014 summit could be described as analytics, SQL, ETL, and interactive apps on a data management platform used by business in an enterprise architecture.
I took a more detailed look at the top 100 terms from the five summits 2010-2014. Here are the words that gained the most in frequency this year vs. the four previous summits:

  1. storm
  2. business
  3. service
  4. yarn
  5. interactive
  6. value
  7. architecture
  8. ecosystem
  9. lessons
  10. warehouse

For comparison, here are the ones that fell the most:

  1. computing
  2. social
  3. users
  4. generation
  5. hdfs
  6. mapreduce
  7. application
  8. graph
  9. infrastructure
  10. algorithms

Hadoop has entered a new phase – it’s not the shiny new thing, it’s not hype, and the community is no longer focused on building and connecting the basic pieces. Hadoop in 2014 is real, and it’s being used by companies large and small, well beyond the original user base of tech companies. The community (and increasingly, the industry) is focused on driving real business value with real applications.
As for this week, I can’t wait for the great sessions and the chance to talk to some really smart people doing cool things. I’d love to talk to you if you’re at the summit – send me a note @chadcc!

P.S. Thanks to the Internet Archive Wayback Machine for keeping the old versions of the Hadoop Summit programs around!

Take a free 15-day trial to see what Big Data success looks like

Pepperdata products provide complete visibility and automation for your big data environment. Get the observability, automated tuning, recommendations, and alerting you need to efficiently and autonomously optimize big data environments at scale.