As we look forward to this week’s Hadoop Summit, I took a few minutes to scan through the program to see what would be new and exciting. There will be a lot of great talks by great speakers – yet the real story isn’t what any one person is going to cover, but just how far Hadoop has come in the past few years.
The 2011 Hadoop Summit had 34 talks in three tracks. This year has over a hundred more, with 138 talks across six tracks.
But it goes beyond the numbers. Just take a look at the top words from the 2011 program:
String together some of the distinctive words, and you see that the earlier summits were generally about a new cluster system using MapReduce, HDFS, Hive, and HBase to process data from web users.
What a difference a few years make! Check out the 2014 program:
The 2014 summit could be described as analytics, SQL, ETL, and interactive apps on a data management platform used by business in an enterprise architecture.
I took a more detailed look at the top 100 terms from the five summits 2010-2014. Here are the words that gained the most in frequency this year vs. the four previous summits:
For comparison, here are the ones that fell the most:
Hadoop has entered a new phase – it’s not the shiny new thing, it’s not hype, and the community is no longer focused on building and connecting the basic pieces. Hadoop in 2014 is real, and it’s being used by companies large and small, well beyond the original user base of tech companies. The community (and increasingly, the industry) is focused on driving real business value with real applications.
As for this week, I can’t wait for the great sessions and the chance to talk to some really smart people doing cool things. I’d love to talk to you if you’re at the summit – send me a note @chadcc!
P.S. Thanks to the Internet Archive Wayback Machine for keeping the old versions of the Hadoop Summit programs around!