A few of us from Pepperdata attended HBaseCon in San Francisco yesterday. We got to talk to quite a few people and sat in on some of the sessions. There was a lot of information to take in, but here are some of the things that really stood out for us:

  • People want to run HBase and MapReduce on the same cluster, but it’s not fun. We heard a lot of people talking about the challenges they have running MapReduce together with HBase, but they still want to do it. If you’re doing batch loading for HBase, it means you’re using MapReduce on the same cluster. At Pepperdata, we’re working on something that will help with that – more to come!
  • HBase can be pushed to the extreme – if you’re careful. Rocket Fuel gave a great talk on how they have implemented large-scale real-time lookup with consistent response times under 10ms, including the challenges they’ve faced. Getting HBase and Hadoop to cooperate is painful – you need the HDFS clients to follow the circuit breaker design pattern, which has to be implemented for each client. It’d be much better if the pattern was more automatically applied.
  • Operations strategies for HBase are constantly evolving. As some of the operations speakers noted, it’s important to think about how to properly distribute HBase workloads and put them in the right clusters. The challenge, though, is that the workload requirements are evolving rapidly as the available technology advances, so a distribution across clusters that works now may not work in a few months. It’s hard for IT operations pros to keep up; there’s a big opportunity for both the community and ecosystem to help.
  • People love ice cream. Maybe it’s just the nice weather here, but the afternoon ice cream break was great and very popular!

Take a free 15-day trial to see what Big Data success looks like

Pepperdata products provide complete visibility and automation for your big data environment. Get the observability, automated tuning, recommendations, and alerting you need to efficiently and autonomously optimize big data environments at scale.