There are a number of estimates out there for how many organizations are currently using Hadoop. These estimates vary widely, and no one can put a hard figure to it. But there is little doubt that the number is growing as organizations of all sizes adopt this platform for large data processing. Despite the large growth in adoption, there are still a myriad of implementation roadblocks when moving Hadoop to production.
In a recent industry report, published by Taneja Group, analyst Mike Matchett discusses issues such as the evolution of big data apps, and the challenges that inherently arise for organizations trying to run one or more application in a single cluster. Splitting up clusters in order to accommodate multi-tenancy or multiple applications is a waste of resources, both in management and in hardware. And guaranteeing performance on those clusters is a constant battle. As Matchett observes:
“…live performance matters as much now in Hadoop as it does for any other data center solution. There is no practical way to replicate a big data cluster and its data sets for every constituency looking for production performance sla’s including business analysts, business process owners, multiple departments with differing agendas and applications, big data scientists, agile operators, DBA’s, and even external clients. IT needs to find a way to ensure shared, multi-tenant workload performance QoS in order to succeed with big data in the datacenter.”
Basically, as the Hadoop market matures, the ecosystem of tools and applications evolves with it. And though that means the potential for business value is on the up, so is the complication of managing and deploying these environments. Security, data quality, and performance optimization are all top of mind. Not to mention keeping costs within reason. And performance is exactly where Pepperdata comes in.
For most organizations wanting to run clusters that have multiple users and/or multiple workloads, they will find that putting everything on a single cluster can degrade performance and cause jobs to fail. So, the logical solution is to split up the clusters to guarantee performance. But with Pepperdata software, you can run multiple workloads, multiple users, and multiple applications on a single cluster – all while still maintaining performance and guaranteeing SLAs. This fully automated QoS enforcement means even large services like HBase can now exist on the same cluster with other big data jobs and not risk missing SLA’s.
Here’s a snapshot from the Pepperdata dashboard showing two separate timelines of HDFS write bytes/sec, broken out by user, for the same customer cluster. There are two users: Red runs all MapReduce jobs, and Green is responsible for HBase. The top graph shows how unpredictable HBase performance was before the customer implemented Pepperdata. The bottom graph shows how Pepperdata enforced HBase protection for performance and reliability.