This blog post is part one of a two-part series authored by Pepperdata Field Engineer Jimmy Bates. Bates is a true veteran of the big data world. He speaks from a place of expertise and in-the-trenches experience. This part one lays out his view of the history of big data, the pan-industry challenges, and the perennial obstacles of working efficiently in a big data setting.
The Timeless Challenges of Big Data
I have worked in big data for over seven years. Big data is a truly dynamic field, with a new execution framework emerging every 12 to 18 months. Ours is a tech landscape in which we face evolving and shifting problems.
With this pace of change, you might think that it would be impossible to find commonalities between companies, industries, or timepoints in the tech landscape. However, surprisingly, many of them recur and repeat themselves. They are flavors of challenges one always finds in emerging technologies.
I can identify six big problems and challenges common to all eras and all forms of big data. Together, they create the common big data feeling that one is in a world of constant overcorrection, always flying blind. In part two of this blog series I will talk about how Pepperdata can help. But first, the big six:
Scarcity of Skills
This is always a common problem as we shift from one execution engine to the next. As the new becomes standard there is always the new new. When you add in the shift to distributed storage and compute environments, you get added challenges; anticipating the needs of scale is always hard. Skills in development, skills in administration, skills in onboarding, skills in architecture analysis, skills for scaling… Finding them, hiring them, and keeping them are a constant challenge.
Scarcity of Resources
A scarcity of resources is always a problem, and it has many wrinkles. I have never been at an enterprise that had all the resources they needed. This is especially true with big data systems. We have seen the drive from purpose-built platforms to general commodity-based architectures. This has made hardware easier to incorporate, but it has had the adverse effect of making resource allocation more challenging.
One of the ways many architectures scale on these big data environments is by pretending that you have the freedom to add more resources whenever you want. I have never been anywhere where that was true. When you only have so much, you need to understand how it is used and make sure it is allocated where it makes the most positive impact at each and every hour of every day.
As you achieve production success, this problem quickly grows. The longer your success lasts, the more success you support and the more legacy you start to carry. Add in blocks of changes in hardware and the continual push to use new software engines, and the path for mastering proper allocation of your limited resources starts to feel like a severe case of tunnel vision.
Rate of Innovation
The drive to innovate is on an explosive upward curve. Every 12 to 18 months, it feels like I am learning a new engine for big data that will purportedly make all my old problems go away if I just re-code all my workflows and believe really, really hard. I am not knocking the new solutions, which can be fantastic. The problem is usually that I still have all my old flows, and they don’t work so well on the new. So now we run both. This pattern repeats.
Add to this the changes in hardware. Over the years, the pendulum has swung from storage, to network, to memory, to compute. The software comes shortly after the hardware to take advantage of the new, and the pendulum swings once again. Add the drive to make all of this more open to fluid real-time changes and you get software-based architectures where everything is distributed, nothing is rigid, and it is a modern marvel how well it all works…. Until it doesn’t.
The result is much kudos and praise for solving unsolvable problems, but we are left with a mess to troubleshoot and manage. Hindsight is 20/20, but it only helps if the problem in front of you is identical to one that is behind you.
Empowerment of Open Source
I love the results of the open source revolution. For so many years, new initiatives were under control of what IT would and would not support. With the arrival and acceptance of the presence of open source, the pendulum of power has switched to more of a balance between Development and IT. This has helped to keep the spirit of innovation burning bright. It has helped keep IT more open. It has helped keep development more agile.
Unfortunately, this only exacerbates the problems mentioned already. In most cases today, it seems that the old adage of move fast, break fast, heal fast is the way we solve this problem. The faster we can get through the thousands of ways that don’t work, the faster we can get to the way that does.
The Continued Growth of the Cloud
Now enter the cloud. Every challenge discussed up to this point is both solved and exacerbated by the cloud. It just depends on where you are at in your cloud journey. To make the correct choice, all you need to do is understand all of your issues, what your strengths are, what the issues and strengths are within each cloud service (which seems to be different for each cloud offering), and then your choice becomes clear. Easy, right?!
All kidding aside, the cloud will solve some of your problems while magnifying others. Understanding your resources, jobs, successes, failures, and costs will make your cloud venture smoother. But the key question is: what can be done to give you visibility in these areas? Most big data users simply don’t know.
Scarcity of Tools
The tools, software, and methodologies all feel outdated. Then we get an update, discover something new, and all is well. Until we break it all again on the next project. I spent some time in the military, and it brings me back to the old adage of “Every general is fighting the last war.”
In Summary: Flying Blind Towards Success
Finding the answers to the six problems listed above is not easy. When I did bring in vendors to show me how they could help, they always made it about one thing: It was data flow, or data governance, or DevOps, or storage, or something else. In reality, it was never just one problem. It was more about getting insights into the problem that was the cause of my present fire.
I wish I could say that managing these problems and resolving them all is just about using the correct software, tool, or methodology. But in reality, it is really about accepting the new norm and getting started on solving them one step at a time.
But this is much easier said than done. This is why I say that production success is like getting the plane off the ground, plotting a new course to your next success, and then discovering that the storm clouds are coming in and you are losing all visibility. This has been the life I have led, with the scars to prove it, for seven years in big data.
Luckily, I have found the GPS of Big Data: Pepperdata.