This blog post is part one of a two-part series authored by Pepperdata Field Engineer Jimmy Bates. Bates is a true veteran of the big data world. He speaks from a place of expertise and in-the-trenches experience. This part one lays out his view of the history of big data, the pan-industry challenges, and the perennial obstacles of working efficiently in a big data setting.
The Timeless Challenges of Big Data
I have worked in big data for over seven years. Big data is a truly dynamic field, with a new execution framework emerging every 12 to 18 months. Ours is a tech landscape in which we face evolving and shifting problems.
With this pace of change, you might think that it would be impossible to find commonalities between companies, industries, or timepoints in the tech landscape. However, surprisingly, many of them recur and repeat themselves. They are flavors of challenges one always finds in emerging technologies.
I can identify six big problems and challenges common to all eras and all forms of big data. Together, they create the common big data feeling that one is in a world of constant overcorrection, always flying blind. In part two of this blog series I will talk about how Pepperdata can help. But first, the big six:
Scarcity of Skills
This is always a common problem as we shift from one execution engine to the next. As the new becomes standard there is always the new new. When you add in the shift to distributed storage and compute environments, you get added challenges; anticipating the needs of scale is always hard. Skills in development, skills in administration, skills in onboarding, skills in architecture analysis, skills for scaling… Finding them, hiring them, and keeping them are a constant challenge.
Scarcity of Resources
A scarcity of resources is always a problem, and it has many wrinkles. I have never been at an enterprise that had all the resources they needed. This is especially true with big data systems. We have seen the drive from purpose-built platforms to general commodity-based architectures. This has made hardware easier to incorporate, but it has had the adverse effect of making resource allocation more challenging.
One of the ways many architectures scale on these big data environments is by pretending that you have the freedom to add more resources whenever you want. I have never been anywhere where that was true. When you only have so much, you need to understand how it is used and make sure it is allocated where it makes the most positive impact at each and every hour of every day.
As you achieve production success, this problem quickly grows. The longer your success lasts, the more success you support and the more legacy you start to carry. Add in blocks of changes in hardware and the continual push to use new software engines, and the path for mastering proper allocation of your limited resources starts to feel like a severe case of tunnel vision.
Rate of Innovation
The drive to innovate is on an explosive upward curve. Every 12 to 18 months, it feels like I am learning a new engine for big data that will purportedly make all my old p