Around Halloween, most of us are up for a good scare. The rest of the year, we’re probably not looking for a fright. Especially not in the form of unexpected cloud costs. Unfortunately, data architects are experiencing a scary event every month when their cloud bills show up higher than anticipated and without explanation. In this blog post, we’ll paint a picture of what the big data architect is responsible for, how this nightmare plays out, and how they can optimize their way out of this frightening scenario.
Responsibility: Platform Design
The architect is responsible for the platform design. But what is often left unsaid is that they are also responsible for the platform’s success. Data platforms have always been of critical value to technology companies, but the past twenty years have seen data become more valued in nearly every line of business. The architect is tasked with designing and deploying data platforms that derive the maximum amount of value from data. If we dig a bit deeper into that statement, architects are responsible for storage, security, access, and ease of use along with proving that each of those things is being delivered. All while managing cloud costs, of course.
The Nightmare: Sprawl, Complexity, Unanswered Questions, and Unruly Cloud Costs
The promise of the cloud sold us all on the idea that you would only pay for what you used, thus reducing cloud costs. But what if you look at the list of options and determine that you need a little bit of everything? The reality is that more often than not, enterprises are struggling to stick to a predetermined budget. Cloud computing offers solutions to every challenge you can think of in data management—and they all come at a cost.
The curse of the cloud is options. Each line of business, each dev team, ITOps, other architects, etc., will all have a list of technologies that they feel is a “must-have” part of the platform. Without constraints, you can end up with a technology stack that blows the budget, is hard to maintain, and is not flexible enough to serve the dynamic nature of today’s business landscape. On top of that, each component will need to be connected with each other, secured, and monitored while supporting an ease of use factor that permits end users to actually work with the data.
The failure of a bad design is felt via a list of unanswered questions that grows over time until things are fixed. Question number one is “why is my bill so high?” That’s just the beginning.
You get the idea. While some of these questions are inevitable, good design also allows you to answer some of them.
The Way Out: Optimize Everything
Optimize everything. Done. If only it were that easy. In most cases, Pepperdata is engaging with clients who have already hit a bunch of hurdles in their plan to migrate to the cloud or to deploy components of their “cloud first” initiative. Optimization is something that people are forced into when they should be thinking of it from the start by designing in a box.
Design in a Box
A box is a physical set of constraints. You can’t put in more than will fit. Period. The cloud doesn’t present you with any kind of box, and that’s where the trouble begins. Start out with a set of constraints like which applications to move first or which ones are particularly well suited to a set of features in a given cloud technology offering. Or perhaps you start with the data and move the newest, most dynamically accessed data to where the ability to change the analytics hardware mix is something you couldn’t achieve in the data center. Starting without constraints, or only budgetary constraints, leaves too much room for sprawl and complexity. Box it in and grow as you learn some lessons.
Measure Everything/Trust but Verify
Observability is all the rage these days. That should tell you something. Monitoring and deriving insights from monitored components is nothing new. The reason observability is now Observability with a capital “O” is because of the complexity introduced by the cloud. As an architect, your planning phase will have you meet with all manner of team leads, stakeholders, and executive types who each have projections. They’ll tell you what they need, how much they think it will cost, the resource footprint required, etc. Take it all with a grain of salt. Each use case will require the ability to be measured and attributed to a source if it is to be contained and accounted for properly. Remember that long list of unanswered questions we talked about? Observability is how you answer these questions. Not with “performance metrics” alone. Those are only the foundation. You need answers.
Automation can sound pretty aspirational. Thoughts of AI and self healing show up pretty high in the google search results. Take those with a grain of salt, too. You want real automation where it counts. Pro tip: There is no tool that automatically rewrites bad code or queries on the fly with no intervention. Pepperdata Capacity Optimizer comes close on the resource side, but no, that tool doesn’t exist. The automation you deploy as part of your platform should answer “yes” to the following:
- Can I get the performance data and insights I need without developers having to do anything (because they won’t)?
- Can this fancy automation provide the data I need for that report my boss asks for every month?
- Can it fix things or is it just a dashboard?
- Can my users access it directly rather than calling the Ops team?
Success: A Combination of Things
No data platform is perfect or without the occasional challenge. In the end, you need it to securely support the business and users. You also need to be able to keep up with change and continuously prove those attributes over time while keeping cloud costs in line. Simple 🙂
Learn more about reducing cloud costs by achieving the best performance on Amazon AWS—download our HiBench Benchmark Report today.