What is Scalability in Cloud Computing

What is Scalability in Cloud Computing

How does a CTO determine which data should go to hot cloud storage, and which to cold data storage? How can companies control their growing amount of data, and the escalating costs that come with it? And how can data storage methods benefit from observability solutions and tools?

These questions, and more, were addressed during our webinar, titled What Does a CTO Do When a 60PB Hadoop Cluster Devours the IT Budget? It featured Chuck Yarbrough, Senior Director of Product Marketing at Hitachi Vantara, and our very own Pepperdata Field Engineer Alex Pierce. Here is a look at some of what was discussed.

The Challenges of Data Storage Methods

Yarbrough, for the better part of his career, has encountered various challenges involving data storage methods in the industry. He recalls how deciding the amount of data he could make available for apps and developers was a constant back-and-forth during his previous stint as a data warehouse manager in Silicon Valley.

“Business users always wanted more. Inevitably we’d say, ‘You could only have three years’ worth of data, and that’s it. And they’d be like, ‘Well, no, I need seven.’ And you argue about it, but the reality is there were limitations,” Yarbrough says.

Big data and the advent of Hadoop really enabled the industry to go way beyond the limitations of prior architectures. However, with this advancement comes another challenge: cost optimization.

“It [Hadoop and big data] enabled a mass change in the industry so that we could scale to areas that we hadn’t been able to before,” says Yarbrough. “But that leaves us here now, where we’re talking about lots of data, 60 petabytes of data, literally eating the budget. The cost began to get pretty big,” he adds.

Determining data temperature is proving to be a challenge now, as well. Before, when data became “cold” (or infrequently accessed), companies could just decide to store that data offline because of limited storage capacities. Often, they would base it on the age of the data.