What is Scalability in Cloud Computing

What is Scalability in Cloud Computing

Our “Pepperdata Profiles” series shines a light on our talented individuals and explores employee experiences. This week, we chatted with Justin Ng, our resident data scientist. Having worked at Pepperdata for more than two years, Justin shared his thoughts on the future of data analytics, the cloud data management challenges that Pepperdata solves, and what people often get wrong about the data analytics industry.

justin ng

Hey, Justin! How did you start with your career in data science?

Before I joined Pepperdata, I had around five years of experience in the industry. I used to be more of a software developer or software engineer type, but I was always interested in looking at data a lot. So I decided to have a career change of sorts. I basically went back to school to do my graduate studies in statistics, in the hope of getting into any sort of field in the data analytics industry, particularly in data science. After I finished my master’s degree in statistics, I went off and I started working with a focus on the data analytics industry, more than software engineering.

Interesting. And how did you come across Pepperdata?

A recruiter called me. I had worked for large corporations in Canada before, and what drew me in with Pepperdata was its appeal as a startup. It was something I was interested in experiencing. The other thing was the industry: I was particularly keen on looking at different sorts of data that I hadn’t seen before, the type of data that Pepperdata collects.

Really? What were the “new” types of data that you were hoping to come across?

Well, before Pepperdata, I had been working with large corporations, right? A lot of those were banks and, also, a telecom company. So the data I’d often see were transactions: how many people withdrew money, how much people were spending, stuff like that. Pepperdata, by comparison, tackles a different animal. Stuff like memory usage, resource usages, job types—higher frequency data that we can potentially use in order to help customers run their data-intensive workloads more effectively. I just thought that was a nice change.

Yeah, the data science world is very cool, very modern. But tell us, Justin: What do people often get wrong about it?

People always hear about the more “sexy” types of projects in the analytics industry. Things like, artificial intelligence, or whatever new thing there is. But for most of the problems, I found that a lot more work goes into retrieving the data, preparing it, and trying to productionalize it, more so than actually building some really intelligent models to do stuff on it.

So there’s a lot of, I guess you can say, mundane work, which takes up most of the time. It’s that type of work that most people wouldn’t consider very “fun” or interesting. But it’s definitely something that you have to deal with before you can do any sort of analysis for most problems.

In reality, data science is crunching enormous data sets and doing quite complex statistical stuff. So if you actually want to do anything, you have to do the boring work first.

Another thing about the analytics industry and sciences is that you don’t actually control the outcome. You’re not sure what you’re going to get before you actually do it. There’s always a little bit of uncertainty