Chad and I founded Pepperdata to help every company benefit from the same kinds of massive data systems we’d built and used at places like Inktomi, Yahoo, and Bing. I managed Yahoo’s web search engineering team – the first commercial user of Hadoop (before it was even called Hadoop). And Chad’s sponsored search optimization and ranking team was the first in the world to increase revenue using Hadoop.

As early Hadoop adopters, we saw great promise in its scalable approach to data processing. For the first time, our teams were able to use huge datasets to directly solve business problems – without worrying about the plumbing needed to move and process all of the data. But at the same time, we struggled with Hadoop’s key limitation: the lack of a distributed cluster supervisor to control hardware use in real time. More than once, I saw an errant Hadoop job bringing down a customer-facing production service.

Data-driven from the beginning

When we started Pepperdata in 2012, we were passionate about helping companies use and have access to the same kinds of data systems we did at Yahoo and Microsoft. But we wanted to make sure we were solving a real problem for real business users.
For the first few months, we were on a full-time fact-finding mission; we set out to understand how both technical and non-technical business users – across a range of industries – use data. We heard from engineers, analysts, line of business owners, and ops teams and met with healthcare providers, real-time trading desks, banks, manufacturers, advertisers, game companies, and the world’s largest online and offline retailers. Once we had collected information from all of these sources, we synthesized feedback and formulated new ideas for months.

The result of all of this work is a vision and product designed and field-tested by the market – before the first code was ever written. As it turns out, we came full circle; after talking to hundreds of users, we saw that as companies start to rely on Hadoop, their needs follow patterns we had already seen at both Yahoo and Microsoft. Ultimately, the problem is that when these clusters are so critical to the business, they have to deliver on time, every time.

Where we are now

So what does Pepperdata mean? Think of it as the extra spice that can give your enterprise data the kick it needs. We let multiple teams run diverse applications on a single cluster in a predictable way – and with full visibility into how the cluster’s hardware resources are being used. We want to make deploying a new distributed application on a thousand-node cluster as seamless as installing a new app on your phone.
We’ve spent the last two years building that vision and putting the product in the hands of customers. But more on that to come…