freshidea - Fotolia
Editor's note: Bob Muglia, who was the CEO of Snowflake Computing, has left the company, it said in a May 1 press release. He has been replaced by Frank Slootman, who is now Snowflake's chairman and CEO.
Many enterprise users are looking to the data warehouse as their prime strategy for cloud migration planning. Indeed, upstart data-warehouse-as-a-service vendor Snowflake Computing has made progress in recent years against more highly touted open source Hadoop and Spark alternatives. The cloud data warehouse vendor is on the rise, with a new partnership with autonomous analytics vendor Anodot and its acquisition of query vendor Numeracy. Now-former Snowflake CEO Bob Muglia talked recently with SearchDataManagement regarding his views on the unfolding evolution of the cloud data warehouse and cloud analytics.
Before coming to Snowflake in 2014, Muglia was the longtime chief of Microsoft's server operating system operations in the era before cloud computing. In this Q&A, conducted before he left the company, Muglia touched on Snowflake's origins, how the vendor embraced the term cloud data warehouse, using AWS and S3 storage for data, and the importance of ethics in data handling as corporate data moves to the cloud.
This transcript has been edited for length and clarity.
What was the essential direction, or technology spark, when Snowflake was founded?
Bob Muglia: Snowflake's founders included Benoit Dageville and Thierry Cruanes, who had come out of Oracle. They had begun thinking about how they could reinvent the data warehouse and the relational database. They were people who could actually build a relational database, and they were looking at whether it was possible to make the physics work in the cloud on Amazon AWS.
They spent a lot of time looking at the performance throughput that they could get out of CPUs ... and how fast they could move data to an S3 store. When they did the measurements, they realized that with the cloud and with technologies like Blob storage, S3, virtual computing on demand and 10 gigabit Ethernet, it was possible. That was the genesis.
The thing about the data warehouse, it turns out, is it's not about being I/O bound, it's about compute power. So the real key was to do compute and get updates over the network.
At the time, data warehouses had successes, but people didn't totally love them, right?
Muglia: Well, it's hard to find anyone who has built a solution and had it meet their needs. There is a huge unsatisfied demand to work with and analyze data and the existing solutions don't do the job.
Over four years ago, we embraced the term data warehouse -- but we thought a lot about it because it was a quaint, retro-like term that went back 25-plus years and people did not have a particularly positive view of it based on the results they achieved with it.
But what we were building was, in fact, a cloud data warehouse.
Bob Muglia CEO, Snowflake Computing
Hadoop was getting a lot of attention in Snowflake's early days as an alternative to the data warehouse, although there wasn't much discussion of cloud at that time. What was it like when you started at Snowflake and walked into a Hadoop-intensive big data era?
Muglia: Yes, when I joined Snowflake, [Hadoop] was the rage. The generally held view was that it was going to be the future. And yet, I think it was obvious to a number of us that it had significant challenges. People were trying to make it work but it was apparent that it was a very difficult system to use.
It was very much a toolkit. People had to put things together, you needed to write code, you needed specialized people -- there were so many challenges that were preventing people from making progress. It became apparent that you could build a relational database that allows you to perform a simple SQL query to do that job for you and the issues of scale and concurrency would be gone.
Of course, Amazon came up with Redshift. It too is much simpler to operate. How does Snowflake differentiate itself from Redshift?
Muglia: Redshift is a traditional shared nothing SQL [cloud] data warehouse. It therefore has the same scalability issues that [similar] existing on-premises systems have. It is simply Amazon's offering of what is effectively an on-premises database offered in the cloud.
The difference between Redshift and Snowflake is architecture. Our architecture we call multi-cluster, shared data. The real key to it is that you only have to make one copy of the data. And we can put as many compute clusters against that data as you might need to do the job.
How do you view cloud and data overall -- that is, in terms of where it is all going?
Muglia: I think we're really in a major transition in the industry right now. You see that with the daily articles on concerns about privacy, data breaches and, generally speaking, the impact that information is having on society. We see a transition from an era that has been dominated by IT systems to a world where information is becoming the primary source of production and future economic growth.
You see that everywhere. You see that with the transition of the auto industry, and the changes that things like Uber, Lyft and autonomous driving [are bringing]. The fuel of all of this is really the data that's behind it and the ability for people to work with data in an effective way, and to do so both responsibly and ethically.
We've seen example in the last couple of years where [companies] have not done that. The Facebook scandal of the week is a good example. It's a reminder of how important it is to be responsible with information, and it's something that we take very seriously.
Meanwhile, we're going to see an entire industry being built to support every organization as they move to become data-driven. Today, access to [top] data science can be found in Silicon Valley and in hedge funds in New York -- but the industry will transform over the next few years to address every organization.