Sergey Nivens - Fotolia
The data warehouse came under criticism in recent years, as a surplus of big unstructured data led some businesses to rethink analytics. Enthusiasm for Hadoop, for a time, put data warehouses on the defensive.
But the data warehouse may be making a comeback – sometimes, in the form of a cloud data warehouse. Count a veteran consultant among those who see a rebirth in interest in the data warehouse.
"Today, you can see people trying to reestablish the idea of the data warehouse in the organization," said William McKnight, president of McKnight Consulting Group in Plano, Texas. He noted that plans for new data warehouse approaches come against a backdrop of changes in Hadoop.
In the form of the Hadoop data lake, the Hadoop distributed data processing platform took attention away from established, vertically scalable data warehouses. Now, Hadoop itself is meeting new competition in the form of cloud data warehouses.
Migration to the cloud and interest in cloud object storage, rather than Hadoop Distributed File System storage, McKnight said, is a partial driver of new data warehouse interest -- that and the fact that existing data warehouses may be showing their age.
"The data warehouse is probably now the place where data infrastructure needs remediation -- more than anywhere else," he said.
Data warehouse lineup
Snowflake -- which takes its name from the snowflake schema beloved by data warehouse architects -- provides a columnar SQL data warehouse as a service and is led by CEO Bob Muglia, once head of Microsoft's servers and tools businesses.
Yellowbrick -- just out of stealth -- is a maker of a flash memory-based data warehouse appliance and is led by CEO Neil Carson, formerly CEO at flash storage specialist Fusion.io. Yellowbrick's first targets are on-premises, hybrid and private cloud implementations -- not yet including public cloud.
Today, these and other systems vie with the data warehouse efforts of cloud providers. The cloud category is led by Amazon, which became an immediate force in the cloud data warehouse sphere with its Amazon Redshift entry in 2013.
Also ready to help data warehouses climb to the cloud are IBM with Db2 on Cloud, Microsoft with Azure SQL Data Warehouse, Oracle with Autonomous Data Warehouse, Teradata with Vantage and others.
That's not to mention Hadoop players, like MapR, and Hortonworks and Cloudera -- now merged as Cloudera -- that target data warehouse applications, among others, for their open source-oriented platforms.
Hadoop's 10-year run of popularity gained it critics among data warehouse ranks -- that is, at least, if Matt Glickman is a guide.
"Hadoop will go down in history as one of the biggest head fakes in technology," said Glickman, vice present of customer and product strategy at Snowflake.
While he acknowledged that the types of data lakes that have grown up around Hadoop can be useful, he maintained that Hadoop still lags in terms of high-concurrency querying, an area on which Snowflake has tried to focus.
Data warehouse workout
For Carlin Eng, data engineer at athletic performance app vendor Strava, the Snowflake cloud data warehouse service has proved a useful platform.
By looking at analytics created in the data warehouse, San Francisco-based Strava can tailor new features for users that employ a GPS tracker to help create their own workout plans, analyze their own exercise activity and participate in the Strava social network.
Formed in 2009, Strava was "born on the cloud," Eng said. Its users' mobile device data goes first to the cloud, so a cloud data warehouse makes perfect sense, he emphasized. Culling that data for trends is important, as is understanding what people like, so developers can prioritize efforts.
"There are a lot of potential products we can build, but we have to know which are the first ones to tackle," Eng said.
Efficiency in use of lean technical team members' time was also a key reason for going with Snowflake, he continued.
"There are a lot of open source tools for big data analytics, but administering them can be an issue," he said.
He included innovative Hadoop platforms among the open source tools he has considered, but some encountered drawbacks.
"Hadoop was a really interesting technology that allowed a lot of things that were impossible before, but it is unlikely that a team our size would want to administer Hadoop clusters," Eng said.
Eng declined to identify other data warehouses Strava has used, but he said support for concurrent queries was one of the main reasons that Strava went with the Snowflake platform.
Clearly, however, reduced infrastructure administration via a cloud data warehouse was also a powerful driver of Strava's move to Snowflake.
"We don't want to have something that requires a lot of care and feeding," Eng said.
Even for large shops, as more and more data arrives, the administrative tasks associated with expanding data warehouses are becoming more onerous. In turn, according to analyst Wayne Eckerson, interest in cloud data warehouse management services is growing.
"You can get rid of infrastructure and IT support, you don't have to spend months tuning deployments and you can scale it up and down," said Eckerson, founder and principal consultant at Eckerson Group in Hingham, Mass.
Moreover, "you don't have to buy for peak capacity," he added.
These and other reasons suggest that, despite rumors of its demise, the data warehouse is staging a comeback -- often in the form of a cloud service.