Sergey Galushko - Fotolia

Feature

Data hub vs. data lake: Deciphering the differences

Data lakes and data hubs are approaches to data management that are typically opposed. Here are the main differences between these two storage options.

Joseph M. Carew

By

Joseph M. Carew

Published: 29 Apr 2020

Companies have realized that the more data they gather, the better they can understand their customers and users. And the way a company stores its data can allow for a more balanced and intelligent view of its operations.

Two storage options are data lakes and data hubs. A data lake stores raw data similar to a regular lake, while a data hub is composed of a core storage system at its center with data in spokes reaching out to different areas.

There has been an ongoing debate on data hub vs. data lake and which is the best way to approach data gathering and storage. Though these are both common terms, differentiating between the two can still be a challenge.

Defining a data lake

A data lake is a centralized option in which all forms of data can be stored in a variety of ways. There is no need to translate data to a singular form, as a data lake can hold a vast amount of raw data in its original format.

Data lakes were created by companies because they understood the value of their data, said Hossein Rahnama, MIT machine intelligence professor and founder and CEO of Flybits. Bringing all that data together allows companies to better predict the needs of their customers and the needs of their business.

A data lake acts as a repository for data from all different parts of an organization. This makes data storage easier than other storage solutions but can become a problem when it comes to drawing that data back out. In order to retrieve desired data from a data lake, it must be queried, and data lake users may struggle with accessibility. Highly technical skills are often required to find relevant information and draw conclusions from that data.

"Companies that are going to be successful leveraging their data lake are the ones that are also building a creative and interactive layer on top of that data lake so non-IT experts can also leverage data assets to build new capabilities," Rahnama said.

Data lakes are often associated with a Hadoop framework; however, many vendors now support data lake architectures, including Amazon, Cloudera and Microsoft. Many even offer the option to deploy data lakes in the cloud.

Key differences between data hub and data lake — A few key differences between data hubs and data lakes

Establishing a data hub

A data hub can be thought of as a hub-and-spoke approach to storing and managing data. Data is physically moved and reindexed into a new system. This provides more structure to the data and permits diverse business users to access information that they need more rapidly than in a data lake.

Data hubs are usually created as a joint effort between complementary businesses, Rahnama said. It could be between a telecom operator, a bank and a supermarket, and they will all come together to share insights and elements of data.

Each spoke of this wheel would have access to some or all of the collective data gathered, depending on what they were looking to gain from it.

"The telecom operator may have a data cloud [storing] telecom information, the financial organization may have another cloud owning transaction data and the supermarket may have another data set," Rahnama said. "Now, these organizations have two options to create a data alliance or a data hub; they may agree to host their data in a centralized repository that can be accessible by all three of them."

This brings up concerns about privacy, as information collected by a bank could find its way to a completely different company. To ease these worries, it is critical for companies using data hubs to ask for user consent to sharing their data.

Similar to data lakes, data hubs were originally built on a Hadoop framework, but there are now other popular vendors, including MarkLogic and Google.

Data hub vs. data lake

Creating a data hub does not mean that data lake architecture is unavailable, however.

"A data hub, at the same time, may or may not use a data lake architecture," Rahnama said. "I can use a data lake with different stakeholders to participate in. Or I can completely decentralize it and leverage something like a blockchain or edge of the cloud or other decentralized mechanism to still form the alliance but in a decentralized way."

Giving numerous businesses access to a communal data lake would, for example, combine both a data lake and a data hub in one solution. This would increase the amount of participating companies but would do nothing to mitigate the accessibility of data lakes.

Data hub and data lake use cases

Both models are strong contenders to reduce data silos, as they are built to be accessible across business divisions' access to the same data.

The multipronged approach of a data hub is popular for use cases that require multiple interpretations to the same data. For example, analyzing similar data for both marketing and financial analytics. This makes data hubs popular for enterprises that analyze various types of data to perform tasks, such as fraud detection and customer service.

Because data lakes are built to store data until it's necessary, they tend to be more popular among enterprise with a less urgent need for data. Data lakes are popular for storing IoT data and archival data. Active archive data stored in a data lake can be used by data scientists for research across industries, including health sciences.

The debate between data lakes vs. data hubs isn't straightforward. With both filling different needs and having a combination as a possibility, the right data management approach boils down to company needs.

Dig Deeper on Database management

Search Business Analytics

Domo acquired for $400M, sells to Progress Software
Amid slow growth and a sliding stock price, the analytics provider is the latest specialist to join forces with a larger ...
Your AI isn't failing -- your metrics are
AI doesn't know what matters to your organization. It only knows what the business measures -- and weak metrics become problems ...
Will AI replace data analysts: A year and a half later
Organizations should foster and fund -- not terminate -- entry-level positions in favor of AI. Junior talent is key to the ...

Search Enterprise AI

The AI metrics trap: Measuring speed but missing value
Rising token counts and pull request throughput look impressive, but they miss where AI creates real value: preventing bad ...
Virtual power plants could help solve the AI energy problem
Virtual power plants can help enterprises and data centers overcome power limitations by unifying energy resources to stabilize ...
When orbital AI data centers become business reality
Shrinking launch costs, hyperscaler validation, serviceable hardware, manageable insurance risks and earthly energy constraints ...

Search Oracle

Click-to-launch tools pull apps through Oracle Cloud Infrastructure marketplace
Oracle has made it easier for customers to choose and launch third-party software onto its cloud. Now, the question is whether ...
Willis develops app to put a personal touch back in voluntary benefits
Part two of a two-part article: Willis uses PeopleSoft 9.1 to bring back the personal feel to automated insurance selection for ...
Willis develops app for real-time voluntary benefit selection
Part one of a two-part article: Willis uses PeopleSoft 9.1 to create real-time automated insurance selection for voluntary ...

Search SAP

SAP Sapphire 2026 news, trends and analysis
Here are the newest developments from SAP Sapphire in Orlando, Fla., with the enterprise software vendor's 2026 announcements and...
Compare SAP greenfield vs. brownfield approach for S/4HANA
Here's an explanation of the key differences between SAP greenfield vs. brownfield, what a third, hybrid approach can do for an S...
At TechEd, SAP continues to lay down the AI data foundation
New tools to speed up agentic AI development, open SAP platforms and provide access to data products were also touted as helping ...

Close