Microsoft, Databricks simplify synchronizing, sharing data
A new integration designed by the tech giant and data platform vendor eliminates the need for users to move or copy data to join it across platforms for better-informed analysis.
Microsoft and Databricks on Tuesday expanded their integration, launching a new feature that unifies data across Azure Databricks and Microsoft Fabric for joint users of the platforms.
First launched in 2018, Azure Databricks is a Microsoft-optimized version of Databricks' platform for data management that is available to Microsoft Azure customers regardless of whether they are also Databricks users. Microsoft Fabric, meanwhile, is a unified platform for data management and analytics released in 2023 that includes a data lakehouse architecture.
With the launch of Mirroring for Azure Databricks Unity Catalog in Microsoft Fabric, Microsoft and Databricks are enabling joint users of Azure Databricks and Fabric to access Azure Databricks data tables from Fabric without forcing them to move data or make duplicate copies.
The feature mirrors data -- provides an exact replica in real time -- housed in Azure Databricks to OneLake and is built on the Databricks Unity Catalog for governance of all data and AI assets. By mirroring data, it provides joint users access to their data across multiple clouds from OneLake, which is a data lake in Fabric.
Previously, joint users had to manually move or make copies of Azure Databricks tables to use them to consume data in Fabric tools such as Power BI.
Now, analysts in Microsoft Fabric can see the latest schema in Unity Catalog and start querying it -- their credentials are automatically handled by Unity Catalog, so it makes for a seamless experience.
Sanjeev MohanFounder and principal, SanjMo
Given that Mirroring for Azure Databricks Unity Catalog in Microsoft Fabric eliminates data egress or the need to make copies of data -- both of which can be complex and expensive -- the integration is a significant addition for joint Azure Databricks and Fabric users, according to Sanjeev Mohan, founder and principal of analyst firm SanjMo.
"Now, analysts in Microsoft Fabric can see the latest schema in Unity Catalog and start querying it -- their credentials are automatically handled by Unity Catalog, so it makes for a seamless experience," he said.
Unifying data
While data management platforms such as Microsoft Azure and Databricks -- as well as Amazon Redshift, Google BigQuery, Oracle Autonomous Data Warehouse, Snowflake and many others -- provide comprehensive data management capabilities, enterprises often not only store data in multiple cloud-based platforms, but also keep some data on-premises as well.
The reasons are varied, but include avoiding vendor lock-in, mitigating risk in the event that one provider suffers an outage, differences between platforms that make them more optimized for certain capabilities such as AI or scale than others, inherited infrastructures from mergers and acquisitions, industry-specific regulations that are better addressed by certain platforms than others, and security.
However, because they store data in different platforms, enterprises often need to integrate that data to develop analytics and AI tools such as reports, dashboards, chatbots and agents. Such integration is complex, particularly when platforms store data tables in different formats, and can be costly, given the need to move data or make copies.
Mirroring for Azure Databricks Unity Catalog in Microsoft Fabric aims to eliminate that complexity and cost for Azure Databricks and Fabric users by enabling them to work from a single copy of data.
The number of joint users of Azure Databricks and Fabric is growing, according to Dipti Borkar, vice president and general manager of Microsoft OneLake and Fabric ISVs. That growth provided the impetus for developing the new integration.
"Given ... the high number of customers who are adopting both Azure Databricks and Fabric in their data estate, extending Mirroring to Azure Databricks was an obvious next step," she said. "This extension [provides] customers with the flexibility they need to more efficiently leverage their data."
Using the Fabric portal, users can synchronize Azure Databricks Unity Catalog data with OneLake, which is automatically included in Fabric, with a few clicks. Once data is synchronized across Fabric and Azure Databricks, as information gets updated or tables added, removed or renamed in one platform, the data stays in sync across both so that it remains consistent and current.
Because the new integration eliminates the labor -- such as replicating data, building extract, transform and load pipelines, or manually synchronizing between platforms -- previously required to use Azure Databricks and Microsoft Fabric data together, it will provide immediate value to joint users, according to Donald Farmer, founder and principal of TreeHive Strategy.
"[Manual] processes are too slow for the current demands of near-instant insight," he said. "Mirroring provides a real-time view into Azure Databricks data from Microsoft Fabric without copying or moving data -- that's the key enablement here."
Among the primary benefits of the integration are unified governance and security through Unity Catalog, Farmer continued.
"What stands out is needing one copy of your data and the unified governance and security," he said. "The ROI on both of these important features should be highly achievable even in the short term, so that's great news for [both Fabric and] Azure Databricks customers."
Next steps
Mirroring for Azure Databricks Unity Catalog in Microsoft Fabric is the latest in a series of integrations between Databricks and Microsoft, according to Borkar, who highlighted recent Azure Databricks integrations with Azure AI Foundry, Power Platform and Copilot Studio.
Looking ahead, Microsoft and Databricks could serve joint users of Fabric and Azure Databricks by adding deeper access controls to improve security when working across platforms, according to Mohan. In addition, Microsoft could add similar integrations with Databricks rival Snowflake.
"The next thing both the vendors should do is converge access governance so secure access to data can become universal," Mohan said. "And I hope to see Microsoft Fabric extend the same mirroring of the Polaris Catalog from Snowflake in the future."
Farmer, meanwhile, suggested that Microsoft and Databricks provide users with federated data tables -- tables stored in an external source that can be connected to and queried from remote platforms as though they were stored internally.
"There are many ways this technical partnership could go further, but if I look at how this specific integration and its advantages could be enhanced, my top request would be for federated tables," he said. "This enables seamless integration of disparate data sources [and] represents a natural evolution of cloud-based data platforms toward increased integration and interoperability."
Eric Avidon is a senior news writer for Informa TechTarget and a journalist with more than 25 years of experience. He covers analytics and data management.