Snowflake broadens open-source embrace, ups Iceberg support
To make data more interoperable across systems and usable for AI, the vendor is participating in projects that address data quality, integration, governance and discovery.
Snowflake on Wednesday unveiled its continued commitment to Apache Iceberg with pending support for the latest version of the open source table format.
In addition, Snowflake affirmed its embrace of open-source technology, noting its recent work to develop pg_lake -- which is an extension of the PostgreSQL database format that enables users to simplify integrating transactional and analytical data -- and its participation in projects to develop the Apache Polaris data catalog, OpenLineage standard and an open source standard for semantic modeling.
Snowflake's work with open source projects is geared toward making data interoperable across disparate systems so it can be integrated and used to inform AI and analytics applications.
Given that open-source capabilities can help organizations control the rising costs of data management and AI development, support for open source tools is important for vendors such as Snowflake, according to William McKnight, president of McKnight Consulting.
"Snowflake's involvement with open source capabilities is highly important for its customers because it directly tackles the rising costs of data fragmentation and unlocks true data autonomy," he said. "Without this commitment to open interoperability, customers face a heavy architectural tax resulting from sprawling data pipelines, disconnected governance layers and compounded security risks."
Stephen Catanzano, an analyst at Omdia, a division of Informa TechTarget, similarly noted the value of Snowflake's embrace of open source tools.
"Snowflake's work with open source capabilities is important because it provides [users] with flexibility and freedom to manage their data across diverse platforms and engines," he said. "This approach reduces vendor lock-in and empowers customers to leverage open standards while still benefiting from Snowflake's advanced capabilities. This is exactly what customers want."
Based in Bozeman, Mont., but with a campus in Menlo Park, Calif., Snowflake is a data platform vendor that provides data management and AI development capabilities. Recently, Snowflake introduced Project SnowWork, an AI-powered platform in research preview aimed at easily enabling users to automate processes.
Community embrace
Apache Iceberg is an open-source format for storing large data sets called tables in open source data lakes and lakehouses. Because Iceberg is a table format where data is logically arranged in rows and columns rather than a file format, Iceberg enables users to add metadata, semantics and other components of master data management to make it easy to manage and discover data for AI and analytics.
Snowflake's involvement with open source capabilities is highly important for its customers because it directly tackles the rising costs of data fragmentation and unlocks true data autonomy. Without this commitment to open interoperability, customers face a heavy architectural tax.
William McKnight President, McKnight Consulting
Snowflake first unveiled limited support for Iceberg in 2022 with only some Snowflake core capabilities, such as governance and security, available for the open source table format. As a result, users were forced to choose between storing data in Snowflake where it benefited from the vendor's full array of capabilities or the flexibility of storing data in open tables, but with limited Snowflake capabilities.
In April 2025, Snowflake added full support for Iceberg, adding the query performance, data sharing and governance capabilities to Iceberg tables that are automatically applied to data stored in Snowflake.
At the time, format V2, released in 2021, was the most current edition of Iceberg. Now, Snowflake plans to add support for V3 of the open source table format.
Format V3, which was released in June 2025, expands what Iceberg can handle to include support for semi-structured data, row-level change data capture, geospatial data and nanosecond-precision timestamps to make identifying the instant an event or transaction occurred more accurate.
"The support for Iceberg v3 is a major milestone because … these capabilities enhance performance, enable more granular data management and open up new use cases, particularly in high-frequency and complex data environments," Catanzano said. "Iceberg [supports] structured, semi-structured, and even unstructured data types. These are all types of data that agents need."
McKnight, meanwhile, noted that it's important for Snowflake to add support for V3 to stay competitive with other vendors supporting the most up-to-date iteration of Iceberg.
"V3 is the latest, so if you're supporting Iceberg, you must support it. … It has been an extremely well-received release that has only furthered Iceberg penetration," he said.
Beyond upcoming support for V3 of Iceberg, Snowflake is working with the open-source community on projects that address data integration, data governance, data quality and data discovery.
Made open source by Snowflake in November 2025, pg_lake eliminates the need for organizations to build and maintain complex extract, transform and load (ETL) pipelines to bridge the gap between transactional PostgreSQL data and analytical Iceberg tables. Meanwhile, the Apache Polaris Catalog, developed and open-sourced by Snowflake in 2024, is a vendor-neutral data catalog designed for Iceberg tables that can be integrated with proprietary data catalogs such as Snowflake's Horizon Catalog, AWS Glue and Dremio Arctic.
Current open source initiatives Snowflake is involved with include the Open Semantic Interchange to build an open standard for semantic modeling, OpenLineage to create an open standard for tracking data movement and developing V4 of Iceberg to improve support for streaming data workloads and upgrade query and search performance.
Snowflake began working closely with the open source community about two years ago, according to James Rowland-Jones, the vendor's director of product management. Previously, Snowflake was a consumer of open source technology but not a significant participant. Now, with better interoperability a motivating factor, Snowflake supports numerous open source foundations, including the Apache Software Foundation, The Linux Foundation and PyTorch.
"We see openness as essential to the future of enterprise data and AI, and we’re committed to building alongside the community so customers have more control over how and where they use their data," Rowland-Jones said. "When interoperability works, customers are able to freely act on their data from any engine."
Regarding the impetus for adding production-ready support for Apache Iceberg V3 and participating in projects to develop other open source tools, feedback from customers as they attempt to develop agents and other AI tools was the primary motivator, Rowland-Jones continued.
"This is directly driven by what we're hearing from customers," he said. "As organizations move from experimentation to production AI, they’re running into real challenges around data silos, fragmented architectures and inconsistent governance. … In the AI era, interoperability is no longer optional."
Perhaps the most significant open source capability Snowflake is playing a role in developing is the Apache Polaris data catalog, according to McKnight, who noted that open table formats need to be augmented by architectural and security features.
"An open format alone is not enough to achieve true data autonomy," he said. "Without a catalog like Polaris, governance and business context do not travel with the data when teams use their preferred tools, which introduces security risks, limits data fidelity and increases costs."
Catanzano, meanwhile, named pg_lake as the most valuable open source feature that Snowflake is helping to develop because it enables users to easily integrate analytical and transactional data.
"By bridging transactional and analytical datasets without the need for complex ETL processes, pg_lake simplifies workflows and enables seamless integration between Postgres and Iceberg, which is a game-changer for many organizations," he said.
Looking ahead
As enterprises continue to increase their investments in AI development, Snowflake's focus is on enabling customers to build and scale AI initiatives based on an open, interoperable data foundation, according to Rowland-Jones
It's toward that end that Snowflake is helping create new capabilities such as an open standard for semantic modeling and participating in developing the next generation of Iceberg.
"These efforts help ensure enterprises can build AI with confidence, knowing their data is trusted and their systems have the context they need to operate reliably," Rowland-Jones said.
Snowflake's continued participation in open source development is wise, according to McKnight, who noted that its involvement in creating the next iteration of Iceberg will be valuable for existing customers and could help attract new ones that are using the open table format.
"Snowflake showing support for open source will attract open source users to fit Snowflake into their architectures," he said.
Catanzano likewise noted Snowflake's ongoing involvement in developing open source capabilities such as standards for semantic modeling and data lineage could help the vendor attract new customers.
"Advancing standards like Open Semantic Interchange and OpenLineage while ensuring seamless integration with other open-source tools could attract new users and further solidify its position as a leader in the open data ecosystem," he said.
Eric Avidon is a senior news writer for Informa TechTarget and a journalist with more than three decades of experience. He covers analytics and data management.