It was a year like no other for data as organizations of all sizes continued to grapple with the ongoing impact of the coronavirus pandemic.
With work from home mushrooming and a pressing need for organizations to optimize operations in a resource-constrained environment, data became the fuel for digital transformation effort and business survival.
The need for data meant growth for databases of all types in the cloud, an accelerated movement to better use cloud data lakes with different technologies and efforts on improving data observability and quality.
The demand for data capabilities in 2021 also drove an unparalleled amount of financial activity as vendors raised money from private investors to help meet growing demand and opportunities.
Money for data vendors and their cloud data lakehouses
The explosive Snowflake IPO at the end of 2020 ignited interest across the venture capital community in the cloud data technology that drove activity in early 2021.
Data lake query engine platform vendor Dremio struck first on Jan. 6, raising $135 million for its technology that helps organization to more easily query data in data lakes.
Both Dremio and Starburst raised money and grew their technologies to enable organizations to use cloud data lakes for data analytics and business intelligence as demand grew throughout the year.
No technology vendor benefited financially from the interest in the cloud data lakes market as much as Databricks, which pioneered the concept of the data lakehouse.
The concept of a data lakehouse is a data warehouse that runs on a cloud data lake. Databricks snared multiple funding rounds in 2021, including the largest venture capital round ever for a data vendor, raising $1.6 billion in August.
Event data streaming demand grows
Another big trend that grew in 2021 was the continued demand for streaming and event data.
The biggest technology in the sector is Apache Kafka, which led to some big outcomes for its lead commercial sponsor, Confluent.
Confluent had its IPO in June, listing on the NASDAQ stock exchange and marking a milestone for the growth and demand for what the vendor refers to as data in motion.
With Kafka, organizations are looking to ease use of real-time data, which can often be challenge due to scale and complexity.
The Confluent-led open source ksqlDB project continued to develop in 2021, using Kafka data with data queries.
Meanwhile, the ability to query streaming data is one that startup Materialize is looking to capitalize on as well, as the vendor builds a streaming database for the cloud that can handle Kafka sources.
Kafka is far from the only open source event streaming technology that grew in 2021.
Apache Pulsar also had its share of growth.
In January, open source stalwart DataStax acquired privately held event streaming vendor Kesque, bringing Pulsar-based technology into the DataStax portfolio. Among the backers of Pulsar is also StreamNative, which raised $23 million for its commercial Pulsar-based data streaming technology in September.
Database vendors go serverless in the cloud
Database as a service (DBaaS) as a trend started before 2021, but it underwent significant change during the year.
Earlier, DBaaS vendors mainly provided database technologies that still required organizations to manage some aspects of cloud infrastructure, including compute and storage.
What became increasingly visible in 2021 was the trend toward serverless DBaaS, with fully managed database services that do not require organizations to have fixed cloud deployments with specific amounts of compute and storage resources. The promise of serverless DBaaS is that organization can rapidly get started with a service that only consumes the resources that are needed.
While on-premises data will continue to exist for years to come, the growth in data lake, DBaaS and event streaming technologies in the cloud that occurred in 2021 is a trend that cannot be ignored.
There is no escaping the reality that cloud in 2021 and beyond will increasingly be the default choice for data efforts.