What is data warehouse as a service (DWaaS)?
Data warehouse as a service (DWaaS) is an outsourcing model in which a cloud service provider configures and manages the hardware and software resources a data warehouse requires, and the customer provides the data and pays for the managed service.
With DWaaS, an organization doesn't have to spend money upfront to buy data warehouse hardware and software and then install the system in its own data center. It also doesn't need to worry about managing the underlying system infrastructure or doing routine administration work on the database that's at the heart of the data warehouse. DWaaS vendors handle those tasks for customers.
DWaaS deployments are growing rapidly as more organizations shift from on-premises systems to cloud data warehouses. In a survey of 753 cloud users conducted in late 2021 by IT management tools vendor Flexera, 55% said their organizations were using data warehouse cloud services. The increasing adoption of DWaaS environments is part of a broader move toward cloud databases overall. For data that's generated in the cloud, DWaaS is a more natural fit than an on-premises data warehouse.
Cloud data warehouses are similar to on-premises ones from an architecture and technology standpoint. With that in mind, the main components of a typical data warehouse implementation include the following items:
- DBMS. A data warehouse requires a database management system (DBMS) to store, process and access the data it contains. Most commonly, data warehouses use mainstream relational databases that store data in rows, but they can also be built on columnar databases that use column-based storage. Because data warehousing is focused on write-once/read-many operations, using a columnar engine can improve the efficiency and performance of analytical queries. A relational DBMS that offers columnar database support is another alternative.
- Data storage. Like the DBMS and the server hardware it runs on, data storage devices are provided as part of a DWaaS environment. A variety of storage options can be used, including traditional hard disk drives, solid-state drives and cloud object storage services.
- Metadata management tools. Metadata characterizes data, providing documentation so data sets can be understood and more easily used. It answers the who, what, when, where, why and how questions for users of the data. Without metadata management capabilities, it's difficult to use a data warehouse effectively.
- Data pipelines. Data warehouses are designed to support business intelligence (BI) and data analytics uses. Transaction data must be moved from operational systems into a data warehouse; the data also needs to be transformed to better organize and format it for analytical querying. Data integration tools that support extract, transform and load (ETL) processes are therefore required DWaaS components. Other integration methods are usually supported, too. That includes extract, load and transform (ELT), an alternative to ETL often used with sets of big data that are transformed for different analytics uses after being loaded into a warehouse.
- Reporting and analytics tools. The primary purpose of a data warehouse is to enable data analysts and business professionals to glean business insights from operational data. BI tools that support querying, analytics and reporting functions against the data warehouse are thus a must.
All of the above can be provided and managed by the DWaaS vendor for the benefit of the user organization. But there are different methods of purchasing, installing and configuring the required hardware and software infrastructure to support a data warehouse in the cloud.
One approach is to deploy traditional data warehouse software on cloud infrastructure. This approach is the most similar to on-premises data warehousing. The expertise to build and manage the data warehouse resides with the customer, while the implementation and much of the ongoing support of the data warehouse system resides with the chosen cloud platform provider.
On the other hand, a pure DWaaS approach relies on the platform provider or another data warehouse vendor that runs its software on a cloud platform to deliver a complete data warehouse environment. The DWaaS vendor also provides ongoing management of the data warehouse, including configuration, performance management and data integration support. Customers can scale computing and storage resources up and down based on their usage needs, and payments are based on the resources they use. System resources can be provisioned on demand as needed or reserved to get discounted pricing.
Benefits of DWaaS deployments
The benefits of DWaaS are similar to those of any cloud computing service, including easier deployment and reduced IT management responsibilities. For example, a database administrator (DBA) responsible for a data warehouse no longer needs to install new releases of the database software that's being used, and an organization's IT team doesn't have to install, upgrade or replace the underlying hardware.
The potential benefits of using a DWaaS environment also include the following:
- Lower IT costs. Overall spending on IT and data management can be reduced because DWaaS eliminates the need for capital expenditures on hardware and software and decreases operating costs in on-premises data centers.
- Easier scalability. DWaaS users can quickly add more data processing and storage capacity when necessary and scale their systems back down when resources are no longer required. In addition, that can be done without the need to add or upgrade hardware or to continually renegotiate contract terms and conditions.
- Reduced staffing needs. Because administration and management are mostly done by the service provider, an organization doesn't need to add new workers to support a data warehouse. This makes DWaaS a good choice for organizations with small or limited IT departments, although cloud data warehouses can also handle mission-critical analytics workloads in large organizations.
- Faster access to new software features. Instead of having to wait for a new release of a vendor's data warehouse software and then install it, as in on-premises systems, users can take advantage of software updates that DWaaS vendors often make on an ongoing basis.
DWaaS also offers the same kind of general benefits as on-premises data warehouses, including expanded access to data for end users and improved data quality with better accuracy and consistency. Ultimately, that can lead to more effective BI and analytics applications to help drive better business decision-making.
DWaaS challenges and considerations
As with any cloud-based offering, performance and availability are primary considerations for potential DWaaS users. Because a DWaaS system runs in the cloud, it requires a reliable internet connection for users to access the data warehouse. If connectivity is impaired or lost, the system may perform poorly or be unavailable. Customers also have to rely on the DWaaS vendor to manage performance and ensure high availability. Service outages similarly affect use of a data warehouse.
Latency can also be an issue on DWaaS implementations. The following two aspects to latency with DWaaS must be considered and managed:
- the delay in getting data from operational systems into the data warehouse, which is a data integration issue; and
- the delay in accessing data once it's in the data warehouse for querying and analysis.
The amount of data that must be moved from operational systems to the data warehouse is the primary factor involved in data integration latency. Typically, the more data that must be added, the longer it takes to migrate from the data source into a DWaaS environment. Similarly, analytical queries that return large amounts of data are most at risk for data latency issues.
Another DWaaS challenge is to mitigate vendor lock-in. It isn't always easy to move from one DWaaS provider to another -- every offering is different. As such, it's wise to choose a DWaaS system with underlying components that your IT and data management team is knowledgeable about to help preserve your ability to migrate to another provider at a future point in time.
In addition, organizations may have concerns about data security, regulatory compliance and risk management in a DWaaS environment. Cost can also become an issue if use of a cloud data warehouse exceeds expectations or if unneeded system resources aren't identified and removed.
Top DWaaS vendors and technologies
As mentioned above, DWaaS vendors include the leading cloud platform providers -- AWS, Google Cloud, Microsoft and Oracle -- and other makers of cloud data warehouses that use one or more of those platforms to run their software. The following technologies are some of the prominent DWaaS offerings available to organizations:
- Amazon Redshift
- Google BigQuery
- IBM Db2 Warehouse on Cloud
- Microsoft Azure Synapse Analytics
- Oracle Autonomous Data Warehouse
- SAP Data Warehouse Cloud
- Teradata Vantage
- Yellowbrick Data Warehouse