polygraphus - Fotolia
Data virtualization layer feeds logical data warehouse, Agile BI
Indiana University is using data virtualization to combine data from various source systems for analysis, as part of an initiative to improve strategic decision-making.
At the start of 2015, Indiana University found itself saddled with a 15-year-old data warehouse that stored a limited set of operational data and wasn't being used for analytics in a concerted way. Looking to improve decision-making, IU officials mapped out a new strategy -- one that resulted in the deployment of a data virtualization layer to create a logical data warehouse and BI environment.
Data virtualization software from Denodo Technologies is used to combine data from various source systems, both on premises and in the cloud. A team of data analysts and architects creates blended views of data sets in Denodo's platform and builds dashboards and data visualizations for business users with Tableau's self-service BI tools, using an Agile BI process to plan and manage projects.
The virtualized approach helped accelerate the initial implementation, said Dan Young, IU's chief data architect and manager of enterprise business intelligence. The university bought the Denodo software in July 2015 and put a production version of the data virtualization architecture in place late that year, he said.
IU was also able to avoid the need to invest in a full-fledged BI platform, according to Young. Instead, his team does data curation and analysis in the data virtualization layer. The Tableau dashboards and data visualizations are then documented in a report catalog that's part of Data Cookbook, a data governance tool developed by IData Inc. for use by colleges and universities.
Partly because of the decision not to buy a conventional BI software suite, an initial $12 million budget that was expected to fund the project until mid-2017 should now carry it through to the end of 2019, Young said. He added that the budget covers software costs, additional staffing, training and other expenses related to the project, which is known as the Decision Support Initiative (DSI).
A more strategic approach to data analysis
The existing Oracle data warehouse contains data on student registrations and other transactional records, Young said. It supports operational reporting on business functions. But the data isn't updated in a timely way, and many of the data definitions used by different departments, schools and campuses are inconsistent. As a result, the warehouse isn't a good fit for analyzing trends and business strategies.
Dan YoungChief architect and manager of enterprise business intelligence, Indiana University
The DSI project is meant to remedy that. "We're trying to get the right information into the hands of high-level people looking to make decisions on things like how to structure programs," Young said. That includes top university administrators plus deans, department chairs and other workers across IU's flagship campus in Bloomington and its eight satellite locations. All told, several hundred people are using Tableau based on metrics generated by the BI software, he noted.
The data virtualization layer is connected to transactional systems running in Oracle, SQL Server and MySQL databases, as well as a learning management system that runs in the AWS cloud and stores its data in the Amazon Redshift data warehouse. The data virtualization software can also pull data from various external web services that IU uses. It's tied to the operational data warehouse, too, although Young said his team usually goes to the source systems for data.
Data virtualization streamlines data work
It's much easier, Young said, to link new data sources to the virtualization layer than it would be to incorporate them into the data warehouse, which was built to take in data from Oracle systems. For example, when IU deployed a new Salesforce CRM system, the logical data warehouse architecture "helped to minimize the impact of that from a reporting perspective," he said.
In addition, new data attributes can be added to a data model much faster now, according to Young. That might take weeks in the Oracle data warehouse, but it can be done on the fly in the Denodo software when data sets are joined together, he said. "It gives us the flexibility to write quick business rules, as opposed to going back into the SQL stack and figuring out where modifications are needed."
Earlier this year, IU upgraded the data virtualization layer to Denodo Platform 7.0, which was released in April 2018. Young said he's looking to take advantage of a new in-memory massively parallel processing feature to speed up processing of complex queries and data joins. The 7.0 release also offers a new UI designed to give business users access to the software's data catalog for self-service data discovery and preparation. "We're looking to see how that meshes with things we're doing here," he said.
Young's team already makes the blended data sets it creates available to data analysts in individual schools or departments who want to do their own analysis and data visualization work. But the DSI's primary output is a set of BI applications delivered in Tableau with embedded dashboards and visualizations. That includes applications for analyzing academic metrics, financial risks and workforce trends.
Rapid response on data needs via Agile BI
As part of the Agile BI process, Young and his team collect requirements for new data analyses from business users and work to deliver something useful in two to four weeks -- even if it's just an initial pie chart for a planned dashboard. But they use a customized Agile development methodology that includes some built-in flexibility on the delivery schedules.
"It doesn't mean we don't have things that take six weeks to deliver," Young said. "Sometimes, it can be six weeks, and sometimes, it's two. It just depends on the complexity."
Doug Henschen, an analyst at Constellation Research, said data virtualization is still a niche technology but a growing one that can provide advantages over traditional extract, transform and load (ETL) processes in some cases, particularly when a variety of source systems need to be integrated.
"The problem of bringing data together from many sources is only being compounded, so there's a need there," Henschen said. He added, though, that there can be some performance penalties associated with data virtualization -- whether it's a good option depends partly "on what you want to do with the data."
In fact, Young said his team has also set up a small Oracle-based analytical data warehouse where it can put data for ETL processing if need be. The Denodo software can then access the data to blend it with information from other sources or pull it into the data virtualization layer to cache it for faster query performance, he said.