Sergey Nivens - Fotolia
Building leaner, meaner BI data sources
As business intelligence analysis and reporting platforms become increasingly important in the enterprise, so does the data that feeds them. Are your BI data sources up to par?
Over the past few years, business intelligence analysis and reporting has broken out of its traditional data warehousing pen. Microsoft's Power BI is perhaps the most cited exemplar of this jailbreak, but BI is spreading through the enterprise at an unprecedented pace -- and it's never turning back.
BI has enjoyed the greatest surge among analytical technologies mostly because it's the easiest to replatform. AI, machine learning and custom analytics are harder to do, both methodology- and infrastructure-wise.
But getting BI onto a bigger stage also means creating practical bottlenecks while pushing through conceptual ones. One of the stickiest examples of these bottlenecks is with BI data sources, an important -- and sometimes overlooked -- area of BI that increasingly requires new thinking and new tools.
BI back then
Believe it or not, the term business intelligence is more than 150 years old, first appearing in an 1865 book of business anecdotes to describe the strategic victories of Sir Henry Furnese, an English merchant. More recently, it entered the IT mainstream when IBM made data warehousing a thing back in the 1980s. Once information storage became cheap, BI was the serendipitous product of all the information the enterprise started hoarding. When Microsoft brought it to the masses with the 2005 edition of SQL Server, BI became a mission that companies of every size could finally undertake without a massive investment in software and infrastructure.
When data warehousing was exciting and new, it wasn't that troubling that BI and data warehousing were a package deal. But now that data warehousing has become passé and BI has become mission-critical, BI needs data sourcing of its own, with its fingers in big data and ad hoc utility. That means enterprises need BI data sources that are robust, flexible and optimized for the job -- and it takes some effort to get there.
Trimming inbound data
BI software now exists apart from data warehouses, data marts and other reporting sources in part because it has become easier to deploy it throughout an enterprise network -- and in the cloud, in particular. The other reason is that BI is no longer just a reporting tool: It's an analytical resource in its own right.
Scott Robinson
In practice, the many-to-one inputs to BI used to happen between points of origin and the data warehouse; now it happens between those points of origin and whatever data sources are BI-specific. Since those BI data sources are less generalized and more application-focused than a traditional data warehouse or data mart, it becomes important to keep them trimmed down.
That means eliminating extraneous columns in spreadsheet workbook sources and database tables that are feeding the source, pre-filtering unnecessary rows that might be outdated or otherwise out of scope, and rigorously examining unstructured big data from sources like Hadoop or Spark. The key is for users to target and extract specific objects in BI data sources rather than aggregations. Today's vendors easily accommodate these approaches.
Forethought goes a long way
Whereas in the past it was commonplace to do the data crunching in the data warehouse -- a SQL Server OLAP cube with SSAS, for example -- now it's often more convenient to do it in process. There are BI tools today that put the power of analysis in the hands of non-IT people, making both ad hoc analysis and analytical reporting easier and more versatile than before.
But this too is a practice that benefits from some forethought. While it's important to keep data exported from origin sources to BI sources uncluttered, it's often useful to carry over any statistical data that might be useful in an analytical process.
BI software -- like spreadsheet software and platforms like SharePoint -- routinely offer a feature called a calculated column or a derived column. This is a repository for information emerging from analysis done on the data in the BI platform when it's loaded into a report. This feature creates a great opportunity for enhanced analytical utility in the data model and allows BI users a greater range of reporting and analysis opportunities for the data. It's even possible, and often desirable, to do some of this reporting and analysis ahead of time, loading calculated columns into datasets when they are extracted from the original BI data source.
Importing relationships vs. remapping
Finally, and perhaps most importantly, there's the matter of data structuring.
BI data sources employ data models from which BI-friendly datasets are generated for a variety of applications. It is often essential that enterprises structure the data in these datasets differently than how it was structured in the original source.
In the case of spreadsheet data, this usually means that users discard pivots and other structural features in the original source. On the other hand, in a relational database, the structure of the data in the source model may hugely impact its utility and storage efficiency -- and that may impact its overall usefulness in BI reporting. Is the data most efficiently stored when detail tables are keyed strategically to master tables? What data should go in lookup tables? Sometimes these are questions that must be asked of BI data, too.
Creating great BI data sources, then, includes a key decision: Should users import the relational structure of the data from a source database or pull the data over to the BI source flattened out and structure it differently in the model?
Many BI vendor packages will allow you to either import existing structures or let them go, leaving you free to decide what's best in your particular scenario. Usually, these decisions are left to IT people -- data architects who can optimize database storage as much as they'd like. But more and more, when it comes to BI, those decisions rest on those who are building out the reporting environment. It's a wise approach, then, to learn a few tricks of the database trade.