Sergey Nivens - Fotolia
Business intelligence only exists by mining relevant data.
Without the ability to find the right data, there's no actual BI to base decisions upon.
Mining relevant data, however, is no simple task.
Given the organizational complexity of today's big enterprises -- many are multinational with offices around the world, others amalgams of companies sewn together over the years by mergers and acquisitions with product lines that go beyond one small niche -- the amount of available data amassed over decades can be overwhelming and disorganized.
To help organizations curate their data and gain meaningful insights, vendors such as cloud Talend, founded in 2005 and based in Redwood City, Calif., along with others such as cloud data integration provider Informatica and MuleSoft, recently acquired by Salesforce, have risen and become specialists in data integration.
In a two-part Q&A, Talend CEO Mike Tuchen discusses in depth the difficulty companies face in mining relevant data.
In part one, Tuchen talks about the general challenges that have developed over the last 10 to 15 years as organizations digitize and pool their data, while in part two he discusses differences large corporations face compared with their small and midsize brethren, as well as Talend's own strategy in helping organizations deal with their sudden abundance of data.
In terms of mining relevant data, what are the challenges organizations face?
Mike Tuchen: The biggest challenge that every company has is that their data is all over the place. It's in a lot of different systems. They're in a lot of different formats -- some of them you might know about, but most of them you don't know about. Where is all the relevant info, and how does it relate to each other? Once you start finding all of this data, you quickly start realizing that you've going from not knowing where it is to suddenly seeing you've got 10 different versions of everything, and they're all inconsistent and overlapping. How do you start? Where do you go to find the right information? How do you get all that stuff consistent? Those are the core problems every single company faces.
How has it developed to this point -- what has happened in the last 10 to 15 years to lead us to this point where mining relevant data is so difficult?
Tuchen: It was more simple 10 to 15 years ago, but that wasn't necessarily a benefit. It was simpler because many companies simply hadn't digitized. They had a whole lot of manual processes, so the data simply wasn't available in any electronic system. The first part of a digital transformation is digitizing, getting everything in the system and now having electronic workflows, and that's a huge step forward. But it brings that second step, which is that now you've created electronic information which you can start to harness and analyze. That's a huge opportunity that's just now starting to be tapped, but it leads to exactly the problems we just discussed. Where is all the data that's relevant, how does it relate to each other, what's the correct info, how do I make it consistent and correct and find that information and start there and use that to drive my analysis? That's where value comes from.
What can a company do to find the data it needs?
Tuchen: One of the first steps a company takes is to start cataloging their data. There are companies like us that provide a data catalog that allows you to understand where all your data is and now get to the point where you have a common definition. When I talk about annual recurring revenue, what's the actual definition, and how am I defining that here? There's no accounting standard that says here's what ARR [annual recurring revenue] means, so you need to define it somewhere, so how do I define that and say here are the source tables where all that kind of stuff is going to form. So you start with cataloging it, and now you start driving that cleaning and governance process, you start pulling the data together, automating the cleanup steps to start making it consistent and correct. And then, as you've built out those two core capabilities, you now are at the point where your data is consistent and correct and you know what it is. You've defined the most important definitions, and your team knows where to go to analyze it.
Are there potential pitfalls that can arise while mining relevant data?
Tuchen: The secondary problem that's been created that we're now starting to touch on is that different analytical teams, without having a catalog to go find the data, are going to start recreating it themselves -- you not only have duplicate work being created but in some cases inconsistent work, which is even worse. It's not just that they're wasting time that could have been saved, it's that they're coming to different results by creating different definitions or different flows that result in different answers. It's creating more confusion. By creating a catalog, understanding where your data is, and now driving convergence and consistency, you're starting with the right data and everyone is starting in the same place and maximizing use.
Editor's note: This interview has been edited for clarity and conciseness.