Organizations need to pull together the growing volumes of data that they generate and collect to keep workflows moving. To address this, several types of data integration software have emerged to help IT teams simplify and manage the process.
But with so many products to choose from, what's the best approach to select the right data integration tool for your enterprise? It isn't about picking the product with the most features, but the one that best matches your integration requirements and enterprise profile.
Before evaluating data integration platforms, ask some questions within your organization to help guide the technology selection process. Your inquiries should cover the following topics.
Source systems. How many do you have? Do you have overlapping systems, such as multiple customer relationship management (CRM) or sales processing applications? Is there unstructured or semi-structured data in addition to conventional structured data? Are there external data sources in addition to internal ones? What are the data volumes and frequency of updates?
Integration use cases. Do you need to integrate data for analytics -- primarily through data warehousing? What about application consolidation? Does your organization need to acquire or process data for master data management (MDM)? What about synchronizing data between on-premises systems and cloud applications or IoT devices or exchanging data between internal business processes or applications and ones at other organizations? Do you have to capture and deliver data for complex event processing or stream processing applications? Is there a need to integrate data from disparate systems virtually, without moving it to a central data store? Does your company have an aggressive merger strategy with a plan to acquire many different businesses, potentially with disparate systems and databases that must integrate with your existing technology base?
Enterprise size. What is your organization's annual revenue, how many total employees does it have and what is the IT budget for data integration?
Resources and skill sets. Do you have dedicated IT resources to perform data integration work? And what's the level of previous experience with data integration tools?
Data integration tools for large enterprises
Large enterprises generally share the following characteristics:
- A diverse set of source systems that often overlap with high data volumes contain structured data sources that are dominant, but unstructured data sources, such as social media, web server logs and flat files, as well as semi-structured data sources, such as XML or message-oriented data, also need integration.
- They have multiple integration use cases.
- IT budgets are sufficient to purchase any of the available data integration tools and supporting infrastructure as necessary. That doesn't mean these enterprises have an open checkbook, but they have fiscal means if justified.
- They have a dedicated IT group with existing data integration specialists or the budget to hire employees or consultants who have experience using the chosen data integration tool.
Large enterprises that fit this profile should consider Informatica PowerCenter and IBM InfoSphere Information Server for Data Integration, because these products address the entire spectrum of integration use cases. Both products also provide the scalability necessary to handle the data complexity, volume and velocity of large enterprises.
These products also stretch across multiple projects and support integration work by teams of any size. IBM and Informatica both offer MDM and data cleansing capabilities. IBM's product addresses information analytics and management needs, while Informatica concentrates on information integration.
InfoSphere can run OSes like Windows and Linux that either run under the hood of the mainframe or are deployed on separate servers. IBM InfoSphere Information Server for Data Integration is particularly well-suited for enterprises utilizing IBM mainframe computers, because it uses an IBM Db2-based data warehouse and also supports analytics software that can run on or in concert with mainframe-hosted applications, such as Netezza, SQL Server and Oracle. It is also important to note that non-mainframe operating systems such as Linux, Unix and Windows now can also run as guest OSes in the mainframe environment. Informatica PowerCenter has a mainframe and cloud orientation, supporting mainframe systems as well as Unix, Linux and Windows.
However, these comprehensive tools come at a price. In addition to being generally more expensive than their competitors, they also require a more extensive set of skills and experience to use. Also, they typically require more comprehensive infrastructure and are more complicated to implement than their competitors.
Many of IBM's and Informatica's rivals have significantly increased their capabilities and features over the years, providing more alternatives for large enterprises, especially those with less demanding integration needs than outlined above. Data integration tools from SAP, Oracle and SAS Institute Inc. address a wide variety of data sources and integration use cases.
Each of these companies also offers enterprise applications such as enterprise resource planning, CRM and analytics that many large enterprises use. To try to take advantage of that, SAP, Oracle and SAS each mesh their own data integration tools with their applications. As a result, if an enterprise has a significant investment in any of these companies' applications, it's reasonable to also consider using that vendor's data integration tools.
SAP Data Services and SAS Data Management both provide extensive data integration capabilities that support large enterprises. SAP Data Services, although limited to working with SAP's business applications, is increasingly integrating more with the company's software portfolio. This means that enterprises that are already SAP customers should consider this integration product. Likewise, SAS customers that are using the company's statistical and analytical products, or other enterprise products such as Oracle and SAP, should consider SAS Data Management.
Data integration tools for midsize enterprises
Midsize enterprises generally have the following characteristics:
- A variety of source systems handle overlapping data subjects and may be on-premises or cloud-based. Data volumes vary based on industry or the products or services available. Structured data sources are still dominant, and any unstructured data that needs integration is generally limited in scope.
- Extract, transform and load (ETL) and data warehousing are the dominant integration use cases, although application integration may arise in the future if data warehousing is addressed.
- IT budgets are constrained.
- The IT group that performs both data integration work and business intelligence development is smaller. Hiring specialists dedicated to specific tools may not be fiscally possible.
Although midsize enterprises with this profile have significant integration needs, they operate with constrained resources in regards to people, budget and time. These companies should consider data integration products from Microsoft, Oracle, Information Builders, Talend, Hitachi Vantara or Dell Boomi. Each of these tools provides capabilities to address the data variety, scope of integration uses and resource constraints typical for such organizations.
Enterprises using Microsoft SQL Server that have developers with deep SQL expertise should consider Microsoft's data-related products, such as SQL Server Integration Services (SSIS). These tools share a common development approach, enabling IT to work with multiple Microsoft tools more effectively. Microsoft has expanded the capabilities of SSIS to handle more complex integration use cases, such as slowly changing dimensions and fuzzy lookups, and a variety of data sources beyond flat files and relational databases.
Although Microsoft's sources and targets aren't limited to its platform, deployment still remains limited to Windows. Microsoft's tools have historically been on-premises, but the company has made significant strides in moving capabilities to the cloud. On the downside, SSIS lacks some of the comprehensive integration transformations, workflows and process management of its competitors, such as the ability to track and manage processes using a repository or team-based development administration functions.
Similar to Microsoft, enterprises currently using Oracle databases may wish to consider Oracle Data Integrator. ODI is a comprehensive data and application integration tool that can handle a wide variety of data sources and integration uses, including BI, MDM and application integration; it also enables scalability in regards to data volumes and velocity. While the product has numerous capabilities, one of its main functions is to automate SQL scripts for users.
ODI does require sufficient training to handle its somewhat complex implementation. The product's ability to work in conjunction with a variety of Oracle products expands its capabilities, but that also increases deployment complexity, making it difficult to use for IT staff with limited resources.
Information Builders' iWay Integration Suite can handle complex integration uses such as MDM, data cleansing and data governance. In particular, iWay is a feasible option when an enterprise is already using other Information Builders data management and analytics products, as it offers tight integration with those products. These tools have well-known scalability functions and the ability to work in real-time with operational systems. One drawback: There's a limited pool of expertise and experience with the iWay software.
Talend's namesake data integration tools and Hitachi Vantara's Pentaho Data Integration platform can also handle a variety of integration uses. Both products have open source versions that enable an IT group to avoid any upfront licensing costs. The open source versions offer solid data integration capabilities that fit well for enterprises that don't have demanding integration needs or for IT groups that are working on a shoestring budget. The enterprise versions of both of these companies' products provide significantly more extensive capabilities.
Dell Boomi offers an extensive library of data integration interfaces to public cloud services providers, social media outlets, leading sales, HR and CRM software suites and several ERP platforms. It is an open and agnostic tool in the sense that it does not tether to any particular product suite and is particularly well-suited to companies that have, or anticipate having, a majority of their IT resources in the cloud.
Data integration tools for small enterprises
Smaller enterprises in this group generally have the following characteristics:
- A variety of source systems that are primarily structured data sources.
- ETL and data warehousing are the primary integration use cases.
- IT budgets are very limited.
- IT staff that multitasks in areas such as data integration, BI and operational systems is also limited.
These enterprises may want to consider either data integration tools tied to the databases they already use -- i.e., Oracle or Microsoft -- or the products from Talend, Hitachi Vantara or Dell Boomi. These tools are cost-effective, considering SSIS comes bundled with SQL Server, and the open source versions of Talend or Pentaho provide more data integration capabilities than many smaller enterprises even need. Dell Boomi also offers a broad assortment of APIs and data integration functionality for many popular commercial software and public clouds. One caveat: Smaller enterprises should ensure that their IT department has sufficient expertise to utilize these tools effectively.
Tools for small enterprises with limited integration needs
These enterprises are primarily doing operational reporting directly from their source systems and aren't creating a data warehouse to integrate those source systems. Under these circumstances, these enterprises generally won't invest in data integration tools or IT skills. Instead, IT will rely on whatever integration functionality is bundled with existing applications or do custom SQL coding. Business users will rely on the reporting built into their operational applications and use spreadsheets to fill the gaps if they need data from multiple applications for reporting.
Mary Shacklett contributed to this report
Healthcare agencies face costly data integration challenges
Tips on integrating and cloud and on-premises applications
A data integration strategy for big data environments