Selecting the right data integration tools and software is critical to meet the increasing demand for data that helps drive more informed business decisions. The tool a company chooses to integrate and translate this very data must also fulfill the organization's requirements. Otherwise, it will become expensive, unused shelfware. Even worse, custom manual coding of integration scripts -- with all its downsides -- will prevail.
Evaluating data integration products starts with gathering and prioritizing requirements such as source and target systems. These systems will dictate the types of data that IT admins will have to pull together and the forms of integration necessary. There are many variables in those requirements. For example, the company may have a mix of structured and unstructured data to integrate. There is also great variability in the tasks that may be required -- from data extract, transform and load processes and application integration to cloud and real-time data integration, data virtualization and related functions like data cleansing and data profiling.
By first getting a grasp on data handling, integration and business requirements, the organization will be in a better position to navigate through the wide choice of data integration platforms that are available.
Once the requirements are in hand, the next step is to create a list of specific features and functions to compare and evaluate. Ultimately, an organization needs to select the data integration tool that's the best fit for its use cases and budget as well as its resources and skills -- not necessarily the most highly ranked or feature-laden product.
Data integration product evaluation and selection criteria
To simplify the selection process, classify the list of features and functions for data integration tools and software as must-haves, should-haves, nice-to-haves and will-not-use items.
Must-have features should be unambiguous. It is important to eliminate a potential tool option if it doesn't have the necessary features. Should-have features occupy a gray area between must-have and nice-to-have features, where certain capabilities can have a major impact on integration productivity, scalability and maintainability. Although nice-to-have features aren't required, they're often the differentiators in selecting a product.
When determining whether a product has a particular desired feature, sometimes the answer is, "Yes, it meets the criteria, but ..." The "buts" include such things as custom coding that will be required to fully meet integration needs, or a company will have to purchase an add-on product, possibly from a third party, to make up for missing functionality.
Others could be that the feature is only available in a specific edition of the product or that it's slated to be added in a future release, or that the vendor doesn't have a good reputation for supporting its products.
These exceptions generally mean that additional time, expense and effort are necessary for that product to meet the criteria. Evaluators need to handle these buts about products as well as the must-have, should-have and nice-to-have features. This is the only way to ensure an accurate product evaluation and to avoid surprises after selecting a product.
For example, if a required feature is lacking in an otherwise suitable product, the company could perform custom coding to fill in the gaps. But beware: As part of the product evaluation process, it is crucial to estimate the cost of the coding in terms of time, resources and opportunity loss, and then assess whether it's better to just forgo missing features or choose a different integration platform that offers them.
Compiling a list of data integration features
Each company's laundry list of must-have items will differ based on its detailed requirements. But these core capabilities are generally considered must-have data integration features for most organizations:
Access data from a mix of sources. The chosen data integration product needs to directly access various data structures and types of information, including the following:
- relational, columnar, in-memory and NoSQL databases, plus multidimensional online analytical processing systems and other specialized databases;
- Hadoop systems and other big data platforms;
- flat files, such as tab-delimited, comma-separated values or spreadsheets;
- application messaging technologies, such as enterprise messaging systems, XML and JSON;
- industry-specific protocols, such as Health Level Seven International and Society for Worldwide Interbank Financial Telecommunication;
- enterprise application integration web or data services;
- business applications, such as ERP and customer relationship management systems;
- SaaS applications;
- mobile applications;
- unstructured data, such as social media data, email, website-related data, images and documents; and
- proprietary data protocols to communicate with specialized sensors, devices and legacy systems, such as mainframes.
Write data to target systems. Data integration tools and software need to be able to insert, modify and delete data in the target systems of integration processes -- for example, data warehouses or operational databases that combine data from various sources for transaction processing.
Interact with sources and targets. An integration tool must support a variety of data capture and delivery methods, including batch acquisition and delivery, bulk import and extract, and change data capture. Streaming and near-real-time data ingestion should also be a standard feature of integration software, along with time-based and event-based data acquisition; the latter triggered by predefined processing events in databases or applications.
Transform data. Basic data handling features are crucial, including data-type conversions, date functionality, string handling, NULL processing and mathematical functions. The same goes for data mapping capabilities, such as join, merge, lookup, aggregate and substitute, and for workflow support, which enables the creation of an integration process with multiple source-to-target mappings that potentially interconnect based on data or functional dependencies. In addition, integration software should provide workflow orchestration that includes looping, if-then-else, case style and passing variables.
Enable effective design and development. Another key data integration feature is a graphical user interface (GUI) that simplifies the construction of source-to-target mappings and integration workflows with data, transformations and other elements displayed in design palettes. A well-designed GUI with ease of use is extremely important because it will cut your staff's training time. It should accompany software development management functionality, such as version control, support for development, testing and production environments, and the ability to attach comments or notes. Data integration products also need to provide interactive testing and debugging functionality and the ability to create reusable and shareable components.
Support efficient operations. Features for managing and optimizing integration processes are vital as well -- for example, runtime process monitoring; error, warning and condition handling; collection of runtime statistics; and security management.
Provide multiple deployment options. A data integration platform must support operating environments both on premises and in the cloud, the latter through either hosted deployments or integration platform-as-a-service offerings. The platform should also support virtualized servers and distributed processing environments across a variety of operating systems.
The following features aren't necessarily must-haves, but they can significantly enhance developer productivity in designing data transformations:
- support for slowly changing dimensions, if used for business intelligence or data warehousing;
- customized log, error and condition handling;
- text string parsing and matching; and
- data set processing, such as time series and pivots.
Other features that could classify as should-haves include support for team-based development and management, as well as release management for both integration processes and the data structures that companies have in place. Repository-based storage and access to process, or runtime, metadata is another, as it enhances the ability to analyze runtime performance to identify bottlenecks and trends.
More nice-to-have features include the following:
- self-generating documentation with graphical representations of workflows;
- where-used and what-if capabilities for analyzing the use of sources, targets and transforms;
- data profiling tools to analyze the information in sources and targets;
- data quality tools to cleanse and enhance data;
- integration with other vendors' software development, management, scheduling and monitoring tools; and
- parallelization of integration processes and data loading.
Additional data integration tool selection criteria
The following are often included in evaluation criteria. But since they're subjective, it's important to clearly weigh their applicability and importance to an organization:
Loading performance. This will vary based on the integration complexity, source systems accessed and data volumes involved. The best practice is to create several prebuilt integration use cases and compare how each product performs on these specific examples.
Scalability. Companies should supplement the loading performance tests with stress tests that simulate anticipated growth in the number and size of the sources and targets.
Ease of use. This will vary based on the knowledge and skills of the data integration developers involved.
Training. This may include vendor in-person classes; online classes, live or prerecorded; or web recordings for specific features or processes.
Documentation and support. There should be separate criteria for developer online help versus technical documentation. How the vendor provides support -- online Q&A for common issues, online chat, in-person discussions and on site -- and pricing should also be included in the evaluation.
Once a company has created its evaluation criteria, it's time to select a shortlist of data integration tools and software and create a request for proposal.
Mary Shacklett contributed to this report.
Is data curation the next step in data integration?
Healthcare organizations require more sophisticated data analytics
How data integration and quality are essential to big data CRM
SAP Data Services offers data integration, transformation and more