singkham - Fotolia
There are many data integration tools to choose from in today's constantly evolving market, all racing to keep up with the rising influx of data.
Here are 10 data integration tools that are widely used in today's market.
Used by SMBs, midmarket and large companies, Dell Boomi's strengths lie in edge computing and IoT integration. Dell Boomi data integration capabilities include a large library of on-premises, private cloud and public cloud endpoint connectors, along with extract, transform and load (ETL) support. The product enables sites to manage their data integration in one central place through a unified reporting portal. This product also features a drag-and-drop UI for data mapping.
This tool is especially well suited to move, manage, govern and orchestrate data across hybrid IT architectures, according to the company. It also provides no-code application integration accelerators, support for DevOps environments, and improved data governance and security in a cloud environment.
The latest Dell Boomi product release features the following:
- "last mile" integration of edge and IoT devices and data, whether on premises or in the cloud;
- workflow and business process automation;
- on-premises or cloud deployment of the software; and
- support for a broad range of IoT protocols, such as proprietary IoT protocols, as well as open standards, such as Advanced Message Queuing Protocol, Message Queuing Telemetry Transport (MQTT) and REST.
Hitachi Vantara Pentaho Data Integration
Pentaho Data Integration is a scalable product from Hitachi Vantara Corp. that runs in a broad array of organizations, ranging from SMBs to large enterprises. It provides a comprehensive data integration platform, which originated as an open source technology known as Kettle. Pentaho offers a free Community Edition of the platform. Many enterprises initially start working with the open source tool to explore integration capabilities or to use the product with limited integration workloads. This gives them an opportunity to try the tool and to see how well it works with their IT environment.
Pentaho Data Integration provides ETL functionality to assimilate a wide variety of data sources, including relational databases, enterprise applications, files and big data.
The platform's ETL architecture supports the creation and maintenance of target databases such as data warehouses, data marts and data lakes. This product also provides the data integration portion of a unified platform that also includes Pentaho Business Analytics, offering users a combination of data integration, preparation, governance and analytics capabilities. Pentaho Data Integration can be used stand-alone or in conjunction with other Pentaho products.
The latest version of Pentaho Data Integration is 8.0. It provides the following:
- enhancements for gathering raw data from various sources and moving that data into the Hadoop ecosystem to create a summarized data set for analysis;
- drag-and-drop creation of big data pipelines;
- near-real-time data monitoring;
- compatibility for Spark libraries packaged with the Cloudera, Hortonworks and Apache distributions of Hadoop;
- support for Kafka streaming; and
- data inspection features.
Along with the Community Edition, Pentaho Data Integration comes in an Enterprise Edition that includes more functionality and technical support. It runs on the Windows, Linux and macOS X operating systems.
IBM InfoSphere Information Server for Data Integration
Best suited to large enterprises, particularly ones that use IBM mainframes in their IT environments, IBM InfoSphere Information Server for Data Integration addresses all phases of the data integration process, including transformation and delivery of data with a common metadata framework and a built-in data catalog.
The integration product suite, which is part of IBM's InfoSphere Information Server platform, enables organizations to integrate data from disparate systems, including database management systems, big data structures, enterprise resource planning and customer relationship management (CRM) systems, as well as on-premises and cloud sources. These integration capabilities also extend to messaging systems, web services, social media and proprietary OS environments that include mainframes, midrange systems and data appliances.
InfoSphere Information Server for Data Integration can deploy on AIX, Linux and Windows servers, including IBM Z mainframes under Linux, while the metadata repository can utilize Oracle Database, Microsoft SQL Server or IBM's Db2 database software. The data integration system can also run on Hadoop clusters in a version of the technology called BigInsights BigIntegrate or as part of InfoSphere Information Server Enterprise Edition, which bundles the integration software with IBM's data quality tools.
Products in the data integration suite include InfoSphere DataStage, InfoSphere Change Data Capture, InfoSphere Data Architect, InfoSphere Data Replication, InfoSphere Blueprint Director, InfoSphere Information Governance Catalog and InfoSphere Information Services Director. In addition to bulk ETL processes, InfoSphere Information Server for Data Integration supports data replication and data federation through ties to IBM's companion InfoSphere Data Replication and InfoSphere Federation Server products.
The latest version of InfoSphere Information Server, 11.7, can perform the following tasks:
- modernize and consolidate user interfaces;
- improve management and runtime performance on Hadoop data lakes;
- automate deployment of runtime environments via Docker containers and Kubernetes;
- create data integration flows; and
- apply data governance and quality rules that the customer defines.
Popular in large enterprises and in some midsize companies, Informatica PowerCenter is a comprehensive platform for data integration, migration and validation. PowerCenter runs in conjunction with an extensive catalogue of related products for cloud application integration, big data integration, data cleansing, master data management (MDM) and other data management functions, which is a potential advantage for IT shops running a variety of software from different vendors.
The data integration platform runs in on-premises, in-cloud and hybrid modes.
PowerCenter often combines with Informatica's Data Integration Hub and its PowerExchange line of packaged connectors.
Data Integration Hub is an add-on to PowerCenter that provides a publish-and-subscribe integration architecture, but it can also run as an independent module. Shops primarily wanting to monitor, control and assure compliance of data might opt to use the Hub software alone. For more expanded data integration, the Hub can combine with PowerCenter and other Informatica products to deliver full ETL capability, big data processing and cloud integration.
PowerExchange, which is an additional add-on to the PowerCenter suite, provides connectivity to a variety of structured, unstructured and semi-structured data sources. A separate PowerExchange module enables PowerCenter users to connect to cloud applications.
Supported data sources include the following:
- ERP and CRM systems;
- on-premises and cloud sources;
- messaging systems;
- web services;
- proprietary systems such as mainframes and midrange systems;
- data appliances; and
- social media.
PowerCenter is available in either Standard, Advanced or Premium editions. All editions include ETL batch data integration, centralized administration, prototyping, data profiling and connectivity to relational, batch and the Open Database Connectivity API, the company touts.
Key enhancements in the latest version, 10.2 include improvements to data governance and code management, as well as expanded data ingestion from sources such as Oracle Exadata, SAP HANA and Micro Focus Vertica, and enhanced data discovery and profiling.
Information Builders Omni-Gen Integration Edition and iWay
Deployed in large and midsize organizations, the Omni-Gen Integration Edition platform and associated iWay integration tools from Information Builders enable organizations to integrate diverse data sources, including legacy hardware and software platforms. The company claims its "sweet spot" is being able to integrate web-facing applications with back-end legacy mainframe and midsize systems.
The core Omni-Gen integration software -- part of a wider data management platform that also offers users data quality and MDM functionality -- provides an integration designer and a governance console with a built-in data profiling tool.
The iWay products include iWay DataMigrator, iWay Service Manager, iWay Big Data Integrator and iWay Universal Adapter Suite.
DataMigrator uses ETL functions and supports the creation and maintenance of target databases, such as data warehouses, data marts and operational data stores. It also has change data capture capabilities, enabling it to support both batch updates and real-time monitoring of data changes in source systems. Service Manager is an enterprise service bus that uses APIs and a microservices architecture for real-time, batch, streaming and other forms of integration, while Big Data Integrator supports Hadoop-based integration. The iWay adapter suite provides more than 300 prebuilt adapters connecting data sources, applications and business-to-business exchange formats.
Key enhancements in the latest version, iWay 8, include the following new capabilities:
- IoT and Blockchain data management;
- new connectors and APIs that facilitate broader access to technologies and data sources such as Twitter, Twillio, AWS DynamoDB, MQTT, and blockchain; and
- UI business orientation that allows users to focus on business subjects rather than on technically-stated models.
Microsoft SQL Server Integration Services (SSIS) is included with Microsoft's SQL Server database product and provides ETL functionality for data integration and data movement between applications through the use of packages -- integration jobs with embedded control and data flow elements that can be saved and built into ongoing workflows. It shares common development and management tools with SQL Server Analysis Services and SQL Server Reporting Services. In addition, the product works with various Microsoft Azure cloud platform products such as Azure SQL Database and Azure HDInsight.
This tool is best suited for organizations that are Microsoft shops and that use SQL Server as the SSIS target for integration. Accordingly, SSIS is most popular with midmarket and SMBs that have committed their IT resources to a Microsoft environment.
In SQL Server 2017, the latest production version of the database platform, Microsoft added a Scale Out feature to SSIS to make it easier to distribute package execution across multiple computers. It also enabled users to run SSIS packages on Linux systems, part of adding overall support for the open source OS to SQL Server.
SQL Server 2019, currently available as a preview release, provides data transformation and integration functionality for both structured and unstructured data via SSIS with the help of the Spark processing engine. It adds an application deployment and management environment for running SSIS jobs on big data clusters that combine SQL Server with Spark and the Hadoop Distributed File System. SSIS now runs in Windows, Linux and Docker container environments.
Oracle Data Integrator 12c
Oracle Data Integrator 12c is best for organizations that use other Oracle systems and applications and want tight data integration with these systems.
The platform combines with Oracle Database, Oracle GoldenGate, Oracle Fusion Middleware, Oracle Big Data Appliance and Exadata. Its core functionality is based on extract, load and transform (ELT) architecture. This architecture enables the software to utilize the functionality, scalability and performance capabilities of relational database management systems and big data systems.
Oracle Data Integrator Enterprise Edition includes bundled support for the unstructured big data environment, as it natively runs integration workloads in Spark, Kafka, Hive, HBase, Sqoop, Pig and Cassandra. A separate Oracle Data Integrator for Big Data tool offers that functionality on a stand-alone basis.
This system can be deployed using bulk load, batch, real-time, cloud or web services.
The latest version features increased interoperability with Oracle Warehouse Builder, as well as a tool that enables migration from OWB to Oracle Data Integrator. It can integrate with Oracle Enterprise Manager Cloud Control 12c to manage deployments of Oracle products. Users of Oracle Data Integrator can deploy real-time data integration, utilizing Oracle GoldenGate, which allows faster and more efficient loading and transformation of real-time data, according to the company.
SAP Data Services
SAP Data Services can be used stand-alone or with other SAP products. It provides data integration, data transformation, data quality, data profiling and text data processing. Typically used by large enterprises, its best bang-for-buck implementation is in SAP shops that use other SAP applications.
This tool discovers, cleanses, enhances, integrates and manages data from SAP and non-SAP sources, including relational databases, enterprise applications, files and big data sources such as Hadoop and NoSQL databases. It provides data integration and data quality capabilities that utilize SAP Information Steward for enterprise data management and governance.
Key improvements in the latest version, 4.2, include enhanced SAP HANA support as well as better connectivity to various relational and big data sources, including text and XML processing. The SAP HANA support is critical for shops that want to expand their use of analytics. The latest version of the SAP Data Services data integration module also provides expanded developer workbench functionality and usability, as well as enhanced deployment and monitoring operations.
SAS Data Management
SAS Data Management is a unified data integration, data quality, data governance and MDM platform from SAS Institute Inc. It works independently of SAS analytical and statistical packages, but customers can also use it with the full spectrum of SAS products. Large enterprises frequently use the SAS Data Management platform, especially those with data integration and quality needs related to enterprise applications and MDM. This platform is best suited for shops that use other SAS applications.
The applications and tools bundled together in the platform give users the ability to discover, transform, cleanse, enrich, integrate, deliver and govern data from databases, enterprise applications, mainframe legacy files, text, XML message queues and big data structures, according to the company. Users can also integrate and cleanse data from disparate systems such as on-premises and cloud sources, messaging systems, web services, proprietary systems and social media.
SAS Data Integration Studio, DataFlux and other integration technologies included in SAS Data Management can deploy either an extract, transform and load or an extract, load and transform architecture. Data integration and data quality processes can deploy in batch mode, near real time and real time, depending on the necessary service, using message queues or web services. In addition, data federation provides integrated views of data in different source systems.
SAS Data Management is available in standard and advanced editions. It's part of SAS 9.4, which combines all of the software vendor's analytics and data management products into a single platform. SAS Data Management components can also now be used with SAS Viya, a cloud-based analytics suite that SAS launched in 2016 as the new centerpiece of its analytics strategy.
New features in SAS Data Integration Studio 4.9, the latest version of the tool, include the following:
- a Git version control plugin for archiving integration jobs and other objects so changes to them can be tracked in metadata;
- a table maintenance transformation that enables users to make changes to database tables with pass-through SQL code to Oracle and other database systems;
- improved connectivity to Hadoop systems, plus updated support for transformations with SQL-on-Hadoop engines Hive, HAWQ and Impala; and
- transformation features for data stored in Amazon S3 (Simple Storage Service), and support for loading data into the Amazon Redshift cloud data warehouse.
Editor's note: Using extensive research into the data integration market, TechTarget editors focused on the vendors that lead in market share, plus those that offer traditional and advanced functionality. Our research included data from TechTarget surveys, as well as reports from other respected research firms, including Gartner and Forrester Research.
Talend Data Management Platform
SMBs were the primary target market for Talend's data integration software in the past, but the vendor is increasingly competing for larger enterprises with Talend Data Management Platform, which combines its integration tools with data quality, data profiling and data governance capabilities. Through the Talend Data Fabric framework, the platform can also be used along with the company's other products, such as Talend Cloud Pipeline Designer, Stitch Data Loader, Talend Big Data Platform and Talend Data Preparation, for a variety of integration use cases.
Talend also offers a free open source tool called Talend Open Studio for Data Integration. Enterprises may use that product either because they have limited integration needs and prefer to avoid writing custom-coded integration or because they're utilizing the free option to explore Talend's integration functionality. The free option is a great advantage for shops that want to get their feet wet with the product before making a larger investment decision, the company claims.
Talend Data Management Platform works independently or with the other products in Talend's portfolio, but it is at its best when it is working with other Talend applications. The software uses both ETL and ELT functionality to process data from and to source and target databases during data integration.
Version 7.1 of Talend Data Management Platform includes the following features:
- support for the OpenJDK implementation of Java;
- enhanced support for Amazon S3 and Redshift, as well as the Snowflake cloud data warehouse;
- added support for Oracle Database 18c and MySQL 8, plus expanded integration features for SAP HANA databases;
- multi-input hierarchical data mapping capabilities to help users do advanced mapping jobs in a GUI; and
- support for word-based pattern profiling for data discovery and data preparation uses.