pathdoc - stock.adobe.com
Arcion updated its namesake platform with a series of improvements to accelerate new data through data pipelines.
At the foundation of the Arcion platform are change data capture (CDC) capabilities that enable an organization to get data out of one source, like a database or an online SaaS platform such as Salesforce, and into another database as part of a data pipeline.
The updates to the Arcion platform include better integration with Oracle Autonomous Database, as well as new support for Google's BigQuery, Microsoft Azure-managed SQL Server, and Imply, which is based on Apache Druid. Arcion also added what it calls a "schema evolution" feature that aims to keep data structures updated across the data pipeline.
"CDC isn't a new market anymore," Gartner analyst Sharat Menon said. "For scalable growth, Arcion, or any other new vendor, for that matter, will need to first find innovative ways to grab mind share away from the established vendors."
Menon noted that while Arcion has multiple integrations and partnerships with cloud data platform vendors, he said the vendor still needs more technology and go-to-market partnerships with product vendors.
Why CDC is needed for modern data pipelines
CDC is an increasingly relevant piece of the data landscape, Menon said.
Traditional data integration workloads were batch oriented, which was good for extracting, transforming and loading (ETL) data at predefined intervals. While those workloads are still there, organizations increasingly require ETL operations that are closer to real time.
Sharat MenonAnalyst, Gartner
"There are many use cases today, such as real-time product recommendations and fraud detection that require data to be moved in near real time, in order to support business decisions in near real time," Menon said. "CDC supports such workloads."
Modern CDC isn't just about basic data replication either.
While database management systems have provided data replication capabilities for decades, organizations now require standalone products that can use logs or triggers to capture changes from a wide range of source databases and replicate those changes to a wide range of target databases, Menon said. And as cloud migration projects become commonplace in many organizations, replicating data from on-premises databases to cloud databases has become a critical requirement, he added.
Arcion adds schema evolution and column transformation
The vendor's platform update includes features designed to specifically address the needs of modern data pipelines, Arcion CEO and president Gary Hagmueller said.
A primary need is for the CDC technology to work as changes happen, rather than in a batch process at a predetermined time interval.
Arcion is able to detect a change in the source system so it can be communicated to the target system. A challenge that can occur is when the schema changes, or, for example, a new column is added.
In that scenario simply communicating the data change isn't enough, as the target data source will also need to change to match the new schema. In the past, that required time as the source data was snapshotted and then rebuilt on the target system.
With Arcion's new schema evolution feature, Hagmueller said the schema changes are communicated more efficiently and faster. In the update, Arcion also integrated a column transformation capability that enables basic data transformations as part of the data pipeline.
For example, in a transactional database, a transformation could be to add column A and column B together and make that column C. Arcion is not doing complex transformations in which additional programming or business logic would be required, Hagmueller said. For complex transformation, he noted that Arcion partners with DBT Labs, a provider of data transformation technology.
Looking forward, Hagmueller said Arcion platform updates will focus on making the management aspect of CDC-driven data pipelines easier for users.
"So, going beyond the user interface are there things we can do to identify if something is wrong with the data pipeline?" he said. "We basically want to get to a place where it's really easy for people to manage this stuff."