Sergey Nivens - Fotolia

News

Trifacta moves beyond data wrangling to DataOps

'Data wrangling' is a term that Trifacta assigns to data preparation, but there is more than that to the concept as part of the vendor's Data Engineering Cloud platform.

Sean Michael Kerner

Published: 28 Jul 2021

Trifacta expanded its cloud platform with the general availability of cloud data engineering templates to provide organizations with predefined processes and workflows for data pipelines.

To make data useful, more is needed than just wrangling or collecting data and transforming it into the right shape so that it can be used for data analytics and business intelligence.

Trifacta, based in San Francisco, got its start in data wrangling, which includes data preparation for organizing data. In April 2021, Trifacta introduced its Data Engineering Cloud platform, which moved the vendor firmly into the DataOps area with a platform that enables wrangling, as well as scaling and management of data operations.

In this Q&A, Adam Wilson, who has been the CEO of Trifacta since 2014, outlines the changes in the market in recent years and explains where data wrangling and DataOps intersect.

Trifacta's cloud platform update went live July 27.

What have you seen as the big changes in the data industry in the time that you've led Trifacta?

Adam Wilson: What we've really seen since I joined seven years ago is a movement toward data operations beyond the biggest companies into the midmarket.

Adam Wilson

The most foundational shift that we've seen is that the analytics projects in the beginning, especially for the Fortune 500 and Global 2000 companies, were very stubbornly on premises. So, up until about 18 to 24 months ago, most of the big companies were still doing most of their data warehousing and advanced analytics on premises. That has now changed.

That's also why Trifacta, in the first quarter of the year, announced a repositioning of the company with the Data Engineering Cloud. That was a big shift for us to provide an end-to-end SaaS-based platform to do all of the data engineering work.

Now, with the new announcement for templates, users can share what they know with others in the organization, as well as with others outside of the organization.

What is the role of open source within a DataOps platform?

Wilson: There are a lot of what I would [call] point solutions solving very specific problems that can work for very technical users who want to stitch all of that together by hand, in order to create their overall data stack.

What we've really seen since I joined seven years ago is a movement toward data operations beyond the biggest companies into the midmarket.

Adam WilsonCEO, Trifacta

From a Trifacta perspective, we're trying to provide a bit more of a seamless experience that spans a complete set of activities. That includes everything from doing the connectivity piece and handling the data ingest, then profiling the data, understanding data quality, consistency, conformity, completeness and automating the process of cleaning up the data. Then, ultimately, you need the data operations component, which is all the scaffolding around how to scale and orchestrate data.

We integrate with a lot of open source technology under the covers. We tie into projects like dbt, Apache Spark, Apache Beam and Apache Airflow. We're fans of these open source projects.

What do you see as the difference between data wrangling and DataOps?

Wilson: People use different terms. For us, data wrangling is the cleansing, standardization and transformation of the data.

We see the DataOps piece of this as: how do I then take the work that an individual user is doing for a small team of users and how do I scale that? How do I operationalize that? How do I think through the governance? How do I think through the monitoring?

That tends to be more of the operations piece, which is about taking the hard work that the end user is doing and putting that into production and then making that a reliable pipeline that a business can depend upon.

Editor's note: This interview has been edited for clarity and conciseness.

Trifacta moves beyond data wrangling to DataOps

'Data wrangling' is a term that Trifacta assigns to data preparation, but there is more than that to the concept as part of the vendor's Data Engineering Cloud platform.

Dig Deeper on Data governance

Alteryx, Databricks expand complementary partnership

Alteryx unveils generative AI engine, Analytics Cloud update

Alteryx makes Analytics Cloud GA, adds new tools

AWS DataZone headlines AWS re:Invent 2022