freshidea - Fotolia
Third-party cloud services for ETL and application testing
Don't default to your cloud provider's ETL and app testing services. Get to know third-party offerings and explore the advantages of using these tools in your cloud environments.
While the major public cloud providers offer a breadth of infrastructure primitives and higher-level functionality, their native services aren't necessarily the best choice for every workload. In order to fill in the gaps of their IT strategy, users look to third-party services to supplement the tools for their primary cloud provider.
As outlined in part 1 of this series, specialized third-party services for imaging, search and authentication can be far superior to similar feature sets available natively on the major cloud providers' platforms. However, those are not the only areas where enterprises could benefit from third-party cloud services.
Here, in part 2 of this series, explore how third-party tools for extract, transform and load (ETL) and application testing can be well worth the investment.
ETL and data pipeline tools
IT staff typically develops ETL and application testing internally, under the assumption that the work is highly specific to the data and applications involved. However, a growing number of companies, including cloud providers, offer these capabilities as a service.
The cloud providers' ETL tools are built as integrated parts of their infrastructure offerings. But none of these tools are comprehensive, and some of them are painfully inferior to popular third-party alternatives.
ETL tools move data from one location and form to another location and format. Over time, IT pros started to call the process of these data movements a data pipeline. Twenty years ago, data pipelines were built so users could structure data perfectly for enterprise data warehouses. Today, similar processes are built to fill data lakes, but they usually don't change the data structure as much. Instead, they rely upon the ability of developers and data analysts and scientists to query heterogenous data lakes quickly and effectively.
Cloud provider offerings
The Big Three cloud providers (AWS, Google and Microsoft) all have data lake products for depositing data files -- such as CSVs or Parquet files -- into object storage services so that data can then be queried. Additionally, AWS and Google have ETL tools to pull data into those data lakes.
With AWS Glue, developers build scripts that run on Apache Spark to take data from a data store that is on AWS -- including Amazon DynamoDB, Amazon Relational Database Service and CSVs in Amazon S3 -- and put them into some other data store on AWS, in a particular format.
Typically, the goal is to pull data from various databases into files on S3 so that they can be queried by Amazon Athena. Unfortunately, Glue and Athena are not the easiest systems to use. In order to move data regularly and get it ready to query, multiple Glue jobs are required, and developers or data analysts must explicitly specify how the data is set up in Athena. Additionally, many of the transformations require writing code in Python to properly set up the Glue job to work.
Google's Dataflow is similar to AWS Glue, except it isn't built on any Apache open source software. Instead, Dataflow uses Google-built software that is generally more user-friendly and requires less tweaking and coding than AWS tools. It is, however, more limited than Glue and Apache in terms of which sources it can accept natively.
Microsoft Azure offers Data Factory, although it is more common for Azure customers to use Microsoft SQL Server Integration Services (SSIS) for ETL tasks. However, SSIS doesn't work for non-Microsoft databases or file formats, and ETL tasks must be implemented manually in T-SQL. Data Factory doesn't provide anything more than SSIS does beyond scheduling jobs.
Third-party ETL tools
In addition to doing all the same things as cloud providers' ETL tools, third-party providers support significantly more data types. Overall, they're also easier to use and have better operational stability.
Fivetran and Stitch are two of the top third-party ETL services. In addition to handling basic ETL functionality, they support a wide variety of SaaS sources for data extraction, including Zendesk, Salesforce and Stripe. These ETL services will schedule and run custom data retrieval functions and help with regulatory compliance.
They also provide better monitoring, retry and recovery capabilities than the cloud providers' native tools. And they handle changes to underlying data schemas without issue, whereas other ETL tools require reconfiguration upon any change.
Fivetran and Stitch even process data better from cloud-specific services than the actual cloud providers. For example, DynamoDB, which is a key-value and document-style database, can be a difficult data source because a top level "field" could have hundreds of objects in it. That data needs to be transferred to a data lake in a way that enables everything to be queried within that record.
AWS Glue compounds those issues because it's slow, requires multiple different steps for each transition and doesn't handle schema changes automatically. Fivetran and Stitch, on the other hand, can process DynamoDB with significantly less effort and into a better format.
These third-party ETL services are also completely cloud-agnostic, so developers can pull and deposit data across cloud platforms.
Application testing tools
Every company is developing software internally these days, and these companies need to be able to properly test that software. Traditionally, organizations have handled application testing as an internal function, done manually by quality assurance staff.
Similar to ETL tooling, there has been a rise in third-party application testing tools and services from the major cloud providers, as well as third parties seeking to meet this increased demand. But unlike ETL services, the application testing space is very broad and consists of many different offerings -- some competing, but many complementary.
Cloud provider offerings
AWS and Google focus on mobile device testing. Developers use AWS Device Farm to automatically or manually try out mobile applications on a variety of devices and web browsers. The service also supports running Selenium and JUnit tests.
On Google Cloud, developers can use Firebase Test Lab, which is similar to Device Farm but is easier to integrate into cloud environments and simpler to use. However, it does have some downsides relative to AWS: It supports fewer devices, and developers have reported stability issues, requiring rerunning tests.
Microsoft Azure currently does not have a comparable offering, but it does have a service called Azure Test Plans, which is a project-management tool to organize how IT teams might run manual tests.
There are many third-party cloud services for application testing that cover different user requirements. Many of these tools fit into the category of organizing and outsourcing manual testing, but there is a growing number of companies that drive automated testing in new ways that surpass the offerings of AWS and Google in the testing space.
ProdPerfect and Testim.io have recently carved out the AI-driven automated test suites category. Both companies provide different ways to use AI to build and update automated feature tests for web applications so developers can identify regressions before updated versions are released.
ProdPerfect tracks usage of applications over several weeks. It uses that tracking, along with AI, to define a small set of tests that covers a large percentage of the paths that end users take through an application. ProdPerfect builds and maintains those tests for its customers, even as applications change.
Testim provides a way to build tests visually or with code, and it uses AI to make the tests more resilient to changes. So, if a developer modifies an application to move a button around or change an end user interaction, Testim gives the IT team a good shot at not having to modify the interface tests.
Applitools, another automated third-party application testing service, provides a way to quickly and easily get screenshots of how an application looks through testing flows, and it automatically identifies deviations between test runs.
This third-party cloud service can complete this task across different screen sizes and web browsers, so developers know how their application looks after they merge code. Applitools also provides several ways to handle dynamic content, like specifying that certain regions within a page may change and shouldn't be evaluated as strictly as others.
Overall, it pays to investigate the options in the market -- both from cloud providers and third parties. It's time-consuming to understand everything that's available, but it's beneficial to the business to search and evaluate ETL and application testing tools before deciding to build it from scratch or default to a cloud provider's offering.
An Azure Data Factory tutorial for beginners
Compare AWS Glue vs. Azure Data Factory