Upbound's Crossplane infrastructure as code beat out incumbent HashiCorp Terraform at two European banks that favored the newcomer's support for asynchronous updates at scale.
Crossplane emerged as a potential IAC disruptor when it was promoted from the sandbox to incubation stage under the Cloud Native Computing Foundation in 2021. The tool uses the Kubernetes control plane to orchestrate resources outside container clusters through YAML code, in contrast to Terraform's domain-specific language and Pulumi's formal programming languages.
A year later, multiple speakers at KubeCon + CloudNativeCon North America named open source Crossplane their IAC tool of choice in their presentations. In April, Crossplane creator Upbound released its first commercial product: a managed control plane service.
Cloud engineers at banks in Germany and Portugal were among Crossplane's open source adopters and have become paying customers through professional services support contracts with Upbound. But for these companies, the fact that it used Kubernetes as a control plane wasn't the most important factor.
Rather it was the fact that the tool used a control plane along with eventual consistency. Those two aspects of Crossplane's architecture set it apart from HashiCorp's Terraform's dependency graph and update process that must first be planned then applied to deployments, sometimes painstakingly.
"If a lot of developers were running concurrent deployments on the infrastructure, it was very slow" using Terraform, said Christopher Haar, lead cloud engineer at Deutsche Kreditbank AG, based in Berlin. "You needed to wait days to deploy things on the environments, or you had to troubleshoot [problems with] dependencies between a lot of stacks."
Deutsche Kreditbank slashes deployment times
The problem with Terraform at high scale, where developers made frequent changes to environments with many interdependencies, lie in the way Terraform builds its dependency graph based on resource configurations and then "walks" this graph to generate new IAC deployment plans and refresh states.
"Terraform calculates all the dependencies before executing something. With Crossplane, you can deploy everything, and it has a reconciliation mechanism which checks if dependencies are resolvable or not," Haar said. "You can deploy a lot of things together, and then the reconciliation mechanism is doing the rest for you automatically."
For example, Deutsche Kreditbank has a lot of overlap between AWS network resources, such as VPCs, with VPNs and network transit gateways. Updates to these resources could get "stuck" in Terraform and had to be resolved by discussions between developers about which dependencies should be updated first, then manually restarted. This process could take days, according to Haar.
Before it started using Crossplane 18 months ago, the bank's cloud team considered building its own reconciliation system for Terraform. It also considered Pulumi's IAC tools, which some of the team had used in the past.
But with 90% of the environment deployed via Kubernetes, the team was well versed in YAML and used to working with its control plane, Haar said. Thus it was most efficient to use Kubernetes to manage the rest of the underlying infrastructure too.
"In the past, with the TerraForm approach, we created clusters in weeks," he said. "At the moment, we are creating clusters with Crossplane in under one hour."
Millennium BCP goes for simplified dev self-service
Portugal's largest privately-owned bank took a similar path from Terraform to Crossplane in the last 18 months, but with the intermediate step of using Crossplane to manage reconciliation for 170,000 lines of Terraform code.
"Our first goal was, we needed something to run and manage Terraform for us once we had the [IAC] pattern implemented," said Nuno Guedes, cloud compute lead for Millennium BCP, based in Porto, Portugal. "It's running Terraform every four hours, making sure it's checked, reconciled and properly configured."
This was in keeping with the cloud team's desire to encourage developers to use its "golden path" platform engineering patterns without disrupting their existing workflow, Guedes said.
Last year, HashiCorp introduced a reconciliation mechanism for Terraform it called continuous validation. Crossplane also offers a more abstract IAC system for developers to interact with, Guedes said.
"Running Terraform isn't the end goal for us. The end goal for us is moving away from it," he said. "[Crossplane has] the ability to define patterns without binding things to engineering details. … For instance, if a developer wants a small relational database, that's what they should be specifying, not that they want this or that database engine or whatever. That's an implementation detail."
This kind of abstraction is preferable to managing dozens of Terraform state files and modules at scale, Guedes said.
"We're at a point where, for instance, [if] someone says, 'I'm going to start a new microservice' … using Terraform, we're spinning up over 100 resources internally, [plus] Kubernetes and observability," he said. "Just running that kills over a minute for that instantiation … We are at a different level of complexity. We need something that scales."
Pros and cons of making an infrastructure-as-code switch
As with Deutsche Kreditbank, Millennium BCP's cloud team is steeped in Kubernetes -- Guedes estimated that 90% of his company's IT environment runs on Kubernetes as well. And like its German counterpart, Millenium signed on for professional support with Upbound after starting with the open source Crossplane project.
Another thing the two companies have in common is that they can't use Upbound's only formal commercial product, because security and governance requirements won't allow them to use a vendor-managed service for a tool with direct access to provisioning infrastructure.
Christopher HaarLead cloud engineer, Deutsche Kreditbank AG
Both companies leaned on Upbound's professional services and their own internal expertise to fill in gaps in Crossplane's integrations with third-party vendors. Upbound teams helped Millennium connect Crossplane with its Datadog observability tools. Deutsche Kreditbank created its own integration for OpenSearch.
Migrating from Terraform on self-hosted Kubernetes clusters and AWS EC2 instances to Crossplane-compatible clusters on Amazon EKS has been a complex undertaking for Haar's team at Deutsche Kreditbank.
"It sounds easy, but in detail, it's not so easy, because we need to switch everything under the hood -- networking, different versions of Kubernetes, everything," he said.
Compounding the complexity is the sheer number of custom resource definitions (CRDs) that can be included within Crossplane Kubernetes clusters, Haar said.
"If you're running with 700, 800, 900 CRDs on the cluster, then the cluster [becomes] unresponsive," he said. "We're dealing with this by forking the providers from Upbound so that it's operable on our platform, but this is something we want them to solve on their side [by] filtering CRDs, splitting up providers so we're only installing what we really need in the cluster."
Upbound is working to expand its supported providers and to address some customers' more rigorous security requirements, said the company's Chief Product Officer, Oren Teich, in an email to TechTarget Editorial this week.
This month, Upbound also introduced Provider Families "specifically to address [CRD management] concerns" such as those expressed by Haar, Teich wrote.
Terraform closes management gaps
Since Deutsche Kreditbank and Millennium BCP jumped ship, there have been changes in Terraform that address some of their issues. For example, in addition to adding reconciliation with continuous validation, HashiCorp has added collaboration and quality-of-life updates for Terraform in the last 18 months, including a Workspace Overview UI for HashiCorp Cloud Platform that provides detailed information for administrators about what's happening during Terraform runs in real time and what resources are affected. This could potentially shorten the time it takes to resolve incompatible dependencies.
HashiCorp Terraform 1.5, released June 12, also streamlined the process of importing existing infrastructure resources into Terraform runs and added a "checks" construct to further fine-tune continuous validation. Terraform Cloud administrators got another collaboration boost with an Explorer feature released June 13 that they can use to ensure DevOps teams use the most current versions of Terraform modules, providers and Terraform itself.
Finally, in the realm of collaboration among teams using interlinked systems, Terraform Cloud will soon support ephemeral workspaces that administrators can assign a finite time to live before they are destroyed automatically or a reminder is issued to manually destroy them. This is meant to avoid problems over time with orphaned workspaces that can bog down systems, according to HashiCorp officials that briefed TechTarget Editorial on the news this month.
"In my experience, Terraform states do become overpopulated. But rather than splitting resources into multiple states, technical solutions and Terraform advancements have covered the gap," said Kyler Middleton, senior principal software engineer at healthcare tech company Veradigm, which automates Terraform management via GitHub Actions. "Kind of like how we've broadly predicted we'll run out of food globally a few times over the past 150 years, and then agriculture innovates, and then we don't run out of food."
Beth Pariseau, senior news writer at TechTarget, is an award-winning veteran of IT journalism. She can be reached at [email protected] or on Twitter @PariseauTT.