An emerging open source project has caught the attention of enterprise early adopters with a developer-friendly approach to orchestrating cloud cost and security policies.
Cloud Custodian was first created in 2016 by Kapil Thangavelu, former Capital One engineer and now co-founder and CTO at Stacklet, which markets commercial products based on Cloud Custodian. The tool was donated to the Cloud Native Computing Foundation as an early-stage sandbox project in June 2020.
This month, the CNCF technical oversight committee promoted Cloud Custodian to the incubation stage, which means it has reached adoption and code committer benchmarks. Cloud Custodian has more than 350 committers from more than 130 organizations, including Intuit, Microsoft and HashiCorp; it's used in production by organizations such as Capital One, JPMorgan Chase & Co. and Siemens.
What sets Cloud Custodian apart for now, according to early adopters, is that it uses a relatively simple YAML-based DSL to both create and enforce security, governance-as-code and FinOps policies in the cloud. The tool also doesn't require additional persistent cloud resources to run, as it can be deployed via serverless functions.
"It has allowed us to automatically identify underutilized or unused resources and then quickly address them," said Lindbergh Matillano, director of cost optimization at digital tax compliance company Avalara in Seattle. "If any exceptions come up, it can immediately shut them down."
Cloud Custodian tidies FinOps for Avalara
Avalara's IT teams started using Cloud Custodian in small pockets about two years ago, primarily on AWS cloud resources. In addition to cloud cost policy exceptions and misconfigured resources, Cloud Custodian can automatically shut down resources during regular off hours for the company.
Matillano estimated the tool has potentially saved the company hundreds of thousands of dollars in annualized run rates -- enough to warrant investment in Stacklet's paid SaaS tools in May 2022.
"We found that building the infrastructure to support Cloud Custodian required some investment in initially creating it and maintaining it," Matillano said. "As we were looking for a more scalable solution, Stacklet came into the picture [as] a good option for us to deploy Cloud Custodian at scale."
As with many cloud automation tools, Cloud Custodian can be deployed using a shift-left strategy, baking policy creation and deployments into the software delivery pipeline as apps are developed. That's an approach that also set it apart from built-in AWS cost management tools such as Cost Explorer, Matillano said.
"We use those tools as well, [but] oftentimes we would have to manually email people with [remediation] recommendations, whereas with Stacklet and Cloud Custodian, it automatically surfaces policy violations in Jira," he said. "That follows the philosophy of trying to meet the developers where they're at and use what they're comfortable with rather than emailing them or creating an entirely different tool."
Matillano said he's so far held off using Stacklet's tools to create automated cost-based gates to deployment for developers or to auto-remediate cost issues in production. For now, Stacklet policies are updated and redeployed on a quarterly basis, primarily in testing environments that can tolerate brief outages. Stacklet reports on production cost inefficiencies to create a jumping-off point for discussions early in the development process.
"We're looking at having different tiers of how we deploy policies," Matillano said. "But right now, the biggest waste is usually in our development environments where they forget to turn stuff off."
Avalara has not extensively explored using Stacklet and Cloud Custodian for SecOps, he added.
HBO Max employs Cloud Custodian for SecOps
HBO Max primarily uses Cloud Custodian's open source tools for security policy-as-code, not FinOps. But as with Avalara, it's the immediate, automatic enforcement of policies and integrations with developers' familiar tools that made the tool stand out.
"The whole point of Custodian for us is development velocity. You can quickly build something and quickly deploy, along with ease of reading," said Mrunal Shah, head of cloud security at the streaming TV provider based in New York. "Usually we could write the same thing in code using an AWS API, but in a handoff from one cloud engineer to another, there's the possibility something may get missed."
Cloud Custodian also plays a major role in HBO Max's efforts to shift security left and prevent cloud infrastructure misconfigurations from ever being deployed in production, if possible. Custodian is triggered by HBO's Jenkins pipelines to detect and block misconfigured resources in infrastructure-as-code, such as overly permissive AWS security groups or public S3 buckets, from being deployed.
Lindbergh Matillano Director of cost optimization, Avalara
Should a misconfiguration slip through, Cloud Custodian can automatically shut it down. But the goal is prevention rather than remediation, Shah said.
"We want to use the software development process to solve security problems," he said. "That's how I see the next frontier, which is being able to build automation, being able to prevent things from happening, and being able to react in a blink of an eye, because I think the future is bots fighting bots."
As it continues to use Cloud Custodian, HBO Max is conducting a proof-of-concept evaluation of Stacklet's commercial products. Cloud Custodian is limited to a command-line interface, while Stacklet offers a graphical set of dashboards and a queryable database of cloud resources, historical revisions and change management data.
Stacklet's SaaS delivery model is also a major part of its appeal, Shah added.
"There's some value to being able to build dashboards and being able to query our resources. But the major selling point is being able to extend this in a very simplistic way to other accounts without having to manage the upgrade process," he said.
Cloud Custodian preps Kubernetes policy support
Cloud Custodian has plenty of competition in both security policy-as-code and FinOps, including fellow CNCF projects Open Policy Agent (OPA) and Kyverno in cloud security, and non-CNCF project Hystax OptScale in FinOps.
OPA vendor Styra, among others, sells commercial policy-as-code tools that cover both cloud resources and Kubernetes clusters and commercial FinOps tools abound, from Apptio to VMware's CloudHealth. Some CI/CD platforms, such as Harness.io, also build in FinOps feedback for developers.
Cloud Custodian does not yet support Kubernetes, although that's on the project's roadmap for next quarter. That's when it will square off against OPA, which uses a DSL; RegO, which carries a steep learning curve for some users; and Kyverno, which uses YAML but is primarily focused on Kubernetes infrastructure.
Shah hasn't ruled out Cloud Custodian for Kubernetes but also hasn't seen a proof of concept yet.
"If Cloud Custodian can leverage YAML and make it simple, why not?" he said. "If it's one tool, one process, it's obviously better than having multiple tools to do the same thing."
Beth Pariseau, senior news writer at TechTarget, is an award-winning veteran of IT journalism. She can be reached at [email protected] or on Twitter @PariseauTT.