Infrastructure as code is one of the core philosophies of the DevOps culture, which aims to reduce friction and improve collaboration between different organizations and teams. IaC applies proven best practices from software development, such as version control, testing and CI/CD, to strengthen the reliability, security and quality of the infrastructure being managed.
Today's technology world is changing at unprecedented speeds. Navigating this world of cloud providers, containers and container orchestration, service meshes, serverless, etc. can be daunting. This new-age infrastructure is less costly to change, however. We can add a load balancer with a single API call to the cloud provider, rather than procure and install additional hardware. This has freed teams to iteratively change, learn and improve. A team can deliver small changes, continuously test these changes and capitalize on short release cycles.
This approach reduces operational overhead and risk to manage or change infrastructure. Gone are the days where developers had to request hardware and wait for weeks for IT teams to procure, rack and stack it in a data center. This makes developers much more productive.
Infrastructure as code fundamentals
At its core, infrastructure as code allows teams to optimize for change. Since change is inevitable in this new-age infrastructure, consider setting up a Kubernetes cluster in your cloud provider. Cloud providers constantly add features requested by developers to their managed Kubernetes services, which means organizations constantly tweak it to best fit their needs.
This article is part of
This velocity of change can be intimidating. But if teams stick to the basic infrastructure as code principles, they'll be set up to successfully build and manage these modern, effective systems.
Use version control
In today's infrastructure landscape, almost every cloud platform and tool supports infrastructure as code or configuration as code. Providers' tools, such as AWS CloudFormation, Azure Resource Templates and HashiCorp Terraform, have a domain-specific language to declaratively define the end state of what the infrastructure looks like. These providers also are keen to roll out support to define infrastructure in standard programming languages; examples include Pulumi and the AWS Cloud Development Kit. These tools incorporate a foundational principle of modern IT infrastructure -- they are idempotent. Multiple runs of the tool don't create multiple instances of the resource; instead it tries to converge the current state to the desired state.
Open source software such as Docker and Kubernetes allows users to declaratively specify the container spec and deployment specifications in a YAML file. Configuration management tools such as Ansible, Chef and Puppet support the ability to specify the tasks to perform on a deployed operating system in a file. Using containers as the packaging mechanism allows teams to treat them as immutable infrastructure components. No change goes in once the application is packaged and deployed; every change (commit in version) creates an immutable artifact for later consumption.
If you manage any of these modern infrastructure systems, you have a way to define your infrastructure blueprints in files and store them inside version control, such as Git. Adding these artifacts inside version control gives the entire team visibility into the code used to provision infrastructure. Version control automatically adds traceability, rollback and correlation to the changes made to the infrastructure. It also can hook to CI/CD pipelines to automatically trigger action for a change introduced.
Teams should strive to put their operation runbooks inside version control as well. These runbooks can be scripts, packages or modules (Bash, Python, PowerShell, etc.), Jupyter notebooks, or markdown files. Why go through all this effort when the change can be done via the click of a button in the UI? Remember that this approach to infrastructure is optimized for change. Changes made to these systems are frequent, and should be automated and placed under version control.
Many teams that embrace this fundamental concept stop at this point. But putting their code in version control is just the first step in the journey -- this opens doors for other teams to see your code, contribute and collaborate. Be open to pull requests in other repositories maintained by other teams and individuals. Remember, the DevOps movement is about culture and transformation.
After teams put their source code in version control, they soon realize that multiple people make changes to these files and submit them back. Even small changes can have a tremendous impact on the infrastructure deployed. Teams must determine how to validate changes and their results safely and without affecting production environments.
It sounds like a good idea to test changes to your infrastructure, but the overhead to build and maintain a test suite is more work than using infrastructure as code. Why make that effort to test changes to infrastructure?
Testing builds confidence to deploy these changes safely. Imagine that an engineer changes something in the version control repo, and before these changes are deployed the validations within a CI/CD system warn of a potential issue solely because of the test suite validating incoming changes. Confidence to make infrastructure changes frees a team from the fear of making change -- tests should be written to catch risks, not introduce them.
Writing tests for infrastructure is a learning process, and teams can build iteratively upon them. For instance, a cloud deployment failed because of exceeded quotas. However, there is still a risk that the deployment fails at the final stage where it tries to deploy. Ideally, infrastructure teams can author a test that checks before the deployment begins, to catch the risk of failure.
Often, infrastructure teams add low-level tests for their declarative code, which becomes a pain to manage over time. For example, they set a disk size in a declarative configuration tool, such as Terraform, CloudFormation or ARM templates, and confirm that size in a low-level test, which asserts that the correct size is set. These types of reflective tests don't generate any value, as the tool that delivers the declarative configuration module also applies it. Instead, check that the configuration is actually applied at this stage -- if the configuration is applied, the desired state must be met. If there are bugs, add specific tests for those.
As you start to roll out changes, document the failures and associated risks and ask yourself: Can we test for this risk before the deployment begins, to catch it early? If the answer is yes, add it to the test suite.
There is traction in the software engineering realm to test in production, even from an infrastructure view, because it is hard to replicate what happens in production inside a sandbox environment. Teams that successfully do this are highly mature and have established guardrails to manage risks of testing in production, such as monitoring, observability and mature deployment schemes. If you're starting out with validating changes, tackle the known risks from your test suite now; as your experience and confidence grow, organically develop into a test-in-production methodology.
Integrate with a CI/CD pipeline
Now we have our code definition and a test suite that comes to life inside a CI/CD pipeline. When applied to infrastructure-as-code projects, this means teams can lint their configuration files and run unit tests on top of the code definitions to provide immediate feedback to the developer making changes. Later, during another stage in the pipeline, teams can test these code definitions against a temporary sandbox environment and publish the results.
Once the changes are thoroughly tested, they can be packaged inside a versioned artifact and made available for later pipelines to consume and deploy infrastructure blueprints from them, i.e., continuous delivery. The key concept which many teams do not follow is that these artifacts, generated for a change that was introduced, should enable teams to track these changes back to version control. To achieve this, build artifacts with a versioning scheme such as semantic versioning. If there is a failure down the stages, it can be tied back to a change that was introduced.
Next step: Change management
At this point, application pipelines enter the CD stage and deploy a production-ready version on the infrastructure. For infrastructure artifacts, the next evolutionary stage in their lifecycle is the change management pipeline, which extends the software delivery pipeline mechanism to also deliver changes to the infrastructure. The lifecycle for infrastructure as code or configuration as code is not over yet, because the blueprints are tested and packaged but they don't do anything fruitful for an organization until they actually deploy infrastructure.
In the change management approach, user input is captured as a commit inside the version control -- remember to put everything inside version control, even the user input -- which is then raised as a pull request with an intent to merge to master. While the pull request provides a feedback and review mechanism, it can run certain tests such as linting and unit tests to provide immediate feedback, while a human also reviews these changes. Once reviewed and merged to master, a pipeline job/agent picks up this change and tries to reconcile the state of infrastructure to what exists inside the version control branch. This practice is often referred to as GitOps and is gaining momentum with projects like Flux for Kubernetes.