Anterovium - Fotolia

Tip

Infrastructure as code principles: How IaC works and how to use it

What are the most important things to know about infrastructure as code if you're going to implement it? Follow these guidelines to build a solid IaC foundation.

Deepak Singh Dhami

By

Deepak Singh Dhami

Published: 24 Nov 2020

Infrastructure as code is one of the core philosophies of the DevOps culture, which aims to reduce friction and improve collaboration between different organizations and teams. IaC applies proven best practices from software development, such as version control, testing and CI/CD, to strengthen the reliability, security and quality of the infrastructure being managed.

Today's technology world is changing at unprecedented speeds. Navigating this world of cloud providers, containers and container orchestration, service meshes, serverless, etc. can be daunting. This new-age infrastructure is less costly to change, however. We can add a load balancer with a single API call to the cloud provider, rather than procure and install additional hardware. This has freed teams to iteratively change, learn and improve. A team can deliver small changes, continuously test these changes and capitalize on short release cycles.

This approach reduces operational overhead and risk to manage or change infrastructure. Gone are the days where developers had to request hardware and wait for weeks for IT teams to procure, rack and stack it in a data center. This makes developers much more productive.

Infrastructure as code fundamentals

At its core, infrastructure as code allows teams to optimize for change. Since change is inevitable in this new-age infrastructure, consider setting up a Kubernetes cluster in your cloud provider. Cloud providers constantly add features requested by developers to their managed Kubernetes services, which means organizations constantly tweak it to best fit their needs.

This velocity of change can be intimidating. But if teams stick to the basic infrastructure as code principles, they'll be set up to successfully build and manage these modern, effective systems.

Use version control

In today's infrastructure landscape, almost every cloud platform and tool supports infrastructure as code or configuration as code. Providers' tools, such as AWS CloudFormation, Azure Resource Templates and HashiCorp Terraform, have a domain-specific language to declaratively define the end state of what the infrastructure looks like. These providers also are keen to roll out support to define infrastructure in standard programming languages; examples include Pulumi and the AWS Cloud Development Kit. These tools incorporate a foundational principle of modern IT infrastructure -- they are idempotent. Multiple runs of the tool don't create multiple instances of the resource; instead it tries to converge the current state to the desired state.

Open source software such as Docker and Kubernetes allows users to declaratively specify the container spec and deployment specifications in a YAML file. Configuration management tools such as Ansible, Chef and Puppet support the ability to specify the tasks to perform on a deployed operating system in a file. Using containers as the packaging mechanism allows teams to treat them as immutable infrastructure components. No change goes in once the application is packaged and deployed; every change (commit in version) creates an immutable artifact for later consumption.

If you manage any of these modern infrastructure systems, you have a way to define your infrastructure blueprints in files and store them inside version control, such as Git. Adding these artifacts inside version control gives the entire team visibility into the code used to provision infrastructure. Version control automatically adds traceability, rollback and correlation to the changes made to the infrastructure. It also can hook to CI/CD pipelines to automatically trigger action for a change introduced.

Teams should strive to put their operation runbooks inside version control as well. These runbooks can be scripts, packages or modules (Bash, Python, PowerShell, etc.), Jupyter notebooks, or markdown files. Why go through all this effort when the change can be done via the click of a button in the UI? Remember that this approach to infrastructure is optimized for change. Changes made to these systems are frequent, and should be automated and placed under version control.

Many teams that embrace this fundamental concept stop at this point. But putting their code in version control is just the first step in the journey -- this opens doors for other teams to see your code, contribute and collaborate. Be open to pull requests in other repositories maintained by other teams and individuals. Remember, the DevOps movement is about culture and transformation.

Validate changes

After teams put their source code in version control, they soon realize that multiple people make changes to these files and submit them back. Even small changes can have a tremendous impact on the infrastructure deployed. Teams must determine how to validate changes and their results safely and without affecting production environments.

It sounds like a good idea to test changes to your infrastructure, but the overhead to build and maintain a test suite is more work than using infrastructure as code. Why make that effort to test changes to infrastructure?

Testing builds confidence to deploy these changes safely. Imagine that an engineer changes something in the version control repo, and before these changes are deployed the validations within a CI/CD system warn of a potential issue solely because of the test suite validating incoming changes. Confidence to make infrastructure changes frees a team from the fear of making change -- tests should be written to catch risks, not introduce them.

Writing tests for infrastructure is a learning process, and teams can build iteratively upon them. For instance, a cloud deployment failed because of exceeded quotas. However, there is still a risk that the deployment fails at the final stage where it tries to deploy. Ideally, infrastructure teams can author a test that checks before the deployment begins, to catch the risk of failure.

Often, infrastructure teams add low-level tests for their declarative code, which becomes a pain to manage over time. For example, they set a disk size in a declarative configuration tool, such as Terraform, CloudFormation or ARM templates, and confirm that size in a low-level test, which asserts that the correct size is set. These types of reflective tests don't generate any value, as the tool that delivers the declarative configuration module also applies it. Instead, check that the configuration is actually applied at this stage -- if the configuration is applied, the desired state must be met. If there are bugs, add specific tests for those.

As you start to roll out changes, document the failures and associated risks and ask yourself: Can we test for this risk before the deployment begins, to catch it early? If the answer is yes, add it to the test suite.

There is traction in the software engineering realm to test in production, even from an infrastructure view, because it is hard to replicate what happens in production inside a sandbox environment. Teams that successfully do this are highly mature and have established guardrails to manage risks of testing in production, such as monitoring, observability and mature deployment schemes. If you're starting out with validating changes, tackle the known risks from your test suite now; as your experience and confidence grow, organically develop into a test-in-production methodology.

Integrate with a CI/CD pipeline

Now we have our code definition and a test suite that comes to life inside a CI/CD pipeline. When applied to infrastructure-as-code projects, this means teams can lint their configuration files and run unit tests on top of the code definitions to provide immediate feedback to the developer making changes. Later, during another stage in the pipeline, teams can test these code definitions against a temporary sandbox environment and publish the results.

Once the changes are thoroughly tested, they can be packaged inside a versioned artifact and made available for later pipelines to consume and deploy infrastructure blueprints from them, i.e., continuous delivery. The key concept which many teams do not follow is that these artifacts, generated for a change that was introduced, should enable teams to track these changes back to version control. To achieve this, build artifacts with a versioning scheme such as semantic versioning. If there is a failure down the stages, it can be tied back to a change that was introduced.

Next step: Change management

At this point, application pipelines enter the CD stage and deploy a production-ready version on the infrastructure. For infrastructure artifacts, the next evolutionary stage in their lifecycle is the change management pipeline, which extends the software delivery pipeline mechanism to also deliver changes to the infrastructure. The lifecycle for infrastructure as code or configuration as code is not over yet, because the blueprints are tested and packaged but they don't do anything fruitful for an organization until they actually deploy infrastructure.

In the change management approach, user input is captured as a commit inside the version control -- remember to put everything inside version control, even the user input -- which is then raised as a pull request with an intent to merge to master. While the pull request provides a feedback and review mechanism, it can run certain tests such as linting and unit tests to provide immediate feedback, while a human also reviews these changes. Once reviewed and merged to master, a pipeline job/agent picks up this change and tries to reconcile the state of infrastructure to what exists inside the version control branch. This practice is often referred to as GitOps and is gaining momentum with projects like Flux for Kubernetes.

Dig Deeper on Systems automation and orchestration

Search Software Quality

7 essential macOS code editors
Learn about the top code editors for MacOS. Make your choice from the following list of code editors based on price, features, ...
Google adds Gemini CLI for GitHub Actions coding agent
The beta version of Google Gemini CLI for GitHub Actions starts simple and builds in security, but overall, the 'honeymoon phase'...
Scrum master certification exam questions and answers
Are you ready for the Scrum master certification exam? Test yourself on these 10 tough Scrum master exam questions and answers.

Search App Architecture

Insomnia vs. Postman: Comparing API management tools
Insomnia has a streamlined interface and focus. Postman has extensive features for end-to-end development. Choosing comes down to...
8 best practices for creating architecture decision records
An ADR is only as good as the record quality. Follow these best practices to establish a dependable ADR creation and maintenance ...
Refactor vs. rewrite: Deciding how to fix problem software
At some point, all developers must decide whether to refactor code or rewrite it. Base this choice on factors such as ...

Search Cloud Computing

Evaluating AIaaS providers: 6-point criteria for success
Is your organization pursuing innovative AI deployments that consistently achieve organizational goals and compliance? Consider ...
MELT away your cloud observability troubles with open source
In today's complex cloud environments, enterprises face a critical visibility challenge. Comprehensive observability isn't just a...
The cloud observability quiz: Are you monitoring or observing?
Ready to test your cloud observability expertise? Discover if you can distinguish between metrics, logs and traces while ...

Search AWS

Compare Datadog vs. New Relic for IT monitoring in 2024
Compare Datadog vs. New Relic capabilities including alerts, log management, incident management and more. Learn which tool is ...
AWS Control Tower aims to simplify multi-account management
Many organizations struggle to manage their vast collection of AWS accounts, but Control Tower can help. The service automates ...
Break down the Amazon EKS pricing model
There are several important variables within the Amazon EKS pricing model. Dig into the numbers to ensure you deploy the service ...

TheServerSide.com

Vibe coding tutorial with Replit and GitHub Copilot
Vibe coding, or using AI agents to create application code, is all the rage today. This video tutorial shows how it works using ...
Product backlog vs. sprint backlog: What's the difference?
The sprint backlog and product backlog are important elements of Scrum and essential to iterative and incremental development. ...
Acceptance criteria vs. definition of done: What's the difference?
Software teams must understand the important distinction between acceptance criteria and definition of done and how to use them ...

Search Data Center

The increasing concern of data center land acquisition
Data center land acquisition is increasing due to the growing demand for capacity and AI workloads. By 2030, facility areas are ...
Nvidia introduces entry-level RTX Pro GPU
The company's RTX Pro 6000 Blackwell Server Edition GPU and RTX Pro Server offer companies using smaller-scale enterprise ...
Server hardware guide: Architecture, products and management
Today's server platforms offer various options for SMBs and enterprise IT buyers; it's important to learn the essentials before ...

Close