The Relativity of Wrong: configuration

This post describes the value provided by managing the infrastructure the same way you manage the source code of software applications, applying standard tools and best practices to the automation. The reference to infrastructure, of course, includes all cloud services incorporated in your architecture.

The following topics areI explore in this post. More posts will follow with a deeper investigation, and to show what is the link between Infrastructure as Code (IaC) and DevOps.

What does Infrastructure as Code mean?
Is IaC a product I can buy?
Most common use cases.
From where do I start?
Resources to practice with Infrastructure as Code.

What does Infrastructure as Code mean

Infrastructure as code (IaC) is the process of managing and provisioning data center environments through machine-readable definition files, rather than physical hardware configuration or interactive configuration tools.

The IT infrastructure managed by this process includes both physical equipment, such as bare-metal servers, storage and network, as well as virtual machines, and associated configuration resources. The same concept applies to public cloud resources, i.e. IaaS and PaaS services.

The definition files for Infrastructure as Code are maintained in a version control system, similarly to what we do with source code of software applications. Generally, in these files you describe the desired state of the system, rather than a sequence of commands that must be executed. This implies that you trust a component in the infrastructure, called a controller, delegating all the logic and the exception handling to it (or to more than one).

Descriptive model, not commands.

You don't configure the individual components of the system (e.g. 20 switches, or 5 servers or 15 virtual machines and their virtual network) one by one, in the right order, managing eventual error conditions and verifying manually that everything work as expected.

You simply describe what you expect the system to look like to a software controller, that owns the configuration of all the individual components. The controller knows how to contact, provision and configure the elements and to make sure your intent is realised. If any command fails, everything is rolled back to ensure a clean state. The APIC controller in the Cisco ACI architecture has this role, but many examples can be found among Cisco products and other vendors', and open-source solutions.

It is like ordering a slice of cake

versus preparing the cake yourself following the recipe from your grandma:

In other architectures you don't have a centralised controller, but the programmability of the individual targets and the API that they expose allow for a remote, automated management that is still much better than using the command line interface or any GUI offered by the device. One script could update the configuration of dozens of devices at the same time, e.g. adding a VLAN to all the switches in the network.

Treat infrastructure like software (source control, single source of truth).

The input files for this process are text files, using different formats based on the tool you use. They might contain variables, whose values are defined externally to make the template reusable (e.g. via environment variables, databases, or systems designed to keep secrets like Vault). I use the word template for Ansible playbooks, Terraform plans, etc.

In any case these are text files, like the files that contain the source code of a software application. And they can be treated the same way: stored in a versioning system, edited collaboratively, subject to role-based access control, retrieved and built automatically by a pipeline orchestrator.

When you adopt this approach, the latest validated version of the system configuration is stored in the versioning system. You can consider that one the single source of truth, rather than the current configuration of the system (that might be corrupted by uncontrolled manual changes, either made intentionally or by mistake, or consciously applied long time ago for a reason that nobody remembers today). Instead, the last committed version in the repository is documented (including the tests that it passed) and ready to be applied again to reset the system, in case you need to solve a configuration drift, or to clone the environment, or for other use cases that require consistency.

Provision and configure entire environments

One example is creating clones of a complete environment, including computing, network and storage resources, to deploy an application in the different phases of the release process. Even with different sizing, being generated by a single template (or blueprint) makes sure they are identical in the configuration that influences the behaviour of the applications deployed.
There will be no surprise due to a missing configuration of a firewall port, of a datastore or a vlan trunk: consistency is granted, troubleshooting is limited.

Ensure idempotence

Idempotence is the property of certain operations in mathematics and computer science whereby they can be applied multiple times without changing the result beyond the initial application (you can find the complete definition here). Well designed API ensure that repeated calls with the same input will not alter the state of the system, that is good because in case of retry you don't risk to create duplicated resources or other troubles.

Is Infrastructure as Code a product I can buy?

No: IaC is a methodology, not a product. It's a set of best practices that you can gradually adopt, and learn step by step. You can start with basic use cases, like creating a new tenant with a few associated resources, if that is a recurring activity and you want to make it faster and error-proof.

Then you will grow with more complex use cases, like creating an entire test environment on demand with all needed resources, including services from a cloud platform. The adoption of Infrastructure as Code is not a big bang, you don't need to build complete automation with no manual activity in one week. You can target quick wins, that validate the approach and generate momentum in the organization. The critical factor in the adoption is change management (i.e. the introduction of a new operational model and new ways of doing things), not the technology.

Of course you need supporting tools: automation frameworks like Ansible and/or Terraform (eventually also scripting languages could do the job), versioning systems, collaboration systems. And your infrastructure needs to be programmable (public cloud is programmable by default), meaning that your servers, networks, storage should be managed via software controllers or, at least, expose well-documented API.

Common use cases for Infrastructure as Code

Environments on demand

IT admins and the Operations teams receive a lot of requests from the applications teams: they need a configuration change to fix a problem or to deploy a new service, they need a new test environment, they need a clone for a new tenant, etc. Most of these requests cannot be satisfied immediately because of higher priorities, or because they require the collaboration of different teams that needs planning.

If the provisioning of the system - and some "day 2 operations" - was automated, with controlled execution of validated templates, both parties (Dev and Ops) would save time and be more satisfied... and efficient, for the benefit of the entire company.

Shared resource pool to increase efficiency

Some companies keep separate environments for each stage of a project: integration test, performance test, quality assurance, production, etc. Resources are always allocated, regardless they are only used - let's say - a week every three months (the interval may vary between one day and one year): only when they release a new version of the project, or have a maintenance windows for deploying bug fixes.

Keeping resources allocated when they are not in use is a waste of capacity, hence a waste of money. Imagine if you multiply the waste by the number of projects. But they cannot do it differently because of the complexity and the time required to build the different environments.

If they could - and with automation they can - recreate an identical environment, end to end, whenever required, they could dispose each environment as soon as it's no longer in use. Knowing that they can recreate it in minutes, they would reuse the returned resources for another stage or another project.
Using a shared resource pool (computing, networking and storage) for many project would be more efficient from a cost perspective. It applies to fixed capacity (less capex: you need to buy less hardware to satisfy all the requests) but also to pay per use scenarios (less opex: you dismiss resources when not needed).

Disaster Recovery

In case you need to repurpose existing or new resources to recover from a disaster, recreating a clone of the system from a single source of truth is much faster and safer. Generating the new infrastructure from the same blueprint that had created the old one, makes sure they are identical.

Blueprints and Compliance

Subject matter experts from every technology domain that collaborate to provision and maintain a system, instead of being engaged every time, could design and release Infrastructure as Code models once. Users (i.e. applications teams or other operations teams) could then use the blueprints for a self-service provisioning, without depending on the availability of the SME. The SME would save their time, feeling safe because the blueprints respect all the defined constraints and comply with the policies (no provisioning anarchy is allowed).

Auditing

Running automation scripts with standardised logging, or better using a pipeline orchestrator for provisioning and configuring systems, would trace what operations have been done, by whom, the input and the outcome. Very useful audit information, with no effort.

From where do I start?

Tools: Ansible and Terraform

Those are the most widely used tools for automation (with or without an Infrastructure as Code approach). They are open-source and free, easy to use. An enterprise version also exist, and in some cases you will find it very useful. But you can start practicing with the free tool and use it for years, with great advantage, if you are the only responsible for the infrastructure. In case of teamwork, you can still use the free version and dedicate some time to build your own operational model and additional tools, or you can switch to the enterprise version that makes it easy to scale at the enterprise level.

You can download the software from the Ansible and Terraform websites, along with good documentation and reusable examples (see below). Good tutorials are also available.

Single operating tool: Cisco Intersight

Cisco Intersight™ is a Software-as-a-Service (SaaS) hybrid cloud operations platform which delivers intelligent automation, observability, and optimization to customers for traditional and cloud-native applications and infrastructure. It supports Cisco Unified Computing System™ (Cisco UCS®) and Cisco HyperFlex™ hyperconverged infrastructure, other Intersight-connected devices, third-party Intersight-connected devices, cloud platforms and services, and other integration endpoints. Because it’s a SaaS-delivered platform, Intersight functionality increases and expands with weekly releases.

With Intersight, you get all of the benefits of SaaS delivery and full lifecycle management of distributed infrastructure and workloads across data centers, remote sites, branch offices, and edge environments. This empowers you to analyze, update, fix, and automate your environment in ways that were not previously possible. As a result, your organization can achieve significant TCO savings and deliver applications faster in support of new business initiatives.

Resources to practice Infrastructure as Code

DevNet - Cisco's developers community, that offers tutorials, sandboxes, labs and reusable assets. This is the Infrastructure as Code page at DevNet: https://developer.cisco.com/iac/

Terraform - documentation, download and tutorials at https://www.terraform.io/. The integration with Cisco Intersight is explained at https://www.hashicorp.com/resources/standardizing-hybrid-cloud-environments-with-hashicorpterraform-and-cisco-intersi

Ansible - documentation, download and tutorials can be found at https://docs.ansible.com/ansible/latest/installation_guide/intro_installation.html

Cisco workshops on Infrastructure as Code - feel free to contact me if you're interested in participating in our free, 3 x half days hands-on workshop.

The Relativity of Wrong

Pages

April 28, 2022

Infrastructure as Code: what's the advantage