The Relativity of Wrong: public cloud

Showing posts with label public cloud. Show all posts

April 28, 2022

Infrastructure as Code: what's the advantage

This post describes the value provided by managing the infrastructure the same way you manage the source code of software applications, applying standard tools and best practices to the automation. The reference to infrastructure, of course, includes all cloud services incorporated in your architecture.

The following topics areI explore in this post. More posts will follow with a deeper investigation, and to show what is the link between Infrastructure as Code (IaC) and DevOps.

What does Infrastructure as Code mean?
Is IaC a product I can buy?
Most common use cases.
From where do I start?
Resources to practice with Infrastructure as Code.

What does Infrastructure as Code mean

Infrastructure as code (IaC) is the process of managing and provisioning data center environments through machine-readable definition files, rather than physical hardware configuration or interactive configuration tools.

The IT infrastructure managed by this process includes both physical equipment, such as bare-metal servers, storage and network, as well as virtual machines, and associated configuration resources. The same concept applies to public cloud resources, i.e. IaaS and PaaS services.

The definition files for Infrastructure as Code are maintained in a version control system, similarly to what we do with source code of software applications. Generally, in these files you describe the desired state of the system, rather than a sequence of commands that must be executed. This implies that you trust a component in the infrastructure, called a controller, delegating all the logic and the exception handling to it (or to more than one).

Descriptive model, not commands.

You don't configure the individual components of the system (e.g. 20 switches, or 5 servers or 15 virtual machines and their virtual network) one by one, in the right order, managing eventual error conditions and verifying manually that everything work as expected.

You simply describe what you expect the system to look like to a software controller, that owns the configuration of all the individual components. The controller knows how to contact, provision and configure the elements and to make sure your intent is realised. If any command fails, everything is rolled back to ensure a clean state. The APIC controller in the Cisco ACI architecture has this role, but many examples can be found among Cisco products and other vendors', and open-source solutions.

It is like ordering a slice of cake

versus preparing the cake yourself following the recipe from your grandma:

In other architectures you don't have a centralised controller, but the programmability of the individual targets and the API that they expose allow for a remote, automated management that is still much better than using the command line interface or any GUI offered by the device. One script could update the configuration of dozens of devices at the same time, e.g. adding a VLAN to all the switches in the network.

Treat infrastructure like software (source control, single source of truth).

The input files for this process are text files, using different formats based on the tool you use. They might contain variables, whose values are defined externally to make the template reusable (e.g. via environment variables, databases, or systems designed to keep secrets like Vault). I use the word template for Ansible playbooks, Terraform plans, etc.

In any case these are text files, like the files that contain the source code of a software application. And they can be treated the same way: stored in a versioning system, edited collaboratively, subject to role-based access control, retrieved and built automatically by a pipeline orchestrator.

When you adopt this approach, the latest validated version of the system configuration is stored in the versioning system. You can consider that one the single source of truth, rather than the current configuration of the system (that might be corrupted by uncontrolled manual changes, either made intentionally or by mistake, or consciously applied long time ago for a reason that nobody remembers today). Instead, the last committed version in the repository is documented (including the tests that it passed) and ready to be applied again to reset the system, in case you need to solve a configuration drift, or to clone the environment, or for other use cases that require consistency.

Provision and configure entire environments

One example is creating clones of a complete environment, including computing, network and storage resources, to deploy an application in the different phases of the release process. Even with different sizing, being generated by a single template (or blueprint) makes sure they are identical in the configuration that influences the behaviour of the applications deployed.
There will be no surprise due to a missing configuration of a firewall port, of a datastore or a vlan trunk: consistency is granted, troubleshooting is limited.

Ensure idempotence

Idempotence is the property of certain operations in mathematics and computer science whereby they can be applied multiple times without changing the result beyond the initial application (you can find the complete definition here). Well designed API ensure that repeated calls with the same input will not alter the state of the system, that is good because in case of retry you don't risk to create duplicated resources or other troubles.

Is Infrastructure as Code a product I can buy?

No: IaC is a methodology, not a product. It's a set of best practices that you can gradually adopt, and learn step by step. You can start with basic use cases, like creating a new tenant with a few associated resources, if that is a recurring activity and you want to make it faster and error-proof.

Then you will grow with more complex use cases, like creating an entire test environment on demand with all needed resources, including services from a cloud platform. The adoption of Infrastructure as Code is not a big bang, you don't need to build complete automation with no manual activity in one week. You can target quick wins, that validate the approach and generate momentum in the organization. The critical factor in the adoption is change management (i.e. the introduction of a new operational model and new ways of doing things), not the technology.

Of course you need supporting tools: automation frameworks like Ansible and/or Terraform (eventually also scripting languages could do the job), versioning systems, collaboration systems. And your infrastructure needs to be programmable (public cloud is programmable by default), meaning that your servers, networks, storage should be managed via software controllers or, at least, expose well-documented API.

Common use cases for Infrastructure as Code

Environments on demand

IT admins and the Operations teams receive a lot of requests from the applications teams: they need a configuration change to fix a problem or to deploy a new service, they need a new test environment, they need a clone for a new tenant, etc. Most of these requests cannot be satisfied immediately because of higher priorities, or because they require the collaboration of different teams that needs planning.

If the provisioning of the system - and some "day 2 operations" - was automated, with controlled execution of validated templates, both parties (Dev and Ops) would save time and be more satisfied... and efficient, for the benefit of the entire company.

Shared resource pool to increase efficiency

Some companies keep separate environments for each stage of a project: integration test, performance test, quality assurance, production, etc. Resources are always allocated, regardless they are only used - let's say - a week every three months (the interval may vary between one day and one year): only when they release a new version of the project, or have a maintenance windows for deploying bug fixes.

Keeping resources allocated when they are not in use is a waste of capacity, hence a waste of money. Imagine if you multiply the waste by the number of projects. But they cannot do it differently because of the complexity and the time required to build the different environments.

If they could - and with automation they can - recreate an identical environment, end to end, whenever required, they could dispose each environment as soon as it's no longer in use. Knowing that they can recreate it in minutes, they would reuse the returned resources for another stage or another project.
Using a shared resource pool (computing, networking and storage) for many project would be more efficient from a cost perspective. It applies to fixed capacity (less capex: you need to buy less hardware to satisfy all the requests) but also to pay per use scenarios (less opex: you dismiss resources when not needed).

Disaster Recovery

In case you need to repurpose existing or new resources to recover from a disaster, recreating a clone of the system from a single source of truth is much faster and safer. Generating the new infrastructure from the same blueprint that had created the old one, makes sure they are identical.

Blueprints and Compliance

Subject matter experts from every technology domain that collaborate to provision and maintain a system, instead of being engaged every time, could design and release Infrastructure as Code models once. Users (i.e. applications teams or other operations teams) could then use the blueprints for a self-service provisioning, without depending on the availability of the SME. The SME would save their time, feeling safe because the blueprints respect all the defined constraints and comply with the policies (no provisioning anarchy is allowed).

Auditing

Running automation scripts with standardised logging, or better using a pipeline orchestrator for provisioning and configuring systems, would trace what operations have been done, by whom, the input and the outcome. Very useful audit information, with no effort.

From where do I start?

Tools: Ansible and Terraform

Those are the most widely used tools for automation (with or without an Infrastructure as Code approach). They are open-source and free, easy to use. An enterprise version also exist, and in some cases you will find it very useful. But you can start practicing with the free tool and use it for years, with great advantage, if you are the only responsible for the infrastructure. In case of teamwork, you can still use the free version and dedicate some time to build your own operational model and additional tools, or you can switch to the enterprise version that makes it easy to scale at the enterprise level.

You can download the software from the Ansible and Terraform websites, along with good documentation and reusable examples (see below). Good tutorials are also available.

Single operating tool: Cisco Intersight

Cisco Intersight™ is a Software-as-a-Service (SaaS) hybrid cloud operations platform which delivers intelligent automation, observability, and optimization to customers for traditional and cloud-native applications and infrastructure. It supports Cisco Unified Computing System™ (Cisco UCS®) and Cisco HyperFlex™ hyperconverged infrastructure, other Intersight-connected devices, third-party Intersight-connected devices, cloud platforms and services, and other integration endpoints. Because it’s a SaaS-delivered platform, Intersight functionality increases and expands with weekly releases.

With Intersight, you get all of the benefits of SaaS delivery and full lifecycle management of distributed infrastructure and workloads across data centers, remote sites, branch offices, and edge environments. This empowers you to analyze, update, fix, and automate your environment in ways that were not previously possible. As a result, your organization can achieve significant TCO savings and deliver applications faster in support of new business initiatives.

Resources to practice Infrastructure as Code

DevNet - Cisco's developers community, that offers tutorials, sandboxes, labs and reusable assets. This is the Infrastructure as Code page at DevNet: https://developer.cisco.com/iac/

Terraform - documentation, download and tutorials at https://www.terraform.io/. The integration with Cisco Intersight is explained at https://www.hashicorp.com/resources/standardizing-hybrid-cloud-environments-with-hashicorpterraform-and-cisco-intersi

Ansible - documentation, download and tutorials can be found at https://docs.ansible.com/ansible/latest/installation_guide/intro_installation.html

Cisco workshops on Infrastructure as Code - feel free to contact me if you're interested in participating in our free, 3 x half days hands-on workshop.

May 25, 2020

Cost control in a multicloud world - part 2

This post is the second part of a discussion about optimization of the cloud budget and cost control in your IT organization. The first part is here.

The CloudCenter Suite

The CloudCenter Suite is a key component in this framework: it is made of three modules that offer provisioning, lifecycle automation and – what we are discussing here – cost control.

The Cost Optimizer module collects data from the API of all the cloud providers to create a series of detailed reports, that you can use to understand where and why you spend your money. You can slice and dice the information across different dimensions, and you can schedule reports or download them to feed a billing system.

complete and granular reporting of cloud spend

You can create any custom organizational hierarchy to map your business (departments, customers, projects, etc.) and give them visibility and responsibility of the budget consumption.

custom hierarchy of cost groups

budget management

The Cost Optimizer gives you recommendations for rightsizing, wherever your assets are deployed, analysing their behavior. You can evaluate the suggestion and then act manually, or you can enable the tool to do it for you automatically.

recommendations for rightsizing and other optimizations

It will also tell you when it makes sense to adopt reserved instances to get a discount, showing you how much you’re going to save.

Act now to reduce your spending… across all clouds

The best practices and the tools are there. You can choose among tools that are specific to a cloud provider (they all offer good solutions) or a cloud agnostic solution that works with all clouds.

With the Cisco CloudCenter Suite you get a fully detailed report of your inventory, all the services you are consuming everywhere, and the efficiency from a budget standpoint.

You also get actionable suggestions to save money that can be, if you configure the tool to do so, executed automatically. If you prefer, you can just get the list of suggested actions and implement them manually.

Thanks to the Cisco CloudCenter Suite you can setup a common governance model and a set of policies in one place, instead of replicating the build of reports and automation in every single cloud, based on full visibility that goes beyond the suggestions given by your service providers.

All you have to do now is to download and test the CloudCenter Suite (30 days trial), or contact us for a live demo or a discussion of your use cases.

Cost control in a multicloud world

Using public cloud is not as cheap as you expected?

Is allocating budget and verifying spending a little more difficult than they had told you?

You’ll probably find some best practices and tools for cost control in public cloud services, described in this series of two posts, useful to improve the efficiency of your consumption. They apply to every IaaS and PaaS service in a multicloud context.

What CxO expect from IT

Surveys show that the enterprise IT is getting a predominant role in front of the lines of business when they need cloud services. Not just to select the best solution for them, but to help keeping the financial aspects in control (source: RightScale State of the Cloud Report).

role of enterprise IT in cloud decisions

Looking at the initiatives, that is where budget is allocated, cost control and cost saving is on top:

Allocation of budget for cloud initiatives

Like drug dealers

Cloud providers are making a lot of money. And, of course, they want more customers and they want to retain the existing ones. Competitors steal some, and others just fly away when they realize that operating applications in production is more expensive than they had thought.

So, they offer you some candies, just to taste what a beautiful trip they can offer you. Even with a free account, you get amazing services that work very well. See also Azure and Google. They all make building business applications quick and easy.

appealing services offered by AWS

Developers are attracted by that, and they love creating cloud native applications using PaaS services. But they do not realize, or they just don’t care of, how expensive it will be to run applications in production.

In addition, they generate a lock-in because PaaS services are not portable across clouds.

Is public cloud easier or is it cheaper?

Now ask yourself this question: do you use the public cloud because it’s easier or because it’s cheaper? Or maybe you’re just lazy and you want to delegate the SLA responsibility to the service provider?

I will not discuss if it’s easier in this post. But, for sure, it’s not cheaper. Budgets go out of control very often.

Those are some tips from Amazon (source: AWS re:Invent2019)

tips for saving from AWS

Of course, cloud providers want to help you to be efficient but… not too much! The two green actions look like the most effective, now let’s see if we can do anything better.

Cost control framework

We could create a cost control framework, as the large company I took this information from.

cost saving framework

We focus mostly on two aspects: avoidance of unnecessary costs and iteration of the improvement actions.

In the avoidance best practice, we concentrate on the efficiency of VM snapshots retention: how many and how long. Storage volumes and public ip addresses should be monitored, because they generate a cost. When you use a S3 storage you can optimize the cost based on frequency of access, speed and size. And VM should not be larger than need, because you pay also the capacity in excess. Suspending VM at night, in the weekend or whenever the business application is not used, also saves money. And so does scaling capacity dynamically

You should monitor the situation carefully, and automation helps here. You can setup alerts and governance policies, having reports sent to the stakeholders. Automation in provisioning, but mostly in reporting and resource adjustment is key.

You can obtain important savings

This is what cost optimization saved in the context of a large SaaS service operations. Most of the saving comes from adoption of reserved instances where appropriate and from cleaning environments used for proof of concepts. The total saving is $7M per year.

savings from cloud optimization

And this is what they are planning: choosing the right instance types, rightsizing, cleaning up will save $7M more per year.

expected savings from recurring optimization

Cost control: a solution

We will examine a simple 3 steps process to save your budget.

Understand your assets and ecosystem
Optimize based on best practices, on investigation through your assets or... recommendations from a tool
Avoid unnecessary costs and iterate of the improvement actions

Easy principles to set a governance model are:

Define (and enforce) budgets for your customers, projects and departments.
Monitor and report about that.
Set policies to automate suspension, autoscaling and cleanup.

Cisco offers a solution that helps you to achieve all that, quick and easy.

We have built a framework that is based on the feedback loop suggested by best practices (e.g. the second principle of DevOps recommends looking at the entire system as a whole and iterating the optimization).

the optimization loop

In next post you will find a solution for implementing your cost control quickly and consistently across all the clouds you use: any combination of private cloud technologies and public clouds, with no need to adopt specific solutions for each individual target.

Keep reading, it will prove useful for your strategy.

January 22, 2020

Teaching Alexa to deploy applications in any cloud

Alexa, deploy a webserver in AWS and a database in Azure!

Recently I presented a session at Codemotion Milan, with my colleague Stefano Gioia.

We demonstrated how the API exposed by the Cisco CloudCenter Suite can be easily integrated from within an Alexa skill.

Of course, this is not something you would do in the real life in a production environment 🙂

But it’s an easy and funny way to show how easy the integration is, and we found it attractive for our customers and partners.

Instead of Alexa you could use any client, like a custom script from the command line, a workflow engine, a web portal or a ITSM system like ServiceNow to achieve the same result.

Any program that can do a REST call can drive CCS (CloudCenter Suite) externally to orchestrate the lifecycle of a software deployment.

In case you use Alexa, you will code the REST client logic in the serverless implementation of the “skill” that is executed as a Lambda function. You can use different languages to create the skills: we chose node.js for the demo.

We decided to show some basic CCS features like deploying any kind of software in any cloud, or measuring the cost of all the services we are consuming in all our clouds (for running VM and containers, for consuming cloud services like load balancers or network bandwidth, for running serverless functions. etc.). Of course there is much more in the product, but we wanted to keep the demo light and funny.

The 3 modules of the Cisco CloudCenter Suite

Everything you can do in the CloudCenter web portal can also be done through its REST API, and the CCS documentation shows examples you can easily reuse and adapt.

These are the API targets, for the different modules, that we used in the implementation of the skill:

Suite Admin API

/suite-idm/

E.g.: https://na.cloudcenter.cisco.com/suite-idm/api/v1/tenants

Workload Manager API

/cloudcenter-ccm-backend/api/v2/apps

E.g.: https://na.cloudcenter.cisco.com/cloudcenter-ccm-backend/api/v2/apps

Action Orchestrator API

/be-console/

E.g.: https://na.cloudcenter.cisco.com/be-console/api/v1/workflow

Cost Optimizer API

cloudcenter-shared-api

E.g.: https://na.cloudcenter.cisco.com/cloudcenter-shared-api/api/v1/costByProvider?cloudGroupId’

Next picture shows the high level process that allows a user to get something done by Alexa, just by speaking to a Echo device or to the Alexa mobile application.

The speech recognition system translates the user’s voice to text, then the “intent” of the user is matched to one of the functions available in the skill. Skills and their intent are executed based on patterns and keywords that the system is able to recognize in the natural language, thanks to machine learning algorithms.

Recognizing an intent triggers your custom code, that is generally a Lambda function (the Amazon developer console makes it easy to write the code and to host it in the Lambda service, providing also reusable examples). The outcome is rendered as audio or, depending on the device, also as a video.

In our specific demo, we put the client code for the Cisco CloudCenter API in the serverless implementation of the intent.
These are the commands that we can give Alexa:

give me the list of existing tenants
list all configured target clouds
deploy a database or a web server in a cloud
show current cost of all cloud services

Here you can see a sample of the capabilities of Alexa when it calls the Cisco CCS API:

Building a new Alexa custom Skill: as the skill developer, you have to:

Define the requests the skill can handle
Define the name Alexa uses to identify your skill, called the invocation name
Define the utterances and input variables, called slots
Write the code to fulfill the request
Test it from the developer console or from your Alexa device

The documentation at the Amazon Developer console contains excellent tutorials to build Alexa skills.
You can learn easily to create a Hello World skill, then you are ready to incorporate the client code to call the CloudCenter Suite API.
Stefano has published his examples in github here, feel free to test it yourself.

This demo demonstrates that it's easy to build a client to drive the API exposed by CCS.
And it helps positioning the CloudCenter Suite as a mediation layer in your architecture, to orchestrate the lifecycle management and to define a governance model including cloud cost control.

Pages