The Relativity of Wrong

August 25, 2023

Carbon footprint of your full stack

This post discusses the following subjects:

What is a ESG strategy
You can't control what you don't measure
How to measure the full stack carbon footprint
A solution from Climatiq
Extensibility of the Cisco FSO Platform
Just one month to build the integration

What is a ESG strategy

In the context of information technology (IT), an ESG strategy involves applying the principles of Environmental, Social, and Governance factors specifically to the IT industry and the use of technology within organizations. Here's how each element applies to IT:

1. Environmental (E): In IT, the environmental aspect focuses on the impact of technology on the environment. This includes evaluating the energy consumption of data centers and IT infrastructure, as well as assessing the carbon footprint associated with digital operations and electronic waste management. Companies can adopt environmentally friendly practices by using renewable energy for data centers, implementing energy-efficient hardware and software solutions, and responsibly recycling electronic equipment.

2. Social (S): In the social dimension of IT, the focus is on how technology affects people and communities. This involves considering the ethical use of data, ensuring data privacy and security, promoting digital inclusion, and addressing issues like digital divide and accessibility. Socially responsible IT companies prioritize the protection of user data, work to bridge the digital gap, and develop inclusive technologies that cater to a diverse range of users.

3. Governance (G): Governance in IT pertains to how technology is managed and governed within organizations. This includes assessing the transparency of IT decision-making, the adherence to IT policies and regulations, and the alignment of IT strategies with broader business goals. Good IT governance ensures that technology is used responsibly and ethically and that it aligns with the overall values and objectives of the organization.

By incorporating ESG principles into IT strategies, organizations aim to become more environmentally conscious, socially responsible, and ethically governed in their technological practices. This not only helps businesses reduce their environmental impact and enhance their reputation but also contributes to a more sustainable and inclusive digital world. Additionally, investors in the IT sector increasingly consider ESG factors when evaluating companies' performance and long-term viability.ESG strategy

The purpose of an ESG Strategy is to demonstrate the environmental, social, and governance factors that your organisation believes to be intrinsically important to consider within your current and future business operations.

This post focuses on the "E" in ESG.

You can't control what you don't measure

Collect data toward key performance indicators (KPIs): Once the ESG plan is up and running, it’s time to start collecting data. ESG processes benefit businesses because they provide objective metrics that prove the success of social responsibility efforts. Use the data you gather to track KPIs, measuring success along the way.

How to measure the full stack carbon footprint

Measuring the carbon footprint from the full IT stack involves assessing the environmental impact of various components and processes within the IT infrastructure. Here's a step-by-step guide to help you get started:

1. Identify Components: First, make a list of all the components within your IT stack. This typically includes data centers, servers, network devices, storage systems, end-user devices (e.g., laptops, desktops, smartphones), and any other IT-related equipment.

2. Energy Consumption: Measure the energy consumption of each component. This can be done using energy monitoring tools, power meters, or data provided by equipment manufacturers. Take into account both the direct energy usage (electricity consumed by the IT equipment) and indirect energy usage (cooling systems, ventilation, etc.).

3. Data Center Efficiency: If you have data centers, assess their efficiency. PUE (Power Usage Effectiveness) is a common metric used to measure data center energy efficiency. It's calculated as the total facility energy consumption divided by the IT equipment energy consumption.

4. Virtualization and Utilization: Analyze the virtualization rate and utilization of servers and other hardware. Virtualization allows running multiple virtual machines on a single physical server, which can lead to better resource utilization and energy efficiency.

5. Cloud Services: If your organization uses cloud services, consider the energy consumption of these services. Cloud providers often publish environmental reports that provide insights into their sustainability efforts.

6. Software Efficiency: Evaluate the efficiency of the software applications and services running in your IT stack. Energy-efficient software design and coding practices can help reduce energy consumption.

7. Telecommuting and Travel: Take into account the energy consumption associated with telecommuting (remote work) and business travel when using IT resources. These factors can impact the carbon footprint indirectly.

8. Data Transmission: Assess the energy used for data transmission over networks, including local area networks (LANs) and wide area networks (WANs).

9. Emission Factors: Convert energy consumption data into greenhouse gas (GHG) emissions using emission factors provided by relevant authorities or industry standards. These factors provide the amount of CO2 equivalent emissions associated with each unit of energy consumed.

10. Calculation and Analysis: Calculate the total carbon footprint of your IT stack by summing up the GHG emissions from all components. Analyze the results to identify areas of high impact and potential opportunities for improvement.

11. Benchmarking and Reporting: Consider benchmarking your carbon footprint against industry standards and best practices. This can help you set targets for reducing emissions and track your progress over time. Create reports and share the findings with stakeholders to raise awareness and support sustainability initiatives.

Keep in mind that measuring the carbon footprint from the full IT stack is a complex task that may require specialized knowledge and tools. Consider involving experts in environmental sustainability or seeking support from organizations specializing in carbon footprint assessments.

A solution from Climatiq

Climatiq provides an embedded carbon intelligence software that enables developers to automate GHG emission calculations based on verified scientific models. Its suite of products includes an open dataset of emission factors, and intelligent APIs that integrate with any existing software for real time monitoring of greenhouse gas emissions.

Climatiq offers a calculation engine that allows to convert metrics about cloud CPU and memory usage, storage, and networking traffic to CO2e estimates for Amazon Web Services (AWS), Google Cloud Platform (GCP), and Microsoft Azure.

This service is offered by Climatiq directly, but it's also been used to demonstrate the flexibility and the extensibility of the Cisco Full Stack Observability Platform. Climatiq is one of the first ecosystem partners that collaborated with Cisco, even before the platform was released in June 2023. The integration of the Climatiq engine with the Cost Insight module of the FSO Platform has been presented in a session at Cisco Live, Las Vegas.

Extensibility of the Cisco FSO Platform

Climatiq's carbon footprint calculation engine was integrated leveraging the developers friendly framework offered by Cisco to create extension modules for the FSO Platform.

With those tools you can model a business domain, or a technical domain, defining entities and their relationships so that you can visualize data in a dashboard that fit your specific business needs.

Telemetry collected from the infrastructure, and from any asset in your business architecture (processes, applications, cloud resources, robots...), is used to populate a view of the world that helps the operations teams and the lines of business to have full control and deep visibility into every component, as well as to roll up all the information at company level dashboards.

You don't need expert programmers to build complex applications, to collect-filter-correlate-visualize data. You just need a domain subject matter expert (e.g. a business analyst, or a SRE) that design a Entity-Relationship diagram and customize a few JSON files that tell the FSO Platform how to manage the incoming telemetry data.

Just one month to build the integration

Building extension modules is so easy that Climatiq, as a Cisco partner, took just one month to build their module. They were educated about the architecture of the system and the developer framework, then they build the module that is now offered in the Cisco FSO marketplace. Every customer that uses the FSO Platform can now subscribe to the extension module from Climatiq and get the carbon footprint calculation instantly in the user interface of their tenant.

cost insight and carbon footprint in the Cisco FSO Platform

The implementation consisted in defining some new metrics (remember, it's all about OpenTelemetry that means Metrics, Events, Logs and Traces) and configuring the connection to the Climatiq API to send data back and forth. Existing metrics in the FSO Platform (CPU and memory usage, storage, and networking traffic) are sent to the calculation engine, that sends additional metrics back to be added to the original entities (e.g. to Kubernetes clusters, deployments, etc.).

So it's easy to navigate your assets in the User Interface to see their health state, as well as related costs and carbon footprint.

In addition to the emission generated by a single component, that you can roll up to the business transaction and the entire business service, you can also see the aggregated information for the entire company for the day and for last week as well as a projection of possible saving in the emissions if you implements the suggested actions.

carbon footprint exec dashboard in the Cisco FSO Platform

Sustainability is just a part of the visibility you want to have about your assets and your processes.

With Full stack Observability you can really model the view of the world that fits your business needs.

And the Cisco FSO Platform is one of the most complete - and extensible - way to collect and correlate the needed information.

July 28, 2023

Why Application Security is important (and complementary to perimeter security)?

Outstanding application security is foundational to a brand's reputation, creating and building trust and loyalty with users. But vulnerabilities can occur anytime, anywhere (in your code, in commercial applications, in libraries you've integrated and in remote API that you invoke), making it difficult and time-consuming to prioritize responses.

<Suggestion for people in a rush> If you only have 5 minutes, just scroll down and look at the amazing recorded demo: it explains everything better than the post itself </Suggestion for people in a rush>

Avoiding costly delays that can result in continuing damage to revenue and brand reputation means organizations must have clear visibility into each new vulnerability and the insights needed to prioritize remediation based on their business impact.

The traditional security schema, based on just protecting the perimeter with firewalls and IPS, is no longer sufficient. You need to protect the full stack, including all the software tiers.

Business Risk Observability

Speed and coordination are paramount when dealing with application security risks.

Bad actors can take advantage of gaps and delays between siloed security and application teams, resulting in costly and damaging consequences. Traditional vulnerability and threat scanning solutions lack the shared business context needed to rapidly assess risks and align teams based on potential business impact. To triage and align teams as fast as possible, teams need to know where vulnerabilities and threats impact their applications, how likely a risk is to be exploited, and how much business risk each issue presents.

One fundamental use case in Full-Stack Observability is business risk observability, supported by new levels of security intelligence capability that brings business context into application security. The new business risk scoring enables security and applications teams to have a greater threat visibility and intelligent business risk prioritization, so that they respond instantly to revenue-impacting security risks and reduce overall organizational risk profiles.

New Cisco Secure Application features and functionalities include business transaction mapping to understand how and where an attack may occur; threat intelligence feeds from Cisco Talos, Kenna, and Panoptica; and business risk scoring.

Business Transaction Mapping

New business transaction mapping locates how and where an attack may occur within common application workflows like ‘login, checkout, or complete payment’ so that ITOps and SecOps professionals can instantly understand the potential impact to your application and your bottom line.

Threat Intelligence Feeds

New threat intelligence feeds from Cisco Talos, Kenna, and Panoptica provide valuable risk scores from multiple sources to assess the likelihood of threat exploits

Business Risk Scoring (for Security Risk Prioritization)

New Business risk scoring combines threat and vulnerability intelligence, business impact and runtime behavior to identify the most pressing risks, avoiding delays, and speeding response across teams.

Video Demonstration of the Business Risk Observability use case

See a complete, explanatory demonstration of how a risk index associated to your business transactions allows to discover and remediate vulnerabilities with a proper priority assessment:

https://video.cisco.com/detail/video/6321988561112

July 14, 2023

Navigating relationships across monitored entities

I have described the Cisco FSO Platform as an extensible, developer friendly platform that can ingest all kinds of telemetry and is able to correlate those data into a meaningful insight.

But... what does it really mean? Some readers told me it's an abstract concept, they don't get how it relates to their daily job in IT Operations.

Let's define telemetry first: it is all the data that you can get from a running system, like a Formula 1 car running on the race track (speed, consumption, temperature, remaining fuel, etc.). Or from your IT systems, that include applications, infrastructure, cloud, network, etc. In this case, data come in the form of Metrics (any number you can measure), Events (something that happened at an instant in time), Logs (information written by a system somewhere) and Traces (description of the execution of a process).

This is the origin of the acronym MELT, that you see written on the walls these days. Everyone is excited by Observability, that is the ability to infer the internal state of the system by looking at its external signals (e.g. collecting MELT). Generally, Observability is realised within a domain: a consistent set of assets of the same type (technologies, devices, or business processes). Example: network monitoring, application performance monitoring (APM), etc.

The fun comes when you're able to correlate MELT to investigate the root cause of an issue, or to find spots for optimising either performance or cost, or to demonstrate business stakeholders that all the business KPI are OK thanks to the good job done by the IT Operations folks :-)

Even better when you're able to correlate MELT across different domains, to extend observability end-to-end. The entire business architecture is under control. You can navigate all the relationships that link the entities that are relevant in your monitoring, and see if any of those is affecting the global outcome (faults, bottlenecks, etc.).

Example: LinkedIn

One illuminating example for this type of navigation is the parallelism with the LinkedIn website, and the exploration of your network of contacts to find a specific person, or information about their professional role, their company, their activity.

Every IT professional I know has a profile on LinkedIn, and each of them generates information: they post articles or photos, they react to others' posts (either repost, or suggest/like them), they advertise events, they update their profile (this can be associated to generating MELT). In addition, everyone is connected to other people, so that you have 1st degree (direct) connections but also 2nd degree connections that you inherit from the 1st degree ones.

Click on the video below to see a graphical representation of the navigation across a network of connections on Linkedin, and the flow of information generated by each one of the people in the network.

Now you can imagine a similar network of logical connections among entities that you monitor with the Full Stack Observability platform. You can explore how they are related to each other, and how every one affects the behaviour and the outcome of the others.

In a typical IT scenario, the entities might be the navigation of a user in the software application that supports a digital service (a Business Transaction), a service, the Kubernetes cluster where the service is running, a K8s node, the server running the node (that might be a VM in the cloud), the network segment to connect to the cloud, the cost of cloud resources, the carbon footprint generated by the infrastructure.

Correlation

All the relationships among the monitored entities are explicitly shown in the user interface, and you can move your focus to another object and inspect it, accessing the current health state, its history, and all the Metrics, Events, Logs and Traces it has generated. This makes extremely easy to understand if an issue detected in one of the entities propagates to others, affecting the way they work.

Also the Health Rules that you can define for one entity could include the evaluation of related entities, so that you roll up warnings and awareness at the top level based on what supporting entities are doing.

In this screenshot I've highlighted the list of relationships in the panel on the left side, with a green dashed line. That list continues, so scrolling down you would also see Workloads, Pods, Containers, Hosts, Configurations, Persistent Volume Claims, Ingresses, Load Balancers and Teams (yes, the organisational teams that are responsible for this cluster). The number on each entity type shows how many objects of that type are related to the one (the K8s cluster) that is currently in focus in the central pane.

Though we have information about all the entities in the system, all the objects that are not in direct relationship with the entity in focus are automatically hidden in the list, to remove what we call the "background noise". Showing only what really matters increases focus, and makes the investigation easier. You can click, let's say, on the two Business Transactions (luckily in this example both are in green health state) to see what business processes would be impacted by a problem occurring in this K8s cluster.

Of course, scrolling down we would see in the central panel all the information available about this cluster, including all the MELT it has generated in the time interval under investigation (see the options below).

What I have described in this post is just the basic capabilities of the Cisco FSO Platform. You can find the full detail in the official documentation.

In next posts, I'll explain the most relevant use cases and the impact that Full Stack Observability can have on your business.

July 8, 2023

FSO Platform: see everything, correlate everything

The Cisco Full Stack Observability Platform

Cisco has been the first vendor to offer a end-to-end observability solution, based on complementary products that are integrated into each other. The use cases described in my previous post are served by a combination of AppDynamics and ThousandEyes, with information fed by first class security system as Talos, Kenna and Panoptica (more in next posts).

Even if another vendor had such an extensive coverage (and they have not), they would not be integrated out of the box. The native integration enhances the power of each product (Applications Ops see also the network, Network Ops see also the applications, Security Ops see everything, everybody get the business context) and saves a lot of time and effort that a custom integration would require.

But we think this is not enough.

Some companies are already very advanced in their journey to Observability. They have already adopted advanced solutions from APM vendors (including Cisco and competitors), network monitoring and cloud services monitoring. Some have built sophisticated home grown systems for Observability and AIOps.

They might find that the predefined view of the world that is implemented in traditional APM solutions is not enough. Entities like an Application, a Service, a Business Transaction and their relationship might not be sufficient to describe their business domain, or a technical domain that is more complicated than common architectures. They would like to extend the domain model, but they can't because the solution has not been designed for extensibility.

Extensibility of the Observability solution

What they are looking for is the possibility to extend their visibility, and the possibility to correlate collected information to describe what's relevant for them.

Here comes the Cisco FSO Platform.

The Cisco FSO Platform is an open, extensible, API driven platform that empowers a new observability ecosystem for organizations. It is a unified platform built on OpenTelemetry (an open source project by CNCF) and anchored on metrics, events, logs and traces (MELT), enabling extensibility from queries to data models with a composable UI framework.

Cisco FSO Platform is a developer friendly environment to build your own view of the world.

You can tailor the Full Stack Observability to your business domain, or to your technical domain, defining the entities that are relevant for your stakeholders and the relationships that tie them. From business processes to every asset included in your architecture: applications, infrastructure, cloud, network, IoT and business data sources.

Creating a series of connections that you can navigate to fully control what's going on, as you do on Linkedin exploring a people's network and the information they generate (see next post for an example). All based on telemetry that you can collect from virtually everything: Metrics, Events, Logs and Traces. A new open standard, OpenTelemetry (supported by vendors and by the open source community), defines the way data are collected and ingested. These data feed the domain model, and you can later use them to investigate about the root cause of any issue, or to report about the business health state, or to look for opportunities to improve the efficiency.

The Cisco FSO platform is a differentiated solution that brings data together from multiple domains such as application, networking, infrastructure, security, cloud and business sources. Users can get correlated insights that reduce time to resolve issues and optimize experiences; while Partners, ISVs, and software developers can now build meaningful FSO applications enabling new use cases.

So there are alternative solutions for Full Stack Observability?

In their evolution from traditional monitoring, organizations go through some maturity steps. It's not a revolution in one day.

Someone starts replacing individual tools with more complete solutions that unify the visualization of collected metrics from different technical domains. Others start correlating those data with business metrics and KPI. Then they extend the observability to - really - the full stack.

For all those, the solution that I started describing in my previous post provides an excellent value. The seven use cases I've mentioned are completely supported by the Cisco FSO solution based on the integration of Appdynamics, ThousandEyes and the security ecosystem. It's well integrated and offers the various operations teams access to deep visibility as well as shared business context.

Some organizations are already in a more advanced state. They have already realized the Full Stack Observability, either adopting the Cisco solution or a competing one, or growing a AIOps system in house. But they feel that they need more, because their business domain (or parts of their technical domain) is not completely covered by the solution they have.

Thanks to the Cisco FSO Platform, that is extensible and developers friendly, they can build the needed extension themselves (or can have a look at the Cisco FSO App Exchange). This powerful engine, that backs all the Cisco FSO products, will allow those organizations to ingest telemetry from virtually every asset and to show correlated data based on their desired view of the world.

So finally we have two parallel motions, that don't conflict necessarily. The adoption of one or the other depends on your current observability maturity level and specific need for tailored dashboards.

In next post I will show a parallelism between the navigation across your LinkedIn network of contacts and the navigation through connected entities in the FSO Platform to search for the root cause of an issue by exploring Metrics, Events, Logs and traces associated to each entity.

Subsequently, I will describe fundamental use cases like Business Risk Observability.

June 29, 2023

Full Stack Observability use cases

Business Use Cases

Full Stack Observability is all about collecting any possible data from the applications running your digital services (i.e. business KPI) and from the infrastructure and cloud resources supporting them (i.e. the telemetry), including potentially also IoT, robots or whatever device involved in the process.

And then correlating those data to create an actionable insight, so that you have full control of your business processes end-to-end and you do better than your competitors (faster, more reliable, more appealing processes and services).

The FSO value proposition is not only related to technology (the infrastructure that you can monitor and the metrics you can read). It is a business value proposition, because observability has an immediate impact on the business outcomes.

Associating business processes, and digital services supporting those, with the health state of the infrastructure gives the Operations teams an immediate and objective measure of the value - or the troubles - that IT provides to their internal clients, that are the lines of business (LOB). And LOB managers can enjoy dedicated dashboards that show how the business is doing, highlighting all the key performance indicators (KPI) that are relevant for each persona in the organization.

If there is any slowdown in the business, they see it instantly and can eventually relate it to a technical problem, or maybe to the release of a new version of a software application, or to the launch of a new marketing campaign. The outcome of any action and of any incident is connected to the business with... no latency. The same visibility is also useful when the business shows a better performance than the day before. You can relate outcomes to actions and events.

So, before speaking about the technology that supports the Full Stack Observability, let's discuss about the use cases and their impact.

We can group the use cases in three categories: Observe, Secure and Optimize (referred to your end-to-end business architecture).

In the Observe category, we have 4 fundamental use cases:

- Hybrid application monitoring

This refers to every application running on Virtual Machines, in any combination of your Data Center and Public Clouds, or on bare metal servers.

You can relate the business KPI (users served, processes completed, amount of money, etc.) to the health state of the software applications and the infrastructure. You can identify the root cause of any problem and relate it to the business transactions (= user navigation for a specific process) that are affected.

- Cloud native application monitoring

Same as the previous use case, but referred to applications designed based on cloud native patterns (e.g. microservices architecture) that run on Kubernetes or Openshift. Regardless it's on premises, in cloud, or in a hybrid scenario. Traditional APM solutions were not so strong on this use case, because they were designed for older architectures.

- Customer digital experience monitoring

Here the focus is on the experience from the end user perspective, that is affected by the performance of both the applications and the infrastructure, but also - and mostly - by the network. Network problems can eventually affect the response time and the reliability of the service because the end user needs to reach the end point where the application is run (generally a web server), the front end needs to communicate with the application components distributed everywhere, and these may be invoking remote API exposed by a business partner (e.g. a payment gateway or any B2B service).

- Application dependency monitoring

In this use case you want to assure the performance of managed and unmanaged (third-party) application services and APIs, including performance over Internet and cloud networks to reach those services. Visibility of network performance and availability, including both public networks and yours, is critical to resolve issues and to push service providers to respect the SLA of the contract.

In the Secure category, we can discuss the Business Risk Observability use case:

- Application security

Reduce business risk by actively identifying and blocking against vulnerabilities found in application runtimes in production. Associate vulnerabilities with the likelihood that they are exploited in your specific context, so that you can prioritize the suggested remediation actions based on the business impact (shown by the association of vulnerabilities with Business Transactions).

In the Optimize category, we have the following use cases:

- Hybrid cost optimization

Lower costs by only paying for what you need in public cloud and by safely increasing utilization of on—premises assets.

- Application resource optimization

Improve and assure application performance by taking the guesswork out of resource allocation for workloads on—premises and in the public cloud.

Observability and network intelligence coming together

The use cases listed above goes beyond the scope of traditional APM solutions (Application Performance Monitoring) because they require to extend the visibility to every segment of the network. This picture shows an example of possible issues that can affect the end user experience, and need to be isolated and remediated to make sure the user is happy.

That is generally difficult, and requires a number of subject matter experts in different domains, and a number of tools. Very few vendors can offer all the complementary solutions that give you visibility on all aspects of the problem. And, of course, they are not integrated (vertical, siloed monitoring).

Data-driven bi-directional integration

The Full Stack Observability solution from Cisco, instead, covers all the angles and - in addition - it does so in a integrated fashion. The APM tool (AppDynamics) and the Network Monitoring tool (ThousandEyes) are integrated bidirectionally through their API (out of the box, no custom integration is required).

The visibility provided by one tool is greatly enhanced by data coming from the other tool, that are correlated automatically and shown in the same console.

So, if you're investigating about a business transaction, you don't see just the performance of the software stack and its distributed topology, but also the latency, packet loss, jitter and more network metrics in the same context (exactly in the network segments that impact the traffic for that single business transaction, at that instant in time).

Similarly, if you're looking at a network, you immediately know what applications and business transaction would be affected if it fails or slows down. And automated tests can be generated to monitor the networks and the end points, that are created automatically from the topology of the application that the APM tool has discovered.

Exciting times are coming, the Operations teams can expect their life to be much easier when they start adopting a Full stack Observability approach. More detail in next posts...

June 14, 2023

Changing the focus of this blog: now... Observability

My previous post about Infrastructure as Code concludes the exploration of Data Center and Cloud solutions and the methodologies that are related to automation and optimization of IT processes.

I've been working in this area for 15 years, after spending the previous 15 in software development.
It's been an amazing adventure and I really enjoied learning new things, exploring and challenging some limits - and sharing the experience with you.

Now I start focusing on a new adventure... possibly for the next 15 years 😜

I assumed a new professional role, that is the technical lead for Full Stack Observability, EMEA sales, at Cisco Appdynamics. From now on, I will tell you stories about my experience with the ultimate evolution of monitoring: it's all about collecting telemetry from every single component of your business architecture, including digital services (= distributed software applications), computing infrastructure, network, cloud resources, IoT, etc.

It's not just putting all those data together, but correlating them to create an insight. Transforming raw data into information about the health state of your processes, matching business KPI with the state of the infrastructure that supports the services.

To visualize that information and to navigate it, you can subscribe to (or create your own) different domain models, that are views of the world built specifically for each stakeholder: from lines of business to applications managers, from SRE to network operations and security teams...

A domain model is made of entities and their relationships, where entities represent what is relevant in your (business or technical) domain. They might be the usual entities in a APM domain (applications, services, business transactions...) or infrastructure entities (servers, VM, clusters, K8s nodes, etc.).
You can also model entities in a business domain (e.g. trains, stations, passengers, tickets, etc.).

Unlike Application Performance Monitoring (APM), where solutions like Appdynamics and its competitors excel in drilling down in the application architecture and its topology, with Full Stack Observability you really have full control end-to-end and have a context shared among all the teams that collaborate at building, running and operating the business ecosystem.

New standards like OpenTelemetry make it easy to convey Metrics, Events, Logs and Traces (MELT) to a unique platform from every single part of the system, including eventually robots in manufacturing, GPS tracing from your supply chain, etc.

All these data will be filtered according to the domain model and those that are relevant will feed the databases containing information about the domain entities and their relationships, that are used to populate the dashboard.

Those data will be matched with any other source of information that is relevant in your business domain (CRM, sales, weather forecast, logistics...) so that you can analyse and forecast the health of the business and relate it to the technologies and the processes behind. You can immediately remediate any problem because you detect the root cause easily, and even be proactive in preventing problems before they occur (or before they are perceived by end users). At the same time, you are able to spot opportunities for optimising the preformances and the cost efficiency of the system.

To see what is the official messaging from Cisco about the Full Stack Observability, check this page describing the FSO Platform.

Stay tuned, interesting news are coming...

June 9, 2023

Infrastructure as Code: making it easy with Nexus as Code

In previous posts I've described the advantage provided by managing the infrastructure the same way developers manage the application code.

Infrastructure as Code means using the same toolset (version control systems, pipeline orchestrators, automated provisioning) and same processes for building, integrating, testing and releasing the system that are used in the release cycle of a software application. This approach has a positive impact on speed, reliability and security end to end.

Together with Ansible, Terraform is one of the most used tools in the automated provisioning space, and many organizations use it when they adopt Infrastructure as Code. The availability of plugins (Terraform Providers) for almost every possible target (physical and virtual servers, network and storage, cloud services, etc.) makes it a common platform for automation: a "de facto" standard.

As many other technology vendors, Cisco offers Terraform Providers wrapping the API of their products, especially for Data Center and Cloud technologies. The Nexus family of switches, that includes the ACI fabric architecture, makes no exception. You can provision and manage the ACI fabric easily with Terraform (as well as with Ansible), and many examples and reusable assets are available at DevNet.

Generally, Terraform Providers surface the object model of the target system so that resources and the their relationships can be managed easily in a configuration plan, representing the desired state of the system. You need to understand how that particular system works and, in some cases, to manage the relationships among managed objects identifiers explicitly.

This is an example of creating a tenant in ACI, and a VRF contained in it:

Some engineers find this object model, and the use of the HCL (Hashicorp Configuration Language), easy and comfortable. Others, maybe due to a limited experience, would prefer an easier syntax and simpler object model.

For this reason Cisco has created a module called Nexus as Code, that sits on top of the standard ACI provider for Terraform, hiding the perceived complexity and offering a simplified object model. The objects that are contained in each other are simply nested and represented in a way that's very close to the conceptual representation of the logical architecture (represented by the following picture)

Nexus as Code can be seen as a (optional) component in the Terraform solution to automate ACI and other network controllers from Cisco.

Using a configuration language as simple as YAML, nesting is represented with indentation in Nexus as Code. This example corresponds to the HCL snippet above:

This format is particularly suitable for copy/paste operations, that make it easy to clone and modify a template so it is ready for a new project.

If you start from the example above, simply copying one line you can have one more VRF created and contained in the same tenant. Definitely simpler that doing the same in a HCL file, and encouraging for a network engineer the first time he/she uses Terraform.

Everything you need is a folder to store one or more YAML file defining the desired state of the ACI fabric, and the installation of the Terraform binary file (free download from here). After that, you will just use the following two commands:

terraform init (that makes sure that the needed providers are installed, and eventually downloads them automatically)

terraform apply (that reads the input, evaluates changes required to align the state of the target fabric to the desired state, then call the API of the ACI controller)

when you confirm the apply, you will see the log of the execution and finally the message will tell that the job is done.

I believe that Nexus as Code is a powerful tool that may help engineers to approach the IaC (Infrastructure as Code) methodology easier, with no stress due to learning new complex technologies and tools.

Being based on standard, open-source tools, it does not introduce any lock in with Cisco technologies.
It simply translates easy-to-manipulate YAML files, that describe your desired state, into plain Terraform plans that are executed automatically.

So you can start adopting the same tools and same processes that developers use in building, integrating, testing and releasing the system, obtaining the same benefit in terms of speed, consistency, security and self-documentation.

Don't be shy, start today to experiment and see how easy it is 😜

Pages