June 29, 2023

Full Stack Observability use cases

Business Use Cases

Full Stack Observability is all about collecting any possible data from the applications running your digital services (i.e. business KPI) and from the infrastructure and cloud resources supporting them (i.e. the telemetry), including potentially also IoT, robots or whatever device involved in the process.

And then correlating those data to create an actionable insight, so that you have full control of your business processes end-to-end and you do better than your competitors (faster, more reliable, more appealing processes and services).  

The FSO value proposition is not only related to technology (the infrastructure that you can monitor and the metrics you can read). It is a business value proposition, because observability has an immediate impact on the business outcomes.


Associating business processes, and digital services supporting those, with the health state of the infrastructure gives the Operations teams an immediate and objective measure of the value - or the troubles - that IT provides to their internal clients, that are the lines of business (LOB). And LOB managers can enjoy dedicated dashboards that show how the business is doing, highlighting all the key performance indicators (KPI) that are relevant for each persona in the organization.  

If there is any slowdown in the business, they see it instantly and can eventually relate it to a technical problem, or maybe to the release of a new version of a software application, or to the launch of a new marketing campaign. The outcome of any action and of any incident is connected to the business with... no latency. The same visibility is also useful when the business shows a better performance than the day before. You can relate outcomes to actions and events.

So, before speaking about the technology that supports the Full Stack Observability, let's discuss about the use cases and their impact.

We can group the use cases in three categories: Observe, Secure and Optimize (referred to your end-to-end business architecture).




In the Observe category, we have 4 fundamental use cases:

- Hybrid application monitoring

This refers to every application running on Virtual Machines, in any combination of your Data Center and Public Clouds, or on bare metal servers.

You can relate the business KPI (users served, processes completed, amount of money, etc.) to the health state of the software applications and the infrastructure. You can identify the root cause of any problem and relate it to the business transactions (= user navigation for a specific process) that are affected.

- Cloud native application monitoring

Same as the previous use case, but referred to applications designed based on cloud native patterns (e.g. microservices architecture) that run on Kubernetes or Openshift. Regardless it's on premises, in cloud, or in a hybrid scenario. Traditional APM solutions were not so strong on this use case, because they were designed for older architectures.

- Customer digital experience monitoring

Here the focus is on the experience from the end user perspective, that is affected by the performance of both the applications and the infrastructure, but also - and mostly - by the network. Network problems can eventually affect the response time and the reliability of the service because the end user needs to reach the end point where the application is run (generally a web server), the front end needs to communicate with the application components distributed everywhere, and these may be invoking remote API exposed by a business partner (e.g. a payment gateway or any B2B service).

- Application dependency monitoring

In this use case you want to assure the performance of managed and unmanaged (third-party) application services and APIs, including performance over Internet and cloud networks to reach those services. Visibility of network performance and availability, including both public networks and yours, is critical to resolve issues and to push service providers to respect the SLA of the contract.

In the Secure category, we can discuss the Business Risk Observability use case:

- Application security

Reduce business risk by actively identifying and blocking against vulnerabilities found in application runtimes in production. Associate vulnerabilities with the likelihood that they are exploited in your specific context, so that you can prioritize the suggested remediation actions based on the business impact (shown by the association of vulnerabilities with Business Transactions).

In the Optimize category, we have the following use cases:

- Hybrid cost optimization

Lower costs by only paying for what you need in public cloud and by safely increasing utilization of on—premises assets.

- Application resource optimization

Improve and assure application performance by taking the guesswork out of resource allocation for workloads on—premises and in the public cloud.


Observability and network intelligence coming together

The use cases listed above goes beyond the scope of traditional APM solutions (Application Performance Monitoring) because they require to extend the visibility to every segment of the network. This picture shows an example of possible issues that can affect the end user experience, and need to be isolated and remediated to make sure the user is happy.



That is generally difficult, and requires a number of subject matter experts in different domains, and a number of tools. Very few vendors can offer all the complementary solutions that give you visibility on all aspects of the problem. And, of course, they are not integrated (vertical, siloed monitoring). 

Data-driven bi-directional integration 

The Full Stack Observability solution from Cisco, instead, covers all the angles and - in addition - it does so in a integrated fashion. The APM tool (AppDynamics) and the Network Monitoring tool (ThousandEyes) are integrated bidirectionally through their API (out of the box, no custom integration is required).


The visibility provided by one tool is greatly enhanced by data coming from the other tool, that are correlated automatically and shown in the same console.

So, if you're investigating about a business transaction, you don't see just the performance of the software stack and its distributed topology, but also the latency, packet loss, jitter and more network metrics in the same context (exactly in the network segments that impact the traffic for that single business transaction, at that instant in time).

Similarly, if you're looking at a network, you immediately know what applications and business transaction would be affected if it fails or slows down. And automated tests can be generated to monitor the networks and the end points, that are created automatically from the topology of the application that the APM tool has discovered.

Exciting times are coming, the Operations teams can expect their life to be much easier when they start adopting a Full stack Observability approach. More detail in next posts...


June 14, 2023

Changing the focus of this blog: now... Observability

My previous post about Infrastructure as Code concludes the exploration of Data Center and Cloud solutions and the methodologies that are related to automation and optimization of IT processes.

I've been working in this area for 15 years, after spending the previous 15 in software development. 
It's been an amazing adventure and I really enjoied learning new things, exploring and challenging some limits - and sharing the experience with you.

Now I start focusing on a new adventure... possibly for the next 15 years 😜

I assumed a new professional role, that is the technical lead for Full Stack Observability, EMEA sales, at Cisco Appdynamics. From now on, I will tell you stories about my experience with the ultimate evolution of monitoring: it's all about collecting telemetry from every single component of your business architecture, including digital services (= distributed software applications), computing infrastructure, network, cloud resources, IoT, etc.

It's not just putting all those data together, but correlating them to create an insight. Transforming raw data into information about the health state of your processes, matching business KPI with the state of the infrastructure that supports the services.

To visualize that information and to navigate it, you can subscribe to (or create your own) different domain models, that are views of the world built specifically for each stakeholder: from lines of business to applications managers, from SRE to network operations and security teams...

A domain model is made of entities and their relationships, where entities represent what is relevant in your (business or technical) domain. They might be the usual entities in a APM domain (applications, services, business transactions...) or infrastructure entities (servers, VM, clusters, K8s nodes, etc.). 
You can also model entities in a business domain (e.g. trains, stations, passengers, tickets, etc.).




Unlike Application Performance Monitoring (APM), where solutions like Appdynamics and its competitors excel in drilling down in the application architecture and its topology, with Full Stack Observability you really have full control end-to-end and have a context shared among all the teams that collaborate at building, running and operating the business ecosystem.

New standards like OpenTelemetry make it easy to convey Metrics, Events, Logs and Traces (MELT) to a unique platform from every single part of the system, including eventually robots in manufacturing, GPS tracing from your supply chain, etc.

All these data will be filtered according to the domain model and those that are relevant will feed the databases containing information about the domain entities and their relationships, that are used to populate the dashboard.




Those data will be matched with any other source of information that is relevant in your business domain (CRM, sales, weather forecast, logistics...) so that you can analyse and forecast the health of the business and relate it to the technologies and the processes behind. You can immediately remediate any problem because you detect the root cause easily, and even be proactive in preventing problems before they occur (or before they are perceived by end users). At the same time, you are able to spot opportunities for optimising the preformances and the cost efficiency of the system.

To see what is the official messaging from Cisco about the Full Stack Observability, check this page describing the FSO Platform

Stay tuned, interesting news are coming...


June 9, 2023

Infrastructure as Code: making it easy with Nexus as Code

In previous posts I've described the advantage provided by managing the infrastructure the same way developers manage the application code.

Infrastructure as Code means using the same toolset (version control systems, pipeline orchestrators, automated provisioning) and same processes for building, integrating, testing and releasing the system that are used in the release cycle of a software application. This approach has a positive impact on speed, reliability and security end to end.


Together with Ansible, Terraform is one of the most used tools in the automated provisioning space, and many organizations use it when they adopt Infrastructure as Code. The availability of plugins (Terraform Providers) for almost every possible target (physical and virtual servers, network and storage, cloud services, etc.) makes it a common platform for automation: a "de facto" standard.

As many other technology vendors, Cisco offers Terraform Providers wrapping the API of their products, especially for Data Center and Cloud technologies. The Nexus family of switches, that includes the ACI fabric architecture, makes no exception. You can provision and manage the ACI fabric easily with Terraform (as well as with Ansible), and many examples and reusable assets are available at DevNet.

Generally, Terraform Providers surface the object model of the target system so that resources and the their relationships can be managed easily in a configuration plan, representing the desired state of the system. You need to understand how that particular system works and, in some cases, to manage the relationships among managed objects identifiers explicitly.

This is an example of creating a tenant in ACI, and a VRF contained in it:



Some engineers find this object model, and the use of the HCL (Hashicorp Configuration Language), easy and comfortable. Others, maybe due to a limited experience, would prefer an easier syntax and simpler object model.

For this reason Cisco has created a module called Nexus as Code, that sits on top of the standard ACI provider for Terraform, hiding the perceived complexity and offering a simplified object model. The objects that are contained in each other are simply nested and represented in a way that's very close to the conceptual representation of the logical architecture (represented by the following picture)


Nexus as Code can be seen as a (optional) component in the Terraform solution to automate ACI and other network controllers from Cisco.



Using a configuration language as simple as YAML, nesting is represented with indentation in Nexus as Code. This example corresponds to the HCL snippet above:


This format is particularly suitable for copy/paste operations, that make it easy to clone and modify a template so it is ready for a new project.

If you start from the example above, simply copying one line you can have one more VRF created and contained in the same tenant. Definitely simpler that doing the same in a HCL file, and encouraging for a network engineer the first time he/she uses Terraform. 

Everything you need is a folder to store one or more YAML file defining the desired state of the ACI fabric, and the installation of the Terraform binary file (free download from here). After that, you will just use the following two commands:

terraform init (that makes sure that the needed providers are installed, and eventually downloads them automatically)


terraform apply (that reads the input, evaluates changes required to align the state of the target fabric to the desired state, then call the API of the ACI controller)



when you confirm the apply, you will see the log of the execution and finally the message will tell that the job is done.



I believe that Nexus as Code is a powerful tool that may help engineers to approach the IaC (Infrastructure as Code) methodology easier, with no stress due to learning new complex technologies and tools.

Being based on standard, open-source tools, it does not introduce any lock in with Cisco technologies. 
It simply translates easy-to-manipulate YAML files, that describe your desired state, into plain Terraform plans that are executed automatically.

So you can start adopting the same tools and same processes that developers use in building, integrating, testing and releasing the system, obtaining the same benefit in terms of speed, consistency, security and self-documentation.

Don't be shy, start today to experiment and see how easy it is 😜