The Relativity of Wrong: APM

Showing posts with label APM. Show all posts

August 25, 2023

Carbon footprint of your full stack

This post discusses the following subjects:

What is a ESG strategy
You can't control what you don't measure
How to measure the full stack carbon footprint
A solution from Climatiq
Extensibility of the Cisco FSO Platform
Just one month to build the integration

What is a ESG strategy

In the context of information technology (IT), an ESG strategy involves applying the principles of Environmental, Social, and Governance factors specifically to the IT industry and the use of technology within organizations. Here's how each element applies to IT:

1. Environmental (E): In IT, the environmental aspect focuses on the impact of technology on the environment. This includes evaluating the energy consumption of data centers and IT infrastructure, as well as assessing the carbon footprint associated with digital operations and electronic waste management. Companies can adopt environmentally friendly practices by using renewable energy for data centers, implementing energy-efficient hardware and software solutions, and responsibly recycling electronic equipment.

2. Social (S): In the social dimension of IT, the focus is on how technology affects people and communities. This involves considering the ethical use of data, ensuring data privacy and security, promoting digital inclusion, and addressing issues like digital divide and accessibility. Socially responsible IT companies prioritize the protection of user data, work to bridge the digital gap, and develop inclusive technologies that cater to a diverse range of users.

3. Governance (G): Governance in IT pertains to how technology is managed and governed within organizations. This includes assessing the transparency of IT decision-making, the adherence to IT policies and regulations, and the alignment of IT strategies with broader business goals. Good IT governance ensures that technology is used responsibly and ethically and that it aligns with the overall values and objectives of the organization.

By incorporating ESG principles into IT strategies, organizations aim to become more environmentally conscious, socially responsible, and ethically governed in their technological practices. This not only helps businesses reduce their environmental impact and enhance their reputation but also contributes to a more sustainable and inclusive digital world. Additionally, investors in the IT sector increasingly consider ESG factors when evaluating companies' performance and long-term viability.ESG strategy

The purpose of an ESG Strategy is to demonstrate the environmental, social, and governance factors that your organisation believes to be intrinsically important to consider within your current and future business operations.

This post focuses on the "E" in ESG.

You can't control what you don't measure

Collect data toward key performance indicators (KPIs): Once the ESG plan is up and running, it’s time to start collecting data. ESG processes benefit businesses because they provide objective metrics that prove the success of social responsibility efforts. Use the data you gather to track KPIs, measuring success along the way.

How to measure the full stack carbon footprint

Measuring the carbon footprint from the full IT stack involves assessing the environmental impact of various components and processes within the IT infrastructure. Here's a step-by-step guide to help you get started:

1. Identify Components: First, make a list of all the components within your IT stack. This typically includes data centers, servers, network devices, storage systems, end-user devices (e.g., laptops, desktops, smartphones), and any other IT-related equipment.

2. Energy Consumption: Measure the energy consumption of each component. This can be done using energy monitoring tools, power meters, or data provided by equipment manufacturers. Take into account both the direct energy usage (electricity consumed by the IT equipment) and indirect energy usage (cooling systems, ventilation, etc.).

3. Data Center Efficiency: If you have data centers, assess their efficiency. PUE (Power Usage Effectiveness) is a common metric used to measure data center energy efficiency. It's calculated as the total facility energy consumption divided by the IT equipment energy consumption.

4. Virtualization and Utilization: Analyze the virtualization rate and utilization of servers and other hardware. Virtualization allows running multiple virtual machines on a single physical server, which can lead to better resource utilization and energy efficiency.

5. Cloud Services: If your organization uses cloud services, consider the energy consumption of these services. Cloud providers often publish environmental reports that provide insights into their sustainability efforts.

6. Software Efficiency: Evaluate the efficiency of the software applications and services running in your IT stack. Energy-efficient software design and coding practices can help reduce energy consumption.

7. Telecommuting and Travel: Take into account the energy consumption associated with telecommuting (remote work) and business travel when using IT resources. These factors can impact the carbon footprint indirectly.

8. Data Transmission: Assess the energy used for data transmission over networks, including local area networks (LANs) and wide area networks (WANs).

9. Emission Factors: Convert energy consumption data into greenhouse gas (GHG) emissions using emission factors provided by relevant authorities or industry standards. These factors provide the amount of CO2 equivalent emissions associated with each unit of energy consumed.

10. Calculation and Analysis: Calculate the total carbon footprint of your IT stack by summing up the GHG emissions from all components. Analyze the results to identify areas of high impact and potential opportunities for improvement.

11. Benchmarking and Reporting: Consider benchmarking your carbon footprint against industry standards and best practices. This can help you set targets for reducing emissions and track your progress over time. Create reports and share the findings with stakeholders to raise awareness and support sustainability initiatives.

Keep in mind that measuring the carbon footprint from the full IT stack is a complex task that may require specialized knowledge and tools. Consider involving experts in environmental sustainability or seeking support from organizations specializing in carbon footprint assessments.

A solution from Climatiq

Climatiq provides an embedded carbon intelligence software that enables developers to automate GHG emission calculations based on verified scientific models. Its suite of products includes an open dataset of emission factors, and intelligent APIs that integrate with any existing software for real time monitoring of greenhouse gas emissions.

Climatiq offers a calculation engine that allows to convert metrics about cloud CPU and memory usage, storage, and networking traffic to CO2e estimates for Amazon Web Services (AWS), Google Cloud Platform (GCP), and Microsoft Azure.

This service is offered by Climatiq directly, but it's also been used to demonstrate the flexibility and the extensibility of the Cisco Full Stack Observability Platform. Climatiq is one of the first ecosystem partners that collaborated with Cisco, even before the platform was released in June 2023. The integration of the Climatiq engine with the Cost Insight module of the FSO Platform has been presented in a session at Cisco Live, Las Vegas.

Extensibility of the Cisco FSO Platform

Climatiq's carbon footprint calculation engine was integrated leveraging the developers friendly framework offered by Cisco to create extension modules for the FSO Platform.

With those tools you can model a business domain, or a technical domain, defining entities and their relationships so that you can visualize data in a dashboard that fit your specific business needs.

Telemetry collected from the infrastructure, and from any asset in your business architecture (processes, applications, cloud resources, robots...), is used to populate a view of the world that helps the operations teams and the lines of business to have full control and deep visibility into every component, as well as to roll up all the information at company level dashboards.

You don't need expert programmers to build complex applications, to collect-filter-correlate-visualize data. You just need a domain subject matter expert (e.g. a business analyst, or a SRE) that design a Entity-Relationship diagram and customize a few JSON files that tell the FSO Platform how to manage the incoming telemetry data.

Just one month to build the integration

Building extension modules is so easy that Climatiq, as a Cisco partner, took just one month to build their module. They were educated about the architecture of the system and the developer framework, then they build the module that is now offered in the Cisco FSO marketplace. Every customer that uses the FSO Platform can now subscribe to the extension module from Climatiq and get the carbon footprint calculation instantly in the user interface of their tenant.

cost insight and carbon footprint in the Cisco FSO Platform

The implementation consisted in defining some new metrics (remember, it's all about OpenTelemetry that means Metrics, Events, Logs and Traces) and configuring the connection to the Climatiq API to send data back and forth. Existing metrics in the FSO Platform (CPU and memory usage, storage, and networking traffic) are sent to the calculation engine, that sends additional metrics back to be added to the original entities (e.g. to Kubernetes clusters, deployments, etc.).

So it's easy to navigate your assets in the User Interface to see their health state, as well as related costs and carbon footprint.

In addition to the emission generated by a single component, that you can roll up to the business transaction and the entire business service, you can also see the aggregated information for the entire company for the day and for last week as well as a projection of possible saving in the emissions if you implements the suggested actions.

carbon footprint exec dashboard in the Cisco FSO Platform

Sustainability is just a part of the visibility you want to have about your assets and your processes.

With Full stack Observability you can really model the view of the world that fits your business needs.

And the Cisco FSO Platform is one of the most complete - and extensible - way to collect and correlate the needed information.

July 28, 2023

Why Application Security is important (and complementary to perimeter security)?

Outstanding application security is foundational to a brand's reputation, creating and building trust and loyalty with users. But vulnerabilities can occur anytime, anywhere (in your code, in commercial applications, in libraries you've integrated and in remote API that you invoke), making it difficult and time-consuming to prioritize responses.

<Suggestion for people in a rush> If you only have 5 minutes, just scroll down and look at the amazing recorded demo: it explains everything better than the post itself </Suggestion for people in a rush>

Avoiding costly delays that can result in continuing damage to revenue and brand reputation means organizations must have clear visibility into each new vulnerability and the insights needed to prioritize remediation based on their business impact.

The traditional security schema, based on just protecting the perimeter with firewalls and IPS, is no longer sufficient. You need to protect the full stack, including all the software tiers.

Business Risk Observability

Speed and coordination are paramount when dealing with application security risks.

Bad actors can take advantage of gaps and delays between siloed security and application teams, resulting in costly and damaging consequences. Traditional vulnerability and threat scanning solutions lack the shared business context needed to rapidly assess risks and align teams based on potential business impact. To triage and align teams as fast as possible, teams need to know where vulnerabilities and threats impact their applications, how likely a risk is to be exploited, and how much business risk each issue presents.

One fundamental use case in Full-Stack Observability is business risk observability, supported by new levels of security intelligence capability that brings business context into application security. The new business risk scoring enables security and applications teams to have a greater threat visibility and intelligent business risk prioritization, so that they respond instantly to revenue-impacting security risks and reduce overall organizational risk profiles.

New Cisco Secure Application features and functionalities include business transaction mapping to understand how and where an attack may occur; threat intelligence feeds from Cisco Talos, Kenna, and Panoptica; and business risk scoring.

Business Transaction Mapping

New business transaction mapping locates how and where an attack may occur within common application workflows like ‘login, checkout, or complete payment’ so that ITOps and SecOps professionals can instantly understand the potential impact to your application and your bottom line.

Threat Intelligence Feeds

New threat intelligence feeds from Cisco Talos, Kenna, and Panoptica provide valuable risk scores from multiple sources to assess the likelihood of threat exploits

Business Risk Scoring (for Security Risk Prioritization)

New Business risk scoring combines threat and vulnerability intelligence, business impact and runtime behavior to identify the most pressing risks, avoiding delays, and speeding response across teams.

Video Demonstration of the Business Risk Observability use case

See a complete, explanatory demonstration of how a risk index associated to your business transactions allows to discover and remediate vulnerabilities with a proper priority assessment:

https://video.cisco.com/detail/video/6321988561112

July 14, 2023

Navigating relationships across monitored entities

I have described the Cisco FSO Platform as an extensible, developer friendly platform that can ingest all kinds of telemetry and is able to correlate those data into a meaningful insight.

But... what does it really mean? Some readers told me it's an abstract concept, they don't get how it relates to their daily job in IT Operations.

Let's define telemetry first: it is all the data that you can get from a running system, like a Formula 1 car running on the race track (speed, consumption, temperature, remaining fuel, etc.). Or from your IT systems, that include applications, infrastructure, cloud, network, etc. In this case, data come in the form of Metrics (any number you can measure), Events (something that happened at an instant in time), Logs (information written by a system somewhere) and Traces (description of the execution of a process).

This is the origin of the acronym MELT, that you see written on the walls these days. Everyone is excited by Observability, that is the ability to infer the internal state of the system by looking at its external signals (e.g. collecting MELT). Generally, Observability is realised within a domain: a consistent set of assets of the same type (technologies, devices, or business processes). Example: network monitoring, application performance monitoring (APM), etc.

The fun comes when you're able to correlate MELT to investigate the root cause of an issue, or to find spots for optimising either performance or cost, or to demonstrate business stakeholders that all the business KPI are OK thanks to the good job done by the IT Operations folks :-)

Even better when you're able to correlate MELT across different domains, to extend observability end-to-end. The entire business architecture is under control. You can navigate all the relationships that link the entities that are relevant in your monitoring, and see if any of those is affecting the global outcome (faults, bottlenecks, etc.).

Example: LinkedIn

One illuminating example for this type of navigation is the parallelism with the LinkedIn website, and the exploration of your network of contacts to find a specific person, or information about their professional role, their company, their activity.

Every IT professional I know has a profile on LinkedIn, and each of them generates information: they post articles or photos, they react to others' posts (either repost, or suggest/like them), they advertise events, they update their profile (this can be associated to generating MELT). In addition, everyone is connected to other people, so that you have 1st degree (direct) connections but also 2nd degree connections that you inherit from the 1st degree ones.

Click on the video below to see a graphical representation of the navigation across a network of connections on Linkedin, and the flow of information generated by each one of the people in the network.

Now you can imagine a similar network of logical connections among entities that you monitor with the Full Stack Observability platform. You can explore how they are related to each other, and how every one affects the behaviour and the outcome of the others.

In a typical IT scenario, the entities might be the navigation of a user in the software application that supports a digital service (a Business Transaction), a service, the Kubernetes cluster where the service is running, a K8s node, the server running the node (that might be a VM in the cloud), the network segment to connect to the cloud, the cost of cloud resources, the carbon footprint generated by the infrastructure.

Correlation

All the relationships among the monitored entities are explicitly shown in the user interface, and you can move your focus to another object and inspect it, accessing the current health state, its history, and all the Metrics, Events, Logs and Traces it has generated. This makes extremely easy to understand if an issue detected in one of the entities propagates to others, affecting the way they work.

Also the Health Rules that you can define for one entity could include the evaluation of related entities, so that you roll up warnings and awareness at the top level based on what supporting entities are doing.

In this screenshot I've highlighted the list of relationships in the panel on the left side, with a green dashed line. That list continues, so scrolling down you would also see Workloads, Pods, Containers, Hosts, Configurations, Persistent Volume Claims, Ingresses, Load Balancers and Teams (yes, the organisational teams that are responsible for this cluster). The number on each entity type shows how many objects of that type are related to the one (the K8s cluster) that is currently in focus in the central pane.

Though we have information about all the entities in the system, all the objects that are not in direct relationship with the entity in focus are automatically hidden in the list, to remove what we call the "background noise". Showing only what really matters increases focus, and makes the investigation easier. You can click, let's say, on the two Business Transactions (luckily in this example both are in green health state) to see what business processes would be impacted by a problem occurring in this K8s cluster.

Of course, scrolling down we would see in the central panel all the information available about this cluster, including all the MELT it has generated in the time interval under investigation (see the options below).

What I have described in this post is just the basic capabilities of the Cisco FSO Platform. You can find the full detail in the official documentation.

In next posts, I'll explain the most relevant use cases and the impact that Full Stack Observability can have on your business.

July 8, 2023

FSO Platform: see everything, correlate everything

The Cisco Full Stack Observability Platform

Cisco has been the first vendor to offer a end-to-end observability solution, based on complementary products that are integrated into each other. The use cases described in my previous post are served by a combination of AppDynamics and ThousandEyes, with information fed by first class security system as Talos, Kenna and Panoptica (more in next posts).

Even if another vendor had such an extensive coverage (and they have not), they would not be integrated out of the box. The native integration enhances the power of each product (Applications Ops see also the network, Network Ops see also the applications, Security Ops see everything, everybody get the business context) and saves a lot of time and effort that a custom integration would require.

But we think this is not enough.

Some companies are already very advanced in their journey to Observability. They have already adopted advanced solutions from APM vendors (including Cisco and competitors), network monitoring and cloud services monitoring. Some have built sophisticated home grown systems for Observability and AIOps.

They might find that the predefined view of the world that is implemented in traditional APM solutions is not enough. Entities like an Application, a Service, a Business Transaction and their relationship might not be sufficient to describe their business domain, or a technical domain that is more complicated than common architectures. They would like to extend the domain model, but they can't because the solution has not been designed for extensibility.

Extensibility of the Observability solution

What they are looking for is the possibility to extend their visibility, and the possibility to correlate collected information to describe what's relevant for them.

Here comes the Cisco FSO Platform.

The Cisco FSO Platform is an open, extensible, API driven platform that empowers a new observability ecosystem for organizations. It is a unified platform built on OpenTelemetry (an open source project by CNCF) and anchored on metrics, events, logs and traces (MELT), enabling extensibility from queries to data models with a composable UI framework.

Cisco FSO Platform is a developer friendly environment to build your own view of the world.

You can tailor the Full Stack Observability to your business domain, or to your technical domain, defining the entities that are relevant for your stakeholders and the relationships that tie them. From business processes to every asset included in your architecture: applications, infrastructure, cloud, network, IoT and business data sources.

Creating a series of connections that you can navigate to fully control what's going on, as you do on Linkedin exploring a people's network and the information they generate (see next post for an example). All based on telemetry that you can collect from virtually everything: Metrics, Events, Logs and Traces. A new open standard, OpenTelemetry (supported by vendors and by the open source community), defines the way data are collected and ingested. These data feed the domain model, and you can later use them to investigate about the root cause of any issue, or to report about the business health state, or to look for opportunities to improve the efficiency.

The Cisco FSO platform is a differentiated solution that brings data together from multiple domains such as application, networking, infrastructure, security, cloud and business sources. Users can get correlated insights that reduce time to resolve issues and optimize experiences; while Partners, ISVs, and software developers can now build meaningful FSO applications enabling new use cases.

So there are alternative solutions for Full Stack Observability?

In their evolution from traditional monitoring, organizations go through some maturity steps. It's not a revolution in one day.

Someone starts replacing individual tools with more complete solutions that unify the visualization of collected metrics from different technical domains. Others start correlating those data with business metrics and KPI. Then they extend the observability to - really - the full stack.

For all those, the solution that I started describing in my previous post provides an excellent value. The seven use cases I've mentioned are completely supported by the Cisco FSO solution based on the integration of Appdynamics, ThousandEyes and the security ecosystem. It's well integrated and offers the various operations teams access to deep visibility as well as shared business context.

Some organizations are already in a more advanced state. They have already realized the Full Stack Observability, either adopting the Cisco solution or a competing one, or growing a AIOps system in house. But they feel that they need more, because their business domain (or parts of their technical domain) is not completely covered by the solution they have.

Thanks to the Cisco FSO Platform, that is extensible and developers friendly, they can build the needed extension themselves (or can have a look at the Cisco FSO App Exchange). This powerful engine, that backs all the Cisco FSO products, will allow those organizations to ingest telemetry from virtually every asset and to show correlated data based on their desired view of the world.

So finally we have two parallel motions, that don't conflict necessarily. The adoption of one or the other depends on your current observability maturity level and specific need for tailored dashboards.

In next post I will show a parallelism between the navigation across your LinkedIn network of contacts and the navigation through connected entities in the FSO Platform to search for the root cause of an issue by exploring Metrics, Events, Logs and traces associated to each entity.

Subsequently, I will describe fundamental use cases like Business Risk Observability.

June 29, 2023

Full Stack Observability use cases

Business Use Cases

Full Stack Observability is all about collecting any possible data from the applications running your digital services (i.e. business KPI) and from the infrastructure and cloud resources supporting them (i.e. the telemetry), including potentially also IoT, robots or whatever device involved in the process.

And then correlating those data to create an actionable insight, so that you have full control of your business processes end-to-end and you do better than your competitors (faster, more reliable, more appealing processes and services).

The FSO value proposition is not only related to technology (the infrastructure that you can monitor and the metrics you can read). It is a business value proposition, because observability has an immediate impact on the business outcomes.

Associating business processes, and digital services supporting those, with the health state of the infrastructure gives the Operations teams an immediate and objective measure of the value - or the troubles - that IT provides to their internal clients, that are the lines of business (LOB). And LOB managers can enjoy dedicated dashboards that show how the business is doing, highlighting all the key performance indicators (KPI) that are relevant for each persona in the organization.

If there is any slowdown in the business, they see it instantly and can eventually relate it to a technical problem, or maybe to the release of a new version of a software application, or to the launch of a new marketing campaign. The outcome of any action and of any incident is connected to the business with... no latency. The same visibility is also useful when the business shows a better performance than the day before. You can relate outcomes to actions and events.

So, before speaking about the technology that supports the Full Stack Observability, let's discuss about the use cases and their impact.

We can group the use cases in three categories: Observe, Secure and Optimize (referred to your end-to-end business architecture).

In the Observe category, we have 4 fundamental use cases:

- Hybrid application monitoring

This refers to every application running on Virtual Machines, in any combination of your Data Center and Public Clouds, or on bare metal servers.

You can relate the business KPI (users served, processes completed, amount of money, etc.) to the health state of the software applications and the infrastructure. You can identify the root cause of any problem and relate it to the business transactions (= user navigation for a specific process) that are affected.

- Cloud native application monitoring

Same as the previous use case, but referred to applications designed based on cloud native patterns (e.g. microservices architecture) that run on Kubernetes or Openshift. Regardless it's on premises, in cloud, or in a hybrid scenario. Traditional APM solutions were not so strong on this use case, because they were designed for older architectures.

- Customer digital experience monitoring

Here the focus is on the experience from the end user perspective, that is affected by the performance of both the applications and the infrastructure, but also - and mostly - by the network. Network problems can eventually affect the response time and the reliability of the service because the end user needs to reach the end point where the application is run (generally a web server), the front end needs to communicate with the application components distributed everywhere, and these may be invoking remote API exposed by a business partner (e.g. a payment gateway or any B2B service).

- Application dependency monitoring

In this use case you want to assure the performance of managed and unmanaged (third-party) application services and APIs, including performance over Internet and cloud networks to reach those services. Visibility of network performance and availability, including both public networks and yours, is critical to resolve issues and to push service providers to respect the SLA of the contract.

In the Secure category, we can discuss the Business Risk Observability use case:

- Application security

Reduce business risk by actively identifying and blocking against vulnerabilities found in application runtimes in production. Associate vulnerabilities with the likelihood that they are exploited in your specific context, so that you can prioritize the suggested remediation actions based on the business impact (shown by the association of vulnerabilities with Business Transactions).

In the Optimize category, we have the following use cases:

- Hybrid cost optimization

Lower costs by only paying for what you need in public cloud and by safely increasing utilization of on—premises assets.

- Application resource optimization

Improve and assure application performance by taking the guesswork out of resource allocation for workloads on—premises and in the public cloud.

Observability and network intelligence coming together

The use cases listed above goes beyond the scope of traditional APM solutions (Application Performance Monitoring) because they require to extend the visibility to every segment of the network. This picture shows an example of possible issues that can affect the end user experience, and need to be isolated and remediated to make sure the user is happy.

That is generally difficult, and requires a number of subject matter experts in different domains, and a number of tools. Very few vendors can offer all the complementary solutions that give you visibility on all aspects of the problem. And, of course, they are not integrated (vertical, siloed monitoring).

Data-driven bi-directional integration

The Full Stack Observability solution from Cisco, instead, covers all the angles and - in addition - it does so in a integrated fashion. The APM tool (AppDynamics) and the Network Monitoring tool (ThousandEyes) are integrated bidirectionally through their API (out of the box, no custom integration is required).

The visibility provided by one tool is greatly enhanced by data coming from the other tool, that are correlated automatically and shown in the same console.

So, if you're investigating about a business transaction, you don't see just the performance of the software stack and its distributed topology, but also the latency, packet loss, jitter and more network metrics in the same context (exactly in the network segments that impact the traffic for that single business transaction, at that instant in time).

Similarly, if you're looking at a network, you immediately know what applications and business transaction would be affected if it fails or slows down. And automated tests can be generated to monitor the networks and the end points, that are created automatically from the topology of the application that the APM tool has discovered.

Exciting times are coming, the Operations teams can expect their life to be much easier when they start adopting a Full stack Observability approach. More detail in next posts...

June 14, 2023

Changing the focus of this blog: now... Observability

My previous post about Infrastructure as Code concludes the exploration of Data Center and Cloud solutions and the methodologies that are related to automation and optimization of IT processes.

I've been working in this area for 15 years, after spending the previous 15 in software development.
It's been an amazing adventure and I really enjoied learning new things, exploring and challenging some limits - and sharing the experience with you.

Now I start focusing on a new adventure... possibly for the next 15 years 😜

I assumed a new professional role, that is the technical lead for Full Stack Observability, EMEA sales, at Cisco Appdynamics. From now on, I will tell you stories about my experience with the ultimate evolution of monitoring: it's all about collecting telemetry from every single component of your business architecture, including digital services (= distributed software applications), computing infrastructure, network, cloud resources, IoT, etc.

It's not just putting all those data together, but correlating them to create an insight. Transforming raw data into information about the health state of your processes, matching business KPI with the state of the infrastructure that supports the services.

To visualize that information and to navigate it, you can subscribe to (or create your own) different domain models, that are views of the world built specifically for each stakeholder: from lines of business to applications managers, from SRE to network operations and security teams...

A domain model is made of entities and their relationships, where entities represent what is relevant in your (business or technical) domain. They might be the usual entities in a APM domain (applications, services, business transactions...) or infrastructure entities (servers, VM, clusters, K8s nodes, etc.).
You can also model entities in a business domain (e.g. trains, stations, passengers, tickets, etc.).

Unlike Application Performance Monitoring (APM), where solutions like Appdynamics and its competitors excel in drilling down in the application architecture and its topology, with Full Stack Observability you really have full control end-to-end and have a context shared among all the teams that collaborate at building, running and operating the business ecosystem.

New standards like OpenTelemetry make it easy to convey Metrics, Events, Logs and Traces (MELT) to a unique platform from every single part of the system, including eventually robots in manufacturing, GPS tracing from your supply chain, etc.

All these data will be filtered according to the domain model and those that are relevant will feed the databases containing information about the domain entities and their relationships, that are used to populate the dashboard.

Those data will be matched with any other source of information that is relevant in your business domain (CRM, sales, weather forecast, logistics...) so that you can analyse and forecast the health of the business and relate it to the technologies and the processes behind. You can immediately remediate any problem because you detect the root cause easily, and even be proactive in preventing problems before they occur (or before they are perceived by end users). At the same time, you are able to spot opportunities for optimising the preformances and the cost efficiency of the system.

To see what is the official messaging from Cisco about the Full Stack Observability, check this page describing the FSO Platform.

Stay tuned, interesting news are coming...

March 27, 2018

Why do you run slow, fragile and useless applications and are still happy?

If you are not interested in the detail, at least browse the post and watch the amazing video recordings embedded :-)

We have already discussed the value of automation in the deployment of software applications.
It is also clear that collecting telemetry data from systems and applications into analytic platforms enhances visibility and control, with an important return on your business.

Cisco offers best of breed solutions for both automation and analytics, but the biggest value is in their integration end to end. Your applications can be deployed with Cisco CloudCenter and completely controlled with AppDynamics and Tetration, with no manual intervention.

Why do you need that visibility?

Thanks to the information provided by AppDynamics, you have immediate visibility of the performances and the dependency map of your software applications. Tetration exposes its compliance and the performances from a system standpoint.
CloudCenter offers your users a self service catalog where applications can be selected for deployment in one of the clouds you have configured as a target: but the smartest feature is that every deployment can also install the sensors for AppDynamics and Tetration automatically in each node of the application topology, so that telemetry data start to be collected immediately.

why we need to collect information at runtime

With that, now you have no excuse to keep running applications that do not perform well enough, that expose vulnerabilities and that produce limited business return due to poor customer satisfaction or inefficiency. The same applies to non compliant applications that break your security rules or your architectural standards, that were deployed some years ago and now are untouchable due to complexity and lack of documentation.

With such an easy integration of telemetry and the insight that you can get immediately, it makes no sense keep those monsters running in your datacenter or in your cloud. You can evolve them and remove the bottlenecks and security risks once identified.

analytics tools add value to the application telemetry

Analytic tools add value to the application telemetry

We want to demonstrate how easy it is.

This post is a follow up to the demonstration of the integration of Cisco CloudCenter with Tetration: we extended the demo with the addition of AppDynamics, so that our applications are now completely under control when it comes to security, compliance, performances and business impact.

Architectural Overview

We used a well known application as an example: WordPress is an open source tool for website creation, written in php. It uses a common LAMP stack: Apache + php + mySQL, running on Linux.

Wordpress is a two tier application, so you generally deploy two VM to run it: the front end is an Apache web server with the php application, the back end runs the database (mySQL).

We want that each tier is monitored, as a default, by both AppDynamics and Tetration. This must happen without introducing any complexity for the user that orders Wordpress from the self service catalog and must work in any target cloud. Based on the administrator preference, the user could even be unaware of the monitoring setup.

AppDynamics and Tetration integration with CloudCenter

Overview of the architecture

Next paragraphs describe the architecture of AppDynamics and of Tetration, so that you understand what integration we built to make CloudCenter inject telemetry sensors in each deployment.
Then the process triggered when a user deploys Wordpress from the CloudCenter catalog is explained.
Detailed video recording of all the steps is also provided.

AppDynamics: Architecture of the system

AppDynamics uses agents to collect information from the running servers and to send them to the controller. Agents are specific for the runtime of various programming languages, but there are also agents that interact only with the operating systems: you choose the type of agent that best fits each node. Also databases and their transactions can be monitored.

The Controller is where users go to view, understand, and analyze that data sent by agents.

Agents send data about usage metrics, code exceptions, error conditions and calls to backend systems to the Controller.

AppDynamics overview

CloudCenter: the Application Profile

Next picture shows the Application Profile of the Wordpress service that we have created in CloudCenter. Each VM in the two tiers will contain the application and the required sensors for AppDynamics and Tetration.

The Tetration injector component is an ephemeral Docker container that is used by CloudCenter just to invoke the API exposed by the Tetration cluster, so that the telemetry data are welcome when they arrive and associated to the scope of the Wordpress deployment. It disappears when the deployment is completed.

Topology of the application deployment, showing the sensors applied

As for any other application, the integration is implemented using custom scripts to deploy the agents for AppDynamics and Tetration

All application artifacts, scripts and services are stored in a Repository, and pulled by the CloudCenter agent running in each VM.

CloudCenter executes our scripts during different stages of the deployment, to add the AppDynamics Agent and the Tetration Sensor (using the same technique you could add any other agent that you use for backup, monitoring, etc.).
This is a video (2 min) showing how the Application Profile for Wordpress is built in CloudCenter:

CloudCenter integration with AppDynamics

The green boxes in next picture show the sequence of actions executed by the CloudCenter agent to deploy the AppDynamics php agent in the frontend VM: the same actions that the administrator would do manually.

Installing and configuring AppDynamics agents

For your reference we used a shell script with placeholders, where configuration parameters are replaced by CloudCenter dynamically as listed below:

AGENT_LOCATION=“http://cc-repo.rmlab.local”

APPD_CONTROLLER="appd.rmlab.local”

APPD_CONTROLLER_PORT="8090”

APPD_ACCESS_KEY="a4abcdc7-ce1c-41cb-[cut]”

APPD_ACCOUNT_NAME="customer1”

APP_NAME="$parentJobName” ($parentJobName will be replaced by the value WPDEMO, that is the name given to the deployment by the user)

TIER_NAME="$cliqrAppTierName” ($cliqrAppTierName will be replaced by the value WSERVER, that is how the tier is identified in the Application Profile)

HOST_NAME="$cliqrNodeHostname" ($cliqrNodeHostname will be replaced by the value C3-b2a9-WPDEMO-WSER, generated by CloudCenter when it provisions the VM)

This video (2 min) shows how the existing Application Profile is updated adding the deployment of the AppDynamics agent:

Tetration: Architecture of the system

Tetration is a ready-to-use big data platform which runs a Hadoop cluster on its core.

As described in a previous post, Tetration collects telemetry streamed by software and hardware sensors. It stores metadata within the data lake and runs machine learning algorithms to provide business outcomes

Tetration sensors, downloaded from the cluster itself, embed the required configuration and don’t need any user input. As soon as they are installed, they start to stream rich telemetry and can optionally control local workload policy enforcement

Tetration overview

CloudCenter integration with Tetration

At deployment time, a dropdown list allows the user to select one of the two types of sensor: Deep Visibility, or Deep Visibility with Enforcement (of security policies).

The telemetry data for this application are segregated under a specific scope, created by CloudCenter during the provision phase using the variable $parentJobName (containing the value WPDEMO in our demonstration).

The sensor are installed in each VM via a custom script, as described by next picture:

installing and configuring Tetration agents

application VM with all the agents installed

Wordpress VM with all the agents installed

Next video (6 min) shows how a service (Tetration Injector) is created and then added to the existing Application Profile:

Result of the deployment seen in CloudCenter, Tetration and AppDynamics

This video shows the deployment of the Wordpress application from the CloudCenter self-service catalog.

And next video shows the analysis of telemetry data in Tetration, when the Wordpress application is deployed:

Finally, we look at AppDynamics to see the analysis of the behavior of the application from a business standpoint:

Summary

Only Cisco can offer automated end-end, real-time application intelligence giving you 360 of visibility at business and network side Do you want to run this demo in your lab? Engage with us to setup a Lab.
All source code, the CloudCenter Services and the Application Profiles are available on github.

Credits and Disclaimer

This post describes a lab activity that was implemented by two colleagues of mine, Riccardo Tortorici and Stefano Gioia.

We created a demonstration lab to show our customers how easy it is to integrate the three products.

This is not the official documentation from Cisco about the integration, that will be released soon.

References

CloudCenter: https://www.cisco.com/c/en/us/products/cloud-systems-management/cloudcenter/index.html

AppDynamics: https://www.appdynamics.com/

Tetration: https://www.cisco.com/c/en/us/products/collateral/data-center-analytics/tetration-analytics/datasheet-c78-737256.html

Previous post on the integration of CloudCenter with Tetration: https://lucarelandini.blogspot.it/2017/10/turn-lights-on-in-your-automated.html

Pages