The Relativity of Wrong: business risk observability

Showing posts with label business risk observability. Show all posts

July 28, 2023

Why Application Security is important (and complementary to perimeter security)?

Outstanding application security is foundational to a brand's reputation, creating and building trust and loyalty with users. But vulnerabilities can occur anytime, anywhere (in your code, in commercial applications, in libraries you've integrated and in remote API that you invoke), making it difficult and time-consuming to prioritize responses.

<Suggestion for people in a rush> If you only have 5 minutes, just scroll down and look at the amazing recorded demo: it explains everything better than the post itself </Suggestion for people in a rush>

Avoiding costly delays that can result in continuing damage to revenue and brand reputation means organizations must have clear visibility into each new vulnerability and the insights needed to prioritize remediation based on their business impact.

The traditional security schema, based on just protecting the perimeter with firewalls and IPS, is no longer sufficient. You need to protect the full stack, including all the software tiers.

Business Risk Observability

Speed and coordination are paramount when dealing with application security risks.

Bad actors can take advantage of gaps and delays between siloed security and application teams, resulting in costly and damaging consequences. Traditional vulnerability and threat scanning solutions lack the shared business context needed to rapidly assess risks and align teams based on potential business impact. To triage and align teams as fast as possible, teams need to know where vulnerabilities and threats impact their applications, how likely a risk is to be exploited, and how much business risk each issue presents.

One fundamental use case in Full-Stack Observability is business risk observability, supported by new levels of security intelligence capability that brings business context into application security. The new business risk scoring enables security and applications teams to have a greater threat visibility and intelligent business risk prioritization, so that they respond instantly to revenue-impacting security risks and reduce overall organizational risk profiles.

New Cisco Secure Application features and functionalities include business transaction mapping to understand how and where an attack may occur; threat intelligence feeds from Cisco Talos, Kenna, and Panoptica; and business risk scoring.

Business Transaction Mapping

New business transaction mapping locates how and where an attack may occur within common application workflows like ‘login, checkout, or complete payment’ so that ITOps and SecOps professionals can instantly understand the potential impact to your application and your bottom line.

Threat Intelligence Feeds

New threat intelligence feeds from Cisco Talos, Kenna, and Panoptica provide valuable risk scores from multiple sources to assess the likelihood of threat exploits

Business Risk Scoring (for Security Risk Prioritization)

New Business risk scoring combines threat and vulnerability intelligence, business impact and runtime behavior to identify the most pressing risks, avoiding delays, and speeding response across teams.

Video Demonstration of the Business Risk Observability use case

See a complete, explanatory demonstration of how a risk index associated to your business transactions allows to discover and remediate vulnerabilities with a proper priority assessment:

https://video.cisco.com/detail/video/6321988561112

July 14, 2023

Navigating relationships across monitored entities

I have described the Cisco FSO Platform as an extensible, developer friendly platform that can ingest all kinds of telemetry and is able to correlate those data into a meaningful insight.

But... what does it really mean? Some readers told me it's an abstract concept, they don't get how it relates to their daily job in IT Operations.

Let's define telemetry first: it is all the data that you can get from a running system, like a Formula 1 car running on the race track (speed, consumption, temperature, remaining fuel, etc.). Or from your IT systems, that include applications, infrastructure, cloud, network, etc. In this case, data come in the form of Metrics (any number you can measure), Events (something that happened at an instant in time), Logs (information written by a system somewhere) and Traces (description of the execution of a process).

This is the origin of the acronym MELT, that you see written on the walls these days. Everyone is excited by Observability, that is the ability to infer the internal state of the system by looking at its external signals (e.g. collecting MELT). Generally, Observability is realised within a domain: a consistent set of assets of the same type (technologies, devices, or business processes). Example: network monitoring, application performance monitoring (APM), etc.

The fun comes when you're able to correlate MELT to investigate the root cause of an issue, or to find spots for optimising either performance or cost, or to demonstrate business stakeholders that all the business KPI are OK thanks to the good job done by the IT Operations folks :-)

Even better when you're able to correlate MELT across different domains, to extend observability end-to-end. The entire business architecture is under control. You can navigate all the relationships that link the entities that are relevant in your monitoring, and see if any of those is affecting the global outcome (faults, bottlenecks, etc.).

Example: LinkedIn

One illuminating example for this type of navigation is the parallelism with the LinkedIn website, and the exploration of your network of contacts to find a specific person, or information about their professional role, their company, their activity.

Every IT professional I know has a profile on LinkedIn, and each of them generates information: they post articles or photos, they react to others' posts (either repost, or suggest/like them), they advertise events, they update their profile (this can be associated to generating MELT). In addition, everyone is connected to other people, so that you have 1st degree (direct) connections but also 2nd degree connections that you inherit from the 1st degree ones.

Click on the video below to see a graphical representation of the navigation across a network of connections on Linkedin, and the flow of information generated by each one of the people in the network.

Now you can imagine a similar network of logical connections among entities that you monitor with the Full Stack Observability platform. You can explore how they are related to each other, and how every one affects the behaviour and the outcome of the others.

In a typical IT scenario, the entities might be the navigation of a user in the software application that supports a digital service (a Business Transaction), a service, the Kubernetes cluster where the service is running, a K8s node, the server running the node (that might be a VM in the cloud), the network segment to connect to the cloud, the cost of cloud resources, the carbon footprint generated by the infrastructure.

Correlation

All the relationships among the monitored entities are explicitly shown in the user interface, and you can move your focus to another object and inspect it, accessing the current health state, its history, and all the Metrics, Events, Logs and Traces it has generated. This makes extremely easy to understand if an issue detected in one of the entities propagates to others, affecting the way they work.

Also the Health Rules that you can define for one entity could include the evaluation of related entities, so that you roll up warnings and awareness at the top level based on what supporting entities are doing.

In this screenshot I've highlighted the list of relationships in the panel on the left side, with a green dashed line. That list continues, so scrolling down you would also see Workloads, Pods, Containers, Hosts, Configurations, Persistent Volume Claims, Ingresses, Load Balancers and Teams (yes, the organisational teams that are responsible for this cluster). The number on each entity type shows how many objects of that type are related to the one (the K8s cluster) that is currently in focus in the central pane.

Though we have information about all the entities in the system, all the objects that are not in direct relationship with the entity in focus are automatically hidden in the list, to remove what we call the "background noise". Showing only what really matters increases focus, and makes the investigation easier. You can click, let's say, on the two Business Transactions (luckily in this example both are in green health state) to see what business processes would be impacted by a problem occurring in this K8s cluster.

Of course, scrolling down we would see in the central panel all the information available about this cluster, including all the MELT it has generated in the time interval under investigation (see the options below).

What I have described in this post is just the basic capabilities of the Cisco FSO Platform. You can find the full detail in the official documentation.

In next posts, I'll explain the most relevant use cases and the impact that Full Stack Observability can have on your business.

June 29, 2023

Full Stack Observability use cases

Business Use Cases

Full Stack Observability is all about collecting any possible data from the applications running your digital services (i.e. business KPI) and from the infrastructure and cloud resources supporting them (i.e. the telemetry), including potentially also IoT, robots or whatever device involved in the process.

And then correlating those data to create an actionable insight, so that you have full control of your business processes end-to-end and you do better than your competitors (faster, more reliable, more appealing processes and services).

The FSO value proposition is not only related to technology (the infrastructure that you can monitor and the metrics you can read). It is a business value proposition, because observability has an immediate impact on the business outcomes.

Associating business processes, and digital services supporting those, with the health state of the infrastructure gives the Operations teams an immediate and objective measure of the value - or the troubles - that IT provides to their internal clients, that are the lines of business (LOB). And LOB managers can enjoy dedicated dashboards that show how the business is doing, highlighting all the key performance indicators (KPI) that are relevant for each persona in the organization.

If there is any slowdown in the business, they see it instantly and can eventually relate it to a technical problem, or maybe to the release of a new version of a software application, or to the launch of a new marketing campaign. The outcome of any action and of any incident is connected to the business with... no latency. The same visibility is also useful when the business shows a better performance than the day before. You can relate outcomes to actions and events.

So, before speaking about the technology that supports the Full Stack Observability, let's discuss about the use cases and their impact.

We can group the use cases in three categories: Observe, Secure and Optimize (referred to your end-to-end business architecture).

In the Observe category, we have 4 fundamental use cases:

- Hybrid application monitoring

This refers to every application running on Virtual Machines, in any combination of your Data Center and Public Clouds, or on bare metal servers.

You can relate the business KPI (users served, processes completed, amount of money, etc.) to the health state of the software applications and the infrastructure. You can identify the root cause of any problem and relate it to the business transactions (= user navigation for a specific process) that are affected.

- Cloud native application monitoring

Same as the previous use case, but referred to applications designed based on cloud native patterns (e.g. microservices architecture) that run on Kubernetes or Openshift. Regardless it's on premises, in cloud, or in a hybrid scenario. Traditional APM solutions were not so strong on this use case, because they were designed for older architectures.

- Customer digital experience monitoring

Here the focus is on the experience from the end user perspective, that is affected by the performance of both the applications and the infrastructure, but also - and mostly - by the network. Network problems can eventually affect the response time and the reliability of the service because the end user needs to reach the end point where the application is run (generally a web server), the front end needs to communicate with the application components distributed everywhere, and these may be invoking remote API exposed by a business partner (e.g. a payment gateway or any B2B service).

- Application dependency monitoring

In this use case you want to assure the performance of managed and unmanaged (third-party) application services and APIs, including performance over Internet and cloud networks to reach those services. Visibility of network performance and availability, including both public networks and yours, is critical to resolve issues and to push service providers to respect the SLA of the contract.

In the Secure category, we can discuss the Business Risk Observability use case:

- Application security

Reduce business risk by actively identifying and blocking against vulnerabilities found in application runtimes in production. Associate vulnerabilities with the likelihood that they are exploited in your specific context, so that you can prioritize the suggested remediation actions based on the business impact (shown by the association of vulnerabilities with Business Transactions).

In the Optimize category, we have the following use cases:

- Hybrid cost optimization

Lower costs by only paying for what you need in public cloud and by safely increasing utilization of on—premises assets.

- Application resource optimization

Improve and assure application performance by taking the guesswork out of resource allocation for workloads on—premises and in the public cloud.

Observability and network intelligence coming together

The use cases listed above goes beyond the scope of traditional APM solutions (Application Performance Monitoring) because they require to extend the visibility to every segment of the network. This picture shows an example of possible issues that can affect the end user experience, and need to be isolated and remediated to make sure the user is happy.

That is generally difficult, and requires a number of subject matter experts in different domains, and a number of tools. Very few vendors can offer all the complementary solutions that give you visibility on all aspects of the problem. And, of course, they are not integrated (vertical, siloed monitoring).

Data-driven bi-directional integration

The Full Stack Observability solution from Cisco, instead, covers all the angles and - in addition - it does so in a integrated fashion. The APM tool (AppDynamics) and the Network Monitoring tool (ThousandEyes) are integrated bidirectionally through their API (out of the box, no custom integration is required).

The visibility provided by one tool is greatly enhanced by data coming from the other tool, that are correlated automatically and shown in the same console.

So, if you're investigating about a business transaction, you don't see just the performance of the software stack and its distributed topology, but also the latency, packet loss, jitter and more network metrics in the same context (exactly in the network segments that impact the traffic for that single business transaction, at that instant in time).

Similarly, if you're looking at a network, you immediately know what applications and business transaction would be affected if it fails or slows down. And automated tests can be generated to monitor the networks and the end points, that are created automatically from the topology of the application that the APM tool has discovered.

Exciting times are coming, the Operations teams can expect their life to be much easier when they start adopting a Full stack Observability approach. More detail in next posts...

Pages