Creating a top class Digital platform – the AIOps way

Creating a top class digital platform that provides for a great user experience and one that continuously improves its performance, is every digital technologist’s dream. Getting into every user’s mind and seeing the application through their eyes, could be a great first step in this quest.

However, in the absence of these psychic skills, a Digital user Experience Management (DEM) solution that is based on Artificial Intelligence for IT Operations (AIOps), could be the next best option that you could rely on

Digital user experience challenges

  • Increasing digital user maturity means, the user experience expectations are higher than ever
  • Highly mobile users expect similar experience and performance across all types of connectivity including slow mobile links
  • Need to distinguish and balance access between premium users and other users
  • Seasonal workloads with high variance – need to provision application infrastructure for the highest load, yet keep infrastructure costs in control
  • Problem resolution becoming increasingly challenging, due to complex systems integrations within digital services

Key factors that influence these user experience challenges include, ease of use, application reliability, end user device performance, network connectivity and backend system performance.

Given the heterogenous nature of environments from which users access digital platforms, the complexity of controlling and analysing these factors are increasing exponentially.

AIOps based DEM – an Integrated approach

As you work through to optimise your digital platform to address these challenges and provide a great customer experience, you will not only require data on each user’s experience and their environments, you will also need effective and efficient means to continuously process the vast amount of data coming from users and other systems. You will also need to derive meaningful insights from the analysis and take self-correcting actions where possible.

While a traditional DEM solution will be able provide user experience data, they typically cannot be used to provide smart insights and take automated corrective actions.

The strengths of AIOps based systems, is in their ability to automatically learn and correct critical issues within systems. By integrating a traditional DEM solution with an AIOps solution, you will be able to deliver a highly effective system that combines user experience data with the powers of machine learning. This integrated solution can also help build a digital platform that dynamically adapts itself to user experience needs.

It is also critical that the solution integrates with other key service management systems in the enterprise, including service desk systems, collaboration systems and other analytics and visualisation systems.

The integrated solution can be used to manage several complex situations including dynamically improving application performance and dynamically scaling application infrastructure. The solution can also be used to provide recommendations at the system design / coding level and help improve the digital platform’s total cost of ownership.

Anatomy of an AIOps based DEM

The components required to make an integrated AIOps based DEM solution include,

  • Intelligent monitoring
  • Machine learnt correlation
  • Anomaly detection
  • Root cause and self-healing
  • Output integration

Intelligent monitoring: Digital applications tend to be designed with strong user experience principles in mind; however, a lot of them do not seem to complete the loop by designing in intelligence that can monitor and report the platform’s user experience parameters.

The intelligent user experience monitoring component should be able to monitor everything from the user’s actions and response times, to the type of network used, the latency, application crash details, transactions failures and client-side system resources consumed. These parameters from every single user, are then constantly fed into a backend AIOPs platform for processing. Although these intelligent agents should ideally be designed into the application from the start, retrofitting these components into web, mobile and legacy applications should be possible.

While real user monitoring could provide key insights about user experiences, synthetic transaction monitoring could provide valuable insights as an ‘ideal client’ and help continuously baseline the application’s availability and performance. Existing DEM tools could be reconfigured to provide synthetic transaction monitoring capabilities and such transactions could be initiated from multiple geographic locations, including the Data Centre. Like user experience monitoring outputs, synthetic transactions should also be monitored for factors including response times, latency, network links used etc. and compared against the outputs from various real user experience monitoring parameters.

The user experience picture will not be complete unless the user side data is combined and correlated with monitoring data from data centre side components like servers, storage, LAN etc. While most organisations can currently monitor these components, integration of monitoring outputs to the AIOps platform will be the key to a successful solution.

Machine learnt correlation: Monitoring outputs from client and server sides could then be fed into an AI based alert correlation engine, where algorithms are used to automatically reduce flood of alerts, number of false alerts and prevent alert fatigue for manual alert monitoring. Unsupervised algorithms can also be used to group and cluster alerts allowing intelligent systems to be used for further analysis. AI based correlation engines are known to reduce alerts by over 80% after it has gone through rigorous learning, greatly increasing monitoring effectiveness.

Anomaly detection: The correlated alerts could then be passed through an anomaly detecting unsupervised learning model, to detect and predict system anomalies. An example could be, after detecting a high resource utlilisation pattern, which is causing progressively slow response times, a service failure could be predicted to happen in the next 10 minutes, based on learnt usage patterns and seasonality predictions.

Root cause and self-healing: Once a pattern is detected, root cause analysis machine learning models will be able to identify elements that are causing the issue and generate actionable insights to resolve the issue. Where possible, expert led systems could then identify recommended actions, which are passed on to the application through automated interfaces. Actions are then taken within the application to fix issues without human interaction.

Output integration: It is critical that the outputs from this solution propagates to other systems in the organisation including collaboration, service desk and visualisation system. The solution should hence be capable of integrating into such systems.

Example use case 1: Automatically improving digital application performance

Application performance issues can arise due to several reasons, including issues at the client side and issues at the backend. An AIOps based DEM solution could be deployed to automatically detect and where possible, self-heal from such performance issues.

Intelligent agents deployed within the client-side application, continuously monitor the performance and various parameters that influence performance. If a performance issue is detected, data from other users and synthetic transactions could be compared to identify the root cause of the issue.

If the performance issue is caused by certain components (like large media files), the application could be dynamically reconfigured to load the functional, lighter components first and the larger components later. Similarly, if the root cause of the performance issue is found to be a slower network link, a lighter version of the application could automatically be loaded to ensure users can interact with the system and transact.

In case of backend system capacity issues, additional capacity could dynamically be added to provide resolution. Key insights from the analysis could also be sent to the developers for continuously improving the application’s user experience.

Example use case 2: Application infrastructure elasticity prediction

Seasonal load on applications, including large intra-day seasonality means, application infrastructure is either configured to cater for peak periods resulting in low utilisation and high costs or if using an auto scaling cloud platform, unpredictable costs.

Using an AIOps based DEM solution, digital platform owners can get a detailed view of platform usage at the application and user level and also predictions of load, based on parameters like user types, usage type, access locations and seasonality.

Detailed infrastructure scaling policies can then be set based on these predictions and real time application usage. As an example, if the system load is caused by unauthenticated users, the policy may restrict scalability to a lower level. However, if the load is caused by authenticated / premium users, the scalability could be set to a higher level. The platform could also be configured to deploy the application across multiple infrastructure instances based on user priorities.

This will also allow platform owners to have a better view of infrastructure costs and balance the need between performance and costs.

Leave a Reply

thirteen − seven =