Our Blogs

Operational Co-Pilots with Generative AI

Generative AI led Operational Co-Pilots 

Operational AI and role of Generative AI / LLMs:

Some of the common use-cases in OPS AI could be:

  • Support co-pilot
    – Guiding and hand-holding support personnel on actions to take
  • Auto-detect incidents
    • Automated detection and preferably prevention of incidents via proactive observability, alarms and triggering of actions to remediate the potential incidents
  • Automated Resolution
    • Automated resolution of incidents, completion of user requests

Data Sources:

  • Log files
  • ITSM Tickets
  • Alarms
  • User initiated incident reports
  • User requests

In an enterprise setup, context of the data sources can be further classified into:

  • Deterministic Domain:
    • Key Attributes required to identify the source are well structured and pre-defined
      e.g. An alarm is already tied to a specific host, application, event, threshold thus clearly identifying the source and context
  • Finite/Limited Domain
    • Attributes are not well defined but belong to a smaller set of possibilities

e.g. A user who uses a limited set of applications reports a problem with his/her access or system slowness.
In this scenario, you can narrow down the key attributes based on the role and access of the user

  • Probabilistic Domain
    • The domain is not clearly defined or there are no attributes to attached within the data source.
      In such instances, the domain needs to be inferred from the content of the source data. This would be done through predictive models based on content and the respective models to classify, predict and subsequent process orchestration in order to utilize the output.
      e.g. an enterprise-wide slowness of IT systems could lead to entries in a variety of web/app/database logs. Based on specific messages, keywords, search prediction models could predict potential issues and take remediation action

e.g. another example could be a negative press article or several social media feeds depicting a negative sentiment about the company and its products will need to be parsed and evaluated to predict the relevant areas within the organization that need to act – this would be typically be carried out by predictive models

  • Unknown Domain
    • There will be situations where the data does not sufficiently represent the context and models cannot determine the context with a high level of certainty – in such cases a “human in the loop” step in the workflow will play a critical role

Extent of automation in these scenarios can vary.

While human in the loop should not be completely eliminated in order to keep your AI responsible, accurate and relevant, the efficiency of the end-to-end workflow and as a result the productivity of resources can significantly improve – to the tune of 20% – 70%, in most cases depending on the current state of operations, automation completeness of CMDB, configuration of ITSM systems, configurations of alarms, maturity of CMDB.

Adoption of AI is going to need an increased and renewed focus on process reengineering.

Instead of automation to be considered as just as a means of labor arbitrage, its true value is going to be in outcomes such as reduced risks, faster turn around times, improved availability, better customer experience, improved Security controls. This will require designing every business process, operational solution to think about manual effort as more of as exception than BAU.

Just like organizations train new employees, create runbooks, they need to startĀ  thinking about training the AIĀ  models for every runbook activity.

Every inquiry / data query request / data maintenance request, should be evaluated first for a self-service automated process. Moreover with advent of GENAI, each of these should utilize the capabilities of LLMs, Retrieval .Augmented Generation (RAG), embedded indexes (vectors) of enterprise data