There's too much hardware and too many applications, running across local, edge and cloud data centers, to monitor it all with narrowly focused point tools. Disparate tools produce fragmented information that leads to IT operations oversights and mistakes, impaired infrastructure and app performance, and potential security and compliance risks.
Artificial intelligence for IT operations (AIOps) overcomes these limitations. AI-based monitoring tools gather information from the IT tools and devices already in place, apply detailed analytics and machine learning to that information and use the results to identify and correct potential issues within an environment.
How AI offers IT answers
At the highest level, AI helps IT staff make better decisions. AIOps monitoring platforms correlate events and alerts to recognize relationships across applications and infrastructure and distinguish between normal and abnormal system behavior.
This enables admins to segregate the serious issues and events from minor ones and receive fewer alerts than with conventional static or manually set monitoring thresholds -- sometimes called noise reduction. The tool also directs alerts to the best administrator or team for a given issue and thereby instates better workflows than one-size-fits-all alert chains. For example, an application performance alert after a patch or update is best routed to the development team, while an application server alert should go to the infrastructure team.
The importance of integration
The more data an AIOps tool ingests from other IT tools and systems, the more comprehensive and meaningful the results of its analysis will be.
This is why AIOps tools typically provide dozens of out-of-the box integrations with other major tools, such as those for orchestration, notifications, network management, logging and application performance. Common integrations reach into offerings such as Microsoft System Center, Zabbix, Splunk, ServiceNow, AWS and Microsoft Azure. Some AIOps vendors offer over 100 integration options.
For IT shops that use open source tools, or develop their own tools in-house, custom integrations via REST APIs and webhooks might be necessary.
This combination of machine learning and correlation enables an AIOps monitoring system to handle troubleshooting tasks, such as root cause analysis. As incidents and alerts arise during the course of IT operations, the tool makes detailed recommendations about the underlying cause and updates or refines those recommendations if the issue recurs.
At a lower level, AI tools provide granular control over how log data is collected and used. Monitoring policies use keywords and event filters to stipulate specific log data that must be captured in real time or at designated intervals. Auditing controls help administrators review log activity and ensure that they receive notifications when specific log events occur.
AIOps monitoring vs. APM and IPM
The difference between AIOps monitoring tools and application performance monitoring and infrastructure performance monitoring is primarily scope as opposed to categorical: APM and IPM tools provide specific monitoring functions and produce detailed guidance for a limited set of software and hardware resources. Although such point tools are effective, they do not evaluate other data and behaviors in the environment.
For example, an APM tool might indicate excessive storage latency, but an administrator has to dig into the APM log data and cross-reference it with logs from other venues, such as infrastructure or systems management, to find a relationship between the application's performance problem and the storage subsystem. The admin, instead of the tool, must determine whether the issue is with the application, the network, the storage subsystem or a specific disk.
AIOps monitoring tools overcome this cause-and-effect opacity, as they correlate metrics and events found in logs to generate a bigger picture of operations.
To continue the previous example, an AIOps tool could ingest logs from APM, IPM and other tools to discover the root cause of the application's storage latency issue: A certain disk in the related logical unit number has reported errors to the storage subsystem and is on the verge of failure. The AIOps tool recommends -- or even automatically triggers -- a disk rebuild to easily avoid a storage failure with minimal support team interaction.
AIOps gets IT closer to business
While all AIOps monitoring tools provide a broader and more in-depth view of an IT environment than point offerings, some do the same for the business side by coupling infrastructure and software performance to business outcomes -- such as revenue generation -- or predicting the effect of hardware or software changes on business results.
For example, StackState is an AIOps tool that enables enterprises to relate IT events to business events, such as how a server disruption affects sales. Monitoring provider AppDynamics, a Cisco company, offers Business iQ for monitoring insights related to business processes and has built up an AIOps technology in its Cognition Engine, a combination of monitoring data correlation and automation.