sdecoret - stock.adobe.com
Any sufficiently advanced technology might be indistinguishable from magic. And the history of IT demonstrates that such technologies invariably exceed human capacity to control them without automation.
VM sprawl was the first sign of this impending chaos that engulfed virtual infrastructure. Now, that predicament is superseded by broader waste issues as cloud services continue to displace conventional enterprise infrastructure.
IT management has evolved from simple UI portals, which connected multiple systems and exposed various IT admin functions in a central location, to sophisticated statistical and machine learning algorithms. These algorithms harness the massive amounts of telemetry data modern physical IT infrastructure and cloud services generate to filter, correlate, summarize, analyze and, ultimately, predict the behavior of an entire cloud environment.
Let's explore which features are required of a true AIOps tool and, through that lens, examine the role of AI in capacity planning and resource management.
Core AIOps features
Artificial intelligence has meant many different things to various groups of people, spanning decades. AIOps, or AI for IT operations, is a Gartner buzzword that evolved from its initial term for the concept, algorithmic IT operations. The latter term is closer to the reality of what AIOps products actually do.
Most of AI's promised features are little more than the application of operations research (OR) to IT. OR breaks tasks into their basic components and uses mathematical analysis -- for example, algorithms, machine learning and graph analysis -- to complete said task in a set of defined steps. AI performs this same process with advanced automation.
Given its breadth of features and potential applications, AIOps cannot be considered a platform or type of product -- it's a feature that fits into a wide variety of IT ops tasks and applications.
Products that incorporate AIOps features demonstrate six key characteristics:
- The ability to ingest data from multiple sources in near real time and in high volume via support for various data transfer protocols and system APIs.
- Support for multiple data types, including system and application telemetry, logs and configuration files.
- Use of a wide variety of statistical, simulation and optimization algorithms, such as linear programming, and machine learning modeling to analyze data.
- The ability to self-correct and optimize statistical and machine learning models, based on data feedback.
- APIs to enable task automation via batch operation or workflows, which are triggered by events, measurements or model predictions.
- A graphical management UI with data visualization and summarization -- i.e., dashboard creation -- tools.
Combined, these capabilities assist IT operations and DevOps teams in several tasks. For example:
- The AI tool identifies baseline norms for system and application performance. This provides an early warning of anomalies that indicate systemic problems, security intrusions or second-order changes (acceleration/deceleration) in usage trends.
- It can predict future capacity requirements based on a combination of persistent increases in demand for the baseline workload, as well as for variable and temporal changes.
- AI makes probabilistic insights based on available data to assist in forensic incident and root-cause analysis.
- It replaces routine manual tasks with automated, data-driven workflows.
Apply AI to capacity planning and resource management
Two of the most valuable applications of AI are for capacity planning and resource management -- particularly for VMs, cloud instances and container nodes.
Algorithmic analysis is valuable to cloud environments for two reasons.
First, the ease with which cloud resources deploy, move and scale leads to their proliferation. Human sloth and forgetfulness -- and overfull plates -- often lead to abandoned instances that consume resources and rack up bills. The combination yields cloud resources that are difficult to track and control via manual processes.
Second, that same facility for resource creation and reconfiguration makes its elements, such as compute instances, cloud object stores and virtual networks, amenable to optimization based on AIOps analysis. Management APIs enable IT teams to configure, clone and decommission enterprise VM environments and cloud instances programmatically.
AIOps can minimize greatly or solve several of cloud users' common problems. For example, users can:
- Select the best compute instance for a particular application that balances cost and performance. In an era of increasingly specialized hardware, with instances tailored for a variety of workloads -- such as general-purpose, storage-intensive, GPU-accelerated or high-performance computing workloads -- choosing the best instance for the job is quite complicated. And manual methods require regular updates to spreadsheets of instance types to account for changes or additions.
- Proactively monitor VM or cloud instance utilization and performance, as well as predict the need for capacity adjustments, whether measured by the number of instances, instance size or a combination of both.
- Find underutilized resources with comparable workloads and recommend consolidation opportunities to reduce the instance count without compromising performance.
- Flag cloud workloads for hybrid environments that would be better served -- either in terms of price or performance -- by redeployment to on-premises servers.
- Identify cloud instances that are targets for sustained use or reserved instance discounts due to their predictable or relatively static usage patterns.
- Find overprovisioned storage that is a good candidate for cold or near-line services to save money.
AIOps vendor summary
Machine learning, statistical-probabilistic and simulation algorithms improve the efficiency and efficacy of infrastructure management software. Thus, every server, storage, network or IT operations management software vendor is incorporating AIOps features in their product. Because AIOps is a feature set, not a product category, it is difficult to identify a group of AIOps vendors. Some examples of companies with products focused on AIOps for capacity planning and optimization include BMC Software, IBM, Quest, ServiceNow and Splunk.
AIOps can deliver tangible improvements to infrastructure resource use and application performance at lower costs; it optimizes workload placement and identifies underutilized VMs, instances and storage resources. Organizations with sizable fleets of VMs and cloud instances should achieve a rapid ROI with AIOps on their compute environments.