4 essential KPIs for edge computing management
If you want to track edge deployment activity, look at storage, network and processing resources to guide workload configuration and maintenance needs.
Edge computing requires careful attention to infrastructure management. Solid remote management strategies and tools are table stakes for these deployments, and IT administrators and business leaders must consider several important metrics to help gauge the state and performance of each edge site.
Including site availability, network, storage and some processing data on a dashboard makes management and resource provisioning much easier for admins and business owners.
When it comes to edge computing management, there is little value in constantly tracking granular compute metrics such as CPU and memory capacity or utilization. Such granular metrics are often more relevant in dynamic virtualized environments, whereas edge computing sites are usually more static.
Site availability
Any edge site is a significant investment in time, money and hardware. The data collected and work performed at each edge site represents a measurable value for the business. Some measure of edge site availability and uptime is a key performance indicator for IT and business use. Availability is traditionally represented as a percentage of total hours per month versus working hours per month.
Some downtime is perfectly normal. Logs help correlate downtime to external factors, such as power or WAN disruptions, and provide the business with improvement opportunities, such as adding a supplemental power system or utilizing an alternative WAN service. Once a business figures the value of an edge site's work, downtime can translate to lost revenue -- or downtime costs -- as an objective consideration for edge site upgrades.
Network performance
Network metrics such as bandwidth utilization and latency help IT and business leaders understand how the LAN and WAN perform over time across the different edge sites. Bandwidth figures gauge the sheer amount of traffic moving across the network, whereas latency indicates delays between data packets being sent and acknowledged.
Bandwidth utilization data tracks how much of the available network is actually being used and may be represented as a percentage of total available bandwidth. Bandwidth utilization may also be tracked as a moving average with recorded spikes. High bandwidth utilization -- and frequent spikes -- can suggest the need for a network upgrade.
Latency is presented as milliseconds for internet networks, and admins measure the time against a common or desired baseline. As average latency reaches or exceeds an established baseline, admins should look for any performance or latency issues.
In edge computing management, bandwidth and latency concerns can be related. As bandwidth approaches maximum capacity, latency frequently increases to indicate network congestion. If latency unexpectedly increases as bandwidth use remains modest, there may be infrastructure problems lurking elsewhere which might demand closer attention.
Storage capacity
Edge computing is so relevant because of organizations' need to store and process huge volumes of data they cannot readily move to a centralized data center. This means an edge will depends on local storage resources until admins can move the data as a batch or processed on site -- and send results to a data center.
Storage is measured in bytes, and storage capacity is gauged in gigabytes or even terabytes, depending on the number of workloads and the actual amount of data expected at the edge site.
Storage utilization metrics are generally reported as a percentage -- or ratio -- of the used storage versus the total storage capacity available. IT and business leaders use the storage utilization metric to determine whether and when the edge site needs more storage capacity.
Utilization information can guide decisions on data retention policies and data protection and backup decisions. Storage capacity planning is critical here, because many edge sites can be remote, and organizations must often dispatch personnel to perform an upgrade as a field engineering exercise as a part of edge computing management.
Compute resources
Even edge computing sites host workloads that perform the data collection, and many edge sites also perform some amount of data processing, such as normalizing or translating data. This typically requires some number of servers running applications. IT departments and managers then follow other metrics that reflect the compute status and performance of the edge site.
In an edge computing management setup, it is more beneficial to follow more abstract metrics such as workload availability and monitor each workload on site with some form of application performance monitoring (APM).
APM software tracks granular compute resources such as CPU utilization, memory use, data throughput and bandwidth utilization, as well as other elements such as transactions and I/O. Measuring APM results against a performance baseline can help IT and business leaders to determine the relative performance and availability of each edge workload.