kentoh - Fotolia
Telemetry vs. SNMP: Is one better for network management?
Networks are more complex, making network management more challenging. Enterprises are weighing telemetry vs. SNMP to see which method is better. Each has pluses and minuses.
Simple Network Management Protocol, or SNMP, and telemetry operate with quite different mechanisms. When weighing telemetry vs. SNMP, do those distinctions make one better than the other?
How SNMP works
SNMP has been in use for network management since 1990 and is widely supported by both network devices and monitoring platforms. Device performance data is collected through a polling mechanism and returned to the management platform. There are three versions of SNMP, with SNMPv3 adding important authentication and encryption features.
SNMP uses a simple protocol that requests data identified by one or more object ID (OID) in a GetRequest, GetNextRequest or GetBulkRequest packet. Data is returned in response packets. The OIDs are structured in a management information base (MIB). It is easy to perform ad hoc data collection as needed. Asynchronous events can be communicated back to the management system via SNMP traps or via syslog. Data is transported via User Datagram Protocol (UDP), which requires only minimal overhead on both the network device and the management system.
SNMP's polling architecture also has a downside. The management system needs to create and send data requests to each device, only to repeat the process a few minutes later. There is also a processing cost. Lexicographical sorting in the MIB is different than the way that interface performance data is stored, so the device's CPU has to do more processing to handle the polling requests.
A vendor-independent MIB, named MIB-II, provides a general set of operational variables across a wide range of devices. Vendors can augment MIB-II with custom MIBs, and some network management systems take advantage of this additional data source.
How telemetry works
Streaming network telemetry is a relatively new mechanism that uses a push model to continuously send high-resolution device operational data to a network management system. It sends data at a higher rate and with lower impact on the network devices than with other methods, like SNMP or the command-line interface (CLI). Data is selected by configuring a periodic cadence, which can be subsecond or an event trigger, such as a threshold breech (e.g., high errors) or a status change (e.g., interface state change).
The data is encoded as XML, JSON or Google protocol buffers. Either UDP or TCP transport can be used, frequently in conjunction with Google Remote Procedure Calls (gRPC), with encryption. GRPC enables a collector to dynamically request a data stream from a network device. It can be used to establish new data streams or to poll for data that rarely changes.
Model-driven telemetry, meanwhile, is based on YANG (Yet Another Next Generation) models and simplifies the selection of the data to stream. The OpenConfig working group is creating standardized models that can be applied across groups of network devices. In addition, Google, through its gRPC Network Management Interface (gNMI) initiative, is attempting to define a standard that governs how telemetry can be used to retrieve network state data.
The volume of data that can be streamed from even a moderately sized network can be huge, requiring big data storage and processing mechanisms. Network managers have to determine the cadence or event triggers for streaming each type of data so they don't overwhelm the processing capabilities of the network management system in question.
Comparing telemetry vs. SNMP
SNMP is used best when retrieving relatively static data, such as inventory or neighboring devices. Its polling mechanism makes collecting high-volume, high-resolution performance data a challenge.
Note: Several network management products exist that can collect a full suite of performance variables from more than 1 million interfaces on thousands of devices every minute from one server. Clearly, a good implementation is critical to good performance.
SNMP is useful for networks equipped with significant numbers of older devices that don't support telemetry. It is also good for collecting nonperformance data, such as routing peers, bridge domain neighbors, Network Time Protocol peers and device inventory information -- i.e., serial numbers, modules and slot locations. Finally, the protocol's use of UDP eliminates the need to allocate large receive buffers, enabling management servers to more efficiently allocate internal memory.
Streaming telemetry is better for collecting high-resolution performance data, such as high-speed network interface statistics. It's becoming more practical as more device and network management vendors begin to support the methodology.
In addition, newer RPC mechanisms make telemetry more efficient than SNMP or CLI in obtaining data from network devices, making telemetry the obvious choice going forward. Keep in mind, however, that telemetry collectors that rely on TCP connections may use a significant amount of memory for receive buffers, depending on the implementation. Moreover, the large number of YANG models for each vendor can make it difficult to analyze streaming data.
For networks that contain a mix of old and new network devices, a combination of SNMP and telemetry will be best. A switch to telemetry is possible when all network devices within an organization support it.
Regardless of how you may assess the data collection methods of telemetry vs. SNMP, network management is essentially a big data problem. The management system needs to process large volumes of data to identify anomalies and alert the network operations team to problems. The OpenConfig and gNMI initiatives are working to simplify data collection and analysis.