Sergey Nivens - Fotolia


Cisco's Tetration adds visibility for monitoring data centers

Cisco analytics engine takes monitoring data centers to new levels by capturing network traffic and employing big data techniques to diagnose problems.

Are you running a data center? Do you know what applications are running there, where they are located and what network traffic they generate? Are you able to lock down network security using a whitelist model that makes it much harder for bad actors to penetrate? Cisco's new analytics engine, Tetration, gives administrators another tool they can use to answer these questions. Without it, you may be spending a lot of time with an application performance management product and staring at packet traces.

Tetration provides a new level of visibility into monitoring data centers. "Ahh," you say, "I already have flow data collection in my data center. I know everything that's happening in my data center." Not so fast. Let's take a closer look.

Flow data provides visibility into the flows, which is great. But it doesn't compare with the visibility Tetration provides. The secret is in collecting much more detailed information about network traffic and then analyzing that traffic using machine learning techniques.

Cisco's Tetration collects data from all packets in the data center, allowing it to determine things that flow data cannot provide. (Cisco says Tetration doesn't sample; each packet is combed through.) To make the data collection manageable, only the packet headers (160 bytes) are collected. Encrypted data? No problem. The data portion of the packet isn't used. I like the analysis functionality and that it treats the network as a system.

How monitoring data centers works

The problem is that network and security administrators often do not know the characteristics of the network traffic that must be allowed in order for the application to function correctly.

Data collection is done in two locations. The first is within the network and is performed by Cisco Nexus 9300 switches, a new model switch that uses an ASIC that performs data capture without impacting the switch CPU. The second is within the data center servers themselves, via host sensors. The host sensors also identify the application that's involved, which facilitates application dependency mapping. If your data center is based on other models of data center equipment, you'll have to settle for just the host sensor to collect application traffic data. Network traffic generated by sending the packet headers to the Tetration analytics engine is minimal -- about 1% of total network volume. Overhead on the servers and network equipment is also minimal.

The packet header data and application endpoints create a big data problem. It's addressed by a Hadoop-based machine learning system that correlates the data to arrive at the application dependency map for each application. It can tell you what protocols and protocol ports are in use. It knows the network infrastructure that the packets have traversed. So, how can we use the data?

Application dependencies

Let's say you need to take a switch down for maintenance or replacement. Will your data center redundancy work correctly and automatically work around the switch when you take it out of service? Monitoring data centers using an even better approach is to steer traffic to alternative paths, verify that the switch is no longer handling production traffic, then take the switch down. A similar exercise involves the sudden failure of a switch. Which applications and servers are affected? Do some virtual machines need to be restarted elsewhere in the data center?

Application dependency mapping can also be used to report on the number of components that make up an application. Some organizations are surprised to find that an alleged three- or four-tier application actually has six or seven layers. Where are the servers located? Studies have shown a significant performance difference when servers within an application are located close to each other, even within a single data center.

Data center migration planning is also an excellent use case. Which applications need to move? What servers are involved? When the move is underway, how do you know that all the servers for all applications have been moved? Monitoring data centers using Tetration as a tool can answer these questions.

By tracking packet flows, it is also easy to determine servers that are no longer part of an application. In one data center move, for example, Cisco found that more than 40% of the servers in a data center were no longer part of an active application. Decommissioning these servers led to a reduction in power and cooling requirements.

Whitelist security

Whitelist security depends on having a complete understanding of the network protocols and ports that an application requires. Access control lists (ACLs) are built to permit only those packets that are allowed for the application to function. Of course, operations, administration and maintenance traffic must be allowed. All other network traffic is denied. This approach is a more secure "locked-down" approach than has been typically used in the past. The problem is that network and security administrators often do not know the characteristics of the network traffic that must be allowed in order for the application to function correctly. This is especially true in the healthcare industry where many of the applications are provided by external IT vendors. Tetration can prepare a whitelist policy for applications based on actual traffic flow information.

Another interesting use case is to validate changes in network security policies. Tetration can replay packet flows against proposed changes to validate whether the applications will be adversely affected. This functionality certainly makes modifying an ACL much less stressful. It also opens the door to validation of active entries in an ACL and removing stale entries. How many times have you ever wondered if all the entries in a big ACL were really needed? Now you can answer that question. Better security is obtained if the ACLs are constructed to support only the active applications.

There is a caveat to using real network traffic for security policy creation. If an application performs some actions on a monthly or annual basis, and those actions generate network traffic that's different from the normal flows, then that information will need to be incorporated into the resulting ACLs.

For more information on the basics of Tetration, take a look at blogs about the announcement: An Easy New Way to Inventory Everything in Your Data Center and Do You Need to Know It All.)

The implementation

Tetration is an analysis engine for monitoring data centers for advanced customers that have a high cost of downtime or whose application environment is challenging to understand. The system is sized for large (not huge) customers and comes as a single rack of 39 1U servers. Installation takes four weeks, supported by Cisco. That implies an expensive system, and it is. Tetration's list price is $3,097,830. (Yes, that's $3.1 million, and it will likely have some amount of discounting applied.) Expect to see smaller systems and perhaps a cloud-based system in the future, presumably at lower price points.

The existing system is a first-generation product. I expect to see quite a few improvements in the next year. For example, the current system does not do a discovery of the physical infrastructure. It is important to be able to identify the physical servers and switch ports used by an application. The security analysis could be extended to provide IDS and IPS functionality, particularly with extensions that allow Tetration to automatically perform configuration updates to ACLs to stop malicious actors in their tracks.

Tetration is a promising network management technology for monitoring data centers. It will be interesting to see how it matures over the coming year.

Next Steps

Assessing network monitoring tools

The high-wire act of network management

SDN and network monitoring

This was last published in July 2016

Dig Deeper on Network management and monitoring