.shock - Fotolia
In the world of IT, it seems to be a commonly held belief that your environment is somehow unique -- special, even.
It's measurably and materially different from the run-of-the-mill enterprise infrastructure designs that make up everyone else's networks -- except, possibly, the networks you designed before joining your current company; those were special, too, but, of course, not as special as your current environment.
As such, network monitoring best practices, common techniques and standard approaches don't apply, or at least they must be significantly altered to fit the delicate and beautiful snowflake that is your IT architecture.
There is nowhere I have seen this perception as common as with system monitoring tools. I've lost count of the number of organizations over the last 30 years that treat their particular mix of servers, applications, network devices and so on as somehow different from any other.
Their monitoring platforms, meantime, were built as an in-house, custom-crafted artisanal craft technology that more resembles an interpretive dance than common software and hardware. And it requires special care and feeding by either a wizard of an engineer who only speaks in enigmatic koan or a mystical cabal of specially trained sys admins who were raised by Linux-wielding monks in a far-flung technical monastery.
No help from vendors
Sadly, many system monitoring vendors don't help this perception much -- each one adding fear, uncertainty and doubt to the mix with marketing that spins epic yarns of how it leverages "special APIs" and "context-sensitive command sets" to underpin a network monitoring best practices foundation. All of these claims stem from a combination of great skill, long beards and certifications from Hogwarts-like institutions.
I say this: Poppycock. Horsefeathers. Bull-frakking-pucky.
With 30 years in IT, and almost 20 of those focused on the monitoring space -- and having used just about every major monitoring platform on the market since 1998 in environments ranging from a few dozen servers to 250,000 systems in 5,000 locations worldwide -- I'm here to tell you something that flies in the face of all that.
Monitoring is simple.
Successful monitoring is standard, but it can be challenging
Yes, implementing good system monitoring -- monitoring robust enough to collect the statistics you need without injecting observer bias; monitoring that provides meaningful, actionable alerts, rather than noise; monitoring that takes the step of performing automated responses as part of the monitoring motion -- is simple. It's not voodoo. It is as standardized as subnetting. That's not to say it's easy, however. Monitoring is a complex task that is far from easy.
And one of the elements that makes monitoring complex is automation. Many IT professionals -- and even a few monitoring experts -- will say automation is really best left in the realm of servers and applications. Or, that the only viable way to get automation in the world of networking is to venture into the undiscovered country of SDN.
Nothing could be further from the truth.
First, let's get something out of the way: Monitoring is not a ticket, a page or a screen. Network monitoring best practices are nothing more -- or less -- than the ongoing, regular and consistent collection of metrics from and about a set of devices. Everything else -- reports, alerts, tickets and even automation -- are a happy byproduct you enjoy as long as you do the first part.
That said, good automation is enabled by -- and is a result of -- good monitoring. For example, if you have robust monitoring in place, it's simple to set up the ability to:
- Collect network device configs on a regular basis.
- Receive config-change traps.
- Collect the configs from the device that just sent out a trap.
- Compare the "last-known good" config to the one just collected.
- If it is materially different, force back the old config and send an alert.
In this way, devices modified without proper change control -- which is the cause of anywhere between 40% and 80% of all corporate network downtime, depending on which report you believe -- are forced back to their previous state until the new changes can be understood.
It's elegant; it's simple; and most importantly, it's not artisanal. It's just automation the way automation is meant to work.
There are other examples of network device automation, some of which I've written about in the past, but the biggest barrier to implementing monitoring at most companies is not having wrong tools or the wrong skills. It's having the wrong mindset -- a mindset that says monitoring and automation are complicated and difficult --"far beyond those of mortal man," to quote the old Superman reruns.
In the end, network monitoring best practices and automation are only limited by your ability to imagine and then implement a good monitoring tool, rather than your ability to perform some weird interpretive dance.
Keeping tabs on your system alerts
Picking the right monitoring tool
Complex apps need robust monitoring