To support our mission of preserving America's national parks for future generations, one of the biggest priorities at the Washington, D.C.-based National Parks Conservation Association (NPCA) is ensuring the stability of our IT network. Our IT department supports more than 100 users, 15 regional offices, financial and business applications as well as our popular Web site ( http://www.npca.org). Downtime -- as in most government and corporate environments -- is simply not an option.
But since the NPCA is an organization with limited IT budget and resources, it took us some time to ensure network uptime and availability. In late 2003, we were often experiencing temporary outages and other small network issues that, because they weren't immediately diagnosed, escalated into major crises. For example, a glitch in our DNS server caused some delays and strange behaviors in the server-to-server and client-to-server communications. Since our servers also cache DNS information, we didn't immediately recognize what was going on, and performance suffered until we pinned down the issue. With proper monitoring we could have immediately received an alert about our DNS services not working properly.
On another occasion, a malfunctioning core switch created corruption in our mail server database. Initially we thought the issue was limited to the mail server itself. Only after spending a considerable amount of time restoring the mail system did we realize we had the second and primary issue to resolve. Tackling the problems one at the time -- instead of on all fronts -- increased recovery time.
To remedy this problem, we sought a network monitoring solution that could help us keep the small problems from evolving into larger issues by showing us in real time exactly what was going on with our switches, servers and routers.
In trying to identify an appropriate solution as a non-profit on a limited budget, we knew it would have to leverage technology and not headcount.
What's more we needed to decide which type of network monitoring solution we would turn to. There is an ongoing debate among system administrators in all IT organizations, between the value of simple monitoring solutions (open source products, for example) versus expensive, feature-rich enterprise solutions that are costly to purchase, and additionally require labor intensive implementation and maintenance. In order to achieve network stability at NPCA at a reasonable price for our non-profit budget, we knew we had to take a completely new approach, because neither the high-end system nor the cheap solution was going to meet our needs.
Prior to evaluating any new technology, we decided to develop a set of best practices to ensure that new investments in system and network monitoring would be successful. These best practices should be a model for other non-profits and IT organizations seeking to manage increasingly complex IT environments.
At the core of these best practices are two themes -- simplicity and cost-effectiveness. We believe that great systems should be easy to use and justify their expense. NPCA's core best practices rest on five elements:
- Web based: Monitoring software must support monitoring from any location and must be able to monitor network operations from any browser.
- Standards based: The typical government or non-profit network environment is heterogeneous and requires a monitoring solution based on industry standards such as SNMP. Proprietary agent-based approaches greatly increase the integration effort, forcing IT to adapt, rather than vice versa.
- Automation-driven: Automation is at the heart of cost savings. Software should be able to automatically discover each new node, whether at the main location or at a satellite office -- including all the equipment and software within it. It should also be able to turn on multiple locations simultaneously -- without requiring store IT involvement. Any network configuration changes should be automatically detected without human intervention.
- Business process based: Whether a data center is large or small; supporting profit or non-profit organizations, its operations are business processes, which can be defined, tracked and improved. A network monitoring solution should be able to translate performance data into solid business information -- such as how long it takes to process a credit card transaction or how often a system was down in a given month. Reports with little but IT-oriented information communicate nothing useful to business users. However, if a report shows that one hour of down time for a check-out lane costs $3,200, we can determine the payback of specific IT expenditures and acceptable service levels.
- Simple licensing: The typical complex, multi-layered license pricing schemes we see in today's enterprise-class solutions make it hard to calculate the business value of a network monitoring solution. We wanted to know -- up front -- a software solution's total cost of ownership. An easy-to-understand licensing model can help. In addition, a vendor's willingness to structure pricing around your business model is a bonus. To control the cost of a monitoring implementation, IT needs visibility into five to 10 years of ongoing maintenance expenses. A corollary to this rule is that software with multiple modules and "optional" add-ons only increase cost and complexity.
We evaluated several vendors against these best practices and found that CITTIO's WatchTowerTM monitoring platform was the closest fit. Equipped with WatchTower and our strong best practice approach, we successfully deployed enterprise-class system and network monitoring on 13 servers and network devices in a timely and cost-effective manner.
Once we gained the complete system control and network visibility our organization required, we were able to develop and meet internal SLAs. We now provide reports and show that we're meeting the SLA and if a user reports a problem, we can quickly troubleshoot and determine the appropriate cause of action to solve the problem.
Through effective networking monitoring, all types of organizations can be aware of IT problems before they escalate out of control and affect end users -- ultimately decreasing IT costs and providing users and customers with better quality service.
About the author: Caterina Luppi is the IT Director for the National Parks Conservation Association, a well-respected Washington, D.C. non-profit organization dedicated to insuring that America's national parks are protected in perpetuity. Before joining NPCA, Caterina worked as consultant at the United Nations in Switzerland where she was responsible for a network of more than 1,000 local users and 175 remote offices. Caterina can be reached at [email protected].