Perform Nagios configuration step by step
Nagios Core, formally Nagios, is a popular open source infrastructure monitoring platform that provides reporting and alert functionality for both Windows, UNIX and Linux systems, switches, routers, applications and services.
This video tutorial takes Nagios configuration step by step, walking through a basic installation of Nagios Core on an Ubuntu 18.04 server. The video also explores additional plugins and tools to monitor remote servers and their information, and provides a high-level overview of how to monitor the Nagios Core server itself.
Learn how to install the Nagios Remote Plugin Executor (NRPE). The NRPE plugin enables IT admins to monitor remote Linux and Windows machines, and collect information on their system states. In this video, we install the NRPE plugin on both the Nagios Core server and the Ubuntu 18.04 client machine.
Once the Nagios Core installation and necessary dependencies are in place, review the basic configuration files that get modified on the Nagios Core server and the client Ubuntu server to monitor server uptime and availability. Then, walk through the monitoring configuration of basic system stats, such as user load, system load and disk usage.
Finally, this Nagios configuration step-by-step tutorial offers a few troubleshooting tips to manage the Nagios Core and NRPE configuration files, and explains how to validate that both Nagios Core and NRPE are running correctly on the system after any monitoring configuration changes.
This video should help IT admins get started with Nagios Core monitoring -- but it barely scratches the surfaces of everything that Nagios Core can provide. With the addition of alerts and other configurations throughout an organization's tool pipeline, IT teams can have comprehensive monitoring, alerting and reporting for their environments.
Today what I'll be talking about is how to install and configure Nagios monitoring. I will go over both the installation and configuration of a Nagios server, and then we'll also cover how to monitor another server for Nagios.
What I'll be using today is Ubuntu 18.04 for both endpoints. I'll go ahead and configure those. To start off with, I just have a quick Bash script that will run through and install dependencies for you, grab the Nagios packages from the public website, extract those and install those built-in from source. This will also grab the plugins that you'll have enabled for you to have different tools to monitor different parts of different configurations. And then, at the end of that, you would have a Nagios installation that's up and running. At that point, though, it will only be able to monitor itself.
We'll go ahead and run this, and it will come back. Now, at some point during the execution of that, you will get this prompt. What this prompt is doing is it's saying, 'admin passwords you are going to set for Nagios.' So, username is "nagiosadmin" and then the password will be whatever you set here.
All right, once that finishes, it'll take you a few minutes depending on what you're installing on. But you'll have this output, telling you the IP of your server/Nagios for the host page. We're not going to go there yet -- as of now, it's actually not up and running. That's just the Apache front end for this. We still have some things to configure from a Nagios perspective.
First thing we want to do to is change the Nagios config file. Page down a couple of times -- one time -- and we're going to come to this line. What this is saying is this is where Nagios is going to look for configuration files on the other servers it's going to manage. So, everything that we put in here in Nagios will pick up and start monitoring based upon that config file. So we'll go ahead and do that. Save and exit.
Now, part of Nagios monitoring typically is the Nagios Remote Plugin Executor that is used so that Nagios can talk to other services, and so that those services on other machines can have a reporting mechanism back to Nagios. So we've already installed that with that script we've run. But we do need to enable the Nagios system to leverage that. So we'll edit that config; we have gotten into that file, and then define commands so that it can use that. And all that's doing is, as we start executing commands across the system, that's leveraging the back end to make that work.
We want to validate the config files are good and sane, and then Nagios will start to check the status. Everything looks good there. Every time you are changing its config files, you do want to make sure you restart the service and check the status of the service after a restart. If there's any kind of issues in that config file, we'll [address them].
Now we want to go over to the machine that we want to monitor. Like I said, this is also a Ubuntu 18.04 machine. We need to install that NRPE plugin here so that the Nagios server can talk to it. We have a script set up for this as well. Pretty basic, we're just going to grab some dependencies, grab the package from the source and install it and away we go. If you're watching this in the future, you will want to make sure you check for the latest version; this is current as of now, but that may change. Okay, let that run. Once that completes, once again, a couple minutes, we have some config files we need to update. So we're going to first do the NRPE config file. Just a quick note, if you have multiple NICs on here that you do need to have a dedicated, only this response on this NIC. This is where you'll change that. Most default configs, you can just leave that alone. Down here on our allowed_hosts file we're going to go ahead and add the IP address of the Nagios server. And if you had another one for backup or monitoring, you could add that as well. A little bit farther down, you'll see the examples for the config, so we're going to change that from hda1 to sda1. And also note we will have some procs settings in here for usr and things like that. These are the configs that you'll have to actually run what the plugins are saying to monitor the service -- so, essentially command, the name of it, path to the plugin that is going to be doing your health check, then for most things, it's what your critical level is, and then any additional parameters you may have. And save that.
All right, so we're going to go and restart the NRPE service, then we also want to check its status. And you should see something very similar to this. It's up, it's running, the ports, what it can talk to. Very important that every time you change the config files, you also restart the NRPE service and validate that it's running correctly. This is also a thing that causes a lot of problems for people.
Now back on the Nagios server, we need a configuration file for the node that we're going to manage. Remember this is the folder that we copied out earlier. This doesn't necessarily have to be the name of that server, but it should be to follow best practice. You'll have one of these for each thing you'll monitor. Now, within this file, we'll start defining the services we're going to monitor for the VM. First thing we'll put in here is the define host. What this will do is define just up and down on this VM. Put in the hostname IP address, a description that will show up in the portal that means something. Now, a couple things: max_check_attempts [is] how many times do we want Nagios to check something before it stops. Check period 24x7, how often are we doing notifications, and then notification period, 24x7. You could have your notification period and your check period be different; this will actually be something either defined by maintenance windows or, you know, I want this to monitor the service 24x7, so I have historical data, as needed, but always do notifications during specific windows. I will save this. Now, if we only left it here, all we will get is up and down on the server. We want to grab some additional information out of here. We want to grab some config files for app monitoring average load, disk usage and user usage. And just like the up and down, this is: define the node hostname, define what plugin command you want to run, and then just a useful description so it shows up on the dashboard. Save and exit that. Once again, we want to restart Nagios.
Now, this is why it's very important that when you do make your config files, you check the status; there's something wrong with this config file, and we need to check an update. So let's go look at that. Here's where we have missed the config file for the name. Save that. Try it again, and here's our status. You'll see that Nagios restarted successfully. Very important that as you're updating these config files, you check the status after you restart them and make sure that everything's working correctly, otherwise, you will not get the things you expected to.
Now after all that, we'll hop over to the admin portal for Nagios. Remember the credential is nagiosadmins and that username we set earlier in the script. It'll run. A couple things in here: lot of information here for additional information on getting started and links and where there are more plugins you can find and things like that. Also, you'll see there's Nagios 11 references. Nagios 11 is the commercial paid product while Nagios Core is the open source free product.
So let's go take a look at our hosts. And you'll see a couple things here. We have one dev VM that's currently off, so it's showing as down -- host unreachable. We have the local host itself for the Nagios server. And then it has its checks. You can view the status details for this host, and then it also gives us information on the different services within this specific host. Back to host, we'll see our prod VM. Once again, it's the same kind of status. We can pull up status, and we can look at this. This is also one of those things where it's good that you need to be making sure that you're checking your ports and checking your firewalls, otherwise, you'll see something like this -- connection refused. So that's a very common thing as well. If you just go to Services entire, you'll get the output for everything that's across your environment.
I hope this helped you with your configurations and services on monitoring Nagios and a couple of gotchas as people go through this. Good luck.