Automate PowerShell scripts for self-healing IT infrastructure

Brien Posey

IT teams lose countless hours performing repetitive system maintenance tasks, while PowerShell's mighty automation capabilities remain largely untapped. This enterprise-ready tool, already available in Windows environments, can transform labor-intensive routines into efficient, self-managing processes that work around the clock.

Deploy self-healing IT with PowerShell

PowerShell's scripting features enable the creation of self-healing systems that actively maintain themselves, dramatically reducing manual intervention. When IT teams automate PowerShell scripts effectively, organizations establish a proactive maintenance framework with four key capabilities:

Self-monitoring. Continuously watch for system anomalies and performance issues.
Self-diagnosing. Identify the root cause of a technical problem.
Self-repairing. Automatically implement pretested solutions.
Self-documenting. Log all automated actions and their outcomes for review.

Real-world applications

PowerShell's deep Windows integration and extensive toolset make it ideal for automating diverse tasks, such as the following:

Storage optimization through automated disk cleanup protocols.
Self-healing network connectivity management.
Automated system health assessments and reporting.
Service monitoring with automatic recovery.

Tutorial overview

Expert Brien Posey presents the fundamentals of how to automate PowerShell scripts to perform IT maintenance, covering the following:

Practical example. Build a script that auto-restarts failed Windows services, examining each line of code as you go.
PowerShell scripting concepts. Learn core concepts for automating system administration tasks, including advanced PowerShell cmdlets, function creation, parameter handling, error management with Try/Catch blocks and loop implementation.
Scripting best practices. The tutorial explains methods for developing safe PowerShell scripts, including implementing error handling protocols, logs for tracking scripts, flow control strategies and testing and verification procedures.

Benefits and outcomes

Organizations that implement these automation techniques with PowerShell benefit in the following ways:

Reduced need for manual system oversight.
Enhanced system reliability and uptime.
Improved resource allocation.
Reclaimed time spent on routine tasks.

While PowerShell automation enhances efficiency, its true value is in augmenting rather than replacing human expertise. Using PowerShell scripts to automate repetitive tasks, IT teams can redirect their focus to strategic initiatives that drive innovation and business growth. This symbiotic relationship between self-healing systems and skilled IT professionals creates a more dynamic, forward-thinking technology environment.

Brien Posey is a former 22-time Microsoft MVP and Commercial Astronaut Candidate. During his 30+ year IT career, Posey has served as the CIO for a national chain of hospitals and healthcare facilities and as the lead network engineer for the U.S. Department of Defense at Fort Knox. He has also worked as a network administrator for some of the largest insurance companies in America.

View All Videos

Transcript - Automate PowerShell scripts for self-healing IT infrastructure

Editor's note: The following transcript has been lightly edited for clarity.

Brien Posey: Hello, greetings, and welcome. I'm Brien Posey. And in this video, I want to talk a little bit about using PowerShell to achieve self-healing IT. In other words, you can build a PowerShell script that can monitor your environment, detect problems and take automatic corrective action without you having to manually intervene. And there are a million different ways that you could potentially do this.

Examples of self-healing IT

One common example is detecting a Windows service that has stopped and automatically restarting that service. Now, you can do this within the Service Control Manager, but you can also do it through PowerShell, and I'll be showing you exactly how to do that. Some other examples might include monitoring disk space and clearing out temp files when certain thresholds are reached. You might also consider building a PowerShell script that resets a network adapter when connectivity is lost. Or you could even use PowerShell to schedule recurring health checks for your IT resources.

Scripting considerations

Now, when it comes to the actual scripting process, there are some things to keep in mind.

One of the big things to keep in mind is that you want to use safe automation. In other words, you want to write your script in a way that it's not going to introduce any new problems into your environment. You want to perform checks so that nothing unexpected happens. And I'll show you some examples for that.

Another thing that I recommend doing is making your script modular so that that way you can reuse it for various purposes.

It's also a good idea if you're using a PowerShell script for self-healing IT to create a log so that, that way, you can take a look at the historical data associated with your script and find out exactly what your script has been doing and when.

It's also important to consider how you're going to run your script. In other words, are you going to run it in the background? Are you going to schedule the script to run using Windows Task Manager? And incidentally, if you do choose to run the script in the background, one way that you can do that is by using the Start-Job cmdlet. That tells PowerShell that you want the script to run in the background, and it will just do its thing without you having to worry about it.

Let's take a look at what a self-healing script might look like.

Example script structure and initial setup

If you look at the screen, you can see that I've created a script for monitoring and restarting a failed service. And the service that I'm going to be using in this particular example is the Spooler service.

The first thing that we have is a comment that just tells what the script does. And then I've got a CLS command, and CLS, it's just the old DOS clear-screen command. And the only reason why this command is in here is just because, when I was debugging the script, things got a little bit messy, and it was just a little bit easier to clear the screen each time that I ran the script, so that I could see exactly what was going on.

Now, the next thing that I've done is define the parameters that are used by the script, and in this particular case, I've got three parameters. I've got a $LogFile, and that $LogFile points to a file and path -- in this case: C:\Scripts\ServicesLog.txt. So, that's going to be the path and file name that's going to be used by my log files.

The next parameter that I'm defining is the $ServiceName. In other words, which service are we monitoring to make sure that it continues to run? Now, in this case, the service that I'm going to be monitoring is the Windows Spooler service. That service is called Spooler. Now, you'll notice right here that it says 'SpoolerX,' not 'Spooler.' So, what I've done is I've introduced an intentional misspelling, and the reason why I did that is because I mentioned earlier that it's important for your script to perform various checks just to make sure that everything is working properly and that you're not doing anything, just to help me demonstrate one of the ways in which your script might catch a problem.

And then, finally, I have a $TestInterval parameter. So, we have to make a decision about how often we actually want the script to check to make sure that the required service is running. So, we could do it several times a second. We could do it a couple of times an hour. We have options. In this case, I'm going to be checking every 30 seconds. And 30 seconds was just an arbitrary value that I pulled out of thin air. We could set this number higher or lower to meet our needs.

Now that I've declared some parameters, the next thing that we've got is we've got a section to find out if our log file exists. And the reason why I'm doing this is because if the log file doesn't exist and we don't provide instructions to create an empty log file, well, then we're going to get an error when we try to write data to a nonexistent log file.

On the other hand, if the log file does exist, and we attempt to create a new log file, then we're going to overwrite the existing log file, and that's bad, too. So, we're handling the check for the log file in a very graceful way.

So, what I'm doing here is I'm using Test-Path. And Test-Path is a native PowerShell cmdlet to find out whether or not a particular file or folder exists in a given location. So, here, we have Test-Path, and then I'm specifying the path, and the path is $LogFile. Now, remember, we declared $LogFile right here, and it points to C:\Scripts\ServicesLog.txt.

This line of code is checking to find out whether a C:\Scripts\ServicesLog.txt exists. Then, we have -not. So, we're saying, 'If this file doesn't exist, then we're going to perform the instructions that are between these brackets, right here.' So, what are those instructions? Well, we're going to be creating an empty log file if no log file exists. And the way that we're doing that is by using the New-Item cmdlet. Again, this is a native PowerShell cmdlet.

And then, we're setting the path equal to $LogFile -- in other words, C:\Scripts\ServicesLog.txt. Then, we've got -ItemType File, because we're creating a file instead of a folder. And then, -Force, because sometimes you have to tell PowerShell, 'Yes, I want to create this, and don't worry about any conditions that might exist.' And then, Out-Null just prevents any output from being displayed on the screen.

And then, I've got the Write-Host command. That's going to display a message on the screen saying, 'New Log File Created at:' and then, it's going to provide the path and file name of that log file. Now, if, on the other hand, the log file does exist, well, then we don't have to do anything. We're just going to display a message that says, 'Logging Data Is Available at:' and then, it provides the name and location of the log file.

Now, this raises an important point. Earlier, I mentioned creating a script that runs unattended as a background job. And if you're running a script as a background job, there's really no reason to be displaying information on the screen, because nobody's going to be around to read it. The script is going to run in the background, so all of the data should be written to the log file. In this case, though, I'm going to be demonstrating the script, and it's just a little bit easier to see what's going on if we display the information on the screen as well as writing it to the log file. So, in the real world, you're probably not going to have to worry about displaying information on the screen. I'm just doing that here as a convenience feature.

Logging function implementation

So, let's scroll down a little bit. Here we have a function that I've created, and that function is designed to write data to the log file.

How does this function work? Well, like many functions, we have a parameter that we're creating, and the parameter is the data that's being passed to the function. And in this case, the parameter is just going to be a $Message. So, we're passing some kind of message -- it doesn't really matter what -- to the function.

Once the function starts running, the first thing that it's going to do is it's going to retrieve the current date and time, and it's going to write that date and time to a variable called $TimeStamp.

Then, the function creates a variable called $LogEntry, which will contain the entry that will be written to the log. That $LogEntry will consist of the $TimeStamp and the $Message that we passed to the function.

From here, we're going to do two things. We're going to use the Write-Host command to display the $LogEntry on the screen, and then we're going to write the $LogEntry to the log. Now, how are we going to do that? Well, we're taking the $LogEntry variable -- and remember that it contains our timestamp and our message -- and then we're using the pipe symbol to direct the output from that command out to a file. And we're specifying -FilePath as $LogFile. And then, we've got the -Append switch.

Now, this -Append switch, which you see right here, is very important, because if we don't include the switch, then the $LogFile will be overwritten, and we don't want that. We want new entries to be appended to the end of the file, rather than overriding everything that already exists. So, then, we come down to our main script.

Service validation

The first thing that we're doing in our main script is to find out whether or not our service name is valid. And remember, earlier, I said it's really important to use conditional logic and to check to make sure that errors aren't being introduced.

Here, I have a command called Try and a command called Catch. Try and Catch in PowerShell is a way of handling errors. The Try section is the code that you want to run, and, normally, that code should run unimpeded. But, if an error occurs, then instead of actually executing the instructions in the Try section, the instructions that are in the Catch section execute instead.

So, let's see what we've got here. The Try section includes Get-Service $ServiceName. And, remember, we declared the $ServiceName way up here. And in this case, the $ServiceName is 'SpoolerX.' So, we're running Get-Service, followed by the $ServiceName, and then -ErrorAction Stop. Now, the -ErrorAction is important because without an -ErrorAction, the Catch will never engage. What we're doing is using the Write-Host statement to display the words, 'Monitoring Service,' followed by the $ServiceName. So, if these two lines can execute, then everything's good.

Now, if we get an error because the service name was invalid, then the Catch section is going to engage, and we're going to see a message displayed on the screen saying that the service name is invalid.

And then, we run the Exit command, and the Exit command causes the script to terminate. So, the script will stop running at that point.

Main monitoring loop

Let's take a look at the portion of the code that actually performs the monitoring. The first thing that we have within our monitoring loop is While ($True). And I'm not actually looking for a condition here. $True is always going to be True, and that means that this loop is going to run indefinitely. As long as True is equal to True, which it always will be, then everything between the following two brackets is going to run in a continuous loop

What are we doing in those brackets? Well, the first thing that we're doing is we're creating a variable called $Service, and we're setting that equal to Get-Service $ServiceName. Remember, $ServiceName was declared earlier in the script, and it contains the name of our service -- in this case, the Spooler service. Then, we're checking to see if the $Service.Status is not equal (that's -ne, not equal) to 'Running.' We're checking to find out if that service has stopped. If it has stopped, then we're going to run everything between these brackets, right here.

So, what are we doing? Well, what we're doing is we're calling the Write-Log function, and we're passing a message saying, Service '$ServiceName' -- so, the name of the service -- 'is' and then whatever the status is. It'll probably be stopped. And then, the message goes on to say, 'Attempting to restart… .'

Now that we've logged that message, the next thing that we need to do is try to restart the service. But, remember, we want to write the script in a way that allows it to check for errors that may occur. So, rather than just trying to start the service without having any kind of error trapping, what I'm going to do is use Try and Catch once again.

Here, we have our Try statement, and the first thing that we're doing within the Try statement is we're trying to start the service. And the name is set to $ServiceName, and the -ErrorAction is Stop. We're trying to start the service, and if this command succeeds, then we go on to call the Write-Log function, and we're passing a message saying: 'Service '$ServiceName' was restarted. In this case, we would get a message saying Service Spooler was restarted.

Now, if we get an error message when we try to start the service, then what's going to happen instead is the Catch will engage, and the Write-Log function will be called. And the message that's going to be passed to that function is going to say, 'Failed to restart the spooler service.' All of that is going to happen if the service is not running, because, remember, we're checking -- if the server status is not equal to running, then do everything between this block. Now, if the service status is running, then we're going to execute this Else statement. And what we're going to do is we are going to call our Write-Log function, and we're going to pass a message saying: 'Service '$ServiceName' is running. So, the message will say something like: Service 'Spooler' is running.

Then, the last thing that we're doing is calling Start-Sleep. Now, Start-Sleep is a native PowerShell cmdlet, and it inserts a time delay. And the time delay is going to be measured in seconds, and we're using the $TestInterval variable. And the $TestInterval variable was declared at the very beginning, and, as you'll recall, I set it to 30 seconds.

In other words, what's going to happen is a 30-second time delay, and then this loop executes all over again. So, the loop is going to execute once every 30 seconds.

Let's go ahead and take a look at how this works.

Script execution

I'm going to begin by opening up File Explorer. And, here, you can see my SpoolerMonitor script. And what I'm going to do is I'm just going to erase the log file.

So, the log file is gone. We don't have a log file. When I run the script, it should detect the absence of the log file and automatically create one for us. Let me go ahead and minimize File Explorer, and let me pull up PowerShell. And I'll switch over to the appropriate folder. And we'll go ahead and run the script.

When I run the script, the Clear Screen command kicks in. We have a message saying that a new log file was created at C:\Script\ServicesLog.txt. And we can actually confirm that. I'll switch over to File Explorer. And here's our new ServicesLog.txt file. We'll look at that file a little bit later on.

Then, we see an entry from today's date. And it says, 'The service name SpoolerX is invalid.' And if you think back to the beginning of this video, you'll recall that I intentionally misspelled the name of the service just so that I could test my ability to catch errors like this one. Since the script was able to catch this error, let me go ahead and minimize PowerShell. Let's get rid of the misspelling, and I'll save my changes. And I'll switch back over to PowerShell.

Incidentally, before I rerun the script again, I do want to quickly mention that I'm running an elevated PowerShell session. You'll notice the word 'Administrator,' right here. And the reason why I'm doing that is because we are manipulating Windows Services, and we can't do that under normal permissions. We have to do this from an elevated session.

With that said, let's go ahead and run the script again. And this time, when I run the script, I get a message saying, 'Logging Data is Available at: C:\Script\ServicesLog.txt.' This time, we didn't get that message saying the log file didn't exist because when I ran the script previously, it created the log file. Next, I see the current status of the service. So, you can see the Spooler is running. Then, we have a message saying, 'Monitoring Service Spooler.' And then, we have a couple of log entries that confirm the Spooler is running, and these entries occur every 30 seconds. If we wait a few more seconds, we should see another instance of this appear on screen.

Now we see confirmation that the Spooler is still running. Let's go ahead and shut it down. To do that, I will open the Service Control Manager. And you can access the Service Control Manager by typing 'services.msc' at the Windows Run prompt, and I'm going to scroll down. These are all the services on the PC. Here is our Print Spooler, and you can see that it is indeed running. So, I'm going to stop the Print Spooler. I'm going to right-click on the Print Spooler service, and I'll choose the Stop option. Now the Print Spooler is stopped.

Let me go ahead and minimize this. And let's watch what happens. The script detects that the Spooler has stopped and attempts a restart. And we have a message saying that the spoiler was restarted. If we wait until the next round of logging data, we will get confirmation that the spooler is running. Here, you can see that the Spooler is indeed running.

Let's have a look at the Service Control Manager just to make sure that what PowerShell is telling us is accurate. I'm going to open up the Service Control Manager once again. If we click on the Refresh icon, we can now see that the Print Spooler is indeed running. So, let me go ahead and close this out.

And the last thing I want to do is to take a look at the log file that was created. I'm going to press Ctrl + C to terminate the script. And let me go ahead and close out of PowerShell. And let's open up File Explorer and take a look at the log. Here you can see the log that was created. And the log contains exactly the same data as what was shown on screen.

As you can see, we can use PowerShell to monitor a system service and automatically take action if that service stops. So, hopefully you'll find this technique helpful. I'm Brien Posey. Thanks again for watching.

+ Show Transcript