cutimage - Fotolia
Remote possibilities: Out-of-band management admin options
COVID-19 puts added strain on administrators who need to compensate for lack of personnel in data centers, which leads to the need for a remote access strategy.
Among the duties of the administrator is the care and feeding of servers and other infrastructure. Even in times of crisis, the update process for hardware firmware, applications the organization relies on and the Windows operating system needs to continue.
For most admins, this work isn't that hard. The update process, once fraught with anxiety when the blue screen of death was a regular occurrence, does not induce panic. The deployment of VMware and Windows updates and firmware patches is relatively painless, but every so often they can still have some issues.
One problem that can occur is a server that gets stuck during an update. Ordinarily, a simple hard reboot is the fix. Until the coronavirus pandemic, this was not much of a concern because the data center was nearby, and a button push would return things to normal. Even if your data center was remote, you had remote console access setup and a remote staff to assist in these situations.
Circumstances have changed dramatically over the last several months. Most people have moved offsite and procedures that were once taken for granted, such as the staff to physically access hardware, have been lost to many organizations.
Administrators with Hewlett Packard Enterprise Integrated Lights-Out (ILO) and Dell's Integrated Dell Remote Access Controller (IDRAC) systems or similar out-of-band management -- sometimes called lights-out management -- on hardware had an extra layer of capability to weather this change; remote access, if a server was hung, did not change. While not ideal, the impact was low.
Not all remote access technologies are the same
For administrators without this functionality, the regular task of deploying updates and rebooting to complete the patch cycle became less routine. For many organizations with data centers, onsite remote access may have been on the list of things to accomplish, but typically near the bottom. The coronavirus pandemic turned that around quickly to put remote ability to the top of the priority list.
It requires some homework to understand what exactly these functions are because each vendor calls this out-of-band management capability by a different acronym -- IBM calls its technology IMM or Integrated Management Module, while Oracle has ILOM or Integrated Lights Out Manager -- but they offer many of the same features.
These remote management systems work independently of the operating system and some use hardware to ensure this functionality will operate in extreme circumstances. Vendors design this subset of internal systems to stay up and running even if the server becomes unresponsive due to a software issue or a hardware conflict that causes a crash. These microsystems provide remote access to the console, cycle the power systems and give the ability to troubleshoot booting issues that occur before you might have access to traditional remote access tools inside the OS.
These systems are not recommended as a substitute for traditional remote management tools. The performance won't be ideal as they are tuned for stability, not speed, in an emergency. Like most things in IT, nothing is free and remote management systems are the same. While most server vendors include some level of remote management, it's more of a question of what features might not be included.
For example, all levels of HPE ILO systems get virtual power buttons, but the remote console that gives pre-OS access is not available without purchasing an advanced license. While each vendor has a slightly different license model, most follow the same format of including the most basic functions with the cost of the server and then selling that additional license for out-of-band management.
When integrated out-of-band management isn't an option
While these internal server systems work, they are not your only option. Administrators looking to compensate for the loss of a remote IT staff can add other remote server management hardware such as KVM over IP switches with power control. These systems support a wide range of different vendors and configurations.
The setup is simple: The KVM unit offers remote access via IP and the servers plug into smart outlets. The servers must be configured to turn on after power loss otherwise the machine will not start until someone physically presses the power button. The drawback with this configuration is it requires physical installation, unlike many of the server-based remote management systems that can be unlocked with a software license.
However, the benefit is this setup supports servers and appliances with no built-in out-of-band management system. This remote management equipment can work with more than servers, such as switches and storage units. The flexibility is only limited by what might need a keyboard and mouse and power. No one wants to power-cycle any part of the infrastructure when it hangs in the middle of an update, but when the hardware is not responsive there is little choice but to restart the machine.
Plans for better remote administration come to the fore
Issues with updates, while on the decline, still happen. Problems with patches have gone down, but the importance and impact of the systems has grown. A few years ago, if a single server didn't restart properly after a patch, then you could wait until morning to tackle the problem. But now problems with a server could be magnified if that machine hosts dozens of virtual machines, meaning anytime downtime is a crisis.
The need for reliable remote access is not new or driven by a single event. It's something many in IT have on their to-do list, but circumstances have changed to make this functionality almost a necessity. In the absence of hands-on help, many administrators have been looking for a more reliable way to manage Windows machines and the infrastructure in the data center. Now is the time for IT to take closer look at the way things are managed and seek out ways to make improvements with remote access and management to avoid problems when maintenance and updates get deployed.