designsoliman - Fotolia

Feature

Server failure, Linux comprise 2020 data center management tips

When you work in IT, you should consistently try to expand your knowledge base. This tip roundup explores recent content about Linux, IT budgeting and server troubleshooting.

Jessica Lulka, Site Editor

Published: 23 Dec 2020

Data center management is a job that requires you to either do or learn about something new on a regular basis. It's what can make your role interesting -- or stressful -- depending on the resources you have to delve into new topics.

Here's a look at some of the most popular SearchDataCenter tips from 2020 that can help you gain knowledge and delve into new areas of interest.

Analyze the top causes of server failure

It's impossible to run any data center infrastructure without servers. This means it's essential to try and predict any potential problems or events that could cause costly downtime -- especially, for mission-critical hardware.

The top causes of server failure include power outages, dust obstruction and poor temperature regulation, outdated firmware, hardware configuration issues and cyberattacks.

With the variety of these causes, it's important to have effective backup power hardware in place; perform regular physical maintenance and use ASHRAE guidelines to set temperatures; have a regular software update policy in place; consistently check cabling setups; and adhere to security protocols.

For big picture planning, your team should ensure any backup power setups are in working order and have a disaster recovery plan should all your data centers go offline unexpectedly.

Learn about Linux

Linux is an industry standard for server and data center management, but there are some questions about what it is exactly and what it does in the data center. At the most basic level, Linux is an OS that you can run on your servers across your infrastructure.

Linux is unusual because it is an open source OS, which affects licensing. The GNU General Public License states the terms under which you can use, modify and distribute the OS. With this license model, the idea is that this OS remains open source and is free for all to use.

The other main distinction Linux has from macOS and Windows OS is the kernel. Because the open source community constantly maintains and updates the kernel, it is an ideal choice for a server infrastructure that requires real-time upkeep. The Linux kernel contains subsystems for memory management, process management, network stack, virtual file system and a system call interface, as well as arch and device drivers.

Set your IT refresh strategy

Depending on the size of your organization, you may be involved in either IT budgets or purchase strategy. Whenever upper management decides to get new technology, it's important to have some business reasons as to why your organization should refresh its infrastructure on a semiregular basis.

The largest reason to upgrade any technology is that server hardware -- the essential piece to any data center -- gets less reliable with age. IDC research noted server performance erodes at an annual average of 14%.

Plus, if you consistently refresh your server hardware on an annual or biannual basis, your team can have a more accurate number of overall expected spend. This means you can predict how much your organization will spend on upgrading hardware, instead of potentially dealing with any surprise costs from overloaded or overworked hardware.

Beyond server technology, a refresh strategy gives your organization the chance to reduce overall operating costs with hardware that is more environmentally friendly and energy efficient.

Purchasing newer hardware is one way to consolidate your infrastructure and, potentially, simplify data center management, whether you simply need fewer machines or decide to run more virtual or cloud-based applications.

Troubleshoot kernel panic

No one enjoys a total system shutdown. Like the Window's blue screen of death, kernel panic is an occurrence where some circumstance -- bad memory, malware, software bug or driver crash -- can render your OS useless.

To figure out how to get your OS back online, you can use the kdump crash tool. This procedure allows you to use the kdump tool to collect any system information around the time of the crash and perform a root cause analysis to troubleshoot the Linux kernel.

With a Linux distro, two VM clients and a network file system, you can use a series of commands to install a network file system, determine where you want to store the system logs and then simulate a kernel crash. With this setup, you can figure out what processes ran, what files were open and what was stored on the virtual memory during the crash.

These pieces of information, along with the kdump crash tool are a surefire way to perform any analysis and get your Linux kernel back to normal.

Properly decommission your mainframes

Mainframe technology is still relevant in industries such as healthcare and finance. Though as new infrastructure emerges and people retire, your organization may not have the expertise or need to run these massive computing setups.

Should you look at mainframe retirement, the first step is to have your team decide what data should stay within the organization and what software the mainframe should support. This will help you run an application inventory, which solidifies what applications your organization still requires or can decommission.

You can investigate outsourcing mainframe operations, but this could potentially be very costly, even if it is the easiest option. Doing so allows for a smoother transition and removes questions about physical hardware disposal.

There's also the choice to replatform any mainframe applications to run on x86 hardware so that you can host the software in your own data center or in the cloud. This option can be easy as long as there aren't major code changes, but how often you require the application data can affect where you decide to host it once the software is off the mainframe. You must also see what new applications your team must adopt to account for the fact the mainframe is no longer available.

Finally, there's the physical disposal of the mainframe, which is no small feat. For proper and secure removal, you should migrate all necessary data, degauss the storage or potentially destroy any hard drives.

Server failure, Linux comprise 2020 data center management tips

When you work in IT, you should consistently try to expand your knowledge base. This tip roundup explores recent content about Linux, IT budgeting and server troubleshooting.

Analyze the top causes of server failure

Learn about Linux

Set your IT refresh strategy

Troubleshoot kernel panic

Properly decommission your mainframes

Dig Deeper on Data center ops, monitoring and management

What is the blue screen of death (BSOD)?

Does Windows kernel use Rust and what does it mean for IT?

User mode vs. kernel mode: OSes explained

What is a device driver?