Troubleshooting Linux boot problems

If your Linux server fails to show the login screen at startup, you have a boot problem. Learn some tricks on how to diagnose and fix a Linux server boot problem from Grub failure to init and runlevels.

If problems occur on your Linux server, they are often related to the boot process. During the system startup, something goes wrong and no login prompt inviting you to log on your server will appear. To fix boot problems, you will need to dig in to your boot procedure, which is what we'll cover here.

You will need to know how to "read" your boot procedure. The bottom line: when your server doesn't get completely started, it stops somewhere. The last message that you can see on the server's terminal can help you figure out where it went wrong. Based on that information, you can start the real troubleshooting at the next stage.

From Grub to kernel
The first thing that happens after your hardware has been successfully initialized, is the loading of the Grub boot loader. Depending on its configuration, there are two ways to see that Grub has loaded successfully from the Master Boot Record. You'll either see a boot menu, asking you what you want to start, or you'll see the kernel that is initializing. So, if you don't see anything happening, but you are sure there is no hardware problem, the trouble is likely Grub related, but there can also be a underlying cause in the Master Boot Record.

After the loading of Grub, your kernel can load. The kernel initializes your server hardware. The kernel is aided by the initrd, which is also referred to as the RamFS. The initrd is created to make sure that drivers can load on your server. There normally is no reason for things to go wrong with the initrd, but it can happen that after an upgrade of the kernel, the initrd is not recreated successfully. If that happens, you'll end up with a prompt stating "Kernel panic" and a server that doesn't react to any input. But, a kernel panic can have more reasons to occur than an error in initrd. For troubleshooting it often is a good idea to start troubleshooting initrd.

Init and runlevels
Once the stage in which kernel and initrd have been loaded is passed successfully, the init process is loaded. This process is also referred to as "the mother of all processes," it is responsible for loading all other processes on your server. It is difficult to pin down the exact moment that init starts loading, but you will recognize it based on the services that are initialized. At the moment that you'll see udev waking up, file systems getting mounted and /proc becoming active, you'll know for sure that the boot process is working on init.

In the loading of init, there are two sub-stages. First, is the loading of fundamental processes. For example, the mounting of your file systems, or the activation of udev which is responsible for creating the driver files in /dev. Typically, if an error occurs in that part of the boot procedure, you'll end up with nothing, the best you can get is the prompt that invites you to provide the root password to enter maintenance mode. In the second sub-stage, your services start loading. The beginning of this stage is marked by the message "initializing run level n," which may happen quickly. When you see this, you already have a functional server. That means that if an error occurs at this stage, you will see an error message, but probably no more than an error message, while your server is capable of completing the boot procedure anyway.

These problems are the easiest problems to fix, as you can work from the server prompt with the complete server environment that has been loaded. To fix problems that happen before this stage, you'll need to use a rescue CD, like Knoppix. Learn more from my tip on how to use a Knoppix rescue CD to access a server that is broken.

Based on the information in this article, you now know how to analyze a server that is broken. Correctly identifying the problem is essential, as it helps you focusing on the exact problem. In future articles you will learn how to recognize and fix some problems that may occur.

ABOUT THE AUTHOR: Sander van Vugt is an author and independent technical trainer, specializing in Linux since 1994. Vugt is also a technical consultant for high-availability (HA) clustering and performance optimization, as well as an expert on SLED 10 administration.

Dig Deeper on Data center ops, monitoring and management

Cloud Computing
and ESG