Skip to content
March 6, 2010 / gus3

Using Netconsole on Linux

When the only connections to a Linux system are the power cord and the Ethernet cable, sending the console output to another host on the local Ethernet is a great way to observe boot-time behavior, including any panics that hang the system. However, full netconsole requires the system’s Ethernet driver to be built into the core kernel. No stock Linux distributions have Ethernet support it in their supplied kernels, except as modules. This has the disadvantage of not emitting console messages before the Ethernet driver is loaded. By building it into the kernel directly, all the kernel messages become available.

NOTE: If the boot process requires human input, such as entering the passphrase for an encrypted volume, then this setup will not work. Netconsole is an output-only technique! If you need to supply input before network logins become active, or you need direct control of LILO or GRUB, then you need either the standard console (VGA and keyboard) or a serial console.

The following assumes you have experience building and installing a Linux kernel, you have two working Linux systems, and the system requiring netconsole has the kernel source installed. For the purpose of this discussion, I will refer to the systems as “remote” and “viewer”.

Building the new kernel

The first step is to gather information. On a piece of paper, make note of the following:

  • the kernel module for remote’s Ethernet
  • remote’s IP address
  • viewer’s IP address and MAC (given in ifconfig)

To find the network driver for the remote system, log in to it, then type /sbin/lspci -k. In the output, find the line containing the word “Ethernet”. One of the following indented lines will indicate the “Kernel driver in use”. This name is the one to write down for the first item in the above list.

The IP addresses are probably listed in the viewer system’s /etc/hosts file.

The next step is building and installing a new Linux kernel on the remote system, to include the Ethernet driver necessary for immediate kernel logging:

  1. In the menu-based or GUI-based configurator, select “General setup”, then provide a special name for the “Local version”. By appending a name to the kernel version, known good driver modules will be left in place for recovery. For this experiment, I suggest “netcons”.
  2. In the top level of the kernel configurator, select “Device Drivers”, then “Network device support”, then the correct Ethernet type for your hardware (10/100, 1000, or 10000 Mbit). Distribution maintainers usually modularize these drivers, so that only one will be loaded during hardware detection. Mark the driver for your system as “Yes”, rather than “Module”. If you are not sure which driver you need, you can check each driver’s module name by checking its “Help”.
  3. Exit the configurator, saving the new configuration.
  4. Build the new kernel and modules, and install the modules.
  5. If the new kernel needs an init ramdisk for booting, build a new init ramdisk, taking care to give it a name that won’t overwrite the init ramdisk for the known good kernel. One idea is to call it /boot/initrd-netcons.gz, in keeping with our “netcons” naming convention.
  6. Referring to the notes you made earlier, record the following kernel parameter, in an editor or on the note paper:
    netconsole=6665@[remote IP]/eth0,6666@[viewer IP]/[viewer MAC]
    Substitute your IP addresses and viewer MAC address where noted. This tells the kernel netconsole code how to build the Ethernet packets, based on the origin and destination of the network packets. The IP addresses are necessary, because DNS and the /etc/hosts file are not available for name-based network access.
  7. If the remote system boots with LILO:
    • Add a new stanza to /etc/lilo.conf for the test kernel, and call it “linux-netcons”. Include any necessary kernel parameters in the append= line. Add to those parameters the netconsole= from the previous step. Do not modify the default= name.
    • Run lilo to install the new boot configuration.
    • Run lilo -R linux-netcons. This will use the test kernel for the next boot only. In case the test kernel causes a panic or otherwise fails to boot properly, the following reboot will revert to the known good kernel.
  8. If the remote system boots with GRUB:
    • Add a new stanza to /boot/grub/menu.lst for the test kernel, and call it “linux-netcons”. Include any necessary kernel parameters on the kernel line. Add to those parameters the netconsole= from the previous step.
    • To make sure the linux-netcons kernel boots only once, in case it panics or otherwise fails to boot properly, follow the instructions here.

Testing the new kernel

  1. On the viewer system, run nc -l -u -p 6666 (no root privileges needed). “nc” might be called “netcat”, to avoid a name collision with the NEdit client.
  2. Reboot the remote system, then wait a couple minutes for BIOS testing and boot loader delay. If kernel boot-up messages are not appearing on the viewer system, try the steps in the “Troubleshooting” section before pressing the reset button or doing a power-cycle on the remote system.
  3. If you saw the remote system’s kernel boot-up messages on the viewer system, congratulations! You have built a netconsole-enabled Linux kernel. If everything is working properly (meaning you don’t see a kernel panic from the remote system), you should log on to the remote system and check the system logs for any unusual errors during boot. Given that the root volume mounted properly, and the network services are listening, the odds of serious errors are very slim.

Once you are satisfied with your custom-built kernel, you can make it the permanent by making it the default in your LILO or GRUB configuration file. However, if you reconfigure and rebuild your kernel, remember to follow the above safety instructions.

Troubleshooting

Netcat shows nothing, even after waiting several minutes.

This does not mean the remote system has panicked. If the remote system responds to a network ping, the system is up and running. You should be able to open an ssh session, to check the uname -r and dmesg output.

If uname -r doesn’t show the “netcons” name that you gave the custom kernel, it means the system is running the known good kernel instead. Repeat step 7 or 8 above, as appropriate for the remote system.

If dmesg shows an early message that netconsole wasn’t able to initialize the Ethernet interface, the necessary Ethernet driver was built as a module, as indicated by the fact that the network is up. This is especially likely if you started with your distribution’s stock kernel configuration. Another possibility is that the correct driver is included, but the motherboard’s interface driver (ISA, PCI or PCIe) is missing from the kernel, being included instead in the init ramdisk. No major Linux distributor does this, but it is not impossible.

Yet another possibility is that a system with multiple Ethernet interfaces was configured to use the wrong Ethernet connection for sending the console output. You may need to modify the kernel command line in the boot loader configuration, specifying eth1, eth2, and so forth, until you have exhausted all of the possibilities for your system.

If the dmesg output doesn’t show any obvious cause on the remote system, another possibility is that netcat was not listening to the correct port. Make sure you specified the same port in the remote kernel command line and the netcat -p parameter (6666 in the above example).

Finally, remember that the remote system is booting with the “netcons” kernel only once, and any reboot will revert to the known good kernel unless instructed otherwise. Double-check step 7 or 8 above, as a safety measure, especially if you reconfigure the kernel or the boot loader.

(For the true hacker: Even if netcat shows nothing, the UDP packets on the wire can show up in a tcpdump or Wireshark capture.)

I see a panic in the netcat output.

The most likely cause of a kernel panic is a failure to mount the root volume. Did you include the necessary filesystem driver in the init ramdisk (or directly into the kernel)? Did you specify the correct kernel version when you built the init ramdisk? These are standard concerns for any custom-built kernel. A hard reset, or power-cycle, should reboot with the known good kernel.

Still, you should give yourself a pat on the back. If you see the console messages, including the panic, it means your netconsole setup is working! In fact, such “early” messages sent to netconsole are the precise reason for building netconsole and Ethernet directly into the kernel, rather than as loadable modules.

The remote system doesn’t respond to ping or ssh. Now what?

Force the issue, with either a press to the reset button, or a power-cycle. If you configured your boot loader’s configuration properly, this should get control back by booting to the known good kernel.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: