Skip to content
July 27, 2008 / gus3

Scratching an Itch

In a prior article, I pointed out that the first filesystem mount in Linux can use only one XFS partition, due to the handling of the “root=” and “rootflags=” kernel command-line parameters. Ext3, JFS, and ReiserFS with external journals contain the journals’ device numbers in the filesystem superblocks, but XFS contains only a flag indicating “internal” or “external.”

I developed the following on SLAMD64, a 64-bit port of Slackware. The techniques presented here are based on the Slackware boot process, but the concepts are not limited to Slackware and its derivatives.

For background: In Linux 2.6.24.5, the problem is explained in init/do_mounts.c. Following the execution path, you can see that the first root filesystem mount happens in mount_root(). First, the device name is converted to a (major,minor) device number pair, then /dev/root is created with that (major,minor) pair. Finally, /dev/root is mounted as the root filesystem.

The catch to this is that the only partition accessible is the one created for /dev/root. The XFS code assumes all partitions are available, which is not the case during the first mount. Because of this, trying to use an external journal on the first mount fails with an “Invalid device” error and a kernel panic. The way around this is to make some other mount the first mount, then switch to the XFS later in the boot process. This is exactly what an init ramdisk gives us.

For Slackware and SLAMD64, the “mkinitrd” command provides the starting point. Here is the general approach (but be sure to read the addendum at the end):

  1. Make sure a kernel capable of booting an init ramdisk is installed to the system, and the bootloader (LILO or GRUB) is configured to use it (but not by default). You should also have prepared a new root partition,
  2. Build an init ramdisk with the necessary module(s) included. For me, this is:
    mkinitrd -c -m xfs -f xfs -r \
    /dev/sdXX -o /boot/initrd-hacked.gz

    Substitute your desired (and unusable?) root partition for “/dev/sdXX”. You may also need to specify the kernel version with “-k 2.6.xxxx”, especially if you have built a custom kernel.
  3. On Slackware and SLAMD64, the /boot/initrd-tree directory now contains the files built into the init ramdisk. One of those files is a shell script called “init”; the Busybox multi-call binary, invoked by the kernel as “/sbin/init”, will read this shell script and execute it. (Side note: I have found nothing, in both the Busybox docs and the Slackware/SLAMD64 build scripts, to indicate how /init in Slackware’s init ramdisk is executed. However, I know it is executed. As Isaac Asimov said, “any sufficiently advanced technology is indistinguishable from magic”; I suppose this makes Patrick Volkerding a magician.)
  4. Using your favorite editor, find the line containing the string “mount -o ro”. This is the line which mounts the actual root filesystem. Comment out that line, then substitute the line to accomplish your nefarious purposes. In my case, this is the following:
    mount -o ro,logdev=/dev/sda2,[...] \
    -t $ROOTFS $ROOTDEV /mnt

    Save the file and exit the editor.
  5. Re-build the init ramdisk using the same “mkinitrd” command, but this time omitting the “-c” parameter. Without “-c”, most of the files in the initrd-tree directory will be unmodified, including the customized /init script. You now have a customized init ramdisk in your /boot/initrd-hacked.gz.
  6. Re-check your bootloader configuration file, to make sure you are loading the customized init ramdisk. For LILO, you should add a new line
    initrd=/boot/initrd-hacked.gz
    to the new entry in /etc/lilo.conf, then run “lilo” to apply the changes. If you are using GRUB instead, you can probably add one of
    initrd=/initrd-hacked.gz
    or
    initrd=(hd0,X)/initrd-hacked.gz
    to the pertinent entry in /boot/grub/menu.lst. My /boot directory resides on /dev/sda1, so the latter would be “(hd0,0)” in my case.
  7. Now comes the big test: Reboot!

If all went well, your new root partition is mounted with options that weren’t available through “rootflags=”. If your testing shows that your system is working well, you may wish to re-configure your bootloader configuration so that the hacked init ramdisk is loaded by default.

For XFS, this approach allows the root partition to have an external journal. On my system, this makes everything faster, both boot-up and normal operation. However, an external XFS journal isn’t the only problem resolved with a hacked init ramdisk. Any filesystem options which are ignored on remount may be specified in a customized init ramdisk.

These steps are merely an outline, based heavily on Slackware’s mkinitrd system. Others, such as Fedora and Debian, use different commands for init ramdisk management; I have no experience with those. However, using the concepts explained here, you should be able to build a customized init ramdisk for your own Linux system.

Addendum 2010-07-06: This method can also be made to work with a generic init ramdisk, using a filesystem driver built into the kernel. Having already built an init ramdisk using the above method, I tried booting a newer kernel, with the XFS driver built-in, but using the same init ramdisk. It worked, with only one error; the new kernel didn’t have the XFS module in the init ramdisk, so the “insmod” command failed. However, that didn’t stop the system from booting.

Advertisements

10 Comments

Leave a Comment
  1. I Am, Therefore I Think / Jul 27 2008 1:22 am

    Finding the Fastest Filesystem

    What follows is based on my observations. My focus is the relative performance of different filesystems, not the raw benchmark numbers of my hardware. For this reason, I have not included any specific model numbers of the hardware. Part of

  2. TITO / Jul 27 2008 8:22 am

    “the Busybox multi-call binary, invoked by the kernel as “/sbin/init”, will read this shell script and execute it”
    I suppose that this script is executed in place of busybox’s init.

  3. gus3 / Jul 27 2008 12:29 pm

    The shell script resides in /init, but the kernel doesn’t search for that. The only one it will find for PID 1 is /sbin/init, which is the link to Busybox. According to the Busybox docs (http://busybox.net/downloads/BusyBox.html), busybox:init looks for /etc/init.d/rcS or /etc/inittab for execution guidelines. Neither of these exist in the Slackware initrd. Busybox’s internal inittab doesn’t contain any explicit direction toward /init.
    To run the /init script, Busybox is called with its internal ash interpreter; PID 1 is still running as init.
    I also looked in the SLAMD64 build script for mkinitrd, and didn’t find any patch to direct busybox:init to look for /init and execute it. So, on this one, I’m stumped!

  4. TITO / Jul 27 2008 12:46 pm

    #define INITTAB “/etc/inittab” /* inittab file location */
    #ifndef INIT_SCRIPT
    #define INIT_SCRIPT “/etc/init.d/rcS” /* Default sysinit script. */
    #endif
    Maybe they changed the hardcoded paths to point to /init

  5. gus3 / Jul 27 2008 8:39 pm

    Nope, I checked for that.

  6. TITO / Jul 28 2008 3:05 am

    From man INITRAMFS-TOOLS(8)
    INIT SCRIPT
    The script which is executed first and is in charge of running all other scripts can be found in /usr/share/initramfs-tools/init.
    This script then:
    1)mounts the real root
    2)moves virtual filesystems over to the real filesystem
    3)searchs for valid init on the new root
    4)does run-init /sbin/init
    The init script will not have pid 1 in this case, but the init launched by run-init will:
    1 ? Ss 0:01 init [2]
    5) by inspecting the kernel source in inux-source-2.6.25/init/main.c
    you will find:
    /*
    * check if there is an early userspace init. If yes, let it do all
    * the work
    */
    if (!ramdisk_execute_command)
    ramdisk_execute_command = “/init”;

  7. gus3 / Jul 28 2008 3:17 am

    Aha! My friend, I believe you have found it.
    …not that Patrick is any less a magician…

  8. Anonymous / Nov 9 2010 11:09 pm

    How do I verify that XFS is using the external journal? I set logdev=/dev/sdxX in fstab but I want to make sure or see some information to show me that the log is indeed residing externally on my other internal drive.

    • musicman529 / Nov 10 2010 5:54 pm

      If you specified an external logdev when you created the filesystem, you can trust that if you can mount it, it’s using the external logdev. An XFS filesystem, created with an external logdev, cannot be mounted without also passing “logdev=” as a mount option. There is no internal pointer in an XFS filesystem to the log device, as ext[34] and JFS have; there is only a flag specifying that the log is internal or external.

Trackbacks

  1. Xfs problems - caused by?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: