Skip to content
July 14, 2008 / gus3

Finding the Fastest Filesystem

What follows is based on my observations. My focus is the relative performance of different filesystems, not the raw benchmark numbers of my hardware. For this reason, I have not included any specific model numbers of the hardware.

Part of my “economic stimulus check” went to a 500GB SATA drive. My original intention was to buy two of them, so I could claim, “over a terabyte of disk space!”. Alas, I got a little ahead of myself; my system had only one open hard drive bay. With a slightly bruised ego, I returned the unopened second hard drive and began to ponder how to exploit my super-roomy disk space. I quickly settled on one goal: find the fastest journaling filesystem (FS) for my SLAMD64 dual-core computer, with 2G of memory. My testing focused on three main areas: filesystem, disk I/O scheduler, and CPU speed.

I chose ext3, JFS, and XFS for my filesystem options. I specifically excluded ReiserFS from my testing, due to its tendency to bypass many of Linux’s internal disk management functions.

So that others may run similar tests on their own systems, I have provided a gzipped tarball (CAPTCHA and 30-second delay for free download) containing the scripts and my own test results.

Frankly, the final results stunned me.

FINDING BENCHMARKS

My first round of tests was a home-brew hack involving Slackware’s package management suite, distributed via threads across several directories, but I found my results too difficult to interpret. I also tried bonnie++, but test after test turned up no clear winner on anything. Part of the problem, I think, is that most of testing in bonnie++ runs in the context of a single thread, leaving each test CPU-bound.

After presenting a little bit of testing results to a discussion board, I learned about irqbalance, which works to distribute the IRQ load evenly across multiple processors, while keeping a particular IRQ response on a single CPU or core as much as possible. On general principles, I installed it immediately.

A little more research reminded me of dbench, the fantastic suite by Andrew Tridgell of Samba fame. His general goal with dbench is to exercise and time all of the basic file operations: create, write, seek, read, rename, lock, unlock, stat, and delete. In addition, the design of dbench specifically includes multi-threading; if a system is designed with multi-processing in mind, dbench will be able to demonstrate its advantage. The results I report here are from dbench, because each run showed a clear winner, by a substantial margin.

As an aside, another favorable point of dbench over bonnie++ is that dbench determines its run length by clock time, rather than number of operations completed. This gives a more deterministic approach to filesystem benchmarking, something I prefer. I’d rather provide a test script that runs for 5 minutes on all systems, than a test that operates on 10,000 files in 30 seconds here, and 20 minutes there.

FINDING A STARTING POINT

My first step (while still stuck in bonnie++-land) was to find a set of FS options that provided somewhat good performance. One option that stood out for each was an external journal, on a different controller. By isolating main partition I/O from journal I/O as much as possible, an SMP system can drive both of them at once. This helps all journaling filesystems.

With that in mind, I used /dev/sdb5 for the main partition, with /dev/sda2 for the journal.

The following are the commands and options I used:

ext3: mke2fs -O journal_dev /dev/sda2 #first step
mke2fs -j -J device=/dev/sda2 /dev/sdb5 #second step

JFS: jfs_mkfs -j /dev/sda2 /dev/sdb5

XFS: mkfs.xfs -l logdev=/dev/sda2,lazy-count=1,size=64m -d agcount=8 -f /dev/sdb5

Three things to note:

1. mke2fs looks for /etc/mke2fs.conf, which may contain additional ext2/ext3 options. My system specifies sparse_super, filetype, resize_inode, dir_index, and ext_attr, with a 4K block size and a 128-byte inode size.

2. jfs_mkfs has surprisingly few documented options.

3. mkfs.xfs has many options. The two mandatory options I needed were “logdev=” for the journal (logging) device, and “size=64m” to clarify that only 64 megs (the maximum) of the 2G partition would go to the journal. The other options are “lazy-count=1”, which eliminates a serialization point in the superblock, and “agcount=8”, meaning 8 allocation groups in the main data partition.

After creating each filesystem, I needed to mount it to /tester. I always passed “-o noatime” to each mount, so that every call to read() would not cause a write to disk.

For ext3, I also specified “data=writeback”, which greatly increased overall performance. This option is explained in the Linux kernel documentation as well as the mount(8) man page.

For XFS, I added “logdev=/dev/sda2,logbufs=8,logbsize=262144”. One drawback of XFS is that the external journal device must always be specified; it has no data in the superblock to indicate the journal location, other than “internal” or “external”. (I will follow up on this in a later article.) I specified 8 logging buffers, to match the number of allocation groups in the filesystem, and gave as much RAM to each log buffer as possible (262,144 bytes).

As with jfs_mkfs, JFS has a dearth of mount options.

RUNNING THE TESTS

In my initial dbench tests, I noticed that my filesystem throughput was mostly stabilized after about 15 seconds of actual test time. Since I was interested more in the short-term throughput typical of program loading and linking (and I tend to be impatient), I shortened the test time to 60 seconds with the following command line:

dbench -t 60 -D /tester $THREAD_COUNT

For a mild system load, I used 5 threads; for a heavy load, I used 20 threads. These are admittedly arbitrary figures, but they did expose well-threaded or poorly-threaded design on my dual-core system.

I created a script to do two primary tasks:

1. Warm the cache with a preliminary run of dbench.
2. Run dbench with each of the four I/O elevator algorithms (noop, deadline, anticipatory [which was deprecated later], and CFQ) as well as the slowest and fastest CPU speed available on my system. The output of these 8 iterations went to /tmp/$FILESYSTEM-$SPEED-$ELEVATOR.txt.

I also hacked up a little script for preliminary analysis of these output files. It found the best and second-best performers for each dbench operation, followed by an ascending list of overall throughput. I found that the elevator can be more important than CPU speed for a filesystem’s performance. The deadline elevator generally did best for all three filesystems in my tests, although the impact of elevator choice varied.

Since not everyone has an SMP system, I also ran the tests after booting with “nosmp”.

THE RESULTS

I was amazed by the numbers. For all but one uniform subset of tests, XFS was the clear winner. I even tried synchronous mounts and encrypted volumes as little “what if?” exercises, and XFS still came out on top. The parameter that cost XFS a total victory was the CFQ elevator on a slower system; ext3 won most of those cases.

One thing that shines about XFS is its highly-threaded design. For every permutation of elevator and CPU speed, XFS scaled upward from 5 threads to 20, although the degree of scaling with “noop” is probably within statistical noise. However, the CFQ elevator had a very large difference between a slow and fast CPU. In fact, (XFS + CFQ + fast CPU speed) on my system clashed so badly that the 5-thread test was the worst one for XFS.

JFS had a surprising disappointment: it never scaled upward. In fact, every single test had better performance with 5 threads than with 20. CPU speed and I/O elevator did not matter. All the 5-thread tests had throughput somewhere between 100-160 MB/s, and all the 20-thread tests came in somewhere between 40-45 MB/s. Once JFS reaches saturation, it performs no better than it would with a synchronous mount (-o sync).

What about ext3? Well, it showed some strange behavior on my system. With a fast CPU, it also showed the characteristic of performing (slightly) better under light load than under heavy load. However, with a slow CPU, all elevator differences disappeared into statistical noise, falling between 160-170 MB/s.

Grouping the results by CPU speed, threads, and I/O elevator, I found that XFS was best on SMP in all but two permutations. For the noop, deadline, and anticipatory elevators, all results come up with XFS best, then ext3, then JFS. With the CFQ elevator and 5 threads, ext3 won, then XFS, then JFS.

With a single CPU and 20 threads, the story was the same: XFS, then ext3, then JFS. However, with the lighter load of 5 threads, there was no uniform winner. ext3 topped the list with the CFQ scheduler and a faster CPU; otherwise, XFS was the winner.

APPLYING THE RESULTS

So which was fastest? On my SMP system, XFS with the deadline elevator topped the list, giving over 400 MB/s throughput. I switched my /home and /usr directories to use XFS with external journals, set the deadline elevator on the kernel command line, and suddenly OpenOffice.org went from 6.5 seconds launch time to 3.5 seconds. I am not the only one to notice that XFS performance improves with the deadline elevator.

Another drawback of XFS’s external journal is that it is currently impossible to specify the journal location on the kernel command line. The option “rootflags=logdev=/dev/XXXX” is not handled properly, due to shortcomings in the kernel’s handling of the first root volume mount. I circumvented this with a hacked initrd, which is another article in itself.

14 Comments

Leave a Comment
  1. Rich / Jul 14 2008 10:47 am

    How about running these with reiserfs. I’d be interested in these results as well.

  2. gus3 / Jul 14 2008 12:39 pm

    I thought about it, but as I indicated, ReiserFS re-invents so many wheels that it’s just that much more to break. But if you want to do a comparison yourself, the scripts are in the tarball linked in the first section.

  3. gus3 / Jul 14 2008 12:52 pm

    For the curious, here are the best figures for each filesystem.
    XFS: 403 MB/s, deadline elevator, 20 threads, SMP
    ext3: 278 MB/s, noop elevator, 5 threads, SMP
    JFS: 159 MB/s, deadline elevator, 5 threads, UP

  4. Chad / Jul 14 2008 3:21 pm

    Do you know what impact irqbalance had on the results? It sounds like a cool program so I’m curious to know how well if actually did.
    Chad
    http://linuxappfinder.com
    http://feedsanywhere.com

  5. Alex Chekholko / Jul 14 2008 3:24 pm

    Maybe I’m confused, but how are you getting 403MB/s (apples? read? write?) out of a single SATA spindle?
    The reason you’re getting roughly the same results from bonnie++ for all the filesystems is that bonnie++ takes care to avoid using data cached by the kernel. With your modern CPU and 2GB RAM, you’re plain I/O limited, and your FS choice and FS options have minimal effect on performance, particularly in real-world use, i.e. not synthetic benchmark.

  6. assente / Jul 14 2008 6:35 pm

    where is reiser4 and ext4?

  7. gus3 / Jul 15 2008 12:25 am

    Chad:
    I’m not sure how much impact irqbalance actually had on my particular results, although I can guess it really wouldn’t be that much overall. When I ran the tests, I was in single-user mode, so the only other IRQ’s beyond disk and timer would be the keyboard (and I made sure not to touch the KB while the tests ran).
    Running the tests on a heavily-loaded system would be more likely to show benefit from irqbalance.
    Alex:
    The MB/s figure is the final result from dbench. Truthfully, I don’t know that much about the internals of dbench and bonnie++, but as I said, bonnie++ didn’t show me any great performance difference between the filesystems, where dbench did.
    Not to denigrate bonnie++, but I don’t spend that much time writing 1G files, which seems to be what bonnie++ is testing. However, I do spend time launching programs, when I log in and when I start Firefox, OpenOffice.org, the Gnome text editor… So I think dbench actually does reflect my own usage patterns better. And with 2G of RAM, my system has lots of room for buffer chache. If that is what gives XFS such a big advantage on my system, then so be it.
    However, as I said, I’m not the only one to notice that XFS is much more responsive with the deadline elevator. The default elevator with Linux is CFQ, which interferes badly with XFS. Changing the elevator, in one user’s words, was “like getting a brand new laptop.”
    assente:
    ext4 may be progressing well, but I don’t want to trust it yet. Who knows, though, when it’s ready it might leave XFS in the dust.
    As for ReiserFS, I’ve already explained why I omitted it.

  8. chris w / Jul 15 2008 5:40 am

    ” I specifically excluded ReiserFS from my testing, due to its tendency to bypass many of Linux’s internal disk management functions.”

    Boooo! hisssss! What “management functions”??? You could have explained your reasons!

    A little insensitive of you considering the state of Reiserfs.

    And, hey, you could have just included it anyways, just to see what gives.

    This makes you seem shortsighted and such a filesystem geek bore. ugh.

  9. gus3 / Jul 15 2008 11:43 am

    “You could have explained your reasons”? What do you think I did?
    “Insensitive”? To whom? Whose life was made more difficult?
    Grow a skin.

  10. x / Jul 15 2008 3:54 pm

    [spam link comment deleted]

  11. I Am, Therefore I Think / Jul 27 2008 1:20 am

    Scratching an Itch

    In a prior article, I pointed out that the first filesystem mount in Linux can use only one XFS partition, due to the handling of the root= and rootflags= kernel command-line parameters. Ext3, JFS, and ReiserFS with external journals contain

  12. I Am, Therefore I Think / Sep 20 2008 1:08 pm

    The Fastest Filesystem: Update

    I’m sure I’m not the first, but I’ve made an interesting discovery about filesystems: System load affects data path speed. Yes, I’m good at stating the obvious. XFS did very well on my desktop system through the summer, but that

  13. Cormac / Oct 31 2008 12:04 pm

    Ive been using XFS for a number of years now, I got to say it’s just awesome, reliable, I only once had a corruption issue due to hard reset, which only resulted in one file being damaged.
    I have 3 systems running xfs, all super fast, especially dual raptors in raid 1.

  14. Shane Kerns / Jan 12 2009 8:10 pm

    XFS seems to be quite faster than even Ext4 based on my personal benchmark tests and the ones on Phoronix. That means XFS stacks up pretty well against even the latest filesystems however the lack sufficient developments to XFS may leave it in the dust as time moves on. I would love it if XFS could be benchmarked against ZFS and BTRFS (dev)

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.