Skip to content
August 9, 2013 / gus3

Accessing the Raspberry Pi’s 1MHz timer, via kernel driver

Introduction

As it turns out, my previous effort was, shall we say, somewhat off-base. Being a member of the “kmem” group, or running a SUID-root program, isn’t enough to access /dev/mem; the program capability CAP_SYS_RAWIO must also be present.

The root user already has this capability, and passes it to all programs run from a root shell. Non-root users can gain this capability when running a program, if the root user has run the following command:

/sbin/setcap cap_sys_rawio+epi /path/to/program

It also helps to make the program set group ID “kmem”:

/usr/bin/chgrp kmem /path/to/program
/usr/bin/chmod g+s /path/to/program

Thus, with the correct group and capability, opening /dev/mem for reading succeeds.

There is one catch: the /sbin/setcap program stores program capabilities in a program file’s extended attributes, the same mechanism used for access control lists (ACL’s). Thus, /sbin/setcap will fail when a program file is on a FAT filesystem. As I found out after much hair-pulling, it will also fail on an NFS mount. So I took the shortcut and ran from local storage instead.

And then there’s gettimeofday(2). Even though it has its own page in the glibc documentation, it is a standard Unix kernel call, with microsecond resolution. It doesn’t rely on updates at each timer tick, despite what I thought it was. For many benchmark timing purposes, gettimeofday() will probably be sufficient.

Motivation

The problems with gettimeofday() are twofold:

  1. The values returned aren’t ordinary integers, but instead two integers indicating seconds and microseconds, so arithmetic operations on them require extra code. To compensate, Linux and BSD have library routines: timeradd(), timersub(), and timercmp().
  2. The values returned are supposed to reflect the Unix epoch. If someone (a sysadmin) or something (an NTP client) changes the system time during a benchmark, gettimeofday() will return jittery values.

The second point is worth noting: Slackware ARM on the Raspberry Pi’s custom kernel will lose a surprising amount of system time under heavy load. I have a cron job set up to adjust the time every 15 minutes, to keep the system timer from going too far astray. Even so, when mostly idle, the system loses ~6ms per adjustment, according to /var/log/messages. (I should point out, in defense of gettimeofday(), that in my performance tests it came out on top in “normal” jitter, with a visibly lower standard deviation than my kernel driver. However, my kernel driver had some calls with <1?s duration, whereas gettimeofday() never had a call return that fast.)

So once again, resorting to the 1MHz timer via /dev/mem seems to be the best bet. Yet, it seems a nuisance to execute the above commands as root after every build of a program (or worse, building the program as root, with the above commands in the Makefile), just to get non-root access to the timer. The ideal solution, at least to my way of thinking, is to expose the timer to all users, read-only, through a character device file.

Once I got it in my head that I could do this, safely, on a read-only basis, I started reading the 3rd edition of Linux Device Drivers by Corbet, Rubini, and Kroah-Hartman, specifically chapter 3, “Char Drivers.” It explains the following concepts:

  • Major and minor device numbers
  • The file operations structure, with pointers to handler functions
  • The file and inode structures (which aren’t germane to this driver)
  • The cdev structure, and its role in global device management

I also took some notes concerning how I wanted the driver to behave, i.e. how I wanted it to respond to requests from user programs. I decided on a very simple subset of the file API to support:

  • open() and close() should work normally.
  • read() should return up to 8 bytes, the length of the timer. If a program requests fewer bytes, that should be permitted; the program knows its intentions. If a program requests more than 8 bytes, the call to read() should still return only 8 bytes.
  • write() should fail, for everyone, since the timer isn’t writable.
  • mmap() should also fail. The timer is part of a larger device control block; it can’t be exposed in a mapped memory page without also exposing much more system-critical information.
  • Asynchronous operations, scatter-gather, and ioctl() aren’t supported.

So basically, the only file operation that needs explicit support in the module is read(). Other methods are left at their defaults, which for most methods means returning “operation not supported.”

I decided to impose one non-standard restriction: all calls to read() ignore file positioning. The BCM2708 timer runs at 1MHz; by always returning the N least-significant bytes, whatever bytes are returned, are guaranteed to have a 1MHz granularity. If a user program requests only the lower 3 bytes, then such a 1MHz timer would overflow every ~16.78 seconds (16,777,216 ?s, the maximum count in 24 bits). A programmer may want exactly that behavior; who am I to judge? Similarly, if a programmer wants the lower resolution of the timer’s top N bytes, or even bits, then the programmer can shift the result to the right by 64-N bits after calling read(). The driver provides the low N bytes to a program; what the programmer does with those bytes is not the driver’s concern.

So, with these design decisions in hand, it was time to start writing code.

The module prologue

Oddly, I found myself writing the module’s initialization and unloading routines first. Initializing the driver consists of two main steps:

  1. Set up the structures for a Linux character device driver.
  2. Acquire a static pointer to the 1MHz timer.

If either of these steps fails, the initialization function releases any allocated resources, sends a failure report to the kernel log, and then returns the error -ENODEV to indicate initialization failure.

The first step is standard fare. Linux requires two main structures, cdev and file_operations. The former defines “what,” and contains a pointer to the latter, which defines “how.” The C file should contain at least 1 static member of each of these structures. A third structure, file, is created by the Linux kernel for every open file on the system. Most character and block drivers need to be aware of this structure, because it contains important tracking information (such as current file position). However, I chose to ignore the file structure. A simple device should have a simple driver, and the BCM2708 1MHz timer couldn’t be a simpler device.

The second step, acquiring a pointer to the timer, has a specialized LInux kernel function: ioremap(). Each major architecture has its version of ioremap.c, dedicated to obtaining a pointer to a specific physical page address in RAM. A kernel driver shouldn’t assume that all pages of RAM are equally accessible. The paging mechanism is still in effect in kernel code; trying to access an unmapped address will generate a kernel OOPS at best, or a panic at worst. Calling ioremap() gets a pointer to the mapped page, then a separate pointer is set to point to the timer itself within the page. Both these pointers are declared static, so they’re available to the module code, but hidden from external reference.

If the module fails to register the character device properly, or fails to map the timer page, the initialization function immediately calls the unloading routine, which unregisters the character device major/minor numbers.

The module epilogue

Unloading the module is actually very simple: unmap the timer’s (mapped) page, then unregister the major/minor numbers. If the unload function is called during a failed attempt to initialize the module, the timer’s page may not be mapped. In this case, there is no call to iounmap().

Reading the timer

The read() function is the core of the driver. All of the support code is already present in the Linux kernel; the only extra burden is to deal with requests of less than 1 byte, or more than 8 bytes. Less than 1 byte is, of course, zero, so the read request returns immediately, copying nothing. Any requests greater than 8 bytes are truncated to 8 bytes only, because the timer is 64 bits (8 bytes) wide. Once these limitations are accounted for, the only remaining step is to call copy_to_user, passing the addresses of the timer and the user-space destination buffer.

And the rest

The rest? There really isn’t any, unless you count the Makefile. Like I said, a simple device should have a simple driver.

The code

Now that all the explanation is in place, here is the code to make it happen. First, the source file, bcm2708_usec.c:

#include <linux/module.h>
#include <linux/init.h>
#include <linux/device.h>
#include <linux/kernel.h>
#include <linux/errno.h>
#include <linux/fs.h>
#include <linux/cdev.h>
#include <linux/uaccess.h>
#include <linux/io.h>

MODULE_LICENSE("GPL");

/*
 * Major 240 is the base of the "experimental" devices.
 * If this driver enters the kernel tree, these should change.
 */
#define MAJORDEV 240
#define MINORDEV 0

/* prototype for the core function */
ssize_t timer_read(struct file *, char *, size_t, loff_t *);

/* other local functions */
static int bcm2708_usec_init(void);
static void bcm2708_usec_cleanup(void);
static int do_bcm2708_usec_init(void);
static void do_bcm2708_usec_deinit(void);

#define TIMER_PAGE_BASE 0x20003000
#define TIMER_OFFSET 4

/* some driver-local storage */
static u8 *timer_pf;		/* the page frame containing the timer */
static u8 *timer;		/* byte ptr to the timer itself */

/* NOTE: up to 8 bytes of timer available, enforced in code */

static struct cdev cdev;

static struct file_operations usec_fops = {
	.owner = THIS_MODULE,
	.read = timer_read,
	.write = NULL,
};

/**
 * bcm2708_usec_init - initialize this module's private info
 *
 * Description:
 *   Registers the necessary kernel structures, and
 *   maps the bcm2708's monotonic 1MHz timer into a
 *   kernel page.
 */

static int __init bcm2708_usec_init(void)
{
	dev_t devnum = MKDEV(MAJORDEV, MINORDEV);
	int err;

	/* register the character major/minor */
	err = register_chrdev_region(devnum, 1, "bcm2708_usec");
	if (err)
		goto bail;

	/* set up the chardev control structure */
	cdev_init(&cdev, &usec_fops);
	cdev.owner = THIS_MODULE;
	cdev.ops = &usec_fops;
	err = cdev_add(&cdev, devnum, 1);
	if (err)
		goto bail;

	/* get the mapping for the page frame containing the timer */
	timer_pf = ioremap(TIMER_PAGE_BASE, SZ_4K);
	/* and set the pointer to the timer itself */
	timer = timer_pf + TIMER_OFFSET;
	pr_info("bcm2708_usec initialized; timer @ %pK\n", timer);
	return 0;

bail:
	/* undo the developer's brain damage (hopefully) */
	bcm2708_usec_cleanup();
	pr_err("bcm2708_usec failed to initialize, err = %d\n", err);
	return -ENODEV;
}

/**
 * bcm2708_usec_cleanup - release module's allocated resources
 *
 * Description:
 *   Unmaps the module's global page containing the pointer,
 *   then unregisters the driver's kernel structures.
 */
static void __exit bcm2708_usec_cleanup(void)
{
	dev_t devnum = MKDEV(MAJORDEV, MINORDEV);

	/* if the timer was mapped (final step of successful module init) */
	if (timer_pf)
		/* release the mapping */
		iounmap(timer_pf);
	/* and release the device major/minor allocation */
	unregister_chrdev_region(devnum, 1);
}

/**
 * timer_read - fetch bytes from the BCM2708's 1MHz timer
 * @flip:		The file pointer (guaranteed to be correct after open())
 * @buff:		The buffer receiving the lowest N bytes of the 1MHz timer
 * @count:		How many bytes to transfer (maximum 8 bytes)
 * @offset:		Ignored; every read transfers only the lowest N bytes
 *
 * Description:
 *   This is the core method of the source code here presented. A user program
 *   requests N bytes (N<=8) from the BCM2708's timer, and this routine copies
 *   them to the supplied user-space pointer. Positioning is ignored. All calls
 *   to this routine will copy the lowest N bytes, guaranteeing 1MHz granularity.
 *   There is no syncrhonization within this scope. Functions called from here
 *   may do their own synchronizations. Otherwise, a simple bounds check is the
 *   only necessary step before calling copy_to_user.
 *
 * Returns:
 *   The number of bytes copied.
 */

ssize_t timer_read(struct file *filp, char __user * buff, size_t count,
		   loff_t * offset)
{
	u64 cur_timer;	/* thread-local storage for the current timer */

	/* The null case, return early. Unlikely, but we won't argue. */
	if (count < 1)  		return 0;  	/* grab the timer (in a single ARM operation, hopefully) */  	cur_timer = *timer;  	/* Transfer maximum of 8 bytes. Note that copy_to_user returns the  	 * number of bytes NOT copied, so we need to invert that result.  	 */  	return count - copy_to_user(buff, timer, count > 8 ? 8 : count);

	/*
	 * Yes, that's how brain-dead this operation is. Just copying bytes
	 * into a user buffer, and returning the count of bytes copied.
	 */
}

/*
 * The following two functions may seem redundant, but this seems to be a
 * Linux kernel code idiom. The users have expectations about how the code
 * behaves; the devs have expectations about how the code looks.
 */

static int __init do_bcm2708_usec_init(void)
{
	return bcm2708_usec_init();
}

static void __exit do_bcm2708_usec_deinit(void)
{
	pr_info("Unloading bcm2708_usec module.\n");
	/*
	 * This routine is the "sane" case, for unloading a successfully-loaded
	 * driver. Others may call bcm2708_usec_cleanup() for their own,
	 * pathological reasons.
	 */
	bcm2708_usec_cleanup();
}

/* End of the code road. Hope you enjoyed the journey. */

module_init(do_bcm2708_usec_init);
module_exit(do_bcm2708_usec_deinit);

And the Makefile to build it:

# remove any special parameters to Make
MAKE = make

ifneq ($(KERNELRELEASE),)
	obj-m := bcm2708_usec.o
else
	KERNELDIR ?= /lib/modules/$(shell uname -r)/build
	PWD := $(shell pwd)

default:
	$(MAKE) -C $(KERNELDIR) M=$(PWD) modules

endif

clean:
	rm -fv *.ko *.o *.mod.? Module.symvers modules.order

To build the module on a Raspberry Pi, first you will need the kernel source code, preferably for the currently running kernel, with the necessary /usr/src/linux/.config to go with it. Save the above two files into the same directory, then as root, type “make”. If all goes well, the local Makefile will redirect the build process to the kernel source’s mechanism for building out-of-tree modules. If the build fails, check for conflicts between the kernel source and its .config file, then type “make clean default” to try again.

If your kernel source tree is not in /usr/src/linux, the following will invoke the proper make subsystem for the kernel you wish to build this module for:

make KERNELDIR=

Once you get a successful build, the next step is to load the module:

/sbin/insmod ./bcm2708_usec.ko

The #1 cause of failure at this step will be a mismatch between the configurations of the running kernel and the kernel source tree. The error report in this case is “Invalid module format”. The running kernel and the source in the kernel tree must match! If “insmod” reports no error, the module is probably loaded. Confirm this with a simple dmesg. If the module is loaded and initialized, the last line should report the kernel-visible address of the BCM2708 timer.

After a successful build and load, one more step remains to access the timer. Create the char device file:

/bin/mknod -m 444 /dev/bcm2708_usec c 240 0

This file is the program-visible mechanism used to access the timer, bypassing the need for special program capabilities (such as CAP_SYS_RAWIO). Once all these steps are completed, you can confirm that the driver is actually working properly with the following command (try it as both root and non-root!):

od -t u8 /dev/bcm2708_usec

Hopefully, you will see a large decimal value, followed by several zeros, and ellipsis marks indicating more skipped zeros, repeating until you type Control-C. The non-zero values should be increasing. The zeros are an artifact of the program’s internal buffering. Since only the read() function is implemented, many other conventional programs (which use mmap() or ioctl() instead) won’t work the same. od uses the read() function, which is what we want.

Putting it to use

So what kind of unprivileged program would use the raw BCM2708 timer? Well, a simple demonstration shows how I anticipate using such a “device”:

/* a demo program showing how to access the bcm2708_usec
timer device (char-major-240 for now). */

#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>

int main(int argc, char *argv[])
{
int timer_file, result;
unsigned long long int t1 = 0, t2 = 0;
unsigned int t3 = 0, t4 = 0;

/* open device file using kernel call */
timer_file = open("/dev/bcm2708_usec", O_RDONLY);
if (timer_file < 0)
{
fputs("open() failed, aborting...\n", stderr);
return 255;
}

/* Notice the lack of file positioning between reads. This is
by design, as it really makes no sense for this device. */

if ((result = read(timer_file, &t1, sizeof(t1))) < 0)
fputs("read() failed for t1\n", stderr);

/* a handy macro to reduce redundant typing */
#define READ_TIMER(x) read(timer_file, &x, sizeof(x))

if ((result = READ_TIMER(t2)) < 0)
fputs("read() failed for t2\n", stderr);

printf("t2 - t1 = %lld usec\n", t2 - t1);

/* the macro is also useful for shorter destination types */

if ((result = READ_TIMER(t3)) < 0)
fputs("read() failed for t3\n", stderr);

sleep(1);

if ((result = READ_TIMER(t4)) < 0)
fputs("read() failed for t4\n", stderr);

printf("t4 - t3 = %d usec\n", t4 - t3);

close(timer_file);

return 0;
}

Open, read, read, read, read, close. I opted for simple paradigm, in part to minimize the effects incurred on benchmarks by requiring the extra seek(). Also, by using sizeof(x) for the number of bytes to transfer, the chances of a buffer overflow are effectively reduced to nil. This is step 1 of secure programming with string buffers, but here it’s simply sensible.

Conclusion—the bigger picture

The stated purpose of the Raspberry Pi is to provide a simple, inexpensive platform for experimentation. For myself, the US$35 (plus shipping, tariffs, and all that) has paid off in spades, giving me the opportunity to attain that Holy Grail of Linux programming: writing a kernel module. A year ago, or even three months ago, such a task wasn’t something I’d consider. But in that tradition of “scratching an itch,” having a Raspberry Pi running Linux opened a door in my mind. All I had to do was step through it.

About these ads

5 Comments

Leave a Comment
  1. SpCr / Mar 19 2014 8:22 am

    Nice post ! i have a question. It is something that i didn’t understand. You get the timer address and store it in the timer pointer, but how does timer_read() know, where to read from? I can’t understand if these two are associated somehow.. I hope i was clear of what i asked :S .

    • SpCr / Mar 19 2014 8:45 am

      I just realized that i had to scroll to the right, to see the missing code. :P I got it now.
      Sorry and thanks

Trackbacks

  1. Accessing the Raspberry Pi’s 1MHz timer, via kernel driver | Hallow Demon
  2. Accessing the Raspberry Pi's 1MHz timer, via ke...
  3. Links 11/8/2013: Fedora Flock, Qualcomm Changes Course on Blobs | Techrights

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: