Basic Loadable Linux Kernel Module Example

In this tutorial, we’re going to walk through the simplest possible loadable Linux kernel module and discuss every last component of it. The source is entirely functional and was written on Ubuntu Server 16.04 LTS (kernel version 4.4.0) so please feel free to clone it, modify it, and try your hand at kernel coding. The repo comes complete with instructions on how to compile, load, unload, and watch for log entries from our module in addition to the module source itself.

Source code can be found here.

More in-depth tutorials on the subject of loadable Linux kernel modules (drivers, more specifically), can be found in the official documentation.

Introduction

Kernel programming can be intimidating. C in itself can be intimidating, even to experienced programmers. There are so many pitfalls to be aware of and so many ways for even the most veteran coder to screw up. In kernel land, these errors are magnified a thousand-fold. Code executing within the context of the kernel has absolute authority in the system. And that’s not to mention that kernel C syntax is, in some ways, different than syntax you’ll see in user space. In user space C, runaway binaries can be killed by the operating system. Their resource consumption can be constrained and controlled. In the kernel, there is no such protection.

It’s very important to realize that all of the constraints placed upon user space applications - written in C or otherwise - are subject to the laws put forth by the kernel. The perfect example of such a protection is everyone’s favorite C runtime error: the segmentation fault. In the kernel, segmentation faults are effectively non-existent. There is no memory address that you have insufficient permission to write to or read from. On occasion, there are mechanisms in place to dissuade kernel code running in dynamically loaded modules from altering certain structs but ultimately nothing can stop you from doing whatever you want, not even other parts of the kernel.

The stakes are high when writing kernel code. It forces you to adopt a certain depth of consideration and respect for the code you write because your code runs with absolute authority and can effect any imposition anywhere in the system with impunity. It can also destabilize your entire operating system without warning in a small fraction of a second.

But this extraordinary power (and the ensuing risks) aren’t what discourage most people from trying their hand at writing kernel code. Kernel land is a scary place and has a seemingly high barrier to entry. Ask your average developer how they’d go about writing some kernel code and most of them would be able to do little more than shrug.

There are a couple components to this project: the Makefile and the module source code itself. Both components are surprisingly straightforward.

The Makefile

The Makefile provides instructions to the “make” command about how to compile and assemble the primary build artifact: the kernel object. The primary build artifact of interest will have an extension of .ko. If you’re running on Ubuntu, you’ll need to install the make application and some basic compilation tools. On any somewhat recent Ubuntu distro, run the following to install everything you need:

$ sudo apt-get install build-essential

The module

The module itself is pretty straightforward. It contains an entry function that is executed when the module is first loaded and an exit function that is executed when the module is unloaded. These functions are decorated with __init and __exit macros, respectively, so that the kernel knows how to execute the code in the module on load and unload.

In this simple example, these are the only two functions in our module. Anything can be built on top of this basic module.

Deep dive

The Makefile

This particular Makefile is super simple. It contains two targets: all and clean. The “all” target simply compiles the module into a directly loadable .ko file. The “clean” target deletes build artifacts. The file looks like:

obj-m += module-skeleton.o

all:
    make -C /lib/modules/$(shell uname -r)/build M=$(PWD) modules

clean:
    make -C /lib/modules/$(shell uname -r)/build M=$(PWD) clean

It’s very important to note the hard tabs in this file. Make is pretty ancient so it won’t work with soft tabs (spaces). Make sure (hah! pun not intended) to use hard tab characters for indentation within a Makefile.

The obj-m line simply specifies the object file that we are going to be compiling our C into. In short, object files contain straight machine code - often in raw binary - that hasn’t been linked by a linker yet. That is to say that object files are raw machine code that has yet to be combined with other object files in order to produce a directly executable binary file.

In this Makefile, we define two targets: “all” and “clean”. We define the “all” target as:

make -C /lib/modules/$(shell uname -r)/build M=$(PWD) modules

The -C argument here tells make to change directory to the /lib/modules/$(shell uname -r)/build which contains all the dependencies we’re going to need to build our module. By building our module from this directory, we’re changing our build context to include all of the auxiliary libraries we need to make our module work.

The M=$(PWD) argument indicates to make that we’re building an external module and that the source code for the module we’re building is in our current working directory.

This is ultimately a recursive call to the make application. By defining our “all” target as a recursive make invocation with a few extra arguments, we save ourselves the work of typing all that out every time we want to build our module. The “modules” component indicates that we’re building a loadable module and is actually redundant and completely unnecessary but it’s nice to have since it’s explicit.

The “clean” target employs all the same reasoning with the exception that the recursive call to make specifies the “clean” target which, in this case, is not redundant. It simply indicates to the make application that we want to delete build artifacts produced as a byproduct of building our module.

For more information on making kernel modules, please refer to the official documentation.

The module itself

The module source itself is pretty simple also but let’s pick it apart piecemeal.

Includes

First and foremost we have our includes:

#include <linux/module.h>  /* Needed by all kernel modules */
#include <linux/kernel.h>  /* Needed for loglevels (KERN_WARNING, KERN_EMERG, KERN_INFO, etc.) */
#include <linux/init.h>    /* Needed for __init and __exit macros. */
#include <linux/slab.h>    /* kmalloc */

The linux/module.h include is required by all kernel modules. It provides a ton of plumbing required for modules including but not limited to:

MODULE_LICENSE()
MODULE_AUTHOR()
MODULE_DESCRIPTION()

We’ll come back to these macros in a moment.

The linux/kernel.h provides a boatload of basic structs that are very commonly used in the kernel. The most prominent in our simple example is:

printk()

The linux/init.h include provides for our beloved __init and __exit macros to denote the entry point and exit point of our module.

The linux/slab.h include provides for kmalloc() (the kernel version of malloc) which we don’t actually use anywhere in this module but is critical to pretty much any significant module you could hope to build in the future. We include it here simply because its prominence alone makes it noteworthy.

Entry point

One of the most important aspects of our module is the entry point. That is: the function in our module that is executed the instant the module is loaded into the kernel.

static int __init onload(void) {
  printk(KERN_EMERG "Loadable module initialized\n");
  return 0;
}

In this example, the module simply prints out to the /var/log/syslog file by way of the printk() function and the KERN_EMERG log level. KERN_EMERG is not strictly necessary but it ensures that the log entry shows up as quickly as possible in the syslog file.

The __init macro is what defines this function as the entry point into our kernel module. The function name can be anything as long as this decorator is present.

Returning 0 from this function indicates the module was successfully initialized. Returning anything else indicates to the greater kernel that something went wrong and the kernel could not be loaded.

Exit point

Much like the entry point of our module, we define a function to be executed every time our module is removed from the kernel. Again, it’s a simple function that logs to /var/log/syslog:

static void __exit onunload(void) {
  printk(KERN_EMERG "Loadable module removed\n");
}

The exit point of our module doesn’t provide a return value. The reason is not necessarily because it can’t fail - it can always cause a kernel panic, if by no other means than directly invoking the panic() function which halts the entire kernel and crashes the system - but because it doesn’t really make sense for the module to fail. A dynamic loadable module can be forcefully unloaded from the kernel so the return code doesn’t much matter to the kernel.

Much like our module entry point, the __exit macro that this function is decorated with indicates that this function should be executed when the module is removed from the kernel.

Entry and exit point registration

You probably noticed the calls to:

module_init(onload);
module_exit(onunload);

These functions are what actually register our entry and exit point functions. These functions make use of the aforementioned __init and __exit macros. They effectively register our module with the greater kernel.

Module metadata

In our module, we invoke a few optional functions:

MODULE_LICENSE("GPL");
MODULE_AUTHOR("Tyler Nichols <tyler.n.89@gmail.com>");
MODULE_DESCRIPTION("A simple skeleton for a loadable Linux kernel module.");

The MODULE_LICENSE(“GPL”) is the only really impactful function call here. You can absolutely decide to omit this function call but if you do, you’ll see a warning in /var/log/syslog explaining that your kernel has been “tainted” as soon as you load your module. Your kernel and your module will both continue to work just fine. This message is printed to indicate to you that your kernel has loaded source code that is not open source. That’s about it.

The MODULE_AUTHOR() and MODULE_DESCRIPTION() functions simply associate some metadata with your module to indicate who wrote the module and a brief description of its purpose, respectively. It’s considered best practice to include all three of these calls but none of them are strictly necessary.

Building, loading, and unloading

All of the instructions for building, loading, and unloading the module can be found in the README. We’ll cover them here also for the sake of completeness.

Building

To build the module from the root of the repository:

$ cd src/
$ make

Once you’ve built the module, you’ll want to monitor the /var/log/syslog file for the output produced by the module:

$ tail -f /var/log/syslog

To load the kernel object (file ending in .ko) into the kernel, run:

$ sudo insmod module-skeleton.ko

To unload the module, run:

$ sudo rmmod module-skeleton

Finally, to delete the .ko and all other build artifacts (basically everything but the Makefile and the source file), run:

$ make clean

And that’s it! Happy hacking!