Avoid a memory page allocation on mount(2)
While working on crun, I got surprised by how much time the kernel spent in the copy_mount_options function. A container runtime issues a large number of mount(2) syscalls during startup — bind mounts, proc, sysfs, devtmpfs, and more — many of them with no extra options to pass. It turned out that passing an empty string instead of NULL for the data argument caused the kernel to allocate a full memory page and attempt a copy from user space on every one of those calls, adding measurable overhead.
The root cause was in using an empty string instead of NULL when
there are no options for the mount syscall.
In the common mount case, copy_mount_options takes most of the
time.
The data option in the mount(2) syscall allows the user to pass down
from userspace to the kernel some additional options. How these
options are interpreted is specific to the file system. Generally it
is a comma-separated string of values.
|
|
On a mount, the kernel internally allocates a page of memory where
data is copied to. If the whole page cannot be copied from upstream
because a fault happened, the remaining buffer is memset’ed to 0.
If there are no options to pass down, using NULL is preferable to
the empty string, as the kernel will not allocate a memory page and
won’t attempt any copy from user space.
To give a measure of the improvements, I’ve tried to run the following program:
|
|
and I got these results:
|
|
Replacing the empty options string with NULL:
|
|
That is almost 12 times faster!