Why do I have two /sys/fs/cgroup in my container

It happened a few times in the past that users wonder why they see two /sys/fs/cgroup mounts in their unprivileged container. When working with unprivileged containers in Podman, users often notice two /sys/fs/cgroup mounts if the container is not using a new network namespace. The duplication is not a bug but an intentional consequence of how the kernel handles bind mounts that cross user namespace boundaries, combined with the need to provide the container with a writable cgroup view that is scoped to its own slice.

The Limitation of Unprivileged Users#

An unprivileged user, by definition, lacks certain permissions that are available to the root user. One of these limitations is the inability to mount a fresh /sys filesystem within a new user namespace, unless there is already a /sys filesystem mounted and accessible in the current namespace, and that the user namespace also owns the current network namespace.

When such conditions are not met, Podman uses a bind mount from the /sys filesystem of the host to provide the container with a /sys filesystem.

Cross-Namespace Bind Mounts#

A consequence of a bind mount that crosses two user namespaces is the kernel automatically ’locking’ the new mount, treating it as a single entity. This has the effect of preventing the inner container from unmounting the /sys/fs/cgroup mount, as it is considered part of the /sys mount itself.

New cgroup mount#

The /sys/fs/cgroup mount, embedded within the /sys mount, refers to the host environment’s cgroup mount. A fresh /sys/fs/cgroup mount is needed for the container, which is then mounted on top of the existing embedded mount.

The consequence of this approach is the appearance of two /sys/fs/cgroup mounts within the container, as it can seen in the following example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20


$ podman run --rm -ti --user podman quay.io/podman/stable podman run --rm --network=host fedora findmnt -R /sys
Resolved "fedora" as an alias (/etc/containers/registries.conf.d/000-shortnames.conf)
Trying to pull registry.fedoraproject.org/fedora:latest...
Getting image source signatures
Copying blob 718a00fe3212 done   | 
Copying config 368a084ba1 done   | 
Writing manifest to image destination
TARGET                            SOURCE  FSTYPE  OPTIONS
/sys                              sysfs   sysfs   ro,nosuid,nodev,noexec,relatime,seclabel
|-/sys/fs/cgroup                  cgroup2 cgroup2 ro,nosuid,nodev,noexec,relatime,seclabel,nsdelegate,memory_recursiveprot
| `-/sys/fs/cgroup                cgroup2 cgroup2 ro,nosuid,nodev,noexec,relatime,seclabel,nsdelegate,memory_recursiveprot
|-/sys/firmware                   tmpfs   tmpfs   ro,relatime,context="system_u:object_r:container_file_t:s0:c210,c329",size=0k,uid=1000,gid=1000,inode64
| `-/sys/firmware                 tmpfs   tmpfs   ro,relatime,seclabel,size=0k,uid=100999,gid=100999,inode64
|-/sys/fs/selinux                 tmpfs   tmpfs   ro,relatime,context="system_u:object_r:container_file_t:s0:c210,c329",size=0k,uid=1000,gid=1000,inode64
| `-/sys/fs/selinux               tmpfs   tmpfs   ro,relatime,seclabel,size=0k,uid=100999,gid=100999,inode64
|-/sys/dev/block                  tmpfs   tmpfs   ro,relatime,context="system_u:object_r:container_file_t:s0:c210,c329",size=0k,uid=1000,gid=1000,inode64
| `-/sys/dev/block                tmpfs   tmpfs   ro,relatime,seclabel,size=0k,uid=100999,gid=100999,inode64
|-/sys/devices/virtual/powercap   tmpfs   tmpfs   ro,relatime,context="system_u:object_r:container_file_t:s0:c210,c329",size=0k,uid=1000,gid=1000,inode64
| `-/sys/devices/virtual/powercap tmpfs   tmpfs   ro,relatime,seclabel,size=0k,uid=100999,gid=100999,inode64
`-/sys/kernel                     tmpfs   tmpfs   ro,relatime,seclabel,size=0k,uid=100999,gid=100999,inode64

Why do I have two /sys/fs/cgroup in my container

Table of Contents

The Limitation of Unprivileged Users#

Cross-Namespace Bind Mounts#

New cgroup mount#