it happened a few times in the past that users wonder why they see two
/sys/fs/cgroup
mounts in their unprivileged container.
When working with unprivileged containers in Podman, users often
notice two /sys/fs/cgroup
mounts if the container is not using a new
network namespace.
The Limitation of Unprivileged Users
An unprivileged user, by definition, lacks certain permissions that
are available to the root user. One of these limitations is the
inability to mount a fresh /sys
filesystem within a new user
namespace, unless there is already a /sys
filesystem mounted and
accessible in the current namespace, and that the user namespace also
owns the current network namespace.
When such conditions are not met, Podman uses a bind mount from the
/sys
filesystem of the host to provide the container with a /sys
filesystem.
Cross-Namespace Bind Mounts
A consequence of a bind mount that crosses two user namespaces is the
kernel automatically ‘locking’ the new mount, treating it as a single
entity. This has the effect of preventing the inner container from
unmounting the /sys/fs/cgroup
mount, as it is considered part of the
/sys
mount itself.
New cgroup mount
The /sys/fs/cgroup
mount, embedded within the /sys
mount, refers to
the host environment's cgroup mount. A fresh /sys/fs/cgroup
mount is
needed for the container, which is then mounted on top of the existing
embedded mount.
The consequence of this approach is the appearance of two
/sys/fs/cgroup
mounts within the container, as it can seen in the
following example:
$ podman run --rm -ti --user podman quay.io/podman/stable podman run --rm --network=host fedora findmnt -R /sys
Resolved "fedora" as an alias (/etc/containers/registries.conf.d/000-shortnames.conf)
Trying to pull registry.fedoraproject.org/fedora:latest...
Getting image source signatures
Copying blob 718a00fe3212 done |
Copying config 368a084ba1 done |
Writing manifest to image destination
TARGET SOURCE FSTYPE OPTIONS
/sys sysfs sysfs ro,nosuid,nodev,noexec,relatime,seclabel
|-/sys/fs/cgroup cgroup2 cgroup2 ro,nosuid,nodev,noexec,relatime,seclabel,nsdelegate,memory_recursiveprot
| `-/sys/fs/cgroup cgroup2 cgroup2 ro,nosuid,nodev,noexec,relatime,seclabel,nsdelegate,memory_recursiveprot
|-/sys/firmware tmpfs tmpfs ro,relatime,context="system_u:object_r:container_file_t:s0:c210,c329",size=0k,uid=1000,gid=1000,inode64
| `-/sys/firmware tmpfs tmpfs ro,relatime,seclabel,size=0k,uid=100999,gid=100999,inode64
|-/sys/fs/selinux tmpfs tmpfs ro,relatime,context="system_u:object_r:container_file_t:s0:c210,c329",size=0k,uid=1000,gid=1000,inode64
| `-/sys/fs/selinux tmpfs tmpfs ro,relatime,seclabel,size=0k,uid=100999,gid=100999,inode64
|-/sys/dev/block tmpfs tmpfs ro,relatime,context="system_u:object_r:container_file_t:s0:c210,c329",size=0k,uid=1000,gid=1000,inode64
| `-/sys/dev/block tmpfs tmpfs ro,relatime,seclabel,size=0k,uid=100999,gid=100999,inode64
|-/sys/devices/virtual/powercap tmpfs tmpfs ro,relatime,context="system_u:object_r:container_file_t:s0:c210,c329",size=0k,uid=1000,gid=1000,inode64
| `-/sys/devices/virtual/powercap tmpfs tmpfs ro,relatime,seclabel,size=0k,uid=100999,gid=100999,inode64
`-/sys/kernel tmpfs tmpfs ro,relatime,seclabel,size=0k,uid=100999,gid=100999,inode64