I have been working on a new functionality for the prctl syscall utility that addresses a common security concern with container runtimes.
On a Linux system, under /proc
it is possible to find many
interesting files about a process, one of them is /proc/[pid]/exe
,
that points to the executable file that was used to launch the
process. The man page for proc(5)
states the following:
/proc/[pid]/exe
Under Linux 2.2 and later, this file is a symbolic link containing the actual pathname of the executed command. This symbolic link can be dereferenced
normally; attempting to open it will open the executable. You can even type /proc/[pid]/exe to run another copy of the same executable that is being run
by process [pid]. If the pathname has been unlinked, the symbolic link will contain the string '(deleted)' appended to the original pathname. In a multiā
threaded process, the contents of this symbolic link are not available if the main thread has already terminated (typically by calling pthread_exit(3)).
This can be useful in certain situations, but it can also pose a
security risk. In particular, the /proc/self/exe
file became popular
with the CVE-2019-5736 vulnerability, which allowed attackers
to escape from a container and gain access to the host!
Much was written about this vulnerability, in particular I suggest to learn more about the issue here: https://unit42.paloaltonetworks.com/breaking-docker-via-runc-explaining-cve-2019-5736/.
The short version is that the attacker was able to overwrite the
container runtime executable file on the host taking advantage of the
/proc/self/exe
file.
The workaround that was implemented in the container runtimes was to use a read-only bind mount, or using a copying of the runtime executable and then using it to re-exec itself before handing the container execution.
To solve the root problem, I've proposed a new option for prctl()
called PR_HIDE_SELF_EXE
. This feature makes any access to
/proc/self/exe
always return ENOENT
, effectively preventing a
process from being able to access its own executable file.
It would have been better if the kernel didn't allow such kind of
issues at all, but at this point any change on how /proc/self/exe
works would be a breaking change. Instead prctl(PR_HIDE_SELF_EXE)
is not a breaking change since a program must opt-in to use this
feature and it won't affect programs that don't use it.
Once the PR_HIDE_SELF_EXE
flag has been set, it cannot be cleared;
it will be automatically cleared when the process calls again
execve
.
Here there is an example of how it would look:
#include <stdio.h>
#include <fcntl.h>
#include <errno.h>
#include <sys/prctl.h>
int main()
{
int fd;
errno = 0;
fd = open("/proc/self/exe", O_RDONLY);
printf("Got fd: %d (%m)\n", fd);
close(fd);
prctl(PR_HIDE_SELF_EXE, 1, 0, 0, 0);
errno = 0;
fd = open("/proc/self/exe", O_RDONLY);
printf("Got fd: %d (%m)\n", fd);
return 0;
}
and then running it:
$ gcc -o hide hide.c
$ ./hide
Got fd: 3 (Success)
Got fd: -1 (No such file or directory)