I have been working on a new functionality for the prctl syscall utility that addresses a common security concern with container runtimes.
On a Linux system, under /proc
it is possible to find many
interesting files about a process, one of them is /proc/[pid]/exe
,
that points to the executable file that was used to launch the
process. The man page for proc(5)
states the following:
This can be useful in certain situations, but it can also pose a
security risk. In particular, the /proc/self/exe
file became popular
with the CVE-2019-5736 vulnerability, which allowed attackers
to escape from a container and gain access to the host!
Much was written about this vulnerability, in particular I suggest to learn more about the issue here: https://unit42.paloaltonetworks.com/breaking-docker-via-runc-explaining-cve-2019-5736/.
The short version is that the attacker was able to overwrite the
container runtime executable file on the host taking advantage of the
/proc/self/exe
file.
The workaround that was implemented in the container runtimes was to use a read-only bind mount, or using a copying of the runtime executable and then using it to re-exec itself before handing the container execution.
To solve the root problem, I’ve proposed a new option for prctl()
called PR_HIDE_SELF_EXE
. This feature makes any access to
/proc/self/exe
always return ENOENT
, effectively preventing a
process from being able to access its own executable file.
It would have been better if the kernel didn’t allow such kind of
issues at all, but at this point any change on how /proc/self/exe
works would be a breaking change. Instead prctl(PR_HIDE_SELF_EXE)
is not a breaking change since a program must opt-in to use this
feature and it won’t affect programs that don’t use it.
Once the PR_HIDE_SELF_EXE
flag has been set, it cannot be cleared;
it will be automatically cleared when the process calls again
execve
.
Here there is an example of how it would look:
|
|
and then running it:
|
|