<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Kernel on *scratch*</title>
    <link>https://www.scrivano.org/tags/kernel/</link>
    <description>Recent content in Kernel on *scratch*</description>
    <generator>Hugo</generator>
    <language>en</language>
    <lastBuildDate>Sat, 06 Jun 2026 10:03:54 +0000</lastBuildDate>
    <atom:link href="https://www.scrivano.org/tags/kernel/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>Hide the current process executable file</title>
      <link>https://www.scrivano.org/posts/2022-12-21-hide-self-exe/</link>
      <pubDate>Wed, 21 Dec 2022 22:15:00 +0200</pubDate>
      <guid>https://www.scrivano.org/posts/2022-12-21-hide-self-exe/</guid>
      <description>&lt;p&gt;I have been working on a new functionality for the &lt;code&gt;prctl&lt;/code&gt; syscall that addresses a common security concern with container runtimes. The &lt;code&gt;/proc/self/exe&lt;/code&gt; symlink, which points to the executable of the running process, was the key ingredient in CVE-2019-5736, a vulnerability that allowed a malicious container to overwrite the container runtime binary on the host. The workaround deployed at the time — re-execing from a copy or using a read-only bind mount — treats the symptom rather than the cause.&lt;/p&gt;</description>
    </item>
    <item>
      <title>The journey to speed up running OCI containers</title>
      <link>https://www.scrivano.org/posts/2022-10-21-the-journey-to-speed-up-oci-containers/</link>
      <pubDate>Wed, 21 Sep 2022 16:30:00 +0200</pubDate>
      <guid>https://www.scrivano.org/posts/2022-10-21-the-journey-to-speed-up-oci-containers/</guid>
      <description>&lt;p&gt;When I started working on crun, I was looking at a faster way to start up and stop containers by improving the OCI runtime, the component in the OCI stack that is responsible for talking to the kernel and setting up the environment where the container runs. Over roughly five years, a combination of kernel patches and userspace fixes reduced the time to start and stop a container from around 160 ms to just over 5 ms — nearly a 30x improvement — through targeted work on network namespace teardown, mqueue mount overhead, IPC namespace cleanup, and seccomp profile compilation.&lt;/p&gt;</description>
    </item>
    <item>
      <title>An interesting issue handling the seccomp listener</title>
      <link>https://www.scrivano.org/posts/2022-09-05-seccomp-listener/</link>
      <pubDate>Mon, 05 Sep 2022 21:59:12 +0200</pubDate>
      <guid>https://www.scrivano.org/posts/2022-09-05-seccomp-listener/</guid>
      <description>&lt;p&gt;A &lt;a href=&#34;https://github.com/containers/crun/issues/1002&#34;&gt;bug report&lt;/a&gt; filed against crun a few days ago exposed a deadlock: under certain seccomp profiles, the runtime would hang indefinitely before the container process ever started. The root cause is a subtle sequencing problem between installing a seccomp filter that intercepts a syscall and then immediately using that same syscall to hand off the resulting listener file descriptor to the userspace handler — the very handler that has not yet received the descriptor it needs to process the interception.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Cgroup v2 OOM group</title>
      <link>https://www.scrivano.org/posts/2020-08-14-oom-group/</link>
      <pubDate>Fri, 14 Aug 2020 19:49:32 +0200</pubDate>
      <guid>https://www.scrivano.org/posts/2020-08-14-oom-group/</guid>
      <description>&lt;p&gt;One annoying issue with setting a memory limit for a container is that the OOM killer can leave the container in an inconsistent state with only some of its processes terminated. When a cgroup hits its memory limit, the kernel selects a single process to kill based on a badness score, not all the processes in the cgroup. This means that a multi-process container — for example, one running a web server and several worker processes — may continue running in a broken state after the OOM event rather than being cleanly torn down.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Avoid a memory page allocation on mount(2)</title>
      <link>https://www.scrivano.org/2019/12/27/avoid-a-memory-page-allocation-on-mount/</link>
      <pubDate>Fri, 27 Dec 2019 16:16:33 +0000</pubDate>
      <guid>https://www.scrivano.org/2019/12/27/avoid-a-memory-page-allocation-on-mount/</guid>
      <description>&lt;p&gt;While working on crun, I got surprised by how much time the kernel spent in the &lt;code&gt;copy_mount_options&lt;/code&gt; function. A container runtime issues a large number of &lt;code&gt;mount(2)&lt;/code&gt; syscalls during startup — bind mounts, proc, sysfs, devtmpfs, and more — many of them with no extra options to pass. It turned out that passing an empty string instead of &lt;code&gt;NULL&lt;/code&gt; for the data argument caused the kernel to allocate a full memory page and attempt a copy from user space on every one of those calls, adding measurable overhead.&lt;/p&gt;</description>
    </item>
  </channel>
</rss>
