The Failing Guard

Imagine you are a system administrator tasked with a simple security policy: block and log any attempt to execute binaries from the /tmp directory.

After some research, you settle on the eBPF Linux Security Module (eBPF LSM). It’s the perfect tool for the job—it allows you to hook into the execve path, inspect the filename and arguments, and decide whether to allow the execution.

You write the following eBPF code:

const char tmp_path[] = "/tmp/";
const int tmp_len = 5;

SEC("lsm/bprm_check_security")
int BPF_PROG(bprm_check, struct linux_binprm *bprm, int ret)
{
    char path[256] = {0};

    // Get the path of the executable
    ret = bpf_d_path(&bprm->file->f_path, path, sizeof(path));

    // Check if path starts with "/tmp/"
    for (int i = 0; i < tmp_len; i++) {
        if (path[i] != tmp_path[i]) {
            return 0;
        }
    }

    return -1;
}

The logic is straightforward: initialize a buffer, ask the bpf_d_path helper to fill it with the current file path, and check if it starts with /tmp/. You compile it against a bleeding-edge 6.18 Kernel and attach it.

Time for a test run. You copy a binary to /tmp and try to run it:

$ cp /bin/id /tmp
$ /tmp/id
uid=0(root) gid=0(root) groups=0(root)

Wait, what? The program executed successfully. Your guard failed.

The failing guard (AI Generated)

The failing guard (AI Generated)

To figure out what went wrong, you add some logging. You want to see exactly why the comparison is failing:

    // Check if path starts with "/tmp/"
    for (int i = 0; i < 5; i++) {
        if (path[i] != tmp_path[i]) {
            bpf_printk("\"%c\" != \"%c\"\n", path[i], tmp_path[i]);
            return 0;
        }
    }

You check the trace pipe, expecting to see a mismatch. Instead, you see this:

<...>-47606   [062] ...11 41396.090096: bpf_trace_printk: "/" != "/"
<...>-47608   [002] ...11 41396.091072: bpf_trace_printk: "/" != "/"

Impossible. The logs claim that '/' is not equal to '/'. You are staring at a condition where x != x evaluates to true.

You review the code obsessively. Finally, out of desperation, you make one trivial change. You stop zero-initializing the array:

// char path[256] = {0};    // OLD
char path[256];             // NEW

Suddenly, it works.

$ cp /bin/id /tmp
$ /tmp/id
exec: Failed to execute process '/tmp/id': No permission. Either suid/sgid is forbidden or you lack capabilities.

This leaves us with a haunting question: Why does removing a robust coding practice (initializing variables) fix the program?

Did the Verifier Get Fooled?

Diagnosing this bug requires putting on our detective hats. If zero-initializing the array causes the failure, my hypothesis was that the BPF Verifier was making an incorrect assumption. It seemed to believe that the stack memory remained zero even after the helper function ran, leading it to optimize away the check entirely.

To understand how this happens, we need to look at how the Verifier tracks memory.

Verifier Internal Ⅰ: Stack Memory Tracking

To ensure memory safety, the Verifier walks through the bytecode, tracking the state of every slot on the stack with byte-level granularity. The main states are:

  • STACK_INVALID: Nothing stored here (uninitialized).
  • STACK_SPILL: A register spilled to the stack.
  • STACK_MISC: The BPF program wrote some unknown data here.
  • STACK_ZERO: The BPF program wrote a constant zero here.

When the Verifier encounters a helper function call, it checks the arguments against a static prototype defined in the kernel. This prototype tells the Verifier what the helper does.

For bpf_d_path, the kernel prototype looks like this:

// Prototype in Kernel
static const struct bpf_func_proto bpf_d_path_proto = {
    .func        = bpf_d_path,
    .gpl_only    = false,
    .ret_type    = RET_INTEGER,
    .arg1_type   = ARG_PTR_TO_BTF_ID,
    .arg1_btf_id = &bpf_d_path_btf_ids[0],
    .arg2_type   = ARG_PTR_TO_MEM,
    .arg3_type   = ARG_CONST_SIZE_OR_ZERO,
    .allowed     = bpf_d_path_allowed,
};

This prototype defines the contract: arg2 is a pointer to memory (buf), and arg3 is the size of that memory (sz). The Verifier uses this to check memory safety and update stack status.

Here is the logic the Verifier uses when checking the memory argument:

err = check_mem_size_reg(env, reg, regno,
             fn->arg_type[arg - 1] & MEM_WRITE ?
             BPF_WRITE : BPF_READ,
             true, meta);

It checks if the argument type has the MEM_WRITE flag. If it does, it marks the access as BPF_WRITE.

Later, in check_stack_range_initialized, the Verifier updates the stack slots:

static int check_stack_range_initialized(...)
  if (type == BPF_WRITE)
        clobber = true;
  ...
  for (i = min_off; i < max_off + access_size; i++){
    ...
    if ((*stype == STACK_ZERO) ||
            (*stype == STACK_INVALID && env->allow_uninit_stack)) {
            if (clobber) {
                /* helper can write anything into the stack */
                *stype = STACK_MISC;
            }
    ...
  }
  ...

The Critical logic: Only if clobber is true (which comes from BPF_WRITE) does the Verifier change the stack state from STACK_ZERO to STACK_MISC. If the Verifier doesn’t think the helper writes to memory, it leaves the state as STACK_ZERO.

Verifier Internal Ⅱ: Optimization

The Verifier performs aggressive optimizations based on these states.

  1. Memory Read Optimization: If the Verifier sees a read from a stack slot marked STACK_ZERO, it doesn’t bother generating a memory load instruction. It simply replaces the register value with the constant 0.
  2. Dead Code Elimination: If the Verifier can statically determine a condition (e.g., comparing two known constants), it calculates the result at verification time and deletes the dead branch.

The Triggering Path

We can now reconstruct the crime scene. The bug was a combination of a lying prototype and a trusting Verifier:

  1. The Lie: The kernel developer missed a flag in the bpf_d_path prototype.
.arg2_type    = ARG_PTR_TO_MEM, /* MEM_WRITE is missing! */
  1. The Assumption: Because MEM_WRITE was missing, check_mem_size_reg treated the access as a read. Consequently, the Verifier did not update the stack slots. It believed path[256] still contained the all-zeros we initialized it with.
  2. The Optimization: When our code checked if (path[i] != tmp_path[i]), the Verifier thought, “I know path[i] is 0 (STACK_ZERO). I know tmp_path is /tmp/. 0 is not equal to /, so this condition is always true.”
  3. The Collapse: The Verifier optimized away the read of the actual data written by the helper. It effectively hardcoded the failure logic.
The verifier’s wrong assumption of stack state (AI Generated)

The verifier’s wrong assumption of stack state (AI Generated)

Why did removing initialization fix it? When we removed = {0}, the stack state started as STACK_INVALID. Normally, reading invalid stack is forbidden. However, tracing tools often have CAP_PERFMON (or equivalent), which relaxes the Verifier:

 * CAP_PERFMON relaxes the verifier checks further:
 * - bpf_trace_printk to print kernel memory is allowed

Because the state was STACK_INVALID (not STACK_ZERO), the “replace with constant 0” optimization didn’t trigger. The program was forced to perform the actual memory read at runtime, fetching the correct data written by bpf_d_path.

The Fix

The fix is technically simple: we just need to tell the truth in the prototype. We submitted a patch to add the MEM_WRITE tag to bpf_d_path, which was quickly merged.

[PATCH bpf v5 0/2] bpf: fix bpf_d_path() helper prototype

But once you see a bug like this, you start looking for others. It turns out bpf_d_path wasn’t the only offender. We found several other helpers with incorrect prototypes:

Helper Function PrototypeIssue
bpf_snprintf_protoMissing MEM_WRITE
bpf_snprintf_btf_protoMissing MEM_WRITE
bpf_read_branch_records_protoMissing MEM_WRITE
bpf_xdp_fib_lookup_protoMissing MEM_WRITE
bpf_skb_fib_lookup_protoMissing MEM_WRITE
bpf_get_stack_proto_raw_tpIncorrectly marked MEM_RDONLY (should be MEM_WRITE)
bpf_tcp_raw_gen_syncookie_ipv4_protoMissing MEM_RDONLY
bpf_tcp_raw_gen_syncookie_ipv6_protoMissing MEM_RDONLY

Missing MEM_WRITE causes the optimization bug we just analyzed. Missing MEM_RDONLY causes the Verifier to incorrectly reject valid programs that try to pass read-only buffers to helpers.

We reported these in a follow-up RFC patch: [RFC bpf PATCH 0/2] bpf: Fix memory access tags in helper prototypes

However, as I noted in the RFC, patching these specific instances doesn’t solve the fundamental problem. The Verifier’s correctness relies entirely on manual annotations (prototypes) provided by kernel developers. If the prototype lies, the Verifier is fooled, and C compilers cannot statically detect this mismatch.

Discussion: When Compiler Meets Kernel

The BPF ecosystem sits in a unique position. The Verifier is deeply coupled with the kernel, while the compiler (LLVM/GCC) runs in userspace.

To maintain maintainability and low loading times, the kernel community enforces simplicity on the Verifier. This constraint dictates our trade-offs: we sacrifice analysis precision and optimization opportunities for speed and soundness. Unlike a userspace compiler, the Verifier cannot perform comprehensive, time-consuming analysis.

Furthermore, userspace compilers are flying blind regarding kernel specifics. They don’t know the semantics of helper functions—only the kernel does. Conversely, the kernel’s JIT engine converts bytecode to machine code without the benefit of the rich infrastructure found in traditional compiler backends.

We are seeing interesting movements to bridge this gap. Recent academic work, such as Prove It to the Kernel: Precise Extension Analysis via Proof-Guided Abstraction Refinement, explores offloading verification work to userspace. The idea is to let a userspace prover exchange metadata with the kernel, “convince” the kernel that a program is safe.

Could we go further for code optimization? Maybe the userspace compiler could generate multiple program variants, letting the Verifier pick the “correct” and optimal one for the running kernel? Or perhaps we could define transformation rules in the Verifier, allowing the compiler to supply a “proof of optimization” that the kernel simply replays? I don’t have the final answer yet, but this bug proved one thing: there is still plenty of interesting work to be done in the code verification and optimization of BPF programs.