eBPF, XDP and Network Security

At Path, we aren’t afraid of breaking things and moving fast. We have fostered a strong engineering culture which encourages innovation and we believe that this important, as it enables our team members to do their best.

One of the latest innovations from within the Path engineering team is our development with eBPF and XDP.

eBPF and XDP are the newest pieces of technology you need to add into your recruiter hottest technology trends vocabulary, alongside Blockchain and Cyberwarfare.

Not really. They are however, technologies I feel have flown under the radar and not only already possess impressive features but also have exciting futures ahead.

In this article, we will dive a tiny little bit into what eBPF and XDP can be used for however we’ll be focusing on how we are incorporating this exciting technology into our network security stack, and share part of our journey doing so.

What is eBPF?

What started out with the modest purpose of network packet filtering, eBPF is quickly growing to become one of the most powerful tools available to Linux. Adopted by the likes of Netflix, Amazon, Google, Microsoft etc and dubbed “Super powers for Linux” by some, it allows you to run ring 3 user-space code within the kernel at ring 0, within a virtual machine.

eBPF has proven to be an invaluable tool for packet filtering and classification, network diagnostics and tracing and performance analysis amongst other things within Path.

This article https://lwn.net/Articles/740157/ can provide further information about eBPF.

What is XDP?

“XDP or eXpress Data Path provides a high performance, programmable network data path in the Linux kernel as part of the IO Visor Project. XDP provides bare metal packet processing at the lowest point in the software stack which makes it ideal for speed without compromising programmability. Furthermore, new functions can be implemented dynamically with the integrated fast path without kernel modification” – iovisor.org

Check out https://www.iovisor.org/technology/xdp for further information about XDP.

eBPF and XDP for Network Security

Why do we want to use eBPF and XDP?

Simply because it greatly strengthens our DDoS Detection, Analysis and Mitigation capabilities. By filtering packets at the silicon level with hardware such as the Mellanox Connect-X5 we are able to reliably outperform any other solution we have tested.

In addition to obvious performance benefits, eBPF and XDP together allows us to add additional layers of security at the lowest level possible. This together with the rest of our proprietary network security technology adds up to exceed Security requirements of any Government, Enterprise or any organisation with sensitive data, and the Path Network Intelligence & Analytics platform provides real-time continuous analytics, providing customers with unparalleled visibility and insight needed over networks and infrastructure.

But like with any new technology, it is not without its pain points and in fact sometimes we feel like we are cutting ourselves on the bleeding edge, as the areas we are treading in are truly yet to be explored.

Let's take a code snippet for finding IP header data in relation to ETH header (this code can be found in a few eBPF examples, and we also use a version of that):

static __always_inline bool parse_eth(struct ethhdr* eth,
                                      void* data_end,
                                      u16* eth_proto,
                                      u64* l3_offset) {
    u16 eth_type = eth->h_proto;
    u64 offset = 0;

    if ((void*)(eth + 1) > data_end)
        return false;

    /* Skip non 802.3 Ethertypes */
    if (unlikely(ntohs(eth_type) < ETH_P_802_3_MIN))
        return false;

        /* Handle (double) VLAN tagged packet */
#pragma unroll
    for (int i = 0; i < 2; ++i) {
        if (eth_type == htons(ETH_P_8021Q) || eth_type == htons(ETH_P_8021AD)) {
            struct vlan_hdr* vlan_hdr;

            vlan_hdr = (void*)eth + offset;
            offset += sizeof(*vlan_hdr);
            if ((void*)(eth + 1) + offset > data_end)
                return false;
            eth_type = vlan_hdr->h_vlan_encapsulated_proto;
        }
    }

    *eth_proto = ntohs(eth_type);
    *l3_offset = offset;
    return true;
}

In this case offset has one of 3 constant values: 0, sizeof(struct vlan_hdr) or 2 * sizeof(struct vlan_hdr), which is apparent when looking at the compiled bytecode. offset is stored in register r4, and this is what happens to it:

$ llvm-objdump -S -no-show-raw-insn xdp_l3offset_kern.o

; offset = 0;
      15:       r4 = 0
; offset += sizeof(*vlan_hdr);
      20:       r4 = 4
; offset += sizeof(*vlan_hdr);
      26:       r5 = r4
      27:       r5 += 4
      // ...
      35:       r4 = r5

We know that the value is [0-8], and the in-kernel eBPF verifier knows it too, as it tracks registers' values to make sure that operations on the packet are well defined:

    if (eth_proto == ETH_P_IP) {
        struct iphdr* ip = (void*)(eth + 1) + l3_offset;
        if ((void*)(ip + 1) > data_end)
            return XDP_DROP;

        // Will print the protocol number to /sys/kernel/debug/tracing/trace_pipe
        bpf_trace_printk("ipproto: %u\n", ip->protocol);
    }
$ llvm-objdump -S -no-show-raw-insn xdp_l3offset_kern.o

; struct iphdr* ip = (void*)(eth + 1) + l3_offset;
; r2 contains the packet pointer
      42:       r2 += r4

Easy.

The compiler and verifier are not in sync, though, so as the eBPF application gets more complex, it may happen that the compiler reorders/optimizes operations in a way that code that would otherwise be verifiable ceases being so. This is the journey of l3_offset in our eBPF, which starts with pretty much the same code as above:

; offset = 0;
      14:       r2 = 0
      15:       *(u64 *)(r10 - 24) = r2
; offset += sizeof(*vlan_hdr);
      25:       r2 = 4
      26:       *(u64 *)(r10 - 24) = r2
; offset += sizeof(*vlan_hdr);
      36:       r2 = 8
      37:       *(u64 *)(r10 - 24) = r2

; struct iphdr* ip = (void*)(eth + 1) + l3_offset;
; r5 contains the packet pointer
      51:       r1 = *(u64 *)(r10 - 24)
      52:       r5 += r1

The compiler moved l3_offset value to a 64-bit stack location. While this code is functionally the same, here's what verifier says when we try to load it into kernel:

49: (55) if r1 != 0x800 goto pc+2540
 R0=inv2 R1=inv2048 R2=pkt_end(id=0,off=0,imm=0) R8=pkt(id=0,off=0,r=26,imm=0) R10=fp0,call_-1 fp-16=pkt_end fp-32=ctx
50: (bf) r5 = r8
51: (79) r1 = *(u64 *)(r10 -24)
52: (0f) r5 += r1
math between pkt pointer and register with unbounded min value is not allowed

Storing l3_offset in memory causes the verifier to forget the constraints. It no longer knows that r1 is in range [0-8]; rather, it assumes that r1 can store any 64-bit value. What we can do in this case is let the verifier know what we know is the range of possible values.

    if (eth_proto == ETH_P_IP) {
        if ((s64)l3_offset < 0 || l3_offset > 8)
            return XDP_DROP;

        struct iphdr* ip = (void*)(eth + 1) + l3_offset;
        if ((void*)(ip + 1) > data_end)
            return XDP_DROP;

        // Will print the protocol number to /sys/kernel/debug/tracing/trace_pipe
        bpf_trace_printk("ipproto: %u\n", ip->protocol);
    }

But if we look at the verifier output, the new check compiles to this:

51: (79) r1 = *(u64 *)(r10 -24)
52: (25) if r1 > 0x8 goto pc+2534
 R0=inv1 R1=inv(id=0,umax_value=8,var_off=(0x0; 0xf)) R2=pkt_end(id=0,off=0,imm=0) R8=pkt(id=0,off=0,r=26,imm=0) R10=fp0,call_-1 fp-16=pkt_end fp-32=ctx

Because the compiler knows that l3_offset is an unsigned 16-bit value, it leaves out the unnecessary check l3_offset < 0. As you can see from verifier output, this helped learn about the max value of r1 (umax_value=8), but did not give it an idea of the min value.

Another common trick to let the verifier about values constraints is to bit-and it with a constant:

    if (eth_proto == ETH_P_IP) {
        l3_offset &= 0xF;

        struct iphdr* ip = (void*)(eth + 1) + l3_offset;
        if ((void*)(ip + 1) > data_end)
            return XDP_DROP;

        // Will print the protocol number to /sys/kernel/debug/tracing/trace_pipe
        bpf_trace_printk("ipproto: %u\n", ip->protocol);
    }

But in this case the compiler omits the instruction from the bytecode completely, because it knows that l3_offset is never bigger than 0xF.

What we're left with is injecting our own instruction into the bytecode to pass on this knowledge to the verifier:

    if (eth_proto == ETH_P_IP) {
        asm volatile("%0 &= 0xF" : "=r"(l3_offset) : "0"(l3_offset));

        struct iphdr* ip = (void*)(eth + 1) + l3_offset;
        if ((void*)(ip + 1) > data_end)
            return XDP_DROP;

        // Will print the protocol number to /sys/kernel/debug/tracing/trace_pipe
        bpf_trace_printk("ipproto: %u\n", ip->protocol);
    }

This asm instruction achieves a couple of things. Let's decompose it:

  //    _ prevents the compiler from optimizing the instruction out
  //   |
  //   |                                       _ store input to the instruction in the same
  //   |                                      |  register the output will be stored in
  //   |                                      |
asm volatile("%0 &= 0xF" : "=r"(l3_offset) : "0"(l3_offset));
  //                                  |              |
  //  the output register is the new _|              |_ take l3_offset as the input value
  //          value of the l3_offset

If the l3_offset value is stored in memory, "0"(l3_offset) input declaration makes sure that it is loaded into a register before our instruction. Assuming the compiler-chosen register is r3, the instruction compiles down rather trivially into r3 &= 15. "=r"(l3_offset) output declaration informs the compiler that the output register now represents the value of the l3_offset variable.

In our early days of development with XDP and eBPF we ran into our fair share of issues and  so we are happy to share this information with the wider open source community in the hope that it will help someone out there.

We're also happy to share this information as an update and are excited to tell you news and share updates along our technology journey, and as we release new features and products we will keep you in the loop.