From bytecode to bytes: automated magic packet generation

2026-04-08

6 min read

Linux malware often hides in Berkeley Packet Filter (BPF) socket programs, which are small bits of executable logic that can be embedded in the Linux kernel to customize how it processes network traffic. Some of the most persistent threats on the Internet use these filters to remain dormant until they receive a specific "magic" packet. Because these filters can be hundreds of instructions long and involve complex logical jumps, reverse-engineering them by hand is a slow process that creates a bottleneck for security researchers.

To find a better way, we looked at symbolic execution: a method of treating code as a series of constraints, rather than just instructions. By using the Z3 theorem prover, we can work backward from a malicious filter to automatically generate the packet required to trigger it. In this post, we explain how we built a tool to automate this, turning hours of manual assembly analysis into a task that takes just a few seconds.

The complexity ceiling

Before we look at how to deconstruct malicious filters, we need to understand the engine running them. The Berkeley Packet Filter (BPF) is a highly efficient technology that allows the kernel to pull specific packets from the network stack based on a set of bytecode instructions.

While many modern developers are familiar with eBPF (Extended BPF), the powerful evolution used for observability and security, this post focuses on "classic" BPF. Originally designed for tools like tcpdump, classic BPF uses a simple virtual machine with just two registers to evaluate network traffic at high speeds. Because it runs deep within the kernel and can "hide" traffic from user-space tools, it has become a favorite tool for malware authors looking to build stealthy backdoors.

Creating a contextual representation of BPF instructions using LLMs is already reducing the manual overhead for analysts, crafting the network packets that correspond to the validating condition can still be a lot of work, even with the added context provided by LLM’s.

Most of the time this is not a problem if your BPF program has only ~20 instructions, but this can get exponentially more complex and time-consuming when a BPF program consists of over 100 instructions as we’ve observed in some of the samples.

If we deconstruct the problem we can see that it boils down to reading a buffer and checking a constraint, depending on the outcome we either continue our execution path or stop and check the end result.

This kind of problem that has a deterministic outcome can be solved by Z3, a theorem prover that has the means to solve problems with a set of given constraints.

Exhibit A: BPFDoor

BPFDoor is a sophisticated, passive Linux backdoor, primarily used for cyberespionage by China-based threat actors, including Red Menshen (also known as Earth Bluecrow). Active since at least 2021, the malware is designed to maintain a stealthy foothold in compromised networks, targeting telecommunications, education, and government sectors, with a strong emphasis on operations in Asia and the Middle East.

BPFDoor uses BPF to monitor all incoming traffic without requiring a specific network port to be open.

BPFDoor example instructions

Let’s focus on the sample of which was shared for the research done by Fortinet (82ed617816453eba2d755642e3efebfcbd19705ac626f6bc8ed238f4fc111bb0). If we dissect the BPF instructions and add some annotations, we can write the following:

(000) ldh [0xc]                   ; Read halfword at offset 12 (EtherType)
(001) jeq #0x86dd, jt 2, jf 6     ; 0x86DD (IPv6) -> ins 002 else ins 006
(002) ldb [0x14]                  ; Read byte at offset 20 (Protocol)
(003) jeq #0x11, jt 4, jf 15      ; 0x11 (UDP) -> ins 004 else DROP
(004) ldh [0x38]                  ; Read halfword at offset 56 (Dst Port)
(005) jeq #0x35, jt 14, jf 15     ; 0x35 (DNS) -> ACCEPT else DROP
(006) jeq #0x800, jt 7, jf 15     ; 0x800 (IPv4) -> ins 007 else DROP
(007) ldb [23]                    ; Read byte at offset 23 (Protocol)
(008) jeq #0x11, jt 9, jf 15      ; 0x11 (UDP) -> ins 009 else DROP
(009) ldh [20]                    ; Read halfword at offset 20 (fragment)
(010) jset #0x1fff, jt 15, jf 11  ; fragmented -> DROP else ins 011
(011) ldxb 4*([14]&0xf)           ; Load index (x) register ihl & 0xf
(012) ldh [x + 16]                ; Read halfword at offset x+16 (Dst Port)
(013) jeq #0x35, jt 14, jf 15     ; 0x35 (DNS) -> ACCEPT else DROP
(014) ret #0x40000 (ACCEPT)
(015) ret #0 (DROP)

In the above example we can establish there are two paths that lead to an ACCEPT outcome (step 5 and step 13). We can also clearly observe certain bytes being checked, including their offsets and values.

Taking these validations, and keeping track of anything that would match the ACCEPT path, we should be able to automatically craft the packets for us.

Calculating the shortest path

To find the shortest path to a packet that validates the conditions presented in the BPF instructions, we need to keep track of paths that are not ending in the unfavorable condition.

We start off by creating a small queue. This queue holds several important data points:

The pointer to the next instruction.
Our current path of executed instructions + the next instruction.

Whenever we encounter an instruction that is checking a condition, we keep track of the outcome using a boolean and store this in our queue, so we can compare paths on the amount of conditions before the ACCEPT condition is reached and calculate our shortest path. In pseudocode we can express this best as:

paths = []
queue = dequeue([(0, [0])])

while queue:
	pc, path = queue.popleft()

	if pc >= len(instructions):
            continue

instruction = instructions[pc]
	
	if instruction.class == return_instruction:
		if instruction_constant != 0:  # accept
			paths.append(path)
		continue  # drop or accept, stop parsing this instruction

if instruction.class == jump_instruction:
	if instruction.operation == unconditional_jump:
		next_pc = pc + 1 + instruction_constant
		queue.append((next_pc, path + [next_pc]))
		continue

	# Conditional jump, explore both
	pc_true = pc + 1 + instruction.jump_true
	pc_false = pc + 1 + instruction.jump_false
	
	if instruction.jump_true <= instruction.jump_false:
		queue.append((pc_true, path + [pc_true]))
		queue.append((pc_false, path + [pc_false]))
	# else: same as above but reverse order of appending
# else: sequential instruction, append to the queue

If we execute this logic against our earlier BPFDoor example, we will be presented with the shortest path to an accepted packet:

(000) code=0x28 jt=0 jf=0  k=0xc     ; Read halfword at offset 12 (EtherType)
(001) code=0x15 jt=0 jf=4  k=0x86dd  ; IPv6 packet
(002) code=0x30 jt=0 jf=0  k=0x14    ; Read byte at offset 20 (Protocol)
(003) code=0x15 jt=0 jf=11 k=0x11    ; Protocol number 17 (UDP)
(004) code=0x28 jt=0 jf=0  k=0x38    ; Read word at offset 56 (Destination Port)
(005) code=0x15 jt=8 jf=9  k=0x35    ; Destination port 53
(014) code=0x06 jt=0 jf=0  k=0x40000 ; Accept

This is already a helpful automation in automatically solving our BPF constraints when it comes to analyzing BPF instructions and figuring out how the accepted packet for the backdoor would look. But what if we can take it a step further?

What if we could create a small tool that will give us the expected packet back in an automated manner?

Employing Z3 and scapy

One such tool that is perfect to solve problems given a set of constraints is Z3. Developed by Microsoft the tool is labeled as a theorem prover and exposes easy to use functions performing very complex mathematical operations under the hood.

The other tool we will use for crafting our valid magic packets is scapy, a popular Python library for interactive packet manipulation.

Given that we already have a way to figure out the path to an accepted packet, we are left with solving the problem by itself, and then translating this solution to the bytes at their respective offsets in a network packet.

Symbolic execution

A common technique for exploring paths taken in a given program is called symbolic execution. For this technique we are giving input that can be used as variables, including the constraints. By knowing the outcome of a successful path we can orchestrate our tool to find all of these successful paths and display the end result to us in a contextualized format.

For this to work we will need to implement a small machine capable of keeping track of the state of things like constants, registers, and different boolean operators as an outcome of a condition that is being checked.

class BPFPacketCrafter:
    MIN_PKT_SIZE = 64           # Minimum packet size (Ethernet + IP + UDP headers)
    LINK_ETHERNET = "ethernet"  # DLT_EN10MB - starts with Ethernet header
    LINK_RAW = "raw"            # DLT_RAW - starts with IP header directly
    MEM_SLOTS = 16              # Number of scratch memory slots (M[0] to M[15])

    def __init__(self, ins: list[BPFInsn], pkt_size: int = 128, ltype: str = "ethernet"):
        self.instructions = ins
        self.pkt_size = max(self.MIN_PKT_SIZE, pkt_size)
        self.ltype = ltype

        # Symbolic packet bytes
        self.packet = [BitVec(f"pkt_{i}", 8) for i in range(self.pkt_size)]

        # Symbolic registers (32-bit)
        self.A = BitVecVal(0, 32)  # Accumulator
        self.X = BitVecVal(0, 32)  # Index register

        # Scratch memory M[0-15] (32-bit words)
        self.M = [BitVecVal(0, 32) for _ in range(self.MEM_SLOTS)]

With the above code we’ve covered most of the machine for keeping a state during the symbolic execution. There are of course more conditions we need to keep track of, but these are handled during the solving process. To handle an ADD instruction, the machine maps the BPF operation to a Z3 addition:

def _execute_ins(self, insn: BPFInsn):
    cls = insn.cls
    if cls == BPFClass.ALU:
        op = insn.op
        src_val = BitVecVal(insn.k, 32) if insn.src == BPFSrc.K else self.X
        if op == BPFOp.ADD:
            self.A = self.A + src_val

Luckily the BPF instruction set is only a small set of instructions that’s relatively easy to implement — only having two registers to keep track of is definitely a welcome constraint!

The overall working of this symbolic execution can be laid out in the following abstracted overview:

Initialize the “x” (index) and “a” (accumulator) registers to their base state.
Loop over the instructions from the path that was identified as a successful path;
- Execute non-jump instructions as-is, keeping track of register states.
- Determine if a jump instruction is encountered, and check if the branch should be taken.
Use the Z3 check() function to check if our condition has been satisfied with the given constraint (ACCEPT).
Convert the Z3 bitvector arrays into bytes.
Use scapy to construct packets of the converted bytes.

If we look at the constraints build by the Z3 solver we can trace the execution steps taken by Z3 to build the packet bytes:

[If(Concat(pkt_12, pkt_13) == 0x800,
    pkt_14 & 0xF0 == 0x40,
    True),
 If(Concat(pkt_12, pkt_13) == 0x800, pkt_14 & 0x0F >= 5, True),
 If(Concat(pkt_12, pkt_13) == 0x800, pkt_14 & 0x0F == 5, True),
 If(Concat(pkt_12, pkt_13) == 0x86DD,
    pkt_14 & 0xF0 == 0x60,
    True),
 0x86DD == ZeroExt(16, Concat(pkt_12, pkt_13)),
 0x11 == ZeroExt(24, pkt_20),
 0x35 == ZeroExt(16, Concat(pkt_56, pkt_57))]

The first part of the Z3 displayed constraints are the constraints added to ensure we’re building up a valid ethernet IP when dealing with link-layer BPF instructions. The “If” statements apply specific constraints based on which protocol is detected:

IPv4 Logic (0x0800):
- pkt_14 & 240 == 64: Byte 14 is the start of the IP header. The 0xF0 mask isolates the high nibble (the Version field) to check if the version is 4 (0x40).
- pkt_14 & 15 == 5: 15 (0x0F), isolating the low nibble (IHL - Internet Header Length). This mandates a header length of 5 (20 bytes), which is the standard size without options.
IPv6 Logic (0x86dd):
- pkt_14 & 240 == 0x60: Check if the version field is version 6 (0x60)

We can observe the network packet values when we look at the second part where different values are being checked:

0x86DD: Packet condition for IPv6 header.
0x11: UDP protocol number.
0x35: The destination port (53).

Next to the expected values we can see the byte offset of where it should exist in a given packet (e.g. pkt_12, pkt_13).

Crafting packets

Now that we’ve established which bytes should exist at specific offsets we can convert it into an actual network packet using scapy. If we generate a new packet from the bytes of our previous Z3 constraints we can clearly see what our packet would look like, and store this for further processing:

###[ Ethernet ]###
  dst       = 00:00:00:00:00:00
  src       = 00:00:00:00:00:00
  type      = IPv6                 <-- IPv6 Packet
###[ IPv6 ]###
     version   = 6
     tc        = 0
     fl        = 0
     plen      = 0
     nh        = UDP               <-- UDP Protocol
     hlim      = 0
     src       = ::
     dst       = ::
###[ UDP ]###
        sport     = 0
        dport     = domain         <-- Port 53
        len       = 0
        chksum    = 0x0

These newly crafted packets can in turn be used for further research or identifying the presence of these implants by scanning for these over the network.

Try it yourself

Understanding what a specific BPF set of instructions is doing can be cumbersome and time-consuming work. The example used is only a total of sixteen instructions, but we’ve encountered samples that were over 200 instructions that would’ve taken at least a day to understand. By using the Z3 solver, we can now reduce this time to just seconds, and not only display the path to an accepted packet, but also the packet skeleton for this as well.

We have open-sourced the filterforge tool to help the community automate the deconstruction of BPF-based implants. You can find the source code, along with usage examples, on our GitHub repository.

By publishing this research and sharing our tool for reducing analysts’ time spent figuring out the BPF instructions, we hope to spark further research by others to expand on this form of automation.

Cloudflare's connectivity cloud protects entire corporate networks, helps customers build Internet-scale applications efficiently, accelerates any website or Internet application, wards off DDoS attacks, keeps hackers at bay, and can help you on your journey to Zero Trust.

Visit 1.1.1.1 from any device to get started with our free app that makes your Internet faster and safer.

To learn more about our mission to help build a better Internet, start here. If you're looking for a new career direction, check out our open positions.

Malware Network Z3 BPF Reverse Engineering