Payload Obfuscation for Red Teams

Instructor:	Duncan Ogilvie
Duration:	2 days
Format:	On-site training with lectures and guided exercises.
Price:	TBD
Registration:	training@ogilvie.pl

Description

Payload obfuscation can move sensitive logic out of native instruction streams and into a virtual execution environment. This training teaches participants how VM-based obfuscation works, how to compile payload logic to RISC-V, and how to execute that code inside a compact interpreter embedded in a host process.

The course starts from first principles with a small custom VM. Participants reverse a bytecode program, identify opcode handlers, write bytecode by hand, inspect interpreter dispatch, and compare a simple switch-based VM with a direct-threaded variant. This gives participants a concrete model for why virtualization raises reverse-engineering cost and why writing bytecode manually does not scale.

The second part introduces RISC-V as a practical VM instruction set. Participants learn the RV64 register model, common instructions, calling convention, position-independent shellcode constraints, linker scripts, ELF containers, raw binary extraction, tracing, and disassembly workflow. They compile small C payloads to rv64im, run them in riscvm, and debug failures with traces and instruction references.

The final part builds useful payloads and hardens the VM. Participants study the host/guest memory model, ecall-based syscalls, import resolution, host_call, the LLVM transpiler, the minimal runtime, relocation handling, payload packaging, opcode shuffling, instruction encryption, direct dispatch, C2 integration, and interpreter obfuscation tradeoffs.

Teaching

The training is exercise-driven. Each lecture block introduces a mechanism that is applied immediately in a lab task. Participants work inside a prepared development environment and build a complete pipeline from C source code to obfuscated RISC-V payload bytes executed by a host VM.

The course goal is pipeline literacy. Participants learn how to inspect each stage, diagnose broken payloads, reason about VM feature mismatches, read the generated RISC-V, and evaluate which obfuscation layers increase analyst effort.

The material is designed for authorized red teams, security researchers, and reverse engineers who need to understand code virtualization from both the builder and analyst perspective.

Learning Objectives

Explain VM-based obfuscation, bytecode, VM contexts, opcode handlers, and interpreter dispatch.
Reverse simple VM bytecode and translate it into C-like pseudocode.
Write basic bytecode programs for arithmetic, conditionals, and loops.
Understand why a well-supported ISA such as RISC-V is useful for payload virtualization.
Read common RV64 instructions, registers, ABI names, branches, loads, stores, and calls.
Compile freestanding C code to rv64im object code with Clang.
Use linker scripts, map files, ELF containers, relocations, and llvm-objcopy to produce raw shellcode bytes.
Debug RISC-V payloads with VM traces, Ghidra, and instruction references.
Understand the shared host/guest memory model used by riscvm.
Implement and use VM syscalls through the RISC-V ecall convention.
Resolve host imports and call host functions from RISC-V with resolve_import and host_call.
Understand how LLVM bitcode extraction and import maps feed the transpiler.
Follow the transpiler pipeline from Windows API C code to RISC-V LLVM IR and shellcode.
Understand the role of crt0, import initialization, relocation processing, minimal CRT functions, and payload startup.
Build and run example payloads through a controlled VM and C2 demo.
Apply VM hardening features such as opcode shuffling, instruction encryption, and direct dispatch.
Understand interpreter-obfuscation options such as LLVM obfuscation passes, native rewriting, junk insertion, opaque predicates, and per-sample variation.
Evaluate limitations such as missing host-to-guest callbacks, restricted C++ support, and the difficulty of translating existing x64 shellcode.

Outline

Day 1: VM Obfuscation and RISC-V Payloads
- Environment setup
  - GitHub Codespaces onboarding
  - Repository fork, machine type selection, and toolchain verification
  - Project tour: exercise_*, riscvm, payload, transpiler, and obfuscator
  - Local Docker workflow for offline use after the training
- VM-based obfuscation model
  - Native execution versus virtualized execution
  - Bytecode, VM contexts, virtual registers, program counters, and handlers
  - Reverse-engineering cost model for custom VMs
  - Non-goals: no turnkey advanced obfuscator and no opaque compiler-theory deep dive
- Mini VM analysis
  - minivm.cpp structure
  - Active bytecode extraction
  - Opcode table recovery
  - Register and context layout
  - Labels, jumps, conditionals, and bytecode patching
  - Switch dispatch versus direct-threaded dispatch
- Mini VM exercises
  - Recover the executed bytecode
  - Document all opcodes and their semantics
  - Translate the bytecode into C pseudocode
  - Write bytecode for a + b, a * b, a - b, and a == 42 ? 1337 : 0
  - Bonus: implement fib(n) and analyze the direct-threaded binary in a disassembler
- RISC-V as a payload VM ISA
  - Why RISC-V fits this use case
  - rv64im scope and disabled compressed instructions
  - Register aliases: zero, ra, sp, temporaries, saved registers, and argument registers
  - Common instructions: addi, mv, sw, lw, add, blt, jal, jalr, and ret
  - Pseudo-instructions and reference documentation workflow
- RISC-V shellcode build pipeline
  - Freestanding C payload structure
  - Clang riscv64 target selection
  - -march=rv64im and -mcmodel=medany
  - Linker script layout for raw shellcode
  - ELF as a temporary container for symbols and relocations
  - llvm-objdump disassembly and llvm-objcopy raw binary extraction
  - Map files and symbol lookup
- RISC-V shellcode exercises
  - Build and run a hello payload in the Linux VM build
  - Run with --trace and inspect the generated trace
  - Load the ELF in Ghidra and comment each instruction
  - Explain what happens when _start returns instead of calling exit
  - Complete a build script that automates compile, link, dump, and run steps
- Host interaction model
  - Host process versus RISC-V guest
  - Shared address space and pointer handling
  - Code, data, heap, and stack layout
  - Why guest code cannot directly execute host instructions
  - VM exit and re-entry through ecall
- VM syscall interface
  - ecall convention and syscall numbers
  - Argument and return registers
  - Debug print syscalls
  - Memory helper syscalls
  - resolve_import for module and function lookup
  - host_call for invoking host functions
  - Syscall tracing and debugging
- Host interaction exercises
  - Recover available syscalls from riscvm.cpp
  - Implement a print_string syscall stub
  - Resolve and call puts from RISC-V code
  - Use host_call to pass arguments into a host function
  - Bonus: create a payload that reads a lab file and displays its contents
Day 2: Transpilation, Runtime, and Hardening
- From handwritten stubs to automated payload builds
  - Limits of writing VM payload code by hand
  - Payload source constraints
  - Windows API-oriented C payloads
  - Debug builds versus hardened builds
- LLVM transpiler pipeline
  - Windows payload compilation with Clang/MinGW
  - Embedding and extracting LLVM bitcode
  - Import map generation from the PE import table
  - LLVM IR transformation from host imports to RISC-V VM runtime calls
  - Pointer-sized argument casting and host_call argument arrays
  - RISC-V LLVM IR emission
  - Object generation, linking, relocation extraction, and raw binary packaging
- Minimal runtime and loader
  - crt0 responsibilities
  - Relocation application at startup
  - Import resolution before main
  - Global constructor initialization
  - main invocation and VM exit
  - Minimal CRT functions for allocation, new/delete, strings, and output
  - Unsupported runtime features such as exceptions and RTTI-heavy C++
- Payload exercises
  - Build riscvm for Windows with tracing enabled
  - Build the payload project
  - Run the hello payload
  - Run the message-box payload through Wine and NoVNC or a Windows host
  - Run the C2 test payload through the controlled demo server
  - Inspect the generated bitcode, import map, RISC-V object, map file, and final payload bytes
- C2 integration patterns
  - Embedding the VM as a library
  - Loading payload bytes from disk, memory, or a network channel
  - HTTP POST demo server workflow
  - Polling versus push-style payload delivery
  - Custom syscall boundary design for a framework
  - Logging, tracing, and controlled lab execution
- VM hardening overview
  - Feature flags and payload/VM compatibility checks
  - Layering obfuscation features safely
  - Debuggability tradeoffs when tracing is disabled
  - Static signatures against an unmodified interpreter
- Instruction encryption
  - Whole-payload encryption versus decode-time instruction encryption
  - Position-dependent keys derived from the program counter
  - Fetch-time decryption inside riscvm_fetch
  - Feature metadata appended to protected payloads
  - What encryption hides and what memory dumping can still recover
- Opcode shuffling
  - Primary opcode, funct3, and funct7 remapping
  - opcodes.json, generated headers, and shuffled payload bytes
  - Keeping the interpreter and payload mapping synchronized
  - Breaking standard RISC-V disassemblers
  - Per-sample randomization and its effect on custom tooling
- Direct dispatch and interpreter control flow
  - Switch-based dispatch recognition
  - Direct-threaded dispatch with computed targets
  - Handler-to-handler jumps
  - Performance and reverse-engineering impact
  - Compiler flags used to preserve the desired dispatch shape
- Hardening exercises
  - Build riscvm with hardening enabled
  - Verify that encrypted payloads still execute
  - Generate a new opcode shuffling map
  - Compare traces and disassembly before and after hardening
  - Bonus: add a new lab payload to the build configuration
- Interpreter obfuscation and signature resistance
  - Why a static VM interpreter is easy to fingerprint
  - Handler-level obfuscation goals
  - LLVM-based obfuscator options
  - Native rewriting with junk instructions and opaque predicates
  - Liveness checks and behavior-preserving rewrites
  - Environment keying and custom feature gates
- Limitations and design tradeoffs
  - No automatic translation of existing x64 shellcode
  - No host-to-guest callbacks without additional stubs
  - Limited C++ runtime support
  - Host API calls remain observable behavior
  - VM size, speed, compatibility, and analysis-cost tradeoffs
  - Follow-up paths for custom syscalls, callbacks, stronger interpreter rewriting, and defensive deobfuscation tooling

Requirements and Recommendations

Prerequisites

Participants should be familiar with:

C programming. This is required for the hands-on payload exercises.
Basic reverse engineering concepts.
Assembly at a modest level. Prior RISC-V experience is not required.
Python basics for build scripts and helper tooling.
Command-line workflows with CMake, Clang, and common LLVM tools.

Helpful but optional:

LLVM IR familiarity.
Windows API experience.
Prior exposure to VMProtect, Themida, OLLVM, or similar obfuscation systems.

Workstation Requirements

Each participant needs their own workstation. The prepared environment requires:

A browser.
A free personal GitHub account.
Access to GitHub Codespaces during the training.

The exercises can also be run after the training with Docker. A Windows VM or host is useful for follow-up testing, but the prepared Codespaces environment uses Wine and NoVNC for the workshop labs.

Classroom Requirements

The training is delivered on-site only. A dedicated classroom with a projector is required. The training uses a collaborative format with frequent questions, live troubleshooting, and shared exercise discussion.

Instructor

Duncan Ogilvie is the creator of x64dbg and co-author of RISC-Y Business: Raging against the reduced machine. He has professional experience in DRM, mobile security, reverse engineering, and binary tooling. The course materials focus on practical VM internals, transparent build pipelines, and the tradeoffs between obfuscation strength, debuggability, and analyst effort.