The second part of a three part series on Windows x64 Shellcoding. Going from a basic Hello World program in Assembly to implementing guardrails in MSFVenom shellcode.
Part 1 – Assembly and WinAPIs – Hello World
Part 2 – Assembly to Shellcode – Stack Buffers and WinAPIs
Part 3 – Adding extra functionality to MSFVenom
Previously we covered how to write a basic Hello World program in x64 ASM, and showed the issues that we need to overcome to convert our Assembly code into Shellcode. These issues are:
- Handling variables manually on the Stack
- Calling Win API functions without relying on linking
Stack Strings
To demonstrate variable handling on the Stack we’ll start with Stack strings, then continue onto Stack buffers. So to demonstrate Stack strings we will use the MessageBoxA
function.
int MessageBoxA(
[in, optional] HWND hWnd,
[in, optional] LPCSTR lpText,
[in, optional] LPCSTR lpCaption,
[in] UINT uType
);
After the first section hopefully you can now modify the code to use the new MessageBoxA
function. If we call this function with 0
values for all variables, we get an empty message box with the default title Error
.
mov rcx, 0
mov rdx, 0
mov r8, 0
mov r9, 0
call MessageBoxA
The second argument, namely that in the rdx
register requires a pointer to a string to display a message within the message box. One simple way of handling this is to push 64-bit strings onto the stack, and then assigning the address of the stack string to a register. Because of the endianness and structure of the stack, string arguments need to be written in a rather annoying format, for example the following string AAAABBBBCCCDDDD
(notice the three C’s, this is intentional), would usually be read as 0x414141414242424243434344444444
. However to preserve the order of the string on the stack, this would need to be pushed onto the stack in the following order.
0x44444444434343
0x4242424241414141
Another thing to remember is about stack alignment, every push decrements the stack pointer by 8-bits, so a single push will need to have an additional 8-bits subtracted from the stack pointer to account for stack alignment. If we push two strings to the stack, the 16-bit alignment works itself out naturally. Therefore the MessageBoxA
call could look something like this:
mov rcx, 0
mov rdx, 0x44444444434343
push rdx
mov rdx, 0x4242424241414141
push rdx
mov rdx, rsp
mov r8, 0
mov r9, 0
call MessageBoxA
It’s easy to make mistakes with the stack string format, while I would recommend trying to do a few different strings manually to learn how to do it, the Python script below handles the formatting of the bytes for you.
# run with python heck.py | tac
msg = "Oh heck! A wild message box appeared!"
for i in range(0, len(msg), 8):
rmsg = msg[i:i+8][::-1]
print("0x" + ''.join(
[
format(ord(ch), '02x')
for ch in rmsg
]
))
Putting this together, we get the following ASM code. Note the single sub rsp, 8
instruction because of the odd number of push
instructions.
start:
sub rsp, 40 ; reserve shadow space
mov rdx, 0x2164657261
push rdx
mov rdx, 0x6570706120786f62
push rdx
mov rdx, 0x206567617373656d
push rdx
mov rdx, 0x20646c6977204120
push rdx
mov rdx, 0x216b63656820684f
push rdx
mov rdx, rsp
mov rcx, 0
mov r8, 0
mov r9, 0
sub rsp, 8
call MessageBoxA
xor rcx, rcx
call ExitProcess
Stack Buffers
Now onto something a little bit more complicated, Stack Buffers. After learning and reading a few tutorials on ASM, I was a bit disappointed as the functionality I was implementing was just message boxes or similar style functions. So to do something a little more interesting, let’s use GetComputerNameA
to get the hostname of our machine. A brief look at the documentation and this function initially looks very simple to implement.
BOOL GetComputerNameA(
[out] LPSTR lpBuffer,
[in, out] LPDWORD nSize
);
Perhaps we could use a stack string, but that just feels messy, what we need instead is a stack buffer. An area of memory reserved on the stack for us to use to write the hostname to. Starting with the following template, similar to the Hello World program from Part 1, we need to create a stack buffer, and then call GetComputerNameA
.
start:
sub rsp, 40 ; reserve shadow space
mov rcx, -11
call GetStdHandle
mov rbx, rax ; store into rbx for later
; TODO: create stack buffer
; TODO: call GetComputerNameA
mov rdx, <stackBuffer>
mov rcx, rbx ; store stdout handle to rcx
mov r8, 6 ; length of my hostname
mov r9, 0
mov qword [rsp+32], 0
call WriteFile
xor rcx, rcx
call ExitProcess
So another problems we are now faced with, is now we’re handling much more data in our registers and stack, we need to make sure that our stack buffer and GetComputerNameA
function call doesn’t destroy our reference to the output handle from GetStdHandle
, so we store the output handle in a non clobbered register, i.e. rbx
.
Next we create a stack buffer, what I mean by that is a space reserved on the stack for us to store our ComputerName. We do this by first decrementing the stack pointer by however much space we wish to use to store our variable. In this example I will use 40. So the new stack pointer is now at rsp-40
. This address is then assigned to another variable that will be used to point to our stack buffer, e.g. mov rcx, rsp
. We then add 40 back to rcx
so we can get a reference back to the start of our buffer, and finally store rcx
into r12
as a non clobbered register, because after we call GetComputerNameA
, our reference in rcx
will likely be destroyed. The code for creating a stack buffer is below.
sub rsp, 40 ; create space for stack buffer
mov rcx, rsp
add rcx, 40 ; get stack buffer pointer
mov r12, rcx ; backup stack buffer
There are much nicer and more efficient ways of handling this, but I think it’s easier to explain for the first time when being as excessive as possible.
If you experiment with different values for the stack buffer you may run into some strange issues. Remember, alignment is always a priority here. Decrementing the stack pointer by 40, means that we need to align the stack by 8 manually to preserve 16-bit alignment. We do this with a push instruction later on so it is handled for us and we don’t need to handle it explicitly with an additional sub rsp, 8
instruction.
If we used 80 instead of 40 however, 80 is already 16-bit aligned (mod 80 16 == 0)
so we wouldn’t need to worry about alignment, however our future code (as discussed previously and shown below) will include a push, so if using 80 as our stack buffer size, we would need to add an additional sub rsp, 8
instruction, or rewrite our push
instruction as something else to ensure the stack is 16-bit aligned.
Anyway, our GetComputerNameA
call requires two registers, rcx
as the stack buffer, which we have just set up, and rdx
a pointer to a numerical value for the computer name length. My computer name for context is jwin11
. Also note how we’re handling the rdx
argument, this is what I was referring to earlier. This is why we don’t need to align the stack manually if using a stack buffer size of 40, our stack buffer is re-aligned with the push
instruction.
mov rax, 0xff
push rax
mov rdx, rsp
call GetComputerNameA
Usually, as seen with other functions e.g. GetStdHandle
, the return value is stored in rax
. While the return value from GetComputerNameA
is still stored in rax
, that value is a boolean on the success value of the function call, which we don’t really care about. The value wecare about is stored in our stack buffer. So rcx
right? No, as I mentioned previously, our rcx
value is likely clobbered and our memory pointer to the stack buffer is gone. This is the reason why we stored it in r12
before the GetComputerNameA
call.
So now we can reinstate our stack buffer address from r12
but now put it into rdx
, as the WriteFile
call requires that the second argument be the string that is printed. The final code for using Stack buffers with GetComputerNameA
should look something like this.
start:
sub rsp, 40 ; reserve shadow space
mov rcx, -11
call GetStdHandle
mov rbx, rax ; store handle in rbx
sub rsp, 40 ; create space for stack buffer
mov rcx, rsp
add rcx, 40 ; get stack buffer pointer
mov r12, rcx ; backup stack buffer
mov rax, 0xff
push rax
mov rdx, rsp
call GetComputerNameA
mov rdx, r12
mov rcx, rbx
mov r8, 6
mov r9, 0
mov qword [rsp+32], 0
call WriteFile
xor rcx, rcx
call ExitProcess
Now we can remove the .data
section from our ASM, and move onto the final blocker for getting our shellcode working.
Walking the Process Environment Block
I learnt the following technique from the following links, I highly recommend reviewing these before continuing, especially the Nytro Security blog as I will be extending Nytro’s code in the remainder of this tutorial.
- Nytro Security – Writing shellcodes for Windows x64
- Windows Internals: Walking the Process Environment Block to Discover In-Memory Libraries
- Finding Kernel32 Base and Function Addresses in Shellcode
So the situation we’re in is that we want to call Windows API functions, but we are not able to link to functions during compilation and we don’t know where in memory the functions exist.
Although I just said we don’t know where Windows APIs are, there are two interesting Windows API function calls we would like to find
The reason we want these functions specifically, is that LoadLibraryA
allows us to load a module of our choosing, e.g. User32.dll
into the address space of our process, and GetProcAddress
allows us to find specific functions within the imported module. For example the logic may look like this
GetProcAddress(LoadLibraryA("User32.dll", "MessageBoxA"))
So we need to find these two functions which both exist in Kernel32.dll
. So to do this, we can walk the Process Environment Block (PEB). Simply put, the PEB is unique to each process and stores information about the processes execution environment. As described further in the links above, we can walk the PEB to locate the address of Kernel32.dll
and therefore our two required functions.
TEB->PEB->Ldr->InMemoryOrderLoadList->currentProgram->ntdll->kernel32.BaseDll
This is possible because InMemoryOrderModuleList
is a doubly linked list data structure that contains linked LDR_DATA_TABLE_ENTRY
structures, which each contain the base address of DLL
files. Each of these data structures also contains the name of the DLL
in the data structure. So with this information we can walk the PEB to find the addresses of the DLL
s we’re searching for. Combine that with our Kernel32
functions from earlier, and we have access to Win APIs.
So assuming that you’ve read and implemented Nytro’s code to get access to specific Win API functions, we can now extend it with the use of stack buffers to call GetComputerNameA
and then print the value out in MessageBoxA
.
Writing Shellcode
We’re going to implement the following seven areas using Nytro’s code as a base.
- Get
Kernel32.dll
address - Get
GetComputerNameA
address - Create a stack buffer for our hostname
- Call
GetComputerNameA
- Get
User32.dll
address - Get
MessageBoxA
address - Call
MessageBoxA
with our stack buffer
Nytro already created a back up of Kernel32
in rbx
, so we already have the address. But then we need to get the address of GetComputerNameA
, GetProcAddress
is already stored in rdi
. So we can make some minor modifications to the ASM used to call GetProcAddress
to get the address of GetComputerNameA
.
xor rcx, rcx
push rcx
mov rcx, 0x41656d614e726574
push rcx
mov rcx, 0x75706d6f43746547
push rcx
mov rdx, rsp ; GetComputerNameA
mov rcx, rbx ; kernel32.dll base address
sub rsp, 0x28
call rdi ; Call GetProcAddress
add rsp, 0x28
add rsp, 0x18
mov r15, rax ; GetComputerNameA in r15
Next we can create our stack buffer as shown earlier. Once again, using 40 as the GetComputerNameA
call afterwards uses a push
instruction.
sub rsp, 40
mov r14, rsp
add r14, 40
mov rcx, r14
mov rdx, 0xff
push rdx
mov rdx, rsp
call r15
Nytro’s blog already includes a code block for fetching the address of User32.dll
, so we can execute that block without any modifications. After getting the address of User32.dll
we can then call GetProcAddress
with the argument for MessageBoxA
.
xor rcx, rcx
push rcx
mov rcx, 0x41786f
push rcx
mov rcx, 0x426567617373654d
push rcx
mov rdx, rsp ; MessageBoxA
mov rcx, r15 ; User32.dll base address
sub rsp, 0x28
call rdi ; Call GetProcAddress
add rsp, 0x28
add rsp, 0x18
mov r15, rax ; Store MessageBoxA into r15
Use our stack buffer, stored in a non-clobbered register and move it into rdx
for the MessageBoxA
call.
mov rcx, 0
mov rdx, r14
mov r8, 0
mov r9, 0
call r15 ; MessageBoxA
Then add an ExitProcess
call at the end, compile and we should have our executable successfully printing a message box with our hostname as the message.
Now we have removed the .data
section and external functions, we can go back to objdump
and extract our shellcode. The following one liner can help us extract the shellcode by running against the compiled executable. Source.
objdump show_hostname.exe -d | grep '[a-f0-9]:' | grep -v 'file' | cut -f2 -d: | cut -f1-7 -d' ' | tr -s ' ' | tr '\t' ' ' | sed 's/ $//g' | sed 's/ /\\x/g' | paste -d '' -s
Next we can add this into a basic shellcode runner, for example injecting into a local process by using the first example here, also provided below.
#include "Windows.h"
int main()
{
unsigned char shellcode[] = "";
void *exec = VirtualAlloc(
0,
sizeof shellcode,
MEM_COMMIT,
PAGE_EXECUTE_READWRITE
);
memcpy(exec, shellcode, sizeof shellcode);
((void(*)())exec)();
return 0;
}
Next we can compile it the final file with our shellcode.
"C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\Build\vcvars64.bat"
"C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.37.32822\bin\Hostx64\x64\cl.exe" payload.cpp
After compiling, we can then run our final payload with our newly created custom shellcode.
In the next and final part of this series on shellcode development, we will walk through writing shellcode that is actually useful and serves an interesting purpose by implementing custom guardrails into MSFVenom shellcode.
Windows x64 Shellcoding – Part 3
If you learnt something cool or this helped you out, send me some BTC for beer 🙂
bc1qannme72ya2gechk2ued2f96ec6v2veyctvz7mc