It's Called a VEH-tor ↗️
— pboReading through an old GuLoader sample in the decompiler, following the exception handler, trying to understand what it was actually doing, made it clear that my knowledge of Windows exception handling was not structured enough to tackle this kind of obfuscation confidently on another family. I knew the broad strokes, enough to recognize the technique, but not enough to follow it precisely or explain it to someone else.
This is a personal writeup, an attempt to connect the dots properly rather than carry around a vague understanding that works until it does not. It covers the theoretical foundation of SEH and VEH, and how the internal structures look in a debugger and a disassembler.
A lot of what ended up here was things I already had a rough idea of but had never verified properly. Documenting what I learned about exceptions allowed me to refine my grasp of the subject. Nothing revolutionary, just notes from someone who went back to the source and want to avoid future headaches.
What brought me back to this subject is the analysis of GuLoader that uses VEH (see SonicWall, Zscaler and Unit42 articles for more deeper malware analysis).
This article is my attempt to write down what I learned properly, starting from the actual concepts rather than jumping straight to the tricks. SEH and VEH are legitimate, well-designed mechanisms. Understanding how they are supposed to work is what makes the abuse readable.
The first part covers the concepts and the API, how the OS dispatches exceptions, how SEH and VEH handlers are registered, and what developers normally use them for. The second part gets into the malware side: how exception handling gets repurposed to hide execution flow. To wrap things up, I decided to test some detection logic. I hacked together a basic implementation in C; while my C skills are definitely still a ‘work in progress,’ the code serves its purpose in demonstrating how to catch this behavior.
If you already know Windows internals well, the first two parts will mostly be a refresher. If you are coming at this from the analysis side without much background in the underlying mechanism, I hope starting from the foundation makes the second part easier to follow.
Before going further, here are some interesting external resources related to VEH related to malware domain:
- SonicWall - GuLoader Demystified: Unraveling its Vectored Exception Handler Approach
- Zscaler - Technical Analysis of GuLoader Obfuscation Techniques
- CrowdStrike - Unmasking the Dark Art of Vectored Exception Handling: Bypassing XDR and EDR in the Evolving
- IBM - You just got vectored – using vectored exception handlers (veh) for defense evasion and process injection
- Unit42 - Tackling Anti-Analysis Techniques of GuLoader and RedLine Stealer
SEH, VEH and a Word on C++ Exceptions #
- What an exception is at the OS level and how Windows dispatches it (brief, just enough to understand the rest)
- SEH: the stack-based chain, per-thread, per-frame, how the compiler owns it for you
- VEH: process-wide, heap-resident, fires before SEH, the two-function API
- The difference with C++ exceptions: try/catch is a language abstraction built on top of SEH, not the same thing, why that distinction matters when you are reading disassembly
The three terms (SEH, VEH and Exception) often get conflated, especially in malware analysis writeups (and especially by myself).
What is an exception at the OS level?
When something goes wrong during execution, whether it is a divide by zero, an access to an unmapped memory page,
or an explicit int 3 instruction, the CPU raises an exception.
Control transfers to the kernel, which builds an EXCEPTION_RECORD describing what happened and a CONTEXT structure capturing the full register state at the time of the fault.
Windows then tries to find something in user space that knows how to handle it. That search is what SEH and VEH are about.
Structured Exception Handling #
SEH in x86 #
SEH is the older of the two mechanisms. The idea is straightforward: each function that wants to handle exceptions registers a handler on the stack,
forming a linked list rooted at fs:[0] on x86. When an exception occurs, Windows walks that list from the top, giving each registered handler a chance to deal with it.
If a handler claims the exception, execution resumes. If nothing handles it, the process crashes.
From a developer perspective, SEH is what sits behind __try / __except / __finally in C.
The compiler does most of the work, emitting the registration and cleanup code around the blocks.
On x64 the implementation is different: instead of a runtime chain on the stack, the compiler emits a static table in the .pdata section that the OS uses to unwind.
The surface API looks the same but the mechanics underneath are not. That is still unclear to me…
#include <windows.h>
#include <stdio.h>
int main(void)
{
__try
{
// intentionally trigger an access violation
int *ptr = NULL;
*ptr = 42;
}
__except(EXCEPTION_EXECUTE_HANDLER)
{
printf("SEH caught the exception\n");
}
return 0;
}
The C file is compiled with this command line: cl.exe /Zi /O1 /GS- test-seh.c. See below the difference between x86 that uses the fs:[0] and
the x64 version that has the logic in the .pdata section.
Figure 1: Main function in 32bit environment
In the 32bits architecture, the exception is “registered” by the first instruction of the main function (see the screenshot below). Where the
compiler add the following instructions:
push 8
push offset struc_478178
call j__SEH_prolog
xor eax, eax
What __SEH_prolog does internally is:
- Saves the current fs:[0] value (the previous handler in the chain)
- Builds an
EXCEPTION_REGISTRATION_RECORDon the stack - Points
fs:[0]to it, inserting this function into the SEH chain - Sets up the
ms_exclocal variable, which is the structure MSVC uses to track the current state of the exception handling frame
The structure struc_478178 is:
Figure 2: struc_478178 content
typedef struct _SCOPETABLE_ENTRY
{
DWORD EnclosingLevel; // index of the enclosing scope, -1 if none
PVOID FilterFunc; // pointer to the filter expression
PVOID HandlerFunc; // pointer to the __except or __finally block
} SCOPETABLE_ENTRY;
Looking at the entry <0FFFFFFFFh, offset $LN5, offset catch_except_ptr_42>:
- EnclosingLevel = 0xFFFFFFFF this is
-1, meaning this__tryblock has no enclosing__tryblock, it is the outermost one in the function - FilterFunc = $LN5 this is the compiled form of the filter expression, the code that evaluates
EXCEPTION_EXECUTE_HANDLERor whatever condition I would put in the C code__except(...) - HandlerFunc = catch_except_ptr_42 this is the actual
__exceptblock that runs if the filter says to handle the exception
SEH in x64 #
Regarding 64 bits architecture, the following main function is:
Figure 3: x64 decompiled main function
Here, as a first observation there is no fs:[0], no __SEH_prolog call. There is no explicit registration at function scope level (from my understanding).
The handler is registered statically through the .pdata (I read that it also can be store in .rdata section too) structures.
The .pdata store the RUNTIME_FUNCTION structure that is define by three terms: BeginAddress, EndAddress and UnwindData (the last one is
a pointer to the UNWIND_INFO structure).
When the access violation fires at mov dword ptr [rax], 2Ah (writing 42 to a null pointer), the OS:
- Catches the fault in the kernel
- Comes back to user mode and calls
RtlDispatchException - Takes the faulting
RIP, does a binary search in.pdatato find the matchingRUNTIME_FUNCTION(the structure that validate this condition:BeginAddress <= FaultyRIP < EndAddress) - Follows it to the
UNWIND_INFO, sees__C_specific_handleras the registered handler - Calls
__C_specific_handlerwhich walks theC_SCOPE_TABLE, finds the scope covering the faulting address, evaluates the filtermain$filt$0 - Filter returns
EXCEPTION_EXECUTE_HANDLER, execution jumps to$LN6which is the__exceptblock calling printf
The “workflow” of the exception is defined as below:
Exception Triggers
-> OS looks up RIP in .pdata
-> Locates RUNTIME_FUNCTION (here stru_140092378)
-> Follows pointer to UNWIND_INFO
-> Calls __C_specific_handler
-> Searches C_SCOPE_TABLE
-> Jumps to $LN6 (my __except block)
Figure 4: .pdata section that hold the RUNTIME_FUNCTION for my exception in the main function
IDA labeled it ExceptionDir because it is the first entry in the exception directory. The three fields map directly to the main function:
rva mainis the start address of the function,0x140007250rva byte_14000727Eis the end address of the main functionrva stru_140092378is the pointer to theUNWIND_INFOstructure, the one that contains__C_specific_handlerand theC_SCOPE_TABLE
The structure is as follow:
Figure 5: IDA view of the structure stru_140092378
stru_140092378 is the UNWIND_INFO structure that the .pdata entry for main points to. It is made of three parts:
- The
UNWIND_INFO_HDRis the header. It describes the prologue of the function. - The
UNWIND_CODEis the actual unwinding instruction. - After the unwind codes, because
UNW_FLAG_EHANDLERwas set, comes the exception handler pointer pointing to__C_specific_handler, followed by theC_SCOPE_TABLE(which a bit different from the structure for x86). That table is where the actual exception handling logic is described: which address range is covered by the__tryblock, which function to call as the filter, and where to redirect execution if the filter decides to handle the exception.
In x64 the C_SCOPE_TABLE_ENTRY structrure is defined as:
struct _C_SCOPE_TABLE_ENTRY {
uint32_t BeginAddress; // RVA of the start of the __try block
uint32_t EndAddress; // RVA of the end of the __try block
uint32_t HandlerAddress; // RVA of the filter or __finally handler
uint32_t JumpTarget;
} C_SCOPE_TABLE_ENTRY
One structure, three responsibilities: unwind the stack, find the handler, map the guarded region.
So the definition of the structure is:
typedef struct _UNWIND_INFO
{
BYTE VersionAndFlags; // UNWIND_INFO_HDR - version + flags (UNW_FLAG_EHANDLER etc.)
BYTE SizeOfProlog; // UNWIND_INFO_HDR - prologue size in bytes
BYTE CountOfCodes; // UNWIND_INFO_HDR - number of UNWIND_CODE slots
BYTE FrameRegisterAndOffset;// UNWIND_INFO_HDR - frame register + offset
UNWIND_CODE UnwindCodes[]; // variable length array, CountOfCodes entries
// padded to 4 byte alignment
// only present if flags contain UNW_FLAG_EHANDLER or UNW_FLAG_UHANDLER
DWORD ExceptionHandlerRVA; // rva j___C_specific_handler
// handler specific data, depends on which handler is used
// for __C_specific_handler this is the C_SCOPE_TABLE
C_SCOPE_TABLE ScopeTable;
} UNWIND_INFO;
At the end of the UNWIND_INFO (if certain flags like UNW_FLAG_EHANDLER are set), there is an extra field
called the ExceptionHandler. For C/C++ code compiled with MSVC, this almost always points to __C_specific_handler.
Link to Microsoft documentation
Vectored Exception Handling #
VEH was introduced in Windows XP and works differently. Instead of being tied to the stack,
VEH handlers are registered at the process level and stored in a list maintained by ntdll. The vectored handler list is consulted before SEH.
If any VEH handler claims the exception, the SEH chain is never walked at all.
The API is simple. A handler is registered with AddVectoredExceptionHandler, which takes a flag indicating whether the handler should be first or last in the list,
and a pointer to the handler function. The handler receives an EXCEPTION_POINTERS structure giving it access to both the EXCEPTION_RECORD and the CONTEXT.
It then returns either EXCEPTION_CONTINUE_EXECUTION to resume execution, or EXCEPTION_CONTINUE_SEARCH to pass to the next handler.
There is also a sibling mechanism called Vectored Continue Handlers, registered with AddVectoredContinueHandler, which fires after a handler has already claimed the exception.
I did not exercice this path in the article.
#include <windows.h>
#include <stdio.h>
LONG CALLBACK MyVectoredHandler(PEXCEPTION_POINTERS ExceptionInfo)
{
if (ExceptionInfo->ExceptionRecord->ExceptionCode == EXCEPTION_ACCESS_VIOLATION)
{
printf("VEH caught an access violation at 0x%p\n",
ExceptionInfo->ExceptionRecord->ExceptionAddress);
// move RIP past the faulting instruction (could be wrapped with macro for 32bit with eip)
ExceptionInfo->ContextRecord->Rip += 2;
return EXCEPTION_CONTINUE_EXECUTION;
}
return EXCEPTION_CONTINUE_SEARCH;
}
int main(void)
{
PVOID handler = AddVectoredExceptionHandler(1, MyVectoredHandler);
// intentionally trigger an access violation
int *ptr = NULL;
*ptr = 42;
RemoveVectoredExceptionHandler(handler);
return 0;
}
Here registration is explicit. The first argument to AddVectoredExceptionHandler being 1 means this handler goes to the front of the list,
so it fires before any other VEH handler and before SEH. The handler inspects the exception code, adjusts RIP to skip past the faulting instruction,
and returns EXCEPTION_CONTINUE_EXECUTION to resume. If the exception is not one it cares about, it returns EXCEPTION_CONTINUE_SEARCH to let the next handler in the chain take over.
The key difference to notice: in the SEH example the handler is scoped to the __try block and the stack frame it lives in.
In the VEH example the handler is active process-wide from the moment it is registered until RemoveVectoredExceptionHandler is called, regardless of which function is currently executing.
C++ Exceptions are not the Same Thing #
This one trips people up. When you write try / catch in C++, you are using the C++ exception model, which is a language-level abstraction.
Under the hood on Windows, the compiler implements it on top of SEH, using a special SEH filter to match C++ exception types.
But they are not the same layer. A C++ catch block is not an SEH handler, and it is definitely not a VEH handler.
The reason this distinction matters in practice is that when you are reversing a sample and you see AddVectoredExceptionHandler being called, you are not looking at a compiler artifact.
There is no language feature that emits that call for you. It is explicit, intentional code, and whoever wrote it made a deliberate
choice to intercept exceptions at the process level before anything else gets a chance to see them.
If you are interested in C++ exceptions, I highly encourage you to read C++ Unwind Exception Metadata: A Hidden Reverse Engineering Bonanza written by Rolf Rolles.
How the VEH List is Built and Stored #
The VEH list is a doubly-linked list maintained per-process in user-mode memory, managed by ntdll.dll. It holds pointers to registered PVECTORED_EXCEPTION_HANDLER callbacks.
When AddVectoredExceptionHandler is called, it calls a thin wrapper that forwards to RtlAddVectoredExceptionHandler in ntdll.dll.
That is where the actual work happens, and it is worth understanding what that function does with the handler pointer.
Ntdll maintains two doubly linked lists for exception handling, one for vectored exception handlers and one for vectored continue handlers.
Both lists are anchored by a single global structure that lives inside ntdll’s data segment, commonly referred to as LdrpVectorHandlerList in debugging sessions.
The structure looks roughly like this:
typedef struct _VECTORED_HANDLER_LIST
{
SRWLOCK Lock; // slim reader/writer lock protecting the list
LIST_ENTRY VEHList; // head of the vectored exception handler list
LIST_ENTRY VCHList; // head of the vectored continue handler list
} VECTORED_HANDLER_LIST;
Each registered handler is wrapped in a node that gets allocated on the heap:
typedef struct _VECTORED_EXCEPTION_NODE
{
LIST_ENTRY ListEntry; // links to previous and next node
PVOID EncodedHandler; // the function pointer, but encoded
ULONG ReferenceCount;
} VECTORED_EXCEPTION_NODE;
The LIST_ENTRY is the standard Windows doubly linked list structure, with a Flink pointing to the next node and a Blink pointing to the previous one.
The list head in LdrpVectorHandlerList acts as the sentinel node, so walking from VEHList.Flink until you loop back to the head gives you every registered handler in order.
RtlAddVectoredExceptionHandler does the following (in order):
- Allocates a
VECTORED_EXCEPTION_NODEon the process heap withRtlAllocateHeap - Encodes the function pointer using
RtlEncodePointerbefore storing it inEncodedHandler - Acquires an exclusive lock on the
SRWLOCKinLdrpVectorHandlerList - Inserts the node either at the front or at the back of the list depending on the first parameter you passed
- Releases the lock
- Returns the address of the node as the handle you use later to remove it
The first parameter is documented as ULONG First. A non-zero value puts the handler at the head of the list, meaning it will be called before any previously registered handler. Zero puts it at the tail.
When an exception occurs, after the kernel-side handling and the transition back to user mode,
ntdll calls RtlDispatchException. Before touching SEH, it acquires a shared lock on LdrpVectorHandlerList and walks the VEH list from head to tail.
For each node it decodes the handler pointer and calls it with the EXCEPTION_POINTERS structure.
If a handler returns EXCEPTION_CONTINUE_EXECUTION, the walk stops and execution resumes. If it returns EXCEPTION_CONTINUE_SEARCH, the walk continues to the next node.
If the entire VEH list is exhausted without anyone claiming the exception, the SEH chain is walked. If SEH also passes, the VCH list is walked.
(VCH: Vectored Continue Handlers, where handler are register via AddVectoredContinueHandler).
The ordering guarantee is therefore strict: VEH first, in registration order, then SEH, then VCH.
Practice: Observing it at runtime #
This is a short of note section on how to inspect the exception regarding VEH and its underlaying structure in WinDbg. For this short exercice, I used the following C code is used:
#include <windows.h>
#include <stdio.h>
LONG CALLBACK FirstHandler(PEXCEPTION_POINTERS ExceptionInfo)
{
if (ExceptionInfo->ExceptionRecord->ExceptionCode == EXCEPTION_ACCESS_VIOLATION)
{
printf("FirstHandler: passing to next handler\n");
return EXCEPTION_CONTINUE_SEARCH;
}
return EXCEPTION_CONTINUE_SEARCH;
}
LONG CALLBACK SecondHandler(PEXCEPTION_POINTERS ExceptionInfo)
{
if (ExceptionInfo->ExceptionRecord->ExceptionCode == EXCEPTION_ACCESS_VIOLATION)
{
printf("SecondHandler: claiming the exception\n");
ExceptionInfo->ContextRecord->Eip += 6;
return EXCEPTION_CONTINUE_EXECUTION;
}
return EXCEPTION_CONTINUE_SEARCH;
}
int main(void)
{
PVOID h1 = AddVectoredExceptionHandler(1, FirstHandler);
PVOID h2 = AddVectoredExceptionHandler(1, SecondHandler);
int *ptr = NULL;
*ptr = 42;
RemoveVectoredExceptionHandler(h1);
RemoveVectoredExceptionHandler(h2);
printf("execution continued after the fault\n");
return 0;
}
NB: I skip the part where I setup the symbols in windbg.
To watch how: the double linked list work, the following breakpoints are set:
bp ntdll!RtlpCallVectoredHandlers
bp double_veh!FirstHandler
bp double_veh!SecondHandler
Why breaking at RtlpCallVectoredHandlers? Reading from the bottom up, this is the full execution path that led to the VEH list walk:
Figure 6: capture of the stack after reaching the RtlpCallVectoredHandler in ntdll (just after ACCESS_VIOLATION occured)
_RtlUserThreadStartandBaseThreadInitThunkare the standard thread startup boilerplate__scrt_common_main_sehis the MSVC CRT startup wrapper that calls mainmain+0x30is my code, specifically line 32 in double-veh.c which is the null pointer write*ptr = 42KiUserExceptionDispatcheris the first user mode function that ran after the kernel caught the fault, the entry point back from kernel modeRtlDispatchException+0x67is where the OS starts looking for a handlerRtlpCallVectoredHandlersis where the execution is currently -> the function about to walk the process VEH list
The key thing to point out for the article is frames 02, 01 and 00.
That three step sequence from KiUserExceptionDispatcher to RtlDispatchException to RtlpCallVectoredHandlers is the exact dispatch chain.
Let it run with g until it hits another breakpoint which should be SecondHandler, since it is registered second with parameter 1
so the first in the VEH list.
Figure 7: windbg capture of the stack after hitting SecondHandler function during the exception management
Now looking at dd esp, the second value 010fec90 is the EXCEPTION_POINTERS pointer being passed as the argument to the handler (SecondHandler).
Which can follow with: dt EXCEPTION_POINTERS 010fec90.
And we obtains:
double_veh!_EXCEPTION_POINTERS
+0x000 ExceptionRecord : 0x010fed74 _EXCEPTION_RECORD
+0x004 ContextRecord : 0x010fedc4 _CONTEXT
and with dt _EXCEPTION_RECORD 0x010fed74 to inspect the exception record
Figure 8: Exception record inspection
This is what expected to observed the code is 0n-1073741819 which is equivalent to 0xC0000005 (STATUS_ACCESS_VIOLATION)
To convert this value from windbg to a hexadecimal representation I used the following Python snippet:
value = -1073741819
print(hex(value & 0xFFFFFFFF))
0xc0000005
Using Exceptions as a Control Flow Primitive #
In this section, I decided to put my modest C skills to the test to see if I could trip up the decompiler.
Three source codes are provided as Proof of Concept see them as ladder to tackle the above challenge.
- Simple PoC which API hashing.
- Introduce inline ASM to produce faulty instruction.
- Improve code to trick decompiler to resolv faulty instructions construction.
VEH combined with API hashing #
The PoC starts by resolving AddVectoredExceptionHandler through API hashing rather
than a normal import: the function name is reduced to a single 32-bit ROR13 constant (0x159B3EA0),
and a small resolver walks kernel32’s export directory at runtime, transparently following the forwarder into kernelbase.dll.
No string, no IAT entry, no static cross-reference. Once the address is in hand, the handler is registered with CALL_FIRST priority
so it sees exceptions before anything else in the process, and the program deliberately raises an int3 to invoke it.
Inside the handler, instead of calling IsDebuggerPresent, the code reads NtGlobalFlag directly from the PEB at offset 0xBC (x64) or 0x68 (x86)
and tests for the 0x70 heap-debug bit pattern that Windows OR’s in whenever a process is launched under a debugger.
I recently came accross this technique which is documented by CheckPoint in there Anti-Debug: Debug Flags documentation.
In the normal case the bits are clear, the handler advances RIP past the int3,
returns EXCEPTION_CONTINUE_EXECUTION, and the program prints its “survived” message and exits cleanly.
Under a debugger the same read returns 0x70, the process terminates with exit code 0xDEAD.
#include <windows.h>
#include <stdio.h>
#define FLG_HEAP_ENABLE_TAIL_CHECK 0x10
#define FLG_HEAP_ENABLE_FREE_CHECK 0x20
#define FLG_HEAP_VALIDATE_PARAMETERS 0x40
#define NT_GLOBAL_FLAG_DBG_MASK \
(FLG_HEAP_ENABLE_TAIL_CHECK | FLG_HEAP_ENABLE_FREE_CHECK | FLG_HEAP_VALIDATE_PARAMETERS)
typedef PVOID (WINAPI *pfnAddVectoredExceptionHandler)(
ULONG First,
PVECTORED_EXCEPTION_HANDLER Handler);
static ULONG GetNtGlobalFlag(void)
{
#ifdef _WIN64
PBYTE peb = (PBYTE)__readgsqword(0x60);
return *(volatile ULONG *)(peb + 0xBC);
#else
PBYTE peb = (PBYTE)__readfsdword(0x30);
return *(volatile ULONG *)(peb + 0x68);
#endif
}
static LONG WINAPI MyVectoredHandler(PEXCEPTION_POINTERS ep)
{
if (ep->ExceptionRecord->ExceptionCode != EXCEPTION_BREAKPOINT)
return EXCEPTION_CONTINUE_SEARCH;
ULONG flag = GetNtGlobalFlag();
printf("[VEH] hit. NtGlobalFlag = 0x%lx\n", flag);
if ((flag & NT_GLOBAL_FLAG_DBG_MASK) == NT_GLOBAL_FLAG_DBG_MASK) {
printf("[VEH] debugger detected via NtGlobalFlag -> bailing.\n");
ExitProcess(0xDEAD);
}
printf("[VEH] clean. Skipping the int3 and resuming.\n");
#ifdef _WIN64
ep->ContextRecord->Rip += 1;
#else
ep->ContextRecord->Eip += 1;
#endif
return EXCEPTION_CONTINUE_EXECUTION;
}
#define HASH_ADDVECTOREDEXCEPTIONHANDLER 0x159B3EA0UL
static DWORD Ror13Hash(const char *s)
{
DWORD h = 0;
while (*s) {
h = (h >> 13) | (h << 19);
h += (BYTE)*s++;
}
return h;
}
static FARPROC ResolveByHash(HMODULE hMod, DWORD target)
{
PBYTE base = (PBYTE)hMod;
PIMAGE_DOS_HEADER dos = (PIMAGE_DOS_HEADER)base;
if (dos->e_magic != IMAGE_DOS_SIGNATURE) return NULL;
PIMAGE_NT_HEADERS nt = (PIMAGE_NT_HEADERS)(base + dos->e_lfanew);
if (nt->Signature != IMAGE_NT_SIGNATURE) return NULL;
IMAGE_DATA_DIRECTORY dir =
nt->OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_EXPORT];
if (!dir.VirtualAddress || !dir.Size) return NULL;
PIMAGE_EXPORT_DIRECTORY exp =
(PIMAGE_EXPORT_DIRECTORY)(base + dir.VirtualAddress);
PDWORD names = (PDWORD)(base + exp->AddressOfNames);
PWORD ordinals = (PWORD) (base + exp->AddressOfNameOrdinals);
PDWORD funcs = (PDWORD)(base + exp->AddressOfFunctions);
for (DWORD i = 0; i < exp->NumberOfNames; i++) {
const char *name = (const char *)(base + names[i]);
if (Ror13Hash(name) != target) continue;
DWORD funcRva = funcs[ordinals[i]];
if (funcRva >= dir.VirtualAddress &&
funcRva < dir.VirtualAddress + dir.Size)
{
const char *fwd = (const char *)(base + funcRva);
const char *dot = fwd;
while (*dot && *dot != '.') dot++;
if (*dot != '.') return NULL;
char dllName[64];
size_t n = (size_t)(dot - fwd);
if (n + 5 > sizeof(dllName)) return NULL;
for (size_t k = 0; k < n; k++) dllName[k] = fwd[k];
dllName[n+0] = '.'; dllName[n+1] = 'd';
dllName[n+2] = 'l'; dllName[n+3] = 'l';
dllName[n+4] = 0;
HMODULE hNext = GetModuleHandleA(dllName);
if (!hNext) hNext = LoadLibraryA(dllName);
if (!hNext) return NULL;
return ResolveByHash(hNext, Ror13Hash(dot + 1));
}
return (FARPROC)(base + funcRva);
}
return NULL;
}
int main(void)
{
HMODULE hK32;
pfnAddVectoredExceptionHandler pAddVEH;
PVOID hVEH;
hK32 = GetModuleHandleA("kernel32.dll");
if (!hK32) {
fprintf(stderr, "[-] GetModuleHandleA failed (%lu)\n", GetLastError());
return 1;
}
pAddVEH = (pfnAddVectoredExceptionHandler)
ResolveByHash(hK32, HASH_ADDVECTOREDEXCEPTIONHANDLER);
if (!pAddVEH) {
fprintf(stderr, "[-] hash resolution failed\n");
return 1;
}
printf("[+] AddVectoredExceptionHandler resolved at %p\n", (void*)pAddVEH);
hVEH = pAddVEH(1, MyVectoredHandler);
if (!hVEH) {
fprintf(stderr, "[-] AddVectoredExceptionHandler returned NULL\n");
return 1;
}
printf("[+] VEH installed at handle %p. Triggering int3...\n", hVEH);
__debugbreak();
printf("[+] Survived. NtGlobalFlag check did not trip.\n");
return 0;
}
Figure 9: Output of two execution, one in nominal execution and another in the debug that trigger the ExitProcess(0xDEAD)
This is a very simple scenario of how malware could abuse this feature to “hijack” the execution flow, here the Exception is made with an explicite int3.
But nothing is really hidden when decompiling the binary, even with dynamic api resolution, so analyst just after resolving the hash will pretty fastly
catch what to analyse. So, from an attacker point of view, how could this simple scenario can be improved?
VEH / A failed way to “Arithmetic as a smokescreen” #
First idea is to change the code that raise the exception, so instead of a int3 why not triggering an ACCESS_VIOLATION based on simple arithmetic calculation.
Here for instance, we can add an inline ASM block that will trigger an ACCESS_VIOLATION by compute via boolean arithmetic a zero that will be used as an address:
mov rbx, 0xdeadbeef
mov rcx, 0xd2acc002
add rcx, 0xc00feed
xor rbx, rcx
mov [rbx], rcx
This result in the register rbx being set to 0, that could read in C as int *ptr = NULL; *ptr = 0xdeadbeef;
#include <windows.h>
#include <stdio.h>
#define FLG_HEAP_ENABLE_TAIL_CHECK 0x10
#define FLG_HEAP_ENABLE_FREE_CHECK 0x20
#define FLG_HEAP_VALIDATE_PARAMETERS 0x40
#define NT_GLOBAL_FLAG_DBG_MASK (FLG_HEAP_ENABLE_TAIL_CHECK | FLG_HEAP_ENABLE_FREE_CHECK | FLG_HEAP_VALIDATE_PARAMETERS)
static ULONG GetNtGlobalFlag(void)
{
#ifdef _WIN64
PBYTE peb = (PBYTE)__readgsqword(0x60);
return *(volatile ULONG *)(peb + 0xBC); // NtGlobalFlag in x64
#else
PBYTE peb = (PBYTE)__readfsdword(0x30);
return *(volatile ULONG *)(peb + 0x68); // NtGlobalFlag in x86
#endif
}
/* ---------- The vectored handler --------------------------------------- */
static LONG WINAPI MyVectoredHandler(PEXCEPTION_POINTERS ep)
{
if (ep->ExceptionRecord->ExceptionCode != EXCEPTION_ACCESS_VIOLATION)
return EXCEPTION_CONTINUE_SEARCH;
if (ep->ExceptionRecord->ExceptionInformation[1] != 0)
return EXCEPTION_CONTINUE_SEARCH;
ULONG flag = GetNtGlobalFlag();
printf("[VEH] hit. NtGlobalFlag = 0x%lx\n", flag);
if ((flag & NT_GLOBAL_FLAG_DBG_MASK) == NT_GLOBAL_FLAG_DBG_MASK) {
printf("[VEH] debugger detected via NtGlobalFlag.\n");
ExitProcess(0xDEAD);
}
printf("[VEH] clean. Skipping the faulting store and resuming.\n");
#ifdef _WIN64
ep->ContextRecord->Rip += 3;
#else
ep->ContextRecord->Eip += 2;
#endif
return EXCEPTION_CONTINUE_EXECUTION;
}
int main(void)
{
PVOID hVEH = AddVectoredExceptionHandler(1, MyVectoredHandler);
if (!hVEH) {
fprintf(stderr, "[-] AddVectoredExceptionHandler returned NULL\n");
return 1;
}
printf("[+] VEH installed at handle %p. Triggering AV via arithmetic NULL...\n", hVEH);
/* Compute a NULL pointer at runtime via boolean arithmetic,
rcx = 0xd2acc002 + 0x0c00feed = 0xdeadbeef
rbx ^= rcx -> 0xdeadbeef ^ 0xdeadbeef = 0
[rbx] = rcx -> write to address 0 -> EXCEPTION_ACCESS_VIOLATION */
__asm__ volatile (
".intel_syntax noprefix\n\t"
"mov rbx, 0xdeadbeef\n\t"
"mov rcx, 0xd2acc002\n\t"
"add rcx, 0x0c00feed\n\t"
"xor rbx, rcx\n\t"
"mov [rbx], rcx\n\t"
".att_syntax prefix\n\t"
::: "rbx", "rcx", "memory"
);
printf("[+] Survived. NtGlobalFlag check did not trip.\n");
return 0;
}
x86_64-w64-mingw32-gcc -Wall -O0 veh_arithmetic_access_violationation.c -o veh.exe
From the disasembly view it is what I was expected, however Hex-Rays is doing constant propagation across the basic block. It sees five instructions with pure-immediate inputs and no external state, so it folds the whole computation at decompile time.
Figure 10: Disassembly view and decompiled view in IDA
This implementation is too transparent: Hex-Rays was able to fold the five constant-driven instructions into a single MEMORY[0] = 0xdeadbeef.
VEH / Sealing the fault #
This section builds on the previous proof of concept and pushes the obfuscation one step further, targeting the decompiler specifically: constants are hidden behind an opaque wrapper, and the handler stops stepping over the fault and starts redirecting execution to an entirely separate function.
Basically what I want to test is the following workflow:
hVEH = AddVectoredExceptionHandler(1, MyVectoredHandler);
t = (uint64_t)hVEH;
g_mask = (t ^ Opaque(t)) + Opaque(0xd2acc002) + 0x0c00feed;
mask_val = g_mask;
__asm { mov rcx, 0xd2acc002; add rcx, 0xc00feed; xor rbx, rcx; mov [rbx], rcx }
printf("(decoy) Survived..."); // I don't want to see this in the decompiled view
The idea here was to try to make the decompiler less helpful to the analyst,
both at the operand level and at the control-flow level.
I’m not sure these are the best techniques, but two small changes were layered onto the previous PoC
to see if Hex-Rays could still be coaxed away from showing a tidy MEMORY[0] = 0xdeadbeef.
First, the constants feeding the inline asm go through Opaque() function, a noinline identity function wrapping a volatile read.
The two attributes seem to pull in different directions, and I think that’s why it works.
noinline forces the compiler to emit a real call at every call site instead of pasting the body inline.
volatile tells it the value inside the function could change between the store and the load
(in practice it can’t, but as far as I understand the standard says the compiler has to assume it might),
so it can’t reason about what comes out.
Together you get something close to a sealed black box: the compiler has to make the call,
and once execution is inside it can’t really prove anything about the return value.
In my tests 0xdeadbeef no longer shows up as a literal anywhere in the binary,
it only exists in rbx at runtime after the Opaque() compute part of the operation with other static variable.
Second, the handler stops being polite. Instead of just stepping over the faulting instruction,
it rewrites CONTEXT.Rip to point at a separate function, named here RealNextStage, which is where the real “work” happens.
From what I’ve seen, IDA seems to treat an access violation as a dead end, so the decompilation of main just stops at the fault.
The printf sitting right after it looks like reachable code but never actually runs,
and the code that does run lives in a function with no static reference from main at all.
An analyst still has to read the handler, spot the Rip write, and follow it by hand.
That probably isn’t a huge obstacle for someone experienced, but it does mean main’s decompilation on its own won’t point the way.
|
|
x86_64-w64-mingw32-gcc -Wall -O0 veh_hid_arithmetic_result.c -o veh.exe
The above code successfuly show what I expected where the “real” execution flow is hidden by the Vector Handler.
Figure 11: Decompiled view of the main function that implement the execution flow hidden
Figure 12: Decompiled view of the custom vector handler that change program execution flow if a debugger is detected via NtGlobalFlag
Detecting It as a Malware Analyst #
As a first ideas or the two starting points would be YARA and CAPA rules to search for following patterns:
- YARA: signatures on
RtlAddVectoredExceptionHandler,AddVectoredExecptionHandlerandAddVectoredContinueHandler, mixed with known patterns such asIsDebuggerPresent,NtQueryInformationProcess, etc… - CAPA: relevant rules around exception handler registration and dynamic control flow, what to write if the rule does not exist yet.
This is a lightweight attempt at a CAPA rule. It may produce false positives, but it has been helpful as a starting point when exploring large binaries. Note that the rule is at function scope so, if the handler is registered at the beginning of the program and the fault instructions in different functions, the rules won’t trigger.
NB: The rule only cover 3 types of exceptions: undefined instruction, int3, divide by zero;
rule:
meta:
name: register vectored exception handler to redirect control flow
namespace: anti-analysis/anti-debugging/debugger-evasion
authors:
- @plebourhis
scopes:
static: function
dynamic: call
att&ck:
- Defense Evasion::Debugger Evasion [T1622]
mbc:
- Anti-Behavioral Analysis::Debugger Detection [B0001]
- Anti-Static Analysis::Disassembler Evasion [B0012]
references:
- https://anti-debug.checkpoint.com/techniques/exceptions.html
- https://learn.microsoft.com/en-us/windows/win32/api/errhandlingapi/nf-errhandlingapi-addvectoredexceptionhandler
description: |
Malware registers a Vectored Exception Handler and then deliberately
raises an exception (int3, ud2, divide-by-zero, RaiseException, ...).
features:
- and:
- or:
- api: AddVectoredExceptionHandler
- api: kernel32.AddVectoredExceptionHandler
- api: ntdll.RtlAddVectoredExceptionHandler
- or:
- mnemonic: int3
- mnemonic: ud2
- api: RaiseException
- api: kernel32.RaiseException
- and:
- mnemonic: div
- number: 0 = divide-by-zero to trigger EXCEPTION_INT_DIVIDE_BY_ZERO
- and:
- mnemonic: int
- number: 0x2D = EXCEPTION_BREAKPOINT alt path (int 0x2D)
A second rule, oriented on the handler itself, looks for code that rewrites the Eip=/=Rip field of the ContextRecord.
rule:
meta:
name: vectored exception handler rewrites instruction pointer
namespace: anti-analysis/anti-debugging/debugger-evasion
authors:
- @plebourhis
scopes:
static: function
dynamic: call
att&ck:
- Defense Evasion::Debugger Evasion [T1622]
mbc:
- Anti-Behavioral Analysis::Debugger Detection [B0001]
description: |
A VEH/SEH callback writes to the Eip (x86, CONTEXT+0xB8) or Rip
(x64, CONTEXT+0xF8) field of the EXCEPTION_POINTERS->ContextRecord
it was handed, redirecting execution after a planted exception.
features:
- and:
- or:
- number: 0xB8 = offsetof(CONTEXT, Eip) on x86
- number: 0xF8 = offsetof(CONTEXT, Rip) on x64
- or:
- number: 0x10001 = EXCEPTION_CONTINUE_EXECUTION
- number: 0xFFFFFFFF = (LONG)-1 EXCEPTION_CONTINUE_EXECUTION
- number: 0 = EXCEPTION_CONTINUE_SEARCH (handler chooses to skip)
None of the paths I wanted to follow seams accurate, however, a hint for my future self would be to checks for SEH function handler that could have interesting code inside.
And also when debugging a new piece of malware add breakpoint on RtlAddVectoredExceptionHandler to investigate the handler code.
A Note on the Limits of Detection
It is important to remain humble about the visibility. While signature-level detection is highly effective against known threats and reused codebases, it has inherent ceilings:
The Reality Check: Static signatures catch what we have seen before. Because the underlying technique of using exception handlers to redirect code flow is a generic architectural feature of Windows, it is relatively easy for an author to tweak the implementation. A new sample can sidestep most rules simply by changing the “fault” instruction or obfuscating the registration call.
Ultimately, these signatures are starting points for a deeper investigation, rather than a definitive “case closed” for a new piece of malware!
Wrapping Up #
Going into this I expected exception handling to be a small detour
before getting back to the malware sample. It turned out to be a
bigger topic than I thought, and I am sure parts of what I wrote above
are still imprecise, the x64 unwind machinery in particular is
something I want to revisit, because I don’t yet have a clean mental
model of how __C_specific_handler decides what to do with the scope
table.
What I take away from this exercise:
- SEH and VEH are not exotic. They are the documented Windows exception model, and most of what makes them feel “tricky” in malware is just that the analyst is meeting them for the first time in an adversarial context.
- VEH is interesting to an attacker for a very specific reason: it fires before SEH, it is process-wide, and the handler has full read/write access to the saved register context. That combination is what makes it usable as a control-flow primitive (from malware author PoV).
- On the detection side, my CAPA attempts are honestly a starting
point. The technique is generic enough that signatures will lag
behind any author who is willing to swap the faulting instruction or
wrap the registration call. I think the more durable signal is
behavioural: a handler that writes to
ContextRecord->Rip/Eipand returnsEXCEPTION_CONTINUE_EXECUTIONis doing something a well-behaved program almost never needs to do (hope so…) but turning that into a rule that does not light up on every C++ runtime is its own project.
If you spotted something wrong, or if you have a cleaner way of writing the CAPA rules, I would genuinely like to hear it. The references at the top of this post (SonicWall, Zscaler, CrowdStrike, IBM, Unit42) remain the better place to read about VEH in the wild; this article is just my attempt to understand the plumbing well enough to recognise it next time.
Other great resources: