Cracking THE tool of the trade (bye bye Wdasm)
(Interactive Disassembler Pro v3.7)
by Quine, (19 October 1997)
Cracking THE tool of the trade (bye bye Wdasm)
Interactive Disassembler Pro v3.7
Interactive Disassembler Pro v3.7 Demo. This is a brand new version which is much superior to version 3.6. Source: http://www.datarescue.com/ida.htm (homepage) Tools used: W32DASM 8.9 (soon to be a thing of the past :-) BoundsChecker Pro 5.0 (look for the poorly protected demo on NuMega's site). SoftICE for NT v3.2. Ultraedit32 (just for multifile searching). HexWorkshop32 (any hex editor will do the trick-including UltraEdit). Sections in this article: I Why IDA Pro 3.7 is so great II What's disabled in the demo version III Cracking the 64k file size limitation IV Reflections on the protection scheme V The Expiration Date VI Summary of the patch VII The function at 43A314 APPENDIX Stack frames and function calls I. Why IDA Pro 3.7 is so great IDA Pro is by far the best disassembler available for PCs (and probably for any platform). It is ultimately, I think, much, much better than W32dasm. Why? To begin with, IDA Pro disassembles properly. (1) It starts disassembling at the program entry point and then follows every possible execution route from there. Having done that, it then looks for functions which are not directly accessible from the main program flow (e.g., window procedures, thread procedures, and other callback functions of various sorts). This method of disassembly enables to perform much greater levels of analysis of the target program. For example, the beginning and ending of functions are identified and properly marked. Passed arguments and local variables (both referenced off the stack) are identified and marked. Switch statements are identified and the case values are determined (W32dasm does this to a very limited extent). Furthermore, you have complete decision over what it marks as code and what it marks as data. This allows it to disassemble code "hidden" or located in the data section, which happens more often than you might think, because W32dasm can't disassemble it. Also, a trick I saw in a dongle driver (ssidppd.drv from the program WiT) completely flusters disassemblers like w32dasm, which disassemble blindly straight through the code segment. Here's the trick: ANTI-WDASM trick mov eax, edx jmp loc_1 db 0F loc_1: inc eax jmp loc_2 db 85 loc_2: call sub_1 ... and so on W32dasm produces garbage for this code, but IDA Pro does it right because it's following the path of execution. (2) IDA can recognize an amazing range of library functions within a target's body. This greatly reduces the amount of code to plough through when trying to get an understanding of the target. It also, of course, provides a wealth of clues about what a program is doing at any particular place. (3) IDA also has a fairly robust macro language which enables high levels of customization. (4) It can disassemble damn near anything: all pc-based binary formats (all exe formats, lib files, obj files, etc.), code for almost any microprocessor (several of which I've never heard of), Java classes --- that's right full blown interactive Java disassembly, which, as fravia+ says, is the future of cracking! (5) Names, labels, etc. can be changed on the fly so that you can gradually accumulate and save more and more information about the target (working the fields, as I like to think of it). (6) Comments can be added. There are undoubtedly more features that I am leaving out, but you get the idea. In other words, this is __the__ disassembling tool for our trade. Combined with SoftICE and BoundsChecker (more on the power of BoundsChecker below), no target is safe any more (not that they had been safe however :-) II. What's disabled in the demo version (1) It expires on Jan 1, 1998 (2) It cannot load files larger than 64k (3) It cannot load saved databases (project files) (4) It cannot produce list (.lst) or asm source files Limitation (1) will prove, as you might have expected, to be trivial to crack. (2) is the biggest issue. The size limitation makes the demo almost useless as it is. Cracking this limitation is the main objective for this discussion. (3) would be very nice to crack, but perhaps long and hard if the loading code is simply not present in the demo (I think, however, that it is). There are two big problems this limitation (3) creates: (a) IDA's power comes at a price---it's slow, so it would be nice not to have to have it redisassemble with every session. (b) Any comments or name changes, etc. will not be accessible after you quit a session. Problem (b), fortunately, is avoidable, because the demo will let you create a macro file (a file with an idc extension) that records all of your changes which can be reloaded and run for future sessions. (4) I consider to be the least important because looking at a dumb listing in a text editor is a pale comparison to viewing it in the IDA environment. It might be fun, though, to try to produce compilable asm files from targets. Now, on to the good stuff. III. Cracking the 64k file size limitation I will be concerned with the win32 version of IDA. This version is run by executing the idaw.exe file. This file is quite small and does nothing more than load ida.wll [sic], which is a large dll that does most of the work. This is the file that needs cracking. The other files with w32 extensions are the various disassembler modules. pc.w32 is the modules to handle x86 code. When you start idaw, it loads ida.wll (hereinafter simply referred to as ida), which asks you which file you want to disassemble. ida determines which module to load based on the file type and then hands control over to that module. The disassembly module then drives execution for the rest of the session, calling functions in ida where necessary. This is a brilliant design, because it allows the programmer to quite easily add modules for completely different file types without having to rewrite the whole program (witness the Java module. One would think Java disassembly is so different from x86 disassembly that the two could not happily co-exist in one environment). ida is responsible for loading the file and for all subsequent file i/o along with a lot of other things, so let's start by loading ida.wll into w32dasm (I spoke harshly of w32dasm above and that was unfair. I do have a fondness for it---however, now that I have a useable IDA Pro, I will never go back to it :-). The first thing to do is find the place where IDA decides that a file bigger 64k is too big. Our first bet might be to look for the text of the message that pops up when you try to open a file that is too big: "The demo version..." Unfortunately it is not in the list of strings w32dasm provides for ida.wll. So, I used UltraEdit32 to do a file search through the entire idademo directory for the string in question. It turns out to be in a file called ida.hlp (which is not a Windows help file, but is in a proprietary format). Looking at ida.hlp with a hex editor (it's not strictly an ASCII file) we see that the strings are zero-terminated and appear to be prefixed with a word which gives their length. Also, at the beginning of the file is a long series of dwords that appear to be offsets into the file. You guessed it, the dwords point to the length/string pairs. This is undoubtedly how the program gets at the strings. There are D (i.e., 13 (decimal)) bytes at the beginning of the file before the dword list begins. So, the index to a particular string can be calculated in the following way: 1. find the string in ida.hlp and record the offset where the length word starts 2. look up the offset in the list of dwords at the beginning of the file and record the offset of the dword. 3. Subtract D from the offset of the dword and divide the result by 4 4. What you end up with is the index ida uses to reference a string in the help file. The index for the file-too-big message is 556. Using w32dasm to search for this value in ida.wll, we find the following code: :00403CE7 81FD00000100 cmp ebp, 00010000 ; cmp file size,64k :00403CED 760B jbe 00403CFA ; go ahead and load file :00403CEF 6856050000 push 00000556 ; ida.hlp index of demo msg :00403CF4 E85FD10500 call 00460E58 ; message box routine This looks too good to be true (don't worry - it is). It compares ebp (which must contain the size of the file) with 10000 (i.e., 64k) and jumps past the bad message if ebp is less than 64k. No problem. Let's patch the program to force the jump, replacing 76 at 403CED with EB (jmp) and see what happens. Running the program, we find that it now lets us open files of any size and it goes about disassembling them. The problem is that it appears to be disassembling only a small part of the file (I'm using ida.wll, by the way, as the large test file to disassemble). Fairly quickly into the disassembly the message "Execution flows beyond limits" repeatedly appears at the bottom of the screen and nothing past (what appears to be) the 64k boundary is disassembled. In fact, nothing past the 64k boundary is even represented by raw bytes in the disassembly listing. There's another check somewhere. I was stuck at this point for some time. I tried using the help file method above to search out references to "Execution flows beyond limits", but, while I found one, no hacking around in that area of the program seemed to help. It then occurred to me that maybe ida never even loaded more than 64k of the file. However, that couldn't be right because it would load the entire DATA segment for ida.wll, which is well past the 64k mark. Maybe it only loaded 64k of each segment. To investigate this, I ran IDA with BoundsChecker in order to look at how much of the file was actually being read in. So, fire up BoundsChecker (other API spies will probably work, but they won't give you the wealth of information BC does), and load idaw.exe. In the program settings, be sure to set it to collect all event data and to load the module ida.wll. Run the program from BC and open ida.wll from ida. Let it run for a while (at least until you start getting the "Execution flows beyond limits" messages), and then quit ida. You've got one hell of a lot of API calls recorded in BC. In BC, search for calls to CreateFile (which, remember also opens files in Win32) until you find one that passes "ida.wll" as the file name. The return value from CreateFile is the handle to ida.wll, so write that down and start searching for the handle (this will catch all API calls having to do with ida.wll). You'll come across a whole bunch of calls to SetFilePointer and to ReadFile. A lot of these calls set the file pointer to places in the PE Header and read 200 bytes. Forget about these-it's just reading in relevant info about the file. Eventually, though, you'll hit a call the sets the pointer to the beginning of the code segment and reads 7A00 bytes, and then another that sets the pointer to the beginning of the data segment and reads DE00 bytes. 600 is the offset to the beginning of the code segment and 88000 is the offset to the DATA segment. Why is it reading 7A00 bytes instead of 10000 bytes? We'll answer that question in a moment. Write down the location in ida.wll (4316DC) that called ReadFile (this can be found in the right hand pane in BC --- isn't BC great? NuMega wins again) and switch over to w32dasm to see what's going on there. It was in switching over to w32dasm at this point that I had a bit of dumb luck (dumb because I should have figured this out rather than stumbling across it). W32dasm happened to be positioned at the top of the file where it tells you the segment information. Guess what the size of the code segment is? 87A00. The size of the data segment? DE00. So, it's only taking the low 2 bytes of the segment sizes to get the number of bytes to read in from the file. That's why all of the DATA segment but only part of the CODE segment is read. This can be verified if you load ida.wll into ida and jump to address 408A00. At exactly that address, the code cuts off and you get question marks instead of data/code. Now, there are two ways to proceed. The quick and easy way versus the methodical way. I naturally first opted for the quick and easy way which is to search through the w32dasm listing for 0000FFFF and'ed with something else (searching for ",{space}0000FFFF" is a sufficiently narrow choice). This, I assumed, was what the program did to prevent longer files from being read. While I found some places where there were such and instructions, nothing panned out. I won't bore you with the mess I created fiddling around with code in these sections. IDA is designed to disassemble 16 bit programs as well as others, so 64k is relevant to it for many other reasons than the protection scheme (the length of a segment in 16 bit mode is 64k), so there are a lot of 64k red herrings in the program. Well, the quick and easy way wasn't quick or easy and didn't work. I'm impatient (a bad quality in a cracker), but it's always better to try to understand what the code is doing, rather than trying to get lucky. If you're methodical, the luck will find you. Time for SoftICE. The strategy now is to set a breakpoint in SI at the ReadFile call, with the condition that the 3rd parameter be equal to 7A00 (otherwise we'd have to wade through the hundreds of other 200 byte calls). This can be done with the following command: bpx 4316DC if *(esp+8)==7a00 Load idaw.exe into SoftICE, set the breakpoint, and load ida.wll into IDA. When the bp hits, we need to trace the 7A00 back until we find the point at which it was changed from 87A00. This will be done by unwinding the stack within softice, which I will explain as we go along. So, the breakpoint has hit and we're sitting in a function which begins at 4316AC. Looking at the code, we know that 7A00 came into this function as the third argument passed on the stack --- [ebp+10] (see the Appendix to this article for a discussion of parameter passing in C and C++ programs). The strategy is to find the function that called the one we're in and figure out where it got the value 7A00 that it passed to the function we're in. We continue to apply this method, walking back through the call stack until we find out how 87A00 got to be 7A00. Here's how it works. Use the command dd esp to display the memory at the top of the stack. The first address within the program's code starting from the stack top and going forward (i.e., higher) in memory is the return address for the call to the function we're in. Disassemble from that address and scroll up a little. Directly above the address from which you disassembled you'll see the call instruction. So, 4316AC was called by the function at 43088C: :004308CD 8B4D10 mov ecx, dword ptr [ebp+10] ; 3rd arg passed to this ; func == 7A00 :004308D0 51 push ecx ; arg3 to 004316AC :004308D1 50 push eax ; arg2 to 004316AC :004308D2 8B4508 mov eax, dword ptr [ebp+08] :004308D5 50 push eax ; arg1 to 004316AC :004308D6 E8D10D0000 call 004316AC From this code we see that 7A00 came in as the 3 rd argument passed to 43088C. So, who called 43088C. Look a little further up the stack (which you should keep in the SoftICE data window) to find a call at 42EB4C, which is in the function that starts at 42EAD4. Keep following the same stack tracing method until you hit a call at 43A395 to a function at 42EC04 (it should only be 2 more steps). 43A395 is in the function that starts at 43A314 and this where we hit pay-dirt. I have included almost the entire function below because what happens here is very interesting. I've commented many of the lines and will also refer to the code as I continue. At 43A389, we notice that the local variable [ebp-18] is passed as the relevant argument to 43088C. Therefore, it contains 7A00. Let's trace backwards from 43A389 and see how [ebp-18] gets its value. It comes in through the cx register, which of course can only hold a 16 bit (i.e., <=64k) value. There's the key to our problem. One more trace back up the stack will take us to the conversion from 87A00 to 7A00. We find the call to 43A314 at 48A836 and looking at the code immediately below we can see where the conversion occurs: :0043A822 8BD7 mov edx, edi :0043A824 8BC3 mov eax, ebx :0043A826 E885FFFFFF call 0043A7B0 :0043A82B 53 push ebx ;="=" start addr of segment="=" 401000 :0043A82C 8BCF mov ecx, edi ;="=" end addr of segment="=" 488A00 :0043A82E 662BCB sub cx, bx ;="=seg" length="=" 7A00 this is our bad guy :0043A831 8BD6 mov edx, esi :0043A833 8B45FC mov eax, dword ptr [ebp-04] :0043A836 E8D9FAFFFF call 0043A314 ; cx and edx are passed to this func Now we have to figure out how to crack it. That means we've got to change a 16 bit data type into a 32 bit data type in the functions at 48A804 and 48A314. In 48A804 it's not so hard. sub cx, bx needs to be sub ecx, ebx. This can be done by changing 66 to 90 (nop) at 43A82E. 66 is the opcode prefix representing an operand size override. A brief digression Intel chips can operate in 32 bit or 16 bit mode. Win32 programs run in 32 bit mode and therefore default to accessing the 32 bit registers and 32 bit memory operands. However, an instruction in 32 bit mode with the operand size override prefix accesses 16 bit registers and 16 bit memory operands. End digression. So, changing 66 to a nop (90) switches the instruction back to accessing 32 bit registering. This strategy is very useful for 48A314 as well. Turning our attention to that function, however, we see that things are much more complicated. The instruction at 43A31D has an opsize prefix, but it accessing a memory location ([ebp-06]) as well as a register. We're going to need 2 more bytes of memory from sonewhere. [ebp-06] is a local variable and we can't have it writing over other local variables. So, let's lay out the local variables on the stack for this function: ebp : prev ebp ebp-04 : local_1 (4 bytes) ebp-06 : our little friend, local_2 (2 bytes) ebp-0C : 10000 (see function code below) (4 bytes) and so on. ebp-0C is a dword pointer and therefore only takes up 4 bytes. It stops at ebp-08. What is there at ebp-08? [ebp-08] is never referenced in the function, so it looks like there's nothing there. Guess what we're going to put there, though? You got it, the rest of the segment length. This can be done by changing all the references to ebp-06 to ebp-08. So, our instruction at 43A31D needs to change from: :0043A31D 66894DFA mov word ptr [ebp-06], cx to: :0043A31D 90 nop :0043A31E 894DF8 mov dword ptr [ebp-08], ecx Removing the 66 takes care of changing cx to ecx and word ptr to dword ptr, while changing FA to F8 changes the ebp offset. The same strategy can be applied at 43A35E, 43A3ED, and 43A414. The instructions at 43A369 and 43A377 also need to be changed. Movzx means move with sign extended and is used for moving smalling operands into larger operands while preserving the sign. We want to change these to simple move instructions, moving our new 32 bit local variable into a 32 bit register. This can be done fairly easily: :0043A369 0FB745FA movzx eax, word ptr [ebp-06] :0043A377 0FB755FA movzx edx, word ptr [ebp-06] becomes :0043A369 908B45F8 (nop) mov eax, dword ptr [ebp-08] :0043A377 908B55F8 (nop) mov edx, dword ptr [ebp-08] That takes care of all the instances of our local variable, but we're not done. At first, it looks like [ebp-0C] is doing some dirty work here. It gets assigned 10000, which is then compared with the # of bytes to read. If the number of bytes is bigger, then only 10000 bytes are read. It looks for all the world like part of the protection scheme, but it isn't. If, after having applied the patches I've mentioned so far, you force the jump at 43A370, the program crashes. What ebp-0C does is simply make sure that the prog is reading only 64k at a time. Notice that most of the function is a loop that reads 64k chunks until it's read everything it needs to. Another 64k red herring. However, the instruction at 43A324 needs to be changed. Remember that edx contains the start address of the segment. For the code segment, this is less than 64k (i.e., 600), but for the data segment it's > 64k. I missed this instruction at first and ended up with a very frustrating problem. The patches I've described so far work, but IDA was lining up the data segment in entirely the wrong place. I spent hours looking for some alternate protection scheme, before I came back and realized what I had missed. 43A324 was the culprit, so that needs to be patched in the same manner as 43A369 and 43A377. That's it. No more 64k boundary and all files are disassembled properly. IV. Reflections on the protection scheme At first, I thought that this must have been done in assembly. Operand override prefixes seemed pretty arcane. However, it was odd to leave that 2 byte whole in the stack. If you're programming in assembly, why do that? Of course, even if that whole hadn't been there, we could have simply added space for a local variable (see the Appendix). Upon further reflection, I realized what the programmer did and he did it in C/C++. The function at 43A314 was changed from taking a long int argument to taking a short int argument and the argument passed to it at 43A804 was cast from a long int to a short int. Two tiny changes in the source code produced exactly this effect (changes, of course, from the real, unlimited version of the program). The 2 bytes we needed on the stack were there because all compilers align 32bit values on dword boundaries and the other surrounding local variable were 32 bit. Furthermore, when you run ida.wll through the cracked IDA you see that the call at 0043A395 to 43EC04 is actually a call to the Borland C library routine _fread (read from file). Why are some of the arguments to 43A314 passed through registers instead of on the stack? See the description of the _fastcall calling convention in the Appendix. All in all, however, this is an interesting protection scheme, and certainly not of a kind that I have ever heard of before. Now all we need to do is get those idb files loaded and produce asm and lst files (I think the code is in there, it's just a matter of getting to it). V. The Expiration Date This crack is utterly trivial. The code that checks the date is immediately after the test for 64k files at the beginning of the program. I'll leave this crack as a mindless exercise. VI. Summary of the patch :00403CED 760B jbe 00403CFA (change 76 to EB) :0043A82E 662BCB sub cx, bx (change 66 to 90) :0043A31D 66894DFA mov word ptr [ebp-06], cx (change to 90894DF8) :0043A324 0FB7C2 movzx eax, dx (change to 908BC2) :0043A35E 66837DFA00 cmp word ptr [ebp-06], 0000 (change to 90837DF800) :0043A369 0FB745FA movzx eax, word ptr [ebp-06] (change to 908B45F8) :0043A377 0FB755FA movzx edx, word ptr [ebp-06] (change to 908B55F8) :0043A3ED 66295DFA sub word ptr [ebp-06], bx (change to 90295DF8) :0043A414 66837DFA00 cmp word ptr [ebp-06], 0000 (change to 90837DF800) VII. The function at 43A314 :0043A314 55 push ebp :0043A315 8BEC mov ebp, esp :0043A317 83C4E0 add esp, FFFFFFE0 :0043A31A 53 push ebx :0043A31B 56 push esi :0043A31C 57 push edi :0043A31D 66894DFA mov word ptr [ebp-06], cx ; here we go! ; 7A00 is passed through cx (a 16bit register-that's no good!) :0043A321 8945FC mov dword ptr [ebp-04], eax :0043A324 0FB7C2 movzx eax, dx ; what's this about? ; it's about hours of headache for me (see above) :0043A327 8BD0 mov edx, eax :0043A329 8B45FC mov eax, dword ptr [ebp-04] :0043A32C 8B7D08 mov edi, dword ptr [ebp+08] :0043A32F E83065FEFF call 00420864 :0043A334 F60532EF490010 test byte ptr [0049EF32], 10 :0043A33B C745F400000100 mov [ebp-0C], 00010000 ; 64k?! is this relevant? NO :0043A342 BA02000000 mov edx, 00000002 :0043A347 7501 jne 0043A34A :0043A349 4A dec edx |:0043A347(C) | :0043A34A 8955F0 mov dword ptr [ebp-10], edx :0043A34D 8B4DF0 mov ecx, dword ptr [ebp-10] :0043A350 0FAF4DF4 imul ecx, dword ptr [ebp-0C] :0043A354 51 push ecx :0043A355 E8028BFFFF call 00432E5C :0043A35A 59 pop ecx :0043A35B 8945EC mov dword ptr [ebp-14], eax :0043A35E 66837DFA00 cmp word ptr [ebp-06], 0000 ; cmp 7A00, 0 :0043A363 0F86B6000000 jbe 0043A41F |:0043A419(C) | :0043A369 0FB745FA movzx eax, word ptr [ebp-06] ; needs patching :0043A36D 3B45F4 cmp eax, dword ptr [ebp-0C] ; [ebp-c]==10000 :0043A370 7605 jbe 0043A377 :0043A372 8B55F4 mov edx, dword ptr [ebp-0C] :0043A375 EB04 jmp 0043A37B |:0043A370(C) | :0043A377 0FB755FA movzx edx, word ptr [ebp-06] ; [ebp-06]==7A00 |:0043A375(U) | :0043A37B 8955E8 mov dword ptr [ebp-18], edx ;edx==7A00 :0043A37E 8BC7 mov eax, edi :0043A380 E84B51FEFF call 0041F4D0 :0043A385 8B4DFC mov ecx, dword ptr [ebp-04] :0043A388 51 push ecx :0043A389 8B45E8 mov eax, dword ptr [ebp-18] ;==7A00 :0043A38C 50 push eax ; # of bytes to read :0043A38D 8B55F0 mov edx, dword ptr [ebp-10] :0043A390 52 push edx :0043A391 8B4DEC mov ecx, dword ptr [ebp-14] :0043A394 51 push ecx :0043A395 E86A48FFFF call 0042EC04 ; this is the call that ends up ; at ReadFile and takes #of bytes ; to read as third arg . . . . . some irrelevant code removed . . . . . |:0043A3AF(C), :0043A3C8(U), :0043A3D4(C) | :0043A3ED 66295DFA sub word ptr [ebp-06], bx ; this needs patching :0043A3F1 3B5DE8 cmp ebx, dword ptr [ebp-18] :0043A3F4 7419 je 0043A40F :0043A3F6 8B55EC mov edx, dword ptr [ebp-14] :0043A3F9 52 push edx :0043A3FA E8F589FFFF call 00432DF4 :0043A3FF 59 pop ecx :0043A400 68BE000000 push 000000BE ; message from ida.hlp: ; "Error during read, not all of file read" :0043A405 E872640200 call 0046087C :0043A40A 59 pop ecx :0043A40B 33C0 xor eax, eax :0043A40D EB1F jmp 0043A42E |:0043A3F4(C) | :0043A40F E8D8FEFFFF call 0043A2EC :0043A414 66837DFA00 cmp word ptr [ebp-06], 0000 ; needs patching :0043A419 0F874AFFFFFF ja 0043A369 |:0043A363(C) | :0043A41F 8B55EC mov edx, dword ptr [ebp-14] :0043A422 52 push edx :0043A423 E8CC89FFFF call 00432DF4 :0043A428 59 pop ecx :0043A429 B801000000 mov eax, 00000001 |:0043A40D(U) | :0043A42E 5F pop edi :0043A42F 5E pop esi :0043A430 5B pop ebx :0043A431 8BE5 mov esp, ebp :0043A433 5D pop ebp :0043A434 C20400 ret 0004 APPENDIX: Stack frames and calling conventions The most common way for a program to set up the stack during a function call is to use the ebp register (the base register) to hold the base of the stack relative to that function. An amount is then subtracted from esp which represents the amount of space reserved for local variables. This is done with the following code which is at the beginning of most functions: push ebp mov ebp,esp add esp, -10 (FFFFFFF0) ; 10 bytes for local variables The funcion can then use positive offsets to ebp to reference arguments and negative offsets to reference local variables. For example: loc1: push arg3 loc2: push arg2 loc3: push arg1 loc4: call sub loc5: mov [ebp-8], eax proc sub sub1: push ebp ; save ebp from calling function sub2: mov ebp, esp ; set ebp to point at stack base for this function sub3: sub esp, 10 ; reserve 10 bytes for four local variables sub4: mov eax, [ebp+8] ; move arg1 into eax sub5: mov ecx, [ebp+C] ; move arg2 into ecx sub6: mov edx, [ebp+10] ; move arg3 into edx sub7: mov [ebp-4], eax ; move arg1 into local_1 sub8: mov [ebp-8], edx ; move arg3 into local_2 sub9: push [ebp+C] ; push arg2 onto stack At sub1, the stack looks like this: arg3 <-- esp-C arg2 <-- esp-8 arg1 <-- esp-4 loc5 <-- esp At sub4 it looks like this: arg3 <-- ebp+10 arg2 <-- ebp+C arg1 <-- ebp+8 loc5 <-- the return address ebp from calling function <-- current ebp local_1 <-- ebp-4 local_2 <-- ebp-8 local_3 <-- ebp-C local_4 <-- ebp-10 <-- esp Things are not always this pretty, unfortunately. There are three factors that can disrupt this happy picture. 1. Stack frame optimization 2. Alternate calling conventions 3. Enregistered local variables 1. Most compilers these days (certainly M$ VC++ and Borland C++) have an optimization setting that allows you to turn off ebp based stack frames. This makes the function overhead smaller and frees up the ebp register for other uses. The function still references arguments and local variables off the stack, but the ebp register doesn't point to the function's stack base. Instead, esp is used to reference the arguments and locals. The problem is that every time something is pushed onto or popped off of the stack, the value of esp changes. The compiler is able to compensate for this by adjusting the amount esp is offset when referencing the arguments and locals. Here's an example: sub1: mov eax, [esp+4] ; mov arg1 into eax sub2: push ebx ; push ebx onto the stack for whatever reason sub3: mov edx, [esp+8] ; move arg1 into edx---the offset has changed ; because we pushed ebx onto the stack at sub2 This is annoying because it makes much harder for things that aren't compilers (like us) to keep track of what's getting referenced where. The nice thing about IDA Pro is that marks all these references for you! That's why it *must* be cracked. 2. A calling convention determines the order in which arguments are passed and how they are passed. The two most common calling convention are the Pascal and C calling conventions, which fravia+ has explained quite well elsewhere. However, there are two other conventions, which pass arguments through registers, that are worth being aware of. The first is the _thiscall convention, which follows the C calling convention, but also passes a pointer to the object this (in C++, this refers to the current object) through register ecx. The second is the _fastcall convention, which passes the first two arguments that are 32 bits or less through registers ecx and edx, and then passes the rest of the arguments on the stack with the C calling order. 3. Local variables are not always stored on the stack after the return address. For optimization purposes (it's much faster to get a value from a register than from memory), registers are sometimes used to hold local variables. Of course, there being a limited number of general registers in Intel chips (RISC chips can have 5-10 times as many registers) this can only be true to a limited extent. Furthermore, the compiler has to use some registers as what are called scratch registers (i.e., places to temporarily hold values) while it moves values in and out of memory. Detecting enregistered local variables can be difficult.
(c) Quine 1997. All rights reversed