When faced with a "live" or "dead" listing of the target program, the reverse-engineer
is like a surgeon to whom the most intricate details of the physiology of his patient
must be known, if there is to be any chance of success. What follows below is the
equivalent of an anatomy primer for the reverse engineer: a summary of the Win32 PE file
header and format, and three examples of the source code of a PE executable: the first
a simple C++ program, the second a rudimentary Windows program written in assembly
language, and the third being a disassembly--via W32Dasm--of the C++ program. It is hoped
that by reviewing and comparing these the reverse-engineer will become familiar with
the fundamental structures and "organs" of his target, and thus be prepared to patch and
"heal" effectively any crippled programs that come his way.
"In the beginning", there were two types of executable files: .COM files (single-segment executables), and .EXE files (multiple-segment executables). COM files needed no file header as they were only a single segment in size, and thus were written directly into the first segment of memory available. EXE files, however, would often cover multiple segments; as a result, they contained a 64-byte file header which provided the OS with information about how many segments were needed and how they would be allocated.
The Windows OS was developed with the idea that executable files would rely on a core collection of OS functions (an "API"); as the programs were no longer handling things like keyboard input, mouse tracking, and screen output internally, and as they were expected to share memory with multiple other programs runninf simultaneously, they needed to provide more information to the OS in order to make communication with the OS both possible and efficient (whereas DOS programs often ignored the OS entirely). Thus came about the NE, LZ and now the PE file formats and their subsequent headers: the file format imposes structure on the executable file, and the file header diagrams that structure. Armed with a firm knowledge of the PE file header and format, one can make a surprising number of changes on a PE file with only a hex editor (hopefully one that parses the PE header, such as Hiew 5.66).
The PE file header begins with the MS-DOS MZ Header:
MZ Header WORD e_magic; // Magic number WORD e_cblp; // Bytes on last page of file WORD e_cp; // Pages in file WORD e_crlc; // Relocations WORD e_cparhdr; // Size of header in paragraphs WORD e_minalloc; // Minimum extra paragraphs needed WORD e_maxalloc; // Maximum extra paragraphs needed WORD e_ss; // Initial (relative) SS value WORD e_sp; // Initial SP value WORD e_csum; // Checksum WORD e_ip; // Initial IP value WORD e_cs; // Initial (relative) CS value WORD e_lfarlc; // File address of relocation table WORD e_ovno; // Overlay number WORD e_res[4]; // Reserved words WORD e_oemid; // OEM identifier (for e_oeminfo) WORD e_oeminfo; // OEM information; e_oemid specific WORD e_res2[10]; // Reserved words DWORD e_lfanew; // File address of new exe headerThis is followed by a "stub" program (such as winstub.exe) that executes if the program is run outside of Windows; usually this is a simple "This program require MS-Windows to run." message, but it can be cusomized by the programmer to say or do anything--in fact one could even write a DOS-mode version of the program that executes whenever the file is run in DOS, thus making it more portable (funny, no-one ever seems to take the time to do this). After the "stub" comes the signature 00004550 (which in a hex editor will appear Hex 50 45 00 00, ASCII "PE.."), then the PE file header:
PE File Header examples given are from Notepad.exe) WORD Machine Type ( 014C ); WORD Number of Sections ( 0006 ); DWORD Time/Date Stamp ( 2FF3548D ); DWORD Pointer To Symbol Table ( 00000000 ); DWORD Number Of Symbols ( 00000000 ); WORD Size Of Optional Header ( 00E0 ); WORD Characteristics ( 010E ); PE Optional Header WORD Magic ( 010B ); BYTE MajorLinkerVersion ( 02 ); BYTE MinorLinkerVersion ( 32 ); DWORD SizeOfCode ( 00003A00 ); DWORD SizeOfInitializedData ( 00004800 ); DWORD SizeOfUninitializedData ( 00000600 ); DWORD AddressOfEntryPoint ( 00001000 ); DWORD BaseOfCode ( 00001000 ); DWORD BaseOfData ( 00005000 ); ----NT Optional Fields (used only by Windows NT)---- DWORD ImageBase; DWORD SectionAlignment; DWORD FileAlignment; WORD MajorOperatingSystemVersion; WORD MinorOperatingSystemVersion; WORD MajorImageVersion; WORD MinorImageVersion; WORD MajorSubsystemVersion; WORD MinorSubsystemVersion; DWORD Reserved1; DWORD SizeOfImage; DWORD SizeOfHeaders; DWORD CheckSum; WORD Subsystem; WORD DllCharacteristics; DWORD SizeOfStackReserve; DWORD SizeOfStackCommit; DWORD SizeOfHeapReserve; DWORD SizeOfHeapCommit; DWORD LoaderFlags; DWORD NumberOfRvaAndSizes;Immediately after the PE File Header (remember to add the Header Size to the offset of the start of the header in order to jump over the 00 bytes filling in the NT optional fields) come the Section Headers:
Section Header BYTE Name[IMAGE_SIZEOF_SHORT_NAME]; DWORD PhysicalAddress; DWORD VirtualSize; DWORD VirtualAddress; DWORD SizeOfRawData; DWORD PointerToRawData; DWORD PointerToRelocations; DWORD PointerToLinenumbers; WORD NumberOfRelocations; WORD NumberOfLinenumbers; DWORD Characteristics;Note that there is one Section Header per section; thus, according to the PE Header for Notepad.exe, there will be 06 Section Headers. Each section contains its name in ASCII (e.g. ".text") and a pointer to its location; the headers are 40 bytes apiece and there is no "padding" between them. The sections that are commonly present in an executable are:
Note that not all of these sections need be present. When searching for a specific section, it is possible to bypass the PE header entirely and start parsing the section headers by searching for the section name in the ASCII window of a hex editor.
Executable code section: .text
This section contains the program code as well as the "fixup" jump table. There is no format to the .text section saving that imposed upon the binary code itself.
Data sections: .bss, .rdata, .data
There are three types of data sections: .bss, which contains uninitialized data (including all variables declared as static); .rata, which contains read-only data (such as strings, and constants); and .data, which contains global variables for the program. These sections have no real structure.
The .rsrc section contains all of the resources for the application. The first 16 bytes of the .rsrc section contain the Resource Directory Header:
Resource Directory DWORD Characteristics; DWORD TimeDateStamp; WORD MajorVersion; WORD MinorVersion; WORD NumberOfNamedEntries; WORD NumberOfIdEntries;Which is immediately followed by the number of Directory Entries specified in NumberOfNamedEntries + NumberOfIdEntries:
Resource Directory Entry DWORD Name; DWORD OffsetToData;The Name of a Directory Entry determines the type of the resource (as defined in winuser.h), while the Offset points either to another Resource Directory Entry (the usual structure is 1 Resource Directory containing the Resource type pointing to 1 Resource Directory (or subdirectory) containing the Resource ID # and pointing to the Resource Data Entry), or to a Resource Data Entry:
Resource Data Entry DWORD OffsetToData; DWORD Size; DWORD CodePage; DWORD Reserved;The Resource Data Entry contains the size and offset of the actual resource data, which will be a list of unicode strings for a String Table, a binary image for a bitmap, or a list of values and strings for a dialog box. Information on the format of the binary resource content can be found in Micro$oft's Win32 Binary Resource Formats document.
The .edata section contains the exported data and functions for an application or library (.DLL). The section begins with the Export Directory:
Export Directory DWORD Characteristics; DWORD TimeDateStamp; WORD MajorVersion; WORD MinorVersion; DWORD Name; DWORD Base; DWORD NumberOfFunctions; DWORD NumberOfNames; DWORD *AddressOfFunctions; DWORD *AddressOfNames; WORD *AddressOfNameOrdinals;The last three fields contain pointers to a list of exported function entry points, a null-seperated list of function names, and a list of ordinal values for the functions. Note that these pointers assume the program is loaded; to find the lists within the file, one has to subtract the Section Header Virtual Address from the AddressOf... field, then add the PointerToRawData address to the result.
This section contains the list of functions imported into the program. It begins with the Import Directory:
Import Directory DWORD dwRVAFunctionNameList; DWORD dwUseless1; DWORD dwUseless2; DWORD dwRVAModuleName; DWORD dwRVAFunctionAddressList;The last two fields are repeated for each application or library that the program imports from; the order is a little strange as one may notice in a hexeditor: the name of the first function imported from a module is given, then the module name is given, then any remaining functions imported from that module are given. The list of imports repeats until there is a null entry.
Debug information section: .debug
This section contains the debug information for the program, if the compiler was set to provide this. It begins with a Debug Directory:
Debug Directory DWORD Characteristics; DWORD TimeDateStamp; WORD MajorVersion; WORD MinorVersion; DWORD Type; DWORD SizeOfData; DWORD AddressOfRawData; DWORD PointerToRawData;The different types of debug information are defined in winuser.h; any further structure imposed on each type of debug information is defined there as well.
Modifying the PE File Header
Since the PE Header gives the starting address and size of each of its various sections (and, as in the case of the .rsrc section, the size of the actual data), it is a simple enough if tedious matter to modify the contents of the PE Header beyond its original size by adjusting the offset fields of each section following that which was modified. To add additional code one must extend the .text section and repair every section thereafter; to add or modify resources one must modify each subdirectory of the .rsrc section and repair every section that comes afterwards (this is how BRW and Resource Studio work). It would make sense, then, when changing any section to move its Virtual Address to the end of the PE file, and extend the section before it to cover the resultant gap which should, of course, be padded with 00 bytes. Note that this is somewhat dicey, especially with the .text and .idata sections, and it increases the size of the executable quite a bit...however it is the quickets and easiest method in some cases, as only two sections need to be repaired (one padded & extended, the other "moved" one modified in every "offset to..." field). An alternative, if the added information is to be data referenced directly by the program (and this may make it desireable to place executable code in the data directories, then refer to it by offset from within the program and insert it into the executable), is to append the data to the last section and extend that section to cover the data.