Mammon_'s Tales to his Grandson
Illustrations of an skeleton


When faced with a "live" or "dead" listing of the target program, the reverse-engineer is like a surgeon to whom the most intricate details of the physiology of his patient must be known, if there is to be any chance of success. What follows below is the equivalent of an anatomy primer for the reverse engineer: a summary of the Win32 PE file header and format, and three examples of the source code of a PE executable: the first a simple C++ program, the second a rudimentary Windows program written in assembly language, and the third being a disassembly--via W32Dasm--of the C++ program. It is hoped that by reviewing and comparing these the reverse-engineer will become familiar with the fundamental structures and "organs" of his target, and thus be prepared to patch and "heal" effectively any crippled programs that come his way.



PE File Format

"In the beginning", there were two types of executable files: .COM files (single-segment executables), and .EXE files (multiple-segment executables). COM files needed no file header as they were only a single segment in size, and thus were written directly into the first segment of memory available. EXE files, however, would often cover multiple segments; as a result, they contained a 64-byte file header which provided the OS with information about how many segments were needed and how they would be allocated.

The Windows OS was developed with the idea that executable files would rely on a core collection of OS functions (an "API"); as the programs were no longer handling things like keyboard input, mouse tracking, and screen output internally, and as they were expected to share memory with multiple other programs runninf simultaneously, they needed to provide more information to the OS in order to make communication with the OS both possible and efficient (whereas DOS programs often ignored the OS entirely). Thus came about the NE, LZ and now the PE file formats and their subsequent headers: the file format imposes structure on the executable file, and the file header diagrams that structure. Armed with a firm knowledge of the PE file header and format, one can make a surprising number of changes on a PE file with only a hex editor (hopefully one that parses the PE header, such as Hiew 5.66).

The PE file header begins with the MS-DOS MZ Header:

 
MZ Header 
WORD e_magic;         // Magic number 
WORD e_cblp;          // Bytes on last page of file 
WORD e_cp;            // Pages in file 
WORD e_crlc;          // Relocations 
WORD e_cparhdr;       // Size of header in paragraphs 
WORD e_minalloc;      // Minimum extra paragraphs needed 
WORD e_maxalloc;      // Maximum extra paragraphs needed 
WORD e_ss;            // Initial (relative) SS value 
WORD e_sp;            // Initial SP value 
WORD e_csum;          // Checksum 
WORD e_ip;            // Initial IP value 
WORD e_cs;            // Initial (relative) CS value 
WORD e_lfarlc;        // File address of relocation table 
WORD e_ovno;          // Overlay number 
WORD e_res[4];        // Reserved words 
WORD e_oemid;         // OEM identifier (for e_oeminfo) 
WORD e_oeminfo;       // OEM information; e_oemid specific 
WORD e_res2[10];      // Reserved words 
DWORD   e_lfanew;        // File address of new exe header 
This is followed by a "stub" program (such as winstub.exe) that executes if the program is run outside of Windows; usually this is a simple "This program require MS-Windows to run." message, but it can be cusomized by the programmer to say or do anything--in fact one could even write a DOS-mode version of the program that executes whenever the file is run in DOS, thus making it more portable (funny, no-one ever seems to take the time to do this). After the "stub" comes the signature 00004550 (which in a hex editor will appear Hex 50 45 00 00, ASCII "PE.."), then the PE file header:
 
PE File Header examples given are from Notepad.exe) 
WORD  Machine Type ( 014C ); 
WORD  Number of Sections ( 0006 ); 
DWORD  Time/Date Stamp ( 2FF3548D ); 
DWORD  Pointer To Symbol Table ( 00000000 ); 
DWORD  Number Of Symbols ( 00000000 ); 
WORD  Size Of Optional Header ( 00E0 ); 
WORD Characteristics ( 010E ); 
PE Optional Header 
WORD  Magic ( 010B ); 
BYTE   MajorLinkerVersion ( 02 ); 
BYTE   MinorLinkerVersion ( 32 ); 
DWORD   SizeOfCode ( 00003A00 ); 
DWORD   SizeOfInitializedData ( 00004800 ); 
DWORD   SizeOfUninitializedData ( 00000600 ); 
DWORD   AddressOfEntryPoint ( 00001000 ); 
DWORD   BaseOfCode ( 00001000 ); 
DWORD   BaseOfData ( 00005000 ); 
----NT Optional Fields (used only by Windows NT)---- 
DWORD   ImageBase; 
DWORD   SectionAlignment; 
DWORD   FileAlignment; 
WORD  MajorOperatingSystemVersion; 
WORD  MinorOperatingSystemVersion; 
WORD  MajorImageVersion; 
WORD  MinorImageVersion; 
WORD  MajorSubsystemVersion; 
WORD  MinorSubsystemVersion; 
DWORD   Reserved1; 
DWORD   SizeOfImage; 
DWORD   SizeOfHeaders; 
DWORD   CheckSum; 
WORD  Subsystem; 
WORD  DllCharacteristics; 
DWORD   SizeOfStackReserve; 
DWORD   SizeOfStackCommit; 
DWORD   SizeOfHeapReserve; 
DWORD   SizeOfHeapCommit; 
DWORD   LoaderFlags; 
DWORD   NumberOfRvaAndSizes; 
Immediately after the PE File Header (remember to add the Header Size to the offset of the start of the header in order to jump over the 00 bytes filling in the NT optional fields) come the Section Headers:
 
Section Header 
BYTE   Name[IMAGE_SIZEOF_SHORT_NAME]; 
DWORD   PhysicalAddress; 
DWORD   VirtualSize; 
DWORD   VirtualAddress; 
DWORD   SizeOfRawData; 
DWORD   PointerToRawData; 
DWORD   PointerToRelocations; 
DWORD   PointerToLinenumbers; 
WORD  NumberOfRelocations; 
WORD  NumberOfLinenumbers; 
DWORD   Characteristics; 
Note that there is one Section Header per section; thus, according to the PE Header for Notepad.exe, there will be 06 Section Headers. Each section contains its name in ASCII (e.g. ".text") and a pointer to its location; the headers are 40 bytes apiece and there is no "padding" between them. The sections that are commonly present in an executable are:
  • Executable Code Section, named .text
  • Data Sections, named .data, .rdata, or .bss
  • Resources Section, named .rsrc
  • Export Data Section, named .edata
  • Import Data Section, named .idata
  • Debug Information Section, named .debug

    Note that not all of these sections need be present. When searching for a specific section, it is possible to bypass the PE header entirely and start parsing the section headers by searching for the section name in the ASCII window of a hex editor.

    Executable code section: .text

    This section contains the program code as well as the "fixup" jump table. There is no format to the .text section saving that imposed upon the binary code itself.

    Data sections: .bss, .rdata, .data

    There are three types of data sections: .bss, which contains uninitialized data (including all variables declared as static); .rata, which contains read-only data (such as strings, and constants); and .data, which contains global variables for the program. These sections have no real structure.

    Resources section: .rsrc

    The .rsrc section contains all of the resources for the application. The first 16 bytes of the .rsrc section contain the Resource Directory Header:

     
     Resource Directory 
     DWORD   Characteristics; 
     DWORD   TimeDateStamp; 
     WORD  MajorVersion; 
     WORD  MinorVersion; 
     WORD  NumberOfNamedEntries; 
     WORD  NumberOfIdEntries; 
     
    Which is immediately followed by the number of Directory Entries specified in NumberOfNamedEntries + NumberOfIdEntries:
     
     Resource Directory Entry 
     DWORD   Name; 
     DWORD   OffsetToData; 
     
    The Name of a Directory Entry determines the type of the resource (as defined in winuser.h), while the Offset points either to another Resource Directory Entry (the usual structure is 1 Resource Directory containing the Resource type pointing to 1 Resource Directory (or subdirectory) containing the Resource ID # and pointing to the Resource Data Entry), or to a Resource Data Entry:
     
     Resource Data Entry 
     DWORD   OffsetToData; 
     DWORD   Size; 
     DWORD   CodePage; 
     DWORD   Reserved; 
     
    The Resource Data Entry contains the size and offset of the actual resource data, which will be a list of unicode strings for a String Table, a binary image for a bitmap, or a list of values and strings for a dialog box. Information on the format of the binary resource content can be found in Micro$oft's Win32 Binary Resource Formats document.

    Export data section: .edata

    The .edata section contains the exported data and functions for an application or library (.DLL). The section begins with the Export Directory:

     
     Export Directory 
     DWORD   Characteristics; 
     DWORD   TimeDateStamp; 
     WORD  MajorVersion; 
     WORD  MinorVersion; 
     DWORD   Name; 
     DWORD   Base; 
     DWORD   NumberOfFunctions; 
     DWORD   NumberOfNames; 
     DWORD  *AddressOfFunctions; 
     DWORD  *AddressOfNames; 
     WORD *AddressOfNameOrdinals; 
     
    The last three fields contain pointers to a list of exported function entry points, a null-seperated list of function names, and a list of ordinal values for the functions. Note that these pointers assume the program is loaded; to find the lists within the file, one has to subtract the Section Header Virtual Address from the AddressOf... field, then add the PointerToRawData address to the result.

    Import data section: .idata

    This section contains the list of functions imported into the program. It begins with the Import Directory:

     
     Import Directory 
     DWORD    dwRVAFunctionNameList; 
     DWORD    dwUseless1; 
     DWORD    dwUseless2; 
     DWORD    dwRVAModuleName; 
     DWORD    dwRVAFunctionAddressList; 
     
    The last two fields are repeated for each application or library that the program imports from; the order is a little strange as one may notice in a hexeditor: the name of the first function imported from a module is given, then the module name is given, then any remaining functions imported from that module are given. The list of imports repeats until there is a null entry.

    Debug information section: .debug

    This section contains the debug information for the program, if the compiler was set to provide this. It begins with a Debug Directory:

     
    Debug Directory 
    DWORD   Characteristics; 
    DWORD   TimeDateStamp; 
    WORD  MajorVersion; 
    WORD  MinorVersion; 
    DWORD   Type; 
    DWORD   SizeOfData; 
    DWORD   AddressOfRawData; 
    DWORD   PointerToRawData; 
    
    The different types of debug information are defined in winuser.h; any further structure imposed on each type of debug information is defined there as well.

    Modifying the PE File Header

    Since the PE Header gives the starting address and size of each of its various sections (and, as in the case of the .rsrc section, the size of the actual data), it is a simple enough if tedious matter to modify the contents of the PE Header beyond its original size by adjusting the offset fields of each section following that which was modified. To add additional code one must extend the .text section and repair every section thereafter; to add or modify resources one must modify each subdirectory of the .rsrc section and repair every section that comes afterwards (this is how BRW and Resource Studio work). It would make sense, then, when changing any section to move its Virtual Address to the end of the PE file, and extend the section before it to cover the resultant gap which should, of course, be padded with 00 bytes. Note that this is somewhat dicey, especially with the .text and .idata sections, and it increases the size of the executable quite a bit...however it is the quickets and easiest method in some cases, as only two sections need to be repaired (one padded & extended, the other "moved" one modified in every "offset to..." field). An alternative, if the added information is to be data referenced directly by the program (and this may make it desireable to place executable code in the data directories, then refer to it by offset from within the program and insert it into the executable), is to append the data to the last section and extend that section to cover the data.