REC - Reverse Engineering Compiler User's Guide

REC's Home Page | Using REC | CG's Home page | E-mail : caprino@netcom.com


REC Command Line Syntax

REC uses a number of techniques and information from different sources in addition to the information REC itself extracts from the input executable file.

REC is also highly configurable, that is it has a lot of options to configure the output and enable or disable its algorithms, to produce different types of output.

REC works in batch or interactive mode. It can be called with the following command line syntax:

rec [{+|-}optionname ...] exec_file

To activate an option, precede its name with a + (plus) sign. To disable an option, precede it with a - (minus) sign. To get the list of all the options and their current value, type:

rec +help

Another very important option is +interactive, which starts an interactive session.

Many of the options are used to debug the program, or to tune its output. A complete list of the options requires an understanding of the algorithms and phases that REC performs to transform an executable file in a source file. If you don't know the meaning of one option, you can experiment by enabling it and check if the output is clearer. Note that some option is only valid if another option is enabled.

The same set of options is available regardless of the host/target combination.

REC Operation

The following block diagram shows REC's interaction with the files it uses/produces:

The minimum input to REC is the binary executable file. For example:

rec file.exe

If file.exe is in one of the recognized formats, it will be read, and a file.rec will be produced using the default options, without further intervention from the user.

However, since decompilation is a very difficult process, the more additional information can be provided by the user, the better the output.

For example, alternative algorithms could be selected, based on the compiler used to compile the executabile file, or based on readability or output preferences. To change any of the default options, the content of the file .recrc (rec.cfg on MSDOS and Windows) is read. Each line in this file contains an option, as if that option was entered on the command line. For example, if you always want REC to start in interactive mode and to always print numeric constants in hexadecimal, use the following lines in the .recrc file:

+interactive
+hexconst

These options can be overridden by command line options. For example, to run REC in batch mode even though the .recrc has a +interactive option, invoke REC with the following command line:

rec -interactive file.exe

Command Files and handling unrecognized formats

The input file could be in a format not yet recognized by REC. In this case, REC has no knowledge of which areas of the file contain data, which contain code and which contain auxiliary information. In this case, REC can be given this information in an ASCII file, called a command file. In this command file, a lot more inforamtion can also be provided, including predefined types, addresses of functions and configuration options. For example, REC could be invoked with the following command line:

rec file.cmd

where file.cmd has the following content:

#!wrec
option: +hexconst

file: file.exe 0x50 0x53
region: 0x80100000 0x801009b4 0x800 data
region: 0x801009b4 0x8010c1e8 0x11b4 text
region: 0x8010c1e8 0x80120800 0xc888 data

symbol: 0x80107fe0, 0x80108077 T CrearImage()
symbol: 0x80108078, 0x801080d7 T LoadImage(char *, int, int)
symbol: 0x801080d8, 0x8010813b T StoreImage()
symbol: 0x8010813c, 0x801081ff T MoveImage(char *, int, int)

patterns: libmips.pat

types: string.o
types:
stdio.o

The file starts with a magic-id : #!wrec. This must be on the first line. Each line contains one command followed by a colon sign (:) and by some arguments. Comments are preceded by a '#' character. The remainder of the line after the '#' is ignored.

Each of the option: command sets one of REC's options. These options override those provided on the command line.

The file: command specifies the binary file to be loaded. There should be only one file: command. After the file name, the magic argument specifies an optional identifier that must be present at the beginning of the file (magic number).

The region: commands specify the layout of the binary file. The arguments are the start and end memory address at which the code and data will be loaded into memory, and the file offset where the section starts. Note that no actual loading occurs. The addresses are only used for informational purposes (they must be correct for call statements to be meaningful). The last argument is the region type, and affects the operation performed on the content of the region. Only text regions are considered for decompilation. Data regions are scanned to find ASCII strings and generic pointers.
In the example:

region: 0x80100000     0x801009b4     0x800     data
        start addr      end addr      file      region
                                      offset    type

The symbol: commands specify starting and ending addresses of functions, along with a symbolic name and possibly a list of parameters for the function. The ending address is optional, and can be computed by REC automatically (see later). Also the ANSI-C style prototype is optional, and actually its use is discouraged, as types should be defined in a type file (see the types: command later). It is better to simply specify that the symbol is a function by adding ( ).

The patterns: commands specify one or more files containing a list of hex strings (pattern) and symbolic names. REC will search in the executable file for each pattern, and when found, it will assign the symbolic name associated with the pattern to the address where the pattern begins. The following is an example of a pattern file:

open() size: 16
A0 00 0A 24 08 00 40 01
00 00 09 24 00 00 00 00
;
lseek() size: 16
A0 00 0A 24 08 00 40 01 01 00 09 24 00 00 00 00
;
...

Each pattern can be up to 256 bytes. These patterns are sometimes called signatures in the literature. The size: option tells REC how many bytes the function occupies in the binary file. For example, you can specify a 16 bytes pattern for a 3000 bytes function.

The types: commands specify one or more ELF files with STAB symbolic information. This file is read to get predefined types and function prototypes. To create a types file, you can simply use Linux' system compiler (or gcc on a Solaris system) with the -g option. For example, to let REC know the types of the functions defined in the string.h header file, you can compile the following C source with the command line "gcc -g -c string.c":

/* string.c - types defined by string.h */

char *strcmp(const char *s1, const char *s2) { }
char *strncmp(const char *s1, const char *s2, int len) { }
char *strcpy(char *dst, const char *src) { }
char *strchr(const char *, int ch) { }
....

REC will add the prototype information to the symbols specified by the symbol: commands or to those found by the patterns: command. The actual code for the compiled functions is ignored, as well as their addresses. Note that the compiler will not generate symbolic information for functions that are not defined in the file, hence the { } at the end of each function.

REC's output

When the end of the command file is reached, and/or when REC has finished analyzing the executable file, it will either enter interactive mode, or it will process the entire executable file. Currently there can be two types of output:

  1. If the +disasmonly option was specified, a file with the .dis extension will be produced. In this file, every region with the text attribute will be disassembled, and every region with the data attribute will be hexdumped.
  2. Without any option, a file with the .rec extension will be produced with a C-like representation of each procedure in each text section. The C-like representation is not perfect, and cannot be fed to a compiler to recreate the original binary. Its goal is to provide the user a better understanding of the structure of the program. The following is an example of the C-like output:
    	fill_buff()
    	{
    	L8006be08:
    		r7 = buf_2;
    		r6 = r4;
    		r8 = r6 + 0x30;
    	L8006be18:
    		do {
    			r7->f0 = r6->f0;
    			r7->f4 = r6->f4;
    			r7->f8 = r6->f8;
    			r7->f12 = r6->f12;
    			_t = r6 + 0x10 - r8;
    			r7 = r7 + 0x10;
    		} while(_t != 0x0);
    	L8006be44:
    		return(0x0);
    	} 
    	... 
    	symbol: 0x8006BE08, 0x8006BE4B T fill_buff()
    	symbol: 0x8006BE4C, 0x8006BE73 T L8006BE4C()
    	symbol: 0x8006BE74, 0x8006BEB7 T L8006be74() 

The symbols: at the end of the output can be modified (for example providing a name for the L8006BE4C function), and then copied back into the .cmd file for another decompilation pass.

Additional output files could be produced if any of the debugging options were enabled. These files are used to produce the intermediate representation of the decompiled file during different stages of the decompilation process.

Options List

The following is a list of all the options supported by REC. The options are presented in hierarchical order, that is some options are meaningful only if the parent option has been enabled.

I might add more options as I add other features. For example, although REC is able to read ELF, A.OUT and Windows PE COFF files, the MIPS version only reads generic binary files. If the executable file has an internal structure (such as a file header and separate areas for code and data), the user must provide the layout and memory allocation information. This is because the only MIPS files I can use as examples are raw binary files.


TODO List:

Things that I still need to add (I'm working on them in my spare time):


Copyright (C) 1997 - 1998 Backer Street Software -- All right reserved.

Last revised on 1 Feb. 1998

REC's Home Page | Using REC | CG's Home page | E-mail : caprino@netcom.com