REC's Home Page | Using REC | CG's Home page | E-mail : caprino@netcom.com
| REC Command Line Syntax |
|---|
REC uses a number of techniques and information from different sources in addition to the information REC itself extracts from the input executable file.
REC is also highly configurable, that is it has a lot of options to configure the output and enable or disable its algorithms, to produce different types of output.
REC works in batch or interactive mode. It can be called with the following command line syntax:
rec [{+|-}optionname ...] exec_file
To activate an option, precede its name with a + (plus) sign. To disable an option, precede it with a - (minus) sign. To get the list of all the options and their current value, type:
rec +help
Another very important option is +interactive, which starts an interactive session.
Many of the options are used to debug the program, or to tune its output. A complete list of the options requires an understanding of the algorithms and phases that REC performs to transform an executable file in a source file. If you don't know the meaning of one option, you can experiment by enabling it and check if the output is clearer. Note that some option is only valid if another option is enabled.
The same set of options is available regardless of the host/target combination.
| REC Operation |
|---|
The following block diagram shows REC's interaction with the files it uses/produces:
The minimum input to REC is the binary executable file. For example:
rec file.exe
If file.exe is in one of the recognized formats, it will be read, and a file.rec will be produced using the default options, without further intervention from the user.
However, since decompilation is a very difficult process, the more additional information can be provided by the user, the better the output.
For example, alternative algorithms could be selected, based on the compiler used to compile the executabile file, or based on readability or output preferences. To change any of the default options, the content of the file .recrc (rec.cfg on MSDOS and Windows) is read. Each line in this file contains an option, as if that option was entered on the command line. For example, if you always want REC to start in interactive mode and to always print numeric constants in hexadecimal, use the following lines in the .recrc file:
+interactive
+hexconstThese options can be overridden by command line options. For example, to run REC in batch mode even though the .recrc has a +interactive option, invoke REC with the following command line:
rec -interactive file.exe
Command Files and handling unrecognized formats
The input file could be in a format not yet recognized by REC. In this case, REC has no knowledge of which areas of the file contain data, which contain code and which contain auxiliary information. In this case, REC can be given this information in an ASCII file, called a command file. In this command file, a lot more inforamtion can also be provided, including predefined types, addresses of functions and configuration options. For example, REC could be invoked with the following command line:
rec file.cmd
where file.cmd has the following content:
#!wrec
option: +hexconstfile: file.exe 0x50 0x53
region: 0x80100000 0x801009b4 0x800 data
region: 0x801009b4 0x8010c1e8 0x11b4 text
region: 0x8010c1e8 0x80120800 0xc888 datasymbol: 0x80107fe0, 0x80108077 T CrearImage()
symbol: 0x80108078, 0x801080d7 T LoadImage(char *, int, int)
symbol: 0x801080d8, 0x8010813b T StoreImage()
symbol: 0x8010813c, 0x801081ff T MoveImage(char *, int, int)patterns: libmips.pat
The file starts with a magic-id : #!wrec. This must be on the first line. Each line contains one command followed by a colon sign (:) and by some arguments. Comments are preceded by a '#' character. The remainder of the line after the '#' is ignored.
Each of the option: command sets one of REC's options. These options override those provided on the command line.
The file: command specifies the binary file to be loaded. There should be only one file: command. After the file name, the magic argument specifies an optional identifier that must be present at the beginning of the file (magic number).
The region: commands specify the layout of the binary file. The arguments are the start and end memory address at which the code and data will be loaded into memory, and the file offset where the section starts. Note that no actual loading occurs. The addresses are only used for informational purposes (they must be correct for call statements to be meaningful). The last argument is the region type, and affects the operation performed on the content of the region. Only text regions are considered for decompilation. Data regions are scanned to find ASCII strings and generic pointers.
In the example:region: 0x80100000 0x801009b4 0x800 data start addr end addr file region offset typeThe symbol: commands specify starting and ending addresses of functions, along with a symbolic name and possibly a list of parameters for the function. The ending address is optional, and can be computed by REC automatically (see later). Also the ANSI-C style prototype is optional, and actually its use is discouraged, as types should be defined in a type file (see the types: command later). It is better to simply specify that the symbol is a function by adding ( ).
The patterns: commands specify one or more files containing a list of hex strings (pattern) and symbolic names. REC will search in the executable file for each pattern, and when found, it will assign the symbolic name associated with the pattern to the address where the pattern begins. The following is an example of a pattern file:
open() size: 16
A0 00 0A 24 08 00 40 01
00 00 09 24 00 00 00 00
;
lseek() size: 16
A0 00 0A 24 08 00 40 01 01 00 09 24 00 00 00 00
;
...Each pattern can be up to 256 bytes. These patterns are sometimes called signatures in the literature. The size: option tells REC how many bytes the function occupies in the binary file. For example, you can specify a 16 bytes pattern for a 3000 bytes function.
The types: commands specify one or more ELF files with STAB symbolic information. This file is read to get predefined types and function prototypes. To create a types file, you can simply use Linux' system compiler (or gcc on a Solaris system) with the -g option. For example, to let REC know the types of the functions defined in the string.h header file, you can compile the following C source with the command line "gcc -g -c string.c":
/* string.c - types defined by string.h */
char *strcmp(const char *s1, const char *s2) { }
char *strncmp(const char *s1, const char *s2, int len) { }
char *strcpy(char *dst, const char *src) { }
char *strchr(const char *, int ch) { }
....REC will add the prototype information to the symbols specified by the symbol: commands or to those found by the patterns: command. The actual code for the compiled functions is ignored, as well as their addresses. Note that the compiler will not generate symbolic information for functions that are not defined in the file, hence the { } at the end of each function.
| REC's output |
|---|
When the end of the command file is reached, and/or when REC has finished analyzing the executable file, it will either enter interactive mode, or it will process the entire executable file. Currently there can be two types of output:
- If the +disasmonly option was specified, a file with the .dis extension will be produced. In this file, every region with the text attribute will be disassembled, and every region with the data attribute will be hexdumped.
- Without any option, a file with the .rec extension will be produced with a C-like representation of each procedure in each text section. The C-like representation is not perfect, and cannot be fed to a compiler to recreate the original binary. Its goal is to provide the user a better understanding of the structure of the program. The following is an example of the C-like output:
fill_buff() { L8006be08: r7 = buf_2; r6 = r4; r8 = r6 + 0x30; L8006be18: do { r7->f0 = r6->f0; r7->f4 = r6->f4; r7->f8 = r6->f8; r7->f12 = r6->f12; _t = r6 + 0x10 - r8; r7 = r7 + 0x10; } while(_t != 0x0); L8006be44: return(0x0); } ... symbol: 0x8006BE08, 0x8006BE4B T fill_buff() symbol: 0x8006BE4C, 0x8006BE73 T L8006BE4C() symbol: 0x8006BE74, 0x8006BEB7 T L8006be74()The symbols: at the end of the output can be modified (for example providing a name for the L8006BE4C function), and then copied back into the .cmd file for another decompilation pass.
Additional output files could be produced if any of the debugging options were enabled. These files are used to produce the intermediate representation of the decompiled file during different stages of the decompilation process.
| Options List |
|---|
The following is a list of all the options supported by REC. The options are presented in hierarchical order, that is some options are meaningful only if the parent option has been enabled.
- +/-help
this option simply prints the list of all the options and their current value on the standard output, and then exits REC.- +/-interactive
disable/enable interactive mode. When in interactive mode, no output file is generated. However, you can see internal information (such as the label list, the branch list, the string list etc.), invoke an interactive hexdump, and decompile individual procedures in random order.- +/-silent
this option will disable the output of the trace information during the decompilation process. If this option is disables, REC prints the current activity on the standard output.- +/-validatestr
this option enables the analysis of the input file areas to detect ASCII strings.- +/-dfoprocs
this option is used to tell REC to only decompile procedures that can be reached from the entry point. The order used by REC is bottom-up, that is the deepest procedure (the fartest from the entry point) is decompiled first; the entry procedure is decompiled last. This allows more accurate acquisition of information such as the number and types of each procedure's parameter.- +/-locals
this option enables/disables the conversion of stack and register references to procedure arguments and local variables.- +/-rdonly
this option tries to substitute register references when the only assignment to the register is that of a formal parameter- +/-simplifyexprs
when this is enabled, processor idioms are converted in more regular expressions. For example, an instruction such as "EAX = EAX ^ EAX" is converted into the expression "EAX = 0". This helps the data flow analyzer.- +/-doblocks
this options builds the control-flow graph for each procedure. It must be enabled for the data flow analyzer (+compsets option) to work correctly.- +/-compsets
this options enables/disables the register lifetime analysis. This analysis greatly helps in elimination of register variables by the following pass. If it is disables, most if not all the produced C expressions will use a lot of register references.- +/-compactexprs
this option enables the elimination of register temporary variables, and the creation of complex expressions. The number of output statements is greatily reduced by this option, but the complexity of each statement increases.
For example, the following instructions:EAX = 1; EAX = EAX + *EBX; PUSH(EAX); CALL 0x1000can be compacted into the following expression:
L1000(1 + *EBX);- +/-types
this option enables variable's type detection. Type detection is only partially implemented at this time.- +/-compactifs
This option converts sequences of compare+branchcondition instructions into if-goto-else-goto statements. This is the first stage where actual C code can be produced. The following stages only try to better structure the output by using more complex C statements.- +/-displaylabels
When this option is enabled, labels are always printed in the output, even if there is no goto statement to that address. This is useful to compare the C output with the disassembler output.- +/-dostmts
This option enables/disables the structurization of the output in more complex C statements. Each type of C statement can also be individually enabled or disabled.- +/-donullgotos
This option enables REC to remove goto statements that jump to the next (sequential) statement.- +/-doifs
during the compilation of statement, the if statement is always represented as a sequence of if-goto-else-goto. This representation simplifies moving if statements around. When this option is enabled, REC tries to remove the goto statement inside the true and false blocks by sustituting code from the destination of the goto statement, or by removing the else part altogether.- +/-doloops
when this option is enabled, loop analysis is used to substitute if-goto statements into while or do-while statements.- +/-dowhile
enable/disable while statement detection.- +/-dofor
when this option is enabled, REC tries to compact while or do-while statements into a single for statement.- +/-dopackloops
this option enables the rewriting of endless while loops into do-while loops, when an if statement at the end of the while block would cause the loop to either continue or end.- +/-dopackstmt
this option enables REC to compact statements, primarily if statements. For example, boolean && and || conditions are used to merge two consecutive if statements. Also, the conditional assignment operator (? :) is created when there is a sequence if(e1) v = e2; else v = e3; This option can create very good looking output.- +/-doswitch
enable/disable switch statement detection- +/-dosort
this option tries to reduce the depth of conditional statements by rearranging compount statements blocks in the output.- +/-flag16
this option forces the i386 disassembler to work in real-mode (16-bit mode) as opposed to the default protected (32-bit) mode. This option is only valid when decompiling x86 files.- +/-int16
this option specifies that integer are 16 bits instead of 32 (default). This is useful for older targets like 8086 or Macintosh's 68000.I might add more options as I add other features. For example, although REC is able to read ELF, A.OUT and Windows PE COFF files, the MIPS version only reads generic binary files. If the executable file has an internal structure (such as a file header and separate areas for code and data), the user must provide the layout and memory allocation information. This is because the only MIPS files I can use as examples are raw binary files.
Things that I still need to add (I'm working on them in my spare time):
Copyright (C) 1997 - 1998 Backer Street Software -- All right reserved.
Last revised on 1 Feb. 1998
REC's Home Page | Using REC | CG's Home page | E-mail : caprino@netcom.com