---------

scanning of procedures for system traps
marking instructions as live or dead - for dead code detection
	- we already do this implicity as part of the control flow graphing.
	- time to dump the results!

can do some work with elf specificly.. does control flow go into places it
should not?  what part of executable parts of the text segment, are
dead code?  do they have function prologues/epilogues?  could indicated
dead procedures.

this is dead code analysis..  basically we contruct a graph or list of
instructions and compress dead/live.. we then do graph colouring (red/white)
to indicate live/dead code.  we can do flat based graphing to give an
indication of file layouts.. eg. make the file one big block. then we
colour in different procedures, dead code, headers etc..  if we overlay these
blocks, then we also mark in some conflicting colours. an executable bit
of code cannot be a binary header etc.

each instruction can have a list of attributes associated with it.. at
the end, we simply look for conflicts - this also helps us resolve overlapping
procedures. an instruction cannot have 2 proc attributes.  how about
this -->

instr.p_attr = {}

p_attr['proc'] = Proc
p_attr['code'] = 1
p_attr['data']

its also possible to get even crazier.. and not assume any code, at all.
then do stuff like look at entry point etc, and go from there.

file[1-52] = header (ehdr)
file[ehdr.e_phoff-x] = header (phdr)
file[ehdr.e_shoff-x] = header (shdr)

^^ then we take full accountibility of the binary - though it turns more
into elf/geometry analysis than code analysis *shrug*

and for large binaries.. LOTS of mem required to do this.

but hey.. if we can catch LOTS of malware/virus/parasites/custom bins, why
the hell not? :)

now.. elf hdr and program headers + dynamic segment + data segment etc,
are not meant to have code in them.

this is going to kill memory usage :(

class BinByte:
	p_segm		= tdf		text/data/file
	p_perm		= rwx-		read/rw/execute/none
	p_func		= ho		header/object(code/data)

so the ELFBinary will fill in the Binary with its own header information.

now.. for things like shdr's or non loadable segments of a binary.. what
do we do with em?

a simple matching is this

segm"t" - perm"x"
segm"d" - perm"r"
segm"d" - perm"w"
segm"t" - func"o"
segm"t" - func"h"
segm"f" - func"h"
segm"t" - perm"-"
perm"-" - func"h"

soo... if you have for example

segm"d" - perm"x"

^^ then that should flag an error (trying to execute code in the data segment)

perm"-" - func"o"

^^ shoudl also flag (use elf header for code or data)

this is basically assigning domains or priveledges to certain parts of the
binary, and flagging when unallowable states are entered.

i will implememnt part of this tomorrow perhaps.

---------

better graphing of the plt

---------

Binary reads in the Dynamic segment to pull in library dependancies
ldg == library dependency graph

these are the graphs we should have for now

library dependancy graph --> connecting to the binary also

call graph
cfg's for each proc
bb graphs for each proc

a 'bug' we can check for is more than 1 level of libraries to resolve a
binaries symbols.

so for each external procedure, we store also the library that resolves
it.

---------

for all paths in a procedure.. if it will always cross an exit() libcall,
then the procedure is actually a __attribute__((noreturn)) (gcc terminology).

---------

make a graph connectivity subgraph inclduing only certain nodes (eg, libcalls)

---------

make ProcCall store ProcArgs
add format string bug checks - simple style
	- look for non immediate pushes for the format string arg in printf,
	  verify that they are rodata, and are actually format strings.

^^^ these are implemement with 0.0.21 (no check for .rodata or valid strings)

	  also use this to say what the types of other args are.
naive 1st layer type analysis
	- look for lib calls, and mark arguments acording to their expected
	  type.

---------

make ELFSym class to not lose data when pushing into Sym. make this the
Sym's value.

---------

use a concept of known and unkown (authoritaive) data.  this will be used
for dataflow and pointer analysis.

mov $1,%eax		data location is known, so is value
mov $1,0x05(%ebx)	data location is unknown if ebx is unknown
				value is known

concept of direct memory or registers
	ebx, 0x8048100 is absolute memory

0x05(%ebx) address is the value of ebx + 0x05
			

so.. to continue with datanalysis, requires at least some more layers
for data representation.	

i think maybe storing the data dependancies for a 'value' etc is something
like a tree, or stored like an rpn (reg. polish notation) expression.

for example -->

eax + ebx*S + N

has 2 dependancies.. eax, and ebx..

eax + N -->

is dependant on eax..

initially a simple way of doing this is to store a 1 layer level of
dependants..

thus the Var := eax + ebx*S + N

has [ eax, ebx ] as dependants.
<eax + ebx*S + N, pc> is its key.. actually its not.. its a "constructed"
variable..  the exansion of eax + ebx*S + N is the true key as this
represents the explicit address of data.  so if this cannot be exapanded,
(because eax, ebx are unknown), then a key cannot be generated..

so we we come across an addressing mode for a variable.. we associate the
list of "dependants" with it.. so the address mode

0x05(eax)...  the dependant is eax.  we then have an expansion for that
variable into its key.  which will use eax.  if eax is not known, then we
cannot say we know what this pointer is, nor what it points too.

so basically we have a pointer (eax, *eax + 5).. and the value that it
points to (*eax, *(*eax + 5)).  notice how 'eax' is a pointer here, but its
the 'pointer' is always known since its a register. but what it points
to is not.  now we we get (%eax), we have to dereference eax.. and get a
value.. if this value is known, then we can reference that, as our real
address/value.. but the new pointer *eax, is now a new pointer... possibly
with an unknown dereferencing.

so.. we must expand IA32.modes to show dependant's as a minimum.

%eax		key is eax, doesnt require val
(%eax)		key is eax, requires val
0x05(%eax)	key is eax, requires val

$0x05		key is None, doesnt require anything. val is 0x05

so IA32.modes gets a list of keys/values now.  if a value is present
then there should be no keys.  but both are in because its easier for
regexp parsing atm

when we read an asm argument.. we check the addressing mode as usual.
we then say if its a value, or a key.

if its a value (its immediate), its authoritave, but doesnt have a key.

if its a keyed value.

then the list of keys, are the dependants.  we also have an expression

0x05(%eax,%ebx,4)

keys are 2 and 3

expression is --> 1 + 2 + 3*4

ok.. change to val now.  val # is not present. only the expresison :)
so for an immediate, it has no keys. but it has an expression which is
just the index, as normal.


------

use an abstract for a GraphEdge.p_val in flow control, saying if its
inter/intra procedural etc and its linkage type etc (plt etc) - this will make
the basic block graph construction work properly also.

^^ this is done using a ProcCall.. i changed my mind of including calls
in cfgs.. it makes no sense when i think about it now ;-)

use a global symbol table - this will fix up all the symbol updates.
^^ done

parse an instruction into its operand/operants, addressing modes etc - this
will allow the beginnings of some other analysis.
^^^ done :)

seperate the Printer into various subfunctions - this will allow us to do
interactive analysis using a mini interpreter.
