Native Android Runtime Emulation
================================


The problem
-----------

Now I have a Android app to be reverse engineered. Its main functionality is
built on top of JNI, i.e. in the shared library. It is not sth. I would not
hesitate using all my efforts. Therefore I started to look for a shortcut.
It could save me lots of time if the .so can be loaded and debugged at runtime.


The offcial Android emulator
----------------------------

The offcial Android emulator consists of a qemu based emulator, the Android
runtime, and a toolset to deploy, run and debug native execultabes built by
Android NDK. However, it turns out not to be handicapped.
The gdbserver delivered with the NDK is basically broken. It fails to recognize
tons of Thumb-2 instructions, thus sends SIGILL to terminate the program.


The first approach: QEMU
------------------------

Since Android emulator is essentially a qemu image, it is quite attemptive to
load Android executables on certain ARMv7-a compatible qemu image, where a
complete native toolchain is available. The effort was in vain. Google adopts
a different linker and loader, together with a different libc and libstdc++,
which are found nowhere in any other Linux distro. The consequence is not only
missing symbols, but crashing the Linux loader when being loaded.
One solution is to port Bionic to Linux. But I'm afraid it won't be enough: the
.so is bound to the Android linker/loader. It suggets that the Android loader
must be ported to Linux as well. It is likely to become a nightmare considering
the size of the code, and even worse, the obvious strategy of Google is to
diffrentiate and isolate Android with other Linux distros.


Build own emulator
------------------

An emulator from scratch pops up as the final solution. Afer all, it is always
more interesting to build sth. from scratch than to fix sth. broken.
To provid a emulated environment for a Android native shared lib, several
layers of services are required:

* Disassembler, which translates binary to instructions.
* Simulator, which executes the instructions and maintains the processor states.
* Loader, which parses the ELF and do what a loader is supposed to do.
* Memory management, which supervises access to memory.
* C runtime, which implements the C library.
* Facilities for user extension and debugging.

We will give a brief to each layer hereunder.

### Disassembler

The libcapstone is a library for disassembler. Its API is very handy, and it is
evolving rapidly. It was a easy decision to take it for disassembler. The
disassembling is done in-time, i.e. only disassemble current address. This is
because it is slow and difficult to scan all .text section beforehand. The
difficulty arises from the PC related data fetching, which makes the .text
section a mixture of code and data.

### Simulator

The simulator is meant to keep state of the processor, and execute the
instructions in a controlled manner. The processor state is defined by several
registers and nothing more. The simulator shall know how to manipulate the
register values according to current instruction execution result. The manual
of ARM is the only thing needed during programming.
The support to multi-threads adds a little bit overhead to the simulator, coz
each thread has its own set of registers. The simulator needs to switch
register-set when switching to another thread. Likewise, the stack of each
thread is also isolated. The manipulation of stack switching is done in the
memory management layer.

### Loader

Besides usual task of a generic loader, this loader has some extra features.
First of all, it needn't fixing all reloc entries. It can safely assume the .so
is loaded to address 0x0. Secondly, it has no knowledge of runtime, so it
cannot resolve external symbols. The solution is to expose the unknown symbols
to the user, who will provides the location and size of the missing symbols to
the loader. Note that not all missing symbols are needed: only those that are
accessed in runtime are necessary.

### Memory management

The memory management defines a few things: where the ELF is loaded, where the
heap and stack are located, and how to access these regions.
The memory mapping is as below:

        +-------------+ ---> STACK END
        |    Stack    | |  Each thread has its own stack
        |             | v
        +-------------+ ---> STACK START
        |    Heap     | ^  Heap is shared between threads
        |             | |
        +-------------+ ---> HEAP START
        |             |
        |   .bss      |
        |   .text     |
        |   .data     |
        |   .rodata   |
        |   .got      |
        |    ...      |
        |             |
        +-------------+ ---> elf load addr: 0x0

All the memory references must be tapped and controlled. This is not only
because of the difference of 32bits and 64 bits architectures, but also because
the memory references are related to the load address of the ELF, which is
defined to be 0x0. This restriction actually brings benefits: the memory access
tapping becomes straightforward. User can easily set watchpoint in any address,
and on any value.
Also note that the heap region are shared between threads, and the stack is
privately owned by each thread.

### C runtime

The missing plt entries in the .so are mainly from C runtime. The oftenly used
functions such as string, stdlib, pthread, and socket are provided. The loader
is aware of the functions, and co-operate with the simulator to stub the calls
to the implementation, which is built on local (x86) C runtime.
Manual translation on arguments and return values are needed, but it is easier
than it sounds. For instance, the sizeof(time_t) is different between 32bits
and 64bits platform. Therefore the call to gettimeofday(2) should take care of
the translation.

### Extensions

It is not unusual that some existing sub-routines need to be masked or watched.
The loader is aware of those checkpoints as well. The simulator just blindly
asks the loader to provide entries when a called routine is not in place. The
loader then take this chance to allow even swapping the stubs in the fly.


The result
----------

With this emulator, the user can start from any address or call any subroutine
with any arguments. All the calls to libc will be adapted to local C runtime.
The execution can be stopped on given memory access and/or .text address. The
whole procedure can be monitored by local gdb session.

## Hello-World

A hello-world example is provided with the libtwolib-second.so, which is built
from Android official NDK sample, two-libs. The subroutine "first" can be fired
with below code:

        FILE *fp;
        struct armld ld;
        struct elf *elf;
        struct vm *vm;
        int ret;
        uint32_t args[4] = {0};

        if (!(fp = fopen(argv[1], "r"))) {
            printf("Cannot open file [%s]\n", argv[1]);
            return -1;
        }

        vm = vm_init();
        elf = elf_load(vm, fp);
        vm_set_elf(vm, elf);
        fclose(fp);

        ld.vm = vm;
        ld.elf = elf;

        setbuf(stdout,NULL);
        setbuf(stderr,NULL);

        /* Will trigger a trap exception which
         * can break the gdb session.
         */
        //exec_set_breakpoint(0xc62);

        /* Watch any store operation to memory address 0x79490 */
        //vm_set_mem_watch(0x79490, 1);

        /* Set up the arguments to the subroutine.
         * The number of arguments can be as many as needed.
         */
        args[0] = 1;
        args[1] = 2;

        /* 0xc60 is the entry to the subroutine.
         * The address can be read from readelf -s.
         *
         * The 3rd param is the halt address. When it
         * is set to 0, the execution will return
         * when current subroutine returns.
         *
         * The return value of the called subroutine
         * is the return value of arm_exec.
         */
        ret = arm_exec(&ld, 0xc60, 0, args, 4);

        /* It should be 3, which is result of 1 + 2.
         * Meantime, the message printed in the library
         * will be visible on the console.
         */
        printf("return: %d\n", ret);

The output of the program is as below:

        $ ./armexec ../libtwolib-second.so
        First is called with 1 + 2
        return: 3
        $


## GDB script

The dbg.gdb has several commands for printing register values, setting
breakpoints, printing backtrace, etc.


Roadmap
--------

This is just the first release of the emulator. It has the potential to be
a more sophiscated lib. It could support loading more than one .so files,
other architectures, or even loading .so files built for different arch.
