CLASS 26

Linking and Relocation

The reason we have to deal with linking and relocation is because we want separate compilation.

There are two problems caused by separate compilation: linking, and relocation:

How is this handled in UNIX?

The program that takes care of linking and relocation is called (variously) the linker, the loader, or the binder. In UNIX it's called the loader, and its name is ld. In all of your compilations to date, ld has been called for you by gcc automatically. (If you want to see what programs gcc is calling, use the -v option.)

In addition to linking together all of the files you have assembled, gcc directs that your files be linked with certain predetermined libraries. What is a library? A library is a set of compiled object code that is set up for the linker to "pick and choose" from. When the linker finds unresolved external symbols in your code, it looks in the libraries to see if any symbols are defined there. If so, it copies the associated subroutine code into your program.

What is linking and relocation?

Linking is the process of finding addresses for all the symbols used by your program. Relocation is modifying addresses that need to change because many files are being combined into one.

The assembler always assembles each file as if it started at memory location zero. When files are combined into one program, they are placed one-after-the other. So when two files are combined into one program, they can't both start at zero; one (at least) has to be changed since it will start after the other.

For linking and relocation purposes, there are THREE KINDS of symbols:

Type 1, local symbols:

        Source file
        +-----------------------+ 
        |                       | 
        | foo: save %sp...      | Location: 10  <-
        |      ... subroutine...|                | 
        |                       |                |
        |                       |                Distance = -490
        |                       |                |
        | main:                 |                |
        |      call foo         | Location: 500 <-
        |                       | 
        +-----------------------+ 

In this case, foo's address can be calculated by the assembler, since it is in the same file as main (the assembler processes the whole file). In addition, the address used in the "call foo" operation doesn't need to be changed, since subroutine calls and branches are PC-relative. Recall that PC-relative means that instead of storing the address of the subroutine or branch target, instead the distance to the subroutine or branch target is stored. In this case, the argument to the call instruction would be the number -490. During execution, the processor adds this distance to the current PC to get the actual address to branch to. This means that a file can be moved around in memory without changing the addresses of subroutines or branch targets.

Note that in the file, after assembly, the symbol "foo" is completely gone.

Type 2, local, position-dependent symbols.
        Source file A
        +-----------------------+ 
        |                       | Location: 0
        |                       | 
        | foo: .word 7          | Location: 10
        |                       | 
        |                       | 
        |                       | 
        | main:                 | 
        |  sethi %hi(foo), %l0  |
        |                       | 
        +-----------------------+ 
The assembler can calculate the address of foo, as before, but now foo is a 32-bit pointer whose value is based on the assumption that the file starts at memory location 0. During assembly, the argument to the sethi operation here is the number 10. However, when this file is linked with another, it will probably not start at 0:


File B  +-----------------------+ 
        |      blah             | Location: 0
        |      blah             | 
        |      blah             | 
        |                       | 
        |                       | 
        |                       | 
        |                       | 
        |                       |
        |                       | 
        +-----------------------+ Location: 1000
File A  +-----------------------+ 
        |                       | 
        | foo: .word 7          | Location: 1010
        |                       | 
        |                       | 
        |                       | 
        |                       | 
        | main:                 | 
        |  sethi %hi(foo), %l0  |
        |                       | 
        +-----------------------+ 
So during linking, the linker must change the value "10" that was originally assmebled as the argument of "sethi" to the value "1010". The basic operation the linker must do is add the new location of the file (here, 1000) to each Type 2 address.

Type 3, external symbols.

        Source file A
        +-----------------------+ 
        |                       |
        |                       | 
        |                       |
        |                       | 
        |                       | 
        |                       | 
        | main:                 | 
        |    call printf        |  Location 200
        |                       | 
        +-----------------------+ 
In this case, the assembler can't find printf anywhere in the source file. So, printf is added to the Unresolved References table, which is kept at the end of each object file:
        Unresolved Reference    At Location
        --------------------    ---------------
        printf                  200
This Unresolved Reference will be resolved by the linker.

Global Symbols.

Any symbol that is declared .global will be added to the Symbol Table at the each of each object file.

        Source file A
        +-----------------------+ 
        | .global var1          |
        | var1 .word 3          | Location 16
        |                       |
        |                       | 
        |                       | 
        | .global main          | 
        | main:                 |  Location 196
        |    call printf        |  
        |                       | 
        +-----------------------+ 

        Global Symbol           At Location
        -------------           -----------
        main                    196
        var1                    16

Actions of the Linker (ld)

Here is a summary, then, of the actions that the linker takes. The linker writes out the result as a file. The file includes information for the OS about how long the text, data, and bss segments are, so the OS can set aside the necessary memory when the program is going to start running. You can get this info using the "size" command.
% size ci
78445 + 976 + 3280 = 82701
text  + data + bss = total size in memory
If unresolved references still exist, the linker does not make the program file executable, and signals an error.

For class 27 notes, click here

For more information, contact me at tvohra@mtu.edu