CLASS 24

Static data and segments in SPARC assembly

There are two types of global variables : initialized to zero and initialized to some other value. These, and the program text (code) are stored on separate locations in memory called segments. Program text is in read only segment, whereas the other areas are r/w segment.

The assembler has to be told about these areas using psuedo-ops:

.text  ! for the start of the text segment
.data  ! for the start of the non-zero initialized variables
.bss   ! (block starting symbol) start of the zero initialized text

All these segments are in the low memory,leaving the stack to occupy the high memory areas.

The assembler maintains the location of three location counters to point to these three different areas in memory.

To the assembler, each segment is a logically separate chunk of memory. The programmer can direct the assembler to assemble into any particular segment; until further notice, all code and data then goes into the selected segment. SPARC assembly uses the pseudo-ops:

 .text 
and
 .data 
to direct the assembler to switch to the named segment.

To set aside memory for use as a variable, we simply specify how much we want:

 .word   - sets aside a single word
 .half   - sets aside a halfword
 .byte   - sets aside a byte
Naturally we want to be able to refer to these locations; for this we use the same labeling mechanism that we use for instructions:
var1:   .word
var2:   .half
These declarations set aside 4 bytes for the variable var1 and two bytes for the variable var2.

If we want to initialize these locations, the assembler can do that:

var1:   .word   17
var2:   .half   22
These declarations set aside 4 bytes for the variable var1, initialized to the value 17, and two bytes for the variable var2, initialized to the value 22. These values are in decimal; the programmer can also use octal or hexadecimal.

Note that a declaration like

.word  15 
simply tells the assembler, "instead of putting an instruction here, put the number 15 here."

The assembler constructs the two segments separately, and then writes each one out to the object file (the machine language file). Of course the assembler also has to write some information at the beginning of the file (a file "header") that describes how big each segment is so that other programs can separate out the two segments.

Text Segment : Starts at location 0x2000 , and all the code written is independent of the actual position at which it is loaded, and only depends on the offset from the starting location, so relocation is not a problem.

The .global symbol ensures that the code is function is available to all other functions, and all other program segments.

Data Segment : used for initializing data, using psuedo-ops:

.word 3, 4 * 3, 5  ! all to be loaded as 32 bit constants.

Usually, the data are labelled, so that they can be referenced by the program.

.half ! 2 bytes
.byte ! 1 byte 
.skip  100  ! to provide space, in bytes, not initialized aany value.

.skip can be used to create global arrays, which can be accessed by all functions, and are not deleted when the scope of one function ends.

arr:  .skip 4 * 100  ! 100 integer can be stored at this location.

Data needs to be alligned in memory, this is done using the .align psuedo-op.

.align 2  ! store at locations which start at a multiple of 2
.align 4  ! store at locations which start at a multiple of 4

Finally, there is a third segment that is used for all the data that is initialized to zero. This segment is called bss (block starting symbol) and its pseudo-op is .bss. The bss segment is just like the data segment except that bss memory locations cannot be initialized by the programmer -- they are assumed to be zero at the beginning of program execution. The reason for separating out the bss segment from the data segment is that the assembler need not write out the contents of the bss segment to the object file. Instead, the assembler only writes out the length of the bss segment. The operating system, when it creates segments at the beginning of program execution, then creates a bss segment of the stated size and initializes it to all zeros.


Review of static data in C

Static data is useful since subroutines may need to 1) communicate other than via arguments and 2) have access to persistent variables.

As stated above, there are only two kinds of extent available in C: variables may exist during the execution of a single subroutine, or they may exist for the whole duration of the program. However, for variables with whole-program extent, there are three kinds of scope available.

Case 1. If a variable is declared outside of any function, it is given whole-program extent and whole-program scope. That is, it can be referenced from any point, in any file that is compiled into the program:

        File: main.c                    File: other.c
        -------------------             -------------------      
        | int foo;        |             | extern int foo; |    
        | int bar;        |             |                 |
        |                 |             |                 |
        | main()          |             | subr()          |
        | {               |             | {               |
        |  .              |             |  .              |
        |  .              |             |  .              |
        |  .              |             |  foo = 12;      |
        |  bar = 7;       |             |  .              |
        |  .              |             |  .              |
        |  .              |             |  .              |
        | }               |             | }               |
        -------------------             -------------------     
Note that if a variable from another file is going to be used, it needs to be declared as extern to tell the compiler that it's declared elsewhere.

Case 2. If a variable is declared outside of any function, but given the keyword static, then it is given whole-program extent, but its scope is restricted to the current file:

        File: main.c                    File: other.c
        -------------------             -------------------      
        | static int foo; |             | extern int bar; |    
        | int bar;        |             |                 |
        |                 |             |                 |
        | main()          |             | subr()          |
        | {               |             | {               |
        |  .              |             |  .              |
        |  .              |             |  /* can't refer |
        |  .              |             |  to foo here */ |
        |  bar = 7;       |             |  .              |
        |  .              |             |  bar = 12;      |
        |  .              |             |  .              |
        | }               |             | }               |
        -------------------             -------------------     
Case 3. If a variable is declared inside a function, with the keyword static, then again is it given whole-program extent, but its scope is restricted to the current subroutine:

        File: main.c                    File: other.c
        -------------------             -------------------      
        |                 |             | extern int bar; |    
        |                 |             |                 |
        |                 |             |                 |
        | main()          |             | subr()          |
        | {               |             | {               |
        |  .              |             |  static int a;  |
        |  .              |             |  .              |
        |  /* can't refer |             |  .              |
        |  to a here */   |             |  .              |
        |  .              |             |  a = 64;        |
        |  .              |             |  .              |
        | }               |             | }               |
        -------------------             |                 |
                                        | subr2()         |
                                        | {               |
                                        |  /* can't refer |
                                        |  to a here */   |
                                        |  .              |
                                        |  .              |
                                        |  .              |
                                        |  .              |
                                        | }               |
                                        -------------------
Note that the effect of adding the keyword staticis totally different in these two cases. When adding static to a stack variable, the effect is to lengthen the extent. When adding static to a variable defined outside a subroutine, the effect is to narrow the scope. This is a confusing overloading of the word static and is generally considered to be a mistake in the definition of the C language, but it's too late to change now.

Using static data in assembly

Accessing static data in assembly is a little different than accessing stack data. Remember that for stack variables, we always had the frame pointer (%fp) to use as our "anchor". All variables were located as offsets from the %fp. Also remember that load and stores are format 3 instructions, which means that the immediate values are 13-bit two's complement values -- thus their range is 4095 to -4096.

However, for static data, we have no such anchor - we have instead only the full 32-bit address of the data item. Since there is no way to load a 32-bit constant into a register in a single instruction, we must use the sethi instruction to get the job done. Remember, sethi loads the low 22 bits of the argument into the high 22 bits of the destination. Also, the %hi() function is equivalent to a left shift of 10 places (ie, moves the high 22 bits to the low 22 bit positions). And, the %lo() function is equivalent to ANDing the operand with a mask of 0x3ff - which means that only the low 10 bits are left. Here is an example of a subroutine that performs the sum "k = i + j" when all variables are static data.

        .data
i_m:    .word   3
j_m:    .word   9
k_m:    .word

        .text
foo:    save    %sp, -64, %sp
        sethi   %hi(i_m), %o0
        ld      [%o0 + %lo(i_m)], %l1
        sethi   %hi(j_m), %o0
        ld      [%o0 + %lo(j_m)], %l2
        add     %l1, %l2, %l3
        sethi   %hi(k_m), %o0
        st      %l3, [%o0 + %lo(k_m)]
        ret
        restore
Make sure you understand the meaning of the following lines of assembly code:
        	.data
        	.global buf_size
buf_size:       .word   10000 

For class 25 notes, click here

For more information, contact me at tvohra@mtu.edu