CLASS 14

Data Structures

One-dimensional array

Arrays in C and C++ are stored as pointers to the first element of the array, and a list of offsets from that element.

Arrays are laid out in memory linearly. To store an array, we allocate enough space on the stack for all the array's elements. The address of each element is computed from the address of first element and the array index i:

  address_of_first_element + i * size_of_array_element_in_bytes
so a five-element array:
	int ary[5];
would be:
  ary:	ary[0]
  ary+4:	ary[1]
  ary+8:	ary[2]
  ary+12:	ary[3]
  ary+16:	ary[4]
  
To allocate an array on the stack, we treat it like one big variable:
  ary_s = -20
  save %sp, (-64 - 20) & -8, %sp
So address of element 0 is %fp + ary_s
Address of element 1 is %fp + ary_s + 4
To load ary[2] into %o0:
	add	%fp, 8, %o1
	ld 	[%o1 + ary_s], %o0
To load ary[i] into %o0, if i were stored in %i_r:
	/* multiply i by 4 */
	sll 	%i_r, 2, %o1
	add	%fp, %o1, %o1
	ld	[%o1 + ary_s], %o0

Example of Array allocation:

int a, b, c[3];
char d[3];
short e[2];

Corresponding to the C declarations, the offsets on the stack will be defined as follows:

define(a_s, -4)
define(b_s, -8)
define(c_s, -20)
define(d_s, -23)
define(e_s, -28)
Now, the initial declaration will be:
save %sp, ((-64 - 28) & -8), %sp 
==> save %sp, -96, %sp
And the variables can be loaded into the registers as:
ld   [%fp + a_s], %o0            ! a --> o0
ldsb [%fp + c_s], %o1            ! c[0]  --> o1
ldsb [%fp + c_s + 4], %o2,       ! c[0]  --> o2
ldsh [%fp + d_s], %o3            ! d[0] --> o3
ldsh [%fp + d_s + 4], %o4        ! d[0] --> o4
and so on.......

Another Example of Array allocation

int a[100];
char b[48];
register int x[3];
The sparc assembly code for these definitions is as follows:
define(a_s, -400)
define(b_s, -448)
define(x_r_0, l0 )
define(x_r_1, l1 )
define(x_r_2, l2 )

Hence, the starting instruction of main will be:

save %sp, ((-64 - 448) & -8), %sp
==> save %sp, (-512 & -8), %sp
==> save %sp, -512, %sp

Example: Summing an array

  1	main()
  2	{
  3	  int nums[4] = {1, 45, -16, 23, 62}
  4	  int n = 5;		/* number of elems in array */
  5	  register int i;	/* for index */
  6	  register int sum = 0;	/* holds the sum */
  7
  8       for (i=0; i < n; i++)         
  9       {                             
 10         sum = sum + nums[i];  
 11       }                             
 12  }                                     
This code translates to:
	/* local variables */
	nums_s = -20
	n_s = -24

	/* i is in %l0;  sum is in %l1 */

	.global	main
main:	save	%sp, -88, %sp

	/* line 3 initialization */
	mov	1, %o0
	st	%o0, [%fp + nums_s]

	mov	45, %o0
	st	%o0, [%fp + nums_s + 4]

	mov	-16, %o0
	st	%o0, [%fp + nums_s + 8]

	mov	23, %o0
	st	%o0, [%fp + nums_s + 12]

	mov	62, %o0
	st	%o0, [%fp + nums_s + 16]

	/* line 4 initialization */
	mov 	5, %o0
	st	%o0, [%fp + n_s]
	
	/* line 6 initialization */
	mov	0, %l1

	/* line 8; top of for loop */
	ba	fortest
	mov	0, %l0
for:
	/* line 10; body of the for loop */
	sll	%l0, 2, %o0			! %o0 = i * 4
	add	%fp, %o0, %o0			! %o0 = %fp + i * 4
	ld	[%o0 + nums_s], %o0		! %o0 = nums[i]
	add	%o0, %l1, %l1			! sum = sum + nums[i]

	/* line 11; bottom of the for loop */
	add	1, %l0, %l0			! i++;
fortest:
	ld	[%fp + n_s], %o0		! %o0 = n
	cmp	%l0, %o0			! is (i < n)?
	bl 	for
	nop

	/* line 12; end of program */
	mov	1, %g1
	ta	 0

Pointer and Address arithmetic

Consider the following C declarations:
	int   a[20];
	char  b[20];
	short c[20];
Let's say these are the only declarations in a subroutine, and the value of %fp is 1000. What are the addresses of a[0], b[0], and c[0]?

First, we allocate the items in order. a takes 80 bytes, so its address is 1000-80 = 920. b takes 20 bytes, so its address is 920 - 20 = 900. c takes 40 bytes, so its address is 900 - 40 = 860.

Now, what are the values of:

	&(a[3])	 = 920 + 3 *  4 = 932
	&(b[10]) = 900 + 10 * 1 = 910
	&(c[5])  = 860 + 5 *  2 = 870
So, as before, we multiply the index by the base item size and add it to the array start.

Since this operation is so common, the C language has it built in. Say we added these declarations.

  int   *a_ptr = a;
  char  *b_ptr = b;
  short *c_ptr = c;
Now, each one points at the first element of its array. So
  a_ptr = 920,
  b_ptr = 900, 
  c_ptr = 860.  
That is, *a_ptr = a[0]. To get the second element of the array, no matter what sizethe base item is, just increment the pointer:
  a_ptr++;
  b_ptr++;
  c_ptr++;
Now *a_ptr = a[1], *b_ptr = b[1], *c_ptr = c[1]. Which means that
  a_ptr = 924,
  b_ptr = 901, 
  c_ptr = 862.  
Notice that they have been increased by different amounts, depending on the size of each variable in bytes.

So what is the difference between a and a_ptr? When using them as names for data, nothing. You can refer to a_ptr[20] just as properly as referring to a[20]. However, there is a difference between a and a_ptr. For a, the space has been allocated on the stack by the compiler. For a_ptr, no space has been allocated -- it is up to the programmer to make sure that a_ptr points to a valid piece of memory. If we need to create more space for the pointer to point to during the course of execution of the program, we need to make use of functions like new and malloc to create the space. But is that space on the stack? Well, the answer is no. The space gets allocated on the heap.

For class 15 notes, click here

For more information, contact me at tvohra@mtu.edu