Local Memory in Subroutines

Allocating and managing dynamic memory on the stack


Many subroutines can perform their function using only registers to store data. But sometimes, they need more storage than can be provided by registers alone, and so they must access main memory. Main memory can be accessed using global data locations stored in static memory, or local data stored in temporary (dynamic) locations in the stack. Static memory is a section of memory that is set aside before a program starts, and it used exclusively for data storage for as long as the program executes. Static memory addresses don’t change, so the main program and any subroutines can access the information stored there. Local memory is allocated on the stack when a subroutine is called. It is available only while the subroutine is running, and it is deallocated when the subroutine ends (so the same memory locations can be used by other subroutines later).

When the stack is used to store local/dynamic variables, a “frame pointer” (FP) is used to keep track of where the variables are located. The FP is just a stack address that is stored in a register, and that address is fixed and valid for as long as the subroutine is running. The ARM calling convention defines R11 as the FP, and the GNU compiler recognizes “FP” as an alias for R11 (so, R11 and FP can be used interchangably). The FP serves a different role than the stack pointer (SP) - the FP always points to the local variables, and its contents are fixed for the duration of the subroutine. The SP always points to the next available stack location, and that can change as the subroutine executes. Local variables can always be accessed with reference to the FP. For example, local variable 1 is located at FP-4, local variable 2 at FP-8, etc. Since the FP identifies where the local variables are located for a given subroutine, it must also be saved on the stack so that if another subroutine is called, it can be restored along with the rest of the context.

The first block of instructions in a typical subroutine serve to preserve context, and set aside room on the stack if local variables are needed. This involves: 1) pushing the link register (LR) onto the stack (in case another/nested subroutine is called); 2) configuring the stack for local variables if they are needed, by pushing the FP to preserve its content, and moving the current SP into the current FP (so the FP points to the location of the first local variable); 3) allocating space for local variables by setting the SP to point to a location just beyond the last local variable; 4) and pushing any general-purpose registers (GPRs) that may be used into the stack. This initial block of instructions is typically referred to as “prologue” code.

When the subroutine completes, the stored register values are popped from the stack, and the dynamic memory used in the stack is deallocated by moving the current FP back into the SP. This “collapses the stack frame”, and any information stored in the deallocated stack locations will be overwritten by future stack operations. This block of instructions is typically found at the very end of a subroutine, and it is known as “epilogue” code.

In the example shown in the figure below, the bottom (or initial location) of a 1Kbyte descending stack is located at address 0x000F0000. The main program calls subroutine #1, and subroutine #1 uses two GPRs and creates five local variables. Then subroutine #1 calls subroutine #2, and subroutine #2 uses three GPRs and creates three local variables. When subroutine #1 is called, the LR and FP are pushed, the SP is moved into the FP, 20 is subtracted from the SP to make room for five local variables, and then R4 and R5 are pushed. Subroutine #1 can access local variable 1 at (FP-4), local variable 2 at (FP-8) and so on. Then, when subroutine #2 is called, the process repeats: the LR and FP are pushed, the SP is moved to the FP, 12 is subtracted from the SP to make room for local variables, then R4-R6 are pushed. Subroutine #2 can access local variable 1 at (FP-4), local variable 2 at (FP-8) and so on.

Figure 1. The frame pointer and stack frames

The following code example shows an example of a subroutine that allocates space for six local variables, and saves five GPRs. In the middle of the code, some examples are given for reading and writing local variables stored on the stack. Make sure you understand the prologue and epilogue code, and make sure you understand how to access local variables.

Example Function Prologue and Epiloge

@make room for 6 words on the stack
@then collapse stack frame
	@function Prologue
	push {fp, lr}		@save old fp and lr
	mov  fp, sp		@set FP for function
	sub  sp, sp, #0x18	@6 words, 4*6=24
	push {r4-r8}		@Optional Save regs
	@end of prologue
	@The code for the body of the subroutine goes here. The example
	@statements below show how local variables are accessed
	ldr r0, [fp,#-4]	@load first local variable to r0
	ldr r1, [fp,#-8]	@load second local variable
	str r0, [fp,#-12]	@store content of r0 to third local variable
	ldr r0, [fp,#-16]	@load the sixth variable to r0
	@function epilogue	
	pop  {r4-r8}		@restore saved regs
	mov  sp,fp		@collapse stack frame
	pop  {fp, lr}		@restore old fp and lr
	bx lr			@return

The code below shows an example of a function that allocates and uses local variables. This function could have been implemented using only registers, but there are many cases where using registers only would be impractical.

@Function that uses local vars to compute (a+c)-b
@	r0:	value a
@	r1: value b
@	r2: value c
@	fp-4:	a
@	fp-8:	b
@	fp-12:	c
	push {fp, lr}		@save old fp and lr
	mov fp, sp		@point fp at start of frame
	sub sp,#12		@make space on stack for 3 32-bit values

	@NOTE: not using non-scratch regs
	@no saved registers on the stack

	@end of function prologue

	str r0, [fp,#-4]	@a = r0
	str r1, [fp,#-8]	@b = r1
	str r2, [fp,#-12]	@c = r2

	ldr r0, [fp,#-4]	@r0 = a
	ldr r1, [fp,#-12]	@r1 = c

	add r0, r1
	str r0, [fp,#-4]	@a = a+c

	ldr r1,[fp,#-8]		@r1 = b

	sub r0, r1		@r0-=b
	str r0, [fp,#-4]	@a = (a+c)-b

	ldr r0, [fp,#-4]	@return value in r0

	@function epiloge
	mov sp, fp		@clear function stack
	pop {fp, lr}		@restore saved regs

	bx lr			@return

As a final example, consider a function to reverse the contents of an array. The entries of the array must be copied into a temporary array before being written back to the original in reverse order. For any reasonably sized array, this requires more memory than regsiters alone can provide. Setting aside static memory for just this temporary buffer would be wasteful, so dynamically allocating space on the stack is the best choice.

@reverses the array of 32-bit values pointed at by r0
@	r0:		pointer to array
@	r1:		size of array
@	fp-4:	original array address
@	fp-8	size of array
@	fp-12:	start of temp array
	push {fp, lr}
	mov fp, sp
	sub sp, sp, #8			@make room for orig addr and size

	@make rooom for size*4 bytes on the stack
	sub sp, sp, r1, lsl #2

	@sp points at lowest address of array
	@have r2 point at start of array
	mov r2,sp

	@save parameters to local values
	str r0, [fp,#-4]
	str r1, [fp,#-8]

		ldr r3, [r0],#4		@get entry from array
		str r3, [r2],#4		@put in temp
		subs r1, r1, #1		@r1--
		bne copy_loop		@r1!=0

	ldr r0, [fp, #-4]		
	ldr r1, [fp, #-8]

	@now r2 points to end of temp
	@r0 points to start of orig again
	@r1 holds size of array

		ldr r3, [r2,#-4]!	@value from temp
		str r3, [r0],#4	@store back to orig
		subs r1,r1,#1		@r1--
		bne rev_loop		@r1!=0

	mov sp,fp
	pop {fp,lr}
	bx lr

Passing Parameters on the stack

When subroutines must receive more input data or produce more output data than can be passed in the four parameter passing registers (R0-R3), data can be transferred using dynamic memory in the stack. As before, local/dynamic memory variables can be accessed using addresses that reference the FP.

To use dynamic memory to pass data into a subroutine, the calling program can simply push data into the stack before a subroutine is called. To return data, the subroutine can place new data in the stack locations that were set aside.

The main program must deallocate the stack memory when the subroutine returns. The example program and subroutine below illustrate the process.

.global main
@ Main program calls a subroutine to add six parameters (R=a+b+c+d+e+f)
@ Four parameters are passed in R0-R3, and two are passed on the stack 
@ Result returned in r0
@ Input parameters are:
@	R0-R3:	parameters a-d (a=1, b=2, c=3, d=4)
@	Parameter e=5 will be located at FP+12
@	Parameter f=6 will be located at FP+8

@ Frist, push stack parameters
mov r0,#5	
push {r0}	@ place parameter e on stack
mov r0,#6
push {r0}	@ place parameter f on stack

@ Next, load register parameters a-d
mov r0,#1	
mov r1,#2
mov r2,#3
mov r3,#4

bl add_six	@ Call subroutine

@ R0 now contains sum (=21)
@ Deallocate stack memory by adding 8 to SP
add sp,sp,#8

@ Enter infinite loop
loop: br loop

The subroutine is shown below. Note that after pushing the FP and LR onto the stack and moving the SP into the FP, the FP points two memory locations below the parameters. The last parameter to be pushed (parameter f) is therefore located at FP+8, the next (parameter e) at FP+12, and so on. When using the stack to pass parameters, keep in mind the LIFO structure - the last parameter pushed on the stack will the the closest to the FP (typically, FP+8).

@ Subroutine to add six parameters (four in R0-R3 and two from stack)

	push {fp, lr}
	mov	fp,sp
	@end prologue

	@add parameters passed in regisers
	add r0,r0,r1
	add r2,r2,r3
	add r0,r0,r2

	@get parameters from stack
	ldr r1, [fp,#8]
	ldr r2, [fp,#12]

	add r1,r2, r1
	add r0, r0, r1

	mov sp,fp
	pop {fp, lr}
	bx lr

The figure below illustrates the stack when used for parameter passing.

Figure 2. Passing parameters using the stack