There are many ARM instructions, and we will introduce them over time as we need them for programming projects. For this first project, we need instructions that can load data from main memory into a register, store information from a register to main memory, move data between registers, add data stored in registers, shift data stored in a register, and finally, branch to a “loop” label. These instructions are examined in more detail here. In later projects, we will introduce more instructions. Once you get a feel for ARM instruction format and use, you can refer to the ARM Architecture Reference Manual for complete information on all instructions.
Load and Store Instructions
Data used by a running program must come from an external source (i.e., main memory or a port). Data can be placed into main memory during programming, or it can arrive during run-time via a port. Either way, all data consumed by the ARM processor must be accessed using the external memory bus. Since the ALU can only operate on data stored in a register, we must first “load” data into a register before we can use it. Likewise, ALU results can only be written to a register, so we must “store” results back into main memory in order to free up the register for use by later instructions.
There are several load and store instructions to load data from a memory location into a register or store data from a register into memory. Some deal with bytes, some load two consecutive bytes, and some load 4 consecutive bytes (32 bits); some are privileged (meaning they can write to protected areas of memory – more on that later); and some provide shared memory synchronization. The table below shows all available load and store instructions; we will only use the instructions in the left two columns to load or store words (LDR, STR), half words (LDRH, STRH), and/or bytes (LDRB, STRB). You can find more information on the other privileged and exclusive load and store instructions in the ARM Architecture Reference Manual.
|Data Type||load||Store||Load Unpriviledged||Store Unpriviledged||Load-Exclusive||Store-Exclusive|
|16-bit unsigned halfword||LDRH||-||LDRHT||-||LDREXH||-|
|16-bit signed halfword||LDRSH||-||LDRSHT||-||-||-|
|8 -bit byte||-||STRB||-||STRBT||-||STREXB|
|8 -bit unsigned byte||LDRB||-||LDRBT||-||LDREXB||-|
|8 -bit signed byte||LDRSB||-||LDRSBT||-||-||-|
|Two 32-bit words||LDRD||STRD||-||-||-||-|
Load and store instructions can only use memory addresses stored in another register. The basic forms are as follows.
LDR Rn, [Rm] @ Load rn with the 32-bit memory contents pointed at by rm STR Rn, [Rm] @ Store the contents of rn at the memory location pointed at by rm.
The square brackets around the second register indicate that it contains an address, and that it is acting as a pointer to a memory location. This is the fundamental addressing mode – it uses only value stored in the “base register” as the address. An immediate 8-bit offset can also be included, and the offset value can be added to or subtracted from the base address prior to forming the address. The value in the base register is unchanged.
LDR Rn, [Rm, #0x04] @load Rn with the memory contents at Rm +4 STR Rn, [Rm, -#0x08] @Store the contents of Rn at the memory location pointed at by Rm -8.
An offset stored in another register can also be added to or subtracted from the base address.
LDR Rn, [Rm, -Rt] @ Load Rn with the memory contents at Rm - Rt STR Rn, [Rm, Rt] @ Store the contents of Rn at the memory location pointed at by Rm + Rt.
An offset stored in a register can also be shifted prior to adding to or subtracting from the base register. This addressing mode is useful for accessing arrays – the index can be scaled to the size of the array element.
LDR Rn, [Rm, Rt, LSL #0x16] @ Load Rn with the memory contents at Rm + (left-shifted 16 Rt) STR Rn, [Rm, Rt, LSR #0x4] @ Store Rn at the location pointed at by Rm + (left-shifted 4 Rt).
The load and store instructions presented so far do not change the base register contents. If desired, the base address register can also be automatically updated. Adding a “!” after the bracket-enclosed base register definition causes the base register to be updated after the access occurs. This is called a “pre indexed” access, because memory is accessed before the base register is modified.
LDR Rn, [Rm, Rt, LSL #0x16]! @ load Rn with the memory contents at Rm + (left-shifted 16 Rt), and then @ After the access, update Rm with the address that was used.
Post indexed accesses are also available.
LDR Rn, [Rm], #0x04 @ Load Rn from Rm, and then add 4 to Rm afterwards. STR Rn, [Rm], Rt, LSR #0x4 @ Store Rn at the location pointed at by Rm, then add the shifted Rt.
There are also load and store instructions that operate on multiple memory locations. We will examine these later.
A MOV instruction can move data between registers, or from an immediate to a register. A MVN instruction also moves information, but does a bit-wise negation in the process.
MOV Rd, Rs @ Load Rd with the contents of Rs; leave Rs unchanged MOV Rd, #0xFFF @ Store FFF in Rd (up to 12-bit immediates can be used). MVN Rd, Rs @ Load Rd with the bit-wise inverted contents of Rs; leave Rs unchanged.
Move Wide (MOVw) and Move Top (MOVt) instructions move an immediate value of 16 bits at a time.
MOVw Rd, #N @ Move 16-bit immediate value into the bottom half of Rd, zeroing the top 16 bits of Rd. MOVt Rd, #N @ Move a 16-bit immediate value into the top half of Rd without changing the bottom 16 bits
There are many other flavors of MOV instructions, to move half words or bytes, to move data into special registers, and to move data to coprocessors. You can get more information about these move instructions from the Arm Architectural Reference starting on page A8-484.
The ADD and ADC (add with carry) instructions add the contents in two 32-bit “source” registers and place the result in a 32-bit “destination” register (the destination register can be the same as one of the sources).
ADD R0,R1,R2 @ R0 <= R1 + R2; R1 and R2 unchanged; status bits not updated ADC R0,R1,R2 @ R0 <= R1 + R2 + C; R1 and R2 unchanged; status bits not updated ADC R3, R2, #0xABC @ R3 <= 0xABC + R2; R2 unchanged; status bits not updated ADDS R0,R1,R2 @ R0 <= R1 + R2; R1 and R2 unchanged; status bits updated ADCNEQ R0,R1,R2 @ R0 <= R1 + R2 + C *if* Z bit = 0; R1 and R2 unchanged; status bits updated
The ARM processor has no increment instruction. Instead, an immediate ‘1’ is added to a register, and the result is stored back into the same register.
ADD R0,R0,#1 @ R0 <= R0 + 1; status bits not updated ADDS R0,R0,#1 @ R0 <= R0 + 1; status bits updated
There are also many other related ADD instructions. One source operand can be shifted prior to being added, and/or add instructions can use the SP or PC as the destination. You can get more information about these instructions from the Arm Architectural Reference starting on page A8-300.
The ARM has two shift instructions: arithmetic shift and logical shift. An arithmetic shift can right-shift a number by up to 32 bit positions, and the sign bit is placed in all vacated bits. A logical shift can shift right (LSR) or left (LSL) up to 32 bits, and a ‘0’ will be placed in all vacated bits. The number of bits to shift can be an immediate operand, or it can be placed in a register. The source register contents are not changed.
ASR r0,r1,#0x04 @ Arithmetic Shift Right R1 four bits, sign extend, and place result in r0 ASL r0,r1,r2 @ Arithmetic Shift Left R1 by the number of bits in the bottom of r1, sign extend, and place result in r0 LSL r0,r1,#0x06 @ Logical Shift Left R1 6 bits, 0 fill, place result in R0 LSR r0,r1,r2 @ Logcal Shift Right R1 by the number of bits in the bottom of R1, 0 fill, place result in r0
A branch instruction (mnemonic B) loads a new value into the PC. It can be used for conditional or unconditional branches (based on the conditional “extended mnemonics”, just like any other instruction). Branch instructions are used for creating loops and if-then-else constructs. Branch and Link instructions (mnemonic BL) are used to call subroutines. BL is identical to B, except that it copies the current PC to the Link register (R14) before performing the branch. This allows the programmer to copy the link register back into the PC at the end of the subroutine, so that on return, the program will resume executing at the instruction immediately following the one that called it.
B label1 @ Unconditional branch to “label1” – good for “always” loops BEQ label1 @ Conditional branch to label1 – branch taken only if Z flag set by last instruction
The branch opcode requires only 6 bits, leaving 24 opcode bits for an immediate operand in the opcode. In this case, the immediate operand is a number that is added to (or subtracted from) the current PC. As soon as the new address is calculated and loaded into the PC, the next executed instruction will be at that new address. This means two things: first, branches can only redirect program flow 2^23 address locations away from the current PC (not 2^24 - the most significant bit is used as a sign bit so branches can go both forward and backward); and second, we need to know how far away from the current PC to branch.
One of the Assembler’s main duties is to calculate branch addresses from labels. Labels are text strings in assembly source files that are not mnemonics, but are instead included to provide information to the assembler itself. Label text strings start in the leftmost column, and end with a colon. When the Assembler encounters a label, it simply records its location in the source file in a “symbol table”. Any given line in your source file can have a label - they do not change how the program functions in any way (if you wanted to, you could put a label on every line in your source file). The Assembler uses label location information to (among other things) calculate branch offsets.
In the example code below, before the Assembler can create the opcode for the BNE menomonic, it needs to know what immediate operand to load in the lower 24 bits. It calculates the immediate operand value by finding the difference between the address of the BNE instruction and the address of the referenced label (and as discussed earlier, the address of the label is in the symbol table created by the Assembler).
MOV R2,#10 @ R2 <- 10; R2 is loop index myloop: ADD R1,R1, #1 @ Increment R1 SUBS R2,R2, #1 @ Decrement R2 BNE myloop @ If R2 != 0, branch to myloop MOV R4, 0xABC @ Continue on with program...