Introduction to ARM Assembly Language

Overview of ARM Instructions

1231

A simplified block diagram of the ARM CPU and memory system is shown below. The ARM is a RISC machine that uses a load-store architecture, so all inputs to the ALU must come from registers, and the ALU results must be stored in a register. In the diagram below, note the only inputs to the ALU come from registers or immediates, and the ALU output can only be stored in a register. Most data processed by the ALU will come from main memory or from a port (here, main memory and ports are called an “external source”, or ES). Since there are only a few registers, programs must constantly “load” operand data from an ES to a register, and then “store” the data back to the ES to free up the register for use by later instructions.

All ARM instruction codes are 32 bits, and the ARM memory bus is 32 bits. So, each instruction occupies one memory location, and one “fetch” can deliver one instruction to the CPU.

The general opcode format is shown below. The bits in the opcode all have particular assignments. The upper four bits are “condition codes”. These four bits are combined with the ALU status bits to determine whether the instruction gets executed, or whether it’s replaced with a “nop” (a nop is a “no operation” code that does not cause any changes to registers or status bits). Embedding condition bits in the opcode, and enabling the CPU to use “conditional execution” for instructions is a special feature of the ARM processor, and will be examined in more detail later.

The three “Op1" bits and single "Op” bit form another four-bit field that defines the instruction class (‘x’ is a don’t care, meaning the bit can be a ‘0’ or a ‘1’, and the ‘-‘ symbol means the bit is not used). See the table below.

The “instruction specific fields” in the opcode vary by instruction, and there is no single definition for these 24 bits that applies to all instructions. Nevertheless, the figure below shows some names for these bit fields that are generally used when instructions need a particular field. For example, the Op field generally contains eight bits (including the four bits discussed above) that are decoded to define particular instructions. But, some instructions don’t need eight bits, so often, some of those bits are unused. The S field determines whether the ALU status bits are updated or not, and the Rn, Rd, Rs, and Rm fields generally define operand locations. Many instructions don’t need all four register definitions, so those bit fields are either unused by particular instruction, or repurposed to provide other control information. Several instructions can also use immediate operands, so the lower bit fields are often repurposed to hold the immediate data.

There really isn’t a strong reason to know the individual opcode for any given instruction, but it is helpful to understand how the opcodes are constructed, and what they control. If you are interested, you can read more about ARM instruction encoding in the ARM Architecture Reference manual starting on page A5-193.

In the block diagram and opcode figures above, the Op and S fields provide inputs to configure the ALU, and sometimes to configure the controller state machine as well (more on that later). The Rd field provides the register address for storing the ALU output (labelled DST_Addr in the block diagram), and the Rn and Rd fields select the input/data mux channels. Once the ALU is configured by bits from the Op and S fields, operand data (as selected by the Rn and Rs feilds) flows through the muxes and ALU, and the ALU output is fed back to the selected destination register in the register file. The next rising edge of the DST_Clk signal “executes” the instruction by writing the ALU output into a register. Pretty straight forward!

ARM has a relatively manageable set of “basic” instructions like Add, Load Register, Save Register, Multiply, Shift Left, And, Xor, Branch, and so forth. Each of these perhaps 70 instructions has a unique opcode field, and each of them use some (or all) of the other bit fields to define which register the source operands are stored in, which destination register to output should be stored in, whether an immediate operand is included, and other related information. (Again, if an immediate operand is used, it will occupy some of the otherwise unused bits in the opcode). These approximately 70 basic instructions are identified by bit patterns in opcode bits 27-21 and bit 4. That’s 8 bits total, which could encode 256 different instructions (so, there is room to add more instructions should ARM ever wish to do so).

As an example, the 32-bit opcode 111000000100100010010000000000100 instructs the ARM processor to add the contents of R1 and R2, place the result in R4, and to execute the instruction unconditionally, and then update the ALU status bits. When written as a binary number, an Opcode is also referred to as “machine code”. Note that to figure out the opcode for this example, I referred to page A8-312 of the ARM ARM (see below).

Rather than force low-level programmers to memorize machine codes, Assembler programs use mnemonics instead. The term “mnemonic” refers to a substitute memorization device that is easier to remember than the more complex item that you’re trying to memorize. For example, SOHCAHTOA is a popular mnemonic device for remembering the trigonometric relationships. In Assembly language we use mnemonics for instruction definitions, for source and destination locations, and in the case of ARM, for identifying conditional execution and whether the ALU status bits should be updated. Mnemonics have a 1-to-1 relationship with the opcodes the represent – they’re just easier to remember. One of the main jobs of an Assembler program is to replace the mnemonics with their “1…0…” opcodes… more on that a bit later as well.

As an example, the 32 bit machine code above would have the ARM assembler mnemonic “ADDS R4, R1, R2”, which is clearly more readable and more friendly than the 111000000100100010010000000000100 machine code. The ADD part of the mnemonic results in bits 27-21 being “0000010”; the S appended to ADD causes bit 20 to be a ‘1’, which in turn causes the ADD instruction to update the ALU status bits (see the S bit discussion below); bits 19-16 are “0001” to select r1 for the destination register, and so on. Examine the machine code and the mnemonic, and refer to the figure above to be sure you unsderstand how the opcode is constructed.

Immediate Operands

No instructions use all 32 opcode bits, and in fact, many instructions use fewer than 16. These otherwise unused bits can be “overloaded” or repurposed to hold “immediate” data. In the ARM processor, immediate data is contained within the 32-bit opcode itself. Since the opcode had to be fetched from memory anyway, these immediate operands come along for the ride, and they are “immediately” available. Any instruction that uses immediate data can get at least 8 bits, and some instructions can use up to 24 bits.

Condition Code Bits

The most-significant four bits of all instructions are “condition codes”. As mentioned earlier, these four bits are combined with the ALU status bits to determine whether the instruction gets executed, or whether it’s replaced with a completely benign “nop” instruction. These four bits are defined by the opcode. For unconditional execution, the “base” instruction mnemonic is used. For example, ADD R4, R1, R2 will always execute the addition, regardless of the ALU status bits. For conditional execution, mnemonic extensions are used to define the conditions under which the instruction should be executed. For example, to cause an ADD to occur only if the previous result was not ‘0’, the extended mnemonic ADDNE R4, R1, R2 could be used. Adding the “NE” extension changes the 4-bit condition code field from “always execute”, or “1110”, to “execute if not equal” or “0001”. The table below shows the condition code field, and the 2-letter mnemonic that can be added to an instruction to cause that condition to be checked.

Virtually all ARM instructions offer conditional execution. Checking conditions amounts to checking the ARM status register (the ASPR) to see if the previous instruction result was zero, negative, or resulted in a carry or overflow. (The ASPR status register is discussed later).

Table 2. Condition code bits (Arm Architecture Reference Manual, page A8-288)
Condition Mnemonic Extension Meaning (integer) Meaning (Floating-point) Condition Flags
0000 EQ Equal Equal Z == 1
0001 NE Not Equal Not Equal or unordered Z == 0
0010 CS Carry Set Greater than, equal, or unordered C== 1
0011 CC Carry Clear Less than C == 0
0100 MI Minus, negative Less than N == 1
0101 PL Plus, positive or zero Greater than, equal, or unordered N == 0
0110 VS Overflow Unordered V == 1
0111 VC No overflow Not unordered V == 0
1000 HI Unsigned Higher Greater than, or unordered C == 1 and Z == 0
1001 LS Unsigned lower or same Less than, or equal C == 0 or Z == 1
1010 GE Signed greater than or equal Greater than or equal N == V
1011 LT Signed less than Less than, or unordered N != V
1100 GT Signed greater than CGreater than Z == 0 and N == V
1101 LE Signed less than or equal Less than, equal, or unordered Z == 1 or N != V
1110 None (AL) Always (unconditional) Always (unconditional) Any

Update Status (S) Bit

The S bit in the opcode defines whether an instruction updates the ALU status bits. Just like with condition codes, any mnemonic can be extended with an S to cause the instruction to update the status bits. As examples, ADD R4, R1, R2 will load R4 with R1 + R2, opcode bit 20 will be a ‘0’, and the instruction will not change the ALU status bits. The instruction mnemonic ADDS R4, R1, R2 will do the same addition, but because the “S” is added, bit 20 will be a ‘1’ and the ALU status bits will be updated to reflect the ALU status immediately after the instruction is executed.

General Instruction Format

The ARM processor supports many classes of instructions, including bit wise logical operations like AND and XOR; arithmetic instructions like add, subtract, compare, and multiply; flow control instructions like branch, breakpoint, supervisor call and exception return; data handling instructions like shift, rotate, pack and extend; and some other miscellaneous instructions. The ARM Architecture Reference Manual defines all available instructions. To give you a feel for the manual, the figure below was taken from the manual to illustrate one class of instructions. The table shows the opcodes for data processing instructions (there are perhaps 10 other similar tables showing the opcodes for other types of instructions). The first four bits are the condition code bits that define the conditions under which the instruction will execute. Bits 27, 26, and 25 are all ‘0’ to indicate the data processing class, and the next 5 bits (24 – 20) are labeled “op”. The table shows the op field, and what codes select which instruction. The next 8 bits are unused by this class of instruction, so they are left blank. Bits 11-7 are used for a possible 5-bit immediate operand (in this case, the immediate operand can define a shift or rotation of one of the operands – more on this later). Finally, bits 5 and 6 are used to further differential certain instructions.

Figure 1. Encoding of ARM Data-Processing(register) Instruction (Arm Architecture Reference Manual, page A5-197)
Figure 1. Encoding of ARM Data-Processing(register) Instruction (Arm Architecture Reference Manual, page A5-197)
Table 1. Data Processing (register) Instructions (Arm Architecture Reference Manual, page A5-197)
op op2 imm5 Instruction Register
0000x - - Biwise AND AND (register)
0001x - - Biwise Exclusive OR EOR (register)
0010x - - Subtract SUB (register)
0011x - - Reverse Subtract RSB (register)
0100x - - Add ADD (register)
0101x - - Add with Carry ADC (register)
0110x - - Subtract with Carry SBC (register)
0111x - - Reverse Subtract with Carry RSC (register)
10xx0 - - Data processing and miscellaneous instructions
10001 - - Test TST (register)
10011 - - Test Equivalence TEQ (register)
10101 - - Compare CMP (register)
10111 - - Compare Negative CMN (register)
1100x - - Bitwise OR ORR (register)
1101x 00 00000 Move MOV (register,ARM)
- - not 00000 Logical Shift Left LSL (immediate)
- 01 - Logical Shift Right LSR (immediate)
- 10 - Arithmetic Shift Right ASR (immediate)
- 11 00000 Rotate Right with Extend RRX
- - not 00000 Rotate Right ROR (immediate)
1110x - - Biwise Bit Clear BIC (register)
1111x - - Biwise NOT MVN (register)

The figure below shows details of the ADD instruction. Check to see if the ADD instruction is compatible with the general instruction format in the table above.

Figure 2. ADD Opcode Instruction (Arm Architecture Reference Manual, page A8-312)
Figure 2. ADD Opcode Instruction (Arm Architecture Reference Manual, page A8-312)