ARM mov / mvn
Unlike x86, the target of a mov instruction is always a register: the str, ldr, stm, and ldm instructions
must be used to reference memory.
push and pop are aliases of specific forms of stm and ldm respectively.
Movement to and from floating point registers uses a completely different set of opcodes, see ARM fpu.
For more details about the shift pseudo-iunstructions or embedded use, see barrel shifter.
For x86, see lea, mov, push, pop, shl, shr (==sar).
For the hll stack/queue see push and pop, and javascript arrays also have an array.push method.
Also, data processing lightly covers the mov and mvn instructions as well.
[DEV/tbc] Note that you can "mov Rn,[local]", which is actually encoded as "ldr Rn,[fp+k]", and likewise "mov [local],Rn", which is actually encoded as "str Rn,[fp+k]", but you’ll get a compilation error for static variables, and instead have to code "lea Rm,[static_var]", which is actually encoded as mov plus up to three orr, followed by "mov Rn,[Rm]". Similarly with "constant THREE = 3, FOUR = four()", you can code "mov Rn,[THREE]" which is the same as "mov Rn,3", while "mov Rn,THREE" gets a complaint that you should use "lea Rn,[THREE]" instead, and the latter is mandatory for all dealings with FOUR.
push and pop are aliases of specific forms of stm and ldm respectively.
Movement to and from floating point registers uses a completely different set of opcodes, see ARM fpu.
For more details about the shift pseudo-iunstructions or embedded use, see barrel shifter.
For x86, see lea, mov, push, pop, shl, shr (==sar).
For the hll stack/queue see push and pop, and javascript arrays also have an array.push method.
Also, data processing lightly covers the mov and mvn instructions as well.
[DEV/tbc] Note that you can "mov Rn,[local]", which is actually encoded as "ldr Rn,[fp+k]", and likewise "mov [local],Rn", which is actually encoded as "str Rn,[fp+k]", but you’ll get a compilation error for static variables, and instead have to code "lea Rm,[static_var]", which is actually encoded as mov plus up to three orr, followed by "mov Rn,[Rm]". Similarly with "constant THREE = 3, FOUR = four()", you can code "mov Rn,[THREE]" which is the same as "mov Rn,3", while "mov Rn,THREE" gets a complaint that you should use "lea Rn,[THREE]" instead, and the latter is mandatory for all dealings with FOUR.
| Opcode | 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
| ROTATE | LIT8 | |||||||||||||||||||||||||||||||
| P | U | X | W | SHIFTH | ||||||||||||||||||||||||||||
| COND | OOO | OPCODE | S | RN | RD | RS | V | SHFT | Z | RM | ||||||||||||||||||||||
| mov[cond][s] Rd, Rm OP # | cond | 0 | 0 | 0 | 1 | 1 | 0 | 1 | S | SBZ | Rd | shift# | shft | 0 | Rm | |||||||||||||||||
| mov[cond][s] Rd, Rm OP Rs | cond | 0 | 0 | 0 | 1 | 1 | 0 | 1 | S | SBZ | Rd | Rs | 0 | shft | 1 | Rm | ||||||||||||||||
| mvn[cond][s] Rd, Rm OP # | cond | 0 | 0 | 0 | 1 | 1 | 1 | 1 | S | SBZ | Rd | shift# | shft | 0 | Rm | |||||||||||||||||
| mvn[cond][s] Rd, Rm OP Rs | cond | 0 | 0 | 0 | 1 | 1 | 1 | 1 | S | SBZ | Rd | Rs | 0 | shft | 1 | Rm | ||||||||||||||||
| mov[cond][s] Rd, # | cond | 0 | 0 | 1 | 1 | 1 | 0 | 1 | S | SBZ | Rd | rotate | # | |||||||||||||||||||
| mvn[cond][s] Rd, # | cond | 0 | 0 | 1 | 1 | 1 | 1 | 1 | S | SBZ | Rd | rotate | # | |||||||||||||||||||
| str[cond]<x> Rd, Rn, # | cond | 0 | 1 | 0 | P | U | X | W | 0 | Rn | Rd | # | ||||||||||||||||||||
| ldr[cond]<x> Rd, Rn, # | cond | 0 | 1 | 0 | P | U | X | W | 1 | Rn | Rd | # | ||||||||||||||||||||
| str[cond]<x> Rd, Rn, # | cond | 0 | 1 | 1 | P | U | X | W | 0 | Rn | Rd | shift# | shft | 0 | Rm | |||||||||||||||||
| ldr[cond]<x> Rd, Rn, # | cond | 0 | 1 | 1 | P | U | X | W | 1 | Rn | Rd | shift# | shft | 0 | Rm | |||||||||||||||||
| stm[cond]<am> Rm<!>, reglst | cond | 1 | 0 | 0 | P | U | 0 | W | 0 | Rn | register list | |||||||||||||||||||||
| ldm[cond]<am> Rm<!>, reglst | cond | 1 | 0 | 0 | P | U | 0 | W | 1 | Rn | register list | |||||||||||||||||||||
| push reglist | cond | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | register list | ||||||||||||||||||
| pop reglist | cond | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | 1 | register list | ||||||||||||||||||
P means pre/post indexing (trust me, you don’t want to know!)
U means up/down ("")
W means write-back ("")
S on mov/mvn determines whether to update the flags, for the rest it is a to/from memory direction.
If X is 0 then it is a word-sized str/ldr instruction, if 1 it is byte-sized (with...??).
If B is 0 then it is a word-sized swp, if 1 it is byte-sized.
T (whereby Rn is set to the calculated address) and EX (mark exclusive) forms of LDR are not supported.
While stm and ldm have four addressing modes (IA, IB, DA, and DB, aka EA/FD, FA/ED, ED/FA, and FD/EA respectively), only FD is supported by Phix, for obvious reasons of avoiding all that insanity. For data transfer use LDMFD (not LDMIA) and STMFD (not STMDB), and for the stack just use push and pop [DEV??].
Likewise the W bit is always set under Phix, aka ! after Rn, to indicate it is updated.
Since the all generated code is run in user mode, I don’t think the hat(^) is relevant.
For push and pop, Rn is pc, aka 13. I think P[erm, S??!!] should be 1 on the push, 0 on the pop, ditto !U??
The mvn ("Move Not") opcode is the same as mov but with (all 32) bits inverted on storage, so, for instance, "mov r0,-1" is technically invalid but automatically mapped by the compiler to "mvn r0,0", and "mov r1,#FF shl 8" is logically the same as (the technically invalid) "mov r1,#FFFF00FF". Elsewhere, mvn is often documented as "Move Negative", which is just downright misleading, since it is a 1’s compliment (aka not_bits), rather than a 2’s compliment (as in '-'). The compiler happily converts eg mov r0,-42 into the appropriate mvn, and of course it is recommended you rely on that.
examples
| Instruction | Description |
| lea r0, [i] | r0 := address of variable i |
| ldr r1, [r0] | r1 := ref of variable i |
| str r2, [r0] | [i] := r2 |
| lea r3, [r1*4] | r3 := r1*4 |
| mov r3, r1 lsl 2 | r3 := r1*4 |
| lsl r3, r1, 2 | r3 := r1*4 |
| lea r3, [r1+r2*4] | r3 := r1+r2*4 (using an add/sub) |
| lea r3, [r1-12] | r3 := r1-12 (ditto) |
| ldr r3, [r1-12] | r3 := [r1-12] |
| ldr r3, [r1+4]! | r3 := [r1+4]; r1 += 4 |
| str r3, [r1-12] | [r1-12] := r3 |
| str r3, [r1+4]! | [r1+4] := r3; r1 += 4 |
| ldr r3, [r1], 4 | r3 := [r1]; r1 += 4 |
| str r3, [r1], 4 | [r1] := r3; r1 += 4 |
| str r3, [r1, r2] | [r1+r2] := r3 |
| str r3, [r1, r2]! | [r1+r2] := r3; r1 += r2 |
| ldr r3, [r1], r2 | r3 := [r1]; r1 += r2 |
| ldr r3, [r1, r2 lsl 2] | r3 := [r1+r2*4] (maybe/not) |
| ldr r3, [r1+r2*4] | "" |
| lsl r0, 2 | r0 *= 4 |
| lsr r0, 2 | r0 := floor(r0/4) |
| lsr r0, r0, r2 | r0 := r0*power(2,r2) |
| lsr r0, r2 | "" |
| mov r0, 42 | r0 := 42 |
| mvn r1, r0 | r1 := -43 (aka not_bits) |
| mvn r1, 42 | r1 := -43 (ditto) |
| mov r1, -42 | r1 := -42 |
| mov r0, -1 | r0 := -1 (encoded as mvn r0, 0) |
| mov r0, -2 | r0 := -2 (encoded as mvn r0, 1) |
| mvn r0, 0 | r0 := -1 |
| mvn r0, 1 | r0 := -2 |
| mov r0, #FFFFFFFF | r0 := -1 (encoded as mvn r0, 0) |
Take special notice of the first three entries: in x86, you can mov eax,[i] in a single instruction,
whereas in ARM you have to load the variable address separately before you can load the contents.
Attempting ldr r0,[i] (or the matching str) results in a compilation error, as does ldr r1,r0.
There is no actual lea instruction in ARM assembly, it is purely a (Phix specific) pseudo-instruction.
In practice, (eg) "lea r0,[i]" maps to "ldr r0,i", though I don’t like or support that syntax.
You can also use more x86-like lea forms such as lea r0, [r1+r2*4], which maps to an add.
The compiler is at liberty to utilise all manner of dirty tricks to implement lea efficiently,
from constructing it in byte-sized chunks to using an offset from some other known address,
and even utilising the s0..s31 floating point registers as a first level alu register spill cache. The *2/4/8 forms of lea/ldr are implemented via lsl 1/2/3 (not much different to x86 then).
I have no immediate plans to support adr/ldr Rn,=literal pseudo-instructions, for now use mov/orr.
Likewise I don’t support any ldr Rn, <hll_var> or similar(/no-[]-register) forms:
In fact and of course lea Rn, [<hll_var>] is my take and as far as I go with all that.
The swp[b] instruction is noted as deprecated in ARMv6, so I’ll not use/support it.
L P U
LDMDA (Decrement After) LDMFA (Full Ascending) 1 0 0
LDMIA (Increment After) LDMFD (Full Descending) 1 0 1
LDMDB (Decrement Before) LDMEA (Empty Ascending) 1 1 0
LDMIB (Increment Before) LDMED (Empty Descending) 1 1 1
STMDA (Decrement After) STMED (Empty Descending) 0 0 0
STMIA (Increment After) STMEA (Empty Ascending) 0 0 1
STMDB (Decrement Before) STMFD (Full Descending) 0 1 0
STMIB (Increment Before) STMFA (Full Ascending) 0 1 1
--stmdb = stmfd = push (if Rn=sp and w=1) [full ascending]
--ldmia = ldmfd = pop (if Rn=sp and w=0 and sp is in the list)
--stmib = stmfa, ldmda = ldmfa [full descending]
--stmda = stmed, ldmib = ldmed [empty descending]
--stmia = stmea, ldmdb = ldmea [empty ascending]
I bit Distinguishes between the immediate and register forms of <shifter_operand>.S bit Signifies that the instruction updates the condition codes.
Rn Specifies the first source operand register.
Rd Specifies the destination register.
shifter_operand Specifies the second source operand.
MOV Move register or constant Rd : = Op2 4.5
MVN Move negative register Rd := 0xFFFFFFFF EOR Op2 4.5 (PL same as not_bits..)
1111 = MVN - Rd:= NOT Op2
MOV : Move
MOV<suffix> <dest>, <op1>
dest = op1
MOV loads a value into the destination register, from another register, a shifted register, or an immediate value.
You can specify the same register for the effect of a NOP instruction, or you can shift the same register if you choose:
MOV R0, R0 ; R0 = R0... NOP instruction
MOV R0, R0, LSL#3 ; R0 = R0 * 8
If R15 is the destination, the program counter or flags can be modified.
This is used to return to calling code, by moving the contents of the link register into R15:
MOV PC, R14 ; Exit to caller
MOVS PC, R14 ; Exit to caller preserving flags
(not 32-bit compliant)
MVN : Move Negative
MVN<suffix> <dest>, <op1>
dest = !op1
MVN loads a value into the destination register, from another register, a shifted register, or an immediate value.
The difference is the bits are inverted prior to moving, thus you can move a negative value into a register.
Due to the way this works (two’s complement), you want to move one less than the required number:
MVN R0, #4 ; R0 = -5
MVN R0, #0 ; R0 = -1
SWP : Swap
SWP<suffix> <dest>, <op1>, [<op2>]
SWP will:
Load a word from memory, address pointed to by operand two, and put that word in the destination register.
Store the contents of register operand one to that same address.
If the destination and operand one are the same register, then the contents of the register and the contents of the memory location given will be swapped.
If the B suffix is set, then a byte will be transferred, otherwise a word will be transferred.
(out of place/DEV, and I’m probably not?? going to use this in Phix...):
(not halfword or doubleword, see A3.11.3 [if ever needed], ditto multiple/A3.12)
Load instructions load a single value from memory and write it to a general-purpose register.
Store instructions read a value from a general-purpose register and store it to memory.
These instructions have a single instruction format:
LDR|STR{<cond>}{B}{T} Rd, <addressing_mode>
| 31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
| cond | 0 | 1 | I | P | U | B | W | L | Rn | Rd | address mode specific | ||||||||||||||||||||
I, P, U, W Are bits that distinguish between different types of <addressing_mode>.
See Addressing Mode 2 - Load and Store Word or Unsigned Byte on page A5-18
L bit Distinguishes between a Load (L==1) and a Store instruction (L==0).
B bit Distinguishes between an unsigned byte (B==1) and a word (B==0) access.
Rn Specifies the base register used by <addressing_mode>.
Rd Specifies the register whose contents are to be loaded or stored.
These are, arguably, the most useful instructions available.
It is all very well being able to do stuff with the registers,
but if you cannot load and store them to the main memory, then... <grin>
Single Data Transfer
The single data transfer instructions (STR and LDR) are used to load and store single bytes
or words of data from/to main memory. The addressing is very flexible.
First, we’ll look at the instruction:
LDR R0, address
STR R0, address
LDRB R0, address
STRB R0, address
These instructions load and store the value of R0 to the specified address.
If 'B' is also specified, as in the latter two instructions, then only a single byte is loaded or saved.
The three unused bytes in the word are zeroed upon loading.
The address can be a simple value, or an offset, or a shifted offset.
Write-back may be performed (to remove the need for adding/subtracting).
STR R0, [Rbase] Store R0 at Rbase.
STR R0, [Rbase, Rindex] Store R0 at Rbase + Rindex.
STR R0, [Rbase, #index] Store R0 at Rbase + index.
Index is an immediate value.
STR R0, [R1, #16] would load R0
from R1+16.
STR R0, [Rbase, Rindex]! Store R0 at Rbase + Rindex, &
write back new address to Rbase.
STR R0, [Rbase, #index]! Store R0 at Rbase + index, &
write back new address to Rbase.
STR R0, [Rbase], Rindex Store R0 at Rbase, & write back
Rbase + Rindex to Rbase.
STR R0, [Rbase, Rindex, LSL #2] will store R0 at the address
Rbase + (Rindex * 4)
STR R0, place Will generate a PC-relative offset
to 'place', and store R0 there.
You can, of course, use conditional execution on any of these instructions.
Note, however, that the conditional flag comes before the byte flag, so if you wish to load a byte when
the result is equal, the instruction would be LDREQB Rx, address (not LDRBEQ...).
If you specify pre-indexed addressing (where the base and index are both within square brackets), the
write-back is controlled by the presence or absence of the '!'.
The fourth and fifth examples above reflect this. Using this, you can automatically move forward or backward in memory.
A string print routine could then become:
.loop
LDRB R0, [R1, #1]!
SWI "OS_WriteC"
CMP R0, #0
BNE loop
instead of:
.loop
LDRB R0, [R1]
SWI "OS_WriteC"
ADD R1, R1, #1
CMP R0, #0
BNE loop
The use of '!' is invalid for post-indexed addressing (where the index is outside of the square brackets, as in example six above) as write-back is implied.
As you can see, the offset may be shifted. Additionally, the index offset may be subtracted from the base. In this case, you might use code such as:
LDRB R0, [R1, #-1]
You cannot modify the PSR with a load or store instruction, though you can store or load the PC. In order to load a stored 'state' and correctly restore it, use:
LDR R0, [Rbase]
MOVS R15, R0
The MOVS will cause the PSR bits to be updated, provided that you are privileged.
Using MOVS with PC is not 32-bit compliant.
According to the ARM assembler manual:
A byte load (LDRB) expects the data on bits 0 to 7 if the supplied address is on a word boundary,
on bits 8 to 15 if it is a word address plus one byte, and so on.
The selected byte is placed in the bottom 8 bits of the destination register, and the remaining bits of the register are filled with zeroes.
A byte store (STRB) repeats the bottom 8 bits of the source register four times across the data bus.
The external memory system should activate the appropriate byte subsystem to store the data.
A word load (LDR) or word store (STR) should generate a word aligned address. Using a non-word-aligned addresses has non-obvious and unspecified results.
The only thing of real note here is that you cannot use LDR to load a word from a non-aligned address.
Multiple Data Transfer
The multiple data transfer instructions (LDM and STM) are used to load and store multiple words of data from/to main memory.
The main use of LDM/STM is to dump registers that need to be preserved onto the stack. We've all seen STMFD R13!, {R0-R12, R14}.
The instruction is:
xxM type cond base write-back, {register list}
'xx' is LD to load, or ST to store.
'type' is:
Stack Other
LDMED LDMIB Pre-incremental load
LDMFD LDMIA Post-incremental load
LDMEA LDMDB Pre-decremental load
LDMFA LDMDA Post-decremental load
STMFA STMIB Pre-incremental store
STMEA STMIA Post-incremental store
STMFD STMDB Pre-decremental store
STMED STMDA Post-decremental store
The assembler takes care of how to map the mnemonics. Note that ED is not IB; it is only the same for a pre-decremental load. When storing, ED is post-decrement.
FD, ED, FA, and EA refer to a Full or Empty stack which is either Ascending or Descending.
A full stack is where the stack pointer points to the last data item written, and empty stack is where the stack pointer points to the first free slot.
A descending stack grows downwards in memory (ie, from the end of application space down) and an ascending stack is one which grows upwards in memory.
The other forms simply describe the behaviour of the instruction, and mean Increment After, Increment Before, Decrement After, Decrement Before.
RISC OS, by tradition, uses a Fully Descending stack. When writing in APCS assembler, it is common to set your stack pointer to the end of application space and
then use a Full Descending stack. If you are working with a high level language (either BASIC or C), then you don’t get a choice.
The stack pointer (traditionally R13) points to the end of a fully descending stack. You must continue this format, or create and manage your own stack (if
you’re the sort of die-hard person that would do something like this!).
'base' is the register containing the address to begin with. Traditionally under RISC OS, the stack pointer is R13, though you can use any available register except R15.
If you would like the stack pointer to be updated with the new register contents, simply set the write-back bit by following the stack pointer register with an '!'.
The register list is given in {curly brackets}. It doesn’t matter what order you specify the registers in, they are stored from lowest to highest.
As a single bit determines whether or not a register is saved, there is no point to trying to specify it twice.
A side effect of this is that code such as:
STMFD R13!, {R0, R1}
LDMFD R13!, {R1, R0}
will not swap the contents of two registers.
A useful shorthand has been provided. To encompass a range of registers, simply say the first and the last, and put a dash between them.
For example R0-R3 is identical to R0, R1, R2, R3, only tidier and saner...
When R15 is stored to memory, the PSR bits are also saved. When R15 is reloaded, the PSR bits are NOT restored unless you request it.
The method of requesting is to follow the register list with a '^'.
STMFD R13!, {R0-R12, R14}
...
LDMFD R13!, {R0-R12, PC}
This saves all registers, does some stuff, then reloads all registers. PC is loaded from R14 which was probably set by a BL instruction or some-such. The PSR flags are untouched.
STMFD R13!, {R0-R12, R14}
...
LDMFD R13!, {R0-R12, PC}^
This saves all registers, does some stuff, then reloads all registers. PC is loaded from R14 which was probably set by a BL instruction. The PSR flags are updated.
Warning: This code is not 32 bit compliant. You need to use MRS and MSR to handle the PSR. You cannot use the '^' suffix.
Note that in both examples, R14 is loaded directly into PC. This saves the need to MOV(S) R14 into R15.
Warning: Using MOVS PC, ... is not 32 bit compliant. You need to use MRS and MSR to handle the PSR.
ldr Rd := (int32)[addr]
str (int32)[addr] := Rd
ldrb Rd:=(uint8)[addr] (zero-extended)
strb (uint8)[addr] := (uint8)Rd
ldrh Rd:=(uint16)[addr] (zero-extended)
strh (uint16)[addr] := (uint16)Rd
ldrsb Rd:=(int8)[addr] (sign-extended)
ldrsh Rd:=(int16)[addr] (sign-extended)
NB no strsb or strsh since strb/h store both signed and unsigned ones
Copy a block of memory:
=======================
r9: address of src, r10: address of dest, r11: end of src (r11>r9)
loop: ldmia r9! {r0-r7}
stmia r10! {r0-r7}
cmp r9,r11
bne loop ; (PL I would have said BL, might need some kind of shift-loops...)
Erm...: You cannot use MOVS PC, R14 or LDMFD R13!, {registers, PC}^ in 32 bit code.