Naked and afraid in the world of ARM.

Olof Astrand
5 min readSep 15, 2022

--

AI generated image of the world of ARM (without thumbs).

This will introduce the basics of the Coretx-m7 arm architecture and some of it s ABI:s (Cortex-M7)

Lots of info was found in this great blog, cortex-m-rtos-context-switching

The registers (r0-r15)

r15 PC. The Program Counter (Current Instruction)
r14 LR. The Link Register (Return Address)
r13 SP The Stack Pointer (Banked as MSP/PSP)
r12 IP The Intra-Procedure-call scratch register
r11
r10
r9. Many times used as Platform Register ( -fpic and -msingle-pic-base )
r8,r7, r6, r5, r4. (Caller saved, they must be preserved after the function returns)
r3, r2, r1, r0 Argument / scratch registers
(must not be restored after return, r0 will be arg0 and also contain the return value)

This register usage is according to the AAPCS

The Procedure Call Standard for the ARM Architecture (AAPCS) forms part of the Base Standard Application Binary Interface for the ARM Architecture (BSABI) specification. By writing code that adheres to the AAPCS, you can ensure that separately compiled and assembled modules can work together.

Function calls and Exceptions (Branch Link)

The arm does not hare RET and RETI instructions. Instead when calling a function,

 bl functionName 

then the next instruction to be executed after returning is stored in the LR register. This way you can return from the fuction with

bx lr

However if your function calls another function (thus modifying the lr register it is better to use a prologue of

push { lr }

And an epilogue (at the end of the function) This way you are able to call as many functions as you like. This detail is however handled by the compiler, as well as which registers that needs to be saved as they are used in the fuctions by the compiler.

pop { pc }

Odd Addresses

Note that we oversimplified the useage of LR. To tell the processor that code is executed in thumb mode, you must use an odd address.

LR[31:1] ← return address
LR[0] ← code type at return address (0 ARM, 1 Thumb)

Long branch call stopovers (aka the Veneer)

The BL instructions are unable to address the full 32-bit address space, (due to limited length of the instructions )so it may be necessary for the linker to insert a veneer (trampoline) between the calling routine and the called subroutine. Veneers may also be needed to support dynamic linking. Any veneer inserted must preserve the contents of all registers except IP (r12) and the condition code flags.

Exception and interrupt handling

In order for the hardware to figure out what state to restore when exiting an exception or interrupt, a special value, known as EXC_RETURN needs to be loaded into the link register, lr. Typically, this will just mirror the value in the lr on exception entry.

An interrupt is handled by putting a special value into the LR register, 0xFFFFFFF1 (return to Handler Mode use MSP) ,0xFFFFFFF9 (Return to thread mode use MSP) ,0xFFFFFFFD (Return to thread mode use PSP) or 0xFFFFFFE1,0xFFFFFFE9,0xFFFFFFED (same but with FPU Extended Frames)

This way interrupt hanlders can be written in “C” However one should remember that other interrups may be pending, and in such case registers will not be popped from the stack until the last interrupt is served.

Exception frame

Qemu

Is a great tool for understanding and trace of what the processor is doing. I have made a patch to make the stm32h7 processor, here: qemu

Useful arguments to with uart emulation,

arm-softmmu/qemu-system-arm -d cpu,unimp,int,page,mmu -machine artpi -gdb “tcp::1234” -S -kernel build/STM32H7.elf -serial tcp::5551,server,nowait -serial tcp::5552,server,nowait -serial tcp::5553,server,nowait -serial tcp::5554,server,nowait -serial tcp::5555,server,nowait -serial tcp::5556,server,nowait -serial tcp::5557,server,nowait -serial tcp::5558,server,nowait -serial tcp::5559,server,nowait

Noteworthy trace is,

-d cpu,unimp,int,page,mmu

…tailchaining to pending exception

The Naked attribute

Finally , the naked but not so afraid any longer. When compiling with the naked attribute,

void PendSV_Handler(void) __attribute__ (( naked ));

No prologue or epilouge is generated which could be useful in rare cases. i.e. context switch in an RTOS.

Local stack variables and stack frames

The prolouge for many functions add this sub sp instructions. This will add space for 2 int size (32 bit) variables,

push { r4, r5, r6, lr }
sub sp,#0x8

Assembly, crash course

uint32_t is_flag_set

.global is_flag_set

ldr (load register)

Move special to register. msr psp,r0

Move register to special. mrs r3, psp

Check variable

 ldr r0,=is_flag_set  ldr r3,[r0]  cmp r3,#0x0 beq LABEL_not_flag_set 

Set variable to 1


ldr r0,=is_flag_set
mov r1,#1
str r1,[r0]

What is the differences between “arm-none-eabi-” and “arm-linux-gnueabi”? Can I use “arm-linux-gnueabi” tool chain in bare-metal environment? How do you know which toolchain binary to use where?

The general form of compiler/linker prefix is as follows:

A-B-C

Where:

  • A indicates the target (arm for AArch32 little-endian, aarch64 for AArch64 little-endian).
  • B indicates the vendor (none or unknown for generic) . Note that this is optional (Eg: not present in arm-linux-gnueabihf)
  • C indicates the ABI in use (linux-gnu* for Linux, linux-android* for Android, elf or eabi for ELF based bare-metal).

The bare-metal ABI (embedded ABI) will assume a different C library (newlib for example, or even no C library) to the Linux ABI (which assumes glibc). Therefore, the compiler may make different function calls depending on what it believes is available above and beyond the Standard C library.

Compiler options explained

  • -specs=nosys.specs
  • -specs=nano.specs
  • -lnosys

Gcc uses specs-strings, which control which subprocesses to run and what parameters it shall pass to them. The behavior defined by the spec-strings can be overridden using spec-files.

Looking at these spec files in the lib folder of the gcc tool chain (e.g. /usr/lib/arm-none-eabi/lib) we can see that the mentioned spec files define which standard library is to be used by the linker.

For example, nosys.specs just defines that system calls should be implemented as stubs that return errors when called (-lnosys). The choice of libc in this case depends on whether nano should be used. With %G the libgcc spec-string is processed, which defines the parameters passed to the linker.

Read all about it, here Spec-Files.html

--

--