In this article I will explain three things:
This topic is new for me as well, I've always used scripts provided by vendor, and yesterday I decided to do this by myself. Partially because I like raping my brain, but mostly to learn something new. The article will guide you step by stop from creating the simplest, barely working linker script, to fully fledged Linker Script That Rocks.
I decided to focus on linker scripts for microcontrollers, because they require good understanding of the target platform and gives you opportunity to decide about everything (mostly). You need to know memory addresses and what to place at specific locations. Don't worry, I will explain everything.
You only need two things to start: ARM Toolchain and your favourite text editor. The newest ARM toolchain can be downloaded directly from ARM site.
If you are lucky and you work on Mac or Linux, you can simply extract the toolchain anywhere and point it temporarily in a shell's path:
$ export PATH=/your/toolchain/bin:$PATH
Now you should be able to execute all toolchain's tools prefixed with: arm-none-eabi-
.
Before we can start, there's one more thing we need to know: memory layout of the target device. You can't write linker script without knowing where to put your code, where to put variables etc. The memory layout differs between architectures and devices, so you must grab a datasheet for your microcontroller and find that information. For STM32F103RBT6 it can be found in the reference manual on page 53 (for SRAM) and page 54 (for flash).
This essentially means that you must put your code at address 0x08000000
and all your variables at 0x20000000
. Don't worry, I will explain later.
I almost forgot. You also need to know how particular microcontroller boots. For most of the time you are responsible for setting up the stack pointer at device boot time and this is done differently for different architectures. The recipe is quite straightforward: find how to set up stack pointer, and set it up to top of the SRAM memory. It's logical, the stack always grows downwards so it has to have space to grow.
If you take a look at page 61, you will find a decent description of how STM32 boots up, and how it sets up the stack pointer.
After this startup delay has elapsed, the CPU fetches the top-of-stack value from address0x00000000
, then starts code execution from the boot memory starting from0x00000004
.
Excellent, more useful things to write down.
There's also one more important thing you need to know about STM32, and this is explained still on the page 61. When the CPU reads from address 0x00000000
and further, it actually might access different memories, depending on a selected boot mode. This is called “memory mapping”. By default, STM32 boots from flash, so it maps memory region 0x08000000
to 0x00000000
; the flash memory is now accessible both from its original address and 0x0000 0000. This allows CPU to start reading instructions directly from flash.
Having the above information in mind, and knowing that we would like to boot from flash, we can already calculate appropriate addresses for stack initializer and entry point:
The documentation is a little misleading. The entry point address is not where execution will start. In fact, it should contain the address where CPU should jump to start the execution. In other words, dword from 0x00000000
will be loaded to SP register, and dword from 0x00000004
will be loaded to PC register.
As promised on the beginning, our first goal is to write ANYTHING that works. So let's do this!
The linker script can consists of a single block called SECTIONS
. In this block you define output sections that will be placed in the binary file. The most important sections are:
.text
- your code,.data
- your initialized data (global and static variables),.bss
- you uninitialized data (global and static variables).
For our first SSLS (stupidly simple linker script), we will use only the .text
section for the code. No data, no variables, just pure code. So, let's create a file called script.ld
and write something like this.
SECTIONS { .text : { *(.text) } }
The above script tells linker to:
.text
section (the leftmost expression)..text
sections from all object files (the expression in curly braces).
That's pretty simple, isn't it? But we are missing something, even a few somethings. We defined the code section, but we didn't specify where it should be placed. In the current form, the code will be placed at address 0x0
, but from what we have read earlier, it should bo loaded at address 0x08000000
, right? Right. Let's fix this.
SECTIONS { . = 0x08000000; .text : { *(.text) } }
The dot symbol in linker scripts is a location counter. It starts from 0x0
and can be modified either directly, as in the example above, or indirectly by adding sections, constants etc. So if you would read the location counter value after the output section .text
entry, it will be 0x08000000
plus the size of the added section. If you do not specify the address of an output section in some other way (other ways are described later), the address is set from the current value of the location counter.
Alright, we have our code at valid location, that's nice. If only CPU knows where the code begins and where the stack starts…
ENTRY(main); SECTIONS { . = 0x08000000; LONG(0x20005000); LONG(main | 1); .text : { *(.text) } }
I think I owe you an explanation.
If you remember, when STM32 boots, it reads two dwords from the boot memory (flash in our case); the first is the initial stack pointer and the second is address where the execution should start. LONG(0x20005000)
simply instructs linker to place this raw 4-byte value in the output binary. Why this value in particular? SRAM starts at 0x20000000
, STM32 has 20 kBs (0x5000
) of SRAM memory, 0x20000000 + 0x5000 = 0x20005000
= top of the SRAM memory.
The second expression outputs address of main
function to the binary file. As you see, the address is OR'ed with 1 to produce odd value. In ARM architecture, odd function address tells CPU to switch to Thumb mode on branch, as opposed to even addresses denoting ARM mode.
Not all branch instructions causes mode switch. B
or BL
only branches; BX
branches with mode switch accordingly to the last bit of an address; BLX
branches and always switches the mode. You can read more on the dedicated page.
STM32F103RBT6 is based on Cortex-M3 that support only Thumb instructions, this is why we tell it on the start to switch to Thumb mode. This is normally transparent to a developer, compiler either uses BL
instruction to keep the current mode, or fixes the calling address automatically. The reason why we do this manually here is because we create SSLS. This will become clearer when we develop SSLS to SLS (simple linker script).
I also added another new thing: ENTRY(main)
. This tells linker what symbol should be used as the entry point of the program. This also prevents .text
section containing main function from being garbage collected by linker.
Okay, we have a linker script, that's nice, but we also need to have something to link. Let's create a simple code that will light a green LED on Nucleo board.
#include "registers.h" void main(void) { RCC->APB2ENR |= (1 << RCC_APB2ENR_IOPAEN); GPIOA->CRL |= (0b10 << GPIOA_CRL_MODE5); GPIOA->CRL &= ~(0b11 << GPIOA_CRL_CNF5); GPIOA->BSRR = (1 << 5); while (1); }
The mysterious registers.h
file is a helper header containing registers' addresses. I've created it from information found in the reference manual. I simply defined a structure per group of registers, and then defined a pointer to the structure using the base address. Thanks to structures, I don't need to perform manual pointer arithmetic, because it's done automagically when accessing a structure's field.
#ifndef LINKER_TUTORIAL_REGISTERS_H #define LINKER_TUTORIAL_REGISTERS_H #include <stdint.h> typedef struct { uint32_t CR; uint32_t CFGR; uint32_t CIR; uint32_t APB2RSTR; uint32_t APB1RSTR; uint32_t AHBENR; uint32_t APB2ENR; uint32_t APB1ENR; uint32_t BDCR; uint32_t CSR; } RCC_Reg; #define RCC ((RCC_Reg*) 0x40021000) #define RCC_APB2ENR_IOPAEN 2 typedef struct { uint32_t CRL; uint32_t CRH; uint32_t IDR; uint32_t ODR; uint32_t BSRR; uint32_t BRR; uint32_t LCKR; } GPIOA_Reg; #define GPIOA ((GPIOA_Reg*) 0x40010800) #define GPIOA_CRL_MODE5 20 #define GPIOA_CRL_CNF5 22 #endif //LINKER_TUTORIAL_REGISTERS_H
And that's all! Since the clock source is not configured, STM32 will use internal 8 MHz RC oscillator, and that's more than sufficient for this simple project. Let's compile and link it:
$ arm-none-eabi-gcc -mcpu=cortex-m3 -mthumb -Tscript.ld -Wl,--gc-sections -Os main.c
In order to compile and link I set CPU type to Cortex-M3, instructions set to Thumb, I've chosen my linker script, told linker to get rid of unused sections and set code size optimization. If everything went good, the firmware file will be created as a.out
. This file is in ELF format, and can't be used directly to flash your microcontroller, instead you need to convert it to Intel HEX. This can be easily done with the following command:
$ arm-none-eabi-objcopy -O ihex a.out fw.hex
Before you load fw.hex to ST-Link utility or OpenOCD, take a few minutes to analyze its content. You can open it in any text editor, and some of them (like Sublime Text after installing appropriate plugin) can highlight specific parts for easier reading. You can read more about Intel HEX syntax on Wikipedia.
Take a look at the first two lines:
: 02 0000 04 0800 F2 : 08 0000 00 0050002011000008 71
The first is a 04 record (Extended Linear Address), that means it sets starting address for next 00 records. As you see, the address is 0800, looks familiar, eh? If you extend it to 32 bits (that's how 04 records work) you will get: 0x08000000
. It's our flash address!
The next record's type is 00, that means data. This is exactly what will be loaded to microcontroller. This particular line instructs programmer to flash 8 bytes at previously set address + 0x0000 offset. Let me translate the payload from little endian to big endian: 20005000 08000011
. Holy crap, it's the initial stack pointer and probably the address of the main function! Let's execute one more command:
$ arm-none-eabi-objdump -D a.out
If you scroll the output to top, you should see something like this:
08000010 <main>: 8000010: 4a07 ldr r2, [pc, #28] ; (8000030 <main+0x20>)
The main function is actually at address 08000010
, but we OR'ed it earlier to produce odd result. You see? The physical placement of function didn't change, it's only how calls are made.
The code compiles, stack pointer and entry point addresses are at valid locations, everything looks promising. Flash it baby! It worked perfectly on my board, the green LED light up as it was supposed to.
The linker script we've created is undoubtedly working and you can use it freely with your simple projects. But, there's a one caveat you must be aware of: you won't be able to modify global or static variables! The script lacks .data
section, and linker will put all your statics and globals right after .text
section in flash. As a consequence they are readable, but not writable. You can clearly see this when you perform object dump of the final binary file.
Disassembly of section .data: 08000058 <a>: 8000058: deadbeef cdple 14, 10, cr11, cr13, cr15, {7}
I've added an additional global variable: int a = 0xDEADBEEF
, and then compiled/linked using our script. As you see, the variable truly exists in flash memory. Local variables won't be affected, they are placed on stack, so as long as you don't use global or static variables, this linker script will work for you. If you demand something more sophisticated, keep reading.
If you are reading this, that means SSLS didn't satisfy your needs. That's good. SSLS was meant to be just an example that linker script doesn't have to be complicated to do its primary job. In this step we will create Simple Linker Script that truly can be used in projects, without giving up on basic language functionalities (like global variables).
In the previous example we used so called location counter to set the starting address of .text
section. It's a sufficient approach for simple scripts, but it will become a complete mess as we add more memory regions. Not only the visual aspects are included, using the location counter solely we limit ourselves to very very basic configuration, and we will hit the wall very soon.
In linker script we can define one, and only one block named MEMORY
. In this block we list all memory regions that we are keen to use. The regions we define there don't need to reflect microcontroller's memory layout exactly, however they are strongly correlated. The MEMORY
block is only for you, for linker and it doesn't affect the target device in any way.
So, what regions should we define in this block? That's obvious: flash and SRAM.
MEMORY { flash (RX) : ORIGIN = 0x08000000, LENGTH = 128K sram (RW) : ORIGIN = 0x20000000, LENGTH = 20K } ENTRY(main); SECTIONS { . = 0x08000000; LONG(0x20005000); LONG(main | 1); .text : { *(.text) } }
The syntax of entries in MEMORY
is kinda self-descriptive.
As I said before, you are free to set regions as you like. You can have, for example, two flash regions: flash_1 starting from address 0x08000000
and flash_2 at 0x08001000
. Why? I don't know, maybe you have some reasons to put a part of your code at a specific address.
And now it's time to reorganize the script a little.
MEMORY { flash (RX) : ORIGIN = 0x08000000, LENGTH = 128K sram (RW) : ORIGIN = 0x20000000, LENGTH = 20K } ENTRY(main); SECTIONS { .text : { LONG(0x20005000); LONG(main | 1); *(.text) } > flash }
Here's a list of things I've done:
.text
section.> flash
.We also need to do something with the SRAM memory. When we created SSLS, variables were placed in the flash memory, because linker was not aware of existence of other memory regions. And now, we can finally tell it!
MEMORY { flash (RX) : ORIGIN = 0x08000000, LENGTH = 128K sram (RW) : ORIGIN = 0x20000000, LENGTH = 20K } ENTRY(main); SECTIONS { .text : { LONG(0x20005000); LONG(main | 1); *(.text) } > flash .data : { *(.data) } > sram }
That's all. We simply defined a new output section: .data
, that will include all .data
sections from all object files, and this will be placed in SRAM memory. In order to peek where linker will put globals now, I've added one: int a = 0xDEADBEEF
. Let's do the object dump and see…
Disassembly of section .data: 20000000 <a>: 20000000: deadbeef cdple 14, 10, cr11, cr13, cr15, {7}
That looks good! This time the global variable has been placed in SRAM memory where it can be both read and written. Let's also take a look at the last few lines of Intel HEX file (after doing the objcopy):
:02 0000 04 2000 DA :04 0000 00 EFBEADDE C4
The first record tells programmer to set the programming address to 0x20000000
and the next line tells it to write 0xDEADBEEF
there. Looks good? Well… no. What you trying to do here is flashing data to SRAM, and that's not possible. Even if it would, everything will vanish at the first reset.
Here comes the first limitation of Simple Linker Script: you can make use of global/static variables, but you can't set their initial value at the declaration time. Actually, this is something you can live with, the value can be set as well during runtime.
Global variables that are declared but not defined at the same time will end up in .bss
section. We didn't define such section yet, but linker is smarter than us and placed it right after the .data
section, exactly where it should be. And here comes the second (and last) limitation of Simple Linker Script: uninitialized global/static variables won't be zero'ed by default. Well, this is a little handicap, but still tolerable.
If you accept the two disabilities I mentioned, the linker script will serve you well. If you still want more, go to the next section where you will learn how to properly initialize .data
and .bss
sections, and you will also see how to prepare interrupt vector table.
It's time to write something that works in every aspect. We want a robust linker script that initialises variables with their predefined values and zeroes uninitialised ones. Only then we could say that we have everything what's required for a basic Linker Script.
Let's sum up what we have missing:
Let's do this sequentially because these points are correlated. We start with changing how the entry point and stack addresses are set.
We've set addresses of entry point and stack pointer directly in linker script. This solution works properly, but as you remember, we had to OR the entry point address, to inform the CPU, that the function under this address is using Thumb instructions set. This shouldn't be done manually, we aren't supposed to do any low-level voodoo to write a simple code, right? Can you imagine reworking every function call in your code? Thankfully, compiler is aware of such voodoo, and fixes all function calls accordingly, we just need to make use of its power.
The “problem” is that, compiler works on source code level, so it'll properly fix all references there, but it knows nothing about the linker script, so the main
reference in it is left untouched. I will do one more thing, just out of curiosity. Below the main
function, I've added a global variable with an address of the main
.
void (*main_ptr)(void) = main;
Now I compiled it and did the object dump. This is how the .data
section looks like:
Disassembly of section .data: 20000000 <main_ptr>: 20000000: 08000009 stmdaeq r0, {r0, r3}
And the actual address of main
:
08000008 <main>
You see? Compiler automatically fixed the address, we did nothing. Now we just need to put this modified address to the beginning of the binary, and say bye-bye to manual ORing. But how do we put something at a specific memory address? It's easy: the same way we've put all the sections earlier.
Add this small block of code under your main
function:
void (*prologue[]) (void) __attribute__((section (".prologue"))) = { (void (*)(void)) 0x20005000, main };
Wow, slow down, Satan! This clearly needs an explanation. Let's start with breaking this to simpler parts.
void (*prologue[]) (void)
- this is a definition of an array of pointers to functions that take nothing and return nothing;__attribute__
- this is a special keyword that allows us specify additional properties of functions, variables, structures etc.;section (".prologue")
- this is a parameter to __attribute__
that tells compiler to put the related symbol (array here) into the section with the specified name (name doesn't in this case);
Putting it together: define an array of pointers to void functions and put it to the .prologue
section, initialising it with two items – the first is the initial stack pointer and the second is an address of the main
function.
Now we just need to tell linker to put this section with an array at the very beginning of a binary file, so the stack pointer and the entry point will be the first two values CPU reads on boot. We did that manually before, now we can have a more elegant solution.
MEMORY { flash (RX) : ORIGIN = 0x08000000, LENGTH = 128K sram (RW) : ORIGIN = 0x20000000, LENGTH = 20K } ENTRY(main); SECTIONS { .text : { KEEP(*(.prologue)); *(.text) } > flash .data : { *(.data) } > sram }
The KEEP()
function tells linker to exclude a mentioned section from the garbage collection process. Linker would do that because we didn't reference the prologue
array anywhere in the code, whereby linker could wrongly assume it's an unused symbol.
If you compile and dump the object file, you will see something beautiful at the beginning:
Disassembly of section .text: 08000000 <prologue>: 8000000: 20005000 andcs r5, r0, r0 8000004: 08000009 stmdaeq r0, {r0, r3} 08000008 <main>:
Exactly how it should look like!
In progress!