User Tools

Site Tools


Sidebar

General

Namespaces

hardware:writing_linker_script_for_stm32_from_scratch

Writing linker script for STM32 from scratch

In this article I will explain three things:

  • how to write a simple linker script from scratch,
  • how to do this for STM32 microcontrollers (STM32F103RBT6 in this article),
  • how to avoid using vendor's files (headers, codes etc.)

This topic is new for me as well, I've always used scripts provided by vendor, and yesterday I decided to do this by myself. Partially because I like raping my brain, but mostly to learn something new. The article will guide you step by stop from creating the simplest, barely working linker script, to fully fledged Linker Script That Rocks.

I decided to focus on linker scripts for microcontrollers, because they require good understanding of the target platform and gives you opportunity to decide about everything (mostly). You need to know memory addresses and what to place at specific locations. Don't worry, I will explain everything.

Tools preparation

You only need two things to start: ARM Toolchain and your favourite text editor. The newest ARM toolchain can be downloaded directly from ARM site.

If you are lucky and you work on Mac or Linux, you can simply extract the toolchain anywhere and point it temporarily in a shell's path:

$ export PATH=/your/toolchain/bin:$PATH

Now you should be able to execute all toolchain's tools prefixed with: arm-none-eabi-.

Knowledge preparation

Before we can start, there's one more thing we need to know: memory layout of the target device. You can't write linker script without knowing where to put your code, where to put variables etc. The memory layout differs between architectures and devices, so you must grab a datasheet for your microcontroller and find that information. For STM32F103RBT6 it can be found in the reference manual on page 53 (for SRAM) and page 54 (for flash).

  • SRAM starts at address 0x2000 0000
  • Flash starts at address 0x0800 0000

This essentially means that you must put your code at address 0x08000000 and all your variables at 0x20000000. Don't worry, I will explain later.

I almost forgot. You also need to know how particular microcontroller boots. For most of the time you are responsible for setting up the stack pointer at device boot time and this is done differently for different architectures. The recipe is quite straightforward: find how to set up stack pointer, and set it up to top of the SRAM memory. It's logical, the stack always grows downwards so it has to have space to grow.

If you take a look at page 61, you will find a decent description of how STM32 boots up, and how it sets up the stack pointer.

After this startup delay has elapsed, the CPU fetches the top-of-stack value from address 0x00000000, then starts code execution from the boot memory starting from 0x00000004.

Excellent, more useful things to write down.

  • Stack initializer: 0x0000 0000
  • Entry point: 0x0000 0004

There's also one more important thing you need to know about STM32, and this is explained still on the page 61. When the CPU reads from address 0x00000000 and further, it actually might access different memories, depending on a selected boot mode. This is called “memory mapping”. By default, STM32 boots from flash, so it maps memory region 0x08000000 to 0x00000000; the flash memory is now accessible both from its original address and 0x0000 0000. This allows CPU to start reading instructions directly from flash.

Having the above information in mind, and knowing that we would like to boot from flash, we can already calculate appropriate addresses for stack initializer and entry point:

  • Stack initializer: 0x0800 0000
  • Entry point: 0x0800 0004

The documentation is a little misleading. The entry point address is not where execution will start. In fact, it should contain the address where CPU should jump to start the execution. In other words, dword from 0x00000000 will be loaded to SP register, and dword from 0x00000004 will be loaded to PC register.

It's coding time

As promised on the beginning, our first goal is to write ANYTHING that works. So let's do this!

Stupidly Simple Linker Script (SSLS)

The linker script can consists of a single block called SECTIONS. In this block you define output sections that will be placed in the binary file. The most important sections are:

  • .text - your code,
  • .data - your initialized data (global and static variables),
  • .bss - you uninitialized data (global and static variables).

For our first SSLS (stupidly simple linker script), we will use only the .text section for the code. No data, no variables, just pure code. So, let's create a file called script.ld and write something like this.

SECTIONS
{
  .text : { *(.text) }
}

The above script tells linker to:

  1. Create a .text section (the leftmost expression).
  2. Take all .text sections from all object files (the expression in curly braces).
  3. Put them to the section created in step 1.

That's pretty simple, isn't it? But we are missing something, even a few somethings. We defined the code section, but we didn't specify where it should be placed. In the current form, the code will be placed at address 0x0, but from what we have read earlier, it should bo loaded at address 0x08000000, right? Right. Let's fix this.

SECTIONS
{
  . = 0x08000000;
  .text : { *(.text) }
}

The dot symbol in linker scripts is a location counter. It starts from 0x0 and can be modified either directly, as in the example above, or indirectly by adding sections, constants etc. So if you would read the location counter value after the output section .text entry, it will be 0x08000000 plus the size of the added section. If you do not specify the address of an output section in some other way (other ways are described later), the address is set from the current value of the location counter.

Alright, we have our code at valid location, that's nice. If only CPU knows where the code begins and where the stack starts…

script.ld
ENTRY(main);
 
SECTIONS
{
  . = 0x08000000;
  LONG(0x20005000);
  LONG(main | 1);
  .text : { *(.text) }
}

I think I owe you an explanation.

If you remember, when STM32 boots, it reads two dwords from the boot memory (flash in our case); the first is the initial stack pointer and the second is address where the execution should start. LONG(0x20005000) simply instructs linker to place this raw 4-byte value in the output binary. Why this value in particular? SRAM starts at 0x20000000, STM32 has 20 kBs (0x5000) of SRAM memory, 0x20000000 + 0x5000 = 0x20005000 = top of the SRAM memory.

The second expression outputs address of main function to the binary file. As you see, the address is OR'ed with 1 to produce odd value. In ARM architecture, odd function address tells CPU to switch to Thumb mode on branch, as opposed to even addresses denoting ARM mode.

Not all branch instructions causes mode switch. B or BL only branches; BX branches with mode switch accordingly to the last bit of an address; BLX branches and always switches the mode. You can read more on the dedicated page.

STM32F103RBT6 is based on Cortex-M3 that support only Thumb instructions, this is why we tell it on the start to switch to Thumb mode. This is normally transparent to a developer, compiler either uses BL instruction to keep the current mode, or fixes the calling address automatically. The reason why we do this manually here is because we create SSLS. This will become clearer when we develop SSLS to SLS (simple linker script).

I also added another new thing: ENTRY(main). This tells linker what symbol should be used as the entry point of the program. This also prevents .text section containing main function from being garbage collected by linker.

Okay, we have a linker script, that's nice, but we also need to have something to link. Let's create a simple code that will light a green LED on Nucleo board.

main.c
#include "registers.h"
 
void main(void) {
    RCC->APB2ENR |= (1 << RCC_APB2ENR_IOPAEN);
    GPIOA->CRL |= (0b10 << GPIOA_CRL_MODE5);
    GPIOA->CRL &= ~(0b11 << GPIOA_CRL_CNF5);
    GPIOA->BSRR = (1 << 5);
 
    while (1);
}

The mysterious registers.h file is a helper header containing registers' addresses. I've created it from information found in the reference manual. I simply defined a structure per group of registers, and then defined a pointer to the structure using the base address. Thanks to structures, I don't need to perform manual pointer arithmetic, because it's done automagically when accessing a structure's field.

registers.h
#ifndef LINKER_TUTORIAL_REGISTERS_H
#define LINKER_TUTORIAL_REGISTERS_H
 
#include <stdint.h>
 
typedef struct {
    uint32_t CR;
    uint32_t CFGR;
    uint32_t CIR;
    uint32_t APB2RSTR;
    uint32_t APB1RSTR;
    uint32_t AHBENR;
    uint32_t APB2ENR;
    uint32_t APB1ENR;
    uint32_t BDCR;
    uint32_t CSR;
} RCC_Reg;
#define RCC ((RCC_Reg*) 0x40021000)
#define RCC_APB2ENR_IOPAEN 2
 
typedef struct {
    uint32_t CRL;
    uint32_t CRH;
    uint32_t IDR;
    uint32_t ODR;
    uint32_t BSRR;
    uint32_t BRR;
    uint32_t LCKR;
} GPIOA_Reg;
#define GPIOA ((GPIOA_Reg*) 0x40010800)
#define GPIOA_CRL_MODE5 20
#define GPIOA_CRL_CNF5 22
 
#endif //LINKER_TUTORIAL_REGISTERS_H

And that's all! Since the clock source is not configured, STM32 will use internal 8 MHz RC oscillator, and that's more than sufficient for this simple project. Let's compile and link it:

$ arm-none-eabi-gcc -mcpu=cortex-m3 -mthumb -Tscript.ld -Wl,--gc-sections -Os main.c

In order to compile and link I set CPU type to Cortex-M3, instructions set to Thumb, I've chosen my linker script, told linker to get rid of unused sections and set code size optimization. If everything went good, the firmware file will be created as a.out. This file is in ELF format, and can't be used directly to flash your microcontroller, instead you need to convert it to Intel HEX. This can be easily done with the following command:

$ arm-none-eabi-objcopy -O ihex a.out fw.hex

Before you load fw.hex to ST-Link utility or OpenOCD, take a few minutes to analyze its content. You can open it in any text editor, and some of them (like Sublime Text after installing appropriate plugin) can highlight specific parts for easier reading. You can read more about Intel HEX syntax on Wikipedia.

Take a look at the first two lines:

fw.hex
: 02 0000 04 0800 F2
: 08 0000 00 0050002011000008 71

The first is a 04 record (Extended Linear Address), that means it sets starting address for next 00 records. As you see, the address is 0800, looks familiar, eh? If you extend it to 32 bits (that's how 04 records work) you will get: 0x08000000. It's our flash address!

The next record's type is 00, that means data. This is exactly what will be loaded to microcontroller. This particular line instructs programmer to flash 8 bytes at previously set address + 0x0000 offset. Let me translate the payload from little endian to big endian: 20005000 08000011. Holy crap, it's the initial stack pointer and probably the address of the main function! Let's execute one more command:

$ arm-none-eabi-objdump -D a.out

If you scroll the output to top, you should see something like this:

08000010 <main>:
 8000010: 4a07    ldr r2, [pc, #28] ; (8000030 <main+0x20>)

The main function is actually at address 08000010, but we OR'ed it earlier to produce odd result. You see? The physical placement of function didn't change, it's only how calls are made.

The code compiles, stack pointer and entry point addresses are at valid locations, everything looks promising. Flash it baby! It worked perfectly on my board, the green LED light up as it was supposed to.

Important notice

The linker script we've created is undoubtedly working and you can use it freely with your simple projects. But, there's a one caveat you must be aware of: you won't be able to modify global or static variables! The script lacks .data section, and linker will put all your statics and globals right after .text section in flash. As a consequence they are readable, but not writable. You can clearly see this when you perform object dump of the final binary file.

Disassembly of section .data:

08000058 <a>:
 8000058:	deadbeef 	cdple	14, 10, cr11, cr13, cr15, {7}

I've added an additional global variable: int a = 0xDEADBEEF, and then compiled/linked using our script. As you see, the variable truly exists in flash memory. Local variables won't be affected, they are placed on stack, so as long as you don't use global or static variables, this linker script will work for you. If you demand something more sophisticated, keep reading.

Simple Linker Script (SLS)

If you are reading this, that means SSLS didn't satisfy your needs. That's good. SSLS was meant to be just an example that linker script doesn't have to be complicated to do its primary job. In this step we will create Simple Linker Script that truly can be used in projects, without giving up on basic language functionalities (like global variables).

Adding a new block: MEMORY

In the previous example we used so called location counter to set the starting address of .text section. It's a sufficient approach for simple scripts, but it will become a complete mess as we add more memory regions. Not only the visual aspects are included, using the location counter solely we limit ourselves to very very basic configuration, and we will hit the wall very soon.

In linker script we can define one, and only one block named MEMORY. In this block we list all memory regions that we are keen to use. The regions we define there don't need to reflect microcontroller's memory layout exactly, however they are strongly correlated. The MEMORY block is only for you, for linker and it doesn't affect the target device in any way.

So, what regions should we define in this block? That's obvious: flash and SRAM.

sls.ld
MEMORY {
  flash   (RX) : ORIGIN = 0x08000000, LENGTH = 128K
  sram    (RW) : ORIGIN = 0x20000000, LENGTH = 20K
}
 
ENTRY(main);
 
SECTIONS
{
  . = 0x08000000;
  LONG(0x20005000);
  LONG(main | 1);
  .text : { *(.text) }
}

The syntax of entries in MEMORY is kinda self-descriptive.

  • The first column is a name of a region, it can be anything meaningful to you.
  • The second is a desired access, for flash memory it's Read and eXecute, for SRAM: Read and Write.
  • The next is a starting address of a region, you usually have this from the microcontroller's documentation.
  • The last column sets the maximum size of a region; this prevents you from putting too much data into it. Linker will raise an error if it detects such situation.

As I said before, you are free to set regions as you like. You can have, for example, two flash regions: flash_1 starting from address 0x08000000 and flash_2 at 0x08001000. Why? I don't know, maybe you have some reasons to put a part of your code at a specific address.

And now it's time to reorganize the script a little.

sls.ld
MEMORY {
  flash   (RX) : ORIGIN = 0x08000000, LENGTH = 128K
  sram    (RW) : ORIGIN = 0x20000000, LENGTH = 20K
}
 
ENTRY(main);
 
SECTIONS
{
  .text :
  {
    LONG(0x20005000);
    LONG(main | 1);
    *(.text) 
  } > flash
}

Here's a list of things I've done:

  • Removed the direct location counter manipulation. Since we explicitly told linker where to put the content of the section, it's no longer needed to set it manually.
  • Moved the stack pointer and entry point values to the .text section.
  • Told linker to put this section to flash memory: > flash.

We also need to do something with the SRAM memory. When we created SSLS, variables were placed in the flash memory, because linker was not aware of existence of other memory regions. And now, we can finally tell it!

sls.ld
MEMORY {
  flash   (RX) : ORIGIN = 0x08000000, LENGTH = 128K
  sram    (RW) : ORIGIN = 0x20000000, LENGTH = 20K
}
 
ENTRY(main);
 
SECTIONS
{
  .text :
  {
    LONG(0x20005000);
    LONG(main | 1);
    *(.text) 
  } > flash
  .data :
  {
    *(.data)
  } > sram
}

That's all. We simply defined a new output section: .data, that will include all .data sections from all object files, and this will be placed in SRAM memory. In order to peek where linker will put globals now, I've added one: int a = 0xDEADBEEF. Let's do the object dump and see…

Disassembly of section .data:

20000000 <a>:
20000000:	deadbeef 	cdple	14, 10, cr11, cr13, cr15, {7}

That looks good! This time the global variable has been placed in SRAM memory where it can be both read and written. Let's also take a look at the last few lines of Intel HEX file (after doing the objcopy):

:02 0000 04 2000 DA
:04 0000 00 EFBEADDE C4

The first record tells programmer to set the programming address to 0x20000000 and the next line tells it to write 0xDEADBEEF there. Looks good? Well… no. What you trying to do here is flashing data to SRAM, and that's not possible. Even if it would, everything will vanish at the first reset.

Here comes the first limitation of Simple Linker Script: you can make use of global/static variables, but you can't set their initial value at the declaration time. Actually, this is something you can live with, the value can be set as well during runtime.

What about uninitialized variables?

Global variables that are declared but not defined at the same time will end up in .bss section. We didn't define such section yet, but linker is smarter than us and placed it right after the .data section, exactly where it should be. And here comes the second (and last) limitation of Simple Linker Script: uninitialized global/static variables won't be zero'ed by default. Well, this is a little handicap, but still tolerable.

If you accept the two disabilities I mentioned, the linker script will serve you well. If you still want more, go to the next section where you will learn how to properly initialize .data and .bss sections, and you will also see how to prepare interrupt vector table.

Linker Script (LS)

It's time to write something that works in every aspect. We want a robust linker script that initialises variables with their predefined values and zeroes uninitialised ones. Only then we could say that we have everything what's required for a basic Linker Script.

Let's sum up what we have missing:

  • proper entry point and stack definition,
  • interrupt vectors,
  • data initialisation.

Let's do this sequentially because these points are correlated. We start with changing how the entry point and stack addresses are set.

Entry point and stack definitions

We've set addresses of entry point and stack pointer directly in linker script. This solution works properly, but as you remember, we had to OR the entry point address, to inform the CPU, that the function under this address is using Thumb instructions set. This shouldn't be done manually, we aren't supposed to do any low-level voodoo to write a simple code, right? Can you imagine reworking every function call in your code? Thankfully, compiler is aware of such voodoo, and fixes all function calls accordingly, we just need to make use of its power.

The “problem” is that, compiler works on source code level, so it'll properly fix all references there, but it knows nothing about the linker script, so the main reference in it is left untouched. I will do one more thing, just out of curiosity. Below the main function, I've added a global variable with an address of the main.

void (*main_ptr)(void) = main;

Now I compiled it and did the object dump. This is how the .data section looks like:

Disassembly of section .data:
 
20000000 <main_ptr>:
20000000:	08000009 	stmdaeq	r0, {r0, r3}

And the actual address of main:

08000008 <main>

You see? Compiler automatically fixed the address, we did nothing. Now we just need to put this modified address to the beginning of the binary, and say bye-bye to manual ORing. But how do we put something at a specific memory address? It's easy: the same way we've put all the sections earlier.

Add this small block of code under your main function:

void (*prologue[]) (void) __attribute__((section (".prologue"))) = {
        (void (*)(void)) 0x20005000,
        main
};

Wow, slow down, Satan! This clearly needs an explanation. Let's start with breaking this to simpler parts.

  • void (*prologue[]) (void) - this is a definition of an array of pointers to functions that take nothing and return nothing;
  • __attribute__ - this is a special keyword that allows us specify additional properties of functions, variables, structures etc.;
  • section (".prologue") - this is a parameter to __attribute__ that tells compiler to put the related symbol (array here) into the section with the specified name (name doesn't in this case);

Putting it together: define an array of pointers to void functions and put it to the .prologue section, initialising it with two items – the first is the initial stack pointer and the second is an address of the main function.

Now we just need to tell linker to put this section with an array at the very beginning of a binary file, so the stack pointer and the entry point will be the first two values CPU reads on boot. We did that manually before, now we can have a more elegant solution.

script.ld
MEMORY {
  flash   (RX) : ORIGIN = 0x08000000, LENGTH = 128K
  sram    (RW) : ORIGIN = 0x20000000, LENGTH = 20K
}
 
ENTRY(main);
 
SECTIONS
{
  .text :
  {
    KEEP(*(.prologue));
    *(.text)
  } > flash
  .data :
  {
    *(.data)
  } > sram
}

The KEEP() function tells linker to exclude a mentioned section from the garbage collection process. Linker would do that because we didn't reference the prologue array anywhere in the code, whereby linker could wrongly assume it's an unused symbol.

If you compile and dump the object file, you will see something beautiful at the beginning:

Disassembly of section .text:

08000000 <prologue>:
 8000000:	20005000 	andcs	r5, r0, r0
 8000004:	08000009 	stmdaeq	r0, {r0, r3}

08000008 <main>:

Exactly how it should look like!

In progress!

hardware/writing_linker_script_for_stm32_from_scratch.txt · Last modified: 2020/06/16 02:00 by itachi