Put arbitrary data into binary
When developing stuff for embedded platforms it might happen that there’s a need to include some arbitrary chunk of data into the final binary. When size of the data is not big, and it’s not expected to be modified frequently, it can be kept as a constant array of bytes, words, or whatever. Depending on a target platform and a toolchain, linker puts constants into the flash memory out of the box, or it might require additional steps. For example, on AVR platform one can use PROGMEM 1 macro to mark objects which should be placed in the flash memory. On RP2040 2 (ARM) it should be enough to add
const modifier because
.rodata section is part of a flash region.
It starts to be more complicated when size of the data is significant, let’s say it’s 2 MB. Why would anyone keep data that big on an embedded platform? Reasons are different, this can be some image to display, a WAV file to play, or a configuration file. No matter the reason, it’s going to be an horror to keep and maintain an array of 2 million elements. And what if the data changes? I can’t imagine recreating the array, even with some help of Python scripts. The ideal solution would be including these files automatically when the binary is built. Good news - it’s possible!
Know the platform
Unfortunately, there’s no universal solution, as I mentioned in the introduction - different platforms might require different approaches. What’s universal though is the goal: the data must be in the flash memory, and it must be accessible from code. There’s no magic here, unless some fancy frameworks are in use, and everything usually starts from reading a linker script used for assembling the final binary. For this article I’m using RP2040, known better as: Raspberry Pi Pico.
Include with .incbin
ARM assembler (and other assemblers usually) supports a neat directive called: .incbin 3. The purpose of this directive is of course to include a binary data from a file. Now, in order to use it, an assembler file must be created and added as a source file to the project. The content of the file should be:
.section .flashdata .balign 4 .global data_file_1 .global data_file_1_size data_file_1: .incbin "file1" .set data_file_1_size, . - data_file_1
This small piece of an assembly includes in the current position whatever is in the file1 file. The “current position” is in this case somewhere in the
.flashdata section, aligned to 4 bytes boundary. The actual placement is unknown until the file is compiled and linked. Thanks to having two global symbols - one with an address of the data, and second with its size - the data can be accessed later from a C code. Although having a separate symbol for the size is optional, it comes very handy.
Compiler will look for files given to
.incbin directive in the include directories, i.e. directories supplied to compiler with the
-I option. When using CMake, the directories can be added with
In order to use foreign symbols in a C code, they must be defined like this:
extern const char data_file_1; extern const char data_file_1_size;
Type of the symbols doesn’t matter because they are only placeholders for the symbol values. The following object dump should help understand this:
1000d044 g .rodata 00000000 data_file_1 0000002d g *ABS* 00000000 data_file_1_size
The first column is a “symbol value”, that means this is the value to which the symbol will resolve when it’s used. For
data_file_1 the value is an address in a memory, but for
data_file_1_size the value is just a raw number equal to the size of the data.
On RP2040 I can now print the content to the console:
printf("%.*s", (int) data_file_1_size, data_file_1);
In my file I had some text content. It isn’t null-terminated by default, so I had to supply the size manually.
Automate with .macro
Everything looks okay, until there’s a need to include 10, or 20 files. Quick copy-paste of the assembly listing I gave above would solve the problem, but this looks bad. What to do now? Of course - another directive!
.macro incfile file:req, name:req .section .flashdata .balign 4 .global \name .global \name\()_size \name: .incbin "\file" .set \name\()_size, . - \name .endm
This is the same code as before but wrapped in a macro. The macro takes two arguments: a file name, and a symbol name. Now, instead of ugly duplications, multiple files can be included like below:
incfile file1,data_file_1 incfile file2,data_file_2
The symbols can be referenced later in a C code as before: