UNDER CONSTRUCTION
This article describes some ABI and toolchain notes about systems without a Memory Management Unit (MMU), with a focus on FDPIC and the future RISC-V FDPIC ABI.
Linux binfmt loaders
fs/Kconfig.binfmt
defines a few loaders.
BINFMT_ELF
defaults to y and depends onMMU
.BINFMT_ELF_FDPIC
defaults to y whenBINFMT_ELF
is not selected. A few architecture supportBINFMT_ELF_FDPIC
for NOMMU. ARM supports FDPIC even with a MMU.BINFMT_FLAT
is provided for a few architectures.
BINFMT_AOUT
, removed in 2022, had been supported for
alpha/arm/x86-32.
BFLT
uClinux uses an object file format called Binary Flat format (BFLT). https://myembeddeddev.blogspot.com/2010/02/uclinux-flat-file-format.html has an introduction.
The Linux kernel has supported the format before the git era
(fs/binfmt_flat.c
). Both version 2
(OLD_FLAT_VERSION
in the kernel) and version 4 are
supported. Shared library support was removed
in April 2022.
BFLT is not for relocatable files. An image file is typically
converted from ELF using elf2flt.
ld-elf2flt
is a ld wrapper that invokes
elf2flt
when the option -elf2flt
is seen.
GCC's m68k port supports -msep-data
, a special
-fPIC
mode, which assumes that text and data segments are
placed in different areas of memory. This option is used for XIP
(eXecute In Place).
-mno-pic-data-is-text-relative
kpatch (live kernel patching) uses this option for s390x.
FDPIC
The Linux kernel has supported the format before the git era
(fs/binfmt_elf_fdpic.c
). Only ET_DYN
executables are supported. Each port that supports FDPIC defines an
EI_OSABI
value to be checked by the loader.
Several architectures define a FDPIC ABI.
- Fujitsu FR-V: The FR-V FDPIC ABI, initial version in 2004
- Blackfin: Blackfin FDPIC ABI
- SuperH: The SH FDPIC ABI, initial version in 2008
- ARM FDPIC ABI, initial version in 2013.
Here is a summary.
The read-only sections, which can be shared, are commonly referred to
as the "text segment", whereas the writable sections are non-shared and
commonly referred to as the "data segment". Functions and certain data
symbols (.rodata
) reside in the text segment, while other
data symbols and the GOT reside in the data segment. Special entries
called "canonical function descriptors" also reside in the GOT.
A call-clobbered register is reserved as the FDPIC register, used to
access the data segment. Upon entry to a function, the FDPIC register
holds the address of _GLOBAL_OFFSET_TABLE_
. The text
segment can be referenced using PC-relative addressing. We will see
later that sometimes it's unknown whether a non-preemptible symbol
resides in the text segment or the data segment, in which case
GOT-indirect addressing with the FDPIC register has to be used.
A function call is called external if the destination may reside in another module, which has a different data segment and therefore needs a different FDPIC register value. Therefore, an external function call needs to update the FDPIC register as well as changing the program counter (PC). The FDPIC register can be spilled into a stack slot or a call-saved register, if the caller needs to reference the data segment later.
Calling a function pointer, including calling a PLT entry, also sets
both the FDPIC register and PC. When the address of a function is taken,
the address of its canonical function descriptor is obtained, not that
of the entry point. The descriptor, resides in the GOT, contains
pointers to both the function's entry point and its FDPIC register
value. The two GOT entries are relocated by a dynamic relocation of type
R_*_FUNCDESC_VALUE
(e.g. R_FRV_FUNCDESC_VALUE
).
If the symbol is preemptible, the code sequence loads a GOT entry.
When the symbol is a function, the GOT entry is relocated by a dynamic
relocation R_*_FUNCDESC
and will contain the address of the
function descriptor address.
Let's checkout examples taking addresses of functions and variables.
1 | __attribute__((visibility("hidden"))) void hidden_fun(); |
Function access in FDPIC
Canonical function descriptors are stored in the GOT, and their access depends on whether the referenced function is preemptible or not.
- For non-preemptible functions: the address of the descriptor is directly computed by adding an offset to the FDPIC register.
- For preemptible functions: a GOT entry is loaded first. This entry,
relocated by a
R_*_FUNCDESC
dynamic relocation, holds the final address of the function descriptor.
1 | // arm-linux-gnueabihf-gcc -c -fpic -mfdpic -Wa,--fdpic |
Unfortunately, when linking a DSO, an
R_ARM_GOTOFFFUNCDESC
relocation referencing a hidden symbol
results in a linker error. This error likely arises because the
generated R_ARM_FUNCDESC_VALUE
dynamic relocation requires
a dynamic symbol. While this can be implemented using an
STB_LOCAL STT_SECTION
dynamic symbol, GNU ld currently
lacks support for this approach.
1 | % arm-linux-gnueabihf-gcc -fpic -mfdpic -O2 -Wa,--fdpic q.c -shared |
Let's try sh4.
sh4-linux-gnu-gcc -fpic -mfdpic -O2 q.c -shared -nostdlib
allows taking the address of a hidden function but not a protected
function.
Then, let's see a global variable initialized by the address of a function and a C++ virtual table.
1 | struct A { virtual void foo(); }; |
1 | // arm-linux-gnueabihf-g++ -c -fpic -mfdpic -Wa,--fdpic |
Data access in FDPIC
GOT-indirect addressing is required for accessing data symbols under two conditions:
- Preemptible symbols: Traditional GOT requirement.
- Non-preemptible symbols with potential data segment placement: This
includes
- Writable data symbols: This covers both locally declared
(
int var;
) and externally declared (extern int var;
) non-const variables. - Potential dynamic initialization:
const A a; extern const int var;
- Certain guaranteed constant initialization:
extern constinit const int *const extern_const;
. Constant initialization may require a relocation, e.g.constinit const int *const extern_const = &var;
- Writable data symbols: This covers both locally declared
(
1 | get_hidden_var: // non-preemptible data with potential data segment placement |
The dynamic relocations R_*_RELATIVE
and
R_*_GLOB_DAT
do not use the standard
+ load_base
semantics. It seems that musl fdpic doesn't
support the special R_*_RELATIVE
.
If the referenced data symbol is non-preemptible and guaranteed to be in the text segment, we can use PC-relative addressing. However, this scenario is remarkably rare in practice. The most likely use case is like the following:
1 | const int ro_array[] = {1, 2, 3, 4}; |
GCC's arm port does not seem to utilize PC-relative addressing. We can try GCC's SuperH port:
1 | // sh4-linux-gnu-gcc -S -fpic -mfdpic -O2 q.c |
It optimizes get_hidden_var
but not
get_ro_hidden_var
.
Thread-local storage
ARM FDPIC ABI defines static TLS relocations
R_ARM_TLS_GD32_FDPIC, R_ARM_TLS_LDM32_FDPIC, R_ARM_TLS_IE32_FDPIC
to be relative to GOT, as opposed to their non-FDPIC counterpart
relative to PC.
Toolchain notes
-mfdpic
, which only makes sense for
-fpie
/-fpic
, enables FDPIC code generation.
Like -mno-pic-data-is-text-relative
, external data accesses
use a different base register, r9 for arm. In addition, external
function calls save and restore r9.
gas's arm port needs --fdpic
to assemble FDPIC-related
relocation types. -mfdpic -mtls-dialect=gnu2
is not
supported. The ARM FDPIC ABI uses ldr
to load a 32-bit
constant embedded in the text segment. The offset is used to materialize
the address of a GOT entry (canonical function descriptor, address of
the canonical function descriptor, or address of data).
RISC-V
Several proposals exist for defining FDPIC-like ABIs to work for MMU-less systems.
- [RFC] RISC-V ELF FDPIC psABI addendum and RISC-V FDPIC/NOMMU toolchain/runtime support: This offers a starting point, but needs further discussion.
- lowRISC ePIC: While simple and interesting, ePIC lacks dynamic linking support.
- Stefan O'Rear's proposal: This proposal holds promise and deserves close attention.
Undoubtly, GP should be used as the FDPIC register.
Loading a constant near code (like ARM) is not efficient. Instead, consider a two-instruction sequence:
- Use hi20 and lo12 instructions to generate an offset relative to the GP register.
- Use
c.add a0, gp
to compute the address of the GOT entry.
Maciej's code sequence supports both function and data access through
indirect GP-relative addressing. We can easily enhance it by adding
R_RISCV_RELAX
to enable linker relaxation and improve
performance. Additionally, for consistency with similar notations on
x86-64 and AArch64 ("gotpcrel"), let's adopt "gotgprel" notation.
1 | .L0: |
For data access, the code sequence is followed by instructions like:
1 | lb a0,0(rY) |
Function descriptors and data have different semantics, requiring two relocation types. Stefan O'Rear proposes:
R_RISCV_FUNCDESC_GOTGPREL_HI
: Find or create two GOT entries for the canonical function descriptor.R_RISCV_GOTGPREL_HI
: For or create a GOT for the symbol, and return an offset from the FDPIC register.
Drawing inspiration from ARM FDPIC, two additional relocation types are needed for TLS. This results in a 4-type scheme.
RISC-V FDPIC: optimization
Addressing performance concerns is crucial. Stefan suggests an "indirect-to-relative optimization and relaxation scheme":
R_RISCV_PIC_ADD
: Tagsc.add rX, gp
to enable optimizationR_RISCV_INTERMEDIATE_LOAD
: Tagsld rY, <gotgprel_lo12>(rX)
to enable optimization
Indirect GP-relative addressing can be optimized to direct GP-relative addressing under specific conditions:
- Non-preemptible functions
- Non-preemptible data in the data segment
1 | # Indirect GP-relative to direct GP-relative |
GOT-indirect addressing can be optimized to PC-relative for non-preemptible data in the text segment.
1 | # Indirect GP-relative to PC-relative |
While enabling absolute addressing for -no-pie
linking
might seem straightforward, the added complexity might outweigh the
benefits.
1 | # Indirect GP-relative to absolute |
Remember, FDPIC aims for position-independent code, and a static dynamic library scheme wouldn't be ideal. Imagine multiple executables sharing the same system: each text segment needs a unique address for proper execution.
RISC-V FDPIC: thread-local storage
To handle TLSDESC, we introduce a new relocation type:
R_RISCV_TLSDESC_GPREL_HI
. This type instructs the linker to
find or create two GOT entries unless optimized to local-exec or
init-exec. The combined hi20 and lo12 offsets compute the GP-relative
offset to the first GOT entry.
1 | label: |
Existing relocation types, R_RISCV_TLSDESC_LOAD_LO12
and
R_RISCV_TLSDESC_ADD_LO12
, are extended to work with
R_RISCV_TLSDESC_GPREL_HI
.
1 | # TLSDESC to initial-exec optimization |
For initial-exec TLS model, we need a new pseudoinstruction, say,
la.tls.ie.fd rX, sym
. It expands to:
1 | lui rX, 0 # R_RISCV_TLS_GOTGPREL_HI20(sym) |
Stefan's scheme uses R_RISCV_PIC_ADDR_LO12_I
. I think we
can just repurpose R_RISCV_PCREL_LO12_I
.