Background:
-fno-pic
can only be used by executables. On most platforms and architectures, direct access relocations are used to reference external data symbols.-fpic
can be used by both executables and shared objects. Windows has__declspec(dllimport)
but most other binary formats allow a default visibility external data to be resolved to a shared object, so generally direct access relocations are disallowed.-fpie
was introduced as a mode similar to-fpic
for ELF: the compiler can make the assumption that the produced object file can only be used by executables, thus all definitions are non-preemptible and thus interprocedural optimizations can apply on them.
For
1 | extern int a; |
-fno-pic
typically produces an absolute relocation (a PC-relative relocation can be used as well). On ELF x86-64 it is usually R_X86_64_32
in the position dependent small code model. If a
is defined in the executable (by another translation unit), everything works fine. If a
turns out to be defined in a shared object, its real address will be non-constant at link time. Either action needs to be taken:
- Emit a dynamic relocation in every use site. Text sections are usually non-writable. A dynamic relocation applied on a non-writable section is called a text relocation.
- Emit a single copy relocation. The linker obtains the size of the symbol, allocates the bytes in .bss (this may make the object writable. On LLD a readonly area may be picked.), and emit an
R_*_COPY
relocation. All references resolve to the new location.
Multiple text relocations are even less acceptable, so on ELF a copy relocation is generally used. Here is a nice description from Rich Felker: "Copy relocations are not a case of overriding the definition in the abstract machine, but an implementation detail used to support data objects in shared libraries when the main program is non-PIC."
Copy relocations have drawbacks:
- Break page sharing.
- Make the symbol properties (e.g. size) part of ABI.
- If the shared object is linked with
-Bsymbolic
or--dynamic-list
and defines a data symbol copy relocated by the executable, the address of the symbol may be different in the shared object and in the executable.
Traditionally copy relocations could only occur in -fno-pic
code. A GCC 5 change made this possible for x86-64. Please read on.
x86: copy relocations and -fpie
-fpic
using GOT indirection for external data symbols has cost. Making -fpie
similar to -fpic
in this regard incurs costs if the data symbol turns out to be defined in the executable. Having the data symbol defined in another translation unit linked into the executable is very common, especially if the vendor uses fully/mostly statically linking mode.
In GCC 5, "x86-64: Optimize access to globals in PIE with copy reloc" started to use direct access relocations for external data symbols on x86-64 in -fpie
mode.
1 | extern int a; |
- GCC<5:
movq a@GOTPCREL(%rip), %rax; movl (%rax), %eax
(8 bytes) - GCC>=5:
movl a(%rip), %eax
(6 bytes)
This change is actually useful for architectures other than x86-64 but is never implemented for other architectures. What went wrong: the change was implemented as an inflexible configure-time choice (HAVE_LD_PIE_COPYRELOC
), defaulting to such a behavior if ld supports PIE copy relocations (most binutils installations). Keep in mind that such a default breaks -Bsymbolic
and --dynamic-list
in shared objects.
Clang addressed the inflexible configure-time choice via an opt-in option -mpie-copy-relocations
(D19996).
I noticed that:
- The option can be used for
-fno-pic
code as well to prevent copy relocations on ELF. This is occasionally users want, and they switch from-fno-pic
to-fpie
just for this purpose. - The option name should describe the code generation behavior, instead of the inferred behavior at the linking stage on a partibular binary format.
- The option does not need to tie to ELF.
- On COFF, the behavior is like always
-fdirect-access-external-data
.__declspec(dllimport)
is needed to enable indirect access. - On Mach-O, the behavior is like
-fdirect-access-external-data
for-fno-pic
(only available on arm) and the opposite for-fpic
.
- On COFF, the behavior is like always
- x86-64 psABI introduced
R_X86_64_GOTPCRELX
andR_X86_64_REX_GOTPCRELX
as GOT optimization. With the optimization, GOT indirection can be optimized, so the incured cost is very low now.
So I proposed an alternative option -f[no-]direct-access-external-data
: https://reviews.llvm.org/D92633 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98112. My wish on the GCC side is to drop HAVE_LD_PIE_COPYRELOC
and (x86-64) default to GOT indirection for external data symbols in -fpie
mode.
Please keep in mind that -f[no-]semantic-interposition
is for definitions while -f[no-]direct-access-external-data
is for undefined data symbols.
GCC 5 introduced -fno-semantic-interposition
to use local aliases for references to definitions in the same translation unit.
STV_PROTECTED
Now let's consider how STV_PROTECTED
comes into play. Here is the generic ABI definition on STV_PROTECTED:
A symbol defined in the current component is protected if it is visible in other components but not preemptable, meaning that any reference to such a symbol from within the defining component must be resolved to the definition in that component, even if there is a definition in another component that would preempt by the default rules. A symbol with STB_LOCAL binding may not have STV_PROTECTED visibility. If a symbol definition with STV_PROTECTED visibility from a shared object is taken as resolving a reference from an executable or another shared object, the SHN_UNDEF symbol table entry created has STV_DEFAULT visibility.
A non-local STV_DEFAULT
defined symbol is by default preemptible in a shared object on ELF. STV_PROTECTED
can make the symbol non-preemptible. You may have noticed that I use "preemptible" while the generic ABI uses "preemptable" and LLVM IR uses "dso_preemptable". Both forms work but "preemptible" is more common.
x86: protected data symbols and copy relocations
Many folks consider that copy relocations are best-effort support provided by the toolchain. STV_PROTECTED
is intended as an optimization and the optimization can error out if it can't be done for whatever reason. Since copy relocations are already oftentimes unacceptable, it is natural to think that we should just disallow copy relocations on protected data symbols.
However, GNU ld 2.26 made a change which enabled copy relocations on protected data symbols for i386 and x86-64.
x86: protected data symbols and direct accesses
If a protected data symbol in a shared object is copy relocated, allowing direct accesses will cause the shared object to operate on a different copy from the executable. Therefore, direct accesses to protected data symbols have to be disallowed in -fpic
code, just in case the symbols may be copy relocated. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65248 changed GCC 5 to use GOT indirection for protected external data.
This caused unneeded pessimization for protected external data. Clang always treats protected similar to hidden/internal.
For older GCC (and all versions of Clang), direct accesses are produced in -fpic
code. Mixing such object files can silently break copy relocations on protected data symbols. Therefore, GNU ld made the change https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=commit;h=ca3fe95e469b9daec153caa2c90665f5daaec2b5 to error in -shared
mode.
1 | % cat a.s |
This led to a heated discussion https://sourceware.org/legacy-ml/binutils/2016-03/msg00312.html. Swift folks noticed this https://bugs.swift.org/browse/SR-1023 and their reaction was to switch from GNU ld to gold.
binutils commit "x86: Clear extern_protected_data for GNU_PROPERTY_NO_COPY_ON_PROTECTED" introduced GNU_PROPERTY_NO_COPY_ON_PROTECTED
. With this property, ld -shared
will not error for relocation R_X86_64_PC32 against protected symbol `foo' can not be used when making a shared object
.
The two issues above are the costs enabling copy relocations on protected data symbols. Personally I don't think copy relocations on protected data symbols are actually leveraged. GNU ld's x86 port can just (1) reject such copy relocations and (2) allow direct accesses referencing protected data symbols in -shared
mode. GNU_PROPERTY_NO_COPY_ON_PROTECTED
can be phased out.
Protected function symbols and canonical PLT entries
1 |
|
GNU ld's aarch64 port somehow rejects this:
1 | % gcc -shared -fuse-ld=bfd -fpic b.c -o b.so |
The code is supported on many other architectures, including powerpc and x86.
On many architectures, a branch instruction uses a branch specific relocation type (e.g. R_AARCH64_CALL26
, R_PPC64_REL24
, R_RISCV_CALL_PLT
). This is great because the address is insignificant and the linker can arrange for a regular PLT if the symbol turns out to be external.
On i386, a branch in -fno-pic
code emits an R_386_PC32
relocation, which is indistinguishable from an address taken operation. If the symbol turns out to be external, the linker has to employ a tricky called "canonical PLT entry" (st_shndx=0, st_value!=0).
1 |
|
1 | % gcc -m32 -shared -fuse-ld=bfd -fpic b.c -o b.so |
This used to be a problem for x86-64 as well, until "x86-64: Generate branch with PLT32 relocation" changed call/jmp foo
to emit R_X86_64_PLT32
instead of R_X86_64_PC32
. Note: (-fpie/-fpic
) call/jmp foo@PLT
always emits R_X86_64_PLT32
.