x86, copy relocations and protected symbols
2021-01-09 17:00:00 Author: maskray.me(查看原文) 阅读量:267 收藏

  • -fno-pic can only be used by executables. On most platforms and architectures, direct access relocations are used to reference external data symbols.
  • -fpic can be used by both executables and shared objects. Windows has __declspec(dllimport) but most other binary formats allow a default visibility external data to be resolved to a shared object, so generally direct access relocations are disallowed.
  • -fpie was introduced as a mode similar to -fpic for ELF: the compiler can make the assumption that the produced object file can only be used by executables, thus all definitions are non-preemptible and thus interprocedural optimizations can apply on them.

For

1
2
extern int a;
int *foo() { return &a; }

-fno-pic typically produces an absolute relocation (a PC-relative relocation can be used as well). On ELF x86-64 it is usually R_X86_64_32 in the position dependent small code model. If a is defined in the executable (by another translation unit), everything works fine. If a turns out to be defined in a shared object, its real address will be non-constant at link time. Either action needs to be taken:

  • Emit a dynamic relocation in every use site. Text sections are usually non-writable. A dynamic relocation applied on a non-writable section is called a text relocation.
  • Emit a single copy relocation. The linker obtains the size of the symbol, allocates the bytes in .bss (this may make the object writable. On LLD a readonly area may be picked.), and emit an R_*_COPY relocation. All references resolve to the new location.

Multiple text relocations are even less acceptable, so on ELF a copy relocation is generally used. Here is a nice description from Rich Felker: "Copy relocations are not a case of overriding the definition in the abstract machine, but an implementation detail used to support data objects in shared libraries when the main program is non-PIC."

Copy relocations have drawback:

  • breaks page sharing
  • makes the symbol properties (e.g. size) part of ABI
  • If the shared object is linked with -Bsymbolic or --dynamic-list, the address of a data symbol may be different in the shared object and in the executable.

Traditionally copy relocations could only occur in -fno-pic code.

Copy relocations and -fpie

-fpic using GOT indirection for external data symbols has cost. Making -fpie similar to -fpic in this regard incurs costs if the data symbol turns out to be defined in the executable. Having the data symbol defined in another translation unit linked into the executable is very common, especially if the vendor uses fully/mostly statically linking mode.

In GCC 5, "x86-64: Optimize access to globals in PIE with copy reloc" started to use direct access relocations for external data symbols on x86-64 in -fpie mode.

1
2
extern int a;
int foo() { return a; }
  • GCC<5: movq a@GOTPCREL(%rip), %rax; movl (%rax), %eax (8 bytes)
  • GCC>=5: movl a(%rip), %eax (6 bytes)

This change is actually useful for architectures other than x86-64 but is never implemented for other architectures. What went wrong here was that the change was implemented as an inflexible configure-time choice (HAVE_LD_PIE_COPYRELOC), defaulting to such a behavior if ld supports PIE copy relocations (most binutils installations). Keep in mind that such a default breaks -Bsymbolic in shared objects.

Clang addressed the inflexible configure-time choice via an opt-in option -mpie-copy-relocations (D19996).

I noticed that:

  • The option can be used for -fno-pic code as well to prevent copy relocations on ELF. This is occasionally users want, and they switch from -fno-pic to -fpie just for this purpose.
  • The option name should describe the code generation behavior, instead of the inferred behavior at the linking stage on a partibular binary format.
  • The option does not need to tie to ELF.
    • On COFF, the behavior is like always -fdirect-access-external-data. __declspec(dllimport) is needed to enable indirect access.
    • On Mach-O, the behavior is like -fdirect-access-external-data for -fno-pic (only available on arm) and the opposite for -fpic.
  • x86-64 psABI introduced R_X86_64_GOTPCRELX and R_X86_64_REX_GOTPCRELX as GOT optimization. With the optimization, GOT indirection can be optimized, so the incured cost is very low now.

So I proposed an alternative option -f[no-]direct-access-external-data: https://reviews.llvm.org/D92633 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98112. My wish on the GCC side is to drop HAVE_LD_PIE_COPYRELOC and (x86-64) default to GOT indirection for external data symbols in -fpie mode.

Please keep in mind that -f[no-]semantic-interposition is for definitions while -f[no-]direct-access-external-data is for undefined data symbols.

GCC 5 introduced -fno-semantic-interposition to use local aliases for references to definitions in the same translation unit.

STV_PROTECTED

A non-local STV_DEFAULT defined symbol is by default preemptible in a shared object on ELF. STV_PROTECTED can make the symbol non-preemptible.

Here is the generic ABI definition on STV_PROTECTED:

A symbol defined in the current component is protected if it is visible in other components but not preemptable, meaning that any reference to such a symbol from within the defining component must be resolved to the definition in that component, even if there is a definition in another component that would preempt by the default rules. A symbol with STB_LOCAL binding may not have STV_PROTECTED visibility. If a symbol definition with STV_PROTECTED visibility from a shared object is taken as resolving a reference from an executable or another shared object, the SHN_UNDEF symbol table entry created has STV_DEFAULT visibility.

You may have noticed that I use "preemptible" while the generic ABI uses "preemptable" and LLVM IR uses "dso_preemptable". Both forms work but "preemptible" is more common.

Protected data symbols and copy relocations

Many folks consider that copy relocations are best-effort support provided by the toolchain. STV_PROTECTED is intended as an optimization and the optimization can error out if it can't be done for whatever reason. Since copy relocations are already oftentimes unacceptable, it is natural to think that we should just disallow copy relocations on protected data symbols.

However, GNU ld 2.26 made a change which enabled copy relocations on protected data symbols for i386 and x86-64.

Protected data symbols and direct accesses

If a protected data symbol in a shared object is copy relocated, allowing direct accesses will cause the shared object to operate on a different copy from the executable. Therefore, direct accesses to protected data symbols have to be disallowed in -fpic code, just in case the symbols may be copy relocated. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65248 changed GCC 5 to use GOT indirection for protected external data.

This caused unneeded pessimization for protected external data. Clang always treats protected similar to hidden/internal.

For older GCC (and all versions of Clang), direct accesses are produced in -fpic code. Mixing such object files can silently break copy relocations on protected data symbols. Therefore, GNU ld made the change https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=commit;h=ca3fe95e469b9daec153caa2c90665f5daaec2b5 to error in -shared mode.

1
2
3
4
5
6
7
8
9
10
11
% cat a.s
leaq foo(%rip), %rax

.data
.global foo
.protected foo
foo:
% gcc -fuse-ld=bfd -shared a.s
/usr/bin/ld.bfd: /tmp/ccchu3Xo.o: relocation R_X86_64_PC32 against protected symbol `foo' can not be used when making a shared object
/usr/bin/ld.bfd: final link failed: bad value
collect2: error: ld returned 1 exit status

This led to a heated discussion https://sourceware.org/legacy-ml/binutils/2016-03/msg00312.html. Swift folks noticed this https://bugs.swift.org/browse/SR-1023 and their reaction was to switch from GNU ld to gold.

Summary

The two issues above are the costs enabling copy relocations on protected data symbols.


文章来源: http://maskray.me/blog/2021-01-09-x86-copy-relocations-protected-symbols
如有侵权请联系:admin#unsafe.sh