ELF interposition and -Bsymbolic
2021-05-16 16:00:00 Author: maskray.me(查看原文) 阅读量:217 收藏

This article discusses ELF interposition and the linker option -Bsymbolic and its friends. (I wrote -fno-semantic-interposition first but realized a reorganization would improve readability, so moved some parts and added more stuff to this new article.)

I added the -fno-semantic-interposition and contributed some optimization in Clang 11 and felt motivated enough to write a post after I had seen a great post by Daniel Colascione ("Python is 1.3x faster when compiled in a way that re-examines shitty technical decisions from the 1990s.") and recent rant from Linus Torvalds on shared objects' performance issues.

Say, we have two default visibility functions f and g. g calls f. We compile g into a shared object. There are 3 cases for f.

First case: f is defined in the same translation unit of g. I will discuss this in depth in my next article -fno-semantic-interposition. One notable point: GCC -fpic suppresses interprocedural optimizations including inlining for such non-inline external linkage functions.

1
2
3

void f() { ... }
void g() { f(); }

Second case: f is defined in a different object file which will be linked into the same shared object.

1
2
3
4
5
6
7
8

void f() { ... }


void f();
void g() { f(); }


Third case: f is defined in a different shared object or the executable. The symbol search on f cannot be prevented.

1
2
3
4
5
6
7
8

void f() { ... }


void f();
void g() { f(); }


You can see that in all three cases no annotation is required on f and g. This reflects an important design philosophy of ELF: dynamic linking should be similar to static linking.

When linked as a shared object (-shared), the linker notices that f is preemptible and will resolve the branch target to a PLT entry with a dynamic relocation R_*_JUMP_SLOT. The cost comes from two places:

  • The dynamic relocation requires a symbol search by the dynamic loader.
  • Every call site goes through a PLT indirection.

The third case is about an external call, where a symbol search cannot be prevented. But why do the first cases (where f is defined in the same shared object) need a PLT? You may read from somewhere that any of the linker options can avoid the PLT: -Bsymbolic, -Bsymbolic-functions, --dynamic-list. Read on.

Dynamic linking model in ELF

Since 2000-07-17, the ELF specification says the following for the STV_DEFAULT visibility (this is the default visibility. You get this unless you do thing like -fvisibility= or __attribute__((visibility(...)))):

Global and weak symbols are also preemptable, that is, they may by preempted by (typo: be) definitions of the same name in another component."

In Chapter 5 Dynamic Linking, the specification says:

When the dynamic linker creates the memory segments for an object file, the dependencies (recorded in DT_NEEDED entries of the dynamic structure) tell what shared objects are needed to supply the program's services. By repeatedly connecting referenced shared objects and their dependencies, the dynamic linker builds a complete process image. When resolving symbolic references, the dynamic linker examines the symbol tables with a breadth-first search. That is, it first looks at the symbol table of the executable program itself, then at the symbol tables of the DT_NEEDED entries (in order), and then at the second level DT_NEEDED entries, and so on. Shared object files must be readable by the process; other permissions are not required.

The wording remains unchanged since then, i.e. the evolution of dynamic linking has not contributed back to the specification.

This paragraph is probably difficult to follow. Let me rephrase it with some additions of dynamic loader behaviors. The dynamic loader does one critical job: resolving dynamic relocations and binding symbol references from one component to another. (A component is an executable or shared object, sometimes called a module.) There is a flat namespace for symbol search. The dynamic loader computes a breadth-first search list (executable, needed0, needed1, needed2, needed0_of_needed0, needed1_of_needed0, ...). For each symbol reference, the dynamic loader iterates over the list and finds the first component which provides a definition. (For dlsym with an explicit handle, the symbol search uses the dependency order, a breadth-first search rooted at the handle.)

The implication is that STB_GLOBAL and STB_WEAK definitions are equivalent in terms of symbol search. A STB_WEAK definition can preempt a STB_GLOBAL definition.

While not mentioned in the ELF specification, many dynamic loader implementations allow the environment variable LD_PRELOAD to inject shared objects. The effect is like the LD_PRELOAD list is inserted at the beginning of the executable's DT_NEEDED list. The search list may look like executable, preload0, preload1, needed0, needed1, needed2, needed0_of_preload0, ..., needed0_of_needed0, needed1_of_needed0, ... (If the program calls dlopen with RTLD_GLOBAL, the newly loaded component and its dependencies (if not loaded) will be appended to the list.) Here is the algorithm:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
fn load(c) {
...
if tail != null { tail->next = &c }
tail = &c
}

head = &exe
tail = null
load(exe)
for c in LD_PRELOAD {
load(c)
}
for (c = head; c; c = c->next) {
if c.loaded { continue; }
for c1 in c.needed
load(c1)
}

Note that the executable is always the first element of the search list, so a defined symbol of any binding in the executable cannot be preempted (interposed). In a shared object, a default visibility STB_GLOBAL or STB_WEAK symbol can be preempted (interposed) because an earlier component may define a symbol of the same name. It may be a bit surprising that a defined default visibility STB_GLOBAL symbol can be interposed.

Alternative symbol search models

Solaris names the above the default search model and introduced an alternative model: direct bindings. With -z defs, one can ensure the dependencies are provided as part of the link and all symbol references are satisfied. The linker can record the bound component for each symbol reference.

Here is an example from Solaris's Linkers and Libraries Guide:

1
2
3
$ elfdump -y W.so.2
[6] [ DEPEND DIRECT ] <self> a
[7] [ DEPEND LAZY DIRECT ] [1] w.so.1 b

With the information about the component name, the dynamic loader can speed up its symbol search by just looking at one component. In particular, frequently the bound component is the component itself.

In Mac OS X, the two-level namespace introduced in 10.1 (default unless you use ld -flat_namespace) is a similar model.

Prelink can be conceived as a direct binding model without great ergonomics.

The standard ELF specification defines DF_SYMBOLIC which can be conceived as a special case of direct bindings. When a shared object is marked as DF_SYMBOLIC (set by ld -Bsymbolic), the symbol search checks the shared object itself before starting the linear search from the executable. It is quite common for a shared object to call STV_DEFAULT definitions in itself. DF_SYMBOLIC can improve the performance greatly.

-Bsymbolic

The linker option -Bsymbolic can be used together with -shared. ld -shared -Bsymbolic is very similar to -pie.

-Bsymbolic follows ELF DF_SYMBOLIC semantics: all defined symbols are non-preemptible. This can optimize relocation processing:

  • function calls: a branch instruction (e.g. call foo@PLT) will not create a PLT entry. The associated R_*_JUMP_SLOT dynamic relocation will be suppressed.
  • variable access and function addresses: the GOT entry will not cause a R_*_GLOB_DAT dynamic relocation. On x86-64, with R_X86_64_GOTPCRELX/R_X86_64_REX_GOTPCRELX, the GOT indirection code sequence can be rewritten. However, the code sequence is still longer than that without GOT. On PowerPC64, there is a similar TOC optimization. On other architectures, there is no difference on code sequences.

-fno-semantic-interposition can address pessimization when the definition is the same as the use site. Working at the shared object level, -Bsymbolic can address cross-translation-unit pessimization which cannot be optimized with -fno-semantic-interposition.

However, in practice, deployment of -Bsymbolic may run into pointer equality problems. Many objects in C++ are not clearly part of a single object file, but are required by the ODR to have a single definition. For example, C++ [dcl.inline]: "An inline function or variable with external or module linkage can be defined in multiple translation units ([basic.def.odr]), but is one entity with one address. A type or static variable defined in the body of such a function is therefore a single entity."

We will discuss variables and functions separately.

Pointer equality for variables

An inline variable with external linkage and a local static variable defined in an inline function with external linkage are required to be unique. The address of such a variable seen by a -Bsymbolic linked shared object may be different from the address seen from outside the shared object. Fortunately it is uncommon to export such a vague linkage variable to both the executable and a shared object.

1
2
3
4
5
6
7
8
9
10
11
12
13

inline int *addr() {
static int data;
return &data;
}


#include "a.h"
int *addr0 = addr();


#include "a.h"
int *addr1 = addr();

(ELF specific) In addition, a regular non-inline variable with external linkage can cause incompatibility problems due to copy relocations. GCC/Clang -fno-pic emit direct access relocations referencing a global variable. If the global variable turns out to be defined in a shared object, there will be a copy relocation in the executable. The object the shared object sees and the executable sees will be different.

1
2
3
4
5
6
7
8
9

void fun();


void fun() {}
void *addr0 = (void *)&fun;


void *addr1 = (void *)&fun;

In Clang, the direct access relocation can be avoided with -fno-pic -fno-direct-access-access-external-data. GCC feature request: PR98112.

See Copy relocations, canonical PLT entries and protected visibility for details.

(In C++, typeid() on an incomplete class can define a typeinfo name object. A -Bsymbolic linked shared object may see a different copy, but the address can hardly cause a problem.).

Pointer equality for functions

The address of an inline function seen by a -Bsymbolic linked shared object may be different from the address seen from outside the shared object. Fortunately such cases are rare. Windows link.exe has Identical COMDAT Folding. ELF/Mach-O programs may use -fvisibility-inlines-hidden. Assuming pointer equality will break Identical COMDAT Folding and -fvisibility-inlines-hidden anyway.

In Mach-O, such symbols are placed into __LINKEDIT,__weak_binding so that dyld can coalesce the definitions across dylibs.

(ELF specific) In addition, a regular non-inline function with external linkage can cause incompatibility problems due to canonical PLT entries. GCC/Clang -fno-pic emit direct access relocations when taking the address of an external function. If the global variable turns out to be defined in a shared object, there will be a canonical PLT entry in the executable. The function address the shared object sees and the executable sees will be different.

I filed https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100593 in the hope that the -fno-pic behavior of direct access can be dropped.

-Bsymbolic-functions

The function incompatibility problems are uncommon. It is often benign when the function address seen by a shared object is different from outside the shared object. However, the variable case is usually severe: the executable and a shared object may act on different copies of a variable supposed to be the same entity.

In practice, we can usually use the linker option -Bsymbolic-functions. The option applies to STT_FUNC symbols in ld.lld and non-STT_OBJECT symbols in GNU ld and gold, avoiding variable incompatibility problems. Though rare, it may make sense to add a linker option (say, -Bsymbolic-global-functions which applies to STT_FUNC STB_GLOBAL symbols to bypass vague linkage STB_WEAK symbols.

Relation with -fvisibility=protected

A non-default visibility symbol cannot be preempted, even if the binding is STB_WEAK. -fvisibility=protected can make all definitions protected and thus non-preemptible, nullifying the performance benefit of -fno-semantic-interposition and -Bsymbolic. Note: if you want a definition to be preemptible, you will need a default visibility attribute, even if it is weak (e.g. __attribute__((weak,visibility("default")))).

However, -fvisibility=protected shares the same problem with -Bsymbolic: too coarse-grained. It can cause the same sets of problems as discussed above in Pointer equality for variables.

In GCC/binutils's x86 port, there is another STT_OBJECT issue resulting in poor Clang interoperability.

1
2
3
4
5
6
7
8
9
10
11
% cat a.s
leaq foo(%rip), %rax

.data
.global foo
.protected foo
foo:
% gcc -fuse-ld=bfd -shared a.s
/usr/bin/ld.bfd: /tmp/ccchu3Xo.o: relocation R_X86_64_PC32 against protected symbol `foo' can not be used when making a shared object
/usr/bin/ld.bfd: final link failed: bad value
collect2: error: ld returned 1 exit status

See Copy relocations, canonical PLT entries and protected visibility for details. There is no problem when you only use Clang and LLD.

-fvisibility=hidden can make all definitions hidden and thus non-preemptible, nullifying the performance benefit of -fno-semantic-interposition.

-fvisibility=hidden requires annotation of exported symbols (__attribute__((visibility("default")))). The explicit annotation sometimes makes it inconvenient to split and join libraries.

However, projects with Windows portability in mind will define macros to dispatch to either the visibility attribute or __declspec(dllexport).

Interaction with LD_PRELOAD

There are several types of LD_PRELOAD usage.

First, use LD_PRELOAD=same_soname.so to replace a DT_NEEDED entry with the same SONAME. Both -fno-semantic-interposition and -Bsymbolic are compatible with such usage.

Second, use LD_PRELOAD=malloc.so to intercept some functions not defined in the application or any of its shared object dependencies. Both -fno-semantic-interposition and -Bsymbolic are compatible.

1
void *f() { return malloc(0xb612); }

Third, use LD_PRELOAD=different_soname.so to replace a function defined in a shared object dependency and the SONAME is different. (This usage is unlikely compatible with C++'s one definition rule.) Such usage is incompatible with -Bsymbolic and -fno-semantic-interposition.

The Last Alliance of ELF and Men

I wish that distributions default to -fno-semantic-interposition and (in the long term) a variant of -Wl,-Bsymbolic-functions, bringing back the lost performance for decades. We can start with a configure-time option, like GCC's --enable-default-pie.

Such interposition doesn't work on macOS (by default) and Windows, so there is good chance that most pieces of portable software are already in a good state. However, I can imagine that there is still a decent amount of work by annotating software which cannot be built with -fno-semantic-interposition or -Wl,-Bsymbolic-functions. Distributions need to put into resources (likely less than the -fno-pic->-fPIE transition (GCC's --enable-default-pie)).

There is a trade-off and the downside is that LD_PRELOAD replacing a fragment of a shared object will be more difficult. In some rare cases the user may need LD_PRELOAD: sometimes as a workaround for some broken software. I feel that distributions should not provide such flexibility by default at such a great cost. The users can build the software by themselves.

We need a linker option to cancel default -Bsymbolic-functions. I have added -Bno-symbolic to GNU ld and gold (binutils 2.37; PR27834) and ld.lld 13.

We need a -Bsymbolic-functions variant which only applies to STB_GLOBAL symbols (i.e. STB_WEAK symbols are excluded). The address of an inline function is required to be unique in C++.

(From Peter Smith) The linker can introduce a debugging option for executables to catch accidental interposition, say, --warn-interposition: "Warning symbol S of type STT_FUNC is defined in executable A and shared objects B and C, using definition in A."

We need an option to disable interposition for functions but enable interposition for variables, because we want to be compatible with copy relocations, which will require years to fix. GCC feature request.

GCC -fno-pic should be fixed to use GOT to take address of an external default visibility function. PR100593.


文章来源: http://maskray.me/blog/2021-05-16-elf-interposition-and-bsymbolic
如有侵权请联系:admin#unsafe.sh