LLD is the LLVM linker. It started at the end of 2011 as a work-in-progress rewrite of ld64 for the Mach-O binary format based on the atom model. COFF and ELF ports based on the atom model were contributed subsequently. They shared one symbol resolution model. (IMO due to Mach-O's unfortunate limitation of 255 section .subsections_via_symbols
was invented. The atom model was an incarnation of the concept but it did not fit into ELF/PE where sections are the better basic units.)
In 2015, both COFF and ELF ports were rewritten. (See "LLD improvement plan") Today, LLD is a mature and fast linker supporting multiple binary formats (ELF, Mach-O, PE/COFF, WebAssembly). FreeBSD, Android, and Chrome OS have adopted it as the main linker.
As a main contributor of LLD's ELF port who has fixed numerous corner cases in recent years, I consider that its x86-64 support has been mature since the 8.0.0 release and is in a great shape since 9.0.0. The AArch64 and PowerPC32/PowerPC64 support has been great since the 10.0.0 release. The 11.0.0 release has very solid linker script support. (When people complain that GNU ld's linker script is not immediately usable with LLD, it is almost assuredly the problem of the script itself.) So, what's next? Build glibc with LLD!
glibc is known for tricks used here and there and tons of GNU extensions which challenge a "foreign" toolchain like llvm-project (Clang, LLD, etc). I just expected quirky linker usage which should be fixed on glibc's side, not anything to improve on LLD' side:)
My adventure concluded with the final toggle configure: Allow LD to be LLD 13.0.0 or above [BZ #26558]. The next release glibc 2.35 should be buildable with LLD 13.0.0, with all tests passing on aarch64/i386/x86-64. (I lied. It seems that you cannot assume all tests pass with GNU ld. On Debian and its derivatives, you may observe more failures than Fedora. Anyway, I just wanted to say LLD does not have more failures than GNU ld.)
Read on.
Build
librtld.map
There is a bootstrapping problem between ld.so and libc because they are separate. In a nutshell, elf/Makefile
performs the following steps to build elf/ld.so
:
- Create
elf/libc_pic.a
from libc.os
files - Create
elf/dl-allobjs.os
from a relocatable link of rtld.os
files - Create link map
elf/librtld.map
from a relocatable link ofelf/dl-allobjs.os
,elf/libc_pic.a
, and-lgcc
- Get a list of extracted archive members (
elf/librtld.mk
) fromelf/librtld.map
and createelf/rtld-libc.a
- Create
elf/librtld.os
from a relocatable link ofelf/dl-allobjs.os
andelf/rtld-libc.a
- Create
elf/ld.so
from a-shared
link ofelf/librtld.os
with the version scriptld.map
In a link map printed by GNU ld, Archive member included to satisfy reference by file (symbol)
is followed by extracted archive members. elf/Makefile
made use of sed -n '[email protected]^$(common-objpfx)\([^(]*\)(\([^)]*\.os\)) *.*[email protected]\1 \[email protected]'
to extract the archive members. LLD doesn't implement Archive member included to satisfy reference by file (symbol)
. Fortunately, LLD's output has lines like
1 | 1f350 1f350 1e 16 /home/maskray/Dev/glibc/out/lld/elf/rtld-libc.a(rtld-access.os):(.text) |
We can use sed -n '[email protected]^[0-9a-f ]*$(common-objpfx)\([^(]*\)(\([^)]*\.os\)) *.*[email protected]\1 \[email protected]'
to extract the archive members.
scripts/output-format.sed
libc.so
is a linker script. On Debian GNU/Linux, it looks like:
1 | % cat /lib/x86_64-linux-gnu/libc.so |
The idea is that -lc
can expand to something like -( libc.so.6 libc_nonshared.a --push-state --as-needed ld-linux-x86-64.so.2 --pop-state -)
. libc_nonshared.a
contains functions which should be statically linked. ld-linux-x86-64.so.2
is mostly for __tls_get_addr
used by general-dynamic/local-dynamic TLS models. Commit d3f5f87569398d11756b3dcb7a66926bfd8ee047 (in 2015) added AS_NEEDED
with no description of the purpose. Retroactively, this can make a ld.so performance difference when an executable has O(1000) shared object dependencies because the overall shared object uniqueness check has quadratic time complexity.
The first non-comment line is an OUTPUT_FORMAT
command, which is derived from the output of ld --verbose
. In GNU ld, --verbose
prints the internal linker script, which is used when an external one (-T
) is not used.
1 | ... |
Makerules
extracted the OUTPUT_FORMAT
line with a frightening sed script:
1 | /ld.*[ ]-E[BL]/b f |
LLD does not have an internal linker script so libc.so
did not have the OUTPUT_FORMAT
line. ( Personally I think an internal linker script is not useful. It would have some exposition value but the language is not powerful enough to encode all built-in logic. If LLD is to support the feature, we would need to emit a lot of conditional code which can add a huge amount of maintenance burden. )
Inspired by a Linux kernel usage, I realized that there is a better way to get the output format (bfdname): we can just parse the output of objdump -f
.
1 | % objdump -f elf/ld.so |
llvm-objdump -f
somewhat printed upper-case output formats. I switched the case in D76046.
Fixed by install: Replace scripts/output-format.sed with objdump -f [BZ #26559]
.
elf/Makefile
specified -Wl,--defsym=malloc=0
and other malloc.os
definitions before libc_pic.a
so that libc_pic.a(malloc.os)
is not extracted. This trick was used to avoid multiple definition errors.
1 |
|
For the interaction between a linker option and an input file, LLD generally chooses the behavior so that their relative order does not matter. Some options are inherently order dependent, e.g. --as-needed
and --no-as-needed
, --whole-archive
and --no-whole-archive
. However, reducing order dependence can improve robustness of a build system.
I had a debate with others and finally I noticed one point: --defsym
defines a SHN_ABS
symbol while a normal definition is relative to the image base. So a normal definition is better regardless.
I sent the patch in April 2020, pinged once in August. Since nobody responded, I sent it again in December. Finally, this issue is fixed by elf: Replace a --defsym trick with an object file to be compatible with LLD
.
-static-pie
and __rela_iplt_start
/__rela_iplt_end
glibc's static non-pie and static pie modes take different code paths for R_*_IRELATIVE
relocation resolving. Its static pie mode expects that __rela_iplt_start==__rela_iplt_end
, which is satisfied if ld leaves the symbols zero. Before 13.0.0, LLD defined __rela_iplt_start
/__rela_iplt_end
for -pie
and therefore broke glibc loader's assumption, causing a program to crash. See GNU indirect function for details about the encapsulation symbols.
I made good arguments but were dismissed by "Ulrich and I designed/implemented IFUNC on x86 and for x86. I consider x86 implementation of IFUC as the gold standard."
In the end, I conceded and changed LLD to only define the two encapsulation symbols for -no-pie
with grumbling. Otherwise, {gcc,clang} -fuse-ld=lld -static-pie
produced static pie would crash, even if the used glibc was built with GNU ld.
_GLOBAL_OFFSET_TABLE_[0]
BZ #28203 for aarch64.
In nearly every ELF port of GNU ld, _GLOBAL_OFFSET_TABLE_[0]
is the link-time address of _DYNAMIC
(the start of .dynamic
/PT_DYNAMIC
). In glibc, sysdeps/*/dl-machine.h
files used this approach to compute the load base (the virtual address of the ELF header).
1 | runtime_DYNAMIC = PC relative address of _DYNAMIC |
So you may ask: why can't glibc extract the p_vaddr
field of the PT_DYNAMIC
program header. Well, its code has a poor organization and makes this elegant solution difficult...
Due to the glibc requirement, unfortunately _GLOBAL_OFFSET_TABLE_[0]
has been a part of i386/x86-64 and PowerPC64 ELFv2 ABIs.
LLD's AArch64 port does not set _GLOBAL_OFFSET_TABLE_[0]
, so the trick does not work. I figured out an elegant fix without updating LLD:
In 2012, GNU ld and gold (included in binutils 2.23) started to define __ehdr_start
which has the link-time address zero. Using a PC relative code sequence to take the runtime address of __ehdr_start
gives us a better way to get the load base. I submitted patches to use the approach for aarch64/arm/riscv/i386/x86-64. The aarch64 code looks like the following. I originally intended to use inline assembly to avoid relying on compiler generating PC-relative addressing for hidden symbol access, but Szabolcs Nagy recommended the pure C approach.
1 |
|
I used this as an argument to mark LLD PR49672 (reported by the glibc creator:)) as wontfix. This also allowed the AArch64 ABI not to be polluted by glibc _GLOBAL_OFFSET_TABLE_[0]
.
Fixed by aarch64: Make elf_machine_{load_address,dynamic} robust [BZ #28203]
.
riscv: Drop reliance on _GLOBAL_OFFSET_TABLE_[0]
x86_64: Simplify elf_machine_{load_address,dynamic}
i386: Port elf_machine_{load_address,dynamic} from x86-64
Non-default version symbols
If linked with LLD<13.0.0 or gold (PR28196), the built ld.so will define both [email protected]_2.2.5
and [email protected]@GLIBC_2.2.5
. This is due to an unfortunate GNU as/GNU ld implementation flaw of symbol versioning which happens to be relied upon by glibc.
1 | GLIBC_2.2.5 { |
Anyway, I conceded and implemented [ELF] Combine [email protected] and foo with the same versionId if both are defined
not to cause confusion for the unfortunate corner case.
See the discussion on defined non-default version symbols on All about symbol versioning.
In addition, I implemented [ELF] Apply version script patterns to non-default version symbols
so that local:
works correctly.
Tests
With the aforementioned issues addressed, LLD linked glibc was fully functioning. However, they did not accept allowing LLD in configure.ac
. They expected that all tests passed. So I had to fix the following issues.
.tls_common
The LLVM integrated assembler does not support the assembler directive. LLD does not support SHN_COMMON STT_TLS
. The ELF standard common symbol uses STT_COMMON
, so it would be incompatible with STT_TLS
.
elf: Drop elf/tls-macros.h in favor of __thread and tls_model attributes [BZ #28152] [BZ #28205]
fixed the issue.
AArch64's general-dynamic/local-dynamic TLS models
AArch64 toolchains use TLS descriptors by default. For AArch64, LLD supports TLD descriptor relocation types but not relocation types for general-dynamic/local-dynamic (R_AARCH64_TLSGD_*/R_AARCH64_TLSLD_*
). In addition, the LLVM integrated assembler doesn't support the general-dynamic/local-dynamic modifiers. So the following tests did not build with Clang or LLD.
elf/tst-tls1.c
elf/tst-tls2.c
elf/tst-tls3.c
elf/tst-tlsmod1.c
elf/tst-tlsmod2.c
elf/tst-tlsmod3.c
elf/tst-tlsmod4.c
elf: Drop elf/tls-macros.h in favor of __thread and tls_model attributes [BZ #28152] [BZ #28205]
fixed the issue.
As a follow-up, I pushed Remove sysdeps/*/tls-macros.h
. These files may have educational values for future architectures about how to implement TLS models:)
--no-tls-get-addr-optimize
GNU ld's PowerPC64 port has implemented a general-dynamic/local-dynamic TLS model optimization. LLD doesn't support --{,no-}tls-get-addr-optimize
.
powerpc: Use --no-tls-get-addr-optimize in test only if the linker supports it
skipped the test if necessary.
--audit
and --depaudit
gold and LLD do not support --audit
or --depaudit
.
elf: Skip tst-auditlogmod-* if the linker doesn't support --depaudit [BZ #28151]
skipped the tests if necessary.
ifunc resolver calls a lazy binding PLT
For the two tests sysdeps/x86/tst-ifunc-isa-*
, an ifunc resolver calls a lazy binding PLT. GNU ld's x86-64 port places the R_*_IRELATIVE
relocation after the PLT's R_*_JUMP_SLOT
relocation, so that the program can work without an ifunc scheduler. This is GNU ld doing something to make a special case work. The right approach is for glibc to fix the issue systematically.
I just XFAILed the two tests.
Epilogue
It is one giant step for me, but just one small step for glibc. When my ambition of building glibc with LLD started in April 2020, I felt frustrated quickly due to lack of review. For my configure: Allow LD to be LLD 9.0.0 or above
patch series, I could not tell whether it was due to objection of a non-GNU toolchain or just sticking with strict requirements. Now I was told that different from the past there is no actively blocking in support of a toolchain beside GCC/binutils.
I tried to recover after a few months: "let me try again". I had put aside this work until someone on IRC told me that they were interested in making glibc static pie work with LLD. I was glad that I picked up the patches, persisted, and finally finished the work in the end of August 2021. Porting does not necessarily mean increased complexity. I actually happened to remove quite a bit of quirk from the code base.
Reviewer resources are never abundant. That said, perhaps a little bit more amicability could have made me feel better.
I wish that Clang can build glibc some day and the glibc community can advertise that Clang is fully supported. GCC and binutils-gdb have been buildable with Clang for a long time. Many GCC/binutils-gdb contributors are even happy to make Clang portability fixes. Hope that the GNU and LLVM communities can have more collaboration.
So, dear glibc, will you be happy with my sending Clang patches? :)
Current state
- aarch64/x86-32/x86-64: on par with GNU ld
- powerpc64le: 8 more failures