lld 15 ELF changes
2022-9-5 15:0:0 Author: maskray.me(查看原文) 阅读量:16 收藏

llvm-project 15 will be released soon. I added some lld/ELF notes to https://github.com/llvm/llvm-project/blob/release/15.x/lld/docs/ReleaseNotes.rst. Here I will elaborate on some changes.

  • --package-metadata= has been added to create package metadata notes D131439 Fedora uses this to embed an allocable section in a linked image so that the core file will have the information. See Package Metadata for Core Files.
  • -z pack-relative-relocs is now available to support DT_RELR for glibc 2.36+. (D120701) There is an exciting size reduction technique called DT_RELR which is finally available for Linux distributions. It took me a lot of energy to lobby for the feature. The benefit will pay off. See Relative relocations and RELR.
  • --no-fortran-common (pre 12.0.0 behavior) is now the default. --fortran-common matches GNU ld behavior but is more likely to cause problems when when users mix COMMON and STB_GLOBAL symbols. See All about COMMON symbols for detail.
  • --load-pass-plugin has been added to load a new pass manager plugin. (D120490)
  • --android-memtag-{mode=,stack,heap} have been added to synthesize SHT_NOTE for memory tags on Android. (D119384)
  • FORCE_LLD_DIAGNOSTICS_CRASH environment variable is now available to force LLD to crash. (D128195) I don't like this option, but I see that it can be useful for testing the crash reporting behavior. For example, Clang Driver may rerun the link with --reproduce= to get a reproduce tarball.
  • --wrap semantics have been refined. (rG7288b85cc80f1ce5509aeea860e6b4232cd3ca01](https://reviews.llvm.org/rG7288b85cc80f1ce5509aeea860e6b4232cd3ca01)) ([D118756](https://reviews.llvm.org/D118756)) ([D124056](https://reviews.llvm.org/D124056))--wrap` is quite complex. Previous releases have repeatedly tuning it. At this point I am mostly confident to say that the ld.lld behavior is desired. In behaviors that ld.lld and GNU ld differ, I am confident to say that GNU ld's is not good enough:)
  • --build-id={md5,sha1} are now implemented with truncated BLAKE3. (D121531) The C implementation of BLAKE3 was imported into llvm-project. It is superior to MD5 and SHA1. We can drop the slow MD5 and SHA1 llvm-project implementations.
  • --emit-relocs: .rel[a].eh_frame relocation offsets are now adjusted. (D122459) The use case is rare. I recently know that gold crashes on the use case https://sourceware.org/bugzilla/show_bug.cgi?id=25968.
  • --emit-relocs: fixed missing STT_SECTION when the first input section is synthetic. (D122463)
  • (TYPE=<value>) can now be used in linker scripts. (D118840) The syntax can be used to create an output section of the specified type. With data commands, we can create an output section even if there is no corresponding input section.
  • Local symbol initialization is now performed in parallel. (D119909) (D120626)

Breaking changes

  • Archives are now parsed as --start-lib object files. If a member is neither an ELF relocatable object file nor an LLVM bitcode file, ld.lld will give a warning. (D119074)
  • The GNU ld incompatible --no-define-common has been removed.
  • The obscure -dc/-dp options have been removed. (D119108)
  • -d is now ignored.
  • If a prevailing COMDAT group defines STB_WEAK symbol, having a STB_GLOBAL symbol in a non-prevailing group is now rejected with a diagnostic. (D120626)
  • Support for the legacy .zdebug format has been removed. Run objcopy --decompress-debug-sections in case old object files use .zdebug. (D126793)
  • --time-trace-file=<file> has been removed. Use --time-trace=<file> instead. (D128451)

Speed

(Compared with glibc malloc, linking against libmimalloc.a is 1.12x as fast.) I use a -DCMAKE_BUILD_TYPE=Release -DCMAKE_EXE_LINKER_FLAGS=-Wl,--push-state,$HOME/Dev/mimalloc/out/release/libmimalloc.a,--pop-state -DLLVM_ENABLE_PROJECTS='clang;lld' -DLLVM_TARGETS_TO_BUILD=X86 -fPIC -pie build. The host compiler is a close-to-main clang. Both input and output is in tmpfs.

I have some changes scattering across the lld/ELF codebase to improve performance.

Linking a -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_ASSERTIONS=ON build of clang:

1
2
3
4
5
6
7
8
9
10
11
12
% hyperfine --warmup 2 --min-runs 16 "numactl -C 20-27 "/tmp/llvm-{14,15}/out/release/bin/ld.lld" @response.txt --threads=8"
Benchmark 1: numactl -C 20-27 /tmp/llvm-14/out/release/bin/ld.lld @response.txt --threads=8
Time (mean ± σ): 987.4 ms ± 10.0 ms [User: 1230.7 ms, System: 491.6 ms]
Range (min … max): 966.7 ms … 1009.1 ms 16 runs

Benchmark 2: numactl -C 20-27 /tmp/llvm-15/out/release/bin/ld.lld @response.txt --threads=8
Time (mean ± σ): 952.0 ms ± 7.5 ms [User: 1231.7 ms, System: 522.3 ms]
Range (min … max): 934.1 ms … 962.3 ms 16 runs

Summary
'numactl -C 20-27 /tmp/llvm-15/out/release/bin/ld.lld @response.txt --threads=8' ran
1.04 ± 0.01 times faster than 'numactl -C 20-27 /tmp/llvm-14/out/release/bin/ld.lld @response.txt --threads=8'

Linking a -DCMAKE_BUILD_TYPE=Debug build of clang:

1
2
3
4
5
6
7
8
9
10
11
12
% hyperfine --warmup 2 --min-runs 16 "numactl -C 20-27 "/tmp/llvm-{14,15}/out/release/bin/ld.lld" @response.txt --threads=8"
Benchmark 1: numactl -C 20-27 /tmp/llvm-14/out/release/bin/ld.lld @response.txt --threads=8
Time (mean ± σ): 3.839 s ± 0.027 s [User: 7.407 s, System: 1.838 s]
Range (min … max): 3.786 s … 3.877 s 16 runs

Benchmark 2: numactl -C 20-27 /tmp/llvm-15/out/release/bin/ld.lld @response.txt --threads=8
Time (mean ± σ): 3.451 s ± 0.016 s [User: 7.145 s, System: 1.879 s]
Range (min … max): 3.416 s … 3.472 s 16 runs

Summary
'numactl -C 20-27 /tmp/llvm-15/out/release/bin/ld.lld @response.txt --threads=8' ran
1.11 ± 0.01 times faster than 'numactl -C 20-27 /tmp/llvm-14/out/release/bin/ld.lld @response.txt --threads=8'

(--threads=1 => 1.04x (7.449s => 7.169s), --threads=2 => )

Linking a default build of chrome:

1
2
3
4
5
6
7
8
9
10
11
12
13
% hyperfine --warmup 2 --min-runs 16 "numactl -C 20-27 "/tmp/llvm-{14,15}/out/release/bin/ld.lld" @response.txt --threads=8"
Benchmark 1: numactl -C 20-27 /tmp/llvm-14/out/release/bin/ld.lld @response.txt --threads=8
⠇ Performing warmup runs
Time (mean ± σ): 5.488 s ± 0.033 s [User: 5.751 s, System: 2.661 s]
Range (min … max): 5.424 s … 5.543 s 16 runs

Benchmark 2: numactl -C 20-27 /tmp/llvm-15/out/release/bin/ld.lld @response.txt --threads=8
Time (mean ± σ): 4.912 s ± 0.030 s [User: 5.418 s, System: 2.632 s]
Range (min … max): 4.864 s … 4.961 s 16 runs

Summary
'numactl -C 20-27 /tmp/llvm-15/out/release/bin/ld.lld @response.txt --threads=8' ran
1.12 ± 0.01 times faster than 'numactl -C 20-27 /tmp/llvm-14/out/release/bin/ld.lld @response.txt --threads=8'

(--threads=4 => 1.11x (5.744s => 5.170s), --threads=2 => 1.09x (6.349s => 5.828s))


文章来源: https://maskray.me/blog/2022-09-05-lld-15-elf-changes
如有侵权请联系:admin#unsafe.sh