Before September 2020 FreeBSD could only be built on a FreeBSD host. Alexander Richardson did a lot of work making this possible: https://wiki.freebsd.org/BuildingOnNonFreeBSD.
Get prebuilt Clang and LLD
I use prebuilt Clang and LLD from Chromium: https://chromium.googlesource.com/chromium/src/tools/clang/+/refs/heads/main/scripts/update.py. My output directory is ~/Stable
.
Build
FreeBSD src enables -Werror
by default. Our Clang is new and may have many new diagnostics. We can use -DWITHOUT_WERROR
to drop -Werror
.
Now kick off our build.
1 | mkdir -p obj/default |
tools/build/make.py
forwards unknown options to bmake.
It may stop at some step due to various non-hermetic issues.
Search for .OBJDIR
in man bmake
to learn MAKEOBJDIRPREFIX
. bmake comes with built-in support for separate src/obj trees. The recursive makefile build system used by FreeBSD does not handle relative MAKEOBJDIRPREFIX
paths well, so just use an absolute path.
Build compile_commands.json
I tried Bear but it could not handle such a complex build system. The process got stuck at some step, so I just gave up on it.
bmake meta mode
I learned a bit about bmake (default make
on FreeBSD) and noticed a nice feature: meta mode. In meta mode bmake records build commands into .meta
files. For the next build, bmake will consult .meta
files to evaluate whether the target has become out-of-date. This is more robust than just comparing file modification times. Build Systems à la Carte says such a build system is self-tracking.
For FreeBSD src, we can enable meta mode with -DWITH_META_MODE
. After buildworld
, we can parse these .meta
files under objdir and build compile_commands.json
.
1 | MAKEOBJDIRPREFIX=$PWD/obj/meta ./tools/build/make.py --cross-bindir=~/Stable/bin -j 20 buildworld TARGET=amd64 TARGET_ARCH=amd64 -DWITH_META_MODE -DNO_FILEMON -DWITHOUT_WERROR |
Some notes about the other -D
variables.
bmake has code dealing with filemon, which is a FreeBSD driver. On Linux we need to disable it with -DNO_FILEMON
.
As of 2021-08, building on Linux still has some issues. I mostly read libexec/rtld-elf
and the build process can proceed beyond libexec/rtld-elf
, so I am satisfied.
ccls
With compile_commands.json
, my ccls can index the repository.
Here is a screenshot browsing libexec/rtld-elf
code in Emacs with (lsp-mode + emacs-ccls).
1 | (setq ccls-sem-highlight-method 'font-lock) |
Contribute to libexec/rtld-elf
I stumbled upon FreeBSD libexec/rtld-elf
in 2019 to sort out how LLD should set the p_memsz
field of PT_GNU_RELRO
. I noticed an issue but did not get a chance to create a patch. Scroll down for details.
When working on some TLS issues in LLD, I noticed that rtld did not handle p_vaddr % p_align != 0
correctly. (Note: fixed for i386 and amd64.)
In 2020 I noticed a symbol resolution issue related to STB_WEAK
, but did not follow up with the patch. (Note: introduced the environment variable LD_DYNAMIC_WEAK=0
to match ELF spec (glibc/musl behavior).)
Now that I have a proper setup, I can work on the aforementioned problems in a virtual machine running FreeBSD 12.2. qemu-system-x86_64 -enable-kvm -m 16384 -smp 16 -drive file=~/Images/freebsd.qcow2,if=virtio -net nic,model=virtio -net user,hostfwd=tcp::2223-:22
1 | % cat /etc/src.conf |
1 |
|
(My experience with SUBDIR_OVERRIDE=libexec/rtld-elf
is bad.)
The versions of rtld and libc should match if they are of different major versions. Simple programs may work even if you don't use a libc of the matching version.
1 | mkdir -p /tmp/opt/lib |
Thanks to kib who reviewed these patches and lwhsu who added me to this contributor list: https://docs.freebsd.org/en/articles/contributors/#contrib-additional.
p_memsz of PT_GNU_RELRO
An ELF component usually needs a PT_LOAD
program header with the permission bits PF_R|PF_W
. Some sections are only needed to be writable at relocation processing time and can be made read-only during regular program execution. glibc invented PT_GNU_RELRO
which has been ported to FreeBSD/NetBSD/OpenBSD.
While linkers ensure that there is an alignment boundary of max-page-size bytes between two PT_LOAD
program headers, the alignment boundary following PT_GNU_RELRO
is just common-page-size bytes. GNU ld, gold, and LLD ensure that p_vaddr+p_memsz
is a multiple of common-page-size.
glibc and musl do something like
1 | size_t start = roundDown(p_vaddr, PAGE_SIZE); |
FreeBSD rtld did something like
1 | size_t start = roundDown(p_vaddr, PAGE_SIZE); |
If PAGE_SIZE
(the system page size) is larger than the link-time common-page-size, mprotect may incorrectly map some non-RELRO pages read-only.
https://reviews.freebsd.org/D31498 fixed the bug.
STB_WEAK in symbol lookup
The first version of the ELF specification http://www.sco.com/developers/gabi/1998-04-29/ch5.dynamic.html says:
When resolving symbolic references, the dynamic linker examines the symbol tables with a breadth-first search. That is, it first looks at the symbol table of the executable program itself, then at the symbol tables of the DT_NEEDED entries (in order), and then at the second level DT_NEEDED entries, and so on...
(This paragraph has not been updated in the latest snapshot.)
The common(?) interpretation is that STB_WEAK/STB_GLOBAL have no differences for symbol lookup.
I asked https://groups.google.com/g/generic-abi/c/YdmpBmukW0g for clarification and archaeology.
glibc and musl use this symbol lookup behavior: resolve to the first found symbol definition, regardless of STB_WEAK
/STB_GLOBAL
.
FreeBSD/NetBSD/OpenBSD use this non-conforming behavior: when a weak symbol definition is found, remember the definition and keep searching in the remaining shared objects for a non-weak definition. If found, the non-weak definition is preferred, otherwise the remembered weak definition is returned.
https://reviews.freebsd.org/D26352 implemented the Linux behavior under the environment variable LD_DYNAMIC_WEAK=1
.
p_vaddr % p_align != 0 for PT_TLS
It is very complex. See All about thread-local storage.
https://reviews.freebsd.org/D31538 fixed the i386/amd64 ports.
Epilogue
bmake
bmake supports a few features which make a make based build system less daunting:
- built-in support for separte src/obj trees
- meta mode (self-tracking build system)
- logical AND operators
&&
and||
are supported in.if
and.elif
conditional structures - variable modifiers
These are major pain points in GNU make.
GNU make needs something like http://make.mad-scientist.net/papers/multi-architecture-builds/ to support separate src/obj trees.
Most GNU make based build systems cannot rebuild the target when the commands change. The Linux kernel uses .cmd
files to solve the problem.
In glibc, the following pattern is quite common to work around the lack of logical AND operators.
1 | ifeq ($(have-foo)$(have-bar),yesyes) |
Some variable modifiers can be difficult to remember, but a good use of them drops the need to spawn various shell utilities.
rtld
FreeBSD rtld supports many GNU extensions:
- GNU indirect functions (
STT_GNU_IFUNC
,R_*_IRELATIVE
) - GNU symbol versioning
I came from a musl background. In many places I think FreeBSD's is overly complex, but is still relatively clean.
It has implemented some features which musl does not support:
- lazy binding PLT
- dlclose which actually unloads a DSO
In comparison, many stuff are quite messy in glibc rtld. It can learn a lot from musl and FreeBSD rtld.