Distribution of debug information
2022-11-5 16:0:0 Author: maskray.me(查看原文) 阅读量:73 收藏

UNDER CONSTRUCTION

This article describes some approaches to distribute debug information. Commands below will use two simple C files for demonstration.

1
2
3
4
5
6
7
8
cat > a.c <
void foo(int);
int main() { foo(42); }
eof
cat > b.c <

void foo(int x) { printf("%d\n", x); }
eof

Debug information in the object file

This is the simplest model. Debug information resides in the executable or shared object.

1
2
gcc -c -g a.c b.c
gcc a.o b.o -o a

Separate debug files

Debug information is large and not needed by many people. As a general size optimization, many distributions move debug information into separate packages. This levages the feature that debug information may reside in a separate file. See https://sourceware.org/gdb/onlinedocs/gdb/Separate-Debug-Files.html.

objcopy from binutils-gdb can create a separate debug file (since 2003).

1
2
objcopy --only-keep-debug a a.debug
strip -S a -o a.stripped

eu-strip from elfutils can generate two output files in one invocation.

1
eu-strip -f a.debug a -o a.stripped

The elfutils way is convenient for simple use cases and is adopted by rpm. But for certain operations there is ambiguity whether they need to apply to one output file or both. The --only-keep-debug way has orthogonality and integrates well with other features (e.g. --compress-debug-sections, --remove-section). I favor --only-keep-debug and implemented it in llvm-objcopy.

When debugging a.stripped in gdb, use add-symbol-file -o xxx a.debug to load the separate debug file.

objcopy --add-gnu-debuglink=a.debug a.stripped adds a non-SHF_ALLOC section .gnu_debuglink to a.stripped. The section contains a filename (no directory information) and a four-byte CRC checksum.

1
2
3
4
5
6
% objdump -g a.stripped
...
Contents of the .gnu_debuglink section (loaded from a.stripped):

Separate debug info file: a.debug
CRC value: 0x4d1e2a66

gdb has supported .gnu_debuglink since 2003. It finds a debug file in the same directory of the executable and .debug/ relative to the directory.

1
2
3
% gdb -ex q a.stripped
Reading symbols from a.stripped...
Reading symbols from /tmp/c/a.debug...

Directories specified by debug-file-directory are used as well. This option needs to be set before loading the inferior.

1
2
3
4
5
6
7
% pwd
/tmp/c
% install -D -t debug/tmp/c a.debug
% rm a.debug
% gdb -iex 'set debug-file-directory debug' -ex q a.stripped
Reading symbols from a.stripped...
Reading symbols from debug//tmp/c/a.debug...

A debug file can be found by build ID. A build ID resides in an ELF note section. Many Linux distributions configure GCC with --enable-linker-build-id to generate a build ID by default. See --build-id for the option.

1
2
3
4
5
6
7
8
9
10
% readelf -Wn a.stripped
...
Displaying notes found in: .note.gnu.build-id
Owner Data size Description
GNU 0x00000008 NT_GNU_BUILD_ID (unique build ID bitstring) Build ID: a3b3f0788440fd94
% install -D -T debug/tmp/c/a.debug debug/.build-id/a3/b3f0788440fd94.debug
% rm debug/tmp/c/a.debug
% gdb -nx -iex 'set debug-file-directory debug' -ex q a.stripped
Reading symbols from a.stripped...
Reading symbols from /tmp/c/debug/.build-id/a3/b3f0788440fd94.debug...

lldb uses target.debug-file-search-paths to locate a separate debug file. (TODO)

On Debian, if we install hello-dbgsym, a debug file will be available in /usr/lib/debug.

1
2
3
4
5
% gdb -batch -ex 'show debug-file-directory'
The directory where separate debug symbols are searched for is "/usr/lib/debug".
% gdb -ex q =hello
Reading symbols from /usr/bin/hello...
Reading symbols from /usr/lib/debug/.build-id/ff/29703f105c66821e9b10149db8cff3b2e4043a.debug...

See dh_strip for packaging commands.

MiniDebugInfo

See https://sourceware.org/gdb/onlinedocs/gdb/MiniDebugInfo.html (implemented in 2012). When a binary contains .gnu_debugdata, gdb decompresses it with xz and loads it.

1
2
objcopy --only-keep-debug a a.debug
objcopy -S --add-section=.gnu_debugdata=a.debug.xz a a.stripped
1
2
3
% gdb -ex q a.stripped
Reading symbols from a.stripped...
Reading symbols from .gnu_debugdata for /tmp/c/a.stripped...

Fedora uses the feature to improve symbolization of stack traces in the absence of debug information. A Fedora MiniDebugInfo file mostly just provides note sections and a .symtab with symbols not in .dynsym. Non-SHF_ALLOC SHT_PROGBITS/SHT_NOTE/SHT_NOBITS sections (e.g. .comment) are removed. Note sections are duplicated in the original binary and the MiniDebugInfo file due to a small missing optimization in eu-strip -f.

See Support MiniDebugInfo in rpm for the original implementation. The new implementation is in scripts/find-debuginfo.in in the debugedit repository. Support for mini-debuginfo in LLDB introduces the lldb implementation.

Here is a simplified demonstration of what scripts/find-debuginfo.in does:

1
2
3
4
5
6
7
eu-strip --remove-comment -f a.mini a -o a.stripped
nm a -f sysv --defined-only | awk -F \| '$4 ~ "FUNC" {print $1}' | sort > a.symtab
nm a -f sysv --defined-only -D | sort > a.dynsym
comm -13 a.dynsym a.symtab > a.keepsyms
objcopy -S --keep-symbols=a.keepsyms a.mini
xz a.mini
objcopy -S --add-section=.gnu_debugdata=a.mini.xz a a.stripped

DWARF supplementary object files

Split DWARF object files

This was originally proposed as GCC debug fission. Later in DWARF version 5 this feature was standardized as split DWARF object files, commonly abbreviated as "split DWARF". The idea is to move the bulk of .debug_* sections into a separate file (.dwo) and leave just a small amount in the relocatable object file (.o). .dwo files are not handled by the linker: this reduces the input section combining work and relocation work for the linker, leading to smaller link time and lower memory usage. The smaller input has advantages for a distributed build farm.

1
2
3
4
5
6
7
8
9
10
11
12
13
% clang -c -g -gsplit-dwarf a.c b.c
% readelf -WS a.o | grep ' .debug_gnu_pub'
[12] .debug_gnu_pubnames PROGBITS 0000000000000000 0000be 00001c 00 0 0 1
[14] .debug_gnu_pubtypes PROGBITS 0000000000000000 0000da 00001b 00 0 0 1
% readelf -WS a.dwo | grep ']'
[Nr] Name Type Address Off Size ES Flg Lk Inf Al
[ 0] NULL 0000000000000000 000000 000000 00 0 0 0
[ 1] .strtab STRTAB 0000000000000000 000155 000051 00 0 0 1
[ 2] .debug_str_offsets.dwo PROGBITS 0000000000000000 000040 00001c 00 E 0 0 1
[ 3] .debug_str.dwo PROGBITS 0000000000000000 00005c 00009d 01 MSE 0 0 1
[ 4] .debug_info.dwo PROGBITS 0000000000000000 0000f9 00002e 00 E 0 0 1
[ 5] .debug_abbrev.dwo PROGBITS 0000000000000000 000127 00002e 00 E 0 0 1
% clang -fuse-ld=lld -Wl,--gdb-index a.o b.o -o a

(Note: don't perform compiling and linking in one action: the DWO files' location may be surprising.)

-ggnu-pubnames implied by -gsplit-dwarf generates .debug_gnu_pubnames and .debug_gnu_pubtypes sections which are used to build .gdb_index by a linker (gold, ld.lld, mold).

When gdb loads a symbol file, it constructs an internal symbol table. .gdb_index can improve startup time. With split DWARF, the .dwo files as referenced by the executable or shared object will be fully parsed on demand.

Clang supports -gsplit-dwarf=single to embed .dwo sections in the relocatable object file. The compilation will not produce a .dwo file. This mode is convenient for single-machine linking.

lldb uses target.debug-file-search-paths for .dwo searching but does not use a directory structure. (TODO)

Distributing a number of .dwo files can be inconvenient. A DWARF package file (typically given the extension .dwp) can be used in replace of .dwo files. (Note: currently DWP has incomplete DWARF64 support.) dwp from binutils-gdb/gold can build a DWARF package file. llvm-dwp is another implementation in llvm-project, but there are some memory usage scaling problems.

1
dwp a.dwo b.dwo -o a.dwp

It is a TODO to integrate DWARF compressing (e.g. dwz) into dwp.

gold is under-maintained in recent years. dwp does not support DWARF v5 yet.

1
2
3

bazel -c dbg --fission=yes :a
bazel -c dbg --fission=yes :a.dwp

debuginfod

While gdb can hint that debug information is missing, the manual step of installing the relevant debug information package is considered by many as inconvenient. In 2019, elfutils introduced a new program debuginfod. Some distributions have hosted debug information in public servers. See https://www.redhat.com/en/blog/how-debuginfod-project-evolved-2021.

Here is an example demonstrating what debuginfod does:

1
2
3
4
objcopy --only-keep-debug a a.debug
strip -S a -o a.stripped
tar cf a.tar.zst --zstd a.stripped a.debug
debuginfod -d debuginfod.sqlite -F -Z .tar.zst=zstdcat .

debuginfod listens on port 8002 by default. -F causes it to scan archives in the specified directory. -Z .tar.zst=zstdcat tells it to use zstdcat to handle a .tar.zst file (e.g. an Arch Linux package).

For every archive member, debuginfod classifies the file as an regular executable/shared object (with at least one SHF_ALLOC SHT_PROGBITS section) or a debug file (with a .debug_* section). By default debuginfod parses a .debug_line

debuginfod-find is a client.

1
2
3
4
buildid=$(readelf -n a | awk '/Build ID:/ {print $3}')
DEBUGINFOD_URLS='http://0:8002/' debuginfod-find debuginfo $buildid
DEBUGINFOD_URLS='http://0:8002/' debuginfod-find executable $buildid
DEBUGINFOD_URLS='http://0:8002/' debuginfod-find source $buildid /tmp/c/a.c

We can query the server manually:

1
2
3
curl -s localhost:8002/buildid/$buildid/debuginfo -o output && cmp a.debug output
curl -s localhost:8002/buildid/$buildid/executable -o output && cmp a.stripped output
curl -s localhost:8002/buildid/a3b3f0788440fd94/source/$(<<'s,/,%2F,g') | diff /tmp/c/a.c -

With set debuginfod enabled on, gdb can query a debuginfod server if a symbol file is not found. As of today, a few programs support debuginfod, e.g. valgrind.

Apple platforms

Apple platforms use a model very different from Linux distributions. Apple ld64 does not combine input __debug_* sections into output sections. When debugging a program with lldb, lldb reads DWARF information from relocatable object files or dSYM bundles. A dSYM bundle is created by dsymutil.


文章来源: https://maskray.me/blog/2022-11-08-distribution-of-debug-information
如有侵权请联系:admin#unsafe.sh