Port LLVM XRay to Apple systems
2023-6-18 15:0:0 Author: maskray.me(查看原文) 阅读量:9 收藏

I do not use Apple products, but I sometimes like investigating Mach-O as an object file format and my llvm-project changes sometimes need to work around the quirks.

LLVM has a function call tracing system called XRay. It supports many architectures on Linux and some BSDs but does not support Apple systems. If the target triple is x86_64-apple-darwin*, you may notice that Clang will allow you to perform compilation, but linking will fail. For other architectures, Clang will reject it.

1
2
3
4
5
6
% clang --target=x86_64-apple-darwin -fxray-instrument -fxray-instruction-threshold=1 -c a.c
% clang --target=x86_64-apple-darwin -fxray-instrument -fxray-instruction-threshold=1 a.o
ld: in section __DATA,xray_instr_map reloc 0: X86_64_RELOC_SUBTRACTOR must have r_extern=1 file 'a.o' for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
% clang --target=arm64-apple-darwin -fxray-instrument -fxray-instruction-threshold=1 -c a.c
clang: error: the clang compiler does not support '-fxray-instrument on aarch64-apple-darwin'

So I dove down the rabbit hole.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
        .section        __DATA,xray_instr_map
Lxray_sleds_start0:
Ltmp0:
.quad Lxray_sled_0-Ltmp0
.quad Lfunc_begin0-(Ltmp0+8)
.byte 0x00
.byte 0x00
.byte 0x02
.space 13
Ltmp1:
.quad Lxray_sled_1-Ltmp1
.quad Lfunc_begin0-(Ltmp1+8)
.byte 0x01
.byte 0x00
.byte 0x02
.space 13
Lxray_sleds_end0:

.quad Lxray_sled_0-Ltmp0 is represented as a pair of relocations (llvm-readobj -r a.o)

1
2
0x0 0 3 0 X86_64_RELOC_SUBTRACTOR 0 xray_instr_map
0x0 0 3 1 X86_64_RELOC_UNSIGNED 0 _foo

X86_64_RELOC_SUBTRACTOR is an external relocation (r_extern==1) where r_symbolnum references a symbol table entry. Linkers will give an error if r_extern=0. The symbols with the "L" prefix are called temporary symbols in LLVMMC and are not present in the symbol table. LLVM integrated assembler tries to convert the subtractor symbol to an atom, that is, a non-temporary symbol defined in the same section. However, since xray_instr_map does not define a non-temporary symbol, the X86_64_RELOC_SUBTRACTOR relocation will have no associated symbol and its r_extern will be 0.

To fix this issue, we need to define a non-temporary symbol. We can accomplish this by renaming Lxray_sleds_start0 to lxray_sleds_start0. In LLVMMC, LinkerPrivateGlobalPrefix is set to "l" for Apple targets. We can define an overload of MCContext::createLinkerPrivateTempSymbol(const Twine &Name) to allow LLVMMC to select an unused symbol starting with lxray_sleds_start. (There is a pitfall: "ltmp" should be compiler internal.) For ELF targets, the MCContext::createLinkerPrivateTempSymbol function creates a temporary symbol starting with ".L".

Oleksii Lozovskyi reported that the -fxray-function-index option has been broken.

  • (default): no function index
  • -fxray-function-index: no function index
  • -fno-xray-function-index: xray_fn_idx section is present

-fxray-function-index was the default. It turns out that a clangDriver refactoring accidentally caused this regression, but the negative variable name was probably the main reason. XRay tests were not great and there was no driver test to catch this. I fixed this.

Now that -fxray-function-index is back, we get the xray_fn_idx section by default. The section contains entries like the following:

1
2
3
4
.section        __DATA,xray_fn_idx
.p2align 4, 0x90
.quad Lxray_sleds_start0
.quad Lxray_sleds_end0

BTW: I noticed an old workaround (2015) for ld64 and proposed to remove it: https://reviews.llvm.org/D152831.

These absolute addresses require rebase opcodes in the special section __LINKEDIT,__rebase. This is not great and I wanted to fix it back in 2020 but never got around to do it. This motivated me to actually fix the issue and create https://reviews.llvm.org/D152661 to change the [start,end) representation to the (pc_relative_start, size) representation.

My initial attempt somehow wrote something like this. I took a difference of two labels, and right shifted it by 5 to get the number of sleds.

1
2
3
4
5
        .section        __DATA,xray_fn_idx,regular,live_support
.p2align 4, 0x90
lxray_fn_idx0:
.quad lxray_sleds_start0-lxray_fn_idx0
.quad (Lxray_sleds_end0-lxray_sleds_start0)>>5

This approach works on ELF targets but not on Mach-O targets due to a pile of assembler issues.

1
2
3
4
% clang -c --target=x86_64-apple-darwin a.s
/tmp/a-b0645c.s:53:8: error: expected relocatable expression
.quad (Lxray_sleds_end0-lxray_sleds_start0)>>5
^

Assembler issues

When assembling an assembly file into an object file, an expression can be evaluated in multiple steps. Two steps are particularly important:

  • Parsing time. At this stage, We have a MCAssembler object but no MCAsmLayout object. Instruction operands and certain directives like .if require the ability to evaluate an expression early.
  • Object file writing time. At this stage, we have both a MCAssembler object and a MCAsmLayout object. The MCAsmLayout object provides information about the offset of each fragment.

The first issue is not specific to this case and is also encountered in ELF. The following assembly code should assemble to the hex pairs 01000001, but Clang fails to compute .if .-1b == 3.

1
2
3
4
5
6
7
8
9
10
11
12
13
% cat x.s
1:
.byte 1
.space 2
.if .-1b == 3
.byte 1
.else
.byte 0
.endif
% clang -c x.s
x.s:4:5: error: expected absolute expression
.if .-1b == 3
^

Jian Cai implemented limited expression folding support to LLVM integrated assembler to support the Linux kernel arm use case.

1
2
arch/arm/mm/proc-v7.S:169:143: error: expected absolute expression
.pushsection ".alt.smp.init", "a" ; .long 9998b ;9997: orr r1, r1, #((1 << 0) | (1 << 6))|(3 << 3) ; .if . - 9997b == 2 ; nop ; .endif ; .if . - 9997b != 4 ; .error "ALT_UP() content must assemble to exactly 4 bytes"; .endif ; .popsection

I have added support for MCFillFragment (.space and .fill) and for A-B, where A is a pending label (which will be reassigned to a real fragment in flushPendingLabels()). Now, the LLVM integrated assembler can successfully assemble x.s when a MCAssembler object is present. However, evaluation still does not work without a MCAssembler object, which is expected.

1
2
3
4
% llvm-mc x.s -filetype=null
x.s:4:5: error: expected absolute expression
.if .-1b == 3
^

Then I noticed a potential pitfall for Mach-O in MCSection::flushPendingLabels. When flushing pending labels, it did not ensure that the new fragment inherits the previous atom symbol. I fixed this issue, although I haven't been able to create a test case to verify this behavior.

After this fix, .quad (Lxray_sleds_end0-lxray_sleds_start0)>>5 can be successfully assembled. However, during "direct object emission", an error expected relocatable expression will be reported. The issue is quite subtle.

In the case of direct object emission, where LLVM IR is directly lowered to an object file bypassing assembly (e.g., using clang -c a.c instead of clang -c a.s or clang -c --save-temps a.c), the assembler information is not used for parsing (MCStreamer::UseAssemblerInfoForParsing). As a result, the assembly code .quad (Lxray_sleds_end0-lxray_sleds_start0)>>5 will be transformed into a fixup.

During object writing time, we have a MCAsmLayout object and atom information for fragments. However, the label Lxray_sleds_end0 will belong to the next fragment, causing the condition in MachObjectWriter::isSymbolRefDifferenceFullyResolvedImpl to fail. In my opinion, it may be necessary to relax the condition in this case.

Linker dead stripping

To support linker dead stripping, also known as linker garbage collection, we need to add the S_ATTR_LIVE_SUPPORT attribute to the two sections xray_instr_map and xray_fn_idx.

Runtime issue

compiler-rt/lib/xray/xray_trampoline_x86_64.S used .Ltmp* symbols which are temporary for ELF but non-temporary for Mach-O. The non-temporary labels become atoms and can cause bad dead stripping behaviors.

I fixed the problem by using the LOCAL_LABEL macro, which generates an "L" symbol specifically for Mach-O.

Driver change

After AArch64 works, we can make Clang Driver accept --target=arm64-apple-darwin for XRay.


文章来源: https://maskray.me/blog/2023-06-18-port-llvm-xray-to-apple-systems
如有侵权请联系:admin#unsafe.sh