All about UndefinedBehaviorSanitizer
2023-1-29 16:0:0 Author: maskray.me(查看原文) 阅读量:27 收藏

UndefinedBehaviorSanitizer (UBSan) is a undefined behavior detector for C/C++. It consists of a code instrumentation and a runtime (in llvm-project/compiler-rt).

Clang and GCC have independent implementations. Clang implemented the instrumentation first in 2009-12, initially named -fcatch-undefined-behavior. GCC 4.9 implemented -fsanitize=undefined in 2013-08.

Available checks

There are many undefined behavior checks which can be detected. One may specify -fsanitize=xxx where xxx is the name of the check individually, or more commonly, specify -fsanitize=undefined to enable all of them.

Some checks are not undefined behavior per language standards, but are often code smell and lead to surprising results. They are implicit-unsigned-integer-truncation, implicit-integer-sign-change, unsigned-integer-overflow, etc.

1
2
% clang -fsanitize=undefined -xc /dev/null '-###'
... -fsanitize=alignment,array-bounds,bool,builtin,enum,float-cast-overflow,function,integer-divide-by-zero,nonnull-attribute,null,pointer-overflow,return,returns-nonnull-attribute,shift-base,shift-exponent,signed-integer-overflow,unreachable,vla-bound,vptr" "-fsanitize-recover=alignment,array-bounds,bool,builtin,enum,float-cast-overflow,function,integer-divide-by-zero,nonnull-attribute,null,pointer-overflow,returns-nonnull-attribute,shift-base,shift-exponent,signed-integer-overflow,vla-bound,vptr" ...

GCC implements slightly fewer checks than Clang.

Modes

UBSan provides 3 modes to allow tradeoffs between code size and diagnostic verboseness.

  • Default mode
  • Minimal runtime (-fsanitize-minimal-runtime)
  • Trap mode (-fsanitize-trap=undefined)

Let's use a C function to compare the behaviors.

1
2
3
4
5
6
7
8
#include <stdio.h>
int foo(int a) { return a * 2; }
int main() {
int x;
scanf("%d", &x);
printf("a %d\n", foo(x));
printf("b %d\n", foo(x));
}

Default mode

With the umbrella option -fsanitize=undefined or the specific -fsanitize=signed-integer-overflow, Clang emits the following LLVM IR for foo:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
define dso_local i32 @foo(i32 noundef %a) local_unnamed_addr #0 {
entry:
%0 = add i32 %a, 1073741824
%1 = icmp sgt i32 %0, -1
br i1 %1, label %cont, label %handler.mul_overflow, !prof !4, !nosanitize !5

handler.mul_overflow:








The add and icmp instructions check whether the argument is in the range [-0x40000000, 0x40000000) (no overflow). If yes, return the non-overflow result. Otherwise, the emitted code calls the callback __ubsan_handle_mul_overflow with arguments describing the source location. __ubsan_handle_mul_overflow is implemented in the runtime (in llvm-project/compiler-rt).

By default, all UBSan checks except return,unreachable are recoverable (i.e. non-fatal). After __ubsan_handle_mul_overflow (or another callback) prints an error, the program keeps executing. The source location is marked and further errors about this location are suppressed. This deduplication feature makes logs less noisy. In one program invocation, we may observe errors from potentially multiple source locations.

We can let Clang exit the program upon a UBSan error (with error code 1, customized by UBSAN_OPTIONS=exitcode=2). Just specify -fno-sanitize-recover (alias for -fno-sanitize-recover=all), -fno-sanitize-recover=undefined, or -fno-sanitize-recover=signed-integer-overflow. A non-return callback will be emitted in place of __ubsan_handle_mul_overflow. Note that the emitted LLVM IR terminates the basic block with an unreachable instruction.

1
2
3
4
handler.mul_overflow:                             



Linking

The UBSan callbacks are provided by the runtime. For a link action, we need to inform the compiler driver that the runtime should be linked. This can be done by specifying -fsanitize=undefined as in a compile action. (Actually, any specific UBSan check that needs runtime can be used, e.g. -fsanitize=signed-integer-overflow.)

1
2
clang -c -O2 -fno-omit-frame-pointer -fsanitize=undefined a.c
clang -fsanitize=undefined a.o -o a

When linking an executable, with the default -static-libsan mode, Clang Driver passes --whole-archive $resource_dir/lib/$triple/libclang_rt.ubsan.a --no-whole-archive to the linker.

Minimal runtime

The default mode provides verbose diagnostics which help programmers identify the undefined behavior. On the other hand, the detailed log helps attackers and the code size may be a concern in some configurations.

UBSan provides a minimal runtime mode to log very little information. Specify -fsanitize-minimal-runtime for both compile actions and link actions to enable the mode.

1
2
3
4
5
6
7
8
9
10
11
% clang -fsanitize=undefined -fsanitize-minimal-runtime a.c -o a
% ./a <<< 1073741824
ubsan: mul-overflow by 0x000055565bab6839
a -2147483648
b -2147483648
% clang -fsanitize=undefined -fno-sanitize-recover -fsanitize-minimal-runtime a.c -o a
% ./a <<< 1073741824
ubsan: mul-overflow by 0x000056443f9a7839
[1] 3857513 IOT instruction ./a <<< 1073741824
% clang -fsanitize=undefined -fsanitize-minimal-runtime a.c '-###' |& grep --color=auto ubsan_minimal
... "--whole-archive" "/tmp/Rel/lib/clang/17/lib/x86_64-unknown-linux-gnu/libclang_rt.ubsan_minimal.a" "--no-whole-archive" "--dynamic-list=/tmp/Rel/lib/clang/17/lib/x86_64-unknown-linux-gnu/libclang_rt.ubsan_minimal.a.syms" ...

In the emitted LLVM IR, a different set of callbacks __ubsan_handle_*_minimal are used. They take no argument and therefore make instrumented code much smaller.

1
2
3
handler.mul_overflow:                             


Trap mode

When -fsanitize=signed-integer-overflow is in effect, we can specify -fsanitize-trap=signed-integer-overflow so that Clang will emit a trap instruction instead of a callback. This will greatly decrease the size bloat from instrumentation compared to the default mode.

Usually, we specify the umbrella options -fsanitize-trap=undefined or -fsanitize-trap (alias for -fsanitize-trap=all).

1
clang -S -emit-llvm -O2 -fsanitize=undefined -fsanitize-trap=undefined a.c

(-fsanitize-undefined-trap-on-error is a deprecated alias for -fsanitize=undefined.)

Instead of calling a callback, the emitted LLVM IR will call the LLVM intrinsic llvm.ubsantrap which lowers to a trap instruction aborting the program. As a side benefit of avoiding UBSan callbacks, we don't need a runtime. Therefore, -fsanitize=undefined can be omitted for link actions. Note: -fsanitize-trap=xxx overrides -fsanitize-recover=xxx.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
define dso_local i32 @foo(i32 noundef %a) local_unnamed_addr #0 {
entry:
%0 = add i32 %a, 1073741824
%1 = icmp sgt i32 %0, -1
br i1 %1, label %cont, label %trap, !nosanitize !5

trap:







llvm.ubsantrap has an integer argument. On some architectures this can change the encoding of the trap instruction with different error types.

On x86-64, llvm.ubsantrap lowers to the 4-byte ud1l ubsan_type(%eax),%eax (UD1 with an address-size override prefix; ud1l is the AT&T syntax). We can write a SIGILL handler to disassembly si->si_addr to know which ubsan check failure it is (https://github.com/llvm/llvm-project/blob/main/clang/lib/CodeGen/CodeGenFunction.h#L112). AArch64 is similar. However, many other architectures do not provide sufficient encoding space. For example, PowerPC provides trap (alias for tw 31,0,0). In tw TO,RA,RB, when we specify RA=RB=0, only 4 bits of the 5-bit TO can be used. That is insufficient.

Used together with other sanitizers

UBSan can be used together with many other sanitizers.

1
2
3
4
5
6
clang -fsanitize=address,undefined a.c
clang -fsanitize=fuzzer,undefined -fno-sanitize-recover a.c
clang -fsanitize=hwaddress,undefined a.c
clang -fsanitize=memory,undefined a.c
clang -fsanitize=thread,undefined a.c
clang -fsanitize=cfi,undefined -flto -fvisibility=hidden a.c

Typically one may want to test multiple sanitizers for a project. The ability to run with another sanitizer decreases the number of configurations. In practice people prefer -fsanitize=address,undefined and -fsanitize=hwaddress,undefined over other combinations. Of course, more instrumentations mean more overhead and size bloat.

Using UBSan with libFuzzer is interesting. We can run -fsanitize=fuzzer,address,undefined -fno-sanitize-recover to make libFuzzer abort when an AddressSanitizer or UndefinedBehaviorSanitizer error is detected.

Miscellaneous

In Clang, unlike many other sanitizers, UndefinedBehaviorSanitizer is performed as part of Clang CodeGen instead of a LLVM pass in the optimization pipeline.

TODO Interfere with optimizations

Applications

compiler-rt/lib/ubsan is written in C++. It uses features that are unavailable in some environments (e.g. some operating system kernels). Adopting UBSan in kernels typically requires reimplementing the runtime (usually in C).

In the Linux kernel, UBSAN: run-time undefined behavior sanity checker introduced CONFIG_UBSAN and a runtime implementation.

NetBSD implemented µUBSan in 2018.


文章来源: https://maskray.me/blog/2023-01-29-all-about-undefined-behavior-sanitizer
如有侵权请联系:admin#unsafe.sh