UndefinedBehaviorSanitizer
(UBSan) is a undefined behavior detector for C/C++. It consists of a
code instrumentation and a runtime (in
llvm-project/compiler-rt
).
Clang and GCC have independent implementations. Clang implemented
the instrumentation first in 2009-12, initially named
-fcatch-undefined-behavior
. GCC 4.9 implemented
-fsanitize=undefined
in 2013-08.
Available checks
There are many undefined behavior checks
which can be detected. One may specify -fsanitize=xxx
where
xxx is the name of the check individually, or more commonly, specify
-fsanitize=undefined
to enable all of them.
Some checks are not undefined behavior per language standards, but
are often code smell and lead to surprising results. They are
implicit-unsigned-integer-truncation
,
implicit-integer-sign-change
,
unsigned-integer-overflow
, etc.
1 | % clang -fsanitize=undefined -xc /dev/null '-###' |
GCC implements slightly fewer checks than Clang.
Modes
UBSan provides 3 modes to allow tradeoffs between code size and diagnostic verboseness.
- Default mode
- Minimal runtime (
-fsanitize-minimal-runtime
) - Trap mode (
-fsanitize-trap=undefined
)
Let's use a C function to compare the behaviors.
1 | #include <stdio.h> |
Default mode
With the umbrella option -fsanitize=undefined
or the
specific -fsanitize=signed-integer-overflow
, Clang emits
the following LLVM IR for foo
:
1 | define dso_local i32 @foo(i32 noundef %a) local_unnamed_addr #0 { |
The add
and icmp
instructions check whether
the argument is in the range [-0x40000000, 0x40000000) (no overflow). If
yes, return the non-overflow result. Otherwise, the emitted code calls
the callback __ubsan_handle_mul_overflow
with arguments
describing the source location. __ubsan_handle_mul_overflow
is implemented in the runtime (in
llvm-project/compiler-rt
).
By default, all UBSan checks except return,unreachable
are recoverable (i.e. non-fatal). After
__ubsan_handle_mul_overflow
(or another callback) prints an
error, the program keeps executing. The source location is marked and
further errors about this location are suppressed.
This deduplication feature makes logs less noisy. In one program
invocation, we may observe errors from potentially multiple source
locations.
We can let Clang exit the program upon a UBSan error (with error code
1, customized by UBSAN_OPTIONS=exitcode=2
). Just specify
-fno-sanitize-recover
(alias for
-fno-sanitize-recover=all
),
-fno-sanitize-recover=undefined
, or
-fno-sanitize-recover=signed-integer-overflow
. A non-return
callback will be emitted in place of
__ubsan_handle_mul_overflow
. Note that the emitted LLVM IR
terminates the basic block with an unreachable
instruction.
1 | handler.mul_overflow: |
Linking
The UBSan callbacks are provided by the runtime. For a link action,
we need to inform the compiler driver that the runtime should be linked.
This can be done by specifying -fsanitize=undefined
as in a
compile action. (Actually, any specific UBSan check that needs runtime
can be used, e.g. -fsanitize=signed-integer-overflow
.)
1 | clang -c -O2 -fno-omit-frame-pointer -fsanitize=undefined a.c |
When linking an executable, with the default
-static-libsan
mode, Clang Driver passes
--whole-archive $resource_dir/lib/$triple/libclang_rt.ubsan.a --no-whole-archive
to the linker.
Minimal runtime
The default mode provides verbose diagnostics which help programmers identify the undefined behavior. On the other hand, the detailed log helps attackers and the code size may be a concern in some configurations.
UBSan provides a minimal runtime mode to log very little information.
Specify -fsanitize-minimal-runtime
for both compile actions
and link actions to enable the mode.
1 | % clang -fsanitize=undefined -fsanitize-minimal-runtime a.c -o a |
In the emitted LLVM IR, a different set of callbacks
__ubsan_handle_*_minimal
are used. They take no argument
and therefore make instrumented code much smaller.
1 | handler.mul_overflow: |
Trap mode
When -fsanitize=signed-integer-overflow
is in effect, we
can specify -fsanitize-trap=signed-integer-overflow
so that
Clang will emit a trap instruction instead of a callback. This will
greatly decrease the size bloat from instrumentation compared to the
default mode.
Usually, we specify the umbrella options
-fsanitize-trap=undefined
or -fsanitize-trap
(alias for -fsanitize-trap=all
).
1 | clang -S -emit-llvm -O2 -fsanitize=undefined -fsanitize-trap=undefined a.c |
(-fsanitize-undefined-trap-on-error
is a deprecated
alias for -fsanitize=undefined
.)
Instead of calling a callback, the emitted LLVM IR will call the LLVM
intrinsic llvm.ubsantrap
which lowers to a trap instruction
aborting the program. As a side benefit of avoiding UBSan callbacks, we
don't need a runtime. Therefore, -fsanitize=undefined
can
be omitted for link actions. Note: -fsanitize-trap=xxx
overrides -fsanitize-recover=xxx
.
1 | define dso_local i32 @foo(i32 noundef %a) local_unnamed_addr #0 { |
llvm.ubsantrap
has an integer argument. On some
architectures this can change the encoding of the trap instruction with
different error types.
On x86-64, llvm.ubsantrap
lowers to the 4-byte
ud1l ubsan_type(%eax),%eax
(UD1 with an address-size
override prefix; ud1l
is the AT&T syntax). We can write
a SIGILL handler to disassembly si->si_addr
to know
which ubsan check failure it is (https://github.com/llvm/llvm-project/blob/main/clang/lib/CodeGen/CodeGenFunction.h#L112).
AArch64 is similar. However, many other architectures do not provide
sufficient encoding space. For example, PowerPC provides
trap
(alias for tw 31,0,0
). In
tw TO,RA,RB
, when we specify RA=RB=0
, only 4
bits of the 5-bit TO
can be used. That is insufficient.
Used together with other sanitizers
UBSan can be used together with many other sanitizers.
1 | clang -fsanitize=address,undefined a.c |
Typically one may want to test multiple sanitizers for a project. The
ability to run with another sanitizer decreases the number of
configurations. In practice people prefer
-fsanitize=address,undefined
and
-fsanitize=hwaddress,undefined
over other combinations. Of
course, more instrumentations mean more overhead and size bloat.
Using UBSan with libFuzzer is interesting. We can run
-fsanitize=fuzzer,address,undefined -fno-sanitize-recover
to make libFuzzer abort when an AddressSanitizer or
UndefinedBehaviorSanitizer error is detected.
Miscellaneous
In Clang, unlike many other sanitizers, UndefinedBehaviorSanitizer is performed as part of Clang CodeGen instead of a LLVM pass in the optimization pipeline.
TODO Interfere with optimizations
Applications
compiler-rt/lib/ubsan
is written in C++. It uses
features that are unavailable in some environments (e.g. some operating
system kernels). Adopting UBSan in kernels typically requires
reimplementing the runtime (usually in C).
In the Linux kernel, UBSAN:
run-time undefined behavior sanity checker introduced
CONFIG_UBSAN
and a runtime implementation.
NetBSD implemented µUBSan in 2018.