UNDER CONSTRUCTION
In my previous post, LLVM integrated assembler: Improving MCExpr and MCValue delved into enhancements made to LLVM's internal MCExpr and MCValue representations. This post covers recent refinements to MC, focusing on expression resolving and relocation generation.
Symbol equating directives
In GNU Assembler, the following directives are called symbol equating. I have re-read its documentation https://sourceware.org/binutils/docs/as.html. Yes, it uses "equating" instead of "assignment" or "definition".
symbol = expression
(multiple=
on the same symbol is allowed).set symbol, expression
(equivalent to=
).equ symbol, expression
(equivalent to=
).equiv symbol, expression
(redefinition leads to errors).eqv symbol, expression
(lazy evaluation, not implemented in LLVM integrated assembler)
Preventing cyclic dependencies
Equated symbols may form a cycle, which is not allowed.
1 | # CHECK: [[#@LINE+2]]:7: error: cyclic dependency detected for symbol 'a' |
Previously, LLVM's interated assembler used an occurs check to detect these cycles when parsing symbol equating directives.
1 | bool parseAssignmentExpression(StringRef Name, bool allow_redef, |
isSymbolUsedInExpression
implemented occurs check as a
tree (or more accurately, a DAG) traversal.
1 | bool MCExpr::isSymbolUsedInExpression(const MCSymbol *Sym) const { |
While generally effective, this routine wasn't universally applied
across all symbol equating scenarios, such as with .weakref
or some target-specific parsing code, leading to potential undetected
cycles, and therefore infinite loop in assembler execution.
To address this, I adopted a 2-color depth-first search (DFS) algorithm. While a 3-color DFS is typical for DAGs, a 2-color approach suffices for our trees, although this might lead to more work when a symbol is visited multiple times. Shared subexpressions are very rare in LLVM.
Here is the relevant change to
evaluateAsRelocatableImpl
. I also need a new bit from
MCSymbol
.
1 | @@ -497,13 +498,25 @@ bool MCExpr::evaluateAsRelocatableImpl(MCValue &Res, const MCAssembler *Asm, |
Unfortunately, I cannot remove
MCExpr::isSymbolUsedInExpression
, as it is still used by
AMDGPU ([AMDGPU] Avoid
resource propagation for recursion through multiple functions).
.weakref
Expression resolving and reassignments
=
and its equivalents (.set
,
.equ
) allow a symbol to be equated multiple times. This
means when a symbol is referenced, its current value is captured at that
moment, and subsequent reassignments do not alter prior references.
1 | .data |
The assembly code evaluates to
.long 0; .long 4; .long 8
.
Historically, the LLVM integrated assembler restricted reassigning
symbols whose value wasn't a parse-time integer constant
(MCConstExpr
). This was a safeguard against potentially
unsafe reassignments, as an old value might still be referenced.
1 | % clang -c g.s |
Over the past few years, during our work on porting Clang to Linux kernel ports, we worked around this by modifying the assembly code itself:
- ARM: 8971/1: replace the sole use of a symbol with its definition in 2020-04
- crypto: aesni - add compatibility with IAS in 2020-07
- powerpc/64/asm: Do not reassign labels in 2021-12
This prior behavior wasn't ideal. I've since enabled proper reassignment by implementing a system where the symbol is cloned upon redefinition, and the symbol table is updated accordingly. Crucially, any existing references to the original symbol remain unchanged, and the original symbol is no longer included in the final emitted symbol table.
Before rolling out this improvement, I discovered problematic uses in the AMDGPU and ARM64EC backends that required specific fixes or workarounds. This is a common challenge when making general improvements to LLVM's MC layer: you often need to untangle and resolve individual backend-specific "hacks" before a more generic interface enhancement can be applied.
- MCParser: Error when .set reassigns a non-redefinable variable
- MC: Allow .set to reassign non-MCConstantExpr expressions
Relocation generation
For a deeper dive into the concepts of relocation generation, you might find my previous post, Relocation generation in assemblers, helpful.
Driven by the need to support new RISC-V vendor relocations (e.g.,
Xqci extensions from Qualcomm) and my preference against introducing an
extra MCAsmBackend
hook, I've significantly refactored
LLVM's relocation generation framework. This effort generalized existing
RISC-V/LoongArch ADD/SUB relocation logic and enabled its customization
for other targets like AVR and PowerPC.
The linker relaxation framework sometimes generated redundant relocations that could have been resolved. This occurred in several scenarios, including:
1 | .option norelax |
And also with label differences within a section without linker-relaxable instructions:
1 | call foo |
These issues have now been resolved through a series of patches, significantly revamping the target-neutral relocation generation framework. Key contributions include:
- [MC] Refactor fixup evaluation and relocation generation
- RISCV,LoongArch: Encode RELAX relocation implicitly
- RISCV: Remove shouldForceRelocation and unneeded relocations
- MC: Remove redundant relocations for label differences
I've also streamlined relocation generation within the SPARC backend. Given its minimal number of relocations, the SPARC implementation could serve as a valuable reference for downstream targets seeking to customize their own relocation handling.