LLVM integrated assembler: Improving expressions and relocations

UNDER CONSTRUCTION

In my previous post, LLVM integrated assembler: Improving MCExpr and MCValue delved into enhancements made to LLVM's internal MCExpr and MCValue representations. This post covers recent refinements to MC, focusing on expression resolving and relocation generation.

Symbol equating directives

In GNU Assembler, the following directives are called symbol equating. I have re-read its documentation https://sourceware.org/binutils/docs/as.html. Yes, it uses "equating" instead of "assignment" or "definition".

symbol = expression (multiple = on the same symbol is allowed)
.set symbol, expression (equivalent to =)
.equ symbol, expression (equivalent to =)
.equiv symbol, expression (redefinition leads to errors)
.eqv symbol, expression (lazy evaluation, not implemented in LLVM integrated assembler)

Preventing cyclic dependencies

Equated symbols may form a cycle, which is not allowed.

# CHECK: [[#@LINE+2]]:7: error: cyclic dependency detected for symbol 'a'
# CHECK: [[#@LINE+1]]:7: error: expression could not be evaluated
a = a + 1

# CHECK: [[#@LINE+3]]:6: error: cyclic dependency detected for symbol 'b1'
# CHECK: [[#@LINE+1]]:6: error: expression could not be evaluated
b0 = b1
b1 = b2
b2 = b0

Previously, LLVM's interated assembler used an occurs check to detect these cycles when parsing symbol equating directives.

bool parseAssignmentExpression(StringRef Name, bool allow_redef,
                               MCAsmParser &Parser, MCSymbol *&Sym,
                               const MCExpr *&Value) {
  ...
  
  
  Sym = Parser.getContext().lookupSymbol(Name);
  if (Sym) {
    
    
    
    
    if (Value->isSymbolUsedInExpression(Sym))
      return Parser.Error(EqualLoc, "Recursive use of '" + Name + "'");
    ...
  }

isSymbolUsedInExpression implemented occurs check as a tree (or more accurately, a DAG) traversal.

bool MCExpr::isSymbolUsedInExpression(const MCSymbol *Sym) const {
  switch (getKind()) {
  case MCExpr::Binary: {
    const MCBinaryExpr *BE = static_cast<const MCBinaryExpr *>(this);
    return BE->getLHS()->isSymbolUsedInExpression(Sym) ||
           BE->getRHS()->isSymbolUsedInExpression(Sym);
  }
  case MCExpr::Target: {
    const MCTargetExpr *TE = static_cast<const MCTargetExpr *>(this);
    return TE->isSymbolUsedInExpression(Sym);
  }
  case MCExpr::Constant:
    return false;
  case MCExpr::SymbolRef: {
    const MCSymbol &S = static_cast<const MCSymbolRefExpr *>(this)->getSymbol();
    if (S.isVariable() && !S.isWeakExternal())
      return S.getVariableValue()->isSymbolUsedInExpression(Sym);
    return &S == Sym;
  }
  case MCExpr::Unary: {
    const MCExpr *SubExpr =
        static_cast<const MCUnaryExpr *>(this)->getSubExpr();
    return SubExpr->isSymbolUsedInExpression(Sym);
  }
  }

  llvm_unreachable("Unknown expr kind!");
}

While generally effective, this routine wasn't universally applied across all symbol equating scenarios, such as with .weakref or some target-specific parsing code, leading to potential undetected cycles, and therefore infinite loop in assembler execution.

To address this, I adopted a 2-color depth-first search (DFS) algorithm. While a 3-color DFS is typical for DAGs, a 2-color approach suffices for our trees, although this might lead to more work when a symbol is visited multiple times. Shared subexpressions are very rare in LLVM.

Here is the relevant change to evaluateAsRelocatableImpl. I also need a new bit from MCSymbol.

@@ -497,13 +498,25 @@ bool MCExpr::evaluateAsRelocatableImpl(MCValue &Res, const MCAssembler *Asm,

   case SymbolRef: {
     const MCSymbolRefExpr *SRE = cast<MCSymbolRefExpr>(this);
-    const MCSymbol &Sym = SRE->getSymbol();
+    MCSymbol &Sym = const_cast<MCSymbol &>(SRE->getSymbol());
     const auto Kind = SRE->getKind();
     bool Layout = Asm && Asm->hasLayout();

     // Evaluate recursively if this is a variable.
+    if (Sym.isResolving()) {
+      if (Asm && Asm->hasFinalLayout()) {
+        Asm->getContext().reportError(
+            Sym.getVariableValue()->getLoc(),
+            "cyclic dependency detected for symbol '" + Sym.getName() + "'");
+        Sym.IsUsed = false;
+        Sym.setVariableValue(MCConstantExpr::create(0, Asm->getContext()));
+      }
+      return false;
+    }
     if (Sym.isVariable() && (Kind == MCSymbolRefExpr::VK_None || Layout) &&
         canExpand(Sym, InSet)) {
+      Sym.setIsResolving(true);
+      auto _ = make_scope_exit([&] { Sym.setIsResolving(false); });
       bool IsMachO =
           Asm && Asm->getContext().getAsmInfo()->hasSubsectionsViaSymbols();
       if (Sym.getVariableValue()->evaluateAsRelocatableImpl(Res, Asm,

Unfortunately, I cannot remove MCExpr::isSymbolUsedInExpression, as it is still used by AMDGPU ([AMDGPU] Avoid resource propagation for recursion through multiple functions).

.weakref

Expression resolving and reassignments

= and its equivalents (.set, .equ) allow a symbol to be equated multiple times. This means when a symbol is referenced, its current value is captured at that moment, and subsequent reassignments do not alter prior references.

.data
.set x, 0
.long x         // reference the first instance
x = .-.data
.long x         // reference the second instance
.set x,.-.data
.long x         // reference the third instance

The assembly code evaluates to .long 0; .long 4; .long 8.

Historically, the LLVM integrated assembler restricted reassigning symbols whose value wasn't a parse-time integer constant (MCConstExpr). This was a safeguard against potentially unsafe reassignments, as an old value might still be referenced.

% clang -c g.s
g.s:6:8: error: invalid reassignment of non-absolute variable 'x'
.set x,.-.data
       ^

Over the past few years, during our work on porting Clang to Linux kernel ports, we worked around this by modifying the assembly code itself:

ARM: 8971/1: replace the sole use of a symbol with its definition in 2020-04
crypto: aesni - add compatibility with IAS in 2020-07
powerpc/64/asm: Do not reassign labels in 2021-12

This prior behavior wasn't ideal. I've since enabled proper reassignment by implementing a system where the symbol is cloned upon redefinition, and the symbol table is updated accordingly. Crucially, any existing references to the original symbol remain unchanged, and the original symbol is no longer included in the final emitted symbol table.

Before rolling out this improvement, I discovered problematic uses in the AMDGPU and ARM64EC backends that required specific fixes or workarounds. This is a common challenge when making general improvements to LLVM's MC layer: you often need to untangle and resolve individual backend-specific "hacks" before a more generic interface enhancement can be applied.

Relocation generation

For a deeper dive into the concepts of relocation generation, you might find my previous post, Relocation generation in assemblers, helpful.

Driven by the need to support new RISC-V vendor relocations (e.g., Xqci extensions from Qualcomm) and my preference against introducing an extra MCAsmBackend hook, I've significantly refactored LLVM's relocation generation framework. This effort generalized existing RISC-V/LoongArch ADD/SUB relocation logic and enabled its customization for other targets like AVR and PowerPC.

MC: Generalize RISCV/LoongArch handleAddSubRelocations and AVR shouldForceRelocation

The linker relaxation framework sometimes generated redundant relocations that could have been resolved. This occurred in several scenarios, including:

.option norelax
j label
// For assembly input, RISCVAsmParser::ParseInstruction sets ForceRelocs (https://reviews.llvm.org/D46423).
// For direct object emission, RISCVELFStreamer sets ForceRelocs (#77436)
.option relax
call foo  // linker-relaxable

.option norelax
j label   // redundant relocation due to ForceRelocs
.option relax

label:

And also with label differences within a section without linker-relaxable instructions:

call foo

.section .text1,"ax"
# No linker-relaxable instruction. Label differences should be resolved.
w1:
  nop
w2:

.data
# Redundant R_RISCV_SET32 and R_RISCV_SUB32
.long w2-w1

These issues have now been resolved through a series of patches, significantly revamping the target-neutral relocation generation framework. Key contributions include:

I've also streamlined relocation generation within the SPARC backend. Given its minimal number of relocations, the SPARC implementation could serve as a valuable reference for downstream targets seeking to customize their own relocation handling.