Understanding alignment - from source to object file

Alignment refers to the practice of placing data or code at memory addresses that are multiples of a specific value, typically a power of 2. This is typically for the underlying hardware requirement, or for efficient access. Misalignment might lead to exceptions or performance penalties.

This blog post explores how alignment is represented and managed as C++ code is transformed through the compilation pipeline: from source code to LLVM IR, assembly, and finally the object file. We'll focus on alignment for both variables and functions.

Alignment in C++ source code

C++ [basic.align] specifies

Object types have alignment requirements ([basic.fundamental], [basic.compound]) which place restrictions on the addresses at which an object of that type may be allocated. An alignment is an implementation-defined integer value representing the number of bytes between successive addresses at which a given object can be allocated. An object type imposes an alignment requirement on every object of that type; stricter alignment can be requested using the alignment specifier ([dcl.align]). Attempting to create an object ([intro.object]) in storage that does not meet the alignment requirements of the object's type is undefined behavior.

alignas can be used to request a stricter alignment. [decl.align]

An alignment-specifier may be applied to a variable or to a class data member, but it shall not be applied to a bit-field, a function parameter, or an exception-declaration ([except.handle]). An alignment-specifier may also be applied to the declaration of a class (in an elaborated-type-specifier ([dcl.type.elab]) or class-head ([class]), respectively). An alignment-specifier with an ellipsis is a pack expansion ([temp.variadic]).

Example:

1 2	alignas(16) int i0; struct alignas(8) S {};

If the strictest alignas on a declaration is weaker than the alignment it would have without any alignas specifiers, the program is ill-formed.

% echo 'alignas(2) int v;' | clang -fsyntax-only -xc++ -
<stdin>:1:1: error: requested alignment is less than minimum alignment of 4 for type 'int'
    1 | alignas(2) int v;
      | ^
1 error generated.

However, the GNU extension __attribute__((aligned(1))) can request a weaker alignment.

LLVM IR representation

In the LLVM Intermediate Representation (IR), both global variables and functions can have an align attribute to specify their required alignment.

Global variable alignment:

An explicit alignment may be specified for a global, which must be a power of 2. If not present, or if the alignment is set to zero, the alignment of the global is set by the target to whatever it feels convenient. If an explicit alignment is specified, the global is forced to have exactly that alignment. Targets and optimizers are not allowed to over-align the global if the global has an assigned section. In this case, the extra alignment could be observable: for example, code could assume that the globals are densely packed in their section and try to iterate over them as an array, alignment padding would break this iteration. For TLS variables, the module flag MaxTLSAlign, if present, limits the alignment to the given value. Optimizers are not allowed to impose a stronger alignment on these variables. The maximum alignment is 1 << 32.

Function alignment

An explicit alignment may be specified for a function. If not present, or if the alignment is set to zero, the alignment of the function is set by the target to whatever it feels convenient. If an explicit alignment is specified, the function is forced to have at least that much alignment. All alignments must be a power of 2.

In addition, align can be used in parameter attributes to decorate a pointer or vector of pointers.

LLVM back end representation

Global variables AsmPrinter::emitGlobalVariable determines the alignment for global variables based on a set of nuanced rules:

With an explicit alignment (explicit),
- If the variable has a section attribute, return explicit.
- Otherwise, compute a preferred alignment for the data layout (getPrefTypeAlign, referred to as pref). Return pref < explicit ? explicit : max(E, getABITypeAlign).
Without an explicit alignment: return getPrefTypeAlign.

getPrefTypeAlign employs a heuristic for global variable definitions: if the variable's size exceeds 16 bytes and the preferred alignment is less than 16 bytes, it sets the alignment to 16 bytes. This heuristic balances performance and memory efficiency for common cases, though it may not be optimal for all scenarios. (See Preferred alignment of globals > 16bytes in 2012)

For assembly output, AsmPrinter emits .p2align (power of 2 alignment) directives with a zero fill value (i.e. the padding bytes are zeros).

% echo 'int v0;' | clang --target=x86_64 -S -xc - -o -
        .file   "-"
        .type   v0,@object                      # @v0
        .bss
        .globl  v0
        .p2align        2, 0x0
v0:
        .long   0                               # 0x0
        .size   v0, 4
...

Functions For functions, AsmPrinter::emitFunctionHeader emits alignment directives based on the machine function's alignment settings.

void MachineFunction::init() {
...
  Alignment = STI.getTargetLowering()->getMinFunctionAlignment();

  
  if (!F.hasOptSize())
    Alignment = std::max(Alignment,
                         STI.getTargetLowering()->getPrefFunctionAlignment());

The subtarget's minimum function alignment
If the function is not optimized for size (i.e. not compiled with -Os or -Oz), take the maximum of the minimum alignment and the preferred alignment. For example, X86TargetLowering sets the preferred function alignment to 16.

% echo 'void f(){} [[gnu::aligned(32)]] void g(){}' | clang --target=x86_64 -S -xc - -o -
        .file   "-"
        .text
        .globl  f                               # -- Begin function f
        .p2align        4
        .type   f,@function
f:                                      # @f
...
        .globl  g                               # -- Begin function g
        .p2align        5
        .type   g,@function
g:                                      # @g

The emitted .p2align directives omits the fill value argument: for code sections, this space is filled with no-op instructions.

Assembly code

GNU Assembler supports multiple alignment directives:

.p2align 3: align to 2**3
.balign 8: align to 8
.align 8: this is identical to .balign on some targets and .p2align on the others.

Clang supports "direct object emission" (clang -c typically bypasses a separate assembler), the LLVMAsmPrinter directly uses the MCObjectStreamer API. This allows Clang to emit the machine code directly into the object file, bypassing the need to parse and interpret alignment directives and instructions from a text-based assembly file.

Object file format

In an object file, the section alignment is determined by the strictest alignment directive present in that section. The assembler sets the section's overall alignment to the maximum of all these directives, as if an implicit directive were at the start.

.section .text.a,"ax"
# implicit alignment max(4, 8)

.long 0
.balign 4
.long 0
.balign 8

This alignment is stored in the sh_addralign field within the ELF section header table. You can inspect this value using tools such as readelf -WS (llvm-readelf -S) or objdump -h (llvm-objdump -h).

Linker considerations

The linker combines multiple object files into a single executable. When it maps input sections from each object file into output sections in the final executable, it ensures that section alignments specified in the object files are preserved.

How the linker handles section alignment

Output section alignment: This is the maximum sh_addralign value among all its contributing input sections. This ensures the strictest alignment requirements are met.

Section placement: The linker also uses input sh_addralign information to position each input section within the output section. As illustrated in the following example, each input section (like a.o:.text.f or b.o:.text) is aligned according to its sh_addralign value before being placed sequentially.

output .text
  # align to sh_addralign(a.o:.text). No-op if this is the first section without any preceding DOT assignment or data command.
  a.o:.text
  # align to sh_addralign(a.o:.text.f)
  a.o:.text.f
  # align to sh_addralign(b.o:.text)
  b.o:.text
  # align to sh_addralign(b.o:.text.g)
  b.o:.text.g

Furthermore, a linker script can override the default behavior. The ALIGN keyword enforces a stricter alignment for a section. For example .text : ALIGN(32) { ... } aligns the section to at least a 32-byte boundary. This is often done to optimize for specific hardware or for memory mapping requirements.

The SUBALIGN keyword on an output section overrides the input section alignments.

Padding: To achieve the required alignment, the linker may insert padding between sections or before the first input section (if there is a gap after the output section start). The fill value is determined by the following rules:

If specified, use the =fillexp output section attribute (within an output section description).
If a non-code section, use zero.
Otherwise, use a trap or no-op instructin.

Target-specific special cases

Some platforms have special rules. For example,

On SystemZ, the larl (load address relative long) instruction cannot generate odd addresses. To prevent GOT indirection, compilers ensure that symbols are at least aligned by 2. (Toolchain notes on z/Architecture)
On AIX, the default alignment mode is power: for double and long double, the first member of this data type is aligned according to its natural alignment value; subsequent members of the aggregate are aligned on 4-byte boundaries. (https://reviews.llvm.org/D79719)
z/OS caps the maximum alignment of static storage variables to 16. (https://reviews.llvm.org/D98864)

The standard representation of the the Itanium C++ ABI requires member function pointers to be even, to distinguish between virtual and non-virtual functions.

In the standard representation, a member function pointer for a virtual function is represented with ptr set to 1 plus the function's v-table entry offset (in bytes), converted to a function pointer as if by reinterpret_cast<fnptr_t>(uintfnptr_t(1 + offset)), where uintfnptr_t is an unsigned integer of the same size as fnptr_t.

Such a function must be imposed a stricter alignment even if __attribute__((aligned(1))) is specified:

1	virtual void bar1() __attribute__((aligned(1)));

IR and assembly generation

In clang/lib/CodeGen/CodeGenModule.cpp, -falign-function=N sets the alignment if a function does not have the gnu::aligned attribute.