Alignment refers to the practice of placing data or code at memory addresses that are multiples of a specific value, typically a power of 2. This is typically for the underlying hardware requirement, or for efficient access. Misalignment might lead to exceptions or performance penalties.
This blog post explores how alignment is represented and managed as C++ code is transformed through the compilation pipeline: from source code to LLVM IR, assembly, and finally the object file. We'll focus on alignment for both variables and functions.
Alignment in C++ source code
C++ [basic.align] specifies
Object types have alignment requirements ([basic.fundamental], [basic.compound]) which place restrictions on the addresses at which an object of that type may be allocated. An alignment is an implementation-defined integer value representing the number of bytes between successive addresses at which a given object can be allocated. An object type imposes an alignment requirement on every object of that type; stricter alignment can be requested using the alignment specifier ([dcl.align]). Attempting to create an object ([intro.object]) in storage that does not meet the alignment requirements of the object's type is undefined behavior.
alignas
can be used to request a stricter alignment. [decl.align]
An alignment-specifier may be applied to a variable or to a class data member, but it shall not be applied to a bit-field, a function parameter, or an exception-declaration ([except.handle]). An alignment-specifier may also be applied to the declaration of a class (in an elaborated-type-specifier ([dcl.type.elab]) or class-head ([class]), respectively). An alignment-specifier with an ellipsis is a pack expansion ([temp.variadic]).
Example:
1 | alignas(16) int i0; |
If the strictest alignas
on a declaration is weaker than
the alignment it would have without any alignas specifiers, the program
is ill-formed.
1 | % echo 'alignas(2) int v;' | clang -fsyntax-only -xc++ - |
However, the GNU extension __attribute__((aligned(1)))
can request a weaker alignment.
LLVM IR representation
In the LLVM Intermediate Representation (IR), both global variables
and functions can have an align
attribute to specify their
required alignment.
An explicit alignment may be specified for a global, which must be a power of 2. If not present, or if the alignment is set to zero, the alignment of the global is set by the target to whatever it feels convenient. If an explicit alignment is specified, the global is forced to have exactly that alignment. Targets and optimizers are not allowed to over-align the global if the global has an assigned section. In this case, the extra alignment could be observable: for example, code could assume that the globals are densely packed in their section and try to iterate over them as an array, alignment padding would break this iteration. For TLS variables, the module flag MaxTLSAlign, if present, limits the alignment to the given value. Optimizers are not allowed to impose a stronger alignment on these variables. The maximum alignment is 1 << 32.
Function alignment
An explicit alignment may be specified for a function. If not present, or if the alignment is set to zero, the alignment of the function is set by the target to whatever it feels convenient. If an explicit alignment is specified, the function is forced to have at least that much alignment. All alignments must be a power of 2.
In addition, align
can be used in parameter attributes
to decorate a pointer or vector of pointers.
LLVM back end representation
Global variables
AsmPrinter::emitGlobalVariable
determines the alignment for
global variables based on a set of nuanced rules:
- With an explicit alignment (
explicit
),- If the variable has a section attribute, return
explicit
. - Otherwise, compute a preferred alignment for the data layout
(
getPrefTypeAlign
, referred to aspref
). Returnpref < explicit ? explicit : max(E, getABITypeAlign)
.
- If the variable has a section attribute, return
- Without an explicit alignment: return
getPrefTypeAlign
.
getPrefTypeAlign
employs a heuristic for global variable
definitions: if the variable's size exceeds 16 bytes and the preferred
alignment is less than 16 bytes, it sets the alignment to 16 bytes. This
heuristic balances performance and memory efficiency for common cases,
though it may not be optimal for all scenarios. (See Preferred
alignment of globals > 16bytes in 2012)
For assembly output, AsmPrinter emits .p2align
(power of
2 alignment) directives with a zero fill value (i.e. the padding bytes
are zeros).
1 | % echo 'int v0;' | clang --target=x86_64 -S -xc - -o - |
Functions For functions,
AsmPrinter::emitFunctionHeader
emits alignment directives
based on the machine function's alignment settings.
1 | void MachineFunction::init() { |
- The subtarget's minimum function alignment
- If the function is not optimized for size (i.e. not compiled with
-Os
or-Oz
), take the maximum of the minimum alignment and the preferred alignment. For example,X86TargetLowering
sets the preferred function alignment to 16.
1 | % echo 'void f(){} [[gnu::aligned(32)]] void g(){}' | clang --target=x86_64 -S -xc - -o - |
The emitted .p2align
directives omits the fill value
argument: for code sections, this space is filled with no-op
instructions.
Assembly code
GNU Assembler supports multiple alignment directives:
.p2align 3
: align to 2**3.balign 8
: align to 8.align 8
: this is identical to.balign
on some targets and.p2align
on the others.
Clang supports "direct object emission" (clang -c
typically bypasses a separate assembler), the LLVMAsmPrinter directly
uses the MCObjectStreamer
API. This allows Clang to emit
the machine code directly into the object file, bypassing the need to
parse and interpret alignment directives and instructions from a
text-based assembly file.
Object file format
In an object file, the section alignment is determined by the strictest alignment directive present in that section. The assembler sets the section's overall alignment to the maximum of all these directives, as if an implicit directive were at the start.
1 | .section .text.a,"ax" |
This alignment is stored in the sh_addralign
field
within the ELF section header table. You can inspect this value using
tools such as readelf -WS
(llvm-readelf -S
) or
objdump -h
(llvm-objdump -h
).
Linker considerations
The linker combines multiple object files into a single executable. When it maps input sections from each object file into output sections in the final executable, it ensures that section alignments specified in the object files are preserved.
How the linker handles section alignment
Output section alignment: This is the maximum
sh_addralign
value among all its contributing input
sections. This ensures the strictest alignment requirements are met.
Section placement: The linker also uses input
sh_addralign
information to position each input section
within the output section. As illustrated in the following example, each
input section (like a.o:.text.f
or b.o:.text
)
is aligned according to its sh_addralign
value before being
placed sequentially.
1 | output .text |
Furthermore, a linker script can override the default behavior. The
ALIGN
keyword enforces a stricter alignment for a section.
For example .text : ALIGN(32) { ... }
aligns the section to
at least a 32-byte boundary. This is often done to optimize for specific
hardware or for memory mapping requirements.
The SUBALIGN
keyword on an output section overrides the
input section alignments.
Padding: To achieve the required alignment, the linker may insert padding between sections or before the first input section (if there is a gap after the output section start). The fill value is determined by the following rules:
- If specified, use the
=fillexp
output section attribute (within an output section description). - If a non-code section, use zero.
- Otherwise, use a trap or no-op instructin.
Target-specific special cases
Some platforms have special rules. For example,
- On SystemZ, the
larl
(load address relative long) instruction cannot generate odd addresses. To prevent GOT indirection, compilers ensure that symbols are at least aligned by 2. (Toolchain notes on z/Architecture) - On AIX, the default alignment mode is
power
: for double and long double, the first member of this data type is aligned according to its natural alignment value; subsequent members of the aggregate are aligned on 4-byte boundaries. (https://reviews.llvm.org/D79719) - z/OS caps the maximum alignment of static storage variables to 16. (https://reviews.llvm.org/D98864)
The standard representation of the the Itanium C++ ABI requires member function pointers to be even, to distinguish between virtual and non-virtual functions.
In the standard representation, a member function pointer for a virtual function is represented with ptr set to 1 plus the function's v-table entry offset (in bytes), converted to a function pointer as if by
reinterpret_cast<fnptr_t>(uintfnptr_t(1 + offset))
, whereuintfnptr_t
is an unsigned integer of the same size asfnptr_t
.
Such a function must be imposed a stricter alignment even if
__attribute__((aligned(1)))
is specified:
1 | virtual void bar1() __attribute__((aligned(1))); |
IR and assembly generation
In clang/lib/CodeGen/CodeGenModule.cpp
,
-falign-function=N
sets the alignment if a function does
not have the gnu::aligned
attribute.