My journey with the LLVM project began with a deep dive into the world of lld and binary utilities. Countless hours were spent unraveling the intricacies of object file formats and shaping LLVM's relevant components. Though my interests have since broadened, object file formats remain a personal fascination, often drawing me into discussions around potential changes within LLVM.
This article compares several prominent object file formats, drawing upon my experience and insights.
At the heart of each format lies the representation of essential components like symbols, sections, and relocations. For each control structure, We'll begin with ELF, a widely used format, before venturing into the landscapes of other notable formats.
History of object file formats
The appendix of All about Procedure Linkage Table contains some history of these object file formats. I have move some the notes here with some additions.
a.out
The a.out format was designed for PDP-11 (1970). The quantities were 16-bit, but can be naturally extended to 32-bit or 64-bit.
In Proceedings of the Summer 1990 USENIX Conference, ELF: An Object File to Mitigate Mischievous Misoneism by James Q. Arnold provided some description.
For 32-bit machines, the a.out format was extended in several ways. Most obviously, 16-bit quantities were enlarged to 32-bit values. The symbol table changed to allow names of unlimited length. Relocation entries also changed significantly. Larger programs and different relocation conventions made it necessary to associate a relocation entry with an explicit address, instead of relying on the implicit correspondence between program sections and relocation records.
Many UNIX and UNIX-like operating systems, including SunOS, HP-UX, BSD, and Linux, used a.out before switching to ELF.
The most noticeable extension is dynamic shared library support. (This feature is distinct from static shared library, where each shared library needs a fixed address in the address space.) There are two flavors:
- In 1988, SunOS 4.0 was released with an extended a.out binary format with dynamic shared library support.
- In 1993, on NetBSD, https://github.com/NetBSD/src/commit/97ca10e37476fb84a20a8ec4b0be3188db703670
(
A linker supporting shared libraries.
) and https://github.com/NetBSD/src/commit/3d68d0acaed0a32f929b2c174146c62940005a18 (A linker supporting shared libraries (run-time part).
) added shared library support similar to the SunOS scheme.
FreeBSD a.out(5) provides a nice description.
If you follow recent years' Linux kernel news, there were some discussions when Linux eventually removed a.out support in 2022.
COFF
a.out supports three fixed loadable sections TEXT, DATA, and BSS, which is too restrictive. COFF introduces custom section support and allows up to 32767 sections. The ELF paper contains some remarks:
Common Object File Format (COFF), was designed primarily to support electronic switching systems (the telephone network). Its distinguishing features were multiple sections (text, data, uninitialized memory, reserved memory, overlays, etc.), some support for multiple target processors, defined structures for symbol tables and relocations, and debugging information tailored for the C language.
According to scnhdr.h
in System V Release 2 for NS32xxx,
COFF was designed no later than 1982. Then, System V Release 3 adopted
COFF, which motivated a lot of follow-ups.
- Windows extended COFF to the Portable Executable (PE) format.
- Texas Instruments modified COFF for its TI toolset and then switched to ELF.
- ECOFF used by Tru64 UNIX changed symbol representation.
- IBM developed XCOFF (COFF combined with the TOC module format concept, CSECT, etc) and used it for AIX.
Prominent drawbacks:
- Hard-wiring debugging information tailored for the C language into the symbol structure is complex, space-inefficient, and ugly.
- The auxiliary symbol record design is inflexible and inefficient.
- Not 32-bit-aligned symbol and section structures caused performance issue to earlier systems.
Mach-O
Carnegie Mellon University developed the Mach kernel as a proof of the microkernel concept. The operating system used a format derived from a.out, named the Mach object file format. The abbreviation, Mach-O, is often used instead. The NeXTSTEP operating system and then Darwin adopted Mach-O.
Dynamic shared library support on Mach-O came later than other object
file formats. In a NeXTSTEP manual released in 1995, I can find
MH_FVMLIB
(fixed virtual memory library, which appears to
be a static shared library scheme) but not MH_DYLIB
(used
by modern macOS for .dylib files).
ELF
Frustrations and inherent constraints of COFF, coupled with a self-imposed byte order dilemma, AT&T introduced a groundbreaking format: Executable and Linking Format (ELF). ELF revisited fixed content and hard-wired concepts in previous object file formats, removed unnecessary elements, and made control structures more flexible.
This pivotal shift was embraced by System V Release 4, marking a new era in object file format design. In the 1990s, many UNIX and UNIX-like operating systems, including Solaris, IRIX, HP-UX, Linux, and FreeBSD, switched to ELF.
Symbols
The minimum of a symbol control structure needs to encode the name, section, and value. We can require that every symbol is defined in relation to some section. We can use a section index of zero to represent an undefined symbol.
In a minimum object file format with only few hard-coded sections (a.out), the section field can be omitted. A type field can be used to decide whether the symbol can reference a function or a data object.
1 |
|
The symbol name is represented as a 32-bit index into the string table. A 32-bit integer suffices, while a 16-bit integer would be too small.
st_shndx
uses a size-saving trick. The 16-bit member
encodes a section index. If the member is SHN_XINDEX
(0xffff), then the actual value is contained in the associated section
of type SHT_SYMTAB_SHNDX
. This is a very nice trick because
the number of sections are almost always smaller than 0xff00. In
pathologic cases, there can be more sections, where a section of type
SHT_SYMTAB_SHNDX
is needed.
st_info
specifies the symbol's type (4 bits) and binding
(4 bits) attributes. Types are allocated very conservatively and usually
imply different linker behaviors. The inherently different linker
behaviors for symbol types are not that many. So 4 bits seem small, they
are sufficient in practice. As we will learn, this is significantly
smaller than COFF's type and storage class representation. A symbol's
binding is for the local/weak/global distinction. The reserved 4 bits
can accommodate more values, but only GNU reserves one value
(STB_GNU_UNIQUE
) (a misfeature in my opinion).
In COFF, function symbols can use an auxiliary symbol record to
encode the size of function (x_fsize
;
TotalSize
in PE). In ELF, st_size
is a fixed
member, used for copy relocations and symbolizers. If we eliminate copy
relocations and don't need the symbolization heuristics, this field will
become garbage.
Here is a demonstration if we remove st_size
.
1 |
|
Symbols (a.out)
1 |
|
a.out uses a nlist
to represent a symbol table entry. In
the original format, the name cannot contain more than 8 bytes. Unix's
appreciation of shorter identifier names is related to this:)
To support longer names, extensions allow n_name
to be
interpreted as an index (n_strx
) into the string table.
This member then becomes a size-saving trick by inlining a short name (8
bytes or less) into the structure. Some variants, like binutils' 64-bit
a.out format, use an index exclusively and removed
n_name
.
n_type
, broken down into three sub-fields, describes
whether a symbol is defined or undefined, external or local, and the
symbol type. The values listed on the FreeBSD manpage are also used on
PDP-11. Later System V releases extended the list with a lot of values
for stabs (a debugging format). Some values seem specific to
FORTRAN.
For a defined symbol, n_type
describes whether it is
relative to TEXT, DATA, or BSS.
Symbols (COFF)
1 |
|
COFF adopts a.out's approach to save space in symbol names. This likely made sense when most symbols were shorter. However, with today's often lengthy symbol names, this inlining technique complicates code and increases the control structure size (from 4 to 8 bytes).
The section number is a 16-bit signed integer, supporting up to 32,767 sections. Positive values indicate a section index, while special values include:
N_UNDEF
(0): Undefined symbol (distinct from a.out's n_type representation).N_ABS
(-1): Symbol has an absolute value.N_DEBUG
(-2): Special debugging symbol (value is meaningless).
COFF's n_type
and n_sclass
encode C' type
and storage class information. PE assigns longer names to these types
and storage classes longer names, e.g.,
IMAGE_SYM_TYPE_CHAR/IMAGE_SYM_TYPE_SHORT
,
IMAGE_SYM_CLASS_AUTOMATIC/IMAGE_SYM_CLASS_EXTERNAL
. While
values are mostly consistent, minor differences exist:
- PE's
IMAGE_SYM_TYPE_VOID
(1) is different from System V Release 3's#define T_ARG 1 /* function argument (only used by compiler) */
. - PE's
IMAGE_SYM_CLASS_WEAK_EXTERNAL
(105) is different from System V Release 3's#define C_ALIAS 105 /* duplicate tag */
.
Symbols with C_EXT
(IMAGE_SYM_CLASS_EXTERNAL
) are global and added to the
linker's global symbol table, akin to ELF's STB_GLOBAL
symbol binding.
If we discard the influence of stabs, n_type
and
n_class
just provide a wasteful counterpart to ELF's
st_info
.
n_numaux
relates to Auxiliary Symbol Records, allowing
extra information but introducing non-uniform symbol table entries.
While seemingly beneficial, their use cases are limited and could often
be encoded using separate sections. In PE, an auxiliary symbol record
can represent weak definitions, but weak references are not supported.
They can also provide extra information to section symbols.
ECOFF defines Local Symbol Entry (SYMR) and External Symbol Entry (EXTR).
1 | typedef struct { |
Symbols (Mach-O)
1 |
|
Mach-O's nlist_64
is not that different from a.out's,
with n_other
changed to n_sect
to indicate the
section index. The 8-bit n_sect field restricts representable sections
to 255 without out-of-band data (discussed later). If we extend
n_sect
to 32-bit, with alignment padding the structure size
will increase to 24 bytes, the same as Elf64_Sym
.
Like a.out, the N_EXT
bit of n_type
indicates an external symbol. The N_PEXT
bit indicates a
private external symbol.
Key bits in n_desc
are N_WEAK_DEF
,
N_WEAK_REF
, and N_ALT_ENTRY
.
Sections
1 |
|
The section name is represented as a 32-bit index into the string
table. If we use a 16-bit integer, a large number of section names with
a symbol suffix (e.g. .text.foo
.text.bar
)
could make the index overflow.
sh_type
categorizes the section's contents and
semantics. It avoids hard-coding magic names in many scenarios.
Technically a 16-bit type could work pretty well but was deemed
insufficient for flexibility.
sh_flags
describe miscellaneous attributes, e.g.
writable and executable permissions, and whether the section should
appear in a loadable segment. This member is 32-bit in
Elf32_Shdr
while 64-bit in Elf64_Shdr
. In
practice no architecture defines flags for bits 32 to 63, therefore this
member is somewhat wasteful.
Location and size. sh_offset
gives the byte offset from
the beginning of the file to the first byte in the section. To support
object files larger than 4GiB, this member has to be 64-bit.
sh_size
gives the section's size in bytes. A section type
of SHT_NOBITS
occupies no space in the file. To support
sections larger than 4GiB, this member has to be 64-bit.
Address and alignment. sh_addr
describes the address at
which the section's first byte should reside for an executable or shared
object. It should be zero for relocatable files.
sh_addralign
holds the address alignment. In practice this
member must be a power of 2 even if the generic ABI does not require so.
This member is 64-bit in ELF64, which allows an alignment up to
2**63
. In practice, an alignment larger than the page size
(or the largest huge page size, if huge pages are enabled) does not make
sense, and a maxiumm value of 2**31 is sufficient. Therefore, we could
use a log2 value to hold the alignment.
Connection information. sh_link
holds a section index.
sh_info
holds either a section index or a symbol index. If
you recall that st_shndx
is 16 bits for very solid reason,
you will know that the two fields are somewhat wasteful.
For a table of fixed-size entries, sh_entsize
holds the
entry size in bytes. In some use cases this member is not a power of
two. In practice, one byte suffices.
While ELF's section header structure is designed for flexibility,
potential optimizations could reduce its size without significant loss
of functionality. By using smaller data types for sh_flags
,
sh_link
, sh_info
, and sh_entsize
based on practical needs, we could make the structure significantly
smaller.
1 | // 32 bytes |
Reducing sh_type
into 2 bytes loses flexibility a bit.
If this deems insufficient, we could take 3 bits from
sh_addralign
(by turning it into a bitfield) and give them
to sh_type
.
Sections (COFF)
1 | // COFF (System V Release 3), 40 bytes, when sizeof(long) == 4 |
PE's section control structure demonstrates a minor modification
compared to COFF, s_paddr => VirtualSize
.
The presented structure measures as 40 bytes when long
is 4 bytes. If we extend
s_paddr, s_vaddr, s_size, s_scnptr, s_relptr, s_lnnoptr
to
8 bytes, the structure will be of 64 bytes.
The section name supports up to 8 bytes. A longer name would require an extension similar to the symbol control structure.
Encoding both s_paddr
and s_vaddr
is
wasteful. ELF encodes the physical address in the segment and therefore
removes the member from its section structure.
COFF embeds the location and size of relocations into the section
structure. This is actually pretty nice. A 16-bit s_nreloc
may appear restritive but is sufficient for relocatable files. In
practice, the number of relocations can exceed 65536 for a single
section using relocatable linking.
Line number entries are used to relate addresses to source file line
numbers. s_lnnoptr
and s_nlnno
are for
obsoleted line number information. They are embedded in the section
structure, which is very inflexible.
Sections (Mach-O)
1 |
|
A Mach-O binary is divided into segments, each housing one or more sections. The section structure encodes the section name and the segment name, both can be up to 16 bytes. This representation allows the section names to be read without a string table, but restrictive for descriptive names. Section semantics are derived from the name (unlike ELF).
The segment name is redundantly encoded within the section structure.
We could derive the segment from the section name and flags, e.g.,
S_ATTR_SOME_INSTRUCTIONS => __TEXT
,
S_ZEROFILL => ZeroFill __DATA
.
There is a severe limitation: maximum of 255 sections due to
nlist::n_sect
being a uint8_t
. This is
apparently too restrictive. Thankfully, an innovative feature
.subsections_via_symbols
overcomes the limitation. The
feature uses a monolithic section with "atoms" dividing it into pieces
(subsections). This is more size-efficient than ELF's
-ffunction-sections -fdata-sections
. However, there are
assembler limitations, relocation processing complexity, and potential
loss of ability to ensure that two non-local symbols are not
reordered.
Like COFF, Mach-O embeds the location and size of relocations into the section structure.
The three trailing reserved members are a bad idea. They increase the size considerably. A better approach would be to just change the version number when the structure needs to grow. It's unlikely that an older consumer can interepret a new section with new members set for new semantics.
Relocations
1 |
|
r_info
specifies the symbol table index with respect to
which the relocation must be made, and the type of relocation to
apply.
- ELFCLASS32: 8-bit type, 24-bit symbol index
- ELFCLASS64: 32-bit type, 32-bit symbol index
There are two variants, REL and RELA. Let's quote the generic ABI:
As specified previously, only Elf32_Rela and Elf64_Rela entries contain an explicit addend. Entries of type Elf32_Rel and Elf64_Rel store an implicit addend in the location to be modified. Depending on the processor architecture, one form or the other might be necessary or more convenient. Consequently, an implementation for a particular machine may use one form exclusively or either form depending on context.
Relocatable files need a lot of relocatable types while executables and shared objects need only a few. The former is often called static relocations while the latter is called dynamic relocations.
Of the few dynamic relocation types, most do not need the addend
member. lld provides an option -z rel
to use
SHT_REL/DT_REL
dynamic relocations.
If we disregard the REL dynamic relocation scenario, then all modern architectures use RELA exclusively. Most architectures encode the immediate with only few bits, which are inadequate for many relocatable file uses.
ELFCLASS64, with its 64-bit members, doubles the size compared to ELFCLASS32's 32-bit members. Since relocations often comprise a substantial portion of object files, this size difference can lead to user concerns. However, in practice, a 24-bit symbol index is often sufficient, even in 64-bit contexts. Therefore, if a 64-bit architecture's relocation type requirements are less than 256, ELFCLASS32 can be a viable and more size-efficient option.
Relocations (a.out)
1 |
|
r_symbolnum
mirrors ELF's ELF32_R_SYM
.
The other bitfields, resembling ELF's ELF32_R_TYPE
, but
split into distinct fields:
r_pcrel
r_length
- others
Reserving dedicated semantics for individual bits can limit adaptability. COFF and ELF opted to remove bitfields in favor of a type to provide greater flexibility.
Relocations (COFF)
1 |
|
This format resembles ELF's Elf32_Rel
.
r_vaddr
gives the virtual address of the location at
which to apply the relocation action. If we interpret
r_vaddr
as an offset (as PE does) and restrict section size
to 32 bits, we could reuse this structure for 64-bit architectures.
r_symndx
is a 32-bit symbol table index.
r_type
is a 16-bit relocation type, limited in number
compared to ELF.
COFF generally supports fewer relocation types than ELF. System V
Release 3 defines very few relocations for each architecture. In
binutils, include/coff/*.h
files define relocations for
more architectures.
Relocations (Mach-O)
1 |
|
Mach-O's relocation structure closely mirrors a.out's with adapted
r_symbolnum
meaning. When r_extern == 0
(local), the r_symbolnum
member references a section index
instead of a symbol index. This is to support custom sections, breaking
the three-section limitation (text, data, and bss) of traditional
a.out.
As aforementioned, dedicating bits to bitfields
(r_pcrel
, r_length
, and
r_scattered
greatly restricted the number of relocation
types.
Related to the relocation type limitation, a
.long foo - .
in a data section requires a pair of
relocations, SUBTRACTOR
and/UNSIGNED
. I have
some notes on Port LLVM XRay
to Apple systems.
Size comparison
TODO
Size reduction opportunities
ELFCLASS32 structures are already compact, offering limited size reduction potential. ELFCLASS64 structures, while flexible, can be optimized by sacrificing some flexibility (64-bit quantities). The 64-bit symbol control structure is compact, but section and relocation's are quite wasteful if we can sacrifice some flexibility.
As the ELF paper acknowledges, "Relocatable and executable files do not necessarily have the same constraints, and we considered using two file formats. Eventually, we decided the two activities were similar enough that a single format would suffice." There are more tools inspecting executables than relocatable files. So, naturally, we might want to change just relocatable files. Can we use ELFCLASS32 relocatable files for 64-bit architectures?
Well, x86-64 and AArch64 make a clear distinct of ELFCLASS32 and ELFCLASS64. ELFCLASS32 is for ILP32 (x32, aarch64_ilp32) while ELFCLASS64 is for LP64. However, the discontinued Itanium architecture sets a precedent that ELFCLASS32 can be used for LP64 programs. Quoting its psABI (Intel Itanium Processorspecific Application Binary Interface (ABI)).
For Itanium architecture ILP32 relocatable (i.e. of type ET_REL) objects, the file class value in e_ident[EI_CLASS] must be ELFCLASS32. For LP64 relocatable objects, the file class value may be either ELFCLASS32 or ELFCLASS64, and a conforming linker must be able to process either or both classes. ET_EXEC or ET_DYN object file types must use ELFCLASS32 for ILP32 and ELFCLASS64 for LP64 programs.
Addresses appearing in ELFCLASS32 relocatable objects for LP64 programs are implicitly extended to 64 bits by zero-extending.
Note: Some constructs legal in LP64 programs, e.g. absolute 64-bit addresses outside the 32-bit range, may require use of an ELFCLASS64 relocatable object file.
Given the prior art, it seems promising to allow ELFCLASS32 when the code size concerns people. Ideally there should be a marker to distinguish ILP32 and LP64-using-ELFCLASS32 object files.
The primary changes reside in the assembler and linker. It's also important to ensure that binary manipulation programs (like objcopy) and dump tools are happy with them.
Further optimization potential lies in exploring the use of
Elf32_Rel
instead of Elf32_Rela
for even
smaller relocations.
Replacing control structures
This approach is independent of whether ELFCLASS32 is adopted and can be applied to both ELFCLASS32 and ELFCLASS64. The ELF paper is clear, "ELF allows extension and redefinition for other control structures." However, caution is warranted due to the significant impact on the ecosystem as many tools rely on the existing structures.
One promising example is Elf32_Shdr_minimized
, a custom
structure reduced to 32 bytes from the standard
Elf32_Shdr
's 40 bytes. While I would be nervous, but if we
reduce sh_type
to a uint16_t
, the structure
size can reduce to 28 bytes.
TODO
WebAssembly