A symbol whose st_shndx
field holds SHN_COMMON
is a COMMON symbol. Alternatively, a symbol whose st_type
field holds STT_COMMON
is a COMMON symbol.
Programming language behavior
FORTRAN 77 COMMON blocks compile to COMMON symbols. You can declare a COMMON block in more than one file, with each specifying the number, type, and size of the variable. The linker allocates enough space to satisfy the largest size.
This feature was somehow ported to C. Unix C compilers traditionally permitted a variable using tentative definition in different compilation units and the linker would allocate enough space without reporting an error.
This behavior is constrast to both C and C++ standards, but GCC and Clang traditionally defaulted to -fcommon
for C. GCC since 10 and Clang since 11 default to -fno-common
.
Assembler behavior
The directive .comm identifier, size[, alignment]
instructs the assembler to define a COMMON symbol with the specified size and the optional alignment.
In the ELF object file format, the symbol is represented as a STT_OBJECT
STB_GLOBAL
symbol whose st_shndx
field holds SHN_COMMON
.
1 | typedef struct { |
1 | % cat a.s |
The st_value
field holds the alignment.
The binding STB_WEAK
is not allowed. Other types are not allowed:
1 | % >err.s cat <<e |
The generic ABI supports STT_COMMON
as another way to label a COMMON symbol. It says:
Symbols with type
STT_COMMON
label uninitialized common blocks. In relocatable objects, these symbols are not allocated and must have the special section indexSHN_COMMON
(see below). In shared objects and executables these symbols must be allocated to some section in the defining object.In relocatable objects, symbols with type
STT_COMMON
are treated just as other symbols with indexSHN_COMMON
. If the link-editor allocates space for theSHN_COMMON
symbol in an output section of the object it is producing, it must preserve the type of the output symbol asSTT_COMMON
.When the dynamic linker encounters a reference to a symbol that resolves to a definition of type
STT_COMMON
, it may (but is not required to) change its symbol resolution rules as follows: instead of binding the reference to the first symbol found with the given name, the dynamic linker searches for the first symbol with that name with type other thanSTT_COMMON
. If no such symbol is found, it looks for theSTT_COMMON
definition of that name that has the largest size.
--elf-stt-common=yes
causes GNU assembler to use STT_COMMON
. It is super rare in the wild, though.
1 | % as a.s --elf-stt-common=yes -o a.o |
Linker behavior
The quoted generic ABI text describes the behavior when a COMMON symbol has different sizes in relocatable objects. The output symbol gets the largest size.
Platforms differ in how the alignment is selected. GNU ld and ld.lld pick the largest alignment.
1 | as -o a.o <<< '.comm x,8,4' |
Mach-O ld64 lets the copy with the largest size decide the alignment.
IN ELF, the precedence is STB_GLOBAL > COMMON > STB_WEAK
.
When the link editor combines several relocatable object files, it does not allow multiple definitions of
STB_GLOBAL
symbols with the same name. On the other hand, if a defined global symbol exists, the appearance of a weak symbol with the same name will not cause an error. The link editor honors the global definition and ignores the weak ones. Similarly, if a common symbol exists (that is, a symbol whosest_shndx
field holdsSHN_COMMON
), the appearance of a weak symbol with the same name will not cause an error. The link editor honors the common definition and ignores the weak ones.
1 | as -o a.o <<< '.comm x,8,4' |
1 | % ld.bfd -e 0 a.o b.o |
GNU ld ported a strange rule from SUN's linker in 1999-12: GNU-ld behaviour does not match native linker behaviour.
Here is a table showing when an element is pulled in from an archive with the Solaris 2.6 linker and ar program:
1 | main program\archive undefined common defined |
When a symbol is COMMON and ld sees an archive, ld checks whether the archive index provides a STB_GLOBAL
definition of the symbol. If yes, ld extracts the archive as well. This is in contrary to the usual rule that only an undefined symbol leads to archive member extraction.
ld.lld since 12.0.0 has this behavior (D86142) with the enabled-by-default --fortran-common
option.
Say b0.a
and b1.a
are mostly identical archives, but b0.a
objects are compiled with -fcommon
while b1.a
objects are compiled with -fno-common
. If a.o
references b0.a
, this archive lookup behavior may cause a duplicate definition error for ld a.o b0.a b1.a
while b1.a
can be shadowed by b0.a
without the rule.
1 | echo 'extern int ret; int main() { return ret; }' > a.c |
1 | # ret in b0.a(b0.o) is COMMON. b1.a(b1.o) is extracted to override the COMMON symbol with a STB_GLOBAL definition. |
What I am most concerned with is how to parallelize symbol resolution in the presence of this archive lookup rule.
GNU ld and ld.lld treat COMMON symbols as though they are in an input section named COMMON
. *(COMMON)
in a linker script can match these symbols.
Error-prone COMMON symbols
With -fcommon
, due to the linker symbol resolution rule, a tentative definition int x;
may be overridden by a STB_GLOBAL
definition in another compilation unit. This is error-prone since the user may assume an initial value of zero if unware of int x = 1;
.
1 | gcc -c -fcommon -xc - -o a.o <<< 'int x;' |
GNU ld and ld.lld support --warn-common
which detects the error-prone overridding.
1 | % gcc -shared -fuse-ld=bfd -Wl,--warn-common a.o b.o |
Some legacy code may inadvertently rely on COMMON symbols by having something like int x;
in a header file. Such code may not compile with -fno-common
.