The gcc program is a compiler driver. It invokes other programs to do the work of compiling (cc1, cc1plus), assembling (GNU as), and linking (collect2). The behavior is controlled by spec strings, which are provided by a plain-text spec file.
You can run gcc -dumpspecs to dump the built-in spec file. It is complex but the main idea is construction of cc1/assembler/linker command lines. Note: the interaction with the assembler/the linker should be clear from the output.
The g++ program is another compiler driver. It additionally links against the C++ library. The two programs are otherwise equivalent.
You can specify -specs= to override built-in directives. Here is an spec file derived from musl-gcc:
This makes it easy to try out musl on a glibc-based system.
While a spec file can control some behaviors of gcc, many behaviors (target preferences) are guarded by macros are configured at build time. It is quite common for toolchain developers to experiment with different configure options.
The Clang driver is similar to the gcc program in concepts but does more things. You can specify clang --target=aarch64-linux-gnu to get aarch64 defaults. The specified other options are translated by the driver into cc1 options. In many cases you can observe the differences between two targets by comparing their cc1 output. This design makes testing easy. It is recommended to test features with cc1 options and place the target-specific behavior tests under clang/test/Driver/.
The driver recognizes the file name suffix to determine the compilation pipeline.
*.c: C source code which must be preprocessed
*.cc *.cpp: C++ source code which must be preprocessed
*.h: C header file to precompile
*.hh *.hpp: C++ header file to precompile
*.i: C source code which should not be preprocessed
*.ii: C++ source code which should not be preprocessed
...
other: object file to be fed straight into linking
gcc a.c performs preprocessing/analysis/compiling/assembly generation/assembling/linking. gcc a.i skips preprocessing. g++ a.cc b.cc performs every phase before linking for each input file and does a link on all object files.
Some options can cause the driver/compiler to dispatch/do less work. The most common ones are:
Clang has an integrated assembler which is enabled by default for most cases. When it is enabled, clang -c and clang -S just choose the different streamers (assembly vs object file). clang -S -fno-integrated-as may behave differently because certain features may be integrated assembler only, or only supported by very new GNU as. I added -fbinutils-version= to give users a choice not to worry about old GNU as/ld.
GCC does not have an integrated assembler. -c causes GCC to additionally feed the assembly to GNU as.
Debugging
-v and -### can print the command lines. -### skips execution.
In GCC, the built-in include paths are computed by cc1/cc1plus (global variable include_prefixes). The rule is very complex. -v can give us the precedence.
Vanilla GCC
--enable-multi-arch is the default for native builds if glibc supports it.
# Configured with --disable-bootstrap --enable-languages=c,c++ --disable-multilib. % /tmp/opt/gcc-debug/bin/gcc --print-multiarch x86_64-linux-gnu % /tmp/opt/gcc-debug/bin/gcc --print-multi-os ../lib64 % /tmp/opt/gcc-debug/bin/gcc --print-multi-lib .; % /tmp/opt/gcc-debug/bin/gcc -fsyntax-only a.cc -v ... #include "..." search starts here: #include <...> search starts here: /tmp/opt/gcc-debug/lib/gcc/x86_64-pc-linux-gnu/11.0.1/../../../../include/c++/11.0.1 /tmp/opt/gcc-debug/lib/gcc/x86_64-pc-linux-gnu/11.0.1/../../../../include/c++/11.0.1/x86_64-pc-linux-gnu /tmp/opt/gcc-debug/lib/gcc/x86_64-pc-linux-gnu/11.0.1/../../../../include/c++/11.0.1/backward /tmp/opt/gcc-debug/lib/gcc/x86_64-pc-linux-gnu/11.0.1/include /usr/local/include/x86_64-linux-gnu # affected by sysroot, multiarch, usually nonexistent /usr/local/include # affected by sysroot /tmp/opt/gcc-debug/include /tmp/opt/gcc-debug/lib/gcc/x86_64-pc-linux-gnu/11.0.1/include-fixed /tmp/opt/gcc-debug/lib/gcc/x86_64-pc-linux-gnu/11.0.1/../../../../x86_64-pc-linux-gnu/include /usr/include/x86_64-linux-gnu # affected by sysroot, multiarch /usr/include # affected by sysroot ... % /tmp/opt/gcc-debug/bin/g++ a.cc '-###' |& sed -E 's/ "?-[iIL]/\n&/g' ... -L/tmp/opt/gcc-debug/lib/gcc/x86_64-pc-linux-gnu/11.0.1 -L/tmp/opt/gcc-debug/lib/gcc/x86_64-pc-linux-gnu/11.0.1/../../../../lib64 -L/lib/x86_64-linux-gnu # affected by sysroot, multiarch -L/lib/../lib64 # affected by sysroot -L/usr/lib/x86_64-linux-gnu # affected by sysroot, multiarch -L/usr/lib/../lib64 # affected by sysroot -L/tmp/opt/gcc-debug/lib/gcc/x86_64-pc-linux-gnu/11.0.1/../../.. # -L$sysroot/lib if sysroot is not "" or "/" # -L$sysroot/usr/lib if sysroot is not "" or "/" ...
Some paths are relative to the GCC installation:
The first three search paths (include/c++) are for libstdc++. Debian patched native gcc has altered the search paths.
/tmp/opt/gcc-debug/lib/gcc/x86_64-pc-linux-gnu/11.0.1/include refers to GCC's private headers.
The others are relative to sysroot. I have annotated the lines in the output. Due to multiarch, $sysroot/usr/local/include and $sysroot/usr/local/include are preceded by their $multiarch counterparts. That is the main point: different architectures have separate include directories while they can share some common directories. However, the library directories cannot really be shared and the common directories just cause issues. The problem in practice is that Debian has local multiarch patches which do things differently - the differences seem entirely unnecessary to me. Read on.
Let's see the output of a vanilla --disable-multi-arch native compiler. The sysroot directory should be clear from the output.
% aarch64-linux-gnu-g++ --print-multiarch aarch64-linux-gnu % aarch64-linux-gnu-g++ --print-multi-os ../lib % aarch64-linux-gnu-g++ --print-multi-lib .; % aarch64linux-gnu-g++ -fsyntax-only a.cc -v ... ignoring nonexistent directory "/usr/lib/gcc-cross/aarch64-linux-gnu/10/include-fixed" #include "..." search starts here: #include <...> search starts here: /usr/lib/gcc-cross/aarch64-linux-gnu/10/../../../../aarch64-linux-gnu/include/c++/10 /usr/lib/gcc-cross/aarch64-linux-gnu/10/../../../../aarch64-linux-gnu/include/c++/10/aarch64-linux-gnu /usr/lib/gcc-cross/aarch64-linux-gnu/10/../../../../aarch64-linux-gnu/include/c++/10/backward /usr/lib/gcc-cross/aarch64-linux-gnu/10/include /usr/local/include/aarch64-linux-gnu # affected by sysroot, usually nonexistent /usr/lib/gcc-cross/aarch64-linux-gnu/10/../../../../aarch64-linux-gnu/include # Debian specific, g++-multiarch-incdir.diff /usr/include/aarch64-linux-gnu # affected by sysroot, usually nonexistent /usr/include # affected by sysroot ... % aarch64-linux-gnu-g++ a.cc '-###' |& sed -E 's/ "?-[iIL]/\n&/g' ... -L/usr/lib/gcc-cross/aarch64-linux-gnu/10 -L/usr/lib/gcc-cross/aarch64-linux-gnu/10/../../../../aarch64-linux-gnu/lib/../lib -L/lib/aarch64-linux-gnu # affected by sysroot, Debian specific -L/lib/../lib # affected by sysroot -L/usr/lib/aarch64-linux-gnu # affected by sysroot, Debian specific -L/usr/lib/../lib # affected by sysroot -L/usr/lib/gcc-cross/aarch64-linux-gnu/10/../../../../aarch64-linux-gnu/lib ...
I hope from various dumpings you have some idea what multiarch/multi-os are. multilib is for integrating -m32/-mx32 functionality into an x86-64 targeted compiler, and situations similar to that. This section does not touch multilib.
multi-os looks very broken to me. The ../lib64 or ../lib makes no sense.
Arch Linux
aarch64-linux-gnu-gcc --print-sysroot prints /usr/aarch64-linux-gnu-gcc. Compilers for different architectures have disjoint include paths. This can cause some redundancy.
Clang
In Clang, the include paths are computed by the driver.
You can specify --target= to ask for cross compiling. Clang will happily detect system GCC installations and add appropriate include and library paths.
Note that Clang before 13.0.0 incorrectly assumes that cross gcc follows the Debian native gcc behavior.
Note that "-internal-isystem" "/usr/lib/gcc-cross/aarch64-linux-gnu/10/../../../../include/aarch64-linux-gnu/c++/10" referrs to a nonexistent directory, so compiling a file with C++ headers will lead to such an error:
1 2 3
/usr/lib/gcc-cross/aarch64-linux-gnu/10/../../../../include/c++/10/iostream:38:10: fatal error: 'bits/c++config.h' file not found #include <bits/c++config.h> ^~~~~~~~~~~~~~~~~~
I have fixed the problem in 13.0.0 and cleaned up unneeded search paths. My guideline is to make Clang able to pick up both vanilla and Debian gcc libstdc++/start files.
"-internal-isystem" "/usr/lib/gcc-cross/aarch64-linux-gnu/10/../../../../aarch64-linux-gnu/include/c++/10" "-internal-isystem" "/usr/lib/gcc-cross/aarch64-linux-gnu/10/../../../../aarch64-linux-gnu/include/c++/10/aarch64-linux-gnu" "-internal-isystem" "/usr/lib/gcc-cross/aarch64-linux-gnu/10/../../../../aarch64-linux-gnu/include/c++/10/backward" "-internal-isystem" "/usr/local/include" "-internal-isystem" "/usr/lib/gcc-cross/aarch64-linux-gnu/10/../../../../aarch64-linux-gnu/include" "-internal-isystem" "/tmp/RelA/lib/clang/13.0.0/include" "-internal-externc-isystem" "/include" "-internal-externc-isystem" "/usr/include" ... "/usr/bin/aarch64-linux-gnu-ld" "-EL" "--eh-frame-hdr" "-m" "aarch64linux" "-dynamic-linker" "/lib/ld-linux-aarch64.so.1" "-o" "a.out" "/usr/lib/gcc-cross/aarch64-linux-gnu/10/../../../../aarch64-linux-gnu/lib/crt1.o" "/usr/lib/gcc-cross/aarch64-linux-gnu/10/../../../../aarch64-linux-gnu/lib/crti.o" "/usr/lib/gcc-cross/aarch64-linux-gnu/10/crtbegin.o" "-L/usr/lib/gcc-cross/aarch64-linux-gnu/10" "-L/usr/lib/gcc-cross/aarch64-linux-gnu/10/../../../../lib64" "-L/lib/aarch64-linux-gnu" # affected by sysroot, multiarch "-L/lib/../lib64" # affected by sysroot "-L/usr/lib/aarch64-linux-gnu" # affected by sysroot, multiarch "-L/usr/lib/../lib64" # affected by sysroot "-L/usr/lib/gcc-cross/aarch64-linux-gnu/10/../../../../aarch64-linux-gnu/lib" "-L/usr/lib/gcc-cross/aarch64-linux-gnu/10/../../.." "-L/tmp/RelA/bin/../lib" # So that -lc++ -lc++abi can pick up libc++/libc++abi built together with clang "-L/lib" # affected by sysroot, always added (unlike gcc) "-L/usr/lib" # affected by sysroot, always added (unlike gcc)
In Clang, --sysroot= additionally changes where Clang detects GCC installations ($sysroot and $sysroot/usr). So the include/library paths for libstdc++/crtbegin/crtend will change as well.
You may specify --gcc-toolchain= to override the prefix used to detect GCC installations.
Link phase
If you link a program with a compiler driver (clang/gcc) in a standard way (not -nostdlib), the following components are usually on the linker command line.
crt1.o (glibc/musl): -no-pie/-pie/-static-pie
crt1.o: -no-pie
Scrt1.o: -pie, -shared
rcrt1.o: -static-pie
gcrt1.o:
crti.o (glibc/musl)
crtbegin.o
crtbegin.o: -no-pie
crtbeginS.o: -pie, -shared
crtbeginT.o: -static-pie
user input
-lstdc++
Some combination of -lc -lgcc_s -lgcc -lgcc_eh
crtn.o (glibc/musl)
crtend.o
crtend.o: -no-pie
crtendS.o: -pie, -shared
crtendT.o: -static-pie
crt1.o
This file is only used by executables.
In glibc, the file is -r linked from csu/start.c csu/abi-note.c csu/init.c csu/static-reloc.c. It used to call __libc_start_main with arguments main, __libc_csu_init, __libc_csu_fini (defined by libc_nonshared.a(elf-init.oS)). From BZ #23323 onwards, on most architectures, start.S:_start calls __libc_main_start with two zero arguments instead, and __libc_csu_init and __libc_csu_fini are moved into csu/libc-start.c.
In musl, this file calls __libc_start_main with main, _init, and _fini.
crti.o/crtn.o
crti.o defines _init in the .init section and _fini in the .fini section. The defined _init (_fini) is a fragment which is expected to be concatenated with other files and finally crtn.o to get the full definition.
In glibc x86-64, sysdeps/x86_64/crti.S and sysdeps/x86_64/crtn.S provide the definitions:
crti.o calls __gmon_start__ (gmon profiling system) if defined. This is used by gcc -pg.
The linker defines DT_INIT if _init (default value for -init) is defined, and DT_FINI if _fini is defined.
The section fragment idea is fragile. On RISC-V, DT_INIT/_init is not used. crti.o and crtn.o have no code/.init/.fini.
crtbegin.o/crtend.o
libgcc/crtstuff.c
If __LIBGCC_INIT_ARRAY_SECTION_ASM_OP__ is not defined and __LIBGCC_INIT_SECTION_ASM_OP__ is defined (HAVE_INITFINI_ARRAY_SUPPORT is 1 in $builddir/gcc/auto-host.h),
crtend.o defines a .init section which calls __do_global_ctors_aux. __do_global_ctors_aux calls the static constructors in the .ctors section.
crtbegin.o defines a .fini section which calls __do_global_dtors_aux. __do_global_dtors_aux calls the static constructors in the .dtors section.
crtbegin.o defines .ctors and .dtors with a single -1 value.
crtend.o defines .ctors and .dtors with a single 0 value.
On modern distributions, __LIBGCC_INIT_ARRAY_SECTION_ASM_OP__ is 0 and crtend.o contains no .text/.ctors/.dtors.
glibc startup sequence
Below the control flows are flattened.
Dynamically linked executable
In rtld:
sysdeps/x86_64/dl-machine.h:_user
elf/rtld.c:_dl_start
sysdeps/x86_64/dl-machine.h:_dl_start_user
elf/dl-init.c:_dl_init
Jump to the main executable e_entry
In the main executable:
sysdeps/x86_64/start.S:_start
csu/libc-start.c:__libc_start_main, the SHARED branch
(if ELF_INITFINI is defined) Run DT_INIT
Run DT_INITARRAY
Run main
Run exit
stdlib/exit.c:__run_exit_handlers
Statically linked executable
In the main executable:
sysdeps/x86_64/start.S:_start
csu/libc-start.c:__libc_start_main, the !SHARED branch
_dl_relocate_static_pie
ARCH_SETUP_IREL
ARCH_SETUP_TLS
csu/libc-start.c:call_init
Run [__preinit_array_start, __preinit_array_end)
(if ELF_INITFINI is defined) Run _init
Run [__init_array_start, __init_array_end)
Run main
Run exit
musl startup sequence
For a dynamically linked executable, the rtld process:
__libc_start_init has different behaviors for dynamically and statically linked executables. For a dynamically linked executable: it runs DT_INIT (unless NO_LEGACY_INITFINI) then DT_INIT_ARRAY. Note: libc.so has a dummy _init.