Linker notes on AArch32
2023-4-23 15:0:0 Author: maskray.me(查看原文) 阅读量:18 收藏

UNDER CONSTRUCTION

This article describes target-specific details about AArch32 in ELF linkers. I described AArch64 in a previous article.

AArch32 is the 32-bit execution state for the Arm architecture and runs the A32 and T32 instruction sets. A32 refers to the old ISA with a 32-bit fixed width, while T32 refers to the mixed 16-bit and 32-bit Thumb2 instructions.

"AArch32", "A32", and "T32" are new names. Many projects use "ARM", "Arm", or "arm" as their port name.

ABI documents

Global Offset Table

The Global Offset Table consists of two sections:

  • .got.plt holds code addresses for PLT.
  • .got holds other addresses and offsets.

The symbol _GLOBAL_OFFSET_TABLE_ is defined at the beginning of the .got section. GNU ld reserves a single entry for .got and .got[0] holds the link-time address of _DYNAMIC for a legacy reason Versions of glibc prior to 2.35 have the _DYNAMIC requirement. See All about Global Offset Table.

Procedure Linkage Table

The PLT header looks like:

1
2
3
4
L1: str lr, [sp, #-4]!
add lr, pc, #0x0NN00000 &(.got.plt - L1 - 4)
add lr, lr, #0x000NN000 &(.got.plt - L1 - 4)
ldr pc, [lr, #0x00000NNN] &(.got.plt -L1 - 4)

If .git.plt-.plt-4 exceeds the +-128MiB range, a long-form PLT header is needed.

Cortex-M Security Extensions

--cmse-implib

This option is for linker support for the Cortex-M Security Extensions (CMSE). It does two jobs:

  • synthesize secure gateway veneers
  • write a CMSE import library when --out-implib= is specified

If a non-local symbol __acle_se_$sym is present, report an error if $sym is not defined. Otherwise $sym is considered a secure gateway veneer. Both __acle_se_$sym and $sym must be a non-absolute function with an odd st_value.

If the addresses of __acle_se_$sym and $sym are not equal, the linker considers that there is an inline secure gateway and doesn't do anything special; otherwise the linker synthesizes a secure gateway veneer in a special section .gnu.sgstubs with the following logic.

The linker allocates an input section in .gnu.sgstubs and defines $sym relative to it. In the output file, $sym is moved to .gnu.sgstubs, a different text section.

1
2
3
4
5
<.gnu.sgstubs>:
...
$sym:
sg
b.w __acle_se_$sym

If --in-implib is specified and the library defines $sym (say the address is $addr), in the output $sym has a fixed address of $addr. Otherwise, the linker assigns an address (larger than all synthesized secure gateway veneers with fixed addresses).

--out-implib=out.lib

Used with --cmse-implib. Write the CMSE import library to out.lib.

out.lib will have 3 sections: .symtab, .strtab, .shstrtab. For every synthesized Secure Gateway veneer, write a SHN_ABS symbol whose address is $addr (if specified by the --in-implib library) or the linker-assigned address. (The CMSE import library does not contain text sections, so a defined symbol has to use SHN_ABS.)

Thread Local Storage

AArch32 uses a variant of TLS Variant I: the static TLS blocks are placed above the thread pointer. The thread pointer points to the end of the thread control block.

The linker doesn't perform TLS optimization.

The traditional general dynamic and local dynamic TLS models are used by default. A TLSDESC ABI exists.

See All about thread-local storage.

Thunks

A destination not reachable by the branch instruction needs a range extension thunk.

v4, v4T

ARMv4T introduced the 16-bit Thumb instruction set. These processors do not support BLX. ARM/Thumb state change must be done using BX.

In the ARM state, the relocation types R_ARM_PC24/R_ARM_PLT32/R_ARM_JUMP24/R_ARM_CALL may need range extension or state change. lld 16.0.0 has added thunk support for Thumb. We can build Game Boy Advance or Nintendo DS roms using Thumb code with ld.lld.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
// ARM to ARM, absolute
ldr pc, [pc, #-4]
L1: .word S

// ARM to Thumb, absolute
ldr r12, [pc] ; L1
bx r12
L1: .word S

// ARM to ARM, position-independent
ldr ip, [pc]
L1: add pc, pc, ip
.word S - (L1 + 8)

// ARM to Thumb, position-independent
ldr ip, [pc]
L1: add ip, pc, ip
bx ip
.word S - (L1 + 8)

R_ARM_THM_CALL

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
// Thumb to ARM, absolute
bx
b #-6
ldr pc, [pc, #-4]
L1: .word S

// Thumb to Thumb, absolute
bx
b #-6
ldr ip, [pc]
bx ip
L1: .word S

// Thumb to ARM, position-independent
bx
b #-6
ldr ip, [pc, #]
L1: add ip, pc, ip
L2: .word S - (L1 + 8)

// Thumb to Thumb, position-independent
bx
b #-6
ldr ip, [pc, #-4]
L1: add ip, pc, ip
bx ip
L2: .word S - (L1 + 8)

v5, v6, v6-K, v6-KZ

These are pre-Cortex processors that support BLX, but not Thumb branch range extension or MOVT/MOVW.

There is no Thumb branch instruction in ARMv5 that supports thunks. LDR can switch processor states.

1
2
3
4
5
6
7
8
9
// absolute (LDR can switch processor states)
ldr pc, [pc, #-4]
.word S

// position-independent
ldr ip, [pc, #4]
L1: add ip, pc, ip
bx ip
.word S - (L1 + 8)

v6-M

Only Thumb instructions are supported. These processors support BLX and J1 J2 encodings (branch range extension), but not MOVT/MOVW.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
// near
b.w S

// far, absolute
push {r0, r1}
ldr r0, [pc, #4]
str r0, [sp, #4]
pop {r0, pc}
.word S

// far, position-independent
push {r0}
ldr r0, [pc, #8]; L2
mov ip, r0
pop {r0}
L1: add pc, ip
nop
L2: .word S - (L1 + 4)

v6T2, v7 (v7-A, v7-R, v7-M, v7E-M) and newer

ARMv6T2 introduced Thumb-2. These processors support BLX/MOVT/MOVW and Thumb branch range extension. (All architectures used in Cortex processors with the exception of v6-M and v6S-M have the MOVT/MOVW instructions.)

In the ARM state, these relocation types R_ARM_PC24, R_ARM_PLT32, R_ARM_JUMP24, R_ARM_CALL may need thunk for range extension or state change.

1
2
3
4
5
6
7
8
9
10
11
12
13
// near and the destination is in the ARM state
b S

// absolute
movw ip, :lower16:S
movt ip, :upper16:S
bx ip

// position-independent
movw ip, :lower16:S-(L1+8)
movt ip, :upper16:S-(L1+8)
L1: add ip, ip, pc
bx ip

In the Thumb state, these relocation types R_ARM_THM_JUMP19, R_ARM_THM_JUMP24, R_ARM_THM_CALL may need thunk for range extension or state change.

1
2
3
4
5
6
7
8
9
10
11
12
13
// near and the destination is in the Thumb state
b.w S

// absolute
movw ip, :lower16:S
movt ip, :upper16:S
bx ip

// position-independent
movw ip, :lower16:S-(L1+4)
movt ip, :upper16:S-(L1+4)
L1: add ip, ip, pc
bx ip

--fix-cortex-a8

This option enables a linker workaround for Arm Cortex-A8 Errata 657417. Linkers scan a 4-byte Thumb-2 branch instruction (Bcc.w, B.w, BLX.w, BL.w) that spans two 4KiB pages, and the target address of the branch falls within the first region. The branch instruction follows a 4-byte non-branch instruction. This may result in an incorrect instruction fetch or processor deadlock.

Oncea erratum condition is detected, linkers try to rewrite it into an alternative code sequence. See the comments in the implementations for detail.

SHT_ARM_ATTRIBUTES

Usually named .ARM.attributes.

SHT_ARM_EXIDX (.ARM.exidx) and .ARM.extab

See Stack unwinding


文章来源: https://maskray.me/blog/2023-04-23-linker-notes-on-aarch32
如有侵权请联系:admin#unsafe.sh