DJI - The ART of obfuscation
2024-2-6 07:0:0 Author: blog.quarkslab.com(查看原文) 阅读量:13 收藏

Study of an Android runtime (ART) hijacking mechanism for bytecode injection through a step-by-step analysis of the packer used to protect the DJI Pilot Android application.

Logo

Introduction

In the world of Android applications, it's not uncommon to come across applications protected by a packer. The role of a packer is to protect all or part of the application code from static analysis. There are many reasons why a developer might want to protect an application:

  • Protect valuable business logic;
  • Protect application monetization logic (e.g. a license management mechanism);
  • Evading conventional analysis tools to hide malicious logic;
  • ...

Here, we take a look at the DJI Pilot application, not to understand why developers want to protect their code - this has already been the subject of previous work (see in particular this DJI Pilot analysis) - but to highlight a runtime mechanism implemented by DJI to protect its application code. This protection is based on the use of a modified version of the SecNeo packer.

The article details the various stages in the analysis to understand how the application code is obfuscated. A Python proof-of-concept named DxFx for statically unpacking the DJI Pilot application is provided as practical support for this article. DxFx does not claim to be a SecNeo unpacker. Its sole aim is to improve the reader's understanding of the various mechanisms implemented by the packer through Python code. It will not be maintained in the future.

Targeted application

The analysis is performed on the latest version of the DJI Pilot application:

  • Version: 2.5.1.17
  • SHA256: 642aa123437c259eea5895fe01dc4210c4a3a430842b79612074d88745f54714
  • Download link

DxFx provided in support of the article has also been tested on the following versions of the DJI Pilot application:

  • Version: 2.5.1.15
  • SHA256: d6f96f049bc92b01c4782e27ed94a55ab232717c7defc4c14c1059e4fa5254c8

and

  • Version: 2.5.1.10
  • SHA256: 860d9d75dc2b2e9426f811589b624b96000fea07cc981b15005686d3c55251d9

Bytecode, where are you?

Primary analysis

Static analysis of the APK initially reveals that the result of bytecode decompilation is, to say the least, uncluttered...

Decompiled tree

This is because, like other packers, SecNeo leaves only a bootstrap code in the bytecode to launch the application's unpacking phase. Here, the packer bootstrap code loads the native libDexHelper.so library:

Decompiled tree

The first step in the analysis is therefore to find the bytecode containing the application's business logic.

The packer logic is present in the native library libDexHelper.so. However, the code of this library is itself packed. So, we have to unpack... the packer to analyze its logic.

As the aim of this article is not to understand how the packer itself is protected, this part is not dealt with in-depth, and we simply dump the library at runtime from the DJI Pilot application process memory space. There are a multitude of ways to do this, using tools such as gdb or Frida.

However, you may be in for a few surprises:

Cannot attach to process 25562: Operation not permitted (1), process 25562 is already traced by process 25598

or:

Failed to attach: process not found

The packer contains some countermeasures, as partially described in this issue, to prevent the use of dynamic tools. Fortunately, these can be easily bypassed.

Once libDexHelper.so has been dumped from memory, it can be analyzed with a disassembly tool.

First look at the packer binary

An initial brief analysis of the libDexHelper.so library reveals the presence of the decrypt_jar_128K symbol. A hook of the associated function with Frida reveals that a buffer is passed as input and contains the contents of a DEX file as output :

'use strict';

const dlopen_ext = Module.getExportByName(null, '__loader_android_dlopen_ext');

function main() {
  const decrypt_jar_128K_addr = Module.getExportByName(
    'libDexHelper.so', 'decrypt_jar_128K'
  );

  /**
  * decrypt_jar_128K function hook
  */
  Interceptor.attach(decrypt_jar_128K_addr, {
    onEnter: function(args) {
      this.dex_buffer_ptr = args[1];
    },
    onLeave: function() {
      console.log(`\nReading dex buffer @ ${this.dex_buffer_ptr}`);
      console.log(this.dex_buffer_ptr.readByteArray(16));
    }
  });
}

/**
 * Bootstrap
 */
const boot_intercept = Interceptor.attach(dlopen_ext, {
  onEnter: function(args) {
    this.name = args[0].readUtf8String();
  },
  onLeave: function() {
    if (this.name.includes('libDexHelper.so')) {
      main()
      boot_intercept.detach();
    }
  }
});

The result of the script is:

Reading dex buffer @ 0x74d1e63140
           0  1  2  3  4  5  6  7  8  9  A  B  C  D  E  F  0123456789ABCDEF
00000000  64 65 78 0a 30 33 35 00 4a 8b b5 fd 1b 58 54 1f  dex.035.J....XT.

Reading dex buffer @ 0x74d268c140
           0  1  2  3  4  5  6  7  8  9  A  B  C  D  E  F  0123456789ABCDEF
00000000  64 65 78 0a 30 33 35 00 6f 02 2a 0b 48 26 a5 e0  dex.035.o.*.H&..

Reading dex buffer @ 0x74d3005140
           0  1  2  3  4  5  6  7  8  9  A  B  C  D  E  F  0123456789ABCDEF
00000000  64 65 78 0a 30 33 35 00 8a b4 08 1c 90 61 5a 34  dex.035......aZ4

Reading dex buffer @ 0x74d3643140
           0  1  2  3  4  5  6  7  8  9  A  B  C  D  E  F  0123456789ABCDEF
00000000  64 65 78 0a 30 33 35 00 cb b9 8e 72 35 3a d8 bc  dex.035....r5:..

Reading dex buffer @ 0x74d4055140
           0  1  2  3  4  5  6  7  8  9  A  B  C  D  E  F  0123456789ABCDEF
00000000  64 65 78 0a 30 33 35 00 c2 8b a3 7b 64 3b c6 54  dex.035....{d;.T

Reading dex buffer @ 0x74d4a5f140
           0  1  2  3  4  5  6  7  8  9  A  B  C  D  E  F  0123456789ABCDEF
00000000  64 65 78 0a 30 33 35 00 dd 47 c2 4e a1 39 cc 79  dex.035..G.N.9.y

Reading dex buffer @ 0x74d552f140
           0  1  2  3  4  5  6  7  8  9  A  B  C  D  E  F  0123456789ABCDEF
00000000  64 65 78 0a 30 33 35 00 58 17 ae a9 56 21 f1 1f  dex.035.X...V!..

Reading dex buffer @ 0x74d5a77140
           0  1  2  3  4  5  6  7  8  9  A  B  C  D  E  F  0123456789ABCDEF
00000000  64 65 78 0a 30 33 35 00 84 62 14 0d ac 5f b7 f8  dex.035..b..._..

So, here we can see that 8 DEX files (with the dex.035 magic) are unpacked. It is possible to modify the previous hook to be able to dump the various DEX files as they are unpacked. Another solution is to understand where the packed DEX files are stored in the APK and how we can unpack them statically.

Static unpacking of DEX files

The advantage of the dynamic extraction method lies in its rapid implementation. However, the latter requires the application to be run and an environment set up to allow instrumentation of the process. Static extraction, on the other hand, enables cold unpacking of DEX files directly from the APK. The drawback of the static approach is that it requires a slightly deeper understanding of how the packer works.

DEX files where are you?

Some versions of the SecNeo packer store the bytecode in the classes0.jar file located in the APK assets. Unfortunately, this is not the case here as the file does not exist.

However, if we take a closer look at the classes.dex file located at the root of the APK and supposed to contain only the packer bootstrap code, we can see that something is wrong with its size:

du -h classes.dex
63M     classes.dex

63MB is a very large size for the code we observed in the first analysis. Usually, the multidex mechanism will split the bytecode file into several .dex files well before reaching this size. File entropy analysis also gives us some interesting clues:

Entropy of classes.dex

We can see 8 peaks tending towards an entropy of 8, which may suggest that these chunks are encrypted. The previous Frida hook revealed that 8 DEX files were unpacked, which is probably no coincidence. The 8 chunks shown in the graph correspond to 128KB sections, so we can make the connection with the decrypt_jar_128K symbol of the function. A differential analysis with the dynamically obtained files finally confirms that the classes.dex file contains all 8 DEX files after the SecNeo bootstrap code. The first 128K chunk of each DEX file is encrypted to probably conceal certain information that could be used to detect the presence of the hidden files like the magic number in the header.

Encryption analysis

To understand how the first 128KB of each DEX is decrypted, we need to analyze how the decrypt_jar_128K function works.

One of the function's basic blocks contains the encryption logic:

loc_8DC78
ADD             W3, W3, #1      ; i++
LDRB            W6, [X5],#1     ; x = buffer[cursor++]
AND             W7, W3, #0xFF   ; i %= 256
SUB             W0, W5, W1
MOV             X3, X7
CMP             X2, X0
LDRB            W0, [X8,X7]     ; +--
ADD             W4, W4, W0      ; | j = (j + S[i]) % 256
AND             W9, W4, #0xFF   ; +--
MOV             X4, X9
LDRB            W10, [X8,X9]    ; +--
STRB            W10, [X8,X7]    ; |
STRB            W0, [X8,X9]     ; | S[i], S[j] = S[j], S[i]
LDRB            W7, [X8,X7]     ; +--
ADD             W0, W7, W0      ; +--
UXTB            W0, W0          ; |
LDRB            W0, [X8,X0]     ; | x = S[(S[i] + S[j]) % 256] ^ x
EOR             W0, W0, W6      ; +--
STURB           W0, [X5,#-1]    ; buffer[cursor-1] = x
B.HI            loc_8DC78

This is RC4's pseudo-random generation algorithm (PRGA):

i := 0
j := 0
while GeneratingOutput:
    i := (i + 1) mod 256
    j := (j + S[i]) mod 256
    swap values of S[i] and S[j]
    t := (S[i] + S[j]) mod 256
    K := S[t]
    output K
endwhile

Analysis of the decrypt_jar_128K CFG gives us information about where different parts of the RC4 algorithm are located:

decrypt_jar_128K CFG

Encryption key generation

The key's cross-references lead to a generation function based on a simple XOR between a 16-byte hardcoded constant and the 16 first bytes of the string com.dji.industry.pilot:

Generate RC4 key DEX

We are now able to statically unpack DEX files.

The DEX encryption is currently implemented in the DexPool class of DxFx

However, disassembly of the unpacked DEX files reveals a problem. The code for a large number of methods seems to have been stolen, overwritten, and replaced mainly by nop instructions:

Stolen bytecode

We can therefore assume that the packer has a second bytecode protection mechanism.

Bytecode where are you? Again...

Method debug info

The various methods whose code is stolen all seem to contain a debug info offset (debug_info_off) which also appears in the body of the method:

Method degug_info_off

It seems there is something fishy with the debug_info_off, this field could play a role in the method code unpacking mechanism, perhaps as an identifier. Moreover, a classes.dgc file located in the APK assets contains a large number of debug info offsets used in stolen methods... The classes.dgc file therefore seems a potentially interesting candidate for further analysis.

The classes.dgc file

An entropy analysis reveals that the beginning of the file (oddly enough, a 128KB chunk) probably contains encrypted data:

Entropy of classes.dgc

This is a good lead to follow in the libDexHelper.so binary.

Encryption analysis

Likely, a mechanism similar to the 128KB chunk encryption of DEX files is used for the classes.dgc file. Analysis of libDexHelper.so reveals a function whose scheme also corresponds to an RC4 encryption algorithm:

DGC RC4 decryption

We can confirm that is the classes.dgc decryption function by using a simple Frida hook:

'use strict';

const dlopen_ext = Module.getExportByName(null, "__loader_android_dlopen_ext");
const nullptr = 0;

function main() {
  const rc4_fct_addr = Module.getExportByName(
    'libDexHelper.so',
    'p416302DA23BEF5D5A81473ACFAC4DA25'
  );

  Interceptor.attach(rc4_fct_addr, {
    onEnter: function(args) {
      console.log(args[0].readByteArray(32))
    }
  });
}

Interceptor.attach(dlopen_ext, {
  onEnter: function(args) {
    this.name = args[0].readUtf8String();
  },
  onLeave: function(retval) {
    if (retval != nullptr && this.name.includes('libDexHelper.so'))
      main();
  }
});

The result is:

           0  1  2  3  4  5  6  7  8  9  A  B  C  D  E  F  0123456789ABCDEF
00000000  ef bd de 50 8b bb 81 c7 80 63 35 ca 95 6e 1d 1d  ...P.....c5..n..
00000010  36 d5 ef 02 df 2a 50 2b e8 88 03 c3 9b 45 da 5f  6....*P+.....E._

It matches the first bytes of the classes.dgc file:

First bytes of the classes.dgc file

As with the decrypt_jar_128K function, the basic block initializing S to identity permutation reveals the presence of a cross-reference to the key.

Encryption key generation

From the cross-references, it is possible to locate the key generation function. The CFG of the function looks a bit like the one used to generate the DEX decryption key. However, a slightly more complex mechanism is used to generate the key:

DGC RC4 key generation

First, the MD5 hash of a 4096-byte binary blob in memory is computed. MD5 is identified by looking at a sub-function called in the previous CFG. This sub-function corresponds to the MD5 algorithm for calculating a block (512 bits). The algorithm is flattened and contains hardcoded K constants (0xe8c7b756, 0xd76aa478, ...).

The binary blob is loaded directly from libDexHelper.so and can be found even in the packed version of the library. This chunk appears to be preceded by a kind of header containing the name mthfilekey:

mthfilekey entry in libDexHelper.so

Once the MD5 has been calculated, a deterministic sequence is generated by calling another sub-function. Analysis of the function reveals that it is a Fibonacci sequence:

Fibonacci generator function CFG

Next, the 16 bytes of the MD5 hash are XORed with 16 bytes retrieved directly from the 4096-byte chunk (mthfilekey) following a deterministic walk based on the Fibonacci sequence previously generated.

We are now able to statically generate the RC4 key that decrypts the first 128KB of the classes.dgc file.

classes.dgc file format

Once decrypted, looking at classes.dgc reveals that the beginning of the file contains a table indexing all the application methods (code_item) whose code has been stolen:

classes.dgc index layout

Each table item points to the code_item of a method:

classes.dgc index layout

However, as it stands, the Dalvik opcodes present in the method bodies seem inconsistent and therefore probably obfuscated... At this stage, we have all the elements needed to link the stolen bytecode (even if obfuscated for the moment, we will address this later) to the application's various damaged methods. First of all, it's interesting to understand when the packer repairs the methods so that the application can run normally. This mechanism is particularly interesting because it uses an ART's functionality.

ART hijacking

ART in a nutshell

The Android Runtime (ART) is Dalvik's successor runtime in charge of optimizing and executing code for Android applications and other Android system components. The Android Runtime — How Dalvik and ART work? article by Paulina Sadowska is a great introduction to ART.

Class loading mechanism

When a method is to be executed, the runtime must first check that the class to which the method belongs is loaded. If this is not the case, the runtime will load and link the class. The linking process involves several phases as described in the Java Language Specification:

  1. Class verification;
  2. Class preparation;
  3. Resolution.

The stage we're interested in here is the class verification because it's precisely this stage that is instrumented by the packer. Among other things, this step checks the bytecode of the class's various methods for inconsistencies. It is implemented in the ClassLinker::VerifyClass method of ART.

One of the interesting features of VerifyClass is that it calls the UpdateClassAfterVerification method:

static void UpdateClassAfterVerification(Handle<mirror::Class> klass,
                                         PointerSize pointer_size,
                                         verifier::FailureKind failure_kind)
    REQUIRES_SHARED(Locks::mutator_lock_) {

  // [...]

  // Now that the class has passed verification, try to set nterp entrypoints
  // to methods that currently use the switch interpreter.
  if (interpreter::CanRuntimeUseNterp()) {
    for (ArtMethod& m : klass->GetMethods(pointer_size)) {
      if (class_linker->IsQuickToInterpreterBridge(m.GetEntryPointFromQuickCompiledCode())) {
        runtime->GetInstrumentation()->InitializeMethodsCode(&m, /*aot_code=*/nullptr);
      }
    }
  }
}

UpdateClassAfterVerification updates the entry points of the various methods of the verified class. So, it has to iterate over all the methods of the class and call the Instrumentation::InitializeMethodsCode method:

InitializeMethodsCode callgraph

Anatomy of the hook

The Instrumentation::InitializeMethodsCode method provides a crossing point on every method in the application that can be executed. It is precisely this crossing point that is exploited by the packer to repair methods whose code has been stolen. To do this, libDexHelper.so places a hook on InitializeMethodsCode:

Hook call graph

The prolog of the Instrumentation::InitalizedMethodsCode method is patched to redirect the execution flow to a function in libDexHelper.so that we call PatchMethodCode :

PatchMethodCode CFG

A few moments later... we can deduce the hook's anatomy and the different operations performed by PatchMethodCode :

Hook anatomy with callgraph

Once the PatchMethodCode function is called, it first loads the obfuscated bytecode of the current method using the debug_info_off as an identifier with the method index table of the classes.dgc file. The code is passed to the function we call here DecryptMethodCode to be de-obfuscated. Then code_item (dex::CodeItem) of the method (art::Method) is patched to point to the buffer containing the de-obfuscated bytecode.

This mechanism ensures that the damaged code in each method is repaired before the method is executed. At this point, the last thing we need to understand is how bytecode is obfuscated in classes.dgc. To do this, we need to analyze the DecryptMethodCode function.

Bytecode de-obfuscation

The function is rather small, and an analysis of a few basic blocks gives a good idea of how it works:

DecryptMethodCode CFG

The function iterates over each opcode. The obfuscated opcodes are XORed with the low byte of the method's info_debug_off offset. The result of this operation is then used as the index of a substitution table. The obfuscated opcode is replaced by the one obtained from the substitution table:

opcode = S[obfuscated_opcode ^ info_debug_off & 0xff]

Since the substitution table is theoretically a maximum of 256 bytes, one might assume that one of the RC4 KSA previously reversed is reused to generate it, but... no.

The S substitution table is simply stored in the libDexHelper.so library and can be directly extracted from the packed binary. We have everything we need to fix all the damaged methods and the unpacked DEX can be decompiled properly:

Fixed method

We are now able to perform static unpacking of the application.

  • The method fixing step is implemented in the Dex class of DxFx.
  • The bytecode de-obfuscation is located in the MethodCipher class of DxFx.

Conclusion

Through the unfolding of the analysis methodology used to create a static unpacker, we can see the different encryption/obfuscation algorithms used by the packer at different stages. In addition, we were able to highlight an interesting protection mechanism involving bytecode injection and exploiting Android runtime hijacking.


If you would like to learn more about our security audits and explore how we can help you, get in touch with us!


文章来源: http://blog.quarkslab.com/dji-the-art-of-obfuscation.html
如有侵权请联系:admin#unsafe.sh