By Francesco Bertolaccini

Rellic is a framework for analyzing and decompiling LLVM modules into C code, implementing the concepts described in the original paper presenting the Dream decompiler and its successor, Dream++. It recently made an appearance on this blog when I presented rellic-headergen, a tool for extracting debug metadata from LLVM modules and turning them into compilable C header files. In this post, I am presenting a tool I developed for exploring the relationship between the original LLVM module and its decompiled C version: rellic-xref.

rellic-xref’s interface

rellic-xref

I was tasked with improving the quality of the “provenance information” (i.e., information about how each LLVM instruction relates to the final decompiled code) produced by Rellic. However, the main front end for Rellic (a tool called rellic-decomp that produces C source code in a “batch processing” style) does not give much information about what is happening under the hood. Most importantly, it does not print any of the provenance info I was interested in.

To solve this problem, I developed rellic-xref as a web interface so that Rellic can be used in an interactive and graphical way. rellic-xref spins up a local web server and serves as an interface to the underlying decompiler. When a user uploads an LLVM module, a textual representation of the module is presented in the right-hand pane. The user can then run a variety of preprocessing steps on the LLVM bitcode or proceed to the decompilation step. Preprocessing is sometimes required when the module contains instructions that are not yet supported by the decompilation engine.

Once the module has been decompiled, the original module and decompiled source code are shown side by side:

Original module and decompiled source code

The main attraction of the tool is already available at this point: hovering your mouse over parts of the decompiled source code will highlight the instructions that led to its generation, and vice versa.

The refinement process

The source code that’s produced straight out of the decompiler is not quite as pretty as it could be. To fix this, Rellic offers a series of refinement passes that make the code more readable. These refinement passes are normally executed iteratively, and the order in which these passes are executed is hard-coded in rellic-decomp. For example, pressing the “Use default chain” button loads rellic-decomp’s default pass configuration into rellic-xref.

Refinement passes offered by Rellic

The default chain consists of passes that are executed only once and passes that are computed iteratively to search for a fixed point. However, rellic-xref gives users the ability to change the order and the way in which the refinement passes are run: passes can be removed (using the “x” buttons in the figure above) and inserted at will.

Although the Rellic passes operate on the Clang abstract syntax tree (AST), they are not suitable for any generic C program, as they rely on assumptions about how the decompilation process generates them. In particular, the decompiled code is initially in a form similar to single static assignment (SSA), reflecting the fact that it is directly generated from LLVM bitcode, which is itself in SSA form.

Refinement passes

For those who are unfamiliar with Rellic’s refinement passes, we provide descriptions of these passes followed by an example of how they refine the code.

Dead statement elimination
This pass removes statements from the AST that do not cause any side effects. For example, “null” statements (lone semicolons) and several types of expressions can be safely removed.

Original Refined

void foo() {

int a;

a = 3;

a;

bar(a);

}

void foo() {

int a;

a = 3;

bar(a);

}

Z3 condition simplification
This pass uses the Z3 SMT solver to improve the quality of if and while conditions by reducing their size as much as possible. It works in a recursive manner by inspecting the syntax tree of each condition and pruning any branch that it finds to be trivially true or false.

Original Refined

if(1U && x == 3) {

foo(x);

}

if(x == 3) {

foo(x);

}

if(x != 3 || x == 3) {

foo(x);

}

if(1U) {

foo(x);

}

Nested condition propagation
As the name suggests, this pass propagates conditions from parent statements to their children. In practical terms, this means that conditions in parent if and while statements are assumed to be true in nested if and while statements.

Original Refined

if(x == 0) {

if(x == 0 && y == 1) {

bar();

}

}

if(x == 0) {

if(1U && y == 1) {

bar();

}

}

Nested scope combination
This one is pretty simple: any statement that appears in a compound statement or in a trivially true if statement can be extracted and put into the parent scope. This pass works on the assumption that all local variables have been declared at the beginning of the function.

Original Refined

void foo() {

int x;

int y;

{

x = 1;

}

if(1U) {

y = 1;

}

}

void foo() {

int x;

int y;

x = 1;

y = 1;

}

Condition-based refinement
This pass recognizes when a scope contains two adjacent if statements that have opposite conditions and merges them into an if-else statement.

Original Refined

if(a == 42) {

foo();

}

if(!(a == 42)) {

bar();

}

if(a == 42) {

foo();

} else {

bar();

}

Reachability-based refinement
Similar to the condition-based refinement pass, this one recognizes when successive if statements have exclusive but not opposite reaching conditions and rearranges them into if-else-if statements.

Original Refined

if(a == 42) {

foo();

}

if(!(a == 42) && b == 13) {

bar();

}

if(a == 42) {

foo();

} else if(b == 13) {

bar();

}

Loop refinement
This pass recognizes if statements that should be responsible for terminating infinite while loops. It refactors the code to create loops with conditions.

Original Refined

while(true) {

if(a == 42) {

break;

}

foo();

}

while(!(a == 42)) {

foo();

}

Expression combination
This pass performs a number of simplifications, such as turning pointer arithmetic into array accesses and removing superfluous casts.

Original Refined
*&x
x
!(x == 5)
x != 5
(&expr)->field
expr.field

Condition normalization
This is the only pass that is not used in the default refinement chain. The purpose of this pass is to turn conditions into conjunctive normal form to reveal more opportunities for simplification. Unfortunately, it also tends to produce exponentially large conditions—literally!—so it is best used sparingly and only as a very last step before applying a final simplification using Z3.

I’m sold! How do I use it?

Just follow the instructions in the README and you’ll have an instance of rellic-xref running in no time.

Closing thoughts

rellic-xref turns the traditionally batch-oriented Rellic into a more interactive tool that provides insight into the decompilation and refinement process. It is also a glimpse into what the Rellic framework could be used for. For instance, what if the user had more control over the underlying Clang AST? Allowing custom variable renaming and retyping, for example, would go a long way in making Rellic feel like a proper component of a reverse engineering suite. Further work on rellic-xref (or development of a similar tool) could give users more control over the Clang AST in this way.

Rellic’s passes operate directly at the C AST level and heavily use Clang’s APIs. This was both a blessing and a curse as I worked with Rellic. For instance, calling the Clang AST “abstract” is a bit of a misnomer, as it has characteristics of both an abstract and a concrete syntax tree. For example, it contains information about positions of tokens and comments but also things that are not actually present in the source code text, like implicit casts. My experience with Rellic has taught me that the Clang AST interface is not really meant to be used as a mutable resource, and it has more of a write-once, read-many semantics. We have plans to migrate Rellic for use in an upcoming project featuring MLIR, which may help in this regard. However, that is beyond the scope of this blog post.

I’d like to thank my mentor, Peter Goodman, for his guidance during my internship and Marek Surovič for his precious feedback on my work with Rellic. Working at Trail of Bits continues to prove to be a great experience full of gratifying moments.