Datalog-Based Binary Equivalence

3 months ago 2

Jens Dietrich, Behnaz Hassanshahi: DALEQ - Explainable Equivalence for Java Bytecode (preprint)
Jens Dietrich, Tim White, Behnaz Hassanshahi, Paddy Krishnan: Levels of Binary Equivalence for the Comparison of Binaries from Alternative Builds. ICSME'25
Jens Dietrich, Tim White, Valerio Terragni, Behnaz Hassanshahi: Towards Cross-Build Differential Testing. ICST'25

DALEQ has been developed and tested with Java 17. DALEQ can be built with Maven as follows:

mvn clean package dependency:copy-dependencies

Then run DALEQ as follows (for windows: change the classpath separator accordingly):

java -DSOUFFLE=<path-to-souffle> -cp "target/classes:target/dependency/*" io.github.bineq.daleq.cli.Main \ -j1,--jar1 <arg> the first jar file to compare (required) -j2,--jar2 <arg> the second jar file to compare (required) -o,--out <arg> the output folder where the report will be generated (required) -s1,--src1 <arg> the first jar file with source code to compare (optional) -s2,--src2 <arg> the second jar file with source code to compare (optional) -a,--autoopen if set, the generated html report will be opened automatically (dont use for CLI integration)

The jar built can also be used directly as an executable superjar.

java -DSOUFFLE=<path-to-souffle> -jar target/daleq-<daleq-version>.jar \ -j1,--jar1 <arg> the first jar file to compare (required) -j2,--jar2 <arg> the second jar file to compare (required) -o,--out <arg> the output folder where the report will be generated (required) -s1,--src1 <arg> the first jar file with source code to compare (optional) -s2,--src2 <arg> the second jar file with source code to compare (optional) -a,--autoopen if set the generated html report will be opened automatically (dont use for CLI integration)

Running the program will create a report report.html in the specified output folder.

The program returns with the following exit codes:

0 - all classes and resources are equal, source codes (if analysed) are equivalent 1 - some classes are not equal but still equivalent, resources are equal, source-codes (if analysed) are equivalent 2 - some classes are not equivalent (wrt daleq) or some resources (e.g. metadata) are not equal or some sources (if analysed) are not equivalent

This can be used to integrate the tool into CI processes.

Daleq takes Java bytecode as input and produces a relational database. This is done in two steps.

flowchart LR bytecode["bytecode (.class)"] --> asm(("asm")) --> edb["EDB"] --> souffle(("souffle")) --> idb["IDB"]

A database representing the bytecode is extracted. An asm-based static analysis is used for this purpose. The result is the EDB (extensional database), a folder with tsv files, each tsv file corresponding to a predicate. Those facts represent the bytecode, including class name, bytecode version, super types and interfaces, access flags for method, fields and classes, bytecode instructions in methods, etc.

The representation is low-level, however, there is one significant abstraction (provided by asm): constant pool references are resolved.

Each fact has a unique generated id that is used to provide provenance.

The IDB computation applies rules to compute a second set of facts (IDB files) that normalises the EBD to establish whether two bytecode compared are equivalent or not.

Rules that are applied for this purpose are defined in src/main/resources/rules/, advanced.souffle is the default set.

The API to generate the IDB is provided by io.github.bineq.daleq.Souffle::createIDB. This requires souffle to be installed. Any application that uses this API must provide the location of souffle via an argument that is passed to the JVM:

-DSOUFFLE=<path to souffle binary>

The first slot (term) for each fact is an id. For EDB facts, those ids are generated during the bytecode analysis / extracyion phase.

Example (from AALOAD.facts):

F1428629 org/apache/hadoop/hdfs/HAUtil::getConfForOtherNodes(Lorg/apache/hadoop/conf/Configuration;)Ljava/util/List; 3200

When the IDB is created, each IDB fact also has this slot. The values encode a derivation tree. Example:

This fact has been computed by applying rule R_AALOAD to fact F1428629. I.e. those ids encode a derivation tree:

graph TD; R_AALOAD --> F1428629;

A less trivial example is R_42[F1,R_43[F2,F3]] , this encodes the following derivation tree:

graph TD; R_42 --> F1; R_42 --> R_43; R_43 --> F2; R_43 --> F3;

The project contains a parser to transform those strings into trees (io.github.bineq.daleq.souffle.provenance.ProvenanceParser).

The grammar of the encoded proofs is defined here: src/main/antlr4/io/github/bineq/daleq/souffle/provenance/Proof.g4.

When trying to establish equivalence between two (compiled) classes, IDBs cannot be compared directly for the following reasons:

The provenance (id) terms -- this reflects the process of normalisation that might be different for each of the classes compared
the IDB contains additional facts for predicates to keep track of facts that have been removed (i.e. EDB facts that have no counterpart in the IDB), or moved.
facts corresponding to bytecode instruction have an instructioncounter slot to define the order of instructions within a method. Those might change in the IDB as normalisation may move/remove/insert facts corresponding to instructions.

The purpose of projection is to remove the aforementioned parts (facts and slots) from the database. At the moment normalisation is implemented in Java.

Workflow to Compare two Binaries

The classes io.github.bineq.daleq.idb.IDBReader and io.github.bineq.daleq.idb.IDBPrinter are used to read an IDB, and convert it into a textual representation suitable for comparison and diffing (using a standard text fiff tool such as diff or kdiff3). This also supports projection.

flowchart LR bytecode1["class1"] --> asm1(("asm")) --> edb1["EDB1"] --> souffle1(("souffle")) --> idb1["IDB1"] --> IDBReader1(("project")) --> file1 --> diff --> result bytecode2["class2"] --> asm2(("asm")) --> edb2["EDB2"] --> souffle2(("souffle")) --> idb2["IDB2"] --> IDBReader2(("project")) --> file2 --> diff

This is an example of a generated report. Here, we have compared the jar file for the artifact javax.transaction:jta:1.1 from Maven Central with the corresponding artifact rebuilt by Google's assured open source. Daleq employs various analysers to compare classes, meta-data and resources within the jars. The results are displayed in a table, the rows correspond to files within the jar(s), and the columns correspond to the various analysers. Analysers compare the content of files, specialised analysers are used to compare source code and bytecode. The actual daleq analyser results are displayed in the last column. Please have a look at our paper for more details.

Possible analysis result states are:

PASS the comparison yields true
FAIL the comparison yields false
N/A the comparison cannot be applied (e.g., a source Code comparison cannot be used to compare bytecode)
ERROR the comparison has resulted in an error

The markers next to each result are hyperlinks to generated pages with additional provenance. For non-equivalence statements, those pages are usually diff reports rendered in HTML. For daleq equivalence statements, advanced diff reports are generated based on the derivations recorded when datalog rules are applied. This is an example:

Constructing the EDB Only

usage: java -cp <path-to-built-jar> io.github.bineq.daleq.edb.FactExtractor -cl <classes> the location of compiled classes, a jar file or folder -f <facts> a folder where to create the extension database (input .facts files), the folder will be created if it does not exist -s <souffle> a file where to create the souffle program (.souffle file) containing imports and input predicate declarations -v whether to verify the datalog facts against the schema, must be true or false, default is true

Note that tests requires souffle to be set up (see also IDB Computation section). This has to be done locally at the moment, see also issue 11. Tests that require souffle will not fail but will be skipped if souffle is not available.

Performance and Issues with Souffle

Souffle occasionally fails. When souffle is used, a Thread::sleep instruction is used to pause DALEQ. This reduces the number of errors (perhaps a souffle race condition?), but makes DALEQ slower.

Souffle issues will be reported by the appplication, and flagged as error .

Additional analysers can be easily added by following those steps:

Write an analyser (say MyAnalyser) implementing io.github.bineq.daleq.cli.Analyser
In your project, add a file src/main/resources/META-INF/services/io.github.bineq.daleq.cli.Analyser
Add a line with the fully qualified name of MyAnalyser to this file
Build a jar file
When running DALEQ, add this jar file to the classpath
Note that the order in which columns are displayed in reports is determined by the property io.github.bineq.daleq.cli.Analyser::positionHint, a values between 0 and 100.

To contribute to this project, please follow the steps below to set up your development environment.

First, create a virtual environment to isolate dependencies:

Activate the virtual environment and install the pre-commit hooks.

source .venv/bin/activate pip install -r requirements.txt pre-commit install

Now, the hooks will automatically run on each commit to check for any issues in the code. You can manually run the pre-commit hooks on all files at any time with:

pre-commit run --all-files