Artefact for A Deterministic Memory Allocator for Dynamic Symbolic Execution -Software Reliability Group

We provide the three KLEE-based tools (mainline KLEE, SymLive & MoKlee) used in the paper in two docker images. For the majority of our experiments, we provide the open-source, KDAlloc-enhanced versions of mainline KLEE and SymLive in the first docker image, kdalloc/main. For the MoKlee experiments described in §6.4 of the paper, we provide a separate docker image, kdalloc/moklee. Thus, the majority of the following deals with the kdalloc/main image.

The download links for the Docker/VM images will be made available soon!

Overview

The kdalloc/main docker image is structured as follows:

/project/examples/: source code to Listing 1 & 3
/project/mainline-experiments/: script and parameters for mainline KLEE experiments
/project/patches/stats.patch: patch for additional allocator statistics in §6.2
/project/queries/: scripts and raw data used for the analysis in §6.3, Figures 7 & 8
/project/results/: csv files with the raw data used for Figures 4, 5 & 6; results from §6.5
/project/root/bin/: pre-compiled binaries like klee and symlive-klee (part of $PATH)
/project/src/klee-src/: source code to mainline KLEE with KDAlloc
/project/src/symlive-src/: source code to SymLive with KDAlloc
/project/symlive-experiments/: script for SymLive experiments
/project/tools/bc/: all benchmarks listed in Tables 1 & 2 as LLVM bitcode files

We have built and installed mainline KLEE and SymLive with KDAlloc in the docker image, they are available as klee and symlive-klee, respectively.

For the MoKlee experiments in §6.4, we provide a separate docker image, whose content is described in the corresponding section of this documentation. Please note that, as the source code for MoKlee is only available on request, we can not make it public through this artifact.

Functionality

Our docker images provide comprehensive support for repeating all experiments from the paper. In particular, we provide:

the source code to KDAlloc, our enhanced mainline KLEE and our enhanced SymLive version (kdalloc/main image)
compiled binaries for mainline KLEE, SymLive (kdalloc/main image) and MoKlee (kdalloc/moklee image) enhanced with KDAlloc
scripts to repeat all experiments from §6.2, §6.5 (kdalloc/main image) and §6.4 (kdalloc/moklee image)
LLVM bitcode for all benchmarks listed in Tables 1-2 (kdalloc/main image) and Table 3 (kdalloc/moklee image)
our original data for experiments from §6.2, §6.5 (kdalloc/main image) and §6.4 (kdalloc/moklee image)
a script for generating Figures 4-6 from either the included original data or own runs (kdalloc/main image)
instructions to repeat the example from §6.3 (Listing 1) in this document (using the kdalloc/main image)
a script for generating Table 4 from your own runs (kdalloc/moklee image)
instructions to repeat the example from §6.5 (Listing 3) in this document (using the kdalloc/main image)

Reusability

We provide the implementation for KDAlloc, along with its integration into mainline KLEE (and SymLive), to enable easy reuse and further modification. The source code for KDAlloc can be found in /project/src/klee-src/include/pseudoalloc as part of the kdalloc/main image.

KDAlloc is a header-only library that can be reused for any other project, although a small FFI wrapper might become necessary for projects not written in modern C++. In our KLEE-based use cases, we use klee::ref for reference counting, but if the macro PSEUDOALLOC_KLEE_REF is undefined or defined to 0, it will use std::shared_ptr instead.

In addition to simply reusing the whole allocator as-is, it is also configurable and modular to adapt it to similar, but different use cases. For example, bin cutoffs can be configured in allocator.h by changing the elements of allocator_t::control_t::_meta in the pseudoalloc::allocator namespace (lines 34-43). More freedom can be achieved by changing how the allocator_t member functions dispatch calls to the suballocators.

The suballocators themselves follow a fairly simple API that is somewhat complicated by sharing static data in a “control” structure that can store data that does not change in between forks (e.g., the size of the bin and its location in memory). The behaviour of the slot allocator (§4.4 and §5.4 in the paper) resides in suballocators/slot_allocator.h. The large object allocator (§4.5 and §5.5 in the paper) is implemented in two different variants that can be switched out by defining either PSEUDOALLOC_MAP_LOH or PSEUDOALLOC_COW_TREAP_LOH (see also suballocators/split_region_allocator.h):

PSEUDOALLOC_COW_TREAP_LOH (enabled in our experiments) uses a custom treap with node-level (i.e., subtreap-level) reference counted copy-on-write. See also suballocators/split_region_allocator/cow_treap_based.h for the suballocator and sized_regions.h (same directory) for the treap itself.
PSEUDOALLOC_MAP_LOH (disabled in our experiments) uses two std::maps to achieve a similar goal, but is much slower and does not include node-level copy-on-write.

To see how we integrated KDAlloc into mainline KLEE, please refer to /project/src/klee-src/lib/Core/MemoryManager.cpp, where we essentially replaced KLEE’s original deterministic allocator for ours. The mechanism for external function calls (§5.3 in the paper) is mainly implemented in Executor::callExternalFunction in /project/src/klee-src/lib/Core/Executor.cpp.

Availability

In addition to our work, the artifact also contains, e.g., containers built on full Linux base images and sources from other open-source projects under different licenses; the parts that constitute our work (this document and our allocator) are available under CC-BY 4.0 International.

Requirements and Setup

Our experiments requires up to about 16 GB of main memory. Some additional headroom is suggested.

Usage via Docker (Preferred)

A working Docker installation is required (alternatives like Podman may also work, but have not been tested by us). Additionally, transparent huge page (THP) support must be set to madvise or never:

$ cat /sys/kernel/mm/transparent_hugepage/enabled
[always] madvise never
$ echo madvise | sudo tee /sys/kernel/mm/transparent_hugepage/enabled >/dev/null
$ cat /sys/kernel/mm/transparent_hugepage/enabled
always [madvise] never

If your kernel is set to always use THP and you are unable or unwilling to change this setting, please use the provided virtual machine instead.

The two provided docker images can then be loaded:

$ docker load -i kdalloc_main.tgz
$ docker load -i kdalloc_moklee.tgz

Usage via VM

Import vm.ova into your virtualisation tool of choice (tested with VMWare Workstation and VirtualBox). The VM is set to 12 GiB memory, which is enough to run most, but not all of our experiments. You are therefore encouraged to increase it to at least 24 GiB if your system can sustain it - decreasing down to 4 GiB will significantly reduce the amount of experiments that can be run, but may still be viable to ascertain validity of the approach.

The user user has the password password and SSH is enabled. As this is fairly insecure, please do not expose the VM to an unsecure network without disabling SSH first.

Setup

To start a kdalloc/main container, please execute the following commands:

$ docker run -it kdalloc/main /bin/bash

This docker image contains the great majority of our experiments. Only for the MoKlee experiments, the second image is needed, whose setup we describe just before it is needed.

Getting Started

After completing the setup described just above, we suggest to run both of the examples (taken from Listing 1 & 3 of the paper) listed below. They should overall take only a couple of minutes to complete, with the actual KLEE and SymLive runs taking just several seconds in every case.

Example: use-after-free (§6.3, Listing 1)

In §6.3 of the paper, we describe how we use KDAlloc to improve KLEE’s use-after-free detection. The example used in the paper (Listing 1) is included in our artifact. In the following, we provide the steps to repeat our experiments. Each involved KLEE call only takes a couple of seconds to complete.

# change into the example directory
$ cd /project/examples

# compare example with Listing 1 of the paper
$ cat uaf.c

# compilation step (creates uaf.bc)
$ clang -emit-llvm -fno-discard-value-names -Wall -Wextra -g \
    -O1 -Xclang -disable-llvm-passes -D__NO_STRING_INLINES \
    -D_FORTIFY_SOURCE=0 -U__OPTIMIZE__ -c uaf.c

# run without KDAlloc (no error detected)
$ klee -libc=uclibc -posix-runtime uaf.bc
# completes without "KLEE: ERROR:" line

# run with KDAlloc
$ klee -allocate-determ=1 -libc=uclibc -posix-runtime uaf.bc
# [...]
KLEE: ERROR: libc/string/strlen.c:22: memory error: use after free
# [...]

# further inspect error (example)
$ cat klee-last/test000001.ptr.err
# [...]
Stack: 
    #000010621 in strlen (s=139100132610048) at libc/string/strlen.c:22
    #100013524 in fputs_unlocked (s=139100132610048, stream=140063547461720) at libc/stdio/fputs.c:25
    #200012678 in puts (s=139100132610048) at libc/stdio/puts.c:26
    #300008828 in __klee_posix_wrapped_main () at uaf.c:14
# [...]

While the error is reported in strlen, the inspection of the error using the ptr.err file reveals that this function is called by puts and corresponds to the first usage of the pointer s after free(s).

As we describe in the paper, if optimisation (-optimize) is enabled, KLEE can also find an error without KDAlloc:

$ klee -optimize -libc=uclibc -posix-runtime uaf.bc
# [...]
KLEE: ERROR: libc/string/strlen.c:22: memory error: out of bound pointer
# [...]

Note, however, that without KDAlloc the error is reported as “out of bound pointer” instead of “use after free”.

Example: infinite loop detection (§6.5, Listing 3)

In §6.5 of the paper, we describe how the example from Listing 3 can only be detected when SymLive is using KDAlloc, while evading detection with the default allocator. If you wonder how the example works, please refer to §6.5. The example from the paper (Listing 3) is included in our artifact. In the following, we provide the steps to repeat our experiments. The two involved SymLive runs only take a couple of seconds to complete.

# change into the example directory
$ cd /project/examples

# compare example with Listing 3 of the paper
$ cat symlive.c

# compilation step (creates symlive.bc)
$ clang -emit-llvm -fno-discard-value-names -Wall -Wextra -g \
    -O1 -Xclang -disable-llvm-passes -D__NO_STRING_INLINES \
    -D_FORTIFY_SOURCE=0 -U__OPTIMIZE__ -I/project/root/include/ \
    -c symlive.c

# run with KDAlloc (detects infinite loop)
$ symlive-klee -posix-runtime -only-output-states-covering-new \
    -external-calls=all -exit-on-error-type=InfiniteLoop \
    -infinite-loop-detection-truncate-on-fork=0 \
    -detect-infinite-loops=1 -allocate-determ-quarantine=0 \
    -allocate-determ=1 symlive.bc
# [...]
KLEE: ERROR: symlive.c:11: infinite loop
# [...]
KLEE: done: completed paths = 2
# [...]

# run with default allocator (no infinite loop found)
$ symlive-klee -posix-runtime -only-output-states-covering-new \
    -external-calls=all -exit-on-error-type=InfiniteLoop \
    -infinite-loop-detection-truncate-on-fork=0 \
    -detect-infinite-loops=1 -allocate-determ-quarantine=0 \
    -allocate-determ=0 -max-time=10s symlive.bc
# [...]
# no "infinite loop" error detected
# [...]
# "completed paths" is significantly > 2, e.g. 230
# [...]

Both calls to symlive-klee differ in the last line (just before symlive.bc):

-allocate-determ: set to 1 in the KDAlloc case; 0 for default allocation
-max-time=10s is only present in the default allocator case (due to non-detection of the infinite loop)

The output shows that in the KDAlloc case, the infinite loop is detected and, due -exit-on-error-type=InfiniteLoop, no further exploration is performed. Thus, there are only two paths when SymLive terminates. In case of default allocation, we abort the exploration after 10 seconds, during which no infinite loop has been detected and significantly more paths were forked. This corresponds to the result reported in §6.5 of the paper.

Experiments

Before we describe scripts that include the necessary KLEE parameters, we describe the flags that we added to control KDAlloc:

-allocate-determ=1 enables deterministic allocation using KDAlloc (default=0: disabled)
-allocate-determ-quarantine=<size> is used to adjust the size of quarantine queues (default=8, disabled=0, unlimited=-1)

Each <segment> size (in GiB) and (page-aligned) start address can be set using

-allocate-determ-<segment>-size
-allocate-determ-<segment>-start-address

where <segment> is any of constants, globals, heap and stack. For the respective default sizes, please refer to Figure 2 of the paper. In our experimental runs, we use the default sizes, but specify fixed start addresses to ensure cross-run deterministic behaviour:

--allocate-determ-constants-start-address=35184372088832
--allocate-determ-globals-start-address=36283883716608
--allocate-determ-heap-start-address=37383395344384
--allocate-determ-stack-start-address=38482906972160

Comparison with Mainline KLEE (§6.2)

We only slightly modified KLEE by adding a few statistics and thresholds and implemented KDAlloc as an additional allocator. The necessary files to re-run our experiments can be found in /project/mainline-experiments:

dfs_instrs.csv/rndcov_instrs.csv - The maximum number of instructions to execute for each benchmark for both search strategies (dfs/rndcov). The values correspond to runs taking roughly 2hrs on our machines.
nondet_dfs_covupdate.csv/nondet_rndcov_covupdate.csv - The number of instructions after which the coverage data gets updated for each benchmark.
maxstates.csv - The maximum number of active states. We use a state-based limit instead of memory limits to increase determinism.
tools.txt - The list of paths for all benchmarks.
run-experiments.py - A script to re-run our experiments.

The run-experiment.py script starts a single run of experiments for all benchmarks. By altering the include_experiments list in the configuration section of the script, the number of benchmarks can be reduced (use /project/mainline-experiments/run-experiments.py -s to check your settings). We highly recommend to do that, as the full set requires more than 24 days on a single machine similar to ours. Additionally, we include the same data that we used for the paper in /project/results (see “Generating the Graphs from Raw Data”)

After setting up /project/mainline-experiments/run-experiments.py, the experiments can be started and the results retrieved:

$ mkdir /project/mainline-experiments/run0
$ cd /project/mainline-experiments/run0
$ ../run-experiments.py
$ klee-stats --print-all --table-format=csv . > ../results/run0.csv
# repeat with run1, run2, ..., run4

$ cd /project/mainline-experiments/results
$ evaluation_artifact.R

Solver Time Improvements (§6.3)

To analyse queries issued by KLEE, we performed separate runs of expand with the default allocator and with KDAlloc. In each of these runs, we used the additional KLEE options --use-query-log=all:kquery,solver:kquery and --compress-query-log. The first flag instructs KLEE to record all queries issued (all), as well as all queries dispatched to the solver (solver) in the KQuery format. The second flag is used to compress the resulting files at runtime, which reduces overhead.

We provide the 100 queries that we manually inspected at /project/queries/expand_dfs/top-100-difftime-det and /project/queries/expand_dfs/top-100-difftime-nondet (with and without KDAlloc, respectively). These files are sufficient to follow our insights on differences in query times resulting from the ability of the solvers to reduce Select expressions (see “Analysing Time Differences of Individual Queries”). However, we provide a detailed description, all scripts and the raw data needed to extract the 100 most impactful queries (see “Analysing the Raw Query Logs”).

Analysing Time Differences of Individual Queries (§6.5, Figure 7 & 8)

We provide the 100 queries with the highest difference between default allocation and KDAlloc in top-100-difftime-nondet and top-100-difftime-det, respectively. As we noted in the paper, all these top 100 queries are structurally very similar, and just the constants and arrays differ between both variants.

To see how much the execution times differ despite this similarity, we can execute both variants of a query through kleaver with the z3 solver backend. To reliably measure the short execution times, we use hyperfine.

$ cd /project/queries/expand_dfs/
$ hyperfine --runs 200 --warmup 5 "kleaver -solver-backend=z3 top-100-difftime-nondet/query-0533316.kquery"
Benchmark #1: kleaver -solver-backend=z3 top-100-difftime-nondet/query-0533316.kquery
  Time (mean ± σ):      34.0 ms ±   2.1 ms    [User: 27.9 ms, System: 5.4 ms]
  Range (min … max):    30.4 ms …  44.1 ms    200 runs

$ hyperfine --runs 200 --warmup 5 "kleaver -solver-backend=z3 top-100-difftime-det/query-0533316.kquery"
Benchmark #1: kleaver -solver-backend=z3 top-100-difftime-det/query-0533316.kquery
  Time (mean ± σ):      18.5 ms ±   1.1 ms    [User: 13.1 ms, System: 5.0 ms]
  Range (min … max):    16.8 ms …  23.6 ms    200 runs

The mean time z3 takes differs by a factor of 1.83 between non-deterministic and deterministic query. We can observe a similar pattern (although with a different factor) for STP:

$ cd /project/queries/expand_dfs/
$ hyperfine --runs 200 --warmup 5 "kleaver -solver-backend=stp top-100-difftime-nondet/query-0533316.kquery"
Benchmark #1: kleaver -solver-backend=stp top-100-difftime-nondet/query-0533316.kquery
  Time (mean ± σ):      23.0 ms ±   2.3 ms    [User: 17.9 ms, System: 4.8 ms]
  Range (min … max):    20.7 ms …  32.5 ms    200 runs

$ hyperfine --runs 200 --warmup 5 "kleaver -solver-backend=stp top-100-difftime-det/query-0533316.kquery"
Benchmark #1: kleaver -solver-backend=stp top-100-difftime-det/query-0533316.kquery
  Time (mean ± σ):      10.1 ms ±   0.5 ms    [User: 5.5 ms, System: 4.4 ms]
  Range (min … max):     9.0 ms …  12.3 ms    200 runs

This disparity can be observed similarly for every other top 100 query.

To prepare a more detailed inspection, we manually minimised an example query (query-0533316.kquery), making sure not to change the solving time ratio we just observed. The resulting query only depends on the three arrays A-data, A-data-stat, const_arr5. We provide the two resulting files in kquery format (with *.min.kquery suffix).

The next step is to convert the query in both variants into the SMT-LIB 2 format, which is a lot more verbose but can be read by z3 directly. To analyse how z3 treats the two query variants differently, we use a debug version of z3 (provided as z3-debug in the kdalloc/main image). Only such a debug version offers the additional tracing flags used for our analyis.

$ cd /project/queries/expand_dfs/
$ cp -v query-0533316-trace/*.min.kquery .
'query-0533316-trace/query-0533316-det.min.kquery' -> './query-0533316-det.min.kquery'
'query-0533316-trace/query-0533316-nondet.min.kquery' -> './query-0533316-nondet.min.kquery'
$ kleaver --print-smtlib query-0533316-det.min.kquery > query-det.smt2
$ kleaver --print-smtlib query-0533316-nondet.min.kquery > query-nondet.smt2
$ z3-debug -tr:th_rewriter_step -tr:simplifier -v:10 query-det.smt2
(simplifier :num-exprs 203 :num-asts 1289 :time 0.74 :before-memory 0.67 :after-memory 0.69)
(propagate-values :num-exprs 142 :num-asts 1265 :time 0.06 :before-memory 0.68 :after-memory 0.69)
(solve_eqs :num-exprs 142 :num-asts 1265 :time 0.00 :before-memory 0.69 :after-memory 0.69)
(elim-uncnstr :num-exprs 142 :num-asts 1265 :time 0.00 :before-memory 0.69 :after-memory 0.69)
(reduce-bv-size :num-exprs 142 :num-asts 1265 :time 0.00 :before-memory 0.69 :after-memory 0.69)
(simplifier :num-exprs 142 :num-asts 1265 :time 0.01 :before-memory 0.69 :after-memory 0.70)
(max-bv-sharing :num-exprs 142 :num-asts 1265 :time 0.00 :before-memory 0.69 :after-memory 0.70)
(ackermannize_bv :num-exprs 142 :num-asts 1403 :time 0.00 :before-memory 0.69 :after-memory 0.75)
(simplifier :num-exprs 106 :num-asts 1403 :time 0.01 :before-memory 0.75 :after-memory 0.75)
(propagate-values :num-exprs 106 :num-asts 1403 :time 0.00 :before-memory 0.75 :after-memory 0.75)
(solve_eqs :num-exprs 35 :num-asts 1371 :time 0.01 :before-memory 0.75 :after-memory 0.76)
(:num-elim-vars 32)
(elim-uncnstr :num-exprs 7 :num-asts 1431 :time 0.00 :before-memory 0.75 :after-memory 0.76)
(:num-elim-apps 16)
(reduce-bv-size :num-exprs 7 :num-asts 1431 :time 0.00 :before-memory 0.76 :after-memory 0.76)
(simplifier :num-exprs 3 :num-asts 1430 :time 0.00 :before-memory 0.76 :after-memory 0.76)
(simplifier :num-exprs 3 :num-asts 1430 :time 0.00 :before-memory 0.76 :after-memory 0.76)
(max-bv-sharing :num-exprs 3 :num-asts 1430 :time 0.00 :before-memory 0.76 :after-memory 0.76)
(ackermannize_bv :num-exprs 3 :num-asts 1430 :time 0.00 :before-memory 0.76 :after-memory 0.76)
(bv1-blaster :num-exprs 3 :num-asts 1430 :time 0.00 :before-memory 0.76 :after-memory 0.76)
(smt.tactic start)
(smt.propagate-values)
(smt.reduce-asserted)
(smt.maximizing-bv-sharing)
(smt.reduce-asserted)
(smt.flatten-clauses)
(smt.simplifier-done)
(smt.searching)
(smt :num-exprs 0 :num-asts 1430 :time 0.00 :before-memory 0.76 :after-memory 0.77)
sat
$ mv .z3-trace z3-trace-det.log
$ z3-debug -tr:th_rewriter_step -tr:simplifier -v:10 query-nondet.smt2
(simplifier :num-exprs 271 :num-asts 1570 :time 2.07 :before-memory 0.73 :after-memory 0.81)
(propagate-values :num-exprs 271 :num-asts 1439 :time 0.00 :before-memory 0.77 :after-memory 0.79)
(solve_eqs :num-exprs 271 :num-asts 1439 :time 0.00 :before-memory 0.79 :after-memory 0.79)
(elim-uncnstr :num-exprs 271 :num-asts 1439 :time 0.00 :before-memory 0.79 :after-memory 0.79)
(reduce-bv-size :num-exprs 271 :num-asts 1439 :time 0.00 :before-memory 0.79 :after-memory 0.79)
(simplifier :num-exprs 269 :num-asts 1494 :time 0.19 :before-memory 0.79 :after-memory 0.80)
(max-bv-sharing :num-exprs 269 :num-asts 1458 :time 0.00 :before-memory 0.79 :after-memory 0.79)
(ackermannize_bv :num-exprs 269 :num-asts 3011 :time 3.12 :before-memory 0.79 :after-memory 1.09)
(smt.tactic start)
(smt.propagate-values)
(smt.reduce-asserted)
(smt.maximizing-bv-sharing)
(smt.reduce-asserted)
(smt.flatten-clauses)
(smt.simplifier-done)
(smt.searching)
(smt.simplifying-clause-set :num-deleted-clauses 35)
(smt.simplifying-clause-set :num-deleted-clauses 710)
(smt.simplifying-clause-set :num-deleted-clauses 0)
(smt.simplifying-clause-set :num-deleted-clauses 0)
(smt.simplifying-clause-set :num-deleted-clauses 0)
(smt.simplifying-clause-set :num-deleted-clauses 0)
(smt.simplifying-clause-set :num-deleted-clauses 0)
(smt :num-exprs 0 :num-asts 2399 :time 0.11 :before-memory 1.09 :after-memory 1.09)
sat
$ mv .z3-trace z3-trace-nondet.log

In the respective first lines output by each z3-debug invocation above, we can see that already after the first major step (“simplifier”), the number of expressions differs considerably (271 for queries without KDAlloc vs 203 with KDAlloc). We use the -tr:th_rewriter_step option to trace the simplification steps performed by the so-called theory rewriter, which runs as part of the simplifier. We also add -tr:simplifier for orientation in the trace (ensuring that the rewriting steps are indeed performed as part of the first simplifier invocation) and -v:10 to set the verbosity (for the output shown above). The trace written by z3-debug is stored in a file called .z3-trace, which we have to backup to a different location before invoking z3-debug again. Our original files are provided in /project/queries/expand_dfs/query-0533316-trace.

We now look at the two trace files, z3-trace-det.log and z3-trace-nondet.log. While we primarily study the former to understand how the query was optimised, we use the latter to find out where the traces differ. We do this by looking at the diff of the two files side-by-side. In the docker container, one can do this using diff -y z3-trace-det.log z3-trace-nondet.log | less, although using a graphical diff viewer might be more comfortable.

The first difference (in the following, we only show excerpts from z3-trace-det.log) ocurrs at

-------- [th_rewriter_step] reduce_app ../../z3/src/ast/rewriter/th_rewriter.cpp:627 ---------
extract
(let ((a!1 (ite (and (= (select A-data-stat #x00000008) #x00)
                     (= (select A-data-stat #x00000009) #x00)
                     (= (select A-data-stat #x0000000a) #x00)
                     (= (select A-data-stat #x0000000b) #x00)
                     (= (select A-data-stat #x0000000c) #x00)
                     (= (select A-data-stat #x0000000d) #x00)
                     (= (select A-data-stat #x0000000e) #x00)
                     (= (select A-data-stat #x0000000f) #x00))
                #x0000000000000000
                #x0000224400000000)))
  (bvadd #xffffddbc00000000 a!1))
---------->
(let ((a!1 (ite (and (= (select A-data-stat #x00000008) #x00)
                     (= (select A-data-stat #x00000009) #x00)
                     (= (select A-data-stat #x0000000a) #x00)
                     (= (select A-data-stat #x0000000b) #x00)
                     (= (select A-data-stat #x0000000c) #x00)
                     (= (select A-data-stat #x0000000d) #x00)
                     (= (select A-data-stat #x0000000e) #x00)
                     (= (select A-data-stat #x0000000f) #x00))
                #x0000000000000000
                #x0000224400000000)))
  (bvadd ((_ extract 31 0) #xffffddbc00000000) ((_ extract 31 0) a!1)))

where a!1 translates to (Select w64 C 0x00 0x224400000000) in KQuery (Select is called ite here and widths are implicit). With the enclosing (Extract w32 0 ...), the query fragment transformed here corresponds to the example from Figure 7 in the paper. The step shown in the trace can be read as extract Q ----------> Q', where extract is the operation that is being rewritten; Q is its operand before the transformation, and Q' is the resulting query fragment. Thus, (Extract w32 0 Q) is rewritten to Q'. From this, we can see that the rewriting step above corresponds to the first step displayed in Figure 7: The Extract is pushed down to the operands of the Add (here: bvadd) operation.

The following entry in the trace

------------------------------------------------
-------- [th_rewriter_step] reduce_app ../../z3/src/ast/rewriter/th_rewriter.cpp:645 ---------
extract
#xffffddbc00000000
---------->
#x00000000

corresponds to the first half of the second step: (Extract w32 0 0xFFFFDDBC00000000) -> 0x00. Then (for the second half), another Extract is pushed down, this time into the Select (here: ite):

------------------------------------------------
-------- [th_rewriter_step] reduce_app ../../z3/src/ast/rewriter/th_rewriter.cpp:627 ---------
extract
(ite (and (= (select A-data-stat #x00000008) #x00)
          (= (select A-data-stat #x00000009) #x00)
          (= (select A-data-stat #x0000000a) #x00)
          (= (select A-data-stat #x0000000b) #x00)
          (= (select A-data-stat #x0000000c) #x00)
          (= (select A-data-stat #x0000000d) #x00)
          (= (select A-data-stat #x0000000e) #x00)
          (= (select A-data-stat #x0000000f) #x00))
     #x0000000000000000
     #x0000224400000000)
---------->
(ite (and (= (select A-data-stat #x00000008) #x00)
          (= (select A-data-stat #x00000009) #x00)
          (= (select A-data-stat #x0000000a) #x00)
          (= (select A-data-stat #x0000000b) #x00)
          (= (select A-data-stat #x0000000c) #x00)
          (= (select A-data-stat #x0000000d) #x00)
          (= (select A-data-stat #x0000000e) #x00)
          (= (select A-data-stat #x0000000f) #x00))
     ((_ extract 31 0) #x0000000000000000)
     ((_ extract 31 0) #x0000224400000000))

We can see that the query fragment after the transformation translates to (Select w64 C (Extract w32 0 0x00) (Extract w32 0 0x224400000000))) with C as the and operation.

Afterwards, the two Extract operations are rewritten, corresponding to the third step in Figure 7:

------------------------------------------------
-------- [th_rewriter_step] reduce_app ../../z3/src/ast/rewriter/th_rewriter.cpp:645 ---------
extract
#x0000000000000000
---------->
#x00000000
------------------------------------------------
-------- [th_rewriter_step] reduce_app ../../z3/src/ast/rewriter/th_rewriter.cpp:645 ---------
extract
#x0000224400000000
---------->
#x00000000

Finally, we arrive at the most important step, removing the Select over a non-trivial condition C:

------------------------------------------------
-------- [th_rewriter_step] reduce_app ../../z3/src/ast/rewriter/th_rewriter.cpp:627 ---------
if
(and (= (select A-data-stat #x00000008) #x00)
     (= (select A-data-stat #x00000009) #x00)
     (= (select A-data-stat #x0000000a) #x00)
     (= (select A-data-stat #x0000000b) #x00)
     (= (select A-data-stat #x0000000c) #x00)
     (= (select A-data-stat #x0000000d) #x00)
     (= (select A-data-stat #x0000000e) #x00)
     (= (select A-data-stat #x0000000f) #x00))
#x00000000
#x00000000
---------->
#x00000000

We have seen how the steps performed by z3 simplify the query when KDAlloc is used, as shown in Figure 7 of the paper. To relate the steps shown in Figure 8 (the nondet variant of the query) to the corresponding z3 trace, we can follow the steps above, albeit looking at the other side of the diff. We can see that z3 performs the same initial steps as above, but cannot and, indeed, does not simplify (Select w64 C 0x00 0x58D6F400), since 0x00 != 0x58D6F400.

Analysing the Raw Query Logs

The following describes how to get from the query-log runs we provided in our raw data to the 100 most impactful queries. This step is time and memory intensive. If you prefer to follow our analysis directly, please refer to “Analysing Time Differences of Individual Queries”.

We included the raw data from our query-log runs in compressed form, to be unextracted as follows:

$ cd /project/queries/expand_dfs/
$ tar xvf expand_dfs.tar.lzma
$ gzip -v -d expand_dfs_*/*queries.kquery.gz

The last step further decompresses two files for each variant (det with KDAlloc and nondet without): A all-queries.kquery and a solver-queries.kquery file. The former logs all queries issued by KLEE, while the latter only contains the queries that reached the solver. Thus, a query that can be answered by one of KLEE’s caches will only be in the first, not in the second log file.

First, we confirm that the time disparity stems from the solver and is not due to caching. For this, we add up and compare the total query times, once for the all-queries.kquery and once for the solver-queries.kquery files.

$ cd /project/queries/
$ ./diff-queries-all.py expand_dfs/expand_dfs_det_211202132702/all-queries.kquery expand_dfs/expand_dfs_nondet_211202132702/all-queries.kquery
Total time elapsed: 3617.93361840200 seconds    expand_dfs/expand_dfs_det_211202132702/all-queries.kquery
Total time elapsed: 6750.46206326700 seconds    expand_dfs/expand_dfs_nondet_211202132702/all-queries.kquery
$ ./diff-queries-solver.py expand_dfs/expand_dfs_det_211202132702/solver-queries.kquery expand_dfs/expand_dfs_nondet_211202132702/solver-queries.kquery
Total time elapsed: 3216.151834275 seconds    expand_dfs/expand_dfs_det_211202132702/solver-queries.kquery
Total time elapsed: 6387.882213567 seconds    expand_dfs/expand_dfs_nondet_211202132702/solver-queries.kquery

As we can see, the difference in all-queries between the det and nondet variants are indeed dominated by the solver time (solver-queries). Our diff-queries-all.py script also provides an option (--csv FILE) that generates a csv file comparing solver times query-by-query.

The most time consuming step is matching the queries, that is mapping constants that relate to non-deterministic addresses to their deterministic counterparts. We thus separated this computation from the analysis using a script that creates an expand_dfs.pkl file, which will later be used by other scripts.

$ cd /project/queries/
$ ./match-queries.py -da expand_dfs/expand_dfs_det_211202132702/all-queries.kquery -ds expand_dfs/expand_dfs_det_211202132702/solver-queries.kquery -na expand_dfs/expand_dfs_nondet_211202132702/all-queries.kquery -ns expand_dfs/expand_dfs_nondet_211202132702/solver-queries.kquery -o expand_dfs.pkl

Afterwards, we want to look at the statistics of the matching process. We can do so using the just generated expand_dfs.pkl file:

$ cd /project/queries/
$ ./show-matching-stats.py -i expand_dfs.pkl > expand_dfs-matching-stats.log

The output from this script provides the numbers listed in §6.3 in the paper.

Finally, we use another two-step process to extract the top 100 queries that we analysed in further detail. First, we extract expand_dfs-top100.pkl from expand_dfs.pkl:

$ cd /project/queries/
$ ./extract-top-difftime-queries.py -i expand_dfs.pkl -o expand_dfs-top100.pkl

Then, we use another script to write these queries into individual files:

$ cd /project/queries/
$ ./write-query-files.py -i expand_dfs-top100.pkl

The script write-query-files.py creates two directories, det and nondet, each containing 100 kquery files.

SymLive (§6.6)

Similarly to the mainline experiment, the set of benchmarks can be reduced in the configuration section of run-experiments.py in /project/symlive-experiments. However, the list of benchmarks is small and can be found in tools.txt. To re-run the experiments and list detected infinite loops, execute:

$ cd /project/symlive-experiments/runs
$ ../run-experiments.py
$ find . -name '*.infty.err'
./minimal_dfs_det/test000001.infty.err
...

Our results can be found in the /project/results directory. symlive_infty.txt lists the error files for the found infinite loops, whereas symlive.stats contains the execution statistics (use e.g. less -S to view). The column L lists detected infinite loops (D). This column was added manually from the results in symlive_infty.txt and is not available from KLEE directly.

Generating the Graphs from Raw Data (Figures 4-6)

The Docker image contains all the packages necessary to generate graphs based on the raw data.

The raw data for all five runs (run0.csv, ..., run4.csv) is in /project/results. An R-based script (evaluation_artifact.R) generates all the png files based on that raw data. The script is commented for better understanding (nano $(which evaluation_artifact.R)).

To execute the script, change into the directory /project/results and execute: evaluation_artifact.R. This should generate the following output:

$ cd /project/results
$ evaluation_artifact.R
[...]
[1] "27 Deterministic apps DFS: ar, asn1Decoding, basename, bc, cut, dircolors, dirname, echo, expand, false, fold, getlimits, logname, make, mkdir, pathchk, pinky, printenv, printf, pwd, tail, tr, true, uname, users, whoami, yes"
[1] "67 Non-Deterministic apps DFS: addr2line, base64, basenc, cksum, comm, cp, datamash, date, [, diff, elfedit, env, expr, factor, find, gawk, head, id, join, link, ln, ls, m4, magick, md5sum, mkfifo, mknod, mktemp, mv, nl, nm-new, numfmt, objdump, od, oggenc, osip, paste, pr, ranlib, readelf, readlink, realpath, sed, seq, sha256sum, shred, size, sqlite3, stat, strip-new, stty, sum, tac, tcpdump, tee, test, tiff, tiffdump, touch, tsort, tty, unexpand, uniq, unlink, uptime, who, xml"
[1] "23 Deterministic apps Rnd+Cov: ar, asn1Decoding, basename, bc, dirname, echo, false, getlimits, logname, make, nm-new, objdump, pinky, printenv, pwd, ranlib, strip-new, true, uname, users, who, whoami, yes"
[1] "71 Non-Deterministic apps Rnd+Cov:  addr2line, base64, basenc, cksum, comm, cp, cut, datamash, date, [, diff, dircolors, elfedit, env, expand, expr, factor, find, fold, gawk, head, id, join, link, ln, ls, m4, magick, md5sum, mkdir, mkfifo, mknod, mktemp, mv, nl, numfmt, od, oggenc, osip, paste, pathchk, pr, printf, readelf, readlink, realpath, sed, seq, sha256sum, shred, size, sqlite3, stat, stty, sum, tac, tail, tcpdump, tee, test, tiff, tiffdump, touch, tr, tsort, tty, unexpand, uniq, unlink, uptime, xml"
Warning messages:
1: Removed 24 rows containing missing values (geom_text_repel).
2: Removed 25 rows containing missing values (geom_text_repel).
3: Removed 23 rows containing missing values (geom_text_repel).
4: Removed 18 rows containing missing values (geom_text_repel).
5: Removed 17 rows containing missing values (geom_text_repel).
6: Removed 17 rows containing missing values (geom_text_repel).

After that, in the same directory, there will be several graphs generated:

$ ls *.png
scatterplot_execution_time_dfs_determ_TRUE.png     scatterplot_maxrss_rndcov_determ_TRUE.png
scatterplot_execution_time_rndcov_determ_TRUE.png  scatterplot_solver_time_dfs_determ_TRUE.png
scatterplot_maxrss_dfs_determ_TRUE.png             scatterplot_solver_time_rndcov_determ_TRUE.png

The filenames are of the pattern scatterplot_<variable>_<searcher>_determ_TRUE.png.

Valid values for <variable> are:

maxrss: Figure 4.
execution_time: Figure 5.
solver_time: Figure 6.

<searcher> can be either dfs for the left side of each figure and rndcov for the right side of each figure.

Selecting Comparable Experiments

The R script above automatically removes experiments that are not comparable. As explained in the paper, we aim to evaluate the isolated effects of deterministic allocation on the performance of symbolic execution. Therefore, we require that all executions (with KDAlloc and the default allocator) take the same paths through the program so that every value computed and every solver query issued might only differ by memory addresses.

It is important to note that we cannot conclude anything from differing performance due to different exploration, as the slower path might, for example, have to solve more difficult queries due to being in a different part of the program, while the faster path might not even have to solve any queries at all.

We consider an experiment comparable, when under each search strategy all ten runs, five for each allocator, show the same values for core statistics, e.g. same numer of instructions, covered instructions, allocations, queries, and external calls.

Due to non-determinism, three resource-intensive applications had to be terminated early with KDAlloc: sqlite (dfs), tac (dfs), tcpdump (rndcov). Application-specific thresholds determined with the default allocator simply did not limit the execution as desired and KLEE ran out of memory while exploring different code sections. Another benchmark, oggenc (rndcov), hit an assertion in KDAlloc when it tried to allocate several Exabytes of memory due to an unconstrained malloc with symbolic size.

Allocator Statistics (§6.2)

In §6.2 we report several additional statistics about allocations and live objects. These results stem from a single experiment run with an enhanced statistics-tracking version of KLEE with KDAlloc. The raw data can be found in results/run_stats.csv. As the KLEE binary provided with the Docker container does not track those statistics, we provide a patch in patches/stats.patch that can be applied:

/project/src/klee-src$ patch -s -p1 < ../../patches/stats.patch

to build that version:

/project/src/klee-build$ make -j2

The relevant statistics are:

MaxNumAllocsPerState: max. number of allocations in a single state
MaxNumBytesAllocatedPerState: max. number of allocated bytes in a single state
MaxLiveAllocs/MaxLiveAllocsPerState: max. number of live allocations (in a single state)
MaxLiveBytesAllocated/MaxLiveBytesAllocatedPerState: max. number of live bytes (in a single state)

Generate Graphs for Other Statistics

To generate graphs for other variables of interest, just add the following lines to the final loop in evaluation_artifact.R:

    plot <-  data %>% plot_scatterplot(searcher, determ, `Time(s)`, "Time (s)", 0.1)
    file_name <- paste("scatterplot_execution_time_", searcher,  "_determ_", determ, sep ="")
    save_plot(file_name, plot)

Replace Time(s) with the variable of interest (e.g. Instrs) and change scatterplot_execution_time_ as well (e.g. scatterplot_instructions_). Please use klee-stats --help for further description of the variables.

MoKlee Setup

Only for the experiments following this section, the kdalloc/moklee docker image is needed. To import it and start a container, please execute the following commands:

$ docker load -i kdalloc_moklee.tgz
$ docker run -it kdalloc/moklee /bin/bash

Although this document should be self contained, we refer to MoKlee’s artifact page for additional information on MoKlee.

MoKlee Experiments (§6.5)

In Section §6.5 we compare the default allocator against KDAlloc in the context of MoKlee and show that KDAlloc improves the determinism during execution, an important requirement for memoised symbolic execution.

The results for our experiments can be found in /project/results. All runs are in their respective .stats files, e.g. coreutils_dfs_det.stats for the Coreutils benchmark suite with DFS searcher during recording and KDAlloc as allocator (det). These files contain tables that can be viewed in the terminal (e.g. less -S) or converted to .csv files (stats2csv). Most important for the MoKlee experiment is the SDiv column for the encountered divergences during re-execution runs with (resume) or without (replay) path-pruning.

When comparing two different implementations, it is important to compare only runs that explored a program under test in the same way (with a high probability). To this end, we only compared those cases where recording runs for both allocators had the same statistics for executed instructions (Instrs), early terminated states due to memory pressure (KStates), covered instructions (CoveredInst), number of allocations (NAllocs), number of queries (NAllQueries), and external calls (NExtCalls). The list of comparable applications can be found in comparable_<searcher>.csv and viewed with e.g. csvlook.

We observed only few divergences in comparable runs, which are listed in comparable_divergences_<searcher>.csv. nohup and shred diverged with both allocators, du, however, diverged only with the default allocator. MoKlee logs divergences to a warnings.txt file in its output directory and in the case of du it showed divergences in the following locations:

KLEE: WARNING: DIVERGENCE: 332,50282,/tmp/klee-uclibc-38/libc/string/memmove.c,40
KLEE: WARNING: DIVERGENCE: 331,50282,/tmp/klee-uclibc-38/libc/string/memmove.c,40
KLEE: WARNING: DIVERGENCE: 350,50282,/tmp/klee-uclibc-38/libc/string/memmove.c,40

Which means that three states (331,332,350) diverged in bitcode line 50282 in the memmove function located in memmove.c line 40. In contrast to the default allocator, KDAlloc’s deterministic allocation is able to prevent the divergence. Please note that, although MoKlee reports three different divergences, we consider it a single location of divergence (see Table 4).

In Table 4 we give the number of locations for divergences in uncomparable runs for the different benchmark suites. We included the list of locations in .csv files with the naming scheme: <benchmark>_uncomparable_divergences_<searcher>_<allocator>.csv. Each row contains the location in the bitcode file (asm line) and the corresponding line (src line) in the source file (src file).

Reproducing Our Results

To reproduce our results, we offer two scripts in /project/experiments. Unmodified, run-experiment.sh repeats all recording/re-execution runs with both allocators. Similar to the mainline experiment, we highly recommend to reduce the number of benchmarks in its configuration section on top. Not only because the experiments take several days on a single machine but also because MoKlee’s persistent execution trees occupy 418 GiB of storage with that many applications. For the sake of simplicity, we removed the categorisation into benchmark suites and put all applications in a single list.

After running the experiments, the evaluate-experiments.sh script can be used to generate the relevant data. It generates the following files:

comparable_<searcher>.csv: list of comparable runs
comparable_divergences_<searcher>.csv: number of divergences for both allocators among comparable runs
uncomparable_divergences_<searcher>_<allocator>.csv: number of divergences per allocator (similar to Table 4 but without the categorisation into benchmark suites)