We provide the three KLEE-based tools (mainline KLEE, SymLive & MoKlee) used in the paper in two docker images.
For the majority of our experiments, we provide the open-source, KDAlloc-enhanced versions of mainline KLEE and SymLive in the first docker image, kdalloc/main
.
For the MoKlee experiments described in §6.4 of the paper, we provide a separate docker image, kdalloc/moklee
.
Thus, the majority of the following deals with the kdalloc/main
image.
The download links for the Docker/VM images will be made available soon!
The kdalloc/main
docker image is structured as follows:
/project/examples/
: source code to Listing 1 & 3/project/mainline-experiments/
: script and parameters for mainline KLEE experiments/project/patches/stats.patch
: patch for additional allocator statistics in §6.2/project/queries/
: scripts and raw data used for the analysis in §6.3, Figures 7 & 8/project/results/
: csv files with the raw data used for Figures 4, 5 & 6; results from §6.5/project/root/bin/
: pre-compiled binaries like klee
and symlive-klee
(part of $PATH
)/project/src/klee-src/
: source code to mainline KLEE with KDAlloc/project/src/symlive-src/
: source code to SymLive with KDAlloc/project/symlive-experiments/
: script for SymLive experiments/project/tools/bc/
: all benchmarks listed in Tables 1 & 2 as LLVM bitcode filesWe have built and installed mainline KLEE and SymLive with KDAlloc in the docker image, they are available as klee
and symlive-klee
, respectively.
For the MoKlee experiments in §6.4, we provide a separate docker image, whose content is described in the corresponding section of this documentation. Please note that, as the source code for MoKlee is only available on request, we can not make it public through this artifact.
Our docker images provide comprehensive support for repeating all experiments from the paper. In particular, we provide:
kdalloc/main
image)kdalloc/main
image) and MoKlee (kdalloc/moklee
image) enhanced with KDAllockdalloc/main
image) and §6.4 (kdalloc/moklee
image)kdalloc/main
image) and Table 3 (kdalloc/moklee
image)kdalloc/main
image) and §6.4 (kdalloc/moklee
image)kdalloc/main
image)kdalloc/main
image)kdalloc/moklee
image)kdalloc/main
image)We provide the implementation for KDAlloc, along with its integration into mainline KLEE (and SymLive), to enable easy reuse and further modification.
The source code for KDAlloc can be found in /project/src/klee-src/include/pseudoalloc
as part of the kdalloc/main
image.
KDAlloc is a header-only library that can be reused for any other project, although a small FFI wrapper might become necessary for projects not written in modern C++.
In our KLEE-based use cases, we use klee::ref
for reference counting, but if the macro PSEUDOALLOC_KLEE_REF
is undefined or defined to 0
, it will use std::shared_ptr
instead.
In addition to simply reusing the whole allocator as-is, it is also configurable and modular to adapt it to similar, but different use cases.
For example, bin cutoffs can be configured in allocator.h
by changing the elements of allocator_t::control_t::_meta
in the pseudoalloc::allocator
namespace (lines 34-43).
More freedom can be achieved by changing how the allocator_t
member functions dispatch calls to the suballocators.
The suballocators themselves follow a fairly simple API that is somewhat complicated by sharing static data in a “control” structure that can store data that does not change in between forks (e.g., the size of the bin and its location in memory).
The behaviour of the slot allocator (§4.4 and §5.4 in the paper) resides in suballocators/slot_allocator.h
.
The large object allocator (§4.5 and §5.5 in the paper) is implemented in two different variants that can be switched out by defining either PSEUDOALLOC_MAP_LOH
or PSEUDOALLOC_COW_TREAP_LOH
(see also suballocators/split_region_allocator.h
):
PSEUDOALLOC_COW_TREAP_LOH
(enabled in our experiments) uses a custom treap with node-level (i.e., subtreap-level) reference counted copy-on-write.
See also suballocators/split_region_allocator/cow_treap_based.h
for the suballocator and sized_regions.h
(same directory) for the treap itself.PSEUDOALLOC_MAP_LOH
(disabled in our experiments) uses two std::map
s to achieve a similar goal, but is much slower and does not include node-level copy-on-write.To see how we integrated KDAlloc into mainline KLEE, please refer to /project/src/klee-src/lib/Core/MemoryManager.cpp
, where we essentially replaced KLEE’s original deterministic allocator for ours.
The mechanism for external function calls (§5.3 in the paper) is mainly implemented in Executor::callExternalFunction
in /project/src/klee-src/lib/Core/Executor.cpp
.
In addition to our work, the artifact also contains, e.g., containers built on full Linux base images and sources from other open-source projects under different licenses; the parts that constitute our work (this document and our allocator) are available under CC-BY 4.0 International.
Our experiments requires up to about 16 GB of main memory. Some additional headroom is suggested.
A working Docker installation is required (alternatives like Podman may also work, but have not been tested by us).
Additionally, transparent huge page (THP) support must be set to madvise
or never
:
$ cat /sys/kernel/mm/transparent_hugepage/enabled
[always] madvise never
$ echo madvise | sudo tee /sys/kernel/mm/transparent_hugepage/enabled >/dev/null
$ cat /sys/kernel/mm/transparent_hugepage/enabled
always [madvise] never
If your kernel is set to always
use THP and you are unable or unwilling to change this setting, please use the provided virtual machine instead.
The two provided docker images can then be loaded:
$ docker load -i kdalloc_main.tgz
$ docker load -i kdalloc_moklee.tgz
Import vm.ova
into your virtualisation tool of choice (tested with VMWare Workstation and VirtualBox).
The VM is set to 12 GiB memory, which is enough to run most, but not all of our experiments.
You are therefore encouraged to increase it to at least 24 GiB if your system can sustain it - decreasing down to 4 GiB will significantly reduce the amount of experiments that can be run, but may still be viable to ascertain validity of the approach.
The user user
has the password password
and SSH is enabled.
As this is fairly insecure, please do not expose the VM to an unsecure network without disabling SSH first.
To start a kdalloc/main
container, please execute the following commands:
$ docker run -it kdalloc/main /bin/bash
This docker image contains the great majority of our experiments. Only for the MoKlee experiments, the second image is needed, whose setup we describe just before it is needed.
After completing the setup described just above, we suggest to run both of the examples (taken from Listing 1 & 3 of the paper) listed below. They should overall take only a couple of minutes to complete, with the actual KLEE and SymLive runs taking just several seconds in every case.
In §6.3 of the paper, we describe how we use KDAlloc to improve KLEE’s use-after-free detection. The example used in the paper (Listing 1) is included in our artifact. In the following, we provide the steps to repeat our experiments. Each involved KLEE call only takes a couple of seconds to complete.
# change into the example directory
$ cd /project/examples
# compare example with Listing 1 of the paper
$ cat uaf.c
# compilation step (creates uaf.bc)
$ clang -emit-llvm -fno-discard-value-names -Wall -Wextra -g \
-O1 -Xclang -disable-llvm-passes -D__NO_STRING_INLINES \
-D_FORTIFY_SOURCE=0 -U__OPTIMIZE__ -c uaf.c
# run without KDAlloc (no error detected)
$ klee -libc=uclibc -posix-runtime uaf.bc
# completes without "KLEE: ERROR:" line
# run with KDAlloc
$ klee -allocate-determ=1 -libc=uclibc -posix-runtime uaf.bc
# [...]
KLEE: ERROR: libc/string/strlen.c:22: memory error: use after free
# [...]
# further inspect error (example)
$ cat klee-last/test000001.ptr.err
# [...]
Stack:
#000010621 in strlen (s=139100132610048) at libc/string/strlen.c:22
#100013524 in fputs_unlocked (s=139100132610048, stream=140063547461720) at libc/stdio/fputs.c:25
#200012678 in puts (s=139100132610048) at libc/stdio/puts.c:26
#300008828 in __klee_posix_wrapped_main () at uaf.c:14
# [...]
While the error is reported in strlen
, the inspection of the error using the ptr.err
file reveals that this function is called by puts
and corresponds to the first usage of the pointer s
after free(s)
.
As we describe in the paper, if optimisation (-optimize
) is enabled, KLEE can also find an error without KDAlloc:
$ klee -optimize -libc=uclibc -posix-runtime uaf.bc
# [...]
KLEE: ERROR: libc/string/strlen.c:22: memory error: out of bound pointer
# [...]
Note, however, that without KDAlloc the error is reported as “out of bound pointer” instead of “use after free”.
In §6.5 of the paper, we describe how the example from Listing 3 can only be detected when SymLive is using KDAlloc, while evading detection with the default allocator. If you wonder how the example works, please refer to §6.5. The example from the paper (Listing 3) is included in our artifact. In the following, we provide the steps to repeat our experiments. The two involved SymLive runs only take a couple of seconds to complete.
# change into the example directory
$ cd /project/examples
# compare example with Listing 3 of the paper
$ cat symlive.c
# compilation step (creates symlive.bc)
$ clang -emit-llvm -fno-discard-value-names -Wall -Wextra -g \
-O1 -Xclang -disable-llvm-passes -D__NO_STRING_INLINES \
-D_FORTIFY_SOURCE=0 -U__OPTIMIZE__ -I/project/root/include/ \
-c symlive.c
# run with KDAlloc (detects infinite loop)
$ symlive-klee -posix-runtime -only-output-states-covering-new \
-external-calls=all -exit-on-error-type=InfiniteLoop \
-infinite-loop-detection-truncate-on-fork=0 \
-detect-infinite-loops=1 -allocate-determ-quarantine=0 \
-allocate-determ=1 symlive.bc
# [...]
KLEE: ERROR: symlive.c:11: infinite loop
# [...]
KLEE: done: completed paths = 2
# [...]
# run with default allocator (no infinite loop found)
$ symlive-klee -posix-runtime -only-output-states-covering-new \
-external-calls=all -exit-on-error-type=InfiniteLoop \
-infinite-loop-detection-truncate-on-fork=0 \
-detect-infinite-loops=1 -allocate-determ-quarantine=0 \
-allocate-determ=0 -max-time=10s symlive.bc
# [...]
# no "infinite loop" error detected
# [...]
# "completed paths" is significantly > 2, e.g. 230
# [...]
Both calls to symlive-klee
differ in the last line (just before symlive.bc
):
-allocate-determ
: set to 1
in the KDAlloc case; 0
for default allocation-max-time=10s
is only present in the default allocator case (due to non-detection of the infinite loop)The output shows that in the KDAlloc case, the infinite loop is detected and, due -exit-on-error-type=InfiniteLoop
, no further exploration is performed.
Thus, there are only two paths when SymLive terminates.
In case of default allocation, we abort the exploration after 10 seconds, during which no infinite loop has been detected and significantly more paths were forked.
This corresponds to the result reported in §6.5 of the paper.
Before we describe scripts that include the necessary KLEE parameters, we describe the flags that we added to control KDAlloc:
-allocate-determ=1
enables deterministic allocation using KDAlloc (default=0: disabled)-allocate-determ-quarantine=<size>
is used to adjust the size of quarantine queues (default=8, disabled=0, unlimited=-1)Each <segment>
size (in GiB) and (page-aligned) start address can be set using
-allocate-determ-<segment>-size
-allocate-determ-<segment>-start-address
where <segment>
is any of constants
, globals
, heap
and stack
.
For the respective default sizes, please refer to Figure 2 of the paper.
In our experimental runs, we use the default sizes, but specify fixed start addresses to ensure cross-run deterministic behaviour:
--allocate-determ-constants-start-address=35184372088832
--allocate-determ-globals-start-address=36283883716608
--allocate-determ-heap-start-address=37383395344384
--allocate-determ-stack-start-address=38482906972160
We only slightly modified KLEE by adding a few statistics and thresholds and implemented KDAlloc as an additional allocator.
The necessary files to re-run our experiments can be found in /project/mainline-experiments
:
dfs_instrs.csv/rndcov_instrs.csv
- The maximum number of instructions to execute for each benchmark for both search strategies (dfs/rndcov). The values correspond to runs taking roughly 2hrs on our machines.nondet_dfs_covupdate.csv/nondet_rndcov_covupdate.csv
- The number of instructions after which the coverage data gets updated for each benchmark.maxstates.csv
- The maximum number of active states. We use a state-based limit instead of memory limits to increase determinism.tools.txt
- The list of paths for all benchmarks.run-experiments.py
- A script to re-run our experiments.The run-experiment.py
script starts a single run of experiments for all benchmarks.
By altering the include_experiments
list in the configuration section of the script, the number of benchmarks can be reduced (use /project/mainline-experiments/run-experiments.py -s
to check your settings).
We highly recommend to do that, as the full set requires more than 24 days on a single machine similar to ours.
Additionally, we include the same data that we used for the paper in /project/results
(see “Generating the Graphs from Raw Data”)
After setting up /project/mainline-experiments/run-experiments.py
, the experiments can be started and the results retrieved:
$ mkdir /project/mainline-experiments/run0
$ cd /project/mainline-experiments/run0
$ ../run-experiments.py
$ klee-stats --print-all --table-format=csv . > ../results/run0.csv
# repeat with run1, run2, ..., run4
$ cd /project/mainline-experiments/results
$ evaluation_artifact.R
To analyse queries issued by KLEE, we performed separate runs of expand
with the default allocator and with KDAlloc.
In each of these runs, we used the additional KLEE options --use-query-log=all:kquery,solver:kquery
and --compress-query-log
.
The first flag instructs KLEE to record all queries issued (all
), as well as all queries dispatched to the solver (solver
) in the KQuery format.
The second flag is used to compress the resulting files at runtime, which reduces overhead.
We provide the 100 queries that we manually inspected at /project/queries/expand_dfs/top-100-difftime-det
and /project/queries/expand_dfs/top-100-difftime-nondet
(with and without KDAlloc, respectively).
These files are sufficient to follow our insights on differences in query times resulting from the ability of the solvers to reduce Select
expressions (see “Analysing Time Differences of Individual Queries”).
However, we provide a detailed description, all scripts and the raw data needed to extract the 100 most impactful queries (see “Analysing the Raw Query Logs”).
We provide the 100 queries with the highest difference between default allocation and KDAlloc in top-100-difftime-nondet
and top-100-difftime-det
, respectively.
As we noted in the paper, all these top 100 queries are structurally very similar, and just the constants and arrays differ between both variants.
To see how much the execution times differ despite this similarity, we can execute both variants of a query through kleaver
with the z3
solver backend.
To reliably measure the short execution times, we use hyperfine.
$ cd /project/queries/expand_dfs/
$ hyperfine --runs 200 --warmup 5 "kleaver -solver-backend=z3 top-100-difftime-nondet/query-0533316.kquery"
Benchmark #1: kleaver -solver-backend=z3 top-100-difftime-nondet/query-0533316.kquery
Time (mean ± σ): 34.0 ms ± 2.1 ms [User: 27.9 ms, System: 5.4 ms]
Range (min … max): 30.4 ms … 44.1 ms 200 runs
$ hyperfine --runs 200 --warmup 5 "kleaver -solver-backend=z3 top-100-difftime-det/query-0533316.kquery"
Benchmark #1: kleaver -solver-backend=z3 top-100-difftime-det/query-0533316.kquery
Time (mean ± σ): 18.5 ms ± 1.1 ms [User: 13.1 ms, System: 5.0 ms]
Range (min … max): 16.8 ms … 23.6 ms 200 runs
The mean time z3
takes differs by a factor of 1.83 between non-deterministic and deterministic query.
We can observe a similar pattern (although with a different factor) for STP:
$ cd /project/queries/expand_dfs/
$ hyperfine --runs 200 --warmup 5 "kleaver -solver-backend=stp top-100-difftime-nondet/query-0533316.kquery"
Benchmark #1: kleaver -solver-backend=stp top-100-difftime-nondet/query-0533316.kquery
Time (mean ± σ): 23.0 ms ± 2.3 ms [User: 17.9 ms, System: 4.8 ms]
Range (min … max): 20.7 ms … 32.5 ms 200 runs
$ hyperfine --runs 200 --warmup 5 "kleaver -solver-backend=stp top-100-difftime-det/query-0533316.kquery"
Benchmark #1: kleaver -solver-backend=stp top-100-difftime-det/query-0533316.kquery
Time (mean ± σ): 10.1 ms ± 0.5 ms [User: 5.5 ms, System: 4.4 ms]
Range (min … max): 9.0 ms … 12.3 ms 200 runs
This disparity can be observed similarly for every other top 100 query.
To prepare a more detailed inspection, we manually minimised an example query (query-0533316.kquery
), making sure not to change the solving time ratio we just observed.
The resulting query only depends on the three arrays A-data
, A-data-stat
, const_arr5
.
We provide the two resulting files in kquery format (with *.min.kquery
suffix).
The next step is to convert the query in both variants into the SMT-LIB 2 format, which is a lot more verbose but can be read by z3
directly.
To analyse how z3
treats the two query variants differently, we use a debug version of z3
(provided as z3-debug
in the kdalloc/main
image).
Only such a debug version offers the additional tracing flags used for our analyis.
$ cd /project/queries/expand_dfs/
$ cp -v query-0533316-trace/*.min.kquery .
'query-0533316-trace/query-0533316-det.min.kquery' -> './query-0533316-det.min.kquery'
'query-0533316-trace/query-0533316-nondet.min.kquery' -> './query-0533316-nondet.min.kquery'
$ kleaver --print-smtlib query-0533316-det.min.kquery > query-det.smt2
$ kleaver --print-smtlib query-0533316-nondet.min.kquery > query-nondet.smt2
$ z3-debug -tr:th_rewriter_step -tr:simplifier -v:10 query-det.smt2
(simplifier :num-exprs 203 :num-asts 1289 :time 0.74 :before-memory 0.67 :after-memory 0.69)
(propagate-values :num-exprs 142 :num-asts 1265 :time 0.06 :before-memory 0.68 :after-memory 0.69)
(solve_eqs :num-exprs 142 :num-asts 1265 :time 0.00 :before-memory 0.69 :after-memory 0.69)
(elim-uncnstr :num-exprs 142 :num-asts 1265 :time 0.00 :before-memory 0.69 :after-memory 0.69)
(reduce-bv-size :num-exprs 142 :num-asts 1265 :time 0.00 :before-memory 0.69 :after-memory 0.69)
(simplifier :num-exprs 142 :num-asts 1265 :time 0.01 :before-memory 0.69 :after-memory 0.70)
(max-bv-sharing :num-exprs 142 :num-asts 1265 :time 0.00 :before-memory 0.69 :after-memory 0.70)
(ackermannize_bv :num-exprs 142 :num-asts 1403 :time 0.00 :before-memory 0.69 :after-memory 0.75)
(simplifier :num-exprs 106 :num-asts 1403 :time 0.01 :before-memory 0.75 :after-memory 0.75)
(propagate-values :num-exprs 106 :num-asts 1403 :time 0.00 :before-memory 0.75 :after-memory 0.75)
(solve_eqs :num-exprs 35 :num-asts 1371 :time 0.01 :before-memory 0.75 :after-memory 0.76)
(:num-elim-vars 32)
(elim-uncnstr :num-exprs 7 :num-asts 1431 :time 0.00 :before-memory 0.75 :after-memory 0.76)
(:num-elim-apps 16)
(reduce-bv-size :num-exprs 7 :num-asts 1431 :time 0.00 :before-memory 0.76 :after-memory 0.76)
(simplifier :num-exprs 3 :num-asts 1430 :time 0.00 :before-memory 0.76 :after-memory 0.76)
(simplifier :num-exprs 3 :num-asts 1430 :time 0.00 :before-memory 0.76 :after-memory 0.76)
(max-bv-sharing :num-exprs 3 :num-asts 1430 :time 0.00 :before-memory 0.76 :after-memory 0.76)
(ackermannize_bv :num-exprs 3 :num-asts 1430 :time 0.00 :before-memory 0.76 :after-memory 0.76)
(bv1-blaster :num-exprs 3 :num-asts 1430 :time 0.00 :before-memory 0.76 :after-memory 0.76)
(smt.tactic start)
(smt.propagate-values)
(smt.reduce-asserted)
(smt.maximizing-bv-sharing)
(smt.reduce-asserted)
(smt.flatten-clauses)
(smt.simplifier-done)
(smt.searching)
(smt :num-exprs 0 :num-asts 1430 :time 0.00 :before-memory 0.76 :after-memory 0.77)
sat
$ mv .z3-trace z3-trace-det.log
$ z3-debug -tr:th_rewriter_step -tr:simplifier -v:10 query-nondet.smt2
(simplifier :num-exprs 271 :num-asts 1570 :time 2.07 :before-memory 0.73 :after-memory 0.81)
(propagate-values :num-exprs 271 :num-asts 1439 :time 0.00 :before-memory 0.77 :after-memory 0.79)
(solve_eqs :num-exprs 271 :num-asts 1439 :time 0.00 :before-memory 0.79 :after-memory 0.79)
(elim-uncnstr :num-exprs 271 :num-asts 1439 :time 0.00 :before-memory 0.79 :after-memory 0.79)
(reduce-bv-size :num-exprs 271 :num-asts 1439 :time 0.00 :before-memory 0.79 :after-memory 0.79)
(simplifier :num-exprs 269 :num-asts 1494 :time 0.19 :before-memory 0.79 :after-memory 0.80)
(max-bv-sharing :num-exprs 269 :num-asts 1458 :time 0.00 :before-memory 0.79 :after-memory 0.79)
(ackermannize_bv :num-exprs 269 :num-asts 3011 :time 3.12 :before-memory 0.79 :after-memory 1.09)
(smt.tactic start)
(smt.propagate-values)
(smt.reduce-asserted)
(smt.maximizing-bv-sharing)
(smt.reduce-asserted)
(smt.flatten-clauses)
(smt.simplifier-done)
(smt.searching)
(smt.simplifying-clause-set :num-deleted-clauses 35)
(smt.simplifying-clause-set :num-deleted-clauses 710)
(smt.simplifying-clause-set :num-deleted-clauses 0)
(smt.simplifying-clause-set :num-deleted-clauses 0)
(smt.simplifying-clause-set :num-deleted-clauses 0)
(smt.simplifying-clause-set :num-deleted-clauses 0)
(smt.simplifying-clause-set :num-deleted-clauses 0)
(smt :num-exprs 0 :num-asts 2399 :time 0.11 :before-memory 1.09 :after-memory 1.09)
sat
$ mv .z3-trace z3-trace-nondet.log
In the respective first lines output by each z3-debug
invocation above, we can see that already after the first major step (“simplifier”), the number of expressions differs considerably (271 for queries without KDAlloc vs 203 with KDAlloc).
We use the -tr:th_rewriter_step
option to trace the simplification steps performed by the so-called theory rewriter, which runs as part of the simplifier.
We also add -tr:simplifier
for orientation in the trace (ensuring that the rewriting steps are indeed performed as part of the first simplifier invocation) and -v:10
to set the verbosity (for the output shown above).
The trace written by z3-debug
is stored in a file called .z3-trace
, which we have to backup to a different location before invoking z3-debug
again.
Our original files are provided in /project/queries/expand_dfs/query-0533316-trace
.
We now look at the two trace files, z3-trace-det.log
and z3-trace-nondet.log
.
While we primarily study the former to understand how the query was optimised, we use the latter to find out where the traces differ.
We do this by looking at the diff
of the two files side-by-side.
In the docker container, one can do this using diff -y z3-trace-det.log z3-trace-nondet.log | less
, although using a graphical diff viewer might be more comfortable.
The first difference (in the following, we only show excerpts from z3-trace-det.log
) ocurrs at
-------- [th_rewriter_step] reduce_app ../../z3/src/ast/rewriter/th_rewriter.cpp:627 ---------
extract
(let ((a!1 (ite (and (= (select A-data-stat #x00000008) #x00)
(= (select A-data-stat #x00000009) #x00)
(= (select A-data-stat #x0000000a) #x00)
(= (select A-data-stat #x0000000b) #x00)
(= (select A-data-stat #x0000000c) #x00)
(= (select A-data-stat #x0000000d) #x00)
(= (select A-data-stat #x0000000e) #x00)
(= (select A-data-stat #x0000000f) #x00))
#x0000000000000000
#x0000224400000000)))
(bvadd #xffffddbc00000000 a!1))
---------->
(let ((a!1 (ite (and (= (select A-data-stat #x00000008) #x00)
(= (select A-data-stat #x00000009) #x00)
(= (select A-data-stat #x0000000a) #x00)
(= (select A-data-stat #x0000000b) #x00)
(= (select A-data-stat #x0000000c) #x00)
(= (select A-data-stat #x0000000d) #x00)
(= (select A-data-stat #x0000000e) #x00)
(= (select A-data-stat #x0000000f) #x00))
#x0000000000000000
#x0000224400000000)))
(bvadd ((_ extract 31 0) #xffffddbc00000000) ((_ extract 31 0) a!1)))
where a!1
translates to (Select w64 C 0x00 0x224400000000)
in KQuery (Select
is called ite
here and widths are implicit).
With the enclosing (Extract w32 0 ...)
, the query fragment transformed here corresponds to the example from Figure 7 in the paper.
The step shown in the trace can be read as extract Q ----------> Q'
, where extract
is the operation that is being rewritten; Q
is its operand before the transformation, and Q'
is the resulting query fragment.
Thus, (Extract w32 0 Q)
is rewritten to Q'
.
From this, we can see that the rewriting step above corresponds to the first step displayed in Figure 7:
The Extract
is pushed down to the operands of the Add
(here: bvadd
) operation.
The following entry in the trace
------------------------------------------------
-------- [th_rewriter_step] reduce_app ../../z3/src/ast/rewriter/th_rewriter.cpp:645 ---------
extract
#xffffddbc00000000
---------->
#x00000000
corresponds to the first half of the second step: (Extract w32 0 0xFFFFDDBC00000000) -> 0x00
.
Then (for the second half), another Extract
is pushed down, this time into the Select
(here: ite
):
------------------------------------------------
-------- [th_rewriter_step] reduce_app ../../z3/src/ast/rewriter/th_rewriter.cpp:627 ---------
extract
(ite (and (= (select A-data-stat #x00000008) #x00)
(= (select A-data-stat #x00000009) #x00)
(= (select A-data-stat #x0000000a) #x00)
(= (select A-data-stat #x0000000b) #x00)
(= (select A-data-stat #x0000000c) #x00)
(= (select A-data-stat #x0000000d) #x00)
(= (select A-data-stat #x0000000e) #x00)
(= (select A-data-stat #x0000000f) #x00))
#x0000000000000000
#x0000224400000000)
---------->
(ite (and (= (select A-data-stat #x00000008) #x00)
(= (select A-data-stat #x00000009) #x00)
(= (select A-data-stat #x0000000a) #x00)
(= (select A-data-stat #x0000000b) #x00)
(= (select A-data-stat #x0000000c) #x00)
(= (select A-data-stat #x0000000d) #x00)
(= (select A-data-stat #x0000000e) #x00)
(= (select A-data-stat #x0000000f) #x00))
((_ extract 31 0) #x0000000000000000)
((_ extract 31 0) #x0000224400000000))
We can see that the query fragment after the transformation translates to
(Select w64 C (Extract w32 0 0x00) (Extract w32 0 0x224400000000)))
with C
as the and
operation.
Afterwards, the two Extract
operations are rewritten, corresponding to the third step in Figure 7:
------------------------------------------------
-------- [th_rewriter_step] reduce_app ../../z3/src/ast/rewriter/th_rewriter.cpp:645 ---------
extract
#x0000000000000000
---------->
#x00000000
------------------------------------------------
-------- [th_rewriter_step] reduce_app ../../z3/src/ast/rewriter/th_rewriter.cpp:645 ---------
extract
#x0000224400000000
---------->
#x00000000
Finally, we arrive at the most important step, removing the Select
over a non-trivial condition C
:
------------------------------------------------
-------- [th_rewriter_step] reduce_app ../../z3/src/ast/rewriter/th_rewriter.cpp:627 ---------
if
(and (= (select A-data-stat #x00000008) #x00)
(= (select A-data-stat #x00000009) #x00)
(= (select A-data-stat #x0000000a) #x00)
(= (select A-data-stat #x0000000b) #x00)
(= (select A-data-stat #x0000000c) #x00)
(= (select A-data-stat #x0000000d) #x00)
(= (select A-data-stat #x0000000e) #x00)
(= (select A-data-stat #x0000000f) #x00))
#x00000000
#x00000000
---------->
#x00000000
We have seen how the steps performed by z3
simplify the query when KDAlloc is used, as shown in Figure 7 of the paper.
To relate the steps shown in Figure 8 (the nondet
variant of the query) to the corresponding z3
trace, we can follow the steps above, albeit looking at the other side of the diff.
We can see that z3
performs the same initial steps as above, but cannot and, indeed, does not simplify (Select w64 C 0x00 0x58D6F400)
, since 0x00 != 0x58D6F400
.
The following describes how to get from the query-log runs we provided in our raw data to the 100 most impactful queries. This step is time and memory intensive. If you prefer to follow our analysis directly, please refer to “Analysing Time Differences of Individual Queries”.
We included the raw data from our query-log runs in compressed form, to be unextracted as follows:
$ cd /project/queries/expand_dfs/
$ tar xvf expand_dfs.tar.lzma
$ gzip -v -d expand_dfs_*/*queries.kquery.gz
The last step further decompresses two files for each variant (det
with KDAlloc and nondet
without): A all-queries.kquery
and a solver-queries.kquery
file.
The former logs all queries issued by KLEE, while the latter only contains the queries that reached the solver.
Thus, a query that can be answered by one of KLEE’s caches will only be in the first, not in the second log file.
First, we confirm that the time disparity stems from the solver and is not due to caching.
For this, we add up and compare the total query times, once for the all-queries.kquery
and once for the solver-queries.kquery
files.
$ cd /project/queries/
$ ./diff-queries-all.py expand_dfs/expand_dfs_det_211202132702/all-queries.kquery expand_dfs/expand_dfs_nondet_211202132702/all-queries.kquery
Total time elapsed: 3617.93361840200 seconds expand_dfs/expand_dfs_det_211202132702/all-queries.kquery
Total time elapsed: 6750.46206326700 seconds expand_dfs/expand_dfs_nondet_211202132702/all-queries.kquery
$ ./diff-queries-solver.py expand_dfs/expand_dfs_det_211202132702/solver-queries.kquery expand_dfs/expand_dfs_nondet_211202132702/solver-queries.kquery
Total time elapsed: 3216.151834275 seconds expand_dfs/expand_dfs_det_211202132702/solver-queries.kquery
Total time elapsed: 6387.882213567 seconds expand_dfs/expand_dfs_nondet_211202132702/solver-queries.kquery
As we can see, the difference in all-queries
between the det
and nondet
variants are indeed dominated by the solver time (solver-queries
).
Our diff-queries-all.py
script also provides an option (--csv FILE
) that generates a csv file comparing solver times query-by-query.
The most time consuming step is matching the queries, that is mapping constants
that relate to non-deterministic addresses to their deterministic counterparts.
We thus separated this computation from the analysis using a script that creates an expand_dfs.pkl
file, which will later be used by other scripts.
$ cd /project/queries/
$ ./match-queries.py -da expand_dfs/expand_dfs_det_211202132702/all-queries.kquery -ds expand_dfs/expand_dfs_det_211202132702/solver-queries.kquery -na expand_dfs/expand_dfs_nondet_211202132702/all-queries.kquery -ns expand_dfs/expand_dfs_nondet_211202132702/solver-queries.kquery -o expand_dfs.pkl
Afterwards, we want to look at the statistics of the matching process.
We can do so using the just generated expand_dfs.pkl
file:
$ cd /project/queries/
$ ./show-matching-stats.py -i expand_dfs.pkl > expand_dfs-matching-stats.log
The output from this script provides the numbers listed in §6.3 in the paper.
Finally, we use another two-step process to extract the top 100 queries that we analysed in further detail.
First, we extract expand_dfs-top100.pkl
from expand_dfs.pkl
:
$ cd /project/queries/
$ ./extract-top-difftime-queries.py -i expand_dfs.pkl -o expand_dfs-top100.pkl
Then, we use another script to write these queries into individual files:
$ cd /project/queries/
$ ./write-query-files.py -i expand_dfs-top100.pkl
The script write-query-files.py
creates two directories, det
and nondet
, each containing 100 kquery
files.
Similarly to the mainline experiment, the set of benchmarks can be reduced in the configuration section of run-experiments.py
in /project/symlive-experiments
.
However, the list of benchmarks is small and can be found in tools.txt
.
To re-run the experiments and list detected infinite loops, execute:
$ cd /project/symlive-experiments/runs
$ ../run-experiments.py
$ find . -name '*.infty.err'
./minimal_dfs_det/test000001.infty.err
...
Our results can be found in the /project/results
directory.
symlive_infty.txt
lists the error files for the found infinite loops, whereas symlive.stats
contains the execution statistics (use e.g. less -S
to view).
The column L
lists detected infinite loops (D
).
This column was added manually from the results in symlive_infty.txt
and is not available from KLEE directly.
The Docker image contains all the packages necessary to generate graphs based on the raw data.
The raw data for all five runs (run0.csv, ..., run4.csv
) is in /project/results
. An R-based script (evaluation_artifact.R
) generates all the png
files based on that raw data.
The script is commented for better understanding (nano $(which evaluation_artifact.R)
).
To execute the script, change into the directory /project/results
and execute: evaluation_artifact.R
.
This should generate the following output:
$ cd /project/results
$ evaluation_artifact.R
[...]
[1] "27 Deterministic apps DFS: ar, asn1Decoding, basename, bc, cut, dircolors, dirname, echo, expand, false, fold, getlimits, logname, make, mkdir, pathchk, pinky, printenv, printf, pwd, tail, tr, true, uname, users, whoami, yes"
[1] "67 Non-Deterministic apps DFS: addr2line, base64, basenc, cksum, comm, cp, datamash, date, [, diff, elfedit, env, expr, factor, find, gawk, head, id, join, link, ln, ls, m4, magick, md5sum, mkfifo, mknod, mktemp, mv, nl, nm-new, numfmt, objdump, od, oggenc, osip, paste, pr, ranlib, readelf, readlink, realpath, sed, seq, sha256sum, shred, size, sqlite3, stat, strip-new, stty, sum, tac, tcpdump, tee, test, tiff, tiffdump, touch, tsort, tty, unexpand, uniq, unlink, uptime, who, xml"
[1] "23 Deterministic apps Rnd+Cov: ar, asn1Decoding, basename, bc, dirname, echo, false, getlimits, logname, make, nm-new, objdump, pinky, printenv, pwd, ranlib, strip-new, true, uname, users, who, whoami, yes"
[1] "71 Non-Deterministic apps Rnd+Cov: addr2line, base64, basenc, cksum, comm, cp, cut, datamash, date, [, diff, dircolors, elfedit, env, expand, expr, factor, find, fold, gawk, head, id, join, link, ln, ls, m4, magick, md5sum, mkdir, mkfifo, mknod, mktemp, mv, nl, numfmt, od, oggenc, osip, paste, pathchk, pr, printf, readelf, readlink, realpath, sed, seq, sha256sum, shred, size, sqlite3, stat, stty, sum, tac, tail, tcpdump, tee, test, tiff, tiffdump, touch, tr, tsort, tty, unexpand, uniq, unlink, uptime, xml"
Warning messages:
1: Removed 24 rows containing missing values (geom_text_repel).
2: Removed 25 rows containing missing values (geom_text_repel).
3: Removed 23 rows containing missing values (geom_text_repel).
4: Removed 18 rows containing missing values (geom_text_repel).
5: Removed 17 rows containing missing values (geom_text_repel).
6: Removed 17 rows containing missing values (geom_text_repel).
After that, in the same directory, there will be several graphs generated:
$ ls *.png
scatterplot_execution_time_dfs_determ_TRUE.png scatterplot_maxrss_rndcov_determ_TRUE.png
scatterplot_execution_time_rndcov_determ_TRUE.png scatterplot_solver_time_dfs_determ_TRUE.png
scatterplot_maxrss_dfs_determ_TRUE.png scatterplot_solver_time_rndcov_determ_TRUE.png
The filenames are of the pattern scatterplot_<variable>_<searcher>_determ_TRUE.png
.
Valid values for <variable>
are:
maxrss
: Figure 4.execution_time
: Figure 5.solver_time
: Figure 6.<searcher>
can be either dfs
for the left side of each figure and rndcov
for the right side of each figure.
The R script above automatically removes experiments that are not comparable. As explained in the paper, we aim to evaluate the isolated effects of deterministic allocation on the performance of symbolic execution. Therefore, we require that all executions (with KDAlloc and the default allocator) take the same paths through the program so that every value computed and every solver query issued might only differ by memory addresses.
It is important to note that we cannot conclude anything from differing performance due to different exploration, as the slower path might, for example, have to solve more difficult queries due to being in a different part of the program, while the faster path might not even have to solve any queries at all.
We consider an experiment comparable, when under each search strategy all ten runs, five for each allocator, show the same values for core statistics, e.g. same numer of instructions, covered instructions, allocations, queries, and external calls.
Due to non-determinism, three resource-intensive applications had to be terminated early with KDAlloc: sqlite
(dfs), tac
(dfs), tcpdump
(rndcov).
Application-specific thresholds determined with the default allocator simply did not limit the execution as desired and KLEE ran out of memory while exploring different code sections.
Another benchmark, oggenc
(rndcov), hit an assertion in KDAlloc when it tried to allocate several Exabytes of memory due to an unconstrained malloc
with symbolic size.
In §6.2 we report several additional statistics about allocations and live objects.
These results stem from a single experiment run with an enhanced statistics-tracking version of KLEE with KDAlloc.
The raw data can be found in results/run_stats.csv
.
As the KLEE binary provided with the Docker container does not track those statistics, we provide a patch in patches/stats.patch
that can be applied:
/project/src/klee-src$ patch -s -p1 < ../../patches/stats.patch
to build that version:
/project/src/klee-build$ make -j2
The relevant statistics are:
MaxNumAllocsPerState
: max. number of allocations in a single stateMaxNumBytesAllocatedPerState
: max. number of allocated bytes in a single stateMaxLiveAllocs
/MaxLiveAllocsPerState
: max. number of live allocations (in a single state)MaxLiveBytesAllocated
/MaxLiveBytesAllocatedPerState
: max. number of live bytes (in a single state)To generate graphs for other variables of interest, just add the following lines to the final loop in evaluation_artifact.R
:
plot <- data %>% plot_scatterplot(searcher, determ, `Time(s)`, "Time (s)", 0.1)
file_name <- paste("scatterplot_execution_time_", searcher, "_determ_", determ, sep ="")
save_plot(file_name, plot)
Replace Time(s)
with the variable of interest (e.g. Instrs
) and change scatterplot_execution_time_
as well (e.g. scatterplot_instructions_
).
Please use klee-stats --help
for further description of the variables.
Only for the experiments following this section, the kdalloc/moklee
docker image is needed.
To import it and start a container, please execute the following commands:
$ docker load -i kdalloc_moklee.tgz
$ docker run -it kdalloc/moklee /bin/bash
Although this document should be self contained, we refer to MoKlee’s artifact page for additional information on MoKlee.
In Section §6.5 we compare the default allocator against KDAlloc in the context of MoKlee and show that KDAlloc improves the determinism during execution, an important requirement for memoised symbolic execution.
The results for our experiments can be found in /project/results
.
All runs are in their respective .stats
files, e.g. coreutils_dfs_det.stats
for the Coreutils benchmark suite with DFS
searcher during recording and KDAlloc as allocator (det
).
These files contain tables that can be viewed in the terminal (e.g. less -S
) or converted to .csv
files (stats2csv
).
Most important for the MoKlee experiment is the SDiv
column for the encountered divergences during re-execution runs with (resume
) or without (replay
) path-pruning.
When comparing two different implementations, it is important to compare only runs that explored a program under test in the same way (with a high probability).
To this end, we only compared those cases where recording runs for both allocators had the same statistics for executed instructions (Instrs
), early terminated states due to memory pressure (KStates
), covered instructions (CoveredInst
), number of allocations (NAllocs
), number of queries (NAllQueries
), and external calls (NExtCalls
).
The list of comparable applications can be found in comparable_<searcher>.csv
and viewed with e.g. csvlook
.
We observed only few divergences in comparable runs, which are listed in comparable_divergences_<searcher>.csv
.
nohup
and shred
diverged with both allocators, du
, however, diverged only with the default allocator.
MoKlee logs divergences to a warnings.txt
file in its output directory and in the case of du
it showed divergences in the following locations:
KLEE: WARNING: DIVERGENCE: 332,50282,/tmp/klee-uclibc-38/libc/string/memmove.c,40
KLEE: WARNING: DIVERGENCE: 331,50282,/tmp/klee-uclibc-38/libc/string/memmove.c,40
KLEE: WARNING: DIVERGENCE: 350,50282,/tmp/klee-uclibc-38/libc/string/memmove.c,40
Which means that three states (331,332,350) diverged in bitcode line 50282 in the memmove
function located in memmove.c
line 40.
In contrast to the default allocator, KDAlloc’s deterministic allocation is able to prevent the divergence.
Please note that, although MoKlee reports three different divergences, we consider it a single location of divergence (see Table 4).
In Table 4 we give the number of locations for divergences in uncomparable runs for the different benchmark suites.
We included the list of locations in .csv
files with the naming scheme: <benchmark>_uncomparable_divergences_<searcher>_<allocator>.csv
.
Each row contains the location in the bitcode file (asm line
) and the corresponding line (src line
) in the source file (src file
).
To reproduce our results, we offer two scripts in /project/experiments
.
Unmodified, run-experiment.sh
repeats all recording/re-execution runs with both allocators.
Similar to the mainline experiment, we highly recommend to reduce the number of benchmarks in its configuration section on top.
Not only because the experiments take several days on a single machine but also because MoKlee’s persistent execution trees occupy 418 GiB of storage with that many applications.
For the sake of simplicity, we removed the categorisation into benchmark suites and put all applications in a single list.
After running the experiments, the evaluate-experiments.sh
script can be used to generate the relevant data.
It generates the following files:
comparable_<searcher>.csv
: list of comparable runscomparable_divergences_<searcher>.csv
: number of divergences for both allocators among comparable runsuncomparable_divergences_<searcher>_<allocator>.csv
: number of divergences per allocator (similar to Table 4 but without the categorisation into benchmark suites)