Setup

Install Docker based on your system configuration: Get Docker.
Download the artefact: moklee_artefact.tgz.
Import the artefact into Docker: docker load --input moklee_artefact.tgz
Run the Docker image: docker run -it --ulimit nofile=1000000:1000000 moklee /bin/bash

This should work in most scenarios. However, in our setup we use:

docker run -it --ulimit nofile=1000000:1000000 --privileged --security-opt seccomp=unconfined --security-opt apparmor=unconfined moklee /bin/bash

Be aware that --security-opt ... and --privileged remove some security restrictions and allow access to /dev on the host!

Structure

MoKlee, our memoization framework, is based on KLEE, a symbolic execution engine for C and C++. It is based on version 1.4 with multiple changes back-ported from version 2.0. For simplicity, we provide MoKlee as pre-compiled binary. However, we will propose our changes to be incorporated with upstream KLEE.

MoKlee is compiled and linked against the following components:

Cap’n Proto	0.6.1
CVC4	1.7
LLVM	3.8.1
Snappy	1.1.7
SQLite	3.29.0
STP	2.3.3
TCMalloc	2.7
Z3	4.8.4

Although this document should be self-contained, additional resources can be found on the KLEE website.

We placed all of the relevant information in the /project directory:

evaluation/ our experimental data and the scripts to compute the figures from the paper
experiments/ the scripts to re-produce our experiments
binutils/, coreutils/, diffutils/, findutils/, grep/, libspng/, tcpdump/ the benchmark applications with
- versions documented in /project/VERSIONS and
- configuration flags (config), patches (patches) and bitcode files (bc/)
- the driver.c for libspng

Unfortunately, we do not include Clang or wllvm to keep the Docker image small. If you want to run your own benchmarks, you need to compile them with Clang 3.8 into bitcode files and use docker cp to copy them into the container.

Modifications

MoKlee modifies KLEE in numerous ways. Some notable changes are:

improved coverage tracking
extended RandomPath search heuristic that supports state subsets
BFS search based on tree depth, not forking depth (differs when KLEE/MoKlee branch multiple times per statement; e.g. switch)
stricter -max-depth handling (KLEE continues branching for some branch types)
stricter solver timeouts (KLEE occasionally disables the timeout)
timeout for external calls
CVC4 support
improved Z3 integration

Using MoKlee

To execute MoKlee, start the Docker image as described in step 4 of the setup section. Inside the container, you can start MoKlee:

  klee [MoKlee arguments] [application] [application arguments]

with the following arguments:

MoKlee arguments: to change the behaviour of MoKlee
application: the path to application under test provided as LLVM bitcode file (.bc)
application arguments: additional arguments for the application

As an example, run the following inside the container:

klee -posix-runtime -libc=uclibc /project/coreutils/bc/echo.bc "Hello ISSTA!"

An output similar to the following should be displayed:

KLEE: load program '/project/coreutils/bc/echo.bc'...
...
KLEE: output directory is "/project/coreutils/bc/klee-out-0"
KLEE: Using STP solver backend
...
Hello ISSTA!
...
KLEE: done: total instructions = 127616
KLEE: done: skipped instructions = 0
KLEE: done: completed paths = 1
KLEE: done: generated tests = 0

As part of the output, the destination directory for the results of this run is printed:

KLEE: output directory is "/project/coreutils/bc/klee-out-0"

Inspecting the results

Use klee-stats and provide one or more result directories to display analysis results. For our example use the following command:

klee-stats /project/coreutils/bc/klee-out-0

The output should look similar to:

-----------------------------------------------------------------------------------------------------------------------------------
|Path                            |  Instrs|  Time(s)|  ICov(%)|  BCov(%)|  ICount|  TSolverChain(%)|  QueryTime(%)|  CachesTime(%)|
-----------------------------------------------------------------------------------------------------------------------------------
|/project/coreutils/bc/klee-out-0|  127616|     0.42|    16.72|   100.00|   20572|             0.68|          0.67|           0.01|
-----------------------------------------------------------------------------------------------------------------------------------

For more details use the additional --print-all flag; and, to avoid line-wrapping, use less -S.

klee-stats --print-all /project/coreutils/bc/klee-out-0 | less -S

To convert tabular data into comma-separated values you can use the stats2csv alias:

klee-stats --print-all /project/coreutils/bc/klee-out-0 | stats2csv > echo.csv

The Docker image includes csvkit. We use it heavily in our scripts but it may also help you analyse .csv files with SQL queries, for instance:

csvsql --query "SELECT Path,[Time(s)] FROM echo WHERE Instrs > 1000" echo.csv

You may notice that MoKlee tracks much more information than KLEE. The most relevant statistics are:

Instrs: executed instructions
Irepl: replayed instructions
Inew: new instructions
Ilost: instructions lost due to divergences
ST: terminated states
SD: interrupted states (e.g. MoKlee timeout)
SDiv: diverging states
Qrepl: replayed queries
Qrec: recorded queries
Qunrec: unrecorded queries (contain memory addresses)
States: created states
Nodes: created execution tree nodes
KStates: states terminated due to memory pressure (or maximum depth)
KStatesIntr: interrupted states due to memory pressure (not fully replayed)
PTTime: time spent maintaining the process tree (without Record/Replay solver)
TSearch: time spent selecting states (search heuristic)

Recording

Of course, the most interesting feature of MoKlee is its memoization capability. In the following example we run grep with symbolic input (--sym-args 0 1 10 --sym-args 0 2 2 --sym-files 1 8 --sym-stdin 8 --sym-stdout) for 1 minute (--max-time) with depth-first search (DFS) exploration (-search), store the execution tree to disk (-write-ptree), enable divergence detection (-hash-constants -hash-structs -disable-sequential-hashes), use a fixed set (here empty) of environment variables (-environ) to reduce divergences, track instruction coverage (-coverage-instruction), and write the results into the record subdirectory (--output-dir).

klee -max-time=1min -search=dfs -write-ptree -hash-constants -hash-structs -disable-sequential-hashes -environ=/project/experiments/empty.env -coverage-instruction -posix-runtime -libc=uclibc --output-dir=record /project/grep/bc/grep.bc --sym-args 0 1 10 --sym-args 0 2 2 --sym-files 1 8 --sym-stdin 8 --sym-stdout

This example shows the minimal number of arguments to record an execution tree. Nonetheless, we recommend additional arguments to limit the memory usage (--max-memory=2000) and the number of written test cases (-dump-states=error), to set a solver timeout (-max-solver-time=10s) or to continue branching close to the memory limit (-max-memory-inhibit=FALSE). The complete set of arguments for our experiments is documented in the shell scripts in the Evaluation section.

Replaying

Afterwards, we replay the recording run from above (--continue-with=record), use the same exploration strategy during replay (-replay-search=dfs), store the execution tree to disk and stop when we finished replaying the recording run (-stop-after-replay). In this run we do not prune any paths as we -restart-all-states.

klee --continue-with=record -replay-search=dfs -search=dfs -stop-after-replay -restart-all-states -write-ptree -hash-constants -hash-structs -disable-sequential-hashes -environ=/project/experiments/empty.env -coverage-instruction -posix-runtime -libc=uclibc --output-dir=replay /project/grep/bc/grep.bc --sym-args 0 1 10 --sym-args 0 2 2 --sym-files 1 8 --sym-stdin 8 --sym-stdout

Now let’s omit the -restart-all-states flag and enable path-pruning. Let’s also enable the RndCov default strategy by omitting the searcher flags to explore different paths. Instead of only replaying the whole tree and stopping afterwards as before, we now re-use the execution tree (--continue-with=record) and continue for 1 minute. Keep in mind that all replay runs modify the recorded process tree file (ptree.db) directly and do not create a copy of that file.

klee -max-time=1min --continue-with=record -write-ptree -hash-constants -hash-structs -disable-sequential-hashes -environ=/project/experiments/empty.env -coverage-instruction -posix-runtime -libc=uclibc --output-dir=resume /project/grep/bc/grep.bc --sym-args 0 1 10 --sym-args 0 2 2 --sym-files 1 8 --sym-stdin 8 --sym-stdout

The command

klee-stats --print-all record replay resume | less -S

should now display something like

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|Path     |  Instrs|  Time(s)|  ICov(%)|  BCov(%)|  ICount|  QTime(%)|  TSolver(%)|  maxStates|  avgStates|  Mem(MB)|  maxMem(MB)|  avgMem(MB)|  Queries|  AvgQC|  Tcex(%)|  Tfork(%)|  TResolve(s)|  QCexCMisses|  QCexCHits|  Irepl|  Iskip|   Inew|  Ilost|  IlostT|  ST|  SD|  SDSolver|  STrepl|  SDrepl|  SPrepl|  SEDErepl|  SDiv|  SDivT|  SDivP|  Qconst|  Qrepl|  Qunrepl|  Qrec|  Qunrec|  QVrec|  PTTime(s)|  ATime(s)|  FTime(s)|  Nodes|  States|  KStates|  KStatesIntr|  KExtCalls|  CoverableBB|  CoveredBB|  CoverableInst|  CoveredInst|  TSearch(%)|  DMemUsage|
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|record   |  564757|    61.41|    17.18|   100.00|   83555|     97.99|       98.37|         25|      21.00|   103.90|      115.88|      113.45|      849|     54|    98.12|      0.03|         0.01|          849|        775|      0|      0| 564757|      0|       0|  52|  23|         0|       0|       0|       0|         0|     0|      0|      0|  290825|      0|        0|   853|      62|     29|       0.00|      0.02|      0.00|    149|      75|        0|            0|          0|            0|          0|              0|            0|        0.00|          0|
|replay   |  564757|     3.41|    17.18|   100.00|   83555|     79.01|       79.70|         23|      12.60|   102.98|      114.98|      101.23|       15|    299|    79.05|      0.41|         2.69|           15|         43| 564757|      0|      0|      0|       0|  52|  23|         0|       0|      23|       0|         1|     0|      0|      0|  290825|    853|       62|     0|       0|      0|       0.00|      0.02|      0.00|    149|      75|        0|            0|          0|            0|          0|              0|            0|        0.02|          0|
|resume   | 1960268|    62.35|     6.94|   100.00|   83555|     58.76|       67.87|       8417|    4440.68|   123.72|      302.90|      213.05|     4275|    105|    61.60|      2.65|         1.69|         4275|       3803| 152354|      0|1807914|      0|       0|   6|8417|         0|       2|       5|       5|         1|     0|      0|      0|  964094|      7|        6| 22129|     332|     76|       0.26|      0.02|      0.00|  16849|    8425|        0|            0|          0|            0|          0|              0|            0|        1.82|          0|
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|Total (3)| 3089782|   127.17|    13.77|   100.00|  250665|     78.25|       82.91|       8465|    4474.28|   330.60|      533.75|      427.73|     5139|     97|    79.71|      1.33|         4.39|         5139|       4621| 717111|      0|2372671|      0|       0| 110|8463|         0|       2|      28|       5|         2|     0|      0|      0| 1545744|    860|       68| 22982|     394|    105|       0.26|      0.06|      0.00|  17147|    8575|        0|            0|          0|            0|          0|              0|            0|        0.89|          0|
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

The replay run re-executed all instructions in a fraction of time (3.41s) without any divergences (SDiv) while the resume run aimed to explore different paths and by doing so replayed 152,354 instructions (Irepl).

Note that if you’d like to run the commands above (or variants of) multiple times, you need to change the names of the output directories passed with --output-dir. Similarly, you need to pass those directory names to the klee-stats command.

Evaluation

In the first part of this document, we walk the reader over how they can inspect the data generated by our experiments. In the second part, we show how the reader can re-run our experiments.

In the following section, we provide the raw data for each of our experiments in separate subdirectories. Each directory contains, if necessary, a minimal script to re-compute the relevant data. These scripts not only document our methodology but also can be used to generate and compare results when reproducing our experiments.

Excluded applications

From our evaluation we exclude several Coreutils:

base32, sha1sum, sha224sum, sha384sum, sha512sum as they are similar to base64
cat because KLEE/MoKlee ran instantly out of memory
chcon, chgrp, chmod, chown, chroot, dd, ginstall, kill, rm, rmdir, stdbuf, truncate, unlink for they interfere (sometimes) with our experiment setup. For instance, chmod removed the access rights to our sandbox directory and truncate destroyed Docker’s overlay file system. Otherwise, MoKlee works fine with those tools and if you are brave enough, you might run kill, rm, … with symbolic arguments.

Early termination (Figure 1, Section 1)

Directory: /project/evaluation/early_termination

For Figure 1, we ran MoKlee for 2 hours on 87 Coreutils with the default memory limit of 2GB. We use MoKlee’s low-overhead recording mode here so that we can re-use this run for other experiments, especially the long-running experiment in Section 4.6. We also provide the data for three non-recording runs (not shown in the paper) with three different memory limits: 2GB, 3GB, and 10GB, showing a similar behaviour and only slightly reduced termination rates for the 10GB run. The raw data produced by MoKlee for the four runs can be found in 2GB_record.stats, 2GB.stats, 3GB.stats, and 10GB.stats. The relevant columns are the number of created states (States) and the number of early terminated states (KStates). We provide a script that outputs the data in a more readable format, e.g.:

./early_termination.sh 2GB_record.stats

Replaying Coreutils (Figures 7 and 9, Section 4.2)

Directory: /project/evaluation/replay_coreutils

In this experiment, we record 2h runs of various Coreutils with the default search strategy, a combination of random-path and coverage-guided heuristics (RndCov). Afterwards, we replay those runs with and without enabled path-pruning and with two different search strategies. The raw data is available in:

record.stats recorded RndCov runs
replay_rc.stats replayed runs with RndCov strategy
replay_dfs.stats replayed runs with depth-first-search (DFS) strategy
prune_rc.stats replayed runs with RndCov strategy and path-pruning enabled
prune_dfs.stats replayed runs with DFS strategy and path-pruning enabled

Please note that when a tool is fully explored in the recording stage, not a single instruction (Instrs) needs to be executed with path-pruning enabled.

For the comparisons shown in Figure 7 and 9, we had to exclude all tools that:

diverge (SDiv > 0) and lose instructions (Ilost > 0) or
terminate (kill) states early with unreplayed instructions due to memory pressure (KStatesIntr > 0)

We include a script that computes:

the tools to exclude (excludes.csv)
timing data for all individual tools (e.g. record_execution_times.csv)
summary data for Figure 9 in table form (execution_times_summary.csv):

After running:

./execution_times.sh

you can retrieve data for Figure 7 in textual form with e.g.:

csvstat -c "Time(s)" record_execution_times.csv

The command above shows the results for the Record-RndCov bar in Figure 7, displaying the results in seconds rather than minutes. The other bars can be obtained in a similar fashion by querying the appropriate .csv files.

The results for Figure 9 can be obtained with:

csvlook execution_times_summary.csv

Replaying Coreutils (DFS) (Figures 8 and 10, Section 4.2 and Figure 13, Section 4.3)

Directory: /project/evaluation/replay_coreutils_dfs

This experiment is similar to the one above but uses DFS as search strategy in the recording run. Again, we provide the raw data and a script to create the timing data.

After running:

./execution_times.sh

you can retrieve data for Figure 8 in textual form from the generated .csv files with e.g.:

csvstat -c "Time(s)" record_execution_times.csv

Again, this shows the results in seconds rather than minutes. The data for Figure 10 can be displayed with:

csvlook execution_times_summary.csv

Although we excluded the replay run with RndCov strategy from the paper as too many tools terminate states early, we do provide the raw data (replay_rc.stats).

We use the same data for Figure 13 in Section 4.3 to report the time spent maintaining the process tree in MoKlee relative to the overall execution time. The statistics for the relevant runs can be retrieved with:

csvstat -c "TTree(%)" record_relttime.csv

and

csvstat -c "TTree(%)" replay_dfs_relttime.csv

Pruned states (Figure 11, Section 4.2)

Directory: /project/evaluation/pruned_states

Figure 11 is based on the path-pruning experiments for Coreutils from above. We use the same data but a different naming scheme:

   record_<record heuristic>.stats

and

   prune_<record heuristic>_<replay heuristic>.stats

After running:

./compute_pruned_states.sh

the distribution of pruned states can be displayed with, e.g.:

csvstat -c "PrunedStates(%)" summary_rc_rc.csv

With the current implementation we can only compute the number of pruned states as the difference of created states between recording and replaying runs. When a replaying run terminates states early due to memory pressure (KStates > 0) or removes subtrees because of divergences (Ilost > 0), we have to remove it from the dataset.

Execution tree sizes (Figure 12, Section 4.3)

Directory: /project/evaluation/ptree_sizes

We used ls to determine the sizes of the recorded process trees (ptree.db) for the 2h RndCov runs. Hence, the sizes in the corresponding .csv files include the overhead for the database file format (sqlite). For the sake of completeness, we report all sizes and not just the runs reaching the 2h limit as in Figure 12.

Runtime overhead (Section 4.3)

Directory: /project/evaluation/runtime_overhead

For this experiment, we use the recorded DFS run from above (record.stats) and re-run the same tools with the same number of instructions but without recording (norecord.stats) to compare execution times. Obviously, to compare two executions of the same tool, we need to make sure that they execute the same instructions and create the same execution tree. Unfortunately, we can not use our divergence detection for the non-recording run as we do not record the necessary data. Therefore, we use various statistics to ensure that different executions are identical, particularly the number of:

instructions (Instrs)
solver queries (Queries)
counter-example cache hits (QCexCHits) and misses (QCexCMisses)
terminated (ST) and interrupted states (SD)
covered instructions (CoveredInst)

After generating the timing data with:

./compute_overhead.sh

the reported data for the remaining 42 tools can be retrieved with:

csvstat -c "Overhead(%)" overhead.csv

Lost instructions (Figure 14, Section 4.4)

Directory: /project/evaluation/lost_instructions

Figure 14 shows the instructions lost due to divergences in non-pruning runs. For that, we select all diverging runs (SDiv > 0) that do not terminate unreplayed states (Ilost > 0) early due to memory pressure (KStatesIntr = 0). After running:

./compute_lost_instructions.sh

the results can be inspected via, e.g.:

csvstat -c "LostInstructions(%)" summary_rc_rc.csv

Replaying diff, find, grep, libspng, readelf, and tcpdump (Table 1, Sections 4.2 and 4.4)

Directory: /project/evaluation/replay_others

We provide statistics for all experiments (.stats) and a script to generate the data for Table 1:

./compute_table.sh

For the sake of completeness, we also include the path-pruning runs. All runs are shown with:

csvlook table.csv

In Section 4.4 we report that replaying find with KLEE’s deterministic memory model reduces the number of divergences significantly. These results are also shown as find-detmem.

Iterative Deepening (Figure 15, Section 4.5)

Directory: /project/evaluation/iterative_deepening

We use the non-Coreutils applications for this experiment and excluded find due to its many divergences. First, we determined the maximum depth that can be computed using BFS (d = 22 for diff, 13 for grep, 42 for libspng, 22 for readelf, and 19 for tcpdump) without terminating states early. These depths were then used as starting depths (00) for iterative deepening with DFS, with and without memoization (nomem). After 5 more levels (05) we terminated the deepening process.

./compute_times.sh

creates summary files that can be inspected with, e.g.:

csvlook diff_summary.csv

Validity

The data for the record-only runs can be found in *_deep_nomem_ptree.stats. We use the ptree-comparison tool to compare process trees (ptree.db) of the highest depth and provide the results in tree_comparison.txt.

Long-running experiments (Figures 16, 17, and 18, Section 4.6)

Directory: /project/evaluation/long_running

For this experiment, we select all Coreutils from the RndCov record run that finish before our 2h timeout for they have to terminate states early due to memory pressure. We then run those experiments repeatedly using memoization but without a time limit until they run out of states due to early termination. When the combined runtime for each application exceeds 7 days, we stop the experiment.

We report replayed instructions (Irepl) in Figure 16 and the number of new instructions (Inew) in Figure 17. You may use, e.g.:

./long_run.sh base64_resumeN.stats

to reduce the output to relevant columns. ptree_sizes.csv contains the sizes of the final process trees.

The coverage data for every tool as shown in Figure 18 is included in the coverage_<tool>.csv files. For each data point it contains the overall line coverage (cov) which is the sum of the covered lines in KLEE’s runtime (runtimeCov), the uclibc C library (uclibcCov), and the application itself (appCov). A helper script shows the relevant columns, e.g.:

./coverage.sh coverage_tac.csv

Experiments

This section is a guideline through the setup of our experiments. For each experiment we provide a script to reproduce our results. All scripts have a similar structure and it should be easy to spot configuration changes or adapt them if need be. The final section explains how to generate the evaluation data for your own experiments.

We ran our experiments on a carefully-configured homogeneous cluster of machines with Intel Core i7-4790 CPUs at 3.6 GHz, 16 GiB of RAM, and 1TB HDDs. Please be aware that these experiments are time-consuming and resource-intensive and you might want to reduce the set of benchmarks.

The Docker image comes with nano and the more user-friendly nice editor (ne) pre-installed. Feel free to install other editors (e.g. apt update && apt install vim) to configure and adapt our scripts.

Early termination

Directory: /project/experiments/early_termination

In our introduction we compute the percentages of states terminated due to memory pressure. You can re-run this experiment with:

./run_experiments.sh

and configure the selection of applications (e.g. tools=( "base64" "grep" )), the timeout (e.g. max_time=60min) and memory limits (memory_limits) in the configuration section of our script. The output directories are located at ./<limit>/<tool>.

Replaying

Directory: /project/experiments/replay

For all benchmarks (Binutils, Coreutils, Diffutils, Findutils, grep, libspng, and tcpdump) we offer a single script:

./run_experiments.sh

to reproduce our replaying experiments. For each application it runs MoKlee 10 times:

two recording runs with DFS and RndCov strategy
a replay run for each recorded run with RndCov strategy
a replay run for each recorded run with DFS strategy
a replay run for each recorded run with RndCov strategy and path-pruning
a replay run for each recorded run with DFS strategy and path-pruning

Running 93 applications 10 times with an initial timeout of 2h is of course time-consuming. You can alter the set of applications, select the search strategies for record and replaying runs, and reduce the timeout in the configuration section of the script.

After successfully running the script you can find MoKlee’s output directories in the following subdirectories:

  ./record/<strategy>/<tool>
  ./replay/<record strategy>/<replay strategy>/<tool>
  ./prune/<record strategy>/<replay strategy>/<tool>

Runtime overhead

Directory: /project/experiments/overhead

This experiment compares the runtimes of a recorded run (DFS) and a non-recording run (DFS) that executes the same number of instructions (-stop-after-n-instructions=<n>). Hence, it requires that some applications were recorded in DFS mode with the script in the previous section. Similar to the last sections, you can adapt the set of applications (tools) in the configuration section and execute:

./run_experiments.sh

MoKlee’s output directories are located at ./norecord/<tool>.

Iterative deepening

Directory: /project/experiments/iterative_deepening

We first determine the largest depth that can be computed with breadth-first strategy without terminating states due to memory pressure. We omit those runs from the artefact. In case you want to determine the relevant depths for other tools or different memory limits, iteratively increase the -max-depth and terminate when the memory limit is reached (-terminate-when-oom). The messages.txt file in the output directory should contain a message like:

KLEE: Reached memory limit, shutting down
Current state (12446) has depth: 22
KLEE: halting execution, dumping 141536 remaining states

For two of the applications, diff and readelf, the runtimes became quite large (several days) for the higher depths and we had to reduce the solver timeout (-max-solver-time) to 1s.

Again, this directory provides the script (run_experiments.sh) to execute memoized and non-memoized deepening runs. You can change the start depths (start_depths) for each application and the number of deepening steps (iterations) in the configuration section.

Long running experiments

With a combined run-time of more than 3 months we consider it impractical to run this experiment in a single Docker container and provide no ready-made script in this case. The setup is identical to the replaying runs with path-pruning: iteratively replay the respective last run with RndCov and omit the following flags

-stop-after-replay
-replay-kill-replayed
-max-time
-replay-search and -search

until the combined run time per application exceeds 7 days.

Evaluating your data

Directory: /project/experiments/evaluation

For convenience we provide a script (collect_data.sh) that gathers the data from the previous runs and builds up a directory structure similar to the one in the evaluation section. Each subdirectory contains files with the same names and the same scripts as above. The main difference is that we do not distinguish between the different application suites and all runs are either in (replay_rc) or (replay_dfs) depending on the search strategy of the recording run.

Keep in mind that compute_overhead.sh filters experiments with a runtime shorter than 2h. This script needs to be adapted accordingly when you change the maximum runtime.

Artefact for Running Symbolic Execution Forever

Setup

Structure

Modifications

Using MoKlee

Inspecting the results

Recording

Replaying

Evaluation

Excluded applications

Early termination (Figure 1, Section 1)

Replaying Coreutils (Figures 7 and 9, Section 4.2)

Replaying Coreutils (DFS) (Figures 8 and 10, Section 4.2 and Figure 13, Section 4.3)

Pruned states (Figure 11, Section 4.2)

Execution tree sizes (Figure 12, Section 4.3)

Runtime overhead (Section 4.3)

Lost instructions (Figure 14, Section 4.4)

Replaying diff, find, grep, libspng, readelf, and tcpdump (Table 1, Sections 4.2 and 4.4)

Iterative Deepening (Figure 15, Section 4.5)

Validity

Long-running experiments (Figures 16, 17, and 18, Section 4.6)

Experiments

Early termination

Replaying

Runtime overhead

Iterative deepening

Long running experiments

Evaluating your data