KLEE Array - Experiments -Software Reliability Group

KLEE-Array is made available both as source tarball and in a binary form, pre-installed in a virtual machine. For more details on KLEE-Array and the benchmarks, please see the ISSTA’17 paper.

Source code

The source code of KLEE-Array is available here.

To compile and run KLEE-Array, you need a Linux-based operating system. You also need:

CMake;
GNU make;
Python 2.7;
LLVM 3.4;
Clang;
Minisat;
STP.

To compile the code, follow the KLEE’s documentation. The only difference is that, instead of checking out KLEE, you need to use the source distribution available above.

Virtual machine

All the benchmark applications and the necessary infrastructure have been pre-installed under the VM.

The VM was prepared under VirtualBox and is based on Ubuntu 14.04 LTS.

Click here to download the virtual machine (md5: 663dc4184378a1c76aadb2756a1c9470).

The source code as well as the compiled binaries of all the tools used can be found in the home directory of the VM user (/home/user). In particular, klee-array contains our modified version of KLEE.

This replication package is located in the home folder and is structured as follows:

RQ1_2 contains the replication infrastructure to answer research question RQ1 and RQ2
- case_studies is the main folder where you can find the source code and binaries of the subject programs
  - common.mk is the main configuration file for our experimental infrastructure
  - sandbox.tgz contains KLEE’s sandbox directory
  - testing-env.sh is a script that creates an environmental file for KLEE
  - bandicoot, bc, binutils, bzip2, and coreutils contain the source code and binaries of the subject programs, as well as a Makefile script to replicate the experiments
RQ3 contains the replication infrastructure to answer research question RQ3
- bc-1.06-mod is the folder where you can find the source code and binary for the program BC that has been modified so that some symbolic inputs are concretized during symbolic exeuction.
- bc-1.06-gcov is the folder where you can find the source code and binary for the original version of BC. During the experiments this program will be compiled so that it collects code and branch coverage information.

If you have any further questions about the VM or you spot an issue, please contact .

Running KLEE-Array

We implemented our transformations within the KLEE symbolic executor. KLEE is invoked through command-line:

# klee [klee-options] <program.bc> [program-options]

The general form of a KLEE command-line is first the arguments for KLEE itself ([klee-options]), then the LLVM bitcode file to execute (program.bc), and then any arguments to pass to the application ([program-options]). In particular, the KLEE option -posix-runtime enables the use of the symbolic environment options as part of the program’s options. To get a complete list of KLEE’s command-line options run: klee --help. More information on how to use KLEE can be found at: http://klee.github.io/docs/options/.

Our transformations can be enabled and controlled through the following command-line options:

-optimize-array            - Optimize accesses to arrays. (default=off)
  =all                     -   Combining constant, index, and value optimizations
  =constant                -   Constant value optimization
  =index                   -   Index-based optimization
  =read_value              -   Value-based at ReadExpr
  =read_value_concrete     -   Value-based at ReadExpr (concrete only)
  =read_value_concsymb     -   Value-based at ReadExpr (concrete/symbolic only)
  =value                   -   Value-based at branch
  =value_concrete          -   Value-based at branch (concrete only)
  =value_concsymb          -   Value-based at branch (concrete/symbolic only)
-optimize-array-rewrite    - Expression rewriting strategy.
  =or                      -   k == 0 | k ==1 | .. | k == 5
  =range                   -   (0 <= k && k <= 5)
  =hybrid                  -   k!=8

-array-value-ratio=R       - Maximum ratio of unique values to array size for which
                             the array optimizations are applied.
-array-value-symb-ratio=R  - Maximum ratio of symbolic values to array size for which
                             the array optimizations are applied.

In particular, we use the following combinations in our experimental evaluation:

Constant: --optimize-array=constant
Index-based: --optimize-array=index --optimize-array-rewrite=hybrid
Value-based: --optimize-array=read_value --array-value-ratio=0.75 --array-value-symb-ratio=0.75
Combined: --optimize-array=all --optimize-array-rewrite=hybrid --array-value-ratio=0.75 --array-value-symb-ratio=0.75

Our transformations can be applied to all the programs supported by standard KLEE. The most difficult part in applying KLEE to a new case study is to compile the subject program to bitcode. We usually followed the procedure and the tools (i.e., whole-program-llvm) suggested in KLEE’s documentation. More examples can be found inside our replication package.

Replicating the Experiments

We provide a pre-configured virtual machine containing our set of experiments.

The artifact includes the tool together with the complete set of experiments and its replication infrastructure to allow other researchers to reproduce the experiments.

Warning

Our experiments are very sensitive to changes in both environment and memory since they directly impact the code exploration strategies of KLEE. The best way to replicate our results is to use a physical machine, although the configuration is non-trivial. We did our best to set up the VM to replicate as close as possible the environment on our physical servers. However, the VM environment might be over-restrictive for our experiments, due to memory, CPU, and disk limits.

The current potential known issues are:

Determinism: Running KLEE deterministically is essential for our experiments, and yet it is non-trivial. We ran our experiments on a physical machine with 16 GB RAM, and we optimistically reserve 8 GB of RAM to the VM. Inside the VM, we disable ASLR and rely on TCMalloc to deterministically allocate memory. Failures in deterministically allocating memory may result in divergences during symbolic execution. KLEE might fail to allocate memory if either the VM, the host system, or both do not have enough available memory. In case of failures, a VM reboot might be enough to fix the issue. Reducing the allocated memory might result in KLEE hitting the memory limit, thus consequently discarding execution states and exposing behavior divergences unrelated to our transformations. KLEE runs all programs in a sandbox directory located in /tmp/sandbox, which is deleted and re-created at every run. Failures in setting up the sandbox will likely result in divergences.
Runtime fluctuations: The data reported in the paper is collected on a physical machine without a middle-layer hypervisor. Moreover, to avoid subtle and suboptimal processor allocation by the Linux scheduler, we pinned processes to CPUs, but unfortunately VMs like Virtualbox do not allow users to pin physical CPUs. As a result, the performance recorded with this VM will likely differ from the one in the paper. However, we expect you to observe similar performance benefits when our optimizations are applied. It is crucial to allocate at least two CPUs to the VM and to run all experiments sequentially.
Disk usage: The complete replication of our experiments requires almost a terabyte of disk space, and one week of (physical) CPU time. Recording execution traces also incurs 3x slowdown. If disk usage is an issue (either space, performance, or both), please follow the suggestion in the \textsf{Replicating the Experiments} section, although you will no longer be able to record execution traces, and thus to check results for RQ1.

From now on, the description—and in particular all the paths in the examples—will refer to our virtual machine setup.

Configuration

This step-by-step tutorial focuses on the replication of all the experiments used to evaluate our technique.

Our experimental infrastructure is based on GNU make and is composed of two main parts: A configuration file which is common to all experiments (case_studies/common.mk) and a Makefile file inside the directory of each case study (e.g., case_studies/bc/Makefile).

The common.mk file contains the configuration of KLEE used in our experiments. The KLEE options used during the evaluation are the following:

--simplify-sym-indices \
--output-module \
--max-memory=4095 \
--allocate-determ=true \
--allocate-determ-size=4095 \
--allocate-determ-start-address=0x7ffef66f3000 \
--max-sym-array-size=4096 \
--disable-inlining \
--optimize \
--use-forked-solver \
--use-cex-cache \
--libc=uclibc \
--posix-runtime \
--allow-external-sym-calls \
--only-output-states-covering-new \
--watchdog \
--max-memory-inhibit=false \
--use-query-log=solver:kquery \
--max-static-fork-pct=1 \
--max-static-solve-pct=1 \
--max-static-cpfork-pct=1 \
--switch-type=internal \
--dump-states-on-halt=false \
--debug-print-instructions=$(KINSTRFORMAT) \
--debug-compress-instructions \
--compress-query-log \
--environ=/tmp/test.env \
--run-in=/tmp/sandbox \
--max-instruction-time=$(KINSTRTIMEOUT) \
--max-solver-time=$(KINSTRTIMEOUT) \
--max-time=$(KTIMEOUT) \
--search=$(KSEARCH) \
$(KOPT)

Where $(KINSTRFORMAT), $(KINSTRTIMEOUT), $(KTIMEOUT), $(KSEARCH), and $(KOPT) have the following default values:

KINSTRFORMAT:=compact:file
KINSTRTIMEOUT:=200.
KTIMEOUT:=4000.
KSEARCH=dfs
KOPT=

NOTE: In case disk usage is an issue, remove the following lines:

  --use-query-log=solver:kquery
  --debug-print-instructions=$(KINSTRFORMAT)
  --debug-compress-instructions
  --compress-query-log

Please notice that by removing these flags:

you will not record execution traces, thus making impossible to replicate the experiments to answer RQ1;
you will generally observe a noticeable performance speedup, thus invalidating previously recorded runs.

We essentially have two sets of experiments: The first set is used to answer research questions RQ1 and RQ2, thus filling the data in Table 3. The second set is used to answer RQ3.

Experiments for RQ1 and RQ2

The experiment follows the protocol described in the paper in Section 5.2: First, we create a baseline, that is, we run KLEE for a given amount of time and record the number of executed instructions (Creating a baseline). Second, we re-run KLEE by limiting the exploration to the number of instructions previously recorded, with and without our transformations (Running KLEE with transformations).

Creating a baseline

The creation and configuration of the baseline in our scripts is non-trivial. Our VM is already configured with the instruction limits that we recorded and used for our experiments on a 30 minute run. Hence, you can skip this step, although running KLEE in the VM on our instruction limits will incur additional runtime (up to 4x slowdown).

We now describe the experiment referring to the coreutils case study where not explicitly mentioned.

Go to the coreutils directory of the replication package:
```
# cd ~/RQ1_2/case_studies/coreutils
```

Invoke the generation of the baseline. You can specify the timeout (in seconds) through the KTIMEOUT variable. For example, to create a baseline for 10 seconds:

# make KTIMEOUT=10. create-baseline
..
Makefile:92: warning: ignoring old commands for target `baseline-klee-od'
Makefile:92: warning: ignoring old commands for target `baseline-klee-pathchk'
..
Creating KLEE baseline with dfs on program [
..

Warnings about overriding of commands in the Makefile are expected in the output.

Ge the instruction count from KLEE run statistics. The invocation will create a directory and a log file for each subject program in the case study:
```
# ls
klee-out-[ klee-out-base64 klee-out-basename klee-out-cat klee-out-cksum
baseline-klee-dfs-[.log baseline-klee-dfs-base64.log ...
```
You can get the executed instruction count with the following command:
```
# tail -n1 klee-out-[/run.stats | cut -d ',' -f1 | cut -d '(' -f2
1843496
```

Once you obtained the executed instructions for all the subject programs, you need to configure the Makefile to use such limits in the next runs. The configuration steps to follow depends on the case study. In general, you need to open the Makefile in the case study directory (e.g., bc/Makefile) and modify the run-* targets with the new instruction limit (e.g., 1234). For example, for the BC case study:

# emacs bc/Makefile
[..]
# Run KLEE on BC given a instruction limit
run-bc: $(KLEE) bc.bc prepare
   @echo "Running KLEE with $(KSEARCH) on program BC"
   -taskset -c 0,1 $(KLEE) --stop-after-n-instructions=1234 \
   --stats-write-interval=0 --istats-write-interval=0 \
   --stats-write-after-instructions=1234 --istats-write-after-instructions=1234 \
   $(KFLAGS) \
   --output-dir=klee-out-bc \
   $(BC_DIR)/bc/bc.bc \
   A --sym-files 1 8 --sym-stdin 8 --sym-stdout \
   > run-klee-$(KSEARCH)-bc.log 2>&1
[..]

In case of coreutils and binutils, the process is quite different. In fact, given the substantial number of the subject programs (63), we rely on the automatic generation of Makefile recipes through macros. This strategy allows us to decouple instruction limits and program invocation, thus simplifying most of the recipes. The instruction limits are thus stored in a separate file called targets.txt inside the case studies’ own directory. The file format is CSV and is quite simple: app-name:instructions:statistics. The instructions column should contain the executed instructions recorded during the creation of the baseline, while the statistics should contain at how many instructions the statistics should be dumped. Since dumping statistics might be expensive, we force KLEE to dump only at the beginning and end of the execution by using the same value of executed instructions. For example, the targets.txt file in coreutils contains:

# cat coreutils/targets.txt
[:1843496:1843496
base64:2948624182:2948624182
[..]

Unfortunately, as discussed on KLEE’s website some of the subject programs require special symbolic arguments to maximize code coverage. Such special Makefile recipes are stored into the Makefile.specials file. Therefore, once you obtain the executed instructions for all the subject programs, you need to:

insert the new instruction limits into the targets.txt;
modify the recipe name for all the run-* targets inside Makefile.specials since it contains a reference to instruction and statistic limits;
modify the instruction and statistic limits for all the run-* targets inside Makefile.specials as shown for the general case.

After the baseline is created and the configuration updated, you need to move the results to another directory, since new runs will be placed directly in the same path. For example, you can create a directory baseline:

# mkdir baseline
# mv klee-out-* baseline
# mv baseline-* baseline

You can now follow the instructions to run KLEE with and without transformations and compare their runtime.

As mentioned in the introduction, KLEE might fail to deterministically allocate memory. You can check if the runs completed successfully by inspecting the log files. The error message is the following: Could not allocate memory deterministically. In case of failures, please reboot the VM and try to run the command again.

Running KLEE with transformations

Go to the coreutils directory of the replication package:
```
# cd ~/RQ1_2/case_studies/coreutils
```
Now you can run different types of experiments:
- Invoke KLEE on all subject programs without transformations:
```
# make all-dfs
..
Running KLEE with dfs on program [
..
```
- Invoke KLEE on all subject programs by specifying the transformation through the KOPT variable. For example, to run KLEE with constant-folding optimizations:
```
# make KOPT="--optimize-array=constant" all-dfs
..
Running KLEE with dfs on program [
..
```
- Invoke KLEE on a single subject program with or without transformations: (This is the only target available for the case studies bandicoot, BC, and bzip2.)
```
# make run-klee-[_1843496_1843496
..
Running KLEE with dfs on program [
..
# make KOPT="--optimize-array=index" run-bc
..
Running KLEE with dfs on program BC
..
```
- To obtain a list of available targets, write make in the terminal and press TAB key twice:
```
# make <TAB><TAB>
Display all 175 possibilities? (y or n) y
Makefile baseline-klee-pathchk run-klee-base64_2948624182_2948624182 ..
```

To completely replicate our experimental evaluation to answer RQ1 and RQ2 you have to run, for each case study:

# make <run-command>
# make KOPT="--optimize-array=constant" <run-command>
# make KOPT="--optimize-array=index --optimize-array-rewrite=hybrid" <run-command>
# make KOPT="--optimize-array=read value --array-value-ratio=0.75 --array-value-symb-ratio=0.75" <run-command>
# make KOPT="--optimize-array=all --optimize-array-rewrite=hybrid --array-value-ratio=0.75 --array-value-symb-ratio=0.75" <run-command>

Where <run-command> may be either all-dfs (for coreutils and binutils), or a run command specific for a case study (e.g. run-bc for BC).

NOTE: After each command invocation, you need to save the current logs and KLEE output directory:

# mkdir constant
# mv klee-out-* constant
# mv run-klee-* constant

Warnings about overriding of commands in the Makefile are expected in the output.

The runtime for each program should be very close to the one used to create the baseline. Minor fluctuations can be observed due to the heuristic nature of constraint solvers.

Experiments for RQ3

The experiment follows the protocol described in the paper in Section 5.4:

First, we run KLEE for 6 hours on a modified version of the program BC. These modifications force a portion of the symbolic bytes to become concretized allowing us to force KLEE to explore a specific functionality of the program as described in the paper. These modifications can be found in bc-1.06-mod/bc/scan.c at lines 1005 through 1067. These modifications insert calls to a special function, klee_assume, which allows the user to place constraints on variables found in the program.
Second, we re-run KLEE on the same version of BC but apply our transformations.
Finally, we execute both sets of inputs generated during symbolic execution on the version of BC that has been compiled with gcov support. This allows us to observe the amount of code/branch coverage achieved by symbolic execution with and without our transformations.

Running KLEE for Coverage Experiments

Go to the RQ3 directory of the replication package:
```
# cd ~/RQ3
```

Now you can run the two experiments:

Invoke KLEE on BC without transformations:

# make klee-bc-orig
..
Running KLEE without transformations on BC
..

Invoke KLEE on BC with Transformations:

# make klee-bc-opt
..
Running KLEE with transformations on BC
..

Finally, you can collect the code/branch coverage information:

Collect coverage information for the unoptimized run:

# make gcov-bc-orig
..
Replaying inputs generated during run with no transformations
..

Collect coverage information for the optimized run:

# make gcov-bc-opt
..
Replaying inputs generated during run with transformations
..

After running the commands to collect coverage information the data for each file in BC will be displayed on the screen. Additionally, the total coverage information will be at the very bottom of the output. Unfortunately, due to the slowdown that is a result of emulation we expect the code coverage achieved to be less than what is reported in the paper. However, the difference between the coverage achieved by the original and optimized runs should be similar.

Inspecting the Results

During the execution of the experiments, KLEE stores two types of data:

data on the correctness of the approach, that is the execution traces, to answer RQ1;
data on the effectiveness of the approach, that is the time spent to symbolically execute the subject program and the generated test cases, to answer RQ2 and RQ3.

Such data is recorded on a per-run basis, although KLEE generates the same files for all runs. We will present how to obtain the results to answer the research question by referring to the BC case study:

# cd PATH_TO_CASE_STUDIES/bc

Each run of BC will generate a folder named klee-out-bc with the following files inside:

# ls klee-out-bc
array-details.stats  array.stats  assembly.ll  final.bc  info instructions.txt.gz
messages.txt  run.istats  run.stats solver-queries.kquery.gz warnings.txt

Correctness

Inside the KLEE-generated folder there should be a file named instructions.txt.gz. This file contains all instructions interpreted by KLEE during symbolic execution.

The file size can quickly grow and, together with file solver-queries.kquery.gz, is the main source of disk usage in our experiments (up to 10GB each). Since it is infeasible to compare with diff the files, we rely on the CRC code to check for equality:

# gzip -lv original/klee-out-bc/instructions.txt.gz | tail -n1 | cut -d ' ' -f2
d22f4d6d
# gzip -lv index/klee-out-bc/instructions.txt.gz | tail -n1 | cut -d ' ' -f2
d22f4d6d

In practice, we found this method extremely accurate. The manual inspection of the two CRC codes for their equality allows the generation of the data to fill Table 3 column Eq in the paper (RQ1).

Effectiveness

KLEE stores statistics on the progress of the execution in a CSV file named run.stats. The file is stored withing the KLEE-generated folder for the run. The CSV file contains several statistics, for example the number of executed instructions, the number of queries issued, and the time required to complete the execution. It is possible to easily inspect the content of the run.stats file with the klee-stats tool.

To fill Table 3 (RQ2) it is sufficient to:

get the total wall-clock time (we do not compare CPU time since KLEE does not track CPU time for the solver) for each run with the following command:
```
# tail -n1 klee-out-bc/run.stats | cut -d ',' -f11
```
get the number of successful transformations that is recorded in warnings.txt
```
# grep "OPT.*success" klee-out-bc/warnings.txt | wc -l
```

KLEE Array - Experiments

Table of contents