Artifact for Testing Symbolic Execution Tools -Software Reliability Group

Our artifact consists in the errors that we found, and a VM containing our testing infrastructure.

Errors found

The errors that we found in KLEE are shown in Table III of the paper. We provide below the links to our bug reports:

Some unions not retaining values
Incorrect by value structure passing
Overshift error triggered by -01 optimisations [see note in the paper]
Vector instructions unhandled, caused by -O2 optimisations
Floating point exception
Performance anomaly related to function calls
Performance anomaly related to memory allocation
Incorrect handling of division by 1
Execution forks unexpectedly
Segmentation fault due to % operator
Incorrect casting from signed to unsigned short [see note in the paper]
Abnormal termination in STP solver
Assertion failure in STP solver 1.6
Replaying read-only variables not handled
Unexpected interaction between file system model and replay library
Divergence between test generating path and test replay path [see note in the paper]

The bug reports for errors found in Crest and FuzzBALL as shown in Table V of the paper can be found below:

The Virtual Machine

We make our artifact available as a Virtual Box VM, which can be downloaded from here.

username: symext
password: 12345
ssh_openport: 6500

To run our testing infrastructure, please read the instructions below.

Getting started

Upon logging in, there are a couple of directories to look for:

~/dependencies: directory where the symbolic execution engines and other dependencies are installed
~/testSymExec/scripts: The root directory containing the bulk of the framework
~/testSymExec/scripts/playground/exp1: A directory where we pre-generated 100 programs with which experiments can be performed

A more detailed description of the files in the framework can be found further down. However the quickest way to get started is to run:

~/quickStart.sh

which runs the KLEE experiment in Mode B. Note that each run (a line in the output) can take up to 100s to run. The output should look like:

test3.c n:.003047703 k:.394848257 SUCCESS
test1.c n:.005460284 k:.576356714 SUCCESS
test2.c n:.003106823 k:.556381137 SUCCESS
test4.c n:.003156257 k:.533275876 SUCCESS
test7.c n:.002964223 k:.035298960 SUCCESS
...

Each line represents a program being run, with the run time information of both native (n:) and klee (k:) shown in seconds. SUCCESS indicates that none of the used oracles reported a discrepancy.

Running different experiments

The general steps to running an experiment are:

Make a new set of test programs as described in New experiment setup (or use a previously generated one)
cd into the directory containing the test programs
source the desired experiment setup to set the environment variables for the experiment:
- source path/to/setups/desiredSetupScript
- run ./runGeneralExperiment.sh

This should go through all the test###.c files in the directory and do a run with them. The experiments can take a long time to run, the worst case (where everything timeouts) would be 100s * number of test###.c files.

The first step is already performed on the VM, so to try out other modes and symbolic executors first go to the directory ~/testSymExec/scripts/playground/exp1, which contains the current set of generated C programs. Then source the desired experiment from the ~/testSymExec/scripts/setups folder, for example crestConstrainersModeB.sh to run with Crest in Mode B. Then start the experiment by running ~/testSymExec/scripts/runGeneralExperiment.sh.

That is, to run Crest in mode B, you need to type:

cd ~/testSymExec/scripts/playground/exp1
source ~/testSymExec/scripts/setups/crestConstrainersModeB.sh
~/testSymExec/scripts/runGeneralExperiment.sh

The output is more interesting this time as we have a failure in test5.c:

test3.c n:.002477044 c:.068876037 SUCCESS
test2.c n:.004286914 c:.050952997 SUCCESS
test1.c n:.003200213 c:.047856962 SUCCESS
test4.c n:.002032256 c:.076428134 SUCCESS
test7.c n:.002491375 c:.005579100 SUCCESS
test5.c n:.003236445 c:.056983235 Fail
test8.c n:.002697584 c:.065846521 SUCCESS
test9.c n:.002490838 c:.074152619 SUCCESS
test10.c n:.004403321 c:.006055252 SUCCESS
...

Investigating the bug

We can investigate the failure further by running

~/testSymExec/scripts/generalSingleExperiment.sh test5.c

which yields the following output:

...
1,2c1,2
< checksum = DA80DDC
< g_18: -2
---
> checksum = DF8D0104
> g_18: 29
4,5c4,5
< g_66: 51461
< g_82: 0
---
> g_66: 1
> g_82: -1
test5.c n:.002708271 c:.039520509 Fail

So it seems the checksum being printed is different and some of the symbolic variables have different values. This looks like a bug! It’s time to reduce it and report it. We first need to make a C-reduce oracle that captures the bug. The easiest way to achieve this is to edit ~/testSymExec/scripts/creduceTest.sh appropriately. For this particular bug a good C-reduce oracle script can be found in ~/testSymExec/scripts/playground/crestTest5Creduce.sh.

The full sequence of steps to run the reduction for test5.c with the provided oracle would be:

source ~/testSymExec/scripts/setups/crestConstrainersModeB.sh
mkdir test5reduction
cp ~/testSymExec/scripts/playground/exp1/test5.c test5Reduction/
cd test5Reduction
creduce ~/testSymExec/scripts/playground/crestTest5Creduce.sh test5.c

Which should reduce test5.c to something similar to our report. An example of this reduction after completion can be seen in ~/testSymExec/scripts/playground/test5Reduction.

In general if running generalSingleExperiment.sh is not sufficient to see the bug, it is possible to delve deeper by running $COMPILE_AND_RUN_1 test5.c to run the program just natively (in this case) or $COMPILE_AND_RUN_2 test5.c to run it with Crest. In fact these are the environment variables set by the setup script.

Mode A experiments

Mode A experiments can be ran with these setups:

setups/kleeModeA.sh sets up the experiment for KLEE in Mode A
setups/crestModeA.sh sets up the experiment for Crest in Mode A
setups/fuzzballModeA.sh sets up the experiment for FuzzBALL in Mode A

Mode B experiments

We’ve prepared experiments to be run in Mode B for KLEE, Crest and FuzzBALL. Unless otherwise specified these use crash, output, function call and performance oracles.

setups/kleeLTConstrainerModeB.sh sets up the experiment for KLEE in Mode B with less than constrainer
setups/kleeLTEConstrainerModeB.sh sets up the experiment for KLEE in Mode B with less than or equal constrainer
setups/kleeRangeConstrainerModeB.sh sets up the experiment for KLEE in Mode B with less than or equal constrainer
setups/kleeDivisorConstrainerModeB.sh sets up the experiment for KLEE in Mode B with the divisor constrainer
setups/crestConstrainersModeB.sh sets up the experiment for Crest in Mode B with less than constrainer
setups/fuzzballConstrainersModeB.sh sets up the experiment for FuzzBALL in Mode B with less than constrainer

Mode B - coverage

An example of a coverage oracle being used for crest can be found by . setups/crestCoverageModeB.sh

Mode C experiments

An example of Mode C implementation for Crest can be set up by . setups/modeCCrestSetup.sh

New experiment setup

To generate a new batch of Csmith programs run

~/testSymExec/scripts/SetupExperiment.sh <experimentName>

To change the number of generated test cases change

seq 100 | ...

to the desired number.

To change the Csmith settings used for the experiemnt edit the file ~/testSymExec/scripts/utils/generateTestCase.sh.

Framework Files Overview

The important files of the framework to look out for:

~/testSymExec/scripts/generalSingleExperiment.sh test1.c: Reads the COMPILE_AND_RUN_{1,2} environment variables, runs test1.c with them and compares the output.
~/testSymExec/scripts/runGeneralExperiment.sh: Runs generalSingleExperiment.sh for every test.c file in the current directory
~/testSymExec/scripts/setups/*: Contains configuration files for a particular experiment setup. A file from setups/ should be sourced before running the above two scripts.
~/testSymExec/scripts/CaR/*: Contains compile and run scripts different symbolic executors in different modes
~/testSymExec/scripts/SetupExperiment.sh dirname: generates a new set of test.c in the dirname directory, to be used for experimentation with the first two scripts.
~/testSymExec/scripts/creduceTest.c a skeleton creduce oracle. This should be edited to capture the bug we want to reduce. Then creduce can be ran using this script as the oracle.
~/testSymExec/instrument_lib/*: Different implementations of constrains and instrumentation library
~/testSymExec/src2src/*: Instrumentation implementation

Extending to new symbolic execution engines

To leverage this infrastacture for another symbolic execution engines, one needs to implement a few functions for marking variables as symbolic, logging functions, etc. and write a script for compiling and runnining a program with the desired symbolic executor.

Functions to be implemented

To adapt the infrastructure for a hypothetical symbolic executor AwesomeSE using crash, output and function call oracles we first need to implement the following library methods:

void symbolize_and_constrain_u(uint32_t *var, int size, uint32_t value, char* name): constrain the memory pointed to by var of size size (in bytes) to value
void print_symbolic(const char* name, int64_t *val, char size): print symbolic variable of size size (8,16,32 bits) with the specified name
logFunction(char* name): log the function name. Can be a nop if the function call oracle is not desired.

Inspiration for the implementation can be found in testSymExec/instrument_lib.

Compile and run script contract

The only other thing needed is to write a compile and run script. Such a script receives two arguments:

Location of the source code file that is meant to be compiled and run
Original name of the file (this is used for function call oracle and passing data around. I t can be safely ignored if not needed)

The script should then compile this file, link it with the appropriate constraining and/or instrumentation library and run it using AwesomeSE. The stdout of this script should contain all the data needed to compare this run with the one of the complementary script. For example, the first part could be the output of the run, the second part the function call chain and the last part the coverage information.

The stderr of this script is ignored and can be used for debugging purposes.

We can set the environment variables as COMPILE_AND_RUN_1=CaR/nativeCompileAndRun.sh and COMPILE_AND_RUN_2=CaR/AwsomeSECompileAndRun.sh and run the experiments.

Note regarding the experiments in the paper

Our KLEE experiments were done using an older version of this framework, which we recently refactored in the current form. However, some experiments, such as the grep benchmark and the KLEE mode C ones have not been copied to the new frammework. To run those experiments, you might like to use our old framework which is documented in artifact-outdated-framework.html.