KATCH - ESEC/FSE Artifact -Software Reliability Group

Overview

We are making KATCH available as a self-contained virtual machine (Fedora 16 64bit) available for download. Since it’s really large, please email Cristian Cadar if you’d like to retrieve it.

To use the VM, you will require VMware Workstation or VMware Player. The login credentials are username katch and password kleepatch. Make sure that you have at least 20GB of free disk space for the virtual machine. Roughly 70GB are required to re-run all experiments.

We first describe how to analyze our readily available results and then how to reproduce them by running KATCH.

Analyzing Data

We make our result files directly available in the /infra/results/ folder. They can be viewed directly using a text editor or analyzed with several scripts available in the /infra/postprocessing folder.

First, extract the results:

cd ~/infra/results
tar xjf results.tar.bz2

Some general statistics for all revisions of a particular program can be viewed using the totals.sh script. The following examples use diffutils but the procedure is similar for findutils and binutils.

cd ~/infra/results/diffutils
../../postprocessing/totals.sh

The totals.sh script outputs a summary of all patch basic blocks (targets).

                    all       static    dynamic   noexec    unreach   novered   covered   
------------------------------------------------------------------------------------------
diff                840       136       93        0         0         40        53
diff3               840       11        4         0         0         1         3
sdiff               840       11        7         0         0         3         4
cmp                 840       8         4         0         0         1         3
Total               840       166       108       0         0         45        63

The all column lists the total number of lines contained or modified by the patches, the static column lists the total number of basic blocks corresponding to these lines (equivalent to the Targets column in Table 1 of our paper). The dynamic column lists the total number of basic blocks not covered by the regression tests. In other words, static - dynamic equals the number of basic blocks covered by the testsuite (third column of Table 1). The covered column lists the number of basic blocks covered by KATCH, in addition to those covered by the regression tests, i.e. static - dynamic + covered represents the targets covered by the tests suite and KATCH (last column of Table 1). Finally, the novered column represents the number of basic blocks KATCH could not cover.

Per-revision details can be viewed in the patch.size file, e.g. /infra/results/diffutils/patch.size and per-revision per-program details can be viewed in patch.size.program-name, e.g. /infra/results/diffutils/patch.size.sdiff, in a similar format.

A visual representation of the distibution of distances to each target (as defined in our paper) along with their covered/not covered status can be obtained using the mindistances2.sh script as in the example below:

cd ~/infra/results/diffutils
../../postprocessing/mindistances2.sh
evince ~/infra/postprocessing/histograms/targets-distances2.pdf &

The mindstances2.sh script creates a histogram of the targets based on their distance from the closest test input. The result is stored in ~/infra/postprocessing/histograms/targets-distances2.pdf. This example creates the histogram for diffutils because it is executed in the diffutils folder. To obtain an aggregate result for all systems, similar to Figure 8 in our paper, use the mindistances2-aggregate.sh

cd ~/infra/results
../postprocessing/mindistances2-aggregate.sh diffutils findutils binutils
evince ~/infra/postprocessing/histograms/targets-distances2-aggregate.pdf &

The result is saved in ~/infra/postprocessing/histograms/targets-distances2-aggregate.pdf. Note that for presentation purposes, Figure 8 omits one outlier target.

The raw data from which the coverage information is obtained, is stored in klee-out-n folders, one associated to each target that KATCH attemptes to cover. These folders are created in the same location with the executable in which the target is compiled. For example, for the second target in the diff program from revision HEAD~~18, the raw results are stored in the folder~~ /infra/results/diffutils/l-18/src/klee-out-1/. Such a folder contains a .patch.cov file whenever the target is covered and an associated .ktest file containing the inputs required for reaching the target (see the next section for more details on extracting the inputs).

Checking for Bugs

When finding a bug, KATCH generates an .err file containing the error details and a .ktest file containing the inputs for reproducing the error. Visualizing these inputs is done using the ktest-tool program, e.g.

cd ~/infra/results
$ ktest-tool binutils/l-1831/binutils/klee-out-2/test000003.ktest
ktest file : 'binutils/l-1831/binutils/klee-out-2/test000003.ktest'
args       : ['/data/benchmarking/patchtesting/binutils-1800/l-1831/binutils/objdump.bc', '-W', 'tmpdir/dw2-compressed.o']
num objects: 12
object    0: name: 'argv'
object    0: size: 4
object    0: data: '-W\x00\x00'
object    1: name: 'argv'
object    1: size: 25
object    1: data: 'tmpdir/dw2-compressed.o\x00\x00'
...

$ mkdir tmpdir
$ ktest-tool --extract-file=tmpdir/dw2-compressed.o binutils/l-1831/binutils/klee-out-2/test000003.ktest
$ file tmpdir/dw2-compressed.o.test 
tmpdir/dw2-compressed.o.test: ELF 64-bit LSB no file type, x86-64, invalid version (SYSV)
$ valgrind /usr/bin/objdump -W tmpdir/dw2-compressed.o.test
...
==10969== Invalid read of size 1
...

The output of the ktest-tool command shows that:

The bug was found in the objdump program
The program has to be executed with two arguments (there are two objects with the name argv)
The first argument is -W and the second one is a file name

To extract the file contents which trigger the bug, use the –extract-file argument of ktest-tool and then pass the file to the stock objdump program installed with the Linux distribution to confirm the bug.

More details on the ktest file format are available in the KLEE documentation at http://klee.llvm.org/TestingCoreutils.html (Step 6: Replaying KLEE generated test cases).

Bugs Reports

KATCH identified 14 distinct bugs in binutils, out of which 12 were also present in the latest version. We grouped these bugs by their underlying cause and filed 7 bug reports which were fixed by the developers.

Running Experiments

Note 1: before running the experiments you may have to adjust the KATCH timeout. The timeout for a native (non-VM) experiment on an Intel Xeon@3.50GHz was 10 minutes (diffutils and findutils) and 15 minutes (binutils). As a precaution, we have already increased them in the VM image to 15, respectivelly 20 minutes. You may want to further adjust these values depending on the machine that you are using and avoid running other CPU-intensive applications at the same time. The diffutils and findutils timeout has to be changed in /infra/wrapper.sh.tmpl and the binutils timout has to be changed in /infra/binutils/wrapper.sh.tmpl, by scaling both the --max-time and --max-cp-time arguments. The --max-time argument represents the total maximum allowed time per target, while the --max-cp-time represents the maximum allowed time for the concrete path induced by the selected seed input, as described in section 4.3 (Symbolic Exploration) of the paper.

Note 2: the diffutils compile scripts, and therefore the diffutils experiments, require an internet connection to download program dependencies.

Running the experiments is straightforward, with the general invocation ./analyze-patches-multiple.sh program start-revision end-revision. Both the start and end revisions are integers, translated to actual git revisions as HEADrevision. Therefore 0 is the most recent revision from the repository snapshots available in /repos/. More details about the revisions that we analyzed are available in the Experimental Evaluation section of our paper.

binutils contains significantly more revisions and patches than diffutils and findutils. While the diffutils and findutils experiments take 5h30, respectivelly 10h on our test machine, binutils runs for several days. The binutils experiments also require an additional 45GB of disk space. An alternative is to analyze a smaller number of revisions, e.g. the following command checks only revision HEAD~307 ./analyze-patches-multiple.sh binutils 307 308

diffutils

$ cd ~/infra
$ ./analyze-patches-multiple.sh diffutils 0 175

findutils

$ cd ~/infra
$ ./analyze-patches-multiple.sh findutils 0 125

binutils

$ cd ~/infra
$ ./analyze-patches-multiple.sh binutils 0 2000

To re-run an experiment you have to explicitly delete the data of any previous runs, e.g. to re-run the diffutils experiment:

$ cd ~/infra
$ rm -rf diffutils/l-\* diffutils/log-\* diffutils/patch.size\*
$ ./analyze-patches-multiple.sh diffutils 0 175

Note: The VM includes the latest version of KATCH, which makes slightly different trade-offs than the version used to obtain the original (paper) results. Therefore the results will slightly differ: one less target will be covered for diffutils and findutils while more than 40 additional targets will be covered for binutils.

Source code

The source code is available separately at http://srg.doc.ic.ac.uk/projects/katch/katch-src.tar.bz2. Since KATCH is based on KLEE, you can use the same instructions used to compile KLEE at the time—the earliest ones available on the web would be the best start: http://klee.github.io/releases/docs/v1.3.0/build-llvm29/. It should be also possible to just drop it in the VM and compile it via make.

Feedback

If you run into any issues and you find solutions that would benefit others trying to use KATCH, we would appreciate letting us know.