We are making KATCH available as a self-contained virtual machine (Fedora 16 64bit) available for download. Since it’s really large, please email Cristian Cadar if you’d like to retrieve it.
To use the VM, you will require VMware Workstation or VMware
Player. The login credentials are username katch
and
password kleepatch
. Make sure that you have at least 20GB
of free disk space for the virtual machine. Roughly 70GB are required
to re-run all experiments.
We first describe how to analyze our readily available results and then how to reproduce them by running KATCH.
We make our result files directly available in the
folder. They can be viewed directly
using a text editor or analyzed with several scripts available in the
/infra/results//infra/postprocessing
folder.
First, extract the results:
cd ~/infra/results tar xjf results.tar.bz2
Some general statistics for all revisions of a particular program can be viewed using the totals.sh script. The following examples use diffutils but the procedure is similar for findutils and binutils.
cd ~/infra/results/diffutils ../../postprocessing/totals.sh
The totals.sh script outputs a summary of all patch basic blocks (targets).
all static dynamic noexec unreach novered covered ------------------------------------------------------------------------------------------ diff 840 136 93 0 0 40 53 diff3 840 11 4 0 0 1 3 sdiff 840 11 7 0 0 3 4 cmp 840 8 4 0 0 1 3 Total 840 166 108 0 0 45 63
The all column lists the total number of lines contained or modified by the patches, the static column lists the total number of basic blocks corresponding to these lines (equivalent to the Targets column in Table 1 of our paper). The dynamic column lists the total number of basic blocks not covered by the regression tests. In other words, static - dynamic equals the number of basic blocks covered by the testsuite (third column of Table 1). The covered column lists the number of basic blocks covered by KATCH, in addition to those covered by the regression tests, i.e. static - dynamic + covered represents the targets covered by the tests suite and KATCH (last column of Table 1). Finally, the novered column represents the number of basic blocks KATCH could not cover.
Per-revision details can be viewed in the patch.size
file, e.g.
and per-revision per-program details can be viewed in /infra/results/diffutils/patch.sizepatch.size.program-name
, e.g. /infra/results/diffutils/patch.size.sdiff
,
in a similar format.
A visual representation of the distibution of distances to each target (as defined in our paper) along with their covered/not covered status can be obtained using the mindistances2.sh script as in the example below:
cd ~/infra/results/diffutils ../../postprocessing/mindistances2.sh evince ~/infra/postprocessing/histograms/targets-distances2.pdf &
The mindstances2.sh script creates a histogram of the targets based on their distance from the closest test
input. The result is stored in ~/infra/postprocessing/histograms/targets-distances2.pdf
.
This example creates the histogram for diffutils because it is executed in the diffutils folder. To
obtain an aggregate result for all systems, similar to Figure 8 in our paper, use the mindistances2-aggregate.sh
cd ~/infra/results ../postprocessing/mindistances2-aggregate.sh diffutils findutils binutils evince ~/infra/postprocessing/histograms/targets-distances2-aggregate.pdf &
The result is saved in ~/infra/postprocessing/histograms/targets-distances2-aggregate.pdf
.
Note that for presentation purposes, Figure 8 omits one outlier target.
The raw data from which the coverage information is obtained, is stored in klee-out-n folders, one associated
to each target that KATCH attemptes to cover. These folders are created in the same location with the executable in
which the target is compiled. For example, for the second target in the diff program from revision HEAD18, the raw
results are stored in the folder /infra/results/diffutils/l-18/src/klee-out-1/. Such a folder contains
a .patch.cov file whenever the target is covered and an associated .ktest file containing the inputs required for reaching
the target (see the next section for more details on extracting the inputs).
When finding a bug, KATCH generates an .err file containing the error details and a .ktest file containing the inputs
for reproducing the error. Visualizing these inputs is done using the ktest-tool
program, e.g.
cd ~/infra/results $ ktest-tool binutils/l-1831/binutils/klee-out-2/test000003.ktest ktest file : 'binutils/l-1831/binutils/klee-out-2/test000003.ktest' args : ['/data/benchmarking/patchtesting/binutils-1800/l-1831/binutils/objdump.bc', '-W', 'tmpdir/dw2-compressed.o'] num objects: 12 object 0: name: 'argv' object 0: size: 4 object 0: data: '-W\x00\x00' object 1: name: 'argv' object 1: size: 25 object 1: data: 'tmpdir/dw2-compressed.o\x00\x00' ... $ mkdir tmpdir $ ktest-tool --extract-file=tmpdir/dw2-compressed.o binutils/l-1831/binutils/klee-out-2/test000003.ktest $ file tmpdir/dw2-compressed.o.test tmpdir/dw2-compressed.o.test: ELF 64-bit LSB no file type, x86-64, invalid version (SYSV) $ valgrind /usr/bin/objdump -W tmpdir/dw2-compressed.o.test ... ==10969== Invalid read of size 1 ...
The output of the ktest-tool command shows that:
To extract the file contents which trigger the bug, use the –extract-file argument of ktest-tool and then pass the file to the stock objdump program installed with the Linux distribution to confirm the bug.
More details on the ktest file format are available in the KLEE documentation at http://klee.llvm.org/TestingCoreutils.html (Step 6: Replaying KLEE generated test cases).
KATCH identified 14 distinct bugs in binutils, out of which 12 were also present in the latest version. We grouped these bugs by their underlying cause and filed 7 bug reports which were fixed by the developers.
Note 1: before running the experiments you may have to adjust the
KATCH timeout. The timeout for a native (non-VM) experiment on an
Intel Xeon@3.50GHz was 10 minutes (diffutils and findutils) and 15
minutes (binutils). As a precaution, we have already increased them in
the VM image to 15, respectivelly 20 minutes. You may want to further
adjust these values depending on the machine that you are using and
avoid running other CPU-intensive applications at the same time. The
diffutils and findutils timeout has to be changed in
and the binutils timout has to be
changed in /infra/wrapper.sh.tmpl/infra/binutils/wrapper.sh.tmpl
, by scaling
both the --max-time
and --max-cp-time
arguments. The --max-time
argument represents the total
maximum allowed time per target, while the --max-cp-time
represents
the maximum allowed time for the concrete path induced by the selected
seed input, as described in section 4.3 (Symbolic Exploration) of the paper.
Note 2: the diffutils compile scripts, and therefore the diffutils experiments, require an internet connection to download program dependencies.
Running the experiments is straightforward, with the general invocation ./analyze-patches-multiple.sh program start-revision end-revision
.
Both the start and end revisions are integers, translated to actual git revisions as HEADrevision. Therefore 0 is the most recent revision from the
repository snapshots available in /repos/. More details about the revisions that we analyzed are available in the Experimental Evaluation
section of our paper.
binutils contains significantly more revisions and patches than diffutils and findutils. While the diffutils and findutils experiments take 5h30,
respectivelly 10h on our test machine, binutils runs for several days. The binutils experiments also require an additional 45GB of disk space. An alternative
is to analyze a smaller number of revisions, e.g. the following command checks only revision HEAD~307 ./analyze-patches-multiple.sh binutils 307 308
diffutils
$ cd ~/infra $ ./analyze-patches-multiple.sh diffutils 0 175
findutils
$ cd ~/infra $ ./analyze-patches-multiple.sh findutils 0 125
binutils
$ cd ~/infra $ ./analyze-patches-multiple.sh binutils 0 2000
To re-run an experiment you have to explicitly delete the data of any previous runs, e.g. to re-run the diffutils experiment:
$ cd ~/infra $ rm -rf diffutils/l-\* diffutils/log-\* diffutils/patch.size\* $ ./analyze-patches-multiple.sh diffutils 0 175
Note: The VM includes the latest version of KATCH, which makes slightly different trade-offs than the version used to obtain the original (paper) results. Therefore the results will slightly differ: one less target will be covered for diffutils and findutils while more than 40 additional targets will be covered for binutils.
The source code is available separately at
http://srg.doc.ic.ac.uk/projects/katch/katch-src.tar.bz2. Since KATCH
is based on KLEE, you can use the same instructions used to compile
KLEE at the time—the earliest ones available on the web would be the
best start: http://klee.github.io/releases/docs/v1.3.0/build-llvm29/. It should be also
possible to just drop it in the VM and compile it via make
.
If you run into any issues and you find solutions that would benefit others trying to use KATCH, we would appreciate letting us know.