docker load --input paper288_artefact.tgz
docker run --rm -it paper288 bash
The docker image loads you into /artefact/loops
directory where most of the
relevant code is present. The folder has many sub-components and we will walk
through them.
To quickly evaluate the main points of the paper go to the Synthesis heading of the detailed description.
If you are interested in the raw data for the figures in the paper or how we processed the data obtained by running the experiments we provide a spreadsheet.
This section focuses on how to extract loops from the programs. The section can be skipped as the extracted loops are already present in the directory.
The repository contains LLVM 6.0 bitcode binaries (*.bc
files) of the programs
we used for our evaluation. We provide the binaries for convenience, interested
parties can compile the appropriate versions of the programs to obtain them.
The automatic part of the extraction process starts with the
LoopFinder/LoopFind/findLoops.sh
script, that is a wrapper to the LLVM pass
that analyses the code and outputs raw data. The script can be invoked as:
LoopFinder/LoopFind/findLoops.sh make.bc
which outputs
...
Function: hash_delete_items
hash.c:197:3 call A write loop!
Function: hash_free
Function: hash_map
hash.c:227:3 loop with a pointer call
...
Which tells us function hash_delete_items
has a loop in file hash.c
at line
197 that is disqualified due to being a write loop. Similarly function
hash_map
has a loop at line 227 that was disqualified due to having a function
call with pointers.
To run the script for all the programs we provide a make
target. Running make
all
should create a .loopinfo
file containing the raw loop information for
each of the programs. ./summarizeLoopInfo.sh
can then be used to summarise the
data into the one presented in Table 2:
make all
ls *.loopinfo | xargs -L1 ./summarizeLoopInfo.sh
Which should output:
bash 1085 944 506 174 45
diff 186 140 80 20 14
gawk 608 502 292 105 17
...
The second column is the total number of loops, the third one is the number of loops without an inner loop, the fourth is the number of loops with calls to functions with pointers. Column 5 is the number of loops with writes, the final column is the number of loops we will consider in the manual filtering phase. Note the semantics of the columns of this script are slightly different from Table 2, but Table 2 can be obtained by simple arithmetic.
Loops were then extracted manually and all the extracted loops were put in
the FoundLoops.c
file. The file is structured as a sequence of comments which
indicate the loop location and the code snippet of the loop.
For the first loop in FoundLoops.c
we would first grep
:
grep BINGO m4.loopinfo
giving:
builtin.c:1458:7 call BINGO! loop reads from 1 pointers
builtin.c:1412:3 call BINGO! loop reads from 1 pointers
...
We would then go to m4
source code to builtin.c
file and extract the first
loop of FoundLoops.c
:
//m4-1.4.18/src/builtin.c:1458:7
int len = strlen (str);
for (i = len; i > 1; i--)
if (str[i - 1] != 'X')
break;
//m4-1.4.18/src/builtin.c:1412:3
...
The loops are then given a driver and separated into files with the extract-loops.py
script:
cat FoundLoops.c | ./extract-loops.py
This creates a loopFiles
directory with all the loops there. The file names
follow the convention <utility-name>_<source file of the loop>_<source line of
the loop>.c
. The files in the loopFiles
directory were then split into
loops that match our criteria and put into gdLoops
folder. The other loops were
put into one of the badLoops
sub-folders, corresponding to the reason they
were filtered out.
cd badLoops
./summarize.sh
Which gives the data presented in section 4.1.2.
The 115 loops we evaluate our synthesis on are found in the gdLoops/
folder.
In addition gdLoops/
folder contains a Makefile, which is the entry point for
the synthesis experiments. We assume gdLoops/
is your working directory for
this section.
To synthesise a program invoke make with *.prog
target. It links the loop with
the code in synthesize/
directory and runs it with KLEE. When KLEE terminates,
it outputs the synthesised program. The synthesised program is then extracted
from KLEE-s output into the .prog file. We recommend using hexdump -C
to view
the synthesised programs. For example, to synthesise and view the loop from git,
file commit.c at line 628, run:
make git_commit_628.prog
hexdump -C git_commit_628.prog
Note that during the synthesis KLEE reports memory errors. These can be ignored. While executing functions on symbolic strings there are obviously paths, which cause memory error. The implementation of the synthesis program is not very careful to constrain symbolic strings such that there can be no such paths, due to simplicity and clarity of implementation. Finally, the last error reported is the assertion failure which indicates a successful synthesis. Making successful synthesis an assertion failure enabled us to easily terminate KLEE at that point.
There are 4 variables you can tweak through make to influence the synthesis: the length of the synthesised program, number of characters to show equivalence with, timeout and the gadgets to synthesise with. For example:
make -B TIME=10 PROGRAM_SIZE=5 EXAMPLE_SIZE=3 GADGETS='X P F' git_commit_628.prog
will synthesise a program for the same loop, with a synthesised program size of 5, proving equivalence up to 3 characters, using only 3 gadgets (X P F) and with timeout of 10 seconds.
To run the synthesis over all loops use:
make synth
Note that it might take a while, depending on the timeout you use. The variables described above still apply. To count the number of successfully synthesised loops you can use:
ls *.prog | wc -l
We provide some convenience scripts. ./genResults.sh
creates results.html that
does best effort to show a highlighted snippet of the loop as well as the hexdump
of the synthesised program below. ./printTodo.sh
outputs all the loops that
weren’t synthesised.
The original loop and the synthesised program can then be cross-checked using
the *.verify
target of the makefile. This can serve as a sanity check that
synthesis ran correctly. You can also change the bound up to which to check to
with EXAMPLE_SIZE
. It can be larger in this case than in synthesis as it is
a cheaper run.
make EXAMPLE_SIZE=5 git_commit_628.verify
If KLEE doesn’t terminate due to assertion failure, the run showed equivalence up to the bound.
make ver
can be used to perform this crosscheck for all the programs
synthesised in the gdLoops
directory.
Here we run the loop summaries with two versions of KLEE. One is the current
master version (klee
in the docker image), which gathers the constraints in
theory of bit-vectors and arrays. It does not use loop summaries, that
is it executes the loop as is. The second version is our modification of KLEE which
gathers the constraints in theory of strings. Therefore it makes use of
loop summaries in that it doesn’t execute the loop of the string function but
directly translates them into constraints. The binary of this version of KLEE is
located at /artefact/str.KLEE/bin/klee-str
.
To perform the comparison use the *.compare
target of the makefile. For our
rolling example:
make git_commit_628.compare
cat git_commit_628.compare
which outputs:
git_commit_628.nostr 0.12
git_commit_628.str 0.03
Meaning the vanilla version terminated in 0.12s whereas the string version
terminated in 0.03s. EXAMPLE_SIZE
can again be used to change the lengths of
the strings to do the comparison on.
make comp
can be used to perform the comparison on all the synthesised
programs in the gdLoops
directory. This can be used to get the data for Figure 4.
We used cat *.compare
to get the data, then pasted it into google sheets
to pivot and plot it.
The data for Figure 3 can be obtained by running the ./runCompareExp.sh
script. It calls the make comp
with different values for EXAMPLE_SIZE and puts
them in compare-[number].cexp
files. These can then be again put in a
spreadsheet, pivoted and plotted. Note that klee-str
fails on small fraction
of the loops, due to implementation bugs. We ignore those.
Of course the actual performance numbers you will obtain are sensitive to the machine you use, but you should see the same kind of speed-ups as reported in the paper.
The *.native.run
make target provides the facility to compare the summarised
loop with the original implementation. For example:
make git_commit_628.native.run
cat git_commit_628.native.run
Outputs:
Loop function took 0.421490 seconds
Native took 0.205331 seconds
Speedup 2.052734
summary 0.421490 0.205331 2.052734
Which means, the original function took 0.42s, the synthesised program took
0.20, with speedup of 2x. make native_comp
can be used to run this for all the
synthesised programs. The plots can be obtained by copying the concatenated data
into a spreadsheet.
To evaluate the refactoring potential the synthesised programs can be inspected as described previously. We provide links to the patches:
To perform iterative deepening first synthesise some programs in the gdLoops/
directory as described above (i.e. with make synth
). Then go to
iterativeDeepening/
directory.
At a high level iterative deepening works by copying the gdLoops
directory
and then performing synthesis there. The iterativeDeepining/Makefile
performs
this automatically with the make do
target. This will run the iterative
deepening with 1 hour timeout and create 1h.exp
directory. The 1h.exp
directory has 1.dir 2.dir 3.dir ...
subfolders, corresponding to the maximum
program size the synthesis was performed in that subfolder.
After running make do
, make 1h.dat
can be run the extract the number of
synthesised loops per maximum program size. To get the data for the plot in
Figure 2, runAll.sh
script can be used to ran the experiments. Then run
make 30s.dat 3min.dat 10min.dat 1h.dat
to extract the data files.
Optimising the vocabulary is performed in the gadgets/
directory by running
./optimize.py
script. It performs the Gaussian optimisation, creating
directories of the form exp-[vocabulary used]-[timeout]-[program size]
.
For example it could create exp-PBXE-300s-7
directory, which is a copy of the
gdLoops
folder, where synthesis was performed using only the P B X E elements
of the vocabulary with 300s timeout and program size of 7.
The printResults.sh
script prints the name of each of this folder alongside
the number of synthesised programs in that folder, giving the data for Table 4.
Note that there is some randomness in the Gaussian optimisation so your results
might not be exactly the same.
The above process might take a large amount of time. To quickly run this
experiment, you can delete loops in the gdLoops
directory. In the example
below we cherry picked 3 loops that demonstrate the benefits of optimising vocabulary:
$ cd gdLoops/
$ ls *.c | grep -v git_name-hash_60 | grep -v patch_inp_479 | grep -v patch_pch_1223 | xargs rm
$ cd ../gadgets/
$ ./optimize.py
$ ./printResult.sh
exp-CPNBZXEV-300s-7 0
exp-MPNBZIFV-300s-7 3
exp-CZIVS-300s-7 0
exp-MNBXIFV-300s-7 2
exp-MNBZS-300s-7 0
exp-MPBEIFV-300s-7 3
exp-MPNBZFV-300s-7 2
exp-MPNEIFV-300s-7 2
exp-MPNZXEIF-300s-7 2
exp-MPNZXF-300s-7 1
exp-MPZXEIF-300s-7 2
exp-MPZXIF-300s-7 2
exp-MRCPBZEIFV-300s-7 2
exp-MRPNBIFV-300s-7 3
exp-PNBIFV-300s-7 2
This takes about half an hour on my machine, but it can be faster if GP decides to stop earlier. These results say for example that CPNBZXEV vocabulary, synthesised 0/3 loops, while MPNBZIFV synthesised 3/3 loops. The CPNBZXEV vocabulary had the 2 more expensive gadgets ( strchr(C) and strlen(E)), which made it too expensive to synthesise any of the loops in 5 minutes. It is also missing the M gadget which is needed for patch_inp_479. While MPNBZIFV had a smaller vocabulary that even synthesised the reverse loop (git_name-hash_60), which doesn’t get synthesised with the 2 hour timeout (loops were cherry picked to demonstrate that).
So far we presented the individual components of our experiments and how you can
play around with them. Running cd FullExp && make
will run all the experiments
we show in the paper with the configuration we used.
It will create copies of the above folders and run the experiments there, which you can inspect with the methods presented above.