docker load --input paper288_artefact.tgz
docker run --rm -it paper288 bash
The docker image loads you into
/artefact/loops directory where most of the
relevant code is present. The folder has many sub-components and we will walk
To quickly evaluate the main points of the paper go to the Synthesis heading of the detailed description.
If you are interested in the raw data for the figures in the paper or how we processed the data obtained by running the experiments we provide a spreadsheet.
This section focuses on how to extract loops from the programs. The section can be skipped as the extracted loops are already present in the directory.
The repository contains LLVM 6.0 bitcode binaries (
*.bc files) of the programs
we used for our evaluation. We provide the binaries for convenience, interested
parties can compile the appropriate versions of the programs to obtain them.
The automatic part of the extraction process starts with the
LoopFinder/LoopFind/findLoops.sh script, that is a wrapper to the LLVM pass
that analyses the code and outputs raw data. The script can be invoked as:
... Function: hash_delete_items hash.c:197:3 call A write loop! Function: hash_free Function: hash_map hash.c:227:3 loop with a pointer call ...
Which tells us function
hash_delete_items has a loop in file
hash.c at line
197 that is disqualified due to being a write loop. Similarly function
hash_map has a loop at line 227 that was disqualified due to having a function
call with pointers.
To run the script for all the programs we provide a
make target. Running
all should create a
.loopinfo file containing the raw loop information for
each of the programs.
./summarizeLoopInfo.sh can then be used to summarise the
data into the one presented in Table 2:
make all ls *.loopinfo | xargs -L1 ./summarizeLoopInfo.sh
Which should output:
bash 1085 944 506 174 45 diff 186 140 80 20 14 gawk 608 502 292 105 17 ...
The second column is the total number of loops, the third one is the number of loops without an inner loop, the fourth is the number of loops with calls to functions with pointers. Column 5 is the number of loops with writes, the final column is the number of loops we will consider in the manual filtering phase. Note the semantics of the columns of this script are slightly different from Table 2, but Table 2 can be obtained by simple arithmetic.
Loops were then extracted manually and all the extracted loops were put in
FoundLoops.c file. The file is structured as a sequence of comments which
indicate the loop location and the code snippet of the loop.
For the first loop in
FoundLoops.c we would first
grep BINGO m4.loopinfo
builtin.c:1458:7 call BINGO! loop reads from 1 pointers builtin.c:1412:3 call BINGO! loop reads from 1 pointers ...
We would then go to
m4 source code to
builtin.c file and extract the first
//m4-1.4.18/src/builtin.c:1458:7 int len = strlen (str); for (i = len; i > 1; i--) if (str[i - 1] != 'X') break; //m4-1.4.18/src/builtin.c:1412:3 ...
The loops are then given a driver and separated into files with the
cat FoundLoops.c | ./extract-loops.py
This creates a
loopFiles directory with all the loops there. The file names
follow the convention
<utility-name>_<source file of the loop>_<source line of
the loop>.c. The files in the
loopFiles directory were then split into
loops that match our criteria and put into
gdLoops folder. The other loops were
put into one of the
badLoops sub-folders, corresponding to the reason they
were filtered out.
cd badLoops ./summarize.sh
Which gives the data presented in section 4.1.2.
The 115 loops we evaluate our synthesis on are found in the
gdLoops/ folder contains a Makefile, which is the entry point for
the synthesis experiments. We assume
gdLoops/ is your working directory for
To synthesise a program invoke make with
*.prog target. It links the loop with
the code in
synthesize/ directory and runs it with KLEE. When KLEE terminates,
it outputs the synthesised program. The synthesised program is then extracted
from KLEE-s output into the .prog file. We recommend using
hexdump -C to view
the synthesised programs. For example, to synthesise and view the loop from git,
file commit.c at line 628, run:
make git_commit_628.prog hexdump -C git_commit_628.prog
Note that during the synthesis KLEE reports memory errors. These can be ignored. While executing functions on symbolic strings there are obviously paths, which cause memory error. The implementation of the synthesis program is not very careful to constrain symbolic strings such that there can be no such paths, due to simplicity and clarity of implementation. Finally, the last error reported is the assertion failure which indicates a successful synthesis. Making successful synthesis an assertion failure enabled us to easily terminate KLEE at that point.
There are 4 variables you can tweak through make to influence the synthesis: the length of the synthesised program, number of characters to show equivalence with, timeout and the gadgets to synthesise with. For example:
make -B TIME=10 PROGRAM_SIZE=5 EXAMPLE_SIZE=3 GADGETS='X P F' git_commit_628.prog
will synthesise a program for the same loop, with a synthesised program size of 5, proving equivalence up to 3 characters, using only 3 gadgets (X P F) and with timeout of 10 seconds.
To run the synthesis over all loops use:
Note that it might take a while, depending on the timeout you use. The variables described above still apply. To count the number of successfully synthesised loops you can use:
ls *.prog | wc -l
We provide some convenience scripts.
./genResults.sh creates results.html that
does best effort to show a highlighted snippet of the loop as well as the hexdump
of the synthesised program below.
./printTodo.sh outputs all the loops that
The original loop and the synthesised program can then be cross-checked using
*.verify target of the makefile. This can serve as a sanity check that
synthesis ran correctly. You can also change the bound up to which to check to
EXAMPLE_SIZE. It can be larger in this case than in synthesis as it is
a cheaper run.
make EXAMPLE_SIZE=5 git_commit_628.verify
If KLEE doesn’t terminate due to assertion failure, the run showed equivalence up to the bound.
make ver can be used to perform this crosscheck for all the programs
synthesised in the
Here we run the loop summaries with two versions of KLEE. One is the current
master version (
klee in the docker image), which gathers the constraints in
theory of bit-vectors and arrays. It does not use loop summaries, that
is it executes the loop as is. The second version is our modification of KLEE which
gathers the constraints in theory of strings. Therefore it makes use of
loop summaries in that it doesn’t execute the loop of the string function but
directly translates them into constraints. The binary of this version of KLEE is
To perform the comparison use the
*.compare target of the makefile. For our
make git_commit_628.compare cat git_commit_628.compare
git_commit_628.nostr 0.12 git_commit_628.str 0.03
Meaning the vanilla version terminated in 0.12s whereas the string version
terminated in 0.03s.
EXAMPLE_SIZE can again be used to change the lengths of
the strings to do the comparison on.
make comp can be used to perform the comparison on all the synthesised
programs in the
gdLoops directory. This can be used to get the data for Figure 4.
cat *.compare to get the data, then pasted it into google sheets
to pivot and plot it.
The data for Figure 3 can be obtained by running the
script. It calls the
make comp with different values for EXAMPLE_SIZE and puts
compare-[number].cexp files. These can then be again put in a
spreadsheet, pivoted and plotted. Note that
klee-str fails on small fraction
of the loops, due to implementation bugs. We ignore those.
Of course the actual performance numbers you will obtain are sensitive to the machine you use, but you should see the same kind of speed-ups as reported in the paper.
*.native.run make target provides the facility to compare the summarised
loop with the original implementation. For example:
make git_commit_628.native.run cat git_commit_628.native.run
Loop function took 0.421490 seconds Native took 0.205331 seconds Speedup 2.052734 summary 0.421490 0.205331 2.052734
Which means, the original function took 0.42s, the synthesised program took
0.20, with speedup of 2x.
make native_comp can be used to run this for all the
synthesised programs. The plots can be obtained by copying the concatenated data
into a spreadsheet.
To evaluate the refactoring potential the synthesised programs can be inspected as described previously. We provide links to the patches:
To perform iterative deepening first synthesise some programs in the
directory as described above (i.e. with
make synth). Then go to
At a high level iterative deepening works by copying the
and then performing synthesis there. The
this automatically with the
make do target. This will run the iterative
deepening with 1 hour timeout and create
1h.exp directory. The
1.dir 2.dir 3.dir ... subfolders, corresponding to the maximum
program size the synthesis was performed in that subfolder.
make 1h.dat can be run the extract the number of
synthesised loops per maximum program size. To get the data for the plot in
runAll.sh script can be used to ran the experiments. Then run
make 30s.dat 3min.dat 10min.dat 1h.dat to extract the data files.
Optimising the vocabulary is performed in the
gadgets/ directory by running
./optimize.py script. It performs the Gaussian optimisation, creating
directories of the form
exp-[vocabulary used]-[timeout]-[program size].
For example it could create
exp-PBXE-300s-7 directory, which is a copy of the
gdLoops folder, where synthesis was performed using only the P B X E elements
of the vocabulary with 300s timeout and program size of 7.
printResults.sh script prints the name of each of this folder alongside
the number of synthesised programs in that folder, giving the data for Table 4.
Note that there is some randomness in the Gaussian optimisation so your results
might not be exactly the same.
The above process might take a large amount of time. To quickly run this
experiment, you can delete loops in the
gdLoops directory. In the example
below we cherry picked 3 loops that demonstrate the benefits of optimising vocabulary:
$ cd gdLoops/ $ ls *.c | grep -v git_name-hash_60 | grep -v patch_inp_479 | grep -v patch_pch_1223 | xargs rm $ cd ../gadgets/ $ ./optimize.py $ ./printResult.sh exp-CPNBZXEV-300s-7 0 exp-MPNBZIFV-300s-7 3 exp-CZIVS-300s-7 0 exp-MNBXIFV-300s-7 2 exp-MNBZS-300s-7 0 exp-MPBEIFV-300s-7 3 exp-MPNBZFV-300s-7 2 exp-MPNEIFV-300s-7 2 exp-MPNZXEIF-300s-7 2 exp-MPNZXF-300s-7 1 exp-MPZXEIF-300s-7 2 exp-MPZXIF-300s-7 2 exp-MRCPBZEIFV-300s-7 2 exp-MRPNBIFV-300s-7 3 exp-PNBIFV-300s-7 2
This takes about half an hour on my machine, but it can be faster if GP decides to stop earlier. These results say for example that CPNBZXEV vocabulary, synthesised 0/3 loops, while MPNBZIFV synthesised 3/3 loops. The CPNBZXEV vocabulary had the 2 more expensive gadgets ( strchr(C) and strlen(E)), which made it too expensive to synthesise any of the loops in 5 minutes. It is also missing the M gadget which is needed for patch_inp_479. While MPNBZIFV had a smaller vocabulary that even synthesised the reverse loop (git_name-hash_60), which doesn’t get synthesised with the 2 hour timeout (loops were cherry picked to demonstrate that).
So far we presented the individual components of our experiments and how you can
play around with them. Running
cd FullExp && make will run all the experiments
we show in the paper with the configuration we used.
It will create copies of the above folders and run the experiments there, which you can inspect with the methods presented above.