Getting started

  1. Get the docker image here
  2. Install docker, for Debian the install instructions are here
  3. Load the docker image: docker load --input paper288_artefact.tgz
  4. Run the docker image: docker run --rm -it paper288 bash

The docker image loads you into /artefact/loops directory where most of the relevant code is present. The folder has many sub-components and we will walk through them.

To quickly evaluate the main points of the paper go to the Synthesis heading of the detailed description.

If you are interested in the raw data for the figures in the paper or how we processed the data obtained by running the experiments we provide a spreadsheet.

Detailed description

Extracting loops

This section focuses on how to extract loops from the programs. The section can be skipped as the extracted loops are already present in the directory.

The repository contains LLVM 6.0 bitcode binaries (*.bc files) of the programs we used for our evaluation. We provide the binaries for convenience, interested parties can compile the appropriate versions of the programs to obtain them.

The automatic part of the extraction process starts with the LoopFinder/LoopFind/findLoops.sh script, that is a wrapper to the LLVM pass that analyses the code and outputs raw data. The script can be invoked as:

LoopFinder/LoopFind/findLoops.sh make.bc

which outputs

...
Function: hash_delete_items
hash.c:197:3 call  A write loop!
Function: hash_free
Function: hash_map
hash.c:227:3 loop with a pointer call
...

Which tells us function hash_delete_items has a loop in file hash.c at line 197 that is disqualified due to being a write loop. Similarly function hash_map has a loop at line 227 that was disqualified due to having a function call with pointers.

To run the script for all the programs we provide a make target. Running make all should create a .loopinfo file containing the raw loop information for each of the programs. ./summarizeLoopInfo.sh can then be used to summarise the data into the one presented in Table 2:

make all
ls *.loopinfo | xargs -L1 ./summarizeLoopInfo.sh

Which should output:

bash   1085   944   506   174  45
diff   186   140   80   20  14
gawk   608   502   292   105  17
...

The second column is the total number of loops, the third one is the number of loops without an inner loop, the fourth is the number of loops with calls to functions with pointers. Column 5 is the number of loops with writes, the final column is the number of loops we will consider in the manual filtering phase. Note the semantics of the columns of this script are slightly different from Table 2, but Table 2 can be obtained by simple arithmetic.

Manual phase

Loops were then extracted manually and all the extracted loops were put in the FoundLoops.c file. The file is structured as a sequence of comments which indicate the loop location and the code snippet of the loop.

For the first loop in FoundLoops.c we would first grep:

grep BINGO m4.loopinfo

giving:

builtin.c:1458:7 call  BINGO!  loop reads from 1 pointers
builtin.c:1412:3 call  BINGO!  loop reads from 1 pointers
...

We would then go to m4 source code to builtin.c file and extract the first loop of FoundLoops.c:

//m4-1.4.18/src/builtin.c:1458:7

int len = strlen (str);
for (i = len; i > 1; i--)
    if (str[i - 1] != 'X')
       break;


//m4-1.4.18/src/builtin.c:1412:3
...

The loops are then given a driver and separated into files with the extract-loops.py script:

cat FoundLoops.c | ./extract-loops.py

This creates a loopFiles directory with all the loops there. The file names follow the convention <utility-name>_<source file of the loop>_<source line of the loop>.c. The files in the loopFiles directory were then split into loops that match our criteria and put into gdLoops folder. The other loops were put into one of the badLoops sub-folders, corresponding to the reason they were filtered out.

cd badLoops 
./summarize.sh

Which gives the data presented in section 4.1.2.

Synthesis

The 115 loops we evaluate our synthesis on are found in the gdLoops/ folder. In addition gdLoops/ folder contains a Makefile, which is the entry point for the synthesis experiments. We assume gdLoops/ is your working directory for this section.

To synthesise a program invoke make with *.prog target. It links the loop with the code in synthesize/ directory and runs it with KLEE. When KLEE terminates, it outputs the synthesised program. The synthesised program is then extracted from KLEE-s output into the .prog file. We recommend using hexdump -C to view the synthesised programs. For example, to synthesise and view the loop from git, file commit.c at line 628, run:

make git_commit_628.prog
hexdump -C git_commit_628.prog

Note that during the synthesis KLEE reports memory errors. These can be ignored. While executing functions on symbolic strings there are obviously paths, which cause memory error. The implementation of the synthesis program is not very careful to constrain symbolic strings such that there can be no such paths, due to simplicity and clarity of implementation. Finally, the last error reported is the assertion failure which indicates a successful synthesis. Making successful synthesis an assertion failure enabled us to easily terminate KLEE at that point.

There are 4 variables you can tweak through make to influence the synthesis: the length of the synthesised program, number of characters to show equivalence with, timeout and the gadgets to synthesise with. For example:

make -B TIME=10 PROGRAM_SIZE=5 EXAMPLE_SIZE=3 GADGETS='X P F' git_commit_628.prog 

will synthesise a program for the same loop, with a synthesised program size of 5, proving equivalence up to 3 characters, using only 3 gadgets (X P F) and with timeout of 10 seconds.

To run the synthesis over all loops use:

make synth

Note that it might take a while, depending on the timeout you use. The variables described above still apply. To count the number of successfully synthesised loops you can use:

ls *.prog | wc -l

We provide some convenience scripts. ./genResults.sh creates results.html that does best effort to show a highlighted snippet of the loop as well as the hexdump of the synthesised program below. ./printTodo.sh outputs all the loops that weren’t synthesised.

Cross-check original and synthesised loop

The original loop and the synthesised program can then be cross-checked using the *.verify target of the makefile. This can serve as a sanity check that synthesis ran correctly. You can also change the bound up to which to check to with EXAMPLE_SIZE. It can be larger in this case than in synthesis as it is a cheaper run.

make EXAMPLE_SIZE=5 git_commit_628.verify

If KLEE doesn’t terminate due to assertion failure, the run showed equivalence up to the bound.

make ver can be used to perform this crosscheck for all the programs synthesised in the gdLoops directory.

Evaluate loop summaries for symbolic execution (Section 4.3)

Here we run the loop summaries with two versions of KLEE. One is the current master version (klee in the docker image), which gathers the constraints in theory of bit-vectors and arrays. It does not use loop summaries, that is it executes the loop as is. The second version is our modification of KLEE which gathers the constraints in theory of strings. Therefore it makes use of loop summaries in that it doesn’t execute the loop of the string function but directly translates them into constraints. The binary of this version of KLEE is located at /artefact/str.KLEE/bin/klee-str.

To perform the comparison use the *.compare target of the makefile. For our rolling example:

make git_commit_628.compare
cat git_commit_628.compare

which outputs:

git_commit_628.nostr 0.12
git_commit_628.str 0.03

Meaning the vanilla version terminated in 0.12s whereas the string version terminated in 0.03s. EXAMPLE_SIZE can again be used to change the lengths of the strings to do the comparison on.

make comp can be used to perform the comparison on all the synthesised programs in the gdLoops directory. This can be used to get the data for Figure 4. We used cat *.compare to get the data, then pasted it into google sheets to pivot and plot it.

The data for Figure 3 can be obtained by running the ./runCompareExp.sh script. It calls the make comp with different values for EXAMPLE_SIZE and puts them in compare-[number].cexp files. These can then be again put in a spreadsheet, pivoted and plotted. Note that klee-str fails on small fraction of the loops, due to implementation bugs. We ignore those.

Of course the actual performance numbers you will obtain are sensitive to the machine you use, but you should see the same kind of speed-ups as reported in the paper.

Evaluate loop summaries for optimisation (Section 4.4)

The *.native.run make target provides the facility to compare the summarised loop with the original implementation. For example:

make git_commit_628.native.run
cat git_commit_628.native.run

Outputs:

Loop function took 0.421490 seconds
Native took 0.205331 seconds
Speedup 2.052734
summary 0.421490 0.205331 2.052734

Which means, the original function took 0.42s, the synthesised program took 0.20, with speedup of 2x. make native_comp can be used to run this for all the synthesised programs. The plots can be obtained by copying the concatenated data into a spreadsheet.

Evaluate loop summaries for refactoring (Section 4.5)

To evaluate the refactoring potential the synthesised programs can be inspected as described previously. We provide links to the patches:

Iterative deepening

To perform iterative deepening first synthesise some programs in the gdLoops/ directory as described above (i.e. with make synth). Then go to iterativeDeepening/ directory.

At a high level iterative deepening works by copying the gdLoops directory and then performing synthesis there. The iterativeDeepining/Makefile performs this automatically with the make do target. This will run the iterative deepening with 1 hour timeout and create 1h.exp directory. The 1h.exp directory has 1.dir 2.dir 3.dir ... subfolders, corresponding to the maximum program size the synthesis was performed in that subfolder.

After running make do, make 1h.dat can be run the extract the number of synthesised loops per maximum program size. To get the data for the plot in Figure 2, runAll.sh script can be used to ran the experiments. Then run make 30s.dat 3min.dat 10min.dat 1h.dat to extract the data files.

Optimising the vocabulary

Optimising the vocabulary is performed in the gadgets/ directory by running ./optimize.py script. It performs the Gaussian optimisation, creating directories of the form exp-[vocabulary used]-[timeout]-[program size].

For example it could create exp-PBXE-300s-7 directory, which is a copy of the gdLoops folder, where synthesis was performed using only the P B X E elements of the vocabulary with 300s timeout and program size of 7.

The printResults.sh script prints the name of each of this folder alongside the number of synthesised programs in that folder, giving the data for Table 4. Note that there is some randomness in the Gaussian optimisation so your results might not be exactly the same.

The above process might take a large amount of time. To quickly run this experiment, you can delete loops in the gdLoops directory. In the example below we cherry picked 3 loops that demonstrate the benefits of optimising vocabulary:

$ cd gdLoops/
$ ls *.c | grep -v git_name-hash_60 | grep -v patch_inp_479 | grep -v patch_pch_1223 | xargs rm
$ cd ../gadgets/
$ ./optimize.py
$ ./printResult.sh 
exp-CPNBZXEV-300s-7 0
exp-MPNBZIFV-300s-7 3
exp-CZIVS-300s-7 0
exp-MNBXIFV-300s-7 2
exp-MNBZS-300s-7 0
exp-MPBEIFV-300s-7 3
exp-MPNBZFV-300s-7 2
exp-MPNEIFV-300s-7 2
exp-MPNZXEIF-300s-7 2
exp-MPNZXF-300s-7 1
exp-MPZXEIF-300s-7 2
exp-MPZXIF-300s-7 2
exp-MRCPBZEIFV-300s-7 2
exp-MRPNBIFV-300s-7 3
exp-PNBIFV-300s-7 2

This takes about half an hour on my machine, but it can be faster if GP decides to stop earlier. These results say for example that CPNBZXEV vocabulary, synthesised 0/3 loops, while MPNBZIFV synthesised 3/3 loops. The CPNBZXEV vocabulary had the 2 more expensive gadgets ( strchr(C) and strlen(E)), which made it too expensive to synthesise any of the loops in 5 minutes. It is also missing the M gadget which is needed for patch_inp_479. While MPNBZIFV had a smaller vocabulary that even synthesised the reverse loop (git_name-hash_60), which doesn’t get synthesised with the 2 hour timeout (loops were cherry picked to demonstrate that).

Running the full set of experiments

So far we presented the individual components of our experiments and how you can play around with them. Running cd FullExp && make will run all the experiments we show in the paper with the configuration we used.

It will create copies of the above folders and run the experiments there, which you can inspect with the methods presented above.