This artefact provides a Docker image that contains:
#04f5031c
docker load --input kleesa_artefact.tgz
docker run -it --ulimit nofile=1000000:1000000 klee-sa/artefact /bin/bash
In our cluster setup we use a less restricted Docker environment by adding --privileged --security-opt seccomp=unconfined --security-opt apparmor=unconfined
.
We expect that the experiments have a similar outcome without these flags.
Alternatives like Podman should work as well but were not tested by us.
Midnight commander (mc
), nano
and tmux
are pre-installed, as well as a complete build setup for all tools and benchmarks.
Other packages can be installed as usual via apt update && apt install <package>
.
The aim of our paper is to confirm true positive reports from off-the-shelf static analysers (SA) with dynamic symbolic execution (DSE) by guiding a DSE engine along the generated SA traces.
We give a short overview of how our tooling combines Clang Static Analyzer with the KLEE symbolic execution engine, how to use Infer to generate traces, and finally how bugs can be injected into arbitrary C code. A more detailed description of the tools can be found in the following sections.
The instrumentation/injection tools are research tools and after our initial negative results we did not develop them further to standalone tools. Expect them to fail whenever the use-case differs from our evaluation.
First, we apply Clang Static Analyzer to the source code from Figure 1a:
$ cd /project/example
$ cp figure1.c figure1_original.c # create backup
$ scan-build -o csa_report clang figure1.c -o figure1 # run CSA
Its report can be read in a browser (replace fields in <>
with actual values) and should contain the use-after-free bug from Figure 1b.
$ elinks csa_report/<report>/report-<tag>.html # exit with q
Afterwards, we convert that report into a simpler format (uaf.csa
) and instrument the original program:
$ html_parser.py csa_report/<report>/report-<id>.html > uaf.csa
$ instrumentation figure1.c uaf.csa 5 -- -I/project/root/lib/clang/11.0.0/include # instrument source file
figure1.c
now contains calls to instrumentation functions (INSTR_LINE_*
) which are defined in inst.c
(Figure 1c).
$ less -SN figure1.c
$ less -SN inst.c
To run programs with the symbolic execution engine, in our case KLEE, they have to be compiled to LLVM bitcode (.bc
):
$ clang -emit-llvm -fno-discard-value-names -Wall -Wextra -g -O1 -Xclang -disable-llvm-passes -D__NO_STRING_INLINES -D_FORTIFY_SOURCE=0 -U__OPTIMIZE__ -c figure1_original.c # original
$ clang -emit-llvm -fno-discard-value-names -Wall -Wextra -g -O1 -Xclang -disable-llvm-passes -D__NO_STRING_INLINES -D_FORTIFY_SOURCE=0 -U__OPTIMIZE__ -I /project/root-top/include -c figure1.c # instrumented
First run the original program:
$ klee --exit-on-error-location=figure1_original.c:47 --only-output-states-covering-new --posix-runtime --libc=uclibc figure1_original.bc --sym-stdin 8
...
KLEE: ERROR: figure1_original.c:47: memory error: out of bound pointer (asm: 733)
KLEE: NOTE: now ignoring this error at this location
KLEE: halting execution, dumping remaining states
KLEE: done: total instructions = 3803911
KLEE: done: completed paths = 7175
KLEE: done: generated tests = 15
Then run the instrumented code together with our targeted searcher. This combination reduces the number of instructions and hence the run-time significantly:
$ klee --search=step-over-minnextdistpath --exit-on-error-location=figure1.c:49 --only-output-states-covering-new --posix-runtime --libc=uclibc figure1.bc --sym-stdin 8
...
KLEE: ERROR: figure1.c:49: memory error: out of bound pointer (asm: 8849)
KLEE: NOTE: now ignoring this error at this location
KLEE: halting execution, dumping remaining states
KLEE: done: total instructions = 57784
KLEE: done: completed paths = 43
KLEE: done: generated tests = 7
Keep in mind there is non-determinism in KLEE and the numbers above might vary slightly. KLEE’s statistics for the two runs can be displayed with:
$ klee-stats --print-all klee-out-0 klee-out-1 | less -S ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | Path | Instrs| Time(s)| TUser(s)| ICov(%)| BCov(%)| ICount| TSolver(s)| TSolver(%)| States| AvgStates| Mem(MB)| Queries| AvgQC| Tcex(%)| Tfork(%)| TResolve(%)| QCexCMisses| QCexCHits| KStates| NumAssume| MaxAssume| MaxReachedAssume| ReachedAssume| SkippedAssume| ExitOnLocation| SearcherTime| CoveredInstructions| FinalAssumeInExitHandler| FinalAssumeStates| FitNegFail| FitNegSAT| FitNegUNSAT| FullBranches| NumBranches| NumQueryConstructs| PartialBranches| QueryTime| RelCexCacheTime| RelForkTime| RelResolveTime| RelSearcherTime| RelUserTime| UncoveredInstructions| UnreachableAssumeStates| avgMem| maxMem| maxState| ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |klee-out-0| 3803911| 42.17| 13.79| 23.80| 14.70| 13204| 28.86| 68.44| 0| 2960.12| 35.65| 1115| 107.00| 27.22| 0.77| 0.23| 1115| 5146| 0| 0| 0| 0| 0| 0| 1| 0.19| 3143| 0| 0| 0| 0| 0| 60| 891| 119470| 142| 26.97| 64.54| 1.83| 0.55| 0.45| 32.70| 10061| 0| 120.02| 217| 6606| |klee-out-1| 57784| 2.22| 1.01| 23.81| 14.09| 13294| 2.00| 89.89| 0| 8.50| 35.89| 143| 63.00| 1.97| 0.01| 0.00| 143| 329| 0| 11| 10| 10| 102| 0| 1| 0.00| 3165| 0| 0| 0| 0| 0| 55| 891| 9014| 141| 1.96| 88.50| 0.32| 0.16| 0.02| 45.34| 10129| 0| 35.86| 37| 21| ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |Total (2) | 3861695| 44.39| 14.80| 23.81| 14.39| 26498| 30.86| 79.17| 0| 1484.31| 71.54| 1258| 85.00| 14.59| 0.39| 0.12| 1258| 5475| 0| 11| 10| 10| 102| 0| 2| 0.19| 6308| 0| 0| 0| 0| 0| 115| 1782| 128484| 283| 28.93| 153.04| 2.15| 0.71| 0.48| 78.04| 20190| 0| 155.88| 254| 6627| ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
During our experiments we observed that Infer sometimes generates infeasible traces.
It is demonstrated by an example in the paper.
In this program, Infer’s trace iterates the loop once (i = 1
, x
is even at the end), enters the loop a second time (i = 2
, x = x + 2
) but then claims that x % 2 == 0
is false, which is a contradiction as x
is even.
You can generate and inspect the trace with:
$ rm -rf infer-out
$ cat inconsistent.c # show source code
$ infer run --no-filtering --biabduction -- gcc inconsistent.c -o inconsistent
$ jq '.[] | select(.bug_type == "BIABDUCTION_MEMORY_LEAK")' infer-out/report.json | less -S
Parsing Infer reports is a little more complex and we refer to the container-driver.sh
script in /tmp
.
On a high level, you have to extract the relevant report from infer-out/report.json
with jq
and then use our JSON-parser as seen in the script (alias json_parser
works in the terminal).
The json parser currently only works with injected bugs and does not support traces with multiple parts/sections (e.g. figure1.c
above).
For our evaluation and the generated traces this functionality was not needed.
The Docker container provides two programs to inject bugs into C code: null-deref
and use-after-free
.
Given an arbitrary C program:
$ cd /project/example
$ nl -b a injection.c
1 #include<stdio.h>
2 #include<stdlib.h>
3
4 int main(int argc, char *argv[]) {
5 int i = argc + 5;
6
7 if (i == 8) {
8 --i;
9 puts("hello world");
10 } else {
11 ++i;
12 }
13
14 return i;
15 }
a bug, here null dereference, can be injected at a specific non-empty line (11
):
$ cp injection.c injection_original.c # create backup
$ null-deref injection.c 11 -- -I/project/root/lib/clang/11.0.0/include # inject bug
$ nl -b a injection.c
1 #include<stdio.h>
2 #include<stdlib.h>
3
4 int wtmp = 1, *xtmp = &wtmp;
5 int main(int argc, char *argv[]) {
6 int i = argc + 5;
7
8 if (i == 8) {
9 --i;
10 puts("hello world");
11 } else {
12 {xtmp = NULL; printf("%d", *xtmp);} ++i;
13 }
14
15 return i;
16 }
or a sequence of lines (5
, 11
):
$ cp injection_original.c injection.c # restore backup
$ null-deref injection.c 5 11 -- -I/project/root/lib/clang/11.0.0/include
$ nl -b a injection.c
#include<stdio.h>
#include<stdlib.h>
int wtmp = 1, *xtmp = &wtmp;
int main(int argc, char *argv[]) {
xtmp = NULL; int i = argc + 5;
if (i == 8) {
--i;
puts("hello world");
} else {
{printf("%d", *xtmp);} ++i;
}
return i;
}
A static trace generated by an analyser provides a subset of events of an execution path that leads to a reported bug. We parse traces to extract relevant information such as the number of steps in the path and their corresponding locations in the source code along with the message associated with each step. The following instructions instrument a source file, provided CSA or Infer generates a trace for it:
$ instrumentation <src_file> <trace> 5
The ominous 5
at the end selects the instrumentation mode.
We only use that one in the paper.
For more complex programs compilation flags can be forwarded after --
.
For instance, Coreutils require additional include paths:
$ cd /project/src/coreutils-8.31/src
$ instrumentation <src_file> <trace> 5 -- -I./lib -I./obj-llvm/lib -I./obj-llvm/src -I/project/root/lib/clang/11.0.0/include
There could be multiple steps in the trace associated with a given location in the source file:
Line no. | Step no. | Message |
---|---|---|
28 | 2 | Loop condition is true |
28 | 6 | Loop condition is false |
For example, line number 28 is an entry point of a while
-loop.
(We show only the steps associated with line number 28 in the source code.)
Step 2 of the trace indicates that the condition of the while
-loop holds and the control should flow into the body of the loop, whereas step 6 indicates the termination of the loop.
The source code is instrumented as follows for the above two steps of the trace:
Original code snippet | Instrumented code snippet | inst.c |
---|---|---|
|
01: #include "inst.c" |
bool INSTR_LINE_28(bool COND) |
The traces associated with other control-flow elements like if-then-else
and switch
are handled in a similar fashion.
Instrumentation for a step with a message type concretizing a variable:
Line no. | Step no. | Message |
---|---|---|
46 | 4 | Assuming x is equal to 0 |
is performed below:
Original code | Instrumented code | inst.c |
---|---|---|
|
01: #include "inst.c" |
void INSTR_LINE_46(int X) |
As described in the paper, we struggled to find real-world bugs with CSA and Infer and decided to inject artificial bugs for our evaluation. In order to inject a multi-event null dereference bug, specify the source file in which you want to inject the bug and the sequence of locations at which you would like to add the events for the bug. The below instruction modifies the source file by injecting a null dereference bug.
$ null-deref <src_file> <sequence_of_locations>
Similarly, you can inject a use-after-free bug:
$ use-after-free <src_file> <sequence_of_locations>
It is possible to forward compilation flags as well and we do this in the Coreutils experiment to prevent build errors. Depending on the application, an invocation example is:
/project/src/coreutils-8.31/src> null-deref <src_file> <sequence_of_locations> -- -I./lib -I./obj-llvm/lib -I./obj-llvm/src -I/project/root/lib/clang/11.0.0/include
We chose 10 applications from GNU Coreutils 8.31 for fault injection experiments: comm
, csplit
, cut
, env
, join
, ln
, nl
, od
, split
, and uniq
.
The inclusion criteria for these experiments are based on the following factors:
Our initial coverage runs are in /project/coverage_results
.
There is a .csv file for each application (and a second one in cov1/
for the first run).
kill_60.csv
lists excluded tools and the reason for exclusion.
The format for the coverage logs is:
elapsed time (µs), assembly line, source file, source line, no. of executed instructions
The injection/instrumentation script is located in /tmp
.
To re-run the (time-consuming) instrumentation process, (re)move the existing instrumentation
directory and run the script to inject bugs at lines that were covered by KLEE in 10-120 minutes:
$ cd /tmp
$ mv /project/instrumentation /project/instrumentation_original
$ ./container-driver.sh
The APPS
and BUG_TYPE
variables in the script header can be modified to only instrument a subset of applications and bug classes.
We provide this script only for the sake of completeness.
This step is not required to re-run our evaluation, as the Docker image already contains all benchmarks.
Bugs were injected at locations that are hard to reach for KLEE without any static guidance. We used the coverage information of KLEE for our selected Coreutils (see above section) and randomly selected a number of program statements for fault injection provided KLEE took more than 10 minutes to cover that program statement for the first time.
This allowed us to inject 297 1-event bugs, 632 2-event, 478 3-event, and 357 4-event bugs across all 10 Coreutils. The number of distinct 1-event bugs is lower than for other counts because a single statement may be reachable by multiple execution paths, catering for numerous multi-event bugs.
All injected bugs for Coreutils 8.31 can be found in the sub-directories /project/null-dereference-bugs
and /project/use-after-free-bugs
.
For instance, a 4-step null-dereference bug in lines 284, 194, 385, 243 for comm
is located in /project/null-dereference-bugs/comm/comm-4/comm-284-194-385-243/fault/comm.c
.
Simply replace the original file in the Coreutils repository with this file and run the analysis phase of a static analyser.
The results highly depend on the version of the analyser, its command line flags and if the program under test is built in-tree or out-of-tree.
A large percentage of 1-event and 2-event bugs were found by CSA and Infer but only a few 3- and 4-event bugs were found by the analysers.
For the Clang Static Analyzer we noticed that an under-approximation of the error()
function lead to missed bugs.
For the KLEE runs in §5.3 we manually selected 55 distinct bugs across all the 10 Coreutils based on the traces generated by CSA and Infer.
The details about selected bugs can be found in /project/instrumentation-info
.
We extended KLEE in several ways and list the most relevant flags and statistics below.
Be aware, that in our implementation the function assume_sa
from the paper is named klee_assume_sa
.
--search=step-over-minnextdistpath
: enable the targeted searcher (we use it in combination with -search=random-path
)-skip-step-when-unreachable
: skip intermediate steps that become unreachable (enabled in all experiment runs)-target-only-last-step
: target only the klee_assume_sa
call with the highest step number (TargetLast
in paper)--assume-sa-mode=(noop|fit|strict)
: strategies to handle klee_assume_sa
calls, we use different names in the paper (noop=ignore, fit=try, strict=require)--assume-sa-order=(sequential|max)
: keep step order or continue with max. reached step, we only use sequential
in our experiments--exit-on-early-termination
: terminate KLEE when it starts terminating states due to memory pressure (used in initial runs to determine the maximum number of states for threshold)--exit-on-error-location=<location>
: exit when error location reached, location
is a file name or a file name and line number separated by a colon (e.g. uniq.c:653
)--test-fit-condition
: determine if the negated constraint in klee_assume_sa
is feasible (see statistics below)--state-limit=<n>
: set a limit of n
active states (checked every 65k instructions)--uncovered-update-after-instructions=<n>
re-calculate coverage information after n
instructions (to improve determinism across runs)Relevant new statistics are:
ExitOnLocation
: number of states terminated at error location (means bug found)MaxAssume
: largest step number in klee_assume_sa
call in bitcodeMaxReachedAssume
: largest step number reached in klee_assume_sa
callNumAssume
: number of klee_assume_sa
call sitesReachedAssume
: number of executed klee_assume_sa
callsSkippedAssume
: number of klee_assume_sa
calls that were unreachable and hence skippedUnreachableAssumeStates
: number of states terminated due to unreachable next stepWhen condition checks (--test-fit-condition
) for klee_assume_sa
calls are enabled, the following statistics are populated:
FitNegFail
: solver failure during testFitNegSAT
: negated condition satisfiableFitNegUNSAT
: negated condition unsatisfiableThe raw data for Figure 6 can be found in /project/results/figure_6.csv
and /project/results/figure_6_targetlast.csv
.
We repeated all runs 5 times and each experiment is named
<app>_<step>(-<step>)*_(ignore|require|try)_(default|targeted)_(Infer|CSA)_<run>(_targetlast)?
in the Path
column.
For instance, ln_194-351_ignore_targeted_CSA_0_targetlast
describes the first (0
) run of ln
with steps in lines 194
, 351
, ignore
strategy, and the targeted
searcher which in this case only targets the last step (targetlast
).
The trace from CSA
was used for this experiment.
For Section §5.3 we did an additional run, summarised in /project/results/fitneg.csv
, to evaluate the percentage of constraints in klee_assume_sa
that were already implied: in 875,886
cases the negated constraint has been infeasible (FitNegUNSAT
) and only in 192
cases (FitNegSAT
) the negated constraint could have been added as well with the given path constraints.
That means in 99.98% of the cases the provided constraint did not (significantly) restrict the search space further.
We provide two scripts that can be used to reproduce our results:
/tmp/run-exp.sh
: Runs 55 experiments with five different seeds to initialise KLEE’s random number generator, three strategies to handle klee_assume_sa
, and both searchers. On our machines all 1650 runs finished in 812 machine hours and required 13GB of disk space./tmp/run-exp-targetlast.sh
: Runs 55 experiments with five seeds, the ignore
strategy and the targeted searcher that in this case only targets the last step. On our machines all 275 runs finished in 45 machine hours and required 3GB of disk space.KLEE’s output directories will be written to /tmp/results
.
The directory names contain the tool name, relevant source lines, klee_assume_sa
strategy, searcher, static analyser used to generate the trace, the seed id (the counter for the experiment repetitions), and if the searcher only targeted the last step, e.g. uniq_499-513-538-654_noop_targeted_CSA_4_targetonlylast
.
KLEE’s klee-stats
tool can be used to generate a summary of all runs:
/tmp/results> klee-stats --print-all --table-format=csv . > summary.csv
or inspect the results in the terminal:
/tmp/results> klee-stats --print-all . | less -S
The source code of our KLEE version is located in /project/src/klee-src-top
.
It can be modified and then easily rebuild from /project/src/klee-build-top
.
For the sake of brevity, we omit a description of all modifications and point to the two core components:
StepOverMinNextPathSearcher
and can be found in lib/Core/Searcher.cpp
.klee_assume_sa
calls are handled in lib/Core/SpecialFunctionHandler.cpp
by the handleAssumeSA
method.For Table 3 we used Clang Static Analyzer and Infer with their default flags. We provide all reports in:
/project/sa-reports/csa
: reports are html files and can be inspected with elinks
/project/sa-reports/infer
: report summaries as json and txt filesUnfortunately, the build directories and esp. Infer’s SQLite databases require several gigabytes of disk space and we refrained from adding them to the Docker image. To reproduce our results, the applications can be downloaded from the respective project pages and build with CSA or Infer. We used the following configure flags:
Application | Configure flags |
---|---|
apr-1.5.2 | --disable-threads --disable-posix-shm --disable-dso --disable-lfs |
flex-2.5.39 | --disable-nls |
gawk-4.1.2 | --disable-mpfr --disable-largefile --disable-nls --disable-extensions |
bc-1.06 | |
binutils-2.25.1 | --disable-nls --disable-largefile --disable-gdb --disable-sim --disable-readline --disable-libdecnumber --disable-libquadmath --disable-libstdcxx --disable-ld --disable-gprof --disable-intl --disable-etc |
combine-0.4.0 | --disable-nls --disable-threads |
coreutils-8.24 | --disable-nls --without-gmp --disable-threads --disable-acl --disable-libsmack --disable-xattr --disable-libcap --without-selinux |
datamash-1.0.6 | --disable-nls |
diffutils-3.3 | --disable-nls --disable-largefile |
findutils-4.4.2 | --disable-nls --disable-largefile --disable-threads |
grep-2.21 | --disable-nls --disable-largefile --disable-threads |
gzip-1.6 | --disable-threads --disable-largefile |
libtasn1-4.5 | --disable-doc |
m4-1.4.17 | --disable-threads --with-included-regex --disable-largefile |
make-4.1 | --disable-nls --disable-largefile --disable-job-server --disable-load |
libosip2-4.1.0 | |
sed-4.2 | --disable-largefile --disable-threads --disable-acl --disable-nls --disable-i18n --without-selinux -disable-threadsafe |
trueprint-5.4 | |
ImageMagick6-6.9.4-8 | --enable-shared=no -disable-shared --disable-openmp --without-threads --without-magick-plus-plus --disable-hdri --without-zstd --without-bzlib --without-jbig --disable-cipher --without-fftw --without-flif --without-fpx --without-djvu --without-fontconfig --without-freetype --without-raqm --without-lcms --without-openjp2 --without-lqr --without-openexr --without-pango --without-webp --without-xml |
jasper-1.900.1 | |
jpeg-9a | |
libtiff-Release-v3-9-7 | --disable-largefile --disable-zlib --disable-jpeg --disable-jbig --disable-lzma --disable-zstd --disable-webp --disable-cxx |
libxml2-2.9.2 | --without-catalog --without-debug --without-docbook --without-ftp --without-http --without-iconv --without-legacy --without-output --without-python --without-threads --without-modules --without-zlib --without-lzma |
tcpdump-4.7.4 | --without-sandbox-capsicum --without-crypto --without-cap-ng --without-smi |
vorbistools-1.4.0 | --disable-nls --disable-largefile --disable-threads --disable-shared --disable-ogg123 --without-flac --without-speex --without-kate --without-curl --disable-curltest --without-curl --disable-oggtest --disable-vorbistest --disable-curltest |