A rigorous empirical study of the impact of miscompilation bugs in a mature compiler, comparing bugs found using a fuzzer to bugs found while compiling real code.
Overview
Despite much recent interest in randomised testing (fuzzing) of compilers, the practical impact of fuzzer-found compiler bugs on real-world applications has barely been assessed. We present the first quantitative and qualitative study of the tangible impact of miscompilation bugs in a mature compiler. We follow a rigorous methodology where the bug impact over the compiled application is evaluated based on (1) whether the bug appears to trigger during compilation; (2) the extent to which generated assembly code changes syntactically due to triggering of the bug; and (3) whether such changes cause regression test suite failures, or whether we can manually find application inputs that trigger execution divergence due to such changes. The study is conducted with respect to the compilation of more than 10 million lines of C/C++ code from 309 Debian packages, using 12% of the historical and now fixed miscompilation bugs found by four state-of-the-art fuzzers in the Clang/LLVM compiler, as well as 18 bugs found by human users compiling real code or as a by-product of formal verification efforts. The results show that almost half of the fuzzer-found bugs propagate to the generated binaries for at least one package, in which case only a very small part of the binary is typically affected, yet causing two failures when running the test suites of all the impacted packages. User-reported and formal verification bugs do not exhibit a higher impact, with a lower rate of triggered bugs and one test failure. The manual analysis of a selection of the syntactic changes caused by some of our bugs (fuzzer-found and non fuzzer-found) in package assembly code, shows that either these changes have no semantic impact or that they would require very specific runtime circumstances to trigger execution divergence.
Watch It on YouTube
A short and a longer talk about this work can be watched on YouTube:
The slides for these talks can be downloaded here: 20 mins - 50 mins.
Download Impact Measurement Platform
The artifact containing our experimental data and impact measurement platform is available here.
Research Support
This work was supported by the EPSRC through grants EP/R011605/1 and EP/R006865/1.
People
The presented empirical study has been carried out equally by Michaël Marcozzi and Qiyi Tang.
Publications
-
Compiler Fuzzing: How Much Does It Matter?
Michael Marcozzi, Qiyi Tang, Alastair Donaldson, Cristian Cadar
Proceedings of the ACM on Programming Languages (OOPSLA 2019)
Talks
-
Compiler Fuzzing: How Much Does It Matter?
Talk @ Seminar of the Software Safety and Security Lab, CEA LIST institute
-
Compiler Fuzzing: How Much Does It Matter?
Talk @ Seminar of the Verimag Lab, Université Grenoble Alpes
-
Compiler Fuzzing: How Much Does It Matter?
Talk @ Papers We Love London - CREST/PWL Special Event
-
Compiler Fuzzing: How Much Does It Matter?
Talk @ MTV2 Workshop (co-located with IFIP-ICTSS 2019)
-
Compiler Fuzzing: How Much Does It Matter?
Talk @ SPLASH 2019 OOPSLA
-
Compiler Fuzzing: How Much Does It Matter?
Talk @ S-REPLS 10