We study the effect of relaxing too conservative conditions for generating UB-free compiler test-cases of Csmith’s code-generation and code-execution time solutions.
Methods for randomized testing of compilers to find miscompilation bugs typically require a way to generate programs that are free from undefined behaviour (UB). Tools such as Csmith achieve UB-freedom by heavily restricting the form of generated programs. This leads to highly idiomatic programs, and we hypothesise that this limits the thoroughness with which compilers are tested. Our idea is that researchers should investigate ways to generate less restricted programs that are still UB-free—programs that get closer to the edge of undefined behaviour, but that do not quite cross the edge. We present experiments investigating one instance of idea via a prototype tool, CsmithEdge, that uses a simple dynamic analysis to detect where Csmith has been too conservative in its use of “safe math” wrappers that guarantee UB-freedom for arithmetic operations, eliminating redundant wrappers. By reducing the use of safe math wrappers, CsmithEdge was able to discover two new miscompilation bugs in GCC that could not be found via intensive testing using regular Csmith, as well as achieving substantial differences in code coverage on GCC compared with regular Csmith.
This work is supported by the EPSRC through grant EP/R011605/1.