A counterexample-guided inductive synthesis approach for summarizing simple string loops written in C with equivalent sequences of standard library functions and primitive pointer operations.
Overview
Analysing and comprehending C programs that use strings is hard: Using standard library functions for manipulating strings is not enforced and programs often use complex loops for the same purpose. We introduce the notion of memoryless loops that capture some of these string loops and present a counterexample-guided inductive synthesis approach to summarise memoryless string loops using C standard library functions, which has applications to testing, optimization and refactoring.
We prove our summarization is correct for arbitrary input strings and evaluate it on a database of loops we gathered from a set of 13 open-source programs. Our approach can summarize over two thirds of memoryless loops in less than 5 minutes of computation time per loop. We then show that these summaries can be used to (1) enhance symbolic execution testing, where we observed median speedups of 79x when employing a string constraint solver, (2) optimize native code, where certain summarizations led to significant performance gains, and (3) refactor code, where we had several patches accepted in the codebases of popular applications such as patch and wget
Teaser
Artefact
The artefact page is available here.
Research Support
This research was supported in part by the UK EPSRC via grant EP/N007166/1 and a PhD studentship.
Publications
-
Computing Summaries of String Loops in C for Better Testing and Refactoring
Timotej Kapus, Oren Ish-Shalom, Shachar Itzhaky, Noam Rinetzky, Cristian Cadar
ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2019)
Talks
-
Computing Summaries of String Loops in C for Better Testing and Refactoring
Talk @ PLDI 2019
-
Summaries of C String Loops for Better Symbolic Execution (and Refactoring)
Talk @ Shonan meeting on Fuzzing and Symbolic Execution