Automatic API Test Generation for Software Libraries using Client Code
Testing APIs of software libraries requires constructing complex preconditions and data structures, and writing such tests manually is both time-consuming and error-prone. Traditional automated test generation techniques rely on static and dynamic program analysis, but tend to produce tests that are difficult to read and maintain, scale poorly to large codebases, and derive context solely from the library under test. Clients of a library, the users that employ it, offer a rich, untapped source of real-world API usage scenarios that can guide the generation of meaningful, representative tests for library APIs. In this thesis, we investigate whether client code can be systematically leveraged to generate useful API tests for software libraries, and make three contributions.
We first present an empirical study of API testing and usage across 21 popular open-source C libraries and a combined total of 3,198 C/C++ clients. By comparing how clients exercise library APIs against how well library test suites cover them, we find that library developers often do not prioritise testing effort according to client usage: popular APIs are frequently under-tested. For example, in one library, 45% of APIs used by clients are not covered by the library’s own test suite. We further demonstrate that, using client test suites, it is possible to improve test coverage of a target library, e.g., in the aforementioned instance by 14.7%.
We then propose a static analysis based approach that, given the source code of a library client, automatically extracts self-contained test cases targeting the library’s APIs. Because these tests originate from real client code, they capture usage scenarios that library developers may not have envisioned. We show that our static analysis approach is able to generate tests that compile independently of the library under test and improve test coverage in the target libraries. While we successfully extract compilable test cases for all libraries and some libraries accepted our contributions, we identify several challenges in scaling our approach to more libraries and clients.
Finally, we improve on the previous approach by augmenting our static analysis pipeline with LLM-driven transformations and code generations to produce tests that are both more readable and less susceptible to hallucinations than those generated by standalone coding agents. We address the challenges identified in the previous approach by using LLMs to provide a scalable solution and adding a novel technique for API sequence extraction from client code. We realise this idea in SPEAR, a tool that extracts API usage slices and call sequences from client code and refines them through a series of LLM-based steps into effective, self-contained API tests. Comparing SPEAR to Claude Code, a leading commercial LLM code assistant, we show that SPEAR exhibits complementary strengths: in particular, it generates tests grounded in real-world usage patterns that are out of reach for the purely LLM-generated unit tests produced by Claude Code.
Ahmed Zaki is a PhD student at Imperial College London, where his research focuses on improving the reliability of software libraries through automated testing. Prior to his doctoral studies, he spent over a decade in industry, holding a range of cyber security roles, most recently at Meta. He is currently the founder of Code Sa, an autonomous code maintenance agent for high-assurance teams.
