Does Automated White-Box Unit Test Generation Really Help Software Testers?

Work on automated test generation has produced several tools capable of generating test data which achieves high structural coverage over a program. In the absence of a specification, developers are expected to manually construct or verify the test oracle for each test input. Nevertheless, it is assumed that these generated tests ease the task of testing for the developer, as testing is reduced to checking the results of tests. While this assumption has persisted for decades, there has been no conclusive evidence to date confirming it. However, the limited adoption in industry indicates this assumption may not be correct, and calls into question the practical value of test generation tools. In this talk I report on results of two controlled experiments comparing between writing tests manually and writing tests with the aid of an automated unit test generation tool, EvoSuite. Although tool support leads to clear improvements in commonly applied quality metrics such as code coverage (up to 300% increase), there was no measurable improvement in the number of bugs actually found by developers. Our results not only cast some doubt on how the research community evaluates test generation tools, but also point to improvements and future work necessary before automated test generation tools will be widely adopted by practitioners.

Gordon Fraser is a lecturer in Computer Science at the University of Sheffield, UK. He received a PhD in computer science from Graz University of Technology, Austria, in 2007. The central theme of his research is improving software quality, and his recent research concerns the prevention, detection, and removal of defects in software. More specifically, he develops techniques to generate test cases automatically, and to guide the tester in validating the output of tests by producing test oracles and specifications.