Dual Channel Software Engineering

Source code combines two channels: a formal algorithmic language (AL) channel and a natural language (NL) channel of identifiers and comments. To date, most work has focused exclusively on one of these two channels. This is a missed opportunity because the two channels interact: the natural language channel often explains, imposes assumptions on, or summarizes the algorithmic channel, so information in one channel can be used to improve analyses of the other channel. In the absence of explicit security annotations, identifier names can implicitly identify secrets. As an extremal example, consider an identifier named “secret”; printing it to the console at least merits investigation. Here, we have a discrepancy between a name in the NL channel and its use in the AL channel. Dual channel analysis finds such discrepancies. To do so, dual channel analysis must overcome two challenges: find cross-channel synchronisation points and handle noise in the form of ambiguity in the NL channel and imprecision due to modelling the AL channel. Thus, dual channel analysis is a natural fit for machine learning. I will present RefiNym, a dual channel analysis that models code with name-flows, a dataflow graph augmented to track identifier names. Conceptual types are logically different types that do not always coincide with program types. Passwords and URLs are example conceptual types that can share the program type String. RefiNym is an unsupervised method that mines a lattice of conceptual types from name-flows and reifies those conceptual types into distinct nominal types. For the String type, we show that RefiNym minimises co-occurrence of disparate conceptual types in the same scope by 22%, thereby making it harder for a developer to inadvertently introduce an unintended flow across conceptual types.

Earl Barr is a senior lecturer (associate professor) at the University College London. He received his Ph.D. at UC Davis in 2009. Earl’s research interests include dual channel software engineering, testing and analysis, game theory, and computer security. His recent work focuses on automated software transplantation, applying game theory to software process, and using machine learning to solve programming problems. Earl dodges vans and taxis on his bike commute in London.