Software Reliability (70024)

Autumn Term, 2024 • Imperial College London

Open this term for the MEng Security and Reliability degree students, this course provides an overview of exciting recent research into techniques and tools that aim to help developers improve the reliability of their software.

The importance of software reliability

Society is becoming ever more reliant on software and software-controlled systems. Some of this software is safety-critical, e.g., the software used to control cars, aeroplanes and other high-speed transport. Defects in safety-critical software can lead to serious injury or death. A much larger volume of software is business-critical, e.g., software that runs in mobile phones, powers web servers and manages data centers. Defects in this type of software can lead to significant financial losses. Underpinning all of these areas is systems software: the low-level operating systems, compilers, device drivers and networking software on which complex systems are built. This foundational role means that the reliability of systems software is of primary importance.

Traditional methods for improving software reliability

Three of the main techniques used in industrial and open-source projects to improve software reliability are:

Manual testing: Manually crafting a suite of tests to exercise a software system to a reasonably high degree, e.g., covering a high percentage of program statements.
Code reviews: Requiring that source code additions or modifications (patches) are reviewed by experienced developers before being committed to the code base.
Coding standards: Requiring that all developers adhere to a set of rules when writing or maintaining code. Coding standards can improve source code readability, making it easier to spot defects, and may ban the use of programming idioms that are arguably dangerous.

Rigorous manual testing, code reviews and adherence to standards are essential to the success of large software projects, but they all suffer from two common problems:

They depend fundamentally on human reasoning and judgement. Humans are clever, but software can be devilishly complex. It is easy for subtle defects to creep into a project despite adherence to coding standards, and to evade manual testing and code review.
They do not provide guarantees. A test suite can demonstrate that certain executions of a software system do not exhibit defects, but provides no further guarantee. For safety- (and often business-) critical systems this may not be enough: it is highly desirable to have a guarantee of defect freedom; ideally an absolute guarantee, but at least a guarantee that system executions have been systematically checked up to some well-defined bound.

Curriculum

The focus of this course is on automatic techniques for improving software reliability which go beyond manual testing. The course will cover:

Basic program analysis notions
Fuzzing
Derived test oracles
Undefined behaviour
Compiler optimizations and unstable code
Dynamic symbolic execution
Data-flow analysis
Coverage criteria

Distinguishing Features of this Course

The course differs a little from most other courses in the Department of Computing:

Coursework intensive. The coursework consists of a relatively large development project (to be undertaken in teams), during which you will build a fuzzer and modify a symbolic execution engine. You will also give a presentation about a software reliability tool of your choice. The coursework and presentation will be a significant undertaking. Consequently, there is a higher weighting of marks for coursework.
Research papers. One part of the course material is based directly on a research paper, which you are required to study, and we also recommend a set of papers that will broaden your knowledge of the course materials and give you inspiration for the coursework projects. See the Reading List for details.

Prerequisites

Good C/C++ programming skills: You need to be comfortable with C/C++ in order to succeed in the course.
Compilers: Experience from a basic compilers course (such as the Imperial Compilers course) will be useful in helping you master some of the material covered in Software Reliability.
Logic: The course will make use of fundamental concepts from first order logic, such as those covered in the Imperial Discrete Mathematics, Logic & Reasoning course.

Some of the topics we will study overlap with the research area of computer systems. Background from the Operating Systems course (or equivalent) may thus prove useful.