Less is More, Exploiting the Standard Compiler Optimization Levels for Better Performance and Energy Consumption

Compilers were introduced to abstract away the ever-increasing complexity of hardware and improve software developer productivity. At the same time, compiler developers face a tough challenge of producing optimized code for increasingly complex architectures. A modern compiler typically supports a large number of architectures and programming languages and is used for a large variety of applications. Thus, tuning the compiler optimizations to perform well across all possible combinations of applications and architectures is impractical. The task becomes even harder as compilers need to adapt to rapid advances in hardware and programming languages.

To tackle this issue, techniques such as iterative compilation and machine learning have been used to find good optimization sequences by exploiting only a fraction of the optimization space. Although such techniques are promising for auto-tuning the compiler’s optimizer settings for a particular application, they typically act as a “black box”, providing no insights into why certain configurations are better than others. Thus, it is difficult to use them as guidance for developing new systematic optimizations or improving the existing ones.

We made the interesting observation that by performing fewer of the optimizations available in a standard compiler optimization level such as -O2, while preserving their original ordering, significant savings can be achieved in both execution time and energy consumption. The savings that can be achieved are in the same range as what can be achieved by the state-of-the-art compiler auto-tuning approaches that use iterative compilation or machine learning to select flags or to determine flag orderings that result in more efficient code. In contrast to these time consuming and expensive to apply techniques, our approach only needs to test a limited number of optimization configurations, less than 64, to obtain similar or even better savings. Furthermore, we demonstrate that the proposed technique can provide insights into why any detected optimization configuration performs better than expected or conversely, degrades a program’s performance. Thus, to the best of our knowledge, this is the first compiler auto-tuning technique that can expose hidden architecture-dependent and cross-architecture optimization opportunities and can drive the improvement and tuning of the compiler’s common optimizer.

Kyriakos Georgiou is a Senior Research Associate in the Trustworthy System Lab (TSL) at the University of Bristol, where he has been researching a broad spectrum of ICT subjects including energy-aware computing, execution time and energy consumption modeling, compiler auto-tuning, and good software development practices. Kyriakos has been the lead researcher of two European-funded projects at the University of Bristol; ENTRA (http://entraproject.ruc.dk/) and TeamPlay (https://www.teamplay-h2020.eu/). He was also one of the authors of the TeamPlay project’s grant proposal, which was awarded 5.415.551,24 euro by the EU Horizon 2020 research and innovation programme. He is now responsible for coordinating some of the TeamPlay’s deliverables that involve the 11 industrial and academic project partners from all over across Europe. Kyriakos Georgiou holds a Ph.D. from the University of Bristol, an MSc in Internet Technologies with Security from the University of Bristol and a BSc in Computer Science from the University of Cyprus. He has previously worked in the industry for two years as a software developer for financial services, and as Compiler Engineer for three years.