Energy-efficient Superoptimization

This blog post introduces the project I will be working on during an internship with Embecosm. The internship is supported by HiPEAC, an organisation with a key aim of promoting collaboration between industry and academia. This project will look at how we can take advantage of intelligent compilation techniques to improve the energy consumption of the memory hierarchy. In this blog post I am going to talk about superoptimization, as a potential method of generating energy-efficient code.

Superoptimization by itself is the very opposite of  ‘intelligent compilation’ — a brute-force search through every possible instruction sequence until the best one is found. This technique, first introduced by Henry Massalin in 1987, has been used to find surprising optimal sequences for a variety of architectures. The main problem with superoptimizers is the sheer size of the search space – searching through instruction sequences of more than three instructions can take weeks!

Fortunately there are ways to reduce the number of instruction sequences that need to be enumerated, such as by removing redundant or meaningless instructions (e.g. mov r0, r0). More recent studies (stochastic superoptimization) have used machine learning and other statistical methods to select which instruction sequence to try, giving good results in a shorter period of time and without the exhaustive search.

Another recent study created a database of peephole optimization using a superoptimizer. These optimizations could then be applied to a new program at compile time, avoiding the long computation times of the superoptimizer. Using this method a database of energy efficient transformations could be constructed, enabling the compiler to produce an energy efficient executable.

Superoptimization has been also used before in the GNU Superoptimizer to generate short sequences of code that remove branches from the target program. It should be possible to apply superoptimization in a similar way to minimise the amount of energy that caches and memories consume.

Research questions

There are several interesting things that I intend to investigate during this project:

  • Does superoptimization for energy consumption produce a different optimal sequence than if targeting code size or performance?
  • Can we create a database of energy-efficient peephole optimizations, and could these be used to transform a non-energy-efficient application into an energy-efficient one?

I intend to implement a superoptimizer to answer these questions. Each potential sequence of code that the superoptimizer finds will be evaluated on a platform instrumented for energy measurement, allowing the lowest energy code sequence to be chosen.

Hardware

Initially the project will use existing hardware that was used last summer with the energy impact of different compiler optimizations project. The BeagleBone fits the criteria well: 32KB instruction and data caches, and 256KB L2 cache backed by 256MB of DRAM.

BeagleBone instrumented to measure three of the power supplies.

The techniques could then be applied to other embedded platforms with caching, to see if they transfer to different memory hierarchies and cache sizes.

Embecosm’s commitment to open access

Any project software outputs will be open source and with the resulting academic papers published for open access. Current results and progress will be hosted online via a wiki, as with previous projects.

By the end of the project, I hope to have developed new optimization methods and novel ideas for how the compiler can reduce an application’s energy consumption, further strengthening Embecosm’s expertise in energy-efficient optimization and compilation techniques.