The Security Enhancing Compilation for Use in Real Environments (SECURE) project is an InnovateUK supported research program by Embecosm which started in July 2017. Its goal is to take the latest academic ideas for improving security of code, and provide practical reference implications in the main open source compilers, GCC and LLVM. The project has a particular priority to improve security of Internet-of-Things (IoT) devices.  Prof Elisabeth Oswald and Dr Dan Page of the Bristol University’s Cryptography Research Group are advisors to the project.

Security is a system wide challenge.  The compiler, as the one tool that looks at all the software in a system is well placed to improve security in two ways:

  • by detecting insecure code by warning the developer about coding that could be insecure; and
  • by providing features to help write secure code;

Features to detect insecure code

  • Sensitive control flow. If a critical variable, such as a cryptographic key, is used to control the flow through a program, then information about that critical variable will leak when the program is run. Techniques such as differential timing or power analysis will reveal the flow and hence information about the critical variable. Professional programmers know to avoid such code, but in a large program it can be hard to spot, particularly where it is not the critical variable itself, but an alias of the variable or a part of the variable.  Marking a critical variable with a suitable attribute will allow the compiler’s global dataflow analysis will then detect uses that affect control flow in the program.
  • Sensitive memory access. This is closely related to sensitive control flow, but in this case, a critical variable controls which memory is accessed. Where different memory regions have different timing or energy characteristics, this will cause information to leak.  The compiler’s dataflow analysis pass can detect memory access controlled by a critical variable which may leak information.

Features to help write secure code

  • Stack erase. When returning from a function, any values left on the stack remain there, and are potentially accessible from other functions.  Functions can be labelled with an attribute to add code to the epilogue to clear the stack frame.  An option to the compiler will apply this to all functions.
  • Register erase.  Often used with stack erase, a function can be labeled with an attribute to indicate that all local register values should be cleared on return from a function. An option to the compiler will apply this to all functions.
  • Longjmp erase. A special case of stack erase, where longjmp clears the entire stack between its call and resumption point.
  • Cryptographic bit splitting. More commonly used as a defence against network attacks, an attribute can be used to indicate that a critical variable should be split up and the parts stored in multiple locations.  While defending against attacks which scan memory, this has a significant computational cost, which in turn may increase information leakage.
  • Bit-slicing. Block ciphers and similar cryptographic algorithms are often defined in terms of processing one row of the block at a time. It can be computationally beneficial to instead process all the rows simultaneously one bit at a time. This has a side benefit of improving the security of the code, with reduced information leakage.  Embecosm is developing pragmas and intrinsic functions to help automate this transformation.
  • Code duplication. An attack method sometimes used with secure code is to shine powerful lasers on the chip in the hope of corrupting code, so a critical variable is not written correctly.  To defend against this, programmers will often assign values to critical variables twice, so even if one code point is corrupted, the value will still be assigned.  The problem is that modern compilers will spot such duplication and automatically remove it, so such code is often compiled without optimization, seriously reducing performance. Embecosm instead automates the duplication, for variables marked as being sensitive to this attack. This simplifies the whole process and allows code to be optimized. Being automated it ensures that no duplication is missed, and the duplication is spaced optimally.
  • Atomicity. Sometimes there is no alternative to a critical variable controlling flow. Atomicity tries to ensure the performance of alternative control flows match as closely as possible. Not just in terms of timing, but in power consumption and memory accesses.
  • Permutation. Very often there are multiple ways of computing the same basic block. Rather than choosing just one, the compiler can generate all of them, randomly choosing which one to execute each time through. This serves to confuse attacks analyzing code flow.

Pages in this section

Research

SECURE

2017-. The compiler is ideally placed to help the user write secure code. The SECURE project is applying the latest academic research in this area to production GCC and LLVM compilers.

Research

AAP

2015-. An Altruistic Processor (AAP) was created to advance compiler technology for deeply embedded processors with a restricted register set and complex memory structures.

Research

GSO 2.0

2015-. GSO 2.0 is a tool kit under development by Embecosm, allowing multiple approaches to superoptimization to be used.

Research

TSERO

2015-2017. TSERO was a follow on project to the MAGEEC and Superoptimization projects looking at compiling energy efficient code for high performance computing systems and data centers.

Research

Superoptimization

2014. This feasibility study established the feasibility of using superoptimization to improve performance of critical code sections and to create new machine dependent peephole optimization passes for compilers

Research

MAGEEC

2013-2014. The MAGEEC research project aimed to make machine learning feasible in commercial compilers, specifically for generating energy efficient code on deeply embedded systems.