Compilation for Security

Security is a system-wide problem, and increasingly important in a highly connected world. All too often secure hardware is compromised by poorly implemented software. The compiler is ideally placed to help the secure software professional detect problems and write secure code. With support from Innovate UK, Embecosm is developing standard extensions to GCC and LLVM, which detect common security flaws in code and provide features to make writing secure code easier.

Some processors include custom security hardware, for example to detect unauthorized modification of code. Embecosm has experience in adding compiler extensions to take advantage of such hardware features automatically.

Technical Details

Embecosm is extending GCC and LLVM to add generic features to make writing secure code easier for the software engineering professional.

Features to detect insecure code. These provide warnings to the developer about coding that could be insecure.
Features to help write secure code. There are a number of techniques which are known to be good practice, but all too often are not used because they are complex to implement. Our extensions automate a number of these techniques, to make them easier to use.

In addition, customers working on secure applications build custom security hardware in the processors. Embecosm can extend compilers to take advantage of such processes automatically (see case study).

In all cases the features have been added using techniques compliant with C and C++ standards, such as variable and function attributes and command line options.

Much of the work on generic features is being carried out with support from Innovate UK, the British government’s innovation agency. This is allowing us to provide reference implementations of some of these features for GCC and LLVM targeting ARM and RISC-V processors. These reference implementations are made freely available as they are completed from Embecosm’s GitHub repositories. Contact Embecosm to discuss porting these to your own architecture, or adding specialist features for your application.

Features to detect insecure code

The following features are being developed to help detect insecure code.

Sensitive control flow. If a critical variable, such as a cryptographic key, is used to control the flow through a program, then information about that critical variable will leak when the program is run. Techniques such as differential timing or power analysis will reveal the flow and hence information about the critical variable.Professional programmers know to avoid such code, but in a large program it can be hard to spot, particularly where it is not the critical variable itself, but an alias of the variable or a part of the variable. Embecosm’s critical attribute is used to mark such variables. The compiler’s global dataflow analysis will then detect uses that affect control flow in the program.
Sensitive memory access. This is closely related to sensitive control flow, but in this case, a critical variable controls which memory is accessed. Where different memory regions have different timing or energy characteristics, this will cause information to leak.The compiler’s dataflow analysis pass can detect memory access controlled by a critical variable which may leak information.

Features to help write secure code

The following features are being developed to help write secure code.

Stack erase. When returning from a function, any values left on the stack remain there, and are potentially accessible from other functions. Functions labelled with the stack_erase attribute add code to the epilogue to clear the stack frame. The -fstack-erase option to the compiler applies this to all functions.
Register erase. Often used with stack erase, the register_erase attribute clears all local register values on return from a function. The-fregister-erase option to the compiler applies this to all functions.
Longjmp erase. A special case of stack erase, where longjmp clears the entire stack between its call and resumption point.
Cryptographic bit splitting. More commonly used as a defence against network attacks, applied to a global critical variable the bit_split attribute will cause that variable to be split up and the parts stored in multiple locations. This defends against attacks that scan memory, but has a significant computational cost (reassembling the value whenever it is used), which in turn may increase information leakage.
Bit-slicing. Block ciphers and similar cryptographic algorithms are often defined in terms of processing one row of the block at a time. It can be computationally beneficial to instead process all the rows simultaneously one bit at a time. This has a side benefit of improving the security of the code, with reduced information leakage. Transforming algorithms in this way is highly complex. Embecosm is working on compiler support to simplify the process.
Code duplication. An attack method sometimes used with secure code is to shine powerful lasers on the chip in the hope of corrupting code, so a critical variable is not written correctly. To defend against this, programmers will often assign values to critical variables twice, so even if one code point is corrupted, the value will still be assigned.The problem is that modern compilers will spot such duplication and automatically remove it, so such code is often compiled without optimization, seriously reducing performance. Embecosm instead automates the duplication, for variables marked as being sensitive to this attack. This simplifies the whole process and allows code to be optimized. Being automated it ensures that no duplication is missed, and the duplication is spaced optimally.
Atomicity. Sometimes there is no alternative to a critical variable controlling flow. Atomicity tries to ensure the performance of alternative control flows match as closely as possible. Not just in terms of timing, but in power consumption and memory accesses.
Permutation. Very often there are multiple ways of computing the same basic block. Rather than choosing just one, the compiler can generate all of them, randomly choosing which one to execute each time through. This serves to confuse attacks analyzing code flow.

Case Study

Using LLVM to guarantee program integrity

There are many embedded systems on which we rely heavily in our day to day lives, and for these it is crucial to ensure that these systems are as robust as possible. To this end, it is important to have strong guarantees about the integrity of running code. Achieving this naturally requires close integration between hardware features and compiler toolchain support for these features.

To achieve this, one Embecosm customer’s architecture uses hardware signing to ensure integrity of a program’s control flow. Each instruction’s interpretation depends on the preceding instruction in the execution flow (and hence the sequence of all preceding instructions). Basic blocks require a “correction value” to bring the system into a consistent state
when arriving from different predecessors.

Compiler support is needed so that compiled code can receive the benefits of this feature. During 2016 we implemented the infrastructure for this feature which can be enabled on a per-function level in LLVM, for functions written in C and/or assembly.

We extended the target’s backend with a pass that produces metadata describing a system’s control flow. This allows branches and calls to be resolved with appropriate correction values. A particular challenge was dealing with function pointers and hence indirect transfers of control. C attributes were implemented to support such functionality in the LLVM front end.

The encoding of each instruction, and the correction values cannot be finally determined until the final programs is linked. Using the metadata generated by LLVM, we can recreate the control flow graph for the entire program. From this, each instruction can be signed, and the correction values for each basic block inserted into the binary.

The full system was demonstrated at the 2016 LLVM Developer’s meeting in San Jose, California.

Compiler Tool Chain Development

Embecosm is able to provide new and upgraded ports of binutils, GCC, GDB, GNU libraries, LLVM, LLDB, LLVM utilities and LLVM libraries, whether for the smallest deeply embedded processor, or the largest supercomputer cluster.

Compilation for Security
Embecosm is developing compiler features to help professional software engineers write secure code

Go to section

Technical Details

Features to detect insecure code

Features to help write secure code

Case Study

Compiler Tool Chain Development