Embecosm develops all components of the tool chain, rigorously testing and benchmarking to ensure you achieve high performance code that executes correctly.
We’ll start with the machine specification (GCC) or TableGen description (LLVM) for your architecture. To this we’ll add all the target specific functionality to allow the compiler to generate code. From this we’ll then extend the compiler:
- tuning the compiler’s optimization heuristics to match your architecture;
- adding custom optimization passes to increase performance;
- defining new attributes and/or pragmas to support your needs;
- providing custom inline assembler constraints; and
- creating intrinsic and builtin instructions to exploit the features of your architecture.
Embecosm’s work with deeply embedded systems gives us the expertise to optimize for code size and energy efficiency as well as for execution performance.
All tool chains need a debugger. Almost invariably this is the GNU Debugger (GDB), but there is the possibility of using the new LLVM Debugger (LLDB). Embecosm will implement the target definition for your architecture, supplementing it with custom commands and variables as required. Embecosm’s expertise in this has been captured in an application note, EAN3, as a guide to others who wish to port GDB to a new architecture.
For large production processors, the debugger can simply run natively on the processor. However for embedded systems, it is usual to use remote debugging, with the client (GDB or LLDB) running on a workstation and communicating via TCP/IP, USB, JTAG or another protocol to a server running on or physically connected to the target. Embecosm has developed a standard open source server, which abstracts the interface to the target, allowing easy switching between a model of the target and physical hardware, with no change in functional behavior.
This is particularly valuable when a tool chain is developed pre-silicon. Users can start developing applications using a high level architectural model. As the chip is implemented, this can be replaced by a model derived from the actual Verilog or VHDL of the chip. Finally when silicon is delivered, real hardware can be substituted. Users will notice a performance difference, but the functionality will remain the same throughout.
Low Level Binary Utilities
A compiler typically generates assembly code. To achieve executable code, a linker and assembler are required. The debugger will require a disassembler and possibly an instruction set simulator.
Embecosm understands how to port the linker to different targets, and is a major contributor over the years to the official distribution for the GNU linker. While the GNU linker remains popular, for larger systems, we can provide the GOLD linker, and in the LLVM environment, the LLVM linker, lld will soon be robust enough to be considered for general commercial deployment.
Assemblers, disassemblers and simulators are often hand-written. However Embecosm has considerable experience in generating these tools automatically using either CGEN or TableGen. From a formal description of the architecture, these tools generate tables and code frameworks from which the assembler, disassembler and simulator can be created. In the case of LLVM the assembler can be integrated within the compiler for increased performance.
This approach is particularly advantageous for configurable and extensible architectures (for example RISC-V). A new instruction can be defined, and an updated tool chain generated, with immediate support for that new instruction.
All tool chains rely on a number of libraries.
- A low level library, providing emulation of functionality not available directly in the hardware. For GCC this is libgcc, for LLVM it is CompilerRT.
- For C a standard C library
- For C++ the standard C++ library
Embecosm will provide the low level emulation library, with hand-written assembler implementations for critical functionality. This can be particularly important for deeply embedded systems, where the hardware may not have floating point, or even integer multiplication and division.
There are a number of choices for the standard C library. Newlib provides a very small library, suitable for bare metal systems, or when using a small real time OS (RTOS). GlibC and MUSL are two popular full functionality libraries, essential if a full operating system kernel such as Linux or one of the BSD kernel is to be supported. Embecosm can provide an implementation of the library of your choice.
The standard C++ library is more straightforward. Both GCC and LLVM include implementations which build directly on the standard C library.
Testing and Benchmarking
At the heart of all Embecosm’s compiler work is robustness of the resulting tool chain. Typically half the effort on any project is dedicated to testing and benchmarking. We characterize four types of testing.
- Regression testing. Have we broken functionality which was broken either for this architecture or some other architecture in the past.
- Functional testing. Do we meet the functionality you required. This will be based on your code, and can be supplemented by standard test suites. Where a new chip is being developed, it can include compliance testing against the architectural specification.
- Non-functional testing (benchmarking). We measure the performance of the code generated by the compiler using standard benchmark programs, and programs provided by the customer. We measure execution performance, code size and (where relevant) energy efficiency.
- Comparative testing. This is used when developing a tool chain pre-silicon. We compare tests using the architectural golden model of the compiler with tests using a simulation of the actual implementation in Verilog or VHDL. Differences can indicate a flaw in the chip design prior to silicon tape-out.
Regression testing is the baseline for a robust tool chain. GCC has around 75,000 regression tests of the C compiler and 50,000 tests of the C++ compiler which test both the compiler and execution of the resulting code. LLVM has around regression 20,000 tests of the compiler, but these do not execute the resulting code. The execution tests for LLVM are based on applications running natively on a Linux system—for execution tests we use the GCC regression tests, but apply them to the LLVM compiler.
All other GNU tools have their own regression tests, of which GDB, with around 10,000 tests of the debugger is the largest. When creating a tool chain for a new architecture, Embecosm will supplement these standard test suites, with tests of new features which are added.
Functional testing provides pro-active exercising of the compiler for correct behavior in the applications to which it will be applied. We rely heavily on the customer supplying relevant test code we can exploit. In the ideal case we use the actual application code, but in may cases this is not feasible, due to incompleteness or instability as applications are developed. In these circumstances we can often create test cases based on key aspects of the application code.
Compliance testing is a particular type of functional testing which can be important.
- With a new architecture, it is valuable is to make an exhaustive check that the compiler/assembler correctly handle all instructions. When the assembler/disassembler has been created using CGEN or TableGen it is possible to generate an instruction set compliance test suite automatically. Where the simulator has also been generated using CGEN, it is also possible to test that the instructions execute correctly as well as assembling correctly.
- For processors where floating point is important, the open source TestFloat suite is invaluable, testing for IEEE 754 compliance.
- For DSPs with unusual integer behavior (fixed point, saturating arithmetic), additional testing in this area is important. Embecosm has its own in-house test suite for exercising such functionality.
- There are also a number of proprietary C and C++ language compliance test suites, such as Plum Hall and Supertest. However these generally exercise the front-end of the compiler, which in an open source compiler shared across many architectures tends to be stable. Such test suites struggle to keep up with the standards (there are no test suites testing beyond C/C++11) and can generate a huge number of irrelevant failures—do you really care about comprehensive support for C trigraphs or K&R style function declarations.
Embecosm also has experience with synthetic tests. Csmith is well established as a tool for generating random C programs, while fuzz testing is emerging as a technique to explore correctness of compiled code.
Once we are confident in the correctness of compiled code, non-functional testing, commonly known as benchmarking, is used to ensure the tool chain is meeting its design objectives, whether for compiled code speed, code size or energy efficiency. Embecosm has experience with many standard benchmarking suitess including EEMBC and individual benchmarks such as CoreMark. Embecosm has also created with Bristol University our own benchmark suite, BEEBS, intended for very deeply embedded systems. However the most important programs with which to benchmark are the users actual applications. All too often standard compilers such as GCC and LLVM are very effective with standard benchmarks, but can fall down when applied to particular applications. The other aspect of benchmarks is that they must verify their results are correct. We have in one case had a customer who presented a program they had used for years to benchmark their old compiler, without anyone noticing that the program was generating the wrong result!
For customers we carry out two types of benchmarking:
- Continuous benchmarking is used to monitor progress in the project. Over the months and years is compiled code tending to get faster, or smaller or more energy efficient. Because this is a long-term exercise, the benchmark code needs to be completely stable. For this reason customer applications, which are evolving, are unsuitable, and a set of standard benchmarks should be used. Care should be used to select benchmarks which are as representative as possible, something with which Embecosm can assist.
- Snapshot benchmarking is used to check the implication of a significant change to the tool chain, such as a new release, or addition of a new optimization. This always uses customer application code, which is compiled and run with both the old and new tool chain. Apart from the tool chain difference (old versus new release, or with and without optimization), the setups must be identical. This provides a clear measure of how any change will affect the actual end use case.
Embecosm are pioneers in comparative testing, used pre-silicon. The standard regression and functional tests are run using both a simulation of the high level architectural model and a simulation of the actual implementation in Verilog or VHDL. The two should be identical, and any differences are likely to be due to a flaw in the silicon chip implementation. We have used this in a number of scenarios, including the OpenRISC processor used by NASA in TechEdSat and the Ephiphany processor described in the case study below.
How much will a tool chain cost to develop?
Historically compilers have required large teams over several years to create. However the ability to share code with open source compilers such as GCC and LLVM has hugely reduced this effort. The actual effort will depend on several factors—how big is the architecture, how unusual is the architecture, what languages need to be supported, how many custom optimizations and features are needed and how much specialist testing and benchmarking is needed. Once created, the compiler will need maintaining, rolling forward to new versions, fixing bugs found by users and adding new optimizations and features.
For most architectures, Embecosm are able to create a proof-of-concept functional tool chain in 3 engineer months. This is sufficient for the customer to explore the potential of the compiler and decide the features needed in the full production tool chain. Taking the tool chain to full release quality, passing all tests and achieving target performance typically takes a total of 1-3 engineer years, over an elapsed period of 12-18 months. Thereafter maintenance typically requires 0.5–1 engineer days per year.
However there is one scenario in which the effort can be much less than this. Very often chip design houses have their own in-house DSP, used for a single class of applications and programmed in assembly language. As such DSPs become more complex the cost of assembly language programmers can become significant, and it could be cost effective to switch to programming compiled C. In this case, we typically have a small, relatively simple architecture and need only support C for a small number of applications for an in-house user base. Embecosm have demonstrated that it is feasible to create a tool chain for such a use case in just 6 engineer months, making the switch to programming in C highly cost effective.