My Google Summer of Code (GSoC) project builds upon a previous joint project between Embecosm and Southampton University, which recently developed an open source AI/ML ISA extension for a RISC-V core. This project extended a CV32E40P RISC-V core with a vector instruction accelerator to improve performance in neural networks inference. This project produced excellent results, demonstrating a 5-fold increase in performance on an appropriate benchmark (TinyMLPerf).
One limitation of this existing work is that the project currently exists only as a Verilator model. This limits the measurement of performance to cycle counts without insight to any impact on clock speed in actual silicon. Furthermore, the current project has no pipeline and restricts all operations to be “single cycle”. This makes it difficult to understand how the previous project will perform in a a realistic situation. The goal of my Google Summer of Code (GSoC) project was to resolve this deficiency by realizing a hardware implementation of the accelerator. This was be done by synthesizing the model on an FPGA, the choice of which for this project was the Nexys A7 FPGA platform. The repository containing my work can be found here.
The Project
The first step it took in this project was to resolve some outstanding issues in the verification of the current design through randomly generated test cases. This was a necessary step to ensuring the integrity of my results, with functional and line coverage ensured with Verilator.
Having done this, I was able to focus on bringing up the ISA extension on the FPGA. As a first step, I attempted to do this on the baseline core (without the accelerator), for which I used the Open Hardware Groups core-v-mcu project. This was a reasonably challenging step, debugging (for which I used OpenOCD and gdb) was not straightforward, and, since the code was running on bare metal, I had to create a linker script was required to assign the program sections to the correct addresses of the core-v-mcu RAM. Anyone interested in following my steps on this project may find the README I wrote on the steps I took a useful reference.
Having synthesized and successfully run the baseline core-v MCU, I was able to tackle the challenge of synthesizing the accelerated core. This required some significant changes to support the new vector instructions and the accelerator/core interface. OpenOCD required the definition of both some vector CSRs, as well as some main processor CSRs, with GDB requiring further modification. At the time of writing this blog post, pure C programs run without issue while programs with assembly vector extensions instructions still did not perform correctly.
My Experience
The GSoC program has been both challenging and exciting for me. Supported by my mentors, Will Jones and Jeremy Bennett of Embecosm, I have been able to make significant progress on running the AI accelerator on the Nexys A7. There still remains a great deal of work to be done on this project, and I have agreed with my mentors that we will continue working on this project after the end of GSoC. Future work will involve more robust testing of the accelerator to identify why vector instructions are failing, and comparing bench marking of the hardware implementation against it’s verilator counterpart.
This project and my experience with GSOC have left me with some key take-aways. While GSoC is a great platform to explore new areas, develop new skills, and perhaps even to discover a new career path, GSoC is not an internship. It is definitely not something you should pursue half-heartedly, and should not be looked at as a way to earn money.
Another thing this project has driven home for me is the value of mentorship and community engagement. Between my project mentors and GSOC I was connected to a large array of experts in the field of my project. The value of collaborations through these connections to my project was extremely high, and I have no doubt that the connections I made will be valuable long into the future.
This has also helped give my confidence in my own value. As well as completing a project that is difficult in it’s own right, the collaboration and engagement I’ve done has shown my how my work has both fostered interest in a wider community, and made useful contributions to it. It was also made very clear to me how even in a circle of experts my efforts could be useful, especially for the fresh perspective is provided.
One specific insight regarding my project, and one that has affected not just me but the wider community, is that the out-of-the-box experience of the Core-V MCU would benefit from substantial improvement. My experience of difficulty setting up the debugging system has been an important factor in focusing attention on the SDK and usability of the CORE-V MCU project project. While there remains a great deal of work to be done on this, I have made my own contributions by documenting and publishing the steps I have taken for guidance of and testing by the wider community. One small example is this README.
Perhaps the most important lesson I have learned from all of this is the value of communication. Ultimately, while I set out to tackle my own specific project, as I discussed above, it ultimately prospered only in the context with mutual collaboration with a wider community. The foundation of all of this was strong communication skills, expressing myself to mentors and community members clearly and listening diligently in turn. This is a lesson I will me taking forward, and strongly encourage any readers of this blog post to take to heart.