Change logs: Why they still matter

Cray 1 computer core. Let’s hope they had a record of where all those wires were connected!

I have often been told that the change log is obsolete; it being suggested that with modern distributed concurrent version systems like Git, the commit log combined with tools such as git blame, git show and git diff, mean the textual change log is no longer needed.

That is to miss the point of the change log. Git blame/show/diff can show you what has changed in complete detail. But they cannot show you why it was changed.

A story of failure

The “why” is really important. Here is a story to illustrate why.

For the past year you have been a major contributor to a popular open source project, which is now suffering performance problems. From git show you can see that five years previously a contributor who has since left the project replaced Quicksort with a bubble sort at the heart of project codes. This seems wrong — we all know Quicksort is much faster than bubble sort, and you are tempted to put Quicksort back to fix the performance problems.  So you spend several hours reversing out the change and the performance improves, although only marginally. Still, at least it hasn’t got worse.

Then three days later you get a flurry of bug reports from a group of users. It turns out they are using your project on a very strange bare metal architecture which does not have a Quicksort implementation in its library. Your project no longer works for them. You realize that must be why Quicksort was originally removed, but you still want to improve performance. Fortunately, being experienced you know that Shell’s sort is a helpful compromise. Simple to implement, faster than bubble sort, if not quite as fast as Quicksort. So you rewrite the code again, test it with a simple test case and check in the change. Performance is still only marginally better, but now your project works for those without a Quicksort library implementation.

Within hours there are a flood of bug reports that the tool has just stopped working for most major users. You spend some time debugging and realize that the central sort routine can be used for sorting on multiple keys. Shell’s sort is not stable, so breaks up the order of earlier keys — your simple test case only had a single key so didn’t see this. So back to the coding again, add tags to your Shell’s sort code to make it stable; all rather more complex than you originally intended and,  disappointingly, although everything works and there are no more bug reports, the project is now no faster than when you started.

What the original contributor had discovered, but you did not know was that the central sort routine, while heavily used is generally given only a small number of items to sort, and those items are almost fully sorted anyway. And under these circumstances simple bubble sort is as fast as any other sort. Had this “why” been documented with the change you would have been spared a lot of time coding, you could have kept the code simple and you would have spared users all their disruption.

Change logs provide the solution

This story of course illustrates flaws other than problems with change logs, and decent regression testing and benchmarking before commits would have avoided end users experiencing issues. But at the heart of the story is the failure to record why a change was made.

In a perfect world, anyone making a commit would write the “why” in the commit message and we would have the record we need. But the git log is not the solution; it is written at the last minute, without review by the person making the commit, and once pushed it can’t be changed (you surely don’t use git commit –amend after you have pushed). Like anything done at the end without review it is prone to short cuts and error. We are all familiar with the commit comprising 300 lines of change and the single line log entry, “Fixed issue”.

When I talk about change logs, I mean ones with a formal structure, which rigorously cover the reason for any change of significance in the code. We use the GNU ChangeLog format, since it is well tried and tested, but any format that applies sufficient rigor is appropriate. We get two benefits

  1. The rigorous format ensures nothing is missed. Tools can help here. For example, the GNU mklog script will take a git diff and construct the template for a GNU ChangeLog file entry, leaving you to just add the semantic content—the “why”.
  2. Being a separate entity in the code base, the entry can be reviewed as part of the code review process. A good change log entry speeds review, because the reviewer can see what is trying to be achieved. And ultimately if a mistake is made, it is a file that can be changed (and that change itself recorded) as a further commit.

There is an art to writing change log entries, very similar to the art of writing comments. A good entry should record why a change is being made, not just record the code changes (git show can deal with that). Even more important it should record why something was not done. In our example above the change log entry should have stated that Quicksort was being dropped due to support users who did not have it in the library and that bubble sort was appropriate because the amount of data being sorted was either small or mostly sorted and the chosen sort method must be stable.

Note that there is no need to explain in detail why bubble sort is appropriate and Shell’s sort is not: a professional software engineer will know this. Similarly there is no need to explain why the method must be stable: a professional will know that there must be sorts on multiple keys going on.

Having prepared your change log entry — typically in a file called ChangeLog —, we could just commit it with all the other files in Git. However we can do better than that: if we also incorporate it in our Git log entry, we will have the convenience of seeing the change log using git log, and easily relating changes to particular commit IDs. So, using the GNU format, our ChangeLog file entry could be:

2014-06-10 Jeremy Bennett <jeremy.bennett@embecosm.com>
        
        * sort-routines.c (central_sort): Replace Quicksort by bubble sort
        for users without QS library. Data is small and/or mostly sorted
        and sort method must be stable.

Git log messages need a title line of up to 72 characters, and we follow this by the body of the ChangeLog file entry. The corresponding git commit message would be:

Support users without Quicksort library.
       
        * sort-routines.c (central_sort): Replace Quicksort by bubble sort
        for users without QS library. Data is small and/or mostly sorted
        and sort method must be stable.

There is no need for the line giving the date and the user details—git commit puts that in automatically.

This may seem like quite a labor, but for a long lived project with multiple contributors it is invaluable. I was motivated to write this blog post, having recently picked up one of my old projects where someone else had made major changes for two years, but without keeping a change log (they had even blown away my original log!) I spent a month trying work out what an earth they were trying to do and how the code was supposed to work. In the end I threw away half the code and rewrote it from scratch. I’ve probably thrown away some really good ideas, but I have no way of knowing about them. If only a proper change log had been maintained!

signature-jeremy-blog