Code Reviews, Quality and Coverity ResultsWed, 22 Dec 2010 - 18:00
NOTE: if you just want to see the Coverity results, feel free to jump to the end of this long post :-)
When I helped jump-starting this project one of the things I wanted to try out was a very strict Review Policy for a few reasons.
One of the reasons was consistency. Previous projects I participated in had very lax policies about pretty much everything. Style, review, quality, where not strictly enforced, and this is felt as a way to keep the barrier to entry low. In my experience though, the inconsistent style, unclear direction, poor or no review, in the end caused other different barriers to a new developer.
Lack of enforced consisting style makes code difficult to read, especially when you have to read a pair of interacting functions that are written in wildly different styles.
Lack of required reviews helped creating an environment in which contributions from outside are not promptly commented upon. The developer with commit access is used to throw in pretty much everything without having to wait for someone to review. This makes core developers forget how painful it is waiting for a review that is never happening. This in turn can discourage new developers that do not have direct commit access from proposing patches as they see too little feedback and do not feel properly engaged.
Finally quality is something I think suffers a lot from lack of review. Developers that do not have to stand review tend to become more relaxed, code is thrown in without much thought, as long as it doesn't break the build. But breaking the build is a pretty low standard. So often the way a function perform operations, the semantics, are implicitly assumed by other code. Reviews, in my experience, tend to expose the same piece of code to different point of views, and expertise within the project. Things that seem innocuous are pointed out and both developers at the end of the process gain both more knowledge of each other points of view, and more knowledge in general about the piece of software they are modifying. Usually the net result is that in the mid/long term code quality improves significantly.
When you use a common SCM tool, like git, code reviews can happen in two ways. Review before commit or Review-Commit (R-C), and review after commit or Commit-Review (C-R). In SSSD we use the former. Patches must be reviewed and acked by a second developer before they can be committed.
R-C is generally thought as a stricter method, but I find it much better than C-R.*
In my experience with the C-R method the reviewer is encouraged to do sloppier and cursory reviews and just give acks unless something really stands up as very ugly. Patches regularly slip past review during phases where a lot of churn happens. Long patches tend to get the least review (exactly when reviews are more important). People are less engaged. Also because the code is already committed, bad patches can cause a lot of bad feelings, the patch is seen as breaking the code, reverts are called for and the author may feel embarrassed or angered by how they are being treated.
R-C instead assures review is done, more importantly it requires active intervention from the reviewer. This in turn makes it less problematic to actually comment on all aspects of the code even minor ones. Of course it also risks abuse from obsessive nitpickers, but in general lets people speak frankly of the code, and request the appropriate corrections be done or the code will not be committed. The patch is never seen as breaking anything, as it is not committed yet, so you rarely see that added anxiety, pressing and bad feelings that rise when a fix is needed asap. The patch creator have all interest in fixing the issues and learning why they were issues in the first place, and resubmit a better patch, without pressure or embarrassment.
I found this aspect to be fundamental in helping new developers get up to good code standards quickly. Not only people does not get frustrated by poor commits that need to be "fixed" asap. But the interaction between more senior developers and younger ones benefits both greatly. On the one hand the younger developer gets access to the insights of the more experienced developer. They get to understand why the patch is not OK and how it need to be improved to be made acceptable. On the other hand the more experienced developer gets a grasp of what parts of the code are really difficult to deal with for younger ones. Sometimes you are so used to do things one way that you don't realize that they really are pain points and needs refactoring to make them usable.
Also because all developers are submitted to the same regime there are no 'elites' that escape review. And this prevents bad feelings when a patch takes some more time to get approved. It generally also prevent the 'elite' from looking down on new developers. Or other similar 'status' issues. Of course there always developers that are more authoritative, but that authority is earned on the field and maintained through reviews
Arguably all these arguments are strongly biased by my personal view of things, I definitely do not deny that, but is there a metric that can tell whether I was right or wrong in some respect ?
Coverity Results seem to give some interesting insight.
We have been running Coverity a couple of times during the 1.2.0 development cycle, using spare cycles of an internal Red Hat instance. 1.2.0 was an important release for us because it was going to end up in RHEL 6.0 so we wanted to find and fix as many critical bugs as possible.
The first time, ever, that we ran Coverity on the SSSD code base gave us back a defect density of 1.141 bugs per thousand lines of code. After removing the false positives we were down to 0.556 bugs per thousands lines of code.
This was an astounding result. As you can see in the 'Coverity Scan: 2010 Open Source Integrity Report' the mean of defects for the software industry is around 1 defect per thousand lines, and the mean for first scan is usually much higher. Also looking at the 2006 report the mean for the top most 32 open source projects was around 0.4 defects per thousands lines. So we were pretty close to that metric too.
Of course we fixed most of the bugs that were found and a second scan of the 1.2.1 release revealed a defect density of 0.029 bugs per thousands lines. I call that impressive (and if you know me you know I am not someone that easily shows enthusiasm).
That was all and well, but we didn't have further access to Coverity until recently. During the release of 1.5.0 we got access again to Coverity scans, so we ran the tool to find out how we fared.
Before spitting numbers I have to say that the comparison against 1.2.0 is a bit skewed because we forked off a set of basic libraries that now live in their own tree.
1.2.1 had ~ 74k lines of C code alone and the libraries we forked off constituted ~12k lines of that code. In 1.5.0 we have ~ 65k lines instead. So we roughly lost 12k lines and gained 3k lines total. The amount of code change is quite a different thing though. Using git, I can see that the removal of the libraries amounted to roughly 34k deletions (this counts also makefiles, comments, blanklines, etc... that's why it is different than the 11k LOC numbers I gave above) while the diffstat of the diff between 1.2.1 and 1.5.0 gives ~ 73k deletions and 56k additions. So quite a bit of changes happened on that code base after all.
In mid December we scanned the code base, roughly 6 months after the release of 1.2.1, and the results were again astounding: 0.189 bugs per thousand lines. In total 24 defects, 20 real, and 4 false positive. And a week later the we were down to 0 (zero) outstanding defects.
These numbers tell me that our code quality is quite good, and although I can't claim a causal effect, I believe that our review strategy is to be accounted for much of it.
Finally, Congratulations to all SSSD developers. You've done a fine job guys, quite a fine job!
* - I have to say that w/o git R-C would be probably too painful, but git let's you manage the code so easily that R-C has become much simpler and doesn't block a developer as he can keep piling patchs on top of his own repository while waiting for the review, and later easily use the rebasing features of git to fix whatever need fixing quite easily.