Some quality metrics that helped my teams

Measuring_Tape_Inch+CMI’ve been asked the question “what are the best metrics to improve software quality?” (or similar) a million times, this blog post is a selfish time saver, you are probably reading this because you asked me a similar question and I sent you here.

Firstly, i am not a fan of metrics and I consider a good 99% of the recommended software quality metrics pure rubbish. Having said that there are a few metrics that have helped teams I worked with and these are the ones I will share.

Secondly, metrics should be used to drive change. I believe it is fundamental that the metric tracked is clearly associated to the reason why the metric is tracked so that people don’t focus on the number but on the benefit that observing the number will drive.

Good metric#1: In order to be able to re-factor without worrying about breaking what we have already built we decided to raise the unit test coverage to >95% and measure it. Builds would fail if the metric was not respected.

Good metric#2: In order to reduce code complexity, improve readability and make changes easier, we set a limit and measured the maximum size of each method (15 lines) and the cyclomatic complexity (don’t remember the number but I think it was <10). Builds would fail if the metric was not respected.

Good metric#3: In order to continuously deliver low complexity easily testable units of work and help with predictability we started measuring the full cycle time of user stories from inception to production with the goal of keeping it between 3 and 5 days. When we had user stories that took more than 5 days we retrospected and examined the reasons.

In the 3 cases above, the focus is on the goal, the number is what we think will drive the change and can always be changed.

If people don’t understand why they write unit tests, they will achieve unit test coverage without guaranteeing the ability to refactor, for example by writing fake tests that don’t have assertions. We should never decouple the metric from the reason we are measuring something.

These are the good metrics, for me. If you want to see some of the bad ones, have a look at this article I wrote some time ago on confrontational metrics and delivery teams that don’t give a damn about their customers.

8 thoughts on “Some quality metrics that helped my teams

  1. I agree with the main theme of your post, but I don’t think code coverage is a good metric at all. Code coverage tells us nothing about how good our tests are, only what code got executed while the tests ran. So high coverage does not directly give us any confidence that no regressions have been introduced.

    A tool that I’ve been using to help gain confidence in my tests is PIT ( which is a mutation testing tool. What it does is inject defects into your application and then run your tests. If any of the tests fail then the mutant is killed (good) and if none fail, then the mutant survives (bad).

    So, use coverage to tell you which parts of your codebase definitely aren’t tested, but don’t assume that coverage tells you anything about the quality of your tests.

    • Hi Seb, thanks for your feedback! I agree, code coverage can be also misused, we tried to reduce this risk by training our developers specifically on unit testing, moreover our code reviews also include unit tests reviews. Deep inside I also trust our developers, don’t you? 🙂
      We started using mutation testing a few months back on some projects and we found it very useful, so yes I would recommend PIT as well, thanks for mentioning it.
      Actually, I just remembered, one of our guys wrote a blog on his experience with mutation testing here

  2. It is not clear from your post, if and if which frameworks you may be using. Unless you are living in a pure Java SE environment your missing one of the most important aspects, integration tests. You should be placing far more emphasis on these.

    Also enforcing a strict 95% code coverage will force developers to test getter and setter methods. In my experience these kinds of tests are not only useless but counterproductive since they hide important tests among lots of unnecessary code coverage compliance tests. It is far more important to test important functions properly, rather than have a 95% CC.

    As a rule of thumb only methods with a cyclomatic complexity from 3 upwards should be tested.

    Also I do not understand the obsession off making builds fail if certain metrics are not adhered to.
    Your CI build should point out integration build issues and not fail because a setter has not been tested.

    • Hi Stephan, thanks for your feedback.
      In regards to your comment on integration testing, maybe I am missing something, but I cannot find any reference in the article where I say that integration testing is not important, the main focus of this article is around the metrics that helped improve software quality, the fact that I don’t mention an integration testing metric doesn’t imply that we don’t do integration testing. As I mentioned I might be missing something, could you help me?

      Getters and setters can be easily removed from the coverage metrics if you want, there are a couple of tools to do that quite painlessly.

      As per your rule of thumb on the minimum cyclomatic complexity that requires a unit test, do you have any reference I can check?

      Failing a build for infringing a metric might seem like an obsession to you, but believe me it is not. I haven’t been able to find any other way of enforcing the metrics when you have more than a couple of developers committing code, if you have a better one please share it with me!

      • Regarding integration tests your are not missing anything, I personally find them to be far more important then unit tests. So I would place more emphasis on those.

        Regarding the cyclomatic complexity of 3 I can only refer to Real World Java EE Patterns-Rethinking Best by Adam Bien.

        Also I find this blog entry fairly pragmatic and sensible:

        You are right in saying the only way to enforce that rule is by making builds fail. However are the resulting tests really sensible or purely written to achieve the desired code coverage?

        I will hazard a guess, that the code coverage is good, but if you look at some of your more complex functions you may find that they are covered, but not properly tested (boundary cases, special cases, etc…).

        I haven’t tried it myself, however it may be interesting to see how TDD would work for your team and project.

        • Stephan, thanks for the reference and the link to the blog.
          Code coverage can be deceptive and that’s why I believe every developer should study unit testing and understand the benefit of it. We are also experimenting with mutation testing that can help you identify badly written unit tests.

  3. Hi Augusto,
    Its being a while since I don’t read your blog, I liked your post and I have some comments which I hope you find useful.

    On point #1
    I think we all agree that well tested code goes together with software quality.
    But the code coverage itself is not always a fully reliable metric. In my opinion, the important thing, is the amount of testing done at unit level, but also how good it is done… Developers need to always think, which type of path, assertion, etc… makes them comfortable from the testing perpective, and if they cannot test it that way then the code is not testable. It is a bit of a tradeoff but in order to test, you have in occasions to sacrifice design as a prior step to a future evolution in your software. If you are OO developers, I am not telling you to break encapsulation but what I invite you to do is to use more parameter injections and move codes from here and there until it becomes into a shape that is testable.
    Now, I know what some are thinking, this may not necesasrily make sense if you see it from a BDD point of view, but it is not true, the power of it is that it forces you to rethink what actually things mean in your system which is now evolving.

    On point #2
    Do loops and shit make you mad? Encourage your devs to spike and play with functional style languages. I recently started using Java8, and can tell you that
    working with streams is super easy and allows you to do very complex things. Also I like very much Java8 because its not purely functional and not purely OO, so you get a bit of both so you have flexibility depending on the program you are solving. This is not surprising, I think multiparadigm languages have a lot to say, and Java 8 is now one of them.

    I liked very much your post, I also have my opinion on this topic, and you just inspired me to write on this topic, have a look at my post on quality metrics: Happy to take feedback.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s