Why code coverage matters

2008-11-19

Tobias Schlitt

I'm a fan of PHPUnit code coverage reports. And with this sentence I can see a lot of the developers out there shiver, because they are of the opinion, that code coverage reports for unit tests are nonsense and cannot give you any hint on the quality of a test suite. I see it a bit differently. Surely, a high code coverage rate of a test suite does never indicate, that code is well tested (if you have not written the code and tests yourself). But the other way around works: A small code coverage rate definitly means, that the test suite is not sufficient. But let me dig a bit deeper into code coverage and what it gives you.

The PHPUnit code coverage report indicates, how many lines of the code you are testing have been executed during the test run. It shows you the figure for each directory (aggregated from the files and directories contained) and each file of the tested code and gives you some color indication, if you have a high, medium or low code coverage rate. Beside that, it shows you the source code of each of your files and indicates, which code lines got executed, which not and which are unreachable.

So, basically you can check, which lines of the code to test are covered by the unit tests and, more important, which are not. A covered line does actually not mean, that this code line is properly tested, but a not covered line definitly means, that this one is not tested at all. For you as a developer, the latter fact is quite important, since it gives you an indicator for test cases you still need to create.

Creating a unit test is over all not easy. To create a proper unit test you need to think of several things: First you need to know, what a specific method should be doing. That is kinda easy, if you wrote (usually will write) the method yourself, but can be kinda hard, if you didn't. Second, you need to know what the preconditions of the method are (which attributes the methods accesses, what the values of these must be to achieve the desired result, which other objects must be instanciated,...). Third you need to know, how the method should behave for different input parameters (and with "behaviour" I mean success and failure).

If you have all these information collected, you can start writing the test cases for your method. Still it may sometimes be hard, if you have a method that performs a more complex operation, since you need to think of all possible (and in most cases you should also think about impossible) combinations your method can be called with. But I'm sure, if you are not lazy, you can figure out some sensible combinations of preconditions, input parameters and expected result.

But how can you figure out, if you really covered all necessary cases? Well, that is even harder than creating the test cases, because you usually think in limited dimensions and can not think of every possible case how a co-developer (or user, if you're developing a library) may try to use your method. Anyway, at this point, code coverage comes into the game. Although it cannot tell you, how a possible user of your method might try to interact with it, it can give you a hint on which code was not tested by you so far. If you see some red lines in the code coverage report, you definitly missed some cases you had in mind while developing the method and which are not tested.

Let me give a small example:

class DiceGamePlayer
{
    public $playerName;
    // ... some more attributes and methods here...
    public function throwDice( $diceSize )
    {
        if ( !is_int( $diceSize ) && $diceSize < 2 )
        {
            throw RuntimeException( "A dice with $diceSize values does not make sense! Player $player must be cheating!" );
        }
        return mt_rand( 1, $diceSize );
    }
}

This tiny method could be from a dice game, that contains several different dices (e.g. roleplaying games usually have). It is quite small, so it should be easy to think of unit test cases. The method obviously has the attribute $playerName as its only pre-condition and expects and integer value larger than 2 as its parameter. The desired return value is an integer value between 1 and the given dice size. A typical test case would be, to give it the number 6 and expect that it returns an integer value between 1 and 6. With that, you already have the desired functionality of the method covered. At least, you might think so.

But a look at the code coverage will indicate, that you only covered 50% of the (executable) code, because you only checked the functionality for a correct input value and did not test the method with an incorrect value. That means: More test cases are necessary to fully test the method. Since the method expects an integer value, it would be a good idea to test if the exception is thrown if you give it a string value like "foo". Well, another look at the code coverage will indicate that now 100% of the code is tested... and this is the point, where code coverage cannot help you any further.

Although you covered the full code now, you still have a bug in it. Another test case is necessary: What happens, if you give that method the integer 1? This is an integer value but smaller than 2, so the new test case would expect the RuntimeException to be thrown. This test case would actually fail, because the checking condition is incorrect. You would need to replace the logical-and operator with a logical-or to have it working correctly.

So, why do I tell you all this, if code coverage can not help you in this place? Well, it already helped you, with indicating, that the given portion of code was not tested. But that does not mean, that you can switch off your brain at this point. You still need to think about all possible cases this part of the code should cover. But still the code coverage report gave you a good hint, that you missed to test something at all.

What is the conclusion of all this? First of all: Writing sensible unit tests is not easy and needs at least a good imagination. A large code coverage rate does not mean that your test cases are good and that every possibility is tested. But a small code coverage number always means, that you missed to test parts of your code. And the report itself can then give you sensible hints, which parts of the code still need explicit testing. And if you you are capable of writing sensible test cases for these parts of your code and are sure (sure, you never can be!) that you already wrote sensible test cases for the covered parts, a large code coverage rate can also indicate a good test suite.

I think a much better alternative for testing the quality of your test suite is the so called "code mutation", which will hopefully be developed for PHPUnit during this years Google Summer of Code. A code mutation tool will try to change small bits of the code to test over and over again, until a test case will fail. If your test suite bails out with (almost) any change in the tested code, this indicates, that you have covered a lot of imaginable cases and in that way, that you tested well (remember, that you can never be really sure!).

So long I can only recommend to check your code coverage report and see where you really missed to test code and then switch on your brain to create sensible test cases for these parts. I'm pretty sure, that this will already raise the quality of your code a lot. At least, this is what I experienced again and again in the past, when looking at my code coverage reports and creating test cases for the not executed parts. Be sure, that you can already clean up a lot of tiny (and possibly larger) bugs and that you can ensure to not break BC much better than before.

So long, happy testing! :)

Comments

I think code coverage is also nice just for motivation. You see this nice percentage go up every time you add a test. And when you add new code it goes down, so you might feel inclined to bring the number back up.

Lukas at 2007-04-12

Yeah, code mutation replaces testcase knowhow for some people. Sorry but the whole article misses one important fact: cost. Real life test suites on complex software seem to have a very minimal code coverage very often because the testing effort (in terms of money) is too great if you'd like to have code 100% coverage. We work a lot with tests actually my suite tells me:
26/26 test cases complete: 1050 passes, 0 fails and 3 exceptions.
We do a job here to earn money and these tests help us to ensure we are delivering a high quality product in a estimated amount of time, developer ressources and money.
As long as we still find bugs with creativity and a little bit of G. Erwin Taller's know how (http://www.amazon.de/Software-Test-Verifikation-Validation-Georg-Thaller/dp/3882291982/ref=pd_bbs_5/028-5037639-0784531?ie=UTF8&s=books&qid=1176374153&sr=8-5) I don't mind a code coverage run at all. It is too much of a false friend that has nothing to do with reality. Not that i say youre 100% wrong, but you forget the facts time2market and ressouces.

Sebs at 2007-04-12

(!is_int( $diceSize ) && $diceSize < 2)
There is an error here... Can't be non-integer and be smaller that 2 at the same time (Ok! float, but should be in error too). (!is_int( $diceSize ) || $diceSize < 2) is more like it.

Philippe Gamache at 2007-04-12

I think this approach is the greatest mistake made in nowerdays software development. If you need to develop any kind of software for a customer (no matter if it is a complete custom program or a standard software you are distributing), the price is mostly counted for the pure development effort. Most companies save money in the areas of planning, designing and testing.
This definitly allows you to come to market with a very low price. The major problem occurs later, when your customers complain about bugs and you have to debug them. This usually results in much larger effort as if you had done proper testing before. Writing tests in paralell to (or even before) writing the actual code is much less pain than debugging later. And for most bugs that are found by a user, you pay the fixing price yourself in terms of your warrenty.

Toby at 2007-04-12

Please read my text carefully again, I already hinted on this error. It was made intentionally to show the limits of what code coverage can do for you. :)

Toby at 2007-04-12

The TDD cycle is to write a test expressing a real implementation-agnostic requirement, write just enough code to make it pass, and then move on to the next requirement. Refactor later. There would never be any code not covered by tests. It's a nice way to explore new territory: little steps, one at a time.

noel darlow at 2007-04-13

Yes, this is true TDD. Although this works in some cases, you will find a lot of cases where you need to write tests for existing software (e.g. with the rapid prototyping approach).

Toby at 2007-04-13

Rightful note. Thx!

sf at 2007-04-13

I was not clear enough: After a certain amount of experience I say coding and point out to the whole process, including testing and debugging.

Sebs at 2007-04-13

Interesting article. Often times I hear code coverage as an end in itself, instead of using it as a tool as you've outlined here.
One point that was missed here is the benefit of a testing a legacy system. In many cases, an acceptance/functional test is more appropriate for a legacy system as the underlying code at some point will be refactored. For this purpose you don't care if there's a 100% test coverage so long as the required functionality is tested. That type of test does not aim for 100% coverage, but rather just enough to make sure that the requirements of function X are met.
At that point, you might have a 10% test coverage simply because of cruft. Focusing on testing features that aren't necessary does spend coding (as Sebs describes it) effort where it isn't really necessary.
The article does bring up a good point though. When inheriting/using an unfamiliar system that does need to be built upon, code coverage definitely does offer a useful tool for determining what needs to be tested and where.

Travis Swicegood at 2007-04-13

A good article thanks.
My experience with code-coverage is a mixed bag. I have recently been assigned a project where the other "main" programmer since day-1 has highlighted at all meetings how high he is keeping his code-coverage. However when I started looking at the code, his code-coverage was almost equivalent to the amount of simple getters and setters - none of the critical code was tested at all.
While this is no problem to me (if I didn't have to work on the project that is), then I think the real issue is that non-tech people are easily amazed by graphs and numbers, as if its the truth about the state of the code.

Soren at 2007-04-16

I see the problem. That is why I wrote this article, because code coverage is often missused and also missunderstood.
Thanks for your feedback! :)

Toby at 2007-04-16