Using Coverage to Test a Prime Number Generator Program

Summary

In this example we use a simple program implementing the Sieve of Eratosthenes algorithm that computes and outputs all prime numbers up to a number supplied by the user. We test this program by focusing on tests that achieve a high percentage of Statement, Branch, or Path coverage.

System Description

The definition of a prime number is:

"A positive integer not divisible without a remainder by any positive integer other than itself and one." (dictionary.com)

"1 and 0 are considered neither prime nor composite" (wikipedia.org)

Let's say that our specification says that the program should find all the prime numbers up to (but excluding) a positive number supplied by the user. A valid input is any number that is greater than 0 and less than the maximum range (which in this case we define as 5000).

The algorithm used in following Java program is from ccins.camosun.bc.ca.

First we make sure that the input is a positive integer less than the maximum allowed input.

We will use a boolean array as a sieve (as in "Sieve of Eratosthenes"). We go through each number from 2 until the number inputted by the user and mark all multiples of that number as composite (setting true in the corresponding position in the sieve array). We start from 2 because 0 and 1 are neither composite nor prime.

We need to cover only the numbers that are less than or equal to the square root of the number inputted by the user, because if one of the multiples of a number is greater than the square root of the result, the other multiple must be smaller and therefore was already checked.

Test Design

Let's start by thinking of test cases that will achieve a high percentage **Statement Coverage**.
The percentage of executable statements in a component that have been exercised by a test case suite (http://www.testingstandards.co.uk/living_glossary.htm).
It turns out that we can achieve 100% statement coverage with only 1 test:

test 1: **input= -5**

Since we have the condition:

The condition evaluates to true and the program uses the default value as input instead of the one supplied by the user.

It turns out that this default value will cause every statement in the program to be executed.

Does this mean that with a single test we can declare that the testing is complete? Certainly not. Here are some examples of bugs that could be missed:

- The simplest counter example would be a program that works
**only**with the default value (e. g. having the answer to the default value hard-coded in it). In this case we cannot catch the error by testing with the default value. - If the first loop was going from 2 to Math.sqrt(inputNum) instead of Math.sqrt(inputNum)
**+ 1**, we would have had an off by one error that could be caught only by an input of a perfect square number, such as 49 for example. In this case the bug would cause the loop to stop before 7, and declare 49 a prime, when it is not. But since we are using 50 as default, this bug would not have been caught.

Now let's consider test cases that achieve a high percentage of **Branch coverage**.
A test coverage criteria which requires that for each decision point each possible branch be executed at least once. Syn: decision coverage (http://www.validationstation.com/glossary/glossaryb.htm). It turns out that we can achieve 100% branch coverage with only 2 test cases:

**input= 0** and **input =1**

In this program we have two 'if' statements and 3 loops.

When input is 0, the first if statement evaluates to true, and the default value (50) is used as input. As a result the bodies of all 3 loops are executed at least once. The second 'if' statement is evaluated several times to true and several times to false.

What is left to do is force the first 'if' statement to evaluate to false, and the bodies of the 3 loops to not be executed. Input = 1 accomplishes that.

Again, the fact that we have achieved 100% Branch Coverage does not mean that we have extensively tested the code. These are some examples of errors that we could have missed:

- Again a counter example could be if the first loop was going from 2 to Math.sqrt(inputNum) instead of Math.sqrt(inputNum)
**+ 1**, we would have had an off by one error that could be caught only by an input of a perfect square number, such as 49 for example. In this case the bug would cause the loop to stop before 7, and declare 49 a prime, when it is not. - Another error that could occur is "off by one" error in the second for loop. If we had the stopping condition to be sieve.length-1, we would not output information about the last number in the range, which is fine if the last number is not a prime (since we output only prime numbers), but in cases where the last number is prime (such as input= 13), the program will not give the correct answer.

The next coverage metric that we will consider is **Path Coverage**. It is measured by the percentage of different execution paths that have been taken through the source code. If we have two independent 'if' statements then we have 4 paths - {false, false}, {true, false}, {false, true}, {true, true}. For loops we consider two possibilities- zero repetitions and more than zero repetitions. For more information see
S. C. Ntafos, A Comparison of Some Structural Testing Strategies, IEEE Transactions on Software Engineering, v.14 n.6, p.868-874, June 1988.

In our program we have two 'if' statements and three 'for' loops. We might think that this means that there are 2 possible branches for each 'for' or 'if' statement or 2^(2+3)= 32 different execution paths for the whole program but in this case many of these paths are impossible because the constraints for the input variable would be contradictory. Let's find out how many possible paths do we actually have.

For the first statement:

We could take the 'true' branch with input < 1 or input > MAX_RANGE (e.g. **input = 0**. It doesn't matter what the actual input is as long it fits these requirements, because it will be changed to the default value, which is 50 in this case), and the 'false' branch with input= 1.

Let's look what happens if we take the 'true' branch of the first 'if' statement. There should have been 2^4= 16 paths which should have been exercised by changing the input variable while still maintaining the constraint for the first 'if' statement (input < 1 or input > MAX_RANGE). In our case we can take **only a single path in the 'true' branch **because changing the input variable cannot change the execution path because we assign the default value (*inputNum= DEFAULT*).

Now let's consider what happens in the 'false' branch of the first 'if' statement. To get in this branch input has to be between 1 and MAX_RANGE. There should have been 2^4= 16 paths in this branch but some of them are not possible (e.g. the second 'for' loop is inside the first one, which limits the number of possible paths).

If the **input is 1**, the first 'if' statement will evaluate to false, and then the first and third 'for' loops will evaluate to false (the second 'for' loop and the second 'if statement will not be reached).

So far we have found all the paths, if the first 'if' statement is true, and one path if the first 'if' statement is false and the first 'for' loop is false. Let's see if there is a case where the first 'if' statement is false and the first 'for' loop is true at least once. **Input = 5** forces the first 'if' statement to evaluate to false; the first 'for' loop evaluates to true at least once; the second 'for' loop evaluates to true at least once also and there is no input that could force it to evaluate to 'false' immediately (it is either 'true' at least once or unreachable). The third 'for' statement will evaluate at least once to true, and the second 'if' statement will also evaluate at least once to true (again this 'if' statement will either be true at least once or not reachable at all).

So far we have explored all paths (except for the impossible ones) except a path that forces the first 'if' statement to be false, the first 'for' loop to be false (second 'for loop will be unreachable), and the third 'for' loop to be true at least once (which means that the second 'if' statement will take both true and false values). This can be achieved with **input = 3.**

As a summary, the following inputs exercise all reachable paths:

input = 0

input = 1

input = 3

input = 5

Again even though we achieved 100% coverage, there are bugs that could be missed:

- Again a counter example could be if the first loop was going from 2 to Math.sqrt(inputNum) instead of Math.sqrt(inputNum)
**+ 1**, we would have had an off by one error that could be caught only by an input of a perfect square number, such as 49 for example. In this case the bug would cause the loop to stop before 7, and declare 49 a prime, when it is not. - We might have a maximum range that doesn't meet user's expectations for a reasonable input range.

Results/Relevance

As we could see it was relatively easy to achieve a high percent coverage for this simple program. We usually cannot achieve 100% of a certain type of coverage on a real world application. Even if we do achieve high percent coverage (e.g. we did 100% in all of these examples), that doesn't mean that we can declare the code completely tested. As we saw there are always bugs that could have been missed. Coverage metrics could be used to get ideas about what areas of the application to test but they are not so useful when we have to decide whether we are done with testing or not.