software testing

software testing

Testing

Testing Fundamentals

In this section we will first define some of the terms that are commonly used when discussing testing. Then we will discuss some basic issues relating to how testing can proceed, the need for oracles for testing.

Error, Fault , and Failure

The term error is used in two different ways. It refers to the discrepancy between a computed, observed, or measured value and the true, specified, or theoretically correct value. Error refers to the difference between the actual output of a software and the correct output.

Fault is a condition that causes a system to fail in performing its required function. A fault is the basic reason for software malfunction and is synonymous with the commonly used term bug. The term error is also often used to refer to defects.

Failure is the inability of a system or component to perform a required function according to its specifications. A software failure occurs if the behavior of the software is different from the specified behavior. Failures may be caused due to functional or performance reasons. A failure is produced only when there is a fault in the system.

Presence of an error (in the state) implies that a failure must have occurred, and the observance of a failure implies that a fault must be present in the system. However, the presence of a fault does not imply that a failure must occur. The presence of a fault in a system only implies that the fault has a potential to cause a failure to occur.

During the testing process, only failures are observed, by which the presence of faults is deduced. That is, testing only reveals the presence of faults.

The actual faults are identified by separate activities, commonly referred to as “debugging.”

In other words, for identifying faults, after testing has revealed the presence of faults, the expensive task of debugging has to be performed.

 

Test Oracles

A test oracle is a mechanism, different from the program itself, that can be used to check the correctness of the output of the program for the test cases. Conceptually, we can consider testing a process in which the test cases are given to the test oracle and the program under testing. The output of the two is then compared to determine if the program behaved correctly for the test cases, as shown in the figure below.

Testing and Test Oracles.

 

Ideally, we would like an automated oracle, which always gives a correct answer. However, often the oracles are human beings, who can make mistakes.

The human oracles generally use the specifications of the program to decide what the “correct” behavior of the program should be. However, the specifications themselves may contain errors, be imprecise, or contain ambiguities.

There are some systems where oracles are automatically generated from specifications of programs or modules. With such oracles, we are assured that the output of the oracle is consistent with the specifications.

 

Test Cases and Test Criteria

Ideally, we would like to determine a set of test cases such that successful execution of all of them implies that there are no errors in the program. This ideal goal cannot usually be achieved due to practical and theoretical constraints. Each test case costs money, as effort is needed to generate the test case, machine time is needed to execute the program for that test case, and

more effort is needed to evaluate the results. Therefore, we would also like to minimize the number of test cases needed to detect errors. While selecting test cases the primary objective is to ensure that if there is an error or fault in the program, it is exercised by one of the test cases.

On what basis should we include some element of the program domain in the set of test cases and not include others? For this test selection criterion (or simply test criterion) can be

used. For a given program P and its specifications S, a test selection criterion specifies the conditions that must be satisfied by a set of test cases T. The criterion becomes a basis for test case selection.

There are two fundamental properties for a testing criterion: reliability and validity

Getting a criterion that is reliable and valid and that can be satisfied by a manageable number of test cases is usually not possible. So, often criteria are chosen that are not valid or reliable like “90% of the statements should be executed at least once.”

Getting a criterion that is reliable and valid and that can be satisfied by a manageable number of test cases is usually not possible. So, often criteria are chosen that are not valid or reliable like “90% of the statements should be executed at least once.”

Even when the criterion is specified, generating test cases to satisfy a criterion is not simple.

Generating test cases for most of the criteria cannot be automated.

For example, even for a simple criterion like “each statement of the program should be executed,” it is extremely hard to construct a set of test cases that will satisfy this criterion for a large program, even if we assume that all the statements can be executed (i.e., there is no part that is not reachable).

 

Psychology of Testing

As we have seen, devising a set of test cases that will guarantee that all errors will be detected is not feasible. Moreover, there are no formal or precise methods for selecting test cases.

The basic purpose of testing is to detect the errors that may be present in the program. Hence, one should not start testing with the intent of showing that a program works; but the intent should be to show that a program does not work.

This emphasis on proper intent of testing is not a trivial matter because test cases are designed by human beings, and human beings have a tendency to perform actions to achieve the goal they have in mind. Testing is essentially a destructive process, where the tester has to treat the program as an adversary that must be beaten by the tester by showing the presence of errors.

One of the reasons many organizations require a product to be tested by people not involved with developing the program before finally delivering it to the customer is this psychological factor. One of the reasons many organizations require a product to be tested by people not involved with developing the program before finally delivering it to the customer is this psychological factor.

 

Black-Box Testing

There are two basic approaches to testing: black-box and white-box. In black-box testing the structure of the program is not considered. Test cases are decided solely on the basis of the requirements or specifications of the program or module, and the internals of the module or the program are not considered for selection of test cases.

In black-box testing, the tester only knows the inputs that can be given to the system and what output the system should give. The most obvious functional testing procedure is exhaustive testing, which as we have stated, is impractical. This strategy has little chance of resulting in a set of test cases that is close to optimal

 

i)Equivalence Class Partitioning

The next natural approach is to divide the input domain into a set of equivalence classes, so that if the program works correctly for a value then it will work correctly for all the other values in that class. An equivalence class is formed of the inputs for which the behavior of the system is specified or expected to be similar. Each group of inputs for which the behavior is expected

to be different from others is considered a separate equivalence class.

For robust software, we must also consider invalid inputs. That is, we should define equivalence classes for invalid inputs also.

Equivalence classes are usually formed by considering each condition specified on an input as specifying a valid equivalence class and one or more. Invalid equivalence classes. For example, if an input condition specifies a range of values (say, 0 < count < Max), then form a valid equivalence class with that range and two invalid equivalence classes, one with values less than the lower bound of the range (i.e., count < 0) and the other with values higher than the higher bound (count > Max).

Also, for each valid equivalence class, one or more invalid equivalence classes should be identified. Once equivalence classes are selected for each of the inputs, then the issue is to select test cases suitably. There are different ways to select the test cases. One strategy is to select each test case covering as many valid equivalence classes as it can, and one separate test case for each invalid equivalence class.

As an example consider a program that takes two inputs—a string s of length up to A’ ‘ and an integer n. The program is to determine the top n highest occurring characters in s. The tester believes that the programmer may deal with different types of characters separately. One set of valid and invalid equivalence classes for this is shown in Table below.

With these as the equivalence classes, we have to select the test cases. A test case for this is a pair of values for s and n. With the first strategy for deciding test cases, one test case could be: 5 as a string of length less than N containing lower case, upper case, numbers, and special characters; and n as the number 5. This one test case covers all the valid equivalence classes

(EQl through EQ6). Then we will have one test case each for covering lEQl , IEQ2, and IEQ3. That is, a total of 4 test cases is needed.

 

ii)Boundary Value Analysis

It has been observed that programs that work correctly for a set of values in an equivalence class fail on some special values. These values often lie on the boundary of the equivalence class. Test cases that have values on the boundaries of equivalence classes are therefore likely to be “high-yield” test cases, and selecting such test cases is the aim of the boundary value analysis. Boundary value test cases are also called “extreme cases.”

In case of ranges, for boundary value analysis it is useful to select the boundary elements of the range and an invalid value just beyond the two ends (for the two invalid equivalence classes). So, if the range is 0.0 < x < 1.0, then the test cases are 0.0, 1.0 (valid inputs), and -0.1, and 1.1 (for invalid inputs). Furthermore, we should try to form test cases that will produce an output that does not lie in the equivalence class.

If there are multiple inputs, then how should the set of test cases be formed covering the boundary values? Suppose each input variable has a defined range. Then there are 6 boundary values—the extreme ends of the range, just beyond the ends, and just before the ends. If an integer range is min to max, then the six values are min — 1, mm, min + l, max — 1, max,

max + 1.

In the first strategy, we select the different boundary values for one variable, and keep the other variables at some nominal value. And we select one test case consisting of nominal values of all the variables. In this case, we will have 6n-|-1 test cases. For two variables X and Y, the 13 test cases will be as shown in Figure below.

iii)Cause-Effect Graphing

One weakness with the equivalence class partitioning and boundary value methods is that they consider each input separately. They do not consider combinations of input circumstances that may form interesting situations that should be tested. One way to exercise combinations of different input conditions is to consider all valid combinations of the equivalence classes of input conditions. For example, if there are n different input conditions, such that any

combination of the input conditions is valid, we will have 2^ n test cases.

A cause is a distinct input condition, and an effect is a distinct output condition. Each condition forms a node in the cause-effect graph. The conditions should be stated such that they can be set to either true or false. For example, an input condition can be “file is empty,” which can be set to true by having an empty input file, and false by a nonempty file. Conditions are combined using the Boolean operators “and,” “or,” and “not,” which are represented in the graph by &, |, and ~.

Suppose that for a bank database there are two commands allowed:

credit   acct_number   transaction_amount

debit    acct_number   transaction_amount

 

The requirements are that if the command is credit and acct_number is valid, then the account is credited.  If the command is debit, the acct_number is valid, and the transaction_amount is valid(less than the balance), then the account is debited.  If the command is not valid, the account number is not valid, or the debit amount is not valid, a suitable message is generated.  We can identify the following causes and effects from these requirements:

 

Causes:

c1. Command is credit

c2. Command is debit

c3. Account number is valid

c4. Transaction_amt is valid

 

Effects:

E1.Print “invalid command”

E2. Print “invalid account_number”

E3. Print “Debit amount not valid”

E4. Debit account

E5. Credit account

 

The cause-effect of this is shown in Figure below.  In the graph, the cause-effect relationship of this example is captured.  For all effects, one can easily determine the causes each effect depends on and the exact nature of the dependency. For example, according to this graph the effect e5 depends on the causes c2,c3 and c4 in a manner such that the effect e5 is enabled when all c2,c3 and c4 are true.  Similarly, the effect e2 is enabled if c3 is false.

 

iv)Pair-wise Testing

Single-mode faults can be detected by testing for different values of different parameters. So, if there are n parameters for a system, and each one of them can take m different values (or m different classes of values, each class being considered as same for purposes of testing as in equivalence class partitioning), then with each test case we can test one different value

of each parameter. In other words, we can test for all the different values in m test cases.

All faults are not single-mode and there are combinations of inputs that reveal the presence of faults. For example, a telephone billing software that does not compute correctly for night time calling (one parameter) to a particular country (another parameter).

If there are n parameters, each with m values, then between each two parameter we have n^m pairs. The first parameter will have these many pairs with each of the remaining n — 1 parameters, the second one will have new pairs with n — 2 parameters (as its pairs with the first are already included in the first parameter pairs), the third will have pairs with n — 3 parameters and so on. That is, the total number of pairs are m*m*n*( n — l)/2.

As there are n parameters, a test case is a combination of values of these parameters and will cover (n — 1) -I – (n — 2) -h … = n(n — l)/2 pairs.

As an example consider a software product being developed for multiple platforms that uses the browser as its interface.

So, we have the following three parameters with their different values:

Operating System: Windows, Solaris, Linux

Memory Size: 128M, 256M, 512M

Browser: IE, Netscape, Mozilla

For discussion, we can say that the system has three parameters: A (operating system), B (memory size), and C (browser). Each of them can have three values which we will refer to as ai,a2,as, bi,b2,bs, and ci,C2, C3. The total number of pair-wise combinations is 9*3 = 27. The number of test cases, however, to cover all the pairs is much less. A test case consisting of

values of the three parameters covers three combinations (of A-B, B-C, and A-C). Hence, in the best case, we can cover all 27 combinations by 27/3=9 test cases. These test cases are shown in Table 10.2, along with the pairs they cover.

 

  1. v) State-Based Testing

There are some systems that are essentially state-less in that for the same inputs they always give the same outputs or exhibit the same behavior. There are, however, many systems whose behavior is state-based in that for identical inputs they behave differently at different times and may produce different outputs. The reason for different behavior is the state of the system,

A state model for a system has four components:

  • Represent the impact of the past inputs to the system.
  • Represent how the state of the system changes from one state to another in response to some events.
  • Inputs to the system.
  • The outputs for the events.

 

To create a state machine model of this system, we notice that of a series of six requests, the first 5 may be treated differently. Hence, we divide into two states: one representing the  receiving of 1-4 requests (state 1), and the other representing the receiving of request 5 (state 2). Next we see that the database can be up or down, and it can go down in any of these two

states. However, the behavior of requests, if the database is down may be different. Hence, we create another pair of states (states 3 and 4). Once the database has failed, then the first 5 requests are serviced using old data. When a request is received after receiving 5 requests, the system enters a failed state (state 5), in which it does not give any response. When the system recovers from the failed state, it must update its cache immediately, hence is goes to state 2. The state model for this system is shown in Figure below {% represents an input from the user for taking the survey).

 

 

We discuss only a few here. Suppose the set of test cases is T. Some of the criteria are:

  • All transition coverage (AT). T must ensure that every transition in the state graph is exercised.
  • All transitions pair coverage (ATP) . T must execute all pairs of adjacent transitions. (An adjacent transition pair comprises of two transitions: an incoming transition to a state and an outgoing transition from that state.)
  • Transition tree coverage (TT) . T must execute all simple paths,

where a simple path is one which starts from the start state and reaches

a state that it has already visited in this path or a final state.

 

The first criterion states that during testing all transitions get fired. This will also ensure that all states are visited. If a state has two incoming transitions t l and t2, and two outgoing transitions t3 and t4, then a set of test cases T that executes tl;t3 and t2;t4 will satisfy AT. However, to satisfy ATP, T must also ensure execution of tl;t4 and t2;t3. The transition tree coverage is named in this manner as a transition tree can be constructed from the graph and then used to identify the paths. In ATP, we are going beyond transitions, and stating that different paths in the state diagram should be exercised during testing. ATP will generally include AT.

For the example above, the set of test cases for AT are given below in Table 10.3. Here req() means that a request for taking the survey should be given, fail() means that the database should be failed, and recover() means that the failed database should be recovered.

 

White-Box Testing

White-box testing, on the other hand is concerned with testing the implementation of the program. The intent of this testing is not to exercise all the different input or output

conditions (although that may be a by-product) but to exercise the different programming structures and data structures used in the program.

 

i)Control Flow-Based Criteria

Let the control flow graph (or simply flow graph) of a program P be G. A node in this graph represents a block of statements that is always executed together, i.e., whenever the first statement is executed, all other statements are also executed. An edge (i, j) (from node i to node j) represents a possible transfer of control after executing the last statement of the block represented by node i to the first statement of the block represented by node j . A node

corresponding to a block whose first statement is the start statement of P is called the start node of G, and a node corresponding to a block whose last statement is an exit statement is called an exit node [129]. A path is a finite sequence of nodes (ni,n2, …,nk),k > 1, such that there is an edge (ni,ni+i) for all nodes Ui in the sequence (except the last node n/^). A complete path is a path whose first node is the start node and the last node is an exit node.

Perhaps the simplest coverage criteria is statement coverage, which requires that each statement of the program be executed at least once during testing.

This coverage criterion is not very strong, and can leave errors undetected. For example, if there is an i f statement in the program without having an else clause, the statement coverage criterion for this statement will be satisfied by a test case that evaluates the condition to true.

int abs (x)

int x;

{

i f (x >= 0) X = 0 – x;

return (x)

}

Suppose we execute the function with the set of test cases { x=0 } (i.e., the set has only one test case). The statement coverage criterion will be satisfied by testing with this set, but the error will not be revealed.

Branch coverage, which requires that each edge in the control flow graph be traversed at least once during testing. In other words, branch coverage requires that each decision in the

program be evaluated to true and false values at least once during testing. Testing based on branch coverage is often called branch testing

The trouble with branch coverage comes if a decision has many conditions in it (consisting of a Boolean expression with Boolean operators and and or). In such situations, a decision can evaluate to true and false without actually exercising all the conditions.

The module is incorrect, as it is checking for x < 200 instead of 100 (perhaps a typing error made by the programmer). Suppose the module is tested with the following set of test cases: { x = 5, x = -5 }. The branch coverage criterion will be satisfied for this module by this set.

This occurs because the decision is evaluating to true and false because of the condition (x > 0). The condition (x < 200) never evaluates to false during this test, hence the error in this

condition is not revealed.

Hence a more general coverage criterion is one that requires all possible paths in the control flow graph be executed during testing. This is called the path coverage criterion or the

all-paths criterion, and the testing based on this criterion is often called path testing.

 

ii)Data Flow-Based Testing

The basic idea behind data flow-based testing is to make sure that during testing, the definitions of variables and their subsequent use is tested.

For data flow-based criteria, a definition-use graph (def/use graph, for short) for the program is first constructed from the control flow graph of the program. A statement in a node in the flow graph representing a block of code has variable occurrences in it. A variable occurrence can be one of the following three types.

  • def represents the definition of a variable. The variable on the left-hand side of an assignment statement is the one getting defined.
  • c-use represents computational use of a variable. Any statement (e.g., read, write, an assignment) that uses the value of variables for computational purposes is said to be making c-use of the variables.
  • p-use represents predicate use. These are all the occurrences of the variables in a predicate

 

An Example

Let us illustrate the use of some of the control flow-based and data flow-based criteria through the use of an example. Consider the following example of a simple program for computing x^y for any integer x and y [129]:

  1. scanfCx, y); if ( y < 0)
  2. pow = 0 – y;
  3. else pow = y;
  4. z = 1.0;
  5. while (pow ! = 0)
  6. { z = z * X; pow = pow – 1; }
  7. if ( y < 0)
  8. z = 1.0/z;
  9. printf(z);

 

The def/use graph for this program is given in the Figure below . In the graph, the line numbers given in the code segment are used to number the nodes (each line contains all the statements of that block). For each node, the def set (i.e., the set of variables defined in the block) and the c-use set (i.e., the set of variables that have a c-use in the block) are given along

with the node. For each edge, if the p-use set is not empty, it is given in the graph.

 

 

iii)Mutation Testing

Mutation testing is another structural testing technique that differs fundamentally from the approaches discussed earlier. In hardware, testing is based on some fault models that have been developed and that model the actual faults closely. The fault models provide a set of simple faults, combination of which can model any fault in the hardware.

In mutation testing, faults of some pre-decided types are introduced in the program being tested. Testing then tries to identify those faults in the mutants.

This is assumed to hold due to the competent programmer hypothesis and the coupling effect.

The competent programmer hypothesis says that a correct program can be constructed

from an incorrect program with some minor changes in the program.

The coupling effect says that the test cases that distinguish programs with minor differences with each other are so sensitive that they will also distinguish programs with more complex differences.

In general, a mutation operator makes a small unit change in the program to produce a mutant.

Examples of mutation operators are: replace an arithmetic operator with some other arithmetic operator, change an array reference (say, from A to B), replace a constant with another constant of the same type.

As an example, consider

a mutation operator that replaces an arithmetic operator with another one

from the set {+,—,*,**,/}• If a program P contains an expression

a = b* (c— d)

Then this particular mutation operator will produce a total of eight mutants (four by replacing ‘*’ and four by replacing ‘-‘).

First a set of test cases T is prepared by the tester, and P is tested by the set of test cases in

  1. If P fails, then T reveals some errors, and they are corrected. If P does not fail during testing by T, then it could mean that either the program P is correct or that P is not correct but T is not sensitive enough to detect the faults in P.

So, if P does not fail on T, the following steps are performed.

  1. Generate mutants for P. Suppose there are N mutants.
  2. By executing each mutant and P on each test case in T, find how many mutants can be distinguished by T. Let D be the number of mutants that are distinguished; such mutants are called dead.
  3. For each mutant that cannot be distinguished by T (called a live mutant), find out which of them are equivalent to P. That is, determine the mutants that will always produce the same output as P. Let E be the number of equivalent mutants.
  4. The mutation score is computed as D/(N — E).
  5. Add more test cases to T and continue testing until the mutation score is 1.

 

Testing Process

And to validate that a change has not affected some old functionality of the system, regression testing is done. In regression testing, old test cases are executed with the expectation that the same old results will be produced. Need for regression testing places additional requirements on the testing phase; it must provide the “old” test cases and their outputs.

 

i)Levels of Testing

The basic levels are unit testing, integration testing, and system and acceptance testing. These different levels of testing attempt to detect different types of faults. The relation of the faults introduced in different phases, and the different levels of testing are shown in Figure below.

The first level of testing is called unit testing. In this, different modules are tested against the specifications produced during design for the modules. Unit testing is essentially for verification of the code produced during the coding phase, and hence the goal is to test the internal logic of the modules.

The next level of testing is often called integration testing. In this, many unit tested modules are combined into subsystems, which are then tested. The goal here is to see if the modules can be integrated properly. Hence, the emphasis is on testing interfaces between modules. This testing activity can be considered testing the design.

The next levels are system testing and acceptance testing. Here the entire software system is tested. The reference document for this process is the requirements document, and the goal is to see if the software meets its requirements.

There is another level of testing, called regression testing, that is performed when some changes are made to an existing system. We know that changes are fundamental to software; any software must undergo changes. A change is made to “upgrade” the software by adding new features and functionality. Testing also has to be done to make sure that the modification has not had any undesired side effect of making some of the earlier services faulty.

For regression testing, some test cases that have been executed on the old system are maintained, along with the output produced by the old system. These test cases are executed again on the modified system and its output compared with the earlier output to make sure that the system is working as before on these test cases.

 

ii)Test Plan

Testing commences with a test plan and terminates with acceptance testing. A test plan is a general document for the entire project that defines the scope, approach to be taken, and the schedule of testing as well as identifies the test items for the entire testing process and the personnel responsible for the different activities of testing.

The test planning can be done well before the actual testing commences and can be done in parallel with the coding and design activities. The inputs for forming the test plan are: (1) project plan, (2) requirements document, and (3) system design document.

The project plan is needed to make sure that the test plan is consistent with the overall quality plan for the project and the testing schedule matches that of the project plan.

The requirements document and the design document are the basic documents used for selecting the test units and deciding the approaches to be used during testing.

 

A test plan should contain the following:

  • Test unit specification
  • Features to be tested
  • Approach for testing
  • Test deliverables
  • Schedule and task allocation

 

A test unit is a set of one or more modules, together with associated data, that are from a single computer program and that are the object of testing. A test unit can occur at any level and can contain from a single module to the entire system. The identification of test units establishes the different levels of testing that will be performed in the project.

The basic idea behind forming test units is to make sure that testing is being performed incrementally with each increment including only a few aspects that need to be tested.

Features to be tested include all software features and combinations of features that should be tested. A software feature is a software characteristic specified or implied by the requirements or design documents. These may include functionality, performance, design constraints, and attributes.

The approach for testing specifies the overall approach to be followed in the current project. The techniques that will be used to judge the testing effort should also be specified. This is sometimes called the testing criterion or the criterion for evaluating the set of test cases used in testing.

Testing deliverables should be specified in the test plan before the actual testing begins. Deliverables could be a list of test cases that were used, detailed results of testing including the list of defects found, test summary report, and data about the code coverage. In general, a test case specification report, test summary report, and a list of defects should always be specified

as deliverables. Test case specification is discussed later. The test summary report summarizes the results of the testing activities and evaluates the results.

This schedule should be consistent with the overall project schedule. For detailed planning and execution, the different tasks in the test plan should be enumerated and allocated to test resources who are responsible for performing them.

 

iii)Test Case Specifications

Test case specification has to be done separately for each unit Based on the approach specified in the test plan, first the features to be tested for this unit must be determined.

For any review, a formal document or work product is needed. This is the primary reason for having the test case specification in the form of a document. The test case specification

document is reviewed, using a formal review process, to make sure that

the test cases are consistent with the policy specified in the plan.

The reviewers can check if all the important conditions are being tested. As conditions can

also be based on the output, by considering the expected outputs of the test cases, it can also be determined if the production of all the different types of outputs the unit is supposed to produce are being tested.

The overall approach stated in the plan is refined into specific test techniques that should be followed and into the criteria to be used for evaluation. Based on these, the test cases are specified for testing the unit. Test case specification gives, for each unit to be tested, all

test cases, inputs to be used in the test cases, conditions being tested by the test case, and outputs expected for those test cases. Test case specifications look like a table of the form shown in Figure below.

 

 

There are two basic reasons test cases are specified before they are used for testing. It is known that testing has severe limitations and the effectiveness of testing depends very heavily on the exact nature of the test cases. Even for a given criterion, the exact nature of the test cases affects the effectiveness of testing. Constructing “good” test cases that will reveal errors

in programs is still a very creative activity that depends a great deal on the ingenuity of the tester.

 

iv)Test Case Execution and Analysis

The test case specifications only specify the set of test cases for the unit to be tested. However, executing the test cases may require construction of driver modules or stubs. It may also require modules to set up the environment as stated in the test plan and test case specifications. Sometimes, the steps to be performed to execute the test cases are specified in a separate document called the test procedure specification.

Various outputs are produced as a result of test case execution for the unit under test. These outputs are needed to evaluate if the testing has been satisfactory. The most common outputs are the test summary report, and the error report. The test summary report is meant for project management, where the summary of the entire test case execution is provided.

A few metrics are very useful for monitoring testing. Testing effort is the total effort actually spent by the team in testing activities, and is an indicator of whether or not sufficient testing is being performed.

Computer time consumed during testing is another measure that can give valuable information to project management. In general, in a software development project, the computer time consumption is low at the start, increases as time progresses, and reaches a peak. Thereafter it is reduced as the project reaches its completion. Maximum computer time is consumed

during the latter part of coding and testing. By monitoring the computer time consumed, one can get an idea about how thorough the testing has been. Again, by comparing the previous buildups in computer time consumption, computer time consumption of the current project can provide valuable information about whether or not the testing is adequate.

The error report gives the list of all the defects found. The defects are generally also categorized into different categories.

 

v)Defect Logging and Tracking

Often the person who fixes a defect is different than the person who finds or reports the defect. In such a scenario, defect reporting and closing cannot be done informally. The use

of informal mechanisms may lead to defects being found but later forgotten, resulting in defects not getting removed or in extra effort in finding the defect again.

Let us understand the life cycle of a defect. A defect can be found by anyone at anytime. When a defect is found, it is logged in a defect control system, along with sufficient information about the defect. The defect is then in the state “submitted,” essentially implying that it has been logged along with information about it. The job of fixing the defect is then assigned to

some person, who is generally the author of the document or code in which the defect is found. The assigned person does the debugging and fixes the reported defect, and the defect then enters the “fixed” state. However, a defect that is fixed is still not considered as fully done. The successful fixing of the defect is verified. This verification may be done by another person

(often the submitter), or by a test team, and typically involves running some tests. Once the defect fixing is verified, then the defect can be marked as “closed.” In other words, the general life cycle of a defect has three states submitted, fixed, and closed, as shown in Figure below.

 

However, the life cycle can be expanded or contracted to suit the purposes of the project or the organization. However, just tracking each defect is not sufficient for most projects, as analysis of defect data can also be very useful for improving the quality.

Frequently defects are categorized into a few types, and the type of each defect is recorded.

The orthogonal defect classification scheme , for example, classifies defects in categories that include functional, interface, assignment, timing, documentation, and algorithm. Some of the defect types used in a commercial organization are: Logic, Standards, User Interface, Component Interface, Performance, and Documentation.

The severity of the defect with respect to its impact on the working of the system is also often divided into few categories.

Most often a four-level classification

is used. One such classification is:

  • Show stopper; affects a lot of users; can delay project.
  • Has a large impact but workaround exists; considerable amount of work needed to fix it, though schedule impact is less.
  • An isolated defect that manifests rarely and with little impact.
  • Small mistakes that don’t impact the correct working.

 

At the end of the project, ideally no open defects should remain. However, this ideal situation is often not practical for most large systems. Using severity classification, a project may have release criteria like “software can be released only if there are no critical and major bugs, and minor bugs are less than x per feature.”

 

Leave a Reply

Your email address will not be published. Required fields are marked *