Jerry Yoakum: Cogitation about Computing: testing

Showing posts with label testing. Show all posts

Friday, April 29, 2022

Prerelease Errors Yield Post-Release Errors

Software with a lot of prerelease errors will also have a lot of post-release errors. This is bad news for developers, but fully supported by empirical data*. The more errors you find in a product, the more you will still find. The best advice is to replace any component with a poor history. Don't throw good money after bad.

If your organization keeps impeccable records then you could use Bayes' theorem to come up with a probability of the number of errors still left in a software component by comparing the current component to a larger population of similar components. More than likely you don't have that kind of data to draw from so you can use this pessimistic heuristic:

How ever many errors have been fixed in the component before release is how many more errors you can expect to still have.

A more optimistic and, hopefully, more accurate approach would be to apply the above heuristic to a quarter-to-quarter or month-to-month timeline. I'm suggesting basic trend projection forecasting so you could get much more creative if you wanted to. However, to my experience, businesses don't want to spend the time and effort needed to gather and process data to do anything more complex. And, there are concerns about employee satisfaction if you go down this rabbit hole because eventually you'll start tying error rates to certain combinations of people on development teams. The slope gets slippery and ROI diminishes. K. I. S. S.

Back to trend project forecasting, when your software reaches the point where your measurement period has zero errors, don't declare the software free of bugs. Instead adjust your measurement period. For example, last month no errors were found so it is time to switch to measuring quarters and so on. This is useful when a stakeholder asks how many bugs are left in the software, and they will ask. You can say, "We don't expect to find any new errors next month but it's probable that we'll find X new errors over the next quarter." If you are really on top of your game, you'll track the variance of errors from period-to-period then you can provide a min-max range (see the graph above).

* Software Defect Removal by Robert Dunn (1984).

Tuesday, March 01, 2022

Validation and Verification

Large software developments need as many checks and balances as possible to ensure a quality product. One proven technique is the use of an organization independent of the development team to conduct validation and verification (V&V). Validation is the process of examining each intermediate product of software development to ensure that it conforms to the previous product. For example, validation ensures that software requirements meet system requirements, that high-level software design satisfies all the software requirements (and none other), that algorithms satisfy the component's external specification, that code implements the algorithms, and so on. Verification is the process of examining each intermediate product of software development to ensure that it satisfies the requirements.

You can think of V&V as a solution to the children's game of telephone. In telephone, a group of children form a chain and a specific oral message is whispered down the line. At the end, the last child tells what he or she heard, and it is rarely the same as the initial message. Validation would cause each child to ask the previous child, "Did you say x?" Verification would cause each child to ask the first child, "Did you say x?"

On a project, V&V should be planned early. It can be documented in the quality assurance plan or it can exist in a separate V&V plan. In either case, its procedures, players, actions, and results should be approved at roughly the same time the software requirements specification is approved.

Reference:

Wallace, D. and Fujii, R., "Software Verification and Validation: An Overview," IEEE Software, May 1989.

Monday, August 23, 2021

Rotate People Through Product Assurance

In many organizations, people are moved into product assurance teams as a first assignment or after they have demonstrated poor performance at engineering software. Product assurance, however, requires the same level of engineering quality and discipline as designing and coding. As an alternative, rotate the best engineering talent through the product assurance team. A good guideline might be that every excellent engineer spends six months in product assurance organization every two or three years. The expectation of all such engineers is that they will make significant improvements to product assurance during their "visit." Such a policy must clearly state that the job rotation is a reward for excellent performance.

Reference:

Mendis, K., "Personnel Requirements to Make Software Quality Assurance Work," in Handbook of Software Quality Assurance, New York: Van Nostrand Reinhold, 1987.

Sunday, May 16, 2021

Product Assurance Is Not a Luxury

Product assurance includes:

software configuration management
software quality assurance
verification and validation
testing

Of the four, the one whose necessity is most often acknowledged, and under-budgeted, is testing. The other three are quite often dismissed as luxuries, as aspects of only large or expensive projects. The checks and balances these disciplines provide result in a significantly higher probability of producing a product that satisfies customer expectations and that is completed closer to schedule and cost goals. The key is to tailor the product assurance disciplines to the project in size, form, and content.

Reference:

Siegel, S., "Why We Need Checks and Balances to Assure Quality," Quality Time Column, IEEE Software, January 1992.

Friday, January 24, 2020

Brooks' Law

Measuring a project solely by person-months makes little sense. If a project could be completed in one year by six people, does that mean that 72 people could complete it in one month? Of course not!

Suppose you have 10 people working on a project that is due for completion in three months. You now believe your are three months behind schedule; that is, you estimate you need 60 more person-months (6 months x 10 people). You cannot add 10 more people and expect the project to be back on schedule. In fact, adding 10 more people would likely delay the project further due to additional training and communications overhead.

Approximately a decade ago, I worked on a big project alone. I approached it as proof-of-concept and focused on getting everything to work. Since ensuring that everything worked was my goal I didn't devote a lot of time to good object-oriented programming practices... I'm not going to apologize for that. I still think I did the right thing to ensure that the project worked. Half of the work I was doing was testing another team's API and helping them get it right. However, the project managers really screwed me over by announcing the project finished when reality I had only finished testing the API provided by the other team. My boss put 20 people on the project to finish it in the next month. This could have been a disaster but she did an amazing job coordinating everyone's efforts. While all those people were helpful with writing documentation and performing testing while the code was being rewritten. It really came down to one person, Charles Forsythe, who took the largely procedural code that I had written and turned it into high quality OO code.

This project would have gone more smoothly had we intentionally planned to throw away my prototype. Also, it is generally a bad idea to try to retrofit quality. It is either a testament that my prototype was of high quality code just not of the required OO paradigm and/or that Charles' coding ability overcame the difficulty of performing a retrofit.

The point of that story is that from the outside it looks like 20 people were thrown at the project to finish in a month. In reality, 1 person finished the project in a month and 19 people cleared the way and focused on tasks that multiplied that 1 person's impact. By doing it this way, my manager avoided the additional training and communications overhead. If they all would have tried to develop code for the project then it would have failed due to Brooks' Law. This isn't a guaranteed way around Brooks' Law but it is a good way to reframe development problems.

Reference:
Brooks, F., The Mythical Man-Month, Reading, MA: Addison-Wesley, 1975.

Saturday, December 07, 2019

Analyze Causes For Errors

Errors are common in software. We spend enormous amounts of resources detecting and fixing them. It is far more cost-effective to reduce their impact by preventing them from occurring in the first place. One way to do this is to analyze the causes for errors and they are detected. The causes are broadcast to all deverlers with the idea being that we are less apt to make an error of the same type as one whose cause was thoroughly analyzed and learned from.

When an error is detected there are two things to do: 1) Analyze its cause and 2) fix it. Record everything you can about the cause of the error. This is not just technical issues like, "I should have checked the passed parameter for validity before using it" or "I should have found out if I needed to execute the loop n or n-1 times before I gave it to integration testing." It is also management issues like, "I should have desk-checked before unit testing" or "If I had let Ellen check the design to see if it satisfied all the requirements when she wanted to, ..." After collecting all this information, broadcast it, letting everybody know what caused the errors, so that such knowledge can become more widespread and such errors can become less widespread.

Reference:
Kajihara, J., Amamiya, G., Saya, T., "Learning from Bugs," IEEE Software, Sept 1993.

Tuesday, October 22, 2019

Don't Integrate Before Unit Testing

Under normal circumstances components are separately unit-tested. As they pass their unit tests, a separate team integrates them into meaningful sets to exercise their interfaces. Components that have not been separately unit-tested are often integrated into the subsystem in a vein attempt to recapture a lost schedule. Such attempts actually cause more schedule delays. This is because a failure of a subsystem to satisfy an integration test plan may be caused now either by a fault in the interface or by a fault in the previously untested component. And much time is spent trying to determine which is the cause.

If you are managing a project, you can do a variety of things to avoid this situation. First and foremost is to develop an integration test plan early (for example, very soon after high-level design is complete). This plan should specify which components are most important to integrate first and in what order components may be integrated. Once you have written this down, allocate appropriate resources to coding and unit testing of specific high-priority components to ensure that integration testers don't spend an inordinate amount of time idle. Second, as it becomes evident that important components for integration testing are going to be unavailable as needed, have the integration testers start developing temporary scaffolding software to simulate the missing components.

Reference:
Dunn, R., Software Defect Removal, New York: McGraw-Hill, 1984.

Friday, September 27, 2019

Achieve Effective Test Coverage

In spite of the fact that testing cannot prove correctness, it is still important to do a through job testing. Metrics exist to determine how thoroughly the code was exercised during test plan generation or test execution. These metrics are easy to use, and tools exist to monitor the coverage level of tests. Some examples include:

Statement coverage, which measures the percentage of statements that have been executed at least once.
Branch coverage, which measures the percentages of branches in a program that have been executed.
Path coverage, which measures how well the possible paths have been exercised.

Just remember that, although "effective" coverage is better than no coverage at all, do not fool yourself into thinking that the program is "correct" by any definition (see Testing Exposes Presence of Flaws).

Reference:

Weiser, M., Gannon, J., and McMullin, P., "Comparison of Structural Test Coverage Metrics," IEEE Software, March 1985.

Thursday, September 26, 2019

Use Effective Test Completion Measures

Graph showing the progress from an imperfect product to a perfect product according to the level of effort.

Many projects proclaim the end of testing when they run out of time. This may make political sense, but it is irresponsible. During test planning, define a measure that can be used to determine when testing should be completed. If you have not met your goal when time runs out, you can still make the choice of whether to ship the product or slip the milestone, but at least you know whether you are shipping a quality product.

Two ideas for this effective measurement of test progress are:

Rate of new error detections per week.
After covertly seeding the software with known bugs (called bebugging), the percentage of these seeded bugs thus far found.

An ineffective measure of test progress is the percentage of test cases correctly passed (unless, of course, you know that the test cases superbly cover the requirements).

Reference:

Dunn, R., Software Defect Removal, New York: McGraw-Hill, 1984.

Monday, September 23, 2019

Use McCabe Complexity Measure

A confused man asking what cyclomatic complexity means.

Although many metrics are available to report the inherent complexity of software, none is as intuitive and as easy-to-use as Tom McCabe's cyclomatic number measure of testing complexity. While it is not absolutely foolproof, it results in fairly consistent predictions of testing difficulty. Simply draw a graph of your program, in which nodes correspond to sequences of instructions and arcs correspond to non-sequential flow of control. McCabe's metric is simply e-n+2p where e is the number of arcs, n is the number of nodes, and p is the number of independent graphs you are examining. This is so simple that there is really no excuse not to use it.

Use McCabe on each module to help assess unit testing complexity. Also, use it at the integration testing level where each procedure is a node and each invocation path is an arc to help assess integration testing complexity.

Reference:
McCabe, T., "A Complexity Measure," IEEE Transactions on Software Engineering, Dec 1976.

Friday, September 20, 2019

The Big Bang Theory Does Not Apply

As a project nears its delivery deadline and the software is not ready, desperation often takes over. Suppose the schedule called for two months of unit testing, two months of integration testing, and two months of software system testing. It is now one month from the scheduled delivery. Suppose 50% of the components have been unit tested. A back-of-the-envelope calculation indicates that you are five months behind schedule. You have two choices:

Admit the five-month delay to your customer: Ask for a postponement.
Put all the components together (including the 50% not yet unit tested) and hope for the best.

In the first case, you are admitting defeat, perhaps prematurely. In the eyes of your managers, you might be giving up before you've done everything in your power to overcome the problem. In the second case, there might be a chance that, when you put it all together, it will work and you'll be back on schedule. Project managers often succumb to the latter because it looks like they are trying everything before admitting defeat. Unfortunately, this will probably add six more months to your schedule since you'll be trying to retrofit quality. You cannot save time by omitting unit and integration testing.

Reference:

Weinberg, G., Quality Software Management, Volume 1: Systems Thinking, New York: Dorset House, 1992.

Thursday, September 19, 2019

Always Stress Test

Software design often behaves just fine when confronted with "normal" loads of inputs of stimuli. The true test of software is whether it can stay operational when faced with severe loads. These severe loads are often stated in the requirements as "maximum of x simultaneous widgets" or "maximum of x new widget arrivals per hour."

If the requirements state that the software shall handle up to x widgets per hour, you must verify that the software can do this. In fact, not only should you test that it handles x widgets, you should also subject the software to x+1 or x+2 (or more) widgets to see what happens and determine when environment violates "acceptable" behavior. After all, the system may not be able to control its environment, and you do not want the software to crash when the environment "misbehaves" in an unexpected manner.

Reference:
Myers, G., The Art of Software Testing, New York: John Wiley & Sons, 1979.

Wednesday, September 18, 2019

Test Invalid Input

It is natural and common to produce test cases for as many acceptable input scenarios as possible. What is equally important -- but also uncommon -- is to produce an extensive set of test cases for all invalid or unexpected input.

For a simple example, let us say we are writing a program to sort lists of integers in the range of 0 to 100. Test lists should include some negative numbers, some nonintegral numbers, some alphabetic data, some null entries, and so on.

Reference:
Myers, G., The Art of Software Testing, New York: John Wiley & Sons, 1979.

Thursday, August 22, 2019

A Test Case Includes Expected Results

Documentation for a test case must include the detailed description of the expected correct results. If these are omitted, there is no way for the tester to determine whether the software succeeded or failed. Furthermore, a tester may assess an incorrect result as correct because there is always a subconscious desire to see a correct result. Even worse, a tester may assess a correct result as incorrect, causing a flurry of designer and programmer activity to "repair" the correct code.

Develop an organization standard for test plans that demands the documentation of expected intermediate and final test case results. Your quality assurance organization should confirm that all test plans conform to the standard.

Reference:
Myers, G., The Art of Software Testing, New York: John Wiley & Sons, 1979.

Monday, August 19, 2019

Use Black-box And White-box Testing

A black and a white box representing the need to perform black-box and white-box testing. (Jerry Yoakum)

Black-box testing uses the specification of a component's external behavior as its only input. It is mandatory to determine if the software does what it is supposed to do and doesn't do what it is not supposed to do. White-box testing uses the code itself to generate test cases. Thus white-box testing might demand, for example, that a certain level of code coverage is obtained. Be aware; however, that even with both black-box and white-box testing, testing can make use of only a small subset of possible data values from the input domain.

To demonstrate how black-box and white-box testing complement each other, let's look at an example. Let's say a procedure's specification states that it should print the sum of all numbers in an input list. When programmed, it looks for one input of 213 and, if it finds it, sets the sum equal to zero. Since that was not in the specification, there is no way to find the error by black-box testing except by accident. White-box testing would demand that paths are more adequately tested, and thus would probably detect the "213" situation. By combining black-box and white-box, you maximize the effectiveness of testing. Neither one by itself does a through test.

Reference:
Dunn, R., Software Defect Removal, New York: McGraw-Hill, 1984.

Thursday, August 15, 2019

Track Errors To Find More Errors

Conservative estimates indicate that, in large systems, approximately half of all software errors are found in 15% of the modules, and 80% of all software errors are found in 50% of the modules. More dramatic results from Gary Okimoto and Gerald Weinberg indicate that 80% of all errors were found in just 2% of the modules. Thus, when testing software, you might consider that, where you find errors, you will probably find more.

Maintain logs not only of how many errors are found per time period for the project, but also how many errors are found per module. When history shows a module to be highly error-prone, you are probably better off rewriting it from scratch, with an emphasis on simplicity rather than cleverness.

References:
Okimoto, G. and Weinberg, G., Quality Software Management, Vol. 1: Systems Thinking, New York: Dorset House, 1992.

Endres, A., "An Analysis of Errors and Their Causes in System Programs," IEEE Transactions on Software Engineering, June 1975.

Tuesday, August 13, 2019

A Successful Test Finds An Error

I have often heard a programmer or tester gleefully declare, "Great news! My test was successful. The program ran correctly." This is the wrong attitude to have when running a test. It also supports the position that programmers should never be the sole testers of their own software. A more constructive attitude is that one is testing to find errors. Thus, a successful test is one that detects an error. Look at the analogous situation with a medical test. Suppose you are feeling ill. The doctor sends a sample of your blood to a laboratory. A few days later, the doctor calls to tell you, "Great news! Your blood was normal." That is not great news. You are sick or you wouldn't have gone to the doctor. A successful blood test reports what's wrong with you. The software has [the potential for] bugs. A successful test reports how these bugs manifest themselves. When generating test plans, you should first select tests based on the likelihood that they will find faults. Only after those tests are written and ran should you write tests for the sake of code coverage.

Reference
Goodenough, J., and Gerhard, S., "Toward a Theory of Test Data Selection," IEEE Transactions on Software Engineering, June 1975.