Saturday, December 07, 2019

Analyze Causes For Errors

Saturday Night Live "fix it" skit.

Errors are common in software. We spend enormous amounts of resources detecting and fixing them. It is far more cost-effective to reduce their impact by preventing them from occurring in the first place. One way to do this is to analyze the causes for errors and they are detected. The causes are broadcast to all deverlers with the idea being that we are less apt to make an error of the same type as one whose cause was thoroughly analyzed and learned from.

When an error is detected there are two things to do: 1) Analyze its cause and 2) fix it. Record everything you can about the cause of the error. This is not just technical issues like, "I should have checked the passed parameter for validity before using it" or "I should have found out if I needed to execute the loop n or n-1 times before I gave it to integration testing." It is also management issues like, "I should have desk-checked before unit testing" or "If I had let Ellen check the design to see if it satisfied all the requirements when she wanted to, ..." After collecting all this information, broadcast it, letting everybody know what caused the errors, so that such knowledge can become more widespread and such errors can become less widespread.


Reference:
Kajihara, J., Amamiya, G., Saya, T., "Learning from Bugs," IEEE Software, Sept 1993.