Saturday, December 07, 2019

Analyze Causes For Errors

Saturday Night Live "fix it" skit.

Errors are common in software. We spend enormous amounts of resources detecting and fixing them. It is far more cost-effective to reduce their impact by preventing them from occurring in the first place. One way to do this is to analyze the causes for errors and they are detected. The causes are broadcast to all deverlers with the idea being that we are less apt to make an error of the same type as one whose cause was thoroughly analyzed and learned from.

When an error is detected there are two things to do: 1) Analyze its cause and 2) fix it. Record everything you can about the cause of the error. This is not just technical issues like, "I should have checked the passed parameter for validity before using it" or "I should have found out if I needed to execute the loop n or n-1 times before I gave it to integration testing." It is also management issues like, "I should have desk-checked before unit testing" or "If I had let Ellen check the design to see if it satisfied all the requirements when she wanted to, ..." After collecting all this information, broadcast it, letting everybody know what caused the errors, so that such knowledge can become more widespread and such errors can become less widespread.


Reference:
Kajihara, J., Amamiya, G., Saya, T., "Learning from Bugs," IEEE Software, Sept 1993.

Friday, December 06, 2019

Instrument Your Software

New Relic is just one of many monitoring systems that can instrument your software.

When testing software, it is often difficult to determine why the software failed. One way of uncovering the reasons is to instrument your software, that is, embed special instructions in the software that report traces, anomalous conditions, procedure calls, and the like. Of course, if your debugging system provides these capabilities, don't instrument manually.

Reference:
Huang, J., "Program Instrumentation and Software Testing," IEEE Computer, April 1978.

Monday, December 02, 2019

Rust by Jonathan Waldman

Rust: The Longest WarRust: The Longest War by Jonathan Waldman
My rating: 5 of 5 stars

Rust (corrosion) is a constant companion to all life on Earth. It seems to happen so slowly that we fool ourselves into thinking that we can deal with it later. This book covers corrosion across the range of human experience. It interesting due to how tremendous of an impact corrosion has on modern life. A lot of the book is detailed and filled with facts. It slowed my reading down as I tried to absorb and remember information.

View all my reviews

Sunday, October 27, 2019

Who Goes There

Who Goes There?Who Goes There? by John W. Campbell Jr.
My rating: 5 of 5 stars

An excellent short book for October as Halloween quickly approaches. A lot of the language describing the shapeshifting alien so much reminds me of how they described and detected for Changelings in Star Trek: Deep Space 9 that I suspect that those writers were also fans.

View all my reviews

Tuesday, October 22, 2019

Don't Integrate Before Unit Testing

Under normal circumstances components are separately unit-tested. As they pass their unit tests, a separate team integrates them into meaningful sets to exercise their interfaces. Components that have not been separately unit-tested are often integrated into the subsystem in a vein attempt to recapture a lost schedule. Such attempts actually cause more schedule delays. This is because a failure of a subsystem to satisfy an integration test plan may be caused now either by a fault in the interface or by a fault in the previously untested component. And much time is spent trying to determine which is the cause.

If you are managing a project, you can do a variety of things to avoid this situation. First and foremost is to develop an integration test plan early (for example, very soon after high-level design is complete). This plan should specify which components are most important to integrate first and in what order components may be integrated. Once you have written this down, allocate appropriate resources to coding and unit testing of specific high-priority components to ensure that integration testers don't spend an inordinate amount of time idle. Second, as it becomes evident that important components for integration testing are going to be unavailable as needed, have the integration testers start developing temporary scaffolding software to simulate the missing components.


Reference:
Dunn, R., Software Defect Removal, New York: McGraw-Hill, 1984.

Monday, October 14, 2019

Angle of Attack

Cover of the book "Flying Beyond the Stall"I've been reading Flying Beyond the Stall: The X-31 and the Advent of Supermaneuverability by Douglas A. Joyce. Very quickly what struck me was that the issue that caused the X-31 to crash in 1995 was incorrect Angle of Attack (AoA) sensor data causing the software to override the pilot and crash the plane. Which was the same thing that happened to the Boeing 737 Max planes that crashed.

Maybe it is because I'm getting older but I'm getting more and more concerned about how the Tech industry forces older engineers out and tries to replace them with entry-level programmers. This isn't an natural issue of passing the baton. This is an issue of trying to replace highly skilled and experienced people with minimally skilled and no experience people.

So many of the software issues that we fight today have already been solved. If we just take the time to learn from the past then we could avoid repeating these mistakes.

Head-on view of a Boeing 737 Max aircraft with the Angle of Attack (AoA) sensor highlighted.

Sunday, October 06, 2019

Spaceman

Spaceman: An Astronaut's Unlikely Journey to Unlock the Secrets of the UniverseSpaceman: An Astronaut's Unlikely Journey to Unlock the Secrets of the Universe by Mike Massimino
My rating: 5 of 5 stars

Wow! Who hasn't dreamed of spaceflight?! I loved the humanity that Massimino brings to the subject. This is a story that is told from his perspective but goes through many subjects that many people either relate to or want to know more about - childhood dreams, college, work, NASA, flight, spaceflight, parenthood, etc. It is wonderful to see those things through Massimino's eyes. Space exploration is awesome, and it needs people like Massimino to share the story; to bring us all along.

View all my reviews

Wednesday, October 02, 2019

Starman Jones

Starman Jones (Heinlein's Juveniles, #7)Starman Jones by Robert A. Heinlein
My rating: 5 of 5 stars

I loved the concern that Max had for his library book. That really hooked me into the book. After that it was a fast, fun ride.

View all my reviews

Monday, September 30, 2019

SuperFreakonomics

SuperFreakonomics: Global Cooling, Patriotic Prostitutes And Why Suicide Bombers Should Buy Life InsuranceSuperFreakonomics: Global Cooling, Patriotic Prostitutes And Why Suicide Bombers Should Buy Life Insurance by Steven D. Levitt
My rating: 5 of 5 stars

Second time reading and still worth the read. It can be a little disheartening to know that some (many / maybe most) of the things discussed didn't make it into mainstream in the past 10 years. That is a lesson in itself and the concept is supported by a few of the stories in SuperFreakonomics.

View all my reviews

Friday, September 27, 2019

Achieve Effective Test Coverage

In spite of the fact that testing cannot prove correctness, it is still important to do a through job testing. Metrics exist to determine how thoroughly the code was exercised during test plan generation or test execution. These metrics are easy to use, and tools exist to monitor the coverage level of tests. Some examples include:
  1. Statement coverage, which measures the percentage of statements that have been executed at least once.
  2. Branch coverage, which measures the percentages of branches in a program that have been executed.
  3. Path coverage, which measures how well the possible paths have been exercised.
Just remember that, although "effective" coverage is better than no coverage at all, do not fool yourself into thinking that the program is "correct" by any definition (see Testing Exposes Presence of Flaws).


Reference:
Weiser, M., Gannon, J., and McMullin, P., "Comparison of Structural Test Coverage Metrics," IEEE Software, March 1985.