Tuesday, March 22, 2022

Principles of Distributed System Design

Three garage bays to represent a distributed system.

Every day software engineers face the task of designing new systems or maintaining existing systems. Whether the need to make those systems distributed is due to performance or reliability requirements it hardly matters. Distributed system design needs to be considered and broken into a limited number of principles to adequately assess the tradeoffs and costs.

Below are 10 principles of distributed system design that I think do a good job summarizing and separating the problem. These are the principles that Amazon used when designing their S3 service (see reference at bottom).

▸ Decentralization: Use fully decentralized techniques to remove scaling bottlenecks and single points of failure.

▸ Asynchrony: The system makes progress under all circumstances.

▸ Autonomy: The system is designed such that individual components can make decisions based on local information.

▸ Local responsibility: Each individual component is responsible for achieving its consistency; this is never the burden of its peers.

▸ Controlled concurrency: Operations are designed such that no or limited concurrency control is required.

▸ Failure tolerant: The system considers the failure of components to be a normal mode of operation and continues operation with no or minimal interruption.

▸ Controlled parallelism: Abstractions used in the system are of such granularity that parallelism can be used to improve performance and robustness of recovery or the introduction of new nodes.

▸ Decompose into small, well-understood building blocks: Do not try to provide a single service that does everything for everyone, but instead build small components that can be used as building blocks for other services.

▸ Symmetry: Nodes in the system are identical in terms of functionality, and require no or minimal node-specific configuration to function.

▸ Simplicity: The system should be made as simple as possible, but no simpler.


Reference:

Amazon Web Services Launches

Saturday, March 05, 2022

If it ain't broke, don't fix it

A photo of the electronics inside of a microwave.

Of course the advice of "if it ain't broke, don't fix it" is applicable to many aspects of life, but it is particularly applicable to software. By its very name, software is considered malleable, easily modified. Don't be fooled into thinking that it is either easy to see or repair a "break" in software.

Suppose you are maintaining a system. You are examining the source code of a component. You are either trying to enhance it or seeking the cause of an error. While examining it, you detect what you believe is another error. Do not try to "repair" it. The probability is very high that you will introduce an error, not fix one. Instead, file a change request. Hopefully, the configuration control and associated technical reviews will determine if it is an error and what priority its repair should be given.


Reference:
Reagan, R., More Programming Pearls, Reading, MA: Addison-Wesley, 1988.

Tuesday, March 01, 2022

Validation and Verification


Large software developments need as many checks and balances as possible to ensure a quality product. One proven technique is the use of an organization independent of the development team to conduct validation and verification (V&V). Validation is the process of examining each intermediate product of software development to ensure that it conforms to the previous product. For example, validation ensures that software requirements meet system requirements, that high-level software design satisfies all the software requirements (and none other), that algorithms satisfy the component's external specification, that code implements the algorithms, and so on. Verification is the process of examining each intermediate product of software development to ensure that it satisfies the requirements.

You can think of V&V as a solution to the children's game of telephone. In telephone, a group of children form a chain and a specific oral message is whispered down the line. At the end, the last child tells what he or she heard, and it is rarely the same as the initial message. Validation would cause each child to ask the previous child, "Did you say x?" Verification would cause each child to ask the first child, "Did you say x?"

On a project, V&V should be planned early. It can be documented in the quality assurance plan or it can exist in a separate V&V plan. In either case, its procedures, players, actions, and results should be approved at roughly the same time the software requirements specification is approved.


Reference:

Wallace, D. and Fujii, R., "Software Verification and Validation: An Overview," IEEE Software, May 1989.