Every day software engineers face the task of designing new systems or maintaining existing systems. Whether the need to make those systems distributed is due to performance or reliability requirements it hardly matters. Distributed system design needs to be considered and broken into a limited number of principles to adequately assess the tradeoffs and costs.
Below are 10 principles of distributed system design that I think do a good job summarizing and separating the problem. These are the principles that Amazon used when designing their S3 service (see reference at bottom).
▸ Decentralization: Use fully decentralized techniques to remove scaling bottlenecks and single points of failure.
▸ Asynchrony: The system makes progress under all circumstances.
▸ Autonomy: The system is designed such that individual components can make decisions based on local information.
▸ Local responsibility: Each individual component is responsible for achieving its consistency; this is never the burden of its peers.
▸ Controlled concurrency: Operations are designed such that no or limited concurrency control is required.
▸ Failure tolerant: The system considers the failure of components to be a normal mode of operation and continues operation with no or minimal interruption.
▸ Controlled parallelism: Abstractions used in the system are of such granularity that parallelism can be used to improve performance and robustness of recovery or the introduction of new nodes.
▸ Decompose into small, well-understood building blocks: Do not try to provide a single service that does everything for everyone, but instead build small components that can be used as building blocks for other services.
▸ Symmetry: Nodes in the system are identical in terms of functionality, and require no or minimal node-specific configuration to function.
▸ Simplicity: The system should be made as simple as possible, but no simpler.
Reference: