# Patterns of Distributed Systems

# Service-Oriented Architecture (SOA)

SOA is a design pattern that involves breaking down a system into smaller, independent services that can communicate with each other. Each service performs a specific function and can be developed, deployed, and scaled independently. SOA is a popular pattern used in distributed systems as it allows for better scalability, fault tolerance, and flexibility.

# MapReduce

MapReduce is a pattern that involves breaking down a large data processing task into smaller, independent tasks that can be processed in parallel. This pattern is commonly used in distributed systems to improve performance and scalability.

# 💎 Event-Driven Architecture (EDA) Patterns

# Event Notification

The event notification pattern is the most basic pattern used in EDA. In this pattern, an event is generated by a component and sent to other components that have subscribed to it. The subscribers can then process the event and take appropriate actions. This pattern is useful when a component needs to notify other components about a change in its state or when a specific condition is met.

# Event-Carried State Transfer

The event-carried state transfer (ECST) pattern is used to transfer the state of an object between different components in an event-driven system. In this pattern, the state of an object is encapsulated in an event and sent to other components. The receiving components can then use the state to update their own state. This pattern is useful when a component needs to share its state with other components in the system.

# Event Sourcing

Event sourcing is a pattern used to store the state of an object as a sequence of events. In this pattern, every change to the state of an object is captured as an event and stored in an event log. The current state of the object can then be reconstructed by replaying the events in the log. This pattern is useful when a system needs to maintain a complete history of changes to an object's state.

# CQRS

Command Query Responsibility Segregation (CQRS) is a pattern used to separate the read and write operations of a system. In this pattern, the read operations are handled by a separate component from the write operations. The write operations generate events that are sent to the read component, which updates its state accordingly. This pattern is useful when a system needs to handle a large number of read and write operations.

# Saga

The saga pattern is used to manage long-running transactions in an event-driven system. In this pattern, a saga is a sequence of events that are executed in a specific order. If an event fails, the saga can be rolled back to a previous state. This pattern is useful when a system needs to handle complex transactions that involve multiple components.

# 💎 Fault-Tolerance

# Redundancy

Redundancy is the duplication of critical components or data to ensure that the system can continue operating even if some components fail. There are several types of redundancy:

  • Hardware redundancy: multiple instances of hardware components are used to ensure that the system can continue operating even if some components fail. For example, a distributed database may have multiple servers that store the same data.
  • Software redundancy: multiple instances of software components are used to ensure that the system can continue operating even if some components fail. For example, a load balancer may have multiple instances that distribute requests to multiple servers.
  • Data redundancy: multiple copies of data are stored to ensure that the system can continue operating even if some data is lost. For example, a distributed file system may store multiple copies of files on different servers.

# Replication

Replication is the process of copying data or components to multiple locations to ensure that the system can continue operating even if some components fail. There are several types of replication:

  • Active-active replication: multiple instances of components are active at the same time and handle requests concurrently. For example, a distributed database may have multiple servers that handle read and write requests.
  • Active-passive replication: one instance of a component is active and handles requests, while other instances are passive and only become active if the active instance fails. For example, a load balancer may have one active instance that distributes requests to multiple servers, and one passive instance that becomes active if the active instance fails.

# Graceful Degradation

Graceful degradation is a pattern that allows a system to continue operating even if some components are unavailable or degraded. The system detects the unavailability or degradation of components and adjusts its behavior to minimize the impact on users. For example, a search engine may return fewer results or slower results if some servers are unavailable.

# Retry pattern

The Retry pattern is used to handle transient failures in a distributed system. It works by retrying a failed operation a certain number of times before giving up. This allows the system to recover from temporary failures and continue to function.

Read more

# Timeout Pattern

The Timeout pattern is used to prevent a distributed system from becoming unresponsive due to long-running operations. It works by setting a timeout for each operation and aborting the operation if it exceeds the timeout. This allows the system to continue to function even if some operations fail.

# Bulkhead

The Bulkhead pattern is used to isolate failures in a distributed system. It works by partitioning the system into separate compartments, each with its own resources and failure modes. This allows failures in one compartment to be contained and not affect other compartments.

# Failover

Failover is a pattern that involves switching to a backup system when the primary system fails. This can be achieved through hardware redundancy or through software redundancy, where multiple instances of a component are run on different machines.

# Circuit Breaker

A circuit breaker is a pattern that prevents a component from repeatedly failing and causing cascading failures in the system. The circuit breaker monitors the number of failures that occur within a certain period of time, and if the number exceeds a threshold, it trips the circuit and stops sending requests to the component. This allows the component to recover and prevents it from causing further failures in the system.

# 💎 Communication Patterns

# Request-Reply Pattern

The request-reply pattern is one of the most common communication patterns used in distributed systems. In this pattern, a client sends a request to a server, and the server responds with a reply. The request-reply pattern is synchronous, meaning that the client waits for the server to respond before continuing with its execution. This pattern is commonly used in client-server architectures, where the client sends a request to the server and waits for a response.

# Remote Procedure Call (RPC)

RPC is a pattern that involves calling a function or method on a remote system as if it were a local function or method. RPC is a popular pattern used in distributed systems as it allows for better scalability, fault tolerance, and decoupling of components.

# Message-Oriented Middleware (MOM)

MOM is a pattern that involves using a messaging system to facilitate communication between different components of a distributed system. Messages are sent between components asynchronously, which means that the sender does not have to wait for a response before continuing with other tasks. MOM is a popular pattern used in distributed systems as it allows for better scalability, fault tolerance, and decoupling of components.

# Publish-Subscribe

Publish-Subscribe is a pattern that involves sending messages to a group of subscribers who have expressed interest in receiving those messages. This pattern is commonly used in distributed systems to facilitate communication between different components.

# Message Queue Pattern

The message queue pattern is a communication pattern used in distributed systems to decouple the sender and receiver of a message. In this pattern, a sender sends a message to a message queue, and the receiver retrieves the message from the queue. The message queue pattern is asynchronous, meaning that the sender does not wait for the receiver to receive the message before continuing with its execution. This pattern is commonly used in microservices architectures, where services communicate with each other through message queues.

# 💎 Optimizing Performance

# Caching

Caching is a technique used to store frequently accessed data in memory, reducing the need to fetch it from the network or disk. In distributed systems, caching can be used to reduce the number of requests made to remote services, improving response times and reducing network traffic. There are several caching strategies, including:

  • Client-side caching: where the client caches responses from the server.
  • Server-side caching: where the server caches responses from the database or other services.
  • Distributed caching: where multiple nodes in the system share a cache.

# Load Balancing

Load balancing is a technique used to distribute incoming requests across multiple servers, ensuring that no single server is overloaded. Load balancing can improve performance by reducing response times and increasing availability. There are several load balancing strategies, including:

  • Round-robin: where requests are distributed evenly across servers.
  • Least connections: where requests are sent to the server with the fewest active connections.
  • IP hash: where requests are sent to the server based on the client's IP address.

# Asynchronous Processing

Asynchronous processing is a technique used to improve performance by allowing tasks to be executed in the background, freeing up resources for other tasks. In distributed systems, asynchronous processing can be used to improve response times and reduce network traffic. There are several asynchronous processing strategies, including:

  • Message queues: where tasks are added to a queue and processed asynchronously.
  • Event-driven architecture: where tasks are triggered by events and processed asynchronously.
  • Actor model: where tasks are executed by actors, which are independent units of computation.

# Data Partitioning

Data partitioning is a technique used to split data across multiple servers, improving performance by reducing the amount of data that needs to be processed by each server. In distributed systems, data partitioning can be used to improve scalability and reduce response times. There are several data partitioning strategies, including:

  • Horizontal partitioning: where data is split by rows.
  • Vertical partitioning: where data is split by columns.
  • Hash partitioning: where data is split based on a hash function.

# Replication

Replication is a technique used to create copies of data across multiple servers, improving performance by reducing the need to fetch data from remote servers. In distributed systems, replication can be used to improve availability and reduce response times. There are several replication strategies, including:

  • Master-slave replication: where one server is designated as the master and all writes are sent to it, while reads can be sent to any of the slaves.
  • Master-master replication: where multiple servers act as masters and can both read and write data.
  • Multi-master replication: where multiple servers can both read and write data, and conflicts are resolved using a consensus algorithm.

# 💎 Idempotency Patterns

# Idempotency Keys

One common pattern for achieving idempotency is to use idempotency keys. An idempotency key is a unique identifier that is associated with a particular operation. When an operation is performed, the idempotency key is recorded, along with the result of the operation. If the operation is repeated with the same idempotency key, the system will recognize that the operation has already been performed and return the same result as before. This pattern is commonly used in REST APIs, where a client can include an idempotency key in a request header to ensure that the request is idempotent.

# Idempotent State Machines

Another pattern for achieving idempotency is to use idempotent state machines. In this pattern, the system maintains a state machine that tracks the progress of an operation. Each state in the state machine is idempotent, meaning that it can be repeated without changing the result beyond the initial application. When an operation is performed, the system transitions to the next state in the state machine. If the operation is repeated, the system will recognize that it has already transitioned to the next state and will not repeat the transition. This pattern is commonly used in distributed systems that perform complex operations, such as payment processing or order fulfillment.

# 💎 Patterns for concurrency

# Locking

Locking is a common pattern used to manage concurrency in distributed systems. It involves acquiring a lock on a shared resource to prevent other processes from accessing it simultaneously. Locking can be implemented using various techniques such as mutual exclusion, semaphores, and monitors. However, locking can also introduce performance overhead and can lead to deadlocks if not implemented correctly.

# Message Passing

Message passing is another pattern used to manage concurrency in distributed systems. It involves sending messages between processes to coordinate their activities. Message passing can be implemented using various techniques such as remote procedure calls (RPC), message queues, and publish-subscribe systems. Message passing can be more efficient than locking in some cases, as it allows processes to communicate without the need for shared resources.

# Event-driven Architecture

Event-driven architecture is a pattern that involves processing events asynchronously. It involves decoupling the processing of events from the generation of events, allowing processes to handle events independently. Event-driven architecture can be implemented using various techniques such as event sourcing, event-driven messaging, and reactive programming. Event-driven architecture can be more scalable than locking or message passing in some cases, as it allows processes to handle events independently and asynchronously.

# Actor Model

The actor model is a pattern that involves modeling concurrent processes as actors. Each actor has its own state and can communicate with other actors by sending messages. The actor model can be implemented using various techniques such as Akka, Erlang, and Orleans. The actor model can be more scalable than locking or message passing in some cases, as it allows processes to handle messages independently and asynchronously.

# Data Partitioning

Data partitioning is a pattern that involves partitioning data across multiple nodes in a distributed system. Each node is responsible for a subset of the data, allowing processes to access data independently and concurrently. Data partitioning can be implemented using various techniques such as sharding, consistent hashing, and range partitioning. Data partitioning can be more scalable than locking or message passing in some cases, as it allows processes to access data independently and concurrently.

# 💎 Patterns for Increasing Concurrency

# 1. Thread Pool Pattern

The Thread Pool pattern is a concurrency pattern that involves creating a pool of threads that can be used to execute tasks. When a task is submitted to the system, it is assigned to an available thread in the pool. This pattern is useful for systems that have a large number of short-lived tasks that need to be executed quickly.

ExecutorService executor = Executors.newFixedThreadPool(10);
for (int i = 0; i < 100; i++) {
    executor.execute(new Task());
}

# 2. Asynchronous Messaging Pattern

The Asynchronous Messaging pattern is a concurrency pattern that involves using messaging to communicate between different parts of a system. This pattern is useful for systems that have long-running tasks that can be executed asynchronously. When a task is submitted to the system, it is sent as a message to a queue. A worker process then retrieves the message from the queue and executes the task.

public class TaskListener implements MessageListener {
    public void onMessage(Message message) {
        // Execute task asynchronously
    }
}

ConnectionFactory connectionFactory = new ActiveMQConnectionFactory("tcp://localhost:61616");
Connection connection = connectionFactory.createConnection();
Session session = connection.createSession(false, Session.AUTO_ACKNOWLEDGE);
Destination destination = session.createQueue("taskQueue");
MessageConsumer consumer = session.createConsumer(destination);
consumer.setMessageListener(new TaskListener());

# 3. Actor Model Pattern

The Actor Model pattern is a concurrency pattern that involves creating actors that can communicate with each other through messages. Each actor has its own state and can execute tasks independently. This pattern is useful for systems that have a large number of long-running tasks that need to be executed concurrently.

public class MyActor extends UntypedActor {
    public void onReceive(Object message) throws Exception {
        // Execute task asynchronously
    }
}

ActorSystem system = ActorSystem.create("MySystem");
ActorRef myActor = system.actorOf(Props.create(MyActor.class));
myActor.tell("Task", ActorRef.noSender());

# 💎 Patterns for Race Condition

# Read-Modify-Write

The Read-Modify-Write pattern is a common source of race conditions in distributed systems. It occurs when multiple processes or threads read a shared resource, modify it, and write it back to the same location. If two or more processes modify the same resource concurrently, the final value of the resource may depend on the order in which the modifications are applied.

To avoid this pattern, we can use atomic operations that guarantee that the read-modify-write sequence is executed atomically, without any other process or thread intervening. For example, in Java, we can use the synchronized keyword or the java.util.concurrent.atomic package to ensure atomicity.

# Locking

Locking is another common pattern for avoiding race conditions in distributed systems. It involves acquiring a lock on a shared resource before accessing it and releasing the lock after the access is complete. If two or more processes attempt to acquire the same lock concurrently, one of them will block until the lock is released.

To avoid deadlocks, we should use a consistent ordering of locks across all processes or threads. For example, we can use a global ordering of locks based on the identity of the resources being locked.

# Message Passing

Message passing is a pattern for avoiding race conditions in distributed systems by using asynchronous communication between processes or threads. Instead of accessing shared resources directly, processes or threads send messages to each other to request access or to notify each other of changes.

To avoid race conditions, we should use a consistent ordering of messages across all processes or threads. For example, we can use a global ordering of messages based on the identity of the sender and the receiver.

# Versioning

Versioning is a pattern for avoiding race conditions in distributed systems by using version numbers to track changes to shared resources. Each process or thread maintains a local copy of the resource and updates its version number when it modifies the resource. When a process or thread reads the resource, it checks the version number to ensure that it has the latest version.

To avoid conflicts, we should use a consistent ordering of updates across all processes or threads. For example, we can use a global ordering of updates based on the version number of the resource.

# 💎 Locking Patterns

# 1. Centralized Locking

Centralized locking is a simple and effective pattern for locking in distributed systems. In this pattern, a single node is responsible for managing the locks for all the resources in the system. When a process needs to access a resource, it requests a lock from the centralized lock manager. The lock manager grants the lock if it is available, and the process can access the resource. Once the process is done, it releases the lock, and the lock manager makes it available for other processes.

The main advantage of centralized locking is its simplicity. It is easy to implement and does not require complex coordination between nodes. However, it can become a bottleneck if the lock manager becomes overloaded or fails.

# 2. Distributed Locking

Distributed locking is a more complex pattern that distributes the lock management across multiple nodes. In this pattern, each node is responsible for managing the locks for the resources it owns. When a process needs to access a resource, it requests a lock from the node that owns the resource. The node grants the lock if it is available, and the process can access the resource. Once the process is done, it releases the lock, and the node makes it available for other processes.

The main advantage of distributed locking is its scalability. It can handle a large number of resources and processes without becoming a bottleneck. However, it requires more complex coordination between nodes and can be challenging to implement correctly.

# 3. Optimistic Locking

Optimistic locking is a pattern that assumes that conflicts between processes are rare and allows multiple processes to access the same resource simultaneously. In this pattern, each process reads the resource's current state and stores it locally. When the process updates the resource, it checks if the resource's state has changed since it last read it. If the state has not changed, the process updates the resource and releases the lock. If the state has changed, the process retries the operation with the new state.

The main advantage of optimistic locking is its high concurrency. It allows multiple processes to access the same resource simultaneously, reducing contention and improving performance. However, it requires careful handling of conflicts and can be challenging to implement correctly.

# 4. Pessimistic Locking

Pessimistic locking is a pattern that assumes that conflicts between processes are common and prevents multiple processes from accessing the same resource simultaneously. In this pattern, each process requests a lock on the resource before accessing it. If the lock is granted, the process can access the resource. If the lock is not granted, the process waits until the lock becomes available.

The main advantage of pessimistic locking is its simplicity and reliability. It ensures that conflicts between processes are avoided, and the resource is accessed by only one process at a time. However, it can become a bottleneck if the lock is held for a long time, and it can reduce concurrency and performance.

In conclusion, locking is a critical aspect of distributed systems, and choosing the right locking pattern depends on the specific requirements of the system. Centralized locking is simple and effective but can become a bottleneck. Distributed locking is scalable but requires more complex coordination. Optimistic locking allows high concurrency but requires careful handling of conflicts. Pessimistic locking is simple and reliable but can reduce concurrency and performance.

# 💎 Distributed transaction patterns

# Two-Phase Commit (2PC):

Read here

# Saga:

A saga is a sequence of local transactions, each executed within a single service, that together form a distributed transaction. Each local transaction updates the data within its own service and publishes events to trigger subsequent local transactions in other services. If a local transaction fails, compensating actions are executed to undo the changes made by previous transactions. Sagas are more flexible than 2PC but require careful design to handle failures and ensure eventual consistency.

# Eventual Consistency:

Instead of enforcing immediate consistency, this pattern allows services to operate independently and asynchronously replicate data changes. Services publish events when they make changes to their data, and other services subscribe to these events to update their own data. While this pattern simplifies the transaction management, it introduces the challenge of handling eventual consistency and resolving conflicts.

# Compensating Transaction:

This pattern is used when a transaction needs to be rolled back due to a failure or error. A compensating transaction is designed to undo the changes made by the original transaction. For example, if a payment transaction fails, a compensating transaction can be triggered to reverse the payment. This pattern requires careful design to ensure that compensating transactions are idempotent and can be executed safely.

# Idempotent Retry:

In this pattern, a failed transaction can be retried without causing any side effects. Each transaction is designed to be idempotent, meaning that executing it multiple times produces the same result as executing it once. By retrying the transaction, eventual consistency can be achieved even in the presence of failures.

# 💎 Patterns for Ensuring Data Consistency

# 1. Two-Phase Commit (2PC)

Read here

# 2. Saga Pattern

The Saga pattern is another pattern that can be used to ensure data consistency in distributed systems. In this pattern, a long-running transaction is broken down into a series of smaller transactions, each of which is executed by a separate node. Each transaction updates the state of the system and publishes an event to notify other nodes of the change. If a transaction fails, the Saga pattern uses compensating transactions to undo the changes made by the failed transaction and restore the system to its previous state.

# 3. Eventual Consistency

Eventual consistency is a pattern that allows for data consistency to be achieved over time, rather than immediately. In this pattern, updates to the system are propagated asynchronously to all nodes in the system. While this approach may result in temporary inconsistencies, the system eventually converges to a consistent state. This pattern is often used in systems where high availability is more important than immediate consistency.

# 4. Conflict-Free Replicated Data Types (CRDTs)

Conflict-Free Replicated Data Types (CRDTs) are a family of data structures that can be used to ensure data consistency in distributed systems. CRDTs are designed to be replicated across multiple nodes in a system, and they ensure that updates to the data structure are conflict-free. This means that updates can be applied in any order, and the data structure will always converge to a consistent state.

# 💎 Consistency patterns

# Name Stategy
1 Compensating action Perform an action that undoes prior action(s)
2 Retry Retry until success or timeout
3 Ignore Do nothing in the event of errors
4 Restart Reset to the original state & start again
5 Tentative operation Perform a tentative operation and confirm (or cancel) later

# Logging for keep consistent:

  • Receiver-based message logging (RBML): involves sync writing every received message to stable storage before any actIon is taken on it.
  • Sender-based message logging (SBML): involves writing the message before it is sent.
  • Hybrid message logging (HML)

# Refs

https://martinfowler.com/articles/patterns-of-distributed-systems/

https://www.youtube.com/watch?v=nH4qjmP2KEE