Asynchronous Consensus
What use is it?
It is hard to coordinate independent machines that do not share memory.
Imagine we are across the room from each other.
I ask you a yes/no question, “Did you hear something?”
You shake your head.
Unbeknownst to you, I am blind and cannot see you shaking your head. You think you have answered; I think you have ignored me. Our understanding of the situation has diverged.
I repeat the question.
You have lapsed into some whimsical daydream, thinking our exchange already ended, and cannot hear me.
How do we avoid this situation?
Imagine instead of talking from across the room we are exchanging letters from opposite sides of a valley that is occupied by our enemies. This scenario may seem fantastic but it is an excellent analogy for communication over noisy channels where messages might be delayed or lost, like on the internet.
Imagine proposing something like, “Let us attack our enemies in the valley (but only if we are each certain that the other has agreed to attack).”
Will we ever attack?
This is called the Two Generals Problem and the answer is no, we will never attack, because the proposal is an infinite recursion (each general’s decision depends on the decision of the other).
Just send a receipt!
You can send a receipt, but you cannot immediately be sure your receipt has been received, so you need a receipt of a receipt, and then a receipt of a receipt of a receipt, and so on.
Practically speaking this is no great obstacle. It only means we must avoid infinite recursions in our battle plans.
The More General Case
How can we be sure a system (generals, drones, etc) will do what we want if no part of the system knows the complete global state of the system, and no message can be relied upon to be delivered?
We cannot always be sure. Uncertainty can always, theoretically speaking, be prolonged indefinitely [Fischer, Lynch, Paterson. Impossibility of Distributed Consensus with One Faulty Process. paper]. What we can do is make that as unlikely as possible, as quickly as possible.
Imagine we agree at the outset on a protocol:
Every time I send a message I will consider that message as being appended to the sequence of all our previously confirmed messages. Therefore each message will include an identifier of the message that came before it. Messages might be delayed or lost but this will ensure that when my messages arrive you will know where in the sequence I intended them to be.
After receiving a valid message I will send a receipt that confirms it. However at any time we both might try to append a message to the sequence, in which case there will be a conflict: two messages vying to be appended at the same position. In other words, there will be two competing sequences that are identical except for the last message. In that case, I will pass both sequences to a mysterious orb called DECIDE(X,Y) and DECIDE(X,Y) will tell me which sequence to confirm. We both have identical orbs called DECIDE(X,Y) so we can be certain that for any given conflict we will agree on which sequence to confirm.
Upon receiving a receipt I will know you have seen my message and resolved any conflicts and so I will consider my message irreversibly appended to the sequence. You will know I have seen your receipt when I either send a receipt of your receipt or append a new message to the newly confirmed sequence. In that way we will maintain an ongoing conversation in which a certain sequence of confirmed messages is known with certainty, and a number of unconfirmed messages remain uncertain.
It is a fine protocol but it hinges entirely on the mysterious orb DECIDE(X,Y). How does DECIDE(X,Y) work?
DECIDE(X,Y) takes two sequences, X and Y, as arguments. X has my message on the end and Y has yours. It outputs one of those, which is its decision. It might be very simple: if X is LEADER then X, otherwise Y. Unfortunately that would be fragile. There might be many of us in the conversation (DECIDE(X,Y,Z,…,etc)) so choosing a leader means we can only progress the conversation at the speed of the leader, and if we lose contact with the leader we are in a pickle.
Ideally DECIDE(X,Y,Z,…) needs to work no matter which participants are reachable. In practice we need at least a majority of participants.
DECIDE(X,Y,Z,…) needs to make an unambiguous decision, or no decision. We will stipulate that an unconfirmed sequence can have any number of speculative messages appended to it, as long as each message correctly identifies the message before it. That way if DECIDE(X,Y,Z,…) makes no decision we can wait for more messages and receipts, extend the relevant sequences, and hope that the set of extended sequences will be decidable. In other words it gives us some scope for converging on a decision.
The problem with convergence is that it takes time. We do not want to waste time, we want DECIDE(X,Y,Z,…) to make fast decisions with minimal information. In other words we want rapid convergence. Not only that, we want to know with certainty the probability that a decision will be made within a given number of messages.
For the reader: come up with the best solution for DECIDE(X,Y,Z,…).
Upshot
What does all this get you?
Something marvellous.
If we can agree on a sequence of messages then we can treat those messages like database transactions: we each set up a block of memory (or a piece of paper) and operate on it according to the instructions in each new message. Because we agree on the order of the messages, we also agree on the state of the memory. In other words we have constructed a shared memory.
Oscar the Operating System
A computer is a block of memory with an operating system. The operating system allows all kinds of apps to use that memory to execute all kinds of tasks. Our shared memory is no different.
Oscar will execute arbitrary tasks on a group of independent machines as if they were one super-machine.
