CBC Casper: an introduction to consensus and finality
In this article, we try our best to explain things from first principles -- we assume almost zero prior knowledge of blockchains or consensus algorithms. In short, it's the article Sacha wishes he could have read when we he was first introduced to the concepts of consensus, finality, and CBC Casper at EthParis this year. As such, it should be accessible to anyone with a healthy dose of curiousity.
What is consensus?
In the english language, to reach (or come to a) consensus simply means to arrive at an agreement, or a common conclusion, about something. For the rest of this document, when we mention consensus, we are referring to consensus among nodes in a distributed network.
What is a distributed network?
A network is just a collection of nodes (computers for example) that communicate with each other. And distributed in this context just means that these computers are spread out geographically. In a distributed network, nodes can talk to each other and make decisions based on the information they receive from each other. But because they are located in different places, new information may take more time to reach some nodes than others.
Why is consensus hard?
Coming to consensus in a distributed network is hard because messages that nodes send and receive may be delayed, reordered, or lost. On top of this, some nodes may send nonsensical messages, multiple messages, or no messages at all. This means that nodes don't all see the same information all the time, and may in fact see contradictory information at any one time (this can be because of latency or maliciousness on the part of other nodes) .
What are we even coming to consensus on?
In most standard presentations of consensus, nodes must agree on a bit: either a zero or one. However, consensus can be reached on a whole range of other interesting pieces of data.

For example, we can have nodes come to consensus on integers, financial transactions, or even lists of blocks .

What is a consensus protocol?
A consensus protocol is a protocol that ensures nodes in a distributed network will eventually come to a consensus. To be considered useful, a consensus protocol must satisfy two main properties:

1. Safety: It should be impossible for different nodes to make conflicting decisions.

2. Liveness: Nodes running the consensus protocol will eventually make decisions.

While it is easy to see the need for safety, the need for liveness is a little less obvious. To see why liveness is necessary, consider a scenario in which nodes never output any values.

In such a scenario, since nodes never make any decision, they cannot possibly make any conflicting decisions. This means that the safety property is always satisfied even though the consensus protocol is essentially useless -- nodes never agree (come to consensus) on any values! By ensuring that nodes eventually make decisions, liveness helps us rule out these edge cases.

What is CBC Casper?
CBC Casper is a family of consensus protocols, as well as a flexible technique for defining new consensus protocols . You can think of it as a blueprint for designing consensus protocols on pretty much any data structure you can imagine.
What is CBC Casper's place within Ethereum research?
CBC Casper is one of the two main branches of Ethereum research. The other branch is known as Serenity, Ethereum 2.0, or Shasper. Both branches are focused on finding solutions to help Ethereum scale. However CBC Casper research is more radical, less implentation focused, and further from production.
What is finality?
Finality is the guarantee that -- subject to certain assumptions -- past decisions can't be reversed. As we will see later, in a distributed network, there is a subtle difference between this and consensus: for example nodes may reach consensus without reaching finality.
Why is finality important?
Finality is not something we are used to thinking about, because in our current world we mainly deal with centralized networks that guarantee finality seamlessly and almost instantaneously. Take a credit card purchase for example. Just moments after the swipe, the vendor knows whether you have enough funds to make the purchase. If the transaction goes through, the vendor knows it will not later be reverted, and that money will be received for the exchange. In other words the purchase is finalized.

In a distributed network however, we do not have a central authority we can trust to finalize data/transactions. So this is something we need to worry about.

How is finality defined in CBC Casper?
Before we tackle this, it will be helpful to define a couple of terms:

1. Validators: They are the consensus forming nodes in the network.

2. Messages: Messages are generated and passed around by validators in an effort to try and reach consensus.

The Casper consensus rules guarantee that after a sufficiently high number of validators have made valid messages that propose a given value, it is impossible for them to change their minds and reach consensus on a contradictory value without a large number of them acting in a way that is clearly -- and cryptographically verifiably -- malicious.

In other words, nodes should achieve finality on a consensus value after they have seen enough other validators propose that same value in valid messages.

How do we check for finality in CBC Casper?
Actually detecting all possible situations where a message is final is very difficult, but we can come up with a set of heuristics -- known as safety oracles -- that can help us detect some of the cases where this happens.

You can think of safety oracles as tools that allow nodes to detect final decisions. The simplest of these is the clique oracle, and it essentially does two things:

1. It checks whether enough validators agree with the consensus value -- c -- in question.

2. It then checks if these validators see each other agreeing on the value.

It is not obvious why we need to check for the second requirement. For this reason, we will start the discussion with a naive example that shows why agreement is not sufficient.

In this example consensus is reached when all validators are in agreement. Additionally, validators will always propose (or estimate) the value that (from their point of view) is supported by the majority -- note that in the case of a tie they are free to choose between the tied values.

Picture three validators -- A, B and C -- that send messages with either 0 or 1 values. These validators need to reach consensus -- in other words come to an agreement -- on whether the truth is 0 or 1.
At this stage each validator has sent one message. As you can see, A and B's first messages have value 0, whereas C's message has value 1.
Now imagine that before sending her next message, A sees C's first message but fails to see B's.
From her point of view she has seen one message with a 0 (her own previous message), and one message with a 1 (C's first message).
Since there is a consensus tie (from her point of view) she is free to choose either 0 or 1. Let's suppose she goes with 1.
Now suppose a similar thing happens to B. And that he also chooses to estimate 1.
We have a situation where all validators are currently proposing 1. In other words, it looks like our validators have reached consensus on 1. If we were to use a safety oracle that only checks for the first thing -- i.e., that enough validators are in agreement on a value -- we would conclude that the value 1 is finalized.
However it turns out that there are continuations where 1 may not be final. How is that possible? Simply put, if the validators fail to realise they are in agreement, they may change their minds.
For example, suppose C doesn't receive A and B's latest messages before sending her next message. From her point of view, both A and B's latest messages are still 0. Since she thinks 0 is supported by the majority, she changes her mind and proposes a 0 too.
Now suppose B sees C's latest message, but fails to see A's. From his point of view both C and A -- a majority -- are agreed on 0, so he also changes his mind and proposes a 0.
Finally, if A is in a symmetrical situation to B, she too will change her mind and propose a 0.
We are now in a situation where -- in just one round of messages -- A, B and C have all changed their minds and reached consenus on the opposite value, 0.
The moral of the story is the following: as long as validators don't realize they have reached consensus, we don't have finality. This is why our safety oracle needs to first check for consensus, and then check to see if validators realize they have reached it.
In our naive example above, consensus was reached when all validators were in agreement. However, in practice we don't need all validators to be in agreement -- a majority is enough. As we explored above, to reach finality we also need to check that the validators within this majority see each other agreeing.
In summary, our safety oracle needs to look for a group of validators that (1) form a majority, (2) are in agreement, and (3) see each other agreeing.
Let's take a look at a correct example.
Suppose we have three validators again -- A, B, and C -- and that they are again sending messages with either 0 or 1 values.
This time, B and C's first messages have value 0, whereas A's message has value 1.
Now, suppose that before sending her second message, A sees both B and C's first messages. She sees that a majority are proposing 0, so she changes her mind and proposes 0 too.
Now C also sees A and B's latest messages before sending her next message. Like A, she sees that everyone is proposing 0, so she sticks with 0.
Suppose the same thing happens to B.
We now have a situation where (1) every validator is in agreement and (2) every validator is aware of every other validator's latest message. In other words, not only are validators in agreement, but they know they are in agreement.
So we can safely say that the network has reached finality on zero 😊
We have seen that if we have a group of validators that (1) form a majority, (2) are in agreement on a consensus value c, and (3) see each other agreeing in their latest messages, then we have some notion of finality on the value c.
But what do we mean by this exactly? Does it mean c is final and will never change?
Not quite, a final decision on c can be reverted if more than a certain percentage of validators are malicious (i.e., knowingly break consensus rules).
So when we say that a decision on c is final, what we really mean is that none of the validators that have seen each other agreeing on c can legally (within the rules of the consensus algorithm) switch over to a contradictory value c' unless enough others illegally switch over first.
In simple terms, c is final as long as a large number of validators are honest. A concrete example should make this easier to understand:

In the paragraph below, some numbers are in green. Drag them with your mouse to adjust them. The consequences of your adjustments will be reflected immediately in the blue numbers.

While we do not get into the details here, it is important to note that even if these validators were to cheat in this way, it would be detectable by the network, and they could be punished. This is because every validator's message history is public and uneditable, and each message contains a list of the previous messages which justify that message -- in CBC lingo this part of the message is called the justification.
We've seen that we can use a simple safety oracle -- known as a clique oracle -- to determine if a message is final. And we've seen why it's important to check if validators see each other agreeing. However so far we've only checked to see if validators are in agreement on their latest messages.
While we won't get into the details here, it turns out we can get even stronger guarantees on finality by looking at how long validators have been in agreement for. Simply put, the longer a majority of validators have been in agreement on a message, the more malicious validators are needed to overturn this message.
Put another way, the longer validators have agreed on a value, the harder it is for that value to be overturned. Given this information, we can try to come up with safety oracles that check for majority agreement across many rounds of messages, not just the latest one.
Up until very recently, figuring out how to construct safety oracles that can check for this efficiently was an unsolved problem. However as of February of this year -- thanks to Daniel Kane & Greg Price at the Feb’19 Ethereum Stanford Workshop -- we now have an efficient (polynomial time computable) way of doing this.
At the ethParis Hackathon this year, we worked on taking this oracle out of research and into running code. Here's what we came up with.
Below you can see 50 messages from 5 validators trying to come to an agreement on a 0 or 1 value. Time flows from left to right.
For each message, the oracle efficiently checks to see if, and for how long, validators have seen each other agreeing on the value of that message.
play
RdYlGn
Here we've chosen to represent the strength of agreement using a red-yellow-green color scale: the greener the message, the longer validators have seen each other agreeing on the value of that message.
In other words, the greener the message, the harder it is for the value of that message to be overturned, and so the more final we can consider that message to be.
Note that the safety oracle algorithm is running directly in the browser on randomized data! Click on the play button above to see it run in real-time on a different dataset.
Finally, if you're interested in understanding how this safety oracle works at a deeper level, we recommend you take a look at our deep dive on the matter.
With this article, you should have obtained an overview of consensus and finality under CBC Casper, and developed a deeper understanding for why finality is not trivial in a distributed network.
Even though we've mostly talked about CBC Casper in the context of a binary consensus protocol -- a protocol in which nodes come to agreement on 0 or 1 messages -- it's important to keep in mind that CBC Casper is a family of protocols that can be adapted to pretty much any data structure you can imagine.
The obvious use case is a consensus protocol on lists of blocks -- where nodes need to come to an agreement on a list of blocks or blockchain -- but this is just one of many possible data structures that the CBC family can handle.
We hope this has sparked your interest in CBC Casper. In case we're right, we've created a list of further resources for you to check out!
Finally, if you want to develop a deeper understanding for how the efficient safety oracle we covered above works, we recommend you take a look at our deep dive on the matter.
Acknowledgements.
We thank Vlad Zamfir, Daniel Kane, Greg Price and Yufei Z. for the research supporting this work.
Open-source.
The code for the safety oracle is free to use and available on Github.
Further resources.