The Handover rule

The most tricky part of in Consensus protocols for Partial Synchrony is the handover from one leader to the next. Over the years, Consensus research exhibited handover sub-protocols with unnecessary communication complexity, misconceptions, and mistakes.

What is the leader handover problem?

Solutions in Partial Synchrony follow a common template. They operate in a view-by-view manner, each view driven by a known leader. If the leader is honest, then the view is successful and the protocol terminates.

If the view expires, then the leader is replaced via a handover sub-protocol. The handover must guarantee that if a commit decision was reached by any Honest Validator in the previous view, then the next leader learns the committed value. This is achieved through a commit-lock paradigm: $2f+1$ Validators must hold a lock on a value before it can be committed. In another post, we describe VPCBC, a linear implementation of commit-lock.

The crux of the handover is for the next leader to learn whether it is possible for $2f+1$ to have a lock. For many years, handover was handled via an approach pioneered in PBFT:

PBFT and Tendermint achieved this goal via gossip communication: all Validators must receive excatly the same $2f+1$ messages that the leader received and apply the same justification logic to accept the leader proposal. This requires quadratic word-Communication. Faced with a cascade of failures, this incurs super-quadratic (up to cubic) communication complexity which is sub-optimal.

HotStuff by Yin, Malkhi, Reiter, Abraham and Gueta pioneered a different approach, which unlocked the first linear implementation in the Partial Synchrony setting.The gist of the new approach is encapsulated by the following handover rule:

The Handover rule: When a leader starts a new view, it must learn the latest lock which is held by any Honest Validator. Validators will accept a leader proposal conditioned on it extending the latest lock they know.

The nice thing about this encapsulation is that different Consensus protocols can be derived by concatenating a leader replacement which satisfies the Handover rule with different forms of VPCBC. Whereas HotStuff utilized 3-phase VPCBC, there are multiple materializations of the Handover rule in different settings, all incurring only linear communication, as summarized in the table below.

VPCBC phases Model Leader protocol to start a new view Linear?
3-phase Partial Synchrony Wait to receive a QC from $2f+1$ Validators. yes
3-phase Synchronous setting Wait to receive a QC from $2f+1$ Validators. yes
2-phase Partial Synchrony Wait for one of three conditions:
1. receive QC from the immediately preceding view.
2. receive a QC from all the Validators
3. $\Delta$ elapse since all Validators exited the previous view
yes

The evolution of this rule has been somewhat peculiar, alternating importance and unimportance of the third phase of HotStuff.

Enter 3-phases.

First, HotStuff implemented the Handover rule by adding one phase to the core VPCBC primitive. A 3-phase VPCBC guarantees that if any Honest Validator holds a lock in a view, then $2f+1$ Validators hold a QC (a “key”) on it. In HotStuff, the next leader simply waits to receive the highest QC from $2f+1$ Validators. This suffices for guaranteeing Handover completion: if a lock is held by any Honest Validator, it will learn the QC for it.

This simple handover procedure can be implemented with linear communication complexity. It was introduced in HotStuff, the first communication-optimal Byzantine Agreement solution for Partial Synchrony.

Enter asynchrony.

VABA by Abraham, Malkhi and Spiegelman harnesses the three-phase HotStuff solution to reach the first communication-optimal Validated Byzantine Agreement protocol in Asynchronous Setting. The core idea is that handover with 3-phase VPCBC does not rely on timers, hence it can be utilized in asynchrony.

This works as follows. VABA borrows the template of Byzantine Agreement solutions in Partial Synchrony and operates in a view-by-view manner, each view driven by a leader. VABA implements each view regime exactly the same as HotStuff: the leader of each view utilizes 3-phase VPCBC, and the next leader only waits to receive the highest QC from $2f+1$ Validators, without any timers. The twist is that VABA operates $n$ view instances in parallel, in order to randomize (in retrospect) leader election and thwart an asynchronous adversary. The linearity of the handover regime of in each instances is crucial, acheiving optimal $n \times O(n)$ communication complexity.

Revisiting three phases.

Remarkably, revisiting the third phase in HotStuff, HotStuff-2 by Malkhi and Nayak demonstrates that it is completely unnecessary, and that under Partial Synchrony, the Handover rule can be implemented with 2-phase VPCBC and have linear Communication.

In a nutshell, what they observed is:

It turns out that handover from a good leader is easier than a bad leader!

Indeed, there are two possibilities.

  • Case 1 is that the preceding leader is non-faulty, and therefore it succeeds in driving (2-phase) VPCBC to completion. In this case, the next leader learns a lock from $2f+1$ Validators that intersect the commit quorum.

  • Case 2 is that the preceding leader is faulty and in particular, the VPCBC in the preceding view did not succeed. In this case, the next leader can wait the maximal network delay to learn the highest lock held by any Honest Validator.

Therefore, a leader can implement the Handover rule and discover if it is in case 1 or case 2 as follows.

In each view in HotStuff-2, leaders employ 2-phase VPCBC to make proposals. When a leader starts, it can guarantee the Handover condition by collecting from Validators their highest known $QC$. The next leader waits to learn the highest lock by satisfying one of three conditions:

  1. The leader obtains a QC from the immediately preceding view.

    Note that by definition, this QC is the highest possible lock.

  2. The leader obtains a QC from all the Validators.

    Here, the leader trivially learns the highest lock held by any Honest Validator.

  3. The leader waits until the maximal delay $\Delta$ has elapsed from when all the Validators have exited the previous view.

    Usually, this occurs within $O(\Delta)$ from when the leader itself entered the view. The View Synchronization sub-protocol can notify the leader when this is guaranteed.

Note that implementing the above incurs (still) only linear communication. In summary, HotStuff-2 reaches agreement under the same conditions as HotStuff, while reducing latency by one third.

On Responsiveness and the importance of the third phase.

Importantly, with 2-phase VPCBC, it doesn’t always suffice for the leader to collect QCs from a quorum of $2f+1$ validators like (the original) HotStuff.

In the leader did only that, liveness may be lost. Consider a view with no commit decision, but with one (or more) Honest Validator holding a lock. A leader’s collect-quorum of $2f+1$ Validators doesn’t necessarily intersect with the Validator(s) that holds a lock from the previous view. If the leader proposes a transaction that does not extend the lock, an Honest Validator holding the lock may reject the leader proposal. For this reason, the Handover rule requires the leader to learn the highest lock.

It might seems that implementing the Handover rule with 3 phases is more “responsive” than implementing it with 2 phases. However, expiring the previous view inevitably incurs waiting $O(\Delta)$ in Partial Synchrony. Hence, when the previous view is unsuccessful, there is no loss in waiting.

It should now be clear why the third phase remains important for VABA. Responsiveness is the key: as explained in a previous post on VABA, protocols in the Asynchronous setting must not rely on timers, because there is no known bound on message transmission delays. Unfortunately, without timers the linear implementation of the Handover rule above does not woek with 2 phases.