Common Models in Consensus
Key problem models and primitives are used in literally thousands of Consensus whitepapers. Each spends precious space and effort (re)defining the same key models and primitives. I often wonder: could we reach consensus (pun intended) as a community on common reference points (e.g., a textbook, a paper)?
Below I listed some core model terms, shown Capitalized. For each of them, I gave a brief definition, not intended to serve as a reference. It would be great if our community had common reference points, so we could simply write “this paper is in the XX model of [@ref]”. (DM me with suggestions.)
The Consensus Problem
Consensus is a building block in solving the State-Machine replication (“SMR”) problem. In SMR, Clients input transactions to a system of Validators. The Validators repeatedly solve Byzantine Agreement (“BA”) to sequence transactions. We equate a history prefix of transactions with the state resulting from deterministically applying them, and ignore the application of transactions (execution) to the replicated state. To guarantee that Validators output a useful sequence of decisions, Cachin et al. [@Cachin2001SecureAE] stipulated that transactions must satisfy a globally verifiable external validity condition and named this BA variant Validated Byzantine Agreement (“Validated BA”).
::: Validated Byzantine Agreement
In the Validated BA problem with validity predicate $V()$, Validators receive transactions from Clients and output a decision, such that the following must hold:
- Agreement:
-
no two non-faulty Validators decide on different values.
- External Validity:
-
if an non-faulty Validator outputs a value $d$, then $V(d)$ must be satisfied.
- Termination:
-
all non-faulty parties must eventually decide on a value and terminate. :::
Environment
::: Participation
In the Known Participation model, a consensus system has two groups of participants: Clients and $n$ Validators (also refered to as nodes or simply as parties), $n-f$ of which are non-faulty. Solving Consensus is challenging against various adversities, such as Validator failures and communication delays, listed below.
Faulty Validators may be completed corrupted (“Byzantine”) or only fail by crashing. In the Byzantine model, non-faulty Validators are referred to as Honest. :::
Communication
::: Partial Synchrony
The most commonly adopted communication model in practice is Partial Synchrony.
In this model, there is a global stabilization time (“GST”) unknown to participants in the system, after which every message sent between two Honest parties is received within a known $\Delta$ delay. Under Partial Synchrony, Agreement and Validity must hold at all times, but Liveness is guaranteed only after GST. The intuitive interpretation of GST is that systems are timely in steady state, but might suffer temporary, unexpected transient delays causing them to suspend progress and resume when the network repairs.
In this model, we assume $n=3f+1$ Validators, at most $f$ of which may suffer arbitrary Byzantine failures; this resilience is shown optimal by a lower bound by Fischer, Lynch and Merritt in [@Dwork1988ConsensusIT].
Performance metrics are set to zero at GST and reflect execution time after GST, since no decision is guaranteed until GST. :::
::: Asynchronous setting.
Another important failure model is the Asynchronous model, in some sense presenting the holly grail of being able to solve BA against pure asynchrony. In this model, no bound exists on message transmission delays, so long as messages among Honest Validator eventually arrive. Mitigating this impossibility, randomized solutions provide upper bounds the length of runs in expectation. Performance measurements are taken in expectation of the randomized selections made inside the system. :::
::: Synchronous setting.
An opposite assumption constitutes the Synchronous failure model. It assumes that all messages among Honest parties are delivered within the known $\Delta$ delay. Bitcoin uses the Synchronous model, expecting a decision only every 10 minutes and consequently, providing a low decision rate of single-digit transaction sequencing per second. When a higher transaction rate is required, the Synchronous model may be less practical: relying on the network to transfer messages withing a small, bounded delay risks compromising the safety of services. :::
::: Authenticated Channels setting.
For various reasons, it is worthy to investigate solutions that do not rely on public-key cryptography. The benefits are faster communication (no signatures), a simpler initial setup, and potentially quantum-resilience in the future. In the standard Authenticated Channels model, every pair of parties can communicate over direct channels whose endpoints are verified, but they cannot prove to a third party where a message originates from. :::
::: Message authentication.
Except for the Authenticated Channels setting, we can assume all messages are signed when needed by the sending party using a Public Key Signature Scheme. A message m signed by a private key $sk_i$ belonging to party $i$ is denoted $sign(sk_i, m)$. Anyone that knows the public key $pk_i$ corresponding to $sk_i$ can verify the signature validity $true \leftarrow Verify(pk_i, sign(sk_i, m))$.
To collectively sign messages, we can employ a Threshold Signing Scheme. A messages $m$ signed by a secret signing key $stk$, shared among $n$ parties – each with its secret key $stk_i$. any $k$ parties can sign the message, without reconstructing $stk$: each party, upon receiving a message m, computes its own signature share by invoking: $\sigma^i \leftarrow TE.ShareSign(stk_i, m)$. (By a slight abuse of notation, sometimes $TE.ShareSign(i, m)$ is used instead.) Any set of $k$ unique signature shares is enough to construct a signature $sign(stk, m) \leftarrow TE.Combine(m, \sigma^1, …, \sigma^k)$. Anyone that knows the public key $ptk$ corresponding to $stk$ can verify the signature validity $true \leftarrow TE.Verify(m, sign(stk, m))$. :::
Performance Measures
::: Communication complexity refers to the number of messages, each carrying $O(1)$ words, which are exchanged among Validators, until a decision is reached. We assume that each word is large enough to carry any counter, command or identity in the protocol (e.g., words are 256 bits). :::
::: Latency is measured as the number of hops in the longest chain of messages or timer expirations, following each other, until a decision is reached. For each hop, this measure can combine a pessimistic bound $\Delta$ on communication delays, and an optimistic bound $\delta$ on actual communication delays. :::