Twins: Making BFT Systems Robust

Motivation: Capturing Byzantine Attacks

Designing distributed protocols with Byzantine Fault Tolerance (BFT) is not easy.

BFT models a distributed system in which faulty nodes can behave in an arbitrary, completely unconstrained manner. However, Byzantine nodes are most likely to succeed in subverting a BFT protocol when their deviating behavior is undetectable. That is, the most potent Byzantine attacks come from within: the attacker sends messages that have the correct format and their interaction with other nodes appear to follow the protocol.

Indeed, in several known attacks (e.g., paper1, paper2) on BFT protocols, Byzantine nodes deviated from correct behavior in simple ways: they equivocate (by sending different messages to different recipients), they omit to send messages (to some recipients), they delete an internal state variable, etc.

Enter Twins!

Leveraging this insight, we developed Twins, a “white-glove” approach for BFT testing.

Twins emulates “bad” behavior by running two (or generally k) instances of a node with the same identity. Each of the two instances (or Twins) runs unmodified, correct code.

Twins systematically generates interesting byzantine behaviors which can be efficiently enumerated and tested; it forgoes uninteresting deviations from the protocol such as sending ill-formatted messages or sending unjustified messages, which honest recipients can easily reject.

Most importantly, there is no need to develop attack logic to use Twins testing. Nodes execute unmodified, correct code to attack the system. A Twins tester simply “mis-configures” the system by deploying two instances of the attacking node instead of one. Both instances use the same identity (network address, signing key), hence, to the rest of the system, a “twin” node appears as a single “mis-behaving” node.

Honing on Interesting Byzantine Behaviors

In order to demonstrate how to cover interesting Byzantine behaviors, let’s design a simple service: a K-V store with one entry. We want the K-V store to tolerate f Byzantine servers out of n=3f+1 servers, the same number required to solve consensus.

fourserversKV

To make the task easier, we will have clients sign their updates when requesting to update the stored value. Therefore, a Byzantine server will not be able to make up illegal values, because it cannot forge signatures by clients, but it might respond to queries with stale values, or it might not respond at all.

Store protocol: The protocol for a client to update the stored value will have two-phases: in the first phase, a client queries 2f+1 out of 3f+1 servers for the latest version (say v). In the second phase, the client requests 2f+1 out of 3f+1 servers to update the stored value to version v+1.

fourserversKV-storeNretrieve Let’s see an example. Suppose there are four servers {A, B, C, D} and C is Byzantine. The first client arrives, queries {A, B, C} for the current version which is 0. The client stores the first value with version=1 and receives a confirmation from {A, B, C}. A second client queries {B, C, D} for the current version. Say that C is Byzantine. It could lie and return version 0, hiding version 1. This would be ok, because B will return version 1.

Query protocol: Now we need to take care of querying the service. The problem is that when a client queries the service, it might catch an unfinished update. For example, suppose version 2 was written only to {A, C} so far. A client querying {A, C, D} will get a response containing version 2 from C. But if client queries {B, C, D}, because C is Byzantine and might hide version 2, only version 1 will be returned. Worse, C might switch back and forth, responding to some clients with version 2 and hiding it from others.

fourservierKV-storeNretrieve2

To fix this, we will use two phases in the query protocol: In the first phase, a client queries 2f+1 servers for the current version, in the second it writes back the highest version to 2f+1 servers. In this way, say that a client queries {A, C, D} and returns version 2. Then it writes back version 2 to some 2f+1 servers, say to {A, C, D}, guaranteeing that the update of version 2 has completed.

Remark: As solving K-V store has not been our focus, we glossed over certain details above. Also, as a side note, client signatures are not required to solve the K-V store problem with n=3f+1; see example solution here.

Systematic Generation of Interesting Scenarios

Notice what we did when designing K-V store:

We created simple scenarios, each with a small number of servers and a handful of exchange steps
In each exchange, we selected f servers to exclude: {D} from phases 1 and 2 by the first client, {A} from phase 1 of the second client, and so on.
We allowed Byzantine server C to go back to an old state and respond with an old value.

Putting the above into a structured scenario, we created the following succession of exchanges, partitions and “internal-state amnesia” by Byzantine servers:

Exchange 1: client-1 phase-1 with {A,B,C};
Exchange 2: client-1 phase-2 with {A,B,C};
Exchange 3: client-2 phase-1 with {B,C,D};
Exchange 4 (partial): client-2 phase-2 with {A,C} (and crash);
Exchange 5: client-3 with {B,C,D}, C has its internal state erased.

Twins systematically generates scenarios like the one above and mimics behaviors like amnesia via twin-instances of Byzantine nodes. Importantly, for reasonably small scenarios, it is quite feasible for Twins to effectively enumerate scenarios like those above in order to expose protocol vulnerabilities.

Summary

Designing distributed protocols with Byzantine Fault Tolerance (BFT) is not easy. The community has been grappling with worrisome safety and liveness vulnerabilities for decades, which in some cases, took the community decades to surface.

Twins is a new approach for BFT testing that provides coverage over many (though not all!) Byzantine attacks. Read more about Twins here.

Twitter Facebook LinkedIn

Twins: Making BFT Systems Robust

Motivation: Capturing Byzantine Attacks

Enter Twins!

Honing on Interesting Byzantine Behaviors

Systematic Generation of Interesting Scenarios

Summary

You May Also Enjoy

The Leader Handover Rule in HotStuff Enables Optimal Solutions

From HotStuff to VABA: Using HotStuff under Purely Asynchronous Settings

From Quorum Certified (QC) Broadcast to HotStuff and HotStuff-2

Some Common Models in Consensus Papers (duplicate)