Short bio

Dr. Malkhi’s work over two decades brought scientific innovation into fruition in several leadering industrial settings. Her research spans broad aspects of reliability and security of distributed systems, recently with focus on blockchains and advances in financial technology. Her work resulted in over 200 publications as well as a strong impact on computing technology.

A select sample of technologies she participated in creating includes:

Malkhi is a Professor in the Department of Computer Science of UCSB since 2024, and a Distinguished Scientist of Chainlink Labs since 2022. Formerly, she was the CTO of Diem Association and Lead Researcher at Novi Financial from 2019 to 2022. In 2014, she co-founded VMware Research and became a Principal Researcher at VMware until June 2019. Prior to that, Malkhi was a partner principal researcher at Microsoft Research, 2004-2014; a tenured Associate Professor of the Hebrew University of Jerusalem, and a senior researcher at AT&T Labs.

Selected academic roles and distinctions:

Technology Impact

I enjoy working on questions that emerge from the engineering of real systems. I spent the past two decades working on bringing scientific innovation into fruition in several leading industrial settings. Below, I tell the stories of four technologies I participated in creating.

HotStuff at VMware 2016 and DiemBFT at Diem(Libra) 2019

Renewed interest in the Blockchain world on scaling and robustifying the long standing problem of asynchronous Byzantine Fault Tolerant (BFT) Consensus.

In 2016 when designing the blockchain infrastructure at VMware’s blockchain project, we observed that all BFT solutions contain quadratic voting steps. Why is this so bad? When Byzantine consensus protocols were originally conceived, a typical target system size was n=4 or n=7, tolerating one or two faults. But scaling BFT consensus to n=2000 means that even on a ``good day’’ when communication is timely and a handful of failures occurs, quadratic steps require 4,000,000 messages. A cascade of failures might bring the communication complexity to whopping 8,000,000,000 transmissions for a single consensus decision. No matter how good the engineering and how we tweak and batch the system, these theoretical measures are a roadblock for scalability.

Around that time, tremendous innovation was occurring outside academic circles by blockchain startups. Two of these caught our attention, Tendermint and Casper. These protocols dramatically simplified the view change mechanism by introducing a synchronous delay when a leader starts. I observed that by adding one more phase to Tendermint, we can maintain the advantage of simplicity while avoiding the delay it introduced. The result is HotStuff: BFT Consensus in the Lens of Blockchain, named after a cartoon character in the same family of Casper, the first responsive BFT solution with a linear view-change.

Beyond improving communication complexity, HotStuff embodies a minimalist algorithmic framework that bridges between classical BFT solutions and the blockchain world; the entire protocol is captured in less than half a page of pseudo-code. HotStuff became popular in the blockchain developer community not only due to linearity, but (and perhaps mostly) due to its simplicity and developer-friendly design. Diem(Libra) adopted it to drive the blockchain infrastructure, as did (that we know of) Flow, Celo, and Cypherium.

Flexible Paxos at VMware 2016

In the summer of 2016, I hosted a research intern named Heidi Howard from Cambridge, UK. I told her about the CorfuDB protocol and encouraged her to think about the performance benefit of separating the sequencer role from the rest of the system. The result has been a stunning revelation we named Flexible Paxos: Quorum Intersection Revisited.:

Each of the phases of Paxos may use non-intersecting quorums. Only quorums from different phases are required to intersect. Majority quorums are not necessary as intersection is required only across phases.

Everyone in the field of distributed systems knows that quorums in Paxos must intersect, so what gives? What Heidi observed is that Paxos, which lies at the foundation of many production systems, is conservative. Within each of the phases of Paxos, it is safe to use disjoint quorums and majority quorums are not necessary. Since the second phase of Paxos (replication) is far more common than the first phase (leader election), we can use Flexible Paxos to reduce the size of commonly used second phase quorums. By no longer requiring replication quorums to intersect, we have removed an important limit on scalability. Through smart quorum construction and pragmatic system design, we enabled a new breed of scalable, resilient and performant consensus algorithms. The algorithmic core of a production scale-out messaging bus at Facebook called LogDevice is based on it, as is the more flexible paxos of YouTube’s distributed MySQL backbone.

CorfuDB at Microsoft 2012 and at VMware 2014

In 2012, Phil Bernstein approached me at Microsoft Research with the following observation. RAM has grown cheap/large enough to hold a complete database index in memory. Therefore, one can build a fully replicated transaction processing engine by storing a database index completely in-memory, persisting index modifications to a shared commit-log. His team prototyped an in-memory index called Hyder. The key enabler for this vision would be a reliable, high throughput distributed log, which Phil wanted to stripe across an array of SSDs. Unfortunately (yet fotunate for me), the initial design of his distributed commit-log was flawed. While fixing the design, I extracted a foundational insight that motivated me to establish and lead the CorfuDB project.

CorfuDB is a database-less database built around a global, reliable, high-throughput distributed commit-log. The CorfuDB log serves as the source of ground truth around which one builds distributed control-planes for large clusters. The key paradigm underlying CorfuDB is the reliable log that operates at high throughput. This was the foundational insight I have taken from Hyder. I built the first CorfuDB PoC at Microsoft with OS license, and later drove it at VMware to production. At VMware, CorfuDB serves as the a distributed control-plane for NSX-T, a leading SDN product that has market volume of over $1B. At Facebook, CorfuDB was re-engineered in Delos, a control plane underlying a dynamic cluster storage backend system.

You might wonder what happened to Phil’s in-memory fully replicated DB. Several years later, it became the backbone of the SQL Azure cloud database.

Fairplay at the Hebrew University of Jerusalem 2004

In 2004, Noam Nisan and I asked ourselves whether cryptographic primitives which were considered completely impractical are actually becoming practical. With my PhD student Yaron Sella, we implemented the MPC protocol, while Noam supervised his grad-students to implement a language that compiles into a binary circuit. The first fully implemented Fairplay MPC platform was alive shortly after. By 2008, the the millionaires problem, mini auctions, and other problems, could be solved over an interconnect in seconds. Since then, the Fairplay source code has been downloaded by hundreds of academic groups, and has sparked in the past decade a wave of crypto-engineering projects which bring crypto theory into practice, including heavy crypto methods like oblivious RAM, ZK proofs and PCP.

Academic descendants