Conflict-Free Replicated Data Types

CRDTs address a growing concern with making distributed systems reliable. The currently popular model assumes perfect connectivity between distributed nodes, which cannot always be guaranteed. While it is possible for each application to address this concern individually, CRDTs represent an effort to “standardize” such approaches, at least in the sense of removing the need for every application developer to reinvent the wheel.

CRDTs in particular assume optimistic replication. In this model, the data of any replica is independent of all other replicas. Merging replicas occurs when the nodes can connect and synchronize with each other. CRDTs primarly take care of conflict resolution in such cases.

For more details on CRDTs, you can read the article “Conflict-free Replicated Data Types” by Nuno Preguiça, Carlos Baquero, and Marc Shapiro (2018), or the older “A comprehensive study of Convergent and Commutative Replicated Data Types” by Marc Shapiro, Nuno Preguiça, Carlos Baquero, and Marek Zawirski (2011).

Relationship to Vessel

Vessel is a container format that primarily provides features for chunking large content up into smaller extents, which can then be individually synchronized between nodes. As this chunking interacts with encryption, the format also supports encryption of each individual extent. In order to do so, it needs to provide some notion of authorship of a extent.

Finally, for authorship by multiple users in parallel, it needs to deconflict at the extent level. The approach that vessel takes is that each author’s extent is equally valuable – more precisely, it is agnostic to the extent’s meaning or value. This means that with multiple authors basing a contribution off the same prior data, branching can occur. Vessel permits this, but provides a simple algorithm for generally ensuring largely linear extent sequences.

Vessel always knows an extent’s predecessor, so can provide a semblance of order in data streams by itself. But when multiple extents are based off the same parent (authored by different authors in parallel), it becomes impossible to provide an authoritative order. This is where wyrd’s CRDT, embedded within the vessel extents, provides an answer.