Jonhoo and Helsing AI's DSON: A delta-state CRDT for resilient P2P communication

3 months ago 2

Helsing

Building applications that can reliably synchronise data in peer-to-peer, multi-hop edge networks is a significant challenge. These environments are often characterised by high latency, low bandwidth, and intermittent, opportunistic connectivity. Today, we are excited to open-source dson, a Rust crate that provides a robust foundation for exactly these scenarios.

dson is a low-level, space-efficient (both in memory and on the wire), and composable implementation of a 𝛿-state (delta-state) Conflict-Free Replicated Datatype (CRDT) for JSON-like data structures. It is based on the principles from the research paper DSON: JSON CRDT Using Delta-Mutations For Document Stores and began as a Rust port of the original authors’ JavaScript implementation.

Zoom image will be displayed

CRDTs offer a principled approach to optimistic replication. At their core, they are a tool for building AP (Availability + Partition Tolerance) systems, as defined by the CAP Theorem. In environments where network partitions are a given, CRDTs allow us to choose availability over strong consistency. This means applications can remain fully operational for both reads and writes, even when disconnected.

Think of Alice and Bob working with their collaborative shopping list in a supermarket with spotty reception. Alice checks off “milk,” Bob adds “ice,” and both phones accept the edits offline. When signal returns, the devices sync and produce a single, and consistent list.

Synchronisation is deferred until connectivity is restored, without risking data loss or corruption, allowing applications to always make decisions based on the latest available data. This works as CRDTs can support Strong Eventual Consistency, which is the guarantee that nodes will be in the same state, when they have received the same set of updates.

This guarantee of eventual state convergence is a powerful primitive for building collaborative and decentralised software. However, not all CRDTs are created equal; they exist on a spectrum of trade-offs.

Operation-based (or “op-based” or commutative) CRDTs work by exchanging individual operations (e.g., “insert ‘a’ at position 5”). This makes for small, targeted messages. However, this efficiency comes at the cost of strong network assumptions. Op-based systems typically require a reliable, causally-ordered broadcast mechanism to ensure correctness. This model is fragile in the face of long network partitions or dynamic membership, as the messaging layer must buffer undelivered operations, leading to potentially unbounded memory usage. Additionally, it’s neither possible to filter the data or to partially relay it according to dynamic rules, nor favor the most recent updates, as all intermediate operations must be transmitted as well to be interpretable by the recipient.

State-based CRDTs take the opposite approach: replicas exchange their entire state together with causal information to ensure causal consistency. This model is extremely robust and well-suited for open systems with unreliable networks, as it naturally tolerates message loss, duplication, and reordering. The downside is obvious: for any non-trivial data structure, the bandwidth cost is prohibitively high.

Delta-state (or 𝛿-state) CRDTs offer a pragmatic middle ground. Replicas exchange “deltas” — small, self-contained updates that represent a change. These deltas can be merged (or “joined”) just like full states, inheriting the robustness of the state-based approach. This combination of small message sizes and network resilience is the key to making CRDTs practical for constrained and unreliable environments.

However, there is no free lunch: while the delta-based approach simplifies the CRDT’s internal logic, it shifts complexity to the peer-to-peer gossip protocol. The network protocol is important to get right to ensure Strong Eventual Consistency.

DSON is a delta-state CRDT designed specifically for JSON-like documents. It provides two core collection types: an Observed-Remove Map and an Observed-Remove Array. And one type to represent primitive values: A Multi-Value Register.

DSON’s collections use Observed-Remove semantics. This means an element can only be removed if the replica performing the removal has previously observed its addition. Consider a collaborative shopping list where Alice updates “bananas” to “blueberries” at the same time Bob removes “bananas”. With OR-semantics, the update wins. Alice’s update implies the continued existence of the item, so her change takes precedence over Bob’s concurrent removal. This behavior is often the most intuitive and desirable for preserving user intent.

A key advantage of this model is the elimination of “tombstones”. In many other CRDTs, when an item is deleted, a “tombstone” marker is left behind to signify its removal. These tombstones are never garbage-collected and can cause unbounded metadata growth in long-lived documents, as the document size is proportional to the entire history of edits, not the live data.

DSON avoids this by being a causal CRDT. It tracks the history of operations using unique identifiers called “dots” (which for dson is a tuple out of a replica identifier and a sequence number). A deletion is not a separate marker but simply the absence of a dot from the set of live operations. When replicas synchronise, they compare their causal contexts to determine which elements have been removed, eliminating the need for tombstones. This design ensures that the metadata overhead remains proportional to the size of the live data.

The dson crate is a low-level, unopinionated library that provides the core CRDT primitives. It does neither include a networking layer nor a gossip protocol, which is required to ensure eventual consistency.

  • OrMap<K, V>: An observed-remove map.
  • OrArray<V>: An observed-remove array.
  • MvReg<T>: A multi-value register that handles conflicts for primitive
    values by preserving all concurrently written versions.
  • CausalDotStore<T>: The top-level container that wraps a CRDT (one of the above) and its associated CausalContext.

A core design concept in dson is the ExtensionType trait. This trait allows developers to define their own domain-specific CRDTs that can be composed within DSON’s OrMap and OrArray. This feature enables the creation of specialised data structures — such as counters, rich-text objects, or other domain-specific types — that go beyond the standard JSON primitives without requiring a fork.

Here is a minimal example demonstrating a concurrent write, delta generation, and merging:

use dson::{
crdts::{mvreg::MvRegValue, snapshot::ToValue, NoExtensionTypes},
CausalDotStore, Identifier, OrMap,
};

// 1. Setup two replicas, Alice and Bob
let alice_id = Identifier::new(0, 0);
let mut alice_store = CausalDotStore::<OrMap<String>>::default();

let bob_id = Identifier::new(1, 0);
let mut bob_store = CausalDotStore::<OrMap<String>>::default();

// 2. Alice creates an initial value
let key = "shared_key".to_string();
let delta_from_alice = dson::api::map::apply_to_register::<_, NoExtensionTypes, _>(
|reg, ctx, id| reg.write("initial".to_string().into(), ctx, id),
key.clone(),
)(&alice_store.store, &alice_store.context, alice_id);

alice_store.join_or_replace_with(delta_from_alice.store.clone(), &delta_from_alice.context);

// 3. Bob receives and applies the initial change
bob_store.join_or_replace_with(delta_from_alice.store, &delta_from_alice.context);
assert_eq!(alice_store, bob_store);

// 4. Alice and Bob make concurrent, un-synced edits
let delta_alice_edit = dson::api::map::apply_to_register::<_, NoExtensionTypes, _>(
|reg, ctx, id| reg.write("from Alice".to_string().into(), ctx, id),
key.clone(),
)(&alice_store.store, &alice_store.context, alice_id);
alice_store.join_or_replace_with(delta_alice_edit.store.clone(), &delta_alice_edit.context);

let delta_bob_edit = dson::api::map::apply_to_register::<_, NoExtensionTypes, _>(
|reg, ctx, id| reg.write("from Bob".to_string().into(), ctx, id),
key.clone(),
)(&bob_store.store, &bob_store.context, bob_id);
bob_store.join_or_replace_with(delta_bob_edit.store.clone(), &delta_bob_edit.context);

// 5. The replicas exchange their deltas
alice_store.join_or_replace_with(delta_bob_edit.store, &delta_bob_edit.context);
bob_store.join_or_replace_with(delta_alice_edit.store, &delta_alice_edit.context);

// Both stores have converged to the same state
assert_eq!(alice_store, bob_store);

// The MvReg deterministically exposes the conflict
let register_value = alice_store.store.get(&key).unwrap();
let values: Vec<_> = register_value.reg.values().into_iter().collect();

assert_eq!(values.len(), 2);
assert!(values.contains(&&MvRegValue::String("from Alice".to_string())));
assert!(values.contains(&&MvRegValue::String("from Bob".to_string())));

As this example shows, dson does not hide conflicts but exposes them deterministically, allowing your application to define its own resolution logic.

Dson is an efficient and powerful toolkit for developers building systems that require robust and resilient communication in P2P, multi-hop, and other constrained network environments.

Dson was built to be flexible and not too tailored to our use-case: we actively invite others to collaborate! We will continue to invest into dson — the crate (eg, extending the ExtensionType trait to also support custom CRDT types instead of a MvReg), and DSON — the algorithm (arbitrary delta decomposition).

We encourage you to explore the crate, provide feedback, and contribute.

Author: Oliver W

Read Entire Article