CRDTs #4: Convergence, Determinism, Lower Bounds and Inflation

12 hours ago 3

The CRDT literature sometimes leaves room for mathematical ambiguity. Maybe because the bulk of the work tends to be targeted at systems researchers and developers, like a lot of work on eventual consistency.

The discussion below untangles three subtle but important ideas in CRDT design, which turn out to be interrelated:

Determinism vs Convergence guarantees
Early reads: lower bounds or not?
Algebraic property requirements for update functions.

⚠️

The CRDT guarantee of strong eventual consistency does not guarantee determinism! If you want your CRDTs to be deterministic, or you want to treat CRDT reads as lower bounds, then your update functions must be inflationary.

This is the 4th post in a series I'm doing on CRDTs. This one is a bit more technical and narrow than my previous CRDT posts, but practitioners should still know about the main conclusions. This post also contains a small formal result that seems novel -- Strong Eventual Consistency does not guarantee determinism! I'd be curious to hear about prior work that makes this point.

Definitions First

Before we get into the details, let's be a bit careful with our terminology. I'll assume you've seen CRDTs before, but I'll review the key points while trying to keep this breezy.

Join Semi-Lattices

You likely remember that CRDTs are related to lattices in abstract algebra. Let's review that.

Let’s suppose we have a set of possible states SS, equipped with a join operator ⊔:S×S→S\sqcup: S \times S \to S. The join is required to satisfy three properties:

Idempotent: x⊔x=xx \sqcup x = x
Commutative: x⊔y=y⊔xx \sqcup y = y \sqcup x
Associative: x⊔(y⊔z)=(x⊔y)⊔zx \sqcup (y \sqcup z) = (x \sqcup y) \sqcup z

If you have a set of states and a binary operator satisfying these properties, you have a join-semilattice.

From this operator, we can define a partial order:

x≤yiffx⊔y=yx \leq y \quad \text{iff} \quad x \sqcup y = y

This means that applying join never forgets anything. It only moves “upward” in the order induced by join.

From Join Semi-Lattices to CRDTs

A CRDT is essentially a replicated join semi-lattice. To the core mathematical definition we add one more API, an update function that mutates the state of the CRDT (while of course staying within the state space SS).  As we'll see below, these functions can be the source of nondeterminism depending on their algebraic properties. Also, the CRDT literature tends to refer to the join operator as merge. (That works for me, as it disambiguates the CRDT term from the unrelated relational join operator from databases.)

Next, we assume an asynchronous network dissemination service that periodically sends a snapshot of one node's state to another. When state arrives at a destination node, its local CRDT state is mutated to reflect the merge of the current state with the message.  We'll assume that every node eventually merges data from every other node directly or indirectly (transitively).

That's all we need, at the level of detail we'll pursue here. However, the literature on CRDTs typically highlights a subclass of CRDTs called "op-based" CRDTs. You can read about those in an earlier post and I'll have more to say about them below.

Given this background, let’s move on to define a key property of functions on the lattice domain.

Inflationary

We say that a function f:S→Sf:S \rightarrow S over an ordered domain SS is inflationary if its output is never smaller than its input:

x≤f(x)x \leq f(x)

For CRDT discussion, we assume the domain SS is the same as that of our semi-lattice, and the partial order ≤\leq is the one from our semi-lattice definition. Inflationary functions never take us "down" in the lattice order.

ℹ️

Let’s start with something reassuring: the join operator x⊔yx \sqcup y is always inflationary in each of its arguments.

Why? x≤x⊔yx \leq x \sqcup y because x⊔yx \sqcup y is an upper bound for xx.

This is one reason why many CRDT papers don’t emphasize inflationarity: they tend to focus on merge operations, and merge (⊔\sqcup) satisfies the property automatically.

Our issue is going to be the update function each node may run between merges. More on that shortly.

But first, let's define the standard guarantee that CRDTs are designed to provide.

Strong Eventual Consistency (SEC)

The canonical correctness guarantee for CRDTs is Strong Eventual Consistency (SEC), as defined by Shapiro et al. It requires that, eventually, all replicas converge to the same state, provided they have seen the same set of updates.

The SEC condition includes three properties:

Eventual delivery: All updates eventually reach all replicas.
Termination: All operations eventually complete.
Strong convergence: If two replicas have received the same set of updates, they will reach the same state.

Note that SEC says nothing about ordering of operations; in particular it does not constrain the interleaving of update and merge. Given that dissemination is assumed to be asynchronous, a replica might send its current state before or after applying an update. We’ll see how this matters next.

SEC Does Not Guarantee Determinism

We tend to think of convergence as a property that avoids non-determinism: after all, in a convergent replica system, all the replicas agree on the outcome. But that doesn't actually mean that they agree on a deterministic outcome!

As defined, Strong Eventual Consistency only says that replicas converge within a run. It says nothing about whether different executions — with the same set of updates and nodes — produce the same final result.

Consider the following function:

DropTop(x)={aif x=⊤xotherwise\text{DropTop}(x) = \begin{cases} a & \text{if } x = \top \\ x & \text{otherwise} \end{cases}

The ⊤\top ("top") symbol is the standard lattice notation for topmost state in a finite lattice (i.e. the unique join of all the states).

DropTop\text{DropTop} is not inflationary: it "drops" from ⊤\top to a lower value aa. If a user applies it once at node A, the final result depends on whether A sends its state to B before or after applying the update.

Example: Two Possible Outcomes

Let S={⊥,a,⊤}S = \{\bot, a, \top\}, with ⊥<a<⊤\bot < a < \top.

Node A starts at ⊤\top
Node B starts at ⊥\bot
A single user applies DropTop\text{DropTop} once on node A

Outcome 1

Run 1

Outcome 2

Run 2

Run 1: send first, then update

A sends ⊤\top to B
B merges: ⊥⊔⊤=⊤\bot \sqcup \top = \top
A applies DropTop:⊤↦a\text{DropTop}: \top \mapsto a
B sends ⊤\top to A
A merges: a⊔⊤=⊤a \sqcup \top = \top
Final state: A = ⊤\top, B = ⊤\top

Run 2: update first, then send

A applies DropTop:⊤↦a\text{DropTop}: \top \mapsto a
A sends aa to B
B merges: ⊥⊔a=a\bot \sqcup a = a
Final state: A = aa, B = aa

Same updates. Same nodes. Same messages. Different orders. Two different converged states.

Inflationarity is Needed for Deterministic Convergence

As we've seen, non-inflationary functions like DropTop\text{DropTop} may non-deterministically drive the CRDT into one of many possible converged states, which depend on the interleaving of updates and messages.

If all updates are inflationary, then once a state has moved "up" the lattice, it never moves back down — and that ensures:

Each replica's state moves up the lattice.
Merging always moves replicas up in the lattice.
The final state is the least upper bound of all updates.

That last point ensures determinism: the final result depends only on the set of updates, not their timing.

⚠️

Inflationary updates are required for deterministic convergence in CRDTs.

Non-inflationary updates will still converge to some common state, but the choice of converged state will be non-deterministic.

Non-Inflationary Updates Spoil Lower Bounds

In the previous post I talked about how it's unsafe to read the naked state of a CRDT, but at least a "well-behaved" CRDT provides lower bounds. Well, in addition to providing non-deterministic outcomes, non-inflationary updates are "ill-behaved" and do not provide lower bounds.

Imagine you're writing application logic that reads from a CRDT value during execution.

If the update function is inflationary, you have a guarantee: the current state always provides a lower bound of the final state. That makes it safe to make assumptions like:

"This user will never lose this permission."
"This counter will only increase from here."

But if updates are not inflationary, those assumptions no longer hold. You might see a high value now — and a lower one later — because your node applied a non-inflationary update.

That means your application can't treat CRDT reads as stable observations in any sense. It must treat every intermediate value as potentially temporary.

☣️

When using non-inflationary update functions, CRDT read values can be completely arbitrary. Inflationary update ensures that CRDT read is a lower bound on the final converged state, which is a common assumption in the CRDT literature.

Ensuring Inflationarity

There is a simple hack to ensure inflationarity, by changing the implementation of the update API, or removing it entirely.

The change is as follows. When a user invokes update(x), we do not directly mutate the CRDT state to x; rather, we mutate the state to merge(update(x)). Because merge is inflationary (as discussed above), the effect of updates in this implementation is always inflationary. The result is determinism and guaranteed lower bounds, at the cost of a slightly surprising instant/"local" effect from update.

If you want the contract with the CRDT programmer to be more explicit, you can simply remove update from the API and force developers to invoke merge locally. This is nearly the same as the above design, except for type signatures:

merge:S→S: S \to S
update:U→(S→S): U \to (S \to S)