Show HN: Cartesian vs. Sparse Merkle Trees – faster treap-based design in Go

4 months ago 3

Use-Cases & Go Benchmarks

Midjourney

Introduction

As decentralized applications, blockchains, and trust-minimized systems advance, so the need for efficient, authenticated data structures continues to grow. Merkle trees are at the forefront of this evolution, enabling compact proofs of data integrity in untrusted environments.

While Sparse Merkle Trees (SMTs) have been a mainstay for state commitments and verification, a newer construction called the Deterministic Cartesian Merkle Tree (CMT) offers a promising alternative. CMTs blend binary search trees (BST), heaps, and Merkle trees into a deterministically shaped, idempotent data structure that might yield gas and storage savings under certain conditions.

This article dives into the technical aspects of both SMT and CMT, compares their performance in a Go-based example, discusses recent optimizations that improved CMT’s off-chain performance, and explores where these trees can be applied in real-world scenarios.

What Are Merkle Trees?

A Merkle tree is a binary tree where each node’s hash commits to the data at its leaves and all nodes below it. By hashing pairs of children up the tree until a single root hash is formed, it becomes possible to produce succinct proofs that a certain piece of data is included (or excluded) without revealing the entire data set. This cryptographic property underpins trust-minimized verification, allowing systems like blockchains and distributed databases to ensure data integrity with minimal overhead. Merkle trees are widely used in:

Blockchains: Ensuring transaction integrity in Bitcoin, Ethereum, and other networks.
Distributed File Systems: Confirming that a file chunk is present in a larger data set.
Database Synchronization: Verifying partial data sets against a known commitment.

Sparse Merkle Trees (SMTs)

A Sparse Merkle Tree is typically a fixed-depth, full binary tree keyed by hashed keys. Even though not all leaves are populated, a consistent “sparse” structure commits to empty nodes as well. This structure makes it exceptionally convenient to produce and verify inclusion or exclusion proofs. SMTs are widely used for on-chain state commitments (e.g., in rollups, sidechains, or layer-2 solutions), as they can produce small proofs attesting that a given key-value pair is included or not in a global state. However, the overhead of hashing and the depth of the tree can sometimes make operations costly, especially if implemented straightforwardly on-chain. The availability and cost of hash functions (e.g., keccak256, Poseidon) also impact SMT efficiency.

Deterministic Cartesian Merkle Trees (CMTs)

A Deterministic Cartesian Merkle Tree combines three data structures into one:

Binary Search Tree (BST): The nodes are keyed and maintain an in-order property, enabling ordered insertions and lookups.
Heap: Each node has a priority derived deterministically from its value’s hash, ensuring a unique and reproducible tree shape for a given set of inserts.
Merkle Tree: Every node commits to its subtree, enabling cryptographic proofs of inclusion and exclusion.

Unlike traditional Merkle trees, CMTs store values in every node, not just the leaves. This design can cut storage costs roughly in half. CMTs are deterministic and idempotent — re-inserting the same key-value pairs yields the same tree shape and root hash. Preliminary on-chain benchmarks using keccak256 showed that CMT insertions could be cheaper than SMT by around 18%. However, with Poseidon (a more advanced but currently not precompiled hash function on Ethereum), SMT outperformed CMT by about 30%. These results highlight how crucial hashing performance and environment are in choosing a suitable data structure.

Off-Chain Performance Optimizations

Below we present updated Go code example that demonstrates both a CMT and an SMT. After initial measurements showed CMT slower in a naive off-chain setup, we introduced several optimizations:

Caching Child Hashes: Instead of recalculating hashes for subtrees at every insertion, removal, or rotation, we cache child node hashes.
Selective Re-hashing: Only re-hash nodes when children or values actually change.
Flexible Hash Functions: Abstracting the hash function to switch easily between SHA-2 and SHA-3 (Keccak256) for experimentation and performance comparison.

These optimizations yielded significantly better performance for the CMT in the off-chain Go benchmark. While off-chain performance does not map directly to on-chain gas costs, it shows that small implementation details can have a large impact on efficiency.

Example in Go:

Note: This code is for demonstration and educational purposes. It is not production-ready.

Code repo: https://github.com/rafaelescrich/cartesian-sparse-merkle-trees

// smt.go
package merkle

import (
"bytes"
)

// SMTNode represents a node in a Sparse Merkle Tree
type SMTNode struct {
keyHash []byte
value []byte
merkleHash []byte
left *SMTNode
right *SMTNode
}

// SMT represents a Sparse Merkle Tree
type SMT struct {
root *SMTNode
hashFunc HashBytesFunc
emptyHash []byte
}

// NewSMT creates a new Sparse Merkle Tree with specified hash function
func NewSMT(hashFunc HashFunc) *SMT {
hashFn := GetHashFunc(hashFunc)
return &SMT{
hashFunc: hashFn,
emptyHash: hashFn([]byte("empty")),
}
}

// calcNodeHash computes a node's hash value
func (s *SMT) calcNodeHash(n *SMTNode) []byte {
if n == nil {
return s.emptyHash
}
leftHash := s.childHash(n.left)
rightHash := s.childHash(n.right)
return s.hashFunc(n.keyHash, n.value, leftHash, rightHash)
}

// childHash returns the hash of a child node or the empty hash if nil
func (s *SMT) childHash(n *SMTNode) []byte {
if n == nil {
return s.emptyHash
}
return n.merkleHash
}

// updateNodeHash updates a node's merkleHash value
func (s *SMT) updateNodeHash(n *SMTNode) {
if n == nil {
return
}
n.merkleHash = s.calcNodeHash(n)
}

// insertNode recursively inserts a key-value pair into the tree
func (s *SMT) insertNode(n *SMTNode, key, value []byte) *SMTNode {
keyHash := s.hashFunc(key)

if n == nil {
newNode := &SMTNode{
keyHash: keyHash,
value: value,
}
s.updateNodeHash(newNode)
return newNode
}

cmp := bytes.Compare(keyHash, n.keyHash)
if cmp < 0 {
n.left = s.insertNode(n.left, key, value)
} else if cmp > 0 {
n.right = s.insertNode(n.right, key, value)
} else {
// Key hash exists, update value if changed
if !bytes.Equal(n.value, value) {
n.value = value
}
}
s.updateNodeHash(n)
return n
}

// Insert adds or updates a key-value pair in the tree
func (s *SMT) Insert(key, value []byte) {
s.root = s.insertNode(s.root, key, value)
}

// findNode recursively finds a node with the given key hash
func (s *SMT) findNode(n *SMTNode, keyHash []byte) *SMTNode {
if n == nil {
return nil
}

cmp := bytes.Compare(keyHash, n.keyHash)
if cmp < 0 {
return s.findNode(n.left, keyHash)
} else if cmp > 0 {
return s.findNode(n.right, keyHash)
} else {
return n
}
}

// Get retrieves the value associated with a key, or nil if the key doesn't exist
func (s *SMT) Get(key []byte) []byte {
keyHash := s.hashFunc(key)
node := s.findNode(s.root, keyHash)
if node == nil {
return nil
}
return node.value
}

// Exists checks if a key exists in the tree
func (s *SMT) Exists(key []byte) bool {
keyHash := s.hashFunc(key)
return s.findNode(s.root, keyHash) != nil
}

// findMin finds the node with the minimum key hash in the subtree
func (s *SMT) findMin(n *SMTNode) *SMTNode {
for n != nil && n.left != nil {
n = n.left
}
return n
}

// removeNode recursively removes a key from the tree
func (s *SMT) removeNode(n *SMTNode, key []byte) *SMTNode {
if n == nil {
return nil
}

keyHash := s.hashFunc(key)
cmp := bytes.Compare(keyHash, n.keyHash)

if cmp < 0 {
n.left = s.removeNode(n.left, key)
} else if cmp > 0 {
n.right = s.removeNode(n.right, key)
} else {
// Node found
if n.left == nil {
return n.right
} else if n.right == nil {
return n.left
} else {
// Replace with successor (minimum in right subtree)
minNode := s.findMin(n.right)
n.keyHash = minNode.keyHash
n.value = minNode.value
n.right = s.removeNode(n.right, minNode.keyHash)
}
}
s.updateNodeHash(n)
return n
}

// Remove deletes a key from the tree
func (s *SMT) Remove(key []byte) {
s.root = s.removeNode(s.root, key)
}

// RootHash returns the hash of the root node, representing the entire tree
func (s *SMT) RootHash() []byte {
if s.root == nil {
return s.emptyHash
}
return s.root.merkleHash
}

// Size returns the number of nodes in the tree
func (s *SMT) Size() int {
return s.countNodes(s.root)
}

// countNodes recursively counts the number of nodes in the tree
func (s *SMT) countNodes(n *SMTNode) int {
if n == nil {
return 0
}
return 1 + s.countNodes(n.left) + s.countNodes(n.right)
}

// Clear removes all nodes from the tree
func (s *SMT) Clear() {
s.root = nil
}

// cmt.go
package merkle

import (
"bytes"
)

// CMTNode represents a node in a Cartesian Merkle Tree
type CMTNode struct {
key []byte
value []byte
priority []byte
merkleHash []byte

left *CMTNode
right *CMTNode
leftHash []byte
rightHash []byte
}

// CMT represents a Cartesian Merkle Tree
type CMT struct {
root *CMTNode
hashFunc HashBytesFunc
emptyHash []byte
}

// NewCMT creates a new Cartesian Merkle Tree with specified hash function
func NewCMT(hashFunc HashFunc) *CMT {
hashFn := GetHashFunc(hashFunc)
return &CMT{
hashFunc: hashFn,
emptyHash: hashFn([]byte("empty")),
}
}

// calcNodeHash computes a node's hash with cached child hashes
func (c *CMT) calcNodeHash(n *CMTNode) []byte {
if n == nil {
return c.emptyHash
}
return c.hashFunc(n.key, n.value, n.leftHash, n.rightHash)
}

// updateNodeHash updates a node's merkleHash when children or node values have changed
func (c *CMT) updateNodeHash(n *CMTNode) {
if n == nil {
return
}
// Update children's hashes if children changed
if n.left == nil {
n.leftHash = c.emptyHash
} else {
n.leftHash = n.left.merkleHash
}
if n.right == nil {
n.rightHash = c.emptyHash
} else {
n.rightHash = n.right.merkleHash
}
n.merkleHash = c.calcNodeHash(n)
}

// rotateRight performs a right rotation on the given node
func (c *CMT) rotateRight(y *CMTNode) *CMTNode {
x := y.left
y.left = x.right
x.right = y
// Update hashes bottom-up
c.updateNodeHash(y)
c.updateNodeHash(x)
return x
}

// rotateLeft performs a left rotation on the given node
func (c *CMT) rotateLeft(x *CMTNode) *CMTNode {
y := x.right
x.right = y.left
y.left = x
// Update hashes bottom-up
c.updateNodeHash(x)
c.updateNodeHash(y)
return y
}

// generatePriority generates a priority value for a node based on its value
func (c *CMT) generatePriority(value []byte) []byte {
return c.hashFunc(value)
}

// insertNode recursively inserts a key-value pair into the tree
func (c *CMT) insertNode(n *CMTNode, key, value []byte) *CMTNode {
if n == nil {
newNode := &CMTNode{
key: key,
value: value,
priority: c.generatePriority(value),
}
// Initialize hashes
c.updateNodeHash(newNode)
return newNode
}

cmp := bytes.Compare(key, n.key)
if cmp < 0 {
n.left = c.insertNode(n.left, key, value)
if bytes.Compare(n.left.priority, n.priority) < 0 {
n = c.rotateRight(n)
}
} else if cmp > 0 {
n.right = c.insertNode(n.right, key, value)
if bytes.Compare(n.right.priority, n.priority) < 0 {
n = c.rotateLeft(n)
}
} else {
// Key exists, update only if value changed
if !bytes.Equal(n.value, value) {
n.value = value
n.priority = c.generatePriority(value)
c.updateNodeHash(n)
}
return n
}
c.updateNodeHash(n)
return n
}

// Insert adds or updates a key-value pair in the tree
func (c *CMT) Insert(key, value []byte) {
c.root = c.insertNode(c.root, key, value)
}

// findNode recursively finds a node with the given key
func (c *CMT) findNode(n *CMTNode, key []byte) *CMTNode {
if n == nil {
return nil
}

cmp := bytes.Compare(key, n.key)
if cmp < 0 {
return c.findNode(n.left, key)
} else if cmp > 0 {
return c.findNode(n.right, key)
} else {
return n
}
}

// Get retrieves the value associated with a key, or nil if the key doesn't exist
func (c *CMT) Get(key []byte) []byte {
node := c.findNode(c.root, key)
if node == nil {
return nil
}
return node.value
}

// Exists checks if a key exists in the tree
func (c *CMT) Exists(key []byte) bool {
return c.findNode(c.root, key) != nil
}

// removeNode recursively removes a key from the tree
func (c *CMT) removeNode(n *CMTNode, key []byte) *CMTNode {
if n == nil {
return nil
}

cmp := bytes.Compare(key, n.key)
if cmp < 0 {
n.left = c.removeNode(n.left, key)
} else if cmp > 0 {
n.right = c.removeNode(n.right, key)
} else {
// Node found
if n.left == nil {
return n.right
} else if n.right == nil {
return n.left
} else {
// Rotation based on priority
if bytes.Compare(n.left.priority, n.right.priority) < 0 {
n = c.rotateRight(n)
n.right = c.removeNode(n.right, key)
} else {
n = c.rotateLeft(n)
n.left = c.removeNode(n.left, key)
}
}
}
c.updateNodeHash(n)
return n
}

// Remove deletes a key from the tree
func (c *CMT) Remove(key []byte) {
c.root = c.removeNode(c.root, key)
}

// RootHash returns the hash of the root node, representing the entire tree
func (c *CMT) RootHash() []byte {
if c.root == nil {
return c.emptyHash
}
return c.root.merkleHash
}

// Size returns the number of nodes in the tree
func (c *CMT) Size() int {
return c.countNodes(c.root)
}

// countNodes recursively counts the number of nodes in the tree
func (c *CMT) countNodes(n *CMTNode) int {
if n == nil {
return 0
}
return 1 + c.countNodes(n.left) + c.countNodes(n.right)
}

// Clear removes all nodes from the tree
func (c *CMT) Clear() {
c.root = nil
}

// main.go
package main

import (
"encoding/hex"
"flag"
"fmt"
"math/rand"
"time"

"cartesian-sparse-merkle-trees/merkle"
)

// runBenchmark runs performance tests for both tree types with the specified hash function
func runBenchmark(hashFunc merkle.HashFunc, keys, values [][]byte, numElements int) {
hashName := "SHA2"
if hashFunc == merkle.SHA3 {
hashName = "SHA3"
}

fmt.Printf("\n========== %s Benchmark ==========\n", hashName)

// Test CMT Insert
cmt := merkle.NewCMT(hashFunc)
start := time.Now()
for i := 0; i < numElements; i++ {
cmt.Insert(keys[i], values[i])
}
cmtInsertDur := time.Since(start)

// Test SMT Insert
smt := merkle.NewSMT(hashFunc)
start = time.Now()
for i := 0; i < numElements; i++ {
smt.Insert(keys[i], values[i])
}
smtInsertDur := time.Since(start)

// Test CMT access
start = time.Now()
for i := 0; i < numElements; i++ {
_ = cmt.Get(keys[i])
}
cmtGetDur := time.Since(start)

// Test SMT access
start = time.Now()
for i := 0; i < numElements; i++ {
_ = smt.Get(keys[i])
}
smtGetDur := time.Since(start)

// Test CMT Remove
start = time.Now()
for i := 0; i < numElements; i++ {
cmt.Remove(keys[i])
}
cmtRemoveDur := time.Since(start)

// Test SMT Remove
start = time.Now()
for i := 0; i < numElements; i++ {
smt.Remove(keys[i])
}
smtRemoveDur := time.Since(start)

// Print results
fmt.Println("Hash Function:", hashName)
fmt.Println("Number of elements:", numElements)
fmt.Println("CMT Root Hash:", cmt.RootHash())
fmt.Println("SMT Root Hash:", smt.RootHash())

fmt.Println("\nPerformance comparison:")
fmt.Printf("CMT Insert Time: %v\n", cmtInsertDur)
fmt.Printf("SMT Insert Time: %v\n", smtInsertDur)
fmt.Printf("CMT Get Time: %v\n", cmtGetDur)
fmt.Printf("SMT Get Time: %v\n", smtGetDur)
fmt.Printf("CMT Remove Time: %v\n", cmtRemoveDur)
fmt.Printf("SMT Remove Time: %v\n", smtRemoveDur)

// Show relative performance
if smtInsertDur > 0 {
fmt.Printf("\nRelative performance:\n")
fmt.Printf("CMT/SMT Insert: %.2fx\n", float64(cmtInsertDur)/float64(smtInsertDur))
fmt.Printf("CMT/SMT Get: %.2fx\n", float64(cmtGetDur)/float64(smtGetDur))
fmt.Printf("CMT/SMT Remove: %.2fx\n", float64(cmtRemoveDur)/float64(smtRemoveDur))
}
}

// simpleExample demonstrates basic usage of both tree types
func simpleExample() {
fmt.Println("\n========== Simple Example ==========")

// Create trees with SHA-256
cmt := merkle.NewCMT(merkle.SHA2)
smt := merkle.NewSMT(merkle.SHA2)

// Sample data
keys := []string{
"key1",
"key2",
"key3",
}

values := []string{
"value1",
"value2",
"value3",
}

// Insert data into both trees
fmt.Println("Inserting key-value pairs...")
for i, key := range keys {
cmt.Insert([]byte(key), []byte(values[i]))
smt.Insert([]byte(key), []byte(values[i]))
}

// Check sizes
fmt.Printf("CMT size: %d\n", cmt.Size())
fmt.Printf("SMT size: %d\n", smt.Size())

// Get values
fmt.Println("\nRetrieving values:")
for _, key := range keys {
cmtValue := cmt.Get([]byte(key))
smtValue := smt.Get([]byte(key))
fmt.Printf("Key: %s, CMT Value: %s, SMT Value: %s\n",
key, string(cmtValue), string(smtValue))
}

// Check if keys exist
fmt.Println("\nChecking key existence:")
existingKey := "key2"
nonExistingKey := "key4"

fmt.Printf("'%s' exists in CMT: %v\n", existingKey, cmt.Exists([]byte(existingKey)))
fmt.Printf("'%s' exists in SMT: %v\n", existingKey, smt.Exists([]byte(existingKey)))
fmt.Printf("'%s' exists in CMT: %v\n", nonExistingKey, cmt.Exists([]byte(nonExistingKey)))
fmt.Printf("'%s' exists in SMT: %v\n", nonExistingKey, smt.Exists([]byte(nonExistingKey)))

// Compare root hashes
fmt.Println("\nRoot hashes before removal:")
fmt.Printf("CMT Root Hash: %s\n", hex.EncodeToString(cmt.RootHash()))
fmt.Printf("SMT Root Hash: %s\n", hex.EncodeToString(smt.RootHash()))

// Remove a key
keyToRemove := "key2"
fmt.Printf("\nRemoving key: %s\n", keyToRemove)
cmt.Remove([]byte(keyToRemove))
smt.Remove([]byte(keyToRemove))

// Check sizes after removal
fmt.Printf("CMT size after removal: %d\n", cmt.Size())
fmt.Printf("SMT size after removal: %d\n", smt.Size())

// Compare root hashes after removal
fmt.Println("\nRoot hashes after removal:")
fmt.Printf("CMT Root Hash: %s\n", hex.EncodeToString(cmt.RootHash()))
fmt.Printf("SMT Root Hash: %s\n", hex.EncodeToString(smt.RootHash()))
}

// ---------------------------------------------
// Main - Comparison and Demonstration
// ---------------------------------------------
func main() {
// Parse command line arguments
numElements := flag.Int("n", 5000, "Number of elements for benchmarks")
runSHA2 := flag.Bool("sha2", true, "Run SHA2 benchmarks")
runSHA3 := flag.Bool("sha3", true, "Run SHA3 benchmarks")
runExample := flag.Bool("example", true, "Run simple example")
flag.Parse()

// Run simple example if requested
if *runExample {
simpleExample()
}

// Only run benchmarks if at least one hash function is selected
if *runSHA2 || *runSHA3 {
// Generate random test data for benchmarks
fmt.Printf("\nGenerating random data for %d elements...\n", *numElements)
keys := make([][]byte, *numElements)
values := make([][]byte, *numElements)

// Create a local random source instead of using the global one
r := rand.New(rand.NewSource(time.Now().UnixNano()))
for i := 0; i < *numElements; i++ {
k := make([]byte, 32)
v := make([]byte, 32)
r.Read(k)
r.Read(v)
keys[i] = k
values[i] = v
}

fmt.Printf("Running benchmarks with %d elements\n", *numElements)

// Run SHA2 benchmarks if selected
if *runSHA2 {
runBenchmark(merkle.SHA2, keys, values, *numElements)
}

// Run SHA3 benchmarks if selected
if *runSHA3 {
runBenchmark(merkle.SHA3, keys, values, *numElements)
}
}
}

Use Cases

Both CMT and SMT have broad applicability, though their optimal environments differ:

On-Chain State Management:

CMT: With precompiled hashing and a large state, a CMT might reduce on-chain storage costs and gas fees due to deterministic structure and value storage at each node.
SMT: Ideal for representing vast state spaces (like Ethereum’s state trie), where sparse structures and succinct proofs are essential.

Off-Chain Indexing & Proof Generation:

CMT: Off-chain ordering systems or indexes where deterministic reconstruction of the same dataset is critical. This can be useful for caching and syncing data across distributed databases with minimal overhead.
SMT: Off-chain solutions that need to frequently prove the non-existence of a key (e.g., in a DNS-like system or a PKI) can leverage SMT’s well-understood proof sizes.

Light Clients & Cross-Chain Bridges:

CMT: Deterministic trees can simplify consensus proofs and state synchronization for light clients that need to verify the authenticity of a subset of data.
SMT: Standard choice for cross-chain bridges and state proofs because of their compatibility with standardized Merkle proof formats and well-understood depth-based proofs.

Data Availability & Rollups:

CMT: Potentially beneficial in rollup solutions that rely heavily on inclusion and exclusion proofs, especially if underlying hash functions become cheaper on-chain.
SMT: Currently more common in rollups and sidechains due to existing tooling and research around Merkle Patricia Tries and SMT-based commitments.

Conclusion

Deterministic Cartesian Merkle Trees are an emerging alternative to Sparse Merkle Trees, offering deterministic shape, idempotency, and potentially better resource usage under certain hashing conditions.

By implementing both structures in Go, experimenting with hash functions, and applying caching optimizations, we demonstrated that small implementation details significantly influence off-chain performance.

Ultimately, the choice between a CMT and an SMT depends on your environment, hash function availability, and cost model. On-chain, gas efficiency might favor one structure over the other. Off-chain, raw performance and ease of implementation could matter more.

As decentralized systems evolve, having multiple tools in the toolbox — such as CMT and SMT, enables developers to pick the right data structure for their specific use case, whether it’s a blockchain state commitment, a verifiable database index, or a cross-chain bridging system.

About the author: Rafael is a seasoned fintech expert, blockchain engineer and entrepreneur, passionate about enlightening others on the potential of blockchain and DeFi. He strives to demystify complex concepts and make them accessible and engaging for all.

Read Entire Article