Oswald – Object Storage Write-Ahead Log Device

1 month ago 5

Nicolae Vartolomei · 2025/10

OSWALD is a Write-Ahead Log (WAL) design built exclusively on object storage primitives. It works with any object storage service that provides read-after-write consistency and compare-and-swap operations, including AWS S3, Google Cloud Storage, and Azure Blob Storage.

The design supports checkpointing and garbage collection, making it suitable for State Machine Replication (SMR).

high-level diagram

The design has been formally specified and verified using the P programming language.

Supporting code is available at github.com/nvartolomei/oswald.

Table of contents

High-level overview

OSWALD works with 3 types of objects:

  1. Manifest - tracks the latest checkpoint (snapshot) and garbage collection progress using Log Sequence Numbers (LSNs)
  2. Snapshots - user-defined state snapshots for optimized recovery
  3. Chunks - log content

The manifest object version, along with its content, is used for synchronizing readers, writers, and the garbage collection process. It is the only mutable object in the system.

Appending

Appending to the WAL requires two round trips: a PUT-If-None-Match to create the next chunk, followed by a GET-If-None-Match on the manifest.1

Append sequence

Tailing

Tailing requires one round trip per new chunk, plus two additional round trips: a GET for the next expected chunk (404 Not Found when no more chunks exist) and a GET-If-None-Match on the manifest.

Tailing sequence

Initialization

Readers and writers initialize by:

  1. GET the manifest
  2. GET the snapshot (if it exists)
  3. GET each chunk not covered by the snapshot
  4. GET-If-None-Match the manifest

With local snapshots, this can be optimized to a tailing-like sequence.

Concurrency conflicts

Writer-Writer conflicts

When multiple writers attempt concurrent writes, conflicts are detected using PUT-If-None-Match when creating chunks.

Writer-Writer conflict diagram

When a chunk for the same LSN already exists, object storage returns a 409 Conflict error. The writer must then follow the Tailing protocol for catch-up recovery and retry.

Writer-Garbage Collector conflicts

When garbage collection is active, the PUT-If-None-Match mechanism alone is insufficient.

Writer-Garbage Collector conflict diagram

If the garbage collector removed chunk n (GC watermark above n) and a writer is behind, PUT-If-None-Match will succeed, potentially causing write loss and log divergence. To prevent this, after creating a chunk but before acknowledging the operation, the writer must verify the GC watermark is below its LSN using GET-If-None-Match on the manifest. If the watermark has advanced past the writer’s LSN, the writer must restart and follow the Initialization protocol.

Tailer-Garbage Collector conflicts

Similar conflicts occur when tailers fall behind the GC watermark during tailing, catch-up recovery, or initialization. When detected, tailers must restart with the Initialization protocol.

Verification

The design has been formally specified and verified using the P programming language. The specification is available at github.com/nvartolomei/oswald and includes an increment-only counter implemented as a Replicated State Machine over OSWALD.


Read Entire Article