Moonlink is a Rust library that enables sub-second mirroring (CDC) of Postgres tables into Iceberg. It serves as a drop-in replacement for the Debezium + Kafka + Flink + Spark stack.
Under the hood, it extends Iceberg with a real-time storage engine optimized for low-latency, high-throughput ingestion from update-heavy sources like Postgres logical replication.
Note: Moonlink is in preview. Expect changes. Join our Community to stay updated!
- Sub-second Ingestion: Including updates and deletes
- Real-time Reads: Unified view combining in-memory state and Iceberg files
- Iceberg-native Optimizations: Implements deletion vectors and compaction to maintain read performance
- Simple Deployment: Single Rust library that can be embedded (pg_mooncake) or scaled out
Moonlink makes deep optimizations for Iceberg as the destination, unlike most replication tools that treat it as a black box.
Moonlink extends Iceberg with a thin Arrow buffer with indexes and a positional deletion log. This buffer efficiently handles hot incoming data; and will periodically flush to Iceberg.
Raw Inserts
- Rows are written to an Arrow buffer
- Buffer data is efficiently flushed to Parquet when full
Raw Deletions
- Moonlink maintains primary key indexes for all rows
- Deletions update the positional deletion log using these indexes
- Periodically, deletion logs are converted to Iceberg v3 deletion vectors.
Moonlink exposes a union read interface that combines its in-memory state with Iceberg files.
Engines can use this union-read interface to access the most current table state. For eg: pg_mooncake v0.2 uses this for sub-second consistency between Postgres and Columnstore (Iceberg) tables.
Note: Moonlink writes Iceberg tables with deletion vectors (Iceberg v3). Check your query engine for deletion vector support.
Moonlink can support multiple input data sources through moonlink-connectors. Currently, only Postgres logical replication is supported as a source.
Feel free request more connectors or open a PR!
Today, Moonlink can be used as a library. We have two sample examples of Moonlink being use:
-
The pg_mooncake Postgres extension:
pg_mooncake runs moonlink as a Background Worker Process that:
- Manages tables and CDC ingestion
- Processes union read requests
-
moonlink-backend test
A demo of moonlink-backend running as a server, replicating Postgres tables to Iceberg.
Iceberg Integration
- Integration & productionize more Iceberg Catalogs
- Iceberg DataFile Optimization & Compaction
Performance Optimization
- Read/Write Cache
- Index Optimization
Data Types
- Composite types in connector
Iceberg Integration
- Partitioning & Clustering
- Schema Evolution
- Iceberg V3 types: Geospatial & Variant
Functionality
- Expose Index_Read interface for fast lookup queries.
- Implement Other index types: Inverted-index, Full-text search, Vector.
Deployability
- Deploy Moonlink as standalone service
🥮