✨ Xorq is an opinionated framework for cataloging, sharing, and shipping multi-engine compute as diffable artifacts for your data in flight. ✨
Xorq helps teams build declarative, reusable ML pipelines across Python and SQL engines like DuckDB, Snowflake, and DataFusion. It offers:
- 🧠 Multi-engine, declarative expressions using pandas-style syntax and Ibis.
- 📦 Expression Format for Python in YAML, enabling repeatable compute.
- ⚡ Portable UDFs and UDAFs with automatic serialization.
- 🔁 Shift-left with caching using expr hash for naming things.
- 🔍 Column-level lineage and observability out of the box.
Then follow the Quickstart Tutorial for a full walk-through using the Penguins dataset.
ML pipelines are brittle, inconsistent, and hard to reuse. Xorq gives you:
| Mixing pandas and SQL | Unified declarative API |
| Wasted computation | Transparent caching |
| Manual deployment | Xorq serve any expr |
| Debugging lineage | Visual lineage trees |
| Engine lock-in | Portable UDxFs |
| Repro issues | Compile-time schema and relational integrity validation |
Once you xorq build your pipeline, you get:
- expr.yaml: a reproducible expression graph
- deferred_reads.yaml: source metadata
- SQL and metadata files for inspection and CI
Here is a sample (abbreviated) output:
Please note that this is still in beta and the spec is subject to change.
Xorq uses Apache Arrow for zero-copy data transfer and leverages Ibis and DataFusion under the hood for efficient computation.
Xorq is pre-1.0 and evolving fast. Expect breaking changes.
.png)



