Target-ducklake: connect 500 sources to Ducklake

3 months ago 2

July 18, 202510 minute read

Steven Wang

At Definite, we’ve been a big fan of DuckDB and use it as a core part of our data infrastructure. When Duck Labs announced Ducklake in May, it was a no-brainer to integrate it with our data stack. Ducklake significantly enhances many aspects of our DuckDB-centric data warehousing strategy—most notably by enabling concurrent reads and writes, all while preserving DuckDB as the query engine (post on how we previously handled reads and writes).

Another big part of our stack is Meltano, which we use for data ingestion. To put all the pieces together, we built target-ducklake, a Meltano and Singer compatible target for loading data into Ducklake. A few of the target’s features are highlighted below:

Type conversions: Ensures that data types are correctly converted when loading into Ducklake. Without these conversions, certain types (such as timestamps or JSON fields) would default to being stored as strings in the underlying Parquet files, rather than maintaining their original data types.
Data sync strategies: append, merge/upsert
Storage: Google Cloud Storage, S3, local files
Catalog Options: Postgres, MySQL, SQLite, DuckDB
Partitions: Timestamp partitions are supported at year, month, day, hour granularities. Categorical partitions are also supported (eg status, country, team, etc.)

It’s still early days for Ducklake and while target-ducklake is still under active development, we would love for users to give it a try, contribute, and provide feedback!

Read Entire Article

Target-ducklake: connect 500 sources to Ducklake

Related

Letting the future out of the box: avoiding boxed futures in...

Marion County agrees to pay $3M and apologize over raid on s...

Firefox AI Window