Published Wed, October 15, 2025 ∙ OLAP, ClickHouse ∙ by Johanan Ottensooser
In an OLAP (analytical) database, there are no updates, just merges. Every transaction is just writing a new row.
This is fundamental to the performance of OLAP databases, and very different from the update paradigm in OLTP (transactional) databases. So, you have to make certain design decisions related to this, especially when you are trying to capture data that includes updates, like CDC (change data capture) from your OLTP database. In ClickHouse, the key design choice is the table engine (plus your sorting key and partitions). In other OLAP systems you’ll tune partitioning, clustering, and dedup/compaction instead of picking an engine.
This article will show you the effect of which ClickHouse table engine you decide to use, in this case, when you are ingesting CDC data. You can capture changes by reading your OLTP write ahead log (WAL) (e.g., Postgres WAL via Debezium) and streaming those events into ClickHouse. The table engine you pick determines how inserts/updates/deletes are reconciled into a queryable, up‑to‑date view without constantly doing deduplication at query time.
TL;DR:
- CDC data is append-only event data, not a simple snapshot of state.
- ClickHouse handles this best with the right table engine (Replacing, Collapsing, or VersionedCollapsing).
- Choosing the wrong engine can lead to double-counting or stale rows in your queries.
Recommendation:
When ingesting CDC into ClickHouse, model updates and deletes explicitly and pick a table engine that matches your CDC semantics. MooseStack can handle this logic for you automatically.
We receive data in a form something like the below:
c | 1 | Alice | [email protected] | 100.00 | null | 100001000 | 2025-10-14T14:00:01Z | initial insert |
u | 1 | Alice | [email protected] | 125.00 | 100.00 | 100001500 | 2025-10-14T14:00:10Z | balance update |
u | 1 | Alice | [email protected] | 150.00 | 125.00 | 100002000 | 2025-10-14T14:00:20Z | another update |
d | 1 | null | null | null | 150.00 | 100002500 | 2025-10-14T14:00:30Z | delete event |
*LSN is Postgres’ Log Sequence Number: a strictly increasing WAL position. Perfect as a monotonic ver for ClickHouse engines to order updates.
If you want to run a query to sum the balance, you can very easily foot-gun:
This returns 375 (even though the final balance is 0 after the delete) because you summed every historical value.
So, how can we turn the above into a set of data that is queryable?
First thing, we use a streaming function to tidy up the data a little (the final form of this tidying may depend on the table engine chosen, as you’ll see below), giving us:
All we are doing is grabbing the relevant columns, creating the ver column, and explicitly calling out row deletes.
Then, we’ll write them to ClickHouse.
The behaviour of a ClickHouse table is defined by the engine used. We’ll compare the following engines and their behaviour for writing the above CDC in (and how you’d query from them):
MergeTree is a strong default, but not for CDC
MergeTree is ClickHouse’s default table engine, and it provides a great balance of fast inserts and optimized queries. However, it does not update rows automatically.
- So, inserts are fine, a new row is just created.
- Updates will be written as new rows, so you’ll have to deduplicate these rows in your consumption queries by selecting the max ver or latest ts_ms per id. Note, it is the ORDER BY key that is the subject of deduplication, here it just happens to be called `id`.
- Deletes are not dealt with automatically. Either you need to run a slow ALTER DELETE query, or you need to add logic around the is_deleted column in your query.
Fastest writes | Slow reads if you want to deduplicate your data Easy to make mistakes in your queries or subqueries Manual deletes if you need real deletes.” |
Moose OLAP support
Any OLAP table created with Moose OLAP by default uses MergeTree. Just create your OlapTable using defaults:
ReplacingMergeTree is strong for most CDC use-cases
ReplacingMergeTree is becoming the community standard for use in CDC. Essentially, it extends the MergeTree engine: duplicate rows are cleaned up on merge. It always keeps the newest version of a row, discarding the older duplicates during background merges.
ReplacingMergeTree deduplicates by the ORDER BY (sorting) key. Among rows with the same sorting key, it keeps the highest ver (or the last inserted if no ver is configured). If you also pass an is_deleted column, a delete is represented as a tombstone (latest version with is_deleted=1). Queries then filter WHERE is_deleted=0 for active rows; physical cleanup of tombstones requires OPTIMIZE … FINAL CLEANUP with the appropriate setting enabled.
- Inserts are just inserts
- Updates are inserted as duplicate rows initially. On a merge, the rules above dictate which rows are deleted. This means that if you query before a merge, there is a chance you see duplicate data. Use SELECT … FINAL to deduplicate on read (expensive), or OPTIMIZE TABLE … FINAL to trigger a background merge.
- Deletes are signalled with the is_deleted flag. On a merge, all previous versions are deleted. In order not to see the deleted row, you must either filter for this flag in your query, or you can run OPTIMIZE … FINAL CLEANUP, which gets rid of these tombstone rows.
Safe query:
Fast writes Very easy implementation | Stale versions are visible and queryable until a merge occurs Need to be aware of tombstone approach / have appropriate fields in your ingested data There are edge cases where you need to understand how your data is partitioned to ensure “finality” of your data |
Moose OLAP support:
To set up an OlapTable with ReplacingMergeTree, we use ClickHouseEngines.ReplacingMergeTree:
This requires an ORDER BY to be specified, and has optional ver and isDeleted parameters for controlling deduplication with more granularity.
CollapsingMergeTree
CollapsingMergeTree uses a Sign column (±1) to cancel rows during merges (in what feels a little bit like double entry accounting).
- Inserts are inserts, and have a sign = 1 (this is a live row)
- When you get an update, you create two rows. A negative row (with sign -1) with the same data as the previous version of the row, effectively cancelling out the row mathematically. A positive row representing the new row.
- Deletes just create a negative row.
On merge, pairs of positive and negative rows are removed.
This is easier explained with a worked example (I’ll use the same data as above).
First transaction (create) creates the row below:
1 | Alice | [email protected] | 100.00 | 100001000 | 2025-10-14T14:00:01Z | 1 |
Second transaction (update) creates the second and third rows below:
At this moment, if you sum the balance, taking into account the sign, you get the correct balance of 125 (the first and second row cancel).
Fourth transaction (delete) creates the fifth and sixth rows below:
For aggregates, computing with sign (e.g., SUM(balance * sign)) yields the correct result immediately. For row-level views, you’ll still need FINAL (or a GROUP BY that encodes the collapsing) until background merges complete.
Safe query:
Explicit deletes Immediately correct for certain queries like SUM(value*Sign) | Complex to use / query Only immediately correct for certain queries, not arbitrary SELECT *s Shifts complexity onto CDC / data pipeline / preparation layer (you need to know / query for the previous version to be able to write the negative row) “Immediate correctness” requires you don’t have bad luck with partitions Engine doesn’t track global version, which can lead to inconsistency |
VersionedCollapsingMergeTree
This engine extends the CollapsingMergeTree engine to solve that final con on the list, adding a version column to the collapsing algorithm to allow the engine to track global version.
Moose OLAP support
Given the flexibility of ReplacingMergeTree, we haven’t seen demand for support for CollapsingMergeTree or VersionedCollapsingMergeTree. Interested in having it supported? Let us know.
For almost every CDC use-case, ReplacingMergeTree will be the perfect balance of fast write speed, low write complexity and correctness (albeit eventual correctness). For most analytical use-cases, the potential temporary incorrectness is marginal compared to the size of data being queried. If you need perfect row level data immediately, it may be that your OLTP source would be the better database to query.
If you need strict correctness under out‑of‑order ingestion or in replicated clusters with frequent updates/deletes, VersionedCollapsingMergeTree is the safest collapsing choice, but it introduces operational complexity.
In OLAP databases (specifically, ClickHouse with the table engines analysed above), every transaction is just writing a new row. The way the transactions are dealt with on merge depends on the engine, and you can do some operator math to ensure correctness before a merge.
But all rows are still writes. You can get around this, but you pay the cost of that in complexity at write time or query time.
In tomorrow’s article, we’ll show you how to navigate these complexities in the CDC from OLTP use-case.