I ditched Java's Sort for DuckDB's because it's quacker

3 weeks ago 2

Source code available at the end of this article.

The experiment

I compare three different implementations of sort:

Collections.sort(). This implementation is single-threaded. The list is modified after the operation. For that reason, a new copy of the list is created each time.
List#stream().parallel().sorted().toList(). This implementation is multi-threaded making it fair to be compared with DuckDB’s sort which is also multi-threaded.
DuckDB’s sort. It will be done by creating a table with one single column

CREATE TABLE tbl (x INT)

Feeding this table with data (the list to be sorted, 1 row for each element) and executing the following query

SELECT x FROM tbl ORDER BY x

The sort operation looks like this

public <T> List<T> sort(int size, ThrowingFunction<FieldReader, T> columnReader) {
try {
var stmt = this.conn.prepareStatement("SELECT x FROM tbl ORDER BY x");
var resultset = (DuckDBResultSet) stmt.executeQuery();
var allocator = new RootAllocator();
List<T> r = new ArrayList<>(size);
try (var reader = (ArrowReader) resultset.arrowExportStream(allocator, 512)) {
while (reader.loadNextBatch()) {
FieldVector vector = reader.getVectorSchemaRoot().getVector(0);
FieldReader vectorReader = vector.getReader();
for (int i = 0; i < vector.getValueCount(); i++) {
vectorReader.setPosition(i);
r.add(columnReader.apply(vectorReader));
}
}
}
return r;
} catch (Exception e) {
throw new RuntimeException(e);
}
}

I tested sorting list of random integers (32-bits) and strings (from 4 to 16 characters) of different size.

Evaluating results

Results are determined by running the program on my local machine, a Mac Studio with Apple M2 Max chip, 32GB of Memory. DuckDB version 1.4.1, JDK version 25.

These benchmark results are just for fun! Don’t read too much into them; I wasn’t aiming for perfect accuracy or strict comparisons.

Results

Average time in milliseconds, lower is better for different list size, from 100 thousands to 100 million elements.

Press enter or click to view image in full size

+----------------------------+------+-----+------+---------+
| Sort type \ List size | 100K | 1M | 10M | 100M |
+----------------------------+------+-----+------+---------+
| DuckDB sort | 6 | 19 | 143 | 1827 |
| JDK parallel sort | 2 | 18 | 281 | 3911 |
| JDK sort | 11 | 148 | 2112 | > 30000 |
+----------------------------+------+-----+------+---------+

Above 10 million elements in the list, DuckDB’s sort is twice as fast as Java’s parallel sort!

Press enter or click to view image in full size

+-----------------------+------+-----+------+--------------+
| Sort type \ List size | 100K | 1M | 10M | 100M |
+-----------------------+------+-----+------+--------------+
| DuckDB sort | 16 | 61 | 538 | 8216 |
| JDK parallel sort | 3 | 36 | 578 | 7980 |
| JDK sort | 19 | 270 | 4277 | Not Measured |
+-----------------------+------+-----+------+--------------+

Using Strings shows a different result. The difference between the two type of sorts is surprisingly small. DuckDB’s sort is very impressive despite the added overhead of string recreation and JNI calls. (I voluntarily omit the fact that more garbage is generated 🙈).

Press enter or click to view image in full size

CPU Time