Can We ScalaSQL on DuckDB?

4 days ago 1

Embolden by earlier's success, I decided to check out if Arrow methods worked.

I wrote a script to create an Arrow vector. The main interest of that code is: it's a good reminder of how verbose Java can be. As such, it's contents are omitted. However, there is something that bit me. You need to generate a full Arrow's VectorSchemaRoot. If you just try to load, say, an IntVector, DuckDB will throw an error, as it seems to require a Schema.

The other relevant bit regarding this setup is: if you want to use Arrow, you have to set the --add-opens JDK variable. I have not found a way to do this directly with scala-cli. Thus I have this config script.

With Arrow properly configured and vector saved to disk, we can replicate this section of DuckDB's docs, using ScalaSQL. We define again our table as a case class. Then load and register the Arrow vector into the database. Finally, we can query our vector using ScalaSQL. Find the code in ArrowTest.scala

import scalasql._, PostgresDialect._ import org.apache.arrow.c.ArrowArrayStream import org.apache.arrow.c.Data import org.apache.arrow.vector.ipc.ArrowFileReader import org.apache.arrow.memory.RootAllocator import java.io.File import java.io.FileInputStream val allocator = RootAllocator() case class Asdf[T[_]]( integers: T[Int] ) object Asdf extends Table[Asdf] @main def arrowTest = { val file = File("data/test_file.arrow") val inputStream = FileInputStream(file) val reader = ArrowFileReader(inputStream.getChannel(), allocator) val arrowStream = ArrowArrayStream.allocateNew(allocator) Data.exportArrayStream(allocator, reader, arrowStream) conn.registerArrowStream("asdf", arrowStream) println(db.run(Asdf.select)) }
Read Entire Article