Datafusion-Postgres: Postgres protocol adapter for datafusion query engine
1 month ago
7
A PostgreSQL-compatible server for Apache DataFusion, supporting authentication, role-based access control, and SSL/TLS encryption. Available as both a library and CLI tool.
Built on pgwire to provide PostgreSQL wire protocol compatibility for analytical workloads.
It was originally an example of the pgwire
project.
datafusion-postgres: Postgres frontend for datafusion, as a library.
Serving Datafusion SessionContext with pgwire library
Customizible authentication
Permission control
Built-in pg_catalog tables
Built-in postgres functions for common meta queries
datafusion-postgres-cli: A cli tool starts a postgres compatible server for
datafusion supported file formats, just like python's SimpleHTTPServer.
arrow-pg: A data type mapping, encoding/decoding library for arrow and
postgres(pgwire) data types.
See auth.rs for complete implementation examples using DfAuthSource.
Supported Database Clients
Database Clients
psql
DBeaver
pgcli
VSCode SQLTools
BI
Metabase
The Library datafusion-postgres
The high-level entrypoint of datafusion-postgres library is the serve
function which takes a datafusion SessionContext and some server configuration
options.
use std::sync::Arc;use datafusion::prelude::SessionContext;use datafusion_postgres::{serve,ServerOptions};use datafusion_postgres::pg_catalog::setup_pg_catalog;// Create datafusion SessionContextlet session_context = Arc::new(SessionContext::new());// Configure your `session_context`// ...// Optional: setup pg_catalog schemasetup_pg_catalog(session_context,"datafusion")?;// Start the Postgres compatible server with SSL/TLSlet server_options = ServerOptions::new().with_host("127.0.0.1".to_string()).with_port(5432)// Optional: setup tls.with_tls_cert_path(Some("server.crt".to_string())).with_tls_key_path(Some("server.key".to_string()));serve(session_context,&server_options).await
Command-line tool to serve JSON/CSV/Arrow/Parquet/Avro files as PostgreSQL-compatible tables.
datafusion-postgres-cli 0.6.1
A PostgreSQL interface for DataFusion. Serve CSV/JSON/Arrow/Parquet files as tables.
USAGE:
datafusion-postgres-cli [OPTIONS]
FLAGS:
-h, --help Prints help information
-V, --version Prints version information
OPTIONS:
--arrow <arrow-tables>... Arrow files to register as table, using syntax `table_name:file_path`
--avro <avro-tables>... Avro files to register as table, using syntax `table_name:file_path`
--csv <csv-tables>... CSV files to register as table, using syntax `table_name:file_path`
-d, --dir <directory> Directory to serve, all supported files will be registered as tables
--host <host> Host address the server listens to [default: 127.0.0.1]
--json <json-tables>... JSON files to register as table, using syntax `table_name:file_path`
--parquet <parquet-tables>... Parquet files to register as table, using syntax `table_name:file_path`
-p <port> Port the server listens to [default: 5432]
--tls-cert <tls-cert> Path to TLS certificate file for SSL/TLS encryption
--tls-key <tls-key> Path to TLS private key file for SSL/TLS encryption
# Run with SSL/TLS encryption
datafusion-postgres-cli \
--csv data:sample.csv \
--tls-cert server.crt \
--tls-key server.key
# Run without encryption (development only)
datafusion-postgres-cli --csv data:sample.csv
Host a CSV dataset as a PostgreSQL-compatible table:
Loaded delhiclimate.csv as table climate
TLS not configured. Running without encryption.
Listening on 127.0.0.1:5432 (unencrypted)
🔐 Authentication: The default setup allows connections without authentication for development. For secure deployments, use DfAuthSource with standard pgwire authentication handlers (cleartext, MD5, or SCRAM). See auth.rs for implementation examples.
psql -h 127.0.0.1 -p 5432 -U postgres
postgres=>SELECTCOUNT(*) FROM climate;
count
-------1462
(1 row)
postgres=>SELECTdate, meantemp FROM climate WHERE meantemp >35LIMIT5;
date | meantemp
------------+----------2017-05-15 | 36.92017-05-16 | 37.92017-05-17 | 38.62017-05-18 | 37.42017-05-19 | 35.4
(5 rows)
postgres=>BEGIN;
BEGIN
postgres=>SELECTAVG(meantemp) FROM climate;
avg
------------------25.4955206557617
(1 row)
postgres=>COMMIT;
COMMIT
Loaded delhiclimate.csv as table climate
TLS enabled using cert: server.crt and key: server.key
Listening on 127.0.0.1:5432 with TLS encryption
If you like the idea of pgwire, datafusion-postgres and want to join the
development of the library, or its ecosystem integrations, extensions, you are
welcomed to join our developer mailing list: https://groups.io/g/pgwire-dev/