FlightStream – An Arrow Flight-Based Server and Client Framework for Node.js

3 months ago 2

License Node.js Version Alpha Release

⚠️ Alpha Release: This is currently in alpha. APIs may change between releases. This is not production-ready software. For production use, consider waiting for the stable release or pinning to a specific alpha version.

A comprehensive, high-performance Apache Arrow Flight streaming framework for Node.js that enables efficient, real-time data streaming across distributed systems. Built with a modular plugin architecture, FlightStream provides both server-side streaming capabilities and client-side data access patterns, making it ideal for modern data pipelines, analytics applications, and microservices architectures.

See It in Action: Streaming a 45MB CSV file with ~850K rows in < 4s

FlightStream Demo

  • Data Engineering: Stream CSV files to analytics engines (Apache Spark, DuckDB, Pandas)
  • API Modernization: Replace REST APIs with efficient columnar data transfer
  • Real-time Analytics: Power dashboards and BI tools with live data streams
  • Microservices: Enable high-performance data sharing between services
  • Multi-language Integration: Connect applications written in different programming languages

Extensible adapter system for any data source - CSV, databases, cloud storage

Efficient gRPC streaming with Apache Arrow's columnar data format

Comprehensive error handling, monitoring hooks, and Docker support

Connect from Python, Java, C++, JavaScript using standard Arrow Flight clients

Automatic Arrow schema detection from CSV files with type optimization

Efficient streaming of large datasets with configurable batch sizes

Rich examples, comprehensive documentation, and easy setup

# Clone and install git clone https://github.com/ggauravr/flightstream.git cd flightstream npm install # Start the example server npm start # Test with the first dataset found in the data/ directory npm test # Test with a specific dataset npm test <dataset>

The server automatically discovers CSV files in the data/ directory and serves them via Arrow Flight protocol.

Server Terminal (npm run dev):

FlightStream Server Running

Client Terminal (npm test):

FlightStream Client Streaming Data

That's it! The server will automatically discover CSV files in the data/ directory and stream them via Arrow Flight protocol. The test client will connect and display the streamed data in real-time. As you can see a CSV with ~41k rows is streamed to the client in .25s!

Client Terminal With a Specific Dataset(npm test MARC2020-County-01):

FlightStream Client Streaming Data

The test client will connect and display the streamed data specificed by the dataset id in real-time. In the example above, CSV with ~800k rows is streamed to the client in <4s!

  • Flight Server: Started on localhost:8080 with CSV adapter
  • Sample Data: Automatically discovered from ./data/ directory
  • Test Client: Connected via gRPC and streamed Arrow data
  • Live Reload: Server restarts automatically when you modify code

The monorepo contains focused, reusable packages:

  • Data Lakes: Serve files efficiently from S3, GCS, Snowflake, or local storage
  • Analytics Pipelines: Stream data to Apache Spark, DuckDB, or custom analytics
  • Real-time ETL: High-performance data transformation and streaming
  • API Modernization: Replace REST APIs with efficient columnar data transfer for real-time analytics products
  • Multi-language Integration: Connect Python, Java, C++, and JavaScript applications

Bug fixes, enhancements, optimizations, docs, anything!

Complete API documentation and examples

Core architecture diagrams and design patterns

The project includes working examples:

  • Basic Server (examples/basic-server/): Complete CSV server implementation
  • Basic Client (examples/basic-client/): Client with connection management and streaming

This project is licensed under the MIT License.

  • Apache Arrow for the columnar data format
  • DuckDB for the embedded analytical database and the mind-blowing single-node performance
  • gRPC for the high-performance RPC framework
  • Apache Arrow Flight for the amazing message transfer protocol
Read Entire Article