⚠️ Alpha Release: This is currently in alpha. APIs may change between releases. This is not production-ready software. For production use, consider waiting for the stable release or pinning to a specific alpha version.
A comprehensive, high-performance Apache Arrow Flight streaming framework for Node.js that enables efficient, real-time data streaming across distributed systems. Built with a modular plugin architecture, FlightStream provides both server-side streaming capabilities and client-side data access patterns, making it ideal for modern data pipelines, analytics applications, and microservices architectures.
- Data Engineering: Stream CSV files to analytics engines (Apache Spark, DuckDB, Pandas)
- API Modernization: Replace REST APIs with efficient columnar data transfer
- Real-time Analytics: Power dashboards and BI tools with live data streams
- Microservices: Enable high-performance data sharing between services
- Multi-language Integration: Connect applications written in different programming languages
Extensible adapter system for any data source - CSV, databases, cloud storage
Efficient gRPC streaming with Apache Arrow's columnar data format
Comprehensive error handling, monitoring hooks, and Docker support
Connect from Python, Java, C++, JavaScript using standard Arrow Flight clients
Automatic Arrow schema detection from CSV files with type optimization
Efficient streaming of large datasets with configurable batch sizes
Rich examples, comprehensive documentation, and easy setup
The server automatically discovers CSV files in the data/ directory and serves them via Arrow Flight protocol.
That's it! The server will automatically discover CSV files in the data/ directory and stream them via Arrow Flight protocol. The test client will connect and display the streamed data in real-time. As you can see a CSV with ~41k rows is streamed to the client in .25s!
The test client will connect and display the streamed data specificed by the dataset id in real-time. In the example above, CSV with ~800k rows is streamed to the client in <4s!
- Flight Server: Started on localhost:8080 with CSV adapter
- Sample Data: Automatically discovered from ./data/ directory
- Test Client: Connected via gRPC and streamed Arrow data
- Live Reload: Server restarts automatically when you modify code
The monorepo contains focused, reusable packages:
- Data Lakes: Serve files efficiently from S3, GCS, Snowflake, or local storage
- Analytics Pipelines: Stream data to Apache Spark, DuckDB, or custom analytics
- Real-time ETL: High-performance data transformation and streaming
- API Modernization: Replace REST APIs with efficient columnar data transfer for real-time analytics products
- Multi-language Integration: Connect Python, Java, C++, and JavaScript applications
Bug fixes, enhancements, optimizations, docs, anything!
Complete API documentation and examples
Core architecture diagrams and design patterns
The project includes working examples:
- Basic Server (examples/basic-server/): Complete CSV server implementation
- Basic Client (examples/basic-client/): Client with connection management and streaming
- GitHub: ggauravr/flightstream
- Issues: Report bugs and request features
- Discussions: Community discussions
- Contributions: Please see the Contributing Guide for details
This project is licensed under the MIT License.
- Apache Arrow for the columnar data format
- DuckDB for the embedded analytical database and the mind-blowing single-node performance
- gRPC for the high-performance RPC framework
- Apache Arrow Flight for the amazing message transfer protocol
.png)






