Show HN: datarepo – a data catalog without running a service or database
4 hours ago
2
datarepo is a simple query interface for multimodal data at any scale.
With datarepo, you can define a catalog, databases, and tables to query any existing data source. Once you've defined your catalog, you can spin up a static site for easy browsing or a read-only API for programmatic access. No running servers or services!
The datarepo catalog has native, declarative connectors to Delta Lake and Parquet stores. datarepo also supports defining tables via custom Python functions, so you can connect to any data source!
Here's an example catalog:
Unified interface: Query data across different storage modalities (Parquet, DeltaLake, relational databases)
Declarative catalog syntax: Define catalogs in python without running services
Catalog site generation: Generate a static site catalog for visual browsing
Extensible: Declare tables as custom python functions for querying any data
API support: Generate a YAML config for querying with ROAPI
datarepo is part of Neuralink's commitment to the open source community. By maintaining free and open source software, we aim to accelerate data engineering and biotechnology.
Neuralink is creating a generalized brain interface to restore autonomy to those with unmet medical needs today, and to unlock human potential tomorrow.
You don't have to be a brain surgeon to work at Neuralink. We are looking for exceptional individuals from many fields, including software and data engineering. Learn more at neuralink.com/careers.