fenic is a PySpark-inspired DataFrame framework designed for building production AI and agentic applications. fenic provides support for reading datasets directly from the Hugging Face Hub.

Getting Started
To get started, pip install fenic:
Create a Session
Instantiate a fenic session with the default configuration (sufficient for reading datasets and other non-semantic operations):
Overview
fenic is an opinionated data processing framework that combines:
- DataFrame API: PySpark-inspired operations for familiar data manipulation
- Semantic Operations: Built-in AI/LLM operations including semantic functions, embeddings, and clustering
- Model Integration: Native support for AI providers (Anthropic, OpenAI, Cohere, Google)
- Query Optimization: Automatic optimization through logical plan transformations
Read from Hugging Face Hub
fenic can read datasets directly from the Hugging Face Hub using the hf:// protocol. This functionality is built into fenic’s DataFrameReader interface.
Supported Formats
fenic supports reading the following formats from Hugging Face:
- Parquet files (.parquet)
- CSV files (.csv)
Reading Datasets
To read a dataset from the Hugging Face Hub:
Reading with Schema Management
Note: In fenic, a schema is the set of column names and their data types. When you enable merge_schemas, fenic tries to reconcile differences across files by filling missing columns with nulls and widening types where it can. Some layouts still cannot be merged—consult the fenic docs for CSV schema merging limitations and Parquet schema merging limitations.
Authentication
To read private datasets, you need to set your Hugging Face token as an environment variable:
Path Format
The Hugging Face path format in fenic follows this structure:
You can also specify dataset revisions or versions:
Features:
- Supports glob patterns (*, **)
- Dataset revisions/versions using @ notation:
- Specific commit: @d50d8923b5934dc8e74b66e6e4b0e2cd85e9142e
- Branch: @refs/convert/parquet
- Branch alias: @~parquet
- Requires HF_TOKEN environment variable for private datasets
Mixing Data Sources
fenic allows you to combine multiple data sources in a single read operation, including mixing different protocols:
This flexibility allows you to seamlessly combine data from Hugging Face Hub and local files in your data processing pipeline.
Processing Data from Hugging Face
Once loaded from Hugging Face, you can use fenic’s full DataFrame API:
Basic DataFrame Operations
AI-Powered Operations
To use semantic and embedding operations, configure language and embedding models in your SessionConfig. Once configured:
.png)

