
fenic is an opinionated, PySpark-inspired DataFrame framework from typedef.ai for building AI and agentic applications. Transform unstructured and structured data into insights using familiar DataFrame operations enhanced with semantic intelligence. With first-class support for markdown, transcripts, and semantic operators, plus efficient batch inference across any model provider.
fenic supports Python [3.10, 3.11, 3.12]
fenic requires an API key from at least one LLM provider. Set the appropriate environment variable for your chosen provider:
The fastest way to learn about fenic is by checking the examples.
Below is a quick list of the examples in this repo:
Hello World! | Introduction to semantic extraction and classification using fenic's core operators through error log analysis. |
Enrichment | Multi-stage DataFrames with template-based text extraction, joins, and LLM-powered transformations demonstrated via log enrichment. |
Meeting Transcript Processing | Native transcript parsing, Pydantic schema integration, and complex aggregations shown through meeting analysis. |
News Analysis | Analyze and extract insights from news articles using semantic operators and structured data processing. |
Podcast Summarization | Process and summarize podcast transcripts with speaker-aware analysis and key point extraction. |
Semantic Join | Instead of simple fuzzy matching, use fenic's powerful semantic join functionality to match data across tables. |
Named Entity Recognition | Extract and classify named entities from text using semantic extraction and classification. |
Markdown Processing | Process and transform markdown documents with structured data extraction and formatting. |
JSON Processing | Handle complex JSON data structures with semantic operations and schema validation. |
Feedback Clustering | Group and analyze feedback using semantic similarity and clustering operations. |
Document Extraction | Extract structured information from various document formats using semantic operators. |
(Feel free to click any example above to jump right to its folder.)
fenic is an opinionated, PySpark-inspired DataFrame framework for building production AI and agentic applications.
Unlike traditional data tools retrofitted for LLMs, fenic's query engine is built from the ground up with inference in mind.
Transform structured and unstructured data into insights using familiar DataFrame operations enhanced with semantic intelligence. With first-class support for markdown, transcripts, and semantic operators, plus efficient batch inference across any model provider.
fenic brings the reliability of traditional data pipelines to AI workloads.
- Query engine designed from scratch for AI workloads, not retrofitted
- Automatic batch optimization for API calls
- Built-in retry logic and rate limiting
- Token counting and cost tracking
- semantic.analyze_sentiment - Built-in sentiment analysis
- semantic.classify - Categorize text with few-shot examples
- semantic.extract - Transform unstructured text into structured data with schemas
- semantic.group_by - Group data by semantic similarity
- semantic.join - Join DataFrames on meaning, not just values
- semantic.map - Apply natural language transformations
- semantic.predicate - Create predicates using natural language to filter rows
- semantic.reduce - Aggregate grouped data with LLM operations
Goes beyond typical multimodal data types (audio, images) by creating specialized types for text-heavy workloads:
- Markdown parsing and extraction as a first-class data type
- Transcript processing (SRT, generic formats) with speaker and timestamp awareness
- JSON manipulation with JQ expressions for nested data
- Automatic text chunking with configurable overlap for long documents
- Multi-provider support (OpenAI, Anthropic, Gemini)
- Local and cloud execution backends
- Comprehensive error handling and logging
- Pydantic integration for type safety
- PySpark-compatible operations
- Lazy evaluation and query optimization
- SQL support for complex queries
- Seamless integration with existing data pipelines
AI and agentic applications are fundamentally pipelines and workflows - exactly what DataFrame APIs were designed to handle. Rather than reinventing patterns for data transformation, filtering, and aggregation, fenic leverages decades of proven engineering practices.
fenic creates a clear separation between heavy inference tasks and real-time agent interactions. By moving batch processing out of the agent runtime, you get:
- More predictable and responsive agents
- Better resource utilization with batched LLM calls
- Cleaner separation between planning/orchestration and execution
DataFrames aren't just for data practitioners. The fluent, composable API design makes it accessible to any engineer:
- Chain operations naturally: df.filter(...).semantic.group_by(...)
- Mix imperative and declarative styles seamlessly
- Get started quickly with familiar patterns from pandas/PySpark or SQL
Join our community on Discord where you can connect with other users, ask questions, and get help with your fenic projects. Our community is always happy to welcome newcomers!
If you find fenic useful, consider giving us a ⭐ at the top of this repository. Your support helps us grow and improve the framework for everyone!
We welcome contributions of all kinds! Whether you're interested in writing code, improving documentation, testing features, or proposing new ideas, your help is valuable to us.
For developers planning to submit code changes, we encourage you to first open an issue to discuss your ideas before creating a Pull Request. This helps ensure alignment with the project's direction and prevents duplicate efforts.
Please refer to our contribution guidelines for detailed information about the development process and project setup.