With extrai, you can extract data from text documents with LLMs, which will be formatted into a given SQLModel and registered in your database.
The core of the library is its Consensus Mechanism. We make the same request multiple times, using the same or different providers, and then select the values that meet a certain threshold.
extrai also has other features, like generating SQLModels from a prompt and documents, and generating few-shot examples. For complex, nested data, the library offers Hierarchical Extraction, breaking down the extraction into manageable, hierarchical steps. It also includes built-in analytics to monitor performance and output quality.
- Consensus Mechanism: Improves extraction accuracy by consolidating multiple LLM outputs.
- Dynamic SQLModel Generation: Generate SQLModel schemas from natural language descriptions.
- Hierarchical Extraction: Handles complex, nested data by breaking down the extraction into manageable, hierarchical steps.
- Extensible LLM Support: Integrates with various LLM providers through a client interface.
- Built-in Analytics: Collects metrics on LLM performance and output quality to refine prompts and monitor errors.
- Workflow Orchestration: A central orchestrator to manage the extraction pipeline.
- Example JSON Generation: Automatically generate few-shot examples to improve extraction quality.
- Customizable Prompts: Customize prompts at runtime to tailor the extraction process to specific needs.
- Rotating LLMs providers: Create the JSON revisions from multiple LLM providers.
For a complete guide, please see the full documentation. Here are the key sections:
- Getting Started
- How-to Guides
- Core Concepts
- Reference
- API Reference
- Community
The library is built around a few key components that work together to manage the extraction workflow. The following diagram illustrates the high-level workflow (see Architecture Overview):
Install the library from PyPI:
For a more detailed guide, please see the Getting Started Tutorial.
Here is a minimal example:
For more in-depth examples, see the /examples directory in the repository.
We welcome contributions! Please see the Contributing Guide for details on how to set up your development environment, run tests, and submit a pull request.
This project is licensed under the MIT License - see the LICENSE file for details.
.png)

