Just dropped ragbits v1.0 and create-ragbits-app – spin up a RAG app in minutes

1 day ago 5

🔨 Build Reliable & Scalable GenAI Apps

📚 Fast & Flexible RAG Processing

  • Ingest 20+ formats – Process PDFs, HTML, spreadsheets, presentations, and more. Process data using Docling, Unstructured or create a custom parser.
  • Handle complex data – Extract tables, images, and structured content with built-in VLMs support.
  • Connect to any data source – Use prebuilt connectors for S3, GCS, Azure, or implement your own.
  • Scale ingestion – Process large datasets quickly with Ray-based parallel processing.

🚀 Deploy & Monitor with Confidence

  • Real-time observability – Track performance with OpenTelemetry and CLI insights.
  • Built-in testing – Validate prompts with promptfoo before deployment.
  • Auto-optimization – Continuously evaluate and refine model performance.
  • Chat UI – Deploy chatbot interface with API, persistance and user feedback.

To get started quickly, you can install with:

This is a starter bundle of packages, containing:

  • ragbits-core - fundamental tools for working with prompts, LLMs and vector databases.
  • ragbits-agents - abstractions for building agentic systems.
  • ragbits-document-search - retrieval and ingestion piplines for knowledge bases.
  • ragbits-evaluate - unified evaluation framework for Ragbits components.
  • ragbits-chat - full-stack infrastructure for building conversational AI applications.
  • ragbits-cli - ragbits shell command for interacting with Ragbits components.

Alternatively, you can use individual components of the stack by installing their respective packages.

To define a prompt and run LLM:

import asyncio from pydantic import BaseModel from ragbits.core.llms import LiteLLM from ragbits.core.prompt import Prompt class QuestionAnswerPromptInput(BaseModel): question: str class QuestionAnswerPromptOutput(BaseModel): answer: str class QuestionAnswerPrompt(Prompt[QuestionAnswerPromptInput, QuestionAnswerPromptOutput]): system_prompt = """ You are a question answering agent. Answer the question to the best of your ability. """ user_prompt = """ Question: {{ question }} """ llm = LiteLLM(model_name="gpt-4.1-nano", use_structured_output=True) async def main() -> None: prompt = QuestionAnswerPrompt(QuestionAnswerPromptInput(question="What are high memory and low memory on linux?")) response = await llm.generate(prompt) print(response.answer) if __name__ == "__main__": asyncio.run(main())

To build and query a simple vector store index:

import asyncio from ragbits.core.embeddings import LiteLLMEmbedder from ragbits.core.vector_stores import InMemoryVectorStore from ragbits.document_search import DocumentSearch embedder = LiteLLMEmbedder(model_name="text-embedding-3-small") vector_store = InMemoryVectorStore(embedder=embedder) document_search = DocumentSearch(vector_store=vector_store) async def run() -> None: await document_search.ingest("web://https://arxiv.org/pdf/1706.03762") result = await document_search.search("What are the key findings presented in this paper?") print(result) if __name__ == "__main__": asyncio.run(run())

Retrieval-Augmented Generation

To build a simple RAG pipeline:

import asyncio from pydantic import BaseModel from ragbits.core.embeddings import LiteLLMEmbedder from ragbits.core.llms import LiteLLM from ragbits.core.prompt import Prompt from ragbits.core.vector_stores import InMemoryVectorStore from ragbits.document_search import DocumentSearch class QuestionAnswerPromptInput(BaseModel): question: str context: list[str] class QuestionAnswerPromptOutput(BaseModel): answer: str class QuestionAnswerPrompt(Prompt[QuestionAnswerPromptInput, QuestionAnswerPromptOutput]): system_prompt = """ You are a question answering agent. Answer the question that will be provided using context. If in the given context there is not enough information refuse to answer. """ user_prompt = """ Question: {{ question }} Context: {% for item in context %} {{ item }} {%- endfor %} """ embedder = LiteLLMEmbedder(model_name="text-embedding-3-small") vector_store = InMemoryVectorStore(embedder=embedder) document_search = DocumentSearch(vector_store=vector_store) llm = LiteLLM(model_name="gpt-4.1-nano", use_structured_output=True) async def run() -> None: question = "What are the key findings presented in this paper?" await document_search.ingest("web://https://arxiv.org/pdf/1706.03762") result = await document_search.search(question) prompt = QuestionAnswerPrompt(QuestionAnswerPromptInput( question=question, context=[element.text_representation for element in result], )) response = await llm.generate(prompt) print(response.answer) if __name__ == "__main__": asyncio.run(run())

Chatbot interface with UI

To expose your RAG application through Ragbits UI:

from collections.abc import AsyncGenerator from pydantic import BaseModel from ragbits.chat.api import RagbitsAPI from ragbits.chat.interface import ChatInterface from ragbits.chat.interface.types import ChatContext, ChatResponse from ragbits.core.embeddings import LiteLLMEmbedder from ragbits.core.llms import LiteLLM from ragbits.core.prompt import Prompt from ragbits.core.prompt.base import ChatFormat from ragbits.core.vector_stores import InMemoryVectorStore from ragbits.document_search import DocumentSearch class QuestionAnswerPromptInput(BaseModel): question: str context: list[str] class QuestionAnswerPrompt(Prompt[QuestionAnswerPromptInput, str]): system_prompt = """ You are a question answering agent. Answer the question that will be provided using context. If in the given context there is not enough information refuse to answer. """ user_prompt = """ Question: {{ question }} Context: {% for item in context %}{{ item }}{%- endfor %} """ class MyChat(ChatInterface): """Chat interface for fullapp application.""" async def setup(self) -> None: self.embedder = LiteLLMEmbedder(model_name="text-embedding-3-small") self.vector_store = InMemoryVectorStore(embedder=self.embedder) self.document_search = DocumentSearch(vector_store=self.vector_store) self.llm = LiteLLM(model_name="gpt-4.1-nano", use_structured_output=True) await self.document_search.ingest("web://https://arxiv.org/pdf/1706.03762") async def chat( self, message: str, history: ChatFormat | None = None, context: ChatContext | None = None, ) -> AsyncGenerator[ChatResponse, None]: # Search for relevant documents result = await self.document_search.search(message) prompt = QuestionAnswerPrompt( QuestionAnswerPromptInput( question=message, context=[element.text_representation for element in result], ) ) # Stream the response from the LLM async for chunk in self.llm.generate_streaming(prompt): yield self.create_text_response(chunk) if __name__ == "__main__": RagbitsAPI(MyChat).run()

Create Ragbits projects from templates:

Explore create-ragbits-app repo here. If you have a new idea for a template, feel free to contribute!

  • Quickstart - Get started with Ragbits in a few minutes
  • How-to - Learn how to use Ragbits in your projects
  • CLI - Learn how to run Ragbits in your terminal
  • API reference - Explore the underlying Ragbits API

We welcome contributions! Please read CONTRIBUTING.md for more information.

Ragbits is licensed under the MIT License.

Read Entire Article