Transform unstructured domain data into structured, agent-ready knowledge - automatically.
Socratic is a tool that automates knowledge synthesis for vertical LLM agents - agents specialized in specific domains.
Socratic ingests sparse, unstructured source documents (docs, code, logs, etc.) and synthesizes them into compact, structured knowledge bases ready to plug into agents.
Building effective domain agents requires high-quality, domain-specific knowledge. Today, this knowledge is:
- Manually curated by experts 🧠
- Costly to maintain 💸
- Quickly outdated as source documents change ⚠️
The goal of Socratic is to automate this process, enabling accurate and cost effective domain knowledge management.
Using Socratic to build knowledge base for Google Analytics SQL agent: https://youtu.be/L20vOB3whMs
From Pypi:
From Source:
Install OpenAI Codex:
For Web-UI:
Assume that the project name is airline_demo and relevant source files are located in examples/repos/tau_airline.
Web UI (recommended):
Command line interface:
Currently only OpenAI models are supported, and an OpenAI API key is required.
Socratic uses a combination of LLM and LLM agents. Socratic contains 3 stages: ingest, synthesis, and compose.
Given a directory containing documents relevant to the vertical task, Socratic extracts a list of candidate concepts to research. This is done collaboratively between the user and a terminal agent.
- User provides high-level research directions.
- A terminal agent (codex) quickly scans the source documents to gain context and proposes concepts to research.
- User further refines and finalizes the list of concepts.
The ingest stage generates the final set of concepts to research (concepts.txt).
For each concept to research generated in the ingest stage, Socratic launches a terminal agent (codex) that explores the source documents to synthesize knowledge related to the specific concept.
For each concept, the synthesis stores the synthesized knowledge in both plain text (concept{i}-synth.txt) and JSON format (concept{i}-synth.json).
Convert synthesized knowledge into prompts that are ready to be dropped directly into your LLM agent’s context.
- Local storage: All files and outputs are stored entirely on your own machine. Socratic does not upload, transfer, index, or store your data anywhere else.
- Local processing: All analysis and processing happen locally, except when data is sent to an external LLM provider (e.g., OpenAI) using your own API key.
- Sandboxed terminal agent: Socratic uses Codex as its terminal agent to read and analyze source documents. Socratic runs Codex in read-only mode, preventing the agent from editing files or running commands that require network access. See the Codex sandbox documentations for more details.
.png)

