
Transform Your Documents
Docling turns messy PDFs, DOCX, and slides into clean, structured data—ready for RAG, GenAI apps, or anything downstream. Complex layouts? Tables? Formulas? It handles them, so you don’t have to.
Advanced Document Parsing
Extracts clean structure from messy PDFs, DOCs, HTML, and more.
GenAI-Ready Integration
Plugs into LangChain, LlamaIndex, and other popular AI frameworks.
Structured Output
Delivers chunked, labeled data optimized for LLM pipelines.
Features
🗂️
Parse multiple document types: PDF, DOCX, PPTX, XLSX, HTML, audio, and images.
📑
Understand PDFs deeply: layout, tables, reading order, code, and formulas.
🧬
Unified DoclingDocument format for structured output.
↪
Export to Markdown, HTML, DocTags, or lossless JSON.
🔒
Run locally for sensitive or air-gapped environments.
🤖
Integrates easily with LangChain, LlamaIndex, Haystack, Langflow, and more.
🔍
OCR support for scanned PDFs and images.
👓
Works with visual language models (SmolDocling).
🎙
Supports audio via automatic speech recognition (ASR).
💻
Fast and easy to use with a simple CLI.
🗂️
Parse multiple document types: PDF, DOCX, PPTX, XLSX, HTML, audio, and images.
📑
Understand PDFs deeply: layout, tables, reading order, code, and formulas.
🧬
Unified DoclingDocument format for structured output.
↪
Export to Markdown, HTML, DocTags, or lossless JSON.
🔒
Run locally for sensitive or air-gapped environments.
🤖
Integrates easily with LangChain, LlamaIndex, Haystack, Langflow, and more.
🔍
OCR support for scanned PDFs and images.
👓
Works with visual language models (SmolDocling).
🎙
Supports audio via automatic speech recognition (ASR).
💻
Fast and easy to use with a simple CLI.
Live Assistant
Want to harness the power of AI with live support on Docling? Try Chat with Dosu, powered by our friends at Dosu. Chat Now →

.png)

