Experience the power of DeepSeek OCR - an open-source AI model that converts complex documents, PDFs, and images into clean Markdown. Try the official demo below, and stay tuned for our upcoming API service with enhanced features.
DeepSeek OCR is powered by cutting-edge AI technology
Hugging Face
PyTorch
vLLM
DeepSeek LLM
SAM ViT-B
CLIP-L
DeepSeek OCR Performance Metrics
Built for Speed and Accuracy
DeepSeek OCR is engineered for production-grade document processing with adaptive resolution modes, high-performance vLLM inference, and efficient token usage. Whether you're processing simple receipts or complex academic papers, DeepSeek OCR scales to meet your needs.
5
Resolution Modes
From Tiny (512px) to Gundam (dynamic)
2500+
Tokens per Second
High-performance inference on A100-40G
400
Max Vision Tokens
Support for high-resolution documents
DeepSeek OCR Features
Everything You Need
Explore how DeepSeek OCR delivers end-to-end document intelligence, from precise recognition to structure-aware conversion.
Explore how DeepSeek OCR delivers end-to-end document intelligence, from precise recognition to structure-aware conversion.
Transform any document into clean, structured Markdown while preserving headings, tables, lists, and semantic layout. DeepSeek OCR understands document structure, not just text - perfect for content migration, documentation workflows, and knowledge base creation. The Markdown output is ready for version control, static site generators, or content management systems.
Why Choose DeepSeek OCR?
Powerful Benefits
DeepSeek OCR delivers unique advantages that set it apart from traditional OCR solutions and commercial alternatives.
DeepSeek OCR Use Cases
Built for Every Scenario
From academic research to business automation, DeepSeek OCR handles diverse document processing challenges with consistent accuracy and efficiency.
Extract complete text, mathematical formulas, citations, and figure captions from academic papers and research documents. DeepSeek OCR recognizes LaTeX math notation, chemical formulas, and complex equations, making it ideal for literature review, knowledge management, and digital library creation. Process thesis documents, journal articles, and conference papers while maintaining academic formatting and structure.
Digitize invoices, contracts, reports, and business correspondence with structure-aware OCR that understands tables, headers, and hierarchical layouts. DeepSeek OCR automates data entry, enables searchable document archives, and accelerates business process automation. Perfect for accounts payable processing, contract management, and compliance documentation.
Convert old scanned documents, handwritten notes, and low-quality images into clean, editable text. DeepSeek OCR's vision-language model handles image noise, skewed scans, and varying quality levels to produce searchable text datasets. Ideal for archival digitization, historical document preservation, and legacy data migration projects.
Extract data from charts, bar graphs, line plots, diagrams, and infographics for analysis and reporting. DeepSeek OCR understands visual data representation beyond text, capturing labels, legends, axis values, and trend information. Transform visual business intelligence into structured data for further processing and analytics workflows.
DeepSeek OCR Architecture
Powered by State-of-the-Art AI
DeepSeek OCR combines state-of-the-art vision processing with powerful language models to deliver accurate, efficient document understanding. The technology stack is optimized for production use, balancing accuracy, speed, and resource efficiency.

DeepSeek OCR employs sophisticated vision encoders that capture both global document layout and fine-grained text details. This dual-level understanding ensures accurate text extraction even in complex documents with mixed content types, varying fonts, and intricate formatting. The encoder architecture is specifically optimized for document processing rather than general image understanding.
Multi-scale feature extraction
At the core of DeepSeek OCR is a powerful language model that brings contextual understanding to OCR. Unlike traditional pattern-matching OCR, the LLM can correct errors using context, understand document semantics, and generate structured output formats like Markdown. This enables intelligent features like grounding, reference extraction, and format-aware text generation.
Supports grounding, reference, and multi-modal reasoning
DeepSeek OCR leverages vLLM (Very Large Language Model) serving technology for production-grade performance. With continuous batching, efficient memory management, and GPU optimization, vLLM enables streaming outputs and high-throughput batch processing. On high-performance hardware like A100 GPUs, process thousands of pages per hour with consistent sub-second latency.
~2500 tokens/s throughput on A100-40G
Gundam mode represents DeepSeek OCR's intelligent adaptive resolution system. Instead of using a fixed resolution for all documents, Gundam mode analyzes document complexity and dynamically adjusts vision token allocation. This multi-crop strategy balances accuracy on dense content (formulas, tables) while maintaining efficiency on simpler sections, resulting in optimal performance across varied document types.
Gundam mode with multi-crop strategy
Frequently Asked Questions
Got Questions?
Find answers to the most common DeepSeek OCR questions, from supported formats to deployment options.
.png)

