Show HN: A highly extensible framework for building OCR systems
4 days ago
1
MyOCR is a highly extensible and customizable framework for building OCR systems. Engineers can easily train, integrate deep learning models into custom OCR pipelines for real-world applications.
⚡️ End-to-End OCR Development Framework – Designed for developers to build and integrate detection, recognition, and custom OCR models in a unified and flexible pipeline.
🛠️ Modular & Extensible – Mix and match components - swap models, predictors, or input output processors with minimal changes.
🔌 Developer-Friendly by Design - Clean Python APIs, prebuilt pipelines and processors, and straightforward customization for training and inference.
🚀 Production-Ready Performance – ONNX runtime support for fast CPU/GPU inference, support various ways of deployment.
🔥2025.05.17 MyOCR v0.1.1 released
Python 3.11+
CUDA: Version 12.6 or higher is recommended for GPU acceleration. CPU-only mode is also supported.
Operating System: Linux, macOS, or Windows.
# Clone the code from GitHub
git clone https://github.com/robbyzhaox/myocr.git
cd myocr
# You can create your own venv before the following steps# Install dependencies
pip install -e .# Development environment installation
pip install -e ".[dev]"# Download pre-trained model weights to models# for Linux, macOS
mkdir -p ~/.MyOCR/models/
# for Windows, the "models" directory can be created in the current path
Download weights from: https://drive.google.com/drive/folders/1RXppgx4XA_pBX9Ll4HFgWyhECh5JtHnY
# Alternative download link: https://pan.baidu.com/s/122p9zqepWfbEmZPKqkzGBA?pwd=yq6j
frommyocr.pipelinesimportCommonOCRPipeline# Initialize common OCR pipeline (using GPU)pipeline=CommonOCRPipeline("cuda:0") # Use "cpu" for CPU mode# Perform OCR recognition on an imageresult=pipeline("path/to/your/image.jpg")
print(result)
Structured OCR Output (Example: Invoice Information Extraction)
config chat_bot in myocr.pipelines.config.structured_output_pipeline.yaml
frompydanticimportBaseModel, Fieldfrommyocr.pipelinesimportStructuredOutputOCRPipeline# Define output data model, refer to InvoiceModel in main.py# Initialize structured OCR pipelinepipeline=StructuredOutputOCRPipeline("cuda:0", InvoiceModel)
# Process image and get structured dataresult=pipeline("path/to/invoice.jpg")
print(result.to_dict())
The framework provides support for Docker deployment, which can be built and run using the following commands:
docker run -d -p 8000:8000 robbyzhaox/myocr:latest
# set the environment variables like following with -e option of docker run if you want use the StructuredOutputOCRPipline
docker run -d \
-p 8000:8000 \
-e CHAT_BOT_MODEL="qwen2.5:14b" \
-e CHAT_BOT_BASEURL="http://127.0.0.1:11434/v1" \
-e CHAT_BOT_APIKEY="key" \
robbyzhaox/myocr:latest