Show HN: A highly extensible framework for building OCR systems

4 days ago 1

MyOCR is a highly extensible and customizable framework for building OCR systems. Engineers can easily train, integrate deep learning models into custom OCR pipelines for real-world applications.

Try the online demo on HuggingFace or ModelScope

⚡️ End-to-End OCR Development Framework – Designed for developers to build and integrate detection, recognition, and custom OCR models in a unified and flexible pipeline.

🛠️ Modular & Extensible – Mix and match components - swap models, predictors, or input output processors with minimal changes.

🔌 Developer-Friendly by Design - Clean Python APIs, prebuilt pipelines and processors, and straightforward customization for training and inference.

🚀 Production-Ready Performance – ONNX runtime support for fast CPU/GPU inference, support various ways of deployment.

  • 🔥2025.05.17 MyOCR v0.1.1 released
  • Python 3.11+
  • CUDA: Version 12.6 or higher is recommended for GPU acceleration. CPU-only mode is also supported.
  • Operating System: Linux, macOS, or Windows.
# Clone the code from GitHub git clone https://github.com/robbyzhaox/myocr.git cd myocr # You can create your own venv before the following steps # Install dependencies pip install -e . # Development environment installation pip install -e ".[dev]" # Download pre-trained model weights to models # for Linux, macOS mkdir -p ~/.MyOCR/models/ # for Windows, the "models" directory can be created in the current path Download weights from: https://drive.google.com/drive/folders/1RXppgx4XA_pBX9Ll4HFgWyhECh5JtHnY # Alternative download link: https://pan.baidu.com/s/122p9zqepWfbEmZPKqkzGBA?pwd=yq6j
from myocr.pipelines import CommonOCRPipeline # Initialize common OCR pipeline (using GPU) pipeline = CommonOCRPipeline("cuda:0") # Use "cpu" for CPU mode # Perform OCR recognition on an image result = pipeline("path/to/your/image.jpg") print(result)

Structured OCR Output (Example: Invoice Information Extraction)

config chat_bot in myocr.pipelines.config.structured_output_pipeline.yaml

chat_bot: model: qwen2.5:14b base_url: http://127.0.0.1:11434/v1 api_key: 'key'

Note: chat bot currently support:

  • Ollama API
  • OpenAI API
from pydantic import BaseModel, Field from myocr.pipelines import StructuredOutputOCRPipeline # Define output data model, refer to InvoiceModel in main.py # Initialize structured OCR pipeline pipeline = StructuredOutputOCRPipeline("cuda:0", InvoiceModel) # Process image and get structured data result = pipeline("path/to/invoice.jpg") print(result.to_dict())

The framework provides support for Docker deployment, which can be built and run using the following commands:

docker run -d -p 8000:8000 robbyzhaox/myocr:latest # set the environment variables like following with -e option of docker run if you want use the StructuredOutputOCRPipline docker run -d \ -p 8000:8000 \ -e CHAT_BOT_MODEL="qwen2.5:14b" \ -e CHAT_BOT_BASEURL="http://127.0.0.1:11434/v1" \ -e CHAT_BOT_APIKEY="key" \ robbyzhaox/myocr:latest

Accessing API Endpoints (Docker)

IMAGE_PATH="your_image.jpg" BASE64_IMAGE=$(base64 -w 0 "$IMAGE_PATH") # Linux #BASE64_IMAGE=$(base64 -i "$IMAGE_PATH" | tr -d '\n') # macOS curl -X POST \ -H "Content-Type: application/json" \ -d "{\"image\": \"${BASE64_IMAGE}\"}" \ http://localhost:8000/ocr

The framework provides a simple Flask API service that can be called via HTTP interface:

# Start the service default port: 5000 python main.py

API endpoints:

  • GET /ping: Check if the service is running properly
  • POST /ocr: Basic OCR recognition
  • POST /ocr-json: Structured OCR output

We also have a UI for these endpoints, please refer to doc-insight-ui

🎖 Contribution Guidelines

We welcome any form of contribution, including but not limited to:

  • Submitting bug reports
  • Adding new features
  • Improving documentation
  • Optimizing performance

This project is open-sourced under the Apache 2.0 License, see the LICENSE file for details.

Read Entire Article