Show HN: Local AI-powered subtitle generation, translation and embedding

1 month ago 5

Local AI-powered subtitle generation and translation

PyPI Version Python Versions License UV Friendly CI Publish

Extract and translate video subtitles using Whisper and NLLB-200.

  • Automatic speech recognition using OpenAI Whisper (auto-detects language)
  • Multi-language translation using Meta's NLLB-200 model
  • Subtitle embedding into video files (MP4 or MKV)
  • Multiple subtitle tracks with proper language naming
  • SRT and ASS format support (ASS for better Unicode/CJK character rendering)

Recommended (fast, reproducible):

Run without installing:

uvx subtool video.mp4 -t en es

With pip:

# Basic usage - extract subtitles (auto-detect language) subtool video.mp4 # Extract and translate to English and Spanish subtool video.mp4 -t en es # Specify source language explicitly subtool video.mp4 -l zh -t en # Use different Whisper model (default: turbo) subtool video.mp4 -m large -t en # Output as MKV (better subtitle support for Unicode/CJK) subtool video.mp4 -t en zh -f mkv # Generate SRT files only (no embedding) subtool video.mp4 -t en es --srt-only # Custom output path subtool video.mp4 -t en -o output_with_subs.mp4

Translation supports 12 common languages:

  • en - English
  • es - Spanish
  • fr - French
  • de - German
  • it - Italian
  • pt - Portuguese
  • ru - Russian
  • ja - Japanese
  • ko - Korean
  • zh - Chinese
  • ar - Arabic
  • hi - Hindi

More languages available in NLLB-200 documentation.

Available models (trade-off between speed and accuracy):

  • tiny - Fastest, least accurate
  • base - Fast, decent accuracy
  • small - Balanced
  • medium - Good accuracy, slower
  • large - Best accuracy, slowest
  • turbo - Fast and accurate (default)
  • MP4 (default): Compatible but limited Unicode support for subtitles
  • MKV: Better subtitle support, recommended for Chinese/Japanese/Korean content

MKV uses ASS format with embedded font information for proper CJK character rendering.

uv sync uv run subtool video.mp4 -t en # run the CLI using local code # optional: editable install uv pip install -e . ./scripts/release.sh # release a new version
  • First run downloads Whisper model (~1.5GB for turbo) and NLLB-200 model (~1.2GB)
  • Models are cached in ~/.cache/huggingface/
  • Requires ffmpeg installed on your system
  • Generated subtitle files (.srt and .ass) are saved alongside the video
  • For best CJK (Chinese/Japanese/Korean) subtitle rendering, use MKV format (-f mkv)
Read Entire Article