Local AI-powered subtitle generation and translation
Extract and translate video subtitles using Whisper and NLLB-200.
- Automatic speech recognition using OpenAI Whisper (auto-detects language)
- Multi-language translation using Meta's NLLB-200 model
- Subtitle embedding into video files (MP4 or MKV)
- Multiple subtitle tracks with proper language naming
- SRT and ASS format support (ASS for better Unicode/CJK character rendering)
Recommended (fast, reproducible):
Run without installing:
uvx subtool video.mp4 -t en es
With pip:
# Basic usage - extract subtitles (auto-detect language)
subtool video.mp4
# Extract and translate to English and Spanish
subtool video.mp4 -t en es
# Specify source language explicitly
subtool video.mp4 -l zh -t en
# Use different Whisper model (default: turbo)
subtool video.mp4 -m large -t en
# Output as MKV (better subtitle support for Unicode/CJK)
subtool video.mp4 -t en zh -f mkv
# Generate SRT files only (no embedding)
subtool video.mp4 -t en es --srt-only
# Custom output path
subtool video.mp4 -t en -o output_with_subs.mp4
Translation supports 12 common languages:
- en - English
- es - Spanish
- fr - French
- de - German
- it - Italian
- pt - Portuguese
- ru - Russian
- ja - Japanese
- ko - Korean
- zh - Chinese
- ar - Arabic
- hi - Hindi
More languages available in NLLB-200 documentation.
Available models (trade-off between speed and accuracy):
- tiny - Fastest, least accurate
- base - Fast, decent accuracy
- small - Balanced
- medium - Good accuracy, slower
- large - Best accuracy, slowest
- turbo - Fast and accurate (default)
- MP4 (default): Compatible but limited Unicode support for subtitles
- MKV: Better subtitle support, recommended for Chinese/Japanese/Korean content
MKV uses ASS format with embedded font information for proper CJK character rendering.
uv sync
uv run subtool video.mp4 -t en # run the CLI using local code
# optional: editable install
uv pip install -e .
./scripts/release.sh # release a new version
- First run downloads Whisper model (~1.5GB for turbo) and NLLB-200 model (~1.2GB)
- Models are cached in ~/.cache/huggingface/
- Requires ffmpeg installed on your system
- Generated subtitle files (.srt and .ass) are saved alongside the video
- For best CJK (Chinese/Japanese/Korean) subtitle rendering, use MKV format (-f mkv)
.png)
