Show HN: Local AI-powered subtitle generation, translation and embedding

1 month ago 5

Local AI-powered subtitle generation and translation

Extract and translate video subtitles using Whisper and NLLB-200.

Automatic speech recognition using OpenAI Whisper (auto-detects language)
Multi-language translation using Meta's NLLB-200 model
Subtitle embedding into video files (MP4 or MKV)
Multiple subtitle tracks with proper language naming
SRT and ASS format support (ASS for better Unicode/CJK character rendering)

Recommended (fast, reproducible):

Run without installing:

uvx subtool video.mp4 -t en es

With pip:

# Basic usage - extract subtitles (auto-detect language) subtool video.mp4 # Extract and translate to English and Spanish subtool video.mp4 -t en es # Specify source language explicitly subtool video.mp4 -l zh -t en # Use different Whisper model (default: turbo) subtool video.mp4 -m large -t en # Output as MKV (better subtitle support for Unicode/CJK) subtool video.mp4 -t en zh -f mkv # Generate SRT files only (no embedding) subtool video.mp4 -t en es --srt-only # Custom output path subtool video.mp4 -t en -o output_with_subs.mp4

Translation supports 12 common languages:

en - English
es - Spanish
fr - French
de - German
it - Italian
pt - Portuguese
ru - Russian
ja - Japanese
ko - Korean
zh - Chinese
ar - Arabic
hi - Hindi

More languages available in NLLB-200 documentation.

Available models (trade-off between speed and accuracy):

tiny - Fastest, least accurate
base - Fast, decent accuracy
small - Balanced
medium - Good accuracy, slower
large - Best accuracy, slowest
turbo - Fast and accurate (default)

MP4 (default): Compatible but limited Unicode support for subtitles
MKV: Better subtitle support, recommended for Chinese/Japanese/Korean content

MKV uses ASS format with embedded font information for proper CJK character rendering.

uv sync uv run subtool video.mp4 -t en # run the CLI using local code # optional: editable install uv pip install -e . ./scripts/release.sh # release a new version

First run downloads Whisper model (~1.5GB for turbo) and NLLB-200 model (~1.2GB)
Models are cached in ~/.cache/huggingface/
Requires ffmpeg installed on your system
Generated subtitle files (.srt and .ass) are saved alongside the video
For best CJK (Chinese/Japanese/Korean) subtitle rendering, use MKV format (-f mkv)

Read Entire Article

Show HN: Local AI-powered subtitle generation, translation and embedding

Related

The Exponential Rise of Global Solar Power

Show HN: ThinkReview open source browser Copilot GitLab and ...

I ate bear fat with honey and salt flakes, to prove a point