Show HN: Lightning-SimulWhisper: Real-Time ASR for Apple Silicon
3 weeks ago
1
The fastest, most power efficient real-time local transcriptions on your apple silicon devices ✨
Zero pytorch dependencies ⛔
15x speedup on encoding, 18x speedup on decoding ⚡
Lightning-SimulWhisper implements Whisper model for simultaneous transcription using MLX (Apple's machine learning framework) and CoreML for optimal performance on Apple Silicon devices. It uses the AlignAtt policy for streaming speech recognition.
Using the original SimulStreaming project I could barely run the base model in real time. Now, I can run medium and even large-v3-turbo models in real time on my M2 Macbook Pro.
The MLX-only version consumes way too much power, so using the CoreML encoder is recommended.
CoreML Encoder: While the encoder speedup is dramatic (up to 18x faster), the overall inference time improvement is more modest because the decoder still runs on MLX
MLX Decoder: MLX provides up to 15x decoder speedup compared to PyTorch implementations, demonstrating excellent Apple Silicon optimization
Power Efficiency: CoreML acceleration uses significantly less power than MLX-only implementations, though exact power measurements weren't captured in this benchmark
Decoder Performance: MLX decoder performance remains consistent across implementations, showing the stability of the MLX framework
Speed Gains: You can achieve up to 18x encoder speed increase and 15x decoder speed increase with optimal CoreML configuration
Note: I have no idea on how to benchmark power consumption for a specific process. Any contributions or suggestions for accurate power measurement on Apple Silicon would be greatly appreciated!
MLX Implementation: Native Apple Silicon optimization with MLX framework (up to 15x decoder speedup)
CoreML Encoder: Up to 18x faster encoding using Apple's Neural Engine
Multiple Model Support: tiny, base, small, medium, large-v1, large-v2, large-v3
Beam Search: Configurable beam search decoding
Real-time Streaming: Both file simulation and live microphone input
Power Efficient: Low power consumption with CoreML acceleration
pip install -r requirements.txt
CoreML Acceleration (Recommended)
For optimal performance on Apple Silicon, install CoreML dependencies:
pip install coremltools ane_transformers
Generate CoreML encoder models for faster inference:
# Clone whisper.cpp for CoreML model generation
git clone https://github.com/ggml-org/whisper.cpp.git
# Generate CoreML encoder for your preferred model
./scripts/generate_coreml_encoder.sh base.en