Show HN: Talk to your Mac offline – sub-second Voice AI (Apple Silicon and MLX)

5 hours ago 1

Ultra-responsive, full-duplex voice assistant tuned for Apple Silicon + MLX. End-to-end round trip (speech → LLM → TTS) is consistently under 1 second, even while handling barge-in.

Watch the demo

  • On-device VAD, STT (Whisper), LLM, and Kokoro TTS sharing a single MLX runtime
  • Sentence-streaming LLM responses with immediate, cancellable TTS playback
  • Client-side AudioWorklet correlator for robust echo suppression and barge-in
  • Rolling audio buffer to preserve context around interruptions
python -m venv venv source venv/bin/activate pip install -r requirements.txt python server.py

Then open http://localhost:8000 and hit Start Listening.

  • config.py: audio rate, VAD thresholds, queue sizes, model choices
  • index.html: INTERRUPTION object for interruption sensitivity
  • requirements.txt: pin runtime dependencies for MLX + Kokoro

Architecture (at a glance)

Mic (16 kHz) → Correlator → VAD State Machine → Segment Queue ↓ ↓ WebSocket ↔ Browser UI MLX Whisper STT ↓ MLX LLM → Kokoro TTS ↓ Playback + barge-in feedback
  • server.py – FastAPI WebSocket server + pipeline orchestration
  • audio_buffer.py, vad_detector.py – speech segmentation utilities
  • transcriber.py, llm_handler.py, tts_handler.py – model wrappers
  • index.html, correlator.js – interactive UI with barge-in logic

Silero VAD, Whisper, MLX, Kokoro TTS, and Pipecat SmartTurn for EoU detection. MIT licensed.

Read Entire Article