🧠 Local Neural Text-to-Speech for Firefox — fast, private, offline.
Tested on a Xeon E3-1265L v3 (2013) — Ran multiple TTS jobs in parallel with barely perceptible lag.
If it works on this, it'll fly on your machine.
Kokoro TTS is a Firefox extension that lets you convert selected or pasted text into natural-sounding speech — without needing an internet connection.
It uses a lightweight Flask server and the Kokoro model running locally on your system.
- ✅ No accounts or logins
- ✅ No cloud APIs or telemetry
- ✅ No GPU required but helps a lot, if no usable GPU falls to using the CPU.
- 🎙️ Neural TTS with multiple voice options
- 🔒 Offline-first & privacy-respecting
- 🧊 Lightweight: Small 82M parameters
- 🥔 Works on low-end CPUs
- 🌍 Linux, macOS, and Windows support
Head to the Releases Page and grab:
- latest kokoro-tts-addon.xpi
- server.py
- Go to about:addons
- Click the gear icon → Install Add-on From File...
- Select the .xpi you downloaded
Create a .bat file like this:
Drop a shortcut to it in the Startup folder (Win + R → shell:startup).
To install espeak-ng on Windows:
- Go to espeak-ng releases
- Click on Latest release
- Download the appropriate *.msi file (e.g. espeak-ng-20191129-b702b03-x64.msi)
- Run the downloaded installer
For advanced configuration and usage on Windows, see the official espeak-ng Windows guide
- Visit http://localhost:8000/health
- You should see a simple “healthy” JSON response
- Use the extension: paste text, pick a voice, click “Generate Speech” 🎉
- First-time run will download the model
- Make sure Python 3.8+ is installed and in PATH
- All processing is local — nothing leaves your machine
You’ll need Python 3.8+ and pip installed. Most systems already have them.
To install all required Python packages (including some optional extras for extended model usage), run:
Licensed under the Apache License 2.0
Powered by the Kokoro TTS model
Comparison of offline using MKLDNN vs online generation using WASM/WebGPU.
.png)



