A simple, hotkey-driven voice transcription script designed for the Sway window manager. Captures audio, transcribes via API, and inserts text at cursor position. Nvidia Parakeet backend included if you want it.
Purely for personal use, satisfaction not guaranteed.
- Install the required dependencies:
# Ubuntu/Debian
sudo apt install alsa-utils curl wtype libnotify-bin jq
# Arch Linux
sudo pacman -S alsa-utils curl wtype libnotify jq
- Put your own configuration: cp config.env.example config.env then at config.env:
API_ENDPOINT="http://localhost:8000/transcribe" # Or where-ever your OAI compliant audio STT API is at
- Set up hotkey in Sway: Add to your ~/.config/sway/config:
bindsym $mod+Shift+v exec /path/to/steno/voice-to-text.sh
-
Start Recording: Press your configured hotkey
- Shows "🎤 Recording started..." notification
-
Stop Recording: Press the same hotkey again
- Shows "🔄 Transcribing..." notification
- Transcribes audio and inserts text at cursor
- Shows "✅ Text inserted..." confirmation
I like to live dangerously and have an nvidia GPU > 12.1 CUDA and containers don't scare me (in small dosage)
Fine, here you go:
git clone https://github.com/Shadowfita/parakeet-fastapi.git
cd parakeet-fastapi
docker build -t parakeet-stt .
docker run -d -p 8000:8000 --gpus all parakeet-stt
# Then go back to the dir before
cd ../
git clone https://github.com/winston-bosan/steno.git
cd steno
chmod +x voice-to-text.sh
./voice-to-text.sh
# SAY YOUR STUFF
./voice-to-text.sh
.png)

