Show HN: Wayland Speech-to-Text Tool

4 months ago 3

Press a keybind, speak, and get instant text output. A background speech-to-text tool that transcribes audio using OpenAI Whisper and either types directly or copies to clipboard.

Signal-driven: Press keybind → speak → get text (no GUI needed)
Dual output modes: Direct typing or clipboard copy
Background operation: Runs continuously, always ready
Audio feedback: Beeps confirm recording start/stop and success
Wayland native: Works with modern Linux desktops (Hyprland, Niri, etc.)

Wayland desktop (Hyprland, Niri, GNOME, KDE, etc.)
OpenAI API key (for Whisper transcription)
System packages:

# Arch Linux sudo pacman -S pipewire ydotool wtype # Ubuntu/Debian sudo apt install pipewire-pulse ydotool wtype # Fedora sudo dnf install pipewire-pulseaudio ydotool wtype

Setup ydotool permissions:

sudo usermod -a -G input $USER # Log out and back in

# Using your preferred AUR helper yay -S waystt-bin # or paru -S waystt-bin

Download from GitHub Releases
Install:

wget https://github.com/sevos/waystt/releases/download/v0.1.1/waystt-linux-x86_64 mkdir -p ~/.local/bin mv waystt-linux-x86_64 ~/.local/bin/waystt chmod +x ~/.local/bin/waystt # Add to PATH (add to ~/.bashrc or ~/.zshrc) export PATH="$HOME/.local/bin:$PATH"

Setup configuration:

# Create config directory and file mkdir -p ~/.config/waystt echo "OPENAI_API_KEY=your_api_key_here" > ~/.config/waystt/.env

Start the service:

cd ~/.config/waystt && nohup ~/.local/bin/waystt > /tmp/waystt.log 2>&1 & disown

Use with signals:

# Direct typing mode pkill --signal SIGUSR1 waystt # Clipboard mode pkill --signal SIGUSR2 waystt

Add to your ~/.config/hypr/hyprland.conf:

# waystt - Speech to Text (direct typing) bind = SUPER, R, exec, pgrep -x waystt >/dev/null && pkill -USR1 waystt || (cd ~/.config/waystt && ~/.local/bin/waystt &) # waystt - Speech to Text (clipboard copy) bind = SUPER SHIFT, R, exec, pgrep -x waystt >/dev/null && pkill -USR2 waystt || (cd ~/.config/waystt && ~/.local/bin/waystt &)

These keybindings will:

Super+R: Start waystt if not running, or send SIGUSR1 to transcribe and type directly
Super+Shift+R: Start waystt if not running, or send SIGUSR2 to transcribe and copy to clipboard

Add to your ~/.config/niri/config.kdl:

binds { // waystt - Speech to Text (direct typing) Mod+R { spawn "sh" "-c" "pgrep -x waystt >/dev/null && pkill -USR1 waystt || (cd ~/.config/waystt && ~/.local/bin/waystt &)"; } // waystt - Speech to Text (clipboard copy) Mod+Shift+R { spawn "sh" "-c" "pgrep -x waystt >/dev/null && pkill -USR2 waystt || (cd ~/.config/waystt && ~/.local/bin/waystt &)"; } }

Configuration is read from ~/.config/waystt/.env by default. You can override this location using the --envfile flag:

waystt --envfile /path/to/custom/.env

waystt supports two transcription providers: OpenAI Whisper (default) and Google Speech-to-Text. Choose the one that best fits your needs.

OpenAI Whisper offers excellent accuracy and supports automatic language detection.

Required: Create ~/.config/waystt/.env with your OpenAI API key:

OPENAI_API_KEY=your_api_key_here

Optional OpenAI settings:

# Whisper model (whisper-1 is default, most cost-effective) WHISPER_MODEL=whisper-1 # Force specific language (default: auto-detect) WHISPER_LANGUAGE=en # API timeout in seconds WHISPER_TIMEOUT_SECONDS=60 # Max retry attempts WHISPER_MAX_RETRIES=3

Google Speech-to-Text provides fast, accurate transcription with support for many languages and dialects.

Setup Steps:

Enable Google Cloud Speech-to-Text API:
- Go to Google Cloud Console
- Create a new project or select existing one
- Enable the "Cloud Speech-to-Text API"
- Create a service account and download the JSON key file
Configure waystt for Google:

# Switch to Google provider TRANSCRIPTION_PROVIDER=google # Path to your service account JSON file GOOGLE_APPLICATION_CREDENTIALS=/path/to/your/service-account-key.json # Primary language (default: en-US) GOOGLE_SPEECH_LANGUAGE_CODE=en-US # Model selection (latest_long for longer audio, latest_short for shorter) GOOGLE_SPEECH_MODEL=latest_long # Optional: Alternative languages for auto-detection (comma-separated) GOOGLE_SPEECH_ALTERNATIVE_LANGUAGES=es-ES,fr-FR,de-DE

Popular Google language codes:

en-US - English (United States)
en-GB - English (United Kingdom)
es-ES - Spanish (Spain)
fr-FR - French (France)
de-DE - German (Germany)
ja-JP - Japanese
zh-CN - Chinese (Simplified)

Audio and system settings (apply to both providers):

# Disable audio beeps ENABLE_AUDIO_FEEDBACK=false # Adjust beep volume (0.0 to 1.0) BEEP_VOLUME=0.1 # Debug logging RUST_LOG=debug

If audio recording fails:

Ensure PipeWire is running: systemctl --user status pipewire
Check microphone permissions
Verify microphone is not muted

If direct text typing (SIGUSR1) fails:

Ensure ydotool is installed and user is in input group
Check ydotool permissions: sudo usermod -a -G input $USER (requires re-login)
Verify ydotool daemon is running: systemctl --user status ydotool

If clipboard operations (SIGUSR2) fail:

Ensure you're running under Wayland: echo $WAYLAND_DISPLAY
Install wtype: Required for clipboard pasting functionality

OpenAI Provider:

Verify your OpenAI API key is valid and has sufficient credits
Check internet connectivity
Review logs for specific error messages

Google Provider:

Verify your service account JSON file path is correct
Ensure the Speech-to-Text API is enabled in your Google Cloud project
Check that your service account has the necessary permissions
Verify your Google Cloud project has billing enabled
Review logs for specific error messages

Running with Debug Output

# Using default config location (~/.config/waystt/.env) RUST_LOG=debug cargo run # Or using project-local .env file for development RUST_LOG=debug cargo run -- --envfile .env

git clone https://github.com/sevos/waystt.git cd waystt # Create config directory and copy example configuration mkdir -p ~/.config/waystt cp .env.example ~/.config/waystt/.env # Edit ~/.config/waystt/.env with your API key # Build the project cargo build --release # Install to local bin mkdir -p ~/.local/bin cp ./target/release/waystt ~/.local/bin/

Licensed under GPL v3.0 or later. Source code: https://github.com/sevos/waystt

See LICENSE for full terms.

Read Entire Article