Show HN: Sokuji – Open-source real-time speech translation for Microsoft Teams

2 days ago 4

Live speech translation powered by OpenAI's Realtime API

Sokuji is a desktop application designed to provide live speech translation using OpenAI's Realtime API. It bridges language barriers in live conversations by capturing audio input, processing it through OpenAI's advanced models, and delivering translated output in real-time.

demo.mp4

Prefer not to install a desktop application? Try our browser extension for Chrome and Chromium-based browsers. It offers the same powerful live speech translation features directly in your browser, with special integration for Google Meet and Microsoft Teams.

Installing Browser Extension in Developer Mode

If you want to install the latest version of the browser extension:

Download the latest sokuji-extension.zip from the releases page
Extract the zip file to a folder
Open Chrome/Chromium and go to chrome://extensions/
Enable "Developer mode" in the top right corner
Click "Load unpacked" and select the extracted folder
The Sokuji extension will be installed and ready to use

Sokuji goes beyond basic translation by offering a complete audio routing solution with virtual device management, allowing for seamless integration with other applications. It provides a modern, intuitive interface with real-time audio visualization and comprehensive logging.

Real-time speech translation using OpenAI's Realtime API
Support for GPT-4o Realtime and GPT-4o mini Realtime models
Automatic turn detection with multiple modes (Normal, Semantic, Disabled)
Audio visualization with waveform display
Virtual audio device creation and management on Linux (using PulseAudio/PipeWire)
Automatic audio routing between virtual devices
Audio input and output device selection
Comprehensive logs for tracking API interactions
Customizable model settings (temperature, max tokens)
User transcript model selection (gpt-4o-mini-transcribe, gpt-4o-transcribe, whisper-1)
Noise reduction options (None, Near field, Far field)
API key validation with real-time feedback
Configuration persistence in user's home directory
Multi-channel audio support (stereo)
Push-to-talk functionality with Space key shortcut

Sokuji creates virtual audio devices to facilitate seamless audio routing:

Sokuji_Virtual_Speaker: A virtual output sink that receives audio from the application
Sokuji_Virtual_Mic: A virtual microphone that can be selected as input in other applications
Automatic connection between these devices using PipeWire's pw-link tool
Multi-channel support (stereo audio)
Proper cleanup of virtual devices when the application exits

Understanding the Audio Routing Diagram

The diagram above illustrates the audio flow between Sokuji and other applications:

Chromium: Represents the Sokuji application itself
Google Chrome: Represents meeting applications like Google Meet, Microsoft Teams, or Zoom running in Chrome
Sokuji_Virtual_Speaker: A virtual speaker created by Sokuji
Sokuji_Virtual_Mic: A virtual microphone created by Sokuji
HyperX 7.1 Audio: Represents a physical audio device

The numbered connections in the diagram represent:

Connection ①: Sokuji's audio output is always sent to the virtual speaker (this cannot be changed)
Connection ②: Sokuji's audio is also always routed to the virtual microphone (this cannot be changed)
Connection ③: The monitoring device selected in Sokuji's audio settings, used to play back the translated audio
Connection ④: The audio output device selected in Google Meet/Microsoft Teams (configured in their settings)
Connection ⑤: The virtual microphone selected as input in Google Meet/Microsoft Teams (configured in their settings)
Connection ⑥: The input device selected in Sokuji's audio settings

This routing system allows Sokuji to capture audio from your selected input device, process it through OpenAI's Realtime API, and then output the translated audio both to your local speakers and to other applications via the virtual microphone.

(required) An OpenAI API key with access to the Realtime API
(required) Linux with PulseAudio or PipeWire for virtual audio device support (desktop app only)

Node.js (latest LTS version recommended)
npm
For Linux virtual audio device support:
- PulseAudio or PipeWire
- PipeWire tools (pw-link)

Clone the repository

git clone https://github.com/kizuna-ai-lab/sokuji.git cd sokuji
Install dependencies
Launch the application in development mode
Build the application for production

Download the latest Debian package from the releases page and install it:

sudo dpkg -i sokuji_*.deb

Setup your API key:
- Click the Settings button in the top-right corner
- Enter your OpenAI API key and click "Validate"
- Click "Save" to store your API key securely
Configure audio devices:
- Click the Audio button to open the Audio panel
- Select your input device (microphone)
- Select your output device (speakers/headphones)
Start a session:
- Click "Start Session" to begin
- Speak into your microphone
- View real-time transcription and translation
Use with other applications:
- Select "Sokuji_Virtual_Mic" as the microphone input in your target application
- The translated audio will be sent to that application