Show HN: Sokuji – Open-source real-time speech translation for Microsoft Teams

2 days ago 4

Sokuji Logo

Live speech translation powered by OpenAI's Realtime API

AGPL-3.0 License Build and Release OpenAI Vibe Coding Ask DeepWiki

Sokuji is a desktop application designed to provide live speech translation using OpenAI's Realtime API. It bridges language barriers in live conversations by capturing audio input, processing it through OpenAI's advanced models, and delivering translated output in real-time.

demo.mp4

Prefer not to install a desktop application? Try our browser extension for Chrome and Chromium-based browsers. It offers the same powerful live speech translation features directly in your browser, with special integration for Google Meet and Microsoft Teams.

Chrome Web Store Sokuji - Live speech translation with real-time AI | Product Hunt

Installing Browser Extension in Developer Mode

If you want to install the latest version of the browser extension:

  1. Download the latest sokuji-extension.zip from the releases page
  2. Extract the zip file to a folder
  3. Open Chrome/Chromium and go to chrome://extensions/
  4. Enable "Developer mode" in the top right corner
  5. Click "Load unpacked" and select the extracted folder
  6. The Sokuji extension will be installed and ready to use

Sokuji goes beyond basic translation by offering a complete audio routing solution with virtual device management, allowing for seamless integration with other applications. It provides a modern, intuitive interface with real-time audio visualization and comprehensive logging.

  1. Real-time speech translation using OpenAI's Realtime API
  2. Support for GPT-4o Realtime and GPT-4o mini Realtime models
  3. Automatic turn detection with multiple modes (Normal, Semantic, Disabled)
  4. Audio visualization with waveform display
  5. Virtual audio device creation and management on Linux (using PulseAudio/PipeWire)
  6. Automatic audio routing between virtual devices
  7. Audio input and output device selection
  8. Comprehensive logs for tracking API interactions
  9. Customizable model settings (temperature, max tokens)
  10. User transcript model selection (gpt-4o-mini-transcribe, gpt-4o-transcribe, whisper-1)
  11. Noise reduction options (None, Near field, Far field)
  12. API key validation with real-time feedback
  13. Configuration persistence in user's home directory
  14. Multi-channel audio support (stereo)
  15. Push-to-talk functionality with Space key shortcut

Audio Routing Diagram

Sokuji creates virtual audio devices to facilitate seamless audio routing:

  • Sokuji_Virtual_Speaker: A virtual output sink that receives audio from the application
  • Sokuji_Virtual_Mic: A virtual microphone that can be selected as input in other applications
  • Automatic connection between these devices using PipeWire's pw-link tool
  • Multi-channel support (stereo audio)
  • Proper cleanup of virtual devices when the application exits

Understanding the Audio Routing Diagram

The diagram above illustrates the audio flow between Sokuji and other applications:

  • Chromium: Represents the Sokuji application itself
  • Google Chrome: Represents meeting applications like Google Meet, Microsoft Teams, or Zoom running in Chrome
  • Sokuji_Virtual_Speaker: A virtual speaker created by Sokuji
  • Sokuji_Virtual_Mic: A virtual microphone created by Sokuji
  • HyperX 7.1 Audio: Represents a physical audio device

The numbered connections in the diagram represent:

Connection ①: Sokuji's audio output is always sent to the virtual speaker (this cannot be changed)
Connection ②: Sokuji's audio is also always routed to the virtual microphone (this cannot be changed)
Connection ③: The monitoring device selected in Sokuji's audio settings, used to play back the translated audio
Connection ④: The audio output device selected in Google Meet/Microsoft Teams (configured in their settings)
Connection ⑤: The virtual microphone selected as input in Google Meet/Microsoft Teams (configured in their settings)
Connection ⑥: The input device selected in Sokuji's audio settings

This routing system allows Sokuji to capture audio from your selected input device, process it through OpenAI's Realtime API, and then output the translated audio both to your local speakers and to other applications via the virtual microphone.

  • (required) An OpenAI API key with access to the Realtime API
  • (required) Linux with PulseAudio or PipeWire for virtual audio device support (desktop app only)
  • Node.js (latest LTS version recommended)
  • npm
  • For Linux virtual audio device support:
    • PulseAudio or PipeWire
    • PipeWire tools (pw-link)
  1. Clone the repository

    git clone https://github.com/kizuna-ai-lab/sokuji.git cd sokuji
  2. Install dependencies

  3. Launch the application in development mode

  4. Build the application for production

Download the latest Debian package from the releases page and install it:

sudo dpkg -i sokuji_*.deb
  1. Setup your API key:

    API Settings

    • Click the Settings button in the top-right corner
    • Enter your OpenAI API key and click "Validate"
    • Click "Save" to store your API key securely
  2. Configure audio devices:

    Audio Settings

    • Click the Audio button to open the Audio panel
    • Select your input device (microphone)
    • Select your output device (speakers/headphones)
  3. Start a session:

    • Click "Start Session" to begin
    • Speak into your microphone
    • View real-time transcription and translation
  4. Use with other applications:

    • Select "Sokuji_Virtual_Mic" as the microphone input in your target application
    • The translated audio will be sent to that application
  • Electron 34
  • React 18
  • TypeScript
  • OpenAI Realtime API
  • PulseAudio/PipeWire for virtual audio devices
  • SASS for styling
  • React-Feather for icons

AGPL-3.0

Read Entire Article