Vsdk – Hacky, educational voice SDK

2 days ago 3

A fun, educational,hacky Voice SDK.
Production not ready. For serious projects, consider pipecat or livekit.

Why this project even exists?

Together with my best friend we were curious how hard it would be to write it without external orchestrating libraries, so we hacked it in few days. We also wrote article about our voice-ai journey.

Coolest Feature: The agent can pause its speech when you interject with short phrases like "mhmm" and then seamlessly resume.

You might find leaking buffers. It's hacky, but it works, and that's pretty cool.

Authors @bnowako, @moscicky

Python
uv
API keys for OpenAI, Groq, and ElevenLabs

Install packages:
Set up environment variables:
- Create a .env file in the backend directory.
- Use backend/.env.example as a template.
Run the application:
Open http://localhost:8000/vsdk in your browser and start talking to the agent.

Twilio Integration: A Twilio-compatible WebSocket interface is available at http://localhost:8000/twilio.
Customizable Interfaces: You can implement your own Agent with custom logic or integrate STT/TTS services from different providers.

class VoiceAgent: def __init__( self, stt: BaseSTT, tts: BaseTTS, agent: BaseAgent, ) -> None: self.stt = stt self.tts = tts self.agent = agent