Build an AI Voice Agent for Calls with Open Source

3 months ago 27

Make INBOUND and OUTBOUND calls with AI agents using VideoSDK. Supports multiple SIP providers and AI agents with a clean, extensible architecture for VoIP telephony solutions.

Python 3.11+
VideoSDK account
Twilio account (SIP trunking provider)
Google API key (for Gemini AI)

Clone the repository

git clone https://github.com/yourusername/ai-agent-telephony.git cd ai-agent-telephony

Install dependencies

pip install -r requirements.txt

Configure environment variables Create a .env file:

# VideoSDK Configuration VIDEOSDK_AUTH_TOKEN=your_videosdk_token VIDEOSDK_SIP_USERNAME=your_sip_username VIDEOSDK_SIP_PASSWORD=your_sip_password # AI Configuration GOOGLE_API_KEY=your_google_api_key # Twilio SIP Trunking Configuration TWILIO_SID=your_twilio_sid TWILIO_AUTH_TOKEN=your_twilio_auth_token TWILIO_NUMBER=your_twilio_number

Run the server

The server will start on http://localhost:8000

Handle Inbound Calls (SIP User Agent Server)

Handles incoming calls from your SIP provider. Expects Twilio webhook parameters, either host this server or use ngrok:

POST <server-url>/inbound-call

CallSid: Unique call identifier
From: Caller's phone number (CLI - Calling Line Identification)
To: Recipient's phone number (DID - Direct Inward Dialing)

Initiate Outbound Calls (SIP User Agent Client)

POST /outbound-call Content-Type: application/json { "to_number": "+1234567890", "initial_greeting": "Hello from AI Agent!" }

POST /configure-provider?provider_name=twilio

Switch SIP providers at runtime (currently supports: twilio).

The modular architecture makes it easy to add new SIP providers and SIP trunking services. Here's how to add a new provider:

1. Create Provider Implementation

Create providers/your_provider.py:

from typing import Dict, Any from .base import SIPProvider from config import Config class YourProvider(SIPProvider): def __init__(self): self.client = self.create_client() def create_client(self) -> Any: return YourProviderClient(Config.YOUR_API_KEY) def generate_twiml(self, sip_endpoint: str, **kwargs) -> str: return f"<Response><Dial><Sip>{sip_endpoint}</Sip></Dial></Response>" def initiate_outbound_call(self, to_number: str, twiml: str) -> Dict[str, Any]: call = self.client.calls.create( to=to_number, from_=Config.YOUR_NUMBER, twiml=twiml ) return { "call_sid": call.id, "status": call.status, "provider": "your_provider" } def get_provider_name(self) -> str: return "your_provider"

2. Update Provider Factory

Add to providers/__init__.py:

from .your_provider import YourProvider def get_provider(provider_name: str = "twilio") -> SIPProvider: providers = { "twilio": TwilioProvider, "your_provider": YourProvider, } # ... rest of function

Update config.py:

class Config: YOUR_API_KEY = os.getenv("YOUR_API_KEY") YOUR_NUMBER = os.getenv("YOUR_NUMBER") @classmethod def validate(cls) -> None: required_vars = { # ... existing vars "YOUR_API_KEY": cls.YOUR_API_KEY, "YOUR_NUMBER": cls.YOUR_NUMBER, } # ... rest of validation

Similarly, you can add new AI agents for intelligent call handling:

1. Create AI Agent Implementation

Create ai/your_ai_agent.py:

from typing import Dict, Any from videosdk.agents import AgentSession, RealTimePipeline from .base_agent import AIAgent from voice_agent import VoiceAgent from config import Config class YourAIAgent(AIAgent): def create_pipeline(self) -> RealTimePipeline: model = YourAIModel( api_key=Config.YOUR_AI_API_KEY, model="your-model-name" ) return RealTimePipeline(model=model) def create_session(self, room_id: str, context: Dict[str, Any]) -> AgentSession: pipeline = self.create_pipeline() agent_context = { "name": "Your AI Agent", "meetingId": room_id, "videosdk_auth": Config.VIDEOSDK_AUTH_TOKEN, **context } session = AgentSession( agent=VoiceAgent(context=agent_context), pipeline=pipeline, context=agent_context ) return session def get_agent_name(self) -> str: return "your_ai_agent"

2. Update AI Agent Factory

Add to ai/__init__.py:

from .your_ai_agent import YourAIAgent def get_ai_agent(agent_name: str = "gemini") -> AIAgent: agents = { "gemini": GeminiAgent, "your_ai_agent": YourAIAgent, } # ... rest of function

curl "http://localhost:8000/health"

Outbound Call Test (SIP UAC)

curl -X POST "http://localhost:8000/outbound-call" \ -H "Content-Type: application/json" \ -d '{"to_number": "+1234567890", "initial_greeting": "Hello from AI Agent!"}'

curl -X POST "http://localhost:8000/configure-provider?provider_name=twilio"

Variable Description Required

VIDEOSDK_AUTH_TOKEN	VideoSDK authentication token	✅
VIDEOSDK_SIP_USERNAME	VideoSDK SIP username	✅
VIDEOSDK_SIP_PASSWORD	VideoSDK SIP password	✅
GOOGLE_API_KEY	Google API key for Gemini	✅
TWILIO_SID	Twilio account SID	✅
TWILIO_AUTH_TOKEN	Twilio auth token	✅
TWILIO_NUMBER	Twilio phone number	✅

Provider-Specific Variables

For additional SIP providers, add their specific environment variables to config.py.

SIP/VoIP Integration: Pluggable SIP providers (Twilio, and more) with session initiation protocol support
AI-Powered Voice Agents: Pluggable AI agents (Gemini, and more) for intelligent call handling
Real-time Voice Communication: AI agents with real-time transport protocol (RTP) capabilities
Modular Architecture: Clean separation of concerns for scalable telephony solutions
Runtime Configuration: Switch SIP providers and AI agents without restart
VideoSDK Integration: Seamless room creation and session management
Call Control: Advanced call routing, forwarding, and transfer capabilities
Codec Support: Multiple audio codecs for optimal voice quality

Customer Service (SIP-based)

AI agents handle customer inquiries via VoIP
24/7 availability with SIP trunking
Consistent service quality across PSTN and IP networks

Automated appointment booking via SIP calls
Reminder calls using SIP user agent client
Rescheduling assistance with DTMF support

Automated survey calls over SIP
Customer feedback collection via VoIP
Data collection with real-time transport protocol

Automated emergency alerts via SIP trunking
Mass notification systems using PSTN integration
Status updates through IP multimedia subsystem (IMS)

Separation of Concerns: Each component has a single responsibility
Extensibility: Easy to add new SIP providers and AI agents
Testability: Components can be tested in isolation
Maintainability: Clear structure makes code easier to understand
Reusability: Components can be reused across different projects
Configuration Management: Centralized configuration with validation
SIP Compliance: Full session initiation protocol support
VoIP Integration: Seamless integration with voice over internet protocol

Add support for multiple AI agents per session
Implement SIP-specific features (SBC, registrar, proxy server)
Add monitoring and metrics for SIP sessions
Create provider-specific webhook handlers
Add support for different voice codecs per AI agent
Implement call recording and transcription
Add sentiment analysis for call quality
Create web dashboard for call management
Support for H.323 protocol integration
Advanced call control features (forwarding, transfer, queue)

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

Follow the existing code patterns
Add proper error handling
Include logging
Update documentation
Add tests if possible

This project is licensed under the MIT License - see the LICENSE file for details.

Made with ❤️ for the developer community

Read Entire Article