Namo-v1 is a collection of open-source turn detection models that solve one of the most challenging problems in conversational AI: knowing when a user has finished speaking. Namo uses advanced Natural Language Understanding to analyze semantic context, creating more natural conversations with reduced interruptions and latency.
Namo's Solution: Uses Natural Language Understanding to analyze the semantic meaning and context of speech, distinguishing between:
- ✅ Complete utterances (user is done speaking)
- 🔄 Incomplete utterances (user will continue speaking)
- Semantic Understanding: Analyzes meaning and context, not just silence
- Ultra-Fast Inference: <19ms for specialized models, <29ms for multilingual
- Lightweight: ~135MB (specialized) / ~295MB (multilingual)
- High Accuracy: Up to 97.3% for specialized models, 90.25% average for multilingual
- Production-Ready: ONNX-optimized for real-time, enterprise-grade applications
- Easy Integration: Standalone usage or plug-and-play with VideoSDK Agents SDK
Namo offers both specialized single-language models and a unified multilingual model:
- Model: Namo-Turn-Detector-v1-Multilingual
- Base: mmBERT
- Languages: All 23 supported languages
- Inference: <29ms
- Size: ~295MB
- Average Accuracy: 90.25%
Evaluated on 25,000+ diverse utterances across all supported languages.
| 🇹🇷 Turkish | 97.31% | 0.9611 | 0.9853 | 0.9730 | 966 |
| 🇰🇷 Korean | 96.85% | 0.9541 | 0.9842 | 0.9690 | 890 |
| 🇯🇵 Japanese | 94.36% | 0.9099 | 0.9857 | 0.9463 | 834 |
| 🇩🇪 German | 94.25% | 0.9135 | 0.9772 | 0.9443 | 1,322 |
| 🇮🇳 Hindi | 93.98% | 0.9276 | 0.9603 | 0.9436 | 1,295 |
| 🇳🇱 Dutch | 92.79% | 0.8959 | 0.9738 | 0.9332 | 1,401 |
| 🇳🇴 Norwegian | 91.65% | 0.8717 | 0.9801 | 0.9227 | 1,976 |
| 🇨🇳 Chinese | 91.64% | 0.8859 | 0.9608 | 0.9219 | 945 |
| 🇫🇮 Finnish | 91.58% | 0.8746 | 0.9702 | 0.9199 | 1,010 |
| 🇬🇧 English | 90.86% | 0.8507 | 0.9801 | 0.9108 | 2,845 |
| 🇵🇱 Polish | 90.68% | 0.8619 | 0.9568 | 0.9069 | 976 |
| 🇮🇩 Indonesian | 90.22% | 0.8514 | 0.9707 | 0.9071 | 971 |
| 🇮🇹 Italian | 90.15% | 0.8562 | 0.9640 | 0.9069 | 782 |
| 🇩🇰 Danish | 89.73% | 0.8517 | 0.9644 | 0.9045 | 779 |
| 🇵🇹 Portuguese | 89.56% | 0.8410 | 0.9676 | 0.8999 | 1,398 |
| 🇪🇸 Spanish | 88.88% | 0.8304 | 0.9681 | 0.8940 | 1,295 |
| 🇮🇳 Marathi | 88.50% | 0.8762 | 0.9008 | 0.8883 | 774 |
| 🇺🇦 Ukrainian | 87.94% | 0.8164 | 0.9587 | 0.8819 | 929 |
| 🇷🇺 Russian | 87.48% | 0.8318 | 0.9547 | 0.8890 | 1,470 |
| 🇻🇳 Vietnamese | 86.45% | 0.8135 | 0.9439 | 0.8738 | 1,004 |
| 🇸🇦 Arabic | 84.90% | 0.7965 | 0.9439 | 0.8639 | 947 |
| 🇧🇩 Bengali | 79.40% | 0.7874 | 0.7939 | 0.7907 | 1,000 |
Average Accuracy: 90.25% across all languages
For detailed performance metrics of individual specialized models, visit their respective model pages.
- Architecture: DistilBERT-based
- Inference: <19ms
- Size: ~135MB each
📊 View Full Collection: All Namo Models on HuggingFace
All Namo models are optimized using ONNX quantization, which reduces model complexity while maintaining high accuracy:
- 2.19x relative speedup compared to unquantized models
- Inference time: Reduced from 61.3ms → 28.0ms
- Throughput: More than doubled
- Accuracy Impact: Negligible
- Latency: Virtually zero perceptible delay in conversations
We’ve provided an inference script to help you quickly test these models. Just plug it in and start experimenting!
For seamless integration into your voice agent pipeline:
📚 Complete Integration Guide: VideoSDK Agents Documentation
Each model includes Colab notebooks for training and testing:
- Training Notebooks: Fine-tune models on your own datasets
- Testing Notebooks: Evaluate model performance on custom data
Visit individual model pages for notebook links:
This project is licensed under the Apache License 2.0.
If you use Namo-v1 in your research or project, please cite:
.png)

