All the examples I have are highly unoptimized - For eg, Modal Labs uses FastAPI - https://modal.com/docs/examples/chatterbox_tts\ BentoML also uses FastAPI like service - https://www.bentoml.com/blog/deploying-a-text-to-speech-application-with-bentoml\
Even Chatterbox TTS has a very naive example - https://github.com/resemble-ai/chatterbox\
Tritonserver docs don’t have a TTS example.
I am 100% certain that a highly optimized variant can be written with TritonServer, utilizing model concurrency and batching.
If someone has implemented a TTS service with Tritonserver or has a better inference server alternative to deploy, please help me out here. I don’t want to reinvent the wheel.