I am working on a product that uses Azure in the back-end for LLMs and Audio Models. Just like how I test the code for every release, every time I add or update things on the system prompts for calibration or new features I also test the conversational flow.
What I mean by this is, I have a set of conversations, used with 0 temperature to guarantee I get most similar answers. The cool thing is I've been working on this product over 6 months and I can see how the very same model of the LLM gets worse and worse. I use the very same messages, and the JSON responses I receive get less and less accurate.
Namely, you are lobotomizing models in the background. Same model, same system prompt, same messages but worse results.
I currently use gpt-4o-mini for language, thank fully it's speed is there but it's answer accuracy is horrible after gpt-5 release. Then I thought I would switch version, and checked out gpt-5-mini and nano. What do you know? gpt-5 is as good as got-4o-mini was according to my tests, but insanely slow sometimes takes up to 20 seconds with minimal reasoning (which still produces bad results.)
So I am trying to understand what is Microsoft's game here? Probably you want to onboard people to newer models to expire old ones, but since the newer models are not good and slow, you have to do this by somehow reducing their quality? And serving smaller parameter versions but still calling them with same names?
This is a bad business strategy, and not everyone is working on note taking apps and text summarization. Accuracy and consistency matter. Which brings me an my team to consider moving away from Azure, since it cannot provide stable service.
I am glad I have proof of this with the test system we put in place and not making this up. What you are doing is bad, either provide better products and ask people to switch or keep things stable and backwards compatible.
Azure AI Language
An Azure service that provides natural language capabilities including sentiment analysis, entity extraction, and automated question answering.