China proves that open models are more effective than all the GPUs in the world

13 hours ago 1

Comment OpenAI was supposed to make good on its name and release its first open-weights model since GPT-2 this week.

Unfortunately, what could have been the US's first half-decent open model of the year has been held up by a safety review, according to CEO Sam Altman. "While we trust the community will build great things with this model, once weights are out, they can't be pulled back. This is new for us, and we want to get it right," he wrote in a post on X.

The delay leaves the US in a rather awkward spot. Despite hundreds of billions of investment in GPUs, the best open model America has managed so far this year is Meta's Llama 4, which enjoyed a less than stellar reception and was marred with controversy. Just this week, it was reported that Meta had apparently taken its two-trillion-parameter Behemoth out behind the barn after it failed to live up to expectations.

There have been a handful of other open model releases from US companies. Microsoft rolled out a version of Phi-4 14B, which was trained using reinforcement learning to enable reasoning functionality; IBM has released a handful of tiny LLMs focused on agentic workloads; and Google released its multimodal Gemma3 family, which topped out at 27 billion parameters. But these models are small fry compared to Meta's 400-billion-parameter Llama 4 Maverick.

As it stands among US companies, much of the real progress in generative AI development this year has been locked away, accessible only through API calls to someone else's servers.

China continues its AI hot streak

But while US model builders continue to do their best work behind closed doors, China is doing it in the open. As Nvidia's CEO likes to point out, half of the world's AI researchers call China home, and it really shows.

In early 2025, DeepSeek, up to that point a relatively obscure AI dev spun out of Chinese quantitative hedge fund High Flyer, became a household name following the release of its R1 model.

The 671-billion-parameter LLM featured a novel mixture-of-experts (MoE) architecture that allowed it to run far faster and on fewer resources than even smaller LLMs like Llama 3.1 405B while replicating the reasoning functionality of OpenAI's still fresh o1 model.

More importantly, the model weights were released in the open, alongside detailed technical docs showing how they'd done it. And in what should have come as a surprise to no one, it was just a matter of weeks before we began to see Western devs replicate these processes to imbue their own models with reasoning capabilities.

Since then, Alibaba has rolled out a slew of new reasoning and MoE models including QwQ, Qwen3-235B-A22B, and 30B-A3B.

In June, Shanghai-based MiniMax released its 456-billion-parameter reasoning model called M1 under a permissive Apache 2.0 software license. Notable features included a fairly large one-million-token context window and a new attention mechanism the dev claims helps it keep track of all those tokens.

That same month, Baidu open sourced its Ernie family of MoE models, which range in size from 47 billion parameters to 424 billion. Huawei has also open sourced its Pangu models trained on its in-house accelerators, but that release was almost immediately overshadowed by allegations of fraud.

That brings us to July, when Moonshot AI, another Chinese AI dev, lifted the curtain on Kimi 2, a one-trillion-parameter MoE model they claim bests even the West's most potent proprietary LLMs. Take those claims with a grain of salt, but the fact remains, the Chinese have developed a one-trillion-parameter open-weights model. The only US LLMs to come close today are all proprietary.

All of this, it should be remembered, was done in spite of Uncle Sam's crusade to deprive the Chinese of the tools necessary to effectively compete in the AI arena.

The year ain't over yet

This brings us back to OpenAI's promised open-weights model. Not much is known about it other than what AI hype-man Altman has shared on X, and in public interviews and Congressional hearings.

Altman kicked the whole thing off in February when he asked his followers which they'd prefer OpenAI's next open source project to be: an o3-mini-level model that'd run on GPUs or the best smartphone LLM they could muster. The o3-mini-level LLM won out.

Then in June, OpenAI pushed back the model's release for the first time, with Altman posting that the "research team did something unexpected and quite amazing, and we think it will be very very worth the wait but needs a bit longer."

Say what you will about Altman's penchant for hyperbole, but the fact remains that OpenAI has historically led on model development. Regardless of whether it'll live up to the hype, any new competition in the open model arena is welcome, particularly among US players.

Unfortunately, just as OpenAI prepares to release its first open model in six years, it's reported that Meta, under the direction of its pricey new superintelligence lab, may abandon its own commitment to open source in favor of a closed model.

xAI, by all appearances, seems to have already gone down this route with its Grok family of LLMs. Originally, the Elon Musk-backed startup planned to open source the weights of its last model when a new version was released. And while xAI did release Grok-1 upon Grok-2's debut, Grok-3 has been out since February, and its Hugging Face page is looking a little lonely.

Then again, who is going to want a model whose hobbies include cosplaying as Mecha-Hitler? Perhaps, in this rare instance, this is one best left closed. ®

Read Entire Article