Next Frontier for LLM Is Quality Long Context

2 days ago 1

With the recent release of new frontiers LLMs in 2025, three things became increasingly clear:

Code generation via LLM is very valuable.
Coding benefits from improved long context.
Long context is still hard to get right.

It’s also not just about having a very long context LLM because you can theoretically get it right just by having an aggressive positional embedding interpolation and a hybrid architecture that makes it possible (like Scout).

There are multiple reasons why long context is hard:

It’s difficult to find quality and coherent long context data that is relevant for users. The benchmarks on this front are still in their early days.
The time complexity of self-attention does make training with large input cumbersome. Using linear attention mechanisms can remove that bottleneck, but it has a performance drawback.
Training on smaller average input lengths and testing on a larger one has a sharp performance decline in traditional RoPE-based systems. Positional interpolation helps, but there are limits.

More about this on my recent review of the subject here:

From RAG-based system to coding in mature repositories, the clear business case for LLM is in being able to ingest a large amount of context without breaking and produce quality output. The best models at the moment struggle beyond a certain point.

From what I’ve seen and read, the field will most likely figure out the recipe to create models with high-quality 1M- 10M context length in 2025. Yes, Scout can theoretically already do 10M, but I would qualify that as low-quality context.

The solution will most likely involve heavy use of hybridization of the layers, both for positional embedding (like Scout mix of RoPE and NoPE) and attention layers (like in Minimax mix of regular and linear attention).

It’s a hard problem, but after watching that full interview with Nikolay Savinov from DeepMind, I think the path is somewhat clear:

The emphasis on having solid inference work being done to make the model work makes me think that Google might still use a non-linear attention mechanism, and that a breakthrough is currently being researched on that front to unlock useful 10M context length.

In all cases, I think this concerted effort to increase context length will have a tremendous impact in multiple AI use cases like RAG-based systems, reasoning models, and coding agents. It will most likely fuel the next big phase of intelligence growth in LLMs, but will for sure require new ideas.

Interested in AI Engineering? Do check out the AI courses at Scrimba; these are pretty solid for beginners!

Have a great week 👋