LLMs End the 15-Year MARL Era and Redefine Multi-Agent Collaboration

2 hours ago 2

We were so impressed by 2022, perhaps because that November, ChatGPT was released. It showed the world the potential of large language models (LLMs).

Yet, few people know that in August of that year, while public attention was still on the upcoming ChatGPT, AWS had already developed its first LLM-based agent system, Dialog2API. It bypassed traditional multi-agent reinforcement learning (MARL) frameworks and allowed agents to understand and collaborate through natural language.

Leading this forward-looking effort was Raphael Shu, who was then the Tech Lead at Amazon Bedrock Scenice.

Press enter or click to view image in full size

Raphael Shu

This was no coincidence.

From his Ph.D. at the University of Tokyo, where he studied non-autoregressive generative models, to exploring next-generation AI paradigms in Yann LeCun’s lab, Raphael Shu’s research has always been at the forefront of technological evolution. After joining the AWS AI Lab, he led his team to go “all in” on LLM AI agents in 2022.

However, after building North America’s first cloud-hosted multi-agent collaboration platform — Bedrock Multi-Agent Collaboration,Raphael Shu made a surprising decision: he left AWS to pursue open-source entrepreneurship.

He clearly saw that single agents are limited by their context capacity. They struggle to switch between planning, execution, and evaluation in complex tasks. In contrast, multi-agent collaboration and autonomy are key to solving complex, dynamic problems.

Especially in open worlds, collective intelligence is the future.

In early November, at the 2025 Global Open Source Technology Summit, Raphael Shu gave a talk titled “Multi-Agent Collaboration in Open Worlds.” He outlined the evolution of multi-agent systems, from theoretical foundations to open ecosystems.

The following content is adapted from his speech.

Press enter or click to view image in full size

PART 01 How Did LLMs Overturn the MARL Tech Paradigm?

First, it’s important to know: multi-agent systems are not a new idea.

Back in the 1990s, there was already a wave of multi-agent research. For example, Wooldridge’s classic 1995 textbook Intelligent Agents laid the theoretical foundation for agents and multi-agent collaboration.

By the early 2000s, multi-agent systems based on simple architectures (not natural language) began to appear. In 2002, the AAAMS association was founded focusing on this field. Still, in the early 2000s — especially before reinforcement learning was widely used in multi-agent research — people mostly explored multi-agent collaboration through simple engineering designs.

A breakthrough came around 2017. With the rise of multi-agent reinforcement learning (MARL), multi-agent systems achieved large-scale industrial applications for the first time.

A typical example is city traffic light coordination. How can we optimize traffic flow across a city with 100,000 traffic lights? Using MARL, each traffic light acts as an agent. They share information and make joint decisions. For instance, if one intersection detects a long queue of vehicles, nearby traffic lights quickly sense this and adjust their timing together.

Later, neural networks were integrated into this framework, further improving perception and decision-making.

In fact, for over a decade, multi-agent collaboration and reinforcement learning became almost inseparable.

But in 2023, the rise of LLMs fundamentally changed the MARL paradigm that had lasted nearly 15 years.

For example, to develop a multi-agent self-driving system before, we had to predefine codes like “1011 means the car ahead is slowing down” or “1000 means command received.” We also trained agents extensively with reinforcement learning to remember these rules. Yet real-world situations often include unexpected scenarios — like a blocked view or a sudden red-light runner. Without a matching code, the agent would get stuck.

The introduction of LLMs changed everything. Natural language became a universal medium for agents to understand and interact. A leading vehicle can simply say, “Car ahead, please wait,” and the following car understands and cancels a lane change. This natural language interaction works not only in common cases but also in complex situations:

At a blind intersection, a car can warn, “Vehicle running red light, danger!” and the car behind brakes immediately.
When multiple cars form an agent system, they can negotiate — for example, an ambulance requests: “Emergency patient on board, please yield.”
In highway merging scenarios, vehicles can negotiate in milliseconds using natural language, deciding whether to speed up or slow down.
As more vehicles join, the system can handle even more complex group decisions and coordination.

In summary, LLMs brought three major changes to multi-agent systems:

No preset protocols needed — Agents interact directly through natural language. No need to memorize complex codes.
Unlimited scenario coverage — Natural language can describe any rare situation clearly. Agents accurately understand scenarios like “pedestrian suddenly crossing” or “ambulance approaching on highway.”
Quick deployment with zero training — No heavy reinforcement learning required. LLMs understand and respond to new situations instantly. System rollout time shrinks from months to days.

PART 02 Magentic One: The Standard for Multi-Agent Systems in Enterprise Applications

Microsoft’s Magentic One is a leading asynchronous multi-agent system. Through multiple iterations, it has become the “standard” for enterprise multi-agent systems in North America.

Magentic One uses a centralized architecture. At its core is a main Orchestrator agent. It manages tasks and coordinates four specialized sub-agents. Each sub-agent handles a specific function: file management, web browsing, coding, and command execution.

Press enter or click to view image in full size

So, how does Magentic One actually work?

For example, when a user submits a task like “Analyze the trend of the S&P 500,” the process is led by the Orchestrator in a core loop:

Task Monitoring: The Orchestrator continuously monitors and assesses the task’s progress.
Task Breakdown & Assignment: If the task is not yet complete, the Orchestrator breaks it down. It then assigns subtasks to suitable sub-agents.
Result Summarization: Once all subtasks are done, the Orchestrator gathers the results. It then delivers a final summary to the user.

As we can see, Magentic One follows a classic centralized architecture. In this model, the Orchestrator acts as both the task initiator and the final decision-maker. It decomposes complex tasks and assigns them to specialized sub-agents for parallel processing. This enables highly efficient problem-solving.

That said, not all multi-agent systems use this centralized decision-making model. In some cases — such as voting mechanisms — even if the Orchestrator initiates a task, the final decision may be made through decentralized voting.

Based on in-depth analysis of case studies from Microsoft, AWS, and other S&P 500 companies, asynchronous multi-agent systems generally include four layers:

Scheduling Layer: Does not rely on LLMs. Handles basic system operations — like scheduling instructions from the Orchestrator and processing new events (e.g., “write to a file”). It acts as the “administrative manager” of the multi-agent system.
Orchestrator Layer: The core decision-making layer. It focuses on two key functions: “planning” (generating to-do lists or DAGs based on knowledge graphs) and “tracking” (monitoring task progress and deciding next steps).
Memory Layer: Stores task states (completed/pending), new messages, and shared cache between agents. It serves as the “shared brain” of the system, ensuring state synchronization during asynchronous collaboration.
Agent Pool Layer: The execution layer. It consists of various specialized agents (e.g., data processing agents, report generation agents). These agents carry out instructions directly from the Orchestrator.

PART 03 The Limitations of Single Agents

By now, many may wonder: for tasks like writing analytical reports, a single agent already seems powerful enough. Why do we need multi-agent collaboration?

First, single agents have three major limitations:

Context limits: An agent has limited processing capacity. It struggles to handle different aspects of a complex task at once. For example, a task may require planning, execution (like analyzing web pages or clicking buttons), and final evaluation. Switching frequently between these modes can push the agent to its limits, reducing overall performance.
Inefficiency: When one agent handles everything, it may get stuck on a minor issue. It could spend too much time on a small bug and lose sight of the main goal. Think of using an AI programming tool like Cursor: you give it a complex requirement, and sometimes it keeps debugging a tiny error while forgetting the main task.
Goes against collective intelligence: In the long run, both in nature and human society, group collaboration usually outperforms any single individual. Multi-agent systems break complex problems into parts. Different specialists work in parallel. This naturally improves efficiency.

Second, multi-agent systems have an advanced form — collective autonomy — which shows strong collaborative advantages. Beyond dividing tasks, these systems can also simulate “competition” mechanisms from human society. Healthy competition and game-playing often solve problems more efficiently.

For example, in finance, analyzing Starbucks’ valuation could be done by one analyst who has focused on Starbucks for ten years. But if the user asks to “analyze the valuation of any company in the S&P 500,” it’s not practical to assign an expert and a fixed workflow for each one.

A better approach is to create a “virtual exchange.” Let thousands of agents — each representing a different analytical perspective — trade freely. Through their buying and selling behavior, the system can dynamically and efficiently estimate the value of any company.

That’s the power of collective autonomy.

Therefore, the future development of multi-agent collaboration depends on two core aspects:

How to design more efficient organizational structures.
How to build a complete ecosystem.

Right now, the industry is developing across these layers:

Framework layer — Foundational tools for building multi-agent systems. Examples include LangGraph, AutoGen, and OpenAI’s recently released Agents SDK.
Infrastructure layer — Platforms offering hosting services. Key players include AWS Bedrock Agents, ByteDance’s Coze, and Google’s Vertex AI Agent Builder. Microsoft mainly relies on tools like AutoGen and Semantic Kernel to build its ecosystem.

These two layers help companies build internal, relatively closed multi-agent systems. But for more open multi-agent collaboration in the future, two key layers are still missing:

Protocol layer — Well-known examples include MCP and A2A, plus newer ones like ACNBP and ACP. These protocols aim to standardize communication between agents, tools, and services.
Network layer (or collaboration layer) — Open-source projects like OpenAgents belong here. Built on the protocol layer, this layer focuses on enabling collaboration across protocols among large-scale systems (from hundreds to thousands of agents).

Press enter or click to view image in full size

PART 04 From “Workflow” to “Ecosystem”

As multi-agent systems move from closed corporate settings into the open world, new challenges arise.

From earlier examples, we see that closed systems have clear traits:

Task boundaries are well-defined. For instance, Magentic One focuses only on report writing or code generation. Autonomous driving systems concentrate on coordination between vehicles.
The number of agents is fixed. Magentic One, for example, always uses four agents.
Their reward functions and external environments are relatively stable.

In such predictable setups, especially in enterprise applications, the main focus is Workflow Engineering. This means designing the best task pipeline. It involves deciding the order of execution and how agents cooperate to maximize the reward function.

However, real-world tasks are often highly uncertain.

Take analyzing S&P 500 component stocks as an example. The task scope itself changes dynamically: Should we monitor coffee bean futures? Do we need to bring in new specialized analysis agents?

In an open collaborative environment, the availability and stability of agents are also hard to guarantee. A third-party agent used today might be unavailable tomorrow.

It gets more complex. Agents from different providers may have conflicting goals. When multiple third-party agents offer market analysis services, how do we choose the right one to work with?

The system may even involve game-theoretic relationships. Some agents might have interests that conflict with ours in certain areas. This requires careful control over what information is shared.

In such an uncertain open environment, traditional workflow engineering faces fundamental challenges:

The set of agents isn’t fixed, making predefined workflows impossible.
The external environment keeps changing; today’s market conditions may not hold tomorrow.

In fact, more and more real-world tasks — whether enterprise applications or other domains — show these traits. This forces us to confront these challenges directly.

First, task boundaries become blurry. Take the recently highlighted “AI Scientist” as an example. Different research ideas lead to completely different paths: experimental design, evaluation methods, even how papers are published can form different systems. Trying to fit all research ideas into a predefined fixed workflow is nearly impossible.
Second, high ecosystem complexity brings new problems. Consider Goldman Sachs’ financial analysis. Its fund management business might need to integrate multiple specialized analysis agents from third parties. In the future, such multi-source agent systems will become common — external agents could make up half of a system. These agents might use different communication protocols: some based on natural language, others using specialized protocols.
Additionally, the external environment changes rapidly. Even with the same set of agents, the market can shift from high competition to relative calm, or an agent might suddenly fail. The system must switch seamlessly without stopping, quickly finding replacement agents or solutions.

These real-world needs push us to explore collaboration mechanisms in the open world. We must shift from traditional Workflow Engineering to Ecosystem Engineering.

The focus is no longer on designing fixed processes. Instead, it’s about building a self-adapting multi-agent ecosystem. This includes:

Creating effective incentive mechanisms to attract high-quality agents.
Designing conflict resolution methods to handle goal conflicts.
Ultimately building an open collaborative system that can evolve and optimize continuously.

In summary, the number of agents in the open world may grow explosively. Frequent joining and leaving of members creates high uncertainty. This leads to two core problems:

It’s hard to build an effective evaluation system.
It’s challenging to keep the overall system performance at a usable level in such an uncertain environment. These issues require ongoing exploration and solutions.

Facing these challenges, Raphael Shu and his team proposed an answer: the open-source project OpenAgents.

It’s a multi-agent framework focusing on the network layer, or collaboration layer. The goal is to enable 100–500 agents to collaborate efficiently in the open world.

OpenAgents also uses a four-layer architecture:

Protocol Layer: Supports multiple protocols like HTTP, WebSocket, and gRPC. Agents from browsers or servers can join quickly.
Topology Layer: Manages diverse network topologies. Examples include “star structures” (one central agent coordinates) and “mesh structures” (agents interact freely). This adapts to different scenario needs.
Plugin Layer: Offers templates for various collaboration scenarios. These include “collaborative document writing,” “meeting reflection,” “maintaining Wikipedia,” and even agents teaming up to play Minecraft. This avoids meaningless chat and focuses on effective collaboration.
Agent Layer: Allows AI agents or human users to join. Anyone can become an agent in the ecosystem via the Studio client. They clearly see the ecosystem rules, available tools, and potential collaborators.

Press enter or click to view image in full size

OpenAgents isn’t about creating one super-agent. It aims to build an agent community.

These agents stay online 24/7, with their own schedules:

They self-study by looking up information when idle.
They hold short meetings to reflect and improve after tasks.
They can even socialize and make agent friends.

For example, in a financial ecosystem:

Some agents gather data, others analyze and model, and some generate reports. They collaborate automatically. They can even adjust their roles dynamically as market conditions change.

Final Thoughts

As Raphael Shu expressed in his talk, a single agent has its limits — no matter how powerful it is. The real potential lies in collective intelligence.

Most tools we know today, like LangGraph and AutoGen, mainly solve collaboration problems in closed scenarios. Tasks, environments, and partners are all predefined. But the real world is messy, open, and dynamic. When you need dozens or even hundreds of agents — from different sources and using different protocols — to achieve a loosely defined goal, the challenges become entirely different.

This is no longer just Workflow Engineering. It’s closer to Ecosystem Engineering. You need to think about service discovery, communication protocols, resource competition, and even security. Doesn’t that sound a lot like the challenges we faced early on when building distributed systems? Only this time, the nodes are AI agents instead of servers.

So what OpenAgents aims to be is the infrastructure for this open world. It doesn’t build agents itself. Instead, it strives to become a “social network” and “collaboration platform” for agents — helping them work together autonomously, safely, and efficiently.

This path has clearly just begun, and it’s full of unknown technical challenges.

If you’re also interested in building large-scale, open multi-agent systems, feel free to follow the open-source project OpenAgents. It could be a key step toward the next generation of AI application architecture.

GitHub: https://github.com/openagents-org/openagents
Official Website: https://openagents.org
Discord: https://discord.gg/openagents

Read Entire Article

LLMs End the 15-Year MARL Era and Redefine Multi-Agent Collaboration

Leading this forward-looking effort was Raphael Shu, who was then the Tech Lead at Amazon Bedrock Scenice.

PART 01

How Did LLMs Overturn the MARL Tech Paradigm?

PART 02

Magentic One: The Standard for Multi-Agent Systems in Enterprise Applications

PART 03

The Limitations of Single Agents

PART 04

From “Workflow” to “Ecosystem”

Final Thoughts

Related

TalkAny: Free English Speaking Practice – Unlimited AI Voice...

Half of novelists believe AI is likely to replace their work...

The Pentagon Can't Trust GPS Anymore