Recursive Multi-Agent Framework

Recursive Multi-Agent Systems

Scaling Agent Collaboration through Latent-space Recursion

1 UIUC  ·  2 Stanford University  ·  3 NVIDIA  ·  4 MIT
* Equal Contribution, Alphabetical Order  ·  Project Lead  ·  Corresponding Authors
+8.3%
Avg. Accuracy Gain Across All Benchmarks
2.4×
End-to-End Speedup Compared to Text-based MAS
−75.6%
Token Usage Compared to Text-based MAS
5
Supported Collaboration Styles across 9 Benchmarks
Overview

Abstract

Recursive or looped language models have recently emerged as a new scaling axis by iteratively refining the same model computation over latent states to deepen reasoning. We extend this scaling principle from a single model to multi-agent systems, and ask: can agent collaboration itself be scaled through recursion?

We introduce RecursiveMAS, a recursive multi-agent framework that casts the entire system as a unified latent-space recursive computation. RecursiveMAS connects heterogeneous agents as a collaboration loop through a lightweight RecursiveLink module, enabling in-distribution latent thoughts generation and cross-agent latent state transfer. To optimize the framework, we develop an inner–outer loop learning algorithm for iterative whole-system co-optimization through shared gradient-based credit assignment across recursion rounds.

Theoretical analyses of runtime complexity and learning dynamics establish that RecursiveMAS is more efficient than text-based MAS and maintains stable gradients during recursive training. Empirically, we instantiate RecursiveMAS under 4 representative collaboration patterns and evaluate across 9 benchmarks spanning mathematics, science, medicine, search, and code. Compared with advanced single/multi-agent and recursive computation baselines, RecursiveMAS consistently delivers an average accuracy improvement of +8.3%, together with 1.2×–2.4× speedup, and 34.6%–75.6% token reduction.

RecursiveMAS overview: scaling trend across recursion depths and generalization across collaboration patterns.

Performance landscape across training/inference recursion depths (top) — the lightweight RecursiveMAS with sub–1.5B agents shows a clean scaling trend as recursion deepens. Generalization across common collaboration patterns (bottom) — the scaled RecursiveMAS seamlessly adapts to diverse multi-agent system structures.

Architecture

RecursiveMAS at a Glance

RecursiveMAS treats the whole multi-agent system as a recursive computation graph. Each agent acts like a layer of a recursive language model — passing latent thoughts to the next, and looping the system's hidden stream across rounds.

Part 1

A Lightweight RecursiveLink

The RecursiveLink is a small two-layer residual module that transmits an agent's last-layer hidden states across two kinds of transitions.

Why residual? The residual branch preserves the original latent semantics, so RecursiveLink only needs to learn the distributional shift — making training more stable and efficient.
Inner and outer RecursiveLink illustrated as 2-layer residual modules connecting latent states.

RecursiveLink. Inner link refines latent thoughts within an agent; outer link bridges hidden representations between heterogeneous agents.

Part 2

Chaining All Agents into a Recursive Loop

Overall architecture of RecursiveMAS. Inner RecursiveLink performs latent thoughts generation; outer RecursiveLink transfers them across heterogeneous agents, forming a recursive loop.

Overall Architecture. Each agent first leverages the inner link to perform latent thoughts generation, then transfers the generated information to the next agent through the outer link. After the last agent finishes, its latent thoughts are fed back to the first agent — closing a recursive loop within the multi-agent system.

Each agent in RecursiveMAS acts like one layer of a recursive language model — information flows within and across agents as the hidden stream of the system:

  • Inside an agent. The inner link folds each generated hidden state back as the next input, producing a sequence of latent thoughts entirely in continuous space.
  • Across agents. The outer link forwards those latent thoughts to the next agent, which conditions on them together with its own input context.
  • Closing the loop. The last agent's latent outputs feed back to the first agent, so each new round refines on top of the previous one. Only the final round decodes text — all intermediate rounds collaborate purely in latent space.
Part 3

Inner–Outer Loop Training

Two-stage inner-outer loop training pipeline for RecursiveMAS.

Two-stage Training Pipeline. An inner loop first warm-starts each agent in parallel to align with latent thoughts generation; an outer loop then unrolls the full recursive computation and jointly optimizes all RecursiveLinks at the system level.

Inner Loop · Model-Level

For each agent $A_i$, we warm-start its inner link via a regression objective that pushes generated latent thoughts toward the agent's own input-embedding distribution of the ground-truth answer $y$:

$$\mathcal{L}_{\text{in}} = 1 - \cos\!\Big(\mathcal{R}_{\text{in}}(H),\; \mathrm{Emb}_{\theta_i}(y)\Big)$$
Outer Loop · System-Level

Unroll the system for $n$ recursion rounds, then minimize a single cross-entropy on the final textual prediction. Gradients back-propagate along the full recursive trace, giving every outer link a shared global credit signal:

$$\mathcal{L}_{\text{out}} = \mathrm{CE}\!\left( \mathcal{S}^{(n)}\!\big(\mathcal{S}^{(n-1)}(\cdots \mathcal{S}^{(1)}(x))\big),\;y\right)$$
Lightweight by design. All base-LLM parameters are frozen — only the inner and outer links are trained — yielding ~13M trainable parameters (~0.31% of the full system) while the whole MAS still co-evolves through recursion.
Why latent? Why recursion?

Theoretical Insights

Two analyses motivate why recursion in latent space, mediated by RecursiveLink, is preferable to text-mediated agent interaction.

Proposition 1 — Runtime Complexity

Text-based recursive MAS pays an expensive per-step decoding cost over the full vocabulary $|V|$, while RecursiveMAS replaces it with a much cheaper latent-space transformation:

$$\Theta\!\big(N(m|V|d_h + (t+m)d_h^2 + (t+m)^2 d_h)\big)$$
vs.
$$\Theta\!\big(N(m\,d_h^2 + (t+m)d_h^2 + (t+m)^2 d_h)\big)$$
Takeaway: Since $d_h \ll |V|$ in practice, RecursiveMAS removes the per-step vocabulary projection bottleneck across all $N$ agents and $m$ latent steps.

Theorem 1 — Gradient Stability

Under realistic assumptions, when token predictions are confident (entropy $\le \epsilon$), text-based recursive SFT suffers from vanishing gradients, while RecursiveLink maintains stable, near-constant gradients across the looped backpropagation:

$$\left\|\tfrac{\partial \mathcal{R}_{\text{text}}(h)}{\partial h}\right\|_2 \le O(\epsilon) \ll 1$$ $$\left\|\tfrac{\partial \mathcal{R}(h)}{\partial h}\right\|_2 \ge \Omega\!\left(1 - \sqrt{\tfrac{1}{d_h}\log\tfrac{1}{\delta}}\right)$$
Takeaway: Latent recursion preserves informative credit signals across rounds, enabling stable whole-system co-optimization.
Plug-and-play

Four Collaboration Patterns

RecursiveMAS is structure-agnostic. We instantiate it under four representative patterns, each composed of off-the-shelf agents from diverse model families.

Sequential Style

Three complementary roles progressively decompose, judge, refine, and solve a problem.

Planner Critic Solver

Mixture Style

Domain specialists reason in parallel; a summarizer aggregates their latent outputs.

Math Code Science Summarizer

Distillation Style

A larger expert pairs with a smaller learner to distill capability while keeping inference fast.

Expert Learner

Deliberation Style

An inner-thinking reflector iteratively critiques a tool-calling agent (Python & search).

Reflector Tool-Caller
Collaboration Pattern Role Model
Sequential (Light)PlannerQwen3-1.7B
CriticLlama3.2-1B-Instruct
SolverQwen2.5-Math-1.5B-Instruct
Sequential (Scaled)PlannerGemma3-4B-it
CriticLlama3.2-3B-Instruct
SolverQwen3.5-4B
MixtureCode SpecialistQwen2.5-Coder-3B-Instruct
Science SpecialistBioMistral-7B
Math SpecialistDeepSeek-R1-Distill-Qwen-1.5B
SummarizerQwen3.5-2B
DistillationExpertQwen3.5-9B
LearnerQwen3.5-4B
DeliberationReflectorQwen3.5-4B
Tool-CallerQwen3.5-4B (Tool-Integration)
Main Results

Whole-System Performance

Under identical training budgets and matched MAS structure, RecursiveMAS consistently outperforms strong single-agent, recursive, and multi-agent baselines.

Method MATH500 AIME 2025 AIME 2026 GPQA-D LiveCodeBench MedQA
Single-Agent Fine-tuning Baselines
Single Agent (LoRA)83.170.073.362.037.476.1
Single Agent (Full-SFT)83.273.376.762.838.677.0
Multi-Agent Frameworks
Mixture-of-Agents (MoA)79.860.063.347.627.057.5
TextGrad84.973.376.762.539.877.2
Recursion-based Methods
LoopLM84.666.763.348.124.956.4
Recursive-TextMAS85.873.373.361.638.777.0
RecursiveMAS (ours) 88.086.786.766.242.979.3

RecursiveMAS at recursion round $r{=}3$. Best-in-column results across all categories.

+8.3%
Average accuracy gain

over the strongest baseline on each benchmark, under matched training budgets.

+18.1%
AIME 2025 reasoning

RecursiveMAS substantially closes the gap on logic-dense competition math.

+13.0%
Code Generation robustness

Held-out hard code generation benchmark; gains preserved over recursion and text-MAS baselines.

Scaling Behavior

Performance Scales with Recursion Depth

On the sequential-style MAS, RecursiveMAS continues to improve as recursion deepens, while Recursive-TextMAS plateaus or regresses — the gap widens with $r$.

Method Metric Math500 AIME 2025 AIME 2026 GPQA-D MedQA Code Gen. Improve
LightScaled LightScaled LightScaled LightScaled LightScaled LightScaled
Recursive Round r = 1
Recursive-TextMAS Acc. 71.984.224.071.316.776.7 28.161.529.076.130.738.5 Base
Time (s) 136824012380846222169376 10562190155515229768867 Base
Token 118514712993939727548854 208436932382142711463154 Base
RecursiveMAS Acc. 75.886.330.780.017.382.7 30.363.130.378.235.140.1 ↑ +3.4
Time (s) 82517011829778417888134 5861965119413484497908 × 1.2
Token 5238161622633815767021 829267513699645772198 ↓ 34.6%
Recursive Round r = 2
Recursive-TextMAS Acc. 72.584.423.370.710.077.3 28.759.128.376.130.038.0 Base
Time (s) 22043958424714380396014110 1825420730972745184714792 Base
Token 21172794531816372498216213 370861284436260919985369 Base
RecursiveMAS Acc. 76.687.133.386.018.784.0 32.364.631.278.336.941.3 ↑ +6.0
Time (s) 109619742367817822638965 7522342142716646278329 × 1.9
Token 4959531614531415526657 8132521138310085312020 ↓ 65.5%
Recursive Round r = 3
Recursive-TextMAS Acc. 69.185.818.073.316.774.7 28.758.628.577.129.336.5 Base
Time (s) 29526010618319304590719678 3322753746843922231022036 Base
Token 30594100864523651781322915 582080916307373126767078 Base
RecursiveMAS Acc. 77.888.234.086.720.086.0 32.666.231.779.337.442.8 ↑ +7.2
Time (s) 136023202727898126299623 86126381704191280510186 × 2.4
Token 5198931586534215376860 7862524137810565952247 ↓ 75.6%
Performance heatmap of RecursiveMAS across training and inference recursion depths.

Train-time × Test-time recursion. Increasing inference depth keeps improving systems trained with fewer rounds, while deeper training shifts the entire performance frontier upward - strongest results consistently appear in the upper-right region.

Efficiency

Faster & Cheaper as Recursion Deepens

Because most rounds happen entirely in latent space, RecursiveMAS avoids repeated text decoding. The advantage compounds with $r$.

Inference time speedup of RecursiveMAS over Recursive-TextMAS across recursion rounds.

End-to-end inference time speedup grows from 1.2× at $r{=}1$ to 1.9× at $r{=}2$ to 2.4× at $r{=}3$.

Token usage reduction of RecursiveMAS over Recursive-TextMAS across recursion rounds.

Token-usage reduction scales from 34.6% at $r{=}1$ to 65.5% at $r{=}2$ to 75.6% at $r{=}3$.

Training Method Peak GPU Mem. Trainable Params. Estimated Cost Avg. Accuracy
LoRA Training21.67 GB15.92M (0.37%)$6.6466.9
Full SFT41.40 GB4.21B (100%)$9.6768.6
RecursiveMAS (ours) 15.29 GB 13.12M (0.31%) $4.27 74.9

With the lowest GPU memory, smallest trainable footprint, and lowest estimated cost, RecursiveMAS still attains the highest average accuracy across all downstream tasks.

Generalization

One Framework, Many Patterns

Beyond sequential MAS, RecursiveMAS extends to mixture, distillation, and deliberation styles with consistent gains over the strongest standalone agent in each setting.

RecursiveMAS generalization across mixture, deliberation, and distillation collaboration patterns.

Across all three additional patterns, RecursiveMAS improves over the strongest standalone agent. In Distillation Style, it boosts the smaller learner while preserving a ~1.5× speed advantage over the expert.

+6.2%
Mixture Style

Beats the strongest single domain specialist via cross-domain composition.

+4.8%
Deliberation Style

Recursive latent coordination remains effective with external tool use.

+8.0%
Distillation Style

Lifts the learner while keeping a 1.5× speed edge over the expert.

Case Study

Recursion Self-Corrects across Rounds

A common pattern across our case studies: an early round produces an incorrect answer; deeper rounds successfully recover the right one through iterative latent refinement.

MATH500 · Question

For how many positive integers $n > 1$ is $2^{24}$ a perfect $n^{\text{th}}$ power?

Round 1 Incorrect
// RecursiveMAS final output, r = 1
The model enumerates all factor pairs of 24 and reports six remaining cases after excluding pairs with n = 1. It also accidentally drops the case n = 24, so its count is off by one — an early-stage double-counting mistake.
final = 6
Round 2 Correct
// RecursiveMAS final output, r = 2
The latent loop reframes the problem: writing $2^{24}\!=\!2^{mn}$ shows perfect n-th powers correspond to divisors of 24. Excluding n = 1, seven valid divisors remain — the spurious factor-pair counting is gone.
final = 7
Round 3 Correct & Cleaner
// RecursiveMAS final output, r = 3
The argument is now compact: n must divide 24, exclude n = 1, and report the count. Latent recursion both fixes the answer and shortens the reasoning trace.
final = 7

Compared with text-mediated baselines, RecursiveMAS reaches the right answer in one fewer external decoding cycle — intermediate rounds collaborate without ever emitting tokens.

Examples

RecursiveMAS Across Downstream Tasks

Four end-to-end traces from the paper's appendix — one per domain. Recursion rounds 1 and 2 happen entirely in latent space; only the final round emits text. Scroll horizontally or use the arrows to navigate.

Cite

BibTeX

@misc{recursivemas,
      title={Recursive Multi-Agent Systems}, 
      author={Xiyuan Yang and Jiaru Zou and Rui Pan and Ruizhong Qiu and Pan Lu and Shizhe Diao and Jindong Jiang and Hanghang Tong and Tong Zhang and Markus J. Buehler and Jingrui He and James Zou},
      year={2026},
      eprint={2604.25917},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2604.25917}, 
}