Recursive Multi-Agent Systems

Yang, Xiyuan; Zou, Jiaru; Pan, Rui; Qiu, Ruizhong; Lu, Pan; Diao, Shizhe; Jiang, Jindong; Tong, Hanghang; Zhang, Tong; Buehler, Markus J.; He, Jingrui; Zou, James

Overview

Abstract

Recursive or looped language models have recently emerged as a new scaling axis by iteratively refining the same model computation over latent states to deepen reasoning. We extend this scaling principle from a single model to multi-agent systems, and ask: can agent collaboration itself be scaled through recursion?

We introduce RecursiveMAS, a recursive multi-agent framework that casts the entire system as a unified latent-space recursive computation. RecursiveMAS connects heterogeneous agents as a collaboration loop through a lightweight RecursiveLink module, enabling in-distribution latent thoughts generation and cross-agent latent state transfer. To optimize the framework, we develop an inner–outer loop learning algorithm for iterative whole-system co-optimization through shared gradient-based credit assignment across recursion rounds.

Theoretical analyses of runtime complexity and learning dynamics establish that RecursiveMAS is more efficient than text-based MAS and maintains stable gradients during recursive training. Empirically, we instantiate RecursiveMAS under 4 representative collaboration patterns and evaluate across 9 benchmarks spanning mathematics, science, medicine, search, and code. Compared with advanced single/multi-agent and recursive computation baselines, RecursiveMAS consistently delivers an average accuracy improvement of +8.3%, together with 1.2×–2.4× speedup, and 34.6%–75.6% token reduction.

RecursiveMAS overview: scaling trend across recursion depths and generalization across collaboration patterns.

Performance landscape across training/inference recursion depths (top) — the lightweight RecursiveMAS with sub–1.5B agents shows a clean scaling trend as recursion deepens. Generalization across common collaboration patterns (bottom) — the scaled RecursiveMAS seamlessly adapts to diverse multi-agent system structures.

Architecture

RecursiveMAS at a Glance

RecursiveMAS treats the whole multi-agent system as a recursive computation graph. Each agent acts like a layer of a recursive language model — passing latent thoughts to the next, and looping the system's hidden stream across rounds.

Part 1

A Lightweight RecursiveLink

The RecursiveLink is a small two-layer residual module that transmits an agent's last-layer hidden states across two kinds of transitions.

Inner Link Latent thoughts inside an agent

Maps the last-layer hidden state back into the input embedding space so the agent can keep generating in continuous latent space — without ever projecting to tokens.

Outer Link Cross-agent latent transfer

Adds a small projection so latent thoughts from one agent can be passed as conditioning inputs to another agent — even with different hidden sizes.

Why residual? The residual branch preserves the original latent semantics, so RecursiveLink only needs to learn the distributional shift — making training more stable and efficient.

Inner and outer RecursiveLink illustrated as 2-layer residual modules connecting latent states.

RecursiveLink. Inner link refines latent thoughts within an agent; outer link bridges hidden representations between heterogeneous agents.

Part 2

Chaining All Agents into a Recursive Loop

Overall architecture of RecursiveMAS. Inner RecursiveLink performs latent thoughts generation; outer RecursiveLink transfers them across heterogeneous agents, forming a recursive loop.

Overall Architecture. Each agent first leverages the inner link to perform latent thoughts generation, then transfers the generated information to the next agent through the outer link. After the last agent finishes, its latent thoughts are fed back to the first agent — closing a recursive loop within the multi-agent system.

Each agent in RecursiveMAS acts like one layer of a recursive language model — information flows within and across agents as the hidden stream of the system:

Inside an agent. The inner link folds each generated hidden state back as the next input, producing a sequence of latent thoughts entirely in continuous space.
Across agents. The outer link forwards those latent thoughts to the next agent, which conditions on them together with its own input context.
Closing the loop. The last agent's latent outputs feed back to the first agent, so each new round refines on top of the previous one. Only the final round decodes text — all intermediate rounds collaborate purely in latent space.

Part 3

Inner–Outer Loop Training

Two-stage Training Pipeline. An inner loop first warm-starts each agent in parallel to align with latent thoughts generation; an outer loop then unrolls the full recursive computation and jointly optimizes all RecursiveLinks at the system level.

Inner Loop · Model-Level

For each agent $A_i$, we warm-start its inner link via a regression objective that pushes generated latent thoughts toward the agent's own input-embedding distribution of the ground-truth answer $y$:

$$\mathcal{L}_{\text{in}} = 1 - \cos\!\Big(\mathcal{R}_{\text{in}}(H),\; \mathrm{Emb}_{\theta_i}(y)\Big)$$

Outer Loop · System-Level

Unroll the system for $n$ recursion rounds, then minimize a single cross-entropy on the final textual prediction. Gradients back-propagate along the full recursive trace, giving every outer link a shared global credit signal:

$$\mathcal{L}_{\text{out}} = \mathrm{CE}\!\left( \mathcal{S}^{(n)}\!\big(\mathcal{S}^{(n-1)}(\cdots \mathcal{S}^{(1)}(x))\big),\;y\right)$$

Lightweight by design. All base-LLM parameters are frozen — only the inner and outer links are trained — yielding ~13M trainable parameters (~0.31% of the full system) while the whole MAS still co-evolves through recursion.

Why latent? Why recursion?

Theoretical Insights

Two analyses motivate why recursion in latent space, mediated by RecursiveLink, is preferable to text-mediated agent interaction.

Proposition 1 — Runtime Complexity

Text-based recursive MAS pays an expensive per-step decoding cost over the full vocabulary $|V|$, while RecursiveMAS replaces it with a much cheaper latent-space transformation:

$$\Theta\!\big(N(m|V|d_h + (t+m)d_h^2 + (t+m)^2 d_h)\big)$$

vs.

$$\Theta\!\big(N(m\,d_h^2 + (t+m)d_h^2 + (t+m)^2 d_h)\big)$$

Takeaway: Since $d_h \ll |V|$ in practice, RecursiveMAS removes the per-step vocabulary projection bottleneck across all $N$ agents and $m$ latent steps.

Theorem 1 — Gradient Stability

Under realistic assumptions, when token predictions are confident (entropy $\le \epsilon$), text-based recursive SFT suffers from vanishing gradients, while RecursiveLink maintains stable, near-constant gradients across the looped backpropagation:

$$\left\|\tfrac{\partial \mathcal{R}_{\text{text}}(h)}{\partial h}\right\|_2 \le O(\epsilon) \ll 1$$ $$\left\|\tfrac{\partial \mathcal{R}(h)}{\partial h}\right\|_2 \ge \Omega\!\left(1 - \sqrt{\tfrac{1}{d_h}\log\tfrac{1}{\delta}}\right)$$

Takeaway: Latent recursion preserves informative credit signals across rounds, enabling stable whole-system co-optimization.

Plug-and-play

Four Collaboration Patterns

RecursiveMAS is structure-agnostic. We instantiate it under four representative patterns, each composed of off-the-shelf agents from diverse model families.

Sequential Style

Three complementary roles progressively decompose, judge, refine, and solve a problem.

Planner Critic Solver

Mixture Style

Domain specialists reason in parallel; a summarizer aggregates their latent outputs.

Math Code Science Summarizer

Distillation Style

A larger expert pairs with a smaller learner to distill capability while keeping inference fast.

Expert Learner

Deliberation Style

An inner-thinking reflector iteratively critiques a tool-calling agent (Python & search).

Reflector Tool-Caller

Collaboration Pattern	Role	Model
Sequential (Light)	Planner	Qwen3-1.7B
	Critic	Llama3.2-1B-Instruct
	Solver	Qwen2.5-Math-1.5B-Instruct
Sequential (Scaled)	Planner	Gemma3-4B-it
	Critic	Llama3.2-3B-Instruct
	Solver	Qwen3.5-4B
Mixture	Code Specialist	Qwen2.5-Coder-3B-Instruct
	Science Specialist	BioMistral-7B
	Math Specialist	DeepSeek-R1-Distill-Qwen-1.5B
	Summarizer	Qwen3.5-2B
Distillation	Expert	Qwen3.5-9B
Distillation	Learner	Qwen3.5-4B
Deliberation	Reflector	Qwen3.5-4B
Deliberation	Tool-Caller	Qwen3.5-4B (Tool-Integration)

Main Results

Whole-System Performance

Under identical training budgets and matched MAS structure, RecursiveMAS consistently outperforms strong single-agent, recursive, and multi-agent baselines.

Method	MATH500	AIME 2025	AIME 2026	GPQA-D	LiveCodeBench	MedQA
Single-Agent Fine-tuning Baselines
Single Agent (LoRA)	83.1	70.0	73.3	62.0	37.4	76.1
Single Agent (Full-SFT)	83.2	73.3	76.7	62.8	38.6	77.0
Multi-Agent Frameworks
Mixture-of-Agents (MoA)	79.8	60.0	63.3	47.6	27.0	57.5
TextGrad	84.9	73.3	76.7	62.5	39.8	77.2
Recursion-based Methods
LoopLM	84.6	66.7	63.3	48.1	24.9	56.4
Recursive-TextMAS	85.8	73.3	73.3	61.6	38.7	77.0
RecursiveMAS (ours)	88.0	86.7	86.7	66.2	42.9	79.3

RecursiveMAS at recursion round $r{=}3$. Best-in-column results across all categories.

+8.3%

Average accuracy gain

over the strongest baseline on each benchmark, under matched training budgets.

+18.1%

AIME 2025 reasoning

RecursiveMAS substantially closes the gap on logic-dense competition math.

+13.0%

Code Generation robustness

Held-out hard code generation benchmark; gains preserved over recursion and text-MAS baselines.

Scaling Behavior

Performance Scales with Recursion Depth

On the sequential-style MAS, RecursiveMAS continues to improve as recursion deepens, while Recursive-TextMAS plateaus or regresses — the gap widens with $r$.

Method	Metric	Math500		AIME 2025		AIME 2026		GPQA-D		MedQA		Code Gen.		Improve
Method	Metric	Light	Scaled	Light	Scaled	Light	Scaled	Light	Scaled	Light	Scaled	Light	Scaled	Improve
Recursive Round r = 1
Recursive-TextMAS	Acc.	71.9	84.2	24.0	71.3	16.7	76.7	28.1	61.5	29.0	76.1	30.7	38.5	Base
	Time (s)	1368	2401	2380	8462	2216	9376	1056	2190	1555	1522	976	8867	Base
	Token	1185	1471	2993	9397	2754	8854	2084	3693	2382	1427	1146	3154	Base
RecursiveMAS	Acc.	75.8	86.3	30.7	80.0	17.3	82.7	30.3	63.1	30.3	78.2	35.1	40.1	↑ +3.4
	Time (s)	825	1701	1829	7784	1788	8134	586	1965	1194	1348	449	7908	× 1.2
	Token	523	816	1622	6338	1576	7021	829	2675	1369	964	577	2198	↓ 34.6%
Recursive Round r = 2
Recursive-TextMAS	Acc.	72.5	84.4	23.3	70.7	10.0	77.3	28.7	59.1	28.3	76.1	30.0	38.0	Base
	Time (s)	2204	3958	4247	14380	3960	14110	1825	4207	3097	2745	1847	14792	Base
	Token	2117	2794	5318	16372	4982	16213	3708	6128	4436	2609	1998	5369	Base
RecursiveMAS	Acc.	76.6	87.1	33.3	86.0	18.7	84.0	32.3	64.6	31.2	78.3	36.9	41.3	↑ +6.0
	Time (s)	1096	1974	2367	8178	2263	8965	752	2342	1427	1664	627	8329	× 1.9
	Token	495	953	1614	5314	1552	6657	813	2521	1383	1008	531	2020	↓ 65.5%
Recursive Round r = 3
Recursive-TextMAS	Acc.	69.1	85.8	18.0	73.3	16.7	74.7	28.7	58.6	28.5	77.1	29.3	36.5	Base
	Time (s)	2952	6010	6183	19304	5907	19678	3322	7537	4684	3922	2310	22036	Base
	Token	3059	4100	8645	23651	7813	22915	5820	8091	6307	3731	2676	7078	Base
RecursiveMAS	Acc.	77.8	88.2	34.0	86.7	20.0	86.0	32.6	66.2	31.7	79.3	37.4	42.8	↑ +7.2
	Time (s)	1360	2320	2727	8981	2629	9623	861	2638	1704	1912	805	10186	× 2.4
	Token	519	893	1586	5342	1537	6860	786	2524	1378	1056	595	2247	↓ 75.6%

Performance heatmap of RecursiveMAS across training and inference recursion depths.

Train-time × Test-time recursion. Increasing inference depth keeps improving systems trained with fewer rounds, while deeper training shifts the entire performance frontier upward - strongest results consistently appear in the upper-right region.

Efficiency

Faster & Cheaper as Recursion Deepens

Because most rounds happen entirely in latent space, RecursiveMAS avoids repeated text decoding. The advantage compounds with $r$.

Inference time speedup of RecursiveMAS over Recursive-TextMAS across recursion rounds.

End-to-end inference time speedup grows from 1.2× at $r{=}1$ to 1.9× at $r{=}2$ to 2.4× at $r{=}3$.

Token usage reduction of RecursiveMAS over Recursive-TextMAS across recursion rounds.

Token-usage reduction scales from 34.6% at $r{=}1$ to 65.5% at $r{=}2$ to 75.6% at $r{=}3$.

Training Method	Peak GPU Mem.	Trainable Params.	Estimated Cost	Avg. Accuracy
LoRA Training	21.67 GB	15.92M (0.37%)	$6.64	66.9
Full SFT	41.40 GB	4.21B (100%)	$9.67	68.6
RecursiveMAS (ours)	15.29 GB	13.12M (0.31%)	$4.27	74.9

With the lowest GPU memory, smallest trainable footprint, and lowest estimated cost, RecursiveMAS still attains the highest average accuracy across all downstream tasks.

Generalization

One Framework, Many Patterns

Beyond sequential MAS, RecursiveMAS extends to mixture, distillation, and deliberation styles with consistent gains over the strongest standalone agent in each setting.

RecursiveMAS generalization across mixture, deliberation, and distillation collaboration patterns.

Across all three additional patterns, RecursiveMAS improves over the strongest standalone agent. In Distillation Style, it boosts the smaller learner while preserving a ~1.5× speed advantage over the expert.

+6.2%

Mixture Style

Beats the strongest single domain specialist via cross-domain composition.

+4.8%

Deliberation Style

Recursive latent coordination remains effective with external tool use.

+8.0%

Distillation Style

Lifts the learner while keeping a 1.5× speed edge over the expert.

Case Study

Recursion Self-Corrects across Rounds

A common pattern across our case studies: an early round produces an incorrect answer; deeper rounds successfully recover the right one through iterative latent refinement.

MATH500 · Question

For how many positive integers $n > 1$ is $2^{24}$ a perfect $n^{\text{th}}$ power?

Round 1 Incorrect

// RecursiveMAS final output, r = 1
The model enumerates all factor pairs of 24 and reports six remaining cases after excluding pairs with n = 1. It also accidentally drops the case n = 24, so its count is off by one — an early-stage double-counting mistake.

final = 6

Round 2 Correct

// RecursiveMAS final output, r = 2
The latent loop reframes the problem: writing $2^{24}\!=\!2^{mn}$ shows perfect n-th powers correspond to divisors of 24. Excluding n = 1, seven valid divisors remain — the spurious factor-pair counting is gone.

final = 7

Round 3 Correct & Cleaner

// RecursiveMAS final output, r = 3
The argument is now compact: n must divide 24, exclude n = 1, and report the count. Latent recursion both fixes the answer and shortens the reasoning trace.

final = 7

Compared with text-mediated baselines, RecursiveMAS reaches the right answer in one fewer external decoding cycle — intermediate rounds collaborate without ever emitting tokens.

Examples

RecursiveMAS Across Downstream Tasks

Four end-to-end traces from the paper's appendix — one per domain. Recursion rounds 1 and 2 happen entirely in latent space; only the final round emits text. Scroll horizontally or use the arrows to navigate.

Mathematical Reasoning AIME 2026

Question

A plane contains points $A$ and $B$ with $AB = 1$. Point $A$ is rotated counterclockwise through an acute angle $\theta$ around $B$ to point $A'$. Then $B$ is rotated clockwise through angle $\theta$ around $A'$ to point $B'$. If $AB' = \tfrac{4}{3}$, and $\cos\theta = \tfrac{m}{n}$ with $\gcd(m,n)=1$, find $m + n$.

Round 1 · latent Planner → Critic → Solver

Round 2 · latent Planner → Critic → Solver

Round 3 · text output

Place $B$ at the origin and $A$ at $(1, 0)$. After rotating $A$ around $B$ by $\theta$, $A' = (\cos\theta, \sin\theta)$. Rotating $B$ clockwise by $\theta$ around $A'$ gives $B' = (\cos\theta - 1,\;\sin\theta)$.

Then $AB' = \sqrt{(\cos\theta - 2)^2 + \sin^2\theta} = \sqrt{5 - 4\cos\theta}$. Setting $\sqrt{5 - 4\cos\theta} = \tfrac{4}{3}$ and squaring yields $\cos\theta = \tfrac{29}{36}$, so $m + n = 29 + 36$.

Final Answer: $\boxed{65}$

Scientific & Medical MedQA

Question

A 45-year-old mechanic presents with acute shortness of breath while repairing a tractor. He is pale and diaphoretic; pupils are constricted; diffuse wheezes are noted in all lung fields. What is the best treatment?

A. Succinylcholine
B. Inhaled ipratropium and oxygen
C. Atropine and pralidoxime
D. Inhaled albuterol and oxygen

Round 1 · latent Planner → Critic → Solver

Round 2 · latent Planner → Critic → Solver

Round 3 · text output

The presentation — miosis, bronchospasm, diaphoresis, agricultural exposure — is a classic cholinergic crisis from organophosphate poisoning (SLUDGE: salivation, lacrimation, urination, defecation, GI upset, emesis).

Atropine reverses muscarinic effects (secretions, bronchospasm); pralidoxime reactivates acetylcholinesterase to address nicotinic effects. Albuterol or ipratropium alone treats only local bronchospasm; succinylcholine worsens paralysis.

Final Answer: $\boxed{C}$

Code Generation MBPP+

Question

Write a function to check whether each element of a second tuple is strictly smaller than its corresponding element in a first tuple.

Round 1 · latent Planner → Critic → Solver

Round 2 · latent Planner → Critic → Solver

Round 3 · text output

Plan: validate equal lengths, then compare element-wise; return False on any violation.

def check_smaller(tuple1, tuple2):
    if len(tuple1) != len(tuple2):
        return False
    for i in range(len(tuple1)):
        if tuple1[i] <= tuple2[i]:
            return False
    return True

Passes all hidden test cases

Search-based QA Bamboogle

Question

Who was president of the United States in the year that Citibank was founded?

Round 1 · latent Reflector → Tool-Caller

Round 2 · latent Reflector → Tool-Caller

Round 3 · text output

Decompose into two sub-questions, then chain web searches.

<search>When was Citibank founded?</search>
<result>Chartered on June 16, 1812 in New York
       under the name "City Bank of New York".</result>

<search>President of the United States 1812</search>
<result>James Madison was president during the
       War of 1812.</result>

Citibank was founded in 1812; the U.S. president that year was James Madison.

Final Answer: $\boxed{\text{James Madison}}$

Cite

BibTeX

@misc{recursivemas,
      title={Recursive Multi-Agent Systems}, 
      author={Xiyuan Yang and Jiaru Zou and Rui Pan and Ruizhong Qiu and Pan Lu and Shizhe Diao and Jindong Jiang and Hanghang Tong and Tong Zhang and Markus J. Buehler and Jingrui He and James Zou},
      year={2026},
      eprint={2604.25917},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2604.25917}, 
}