Recursive Multi-Agent Systems: Summary

Summary (Overview)

Core Contribution: Introduces RecursiveMAS, a novel framework that extends the principle of recursive computation from a single language model to an entire multi-agent system (MAS). It enables heterogeneous agents to collaborate and refine their reasoning iteratively within a shared latent space.
Key Mechanism: Agents are connected via a lightweight RecursiveLink module (with inner and outer variants). The inner link enables latent thoughts generation within an agent, and the outer link facilitates cross-agent latent state transfer, forming a system-wide recursive loop.
Optimization Method: Proposes an inner-outer loop learning algorithm that co-optimizes the entire system by training only the RecursiveLink modules. This allows for shared gradient-based credit assignment across recursion rounds, enabling efficient whole-system evolution.
Empirical Performance: Evaluated across 9 benchmarks (mathematics, science, medicine, code, search), RecursiveMAS consistently outperforms advanced single/multi-agent and recursive baselines, achieving an average accuracy improvement of 8.3%, alongside significant inference speedup (1.2×–2.4×) and token usage reduction (34.6%–75.6%).
Theoretical Foundation: Provides analyses on runtime complexity and learning dynamics, establishing that latent-space collaboration in RecursiveMAS is more computationally efficient and maintains stable gradient flow during training compared to text-based interaction.

Introduction and Theoretical Foundation

Traditional single language models often struggle with complex tasks due to limited capacity or inefficient solution space exploration. Multi-agent systems (MAS) address this by orchestrating collaboration among specialized agents. However, improving such systems is challenging: prompt-based adaptation doesn't improve agents internally, while training all agent parameters is non-trivial, and text-based sequential interactions introduce latency.

This paper recasts agent collaboration through the lens of recursive language models (RLMs), where shared layers are iteratively applied within a continuous latent space to deepen reasoning. The authors ask: "Can agent collaboration itself be scaled through recursion?"

RecursiveMAS answers affirmatively by treating the entire MAS as a unified recursive computation. Each agent acts like an RLM layer, iteratively passing and refining latent representations in a loop. The framework is optimized via lightweight RecursiveLink modules, avoiding the need to update all model parameters. Theoretical justifications show that latent-space collaboration avoids the expensive per-step vocabulary decoding of text-based systems and maintains stable gradients during training, making system-wide co-optimization more effective.

Key Definitions:

Auto-regressive Generation in Latent Space: Given a model $f_{\theta}(\cdot)$ and input embeddings $E = [e_1, ..., e_t] \in \mathbb{R}^{t \times d_h}$ , the next latent thought at step $t+1$ is generated as: $h_{t+1} = f_{\theta}([E_{\leq t}; h_t]). \tag{1}$
Recursive Computation: A recursive model reuses the same transformation stack $f_{\theta}$ for $n$ iterations: $H^{(0)} = E, \quad H^{(r)} = f_{\theta}\left(H^{(r-1)}\right), \quad r = 1, ..., n. \tag{2}$
Recursive Multi-Agent Evolution (Definition 2.1): The progressive refinement of the system's collective latent state $\mathcal{H} = \{H_1, ..., H_N\}$ through iterative interaction, i.e., $\mathcal{S}^{(0)} \xrightarrow[\text{Evolve}]{\mathcal{H}^{(1)}} \mathcal{S}^{(1)} \xrightarrow[\text{Evolve}]{\mathcal{H}^{(2)}} \cdots \xrightarrow[\text{Evolve}]{\mathcal{H}^{(n)}} \mathcal{S}^{(n)}$ .
Collaboration Patterns: The framework is instantiated under four common MAS patterns: Sequential Style (Planner, Critic, Solver pipeline), Mixture Style (parallel domain experts aggregated by a Summarizer), Distillation Style (Expert-Learner knowledge transfer), and Deliberation Style (Reflector and Tool-Caller with external tools).

Methodology

3.1 A Lightweight RecursiveLink

The RecursiveLink $\mathcal{R}$ is a two-layer residual projection module designed to preserve and transmit semantic information between embedding spaces.

Inner Link $\mathcal{R}_{\text{in}}$ : Used within an agent during latent thoughts generation to map its last-layer embedding back to its input space. $\mathcal{R}_{\text{in}}(h) = h + W_2 \sigma(W_1 h). \tag{3}$ Here, $W_1, W_2$ are linear layers, $\sigma(\cdot)$ is GELU activation, and the residual connection preserves original semantics.
Outer Link $\mathcal{R}_{\text{out}}$ : Connects heterogeneous agents with different hidden dimensions by adding a linear layer $W_3$ for cross-space mapping. $\mathcal{R}_{\text{out}}(h) = W_3 h + W_2 \sigma(W_1 h). \tag{4}$

3.2 Chain All Agents Together as a Loop

The architecture forms a recursive loop where agents collaborate in latent space (see Figure 2).

Latent Thoughts Generation: An agent $A_1$ starts with input embeddings $E_{A_1}$ . It computes a last-layer hidden state $h_t$ , transforms it via $\mathcal{R}_{\text{in}}$ to get the next input embedding $e_{t+1}$ , and repeats this auto-regressively for $m$ steps to generate a sequence of latent thoughts $H_{A_1} = [h_t, h_{t+1}, ..., h_{t+m}]$ .
Cross-Agent Interaction: $H_{A_1}$ is sent to the next agent $A_2$ via $\mathcal{R}_{\text{out}}$ to become $A_2$ -aligned input embeddings. $A_2$ then generates its own latent thoughts conditioned on this transferred information and its own context.
System Loop: This process continues through all agents. After the last agent $A_N$ finishes, its latent outputs are passed back to $A_1$ via the RecursiveLink, closing the loop. This allows each new recursion round to condition on and refine information from previous rounds. Only in the final recursion round does the last agent decode a textual output.

Proposition 3.1 (Runtime Complexity): This design provides an efficiency advantage.

Text-based Recursive MAS complexity: $\Theta(N(m|V|d_h + (t+m)d_h^2 + (t+m)^2 d_h))$ .
RecursiveMAS complexity: $\Theta(N(md_h^2 + (t+m)d_h^2 + (t+m)^2 d_h))$ .

Remark 3.2: Since the hidden dimension $d_h \ll |V|$ (vocabulary size), RecursiveMAS replaces the expensive term $m|V|d_h$ with the more efficient $md_h^2$ .

4. Learning to Recur as a Whole

The training pipeline (Figure 4) has two stages:

Model-Level Inner-Loop Training: Warms up each agent's inner RecursiveLink $\mathcal{R}_{\text{in}}$ for latent thoughts generation. Given agent $A_i$ and ground-truth text $y$ , the objective is to align the generated latent thoughts $H$ with the semantic distribution of $y$ 's input embeddings: $\mathcal{L}_{\text{in}} = 1 - \cos\left(\mathcal{R}_{\text{in}}(H), \text{Emb}_{\theta_i}(y)\right). \tag{5}$
System-Level Outer-Loop Training: Co-optimizes the entire system by training the outer RecursiveLinks $\mathcal{R}_{\text{out}}$ across $n$ recursion rounds. After the final round, a cross-entropy loss is computed on the decoded textual output: $\mathcal{L}_{\text{out}} = \text{CE}\left(\mathcal{S}^{(n)}(\mathcal{S}^{(n-1)}(\cdots \mathcal{S}^{(1)}(x))), y\right). \tag{6}$ Gradients are backpropagated through the full recursive computation graph, enabling shared credit assignment.

Theorem 4.1 (Gradient Stability): Under realistic assumptions, if token distributions are confident (entropy $\leq \epsilon$ , $\epsilon \ll 1$ ):

Text-based interaction suffers from gradient vanishing: $\left\|\frac{\partial \mathcal{R}_{\text{text}}(h)}{\partial h}\right\|_2 \leq O(\epsilon) \ll 1$ .
RecursiveMAS maintains stable gradients: $\left\|\frac{\partial \mathcal{R}(h)}{\partial h}\right\|_2 \geq \Omega\left(1 - \sqrt{\frac{1}{d_h} \log \frac{1}{\delta}}\right)$ .

This demonstrates the learning advantage of latent-space collaboration.

Empirical Validation / Results

Setup: Evaluated on 9 benchmarks: MATH500, AIME2025, AIME2026 (Math); GPQA-Diamond, MedQA (Science/Medicine); LiveCodeBench, MBPP+ (Code); HotpotQA, Bamboogle (Search). RecursiveMAS was instantiated with diverse LLMs (Qwen, Llama, Gemma, Mistral) under four collaboration patterns (see agent configurations in Table 1). Compared against strong baselines: Single Agents (LoRA/Full-SFT), Mixture-of-Agents (MoA), TextGrad, LoopLM, and Recursive-TextMAS.

Table 1: Agent Configurations for Different Collaboration Patterns

Collaboration Pattern	Role	Model Size & Version
Sequential Style (Light)	Planner	Qwen3-1.7B
	Critic	Llama3.2,1B-Instruct
	Solver	Qwen2.5-Math-1.5B-Instruct
Sequential Style (Scaled)	Planner	Gemma3-4B-it
	Critic	Llama3.2-3B-Instruct
	Solver	Qwen3.5-4B
Mixture Style	Code Specialist	Qwen2.5-Coder-3B-Instruct
	Science Specialist	BioMistral-7B
	Math Specialist	DeepSeek-R1-Distill-Qwen-1.5B
	Summarizer	Qwen3.5-2B
Distillation Style	Learner	Qwen3.5-4B
	Expert	Qwen3.5-9B
Deliberation Style	Reflector	Qwen3.5-4B
	Tool-Caller	Qwen3.5-4B (with Tool-Integration)

5.1 Scaling Performance via Recursion

Table 2 shows RecursiveMAS's performance improves with recursion depth ( $r=1,2,3$ ) across accuracy, runtime, and token usage, consistently outperforming the text-based baseline (Recursive-TextMAS).

Key Results from Table 2 (Averaged Trends):

Accuracy: RecursiveMAS shows an average improvement over the text baseline of +8.1% at r=1, +19.6% at r=2, and +20.2% at r=3.
Efficiency: RecursiveMAS delivers 1.2× to 2.4× inference speedup and reduces token usage by 34.6% to 75.6% as recursion deepens.

The framework exhibits a clean scaling law: deeper training shifts the performance frontier upward, and deeper inference continues to improve systems trained with fewer rounds (Figure 1 Top).

5.2 Broader Comparison

Table 3: Comparison of RecursiveMAS with Other Methods (at r=3)

Method	MATH500	AIME2025	AIME2026	GPQA-D	LiveCodeBench	MedQA
Single Agent (w/ LoRA)	83.1	70.0	73.3	62.0	37.4	76.1
Single Agent (w/ Full-SFT)	83.2	73.3	76.7	62.8	38.6	77.0
Mixture-of-Agents (MoA)	79.8	60.0	63.3	47.6	27.0	57.5
TextGrad	84.9	73.3	76.7	62.5	39.8	77.2
LoopLM	84.6	66.7	63.3	48.1	24.9	56.4
Recursive-TextMAS	85.8	73.3	73.3	61.6	38.7	77.0
RecursiveMAS	88.0	86.7	86.7	66.2	42.9	79.3

RecursiveMAS delivers a consistent whole-system advantage, achieving an average performance improvement of 8.3% over the strongest baseline on each benchmark.

5.3 Generalization Across Collaboration Patterns

RecursiveMAS generalizes effectively across diverse MAS structures (Figure 1 Bottom):

Mixture-Style: Achieves +6.2% average improvement over the strongest domain specialist.
Deliberation-Style: Improves the tool-calling agent by +4.8%.
Distillation-Style: Improves the Learner by +8.0% while retaining a 1.5× end-to-end speed advantage over the Expert.

5.4 Efficiency Analyses

Inference Time Speedup (Figure 5): The efficiency gain of RecursiveMAS over the text baseline grows with recursion depth: 1.2× at r=1, 1.9× at r=2, 2.4× at r=3.
Token Reduction (Figure 6): RecursiveMAS reduces token usage by 34.6% at r=1, 65.5% at r=2, and 75.6% at r=3, as it avoids repeated intermediate text generation.

6. In-depth Analyses

RecursiveLink Design (Table 4): The proposed 2-layer residual design performs best. The residual connection is crucial, providing stable training and stronger inference. Table 4: Efficacy on RecursiveLink Design
RecursiveLink Design Math500 GPQA-D LiveCodeBench
1-Layer 84.4 63.2 40.1
Res+1-Layer 86.7 65.3 41.4
2-Layer 85.6 64.5 40.5
Res+2-Layer (ours) 88.0 66.2 42.9
Semantic Representations (Figure 7): PCA visualization shows that the semantic distribution of answers generated by RecursiveMAS progressively aligns with the ground-truth distribution as recursion depth increases from $r=1$ to $r=3$ .
Optimal Latent Thoughts Length (Figure 8 & Table 9): Performance improves with latent step length $m$ and stabilizes around $m=80$ , indicating effective collaboration requires only a modest latent budget.
Training Cost Analysis (Table 5): RecursiveMAS has the lowest GPU memory usage, trainable parameters, and estimated cost while achieving the highest accuracy, offering a superior cost-performance trade-off. Table 5: Cost Analysis on RecursiveMAS
Methods GPU Mem. Trainable Param. Cost Avg. Acc.
LoRA Training 21.67 GB 15.92M (0.37%) $6.64 66.9%
Full-SFT 41.40 GB 4.21B (100%) $9.67 68.6%
RecursiveMAS 15.29 GB 13.12M (0.31%) $4.27 74.9%

RecursiveLink Design	Math500	GPQA-D	LiveCodeBench
1-Layer	84.4	63.2	40.1
Res+1-Layer	86.7	65.3	41.4
2-Layer	85.6	64.5	40.5
Res+2-Layer (ours)	88.0	66.2	42.9

Methods	GPU Mem.	Trainable Param.	Cost	Avg. Acc.
LoRA Training	21.67 GB	15.92M (0.37%)	$6.64	66.9%
Full-SFT	41.40 GB	4.21B (100%)	$9.67	68.6%
RecursiveMAS	15.29 GB	13.12M (0.31%)	$4.27	74.9%

Theoretical and Practical Implications

Theoretical Implications: The paper provides formal grounding for latent-space multi-agent collaboration. The complexity analysis (Proposition 3.1) justifies the architectural efficiency, and the gradient stability theorem (Theorem 4.1) explains why latent-space recursion enables more effective whole-system optimization compared to text-based interaction, which suffers from gradient vanishing.

Practical Implications:

Scalable Agent Collaboration: RecursiveMAS provides a principled, efficient framework for scaling MAS performance through recursion, a new axis beyond simply using larger models or more agents.
Efficiency Gains: The significant reductions in inference time and token usage make advanced multi-agent reasoning more practical for real-world deployment.
Flexible Deployment: The framework is agnostic to specific MAS architectures and model families, allowing it to be adapted to various collaboration patterns (sequential, mixture, distillation, deliberation) with complementary specialist agents.
Cost-Effective Training: By freezing base LLM parameters and only training the lightweight RecursiveLink, the system achieves state-of-the-art performance at a fraction of the training cost of full fine-tuning.

Conclusion

RecursiveMAS introduces a novel paradigm for scaling multi-agent systems through system-level recursion in latent space. By connecting heterogeneous agents via lightweight RecursiveLink modules and optimizing them with an inner-outer loop algorithm, the framework enables efficient, iterative collaboration and refinement. Theoretical analyses confirm its advantages in runtime complexity and stable learning dynamics.