# Recursive Multi-Agent Systems

> RecursiveMAS introduces a framework enabling multi-agent systems to collaborate recursively in a shared latent space via lightweight RecursiveLink modules, achieving significant performance gains and efficiency improvements.

- **Source:** [arXiv](https://arxiv.org/abs/2604.25917)
- **Published:** 2026-04-30
- **Permalink:** https://picx.dev/p/qxP2k7
- **Whiteboard:** https://picx.dev/p/qxP2k7/image

## Summary

# Recursive Multi-Agent Systems: Summary

## Summary (Overview)
*   **Core Contribution:** Introduces **RecursiveMAS**, a novel framework that extends the principle of recursive computation from a single language model to an entire multi-agent system (MAS). It enables heterogeneous agents to collaborate and refine their reasoning iteratively within a shared **latent space**.
*   **Key Mechanism:** Agents are connected via a lightweight **RecursiveLink** module (with inner and outer variants). The inner link enables latent thoughts generation within an agent, and the outer link facilitates cross-agent latent state transfer, forming a system-wide recursive loop.
*   **Optimization Method:** Proposes an **inner-outer loop learning algorithm** that co-optimizes the entire system by training only the RecursiveLink modules. This allows for shared gradient-based credit assignment across recursion rounds, enabling efficient whole-system evolution.
*   **Empirical Performance:** Evaluated across 9 benchmarks (mathematics, science, medicine, code, search), RecursiveMAS consistently outperforms advanced single/multi-agent and recursive baselines, achieving an **average accuracy improvement of 8.3%**, alongside significant **inference speedup (1.2×–2.4×)** and **token usage reduction (34.6%–75.6%)**.
*   **Theoretical Foundation:** Provides analyses on runtime complexity and learning dynamics, establishing that latent-space collaboration in RecursiveMAS is more computationally efficient and maintains stable gradient flow during training compared to text-based interaction.

## Introduction and Theoretical Foundation
Traditional single language models often struggle with complex tasks due to limited capacity or inefficient solution space exploration. Multi-agent systems (MAS) address this by orchestrating collaboration among specialized agents. However, improving such systems is challenging: prompt-based adaptation doesn't improve agents internally, while training all agent parameters is non-trivial, and text-based sequential interactions introduce latency.

This paper recasts agent collaboration through the lens of **recursive language models (RLMs)**, where shared layers are iteratively applied within a continuous latent space to deepen reasoning. The authors ask: *"Can agent collaboration itself be scaled through recursion?"*

**RecursiveMAS** answers affirmatively by treating the entire MAS as a unified recursive computation. Each agent acts like an RLM layer, iteratively passing and refining latent representations in a loop. The framework is optimized via lightweight **RecursiveLink** modules, avoiding the need to update all model parameters. Theoretical justifications show that latent-space collaboration avoids the expensive per-step vocabulary decoding of text-based systems and maintains stable gradients during training, making system-wide co-optimization more effective.

**Key Definitions:**
*   **Auto-regressive Generation in Latent Space:** Given a model $f_{\theta}(\cdot)$ and input embeddings $E = [e_1, ..., e_t] \in \mathbb{R}^{t \times d_h}$, the next latent thought at step $t+1$ is generated as:
    $$h_{t+1} = f_{\theta}([E_{\leq t}; h_t]). \tag{1}$$
*   **Recursive Computation:** A recursive model reuses the same transformation stack $f_{\theta}$ for $n$ iterations:
    $$H^{(0)} = E, \quad H^{(r)} = f_{\theta}\left(H^{(r-1)}\right), \quad r = 1, ..., n. \tag{2}$$
*   **Recursive Multi-Agent Evolution (Definition 2.1):** The progressive refinement of the system's collective latent state $\mathcal{H} = \{H_1, ..., H_N\}$ through iterative interaction, i.e., $\mathcal{S}^{(0)} \xrightarrow[\text{Evolve}]{\mathcal{H}^{(1)}} \mathcal{S}^{(1)} \xrightarrow[\text{Evolve}]{\mathcal{H}^{(2)}} \cdots \xrightarrow[\text{Evolve}]{\mathcal{H}^{(n)}} \mathcal{S}^{(n)}$.
*   **Collaboration Patterns:** The framework is instantiated under four common MAS patterns: **Sequential Style** (Planner, Critic, Solver pipeline), **Mixture Style** (parallel domain experts aggregated by a Summarizer), **Distillation Style** (Expert-Learner knowledge transfer), and **Deliberation Style** (Reflector and Tool-Caller with external tools).

## Methodology
### 3.1 A Lightweight RecursiveLink
The **RecursiveLink** $\mathcal{R}$ is a two-layer residual projection module designed to preserve and transmit semantic information between embedding spaces.
*   **Inner Link** $\mathcal{R}_{\text{in}}$: Used within an agent during latent thoughts generation to map its last-layer embedding back to its input space.
    $$\mathcal{R}_{\text{in}}(h) = h + W_2 \sigma(W_1 h). \tag{3}$$
    Here, $W_1, W_2$ are linear layers, $\sigma(\cdot)$ is GELU activation, and the residual connection preserves original semantics.
*   **Outer Link** $\mathcal{R}_{\text{out}}$: Connects heterogeneous agents with different hidden dimensions by adding a linear layer $W_3$ for cross-space mapping.
    $$\mathcal{R}_{\text{out}}(h) = W_3 h + W_2 \sigma(W_1 h). \tag{4}$$

### 3.2 Chain All Agents Together as a Loop
The architecture forms a recursive loop where agents collaborate in latent space (see Figure 2).
1.  **Latent Thoughts Generation:** An agent $A_1$ starts with input embeddings $E_{A_1}$. It computes a last-layer hidden state $h_t$, transforms it via $\mathcal{R}_{\text{in}}$ to get the next input embedding $e_{t+1}$, and repeats this auto-regressively for $m$ steps to generate a sequence of latent thoughts $H_{A_1} = [h_t, h_{t+1}, ..., h_{t+m}]$.
2.  **Cross-Agent Interaction:** $H_{A_1}$ is sent to the next agent $A_2$ via $\mathcal{R}_{\text{out}}$ to become $A_2$-aligned input embeddings. $A_2$ then generates its own latent thoughts conditioned on this transferred information and its own context.
3.  **System Loop:** This process continues through all agents. After the last agent $A_N$ finishes, its latent outputs are passed back to $A_1$ via the RecursiveLink, closing the loop. This allows each new recursion round to condition on and refine information from previous rounds. **Only in the final recursion round does the last agent decode a textual output.**

**Proposition 3.1 (Runtime Complexity):** This design provides an efficiency advantage.
*   **Text-based Recursive MAS** complexity: $\Theta(N(m|V|d_h + (t+m)d_h^2 + (t+m)^2 d_h))$.
*   **RecursiveMAS** complexity: $\Theta(N(md_h^2 + (t+m)d_h^2 + (t+m)^2 d_h))$.

**Remark 3.2:** Since the hidden dimension $d_h \ll |V|$ (vocabulary size), RecursiveMAS replaces the expensive term $m|V|d_h$ with the more efficient $md_h^2$.

### 4. Learning to Recur as a Whole
The training pipeline (Figure 4) has two stages:
1.  **Model-Level Inner-Loop Training:** Warms up each agent's inner RecursiveLink $\mathcal{R}_{\text{in}}$ for latent thoughts generation. Given agent $A_i$ and ground-truth text $y$, the objective is to align the generated latent thoughts $H$ with the semantic distribution of $y$'s input embeddings:
    $$\mathcal{L}_{\text{in}} = 1 - \cos\left(\mathcal{R}_{\text{in}}(H), \text{Emb}_{\theta_i}(y)\right). \tag{5}$$
2.  **System-Level Outer-Loop Training:** Co-optimizes the entire system by training the outer RecursiveLinks $\mathcal{R}_{\text{out}}$ across $n$ recursion rounds. After the final round, a cross-entropy loss is computed on the decoded textual output:
    $$\mathcal{L}_{\text{out}} = \text{CE}\left(\mathcal{S}^{(n)}(\mathcal{S}^{(n-1)}(\cdots \mathcal{S}^{(1)}(x))), y\right). \tag{6}$$
    Gradients are backpropagated through the full recursive computation graph, enabling shared credit assignment.

**Theorem 4.1 (Gradient Stability):** Under realistic assumptions, if token distributions are confident (entropy $\leq \epsilon$, $\epsilon \ll 1$):
*   Text-based interaction suffers from gradient vanishing: $\left\|\frac{\partial \mathcal{R}_{\text{text}}(h)}{\partial h}\right\|_2 \leq O(\epsilon) \ll 1$.
*   RecursiveMAS maintains stable gradients: $\left\|\frac{\partial \mathcal{R}(h)}{\partial h}\right\|_2 \geq \Omega\left(1 - \sqrt{\frac{1}{d_h} \log \frac{1}{\delta}}\right)$.

This demonstrates the learning advantage of latent-space collaboration.

## Empirical Validation / Results
**Setup:** Evaluated on 9 benchmarks: MATH500, AIME2025, AIME2026 (Math); GPQA-Diamond, MedQA (Science/Medicine); LiveCodeBench, MBPP+ (Code); HotpotQA, Bamboogle (Search). RecursiveMAS was instantiated with diverse LLMs (Qwen, Llama, Gemma, Mistral) under four collaboration patterns (see agent configurations in Table 1). Compared against strong baselines: Single Agents (LoRA/Full-SFT), Mixture-of-Agents (MoA), TextGrad, LoopLM, and Recursive-TextMAS.

**Table 1: Agent Configurations for Different Collaboration Patterns**
| Collaboration Pattern | Role | Model Size & Version |
| :--- | :--- | :--- |
| **Sequential Style (Light)** | Planner | Qwen3-1.7B |
| | Critic | Llama3.2,1B-Instruct |
| | Solver | Qwen2.5-Math-1.5B-Instruct |
| **Sequential Style (Scaled)** | Planner | Gemma3-4B-it |
| | Critic | Llama3.2-3B-Instruct |
| | Solver | Qwen3.5-4B |
| **Mixture Style** | Code Specialist | Qwen2.5-Coder-3B-Instruct |
| | Science Specialist | BioMistral-7B |
| | Math Specialist | DeepSeek-R1-Distill-Qwen-1.5B |
| | Summarizer | Qwen3.5-2B |
| **Distillation Style** | Learner | Qwen3.5-4B |
| | Expert | Qwen3.5-9B |
| **Deliberation Style** | Reflector | Qwen3.5-4B |
| | Tool-Caller | Qwen3.5-4B (with Tool-Integration) |

### 5.1 Scaling Performance via Recursion
**Table 2** shows RecursiveMAS's performance improves with recursion depth ($r=1,2,3$) across accuracy, runtime, and token usage, consistently outperforming the text-based baseline (Recursive-TextMAS).

**Key Results from Table 2 (Averaged Trends):**
*   **Accuracy:** RecursiveMAS shows an average improvement over the text baseline of **+8.1% at r=1**, **+19.6% at r=2**, and **+20.2% at r=3**.
*   **Efficiency:** RecursiveMAS delivers **1.2× to 2.4× inference speedup** and reduces token usage by **34.6% to 75.6%** as recursion deepens.

The framework exhibits a clean **scaling law**: deeper training shifts the performance frontier upward, and deeper inference continues to improve systems trained with fewer rounds (Figure 1 Top).

### 5.2 Broader Comparison
**Table 3: Comparison of RecursiveMAS with Other Methods (at r=3)**
| Method | MATH500 | AIME2025 | AIME2026 | GPQA-D | LiveCodeBench | MedQA |
| :--- | :---: | :---: | :---: | :---: | :---: | :---: |
| Single Agent (w/ LoRA) | 83.1 | 70.0 | 73.3 | 62.0 | 37.4 | 76.1 |
| Single Agent (w/ Full-SFT) | 83.2 | 73.3 | 76.7 | 62.8 | 38.6 | 77.0 |
| Mixture-of-Agents (MoA) | 79.8 | 60.0 | 63.3 | 47.6 | 27.0 | 57.5 |
| TextGrad | 84.9 | 73.3 | 76.7 | 62.5 | 39.8 | 77.2 |
| LoopLM | 84.6 | 66.7 | 63.3 | 48.1 | 24.9 | 56.4 |
| Recursive-TextMAS | 85.8 | 73.3 | 73.3 | 61.6 | 38.7 | 77.0 |
| **RecursiveMAS** | **88.0** | **86.7** | **86.7** | **66.2** | **42.9** | **79.3** |

RecursiveMAS delivers a consistent whole-system advantage, achieving an **average performance improvement of 8.3%** over the strongest baseline on each benchmark.

### 5.3 Generalization Across Collaboration Patterns
RecursiveMAS generalizes effectively across diverse MAS structures (Figure 1 Bottom):
*   **Mixture-Style:** Achieves +6.2% average improvement over the strongest domain specialist.
*   **Deliberation-Style:** Improves the tool-calling agent by +4.8%.
*   **Distillation-Style:** Improves the Learner by +8.0% while retaining a **1.5× end-to-end speed advantage** over the Expert.

### 5.4 Efficiency Analyses
*   **Inference Time Speedup (Figure 5):** The efficiency gain of RecursiveMAS over the text baseline grows with recursion depth: **1.2× at r=1, 1.9× at r=2, 2.4× at r=3**.
*   **Token Reduction (Figure 6):** RecursiveMAS reduces token usage by **34.6% at r=1, 65.5% at r=2, and 75.6% at r=3**, as it avoids repeated intermediate text generation.

### 6. In-depth Analyses
*   **RecursiveLink Design (Table 4):** The proposed 2-layer residual design performs best. The residual connection is crucial, providing stable training and stronger inference.
    **Table 4: Efficacy on RecursiveLink Design**
    | RecursiveLink Design | Math500 | GPQA-D | LiveCodeBench |
    | :--- | :---: | :---: | :---: |
    | 1-Layer | 84.4 | 63.2 | 40.1 |
    | Res+1-Layer | 86.7 | 65.3 | 41.4 |
    | 2-Layer | 85.6 | 64.5 | 40.5 |
    | **Res+2-Layer (ours)** | **88.0** | **66.2** | **42.9** |
*   **Semantic Representations (Figure 7):** PCA visualization shows that the semantic distribution of answers generated by RecursiveMAS progressively aligns with the ground-truth distribution as recursion depth increases from $r=1$ to $r=3$.
*   **Optimal Latent Thoughts Length (Figure 8 & Table 9):** Performance improves with latent step length $m$ and stabilizes around $m=80$, indicating effective collaboration requires only a modest latent budget.
*   **Training Cost Analysis (Table 5):** RecursiveMAS has the lowest GPU memory usage, trainable parameters, and estimated cost while achieving the highest accuracy, offering a superior cost-performance trade-off.
    **Table 5: Cost Analysis on RecursiveMAS**
    | Methods | GPU Mem. | Trainable Param. | Cost | Avg. Acc. |
    | :--- | :---: | :---: | :---: | :---: |
    | LoRA Training | 21.67 GB | 15.92M (0.37%) | $6.64 | 66.9% |
    | Full-SFT | 41.40 GB | 4.21B (100%) | $9.67 | 68.6% |
    | **RecursiveMAS** | **15.29 GB** | **13.12M (0.31%)** | **$4.27** | **74.9%** |

## Theoretical and Practical Implications
**Theoretical Implications:** The paper provides formal grounding for latent-space multi-agent collaboration. The complexity analysis (Proposition 3.1) justifies the architectural efficiency, and the gradient stability theorem (Theorem 4.1) explains why latent-space recursion enables more effective whole-system optimization compared to text-based interaction, which suffers from gradient vanishing.

**Practical Implications:**
1.  **Scalable Agent Collaboration:** RecursiveMAS provides a principled, efficient framework for scaling MAS performance through recursion, a new axis beyond simply using larger models or more agents.
2.  **Efficiency Gains:** The significant reductions in inference time and token usage make advanced multi-agent reasoning more practical for real-world deployment.
3.  **Flexible Deployment:** The framework is agnostic to specific MAS architectures and model families, allowing it to be adapted to various collaboration patterns (sequential, mixture, distillation, deliberation) with complementary specialist agents.
4.  **Cost-Effective Training:** By freezing base LLM parameters and only training the lightweight RecursiveLink, the system achieves state-of-the-art performance at a fraction of the training cost of full fine-tuning.

## Conclusion
RecursiveMAS introduces a novel paradigm for scaling multi-agent systems through system-level recursion in latent space. By connecting heterogeneous agents via lightweight RecursiveLink modules and optimizing them with an inner-outer loop algorithm, the framework enables efficient, iterative collaboration and refinement. Theoretical analyses confirm its advantages in runtime complexity and stable learning dynamics.

---

_Markdown view of https://picx.dev/p/qxP2k7, served by PicX — AI-generated visual whiteboard summaries of research papers._