Recursive Multi-Agent Systems: Summary

Summary (Overview)

  • Core Contribution: Introduces RecursiveMAS, a novel framework that extends the principle of recursive computation from a single language model to an entire multi-agent system (MAS). It enables heterogeneous agents to collaborate and refine their reasoning iteratively within a shared latent space.
  • Key Mechanism: Agents are connected via a lightweight RecursiveLink module (with inner and outer variants). The inner link enables latent thoughts generation within an agent, and the outer link facilitates cross-agent latent state transfer, forming a system-wide recursive loop.
  • Optimization Method: Proposes an inner-outer loop learning algorithm that co-optimizes the entire system by training only the RecursiveLink modules. This allows for shared gradient-based credit assignment across recursion rounds, enabling efficient whole-system evolution.
  • Empirical Performance: Evaluated across 9 benchmarks (mathematics, science, medicine, code, search), RecursiveMAS consistently outperforms advanced single/multi-agent and recursive baselines, achieving an average accuracy improvement of 8.3%, alongside significant inference speedup (1.2×–2.4×) and token usage reduction (34.6%–75.6%).
  • Theoretical Foundation: Provides analyses on runtime complexity and learning dynamics, establishing that latent-space collaboration in RecursiveMAS is more computationally efficient and maintains stable gradient flow during training compared to text-based interaction.

Introduction and Theoretical Foundation

Traditional single language models often struggle with complex tasks due to limited capacity or inefficient solution space exploration. Multi-agent systems (MAS) address this by orchestrating collaboration among specialized agents. However, improving such systems is challenging: prompt-based adaptation doesn't improve agents internally, while training all agent parameters is non-trivial, and text-based sequential interactions introduce latency.

This paper recasts agent collaboration through the lens of recursive language models (RLMs), where shared layers are iteratively applied within a continuous latent space to deepen reasoning. The authors ask: "Can agent collaboration itself be scaled through recursion?"

RecursiveMAS answers affirmatively by treating the entire MAS as a unified recursive computation. Each agent acts like an RLM layer, iteratively passing and refining latent representations in a loop. The framework is optimized via lightweight RecursiveLink modules, avoiding the need to update all model parameters. Theoretical justifications show that latent-space collaboration avoids the expensive per-step vocabulary decoding of text-based systems and maintains stable gradients during training, making system-wide co-optimization more effective.

Key Definitions:

  • Auto-regressive Generation in Latent Space: Given a model fθ()f_{\theta}(\cdot) and input embeddings E=[e1,...,et]Rt×dhE = [e_1, ..., e_t] \in \mathbb{R}^{t \times d_h}, the next latent thought at step t+1t+1 is generated as: ht+1=fθ([Et;ht]).(1)h_{t+1} = f_{\theta}([E_{\leq t}; h_t]). \tag{1}
  • Recursive Computation: A recursive model reuses the same transformation stack fθf_{\theta} for nn iterations: H(0)=E,H(r)=fθ(H(r1)),r=1,...,n.(2)H^{(0)} = E, \quad H^{(r)} = f_{\theta}\left(H^{(r-1)}\right), \quad r = 1, ..., n. \tag{2}
  • Recursive Multi-Agent Evolution (Definition 2.1): The progressive refinement of the system's collective latent state H={H1,...,HN}\mathcal{H} = \{H_1, ..., H_N\} through iterative interaction, i.e., S(0)EvolveH(1)S(1)EvolveH(2)EvolveH(n)S(n)\mathcal{S}^{(0)} \xrightarrow[\text{Evolve}]{\mathcal{H}^{(1)}} \mathcal{S}^{(1)} \xrightarrow[\text{Evolve}]{\mathcal{H}^{(2)}} \cdots \xrightarrow[\text{Evolve}]{\mathcal{H}^{(n)}} \mathcal{S}^{(n)}.
  • Collaboration Patterns: The framework is instantiated under four common MAS patterns: Sequential Style (Planner, Critic, Solver pipeline), Mixture Style (parallel domain experts aggregated by a Summarizer), Distillation Style (Expert-Learner knowledge transfer), and Deliberation Style (Reflector and Tool-Caller with external tools).

Methodology

3.1 A Lightweight RecursiveLink

The RecursiveLink R\mathcal{R} is a two-layer residual projection module designed to preserve and transmit semantic information between embedding spaces.

  • Inner Link Rin\mathcal{R}_{\text{in}}: Used within an agent during latent thoughts generation to map its last-layer embedding back to its input space. Rin(h)=h+W2σ(W1h).(3)\mathcal{R}_{\text{in}}(h) = h + W_2 \sigma(W_1 h). \tag{3} Here, W1,W2W_1, W_2 are linear layers, σ()\sigma(\cdot) is GELU activation, and the residual connection preserves original semantics.
  • Outer Link Rout\mathcal{R}_{\text{out}}: Connects heterogeneous agents with different hidden dimensions by adding a linear layer W3W_3 for cross-space mapping. Rout(h)=W3h+W2σ(W1h).(4)\mathcal{R}_{\text{out}}(h) = W_3 h + W_2 \sigma(W_1 h). \tag{4}

3.2 Chain All Agents Together as a Loop

The architecture forms a recursive loop where agents collaborate in latent space (see Figure 2).

  1. Latent Thoughts Generation: An agent A1A_1 starts with input embeddings EA1E_{A_1}. It computes a last-layer hidden state hth_t, transforms it via Rin\mathcal{R}_{\text{in}} to get the next input embedding et+1e_{t+1}, and repeats this auto-regressively for mm steps to generate a sequence of latent thoughts HA1=[ht,ht+1,...,ht+m]H_{A_1} = [h_t, h_{t+1}, ..., h_{t+m}].
  2. Cross-Agent Interaction: HA1H_{A_1} is sent to the next agent A2A_2 via Rout\mathcal{R}_{\text{out}} to become A2A_2-aligned input embeddings. A2A_2 then generates its own latent thoughts conditioned on this transferred information and its own context.
  3. System Loop: This process continues through all agents. After the last agent ANA_N finishes, its latent outputs are passed back to A1A_1 via the RecursiveLink, closing the loop. This allows each new recursion round to condition on and refine information from previous rounds. Only in the final recursion round does the last agent decode a textual output.

Proposition 3.1 (Runtime Complexity): This design provides an efficiency advantage.

  • Text-based Recursive MAS complexity: Θ(N(mVdh+(t+m)dh2+(t+m)2dh))\Theta(N(m|V|d_h + (t+m)d_h^2 + (t+m)^2 d_h)).
  • RecursiveMAS complexity: Θ(N(mdh2+(t+m)dh2+(t+m)2dh))\Theta(N(md_h^2 + (t+m)d_h^2 + (t+m)^2 d_h)).

Remark 3.2: Since the hidden dimension dhVd_h \ll |V| (vocabulary size), RecursiveMAS replaces the expensive term mVdhm|V|d_h with the more efficient mdh2md_h^2.

4. Learning to Recur as a Whole

The training pipeline (Figure 4) has two stages:

  1. Model-Level Inner-Loop Training: Warms up each agent's inner RecursiveLink Rin\mathcal{R}_{\text{in}} for latent thoughts generation. Given agent AiA_i and ground-truth text yy, the objective is to align the generated latent thoughts HH with the semantic distribution of yy's input embeddings: Lin=1cos(Rin(H),Embθi(y)).(5)\mathcal{L}_{\text{in}} = 1 - \cos\left(\mathcal{R}_{\text{in}}(H), \text{Emb}_{\theta_i}(y)\right). \tag{5}
  2. System-Level Outer-Loop Training: Co-optimizes the entire system by training the outer RecursiveLinks Rout\mathcal{R}_{\text{out}} across nn recursion rounds. After the final round, a cross-entropy loss is computed on the decoded textual output: Lout=CE(S(n)(S(n1)(S(1)(x))),y).(6)\mathcal{L}_{\text{out}} = \text{CE}\left(\mathcal{S}^{(n)}(\mathcal{S}^{(n-1)}(\cdots \mathcal{S}^{(1)}(x))), y\right). \tag{6} Gradients are backpropagated through the full recursive computation graph, enabling shared credit assignment.

Theorem 4.1 (Gradient Stability): Under realistic assumptions, if token distributions are confident (entropy ϵ\leq \epsilon, ϵ1\epsilon \ll 1):

  • Text-based interaction suffers from gradient vanishing: Rtext(h)h2O(ϵ)1\left\|\frac{\partial \mathcal{R}_{\text{text}}(h)}{\partial h}\right\|_2 \leq O(\epsilon) \ll 1.
  • RecursiveMAS maintains stable gradients: R(h)h2Ω(11dhlog1δ)\left\|\frac{\partial \mathcal{R}(h)}{\partial h}\right\|_2 \geq \Omega\left(1 - \sqrt{\frac{1}{d_h} \log \frac{1}{\delta}}\right).

This demonstrates the learning advantage of latent-space collaboration.

Empirical Validation / Results

Setup: Evaluated on 9 benchmarks: MATH500, AIME2025, AIME2026 (Math); GPQA-Diamond, MedQA (Science/Medicine); LiveCodeBench, MBPP+ (Code); HotpotQA, Bamboogle (Search). RecursiveMAS was instantiated with diverse LLMs (Qwen, Llama, Gemma, Mistral) under four collaboration patterns (see agent configurations in Table 1). Compared against strong baselines: Single Agents (LoRA/Full-SFT), Mixture-of-Agents (MoA), TextGrad, LoopLM, and Recursive-TextMAS.

Table 1: Agent Configurations for Different Collaboration Patterns

Collaboration PatternRoleModel Size & Version
Sequential Style (Light)PlannerQwen3-1.7B
CriticLlama3.2,1B-Instruct
SolverQwen2.5-Math-1.5B-Instruct
Sequential Style (Scaled)PlannerGemma3-4B-it
CriticLlama3.2-3B-Instruct
SolverQwen3.5-4B
Mixture StyleCode SpecialistQwen2.5-Coder-3B-Instruct
Science SpecialistBioMistral-7B
Math SpecialistDeepSeek-R1-Distill-Qwen-1.5B
SummarizerQwen3.5-2B
Distillation StyleLearnerQwen3.5-4B
ExpertQwen3.5-9B
Deliberation StyleReflectorQwen3.5-4B
Tool-CallerQwen3.5-4B (with Tool-Integration)

5.1 Scaling Performance via Recursion

Table 2 shows RecursiveMAS's performance improves with recursion depth (r=1,2,3r=1,2,3) across accuracy, runtime, and token usage, consistently outperforming the text-based baseline (Recursive-TextMAS).

Key Results from Table 2 (Averaged Trends):

  • Accuracy: RecursiveMAS shows an average improvement over the text baseline of +8.1% at r=1, +19.6% at r=2, and +20.2% at r=3.
  • Efficiency: RecursiveMAS delivers 1.2× to 2.4× inference speedup and reduces token usage by 34.6% to 75.6% as recursion deepens.

The framework exhibits a clean scaling law: deeper training shifts the performance frontier upward, and deeper inference continues to improve systems trained with fewer rounds (Figure 1 Top).

5.2 Broader Comparison

Table 3: Comparison of RecursiveMAS with Other Methods (at r=3)

MethodMATH500AIME2025AIME2026GPQA-DLiveCodeBenchMedQA
Single Agent (w/ LoRA)83.170.073.362.037.476.1
Single Agent (w/ Full-SFT)83.273.376.762.838.677.0
Mixture-of-Agents (MoA)79.860.063.347.627.057.5
TextGrad84.973.376.762.539.877.2
LoopLM84.666.763.348.124.956.4
Recursive-TextMAS85.873.373.361.638.777.0
RecursiveMAS88.086.786.766.242.979.3

RecursiveMAS delivers a consistent whole-system advantage, achieving an average performance improvement of 8.3% over the strongest baseline on each benchmark.

5.3 Generalization Across Collaboration Patterns

RecursiveMAS generalizes effectively across diverse MAS structures (Figure 1 Bottom):

  • Mixture-Style: Achieves +6.2% average improvement over the strongest domain specialist.
  • Deliberation-Style: Improves the tool-calling agent by +4.8%.
  • Distillation-Style: Improves the Learner by +8.0% while retaining a 1.5× end-to-end speed advantage over the Expert.

5.4 Efficiency Analyses

  • Inference Time Speedup (Figure 5): The efficiency gain of RecursiveMAS over the text baseline grows with recursion depth: 1.2× at r=1, 1.9× at r=2, 2.4× at r=3.
  • Token Reduction (Figure 6): RecursiveMAS reduces token usage by 34.6% at r=1, 65.5% at r=2, and 75.6% at r=3, as it avoids repeated intermediate text generation.

6. In-depth Analyses

  • RecursiveLink Design (Table 4): The proposed 2-layer residual design performs best. The residual connection is crucial, providing stable training and stronger inference. Table 4: Efficacy on RecursiveLink Design
    RecursiveLink DesignMath500GPQA-DLiveCodeBench
    1-Layer84.463.240.1
    Res+1-Layer86.765.341.4
    2-Layer85.664.540.5
    Res+2-Layer (ours)88.066.242.9
  • Semantic Representations (Figure 7): PCA visualization shows that the semantic distribution of answers generated by RecursiveMAS progressively aligns with the ground-truth distribution as recursion depth increases from r=1r=1 to r=3r=3.
  • Optimal Latent Thoughts Length (Figure 8 & Table 9): Performance improves with latent step length mm and stabilizes around m=80m=80, indicating effective collaboration requires only a modest latent budget.
  • Training Cost Analysis (Table 5): RecursiveMAS has the lowest GPU memory usage, trainable parameters, and estimated cost while achieving the highest accuracy, offering a superior cost-performance trade-off. Table 5: Cost Analysis on RecursiveMAS
    MethodsGPU Mem.Trainable Param.CostAvg. Acc.
    LoRA Training21.67 GB15.92M (0.37%)$6.6466.9%
    Full-SFT41.40 GB4.21B (100%)$9.6768.6%
    RecursiveMAS15.29 GB13.12M (0.31%)$4.2774.9%

Theoretical and Practical Implications

Theoretical Implications: The paper provides formal grounding for latent-space multi-agent collaboration. The complexity analysis (Proposition 3.1) justifies the architectural efficiency, and the gradient stability theorem (Theorem 4.1) explains why latent-space recursion enables more effective whole-system optimization compared to text-based interaction, which suffers from gradient vanishing.

Practical Implications:

  1. Scalable Agent Collaboration: RecursiveMAS provides a principled, efficient framework for scaling MAS performance through recursion, a new axis beyond simply using larger models or more agents.
  2. Efficiency Gains: The significant reductions in inference time and token usage make advanced multi-agent reasoning more practical for real-world deployment.
  3. Flexible Deployment: The framework is agnostic to specific MAS architectures and model families, allowing it to be adapted to various collaboration patterns (sequential, mixture, distillation, deliberation) with complementary specialist agents.
  4. Cost-Effective Training: By freezing base LLM parameters and only training the lightweight RecursiveLink, the system achieves state-of-the-art performance at a fraction of the training cost of full fine-tuning.

Conclusion

RecursiveMAS introduces a novel paradigm for scaling multi-agent systems through system-level recursion in latent space. By connecting heterogeneous agents via lightweight RecursiveLink modules and optimizing them with an inner-outer loop algorithm, the framework enables efficient, iterative collaboration and refinement. Theoretical analyses confirm its advantages in runtime complexity and stable learning dynamics.