Summary

FastContext: Training Efficient Repository Explorer for Coding Agents

Summary (Overview)

  • FastContext is a dedicated exploration subagent that separates repository search from the main coding agent's solving process, performing parallel, read-only tool calls and returning compact file-path and line-range evidence.
  • Significant efficiency gains: Integrating FastContext into Mini-SWE-Agent reduces main-agent token consumption by up to 60% while improving end-to-end resolution rates by up to 5.5% across SWE-bench Multilingual, SWE-bench Pro, and SWE-QA benchmarks.
  • Specialized trained explorers: A family of models (4B–30B parameters) is introduced, bootstrapped from reference-model trajectories via supervised fine-tuning (SFT) and refined with task-grounded reinforcement learning (RL) using GRPO.
  • Standalone exploration quality: FastContext trained checkpoints achieve 73.71 file-level F1 and 60.35 module-level F1 on SWE-bench Verified, outperforming prior localization methods.
  • Modular design: Repository exploration is treated as a first-class, trainable component, enabling smaller specialized models (e.g., 4B) to collaborate with stronger main agents (GPT-5.4, GLM-5.1, Kimi-K2.6) with clearer context boundaries.

Introduction and Theoretical Foundation

Background and Motivation

Coding agents (e.g., Claude Code, Codex, GitHub Copilot CLI, Cursor) have advanced automated software engineering, but repository exploration remains a major bottleneck. In existing systems, the same model that solves the task also explores the repository, causing:

  • High token consumption: Reading and searching account for 56.2% of all tool-use turns and 46.5% of main-agent total tokens on average (analysis of GPT-5.4 trajectories on SWE-bench Multilingual).
  • Long sequential prelude: Before the first edit, agents execute a median of six sequential exploration turns and 15.5 exploration tool calls. Unresolved trajectories involve more pre-edit exploration turns (8.34 vs. 6.67) than resolved ones.
  • Noisy context: Irrelevant snippets accumulated during navigation pollute the solver's history, leading to mistaken hypotheses and wasted later turns.

Theoretical Basis

The paper argues that repository exploration should be separated from solving and delegated to a dedicated, lightweight subagent. This follows recent observations from SWE-Pruner that coding-agent context can be pruned (Wang et al., 2026b). Prior work includes graph-/structure-guided localization (AutoCodeRover, LocAgent, CoSIL), retrieval/compression methods (RepoCoder, LongCodeZip), and RL-trained search agents (CodeScout, SWE-grep), but none provides a lightweight, trained explorer that coexists with a standard main agent.

Key insight: Exploration is structured (read-only, parallel tool calls) and expensive enough to motivate delegation, yet can be handled by a small, task-optimized model that returns only the evidence the solver truly needs.

Methodology

FastContext Subagent Architecture

FastContext is a delegation mechanism with a simple runtime harness and three language-agnostic tools:

  • READ: Read line-numbered file contents
  • GLOB: Discover file paths
  • GREP: Regex search over repository text (using ripgrep)

The subagent operates in a loop: at each turn, it issues parallel tool calls (multiple calls executed concurrently) or outputs a final evidence list in a compact format:

<final_answer>
/src/router.py:42-58 (Router definition)
/tests/test_router.py:101-119
</final_answer>

This output is directly consumable by the main agent as focused context, avoiding the long exploratory trajectory.

Policy Initialization with Supervised Fine-Tuning (SFT)

Training data (2,954 filtered examples) is constructed from Sonnet 4.6 exploration traces with three sources:

  1. parallel_toolcalls: Broad first-turn search – the reference model issues nonredundant parallel tool calls covering complementary signals.
  2. multiturn_traj: Multi-turn evidence gathering – full reference-model trajectories preserved.
  3. linerange: Precise citation generation – model produces only a narrow <final_answer> block from retrieved contents.

The SFT loss is an assistant-token-only objective:

LSFT=1Dsft(x,y)Dsftt=1ymtlogpθ(ytx,y<t)\mathcal{L}_{\text{SFT}} = -\frac{1}{|\mathcal{D}_{\text{sft}}|} \sum_{(x,y) \in \mathcal{D}_{\text{sft}}} \sum_{t=1}^{|y|} m_t \log p_\theta(y_t \mid x, y_{<t})

where mtm_t masks out non-assistant tokens. The model is fine-tuned with this objective from an initial checkpoint.

Policy Refinement with Reinforcement Learning (RL)

SFT imitation does not directly optimize whether final citations cover the code locations needed to solve the issue. Therefore, RL is used with a 400-prompt set from issue-resolution tasks with reference patches.

For each instance, the reference patch is parsed into target file-and-line sets Gf\mathcal{G}_f and Gl\mathcal{G}_l (files and lines). The model rolls out as the actual FastContext subagent, interacting with tools and finally producing a <final_answer> block. The predicted sets Pf\mathcal{P}_f and Pl\mathcal{P}_l are parsed from the model's output.

The reward function combines task outcome, parallel bonus, and format penalty:

R=F1(Pf,Gf)+F1(Pl,Gl)task outcome+rparallelparallel function callrformatpenaltyR = \underbrace{F_1(\mathcal{P}_f, \mathcal{G}_f) + F_1(\mathcal{P}_l, \mathcal{G}_l)}_{\text{task outcome}} + \underbrace{r_{\text{parallel}}}_{\text{parallel function call}} - \underbrace{r_{\text{format}}}_{\text{penalty}}
  • Task outcome: Sum of file-level and line-level F1 after path normalization (zero for empty sets).
  • rparallelr_{\text{parallel}}: Small bonus for bounded multi-call exploration.
  • rformatr_{\text{format}}: Penalty for empty, overly long, malformed, or excessive-fan-out outputs.

The model is optimized with GRPO (Shao et al., 2024), sampling multiple trajectories per prompt from the SFT checkpoint. This stage aligns the explorer with the practical goal of returning a compact citation set covering the code regions most likely to matter.

Model Variants

  • FC-30B-SFT: 30B parameter model trained only with SFT (scaling reference)
  • FC-4B-SFT: 4B model trained with SFT (compact deployment target)
  • FC-4B-RL: 4B model additionally refined with RL (test of task-grounded optimization)

Empirical Validation / Results

End-to-End Performance (Table 1)

Three benchmarks are used: SWE-bench Multilingual (300 instances), SWE-bench Pro (200 random subset), and SWE-QA (repository-level QA). Main agents: GPT-5.4, GLM-5.1, Kimi-K2.6. Baseline: direct solving (w/o Explore) vs. same-model exploration vs. FastContext variants.

Table 1: End-to-end performance and efficiency across three benchmarks.

Main AgentSubagentSWE-bench Multilingual Score / Tokens / TurnsSWE-bench Pro Score / Tokens / TurnsSWE-QA Score / Tokens / Turns
GPT-5.4w/o Explore71.7 / 457k / 17.746.0 / 818k / 20.781.3 / 418k / 15.7
GPT-5.4Same model73.3 / 379k / 18.351.5 / 703k / 23.781.4 / 166k / 9.8
GPT-5.4FC-30B-SFT75.0 / 356k / 18.249.0 / 688k / 23.582.0 / 206k / 11.2
GPT-5.4FC-4B-SFT73.3 / 364k / 18.347.0 / 689k / 23.281.9 / 213k / 11.6
GPT-5.4FC-4B-RL74.7 / 338k / 18.348.5 / 701k / 23.582.0 / 210k / 11.4
GLM-5.1w/o Explore72.3 / 2514k / 73.917.5 / 2692k / 67.472.7 / 401k / 27.7
GLM-5.1Same model73.3 / 1994k / 55.918.0 / 2356k / 63.973.4 / 249k / 20.4
GLM-5.1FC-30B-SFT73.7 / 1797k / 55.020.0 / 2370k / 64.273.3 / 292k / 23.0
GLM-5.1FC-4B-SFT73.3 / 1919k / 56.918.0 / 2279k / 64.073.4 / 306k / 23.8
GLM-5.1FC-4B-RL73.7 / 1971k / 56.622.5 / 2210k / 64.373.5 / 302k / 23.2
Kimi-K2.6w/o Explore76.3 / 1553k / 55.731.0 / 2383k / 68.071.6 / 510k / 32.5
Kimi-K2.6Same model76.3 / 1367k / 50.532.0 / 2060k / 58.073.0 / 361k / 24.4
Kimi-K2.6FC-30B-SFT76.7 / 1360k / 49.933.0 / 2150k / 58.872.8 / 373k / 26.4
Kimi-K2.6FC-4B-SFT75.3 / 1306k / 49.332.5 / 2159k / 61.672.6 / 402k / 28.0
Kimi-K2.6FC-4B-RL78.3 / 1384k / 52.133.5 / 2158k / 61.172.6 / 378k / 27.5

Score deltas and token reductions relative to w/o Explore. Bold = best, underline = second best per benchmark.

Key observations:

  • FastContext improves accuracy over direct solving for all main agents and benchmarks. Largest gain: GPT-5.4 on SWE-bench Pro (+5.5 points).
  • Token savings are substantial: up to 60.3% for GPT-5.4 on SWE-QA; 14–26% on issue-resolution tasks.
  • 4B-RL often beats 30B-SFT in both score and token efficiency (e.g., GLM-5.1 SWE-bench Pro: 22.5 vs. 20.0; Kimi-K2.6 Multilingual: 78.3 vs. 76.7).
  • RL consistently improves over SFT on the compact 4B model in all nine settings.

Standalone Exploration Quality (Table 2)

On SWE-bench Verified, patch-derived reference locations are used to compute F1 at file, module, and function granularity. FastContext is compared to baselines: RepoSearcher, LocAgent, Agentless, OrcaLoca, CoSIL, OpenHands-Bash, CodeScout.

Table 2: Standalone exploration quality on SWE-bench Verified.

ScaffoldLLMFile-level F1 / Prec. / Rec.Module-level F1 / Prec. / Rec.Function-level F1 / Prec. / Rec.
RepoSearcherQwen3-4B55.83 / 59.32 / 54.4838.66 / 35.39 / 53.5320.76 / 15.26 / 46.71
RepoSearcherQwen3-30B67.14 / 70.94 / 65.6125.18 / 22.24 / 35.9315.40 / 11.82 / 33.11
LocAgentQwen3-4B61.78 / 65.13 / 60.4243.88 / 44.80 / 47.2728.04 / 24.66 / 41.73
... (other baselines)............
FastContextGPT-5.472.34 / 76.55 / 70.6955.16 / 51.63 / 71.7635.91 / 29.15 / 70.81
FastContextGLM-5.173.88 / 77.96 / 72.2959.31 / 56.28 / 73.5143.50 / 37.46 / 72.02
FastContextKimi-K2.671.34 / 75.15 / 69.8659.34 / 56.85 / 70.8043.87 / 38.68 / 68.22
FastContextQwen3-4B62.57 / 66.13 / 61.1951.25 / 53.05 / 53.7937.80 / 37.37 / 47.22
FastContextQwen3-30B65.29 / 69.14 / 63.7857.04 / 56.48 / 64.0442.90 / 39.98 / 60.62
FastContextFC-30B-SFT73.71 / 77.76 / 72.1360.35 / 58.43 / 71.1740.74 / 35.32 / 67.97
FastContextFC-4B-SFT70.55 / 74.75 / 68.8555.26 / 53.00 / 69.2537.48 / 32.83 / 66.82
FastContextFC-4B-RL71.48 / 75.35 / 69.9256.26 / 53.80 / 70.4938.45 / 32.79 / 68.05

Bold = best per column excluding frontier-model rows; underline = second best.

Key observations:

  • FastContext trained checkpoints form the strongest group at file and module granularity, reaching 73.71 file-level F1 and 60.35 module-level F1.
  • SFT substantially improves the 4B explorer (file F1: 62.57 → 70.55; module F1: 51.25 → 55.26).
  • RL further improves the 4B model (file F1: 71.48; module F1: 56.26), mainly via higher recall.
  • The advantage is clearest at module and function level, indicating that FastContext narrows evidence toward code regions most likely to matter.

Ablation and Analysis

  • Figure 4 breaks down main-agent total tokens by action category (File Reading, Code Search, File Editing, Testing, Other, FastContext overhead). Adding FastContext drastically reduces File Reading and Code Search tokens while adding only small FastContext invocation overhead.
  • Figure 5 shows per-instance token distributions shift leftward (lower usage) for all FastContext variants.
  • Same-model exploration (frontier model also doing delegation) is usually inferior to trained FastContext in both score and token efficiency.

Theoretical and Practical Implications

  • Separation of concerns: The paper demonstrates that repository exploration can be decoupled from the solving agent and handled effectively by a much smaller, trained subagent. This modular view contrasts with monolithic agent trajectories where exploration and solving are interleaved.
  • Efficiency without sacrifice: Reducing main-agent token consumption by up to 60% while improving accuracy challenges the assumption that better results require more context. Focused, grounded evidence is more valuable than exhaustive exploration.
  • **Scal

Related papers