Here is a comprehensive summary of the academic paper "OpenSeeker: Democratizing Frontier Search Agents by Fully Open-Sourcing Training Data" in Markdown format:

Summary (Overview)

  • OpenSeeker is the first fully open-source search agent (model and data) that achieves frontier-level performance through strategic data synthesis.
  • Core Innovations: Two technical methods: (1) Fact-grounded scalable controllable QA synthesis that reverse-engineers web graphs to generate complex multi-hop reasoning tasks, and (2) Denoised trajectory synthesis that uses retrospective summarization to produce high-quality actions.
  • Performance: Trained on only 11.7k synthesized samples via simple SFT, OpenSeeker achieves state-of-the-art results on multiple benchmarks (BrowseComp: 29.5%, BrowseComp-ZH: 48.4%, xbench-DeepSearch: 74.0%, WideSearch: 59.4%), surpassing industrial competitors like Tongyi DeepResearch.
  • Democratization: The work aims to break the corporate "data moat" by fully open-sourcing the complete training dataset and model weights to foster transparent research.
  • Academic Achievement: This represents the first work by a purely academic team to achieve SOTA performance on frontier search benchmarks while fully open-sourcing training data.

Introduction and Theoretical Foundation

The paper addresses a critical problem in AI research: the development of high-performance search agents has been dominated by industrial giants due to a lack of transparent, high-quality training data. This data scarcity has hindered progress in the broader research community. While corporate entities have produced capable proprietary agents (OpenAI Deep Research, Kimi-Researcher, Gemini Deep Research) and some have released open-weight models (Kimi K2 series, GLM, MiniMax M2), none have disclosed their training data, creating a "data moat."

The authors argue that to train effective deep search agents, two pivotal challenges must be addressed:

  1. High-difficulty QA: Only sufficiently complex queries compel the system to engage in rigorous multi-turn interaction cycles ("Reasoning → Tool Call → Tool Response"), generating long-horizon trajectories.
  2. High-quality trajectories: Synthesis of solution paths must rely on stable methods to ensure training signals represent "correct and generalizable" strategies.

OpenSeeker is introduced as a solution to democratize frontier search intelligence by providing the complete synthesis pipeline and high-fidelity training data.

Methodology

The methodology consists of two core technical innovations:

1. Fact-Grounded Scalable Controllable QA Synthesis

This framework constructs question-answer pairs (q,y)(q, y) directly from the web graph G=(V,E)G = (V, E), where VV denotes web pages and EE denotes hyperlinks. The pipeline operates in two phases:

Generative Construction:

  • Graph Expansion: Samples a seed node vseedVv_{seed} \sim V and traverses outgoing edges to gather kk connected nodes, forming a local dependency subgraph Gsub={vseed}{vi(vseed,vi)E}kG_{sub} = \{v_{seed}\} \cup \{v_i | (v_{seed}, v_i) \in E\}_k.
  • Entity Extraction: Identifies the central theme ythemey_{theme} of vseedv_{seed} and distills key entities from across GsubG_{sub} into a condensed Entity Subgraph GentityG_{entity}, removing textual noise.
  • Question Generation: Uses a generator PgenP_{gen} to synthesize an initial question qinitq_{init} conditioned on GentityG_{entity}, imposing a hard structural constraint that deriving ythemey_{theme} must necessitate traversing multiple edges within GentityG_{entity}.
  • Entity Obfuscation: Applies an obfuscation operator Φ\Phi to entity nodes ee to map them to vague references e~=Φ(e)\tilde{e} = \Phi(e), yielding a Fuzzy Entity Subgraph G~entity\tilde{G}_{entity}.
  • Question Obfuscation: Generates the final question q~\tilde{q} by rewriting qinitq_{init} to incorporate ambiguous descriptions from G~entity\tilde{G}_{entity} while preserving reasoning logic.

Dual-Criteria Verification via Rejection Sampling: Two indicator functions filter synthesized pairs (q~,y)(\tilde{q}, y):

  • Criterion 1 (Difficulty): I[πbase(q~)y]I[\pi_{base}(\tilde{q}) \neq y], where πbase\pi_{base} is a strong foundation model in a closed-book setting. If correct, question is discarded.
  • Criterion 2 (Solvability): I[πbase(q~Gentity)=y]I[\pi_{base}(\tilde{q}|G_{entity}) = y], where model is provided GentityG_{entity} as context. If fails, sample is rejected.

This paradigm offers three strengths:

  1. Factual grounding: Anchored in real web topology, mitigating hallucination.
  2. Scalability: Leverages TB-scale web archives as an inexhaustible source.
  3. Controllability: Difficulty calibrated by tuning subgraph size (k)(k).

2. Denoised Trajectory Synthesis

This method synthesizes high-quality search trajectories τ=[q,(r1,a1,o1),...,(rT,aT,oT),y]\tau = [q, (r_1, a_1, o_1), ..., (r_T, a_T, o_T), y], where rtr_t is reasoning, ata_t is action (tool call), and oto_t is observation (tool response).

Synthesis via Dynamic Context Denoising: The context construction follows a "Summarized History + Raw Recent" protocol:

Ht={q,(r1,a1,s1),...,(rt2,at2,st2)}{(rt1,at1,ot1)}H_t = \{q, (r_1, a_1, s_1), ..., (r_{t-2}, a_{t-2}, s_{t-2})\} \cup \{(r_{t-1}, a_{t-1}, o_{t-1})\}

where si=Summarize(oicontext)s_i = Summarize(o_i | context) is a compressed summary.

The mechanism operates in a two-phase cycle:

  1. Decision phase: Agent generates (rt,at)(r_t, a_t) based on HtH_t, which includes the full raw observation ot1o_{t-1}.
  2. Compression phase: After step tt, system retrospectively compresses ot1o_{t-1} into st1s_{t-1}, which replaces ot1o_{t-1} in long-term history for Ht+1H_{t+1}.

Asymmetric Context Training for Robust Denoising:

  • Synthesis data (Teacher): Uses clean, denoised context HtH_t containing summaries.
  • Training data (Student): Uses noisy, raw context Httrain={q,(r1,a1,o1),...,(rt1,at1,ot1)}H^{train}_t = \{q, (r_1, a_1, o_1), ..., (r_{t-1}, a_{t-1}, o_{t-1})\}. The student is supervised to predict optimal (rt,at)(r_t, a_t) given noisy context, forcing it to learn denoising capabilities.

Empirical Validation / Results

Experimental Setup:

  • Model: OpenSeeker initialized from Qwen3-30B-A3B-Thinking-2507 (30B total parameters, 3B activated).
  • Training: Single SFT run on 11.7k samples (10.3k English, 1.4k Chinese) without heuristic filtering or hyperparameter optimization.
  • Benchmarks: BrowseComp, BrowseComp-ZH, xbench-DeepSearch, WideSearch.

Key Results:

Table 1: Comparisons among OpenSeeker and other search agents

Model Name# Samples# OS SamplesTrainingAcademicBrowseCompBrowseComp-ZHxbenchWideSearch
OpenSeeker-v1-30B-SFT11.7k11.7kSFT29.5%48.4%74.0%59.4%
DeepDive-32B4.1k4.1kSFT+RL×15.3%29.7%51.8%-
MiroThinker-32B-v0.1147k147kSFT×10.6%13.8%--
WebSailor-V2-30B-SFT?0SFT×24.4%28.3%61.7%-
WebLeaper-30B15k0SFT×27.7%-66.0%44.1%
Tongyi DeepResearch?0CPT+SFT+RL×43.4%46.7%75.0%-
OpenAI-o3?0?×49.1%68.7%-60.0%

Table 2: Performance comparison of different models trained via SFT

Data# Samples# OS SamplesAcademicBrowseCompBrowseComp-ZHxbenchWideSearch-EN
OpenSeeker-v1-30B-SFT11.7k11.7k29.5%48.4%74.0%59.4%
DeepDive-32B4.1k4.1k×9.5%23.0%48.5%-
MiroThinker-32B-v0.1147k147k×10.6%13.8%--
WebSailor-V2-30B?0×24.4%28.3%61.7%-
WebLeaper-30B15k0×27.7%-66.0%44.1%

Table 3: Performance comparison under comparable data volumes

Data# Samples# OS SamplesDeveloperBrowseCompxbenchWideSearch-EN
OpenSeeker-v1-Data-11.7k11.7k11.7kAcademic29.50%74.00%59.40%
WebSailor-V2-10k10k0Tongyi24.50%62.67%38.91%
WebSailor-V2-5k + WebLeaper-Basic-5k10k0Tongyi20.67%58.33%32.26%
WebSailor-V2-5k + WebLeaper-Union-5k10k0Tongyi27.50%62.33%41.70%
WebSailor-V2-5k + WebLeaper-Reverse-Union-10k15k0Tongyi27.67%66.00%44.07%

Key Findings:

  1. Outperforming resource-intensive baselines: OpenSeeker achieves 48.4% on BrowseComp-ZH, surpassing Tongyi DeepResearch (46.7%) which uses CPT+SFT+RL.
  2. Superior performance under identical SFT setup: Among ∼30B models trained only with SFT, OpenSeeker outperforms WebSailor-V2-SFT by nearly 20% on BrowseComp-ZH.
  3. Superior performance with comparable data volume: Despite using fewer samples (11.7k vs. 10k-15k), OpenSeeker outperforms best baseline combinations by 8% on xbench and 15% on WideSearch.
  4. Data difficulty analysis: The synthesized Chinese data averages 46.35 tool calls and 76.1k tokens per trajectory, significantly more complex than BrowseComp-ZH (26.98 tool calls, 15.1k tokens).

Theoretical and Practical Implications

  • Breaking Corporate Data Monopoly: OpenSeeker dismantles the "data moat" held by industrial corporations, providing the academic community with resources to replicate industrial-grade capabilities.
  • Data Quality over Quantity: The results demonstrate that high-fidelity, complex data (even with limited volume) is more effective than large volumes of lower-quality data.
  • Democratization of Frontier AI: By fully open-sourcing the synthesis pipeline, training dataset, and model weights, this work fosters a more inclusive, transparent, and collaborative ecosystem for search agent research.
  • Methodological Advancements: The fact-grounded QA synthesis and denoised trajectory synthesis techniques provide scalable, controllable frameworks for generating high-quality training data for complex reasoning tasks.

Conclusion

OpenSeeker represents a significant breakthrough in democratizing frontier search agent development. Through two innovative data synthesis methods, it produces high-fidelity training data that enables state-of-the-art performance with only 11.7k samples and simple SFT. The work:

  • Achieves competitive results against industrial models trained with extensive resources.
  • Demonstrates that data quality is paramount over quantity.
  • Fully open-sources all components (data, model, pipeline) to break down barriers.
  • Provides a foundation for future research to optimize data distributions, implement quality filtering, and generate even more complex data.

The authors emphasize that their current results represent a lower bound due to resource constraints (single training run, no hyperparameter optimization), leaving substantial room for future improvement. OpenSeeker aims to catalyze a more open, collaborative development of autonomous agents.