Awesome-AgenticRAG-Data

Abstract

Large Language Models (LLMs) excel at natural language understanding and generation, yet their reliance on static pre-training corpora may lead to outdated knowledge, hallucinations, and limited adaptability. Retrieval-Augmented Generation (RAG) mitigates these issues by grounding model outputs with external retrieval, but conventional RAG remains constrained by a fixed retrieve–then–generate routine and struggles with multi-step reasoning and tool calls. Agentic RAG addresses these limitations by enabling LLM agents to actively decompose tasks, issue exploratory queries, and refine evidence through iterative retrieval. Despite growing interest, the development of Agentic RAG is impeded by data scarcity: unlike traditional RAG, it requires challenging tasks that require planning, retrieval, and multiple reasoning decisions, and corresponding rich, interactive agent trajectories. This survey presents the first data-centric overview of Agentic RAG, framing its data lifecycle—data collecting, data preprocessing and task formulation, task construction, data for evaluation, and data enhancement for training—and cataloging representative systems and datasets in different domains (\eg question answering, web, software engineering). From data perspectives, we aim to guide the creation of scalable, high-quality datasets for the next generation of adaptive, knowledge-seeking LLM agents.

Introduction

Large Language Models (LLMs) have greatly advanced AI with strong natural language understanding and generation.
Yet their dependence on static pre-training data leads to outdated facts, hallucinations, and limited adaptability to fast-changing information. Retrieval-Augmented Generation (RAG) mitigates these issues by augmenting LLMs with retrieving real-time knowledge from external databases, APIs, or the web to ground generation.
Nevertheless, traditional RAG follows a fixed retrieve–then-generate routine and struggles with multi-step reasoning or iterative retrieval.

Recent developments in agentic AI introduce autonomous LLM-based agents that can plan, reflect, and coordinate tool use.
Combining this paradigm with RAG yields Agentic RAG, where agents actively drive retrieval, assess evidence, and refine outputs through iterative interaction.

Unlike traditional RAG, these RAG-reasoning agents perform active knowledge seeking: decomposing tasks, issuing exploratory queries to multiple sub-agents, and looping retrieval until sufficient information is obtained.

Despite growing interest, Agentic RAG development is hindered by data scarcity.
Unlike traditional RAG—where static corpora suffice—Agentic RAG requires challenging tasks that require planning, retrieval, and multiple reasoning decisions, and corresponding rich, interactive agent trajectories.

Stage	Traditional RAG	Agentic RAG
Data Collection	Static data (e.g., Wikipedia, ArXiv)	Interactive data (e.g., tool/API usage, web navigation)
Task Construction	Basic tasks (single-step, solvable with direct retrieval)	Hard tasks (requiring decomposition, different tools, and reasoning)
Evaluation Metrics	Correctness	Multiple axes (e.g., correctness, efficiency, safety)
Data for Training	Chain-of-Thought	Thought–action trajectories, preference pairs, process rewards, new data generated during training for self-improvement

Table 1. Comparison of traditional RAG and Agentic RAG in data lifecycle.

Such data are costly to annotate, difficult to scale, and prone to quality issues when automatically synthesized. Therefore, curating scalable and high-quality datasets and benchmarks has been a central problem in the development of Agentic RAG systems.

The data curation process in Agentic RAG has two distinctive aspects:

Traditional RAG vs. Agentic RAG: traditional RAG relies on query–document pairs, whereas Agentic RAG demands rich agent–environment interaction traces encoding planning and retrieval actions.
Agentic RAG vs. general agents: general agents often use tools such as calculators or code interpreters for problem solving, whereas Agentic RAG uses search engines and knowledge bases for knowledge seeking. In the former cases, tools provide clear solutions, while in Agentic RAG, tools may actually bring more information for the agent to process.

This survey frames Agentic RAG through a data lifecycle that spans data collecting, data preprocessing and task formulation, task construction, data for evaluation, and data enhancement for training. Specifically, we adopt a generate-verify-filter/refine pipeline to analyze the curation process of tasks and trajectories.

Data Lifecycle

Overview

Data Collecting
- Static Data
- Interactive Data
Data Preprocessing and Task Formulation
- Preprocessing
- Task Formulation
Task Construction: Annotation and Synthesis
Data for Evaluation
- Decontamination
- Evaluation Metrics and Approaches
  - Correctness
  - Beyond Correctness
Data Enhancement for Training
- SFT
  - Basic Tool-usage Skills
  - Thought–action Trajectories
- RL
  - Outcome-based Rewards
  - Data-aware Rewards

Data Collecting

Static Data

Wikipedia
- (TACL 2019) Natural Questions: A Benchmark for Question Answering Research [Paper] [Code]
- (EMNLP 2018) HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering [Paper] [Code]
- (COLING 2020) Constructing A Multi-hop QA Dataset for Comprehensive Evaluation of Reasoning Steps (2WikiMultihopQA) [Paper] [Code]
Github repositories
- (ICLR 2024) SWE-bench: Can Language Models Resolve Real-world Github Issues? [Paper] [Code]
Kaggle competitions
- (ICLR 2025) MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering [Paper] [Code]

Interactive data

API-based retrieval
- (WWW 2025) FlashRAG: A Modular Toolkit for Efficient Retrieval-Augmented Generation Research [Paper] [Code]
Web navigation
- WebGPT: Browser-assisted question-answering with human feedback [Paper]
- WebDancer: Towards Autonomous Information Seeking Agency [Paper] [Code]

Data Preprocessing and Task Formulation

Preprocessing

(EMNLP2025) LightRAG: Simple and Fast Retrieval-Augmented Generation [Paper] [Code] (relation schemas)
T-GRAG: A Dynamic GraphRAG Framework for Resolving Temporal Conflicts and Redundancy in Knowledge Retrieval [Paper] [Code] (chronological structure)

Task Formulation

Close-ended
- (ACL 2017) TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension [Paper] [Code]
- (TACL 2019) Natural Questions: A Benchmark for Question Answering Research [Paper] [Code]
Real-world workflows
- (ICLR 2024) SWE-bench: Can Language Models Resolve Real-world Github Issues? [Paper] [Code]
- (ICLR 2025) MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering [Paper] [Code]
Creative (Academic Writing)
- (Neurips 2024) AutoSurvey: Large Language Models Can Automatically Write Surveys [Paper] [Code]
- (ACL 2025) SurveyForge: On the Outline Heuristics, Memory-Driven Generation, and Multi-dimensional Evaluation for Automated Survey Writing [Paper] [Code]
- SurveyX: Academic Survey Automation via Large Language Models [Paper] [Code]
- Agent Laboratory: Using LLM Agents as Research Assistants [Paper] [Code]

Task Construction: Annotation and Synthesis

Generate

Curating Methods

Crowdsourced
- (TACL 2019) Natural Questions: A Benchmark for Question Answering Research [Paper] [Code]
- (EMNLP 2018) HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering [Paper] [Code]
- Measuring short-form factuality in large language models (SimpleQA) [Paper] [Code]
- BrowseComp: A Simple Yet Challenging Benchmark for Browsing Agents [Paper] [Code]
- (ICLR 2024) GAIA: a benchmark for General AI Assistants [Paper] [Dataset]
Ready tasks on Internet
- (ACL 2017) TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension [Paper] [Code]
- (ICLR 2024) SWE-bench: Can Language Models Resolve Real-world Github Issues? [Paper] [Code]
- (ICLR 2025) MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering [Paper] [Code]
Synthetic
- (COLING 2020) Constructing A Multi-hop QA Dataset for Comprehensive Evaluation of Reasoning Steps (2WikiMultihopQA) [Paper] [Code]
- (ACL 2024) INTERS: Unlocking the Power of Large Language Models in Search with Instruction Tuning [Paper] [Code]
- (Neurips 2024) Gorilla: Large Language Model Connected with Massive APIs [Paper] [Code]
- WebDancer: Towards Autonomous Information Seeking Agency [Paper] [Code]

Difficulty Enhancement

Complexity
- (EMNLP 2018) HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering [Paper] [Code] (multi hops)
- (COLING 2020) Constructing A Multi-hop QA Dataset for Comprehensive Evaluation of Reasoning Steps (2WikiMultihopQA) [Paper] [Code] (multi hops)
- (TACL 2022) MuSiQue: Multihop Questions via Single-hop Question Composition [Paper] [Code] (multi hops)
- TaskCraft: Automated Generation of Agentic Tasks [Paper] [Code] (multi hops)
- WebDancer: Towards Autonomous Information Seeking Agency [Paper] [Code] (multi hops)
- (ACL 2024) On the Multi-turn Instruction Following for Conversational Web Agents [Paper] [Code] (multi-turn conversations)
- (ACL 2025) WebWalker: Benchmarking LLMs in Web Traversal [Paper] [Code] (multiple webpages)
- (ICLR 2024) SWE-bench: Can Language Models Resolve Real-world Github Issues? [Paper] [Code] (repo-level coding)
- (ICLR 2024) RepoBench: Benchmarking Repository-Level Code Auto-Completion Systems [Paper] [Code] (repo-level coding)
- (ICLR 2024) GAIA: a benchmark for General AI Assistants [Paper] [Dataset] (multiple tools)
Uncertainty
- (TACL 2021) Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning Strategies (StrategyQA) [Paper] [Code] (implicit reasoning tasks)
- (COLING 2020) Constructing A Multi-hop QA Dataset for Comprehensive Evaluation of Reasoning Steps (2WikiMultihopQA) [Paper] [Code] (distractors in reference documents)
- (TACL 2022) MuSiQue: Multihop Questions via Single-hop Question Composition [Paper] [Code] (distractors in reference documents, unanswerable questions)
- WebSailor: Navigating Super-human Reasoning for Web Agent [Paper] [Code] (obfuscate key information)
- BrowseComp: A Simple Yet Challenging Benchmark for Browsing Agents [Paper] [Code] (inverted problems)
Expertise
- (COLM 2024) GPQA: A Graduate-Level Google-Proof Q&A Benchmark [Paper] [Code]
- Humanity's Last Exam [Paper] [Code]

Verify

Methods

Human-based (inter-annotator agreement)
- Measuring short-form factuality in large language models (SimpleQA) [Paper] [Code]
- BrowseComp: A Simple Yet Challenging Benchmark for Browsing Agents [Paper] [Code]
- (ICLR 2024) GAIA: a benchmark for General AI Assistants [Paper] [Dataset]
LLM-based
- (ACL 2024 findings) Chain-of-Verification Reduces Hallucination in Large Language Models [Paper] [Code]

Overlooked Validity Criteria

QA
- Measuring short-form factuality in large language models (SimpleQA) [Paper] [Code] (unique, time-invariant answer)
- (ICLR 2024) GAIA: a benchmark for General AI Assistants [Paper] [Dataset] (unique, time-invariant answer)
Code
- (ICLR 2024) SWE-bench: Can Language Models Resolve Real-world Github Issues? [Paper] [Code] (environment reproducible, reference code passable)

Filter/Refine

Quality

(ACL 2025) WebWalker: Benchmarking LLMs in Web Traversal [Paper] [Code] (linguistic naturalness)
A Functionality-Grounded Benchmark for Evaluating Web Agents in E-commerce Domains (Amazon-bench) [Paper] (linguistic naturalness)
(EMNLP 2020) Is Multihop QA in DIRE Condition? Measuring and Reducing Disconnected Reasoning [Paper] [Code] (no data leakage or exploitable shortcuts)
(TACL 2022) MuSiQue: Multihop Questions via Single-hop Question Composition [Paper] [Code] (no data leakage or exploitable shortcuts)
Agent Laboratory: Using LLM Agents as Research Assistants [Paper] [Code] (source credibility)

Difficulty

rule-based
- TaskCraft: Automated Generation of Agentic Tasks [Paper] [Code] (number of hops)
- (ACL 2025) WebWalker: Benchmarking LLMs in Web Traversal [Paper] [Code] (number of hops)
- (ICLR 2024) GAIA: a benchmark for General AI Assistants [Paper] [Dataset] (number of tools)
- (TACL 2022) MuSiQue: Multihop Questions via Single-hop Question Composition [Paper] [Code] (with or without unanswerable questions)
- (COLM 2024) GPQA: A Graduate-Level Google-Proof Q&A Benchmark [Paper] [Code] (accuracy of experts and non-experts)
LLM-based (LLM's success rate as proxy)
- (Neurips 2024) Easy2Hard-Bench: Standardized Difficulty Labels for Profiling LLM Performance and Generalization [Paper] [Code]
- TaskEval: Assessing Difficulty of Code Generation Tasks for Large Language Models [Paper]

Data for Evaluation

Decontamination

(TACL 2022) MuSiQue: Multihop Questions via Single-hop Question Composition [Paper] [Code] (filter out multi-hop questions in test split with any identical single-hop component in train split)
(ICLR 2024) GAIA: a benchmark for General AI Assistants [Paper] [Dataset] (question does not exist on the internet in plain text)

Evaluation Metrics and Approaches

Correctness

For this part, please refer to task formulation for the papers.

Gold-standard answers
Programmatic validators
LLM-as-a-judge

Beyond Correctness

(ACL 2025) WebWalker: Benchmarking LLMs in Web Traversal [Paper] [Code] (efficiency: the action count of successful agentic executions)
A Functionality-Grounded Benchmark for Evaluating Web Agents in E-commerce Domains (Amazon-bench) [Paper] (safety: benign failures vs. harmful failures)

Data Enhancement for Training

SFT

Basic Tool-usage Skills

(Neurips 2023) Toolformer: Language Models Can Teach Themselves to Use Tools [Paper] [Code] (modify pretraining corpora)
(ACL 2024) INTERS: Unlocking the Power of Large Language Models in Search with Instruction Tuning [Paper] [Code] (integrate multiple resources into meta-datasets)
(Neurips 2024) Gorilla: Large Language Model Connected with Massive APIs [Paper] [Code] (self-instruction and in-context learning)

Thought–action Trajectories

Generate
- (Neurips 2022) STaR: Bootstrapping Reasoning With Reasoning [Paper] [Code] (in-context bootstrapping)
- Distilling LLM Agent into Small Models with Retrieval and Code Tools [Paper] [Code] (trajectory distillation)
- WebSailor: Navigating Super-human Reasoning for Web Agent [Paper] [Code] (trajectory distillation)
- WebShaper: Agentically Data Synthesizing via Information-Seeking Formalization [Paper] [Code] (trajectory distillation)
Filter/Refine
- (ACL 2025 Findings) Unveiling the Key Factors for Distilling Chain-of-Thought Reasoning [Paper] [Code] (quality influenced by factors such as trajectory granularity, formatting choices, and the teacher model used)
- WebShaper: Agentically Data Synthesizing via Information-Seeking Formalization [Paper] [Code] (conciseness: filters out trajectories with severe repetition)
- WebSailor: Navigating Super-human Reasoning for Web Agent [Paper] [Code] (conciseness: reconstructs concise rationales from action–observation sequences)
- Deconstructing Long Chain-of-Thought: A Structured Reasoning Optimization Framework for Long CoT Distillation [Paper](conciseness: removes redundant or incorrect reasoning paths)

RL

Outcome-based Rewards

(COLM 2025) Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning [Paper] [Code]
R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning [Paper] [Code]
DeepResearcher: Scaling Deep Research via Reinforcement Learning in Real-world Environments [Paper] [Code]
ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning [Paper] [Code]
WebDancer: Towards Autonomous Information Seeking Agency [Paper] [Code]
WebSailor: Navigating Super-human Reasoning for Web Agent [Paper] [Code]
WebShaper: Agentically Data Synthesizing via Information-Seeking Formalization [Paper] [Code]

Data-aware Rewards

(COLM 2025) DeepRetrieval: Hacking Real Search Engines and Retrievers with Large Language Models via Reinforcement Learning [Paper] [Code] (retrieval rewards)
ReZero: Enhancing LLM search ability by trying one-more-time [Paper] [Code] (retrieval rewards)
(Neurips 2025) WebThinker: Empowering Large Reasoning Models with Deep Research Capability [Paper] [Code] (preference pairs based on quality, efficiency, and conciseness)

Domain-Specific Agentic RAG Benchmarks

Name	Task	Source	Scale	Metrics	Data Curating Method
Question Answering (QA)
NQ	Single-hop QA	Google queries, Wikipedia	train 307k, dev 7.8k, test 7.8k	-	Select queries from Google. Search for relevant documents in Wikipedia, and ask annotators to identify answers and filter low-quality questions.
TriviaQA	Single-hop QA	Quiz websites, Wikipedia and Internet	train 76.5k, val 10.0k, test 9.5k	-	Select questions from 14 quiz websites. Search for relevant documents in Wikipedia and Internet, and keep those with answers.
SimpleQA	Single-hop QA	Crowdsourced	4326	-	Annotators create questions with unique time-invariant answer. All questions are verified by another person independently. Keep only those that are incorrectly answered at least once in 4 times by gpt-4.
HotpotQA	Multi-hop QA	Crowdsourced from Wikipedia	train 90.4k, val 7.4k, test 7.4k	-	Build a relation graph from the links in Wikipedia. Choose relevant paragraphs from it, and ask annotators to create multi-hop questions based on the paragraphs and identify supporting facts in them.
2WikiMultihopQA	Multi-hop QA	Synthesized from Wikipedia	train medium 155k, train hard 12.6k, dev 12.6k, test 12.6k	-	Classify the entities in Wikidata. Manually write different question templates, and sample entities to create questions. Filter out questions with no answer or multiple answers. Add distractors in supporting documents.
MuSiQue	Multi-hop QA	Synthesized and annotated from Wikipedia	train 39.9k, val 4.8k, test 4.9k	-	Collect Wikipedia-based single-hop questions. Compose 2-hop questions and filter out those with shortcuts. Build different multi-hop question structures and crowdsource questions. Add distractors in supporting documents. Add unanswerable questions.
Bamboogle	Multi-hop QA	Manually created from Wikipedia	125	-	Create 2-hop questions based on Wikipedia. Keep only those that cannot be directly searched for the correct answer.
Taskcraft	Multi-hop QA	Synthesized from different corpus	36k	-	Generate single-hop questions based on different corpus by LLM. Extend to multi-hop questions via depth-based and width-based extension. Filter out those with shortcuts.
Web
WebArena	QA-like & task-oriented web interaction	Custom web environments (shopping, email, forum, map, social media)	7 environments, 812 tasks	Task success rate	Provide realistic multi-page websites. Annotators design diverse tasks requiring navigation, reasoning and interaction.
AgentBench	Open-ended web tasks with tool use	Real-world web APIs and websites	8 domains, 2000+ tasks	Success rate, human eval	Collect tasks from multiple domains (travel, shopping, QA, etc.). Provide tool APIs and human-verified success criteria.
GAIA	Complex open-domain information-seeking	Live web environment	466 tasks (300 retained answers)	F1 score, factual accuracy	Ask annotators to design multi-step questions requiring reasoning, planning and external search. Include hidden evaluation sets to test real-time retrieval.
BrowseComp	Fact-seeking QA over web browsing	Internet (open web), human-crafted QA	1,266 questions	Exact match	Questions designed so answer is short and verifiable. Human annotators ensure difficulty (not solved by existing models, not in top search results), enforce time/effort thresholds.
WebWalkerQA	Multi-hop QA via web navigation	Real Wikipedia + open web	680 questions	Exact match, F1 score	Generate multi-hop QA pairs requiring active web navigation. Filter with LLM-based difficulty control and human verification.
Amazon-Bench	E-commerce	Live Amazon.com webpages	400 user queries across 7 task types	Task success rate, harmful/benign failure rate, efficiency	Explore and categorize 60k+ Amazon pages. Sample diverse pages by functionality score, then prompt LLMs to generate realistic user queries and refine them to make them sound more natural and user-like.
Software Engineering
SWE-bench	Generate a pull request (PR) to solve a given issue	GitHub issues from 12 Python repositories	train 19k, test 2294	Unit test pass rate	Select PRs that resolve an issue and contribute tests. Keep only those that install successfully and passes all tests.
RepoBench	Code retrieval, code completion	Github-code dataset, Github Python and Java repositories	Python 24k, Java 26k	Golden snippet matching, line matching	Random sample lines as completion goals (with a first-to-use subset). Extract candidate snippets based on import codes, and annotate golden snippets.
DevEval	Repository-level function completion	Popular repositories from PyPI	1874	Unit test pass rate, recall of reference dependency	Select functions with test cases from repositories. Ask annotators to write requirements and reference dependencies. Filter out those with no cross-file dependency.
Machine Learning
MLAgentbench	Improve the performance metric by at least 10% over the baseline in the starter code	Kaggle	13	Success rate of 10% improvement, total time and tokens	Manually construct task description, starter code and evaluation code.
MLEbench	Achieve the best score on a metric pre-defined for each competition	Kaggle	75	Test score compared on leaderboard (e.g. medals)	Crawl task description, dataset, grading code and leaderboard from Kaggle website. Keep only those reproducible and up-to-date. Manually label the category and difficulty.
Medical
MedQA	Four-option multiple-choice question	National Medical Board Examination	train 48.9k, dev 6.1k, test 8.1k	Exact match	Collect question-answer pairs from the National Medical Board Examination.
MedMCQA	Four-option multiple-choice QA resembling medical exams	Open websites and books, All India Institute of Medical Sciences, National Eligibility cum Entrance Test	train 18.2k, dev 4.2k, test 6.2k	Exact Match	Collect question-answer pairs from medical examinations. Use rule-based method to preprocess the data. Split the dataset by exams (the training set consists of questions from mock and online exams, while the developing and test set consists of questions from formal exams.)
Quilt-VQA	VQA (Vision question answering)	Educational histopathology videos in Youtube	Image-dependent: 1055, General-knowledge: 255	LLM evaluation	Localize the "?"s in the video's transcript. Extract the relevant texts and images. Prompt GPT-4 to generate QA pairs. Perform a manual verification.
PathVQA	VQA	Electronic pathology textbooks and Pathology Education Informational Resource Digital Library website	Images: 4998, QA pairs: 32799	Accuracy(yes/no questions), exact match, Macro-averaged F1, BLEU	Extract images and their captions from the data sources. Perform natural language processing of the captions to break a long sentence into several short ones and get POS tagging. Generate open-ended questions based on POS tags and named entities.
PMC-VQA	VQA	PMC-OA	Images: 149k, QA pairs: 227k	BLEU, accuracy	Prompt ChatGPT with the images and captions to generate QA pairs. Perform LLM-based and manual data filtering.
PathMMU	VQA	PubMed, EduContent, Atlas, SocialPath, PathCLS	Images: train 16312, val 510, test 7213; QA pairs: train 23041, val 710, test 9677	-	Extract image-caption pairs from the data source. Prompt GPT-4V to generate detailed description of images and then three questions per image. Perform expert validation.
Legal
LegalBench	Issue-spotting, rule-recall, rule-application and rule-conclusion, interpretation, rhetorical-understanding	Existing datasets, in-house datasets	9.1k	Accuracy, human evaluation	Filter and restructure the data from the data sources.
LegalBench-RAG	Retrieve snippets from legal corpora	LegalBench, PrivacyQA, CUAD, MAUD, ContractNLI	6889	Recall@k, precision@k	Start from LegalBench queries. Trace back each query’s context to its original document span in the corpus. Final dataset pairs each query with its exact evidence.

Metrics for QA are generally string matching (exact/fuzzy) or F1, and are omitted in the table.

Related Surveys

Agentic Retrieval-Augmented Generation: A Survey on Agentic RAG [Paper] [GitHub] (a general survey on Agentic RAG pipelines and frameworks)
(EMNLP 2025) Towards Agentic RAG with Deep Reasoning: A Survey of RAG-Reasoning Systems in LLMs [Paper] [GitHub] (the reasoning methods and frameworks in Agentic RAG)

Contributing

We welcome contributions to expand this collection! To add your work, please:

Submit a Pull Request or Open an Issue with the following information:
- Paper Title: Your paper's full title
- Paper Link: DOI, arXiv, or conference link
- GitHub Repository: Link to your open-source implementation (if available)
- Category: Specify which stage under our lifecycle your work belongs to:
  - Data Collecting: Static Data / Interactive Data
  - Data Preprocessing and Task Formulation: Preprocessing / Task Formulation
  - Task Construction: Annotation and Synthesis: Generate / Verify / Filter
  - Data for Evaluation: Decontamination / Evaluation Metrics and Approaches
  - Data Enhancement for Training: SFT / RL
Notice that your work may belong to multiple stages. Please choose 1-3 main focus of your work.
Format: Follow the existing format in the README for consistency.
Relevance: Ensure your work is relevant to Agentic RAG data.

Your contributions help build a comprehensive resource for the research community!

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Figures		Figures
LICENSE		LICENSE
README.md		README.md

License

fatty-belly/Awesome-AgenticRAG-Data

Folders and files

Latest commit

History

Repository files navigation