-
Notifications
You must be signed in to change notification settings - Fork 3
Open
Description
Feature Type
New functionality
Problem Statement
Human generated QA sets are time consuming. Automated QA are hard when they involve complexity more than single hop QA with targeted domain.
Proposed Solution
Advanced Reading Comprehension Dataset Annotation over Individual Earth Science Papers
Objective:
To annotate high-quality QA pairs that require deep reading over full-length earth science papers. The questions should involve synthesizing information from various content types (e.g., text snippets, text-table, text-figure) and require long-form answers.
Annotation Process:
Stage 1: Expert Pilot Annotation
1. Deep Reading & Question Formulation: Experts perform either thorough or skim reading of all sections for a paper p. Based on a predefined schema (https://proceedings.mlr.press/v202/lee23n/lee23n.pdf), they formulate testing and reasoning-intensive questions. The schema can be adapted to the earth science domain.
2. Answer Construction
For each question q derived from a paper p, experts:
a) Highlight all relevant content units in p that support answering q.
b) Compose a comprehensive long-form answer a grounded in the highlighted evidence.
c) Indicate whether the answer required additional background knowledge or inference beyond the highlighted content, or the question is not answerable.
Stage 2: LLM-based auto-annotation
Using the QA taxonomy developed in Stage 1, using a large LLM to generate question-answer pairs for a large set of earth science papers.
Stage 3: Expert verification
Domain experts review and filter the automatically generated QA pairs from Stage 2, verifying their quality.
Advanced Reading Comprehension Dataset Annotation over Multiple Earth Science Papers
Repeat the above process, this time focusing on the related work sections of each paper. Ensure that all QA pairs require synthesizing information from multiple papers.
We need to find ways that exist in the literature that target Multi Hop / long form QA pair generation that need deeper zero shot synthetic QA generation
Alternative Solutions
User Benefits
would speed up testing retrieval systems evaluation
Implementation Ideas
No response
Contribution
- I'm willing to submit a PR for this feature
- I'm willing to test this feature
- I'm willing to help document this feature
Additional Context
- What does it mean to test Literature review components?
Metadata
Metadata
Assignees
Labels
No labels