Skip to content
Change the repository type filter

All

    Repositories list

    • HintEval

      Public
      HintEval💡: A Comprehensive Framework for Hint Generation and Evaluation for Questions
      Python
      33600Updated Jan 6, 2026Jan 6, 2026
    • TempRetriever: Fusion-based Temporal Dense Passage Retrieval for Time-Sensitive Questions Accepted at WSDM main 2026
      0000Updated Dec 1, 2025Dec 1, 2025
    • RankArena

      Public
      RankArena: A Unified Platform for Evaluating Retrieval, Reranking and RAG with Human and LLM Feedback — CIKM ’25, Seoul, Nov 10–14, 2025.
      Python
      1200Updated Nov 26, 2025Nov 26, 2025
    • Python
      2930Updated Nov 11, 2025Nov 11, 2025
    • Parse

      Public
      An Open-Domain Reasoning Question Answering Benchmark for Persian
      0000Updated Nov 2, 2025Nov 2, 2025
    • Survey of datasets, methods, and tools for Temporal Question Answering.
      0100Updated Oct 29, 2025Oct 29, 2025
    • Rankify

      Public
      🔥 Rankify: A Comprehensive Python Toolkit for Retrieval, Re-Ranking, and Retrieval-Augmented Generation 🔥. Our toolkit integrates 40 pre-retrieved benchmark datasets and supports 7+ retrieval techniques, 24+ state-of-the-art Reranking models, and multiple RAG methods.
      Python
      4052851Updated Oct 23, 2025Oct 23, 2025
    • DeAR (Deep Agent Rank): Dual-Stage Document Reranking with Reasoning Agents Accepted at EMNLP Findings 2025
      Python
      0610Updated Oct 23, 2025Oct 23, 2025
    • SustainableQA: A Comprehensive Question Answering Dataset for Corporate Sustainability and EU Taxonomy Reporting
      Python
      23700Updated Oct 9, 2025Oct 9, 2025
    • HintQA

      Public
      Exploring Hint Generation Approaches in Open-Domain Question Answering
      Jupyter Notebook
      23000Updated Sep 19, 2025Sep 19, 2025
    • How Good are LLM-based Rerankers? Accepted at EMNLP Findings 2025
      01000Updated Aug 28, 2025Aug 28, 2025
    • Evaluating Robustness of LLMs in Question Answering on Multilingual NOisy OCR Data
      Python
      0600Updated Aug 20, 2025Aug 20, 2025
    • ChroniclingAmericaQA: A Large-scale Question Answering Dataset based on Historical American Newspaper Pages
      Python
      11300Updated Aug 19, 2025Aug 19, 2025
    • Wrong Answers Can Also Be Useful: PlausibleQA — A QA Dataset with Answer Plausibility Scores
      2900Updated Jul 27, 2025Jul 27, 2025
    • WikiHint

      Public
      WikiHint: A Human-Annotated Dataset for Hint Ranking and Generation
      Python
      1400Updated Jul 27, 2025Jul 27, 2025
    • TriviaHG

      Public
      A Dataset for Automatic Hint Generation from Factoid Questions
      Python
      22700Updated Jul 27, 2025Jul 27, 2025
    • Detecting Temporal Ambiguity in Questions
      0400Updated Nov 26, 2024Nov 26, 2024
    • ArabicaQA

      Public
      ArabicaQA: Comprehensive Dataset for Arabic Question Answering accepted at SIGIR 2024
      Python
      51800Updated Jul 28, 2024Jul 28, 2024