-
Notifications
You must be signed in to change notification settings - Fork 3
Description
We need to add the following dataset from Lighteval.
abstract_narrative_understanding: Tests understanding of abstract narratives.
analogical_similarity: Tests analogical reasoning.
arithmetic_bb: Basic arithmetic reasoning.
cause_and_effect: Tests understanding of causal relationships.
chess_state_tracking: Logical reasoning in a structured domain.
common_morpheme: Tests understanding of word structures.
contextual_parametric_knowledge_conflicts: Tests contextual understanding.
dyck_languages: Tests understanding of nested structures.
elementary_math_qa: Elementary math question answering.
formal_fallacies_syllogisms_negation: Logical fallacy detection.
general_knowledge: General knowledge question answering.
geometric_shapes: Understanding of geometric concepts.
logical_deduction: Logical deduction tasks.
mathematical_induction: Mathematical induction reasoning.
metaphor_understanding: Understanding of metaphors.
natural_instructions: Natural language instruction following.
object_counting: Counting objects in a description.
qa_wikidata: Question answering using Wikidata.
reasoning_about_colored_objects: Reasoning about colored objects.
simple_arithmetic_json: Simple arithmetic tasks.
tracking_shuffled_objects: Tracking objects in a shuffled sequence.
unit_conversion: Unit conversion tasks.
Harness Tasks (harness):
bbh:logical_deduction_three_objects : Logical deduction with three objects.
bbh:movie_recommendation : Movie recommendation based on preferences.
bbh:navigate : Navigation tasks based on descriptions.
bbh:ruin_names : Understanding of ruin names.
bbh:salient_translation_error_detection : Detecting salient translation errors.
bbh:snarks : Understanding of snarky statements.
bbh:sports_understanding : Understanding of sports concepts.
bbh:temporal_sequences : Understanding of temporal sequences.
bbh:tracking_shuffled_objects_three_objects : Tracking shuffled objects.
HELM Tasks (helm):
bigbench:auto_debugging : Debugging code based on descriptions.
bigbench:code_line_description : Describing lines of code.
bigbench:conceptual_combinations : Understanding conceptual combinations.
bigbench:conlang_translation : Translating constructed languages.
bigbench:emoji_movie : Identifying movies from emoji descriptions.
bigbench:linguistics_puzzles : Solving linguistic puzzles.
bigbench:logical_deduction-three_objects : Logical deduction with three objects.
bigbench:misconceptions_russian : Identifying misconceptions in Russian.
bigbench:novel_concepts : Understanding novel concepts.
bigbench:symbol_interpretation : Interpreting symbolic representations.
bigbench:vitaminc_fact_verification : Fact verification.
bigbench:winowhy : Understanding why questions.
Leaderboard Tasks (leaderboard):
arc:challenge : AI2 Reasoning Challenge.
gsm8k: General Science questions.
hellaswag: HellaSwag: Can a Machine Tell a Good Story?
mmlu:high_school_mathematics : Mathematics at the high school level.
mmlu:high_school_physics : Physics at the high school level.
mmlu:high_school_biology : Biology at the high school level.
mmlu:high_school_chemistry : Chemistry at the high school level.
mmlu:high_school_computer_science : Computer Science at the high school level.
mmlu:high_school_psychology : Psychology at the high school level.
mmlu:high_school_us_history : US History at the high school level.
mmlu:high_school_world_history : World History at the high school level.
truthfulqa:mc : TruthfulQA with multiple choice answers.
winogrande: Winograd schema tasks.
LightEval Tasks (lighteval):
agieval:aqua-rat : AQUA-RAT: Arithmetic questions.
agieval:gaokao-mathqa : Gaokao math questions.
blimp:adjunct_island : BLiMP syntactic tasks.
blimp:animate_subject_passive : BLiMP syntactic tasks.
blimp:causative : BLiMP syntactic tasks.
blimp:complex_NP_island : BLiMP syntactic tasks.
blimp:determiner_noun_agreement_1 : BLiMP syntactic tasks.
blimp:drop_argument : BLiMP syntactic tasks.
blimp:ellipsis_n_bar_1 : BLiMP syntactic tasks.
blimp:existential_there_object_raising : BLiMP syntactic tasks.
blimp:inchoative : BLiMP syntactic tasks.
blimp:left_branch_island_echo_question : BLiMP syntactic tasks.
blimp:matrix_question_npi_licensor_present : BLiMP syntactic tasks.
blimp:npi_present_1 : BLiMP syntactic tasks.
blimp:passive_1 : BLiMP syntactic tasks.
blimp:principle_A_c_command : BLiMP syntactic tasks.
blimp:regular_plural_subject_verb_agreement_1 : BLiMP syntactic tasks.
blimp:sentential_negation_npi_licensor_present : BLiMP syntactic tasks.
blimp:superlative_quantifiers_1 : BLiMP syntactic tasks.
blimp:tough_vs_raising_1 : BLiMP syntactic tasks.
blimp:wh_island : BLiMP syntactic tasks.
blimp:wh_questions_subject_gap : BLiMP syntactic tasks.
coqa: CoQA: Conversational Question Answering.
gsm8k: General Science questions.
lambada:openai : LAMBADA language modeling task.
math:algebra : Math algebra questions.
math:geometry : Math geometry questions.
math:prealgebra : Math pre-algebra questions.
mathqa: Math question answering.
piqa: P IQ-A: Commonsense reasoning.
super_glue:boolq : SuperGLUE boolean questions.
super_glue:cb : SuperGLUE comprehension boolean questions.
super_glue:copa : SuperGLUE causal reasoning.
super_glue:rte : SuperGLUE recognizing textual entailment.
super_glue:wic : SuperGLUE word in context.
super_glue:wsc : SuperGLUE winograd schema challenges.
truthfulqa:gen : TruthfulQA generative.
Original Tasks (original):
arc:c:simple : ARC-Easy: Simple science questions.