Skip to content

Unsplitted comma on NusaASR Benchmark List #736

@sabilmakbar

Description

@sabilmakbar

Describe the bug

Unsplitted comma on NusaASR Benchmark List (https://github.com/SEACrowd/seacrowd-datahub/blob/master/seacrowd/config_helper.py#L393-L395)

Image

Steps to reproduce the bug

# Sample code to reproduce the bug
from seacrowd.config_helper import BENCHMARK_DICT
seacrowd_nusa_asr = sc.load_benchmark("NusaASR")
len(BENCHMARK_DICT["NusaASR"]) == len(seacrowd_nusa_asr) #expected True, got False

the left one has 19 entry, the right one has 18 (1 invalid), but from the manual count it should be 20

Environment info

  • datasets version: 2.21.0
  • Platform: Linux-6.8.0-101-generic-x86_64-with-glibc2.39
  • Python version: 3.11.14
  • huggingface_hub version: 1.5.0
  • PyArrow version: 23.0.1
  • Pandas version: 3.0.1
  • fsspec version: 2024.6.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions