Skip to content

The module datasets.AutomaticSpeechRecognition is deprecated in 3.0.0, breaking some dataloaders #734

@sabilmakbar

Description

@sabilmakbar

Describe the bug

The module datasets.AutomaticSpeechRecognition (from datasets.tasks) was deprecated since 3.0.0
Release Note on 3.0.0: (https://github.com/huggingface/datasets/releases/tag/3.0.0)

Steps to reproduce the bug

# Sample code to reproduce the bug
load_dataset(
    path="SEACrowd/titml_idn",
    name="titml_idn_source",
    trust_remote_code=True
)

Expected results

It should load the dataset using load_dataset. example dataset:

load_dataset(
    path="SEACrowd/asr_indocsc",
    name="asr_indocsc_source",
    trust_remote_code=True
)
Image

Actual results

Specify the actual results or traceback.

2057 verification_mode = VerificationMode(
   2058     (verification_mode or VerificationMode.BASIC_CHECKS) if not save_infos else VerificationMode.ALL_CHECKS
   2059 )
   2061 # Create a dataset builder
-> [2062](https://vscode-remote+ssh-002dremote-002b7b22686f73744e616d65223a225265766f6c61622d547261696e65722d312d5461696c7363616c65227d.vscode-resource.vscode-cdn.net/sabil/id_asr_benchmark/.venv/lib/python3.11/site-packages/datasets/load.py:2062) builder_instance = load_dataset_builder(
   2063     path=path,
   2064     name=name,
   2065     data_dir=data_dir,
   2066     data_files=data_files,
   2067     cache_dir=cache_dir,
   2068     features=features,
   2069     download_config=download_config,
   2070     download_mode=download_mode,
...
     85     citation=_CITATION,
---> [86](https://vscode-remote+ssh-002dremote-002b7b22686f73744e616d65223a225265766f6c61622d547261696e65722d312d5461696c7363616c65227d.vscode-resource.vscode-cdn.net/sabil/~/.cache/huggingface/modules/datasets_modules/datasets/SEACrowd--titml_idn/2e3a4c66685a6eb7d1b3c3062d399b7c77fefef8571aa6b6e19f896188ccffee/titml_idn.py:86)     task_templates=[datasets.AutomaticSpeechRecognition(audio_column="audio", transcription_column="text")],
     87 )

AttributeError: module 'datasets' has no attribute 'AutomaticSpeechRecognition'

Environment info

  • datasets version: 3.6.0
  • Platform: Linux-6.8.0-101-generic-x86_64-with-glibc2.39
  • Python version: 3.11.14
  • huggingface_hub version: 1.5.0
  • PyArrow version: 23.0.1
  • Pandas version: 3.0.1
  • fsspec version: 2025.3.0

Suggestion: since only 9 datasets are still using deprecated class, we can remove it entirely the kwargs task_templates from the code:
https://github.com/search?q=repo%3ASEACrowd%2Fseacrowd-datahub%20AutomaticSpeechRecognition&type=code

https://github.com/search?q=repo%3ASEACrowd%2Fseacrowd-datahub+task_templates&type=code

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions