added phonetics, speech_disorder, and speech_enhancement tasks - stil… #22

pcsid · 2025-12-19T20:41:51Z

…l in need of full model scoring. Fixed small inconsistency bug in config by changing judge_properties to judge_settings.

📌 Description

🔗 Related Issue(s)

🛠️ Type of Change

Bug fix (non-breaking change that fixes an issue)
New feature (non-breaking change that adds functionality including new tasks)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Documentation update
Refactor / Code cleanup
Maintenance / Chore / Task
Other (please describe):

✅ How Has This Been Tested?

Unit tests
Integration tests
Manual testing

Test Results / Screenshots (if applicable):

📸 Screenshots / Demos

📋 Checklist

Code follows project style guidelines
Tests have been added/updated (if applicable)
Documentation has been updated (if applicable)
Linked relevant issue(s)
Self-reviewed my code

🙌 Additional Notes

Added three new tasks - please test on full dataset, don't have access to cluster so can't test that many samples myself.

Phonemes tasks not working as expected as audio clips are too short(0.5-1 seconds), do model often fails to recognize audio.
Stuttering task worked fine, but only tested on few samples
Noise detection task worked fine, but only tested on few samples

Small bug fix - reads judge_settings instead of judge_properties from the config - maintains similarity with other documentation and codebase pattern(judge_settings for top level, judge_properties when creating judge object itself)

…l in need of full model scoring. Fixed small inconsistency bug in config by changing judge_properties to judge_settings.

nhhoang96

Make sure to include the minimum results to compare against the existing reported results from current literature. Update in the PR Screenshots/ Demos section.

Performance results on phoneme and noise_detection tasks are low. Please inspect and submit output samples to verify the expected behaviors of the added tasks.

tasks/speech_enhancement/noise_detection/guassian_noise_detection.yaml

pcsid · 2026-01-05T01:52:53Z

I checked the three new tasks against the reported scores. Here is the leaderboard I used - https://huggingface.co/spaces/DynamicSuperb/leaderboard

The speech disoder task about stuttering scores finished at 57% for gpt-4o-audio. The reported scores from DynamicSuperb were all around 50%(random guessing), which made sense considering they are older, open source LLMs. The paper - https://arxiv.org/pdf/2102.12394 - with trained specialized audio models(not LLMs) scored much better, ranging from 61% to 67%.

The speech enhancement task on Gaussian Noise Detection for gpt-4-audio scored 53.5%, which was better than the reported scores on the Huggingface leaderboard, mostly from weaker open source models. These scores were probably very low on a 50/50 task due to refusals/improper prompting.

The Phonetics task for phoneme counting for gpt-4-audio scored 24.1%, which was better than the exact-match accuracy scores of the open source model on the DynamicSuperb Leaderboard, which ranged from 1% to 21%.

For all 3 of these tasks, and especially the last two, LLMs as a whole done seem very good at solving these problems as they are very niche classification tasks, not well-suited for the strengths of LLMs.

nhhoang96

LGTM

added phonetics, speech_disorder, and speech_enhancement tasks - stil…

d42af02

…l in need of full model scoring. Fixed small inconsistency bug in config by changing judge_properties to judge_settings.

pcsid requested review from akshaykalkunte, jash-mehta-3300, nhhoang96 and oluwanifemibamgbose December 19, 2025 20:41

pcsid self-assigned this Dec 19, 2025

pcsid added bug Something isn't working enhancement New feature or request labels Dec 19, 2025

nhhoang96 requested changes Dec 24, 2025

View reviewed changes

tasks/speech_enhancement/noise_detection/guassian_noise_detection.yaml Outdated Show resolved Hide resolved

Update the correct HF path for noise_detection task

6afecca

nhhoang96 self-assigned this Dec 30, 2025

updated scores

3b7472c

pcsid requested a review from nhhoang96 January 5, 2026 01:53

nhhoang96 approved these changes Jan 5, 2026

View reviewed changes

nhhoang96 merged commit a77e996 into main Jan 5, 2026

nhhoang96 deleted the feat/dynamic-tasks branch January 5, 2026 21:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

added phonetics, speech_disorder, and speech_enhancement tasks - stil… #22

added phonetics, speech_disorder, and speech_enhancement tasks - stil… #22

Uh oh!

pcsid commented Dec 19, 2025 •

edited

Loading

Uh oh!

nhhoang96 left a comment

Uh oh!

Uh oh!

pcsid commented Jan 5, 2026

Uh oh!

nhhoang96 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

added phonetics, speech_disorder, and speech_enhancement tasks - stil… #22

added phonetics, speech_disorder, and speech_enhancement tasks - stil… #22

Uh oh!

Conversation

pcsid commented Dec 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📌 Description

🔗 Related Issue(s)

🛠️ Type of Change

✅ How Has This Been Tested?

📸 Screenshots / Demos

📋 Checklist

🙌 Additional Notes

Uh oh!

nhhoang96 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

pcsid commented Jan 5, 2026

Uh oh!

nhhoang96 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

pcsid commented Dec 19, 2025 •

edited

Loading