Skip to content

Conversation

@nithinraok
Copy link
Member

Important

The Update branch button must only be pressed in very rare occassions.
An outdated branch is never blocking the merge of a PR.
Please reach out to the automation team before pressing that button.

What does this PR do ?

This change converts command strings to argument lists, which is the recommended secure approach for subprocess execution.

Collection: [ASR]

Changelog

  • Refactored run_chunked_inference() in tools/asr_evaluator/utils.py to use shell=False by converting the command string to a list of arguments.
  • Refactored run_offline_inference() in tools/asr_evaluator/utils.py to use shell=False by converting the command string to a list of arguments.

Usage

No changes to usage. The ASR evaluator functions wors as before:

from tools.asr_evaluator.utils import run_asr_inference
from omegaconf import DictConfig

cfg = DictConfig({
    "model_path": "/path/to/model.nemo",
    "pretrained_name": None,
    "inference": {"mode": "offline", "decoder_type": "ctc"},
    "test_ds": {
        "manifest_filepath": "/path/to/manifest.json",
        "batch_size": 32,
        "num_workers": 4
    },
    "random_seed": 42,
    # ... other config options
})

result_cfg = run_asr_inference(cfg)

GitHub Actions CI

The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.

The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

  • New Feature
  • Bugfix
  • Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

  • Related to # (issue)

Signed-off-by: nithinraok <[email protected]>
@nithinraok nithinraok requested a review from melllinia December 22, 2025 21:28
nithinraok and others added 2 commits December 22, 2025 13:35
Comment on lines +168 to +169
'Mozilla/5.0 (Windows NT 10.0; WOW64) '
'AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36',

Check warning

Code scanning / CodeQL

Implicit string concatenation in a list Warning

Implicit string concatenation. Maybe missing a comma?

Copilot Autofix

AI 18 days ago

To fix this, we should make the concatenation of the two string literals in the commands list explicit. The goal is to keep the User-Agent value exactly the same while avoiding implicit string concatenation inside the list, thereby satisfying CodeQL and improving readability.

The best minimal change is to join the two adjacent literals on lines 168–169 with + so that Python’s intent is clear while still passing a single User-Agent string as the value for the --user-agent option. We should only edit the string element in the commands list and not change any other behavior or imports.

Concretely, in scripts/dataset_processing/get_commonvoice_data.py, locate the commands = [ block in main() and replace the two-line implicit concatenation:

            'Mozilla/5.0 (Windows NT 10.0; WOW64) '
            'AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36',

with an explicit concatenation (split across lines for readability):

            'Mozilla/5.0 (Windows NT 10.0; WOW64) ' +
            'AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36',

No new methods, imports, or definitions are needed.

Suggested changeset 1
scripts/dataset_processing/get_commonvoice_data.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/scripts/dataset_processing/get_commonvoice_data.py b/scripts/dataset_processing/get_commonvoice_data.py
--- a/scripts/dataset_processing/get_commonvoice_data.py
+++ b/scripts/dataset_processing/get_commonvoice_data.py
@@ -165,7 +165,7 @@
         commands = [
             'wget',
             '--user-agent',
-            'Mozilla/5.0 (Windows NT 10.0; WOW64) '
+            'Mozilla/5.0 (Windows NT 10.0; WOW64) ' +
             'AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36',
             '-O',
             output_archive_filename,
EOF
@@ -165,7 +165,7 @@
commands = [
'wget',
'--user-agent',
'Mozilla/5.0 (Windows NT 10.0; WOW64) '
'Mozilla/5.0 (Windows NT 10.0; WOW64) ' +
'AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36',
'-O',
output_archive_filename,
Copilot is powered by AI and may make mistakes. Always verify output.
Copy link
Member

@melllinia melllinia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions
Copy link
Contributor

[🤖]: Hi @nithinraok 👋,

We wanted to let you know that a CICD pipeline for this PR just finished successfully.

So it might be time to merge this PR or get some approvals.

//cc @chtruong814 @ko3n1g @pablo-garay @thomasdhc

@nithinraok nithinraok merged commit 4de2018 into main Dec 23, 2025
71 of 72 checks passed
@nithinraok nithinraok deleted the update_asr_evaluator_utils branch December 23, 2025 13:51
chtruong814 pushed a commit that referenced this pull request Jan 5, 2026
* update subprocess cmd

Signed-off-by: nithinraok <[email protected]>

* common voice script

Signed-off-by: nithinraok <[email protected]>

* Apply isort and black reformatting

Signed-off-by: nithinraok <[email protected]>

---------

Signed-off-by: nithinraok <[email protected]>
Signed-off-by: nithinraok <[email protected]>
Co-authored-by: nithinraok <[email protected]>
Signed-off-by: Charlie Truong <[email protected]>
chtruong814 pushed a commit that referenced this pull request Jan 6, 2026
* update subprocess cmd

Signed-off-by: nithinraok <[email protected]>

* common voice script

Signed-off-by: nithinraok <[email protected]>

* Apply isort and black reformatting

Signed-off-by: nithinraok <[email protected]>

---------

Signed-off-by: nithinraok <[email protected]>
Signed-off-by: nithinraok <[email protected]>
Co-authored-by: nithinraok <[email protected]>
Signed-off-by: Charlie Truong <[email protected]>
chtruong814 added a commit that referenced this pull request Jan 6, 2026
* update subprocess cmd



* common voice script



* Apply isort and black reformatting



---------

Signed-off-by: nithinraok <[email protected]>
Signed-off-by: nithinraok <[email protected]>
Signed-off-by: Charlie Truong <[email protected]>
Co-authored-by: Nithin Rao <[email protected]>
Co-authored-by: nithinraok <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants