Skip to content

Conversation

@paulirish
Copy link

What does this PR do ?

Optimizes model loading performance for ASR models, specifically reducing Canary's setup time by ~44% (from 41.1s to 23.1s) through optimized config processing and eliminating redundant archive extractions.

Collection: [ASR]

Changelog

  • Prevent EncDecMultiTaskModel from re-extracting tarballs when they are already handled by nemo.utils.model_utils.
  • Optimized nemo.utils.model_utils.maybe_update_config_version and convert_model_config_to_dict_config to support in-place updates via a make_copy parameter, avoiding expensive OmegaConf deep copies.
  • Updated Serialization and FileIO mixins in nemo.core.classes.common to use non-copying config conversions where safe.
  • Enabled in-place config resolution in EncDecMultiTaskModel, EncDecCTCModelBPE, and EncDecHybridRNNTCTCBPEModel.

Measured performance gains on Canary model load:
- Baseline: 41.1s
- After config optimizations: 32.9s (~20% improvement)
- After just tar extraction fix: 25.6s (~37% improvement)
- Combined: 23.1s (~44% improvement)

These two commits are separate and I'm happy to drop one.

Usage

No changes to public APIs. Models will load significantly faster.

from nemo.collections.asr.models import EncDecMultiTaskModel
# This call is now ~18s faster for Canary
model = EncDecMultiTaskModel.from_pretrained("nvidia/canary-1b-v2")

See #15240 for a more complete repro script

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
    • No, as the existing tests cover loading
  • Did you add or update any necessary documentation?
    • Nope.
  • Does the PR affect components that are optional to install?
    • No.

PR Type:

  • New Feature
  • Bugfix
  • Documentation

--

fixes #15240 cc @nithinraok

Copy link
Member

@nithinraok nithinraok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@paulirish Thanks for the great PR, really appreciate it.

Made some comments.

@nithinraok
Copy link
Member

For fixing format checker for CI-CD test, pls run:

python setup.py style --scope=<filepath> --fix

(we are working on improving contributing guide)

nithinraok
nithinraok previously approved these changes Jan 7, 2026
paulirish and others added 6 commits January 8, 2026 09:53
Reduces load time by ~8.1s (41.1s -> 32.96s) on warm start.
Avoids unnecessary deep copies in `nemo.utils.model_utils` and `nemo.core.classes.modelPT`.
Enables in-place config updates for `EncDecMultiTaskModel`, `EncDecCTCModelBPE`, and `EncDecHybridRNNTCTCBPEModel`.
Updates Serialization and FileIO mixins to use optimized config conversion.

Signed-off-by: Paul Irish <[email protected]>
Reduces load time by ~9.8s when combined with config optimizations (32.96s -> 23.14s).
On its own, reduces load time by ~15.5s (41.1s -> 25.62s).
Prevents `EncDecMultiTaskModel` from re-extracting tarballs when they are already handled by `nemo.utils.model_utils`.

Signed-off-by: Paul Irish <[email protected]>
Signed-off-by: nithinraok <[email protected]>
Signed-off-by: nithinraok <[email protected]>
Signed-off-by: nithinraok <[email protected]>
@paulirish
Copy link
Author

@nithinraok thank you.
I did much of the same over the weekend but my simplification wasn't as effective as yours, so I appreciate you stepping in.

@nithinraok
Copy link
Member

@nithinraok thank you. I did much of the same over the weekend but my simplification wasn't as effective as yours, so I appreciate you stepping in.

Thanks @paulirish. Appreciate it. Delay in CI due to many open PRs, Will take care of this PR. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Performance: EncDecMultiTaskModel (Canary) initialization triggers double .nemo extraction and recursive heavy reload (~52s cold start)

3 participants