Skip to content

Conversation

@gwarmstrong
Copy link
Collaborator

@gwarmstrong gwarmstrong commented Feb 10, 2026

Adds a lightweight nemo-skills-core subpackage (core/ subdirectory)
with only inference, evaluation, and tool calling deps. Default
pip install nemo-skills is unchanged (installs everything).

Changes

  • core/pyproject.toml + core/requirements.txt: New subpackage installable via pip install ./core or git URL with #subdirectory=core. Single source of truth for core deps, referenced by both core and root pyproject.toml.
  • nemo_skills/pipeline/__init__.py: Import guard using importlib.metadata -- importing pipeline modules with only core installed raises a clear ImportError instead of a cryptic ModuleNotFoundError.
  • nemo_skills/_cli_stub.py: Stub ns CLI entry point for core-only installs that prints a helpful message.
  • nemo_skills/evaluation/evaluator/__init__.py: Lazy evaluator registry using string paths instead of eager imports, so core-only installs don't fail on benchmark-specific deps (faiss, func_timeout, etc.).
  • nemo_skills/dataset/utils.py + nemo_skills/pipeline/dataset.py: Moved cluster-dependent dataset logic into pipeline module to keep core free of nemo_run imports.
  • requirements/pipeline.txt: New requirements file for pipeline-only deps (nemo_run, typer, etc.).
  • .github/workflows/tests.yml: Install uv in CI for use with testing installation.
  • Docs: Added installation guide, updated CONTRIBUTING.md with dependency placement guidance.

Summary by CodeRabbit

  • New Features

    • Lightweight core installation (nemo-skills-core) with a minimal CLI stub and modular dependency groups (core, pipeline, dev).
    • Lazy, on-demand loading for evaluators to reduce startup cost.
    • Improved dataset loading: local-first behavior and a cluster-aware dataset loader with clearer errors and deprecation guidance.
  • Documentation

    • Expanded installation guide covering package variants, dependency placement, and Core/Pipeline boundary rules.
  • Chores

    • Project packaging restructured and CI workflow step added for UV setup.

@gwarmstrong gwarmstrong force-pushed the georgea/refactor-separable-pipeline branch from 8fa5c7d to 4e2fad9 Compare February 10, 2026 22:16
@gwarmstrong gwarmstrong changed the title maint: separate dependencies for different Skills components Add nemo-skills-core subpackage and separate core/pipeline dependencies Feb 12, 2026
@gwarmstrong gwarmstrong force-pushed the georgea/refactor-separable-pipeline branch from d22246e to 76c2a18 Compare February 12, 2026 21:46
@gwarmstrong gwarmstrong changed the title Add nemo-skills-core subpackage and separate core/pipeline dependencies Add nemo-skills-core subpackage for lightweight installs Feb 13, 2026
@gwarmstrong gwarmstrong force-pushed the georgea/refactor-separable-pipeline branch from a2751f3 to f0eb8d0 Compare February 13, 2026 00:38
@gwarmstrong gwarmstrong marked this pull request as ready for review February 13, 2026 00:58
@gwarmstrong gwarmstrong requested review from Kipok February 13, 2026 00:58
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

13 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Comment on lines +106 to +107
_EVALUATOR_MAP_PATHS[eval_type] = None
_resolved_evaluator_map[eval_type] = eval_fn
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting _EVALUATOR_MAP_PATHS[eval_type] = None creates a fragile state. If _resolved_evaluator_map is ever cleared or doesn't contain the eval_type, _get_evaluator_fn will call _resolve(None) and crash.

Suggested change
_EVALUATOR_MAP_PATHS[eval_type] = None
_resolved_evaluator_map[eval_type] = eval_fn
# Store function directly, bypassing the lazy resolution path
_resolved_evaluator_map[eval_type] = eval_fn

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 13, 2026

📝 Walkthrough

Walkthrough

Adds a lightweight core package and docs, reorganizes optional dependencies, introduces CI step for UV, implements lazy evaluator resolution, refactors dataset loading to prefer local modules and delegates cluster handling to a new pipeline dataset module, and adds a runtime guard for pipeline imports.

Changes

Cohort / File(s) Summary
CI Workflow
\.github/workflows/tests.yml
Adds an "Install uv" step (astral-sh/setup-uv@v4) before dependency installation.
Docs & Contribution Guidance
CONTRIBUTING.md, docs/basics/installation.md, mkdocs.yml
Adds installation docs and a "Core / Pipeline dependency boundary" section; registers new docs nav entry.
Core package files
core/pyproject.toml, core/requirements.txt
Introduces lightweight core package metadata and its dedicated requirements file.
Root packaging & extras
pyproject.toml, requirements/pipeline.txt
Splits optional-dependencies into core, pipeline, dev groups; adds pipeline-specific deps (nemo_run, wandb, typer, constrained click, nemo-evaluator-launcher).
CLI stub
nemo_skills/_cli_stub.py
Adds a minimal CLI entrypoint that prints a message when core-only package is installed and exits.
Dataset loading (core)
nemo_skills/dataset/utils.py
Reworks get_dataset_module to prefer local imports, remove cluster helpers, emit DeprecationWarning when cluster_config is used, and delegate cluster resolution to pipeline dataset module; adds extra_datasets handling and clearer errors.
Dataset loading (pipeline)
nemo_skills/pipeline/dataset.py
New pipeline-aware dataset loader that handles cluster downloads/imports, local fallbacks, and provides get_dataset_module for cluster-enabled resolution.
Evaluator lazy loading
nemo_skills/evaluation/evaluator/__init__.py
Replaces eager evaluator imports with path-based maps and runtime import resolution (lazy loading) for evaluator functions/classes; adds resolver utilities and caches.
Pipeline import guard
nemo_skills/pipeline/__init__.py
Adds runtime check using package metadata to ensure full package is installed; raises ImportError if only core is present.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Core as nemo_skills.dataset.utils
    participant Pipeline as nemo_skills.pipeline.dataset
    participant Cluster

    rect rgba(100, 150, 200, 0.5)
    Note over User,Core: Local-only flow (default)
    User->>Core: get_dataset_module(dataset, data_dir=None)
    Core->>Core: import from nemo_skills.dataset or local path
    Core-->>User: return dataset module
    end

    rect rgba(200, 100, 150, 0.5)
    Note over User,Cluster: Cluster flow (deprecated in Core)
    User->>Core: get_dataset_module(dataset, cluster_config=...)
    Core->>Core: emit DeprecationWarning
    Core->>Pipeline: delegate get_dataset_module(...)
    Pipeline->>Cluster: fetch / download cluster module (remote)
    Cluster-->>Pipeline: module content / init.py
    Pipeline->>Core: imported module
    Core-->>User: return dataset module
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested reviewers

  • activatedgeek
  • Kipok
🚥 Pre-merge checks | ✅ 3 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 64.29% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Add nemo-skills-core subpackage for lightweight installs' directly and clearly summarizes the main objective of the PR, which is to introduce a lightweight core subpackage.
Merge Conflict Detection ✅ Passed ✅ No merge conflicts detected when merging into main

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch georgea/refactor-separable-pipeline

No actionable comments were generated in the recent review. 🎉


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Fix all issues with AI agents
In `@nemo_skills/evaluation/evaluator/__init__.py`:
- Around line 113-117: The error message incorrectly labels
`_EVALUATOR_MAP_PATHS.keys()` as "All supported types" when it only contains
function-based evaluators; update the ValueError text in the raise block (the
code that references eval_type) to clearly distinguish class-based vs
function-based types by either listing both maps together (combine
`_EVALUATOR_CLASS_MAP_PATHS.keys()` and `_EVALUATOR_MAP_PATHS.keys()`) or
renaming the second label to "Function-based evaluator types" so users see
accurate descriptions of `_EVALUATOR_CLASS_MAP_PATHS` and
`_EVALUATOR_MAP_PATHS`.
- Around line 106-107: register_evaluator currently stores None into
_EVALUATOR_MAP_PATHS[eval_type], which will cause AttributeError when code later
iterates or calls _resolve expecting a path string; change register_evaluator so
it stores a sentinel string (e.g. "<dynamic>") into
_EVALUATOR_MAP_PATHS[eval_type] instead of None, and ensure any
resolution/display logic in _resolve/_get_evaluator_fn treats that sentinel as a
dynamic entry (or filters it out) so rsplit is only called on real path strings;
update references to _EVALUATOR_MAP_PATHS, register_evaluator,
_resolved_evaluator_map, _get_evaluator_fn, and _resolve accordingly.
- Line 137: Remove the leftover debug print statement print(f"evaluator:
{evaluator}") from the module (it should not be in production code); either
delete that line or replace it with an appropriate logger.debug call using the
module logger (e.g., logger.debug("evaluator: %s", evaluator)) so diagnostics
use the configured logging system and not stdout—locate the print by searching
for the exact string and update in the __init__ module where the evaluator
variable is in scope.
- Around line 93-94: Remove the debug print statement print(f"evaluator:
{evaluator}") from the module so it no longer emits debug output; locate the
temporary print in the evaluator initialization block near where EVALUATOR_MAP
and EVALUATOR_CLASS_MAP are set and delete that single line, leaving the maps
and the helper functions (_get_evaluator_fn, _get_evaluator_cls, evaluate,
get_evaluator_class) intact so iteration via EVALUATOR_MAP/EVALUATOR_CLASS_MAP
still works per the documented design.

In `@nemo_skills/pipeline/dataset.py`:
- Around line 60-62: The check uses cluster_config.get("executor") which masks a
missing-key error; change it to access the key directly
(cluster_config["executor"]) so missing executor raises immediately, and keep
the logic that if cluster_config is None or cluster_config["executor"] in (None,
"none") then return _get_local_dataset_module(dataset, data_dir); update any
related code paths that assume executor exists (e.g., the code around
get_unmounted_path in nemo_skills/pipeline/utils/mounts.py) to rely on the same
direct-access semantics to fail fast on misconfiguration.
🧹 Nitpick comments (6)
CONTRIBUTING.md (1)

56-59: Fenced code block missing language specifier.

Minor nit from markdownlint — adding a language (e.g., text) would silence MD040.

Proposed fix
-```
+```text
 Pipeline can import from Core.
 Core CANNOT import from Pipeline.
-```
+```
core/requirements.txt (1)

17-27: Section label "math evaluation" is misleading — several packages below it aren't math-specific.

mcp, numpy, openai, requests, rich, tqdm, and transformers are general-purpose dependencies, not math evaluation specific. Consider either reorganizing sections or using a broader label like # --- general / shared ---.

nemo_skills/pipeline/dataset.py (3)

39-51: Imported module outlives its backing file.

import_from_path is called inside a TemporaryDirectory context manager. Once the with block exits, the downloaded init.py is deleted, but the module object (and its __file__ attribute) still references the now-removed path. This works at runtime because CPython caches the compiled bytecode in memory, but it can cause confusing errors if any downstream code inspects module.__file__ or attempts a reload.

Consider moving the temp directory lifecycle to the caller or keeping it alive longer if module introspection is needed.


44-50: Chain the re-raised exception for clearer tracebacks.

Per the static analysis hint (B904), raise ... from err preserves the original traceback context.

Proposed fix
         try:
             cluster_download_file(cluster_config, cluster_dataset_path, tmp_path)
-        except FileNotFoundError:
-            raise RuntimeError(
+        except FileNotFoundError as err:
+            raise RuntimeError(
                 f"Init file {mounted_path} not found on the cluster. "
                 f"Please check the dataset name you're using. Did you forget to run prepare data commands?"
-            )
+            ) from err

109-113: Chain the re-raised RuntimeError for clearer tracebacks.

Same B904 pattern — add from err to preserve the original ModuleNotFoundError context.

Proposed fix
-        except ModuleNotFoundError:
-            raise RuntimeError(
+        except ModuleNotFoundError as err:
+            raise RuntimeError(
                 f"Dataset {dataset} not found in any of the searched locations: "
                 f"{data_dir if data_dir else 'nemo_skills.dataset'}, {extra_datasets}"
-            )
+            ) from err
nemo_skills/dataset/utils.py (1)

116-135: Chain re-raised exceptions for clearer tracebacks.

Same pattern as flagged in pipeline/dataset.py — the raise RuntimeError(...) statements at Lines 120 and 126 inside except clauses should use from to preserve the original exception context.

Proposed fix
     except ModuleNotFoundError:
         dataset = dataset.replace(".", "/")
         extra_datasets = extra_datasets or os.environ.get("NEMO_SKILLS_EXTRA_DATASETS")
         if extra_datasets is None:
-            raise RuntimeError(f"Dataset {dataset} not found in {data_dir if data_dir else 'nemo_skills.dataset'}")
+            raise RuntimeError(
+                f"Dataset {dataset} not found in {data_dir if data_dir else 'nemo_skills.dataset'}"
+            ) from None
         if extra_datasets_type == ExtraDatasetType.local or extra_datasets_type is None:
             with add_to_path(extra_datasets):
                 try:
                     dataset_module = importlib.import_module(dataset)
-                except ModuleNotFoundError:
-                    raise RuntimeError(
+                except ModuleNotFoundError as err:
+                    raise RuntimeError(
                         f"Dataset {dataset} not found in any of the searched locations: "
                         f"{data_dir if data_dir else 'nemo_skills.dataset'}, {extra_datasets}"
-                    )
+                    ) from err

Comment on lines +93 to +94
EVALUATOR_MAP = _EVALUATOR_MAP_PATHS
EVALUATOR_CLASS_MAP = _EVALUATOR_CLASS_MAP_PATHS
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# Check for direct bracket access to EVALUATOR_MAP and EVALUATOR_CLASS_MAP
rg -n 'EVALUATOR_MAP\[' --type=py
rg -n 'EVALUATOR_CLASS_MAP\[' --type=py

Repository: NVIDIA-NeMo/Skills

Length of output: 44


🏁 Script executed:

# Check for .values() iteration
rg -n 'EVALUATOR_MAP\.values\(\)' --type=py
rg -n 'EVALUATOR_CLASS_MAP\.values\(\)' --type=py

Repository: NVIDIA-NeMo/Skills

Length of output: 44


🏁 Script executed:

# Check for direct assignment or import of these maps
rg -n 'from.*EVALUATOR_MAP|from.*EVALUATOR_CLASS_MAP|import.*EVALUATOR_MAP|import.*EVALUATOR_CLASS_MAP' --type=py

Repository: NVIDIA-NeMo/Skills

Length of output: 44


🏁 Script executed:

# Let's examine the __init__.py file to understand the current structure
cat -n nemo_skills/evaluation/evaluator/__init__.py | head -150

Repository: NVIDIA-NeMo/Skills

Length of output: 7640


🏁 Script executed:

# Check if __all__ is defined in the file
rg -n '__all__' nemo_skills/evaluation/evaluator/__init__.py

Repository: NVIDIA-NeMo/Skills

Length of output: 44


🏁 Script executed:

# Check test files that might import or use these maps
fd -e py -path '*test*' | xargs rg -l 'EVALUATOR_MAP|EVALUATOR_CLASS_MAP' 2>/dev/null || true

Repository: NVIDIA-NeMo/Skills

Length of output: 292


🏁 Script executed:

# Search for any imports of these specific exports
rg -n 'from nemo_skills.evaluation.evaluator import.*EVALUATOR_MAP|from nemo_skills.evaluation.evaluator import.*EVALUATOR_CLASS_MAP' --type=py

Repository: NVIDIA-NeMo/Skills

Length of output: 44


Remove debug print statement on line 137.

The print(f"evaluator: {evaluator}") should be removed.

Regarding the breaking change concern: The design explicitly intends EVALUATOR_MAP and EVALUATOR_CLASS_MAP to be used for iteration only (e.g., listing available types), with all actual access going through the helper functions (_get_evaluator_fn(), _get_evaluator_cls(), evaluate(), get_evaluator_class()). The comment on lines 90–92 documents this. No code in the repository accesses these maps via bracket notation, so there is no breaking change in practice.

🤖 Prompt for AI Agents
In `@nemo_skills/evaluation/evaluator/__init__.py` around lines 93 - 94, Remove
the debug print statement print(f"evaluator: {evaluator}") from the module so it
no longer emits debug output; locate the temporary print in the evaluator
initialization block near where EVALUATOR_MAP and EVALUATOR_CLASS_MAP are set
and delete that single line, leaving the maps and the helper functions
(_get_evaluator_fn, _get_evaluator_cls, evaluate, get_evaluator_class) intact so
iteration via EVALUATOR_MAP/EVALUATOR_CLASS_MAP still works per the documented
design.

Comment on lines +106 to +107
_EVALUATOR_MAP_PATHS[eval_type] = None
_resolved_evaluator_map[eval_type] = eval_fn
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

register_evaluator stores None in the path map, which will break _resolve.

When a dynamically registered evaluator is later resolved via _get_evaluator_fn, it correctly hits the _resolved_evaluator_map cache (Line 77). However, if any code iterates _EVALUATOR_MAP_PATHS values and tries to resolve them (or displays them), the None entry will cause an AttributeError on rsplit. Storing a sentinel like "<dynamic>" or filtering None values in error messages would be safer.

Proposed fix
-    _EVALUATOR_MAP_PATHS[eval_type] = None
+    _EVALUATOR_MAP_PATHS[eval_type] = "<dynamically-registered>"
     _resolved_evaluator_map[eval_type] = eval_fn
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
_EVALUATOR_MAP_PATHS[eval_type] = None
_resolved_evaluator_map[eval_type] = eval_fn
_EVALUATOR_MAP_PATHS[eval_type] = "<dynamically-registered>"
_resolved_evaluator_map[eval_type] = eval_fn
🤖 Prompt for AI Agents
In `@nemo_skills/evaluation/evaluator/__init__.py` around lines 106 - 107,
register_evaluator currently stores None into _EVALUATOR_MAP_PATHS[eval_type],
which will cause AttributeError when code later iterates or calls _resolve
expecting a path string; change register_evaluator so it stores a sentinel
string (e.g. "<dynamic>") into _EVALUATOR_MAP_PATHS[eval_type] instead of None,
and ensure any resolution/display logic in _resolve/_get_evaluator_fn treats
that sentinel as a dynamic entry (or filters it out) so rsplit is only called on
real path strings; update references to _EVALUATOR_MAP_PATHS,
register_evaluator, _resolved_evaluator_map, _get_evaluator_fn, and _resolve
accordingly.

Comment on lines 113 to 117
raise ValueError(
f"Evaluator class not found for type: {eval_type}.\n"
f"Available types with class support: {list(EVALUATOR_CLASS_MAP.keys())}\n"
f"All supported types: {list(EVALUATOR_MAP.keys())}"
f"Available types with class support: {list(_EVALUATOR_CLASS_MAP_PATHS.keys())}\n"
f"All supported types: {list(_EVALUATOR_MAP_PATHS.keys())}"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Misleading "All supported types" label — only shows function-based types.

Line 116 lists _EVALUATOR_MAP_PATHS.keys() under the label "All supported types", but it only includes function-based evaluators, not class-based ones. This is confusing for users hitting this error.

Proposed fix
         raise ValueError(
             f"Evaluator class not found for type: {eval_type}.\n"
             f"Available types with class support: {list(_EVALUATOR_CLASS_MAP_PATHS.keys())}\n"
-            f"All supported types: {list(_EVALUATOR_MAP_PATHS.keys())}"
+            f"All supported types: {sorted(list(_EVALUATOR_CLASS_MAP_PATHS.keys()) + list(_EVALUATOR_MAP_PATHS.keys()))}"
         )
🧰 Tools
🪛 Ruff (0.15.0)

[warning] 113-117: Avoid specifying long messages outside the exception class

(TRY003)

🤖 Prompt for AI Agents
In `@nemo_skills/evaluation/evaluator/__init__.py` around lines 113 - 117, The
error message incorrectly labels `_EVALUATOR_MAP_PATHS.keys()` as "All supported
types" when it only contains function-based evaluators; update the ValueError
text in the raise block (the code that references eval_type) to clearly
distinguish class-based vs function-based types by either listing both maps
together (combine `_EVALUATOR_CLASS_MAP_PATHS.keys()` and
`_EVALUATOR_MAP_PATHS.keys()`) or renaming the second label to "Function-based
evaluator types" so users see accurate descriptions of
`_EVALUATOR_CLASS_MAP_PATHS` and `_EVALUATOR_MAP_PATHS`.

if eval_type in EVALUATOR_CLASS_MAP:
if eval_type in _EVALUATOR_CLASS_MAP_PATHS:
evaluator = get_evaluator_class(eval_type, eval_config)
print(f"evaluator: {evaluator}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Debug print left in production code.

This looks like a leftover debug statement.

Proposed fix
-        print(f"evaluator: {evaluator}")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
print(f"evaluator: {evaluator}")
🤖 Prompt for AI Agents
In `@nemo_skills/evaluation/evaluator/__init__.py` at line 137, Remove the
leftover debug print statement print(f"evaluator: {evaluator}") from the module
(it should not be in production code); either delete that line or replace it
with an appropriate logger.debug call using the module logger (e.g.,
logger.debug("evaluator: %s", evaluator)) so diagnostics use the configured
logging system and not stdout—locate the print by searching for the exact string
and update in the __init__ module where the evaluator variable is in scope.

Comment on lines +60 to +62
if cluster_config is None or cluster_config.get("executor") in (None, "none"):
# Delegate to core for local-only loading
return _get_local_dataset_module(dataset, data_dir)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Use direct dictionary access for executor instead of .get().

cluster_config["executor"] is accessed directly elsewhere (e.g., Line 70, and in get_unmounted_path from nemo_skills/pipeline/utils/mounts.py), so executor is expected to be present. Using .get() here silently treats a missing key the same as None, masking a configuration error.

Proposed fix
-    if cluster_config is None or cluster_config.get("executor") in (None, "none"):
+    if cluster_config is None or cluster_config["executor"] in (None, "none"):

As per coding guidelines, "Do not use .get() for accessing dictionary keys if the code expects them to be present; use direct dictionary access dict[key] instead to allow proper error handling and fail fast with clear errors".

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if cluster_config is None or cluster_config.get("executor") in (None, "none"):
# Delegate to core for local-only loading
return _get_local_dataset_module(dataset, data_dir)
if cluster_config is None or cluster_config["executor"] in (None, "none"):
# Delegate to core for local-only loading
return _get_local_dataset_module(dataset, data_dir)
🤖 Prompt for AI Agents
In `@nemo_skills/pipeline/dataset.py` around lines 60 - 62, The check uses
cluster_config.get("executor") which masks a missing-key error; change it to
access the key directly (cluster_config["executor"]) so missing executor raises
immediately, and keep the logic that if cluster_config is None or
cluster_config["executor"] in (None, "none") then return
_get_local_dataset_module(dataset, data_dir); update any related code paths that
assume executor exists (e.g., the code around get_unmounted_path in
nemo_skills/pipeline/utils/mounts.py) to rely on the same direct-access
semantics to fail fast on misconfiguration.

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

13 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

Comment on lines +106 to +107
_EVALUATOR_MAP_PATHS[eval_type] = None
_resolved_evaluator_map[eval_type] = eval_fn
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting _EVALUATOR_MAP_PATHS[eval_type] = None is fragile. If _resolved_evaluator_map gets cleared or doesn't contain eval_type, _get_evaluator_fn will call _resolve(None) and crash with ValueError: not enough values to unpack.

The current implementation works only because the function is immediately added to _resolved_evaluator_map, but this implicit dependency is error-prone. Consider either:

  1. Not setting _EVALUATOR_MAP_PATHS[eval_type] at all (just use _resolved_evaluator_map)
  2. Setting it to a sentinel string that provides a better error message if accidentally resolved

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 13, 2026

Additional Comments (1)

requirements/main.txt
rich is in core/requirements.txt but missing from requirements/main.txt, violating the rule that "all core and pipeline deps must also appear in requirements/main.txt"

rich

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant