Skip to content

Fix output path argument bug in build_sequences_per_dataset script #4143

@DhineshPonnarasan

Description

@DhineshPonnarasan

Summary

The script build_sequences_per_dataset.py defines the CLI argument --per-dataset-sequences-path, but later writes to a different non-existent argument name. This causes a runtime failure when saving output.

Problem

In build_sequences_per_dataset.py, the script uses args.path_to_sequences_per_dataset_json, but that field is never defined by argparse.

The defined argument is:

--per-dataset-sequences-path

Steps to Reproduce

  1. Run the script with valid data arguments and:
    --per-dataset-sequences-path out.json
  2. Let the script reach the output-writing section.
  3. Observe an AttributeError for missing path_to_sequences_per_dataset_json.

Expected Behavior

The script should write JSON output to the path provided by --per-dataset-sequences-path and finish successfully.

Proposed Fix

  1. Replace:

    args.path_to_sequences_per_dataset_json

    with:

    args.per_dataset_sequences_path

    in:

    • build_sequences_per_dataset.py
  2. Optional cleanup:

    • Fix malformed help text so train, valid, test key names are formatted correctly.

Acceptance Criteria

  • Script runs without AttributeError.
  • Output JSON file is created at the path passed to --per-dataset-sequences-path.
  • Console success message prints the correct output path.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions