Skip to content

Comments

fix: resolve retrieval dataset corpus paths relative to training file#1367

Merged
adil-a merged 2 commits intomainfrom
oholworthy/retrieval-dataset-resolve-relative-corpus
Feb 24, 2026
Merged

fix: resolve retrieval dataset corpus paths relative to training file#1367
adil-a merged 2 commits intomainfrom
oholworthy/retrieval-dataset-resolve-relative-corpus

Conversation

@oliverholworthy
Copy link
Contributor

  • Relative corpus paths in retrieval training JSON files are now resolve relative to the JSON file's directory, not the working directory
  • This makes training data portable across machines and containers without needing to rewrite corpus paths

Changelog

  • load_datasets() in retrieval_dataset.py: resolve relative path entries in the corpus config before passing them to add_corpus()
  • Added unit test covering the relative path resolution

Tests

  • New unit test test_load_datasets_resolves_relative_corpus_path
  • All existing test_retrieval_dataset.py tests pass

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?

If you haven't finished some of the above items you can still open "Draft" PR.

Signed-off-by: Oliver Holworthy <1216955+oliverholworthy@users.noreply.github.com>
Signed-off-by: Oliver Holworthy <1216955+oliverholworthy@users.noreply.github.com>
@oliverholworthy oliverholworthy force-pushed the oholworthy/retrieval-dataset-resolve-relative-corpus branch from e1a77c6 to 4359e8a Compare February 24, 2026 16:27
@oliverholworthy
Copy link
Contributor Author

/ok to test f2938d7

Copy link
Collaborator

@adil-a adil-a left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @oliverholworthy

@adil-a adil-a merged commit 9183cb0 into main Feb 24, 2026
51 checks passed
@adil-a adil-a deleted the oholworthy/retrieval-dataset-resolve-relative-corpus branch February 24, 2026 19:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants