Skip to content

Conversation

@hh-space-invader
Copy link
Contributor

@hh-space-invader hh-space-invader commented Nov 8, 2024

Refactored the way we add ticks in colbert models to be replaced with query and document markers.

All Submissions:

  • Have you followed the guidelines in our Contributing document?
  • Have you checked to ensure there aren't other open Pull Requests for the same update/change?

New Feature Submissions:

  • Does your submission pass the existing tests?
  • Have you added tests for your feature?
  • Have you installed pre-commit with pip3 install pre-commit and set up hooks with pre-commit install?

New models submission:

  • Have you added an explanation of why it's important to include this model?
  • Have you added tests for the new model? Were canonical values for tests computed via the original model?
  • Have you added the code snippet for how canonical values were computed?
  • Have you successfully ran tests with your changes locally?

Comment on lines 78 to 80
if not is_doc:
onnx_input["input_ids"] = onnx_input["input_ids"][:, :original_length]
onnx_input["attention_mask"] = onnx_input["attention_mask"][:, :original_length]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we cutting back to original_length? I thought we should do it only if we exceed the model's context window size

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But there will be an output mismatch (33) as its expected to be 32 since its the minimum query length.
Please note that I'm inserting the query/document marker to the array, not replacing.

@joein joein force-pushed the change-handling-colbert-markers branch from 7e3d715 to 0424528 Compare November 13, 2024 08:46
@joein joein self-requested a review November 13, 2024 09:23
@joein joein merged commit 7c93571 into main Nov 13, 2024
@joein joein deleted the change-handling-colbert-markers branch November 13, 2024 09:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants