-
Notifications
You must be signed in to change notification settings - Fork 1.1k
[reranker] refactor reranker #7334
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[reranker] refactor reranker #7334
Conversation
Summary of ChangesHello @Jintao-Huang, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses a bug in the calculation of reranker metrics by refining how logits are processed and selected before evaluation. The changes streamline the logic for handling Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request refactors the reranker metric calculation by moving the padding_free logic into the _preprocess_generative_reranker_logits function. While this is a good structural change, the implementation in _preprocess_generative_reranker_logits introduces a potential runtime error by not correctly handling unexpected logit shapes. I've provided a suggestion to fix this issue while retaining the intended refactoring.
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request refactors the handling of generative rerankers by centralizing the logit processing logic into a new utility function, get_generative_reranker_logits. This significantly simplifies the code in the trainer, loss functions, and inference engine, reducing duplication and improving maintainability.
The changes are well-structured, but I've identified a critical issue in the inference engine where the new utility function is called with incorrect arguments, which will lead to a runtime error. I've also suggested a minor improvement for error handling in the new utility function to make it more robust.
Overall, this is a great refactoring. Once the identified issues are addressed, the code will be much cleaner.
| positive_token_id = tokenizer.convert_tokens_to_ids(positive_token) | ||
| negative_token_id = tokenizer.convert_tokens_to_ids(negative_token) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tokenizer.convert_tokens_to_ids call can fail if the positive or negative tokens (e.g., 'yes'/'no') are not present in the tokenizer's vocabulary. The previous implementation handled this with a try...except block. It's good practice to add similar error handling here to provide a more informative error message to the user.
| positive_token_id = tokenizer.convert_tokens_to_ids(positive_token) | |
| negative_token_id = tokenizer.convert_tokens_to_ids(negative_token) | |
| try: | |
| positive_token_id = tokenizer.convert_tokens_to_ids(positive_token) | |
| negative_token_id = tokenizer.convert_tokens_to_ids(negative_token) | |
| except Exception as e: | |
| raise ValueError( | |
| f"Failed to convert reranker tokens '{positive_token}' or '{negative_token}' to IDs. " | |
| f"Please check if these tokens exist in the tokenizer vocabulary. Error: {e}" | |
| ) from e |
No description provided.