-
Notifications
You must be signed in to change notification settings - Fork 119
[Tools] Add Support for RWKV Model Evaluation #880
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[Tools] Add Support for RWKV Model Evaluation #880
Conversation
Summary of ChangesHello @yanqingliu35-stack, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces native support for evaluating RWKV language models within a standardized framework. By adding a new script and a custom adapter, it enables seamless integration with the Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds a new script for evaluating RWKV models using lm-eval. The script provides a good starting point by wrapping the RWKV model in an adapter compatible with the evaluation harness. My review includes several suggestions to improve the script's correctness, robustness, and maintainability. The most critical issue is that the script, as written, will not run any evaluation tasks because the task list is empty. I've also pointed out a bug where a global model variable is used instead of the instance's self.model, and suggested improvements like using dependency injection and enhancing command-line argument parsing for better flexibility.
tools/checkpoint/rwkv/run_lm_eval.py
Outdated
| eval_tasks = [ | ||
| #'lambada_openai', 'piqa', 'storycloze_2016', 'hellaswag', 'winogrande', | ||
| #'arc_challenge', 'arc_easy', 'headqa_en', 'openbookqa', 'sciq', | ||
| #'mmlu','glue'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The eval_tasks list is currently empty because all task names are commented out. This will cause the script to run without performing any evaluations. To make the script functional, you should uncomment the tasks you intend to run.
Additionally, there's a typo on line 40: a full-width comma (,) is used instead of a standard comma (,), which would cause a SyntaxError if that line is uncommented. I've corrected it in the suggestion.
eval_tasks = [
'lambada_openai', 'piqa', 'storycloze_2016', 'hellaswag', 'winogrande',
#'arc_challenge', 'arc_easy', 'headqa_en', 'openbookqa', 'sciq',
#'mmlu', 'glue']| all_tokens = [] | ||
| state = None | ||
|
|
||
| out, state = model.forward(context_tokens, state) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| break | ||
|
|
||
| all_tokens.append(token) | ||
| out, state = model.forward([token], state) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| ######################################################################################################## | ||
| # pip install rwkv lm_eval --upgrade | ||
|
|
||
| import os, sys, types, json, math, time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if len(sys.argv) < 2: | ||
| print("Usage: python your_script_name.py /path/to/your/model.pth") | ||
| sys.exit(1) | ||
|
|
||
| MODEL_NAME = sys.argv[1] | ||
|
|
||
| print(f'Loading model - {MODEL_NAME}') | ||
| model = RWKV(model=MODEL_NAME, strategy='cuda fp16') | ||
| pipeline = PIPELINE(model, "rwkv_vocab_v20230424") | ||
|
|
||
| eval_tasks = [ | ||
| #'lambada_openai', 'piqa', 'storycloze_2016', 'hellaswag', 'winogrande', | ||
| #'arc_challenge', 'arc_easy', 'headqa_en', 'openbookqa', 'sciq', | ||
| #'mmlu','glue'] | ||
| num_fewshot = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The script currently uses basic command-line argument parsing for the model path only, and hardcodes many important parameters like strategy, vocab, eval_tasks, and num_fewshot. This makes the script less flexible. Consider using Python's argparse module to allow users to configure these settings from the command line. This would make the script more powerful and easier to use for different evaluation scenarios.
| def __init__(self): | ||
| super().__init__() | ||
| self.tokenizer = pipeline.tokenizer | ||
| self.model = model |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The EvalHarnessAdapter's __init__ method relies on global variables pipeline and model. It's a better practice to pass these dependencies explicitly to the constructor (dependency injection). This improves modularity, reusability, and makes the class easier to test. After this change, you'll also need to update how the adapter is instantiated on line 160 to adapter = EvalHarnessAdapter(model, pipeline.tokenizer).
def __init__(self, model, tokenizer):
super().__init__()
self.tokenizer = tokenizer
self.model = model
This PR introduces native support for evaluating RWKV models
how to use