-
Notifications
You must be signed in to change notification settings - Fork 105
Refactor single_task_bench to remove compression references (to fix issue#250) #259
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Updated paths in YAML files to remove references to 'single_task_bench_with_compression' and use the correct 'single_task_bench' directories. Added a new edge_prompts.jsonl dataset for benchmarking. Improved LlamaCppModel with a stub fallback and environment-based sample limiting. Fixed throughput metric calculation to handle milliseconds correctly and avoid division by zero.
|
Welcome @NIKOPACK! It looks like this is your first PR to kubeedge/ianvs 🎉 |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @NIKOPACK, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request focuses on refining the LLM edge benchmarking suite by standardizing configuration paths, introducing a new dataset for relevant prompts, and enhancing the robustness and control of the LlamaCppModel. It also includes critical fixes for metric calculations to ensure accuracy and prevent errors, alongside adjustments to metric prioritization in the benchmarking job definition.
Highlights
- Benchmarking Configuration Refinement: Paths in YAML files were updated to remove references to 'single_task_bench_with_compression', standardizing on 'single_task_bench' directories. This includes updates to
benchmarkingjob.yaml,algorithm.yaml, andtestenv.yaml. - New Benchmarking Dataset: A new
edge_prompts.jsonldataset has been added, providing specific prompts for benchmarking related to edge computing and LLMs. - LlamaCppModel Robustness and Control: The
LlamaCppModelnow includes a stub fallback mechanism forllama_cppimport failures, ensuring basic functionality even if the library is not available. Additionally, an environment variable (IANVS_SAMPLE_LIMIT) was introduced to allow limiting the number of samples processed during prediction, useful for quick smoke tests. - Throughput Metric Correction: The throughput metric calculation in
throughput.pyhas been improved to correctly handletotal_latencyin milliseconds by converting it to seconds, and a division-by-zero safeguard was added. - Benchmarking Job Metric Prioritization: The
benchmarkingjob.yamlnow sorts results bylatencyin ascending order andthroughputin descending order, reflecting a preference for lower latency and higher throughput.mem_usagehas also been explicitly added to the metrics collected.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request effectively refactors the single_task_bench by removing outdated _with_compression references, which cleans up the benchmark suite. It also introduces several valuable improvements, such as adding a new dataset, making the sample limit configurable for easier testing, and providing a stub for the llama_cpp dependency. Most importantly, it corrects the metric sorting logic and fixes a critical bug in the throughput calculation. My review includes a minor suggestion to use a more specific exception when handling the optional llama_cpp import to improve code robustness.
examples/llm-edge-benchmark-suite/single_task_bench/testalgorithms/basemodel.py
Show resolved
Hide resolved
Updated paths in YAML files to remove references to 'single_task_bench_with_compression' and use the correct 'single_task_bench' directories. Added a new edge_prompts.jsonl dataset for benchmarking. Improved LlamaCppModel with a stub fallback and environment-based sample limiting. Fixed throughput metric calculation to handle milliseconds correctly and avoid division by zero. Signed-off-by: NIKOPACK <ayanamirei@mail.com>
Updated paths in YAML files to remove references to 'single_task_bench_with_compression' and use the correct 'single_task_bench' directories. Added a new edge_prompts.jsonl dataset for benchmarking. Improved LlamaCppModel with a stub fallback and environment-based sample limiting. Fixed throughput metric calculation to handle milliseconds correctly and avoid division by zero. Signed-off-by: NIKOPACK <ayanamirei@mail.com>
Updated paths in YAML files to remove references to 'single_task_bench_with_compression' and use the correct 'single_task_bench' directories. Added a new edge_prompts.jsonl dataset for benchmarking. Improved LlamaCppModel with a stub fallback and environment-based sample limiting. Fixed throughput metric calculation to handle milliseconds correctly and avoid division by zero. Signed-off-by: NIKOPACK <ayanamirei@mail.com>
…rror Signed-off-by: Aryan <nandaaryan823@gmail.com> Signed-off-by: NIKOPACK <ayanamirei@mail.com>
dataset instructions changed from hugging face to kaggle matplotlib added to requirements.txt dataset instructions changed from hugging face to kaggle dataset instructions changed from hugging face to kaggle dataset instructions changed from hugging face to kaggle Signed-off-by: Aryan Nanda <nandaaryan823@gmail.com> changes in readme of cloud-edge-collaborative-inference done to use kaggle instead of huggingface Signed-off-by: Aryan <nandaaryan823@gmail.com> readme file updated Signed-off-by: Aryan <nandaaryan823@gmail.com> print changed to logger Signed-off-by: Aryan <nandaaryan823@gmail.com> Signed-off-by: NIKOPACK <ayanamirei@mail.com>
…ge collaborative-inference-for-llm example Updated the Documentation of ianvs by replacing pcb-aoi with cloud-edge collaborative-inference-for-llm example Updates the Documentation of ianvs by replacing pcb-aoi with cloud-edge collaborative-inference-for-llm example Updated the ianvs Quick Start guide. Replaced the PCB-AOI related content with cloud-edge-collaborative-inference-for-LLM. Updated the ianvs Quick Start guide. Replaced the PCB-AOI related content with cloud-edge-collaborative-inference-for-LLM. Signed-off-by: Aryan <nandaaryan823@gmail.com> changes done on ianvs quick-start guide Signed-off-by: Aryan <nandaaryan823@gmail.com> how-to-use-ianvs-command-line updated to use cloud-edge-collaborative-inference-for-llm as an example instead of pcb-aoi Signed-off-by: Aryan <nandaaryan823@gmail.com> Cloud-Edge-Collaborative-Inference-For-LLM scenarion added to Scenarios section of docs with details of MMLU-5-Shot dataset Signed-off-by: Aryan <nandaaryan823@gmail.com> Joint Inference: Query-Routing Algorithm Added to algorithms section of ianvs documentation Signed-off-by: Aryan <nandaaryan823@gmail.com> benchmarking.yml file present in How to build simulation env updated to use cloud-edge inference for llm instead of pcb-aoi Signed-off-by: Aryan <nandaaryan823@gmail.com> testenv present in how-to-contribute-test-environments.md updated to use cloud-edge-collaborative-inference-for-llm as an example instead of pcb-aoi Signed-off-by: Aryan <nandaaryan823@gmail.com> how-to-contribute-algorithms updated to use cloud-edge-collaborative-inference-for-llm as an example as well Signed-off-by: Aryan <nandaaryan823@gmail.com> images folder removed. from docs/proposalsa/scenarios/cloud-edge-collaborative-inference Signed-off-by: Aryan <nandaaryan823@gmail.com> how-to-test-algorithms updated to include cloud-edge-collaborative-inference-for-llm example Signed-off-by: Aryan <nandaaryan823@gmail.com> leaderboard of cloud-edge-collaborative-inference-for-llm scenario added Signed-off-by: Aryan <nandaaryan823@gmail.com> Testing Joint Inference Learning in Cloud Edge Collaborative Inference for LLM Scenario with Ianvs-MMLU-5-shot dataset added Signed-off-by: Aryan <nandaaryan823@gmail.com> user_interfaces guides updated to use cloud-edge-collaborative-inference-for-llm as an example instead of pcb-aoi Signed-off-by: Aryan <nandaaryan823@gmail.com> index.rst updated to restructure leaderboards as per test-reports Signed-off-by: Aryan <nandaaryan823@gmail.com> cloud-edge-collaborative-inference-for-llm design image Signed-off-by: Aryan <nandaaryan823@gmail.com> cloud-edge-collaborative-inference-for-llm design image Signed-off-by: Aryan <nandaaryan823@gmail.com> cloud-robotics-lifelong-learning scenario added Signed-off-by: Aryan <nandaaryan823@gmail.com> dataset download instruction changed from hugging face to kaggle dataset download instruction changed from hugging face to kaggle Signed-off-by: Aryan <nandaaryan823@gmail.com> version in how to install updated Signed-off-by: NIKOPACK <ayanamirei@mail.com>
- Added pose-estimation module proposal - Added industrialEI design documentation with high-quality diagrams - Updated directory structure and documentation Signed-off-by: anshRastogi02 <rastogiansh2@gmail.com> Signed-off-by: NIKOPACK <ayanamirei@mail.com>
Signed-off-by: NishantSinghhhhh <nishantsingh_230137@aitpune.edu.in> Signed-off-by: NIKOPACK <ayanamirei@mail.com>
Updated paths in YAML files to remove references to 'single_task_bench_with_compression' and use the correct 'single_task_bench' directories. Added a new edge_prompts.jsonl dataset for benchmarking. Improved LlamaCppModel with a stub fallback and environment-based sample limiting. Fixed throughput metric calculation to handle milliseconds correctly and avoid division by zero. Signed-off-by: NIKOPACK <ayanamirei@mail.com>
|
Please rebase your branch and only include your updates in this PR. Also, please test out these changes and share the output screenshot, that would help. @NIKOPACK |
Updated paths in YAML files to remove references to 'single_task_bench_with_compression' and use the correct 'single_task_bench' directories. Added a new edge_prompts.jsonl dataset for benchmarking. Improved LlamaCppModel with a stub fallback and environment-based sample limiting. Fixed throughput metric calculation to handle milliseconds correctly and avoid division by zero.