-
Notifications
You must be signed in to change notification settings - Fork 851
implement LLM benchmark #1129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
implement LLM benchmark #1129
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #1129 +/- ##
==========================================
+ Coverage 72.87% 72.95% +0.08%
==========================================
Files 79 79
Lines 15199 15240 +41
==========================================
+ Hits 11076 11119 +43
+ Misses 3019 3017 -2
Partials 1104 1104 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
…to CFSplit interface
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR implements LLM benchmarking capabilities for the Gorse recommendation system by adding timestamp tracking to feedback data and creating evaluation functions for different model types (collaborative filtering, attentive factorization machines, and LLM-based rankers).
Changes:
- Added timestamp tracking to the dataset structure to enable temporal data splitting
- Implemented
SplitLatest()method to create train/test splits based on most recent feedback per user - Created comprehensive benchmark command with evaluation functions for CF models, AFM, and LLM rankers
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| dataset/dataset.go | Added timestamps field and tracking, implemented SplitLatest method for temporal splitting, added GetItems to CFSplit interface |
| dataset/dataset_test.go | Updated tests for AddFeedback signature change, added test coverage for SplitLatest method |
| model/cf/evaluator_test.go | Updated AddFeedback call to include timestamp parameter |
| master/tasks.go | Updated AddFeedback call to pass actual timestamp from feedback |
| logics/chat.go | Added ChatTemplateKwargs to disable thinking mode for chat completions |
| common/parallel/parallel_test.go | Increased test sleep duration from 100ms to 1 second |
| cmd/gorse-benchmark/main.go | Implemented comprehensive benchmark command with EvaluateCF, EvaluateAFM, EvaluateLLM functions and CTR dataset splitting logic |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…tLatest, update LLM evaluation, and integrate table output for metrics
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 8 out of 9 changed files in this pull request and generated 8 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…aluateAFM, ensure proper error handling, and streamline output formatting
… handling, and log execution duration
No description provided.