Detection of non-zero-shot models from the annotations we have #2158
Replies: 3 comments 2 replies
-
I HAVE NOT RIGOROUSLY CROSS-VALIDATED MY RESULTS, THIS MIGHT BE INCORRECT |
Beta Was this translation helpful? Give feedback.
-
I think this issue is related #1636 |
Beta Was this translation helpful? Give feedback.
-
Should we examine discrepancies between a held-out group of models to see what it misses, what it gets wrong, and what it gets correct? (just glancing over these, there seem to be a lot of false positives; we might adjust the threshold to do this) Will you also share the script: - e.g., just drop it in the scripts folder in a branch |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Since we have quite a few annotations on what models were and weren't trained on given datasets, I thought it might make sense to look into whether some patterns could be found and some models could be identified as cases of training on benchmark datasets.
I have rained a random forest classifier from z-score normalized scores and n_parameters to whether the model is zero shot, and then made predictions for all models lacking annotations.
I have a list of models and task, where one could suspect that the model has been trained on the benchmark task.
The results should definitely be interpreted with a pinch of salt.
While some of them seem unlikely (I highly doubt that gte-Qwen1.5-7B-instruct has been trained on DKHateClassification for instance), a lot of these seem very reasonable and confirm my intuition of what some of these might have been trained on.
It is also reassuring that for instance, with Linq embedding, where we know that our annotations were incorrect, some the tasks are marked as having been trained on.
cc. @KennethEnevoldsen @isaac-chung @Samoed @tomaarsen @Muennighoff
Beta Was this translation helpful? Give feedback.
All reactions