Add extended task for LiveCodeBench codegeneration#548
Add extended task for LiveCodeBench codegeneration#548NathanHB merged 23 commits intohuggingface:mainfrom
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
Hi ! Thanks for the PR. |
|
Hi @plaguss @NathanHB will it be possible to run this eval without needing a YAML file? The reason I ask is that all of our codebases assume one can run Also, perhaps we can speed this up dramatically by using |
Hi Lewis, I coulnd't find a way of passing the generation parameters in the CLI, which seem relevant for this model. I can update the code to pass them through ARGS (it should be here unless there's already a better way, @NathanHB?) NEW: lighteval vllm \
"pretrained=deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B,dtype=float16,data_parallel_size=4,max_model_length=32768,gpu_memory_utilisation=0.8,generation_parameters={temperature:0.7,top_p:5}" \
"extended|lcb:codegeneration|0|0" \
--custom-tasks src/lighteval/tasks/extended/lcb/main.py \
--output-dir $OUTPUT_DIR \
--save-detailsNow we could read the generation parameters from the model args following this pattern, let me know what you both think.
Sure, I run it with |
|
The 32B is still running due to an error, but the other values can be found here:
|
Great ! thanks for adding a way to pass generation params as args |
NathanHB
left a comment
There was a problem hiding this comment.
Great work on this ! The results look great ! I was only wondering about dynamically changing the metric config at runtime, and if you could add some docs !
Otherwise ready to merge :)
| with open(model_args, "r") as f: | ||
| config = yaml.safe_load(f)["model"] | ||
| model_args = config["base_params"]["model_args"] | ||
| metric_options = config.get("metric_options", {}) |
There was a problem hiding this comment.
can you add some docs for this ?
src/lighteval/pipeline.py
Outdated
| if metric_data := self._metric_options.get(metric.metric_name, None): | ||
| num_samples = metric_data.get("num_samples", None) | ||
| if num_samples: | ||
| task.num_samples.append(num_samples) |
There was a problem hiding this comment.
Done, it had 2 bugs in fact, thanks! now works as expected:
for metric in task.metrics:
if metric_data := self._metric_options.get(metric.metric_name, None):
num_samples = metric_data.get("num_samples", None)
if num_samples:
task.num_samples = [num_samples]```shell
lighteval vllm \
"pretrained=deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B,dtype=float16,data_parallel_size=4,max_model_length=32768,gpu_memory_utilisation=0.8,generation_parameters={temperature: 0.7}" \
"extended|lcb:codegeneration|0|0" \
--use-chat-template
```
```shell
lighteval vllm \
"pretrained=deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B,dtype=float16,data_parallel_size=4,max_model_length=32768,gpu_memory_utilisation=0.8,generation_parameters={temperature: 0.7}" \
"extended|lcb:codegeneration|0|0" \
--use-chat-template
```

Adds a new extended task to run LiveCodeBench's codegeneration subset.
The results for
deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B:lighteval vllm \ "pretrained=deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B,dtype=float16,data_parallel_size=4,max_model_length=32768,gpu_memory_utilisation=0.8,generation_parameters={temperature: 0.7}" \ "extended|lcb:codegeneration|0|0" \ --use-chat-templatewith the yaml file like so:
lighteval vllm \ "lcb.yaml" \ "extended|lcb:codegeneration|0|0" \ --use-chat-template ... | Task |Version|Metric|Value| |Stderr| |-----------------------------|------:|------|----:|---|-----:| |all | |maj@16|0.163|± |0.0188| |extended:lcb:codegeneration:0| 0|maj@16|0.163|± |0.0188|Note: This is just an idea, not sure it's the best approach.
Additionally it adds a way of updating the number of samples required to run a metric via the yaml file:
Under the
metric_options, an entry can be added with themetric_nameto be updated. It would just work withnum_samples, but defined like this it shouldn't need more updates. Otherwise, thenum_samplescan be informed using themetric_name.