-
Notifications
You must be signed in to change notification settings - Fork 370
Open
Labels
Description
Describe the bug
While trying to evaluate mixeval, I get this error after generation is completed (maybe the scorer?)
EngineCore_0 pid=3460416) WARNING 10-03 20:40:28 [cudagraph_dispatcher.py:101] cudagraph dispatching keys are not initialized. No cudagraph will be used. | 0/500 [00:00<?, ?it/s]
Adding requests: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 500/500 [00:00<00:00, 10319.61it/s]
Processed prompts: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 500/500 [02:14<00:00, 3.73it/s, est. speed input: 564.50 toks/s, output: 403.32 toks/s]
Splits: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [02:14<00:00, 134.30s/it]
╭─────────────────────────────────────────────────────────────────────────────────────────────────────────────── Traceback (most recent call last) ────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ /fsx/lewis/git/hf/lighteval/src/lighteval/main_vllm.py:129 in vllm │
│ │
│ 126 │ │ metric_options=metric_options, │
│ 127 │ ) │
│ 128 │ │
│ ❱ 129 │ pipeline.evaluate() │
│ 130 │ │
│ 131 │ pipeline.show_results() │
│ 132 │
│ │
│ /fsx/lewis/git/hf/lighteval/src/lighteval/pipeline.py:282 in evaluate │
│ │
│ 279 │ │ │ │ ) │
│ 280 │ │ │ │ outputs = self._run_model() │
│ 281 │ │ else: │
│ ❱ 282 │ │ │ outputs = self._run_model() │
│ 283 │ │ │
│ 284 │ │ if self.is_main_process(): │
│ 285 │ │ │ self._post_process_outputs(outputs) │
│ │
│ /fsx/lewis/git/hf/lighteval/src/lighteval/pipeline.py:335 in _run_model │
│ │
│ 332 │ │ if self.model.is_async: │
│ 333 │ │ │ outputs = asyncio.run(self._run_model_async()) │
│ 334 │ │ else: │
│ ❱ 335 │ │ │ outputs = self._run_model_sync() │
│ 336 │ │ │
│ 337 │ │ # Cleaning up the model before running metrics │
│ 338 │ │ self.model.cleanup() │
│ │
│ /fsx/lewis/git/hf/lighteval/src/lighteval/pipeline.py:316 in _run_model_sync │
│ │
│ 313 │ │ │ logger.info(f"Running {sampling_method} requests") │
│ 314 │ │ │ match sampling_method: │
│ 315 │ │ │ │ case SamplingMethod.GENERATIVE: │
│ ❱ 316 │ │ │ │ │ model_outputs = self.model.greedy_until(docs) │
│ 317 │ │ │ │ │ outputs[sampling_method] = model_outputs │
│ 318 │ │ │ │ case SamplingMethod.LOGPROBS: │
│ 319 │ │ │ │ │ model_outputs = self.model.loglikelihood(docs) │
│ │
│ /fsx/lewis/git/hf/lighteval/src/lighteval/utils/cache_management.py:405 in wrapper │
│ │
│ 402 │ │ │ │ new_results = func(self, docs_not_cached, *args, **kwargs) │
│ 403 │ │ │ │ │
│ 404 │ │ │ │ # Store new results in file cache │
│ ❱ 405 │ │ │ │ cache.cache_samples( │
│ 406 │ │ │ │ │ docs=docs_not_cached, │
│ 407 │ │ │ │ │ results=new_results, │
│ 408 │ │ │ │ │ task_ids=task_ids, │
│ │
│ /fsx/lewis/git/hf/lighteval/src/lighteval/utils/cache_management.py:308 in cache_samples │
│ │
│ 305 │ │ │ task_id = self.get_task_id(doc.task_name, sampling_method) │
│ 306 │ │ │ sample = self._dump_sample(result) │
│ 307 │ │ │ │
│ ❱ 308 │ │ │ processed_data[task_id].append({"sample_id": doc.id, "sample": sample}) │
│ 309 │ │ processed_data = {task_id: task_data for task_id, task_data in processed_data.it │
│ 310 │ │ │
│ 311 │ │ # Concatenate it with existing data and save to file │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
KeyError: TaskID(task_name='extended|mixeval_hard:multichoice|0', task_hash='5029516ccc122911', sampling_method=<SamplingMethod.GENERATIVE: 'GENERATIVE'>)
[rank0]:[W1003 20:42:43.820905213 ProcessGroupNCCL.cpp:1479] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
[2025-10-03 20:42:43,783] [ ERROR]: Engine core proc EngineCore_0 died unexpectedly, shutting down client. (core_client.py:562)
To Reproduce
Run this
lighteval vllm "model_name=Qwen/Qwen3-4B-Instruct-2507" "extended|mixeval_hard:multichoice|0"
Expected behavior
MixEval works
Version info
absl-py==2.3.1
accelerate==1.10.1
aenum==3.1.15
aiohappyeyeballs==2.6.1
aiohttp==3.12.15
aiosignal==1.4.0
annotated-types==0.7.0
antlr4-python3-runtime==4.13.2
anyio==4.10.0
astor==0.8.1
attrs==25.3.0
blake3==1.0.6
blis==1.3.0
cachetools==6.2.0
catalogue==2.0.10
cbor2==5.7.0
certifi==2025.8.3
cffi==2.0.0
cfgv==3.4.0
chardet==5.2.0
charset-normalizer==3.4.3
click==8.3.0
cloudpathlib==0.22.0
cloudpickle==3.1.1
colorama==0.4.6
colorlog==6.9.0
compressed-tensors==0.10.2
confection==0.1.5
cupy-cuda12x==13.6.0
cymem==2.0.11
dataproperty==1.1.0
datasets==4.1.1
deepdiff==8.6.1
depyf==0.19.0
dill==0.4.0
diskcache==5.6.3
distlib==0.4.0
distro==1.9.0
dnspython==2.8.0
einops==0.8.1
email-validator==2.3.0
emoji==2.15.0
fastapi==0.117.1
fastapi-cli==0.0.13
fastapi-cloud-cli==0.2.0
fastrlock==0.8.3
filelock==3.19.1
frozendict==2.4.6
frozenlist==1.7.0
fsspec==2025.9.0
gguf==0.17.1
gitdb==4.0.12
gitpython==3.1.45
h11==0.16.0
hf-xet==1.1.10
httpcore==1.0.9
httptools==0.6.4
httpx==0.28.1
huggingface-hub==0.35.0
identify==2.6.14
idna==3.10
iniconfig==2.1.0
interegular==0.3.3
jieba==0.42.1
jinja2==3.1.6
jiter==0.11.0
joblib==1.5.2
jsonschema==4.25.1
jsonschema-specifications==2025.9.1
langcodes==3.5.0
langdetect==1.0.9
language-data==1.3.0
lark==1.2.2
latex2sympy2-extended==1.0.6
-e file:///fsx/lewis/git/hf/lighteval
llguidance==0.7.30
llvmlite==0.44.0
lm-format-enforcer==0.10.12
lxml==6.0.2
marisa-trie==1.3.1
markdown-it-py==4.0.0
markupsafe==3.0.2
mbstrdecoder==1.1.4
mdurl==0.1.2
mistral-common==1.8.5
more-itertools==10.8.0
mpmath==1.3.0
msgpack==1.1.1
msgspec==0.19.0
multidict==6.6.4
multiprocess==0.70.16
murmurhash==1.0.13
natto-py==1.0.1
networkx==3.5
ninja==1.13.0
nltk==3.9.1
nodeenv==1.9.1
numba==0.61.2
numpy==2.2.6
nvidia-cublas-cu12==12.6.4.1
nvidia-cuda-cupti-cu12==12.6.80
nvidia-cuda-nvrtc-cu12==12.6.77
nvidia-cuda-runtime-cu12==12.6.77
nvidia-cudnn-cu12==9.5.1.17
nvidia-cufft-cu12==11.3.0.4
nvidia-cufile-cu12==1.11.1.6
nvidia-curand-cu12==10.3.7.77
nvidia-cusolver-cu12==11.7.1.2
nvidia-cusparse-cu12==12.5.4.2
nvidia-cusparselt-cu12==0.6.3
nvidia-nccl-cu12==2.26.2
nvidia-nvjitlink-cu12==12.6.85
nvidia-nvtx-cu12==12.6.77
openai==1.108.1
openai-harmony==0.0.4
opencv-python-headless==4.12.0.88
orderly-set==5.5.0
outlines-core==0.2.10
packaging==25.0
pandas==2.3.2
partial-json-parser==0.2.1.1.post6
pathvalidate==3.3.1
pillow==11.3.0
pip==25.2
platformdirs==4.4.0
pluggy==1.6.0
portalocker==3.2.0
pre-commit==4.3.0
preshed==3.0.10
prometheus-client==0.23.1
prometheus-fastapi-instrumentator==7.1.0
propcache==0.3.2
protobuf==6.32.1
psutil==7.1.0
py-cpuinfo==9.0.0
pyarrow==21.0.0
pybase64==1.4.2
pycountry==24.6.1
pycparser==2.23
pydantic==2.11.9
pydantic-core==2.33.2
pydantic-extra-types==2.10.5
pygments==2.19.2
pytablewriter==1.2.1
pytest==8.4.2
pythainlp==5.1.2
python-crfsuite==0.9.11
python-dateutil==2.9.0.post0
python-dotenv==1.1.1
python-json-logger==3.3.0
python-multipart==0.0.20
pytz==2025.2
pyvi==0.1.1
pyyaml==6.0.2
pyzmq==27.1.0
ray==2.49.2
referencing==0.36.2
regex==2025.9.18
requests==2.32.5
rich==14.1.0
rich-toolkit==0.15.1
rignore==0.6.4
rouge-score==0.1.2
rpds-py==0.27.1
ruff==0.13.1
sacrebleu==2.5.1
safetensors==0.6.2
scikit-learn==1.7.2
scipy==1.16.2
sentencepiece==0.2.1
sentry-sdk==2.38.0
setproctitle==1.3.7
setuptools==80.9.0
shellingham==1.5.4
six==1.17.0
sklearn-crfsuite==0.5.0
smart-open==7.3.1
smmap==5.0.2
sniffio==1.3.1
soundfile==0.13.1
soxr==1.0.0
spacy==3.8.7
spacy-legacy==3.0.12
spacy-loggers==1.0.5
srsly==2.5.1
stanza==1.10.1
starlette==0.48.0
sudachidict-core==20250825
sudachipy==0.6.10
syllapy==0.7.2
sympy==1.14.0
tabledata==1.3.4
tabulate==0.9.0
tcolorpy==0.1.7
termcolor==2.3.0
thinc==8.3.6
threadpoolctl==3.6.0
tiktoken==0.11.0
tokenizers==0.22.1
torch==2.7.1
torchaudio==2.7.1
torchvision==0.22.1
tqdm==4.67.1
transformers==4.56.2
triton==3.3.1
typepy==1.3.4
typer==0.19.1
typing-extensions==4.15.0
typing-inspection==0.4.1
tzdata==2025.2
urllib3==2.5.0
uvicorn==0.36.0
uvloop==0.21.0
virtualenv==20.34.0
vllm==0.10.1.1
wasabi==1.1.3
watchfiles==1.1.0
weasel==0.4.1
websockets==15.0.1
wrapt==1.17.3
xformers==0.0.31
xgrammar==0.1.21
xxhash==3.5.0
yarl==1.20.1