在单个模型对多个partitioned数据集做eval的时候，模型会被重复加载 #1088

IcyFeather233 · 2024-04-25T07:20:36Z

IcyFeather233
Apr 25, 2024

例如这样的配置：

judge_models = [
    dict(
        type=LmdeployPytorchModel,
        abbr='Qwen1.5-14B-Chat',
        path='/xxx/Qwen1.5-14B-Chat',
        engine_config=dict(session_len=2048,
                        max_batch_size=8),
        gen_config=dict(top_p=0.9,
                        temperature=0.1,
                        max_new_tokens=1024),
        max_out_len=1024,
        max_seq_len=2048,
        concurrency=8,
        meta_template=_meta_template,
        batch_size=GPU_NUMS * 8,
        end_str='<|im_end|>',
        run_cfg=dict(num_gpus=GPU_NUMS, num_procs=1),
    )
]

## single evaluation
eval = dict(
    partitioner=dict(type=SubjectiveSizePartitioner, strategy='split', max_task_size=10000, mode='singlescore', models=models, judge_models=judge_models),
    runner=dict(type=LocalRunner, max_num_workers=32, task=dict(type=SubjectiveEvalTask)),
)

summarizer = dict(type=MTBenchSummarizer, judge_type='single')

MTBench数据集被切成了三个task，运行发现每次运行一个task，都会重复加载模型，造成时间上的浪费：

2024-04-25 15:13:31,711 - lmdeploy - INFO - Checking environment for PyTorch Engine.
2024-04-25 15:13:32,450 - lmdeploy - INFO - Checking model.
Loading checkpoint shards: 100%|██████████| 8/8 [01:04<00:00,  8.11s/it]
2024-04-25 15:14:51,715 - lmdeploy - INFO - build CacheEngine with config:CacheConfig(block_size=64, num_cpu_blocks=81, num_gpu_blocks=779, window_size=-1, cache_max_entry_count=0.8, max_prefill_token_num=4096)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[2024-04-25 15:14:54,057] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|██████████| 2/2 [00:44<00:00, 22.12s/it]
04/25 15:15:38 - OpenCompass - INFO - Task [Qwen1.5-14B-Chat/mtbench_0.0]
04/25 15:15:38 - OpenCompass - INFO - time elapsed: 150.39s
04/25 15:15:45 - OpenCompass - DEBUG - Get class `SubjectiveEvalTask` from "task" registry in "opencompass"
04/25 15:15:45 - OpenCompass - DEBUG - An `SubjectiveEvalTask` instance is built from registry, and its implementation can be found in opencompass.tasks.subjective_eval
Error: mkl-service + Intel(R) MKL: MKL_THREADING_LAYER=INTEL is incompatible with libgomp-a34b3233.so.1 library.
        Try to import numpy first or set the threading layer accordingly. Set MKL_SERVICE_FORCE_INTEL to force it.
2024-04-25 15:16:37,318 - lmdeploy - INFO - Checking environment for PyTorch Engine.
2024-04-25 15:16:37,596 - lmdeploy - INFO - Checking model.
Loading checkpoint shards: 100%|██████████| 8/8 [01:06<00:00,  8.34s/it]
2024-04-25 15:17:59,429 - lmdeploy - INFO - build CacheEngine with config:CacheConfig(block_size=64, num_cpu_blocks=81, num_gpu_blocks=779, window_size=-1, cache_max_entry_count=0.8, max_prefill_token_num=4096)
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[2024-04-25 15:18:01,755] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
100%|██████████| 1/1 [00:15<00:00, 15.35s/it]
04/25 15:18:17 - OpenCompass - INFO - Task [Qwen1.5-14B-Chat/mtbench_0.1]
04/25 15:18:17 - OpenCompass - INFO - time elapsed: 125.36s
04/25 15:18:22 - OpenCompass - DEBUG - Get class `SubjectiveEvalTask` from "task" registry in "opencompass"
04/25 15:18:22 - OpenCompass - DEBUG - An `SubjectiveEvalTask` instance is built from registry, and its implementation can be found in opencompass.tasks.subjective_eval

但是我没完全看懂代码，找不到在哪里修改TAT 不过我觉得这个问题应该被修复掉

bittersweet1999 · 2024-04-25T07:31:06Z

bittersweet1999
Apr 25, 2024
Collaborator

oc设计上是这样的，这种方式更有利于抢占集群的资源。如果你对MTBench这种比较小的数据集想要不切分的话，我建议用这个config： https://github.com/open-compass/opencompass/blob/main/configs/datasets/subjective/multiround/mtbench_single_judge.py 他只会启动一个任务，这是因为原版的mtbench他根据题目类型划分成了三种不同温度的设定，也就是不同的子数据集需要用不同的温度来推理，所以为了和原版保持一致我们也做了这样的划分。但是实际上我们实测下来即使都设成同一温度也没有多少影响（这一点在最新的arenahard上也使用了统一的温度），比不上mtbench本身的bias，所以你可以直接用统一温度的config推理，这样就不会切分了。 By the way，arenahard数据集我们也支持完了，很快会提pr

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

在单个模型对多个partitioned数据集做eval的时候，模型会被重复加载 #1088

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

在单个模型对多个partitioned数据集做eval的时候，模型会被重复加载 #1088

Uh oh!

IcyFeather233 Apr 25, 2024

Replies: 1 comment

Uh oh!

bittersweet1999 Apr 25, 2024 Collaborator

IcyFeather233
Apr 25, 2024

bittersweet1999
Apr 25, 2024
Collaborator