Using enforce_eager=True in llama-3.1-8B

Hi,
How can I set the "enforced eager" mode in Llama-3.1-8B command line? For this command:

```
python -u main.py \
  --scenario Offline \
  --model-path $CHECKPOINT_PATH \
  --batch-size $BATCH_SIZE \
  --dtype bfloat16 \
  --user-conf user.conf \
  --total-sample-count 1 \
  --dataset-path $DATASET_PATH \
  --output-log-dir output \
  --tensor-parallel-size $GPU_COUNT \
  --vllm
```
I see this message:
```
INFO 12-08 09:22:13 gpu_executor.py:122] # GPU blocks: 28190, # CPU blocks: 2048
INFO 12-08 09:22:13 gpu_executor.py:126] Maximum concurrency for 131072 tokens per request: 3.44x
INFO 12-08 09:22:16 model_runner.py:1402] Capturing the model for CUDA graphs. This may lead to unexpected consequences if the model is not static. To run the model in eager mode, set 'enforce_eager=True' or use '--enforce-eager' in the CLI.
INFO 12-08 09:22:16 model_runner.py:1406] CUDA graphs can take additional 1~3 GiB memory per GPU. If you are running out of memory, consider decreasing `gpu_memory_utilization` or enforcing eager mode. You can also reduce the `max_num_seqs` as needed to decrease memory usage.

```
I tried using `-vllm --enforce-eager` in the python command but apparently there is no such option for that.

I also tried setting that option manually in main.py as below:
```
   if args.vllm:
        sut = sut_cls(
            model_path=args.model_path,
            dtype=args.dtype,
            batch_size=args.batch_size,
            dataset_path=args.dataset_path,
            total_sample_count=args.total_sample_count,
            workers=args.num_workers,
            tensor_parallel_size=args.tensor_parallel_size,
            enforce_eager=True               <=======
        )
```
But I get this error:
```
  File "/mnt/users/m/inference/language/llama3.1-8b/main.py", line 173, in main
    sut = sut_cls(
          ^^^^^^^^
TypeError: SUT.__init__() got an unexpected keyword argument 'enforce_eager'
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Using enforce_eager=True in llama-3.1-8B #2408

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Using enforce_eager=True in llama-3.1-8B #2408

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions