Skip to content

Using enforce_eager=True in llama-3.1-8B #2408

@mahmoodn

Description

@mahmoodn

Hi,
How can I set the "enforced eager" mode in Llama-3.1-8B command line? For this command:

python -u main.py \
  --scenario Offline \
  --model-path $CHECKPOINT_PATH \
  --batch-size $BATCH_SIZE \
  --dtype bfloat16 \
  --user-conf user.conf \
  --total-sample-count 1 \
  --dataset-path $DATASET_PATH \
  --output-log-dir output \
  --tensor-parallel-size $GPU_COUNT \
  --vllm

I see this message:

INFO 12-08 09:22:13 gpu_executor.py:122] # GPU blocks: 28190, # CPU blocks: 2048
INFO 12-08 09:22:13 gpu_executor.py:126] Maximum concurrency for 131072 tokens per request: 3.44x
INFO 12-08 09:22:16 model_runner.py:1402] Capturing the model for CUDA graphs. This may lead to unexpected consequences if the model is not static. To run the model in eager mode, set 'enforce_eager=True' or use '--enforce-eager' in the CLI.
INFO 12-08 09:22:16 model_runner.py:1406] CUDA graphs can take additional 1~3 GiB memory per GPU. If you are running out of memory, consider decreasing `gpu_memory_utilization` or enforcing eager mode. You can also reduce the `max_num_seqs` as needed to decrease memory usage.

I tried using -vllm --enforce-eager in the python command but apparently there is no such option for that.

I also tried setting that option manually in main.py as below:

   if args.vllm:
        sut = sut_cls(
            model_path=args.model_path,
            dtype=args.dtype,
            batch_size=args.batch_size,
            dataset_path=args.dataset_path,
            total_sample_count=args.total_sample_count,
            workers=args.num_workers,
            tensor_parallel_size=args.tensor_parallel_size,
            enforce_eager=True               <=======
        )

But I get this error:

  File "/mnt/users/m/inference/language/llama3.1-8b/main.py", line 173, in main
    sut = sut_cls(
          ^^^^^^^^
TypeError: SUT.__init__() got an unexpected keyword argument 'enforce_eager'

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions