Skip to content

Commit 3c7cb66

Browse files
Add EXAONE-Deep (NVIDIA#3054)
Signed-off-by: yechank <[email protected]> Co-authored-by: QI JUN <[email protected]>
1 parent e6cb34d commit 3c7cb66

File tree

9 files changed

+62
-43
lines changed

9 files changed

+62
-43
lines changed

examples/exaone/README.md

Lines changed: 29 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,9 @@ See the LLaMA example [`examples/llama`](../llama) for details.
77

88
- [EXAONE](#exaone)
99
- [Support Matrix](#support-matrix)
10-
- [Download model checkpoints](#download-model-checkpoints)
10+
- [Supported Models](#supported-models)
11+
- [EXAONE-3.0](#exaone-30)
12+
- [EXAONE-Deep](#exaone-deep)
1113
- [Usage](#usage)
1214
- [Convert checkpoint and build TensorRT engine(s)](#convert-checkpoint-and-build-tensorrt-engines)
1315
- [FP8 Post-Training Quantization](#fp8-post-training-quantization)
@@ -25,12 +27,23 @@ See the LLaMA example [`examples/llama`](../llama) for details.
2527
* INT8 SmoothQuant
2628
* INT4 AWQ & W4A8 AWQ
2729

28-
## Download model checkpoints
30+
## Supported Models
31+
### EXAONE-3.0
2932

30-
First, download the HuggingFace FP32 checkpoints of EXAONE model.
33+
Download the HuggingFace FP32 checkpoints of EXAONE-3.0 model. We support EXAONE-3.0 families but here, we only use the `EXAONE-3.0-7.8B-Instruct` model for the example.
3134

3235
```bash
33-
git clone https://huggingface.co/LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct hf_models/exaone
36+
export HF_MODEL_DIR=hf_models/exaone
37+
git clone https://huggingface.co/LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct $HF_MODEL_DIR
38+
```
39+
40+
### EXAONE-Deep
41+
42+
Download the HuggingFace BF16 checkpoints of EXAONE-Deep model. Here, we only use the `EXAONE-Deep-2.4B` model for the example. We can use the same procedure as EXAONE-3.0 to convert the weights and build the TensorRT engine.
43+
44+
```bash
45+
export HF_MODEL_DIR=hf_models/exaone_deep
46+
git clone https://huggingface.co/LGAI-EXAONE/EXAONE-Deep-2.4B $HF_MODEL_DIR
3447
```
3548

3649
## Usage
@@ -43,7 +56,7 @@ The next section describe how to convert the weights from the [HuggingFace (HF)
4356

4457
# Build the EXAONE model using a single GPU and FP16.
4558
python ../llama/convert_checkpoint.py \
46-
--model_dir hf_models/exaone \
59+
--model_dir $HF_MODEL_DIR \
4760
--output_dir trt_models/exaone/fp16/1-gpu \
4861
--dtype float16
4962

@@ -54,7 +67,7 @@ trtllm-build \
5467

5568
# Build the EXAONE model using a single GPU and and apply INT8 weight-only quantization.
5669
python ../llama/convert_checkpoint.py \
57-
--model_dir hf_models/exaone \
70+
--model_dir $HF_MODEL_DIR \
5871
--output_dir trt_models/exaone/int8_wq/1-gpu \
5972
--use_weight_only \
6073
--weight_only_precision int8 \
@@ -67,7 +80,7 @@ trtllm-build \
6780

6881
# Build the EXAONE model using a single GPU and and apply INT4 weight-only quantization.
6982
python ../llama/convert_checkpoint.py \
70-
--model_dir hf_models/exaone \
83+
--model_dir $HF_MODEL_DIR \
7184
--output_dir trt_models/exaone/int4_wq/1-gpu \
7285
--use_weight_only \
7386
--weight_only_precision int4 \
@@ -78,9 +91,9 @@ trtllm-build \
7891
--output_dir trt_engines/exaone/int4_wq/1-gpu \
7992
--gemm_plugin auto
8093

81-
# Build the EXAONE model using using 2-way tensor parallelism and FP16.
94+
# Build the EXAONE model using 2-way tensor parallelism and FP16.
8295
python ../llama/convert_checkpoint.py \
83-
--model_dir hf_models/exaone \
96+
--model_dir $HF_MODEL_DIR \
8497
--output_dir trt_models/exaone/fp16/2-gpu \
8598
--tp_size 2 \
8699
--dtype float16
@@ -101,7 +114,7 @@ First make sure Modelopt toolkit is installed (see [examples/quantization/README
101114
```bash
102115
# Build the EXAONE model using a single GPU and and apply FP8 quantization.
103116
python ../quantization/quantize.py \
104-
--model_dir hf_models/exaone \
117+
--model_dir $HF_MODEL_DIR \
105118
--dtype float16 \
106119
--qformat fp8 \
107120
--kv_cache_dtype fp8 \
@@ -122,7 +135,7 @@ First make sure Modelopt toolkit is installed (see [examples/quantization/README
122135
```bash
123136
# Build the EXAONE model using a single GPU and and apply INT8 SmoothQuant.
124137
python ../quantization/quantize.py \
125-
--model_dir hf_models/exaone \
138+
--model_dir $HF_MODEL_DIR \
126139
--dtype float16 \
127140
--qformat int8_sq \
128141
--output_dir trt_models/exaone/int8_sq/1-gpu
@@ -142,7 +155,7 @@ First make sure Modelopt toolkit is installed (see [examples/quantization/README
142155
```bash
143156
# Build the EXAONE model using a single GPU and and apply INT4 AWQ.
144157
python ../quantization/quantize.py \
145-
--model_dir hf_models/exaone \
158+
--model_dir $HF_MODEL_DIR \
146159
--dtype float16 \
147160
--qformat int4_awq \
148161
--output_dir trt_models/exaone/int4_awq/1-gpu
@@ -161,7 +174,7 @@ Please make sure your system contains a Hopper GPU before trying the commands be
161174
```bash
162175
# Build the EXAONE model using a single GPU and and apply W4A8 AWQ.
163176
python ../quantization/quantize.py \
164-
--model_dir hf_models/exaone \
177+
--model_dir $HF_MODEL_DIR \
165178
--dtype float16 \
166179
--qformat w4a8_awq \
167180
--output_dir trt_models/exaone/w4a8_awq/1-gpu
@@ -180,21 +193,21 @@ Test your engine with the [run.py](../run.py) script:
180193
python3 ../run.py \
181194
--input_text "When did the first world war end?" \
182195
--max_output_len=100 \
183-
--tokenizer_dir hf_models/exaone \
196+
--tokenizer_dir $HF_MODEL_DIR \
184197
--engine_dir trt_engines/exaone/fp16/1-gpu
185198

186199
# Run with 2 GPUs
187200
mpirun -n 2 --allow-run-as-root \
188201
python3 ../run.py \
189202
--input_text "When did the first world war end?" \
190203
--max_output_len=100 \
191-
--tokenizer_dir hf_models/exaone \
204+
--tokenizer_dir $HF_MODEL_DIR \
192205
--engine_dir trt_engines/exaone/fp16/2-gpu
193206

194207
python ../summarize.py \
195208
--test_trt_llm \
196209
--data_type fp16 \
197-
--hf_model_dir hf_models/exaone \
210+
--hf_model_dir $HF_MODEL_DIR \
198211
--engine_dir trt_engines/exaone/fp16/1-gpu
199212
```
200213

tensorrt_llm/models/llama/model.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -478,7 +478,7 @@ def from_hugging_face(
478478
}
479479
elif "vila" in model_name:
480480
hf_model_dir += "/llm"
481-
elif "exaone" in model_name:
481+
elif "exaone" in model_name.lower():
482482
custom_dict = {
483483
"transformer": "transformer",
484484
"layers": "h",

tests/integration/defs/.test_durations

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -168,8 +168,8 @@
168168
"examples/test_draft_target_model.py::test_llm_draft_target_model_1gpu[streaming-gpt2-use_cpp_session-use_logits-draft_len_4-float16-bs2]": 222.54111004807055,
169169
"examples/test_enc_dec.py::test_llm_enc_dec_general[compare_hf-bart-large-cnn-float32-enable_gemm_plugin-enable_attention_plugin-enable_paged_kv_cache-tp:1-pp:1-nb:2-disable_fp8]": 203.55354792065918,
170170
"examples/test_enc_dec.py::test_llm_enc_dec_general[compare_hf-t5-small-float32-enable_gemm_plugin-enable_attention_plugin-enable_paged_kv_cache-tp:1-pp:1-nb:1-disable_fp8]": 189.6864925120026,
171-
"examples/test_exaone.py::test_llm_exaone_1gpu[disable_weight_only-exaone-float16-nb:1]": 473.8068177103996,
172-
"examples/test_exaone.py::test_llm_exaone_1gpu[disable_weight_only-exaone-float16-nb:4]": 205.28752172738314,
171+
"examples/test_exaone.py::test_llm_exaone_1gpu[disable_weight_only-exaone_3.0_7.8b_instruct-float16-nb:1]": 473.8068177103996,
172+
"examples/test_exaone.py::test_llm_exaone_1gpu[disable_weight_only-exaone_3.0_7.8b_instruct-float16-nb:4]": 205.28752172738314,
173173
"examples/test_multimodal.py::test_llm_multimodal_general[deplot-pp:1-tp:1-float16-bs:1-cpp_e2e:False-nb:1]": 179.15185776166618,
174174
"examples/test_prompt_lookup.py::test_llm_prompt_lookup_1gpu[no_streaming-gpt2-use_cpp_session-use_tokens-max_matching_ngram_size_2-prompt_lookup_num_tokens_8-float16-bs1]": 233.80333462916315,
175175
"examples/test_qwen.py::test_llm_qwen_single_gpu_summary[qwen2_0.5b_instruct-enable_paged_kv_cache-enable_remove_input_padding-enable_weight_only-enable_fmha]": 123.65003899484873,

tests/integration/defs/conftest.py

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -453,10 +453,13 @@ def llm_exaone_model_root(request) -> str:
453453
"Get EXAONE model root"
454454
models_root = llm_models_root()
455455
assert models_root, "Did you set LLM_MODELS_ROOT?"
456-
assert request.param == "exaone", "Is the name of model root is exaone?"
457456

458-
exaone_model_root = os.path.join(models_root, request.param)
459-
assert exists(exaone_model_root), f"{exaone_model_root} does not exist!"
457+
exaone_model_root = os.path.join(models_root, "exaone")
458+
if hasattr(request, "param"):
459+
if request.param == "exaone_3.0_7.8b_instruct":
460+
exaone_model_root = os.path.join(models_root, "exaone")
461+
elif request.param == "exaone_deep_2.4b":
462+
exaone_model_root = os.path.join(models_root, "EXAONE-Deep-2.4B")
460463

461464
return exaone_model_root
462465

tests/integration/defs/examples/test_exaone.py

Lines changed: 11 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,9 @@
2323
@pytest.mark.parametrize("num_beams", [1, 2, 4],
2424
ids=lambda num_beams: f'nb:{num_beams}')
2525
@pytest.mark.parametrize("data_type", ['bfloat16', 'float16'])
26-
@pytest.mark.parametrize("llm_exaone_model_root", ['exaone'], indirect=True)
26+
@pytest.mark.parametrize("llm_exaone_model_root",
27+
['exaone_3.0_7.8b_instruct', 'exaone_deep_2.4b'],
28+
indirect=True)
2729
@pytest.mark.parametrize("use_weight_only", [True, False],
2830
ids=["enable_weight_only", "disable_weight_only"])
2931
def test_llm_exaone_1gpu(data_type, exaone_example_root, llm_exaone_model_root,
@@ -44,13 +46,11 @@ def test_llm_exaone_1gpu(data_type, exaone_example_root, llm_exaone_model_root,
4446
data_type=data_type,
4547
use_weight_only=use_weight_only)
4648

47-
# TODO: Should we add use_weight_only_groupwise_quant_matmul_plugin?
48-
4949
build_cmd = [
50-
"trtllm-build", f"--checkpoint_dir={model_dir}",
51-
f"--output_dir={engine_dir}", f"--gpt_attention_plugin={data_type}",
52-
f"--gemm_plugin={data_type}", f"--max_beam_width={num_beams}",
53-
"--max_batch_size=256"
50+
"trtllm-build",
51+
f"--checkpoint_dir={model_dir}",
52+
f"--output_dir={engine_dir}",
53+
f"--max_beam_width={num_beams}",
5454
]
5555
check_call(" ".join(build_cmd), shell=True, env=llm_venv._new_env)
5656

@@ -80,7 +80,9 @@ def test_llm_exaone_1gpu(data_type, exaone_example_root, llm_exaone_model_root,
8080
@pytest.mark.parametrize("num_beams", [1],
8181
ids=lambda num_beams: f'nb:{num_beams}')
8282
@pytest.mark.parametrize("data_type", ['float16'])
83-
@pytest.mark.parametrize("llm_exaone_model_root", ['exaone'], indirect=True)
83+
@pytest.mark.parametrize("llm_exaone_model_root",
84+
['exaone_3.0_7.8b_instruct', 'exaone_deep_2.4b'],
85+
indirect=True)
8486
def test_llm_exaone_2gpu(data_type, exaone_example_root, llm_exaone_model_root,
8587
llama_example_root, llm_datasets_root, llm_rouge_root,
8688
llm_venv, cmodel_dir, engine_dir, num_beams):
@@ -102,8 +104,7 @@ def test_llm_exaone_2gpu(data_type, exaone_example_root, llm_exaone_model_root,
102104

103105
build_cmd = [
104106
"trtllm-build", f"--checkpoint_dir={model_dir}",
105-
f"--output_dir={engine_dir}", f"--gpt_attention_plugin={data_type}",
106-
f"--gemm_plugin={data_type}", f"--max_beam_width={num_beams}"
107+
f"--output_dir={engine_dir}", f"--max_beam_width={num_beams}"
107108
]
108109
check_call(" ".join(build_cmd), shell=True, env=llm_venv._new_env)
109110

tests/integration/test_lists/qa/examples_test_list.txt

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -38,10 +38,11 @@ examples/test_enc_dec.py::test_llm_enc_dec_general[compare_hf-t5-small-float32-e
3838
examples/test_enc_dec.py::test_llm_enc_dec_general[compare_hf-t5-small-float32-enable_gemm_plugin-enable_attention_plugin-enable_paged_kv_cache-tp:2-pp:1-nb:1-enable_fp8]
3939
examples/test_enc_dec.py::test_llm_enc_dec_general[no_compare_hf-byt5-small-float32-enable_gemm_plugin-enable_attention_plugin-enable_paged_kv_cache-tp:1-pp:1-nb:1-enable_fp8]
4040
examples/test_enc_dec.py::test_llm_enc_dec_general[no_compare_hf-byt5-small-float32-enable_gemm_plugin-enable_attention_plugin-enable_paged_kv_cache-tp:2-pp:1-nb:1-disable_fp8]
41-
examples/test_exaone.py::test_llm_exaone_1gpu[disable_weight_only-exaone-float16-nb:1]
42-
examples/test_exaone.py::test_llm_exaone_1gpu[disable_weight_only-exaone-float16-nb:4]
43-
examples/test_exaone.py::test_llm_exaone_1gpu[enable_weight_only-exaone-float16-nb:1]
44-
examples/test_exaone.py::test_llm_exaone_2gpu[exaone-float16-nb:1]
41+
examples/test_exaone.py::test_llm_exaone_1gpu[disable_weight_only-exaone_3.0_7.8b_instruct-float16-nb:1]
42+
examples/test_exaone.py::test_llm_exaone_1gpu[disable_weight_only-exaone_3.0_7.8b_instruct-float16-nb:4]
43+
examples/test_exaone.py::test_llm_exaone_1gpu[disable_weight_only-exaone_3.0_7.8b_instruct-float16-nb:4]
44+
examples/test_exaone.py::test_llm_exaone_1gpu[enable_weight_only-exaone_deep_2.4b-float16-nb:1]
45+
examples/test_exaone.py::test_llm_exaone_2gpu[exaone_3.0_7.8b_instruct-float16-nb:1]
4546
examples/test_gemma.py::test_llm_gemma_1gpu_summary[gemma-2-27b-it-other-bfloat16-8]
4647
examples/test_gemma.py::test_llm_hf_gemma_quantization_1gpu[gemma-2-27b-it-fp8-bfloat16-8]
4748
examples/test_gemma.py::test_hf_gemma_fp8_base_bf16_multi_lora[gemma-2-9b-it]

tests/integration/test_lists/qa/llm_sanity_test.txt

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -16,8 +16,8 @@ examples/test_enc_dec.py::test_llm_enc_dec_general[compare_hf-flan-t5-small-floa
1616
examples/test_enc_dec.py::test_llm_enc_dec_general[compare_hf-mbart-large-50-many-to-one-mmt-float16-enable_gemm_plugin-enable_attention_plugin-enable_paged_kv_cache-tp:2-pp:2-nb:1-enable_fp8]
1717
examples/test_enc_dec.py::test_llm_enc_dec_general[compare_hf-t5-small-float32-enable_gemm_plugin-enable_attention_plugin-enable_paged_kv_cache-tp:2-pp:1-nb:1-enable_fp8]
1818
examples/test_enc_dec.py::test_llm_enc_dec_general[no_compare_hf-byt5-small-float32-enable_gemm_plugin-enable_attention_plugin-enable_paged_kv_cache-tp:2-pp:1-nb:1-disable_fp8]
19-
examples/test_exaone.py::test_llm_exaone_1gpu[enable_weight_only-exaone-float16-nb:1]
20-
examples/test_exaone.py::test_llm_exaone_2gpu[exaone-float16-nb:1]
19+
examples/test_exaone.py::test_llm_exaone_1gpu[enable_weight_only-exaone_3.0_7.8b_instruct-float16-nb:1]
20+
examples/test_exaone.py::test_llm_exaone_2gpu[exaone_3.0_7.8b_instruct-float16-nb:1]
2121
examples/test_gemma.py::test_llm_gemma_1gpu_summary[gemma-2-27b-it-other-bfloat16-8]
2222
examples/test_gpt.py::test_llm_gpt_starcoder_lora_1gpu[peft-lora-starcoder2-15b-unity-copilot-starcoder2-lora_fp16-base_fp8]
2323
examples/test_gpt.py::test_streaming_beam[batch_size_3-return_all_generated_tokens-num_beams_4]

tests/integration/test_lists/test-db/l0_a30.yml

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -103,9 +103,10 @@ l0_a30:
103103
- examples/test_mistral.py::test_llm_mistral_v1_1gpu[mistral-7b-v0.1-float16-max_attention_window_size_4096-summarization] # 5 mins
104104
- examples/test_mistral.py::test_llm_mistral_v1_1gpu[mistral-7b-v0.1-float16-max_attention_window_size_4096-summarization_long] # 6 mins
105105
- examples/test_enc_dec.py::test_llm_enc_dec_general[compare_hf-t5-small-float32-enable_gemm_plugin-enable_attention_plugin-enable_paged_kv_cache-tp:1-pp:1-nb:2-disable_fp8]
106-
- examples/test_exaone.py::test_llm_exaone_1gpu[disable_weight_only-exaone-float16-nb:1] # ? mins
107-
- examples/test_exaone.py::test_llm_exaone_1gpu[disable_weight_only-exaone-float16-nb:4] # ? mins
108-
- examples/test_exaone.py::test_llm_exaone_2gpu[exaone-float16-nb:1] # ? mins
106+
- examples/test_exaone.py::test_llm_exaone_1gpu[disable_weight_only-exaone_3.0_7.8b_instruct-float16-nb:1] # ? mins
107+
- examples/test_exaone.py::test_llm_exaone_1gpu[disable_weight_only-exaone_3.0_7.8b_instruct-float16-nb:4] # ? mins
108+
- examples/test_exaone.py::test_llm_exaone_1gpu[disable_weight_only-exaone_deep_2.4b-float16-nb:4]
109+
- examples/test_exaone.py::test_llm_exaone_2gpu[exaone_3.0_7.8b_instruct-float16-nb:1] # ? mins
109110
- examples/test_granite.py::test_llm_granite[granite-3.0-2b-instruct-bfloat16] # 5 mins
110111
- examples/test_draft_target_model.py::test_llm_draft_target_model_1gpu[no_streaming-gpt2-use_cpp_session-use_tokens-draft_len_4-float16-bs2] # 1 min
111112
- examples/test_draft_target_model.py::test_llm_draft_target_model_1gpu[no_streaming-gpt2-use_cpp_session-use_logits-draft_len_4-float16-bs2] # 1 min

tests/integration/test_lists/waives.txt

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -111,7 +111,7 @@ full:B200_PCIe/examples/test_enc_dec.py::test_llm_enc_dec_general[compare_hf-bar
111111
full:B200_PCIe/examples/test_enc_dec.py::test_llm_enc_dec_general[compare_hf-bart-large-cnn-float16-disable_gemm_plugin-enable_attention_plugin-enable_paged_kv_cache-tp:1-pp:1-nb:1] SKIP (Disable for Blackwell)
112112
full:B200_PCIe/examples/test_enc_dec.py::test_llm_enc_dec_general[compare_hf-bart-large-cnn-float16-disable_gemm_plugin-enable_attention_plugin-disable_paged_kv_cache-tp:1-pp:1-nb:1] SKIP (Disable for Blackwell)
113113
full:B200_PCIe/examples/test_enc_dec.py::test_llm_enc_dec_general[compare_hf-mbart-large-50-many-to-one-mmt-float16-enable_gemm_plugin-enable_attention_plugin-disable_paged_kv_cache-tp:1-pp:1-nb:1] SKIP (Disable for Blackwell)
114-
full:B200_PCIe/examples/test_exaone.py::test_llm_exaone_1gpu[enable_weight_only-exaone-float16-nb:1] SKIP (Disable for Blackwell)
114+
full:B200_PCIe/examples/test_exaone.py::test_llm_exaone_1gpu[enable_weight_only-exaone_3.0_7.8b_instruct-float16-nb:1] SKIP (Disable for Blackwell)
115115
full:B200_PCIe/examples/test_gpt.py::test_llm_gpt2_next_prompt_tuning_1gpu SKIP (Disable for Blackwell)
116116
full:B200_PCIe/examples/test_gpt.py::test_llm_gpt_starcoder_lora_1gpu[peft-lora-starcoder2-15b-unity-copilot-starcoder2-lora_fp16-base_fp16] SKIP (Disable for Blackwell)
117117
full:B200_PCIe/examples/test_gpt.py::test_llm_gpt_starcoder_lora_1gpu[peft-lora-starcoder2-15b-unity-copilot-starcoder2-lora_fp16-base_fp8] SKIP (Disable for Blackwell)
@@ -224,7 +224,7 @@ full:B200/examples/test_enc_dec.py::test_llm_enc_dec_general[compare_hf-bart-lar
224224
full:B200/examples/test_enc_dec.py::test_llm_enc_dec_general[compare_hf-bart-large-cnn-float16-disable_gemm_plugin-enable_attention_plugin-enable_paged_kv_cache-tp:1-pp:1-nb:1] SKIP (Disable for Blackwell)
225225
full:B200/examples/test_enc_dec.py::test_llm_enc_dec_general[compare_hf-bart-large-cnn-float16-disable_gemm_plugin-enable_attention_plugin-disable_paged_kv_cache-tp:1-pp:1-nb:1] SKIP (Disable for Blackwell)
226226
full:B200/examples/test_enc_dec.py::test_llm_enc_dec_general[compare_hf-mbart-large-50-many-to-one-mmt-float16-enable_gemm_plugin-enable_attention_plugin-disable_paged_kv_cache-tp:1-pp:1-nb:1] SKIP (Disable for Blackwell)
227-
full:B200/examples/test_exaone.py::test_llm_exaone_1gpu[enable_weight_only-exaone-float16-nb:1] SKIP (Disable for Blackwell)
227+
full:B200/examples/test_exaone.py::test_llm_exaone_1gpu[enable_weight_only-exaone_3.0_7.8b_instruct-float16-nb:1] SKIP (Disable for Blackwell)
228228
full:B200/examples/test_gpt.py::test_llm_gpt2_next_prompt_tuning_1gpu SKIP (Disable for Blackwell)
229229
full:B200/examples/test_gpt.py::test_llm_gpt_starcoder_lora_1gpu[peft-lora-starcoder2-15b-unity-copilot-starcoder2-lora_fp16-base_fp16] SKIP (Disable for Blackwell)
230230
full:B200/examples/test_gpt.py::test_llm_gpt_starcoder_lora_1gpu[peft-lora-starcoder2-15b-unity-copilot-starcoder2-lora_fp16-base_fp8] SKIP (Disable for Blackwell)

0 commit comments

Comments
 (0)