Skip to content

Conversation

@notsyncing
Copy link

What does this PR do?

Hello, I'm trying exporting a Qwen2.5 GPTQ model loaded with gptqmodel to openvino, and it errors:


INFO  ENV: Auto setting PYTORCH_CUDA_ALLOC_CONF='expandable_segments:True' for memory saving.                                                       
INFO  ENV: Auto setting CUDA_DEVICE_ORDER=PCI_BUS_ID for correctness.                                                                               
`low_cpu_mem_usage` was None, now default to True since model is quantized.
Sliding Window Attention is enabled but not implemented for `sdpa`; unexpected results may be encountered.
INFO   Kernel: Auto-selection: adding candidate `IPEXQuantLinear`                                                                                   
`loss_type=None` was set in the config but it is unrecognised.Using the default loss: `ForCausalLMLoss`.
Traceback (most recent call last):
  File "/var/home/sfc/Projects/azarrot-py312/.venv/bin/azarrot", line 10, in <module>
    sys.exit(main())
             ^^^^^^
  File "/var/home/sfc/Projects/azarrot-py312/src/azarrot/server.py", line 347, in main
    server = create_server()
             ^^^^^^^^^^^^^^^
  File "/var/home/sfc/Projects/azarrot-py312/src/azarrot/server.py", line 279, in create_server
    model_manager = ModelManager(config, backends)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/var/home/sfc/Projects/azarrot-py312/src/azarrot/models/model_manager.py", line 57, in __init__
    self.refresh_models()
  File "/var/home/sfc/Projects/azarrot-py312/src/azarrot/models/model_manager.py", line 204, in refresh_models
    model_info = backend.load_model(model)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/var/home/sfc/Projects/azarrot-py312/src/azarrot/backends/transformers_based_backend.py", line 157, in load_model
    transformers_model: Any = model_class.from_pretrained(
                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/var/home/sfc/Projects/azarrot-py312/.venv/lib/python3.12/site-packages/optimum/intel/openvino/modeling_base.py", line 485, in from_pretrained
    return super().from_pretrained(
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/var/home/sfc/Projects/azarrot-py312/.venv/lib/python3.12/site-packages/optimum/modeling_base.py", line 438, in from_pretrained
    return from_pretrained_method(
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/var/home/sfc/Projects/azarrot-py312/.venv/lib/python3.12/site-packages/optimum/intel/openvino/modeling_decoder.py", line 319, in _from_transformers
    main_export(
  File "/var/home/sfc/Projects/azarrot-py312/.venv/lib/python3.12/site-packages/optimum/exporters/openvino/__main__.py", line 386, in main_export
    model = TasksManager.get_model_from_task(
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/var/home/sfc/Projects/azarrot-py312/.venv/lib/python3.12/site-packages/optimum/exporters/tasks.py", line 2283, in get_model_from_task
    model = model_class.from_pretrained(model_name_or_path, **kwargs)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/var/home/sfc/Projects/azarrot-py312/.venv/lib/python3.12/site-packages/transformers/models/auto/auto_factory.py", line 564, in from_pretrained
    return model_class.from_pretrained(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/var/home/sfc/Projects/azarrot-py312/.venv/lib/python3.12/site-packages/transformers/modeling_utils.py", line 262, in _wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/var/home/sfc/Projects/azarrot-py312/.venv/lib/python3.12/site-packages/transformers/modeling_utils.py", line 4400, in from_pretrained
    hf_quantizer.postprocess_model(model, config=config)
  File "/var/home/sfc/Projects/azarrot-py312/.venv/lib/python3.12/site-packages/transformers/quantizers/base.py", line 207, in postprocess_model
    return self._process_model_after_weight_loading(model, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/var/home/sfc/Projects/azarrot-py312/.venv/lib/python3.12/site-packages/transformers/quantizers/quantizer_gptq.py", line 111, in _process_model_after_weight_loading
    model = self.optimum_quantizer.post_init_model(model)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/var/home/sfc/Projects/azarrot-py312/.venv/lib/python3.12/site-packages/optimum/exporters/openvino/__main__.py", line 345, in post_init_model
    from auto_gptq import exllama_set_max_input_length
ModuleNotFoundError: No module named 'auto_gptq'

I found that gptqmodel has the same method exllama_set_max_input_length as auto_gptq, so I just added an import trying gptqmodel first, and it works.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

Copy link
Member

@IlyasMoutawwakil IlyasMoutawwakil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the catch ! autogptq is being deprecated so we probably should also warn the user about that and advise to use gptqmodel instead.

@notsyncing notsyncing force-pushed the fix-gptqmodel-exporting branch from 8e1aab1 to 5d29d0f Compare May 6, 2025 13:39
@notsyncing notsyncing force-pushed the fix-gptqmodel-exporting branch from 5d29d0f to d642dde Compare May 9, 2025 14:26
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants