Skip to content

Conversation

@noemotiovon
Copy link
Collaborator

@noemotiovon noemotiovon commented Aug 4, 2025

This commit adds support for executing ggml computational graphs using Huawei's ACL graph mode via the USE_ACL_GRAPH flag. The support can be enabled at compile time using the CMake option:

-DUSE_ACL_GRAPH=ON

By default, ACL graph execution is disabled, and the fallback path uses node-by-node execution.

Key additions:

  • CMake option to toggle graph mode
  • Graph capture and execution logic using
  • Tensor property matching to determine whether graph update is required
  • Safe fallback and logging if the environment variable LLAMA_SET_ROWS is unset or invalid

This prepares the backend for performance improvements in repetitive graph execution scenarios on Ascend devices.

@github-actions github-actions bot added ggml changes relating to the ggml tensor library for machine learning Ascend NPU issues specific to Ascend NPUs labels Aug 4, 2025
@noemotiovon
Copy link
Collaborator Author

noemotiovon commented Aug 4, 2025

Model: Qwen2.5-0.5B

with GGML_OP_CPY with GGM_OP_SET_ROWS
with ACL Graph fall back without ACL Graph 120tok/s
without ACL Graph 70tok/s 70tok/s

With ACL Graph Test:

user
Building a website can be done in 10 steps:
assistant
Certainly! Here are 10 steps to help you build a website:

1. **Define Your Purpose and Goals**
   - Identify what you want your website to do.
   - Determine your audience and what they want.

2. **Choose the Right Platform**
   - Select a web development platform that suits your needs.
   - Consider factors like stability, security, and scalability.

3. **Set Up a Domain Name**
   - Choose a domain name that is easy to remember and reflects your website.
   - Ensure it is available and easy to register.

4. **Choose a Domain Hosting Provider**
   - Select a reliable domain hosting service that meets your needs.
   - Consider features like speed, security, and scalability.

5. **Choose a Content Management System (CMS)**
   - Select a CMS that fits your needs and the complexity of your website.
   - Choose based on features, ease of use, and community support.

6. **Design Your Website**
   - Choose a design that is visually appealing and easy to navigate.
   - Consider the style, color scheme, and layout.

7. **Choose a Content Writer**
   - Hire a skilled content writer to create the content.
   - Consider the writer's skills, experience, and style.

8. **Choose a Website Builder**
   - Select a website builder that suits your needs.
   - Choose based on features, ease of use, and scalability.

9. **Implement Your Website**
   - Use your chosen content management system and website builder to create your website.
   - Test the website thoroughly for errors and usability.

10. **Launch and Promote Your Website**
    - Launch your website and start promoting it.
    - Engage with your audience through social media, email marketing, and other channels.

11. **Monitor and Update**
    - Monitor the performance of your website.
    - Keep your website updated with new content and features.

12. **Refine and Improve**
    - Regularly review and refine your website content and design.
    - Stay updated with the latest trends and best practices in web development.

These steps should help you build a website that meets your needs and helps drive traffic to your site.

> 
llama_perf_sampler_print:    sampling time =     117.75 ms /   469 runs   (    0.25 ms per token,  3982.91 tokens per second)
llama_perf_context_print:        load time =    7527.43 ms
llama_perf_context_print: prompt eval time =      40.20 ms /    20 tokens (    2.01 ms per token,   497.49 tokens per second)
llama_perf_context_print:        eval time =    3754.24 ms /   448 runs   (    8.38 ms per token,   119.33 tokens per second)
llama_perf_context_print:       total time =    4803.95 ms /   468 tokens
llama_perf_context_print:    graphs reused =        433

Copy link
Collaborator

@hipudding hipudding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution. Enabling acl_graph support greatly reduces NPU idle cycles, with particularly noticeable performance gains on models with smaller parameter sizes.
The code is basically fine; only a few minor changes are needed.

if (ggml_is_empty(node) || node->op == GGML_OP_NONE) {
continue;
#ifdef USE_CANN_GRAPH
bool use_cann_graph = true;
Copy link
Collaborator

@hipudding hipudding Aug 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think use_cann_graphis not necessary. You need only set cann_graph_update_required = true for both acl_graph on and off.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think use_cann_graph is still needed. Even when cann_graph is enabled, we may fall back to eager mode if LLAMA_SET_ROWS is not set. So we need a way to track that cann_graph is not being used in this case.

@hipudding hipudding changed the title feat(cann): add optional support for ACL Graph execution CANN: add optional support for ACL Graph execution Aug 5, 2025
This commit adds support for executing ggml computational graphs using
Huawei's ACL graph mode via the USE_CANN_GRAPH flag. The support can be
enabled at compile time using the CMake option:

    -DUSE_CANN_GRAPH=ON

By default, ACL graph execution is **disabled**, and the fallback path
uses node-by-node execution.

Key additions:
- CMake option  to toggle graph mode
- Graph capture and execution logic using
- Tensor property matching to determine whether graph update is required
- Safe fallback and logging if the environment variable LLAMA_SET_ROWS
  is unset or invalid

This prepares the backend for performance improvements in repetitive graph
execution scenarios on Ascend devices.

Signed-off-by: noemotiovon <[email protected]>
Signed-off-by: noemotiovon <[email protected]>
Signed-off-by: noemotiovon <[email protected]>
@hipudding hipudding merged commit 2241453 into ggml-org:master Aug 6, 2025
48 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Ascend NPU issues specific to Ascend NPUs ggml changes relating to the ggml tensor library for machine learning

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants