Skip to content

[v0.9.1-dev]dynamic eplb metrics #2364

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 13 commits into
base: v0.9.1-dev
Choose a base branch
from

Conversation

baxingpiaochong
Copy link

@baxingpiaochong baxingpiaochong commented Aug 14, 2025

What this PR does / why we need it?

In the dynamic EPLB, expert heat is written to the metrics interface, allowing for observation of expert heat.
Metrics are written to the physical expert heat and the physical expert to logical expert mapping table. Because redundant experts prevent the logical expert from accurately observing the heat on each card, physical expert heat is recorded. The physical expert to logical expert mapping is also recorded to map heat back to the logical expert.
Physical expert heat is recorded using layer, phy_expert_id, and rank, representing the MoE layer, physical expert ID, and card, respectively. The physical expert to logical expert mapping table uses layer, log_expert_id, phy_expert_id, and rank, representing the MoE layer, logical expert ID, physical expert ID, and card, respectively. The mapping table uses a binding relationship: the log_expert_id indicator is 1 when the phy_expert_id is bound to it, and 0 otherwise.

After the test, the impact on the overall tpot is less than 1%.

Does this PR introduce any user-facing change?

How was this patch tested?

version vllm-ascend v0.9.1 dev
export DYNAMIC_EXPERT_LOAD_METRICS=1

image image

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces metrics for dynamic Expert Parallel Load Balancing (EPLB) to monitor expert heat and mappings. The changes involve creating a new logger class, EplbStatLogger, to handle Prometheus metrics, and integrating it into the model runner. The review identified a few high-severity issues: a non-daemon background thread that could prevent clean shutdowns, an incorrect implementation of a singleton's get_instance method, and improper handling of an environment variable for a boolean flag. These issues should be addressed to ensure correctness and robustness.

documentation="physical expert to logical expert per rank",
labelnames=labelnames_phy2log)

self.do_record_loop = threading.Thread(target=self.record_loop)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The background thread do_record_loop is not configured as a daemon thread. This can prevent the application from shutting down cleanly, as the non-daemon thread will keep the process alive. It's recommended to create it as a daemon thread to ensure the process can exit properly.

Suggested change
self.do_record_loop = threading.Thread(target=self.record_loop)
self.do_record_loop = threading.Thread(target=self.record_loop, daemon=True)

Comment on lines 83 to 87
def get_instance():
if EplbStatLogger._instance is None:
raise ValueError(
"ExpertLoadBalancer instance has not been initialized.")
return EplbStatLogger

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The get_instance method is implemented incorrectly. It returns the EplbStatLogger class object instead of the singleton instance _instance. Additionally, the ValueError message incorrectly refers to ExpertLoadBalancer instead of EplbStatLogger.

Suggested change
def get_instance():
if EplbStatLogger._instance is None:
raise ValueError(
"ExpertLoadBalancer instance has not been initialized.")
return EplbStatLogger
def get_instance():
if EplbStatLogger._instance is None:
raise ValueError(
"EplbStatLogger instance has not been initialized.")
return EplbStatLogger._instance

@@ -386,10 +388,12 @@ def __init__(self, vllm_config: VllmConfig, device: torch.device):

#EPLB
self.dynamic_eplb = ascend_config.dynamic_eplb
self.dynamic_eplb_metrics = os.getenv("DYNAMIC_EXPERT_LOAD_METRICS", False)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The os.getenv function returns a string. When checking for a boolean environment variable, simply checking the truthiness of the returned value can lead to unexpected behavior. For example, if DYNAMIC_EXPERT_LOAD_METRICS is set to "0", it will be evaluated as True. The value should be explicitly compared to a string like "1".

Suggested change
self.dynamic_eplb_metrics = os.getenv("DYNAMIC_EXPERT_LOAD_METRICS", False)
self.dynamic_eplb_metrics = os.getenv("DYNAMIC_EXPERT_LOAD_METRICS", "0") == "1"

@baxingpiaochong
Copy link
Author

ready for review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant