-
Notifications
You must be signed in to change notification settings - Fork 335
[v0.9.1-dev]dynamic eplb metrics #2364
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: v0.9.1-dev
Are you sure you want to change the base?
[v0.9.1-dev]dynamic eplb metrics #2364
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces metrics for dynamic Expert Parallel Load Balancing (EPLB) to monitor expert heat and mappings. The changes involve creating a new logger class, EplbStatLogger
, to handle Prometheus metrics, and integrating it into the model runner. The review identified a few high-severity issues: a non-daemon background thread that could prevent clean shutdowns, an incorrect implementation of a singleton's get_instance
method, and improper handling of an environment variable for a boolean flag. These issues should be addressed to ensure correctness and robustness.
vllm_ascend/eplb/eplb_loggers.py
Outdated
documentation="physical expert to logical expert per rank", | ||
labelnames=labelnames_phy2log) | ||
|
||
self.do_record_loop = threading.Thread(target=self.record_loop) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The background thread do_record_loop
is not configured as a daemon thread. This can prevent the application from shutting down cleanly, as the non-daemon thread will keep the process alive. It's recommended to create it as a daemon thread to ensure the process can exit properly.
self.do_record_loop = threading.Thread(target=self.record_loop) | |
self.do_record_loop = threading.Thread(target=self.record_loop, daemon=True) |
vllm_ascend/eplb/eplb_loggers.py
Outdated
def get_instance(): | ||
if EplbStatLogger._instance is None: | ||
raise ValueError( | ||
"ExpertLoadBalancer instance has not been initialized.") | ||
return EplbStatLogger |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The get_instance
method is implemented incorrectly. It returns the EplbStatLogger
class object instead of the singleton instance _instance
. Additionally, the ValueError
message incorrectly refers to ExpertLoadBalancer
instead of EplbStatLogger
.
def get_instance(): | |
if EplbStatLogger._instance is None: | |
raise ValueError( | |
"ExpertLoadBalancer instance has not been initialized.") | |
return EplbStatLogger | |
def get_instance(): | |
if EplbStatLogger._instance is None: | |
raise ValueError( | |
"EplbStatLogger instance has not been initialized.") | |
return EplbStatLogger._instance |
@@ -386,10 +388,12 @@ def __init__(self, vllm_config: VllmConfig, device: torch.device): | |||
|
|||
#EPLB | |||
self.dynamic_eplb = ascend_config.dynamic_eplb | |||
self.dynamic_eplb_metrics = os.getenv("DYNAMIC_EXPERT_LOAD_METRICS", False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The os.getenv
function returns a string. When checking for a boolean environment variable, simply checking the truthiness of the returned value can lead to unexpected behavior. For example, if DYNAMIC_EXPERT_LOAD_METRICS
is set to "0"
, it will be evaluated as True
. The value should be explicitly compared to a string like "1"
.
self.dynamic_eplb_metrics = os.getenv("DYNAMIC_EXPERT_LOAD_METRICS", False) | |
self.dynamic_eplb_metrics = os.getenv("DYNAMIC_EXPERT_LOAD_METRICS", "0") == "1" |
ready for review |
What this PR does / why we need it?
In the dynamic EPLB, expert heat is written to the metrics interface, allowing for observation of expert heat.
Metrics are written to the physical expert heat and the physical expert to logical expert mapping table. Because redundant experts prevent the logical expert from accurately observing the heat on each card, physical expert heat is recorded. The physical expert to logical expert mapping is also recorded to map heat back to the logical expert.
Physical expert heat is recorded using layer, phy_expert_id, and rank, representing the MoE layer, physical expert ID, and card, respectively. The physical expert to logical expert mapping table uses layer, log_expert_id, phy_expert_id, and rank, representing the MoE layer, logical expert ID, physical expert ID, and card, respectively. The mapping table uses a binding relationship: the log_expert_id indicator is 1 when the phy_expert_id is bound to it, and 0 otherwise.
After the test, the impact on the overall tpot is less than 1%.
Does this PR introduce any user-facing change?
How was this patch tested?
version vllm-ascend v0.9.1 dev
export DYNAMIC_EXPERT_LOAD_METRICS=1