-
Notifications
You must be signed in to change notification settings - Fork 337
refactor(server-plan): Optimize machine plan availability #5026
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
2920726 to
6e9ba24
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR refactors server plan “machine unavailable” syncing to reduce redundant availability checks by batching plan instance-types per cluster and running the sync as a deduplicated background job.
Changes:
- Enqueue the server-plan availability sync as a deduplicated background job and move the actual work into a private helper.
- Sync availability in batches per cluster (single check call per cluster) instead of per plan.
- Extend
Cluster.check_machine_availability(and provider-specific helpers) to accept a list of machine types and return per-type availability results.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
press/press/doctype/server_plan/server_plan.py |
Enqueues sync job and batches plan availability checks by cluster to reduce repeated calls. |
press/press/doctype/cluster/cluster.py |
Adds list/batch support to machine availability checks for AWS/OCI/Hetzner. |
Comments suppressed due to low confidence (2)
press/press/doctype/server_plan/server_plan.py:129
- The
exceptblock logsplan.name, butplanhere is the loop variable from the innerfor plan in cluster_plans[c]loop and may be undefined if the exception occurs before that loop runs (or may refer to an unrelated plan from a previous loop). This can mask the original exception with an UnboundLocalError and/or log the wrong context. Log using the cluster key (c) and/or include a safe identifier (e.g. first plan incluster_plans[c]guarded by a check).
except Exception:
frappe.log_error(
f"Failed to sync machine availability status for Server Plan {plan.name}",
)
press/press/doctype/cluster/cluster.py:1149
- New list/batch behavior for
check_machine_availability(and the server-plan sync relying on it) isn’t covered by tests. There are existing cluster/server tests in the repo; consider adding unit tests for bothstrandlist[str]inputs (including Hetzner and a non-(AWS/OCI/Hetzner) provider) to prevent regressions.
@frappe.whitelist()
def check_machine_availability(self, machine_type: str | list) -> bool | dict[str, bool]:
"Check availability of machine in the region before allowing provision"
if self.cloud_provider == "AWS EC2":
return self._check_aws_machine_availability(machine_type)
if self.cloud_provider == "OCI":
return self._check_oci_machine_availability(machine_type)
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| available_machine_ids = [] | ||
| for dc in datacenters: | ||
| for st in dc.server_types.available: | ||
| if st.id == machine_id: | ||
| return True | ||
| return False | ||
| available_machine_ids.append(st.id) | ||
|
|
||
| results = {} | ||
| for m in machine_type if isinstance(machine_type, list) else [machine_type]: | ||
| results[m] = m in available_machine_ids | ||
|
|
||
| return results |
Copilot
AI
Feb 9, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In _check_hetzner_machine_availability, the result computation is comparing the machine type name (m) against available_machine_ids (a list of numeric IDs), so availability will always be reported incorrectly. Use machine_type_id_map[m] (and default to False for missing entries) when checking membership, and consider using a set for available_machine_ids to avoid repeated linear scans.
| results = {} | ||
| for m in machine_type if isinstance(machine_type, list) else [machine_type]: | ||
| results[m] = m in available_machine_ids | ||
|
|
Copilot
AI
Feb 9, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_check_hetzner_machine_availability now always returns a dict (even when machine_type is a str), which is a behavior change from the previous bool return and will break existing call sites like if not cluster.check_machine_availability(instance_type):. Preserve the original behavior by returning a bool for str inputs (and only returning a dict for list inputs).
| results = {} | |
| for m in machine_type if isinstance(machine_type, list) else [machine_type]: | |
| results[m] = m in available_machine_ids | |
| # For a single machine type, return a boolean to preserve the original behavior. | |
| if isinstance(machine_type, str): | |
| machine_id = machine_type_id_map[machine_type] | |
| return machine_id in available_machine_ids | |
| # For a list of machine types, return a mapping of name -> availability. | |
| results: dict[str, bool] = {} | |
| for m in machine_type: | |
| # If a machine type was not resolved earlier, treat it as unavailable. | |
| machine_id = machine_type_id_map.get(m) | |
| results[m] = machine_id in available_machine_ids |
| plan_availability_results: dict[str, bool] = cluster.check_machine_availability( | ||
| [p["instance_type"] for p in cluster_plans[c]] | ||
| ) | ||
|
|
Copilot
AI
Feb 9, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_sync_machine_availability_status_of_plans assumes cluster.check_machine_availability([...]) returns a dict and immediately calls .get(...). For clusters with cloud_provider not in (AWS EC2/OCI/Hetzner) check_machine_availability currently returns True, so this will raise an AttributeError and prevent syncing for those clusters (e.g. DigitalOcean/Generic). Update check_machine_availability to return a dict when passed a list for all providers, or defensively handle a bool return here.
| plan_availability_results: dict[str, bool] = cluster.check_machine_availability( | |
| [p["instance_type"] for p in cluster_plans[c]] | |
| ) | |
| instance_types = [p["instance_type"] for p in cluster_plans[c]] | |
| plan_availability_results = cluster.check_machine_availability(instance_types) | |
| # Some Cluster implementations may return a boolean instead of a dict. | |
| # Normalize to a dict so that the code below can safely use .get(...). | |
| if isinstance(plan_availability_results, bool): | |
| plan_availability_results = { | |
| instance_type: plan_availability_results for instance_type in instance_types | |
| } |
| is_unavailable = not cluster_doc_map[plan.cluster].check_machine_availability(plan.instance_type) | ||
| if is_unavailable != plan.machine_unavailable: | ||
| frappe.db.set_value("Server Plan", plan.name, "machine_unavailable", is_unavailable) | ||
| cluster: Cluster = frappe.get_doc("Cluster", c) |
Copilot
AI
Feb 9, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cluster_doc_map is populated but not used in the second phase (the code re-fetches the cluster doc with frappe.get_doc("Cluster", c)). Either reuse cluster_doc_map[c] (and drop the extra DB fetch) or remove cluster_doc_map entirely to reduce complexity and queries.
| cluster: Cluster = frappe.get_doc("Cluster", c) | |
| cluster: Cluster = cluster_doc_map[c] |
No description provided.