Skip to content

Conversation

@tanmoysrt
Copy link
Member

No description provided.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors server plan “machine unavailable” syncing to reduce redundant availability checks by batching plan instance-types per cluster and running the sync as a deduplicated background job.

Changes:

  • Enqueue the server-plan availability sync as a deduplicated background job and move the actual work into a private helper.
  • Sync availability in batches per cluster (single check call per cluster) instead of per plan.
  • Extend Cluster.check_machine_availability (and provider-specific helpers) to accept a list of machine types and return per-type availability results.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
press/press/doctype/server_plan/server_plan.py Enqueues sync job and batches plan availability checks by cluster to reduce repeated calls.
press/press/doctype/cluster/cluster.py Adds list/batch support to machine availability checks for AWS/OCI/Hetzner.
Comments suppressed due to low confidence (2)

press/press/doctype/server_plan/server_plan.py:129

  • The except block logs plan.name, but plan here is the loop variable from the inner for plan in cluster_plans[c] loop and may be undefined if the exception occurs before that loop runs (or may refer to an unrelated plan from a previous loop). This can mask the original exception with an UnboundLocalError and/or log the wrong context. Log using the cluster key (c) and/or include a safe identifier (e.g. first plan in cluster_plans[c] guarded by a check).
		except Exception:
			frappe.log_error(
				f"Failed to sync machine availability status for Server Plan {plan.name}",
			)

press/press/doctype/cluster/cluster.py:1149

  • New list/batch behavior for check_machine_availability (and the server-plan sync relying on it) isn’t covered by tests. There are existing cluster/server tests in the repo; consider adding unit tests for both str and list[str] inputs (including Hetzner and a non-(AWS/OCI/Hetzner) provider) to prevent regressions.
	@frappe.whitelist()
	def check_machine_availability(self, machine_type: str | list) -> bool | dict[str, bool]:
		"Check availability of machine in the region before allowing provision"
		if self.cloud_provider == "AWS EC2":
			return self._check_aws_machine_availability(machine_type)
		if self.cloud_provider == "OCI":
			return self._check_oci_machine_availability(machine_type)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +1132 to +1141
available_machine_ids = []
for dc in datacenters:
for st in dc.server_types.available:
if st.id == machine_id:
return True
return False
available_machine_ids.append(st.id)

results = {}
for m in machine_type if isinstance(machine_type, list) else [machine_type]:
results[m] = m in available_machine_ids

return results
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In _check_hetzner_machine_availability, the result computation is comparing the machine type name (m) against available_machine_ids (a list of numeric IDs), so availability will always be reported incorrectly. Use machine_type_id_map[m] (and default to False for missing entries) when checking membership, and consider using a set for available_machine_ids to avoid repeated linear scans.

Copilot uses AI. Check for mistakes.
Comment on lines +1137 to +1140
results = {}
for m in machine_type if isinstance(machine_type, list) else [machine_type]:
results[m] = m in available_machine_ids

Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_check_hetzner_machine_availability now always returns a dict (even when machine_type is a str), which is a behavior change from the previous bool return and will break existing call sites like if not cluster.check_machine_availability(instance_type):. Preserve the original behavior by returning a bool for str inputs (and only returning a dict for list inputs).

Suggested change
results = {}
for m in machine_type if isinstance(machine_type, list) else [machine_type]:
results[m] = m in available_machine_ids
# For a single machine type, return a boolean to preserve the original behavior.
if isinstance(machine_type, str):
machine_id = machine_type_id_map[machine_type]
return machine_id in available_machine_ids
# For a list of machine types, return a mapping of name -> availability.
results: dict[str, bool] = {}
for m in machine_type:
# If a machine type was not resolved earlier, treat it as unavailable.
machine_id = machine_type_id_map.get(m)
results[m] = machine_id in available_machine_ids

Copilot uses AI. Check for mistakes.
Comment on lines +116 to +119
plan_availability_results: dict[str, bool] = cluster.check_machine_availability(
[p["instance_type"] for p in cluster_plans[c]]
)

Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_sync_machine_availability_status_of_plans assumes cluster.check_machine_availability([...]) returns a dict and immediately calls .get(...). For clusters with cloud_provider not in (AWS EC2/OCI/Hetzner) check_machine_availability currently returns True, so this will raise an AttributeError and prevent syncing for those clusters (e.g. DigitalOcean/Generic). Update check_machine_availability to return a dict when passed a list for all providers, or defensively handle a bool return here.

Suggested change
plan_availability_results: dict[str, bool] = cluster.check_machine_availability(
[p["instance_type"] for p in cluster_plans[c]]
)
instance_types = [p["instance_type"] for p in cluster_plans[c]]
plan_availability_results = cluster.check_machine_availability(instance_types)
# Some Cluster implementations may return a boolean instead of a dict.
# Normalize to a dict so that the code below can safely use .get(...).
if isinstance(plan_availability_results, bool):
plan_availability_results = {
instance_type: plan_availability_results for instance_type in instance_types
}

Copilot uses AI. Check for mistakes.
is_unavailable = not cluster_doc_map[plan.cluster].check_machine_availability(plan.instance_type)
if is_unavailable != plan.machine_unavailable:
frappe.db.set_value("Server Plan", plan.name, "machine_unavailable", is_unavailable)
cluster: Cluster = frappe.get_doc("Cluster", c)
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cluster_doc_map is populated but not used in the second phase (the code re-fetches the cluster doc with frappe.get_doc("Cluster", c)). Either reuse cluster_doc_map[c] (and drop the extra DB fetch) or remove cluster_doc_map entirely to reduce complexity and queries.

Suggested change
cluster: Cluster = frappe.get_doc("Cluster", c)
cluster: Cluster = cluster_doc_map[c]

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

1 participant