refactor(server-plan): Optimize machine plan availability #5026

tanmoysrt · 2026-02-09T09:49:46Z

No description provided.

Copilot

Pull request overview

This PR refactors server plan “machine unavailable” syncing to reduce redundant availability checks by batching plan instance-types per cluster and running the sync as a deduplicated background job.

Changes:

Enqueue the server-plan availability sync as a deduplicated background job and move the actual work into a private helper.
Sync availability in batches per cluster (single check call per cluster) instead of per plan.
Extend Cluster.check_machine_availability (and provider-specific helpers) to accept a list of machine types and return per-type availability results.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File	Description
`press/press/doctype/server_plan/server_plan.py`	Enqueues sync job and batches plan availability checks by cluster to reduce repeated calls.
`press/press/doctype/cluster/cluster.py`	Adds list/batch support to machine availability checks for AWS/OCI/Hetzner.

Comments suppressed due to low confidence (2)

press/press/doctype/server_plan/server_plan.py:129

The except block logs plan.name, but plan here is the loop variable from the inner for plan in cluster_plans[c] loop and may be undefined if the exception occurs before that loop runs (or may refer to an unrelated plan from a previous loop). This can mask the original exception with an UnboundLocalError and/or log the wrong context. Log using the cluster key (c) and/or include a safe identifier (e.g. first plan in cluster_plans[c] guarded by a check).

		except Exception:
			frappe.log_error(
				f"Failed to sync machine availability status for Server Plan {plan.name}",
			)

press/press/doctype/cluster/cluster.py:1149

New list/batch behavior for check_machine_availability (and the server-plan sync relying on it) isn’t covered by tests. There are existing cluster/server tests in the repo; consider adding unit tests for both str and list[str] inputs (including Hetzner and a non-(AWS/OCI/Hetzner) provider) to prevent regressions.

	@frappe.whitelist()
	def check_machine_availability(self, machine_type: str | list) -> bool | dict[str, bool]:
		"Check availability of machine in the region before allowing provision"
		if self.cloud_provider == "AWS EC2":
			return self._check_aws_machine_availability(machine_type)
		if self.cloud_provider == "OCI":
			return self._check_oci_machine_availability(machine_type)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-09T09:58:46Z

press/press/doctype/cluster/cluster.py

+		available_machine_ids = []
 		for dc in datacenters:
 			for st in dc.server_types.available:
-				if st.id == machine_id:
-					return True
-		return False
+				available_machine_ids.append(st.id)
+
+		results = {}
+		for m in machine_type if isinstance(machine_type, list) else [machine_type]:
+			results[m] = m in available_machine_ids
+
+		return results


In _check_hetzner_machine_availability, the result computation is comparing the machine type name (m) against available_machine_ids (a list of numeric IDs), so availability will always be reported incorrectly. Use machine_type_id_map[m] (and default to False for missing entries) when checking membership, and consider using a set for available_machine_ids to avoid repeated linear scans.

Copilot · 2026-02-09T09:58:47Z

press/press/doctype/cluster/cluster.py

+		results = {}
+		for m in machine_type if isinstance(machine_type, list) else [machine_type]:
+			results[m] = m in available_machine_ids
+


_check_hetzner_machine_availability now always returns a dict (even when machine_type is a str), which is a behavior change from the previous bool return and will break existing call sites like if not cluster.check_machine_availability(instance_type):. Preserve the original behavior by returning a bool for str inputs (and only returning a dict for list inputs).

Suggested change

results = {}

for m in machine_type if isinstance(machine_type, list) else [machine_type]:

results[m] = m in available_machine_ids

# For a single machine type, return a boolean to preserve the original behavior.

if isinstance(machine_type, str):

machine_id = machine_type_id_map[machine_type]

return machine_id in available_machine_ids

# For a list of machine types, return a mapping of name -> availability.

results: dict[str, bool] = {}

for m in machine_type:

# If a machine type was not resolved earlier, treat it as unavailable.

machine_id = machine_type_id_map.get(m)

results[m] = machine_id in available_machine_ids

Copilot · 2026-02-09T09:58:47Z

press/press/doctype/server_plan/server_plan.py

+			plan_availability_results: dict[str, bool] = cluster.check_machine_availability(
+				[p["instance_type"] for p in cluster_plans[c]]
+			)
+


_sync_machine_availability_status_of_plans assumes cluster.check_machine_availability([...]) returns a dict and immediately calls .get(...). For clusters with cloud_provider not in (AWS EC2/OCI/Hetzner) check_machine_availability currently returns True, so this will raise an AttributeError and prevent syncing for those clusters (e.g. DigitalOcean/Generic). Update check_machine_availability to return a dict when passed a list for all providers, or defensively handle a bool return here.

Suggested change

plan_availability_results: dict[str, bool] = cluster.check_machine_availability(

[p["instance_type"] for p in cluster_plans[c]]

)

instance_types = [p["instance_type"] for p in cluster_plans[c]]

plan_availability_results = cluster.check_machine_availability(instance_types)

# Some Cluster implementations may return a boolean instead of a dict.

# Normalize to a dict so that the code below can safely use .get(...).

if isinstance(plan_availability_results, bool):

plan_availability_results = {

instance_type: plan_availability_results for instance_type in instance_types

}

Copilot · 2026-02-09T09:58:48Z

press/press/doctype/server_plan/server_plan.py

-			is_unavailable = not cluster_doc_map[plan.cluster].check_machine_availability(plan.instance_type)
-			if is_unavailable != plan.machine_unavailable:
-				frappe.db.set_value("Server Plan", plan.name, "machine_unavailable", is_unavailable)
+			cluster: Cluster = frappe.get_doc("Cluster", c)


cluster_doc_map is populated but not used in the second phase (the code re-fetches the cluster doc with frappe.get_doc("Cluster", c)). Either reuse cluster_doc_map[c] (and drop the extra DB fetch) or remove cluster_doc_map entirely to reduce complexity and queries.

Suggested change

cluster: Cluster = frappe.get_doc("Cluster", c)

cluster: Cluster = cluster_doc_map[c]

mergify bot added the backport-master label Feb 9, 2026

refactor(server-plan): Optimize machine plan availability

6e9ba24

tanmoysrt force-pushed the server_plan_sync branch from 2920726 to 6e9ba24 Compare February 9, 2026 09:52

tanmoysrt added this to Sprint - Feb -2026 - FC Bugs / Errors / Enhancements / Docs Feb 9, 2026

tanmoysrt self-assigned this Feb 9, 2026

tanmoysrt requested a review from Copilot February 9, 2026 09:54

Copilot started reviewing on behalf of tanmoysrt February 9, 2026 09:55 View session

Copilot AI reviewed Feb 9, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(server-plan): Optimize machine plan availability #5026

refactor(server-plan): Optimize machine plan availability #5026

tanmoysrt commented Feb 9, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 9, 2026

Uh oh!

Copilot AI Feb 9, 2026

Uh oh!

Copilot AI Feb 9, 2026

Uh oh!

Copilot AI Feb 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

-		results = {}
-		for m in machine_type if isinstance(machine_type, list) else [machine_type]:
-			results[m] = m in available_machine_ids
+		# For a single machine type, return a boolean to preserve the original behavior.
+		if isinstance(machine_type, str):
+			machine_id = machine_type_id_map[machine_type]
+			return machine_id in available_machine_ids
+		# For a list of machine types, return a mapping of name -> availability.
+		results: dict[str, bool] = {}
+		for m in machine_type:
+			# If a machine type was not resolved earlier, treat it as unavailable.
+			machine_id = machine_type_id_map.get(m)
+			results[m] = machine_id in available_machine_ids

-			plan_availability_results: dict[str, bool] = cluster.check_machine_availability(
-				[p["instance_type"] for p in cluster_plans[c]]
-			)
+			instance_types = [p["instance_type"] for p in cluster_plans[c]]
+			plan_availability_results = cluster.check_machine_availability(instance_types)
+			# Some Cluster implementations may return a boolean instead of a dict.
+			# Normalize to a dict so that the code below can safely use .get(...).
+			if isinstance(plan_availability_results, bool):
+				plan_availability_results = {
+					instance_type: plan_availability_results for instance_type in instance_types
+				}

	cluster: Cluster = frappe.get_doc("Cluster", c)
	cluster: Cluster = cluster_doc_map[c]

refactor(server-plan): Optimize machine plan availability #5026

Are you sure you want to change the base?

refactor(server-plan): Optimize machine plan availability #5026

Conversation

tanmoysrt commented Feb 9, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant