Skip to content

Commit 2fc937e

Browse files
committed
Check domain stage before returning RUNNING status
Additional fix for continuous polling issue. The Space can be in RUNNING stage (Docker container started) but the domain might not be ready yet (DNS propagating, routing not configured). This caused premature RUNNING status reports. Changes: - Only return DeploymentStatus.RUNNING when BOTH conditions are met: 1. runtime.stage == SpaceStage.RUNNING 2. domains[0]['stage'] == "READY" - Space RUNNING + domain not ready → PENDING status - Only set deployment URL when domain stage is READY - Added domain_stage to metadata for debugging This ensures health checks only run when the domain is actually ready to receive traffic, not just when the Docker container has started. Note: If polling continues after domain is READY, it likely means the FastAPI app inside the container is still initializing. The base deployer will continue polling until the /health endpoint responds with 200 OK.
1 parent f58c1fd commit 2fc937e

File tree

1 file changed

+17
-9
lines changed

1 file changed

+17
-9
lines changed

src/zenml/integrations/huggingface/deployers/huggingface_deployer.py

Lines changed: 17 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -501,9 +501,15 @@ def do_get_deployment_state(
501501
runtime = api.get_space_runtime(repo_id=space_id)
502502

503503
# Map HuggingFace Space stages to ZenML standard deployment states
504-
# Only RUNNING means fully provisioned with health endpoint available
504+
# Only RUNNING + domain READY means fully provisioned with health endpoint available
505505
if runtime.stage == SpaceStage.RUNNING:
506-
status = DeploymentStatus.RUNNING
506+
# Check if domain is also ready (not just Space running)
507+
domains = runtime.raw.get("domains", [])
508+
if domains and domains[0].get("stage") == "READY":
509+
status = DeploymentStatus.RUNNING
510+
else:
511+
# Space is running but domain not ready yet (DNS propagating, etc.)
512+
status = DeploymentStatus.PENDING
507513
# Building/updating states - health endpoint not yet available
508514
elif runtime.stage in [
509515
SpaceStage.BUILDING,
@@ -529,22 +535,24 @@ def do_get_deployment_state(
529535
# Unknown/future stages
530536
status = DeploymentStatus.UNKNOWN
531537

532-
# Get deployment URL from Space domains (only available when RUNNING)
538+
# Get deployment URL from Space domains (only when fully ready)
533539
url = None
534-
if status == DeploymentStatus.RUNNING and runtime.raw.get(
535-
"domains"
536-
):
537-
# Extract the first domain (primary domain for the Space)
540+
domain_stage = None
541+
if runtime.raw.get("domains"):
538542
domains = runtime.raw.get("domains", [])
539-
if domains and domains[0].get("domain"):
540-
url = f"https://{domains[0]['domain']}"
543+
if domains:
544+
domain_stage = domains[0].get("stage")
545+
# Only set URL if domain is ready for traffic
546+
if domain_stage == "READY" and domains[0].get("domain"):
547+
url = f"https://{domains[0]['domain']}"
541548

542549
return DeploymentOperationalState(
543550
status=status,
544551
url=url,
545552
metadata={
546553
"space_id": space_id,
547554
"external_state": runtime.stage,
555+
"domain_stage": domain_stage,
548556
},
549557
)
550558

0 commit comments

Comments
 (0)