skypilot/utils.py: create_docker_run_command passes -e KEY without value, breaking env vars under sudo

### Contact Details [Optional]

arrobaouassim@gmail.com

### System Information

```
ZENML_LOCAL_VERSION: 0.94.1
ZENML_SERVER_VERSION: 0.94.1
ZENML_SERVER_DATABASE: mysql
ZENML_SERVER_DEPLOYMENT_TYPE: other
ZENML_CONFIG_DIR: /home/<user>/.config/zenml
ZENML_LOCAL_STORE_DIR: /home/<user>/.config/zenml/local_stores
ZENML_SERVER_URL: <redacted>
ZENML_ACTIVE_REPOSITORY_ROOT: /home/<user>/dev/<project>
PYTHON_VERSION: 3.13.12
ENVIRONMENT: wsl
SYSTEM_INFO: {'os': 'linux', 'linux_distro': 'ubuntu', 'linux_distro_like': 'debian', 'linux_distro_version': '24.04'}
ACTIVE_PROJECT: default
ACTIVE_STACK: cloud_stack
ACTIVE_USER: <redacted>
TELEMETRY_STATUS: enabled
ANALYTICS_CLIENT_ID: <redacted>
ANALYTICS_USER_ID: <redacted>
ANALYTICS_SERVER_ID: <redacted>
INTEGRATIONS: ['scipy', 'numpy', 'kaniko', 'kubernetes', 'wandb', 'airflow', 's3', 'sklearn', 'pandas', 'pillow']

CURRENT STACK

Name: cloud_stack
ID: <redacted>
User: <redacted>

IMAGE_BUILDER: local_builder

Name: local_builder
ID: <redacted>
Type: image_builder
Flavor: local
Configuration: {}
User: <redacted>

EXPERIMENT_TRACKER: wandb_tracker

Name: wandb_tracker
ID: <redacted>
Type: experiment_tracker
Flavor: wandb
Configuration: {'api_key': '********', 'entity': '<redacted>', 'project_name': '<redacted>'}
User: <redacted>

ORCHESTRATOR: skypilot_gcp

Name: skypilot_gcp
ID: <redacted>
Type: orchestrator
Flavor: vm_gcp
Configuration: {'region': 'us-east4', 'idle_minutes_to_autostop': 30, 'down': True, 'stream_logs': True, 'project': '<gcp-project>'}
User: <redacted>

CONTAINER_REGISTRY: artifact_registry

Name: artifact_registry
ID: <redacted>
Type: container_registry
Flavor: gcp
Configuration: {'uri': '<artifact-registry-uri>'}
User: <redacted>

ARTIFACT_STORE: gcs_store

Name: gcs_store
ID: <redacted>
Type: artifact_store
Flavor: gcp
Configuration: {'path': 'gs://<bucket>/zenml'}
User: <redacted>
```

### What happened?

> *I'll be honest, this is something Claude debugged for me, so I'm not entirely sure it isn't a misconfiguration on my end. That said, the root cause traces to a specific line in the integration, so I'm filing it in case it helps others.*

Filling in the checkboxes I missed when creating this issue via CLI.

**ZenML version:** `0.94.1`
**Stack:** `vm_gcp` orchestrator + GCP Artifact Registry (containerized steps)

---

## Describe the bug

In `zenml/integrations/skypilot/utils.py`, the function `create_docker_run_command()` generates environment flags in the form `-e KEY` (key only, no value):

```python
docker_environment_str = " ".join(
    f"-e {shlex.quote(k)}" for k in environment
)
```

Docker's `-e KEY` syntax means *"inherit this variable from the calling shell environment."* This works fine **without `sudo`**.

However, `skypilot_base_vm_orchestrator.py` calls this function with `use_sudo=True` (hardcoded), and `sudo` resets the environment by default — so every `-e KEY` flag passes an empty/unset variable into the container.

**Effect:** the container receives none of the ZenML configuration variables (`ZENML_STORE_URL`, `ZENML_STORE_TYPE`, auth tokens, etc.), falls back to a local SQLite store, and immediately crashes:

```text
ModuleNotFoundError: No module named 'sqlalchemy_utils'
```

This only affects the `vm_gcp` orchestrator path. The `vm_kubernetes` path takes a different branch (runs Python directly in a virtualenv and never calls `create_docker_run_command()`), which is likely why this went unnoticed.

---

## Expected behavior

The container should receive all environment variables and connect to the remote ZenML server as configured.

---

## Proposed fix

Use `-e KEY=VALUE` to pass values explicitly, bypassing `sudo`'s environment reset:

```python
docker_environment_str = " ".join(
    f"-e {shlex.quote(k)}={shlex.quote(str(v))}"
    for k, v in environment.items()
)
```


### Reproduction steps

1. Set up a ZenML stack with a `vm_gcp` SkyPilot orchestrator and a container registry
2. Run any pipeline that uses a Docker image (i.e. the stack has a container registry configured)
3. The orchestrator submits `sudo docker run -e ZENML_STORE_URL ...` on the GCP VM
4. Observe that the container starts with no ZenML configuration

### Relevant log output

```shell
ModuleNotFoundError: No module named 'sqlalchemy_utils'

And a few lines above it in the traceback:

KeyError: 'ZENML_STORE_URL'
```

### Code of Conduct

- [x] I agree to follow this project's Code of Conduct

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

skypilot/utils.py: create_docker_run_command passes -e KEY without value, breaking env vars under sudo #4652

Contact Details [Optional]

System Information

What happened?

Describe the bug

Expected behavior

Proposed fix

Reproduction steps

Relevant log output

Code of Conduct

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

skypilot/utils.py: create_docker_run_command passes -e KEY without value, breaking env vars under sudo #4652

Description

Contact Details [Optional]

System Information

What happened?

Describe the bug

Expected behavior

Proposed fix

Reproduction steps

Relevant log output

Code of Conduct

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions