Fail early on Windows for vLLM-based cloud-edge LLM example #316

iashutoshyadav · 2026-02-03T13:11:39Z

What this PR does

Adds an early platform check for the cloud-edge collaborative LLM example.

Why this is needed

The example depends on vLLM, which does not support Windows or CPU-only
environments. Previously, running the example on Windows resulted in a
late ModuleNotFoundError. This change fails early with a clear, actionable
error message.

Tested

Windows 11 (CPU-only)
Verified that the example fails early with a clear RuntimeError

Fixes #310

kubeedge-bot · 2026-02-03T13:11:50Z

Welcome @iashutoshyadav! It looks like this is your first PR to kubeedge/ianvs 🎉

kubeedge-bot · 2026-02-03T13:11:53Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: iashutoshyadav
To complete the pull request process, please assign jaypume after the PR has been reviewed.
You can assign the PR to them by writing /assign @jaypume in a comment when ready.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

gemini-code-assist · 2026-02-03T13:12:02Z

Summary of Changes

Hello @iashutoshyadav, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the robustness and user experience of the cloud-edge collaborative LLM example by implementing proactive platform and dependency validation. It ensures that the example gracefully exits with informative error messages when executed in unsupported environments, such as Windows, or when essential dependencies like vLLM are not installed, thereby preventing cryptic failures later in the execution flow.

Highlights

Early Platform Checks: Introduced checks at the start of vllm_llm.py to detect unsupported environments before execution proceeds.
Windows OS Detection: The example now explicitly raises a RuntimeError if run on Windows, as vLLM does not support this operating system.
vLLM Dependency Verification: A check for the vllm package installation ensures it is present, failing early with a RuntimeError if missing.
Improved Error Messaging: Users now receive clear, actionable error messages, preventing cryptic ModuleNotFoundError exceptions later in the process.
Code Structure and Readability: The vllm_llm.py file has been refactored with clear sections for platform guards, safe imports, and environment variables, along with more concise docstrings.
Git Ignore Updates: Added entries to .gitignore for ianvs-env/ (virtual environment) and test_vllm_import.py (local verification script).

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

.gitignore
- Added ianvs-env/ to the ignore list for virtual environments.
- Added test_vllm_import.py to the ignore list for local verification scripts.
examples/cloud-edge-collaborative-inference-for-llm/testalgorithms/query-routing/models/vllm_llm.py
- Implemented an early check using platform.system() to raise a RuntimeError if the script is run on Windows.
- Added an early check using importlib.util.find_spec("vllm") to raise a RuntimeError if the vllm package is not installed.
- Grouped initial imports and environment variable settings into distinct, commented sections for clarity.
- Updated method docstrings (__init__, _load, warmup, _infer, cleanup) to be more concise and descriptive.
- Replaced BaseLLM.__init__(self, **kwargs) with super().__init__(**kwargs) for better inheritance practice.
- Added a trailing comma to max_model_len=8192 and max_tokens=self.max_tokens for consistent formatting.
- Removed redundant comments like "Completion Text" and "Prompt Token Count" within the _infer method.
- Ensured the cleanup method's exception handling is robust.

Activity

The pull request was created by iashutoshyadav.
The author provided a clear description outlining the purpose ("Adds an early platform check"), the rationale ("vLLM... does not support Windows or CPU-only environments... resulted in a late ModuleNotFoundError"), and testing details ("Windows 11 (CPU-only)", "Verified that the example fails early").
The PR addresses and fixes issue Cloud-edge LLM example fails on Windows due to undocumented vllm dependency #310.
No further review comments or activities have been recorded.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces an important improvement by adding platform and dependency checks to the vLLM-based example, allowing it to fail early on unsupported environments like Windows. This is a great step towards a better user experience.

My review includes a few suggestions:

To fully address the environment requirements mentioned in the PR description (CPU-only is not supported), I've recommended adding a check for CUDA availability. This will ensure a GPU is present, providing an even clearer error message.
I've also pointed out that several docstrings were simplified to the point of losing valuable information about parameters. I've suggested restoring some of these details to maintain code clarity and ease of use for future developers.

The other changes, such as using super() and cleaning up the _infer method, are positive and improve the code. Overall, this is a solid contribution.

gemini-code-assist · 2026-02-03T13:13:48Z

...s/cloud-edge-collaborative-inference-for-llm/testalgorithms/query-routing/models/vllm_llm.py

+import platform
+import importlib.util
+
+if platform.system() == "Windows":
+    raise RuntimeError(
+        "Cloud-edge LLM example requires Linux + GPU. "
+        "The vLLM backend is not supported on Windows."
+    )
+
+if importlib.util.find_spec("vllm") is None:
+    raise RuntimeError(
+        "vLLM is required for the cloud-edge LLM example but is not installed. "
+        "Please run this example on Linux with GPU support and install vllm."
+    )


The PR description mentions that vLLM doesn't support CPU-only environments. To fully address this and fail early, it would be beneficial to also check for CUDA availability. This ensures the environment has a GPU, as required.

This suggestion moves the torch import into the guard block to perform this check. The now-redundant import torch on line 37 should be removed.

import platform import importlib.util import torch if platform.system() == "Windows": raise RuntimeError( "Cloud-edge LLM example requires Linux + GPU. " "The vLLM backend is not supported on Windows." ) if importlib.util.find_spec("vllm") is None: raise RuntimeError( "vLLM is required for the cloud-edge LLM example but is not installed. " "Please run this example on Linux with GPU support and install vllm." ) if not torch.cuda.is_available(): raise RuntimeError( "Cloud-edge LLM example requires a GPU, but CUDA is not available. " "Please run this example on a machine with a GPU and CUDA installed." )

gemini-code-assist · 2026-02-03T13:13:48Z

...s/cloud-edge-collaborative-inference-for-llm/testalgorithms/query-routing/models/vllm_llm.py

+        """Initialize the VllmLLM class"""

-            See details about special parameters in [vLLM's Named Arguments](https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html).
-        """
-
-        BaseLLM.__init__(self, **kwargs)
+        super().__init__(**kwargs)

        self.tensor_parallel_size = kwargs.get("tensor_parallel_size", 1)
        self.gpu_memory_utilization = kwargs.get("gpu_memory_utilization", 0.8)

    def _load(self, model):
-        """Load the model via vLLM API
+        """Load the model via vLLM API"""

-        Parameters
-        ----------
-        model : str
-            Hugging Face style model name. Example: `Qwen/Qwen2.5-0.5B-Instruct`
-        """
        self.model = LLM(
            model=model,
            trust_remote_code=True,
            dtype="float16",
            tensor_parallel_size=self.tensor_parallel_size,
            gpu_memory_utilization=self.gpu_memory_utilization,
-            max_model_len = 8192
-            #quantization=self.quantization # TODO need to align with vllm API
+            max_model_len=8192,
+            # quantization=self.quantization  # TODO align with vLLM API
        )

        self.sampling_params = SamplingParams(
            temperature=self.temperature,
            top_p=self.top_p,
            repetition_penalty=self.repetition_penalty,
-            max_tokens=self.max_tokens
+            max_tokens=self.max_tokens,
        )

-        # Warmup to make metrics more accurate
+        # Warmup for accurate metrics
        self.warmup()

    def warmup(self):
-        """Warm up the Model for more accurate performance metrics
-        """
+        """Warm up the model"""

        try:
            self.model.chat(
                [{"role": "user", "content": "Hello"}],
                self.sampling_params,
-                use_tqdm=False
+                use_tqdm=False,
            )
        except Exception as e:
            raise RuntimeError(f"Warmup failed: {e}")
-        
+
    def _infer(self, messages):
-        """Call the vLLM Offline Inference API to get the response
-
-        Parameters
-        ----------
-        messages : list
-            OpenAI style message chain. Example:
-        ```
-        [{"role": "user", "content": "Hello, how are you?"}]
-        ```
-
-        Returns
-        -------
-        dict
-            Formatted Response. See `_format_response()` for more details.
-        """
+        """Run inference using vLLM"""


This pull request simplifies many of the docstrings in this class (__init__, _load, _infer). While conciseness is good, the previous docstrings contained valuable information about parameters (e.g., model in _load, special kwargs in __init__) and return values that are important for maintainability and understanding the code's contract. Please consider restoring these details to the docstrings to help future developers.

Signed-off-by: Ashutosh Yadav <ashutosh2213072@akgec.ac.in>

kubeedge-bot requested review from Poorunga and hsj576 February 3, 2026 13:11

kubeedge-bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Feb 3, 2026

gemini-code-assist bot reviewed Feb 3, 2026

View reviewed changes

iashutoshyadav added 3 commits February 3, 2026 19:13

Fail early on Windows for vLLM-based cloud-edge LLM example

15e8c8a

Signed-off-by: Ashutosh Yadav <ashutosh2213072@akgec.ac.in>

Ignore local virtual environment and test script

b5e8a28

Signed-off-by: Ashutosh Yadav <ashutosh2213072@akgec.ac.in>

Add CUDA availability check and restore detailed docstrings

e118c33

Signed-off-by: Ashutosh Yadav <ashutosh2213072@akgec.ac.in>

iashutoshyadav force-pushed the fix/cloud-edge-llm-windows-guard branch from e949f1f to e118c33 Compare February 3, 2026 13:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fail early on Windows for vLLM-based cloud-edge LLM example #316

Fail early on Windows for vLLM-based cloud-edge LLM example #316

Uh oh!

iashutoshyadav commented Feb 3, 2026

Uh oh!

kubeedge-bot commented Feb 3, 2026

Uh oh!

kubeedge-bot commented Feb 3, 2026

Uh oh!

gemini-code-assist bot commented Feb 3, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Feb 3, 2026

Uh oh!

gemini-code-assist bot Feb 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fail early on Windows for vLLM-based cloud-edge LLM example #316

Are you sure you want to change the base?

Fail early on Windows for vLLM-based cloud-edge LLM example #316

Uh oh!

Conversation

iashutoshyadav commented Feb 3, 2026

What this PR does

Why this is needed

Tested

Uh oh!

kubeedge-bot commented Feb 3, 2026

Uh oh!

kubeedge-bot commented Feb 3, 2026

Uh oh!

gemini-code-assist bot commented Feb 3, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants