Skip to content

Conversation

@iashutoshyadav
Copy link

What this PR does

Adds an early platform check for the cloud-edge collaborative LLM example.

Why this is needed

The example depends on vLLM, which does not support Windows or CPU-only
environments. Previously, running the example on Windows resulted in a
late ModuleNotFoundError. This change fails early with a clear, actionable
error message.

Tested

  • Windows 11 (CPU-only)
  • Verified that the example fails early with a clear RuntimeError

Fixes #310

@kubeedge-bot
Copy link
Collaborator

Welcome @iashutoshyadav! It looks like this is your first PR to kubeedge/ianvs 🎉

@kubeedge-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: iashutoshyadav
To complete the pull request process, please assign jaypume after the PR has been reviewed.
You can assign the PR to them by writing /assign @jaypume in a comment when ready.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kubeedge-bot kubeedge-bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Feb 3, 2026
@gemini-code-assist
Copy link

Summary of Changes

Hello @iashutoshyadav, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the robustness and user experience of the cloud-edge collaborative LLM example by implementing proactive platform and dependency validation. It ensures that the example gracefully exits with informative error messages when executed in unsupported environments, such as Windows, or when essential dependencies like vLLM are not installed, thereby preventing cryptic failures later in the execution flow.

Highlights

  • Early Platform Checks: Introduced checks at the start of vllm_llm.py to detect unsupported environments before execution proceeds.
  • Windows OS Detection: The example now explicitly raises a RuntimeError if run on Windows, as vLLM does not support this operating system.
  • vLLM Dependency Verification: A check for the vllm package installation ensures it is present, failing early with a RuntimeError if missing.
  • Improved Error Messaging: Users now receive clear, actionable error messages, preventing cryptic ModuleNotFoundError exceptions later in the process.
  • Code Structure and Readability: The vllm_llm.py file has been refactored with clear sections for platform guards, safe imports, and environment variables, along with more concise docstrings.
  • Git Ignore Updates: Added entries to .gitignore for ianvs-env/ (virtual environment) and test_vllm_import.py (local verification script).

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • .gitignore
    • Added ianvs-env/ to the ignore list for virtual environments.
    • Added test_vllm_import.py to the ignore list for local verification scripts.
  • examples/cloud-edge-collaborative-inference-for-llm/testalgorithms/query-routing/models/vllm_llm.py
    • Implemented an early check using platform.system() to raise a RuntimeError if the script is run on Windows.
    • Added an early check using importlib.util.find_spec("vllm") to raise a RuntimeError if the vllm package is not installed.
    • Grouped initial imports and environment variable settings into distinct, commented sections for clarity.
    • Updated method docstrings (__init__, _load, warmup, _infer, cleanup) to be more concise and descriptive.
    • Replaced BaseLLM.__init__(self, **kwargs) with super().__init__(**kwargs) for better inheritance practice.
    • Added a trailing comma to max_model_len=8192 and max_tokens=self.max_tokens for consistent formatting.
    • Removed redundant comments like "Completion Text" and "Prompt Token Count" within the _infer method.
    • Ensured the cleanup method's exception handling is robust.
Activity
  • The pull request was created by iashutoshyadav.
  • The author provided a clear description outlining the purpose ("Adds an early platform check"), the rationale ("vLLM... does not support Windows or CPU-only environments... resulted in a late ModuleNotFoundError"), and testing details ("Windows 11 (CPU-only)", "Verified that the example fails early").
  • The PR addresses and fixes issue Cloud-edge LLM example fails on Windows due to undocumented vllm dependency #310.
  • No further review comments or activities have been recorded.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces an important improvement by adding platform and dependency checks to the vLLM-based example, allowing it to fail early on unsupported environments like Windows. This is a great step towards a better user experience.

My review includes a few suggestions:

  • To fully address the environment requirements mentioned in the PR description (CPU-only is not supported), I've recommended adding a check for CUDA availability. This will ensure a GPU is present, providing an even clearer error message.
  • I've also pointed out that several docstrings were simplified to the point of losing valuable information about parameters. I've suggested restoring some of these details to maintain code clarity and ease of use for future developers.

The other changes, such as using super() and cleaning up the _infer method, are positive and improve the code. Overall, this is a solid contribution.

Comment on lines 18 to 32
import platform
import importlib.util

if platform.system() == "Windows":
raise RuntimeError(
"Cloud-edge LLM example requires Linux + GPU. "
"The vLLM backend is not supported on Windows."
)

if importlib.util.find_spec("vllm") is None:
raise RuntimeError(
"vLLM is required for the cloud-edge LLM example but is not installed. "
"Please run this example on Linux with GPU support and install vllm."
)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The PR description mentions that vLLM doesn't support CPU-only environments. To fully address this and fail early, it would be beneficial to also check for CUDA availability. This ensures the environment has a GPU, as required.

This suggestion moves the torch import into the guard block to perform this check. The now-redundant import torch on line 37 should be removed.

import platform
import importlib.util
import torch

if platform.system() == "Windows":
    raise RuntimeError(
        "Cloud-edge LLM example requires Linux + GPU. "
        "The vLLM backend is not supported on Windows."
    )

if importlib.util.find_spec("vllm") is None:
    raise RuntimeError(
        "vLLM is required for the cloud-edge LLM example but is not installed. "
        "Please run this example on Linux with GPU support and install vllm."
    )

if not torch.cuda.is_available():
    raise RuntimeError(
        "Cloud-edge LLM example requires a GPU, but CUDA is not available. "
        "Please run this example on a machine with a GPU and CUDA installed."
    )

Comment on lines 57 to 100
"""Initialize the VllmLLM class"""

See details about special parameters in [vLLM's Named Arguments](https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html).
"""

BaseLLM.__init__(self, **kwargs)
super().__init__(**kwargs)

self.tensor_parallel_size = kwargs.get("tensor_parallel_size", 1)
self.gpu_memory_utilization = kwargs.get("gpu_memory_utilization", 0.8)

def _load(self, model):
"""Load the model via vLLM API
"""Load the model via vLLM API"""

Parameters
----------
model : str
Hugging Face style model name. Example: `Qwen/Qwen2.5-0.5B-Instruct`
"""
self.model = LLM(
model=model,
trust_remote_code=True,
dtype="float16",
tensor_parallel_size=self.tensor_parallel_size,
gpu_memory_utilization=self.gpu_memory_utilization,
max_model_len = 8192
#quantization=self.quantization # TODO need to align with vllm API
max_model_len=8192,
# quantization=self.quantization # TODO align with vLLM API
)

self.sampling_params = SamplingParams(
temperature=self.temperature,
top_p=self.top_p,
repetition_penalty=self.repetition_penalty,
max_tokens=self.max_tokens
max_tokens=self.max_tokens,
)

# Warmup to make metrics more accurate
# Warmup for accurate metrics
self.warmup()

def warmup(self):
"""Warm up the Model for more accurate performance metrics
"""
"""Warm up the model"""

try:
self.model.chat(
[{"role": "user", "content": "Hello"}],
self.sampling_params,
use_tqdm=False
use_tqdm=False,
)
except Exception as e:
raise RuntimeError(f"Warmup failed: {e}")

def _infer(self, messages):
"""Call the vLLM Offline Inference API to get the response

Parameters
----------
messages : list
OpenAI style message chain. Example:
```
[{"role": "user", "content": "Hello, how are you?"}]
```

Returns
-------
dict
Formatted Response. See `_format_response()` for more details.
"""
"""Run inference using vLLM"""

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This pull request simplifies many of the docstrings in this class (__init__, _load, _infer). While conciseness is good, the previous docstrings contained valuable information about parameters (e.g., model in _load, special kwargs in __init__) and return values that are important for maintainability and understanding the code's contract. Please consider restoring these details to the docstrings to help future developers.

Signed-off-by: Ashutosh Yadav <ashutosh2213072@akgec.ac.in>
Signed-off-by: Ashutosh Yadav <ashutosh2213072@akgec.ac.in>
Signed-off-by: Ashutosh Yadav <ashutosh2213072@akgec.ac.in>
@iashutoshyadav iashutoshyadav force-pushed the fix/cloud-edge-llm-windows-guard branch from e949f1f to e118c33 Compare February 3, 2026 13:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Cloud-edge LLM example fails on Windows due to undocumented vllm dependency

2 participants