-
Notifications
You must be signed in to change notification settings - Fork 105
Fail early on Windows for vLLM-based cloud-edge LLM example #316
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Fail early on Windows for vLLM-based cloud-edge LLM example #316
Conversation
|
Welcome @iashutoshyadav! It looks like this is your first PR to kubeedge/ianvs 🎉 |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: iashutoshyadav The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
Summary of ChangesHello @iashutoshyadav, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request enhances the robustness and user experience of the cloud-edge collaborative LLM example by implementing proactive platform and dependency validation. It ensures that the example gracefully exits with informative error messages when executed in unsupported environments, such as Windows, or when essential dependencies like Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces an important improvement by adding platform and dependency checks to the vLLM-based example, allowing it to fail early on unsupported environments like Windows. This is a great step towards a better user experience.
My review includes a few suggestions:
- To fully address the environment requirements mentioned in the PR description (CPU-only is not supported), I've recommended adding a check for CUDA availability. This will ensure a GPU is present, providing an even clearer error message.
- I've also pointed out that several docstrings were simplified to the point of losing valuable information about parameters. I've suggested restoring some of these details to maintain code clarity and ease of use for future developers.
The other changes, such as using super() and cleaning up the _infer method, are positive and improve the code. Overall, this is a solid contribution.
| import platform | ||
| import importlib.util | ||
|
|
||
| if platform.system() == "Windows": | ||
| raise RuntimeError( | ||
| "Cloud-edge LLM example requires Linux + GPU. " | ||
| "The vLLM backend is not supported on Windows." | ||
| ) | ||
|
|
||
| if importlib.util.find_spec("vllm") is None: | ||
| raise RuntimeError( | ||
| "vLLM is required for the cloud-edge LLM example but is not installed. " | ||
| "Please run this example on Linux with GPU support and install vllm." | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR description mentions that vLLM doesn't support CPU-only environments. To fully address this and fail early, it would be beneficial to also check for CUDA availability. This ensures the environment has a GPU, as required.
This suggestion moves the torch import into the guard block to perform this check. The now-redundant import torch on line 37 should be removed.
import platform
import importlib.util
import torch
if platform.system() == "Windows":
raise RuntimeError(
"Cloud-edge LLM example requires Linux + GPU. "
"The vLLM backend is not supported on Windows."
)
if importlib.util.find_spec("vllm") is None:
raise RuntimeError(
"vLLM is required for the cloud-edge LLM example but is not installed. "
"Please run this example on Linux with GPU support and install vllm."
)
if not torch.cuda.is_available():
raise RuntimeError(
"Cloud-edge LLM example requires a GPU, but CUDA is not available. "
"Please run this example on a machine with a GPU and CUDA installed."
)| """Initialize the VllmLLM class""" | ||
|
|
||
| See details about special parameters in [vLLM's Named Arguments](https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html). | ||
| """ | ||
|
|
||
| BaseLLM.__init__(self, **kwargs) | ||
| super().__init__(**kwargs) | ||
|
|
||
| self.tensor_parallel_size = kwargs.get("tensor_parallel_size", 1) | ||
| self.gpu_memory_utilization = kwargs.get("gpu_memory_utilization", 0.8) | ||
|
|
||
| def _load(self, model): | ||
| """Load the model via vLLM API | ||
| """Load the model via vLLM API""" | ||
|
|
||
| Parameters | ||
| ---------- | ||
| model : str | ||
| Hugging Face style model name. Example: `Qwen/Qwen2.5-0.5B-Instruct` | ||
| """ | ||
| self.model = LLM( | ||
| model=model, | ||
| trust_remote_code=True, | ||
| dtype="float16", | ||
| tensor_parallel_size=self.tensor_parallel_size, | ||
| gpu_memory_utilization=self.gpu_memory_utilization, | ||
| max_model_len = 8192 | ||
| #quantization=self.quantization # TODO need to align with vllm API | ||
| max_model_len=8192, | ||
| # quantization=self.quantization # TODO align with vLLM API | ||
| ) | ||
|
|
||
| self.sampling_params = SamplingParams( | ||
| temperature=self.temperature, | ||
| top_p=self.top_p, | ||
| repetition_penalty=self.repetition_penalty, | ||
| max_tokens=self.max_tokens | ||
| max_tokens=self.max_tokens, | ||
| ) | ||
|
|
||
| # Warmup to make metrics more accurate | ||
| # Warmup for accurate metrics | ||
| self.warmup() | ||
|
|
||
| def warmup(self): | ||
| """Warm up the Model for more accurate performance metrics | ||
| """ | ||
| """Warm up the model""" | ||
|
|
||
| try: | ||
| self.model.chat( | ||
| [{"role": "user", "content": "Hello"}], | ||
| self.sampling_params, | ||
| use_tqdm=False | ||
| use_tqdm=False, | ||
| ) | ||
| except Exception as e: | ||
| raise RuntimeError(f"Warmup failed: {e}") | ||
|
|
||
| def _infer(self, messages): | ||
| """Call the vLLM Offline Inference API to get the response | ||
|
|
||
| Parameters | ||
| ---------- | ||
| messages : list | ||
| OpenAI style message chain. Example: | ||
| ``` | ||
| [{"role": "user", "content": "Hello, how are you?"}] | ||
| ``` | ||
|
|
||
| Returns | ||
| ------- | ||
| dict | ||
| Formatted Response. See `_format_response()` for more details. | ||
| """ | ||
| """Run inference using vLLM""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pull request simplifies many of the docstrings in this class (__init__, _load, _infer). While conciseness is good, the previous docstrings contained valuable information about parameters (e.g., model in _load, special kwargs in __init__) and return values that are important for maintainability and understanding the code's contract. Please consider restoring these details to the docstrings to help future developers.
Signed-off-by: Ashutosh Yadav <ashutosh2213072@akgec.ac.in>
Signed-off-by: Ashutosh Yadav <ashutosh2213072@akgec.ac.in>
Signed-off-by: Ashutosh Yadav <ashutosh2213072@akgec.ac.in>
e949f1f to
e118c33
Compare
What this PR does
Adds an early platform check for the cloud-edge collaborative LLM example.
Why this is needed
The example depends on vLLM, which does not support Windows or CPU-only
environments. Previously, running the example on Windows resulted in a
late ModuleNotFoundError. This change fails early with a clear, actionable
error message.
Tested
Fixes #310