-
Notifications
You must be signed in to change notification settings - Fork 89
feat: Add tool to deploy LLM models to Cloud Run #124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
steren
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR
While I like the idea. I wonder if we should add it. What are the use cases for an Agent deploying a model?
Note that later, we will have presets that will cover this use case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove file
| ); | ||
|
|
||
| server.registerTool( | ||
| 'cloud_run_deploy_model', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove the cloud_run_ prefix.
Suggestion: deploy_ai_model
| framework: | ||
| z.enum(['ollama', 'vllm']).describe('The framework to use for serving the model.'), | ||
| model: | ||
| z.string().describe('The model to deploy from Ollama library or Hugging Face Hub.'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add more details. give examples of accepted formats
Refactors the vLLM deployment strategy to use a dedicated Cloud Function for streaming models from Hugging Face to GCS. This avoids slow local downloads and network bottlenecks. Also includes: - Hardening the vLLM container with --max-model-len and HF_HUB_OFFLINE. - Correcting the container port to 8000. - Cleaning up unused code in the deployment scripts.
This PR introduces a new tool,
cloud-run-deploy-model, which simplifies the deployment of Large Language Models (LLMs) to Google Cloud Run.Key Features:
cloud-run-deploy-modeltool that supports deploying models using the Ollama and vLLM frameworks.Example
Prompt:
Output: