-
Notifications
You must be signed in to change notification settings - Fork 8
MTA-5378 - LLM Configurations for Developer Lightspeed #177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
5466839
to
eff6ff7
Compare
Signed-off-by: Prabha Kylasamiyer Sundara Rajan <[email protected]>
Signed-off-by: Prabha Kylasamiyer Sundara Rajan <[email protected]>
Signed-off-by: Prabha Kylasamiyer Sundara Rajan <[email protected]>
Signed-off-by: Prabha Kylasamiyer Sundara Rajan <[email protected]>
…lysis Signed-off-by: Prabha Kylasamiyer Sundara Rajan <[email protected]>
Signed-off-by: Prabha Kylasamiyer Sundara Rajan <[email protected]>
Signed-off-by: Prabha Kylasamiyer Sundara Rajan <[email protected]>
---- | ||
|
||
See link:https://docs.redhat.com/en/documentation/openshift_container_platform/4.19/html/machine_management/applying-autoscaling#additional-resources-2[Machine autoscaler resource definition] for descriptions of the CR parameters. | ||
+ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please note that I need to make formatting changes in proc_configuring-machine-auto-scaling.adoc
See link:https://docs.redhat.com/en/documentation/openshift_container_platform/4.19/html/machine_management/applying-autoscaling#additional-resources-2[Cluster autoscaler resource definition] for descriptions of the CR parameters. | ||
+ | ||
. Enter the following command to deploy the cluster auto scaler CR. | ||
+ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please note that I need to make formatting changes in proc_configuring-node-auto-scaling.adoc
|
||
As a member of the hybrid cloud infrastructure team, your initial set of tasks to deploy a large language model (LLM) through model-as-a-service involves creating {ocp-short} clusters with primary and secondary nodes and configuring an identity provider with role-based access control for users to log in to the clusters. | ||
|
||
Next, you configure the GPU operators required to run an LLM, GPU nodes, and auto scaling for the GPU nodes in your namespace on {ocp-short} AI. The following procedures refer to an {ocp-full} cluster hosted on Amazon Web Services. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Next, you configure the GPU operators required to run an LLM, GPU nodes, and auto scaling for the GPU nodes in your namespace on {ocp-short} AI. The following procedures refer to an {ocp-full} cluster hosted on Amazon Web Services. | |
Next, you configure the GPU operators required to run an LLM, GPU nodes, and auto scaling for the GPU nodes in your namespace on {ocp-short}. The following procedures refer to an {ocp-full} cluster hosted on Amazon Web Services. |
* Azure OpenAI | ||
* Google Gemini | ||
* Amazon Bedrock | ||
* Deepseek |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Deepseek" should be removed, we do not explicitly test this for downstream.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Pkylas007 I think this PR should be made simpler, we are covering extra permutations that we don't need for this release.
I think we should target covering:
- How to specify LLMs in the IDE and Solution Server (unsure if both are covered here, or just the IDE and S.S. is covered in another section). Assume this is walking through how to access the provider-settings.yaml from IDE.
- Introduce the "&active" anchor yaml tag and how that is used to specify which stanza in provider-settings.yaml is active/used
- We should show examples for configuring the below model providers:
- OpenAI
- OpenAI Compatible (covering OpenShift AI, Podman AI, etc)
- Potential (we may document a few more examples of model providers)
- Azure OpenAI
- AWS Bedrock
- Google Gemini
- Walk through how to debug a bad credential issue with LLM
We also need to introduce the "GenAI Enabled/Disabled" configuration option and be sure users know it exists.
= Installing and configuring {ocp-short} cluster | ||
:context: maas-oc-install-config | ||
|
||
As a member of the hybrid cloud infrastructure team, your initial set of tasks to deploy a large language model (LLM) through model-as-a-service involves creating {ocp-short} clusters with primary and secondary nodes and configuring an identity provider with role-based access control for users to log in to the clusters. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a member of the hybrid cloud infrastructure team
This seems strange, assume there was a mistake with including a section on "maas" for downstream docs.
:_newdoc-version: 2.18.3 | ||
:_template-generated: 2025-04-08 | ||
|
||
ifdef::context[:parent-context-of-preparing-llm-analysis: {context}] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see the file name include "llm-analysis", just want to be clear that our analysis in MTA is not driven by a LLM today.
Our analysis is a rules driven static code analysis.
Our code suggestions/problem fixes are driven by LLMs
<productnumber>{DocInfoProductNumber}</productnumber> | ||
<subtitle>Using the {ProductName} command-line interface to migrate your applications</subtitle> | ||
<abstract> | ||
<para>Use {ProductFullName} Developer Lightspeed for application modernization in your organization by running Artificial Intelligence-driven static code analysis for Java applications.</para> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
running Artificial Intelligence-driven static code analysis for Java applications.
Note our analysis is not AI-driven, our code suggestions/fixes are AI-driven. Our analysis is rules based static code analysis.
|
||
The code suggestions differ based on various parameters about the large language model (LLM) used for an analysis. Therefore, model-as-a-service enables you more control over using {mta-dl-plugin} with an LLM that is trained for your specific requirements than general purpose models from the public AI providers. | ||
|
||
{mta-dl-plugin} is built to analyze better when it can access code changes resulting from analysis performed at scale across many application teams. In an enterprise, changes at scale become more consistent when the LLMs that generate the code change suggestions are shared across application teams than when each team uses a different LLM. This approach calls for a common strategy in an enterprise to manage the underlying resources that power the models that must be exposed to multiple members in different teams. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is built to analyze better when it can access code changes resulting from analysis performed at scale across many application teams
This isn't quite accurate, our analysis does not improve. The analysis is same rules driven analysis.
Code suggestions may potentially improve by better/improved "context" from the Solution Server.
So it's the fixing of a problem that may get better the more an organization uses MTA (if they use the Solution Server running).
|
||
[role="_abstract"] | ||
|
||
The code suggestions differ based on various parameters about the large language model (LLM) used for an analysis. Therefore, model-as-a-service enables you more control over using {mta-dl-plugin} with an LLM that is trained for your specific requirements than general purpose models from the public AI providers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we want to be documenting model-as-a-service, IF this is implying the internal instance of https://github.com/rh-aiservices-bu/models-aas
|
||
{mta-dl-plugin} is built to analyze better when it can access code changes resulting from analysis performed at scale across many application teams. In an enterprise, changes at scale become more consistent when the LLMs that generate the code change suggestions are shared across application teams than when each team uses a different LLM. This approach calls for a common strategy in an enterprise to manage the underlying resources that power the models that must be exposed to multiple members in different teams. | ||
|
||
To cater to an enterprise-wide LLM deployment, {mta-dl-plugin} integrates with LLMs that are deployed as a scalable service on {ocp-full} clusters. These deployments, called model-as-a-service (MaaS), provide you with a granular control over resources such as compute, cluster nodes, and auto-scaling Graphical Processing Units (GPUs) while enabling you to leverage LLMs to perform analysis at a large scale. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see a potential desire to call out to the ability to run LLMs on OpenShift AI.
As I'm reading over this PR I'm not quite clear of what we are doing with model-as-a-service, in past I have heard model-as-a-service refered to as the maas endpoint at https://maas.apps.prod.rhoai.rh-aiservices-bu.com/ from the code at https://github.com/rh-aiservices-bu/models-aas.
I see the term is generic and can refer to other things, I'm just a bit concerned that some testing information may have bled over into product docs that wouldn't be appropriate.
[role="_abstract"] | ||
In {ocp-short}, a project is a Kubernetes namespace with additional annotations, and is the main way that you can manage user access to resources. A project organizes your data science work in one place and also allows you to collaborate with other developers in your organization. | ||
|
||
In your data science project, you must create a data connection to your existing S3-compatible storage bucket to which you uploaded a large language model. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems strange for us to document, feels out of scope for MTA
|
||
[role="_abstract"] | ||
|
||
The Podman AI lab extension enables you to use an open-source model from a curated list of models and use it locally in your system. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Covering instructions for Podman AI sounds useful.
There is a concern that for practical purposes it is HIGHLY likely that users will be disappointed/frustrated running local models. This is a nature that the majority of models run locally are smaller and not powerful, that directly limits the potential help they can provide to MTA.
For most purposes a local model will yield poor results.
We likely want to be clear to instruct the user that MTA's behavior is related to the capability of a model, often local models are smaller and less powerful, therefore they often lack the ability to significantly help with MTA use-cases.
= Configuring the LLM serving runtime | ||
|
||
[role="_abstract"] | ||
It takes several minutes to scale nodes and pull the image to serve the virtual large language model (vLLM). However, the default time for deploying a vLLM is 10 minutes. A vLLM deployment that takes longer fails on the {ocp-short} AI cluster. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would remove this section.
Not sure we want to get into documenting how to run models via vllm for our MTA product docs.
Interesting to consider, but for this release both Engineering and QE have NOT tested MTA against a model deployed by vllm directly.
JIRA
Version
Preview