Skip to content
Open
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
:_newdoc-version: 2.18.3
:_template-generated: 2025-04-08

ifdef::context[:parent-context-of-configuring-openshift-ai: {context}]

:_mod-docs-content-type: ASSEMBLY

ifndef::context[]
[id="configuring-openshift-ai"]
endif::[]
ifdef::context[]
[id="configuring-openshift-ai_{context}"]
endif::[]
= Configuring {ocp-short} AI
:context: configuring-openshift-ai

The configurations that you must complete for {ocp-short} AI include creating a data science project instance in the {ocp-short} AI operator. Next, you can configure model-specific configurations in the *Red Hat {ocp-short} AI* console.

include::topics/developer-lightspeed/proc_creating-datascience-cluster.adoc[leveloffset=+1]

include::topics/developer-lightspeed/proc_configuring-llm-serving-runtime.adoc[leveloffset=+1]

include::topics/developer-lightspeed/proc_creating-accelerator-profile.adoc[leveloffset=+1]


ifdef::parent-context-of-configuring-openshift-ai[:context: {parent-context-of-configuring-openshift-ai}]
ifndef::parent-context-of-configuring-openshift-ai[:!context:]
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
:_newdoc-version: 2.18.3
:_template-generated: 2025-04-08

ifdef::context[:parent-context-of-configuring-llm: {context}]

:_mod-docs-content-type: ASSEMBLY

ifndef::context[]
[id="configuring-llm"]
endif::[]
ifdef::context[]
[id="configuring-llm_{context}"]
endif::[]
= Configuring large language models for analysis
:context: configuring-llm

In an analysis, {mta-dl-plugin} provides the large language model (LLM) with the contextual prompt to identify the issues in the current application and generate suggestions to resolve them.

{mta-dl-plugin} is designed to be model agnostic. It works with LLMs that are run in different environments (in local containers, as local AI, as a shared service) to support analyzing Java applications in a wide range of scenarios. You can choose an LLM from well-known providers, local models that you run from Ollama or Podman desktop, and OpenAI API compatible models that are configured as Model-as-a-Service deployments.

The result of an analysis performed by {mta-dl-plugin} depends on the parameters of the LLM that you choose.

You can run an LLM from the following generative AI providers:

* OpenAI
* Azure OpenAI
* Google Gemini
* Amazon Bedrock
* Deepseek
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Deepseek" should be removed, we do not explicitly test this for downstream.

* OpenShift AI

include::topics/developer-lightspeed/con_model-as-a-service.adoc[leveloffset=+1]

include::assembly_maas-oc-install-config.adoc[leveloffset=+2]

include::assembly_configuring-openshift-ai.adoc[leveloffset=+2]

include::assembly_deploying-openshift-ai-llm.adoc[leveloffset=+2]

include::assembly_preparing-llm-analysis.adoc[leveloffset=+2]

include::topics/developer-lightspeed/proc_configuring-llm-podman-desktop.adoc[leveloffset=+1]

ifdef::parent-context-of-configuring-llm[:context: {parent-context-of-configuring-llm}]
ifndef::parent-context-of-configuring-llm[:!context:]
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
:_newdoc-version: 2.18.3
:_template-generated: 2025-04-08

ifdef::context[:parent-context-of-deploying-openshift-ai-llm: {context}]

:_mod-docs-content-type: ASSEMBLY

ifndef::context[]
[id="deploying-openshift-ai-llm"]
endif::[]
ifdef::context[]
[id="deploying-openshift-ai-llm_{context}"]
endif::[]
= Deploying the large language model
:context: deploying-openshift-ai-llm

To connect the {ocp-short} AI platform to a large language model (LLM), first, you must upload your LLM to a data source.

{ocp-short} AI, that runs on pods in a Red Hat {ocp-short} on AWS (ROSA) cluster, can access the LLM from a data source such as an Amazon Web Services (AWS) S3 storage. You must create an AWS S3 bucket and configure access permission so that it can access the pods running in the ROSA cluster. See how to enable link:https://docs.redhat.com/en/documentation/red_hat_openshift_service_on_aws/4/html/authentication_and_authorization/assuming-an-aws-iam-role-for-a-service-account#how-service-accounts-assume-aws-iam-roles-in-user-defined-projects_assuming-an-aws-iam-role-for-a-service-account[service account to assume AWS IAM role to access the ROSA pods].

Next, you must configure a data connection to the bucket and deploy the LLM from the {ocp-short} AI platform.

include::topics/developer-lightspeed/proc_adding-data-connection.adoc[leveloffset=+1]

include::topics/developer-lightspeed/proc_deploying-the-model.adoc[leveloffset=+1]


ifdef::parent-context-of-deploying-openshift-ai-llm[:context: {parent-context-of-deploying-openshift-ai-llm}]
ifndef::parent-context-of-deploying-openshift-ai-llm[:!context:]
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
:_newdoc-version: 2.18.3
:_template-generated: 2025-04-08

ifdef::context[:parent-context-of-maas-oc-install-config: {context}]

:_mod-docs-content-type: ASSEMBLY

ifndef::context[]
[id="maas-oc-install-config"]
endif::[]
ifdef::context[]
[id="maas-oc-install-config_{context}"]
endif::[]
= Installing and configuring {ocp-short} cluster
:context: maas-oc-install-config

As a member of the hybrid cloud infrastructure team, your initial set of tasks to deploy a large language model (LLM) through model-as-a-service involves creating {ocp-short} clusters with primary and secondary nodes and configuring an identity provider with role-based access control for users to log in to the clusters.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a member of the hybrid cloud infrastructure team

This seems strange, assume there was a mistake with including a section on "maas" for downstream docs.


Next, you configure the GPU operators required to run an LLM, GPU nodes, and auto scaling for the GPU nodes in your namespace on {ocp-short} AI. The following procedures refer to an {ocp-full} cluster hosted on Amazon Web Services.
Copy link
Collaborator Author

@Pkylas007 Pkylas007 Aug 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Next, you configure the GPU operators required to run an LLM, GPU nodes, and auto scaling for the GPU nodes in your namespace on {ocp-short} AI. The following procedures refer to an {ocp-full} cluster hosted on Amazon Web Services.
Next, you configure the GPU operators required to run an LLM, GPU nodes, and auto scaling for the GPU nodes in your namespace on {ocp-short}. The following procedures refer to an {ocp-full} cluster hosted on Amazon Web Services.


include::topics/developer-lightspeed/proc_install-oc-cluster.adoc[leveloffset=+1]

include::topics/developer-lightspeed/proc_configuring-operators.adoc[leveloffset=+1]

include::topics/developer-lightspeed/proc_creating-gpu-machine-set.adoc[leveloffset=+1]

include::topics/developer-lightspeed/proc_configuring-node-auto-scaling.adoc[leveloffset=+1]

include::topics/developer-lightspeed/proc_configuring-machine-auto-scaling.adoc[leveloffset=+1]

ifdef::parent-context-of-maas-oc-install-config[:context: {parent-context-of-maas-oc-install-config}]
ifndef::parent-context-of-maas-oc-install-config[:!context:]
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
:_newdoc-version: 2.18.3
:_template-generated: 2025-04-08

ifdef::context[:parent-context-of-preparing-llm-analysis: {context}]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the file name include "llm-analysis", just want to be clear that our analysis in MTA is not driven by a LLM today.

Our analysis is a rules driven static code analysis.
Our code suggestions/problem fixes are driven by LLMs


:_mod-docs-content-type: ASSEMBLY

ifndef::context[]
[id="preparing-llm-analysis"]
endif::[]
ifdef::context[]
[id="preparing-llm-analysis_{context}"]
endif::[]
= Preparing the large language model for analysis
:context: preparing-llm-analysis

To access the large language model (LLM), you must create an API key for the model and update settings in {mta-dl-plugin} to enable the extension to use the LLM.

include::topics/developer-lightspeed/proc_configuring-openai-api-key.adoc[leveloffset=+1]

ifdef::parent-context-of-preparing-llm-analysis[:context: {parent-context-of-preparing-llm-analysis}]
ifndef::parent-context-of-preparing-llm-analysis[:!context:]
1 change: 1 addition & 0 deletions assemblies/developer-lightspeed-guide/topics
1 change: 1 addition & 0 deletions docs/developer-lightspeed-guide/assemblies
11 changes: 11 additions & 0 deletions docs/developer-lightspeed-guide/master-docinfo.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
<title>MTA Developer Lightspeed Guide</title>
<productname>{DocInfoProductName}</productname>
<productnumber>{DocInfoProductNumber}</productnumber>
<subtitle>Using the {ProductName} command-line interface to migrate your applications</subtitle>
<abstract>
<para>Use {ProductFullName} Developer Lightspeed for application modernization in your organization by running Artificial Intelligence-driven static code analysis for Java applications.</para>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

running Artificial Intelligence-driven static code analysis for Java applications.

Note our analysis is not AI-driven, our code suggestions/fixes are AI-driven. Our analysis is rules based static code analysis.

</abstract>
<authorgroup>
<orgname>Red Hat Customer Content Services</orgname>
</authorgroup>
<xi:include href="Common_Content/Legal_Notice.xml" xmlns:xi="http://www.w3.org/2001/XInclude" />
26 changes: 26 additions & 0 deletions docs/developer-lightspeed-guide/master.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
:mta:
include::topics/templates/document-attributes.adoc[]
:_mod-docs-content-type: ASSEMBLY
[id="mta-developer-lightspeed"]
= MTA Developer Lightspeed

:toc:
:toclevels: 4
:numbered:
:imagesdir: topics/images
:context: mta-developer-lightspeed
:mta-developer-lightspeed:

//Inclusive language statement
include::topics/making-open-source-more-inclusive.adoc[]








include::assemblies/developer-lightspeed-guide/assembly_configuring_llm.adoc[leveloffset=+1]

:!mta-developer-lightspeed:
1 change: 1 addition & 0 deletions docs/developer-lightspeed-guide/topics
23 changes: 23 additions & 0 deletions docs/topics/developer-lightspeed/con_model-as-a-service.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
:_newdoc-version: 2.15.0
:_template-generated: 2024-2-21

:_mod-docs-content-type: CONCEPT

[id="model-as-a-service_{context}"]
= Deploying an LLM as a scalable service

[role="_abstract"]

The code suggestions differ based on various parameters about the large language model (LLM) used for an analysis. Therefore, model-as-a-service enables you more control over using {mta-dl-plugin} with an LLM that is trained for your specific requirements than general purpose models from the public AI providers.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want to be documenting model-as-a-service, IF this is implying the internal instance of https://github.com/rh-aiservices-bu/models-aas


{mta-dl-plugin} is built to analyze better when it can access code changes resulting from analysis performed at scale across many application teams. In an enterprise, changes at scale become more consistent when the LLMs that generate the code change suggestions are shared across application teams than when each team uses a different LLM. This approach calls for a common strategy in an enterprise to manage the underlying resources that power the models that must be exposed to multiple members in different teams.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is built to analyze better when it can access code changes resulting from analysis performed at scale across many application teams

This isn't quite accurate, our analysis does not improve. The analysis is same rules driven analysis.

Code suggestions may potentially improve by better/improved "context" from the Solution Server.

So it's the fixing of a problem that may get better the more an organization uses MTA (if they use the Solution Server running).


To cater to an enterprise-wide LLM deployment, {mta-dl-plugin} integrates with LLMs that are deployed as a scalable service on {ocp-full} clusters. These deployments, called model-as-a-service (MaaS), provide you with a granular control over resources such as compute, cluster nodes, and auto-scaling Graphical Processing Units (GPUs) while enabling you to leverage LLMs to perform analysis at a large scale.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see a potential desire to call out to the ability to run LLMs on OpenShift AI.

As I'm reading over this PR I'm not quite clear of what we are doing with model-as-a-service, in past I have heard model-as-a-service refered to as the maas endpoint at https://maas.apps.prod.rhoai.rh-aiservices-bu.com/ from the code at https://github.com/rh-aiservices-bu/models-aas.

I see the term is generic and can refer to other things, I'm just a bit concerned that some testing information may have bled over into product docs that wouldn't be appropriate.


The workflow for configuring an LLM on {ocp-short} AI can be broadly divided into the following parts:

* Installing and configuring infrastructure resources
* Configuring OpenShift AI
* Connecting OpenShift AI with the LLM
* Preparing the LLM for analysis
//* Configuring monitoring and alerting for the storage resource: creating a ConfigMap for monitoring storage and an alert configuration file.
44 changes: 44 additions & 0 deletions docs/topics/developer-lightspeed/proc_adding-data-connection.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
:_newdoc-version: 2.15.0
:_template-generated: 2024-2-21
:_mod-docs-content-type: PROCEDURE

[id="adding-data-connection_{context}"]
= Adding a data connection

[role="_abstract"]
In {ocp-short}, a project is a Kubernetes namespace with additional annotations, and is the main way that you can manage user access to resources. A project organizes your data science work in one place and also allows you to collaborate with other developers in your organization.

In your data science project, you must create a data connection to your existing S3-compatible storage bucket to which you uploaded a large language model.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems strange for us to document, feels out of scope for MTA


.Prerequisites

You need the following credential information for the storage buckets:

* Endpoint URL
* Access key
* Secret key
* Region
* Bucket name

If you do not have this information, contact your storage administrator.

.Procedure

. In the {ocp-short} AI web console, select *Data science projects*.
The *Data science projects* page shows a list of projects that you can access. For each user-requested project in the list, the *Name* column shows the project display name, the user who requested the project, and the project description.

. Click *Create project*.
In the *Create project* dialog, update the *Name* field to enter a unique display name for your project.
+
. Optional: In the *Description* field, provide a project description.

. Click *Create*.
Your project is listed on the *Data science projects* page.

. Click the name of your project, select the *Connections* tab, and click *Create connection*.

. In the *Connection type* drop down, select *S3 compatible object storage - v1*.

. In the *Connection details* section, enter the connection name, the access key, the secret key, endpoint to your storage bucket, and the region.

. Click *Create*.
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
:_newdoc-version: 2.15.0
:_template-generated: 2024-2-21
:_mod-docs-content-type: PROCEDURE

[id="configuring-llm-podman_{context}"]
= Configuring the LLM in Podman Desktop

[role="_abstract"]

The Podman AI lab extension enables you to use an open-source model from a curated list of models and use it locally in your system.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Covering instructions for Podman AI sounds useful.
There is a concern that for practical purposes it is HIGHLY likely that users will be disappointed/frustrated running local models. This is a nature that the majority of models run locally are smaller and not powerful, that directly limits the potential help they can provide to MTA.

For most purposes a local model will yield poor results.

We likely want to be clear to instruct the user that MTA's behavior is related to the capability of a model, often local models are smaller and less powerful, therefore they often lack the ability to significantly help with MTA use-cases.


.Prerequisites

* You installed link:https://podman-desktop.io/docs/installation[Podman Desktop] in your system.

* You completed initial configurations in {mta-dl-plugin} required for the analysis.

.Procedure

. Go to the Podman AI Lab extension and click *Catalog* under *Models*.

. Download one or more models.

. Go to *Services* and click *New Model Service*.

. Select a model that you downloaded in the *Model* drop down menu and click *Create Service*.

. Click the deployed model service to open the *Service Details* page.

. Note the server URL and the model name.
You must configure these specifications in the {mta-dl-plugin} extension.

. Export the inference server URL as follows:
+
[source, terminal]
----
export OPENAI_API_BASE=<server-url>
----
+
. In the VS Code, click *Configure GenAI Settings* to open the `provider-settings.yaml` file.

. Enter the model details from Podman Desktop. For example, use the following configuration for a Mistral model.
+
[source, yaml]
----
podman_mistral:
provider: "ChatOpenAI"
environment:
OPENAI_API_KEY: "unused value"
args:
model: "mistral-7b-instruct-v0-2"
base_url: "http://localhost:35841/v1"
----
+
[NOTE]
====
The Podman Desktop service endpoint does not need a password but the OpenAI library expects the `OPENAI_API_KEY` to be set. In this case, the value of the `OPENAI_API_KEY` variable does not matter.
====
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
:_newdoc-version: 2.15.0
:_template-generated: 2024-2-21
:_mod-docs-content-type: PROCEDURE

[id="configuring-llm-serving-runtime_{context}"]
= Configuring the LLM serving runtime

[role="_abstract"]
It takes several minutes to scale nodes and pull the image to serve the virtual large language model (vLLM). However, the default time for deploying a vLLM is 10 minutes. A vLLM deployment that takes longer fails on the {ocp-short} AI cluster.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would remove this section.

Not sure we want to get into documenting how to run models via vllm for our MTA product docs.

Interesting to consider, but for this release both Engineering and QE have NOT tested MTA against a model deployed by vllm directly.


To mitigate this issue, you must enter a custom serving time configuration.

.Procedure

. On the {ocp-short} AI dashboard, click *Settings > Serving runtimes*.
The *Serving runtimes* page lists the `vLLM ServingRuntime for KServe` custom resource (CR).
`KServe` orchestrates model serving for all types of models and includes model-serving runtimes that implement the loading of given types of model servers. KServe also handles the lifecycle of the deployment object, storage access, and networking setup.

. Click on the kebab menu for `vLLM ServingRuntime for KServe` and select *Duplicate serving runtime*.

. Enter a different display name for the serving runtime and increase the value for `serving.knative.dev/progress-deadline` to `60m`.

. To support multiple GPU nodes and scaling, add `--distributed-executor-backend` and `--tensor-parallel-size` to `containers.args` as follows:
+
[source, yaml]
----
spec:
containers:
- args:
- --port=8080
- --model=/mnt/models
- --served-model-name={{.Name}}
- --distributed-executor-backend=mp
- --tensor-parallel-size=8
----
+

Next, you must create an accelerator profile if you want to run a GPU node for the first time.
Loading