MTA-5378 - LLM Configurations for Developer Lightspeed #177

Pkylas007 · 2025-08-11T16:12:42Z

JIRA

MTA-5378

Version

8.0.0

Preview

Configuring large language models for analysis

Signed-off-by: Prabha Kylasamiyer Sundara Rajan <[email protected]>

….adoc

Signed-off-by: Prabha Kylasamiyer Sundara Rajan <[email protected]>

…lysis Signed-off-by: Prabha Kylasamiyer Sundara Rajan <[email protected]>

Signed-off-by: Prabha Kylasamiyer Sundara Rajan <[email protected]>

Pkylas007 · 2025-08-26T05:37:48Z

docs/topics/developer-lightspeed/proc_configuring-machine-auto-scaling.adoc

+----
+
+See link:https://docs.redhat.com/en/documentation/openshift_container_platform/4.19/html/machine_management/applying-autoscaling#additional-resources-2[Machine autoscaler resource definition] for descriptions of the CR parameters.
+


Please note that I need to make formatting changes in proc_configuring-machine-auto-scaling.adoc

Pkylas007 · 2025-08-26T05:38:15Z

docs/topics/developer-lightspeed/proc_configuring-node-auto-scaling.adoc

+See link:https://docs.redhat.com/en/documentation/openshift_container_platform/4.19/html/machine_management/applying-autoscaling#additional-resources-2[Cluster autoscaler resource definition] for descriptions of the CR parameters.
+
+. Enter the following command to deploy the cluster auto scaler CR.
+


Please note that I need to make formatting changes in proc_configuring-node-auto-scaling.adoc

Pkylas007 · 2025-08-26T05:41:18Z

assemblies/developer-lightspeed-guide/assembly_maas-oc-install-config.adoc

+
+As a member of the hybrid cloud infrastructure team, your initial set of tasks to deploy a large language model (LLM) through model-as-a-service involves creating {ocp-short} clusters with primary and secondary nodes and configuring an identity provider with role-based access control for users to log in to the clusters. 
+
+Next, you configure the GPU operators required to run an LLM, GPU nodes, and auto scaling for the GPU nodes in your namespace on {ocp-short} AI. The following procedures refer to an {ocp-full} cluster hosted on Amazon Web Services. 


Suggested change

Next, you configure the GPU operators required to run an LLM, GPU nodes, and auto scaling for the GPU nodes in your namespace on {ocp-short} AI. The following procedures refer to an {ocp-full} cluster hosted on Amazon Web Services.

Next, you configure the GPU operators required to run an LLM, GPU nodes, and auto scaling for the GPU nodes in your namespace on {ocp-short}. The following procedures refer to an {ocp-full} cluster hosted on Amazon Web Services.

jwmatthews · 2025-08-29T15:06:04Z

assemblies/developer-lightspeed-guide/assembly_configuring_llm.adoc

+* Azure OpenAI
+* Google Gemini
+* Amazon Bedrock
+* Deepseek 


"Deepseek" should be removed, we do not explicitly test this for downstream.

jwmatthews

@Pkylas007 I think this PR should be made simpler, we are covering extra permutations that we don't need for this release.

I think we should target covering:

How to specify LLMs in the IDE and Solution Server (unsure if both are covered here, or just the IDE and S.S. is covered in another section). Assume this is walking through how to access the provider-settings.yaml from IDE.
Introduce the "&active" anchor yaml tag and how that is used to specify which stanza in provider-settings.yaml is active/used
We should show examples for configuring the below model providers:
- OpenAI
- OpenAI Compatible (covering OpenShift AI, Podman AI, etc)
Potential (we may document a few more examples of model providers)
- Azure OpenAI
- AWS Bedrock
- Google Gemini
Walk through how to debug a bad credential issue with LLM

We also need to introduce the "GenAI Enabled/Disabled" configuration option and be sure users know it exists.

jwmatthews · 2025-08-29T15:07:53Z

assemblies/developer-lightspeed-guide/assembly_maas-oc-install-config.adoc

+= Installing and configuring {ocp-short} cluster
+:context: maas-oc-install-config
+
+As a member of the hybrid cloud infrastructure team, your initial set of tasks to deploy a large language model (LLM) through model-as-a-service involves creating {ocp-short} clusters with primary and secondary nodes and configuring an identity provider with role-based access control for users to log in to the clusters. 


As a member of the hybrid cloud infrastructure team

This seems strange, assume there was a mistake with including a section on "maas" for downstream docs.

jwmatthews · 2025-08-29T15:09:46Z

assemblies/developer-lightspeed-guide/assembly_preparing-llm-analysis.adoc

+:_newdoc-version: 2.18.3
+:_template-generated: 2025-04-08
+
+ifdef::context[:parent-context-of-preparing-llm-analysis: {context}]


I see the file name include "llm-analysis", just want to be clear that our analysis in MTA is not driven by a LLM today.

Our analysis is a rules driven static code analysis.
Our code suggestions/problem fixes are driven by LLMs

jwmatthews · 2025-08-29T15:10:58Z

docs/developer-lightspeed-guide/master-docinfo.xml

+<productnumber>{DocInfoProductNumber}</productnumber>
+<subtitle>Using the {ProductName} command-line interface to migrate your applications</subtitle>
+<abstract>
+    <para>Use {ProductFullName} Developer Lightspeed for application modernization in your organization by running Artificial Intelligence-driven static code analysis for Java applications.</para>


running Artificial Intelligence-driven static code analysis for Java applications.

Note our analysis is not AI-driven, our code suggestions/fixes are AI-driven. Our analysis is rules based static code analysis.

jwmatthews · 2025-08-29T15:13:08Z

docs/topics/developer-lightspeed/con_model-as-a-service.adoc

+
+The code suggestions differ based on various parameters about the large language model (LLM) used for an analysis. Therefore, model-as-a-service enables you more control over using {mta-dl-plugin} with an LLM that is trained for your specific requirements than general purpose models from the public AI providers.
+
+{mta-dl-plugin} is built to analyze better when it can access code changes resulting from analysis performed at scale across many application teams. In an enterprise, changes at scale become more consistent when the LLMs that generate the code change suggestions are shared across application teams than when each team uses a different LLM. This approach calls for a common strategy in an enterprise to manage the underlying resources that power the models that must be exposed to multiple members in different teams.


is built to analyze better when it can access code changes resulting from analysis performed at scale across many application teams

This isn't quite accurate, our analysis does not improve. The analysis is same rules driven analysis.

Code suggestions may potentially improve by better/improved "context" from the Solution Server.

So it's the fixing of a problem that may get better the more an organization uses MTA (if they use the Solution Server running).

jwmatthews · 2025-08-29T15:14:23Z

docs/topics/developer-lightspeed/con_model-as-a-service.adoc

+
+[role="_abstract"]
+
+The code suggestions differ based on various parameters about the large language model (LLM) used for an analysis. Therefore, model-as-a-service enables you more control over using {mta-dl-plugin} with an LLM that is trained for your specific requirements than general purpose models from the public AI providers.


I don't think we want to be documenting model-as-a-service, IF this is implying the internal instance of https://github.com/rh-aiservices-bu/models-aas

jwmatthews · 2025-08-29T15:19:13Z

docs/topics/developer-lightspeed/con_model-as-a-service.adoc

+
+{mta-dl-plugin} is built to analyze better when it can access code changes resulting from analysis performed at scale across many application teams. In an enterprise, changes at scale become more consistent when the LLMs that generate the code change suggestions are shared across application teams than when each team uses a different LLM. This approach calls for a common strategy in an enterprise to manage the underlying resources that power the models that must be exposed to multiple members in different teams.
+
+To cater to an enterprise-wide LLM deployment, {mta-dl-plugin} integrates with LLMs that are deployed as a scalable service on {ocp-full} clusters. These deployments, called model-as-a-service (MaaS), provide you with a granular control over resources such as compute, cluster nodes, and auto-scaling Graphical Processing Units (GPUs) while enabling you to leverage LLMs to perform analysis at a large scale.


I can see a potential desire to call out to the ability to run LLMs on OpenShift AI.

As I'm reading over this PR I'm not quite clear of what we are doing with model-as-a-service, in past I have heard model-as-a-service refered to as the maas endpoint at https://maas.apps.prod.rhoai.rh-aiservices-bu.com/ from the code at https://github.com/rh-aiservices-bu/models-aas.

I see the term is generic and can refer to other things, I'm just a bit concerned that some testing information may have bled over into product docs that wouldn't be appropriate.

jwmatthews · 2025-08-29T15:19:47Z

docs/topics/developer-lightspeed/proc_adding-data-connection.adoc

+[role="_abstract"]
+In {ocp-short}, a project is a Kubernetes namespace with additional annotations, and is the main way that you can manage user access to resources. A project organizes your data science work in one place and also allows you to collaborate with other developers in your organization.
+
+In your data science project, you must create a data connection to your existing S3-compatible storage bucket to which you uploaded a large language model.


This seems strange for us to document, feels out of scope for MTA

jwmatthews · 2025-08-29T15:22:33Z

docs/topics/developer-lightspeed/proc_configuring-llm-podman-desktop.adoc

+
+[role="_abstract"]
+
+The Podman AI lab extension enables you to use an open-source model from a curated list of models and use it locally in your system. 


Covering instructions for Podman AI sounds useful.
There is a concern that for practical purposes it is HIGHLY likely that users will be disappointed/frustrated running local models. This is a nature that the majority of models run locally are smaller and not powerful, that directly limits the potential help they can provide to MTA.

For most purposes a local model will yield poor results.

We likely want to be clear to instruct the user that MTA's behavior is related to the capability of a model, often local models are smaller and less powerful, therefore they often lack the ability to significantly help with MTA use-cases.

jwmatthews · 2025-08-29T15:23:48Z

docs/topics/developer-lightspeed/proc_configuring-llm-serving-runtime.adoc

+= Configuring the LLM serving runtime
+
+[role="_abstract"]
+It takes several minutes to scale nodes and pull the image to serve the virtual large language model (vLLM). However, the default time for deploying a vLLM is 10 minutes. A vLLM deployment that takes longer fails on the {ocp-short} AI cluster. 


I would remove this section.

Not sure we want to get into documenting how to run models via vllm for our MTA product docs.

Interesting to consider, but for this release both Engineering and QE have NOT tested MTA against a model deployed by vllm directly.

MTA-5378 - LLM Configurations for Developer Lightspeed

eff6ff7

Pkylas007 force-pushed the mta-5378-llm-configurations branch from 5466839 to eff6ff7 Compare August 11, 2025 16:14

Pkylas007 changed the title ~~MTA-5378 - LLM Configurations for Developer Lightspeed~~ [WIP] MTA-5378 - LLM Configurations for Developer Lightspeed Aug 11, 2025

Prabha Kylasamiyer Sundara Rajan and others added 9 commits August 12, 2025 10:58

Added assemblies to complete MaaS example

3969139

Documented the procedures for compute and auto scale infrastructure

75019ab

Documented the procedures for configuring LLM runtime configurations

c95f25f

Signed-off-by: Prabha Kylasamiyer Sundara Rajan <[email protected]>

Documented the procedures for deploying the LLM

26299f8

Signed-off-by: Prabha Kylasamiyer Sundara Rajan <[email protected]>

Delete docs/topics/developer-lightspeed/proc_export-token-certificate…

83547b3

….adoc

Modified master.adoc and configuring LLM assembly files

ae22e71

Signed-off-by: Prabha Kylasamiyer Sundara Rajan <[email protected]>

Modified docinfo

e8f4f1e

Signed-off-by: Prabha Kylasamiyer Sundara Rajan <[email protected]>

Documented procedures to prepare the LLM for developer lightspeed ana…

bc297a1

…lysis Signed-off-by: Prabha Kylasamiyer Sundara Rajan <[email protected]>

Documented Podman Desktop AI procedure

94c8af9

Signed-off-by: Prabha Kylasamiyer Sundara Rajan <[email protected]>

Pkylas007 changed the title ~~[WIP] MTA-5378 - LLM Configurations for Developer Lightspeed~~ MTA-5378 - LLM Configurations for Developer Lightspeed Aug 18, 2025

Modified master.adoc

8985fde

Signed-off-by: Prabha Kylasamiyer Sundara Rajan <[email protected]>

Pkylas007 requested review from jmontleon, shawn-hurley and anarnold97 August 25, 2025 08:55

Pkylas007 commented Aug 26, 2025

View reviewed changes

jwmatthews reviewed Aug 29, 2025

View reviewed changes

jwmatthews requested changes Aug 29, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

MTA-5378 - LLM Configurations for Developer Lightspeed #177

MTA-5378 - LLM Configurations for Developer Lightspeed #177

Pkylas007 commented Aug 11, 2025 •

edited

Loading

Uh oh!

Pkylas007 Aug 26, 2025

Uh oh!

Pkylas007 Aug 26, 2025

Uh oh!

Pkylas007 Aug 26, 2025 •

edited

Loading

Uh oh!

jwmatthews Aug 29, 2025

Uh oh!

jwmatthews left a comment

Uh oh!

jwmatthews Aug 29, 2025

Uh oh!

jwmatthews Aug 29, 2025

Uh oh!

jwmatthews Aug 29, 2025

Uh oh!

jwmatthews Aug 29, 2025

Uh oh!

jwmatthews Aug 29, 2025

Uh oh!

jwmatthews Aug 29, 2025

Uh oh!

jwmatthews Aug 29, 2025

Uh oh!

jwmatthews Aug 29, 2025

Uh oh!

jwmatthews Aug 29, 2025

Uh oh!

Uh oh!


		As a member of the hybrid cloud infrastructure team, your initial set of tasks to deploy a large language model (LLM) through model-as-a-service involves creating {ocp-short} clusters with primary and secondary nodes and configuring an identity provider with role-based access control for users to log in to the clusters.

		Next, you configure the GPU operators required to run an LLM, GPU nodes, and auto scaling for the GPU nodes in your namespace on {ocp-short} AI. The following procedures refer to an {ocp-full} cluster hosted on Amazon Web Services.


		The code suggestions differ based on various parameters about the large language model (LLM) used for an analysis. Therefore, model-as-a-service enables you more control over using {mta-dl-plugin} with an LLM that is trained for your specific requirements than general purpose models from the public AI providers.

		{mta-dl-plugin} is built to analyze better when it can access code changes resulting from analysis performed at scale across many application teams. In an enterprise, changes at scale become more consistent when the LLMs that generate the code change suggestions are shared across application teams than when each team uses a different LLM. This approach calls for a common strategy in an enterprise to manage the underlying resources that power the models that must be exposed to multiple members in different teams.


		[role="_abstract"]

		The code suggestions differ based on various parameters about the large language model (LLM) used for an analysis. Therefore, model-as-a-service enables you more control over using {mta-dl-plugin} with an LLM that is trained for your specific requirements than general purpose models from the public AI providers.


		{mta-dl-plugin} is built to analyze better when it can access code changes resulting from analysis performed at scale across many application teams. In an enterprise, changes at scale become more consistent when the LLMs that generate the code change suggestions are shared across application teams than when each team uses a different LLM. This approach calls for a common strategy in an enterprise to manage the underlying resources that power the models that must be exposed to multiple members in different teams.

		To cater to an enterprise-wide LLM deployment, {mta-dl-plugin} integrates with LLMs that are deployed as a scalable service on {ocp-full} clusters. These deployments, called model-as-a-service (MaaS), provide you with a granular control over resources such as compute, cluster nodes, and auto-scaling Graphical Processing Units (GPUs) while enabling you to leverage LLMs to perform analysis at a large scale.


		[role="_abstract"]

		The Podman AI lab extension enables you to use an open-source model from a curated list of models and use it locally in your system.

MTA-5378 - LLM Configurations for Developer Lightspeed #177

Are you sure you want to change the base?

MTA-5378 - LLM Configurations for Developer Lightspeed #177

Conversation

Pkylas007 commented Aug 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

JIRA

Version

Preview

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Pkylas007 Aug 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jwmatthews left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Pkylas007 commented Aug 11, 2025 •

edited

Loading

Pkylas007 Aug 26, 2025 •

edited

Loading