redhat-developer · jmagak · Nov 10, 2025 · Oct 30, 2025 · Oct 31, 2025 · Nov 5, 2025
diff --git a/artifacts/attributes.adoc b/artifacts/attributes.adoc
@@ -53,8 +53,8 @@
 :odf-name: OpenShift Data Foundation
 :osd-brand-name: Red Hat OpenShift Dedicated
 :osd-short: OpenShift Dedicated
-:rcs-name: Road-Core Service
-:rcs-short: RCS
+:lcs-name: Lightspeed Core Service
+:lcs-short: LCS
 :rhacs-brand-name: Red Hat Advanced Cluster Security
 :rhacs-short: Advanced Cluster Security
 :rhacs-very-short: ACS

diff --git a/assemblies/assembly-customizing-developer-lightspeed.adoc b/assemblies/assembly-customizing-developer-lightspeed.adoc
@@ -4,9 +4,7 @@
 [id="{context}"]
 = Customizing {ls-short}
 
-You can customize {ls-short} functionalities, such as, question validation, gathering feedback, and storing chat history in PostgreSQL.
-
-include::modules/developer-lightspeed/proc-using-question-validation.adoc[leveloffset=+1]
+You can customize {ls-short} functionalities such as gathering feedback, storing chat history in PostgreSQL, and {model-context-protocol-link}#proc-configure-mcp-tools-for-developer-lightspeed_assembly-model-context-protocol-tools[configuring Model Context Protocol (MCP) tools].
 
 include::modules/developer-lightspeed/proc-gathering-feedback.adoc[leveloffset=+1]
 

diff --git a/assemblies/assembly-developer-lightspeed.adoc b/assemblies/assembly-developer-lightspeed.adoc
@@ -8,7 +8,7 @@ include::modules/developer-lightspeed/con-about-developer-lightspeed.adoc[levelo
 
 include::modules/developer-lightspeed/con-supported-architecture.adoc[leveloffset=+1]
 
-include::modules/developer-lightspeed/con-about-road-core-service.adoc[leveloffset=+2]
+include::modules/developer-lightspeed/con-about-lightspeed-stack-and-llama-stack.adoc[leveloffset=+2]
 
 include::modules/developer-lightspeed/con-rag-embeddings.adoc[leveloffset=+1]
 

diff --git a/assemblies/assembly-using-developer-lightspeed.adoc b/assemblies/assembly-using-developer-lightspeed.adoc
@@ -13,6 +13,15 @@ endif::[]
 
 {ls-brand-name} is designed to support you when performing various tasks during your development workflow.
 
+[NOTE]
+====
+The `Question Validation` feature is enabled by default if you are using the `quay.io/redhat-ai-dev/llama-stack` image without overriding the `run.yaml` configuration file in the image. To disable `Question Validation`, you must mount a `run.yaml` file to the container with the following sections removed:
+
+* `Safety`
+* `Shields`
+* `External_providers_dir` set to `null` 
+====
+
 With `Question Validation` enabled, you can ask {ls-short} the following types of questions:
 
 * “Tell me about {product}.”

diff --git a/images/rhdh-plugins-reference/developer-lightspeed-1-8-0.png b/images/rhdh-plugins-reference/developer-lightspeed-1-8-0.png
diff --git a/images/rhdh-plugins-reference/developer-lightspeed-architecture-1-8-0.png b/images/rhdh-plugins-reference/developer-lightspeed-architecture-1-8-0.png
diff --git a/images/rhdh-plugins-reference/developer-lightspeed.png b/images/rhdh-plugins-reference/developer-lightspeed.png
diff --git a/modules/developer-lightspeed/con-about-bring-your-own-model.adoc b/modules/developer-lightspeed/con-about-bring-your-own-model.adoc
@@ -3,7 +3,7 @@
 [id="con-about-bring-your-own-model_{context}"]
 = About Bring Your Own Model
 
-{ls-short} does not provide its own inference services, but uses a _Bring Your Own Model_ approach. This means that you can configure the {rcs-name} to talk to the inference server or service of your choice. This also means that you are responsible for ensuring that the configured service meets your particular company policies and legal requirements, including any applicable terms with the third-party model provider.
+{ls-short} does not provide its own inference services, but uses a _Bring Your Own Model_ approach. This means that you can configure the {lcs-name} to talk to the inference server or service of your choice. This also means that you are responsible for ensuring that the configured service meets your particular company policies and legal requirements, including any applicable terms with the third-party model provider.
 //Add the cross reference to "Bring your own model"
 The only technical requirements for inference services are:
 

diff --git a/modules/developer-lightspeed/con-about-developer-lightspeed.adoc b/modules/developer-lightspeed/con-about-developer-lightspeed.adoc
@@ -14,7 +14,7 @@ This early access program enables customers to share feedback on the user experi
 You can experience {ls-short} Developer Preview by installing the Developer Lightspeed for {product} plugin within an existing {product-very-short} instance.
 Alternatively, if you prefer to test it locally first, you can try {ls-short} using {product-local-very-short}.
 
-image::rhdh-plugins-reference/developer-lightspeed.png[]
+image::rhdh-plugins-reference/developer-lightspeed-1-8-0.png[]
 
 .Additional resources
-* link:https://github.com/redhat-developer/rhdh-local/blob/main/README.md[{product-local-very-short}]
+* link:https://github.com/redhat-developer/rhdh-local/blob/main/README.md[{product-local-very-short}]
diff --git a/modules/developer-lightspeed/con-about-lightspeed-stack-and-llama-stack.adoc b/modules/developer-lightspeed/con-about-lightspeed-stack-and-llama-stack.adoc
@@ -0,0 +1,33 @@
+:_mod-docs-content-type: CONCEPT
+
+[id="con-about-lightspeed-stack-and-llama-stack_{context}"]
+= About {lcs-name} and Llama Stack
+
+The {lcs-name} and Llama Stack deploy together as sidecar containers to augment {product-very-short} functionality.
+
+The Llama Stack delivers the augmented functionality by integrating and managing core components, which include:
+
+* Large language model (LLM) inference providers
+
+* Model Context Protocol (MCP) or Retrieval Augmented Generation (RAG) tool runtime providers
+
+* Safety providers
+
+* Vector database settings
+
+The {lcs-name} serves as the Llama Stack service intermediary. It manages the operational configuration and key data, specifically:
+
+* User feedback collection
+
+* MCP server configuration
+
+* Conversation history
+
+Llama Stack provides the inference functionality that {lcs-short} uses to process requests. For more information, see https://llamastack.github.io/docs#what-is-llama-stack[What is Llama Stack].
+
+The {ls-brand-name} plugin in {product-very-short} sends prompts and receives LLM responses through the {lcs-short} sidecar. {lcs-short} then uses the Llama Stack sidecar service to perform inference and MCP or RAG tool calling.
+
+[NOTE]
+====
+{ls-brand-name} is a Developer Preview release. You must manually deploy the {lcs-name} and Llama Stack sidecar containers, and install the {ls-brand-name} plugin on your {product-very-short} instance.
+====
diff --git a/modules/developer-lightspeed/con-about-road-core-service.adoc b/modules/developer-lightspeed/con-about-road-core-service.adoc
diff --git a/modules/developer-lightspeed/con-llm-requirements.adoc b/modules/developer-lightspeed/con-llm-requirements.adoc
@@ -5,11 +5,10 @@
 
 {ls-short} follows a _Bring Your Own Model_ approach. This model means that to function, {ls-short} requires access to a large language model (LLM) which you must provide. An LLM is a type of generative AI that interprets natural language and generates human-like text or audio responses. When an LLM is used as a virtual assistant, the LLM can interpret questions and provide answers in a conversational manner.
 
-LLMs are usually provided by a service or server. Since {ls-short} does not provide an LLM for you, you must configure your preferred LLM provider during installation.
-You can use {ls-short} with a number of LLM providers that offer the OpenAI API interface including the following LLMS:
+LLMs are usually provided by a service or server. Because {ls-short} does not provide an LLM for you, you must configure your preferred LLM provider during installation. You can configure the underlying Llama Stack server to integrate with a number of LLM `providers` that offer compatibility with the OpenAI API including the following inference providers:
 
 * OpenAI (cloud-based inference service)
-* Red Hat OpenShift AI (enterprise model builder & inference server)
-* Red Hat Enterprise Linux AI (enterprise inference server)
+* {rhoai-brand-name} (enterprise model builder and inference server)
+* {rhel} AI (enterprise inference server)
 * Ollama (popular desktop inference server)
 * vLLM (popular enterprise inference server)
diff --git a/modules/developer-lightspeed/con-rag-embeddings.adoc b/modules/developer-lightspeed/con-rag-embeddings.adoc
@@ -1,6 +1,8 @@
 :_mod-docs-content-type: CONCEPT
 
 [id="con-rag-embeddings_{context}"]
-= Retrieval Augmented Generation embeddings
+= Retrieval augmented generation (RAG) embeddings
 
-The {product} documentation set has been added to the {rcs-name} as a RAG embedding.
+The {product} documentation serves as the Retrieval-Augmented Generation (RAG) data source.
+
+RAG initialization occurs through an initialization container, which copies the RAG data to a shared volume. The Llama Stack sidecar then mounts this shared volume to access the RAG data. The Llama Stack service uses the resulting RAG embeddings in the vector database as a reference. This allows the service to provide citations to production documentation during the inference process.
diff --git a/modules/developer-lightspeed/con-supported-architecture.adoc b/modules/developer-lightspeed/con-supported-architecture.adoc
@@ -3,13 +3,11 @@
 [id="con-supported-architecture_{context}"]
 = Supported architecture for {ls-brand-name}
 
-{ls-short} is available as a plugin on all platforms that host {product-very-short}, and it requires the use of {rcs-name} ({rcs-short}) as a sidecar container.
+{ls-short} is available as a plugin on all platforms that host {product-very-short}. It requires two sidecar containers: the {lcs-name} ({lcs-short}) and the Llama Stack service.
 
+The {lcs-short} container acts as the intermediary layer, which interfaces with and manages the Llama Stack service.
 
-[NOTE]
-====
-Currently, the provided {rcs-short} image is built for x86 platforms. To use other platforms (for example, arm64), ensure that you enable emulation.
-====
+image::rhdh-plugins-reference/developer-lightspeed-architecture-1-8-0.png[]
 
 .Additional resources
-* link:https://access.redhat.com/support/policy/updates/developerhub[{product} Life Cycle and supported platforms]
+* link:https://access.redhat.com/support/policy/updates/developerhub[{product} Life Cycle and supported platforms]
diff --git a/modules/developer-lightspeed/proc-changing-your-llm-provider.adoc b/modules/developer-lightspeed/proc-changing-your-llm-provider.adoc
@@ -3,66 +3,25 @@
 [id="proc-changing-your-llm-provider_{context}"]
 = Changing your LLM provider in {ls-short}
 
-{ls-short} operates on a {developer-lightspeed-link}#con-about-bring-your-own-model_appendix-about-user-data-security[_Bring Your Own Model_] approach, meaning you must provide and configure access to your preferred Large Language Model (LLM) provider for the service to function. The Road-Core Service (RCS) acts as an intermediary layer that handles the configuration and setup of these LLM providers.
-
-[IMPORTANT]
-====
-The LLM provider configuration section includes a mandatory dummy provider block. Due to limitations of Road Core, this dummy provider must remain present when working with Lightspeed. This block is typically marked with comments (# Start: Do not remove this block and # End: Do not remove this block) and must not be removed from the configuration file.
-====
-
-.Prerequisites
-
-* The path to the file containing your API token must be accessible by the RCS container, requiring the file to be mounted to the RCS container.
+{ls-short} operates on a {developer-lightspeed-link}#con-about-bring-your-own-model_appendix-about-user-data-security[_Bring Your Own Model_] approach, meaning you must provide and configure access to your preferred large language model (LLM) provider for the service to function. Llama Stack acts as an intermediary layer that handles the configuration and setup of these LLM providers.
 
 .Procedure
 
-You can define additional LLM providers using either of following methods:
-
-* Recommended: In your Developer Lightspeed plugin configuration (the `lightspeed` section within the `lightspeed-app-config.yaml` file), define the new provider or providers under the `lightspeed.servers` key as shown in the following code:
-+
-[source,yaml]
-----
-lightspeed:
-  servers:
-    - id: _<my_new_provider>_
-      url: _<my_new_url>_
-      token: _<my_new_token>_
-----
-+
-[NOTE]
-====
-In Developer preview, only one LLM server is supported at a time.
-====
-** Optional: You can set the `id`, `url`, and `token` values in a Kubernetes Secret and reference them as environment variables using the `envFrom` section.
-[source,yaml]
-----
-containers:
-    - name: my-container
-      image: my-image
-      envFrom:
-        - secretRef:
-            name: my-secret
-----
-
-* You can add new LLM providers by updating the `rcsconfig.yaml` file.
-.. In the `llm_providers` section within your `rcsconfig.yaml` file, add your new provider configuration below the mandatory dummy provider block as shown in the following code:
+* You can define additional LLM providers by updating your Llama Stack app config (`llama-stack`) file. In the `inference` section within your `llama-stack.yaml` file, add your new provider configuration as shown in the following example:
 +
 [source,yaml]
 ----
-llm_providers:
-   # Start: Do not remove this block
-    - name: dummy
-      type: openai
-      url: https://dummy.com
-      models:
-        - name: dummymodel
-   # END: Do not remove this block
-    - name: _<my_new_providers>_
-      type: openai
-      url: _<my_provider_url>_
-      credentials_path: path/to/token
-      disable_model_check: true
-----
-.. If you need to define a new provider in `rcsconfig.yaml`, you must configure the following critical parameters:
-** `credentials_path`: Specifies the path to a `.txt` file that contains your API token. This file must be mounted and accessible by the RCS container.
-** `disable_model_check`: Set this field to `true` to allow the RCS to locate models through the `/v1/models` endpoint of the provider. When you set this field to `true`, you avoid the need to define model names explicitly in the configuration.
+  #START - Adding your LLM provider
+  inference:
+    - provider_id: vllm
+      provider_type: remote::vllm
+      config:
+        url: ${env.VLLM_URL}
+        api_token: ${env.VLLM_API_KEY}
+        max_tokens: ${env.VLLM_MAX_TOKENS:=4096}
+        tls_verify: ${env.VLLM_TLS_VERIFY:=true}
+    - provider_id: sentence-transformers
+      provider_type: inline::sentence-transformers
+      config: {}
+#END - Adding your LLM provider
+----
diff --git a/modules/developer-lightspeed/proc-customizing-the-chat-history-storage.adoc b/modules/developer-lightspeed/proc-customizing-the-chat-history-storage.adoc
@@ -3,45 +3,37 @@
 [id="proc-customizing-the-chat-history-storage_{context}"]
 = Customizing the chat history storage in {ls-short}
 
-By default, the {rcs-short} service stores chat history using an in-memory database. This means that if you restart the Pod containing the server, the chat history is lost. You can manually configure {ls-short} to store the chat history persistently as a long-term backup with PostgreSQL by any of the following methods:
-
-* {product-very-short} Operator
-* {product-very-short} Helm chart
+By default, the {ls-short} service stores chat history in a non-persistent local SQL database within in the {lcs-short} container. This means that chat history is lost if you create and use a new {lcs-short} sidecar. You can manually configure {ls-short} to store the chat history persistently as a long-term backup with PostgreSQL by updating your {lcs-short} service configuration.
 
 +
 [WARNING]
 ====
-If you configure {ls-short} to store chat history using PostgreSQL, prompts and responses are recorded and can be reviewed by your platform administrators. If any of your user's chat history contains any private, sensitive, or confidential information, this might have data privacy and security implications that you need to assess. For users that wish to have their chat data removed, they must request their respective platform administrator to perform this action. {company-name} does not collect (or have access to) any of this chat history data.
+Configuring {ls-short} to use PostgreSQL records prompts and responses, which platform administrators can review. You must assess any data privacy and security implications if user chat history contains private, sensitive, or confidential information. For users that wish to have their chat data removed, they must request their respective platform administrator to perform this action. {company-name} does not collect or access this chat history data.
 ====
 
 .Procedure
-* When you are using {ls-short} on an Operator-installed {product-very-short} instance, in your {product-very-short} instance ConfigMap, update the `conversation-cache` field as shown in the following example:
-+
+. Configure the chat history storage type in the {lcs-short} configuration file (`lightspeed-stack.yaml`) using any of the relevant options:
+** To enable persistent storage with PostgreSQL, add the following configuration:
++ 
 [source,yaml]
 ----
  conversation_cache:
     type: postgres
     postgres:
       host: _<your_database_host>_
       port: _<your_database_port>_
-      dbname: _<your_database_name>_
-   	  user: _<your_user_name>_
-	    password_path: postgres_password.txt
-	    ca_cert_path: postgres_cert.crt
-      ssl_mode: "require"
+      db: _<your_database_name>_
+      user: _<your_user_name>_
+      password: _<postgres_password>_
 ----
-
-* When you are using {ls-short} on a Helm-installed {product-very-short} instance, in your {product-very-short} instance `values.yaml` file, update the `conversation-cache` field as shown in the following example:
+* To retain the default, non-persistent SQLite storage, make sure the configuration is set as shown in the following example:
 +
 [source,yaml]
 ----
- conversation_cache:
-    type: postgres
-    postgres:
-      host: _<your_database_host>_
-      port: _<your_database_port>_
-      dbname: _<your_database_name>_
-   	  user: _<your_user_name>_
-	    password_path: postgres_password.txt
-	    ca_cert_path: postgres_cert.crt
+conversation_cache:
+  type: "sqlite"
+  sqlite:
+    db_path: "/tmp/cache.db"
 ----
+
+. Restart your {lcs-short} service to apply the new configuration.
diff --git a/modules/developer-lightspeed/proc-gathering-feedback.adoc b/modules/developer-lightspeed/proc-gathering-feedback.adoc
@@ -3,19 +3,21 @@
 [id="proc-gathering-feedback_{context}"]
 = Gathering feedback in {ls-short}
 
-Feedback collection is an optional feature configured on the {rcs-short}. This feature gathers user feedback by providing thumbs-up/down ratings and text comments directly from the chat window. {rcs-short} gathers the feedback, along with the user's query and the response of the model, and stores it as a JSON file within the local file system of the Pod for later collection and analysis by the platform administrator. This can be useful for assessing model performance and improving your users' experience. The collected feedback is stored in the cluster where {product-very-short} and {rcs-short} are deployed, and as such, is only accessible by the platform administrators for that cluster. For users that intend to have their data removed, they must request their respective platform administrator to perform that action as {company-name} does not collect (or have access to) any of this data.
+Feedback collection is an optional feature configured on the {lcs-short}. This feature gathers user feedback by providing thumbs-up/down ratings and text comments directly from the chat window. 
+
+{lcs-short} collects the feedback, the user's query, and the response of the model, storing the data as a JSON file on the local file system of the Pod. A platform administrator must later collect and analyze this data to assess model performance and improve the user experience.
+
+The collected data resides in the cluster where {product-very-short} and {lcs-short} are deployed, making it accessible only to platform administrators for that cluster. For data removal, users must request this action from their platform administrator, as {company-name} neither collects nor accesses this data.
 
 .Procedure
 
-* To enable or disable feedback, in your {rcs-short} configuration file, add the following settings:
+. To enable or disable feedback collection, in the {lcs-short} configuration file (`lightspeed-stack.yaml`), add the following settings:
 +
 [source,yaml]
 ----
-llm_providers:
-   .......
-ols_config:
-   ......
   user_data_collection:
-    feedback_disabled: <true/false>
-    feedback_storage: "/app-root/tmp/data/feedback"
+    feedback_enabled: true
+    feedback_storage: "/tmp/data/feedback"
+    transcripts_enabled: true
+    transcripts_storage: "/tmp/data/transcripts"
 ----