Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions artifacts/attributes.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -53,8 +53,8 @@
:odf-name: OpenShift Data Foundation
:osd-brand-name: Red Hat OpenShift Dedicated
:osd-short: OpenShift Dedicated
:rcs-name: Road-Core Service
:rcs-short: RCS
:lcs-name: Lightspeed Core Service
:lcs-short: LCS
:rhacs-brand-name: Red Hat Advanced Cluster Security
:rhacs-short: Advanced Cluster Security
:rhacs-very-short: ACS
Expand Down
4 changes: 1 addition & 3 deletions assemblies/assembly-customizing-developer-lightspeed.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,7 @@
[id="{context}"]
= Customizing {ls-short}

You can customize {ls-short} functionalities, such as, question validation, gathering feedback, and storing chat history in PostgreSQL.

include::modules/developer-lightspeed/proc-using-question-validation.adoc[leveloffset=+1]
You can customize {ls-short} functionalities such as gathering feedback, storing chat history in PostgreSQL, and {model-context-protocol-link}#proc-configure-mcp-tools-for-developer-lightspeed_assembly-model-context-protocol-tools[configuring Model Context Protocol (MCP) tools].

include::modules/developer-lightspeed/proc-gathering-feedback.adoc[leveloffset=+1]

Expand Down
2 changes: 1 addition & 1 deletion assemblies/assembly-developer-lightspeed.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ include::modules/developer-lightspeed/con-about-developer-lightspeed.adoc[levelo

include::modules/developer-lightspeed/con-supported-architecture.adoc[leveloffset=+1]

include::modules/developer-lightspeed/con-about-road-core-service.adoc[leveloffset=+2]
include::modules/developer-lightspeed/con-about-lightspeed-stack-and-llama-stack.adoc[leveloffset=+2]

include::modules/developer-lightspeed/con-rag-embeddings.adoc[leveloffset=+1]

Expand Down
9 changes: 9 additions & 0 deletions assemblies/assembly-using-developer-lightspeed.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,15 @@ endif::[]

{ls-brand-name} is designed to support you when performing various tasks during your development workflow.

[NOTE]
====
The `Question Validation` feature is enabled by default if you are using the `quay.io/redhat-ai-dev/llama-stack` image without overriding the `run.yaml` configuration file in the image. To disable `Question Validation`, you must mount a `run.yaml` file to the container with the following sections removed:

* `Safety`
* `Shields`
* `External_providers_dir` set to `null`
====

With `Question Validation` enabled, you can ask {ls-short} the following types of questions:

* “Tell me about {product}.”
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
[id="con-about-bring-your-own-model_{context}"]
= About Bring Your Own Model

{ls-short} does not provide its own inference services, but uses a _Bring Your Own Model_ approach. This means that you can configure the {rcs-name} to talk to the inference server or service of your choice. This also means that you are responsible for ensuring that the configured service meets your particular company policies and legal requirements, including any applicable terms with the third-party model provider.
{ls-short} does not provide its own inference services, but uses a _Bring Your Own Model_ approach. This means that you can configure the {lcs-name} to talk to the inference server or service of your choice. This also means that you are responsible for ensuring that the configured service meets your particular company policies and legal requirements, including any applicable terms with the third-party model provider.
//Add the cross reference to "Bring your own model"
The only technical requirements for inference services are:

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ This early access program enables customers to share feedback on the user experi
You can experience {ls-short} Developer Preview by installing the Developer Lightspeed for {product} plugin within an existing {product-very-short} instance.
Alternatively, if you prefer to test it locally first, you can try {ls-short} using {product-local-very-short}.

image::rhdh-plugins-reference/developer-lightspeed.png[]
image::rhdh-plugins-reference/developer-lightspeed-1-8-0.png[]

.Additional resources
* link:https://github.com/redhat-developer/rhdh-local/blob/main/README.md[{product-local-very-short}]
* link:https://github.com/redhat-developer/rhdh-local/blob/main/README.md[{product-local-very-short}]
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
:_mod-docs-content-type: CONCEPT

[id="con-about-lightspeed-stack-and-llama-stack_{context}"]
= About {lcs-name} and Llama Stack

The {lcs-name} and Llama Stack deploy together as sidecar containers to augment {product-very-short} functionality.

The Llama Stack delivers the augmented functionality by integrating and managing core components, which include:

* Large language model (LLM) inference providers

* Model Context Protocol (MCP) or Retrieval Augmented Generation (RAG) tool runtime providers

* Safety providers

* Vector database settings

The {lcs-name} serves as the Llama Stack service intermediary. It manages the operational configuration and key data, specifically:

* User feedback collection

* MCP server configuration

* Conversation history

Llama Stack provides the inference functionality that {lcs-short} uses to process requests. For more information, see https://llamastack.github.io/docs#what-is-llama-stack[What is Llama Stack].

The {ls-brand-name} plugin in {product-very-short} sends prompts and receives LLM responses through the {lcs-short} sidecar. {lcs-short} then uses the Llama Stack sidecar service to perform inference and MCP or RAG tool calling.

[NOTE]
====
{ls-brand-name} is a Developer Preview release. You must manually deploy the {lcs-name} and Llama Stack sidecar containers, and install the {ls-brand-name} plugin on your {product-very-short} instance.
====
11 changes: 0 additions & 11 deletions modules/developer-lightspeed/con-about-road-core-service.adoc

This file was deleted.

7 changes: 3 additions & 4 deletions modules/developer-lightspeed/con-llm-requirements.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -5,11 +5,10 @@

{ls-short} follows a _Bring Your Own Model_ approach. This model means that to function, {ls-short} requires access to a large language model (LLM) which you must provide. An LLM is a type of generative AI that interprets natural language and generates human-like text or audio responses. When an LLM is used as a virtual assistant, the LLM can interpret questions and provide answers in a conversational manner.

LLMs are usually provided by a service or server. Since {ls-short} does not provide an LLM for you, you must configure your preferred LLM provider during installation.
You can use {ls-short} with a number of LLM providers that offer the OpenAI API interface including the following LLMS:
LLMs are usually provided by a service or server. Because {ls-short} does not provide an LLM for you, you must configure your preferred LLM provider during installation. You can configure the underlying Llama Stack server to integrate with a number of LLM `providers` that offer compatibility with the OpenAI API including the following inference providers:

* OpenAI (cloud-based inference service)
* Red Hat OpenShift AI (enterprise model builder & inference server)
* Red Hat Enterprise Linux AI (enterprise inference server)
* {rhoai-brand-name} (enterprise model builder and inference server)
* {rhel} AI (enterprise inference server)
* Ollama (popular desktop inference server)
* vLLM (popular enterprise inference server)
6 changes: 4 additions & 2 deletions modules/developer-lightspeed/con-rag-embeddings.adoc
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
:_mod-docs-content-type: CONCEPT

[id="con-rag-embeddings_{context}"]
= Retrieval Augmented Generation embeddings
= Retrieval augmented generation (RAG) embeddings

The {product} documentation set has been added to the {rcs-name} as a RAG embedding.
The {product} documentation serves as the Retrieval-Augmented Generation (RAG) data source.

RAG initialization occurs through an initialization container, which copies the RAG data to a shared volume. The Llama Stack sidecar then mounts this shared volume to access the RAG data. The Llama Stack service uses the resulting RAG embeddings in the vector database as a reference. This allows the service to provide citations to production documentation during the inference process.
10 changes: 4 additions & 6 deletions modules/developer-lightspeed/con-supported-architecture.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -3,13 +3,11 @@
[id="con-supported-architecture_{context}"]
= Supported architecture for {ls-brand-name}

{ls-short} is available as a plugin on all platforms that host {product-very-short}, and it requires the use of {rcs-name} ({rcs-short}) as a sidecar container.
{ls-short} is available as a plugin on all platforms that host {product-very-short}. It requires two sidecar containers: the {lcs-name} ({lcs-short}) and the Llama Stack service.

The {lcs-short} container acts as the intermediary layer, which interfaces with and manages the Llama Stack service.

[NOTE]
====
Currently, the provided {rcs-short} image is built for x86 platforms. To use other platforms (for example, arm64), ensure that you enable emulation.
====
image::rhdh-plugins-reference/developer-lightspeed-architecture-1-8-0.png[]

.Additional resources
* link:https://access.redhat.com/support/policy/updates/developerhub[{product} Life Cycle and supported platforms]
* link:https://access.redhat.com/support/policy/updates/developerhub[{product} Life Cycle and supported platforms]
73 changes: 16 additions & 57 deletions modules/developer-lightspeed/proc-changing-your-llm-provider.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -3,66 +3,25 @@
[id="proc-changing-your-llm-provider_{context}"]
= Changing your LLM provider in {ls-short}

{ls-short} operates on a {developer-lightspeed-link}#con-about-bring-your-own-model_appendix-about-user-data-security[_Bring Your Own Model_] approach, meaning you must provide and configure access to your preferred Large Language Model (LLM) provider for the service to function. The Road-Core Service (RCS) acts as an intermediary layer that handles the configuration and setup of these LLM providers.

[IMPORTANT]
====
The LLM provider configuration section includes a mandatory dummy provider block. Due to limitations of Road Core, this dummy provider must remain present when working with Lightspeed. This block is typically marked with comments (# Start: Do not remove this block and # End: Do not remove this block) and must not be removed from the configuration file.
====

.Prerequisites

* The path to the file containing your API token must be accessible by the RCS container, requiring the file to be mounted to the RCS container.
{ls-short} operates on a {developer-lightspeed-link}#con-about-bring-your-own-model_appendix-about-user-data-security[_Bring Your Own Model_] approach, meaning you must provide and configure access to your preferred large language model (LLM) provider for the service to function. Llama Stack acts as an intermediary layer that handles the configuration and setup of these LLM providers.

.Procedure

You can define additional LLM providers using either of following methods:

* Recommended: In your Developer Lightspeed plugin configuration (the `lightspeed` section within the `lightspeed-app-config.yaml` file), define the new provider or providers under the `lightspeed.servers` key as shown in the following code:
+
[source,yaml]
----
lightspeed:
servers:
- id: _<my_new_provider>_
url: _<my_new_url>_
token: _<my_new_token>_
----
+
[NOTE]
====
In Developer preview, only one LLM server is supported at a time.
====
** Optional: You can set the `id`, `url`, and `token` values in a Kubernetes Secret and reference them as environment variables using the `envFrom` section.
[source,yaml]
----
containers:
- name: my-container
image: my-image
envFrom:
- secretRef:
name: my-secret
----

* You can add new LLM providers by updating the `rcsconfig.yaml` file.
.. In the `llm_providers` section within your `rcsconfig.yaml` file, add your new provider configuration below the mandatory dummy provider block as shown in the following code:
* You can define additional LLM providers by updating your Llama Stack app config (`llama-stack`) file. In the `inference` section within your `llama-stack.yaml` file, add your new provider configuration as shown in the following example:
+
[source,yaml]
----
llm_providers:
# Start: Do not remove this block
- name: dummy
type: openai
url: https://dummy.com
models:
- name: dummymodel
# END: Do not remove this block
- name: _<my_new_providers>_
type: openai
url: _<my_provider_url>_
credentials_path: path/to/token
disable_model_check: true
----
.. If you need to define a new provider in `rcsconfig.yaml`, you must configure the following critical parameters:
** `credentials_path`: Specifies the path to a `.txt` file that contains your API token. This file must be mounted and accessible by the RCS container.
** `disable_model_check`: Set this field to `true` to allow the RCS to locate models through the `/v1/models` endpoint of the provider. When you set this field to `true`, you avoid the need to define model names explicitly in the configuration.
#START - Adding your LLM provider
inference:
- provider_id: vllm
provider_type: remote::vllm
config:
url: ${env.VLLM_URL}
api_token: ${env.VLLM_API_KEY}
max_tokens: ${env.VLLM_MAX_TOKENS:=4096}
tls_verify: ${env.VLLM_TLS_VERIFY:=true}
- provider_id: sentence-transformers
provider_type: inline::sentence-transformers
config: {}
#END - Adding your LLM provider
----
Original file line number Diff line number Diff line change
Expand Up @@ -3,45 +3,37 @@
[id="proc-customizing-the-chat-history-storage_{context}"]
= Customizing the chat history storage in {ls-short}

By default, the {rcs-short} service stores chat history using an in-memory database. This means that if you restart the Pod containing the server, the chat history is lost. You can manually configure {ls-short} to store the chat history persistently as a long-term backup with PostgreSQL by any of the following methods:

* {product-very-short} Operator
* {product-very-short} Helm chart
By default, the {ls-short} service stores chat history in a non-persistent local SQL database within in the {lcs-short} container. This means that chat history is lost if you create and use a new {lcs-short} sidecar. You can manually configure {ls-short} to store the chat history persistently as a long-term backup with PostgreSQL by updating your {lcs-short} service configuration.

+
[WARNING]
====
If you configure {ls-short} to store chat history using PostgreSQL, prompts and responses are recorded and can be reviewed by your platform administrators. If any of your user's chat history contains any private, sensitive, or confidential information, this might have data privacy and security implications that you need to assess. For users that wish to have their chat data removed, they must request their respective platform administrator to perform this action. {company-name} does not collect (or have access to) any of this chat history data.
Configuring {ls-short} to use PostgreSQL records prompts and responses, which platform administrators can review. You must assess any data privacy and security implications if user chat history contains private, sensitive, or confidential information. For users that wish to have their chat data removed, they must request their respective platform administrator to perform this action. {company-name} does not collect or access this chat history data.
====

.Procedure
* When you are using {ls-short} on an Operator-installed {product-very-short} instance, in your {product-very-short} instance ConfigMap, update the `conversation-cache` field as shown in the following example:
+
. Configure the chat history storage type in the {lcs-short} configuration file (`lightspeed-stack.yaml`) using any of the relevant options:
** To enable persistent storage with PostgreSQL, add the following configuration:
+
[source,yaml]
----
conversation_cache:
type: postgres
postgres:
host: _<your_database_host>_
port: _<your_database_port>_
dbname: _<your_database_name>_
user: _<your_user_name>_
password_path: postgres_password.txt
ca_cert_path: postgres_cert.crt
ssl_mode: "require"
db: _<your_database_name>_
user: _<your_user_name>_
password: _<postgres_password>_
----

* When you are using {ls-short} on a Helm-installed {product-very-short} instance, in your {product-very-short} instance `values.yaml` file, update the `conversation-cache` field as shown in the following example:
* To retain the default, non-persistent SQLite storage, make sure the configuration is set as shown in the following example:
+
[source,yaml]
----
conversation_cache:
type: postgres
postgres:
host: _<your_database_host>_
port: _<your_database_port>_
dbname: _<your_database_name>_
user: _<your_user_name>_
password_path: postgres_password.txt
ca_cert_path: postgres_cert.crt
conversation_cache:
type: "sqlite"
sqlite:
db_path: "/tmp/cache.db"
----

. Restart your {lcs-short} service to apply the new configuration.
18 changes: 10 additions & 8 deletions modules/developer-lightspeed/proc-gathering-feedback.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -3,19 +3,21 @@
[id="proc-gathering-feedback_{context}"]
= Gathering feedback in {ls-short}

Feedback collection is an optional feature configured on the {rcs-short}. This feature gathers user feedback by providing thumbs-up/down ratings and text comments directly from the chat window. {rcs-short} gathers the feedback, along with the user's query and the response of the model, and stores it as a JSON file within the local file system of the Pod for later collection and analysis by the platform administrator. This can be useful for assessing model performance and improving your users' experience. The collected feedback is stored in the cluster where {product-very-short} and {rcs-short} are deployed, and as such, is only accessible by the platform administrators for that cluster. For users that intend to have their data removed, they must request their respective platform administrator to perform that action as {company-name} does not collect (or have access to) any of this data.
Feedback collection is an optional feature configured on the {lcs-short}. This feature gathers user feedback by providing thumbs-up/down ratings and text comments directly from the chat window.

{lcs-short} collects the feedback, the user's query, and the response of the model, storing the data as a JSON file on the local file system of the Pod. A platform administrator must later collect and analyze this data to assess model performance and improve the user experience.

The collected data resides in the cluster where {product-very-short} and {lcs-short} are deployed, making it accessible only to platform administrators for that cluster. For data removal, users must request this action from their platform administrator, as {company-name} neither collects nor accesses this data.

.Procedure

* To enable or disable feedback, in your {rcs-short} configuration file, add the following settings:
. To enable or disable feedback collection, in the {lcs-short} configuration file (`lightspeed-stack.yaml`), add the following settings:
+
[source,yaml]
----
llm_providers:
.......
ols_config:
......
user_data_collection:
feedback_disabled: <true/false>
feedback_storage: "/app-root/tmp/data/feedback"
feedback_enabled: true
feedback_storage: "/tmp/data/feedback"
transcripts_enabled: true
transcripts_storage: "/tmp/data/transcripts"
----
Loading