Merge pull request #877 from chtyler/RHAI-ENG-124-llamastack-activation

chtyler · web-flow · commit 3c16a36a11c5 · 2025-07-25T14:09:44.000+01:00
RHAI-ENG-124-llamastack-activation - initial draft of tech preview co…
diff --git a/modules/working-with-llama-stack.adoc b/modules/working-with-llama-stack.adoc
@@ -0,0 +1,44 @@
+:_module-type: CONCEPT
+
+[id="working-with-llama-stack_{context}"]
+= Working with Llama Stack
+
+[role="_abstract"]
+Llama Stack is a unified AI runtime environment designed to simplify the deployment and management of generative AI workloads on {productname-short}. Llama Stack integrates LLM inference servers, vector databases, and retrieval services in a single stack, optimized for Retrieval-Augmented Generation (RAG) and agent-based AI workflows. In {openshift-platform}, the Llama Stack Operator manages the deployment lifecycle of these components, ensuring scalability, consistency, and integration with {productname-short} projects.
+
+ifndef::upstream[]
+[IMPORTANT]
+====
+ifdef::self-managed[]
+Llama Stack integration is currently available in {productname-long} {vernum} as a Technology Preview feature.
+endif::[]
+ifdef::cloud-service[]
+Llama Stack integration is currently available in {productname-long} as a Technology Preview feature.
+endif::[]
+Technology Preview features are not supported with {org-name} production service level agreements (SLAs) and might not be functionally complete.
+{org-name} does not recommend using them in production.
+These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process.
+
+For more information about the support scope of {org-name} Technology Preview features, see link:https://access.redhat.com/support/offerings/techpreview/[Technology Preview Features Support Scope].
+====
+endif::[]
+
+Llama Stack includes the following components:
+
+* **Inference model servers** such as vLLM, designed to efficiently serve large language models.
+* **Vector storage** solutions, primarily Milvus, to store embeddings generated from your domain data.
+* **Retrieval and embedding management** workflows using integrated tools, such as Docling, to handle continuous data ingestion and synchronization.
+* **Integration with {productname-short}** by using the `LlamaStackDistribution` custom resource, simplifying configuration and deployment.
+
+ifdef::upstream[]
+For information about how to deploy Llama Stack in {productname-short}, see link:{odhdocshome}/working-with-rag/#deploying-a-rag-stack-in-a-data-science-project_rag[Deploying a RAG stack in a Data Science Project].
+endif::[]
+ifndef::upstream[]
+For information about how to deploy Llama Stack in {productname-short}, see link:{rhoaidocshome}{default-format-url}/working_with_rag/deploying-a-rag-stack-in-a-data-science-project_rag[Deploying a RAG stack in a Data Science Project].
+endif::[]
+
+[role="_additional-resources"]
+.Additional resources
+* link:https://github.com/opendatahub-io/llama-stack-demos[Llama Stack Demos repository^]
+* link:https://llama-stack-k8s-operator.pages.dev/[Llama Stack Kubernetes Operator documentation^]
+* link:https://llama-stack.readthedocs.io/en/latest/[Llama Stack documentation]
diff --git a/working-with-rag.adoc b/working-with-rag.adoc
@@ -16,4 +16,5 @@ include::_artifacts/document-attributes-global.adoc[]
 = Working with RAG
 
 include::modules/overview-of-rag.adoc[leveloffset=+1]
+include::modules/working-with-llama-stack.adoc[leveloffset=+1]
 include::assemblies/deploying-a-rag-stack-in-a-data-science-project.adoc[leveloffset=+1]