|
| 1 | +:_module-type: CONCEPT |
| 2 | + |
| 3 | +[id="working-with-llama-stack_{context}"] |
| 4 | += Working with Llama Stack |
| 5 | + |
| 6 | +[role="_abstract"] |
| 7 | +Llama Stack is a unified AI runtime environment designed to simplify the deployment and management of generative AI workloads on {productname-short}. Llama Stack integrates LLM inference servers, vector databases, and retrieval services in a single stack, optimized for Retrieval-Augmented Generation (RAG) and agent-based AI workflows. In {openshift-platform}, the Llama Stack Operator manages the deployment lifecycle of these components, ensuring scalability, consistency, and integration with {productname-short} projects. |
| 8 | + |
| 9 | +ifndef::upstream[] |
| 10 | +[IMPORTANT] |
| 11 | +==== |
| 12 | +ifdef::self-managed[] |
| 13 | +Llama Stack integration is currently available in {productname-long} {vernum} as a Technology Preview feature. |
| 14 | +endif::[] |
| 15 | +ifdef::cloud-service[] |
| 16 | +Llama Stack integration is currently available in {productname-long} as a Technology Preview feature. |
| 17 | +endif::[] |
| 18 | +Technology Preview features are not supported with {org-name} production service level agreements (SLAs) and might not be functionally complete. |
| 19 | +{org-name} does not recommend using them in production. |
| 20 | +These features provide early access to upcoming product features, enabling customers to test functionality and provide feedback during the development process. |
| 21 | +
|
| 22 | +For more information about the support scope of {org-name} Technology Preview features, see link:https://access.redhat.com/support/offerings/techpreview/[Technology Preview Features Support Scope]. |
| 23 | +==== |
| 24 | +endif::[] |
| 25 | + |
| 26 | +Llama Stack includes the following components: |
| 27 | + |
| 28 | +* **Inference model servers** such as vLLM, designed to efficiently serve large language models. |
| 29 | +* **Vector storage** solutions, primarily Milvus, to store embeddings generated from your domain data. |
| 30 | +* **Retrieval and embedding management** workflows using integrated tools, such as Docling, to handle continuous data ingestion and synchronization. |
| 31 | +* **Integration with {productname-short}** by using the `LlamaStackDistribution` custom resource, simplifying configuration and deployment. |
| 32 | + |
| 33 | +ifdef::upstream[] |
| 34 | +For information about how to deploy Llama Stack in {productname-short}, see link:{odhdocshome}/working-with-rag/#deploying-a-rag-stack-in-a-data-science-project_rag[Deploying a RAG stack in a Data Science Project]. |
| 35 | +endif::[] |
| 36 | +ifndef::upstream[] |
| 37 | +For information about how to deploy Llama Stack in {productname-short}, see link:{rhoaidocshome}{default-format-url}/working_with_rag/deploying-a-rag-stack-in-a-data-science-project_rag[Deploying a RAG stack in a Data Science Project]. |
| 38 | +endif::[] |
| 39 | + |
| 40 | +[role="_additional-resources"] |
| 41 | +.Additional resources |
| 42 | +* link:https://github.com/opendatahub-io/llama-stack-demos[Llama Stack Demos repository^] |
| 43 | +* link:https://llama-stack-k8s-operator.pages.dev/[Llama Stack Kubernetes Operator documentation^] |
| 44 | +* link:https://llama-stack.readthedocs.io/en/latest/[Llama Stack documentation] |
0 commit comments