chore: Updating KFP docs (kubeflow#11927)

franciscojavierarceo · mprahl · web-flow · commit 6c32514c356c · 2025-05-27T21:02:18.000Z
* chore: Updating KFP docs

Signed-off-by: Francisco Javier Arceo &lt;farceo@redhat.com&gt;

* remove html reference

Signed-off-by: Francisco Javier Arceo &lt;farceo@redhat.com&gt;

* updating python version

Signed-off-by: Francisco Javier Arceo &lt;farceo@redhat.com&gt;

* Update index.rst

Co-authored-by: Matt Prahl &lt;mprahl@users.noreply.github.com&gt;
Signed-off-by: Francisco Arceo &lt;farceo@redhat.com&gt;

* Update quickstart.rst

Co-authored-by: Matt Prahl &lt;mprahl@users.noreply.github.com&gt;
Signed-off-by: Francisco Arceo &lt;farceo@redhat.com&gt;

* Update quickstart.rst

Co-authored-by: Matt Prahl &lt;mprahl@users.noreply.github.com&gt;
Signed-off-by: Francisco Arceo &lt;farceo@redhat.com&gt;

* rebasing

Signed-off-by: Francisco Javier Arceo &lt;farceo@redhat.com&gt;

* removed overview file

Signed-off-by: Francisco Javier Arceo &lt;farceo@redhat.com&gt;

* updated to make things look better

Signed-off-by: Francisco Javier Arceo &lt;farceo@redhat.com&gt;

* Update docs/source/overview.rst

Co-authored-by: Matt Prahl &lt;mprahl@users.noreply.github.com&gt;
Signed-off-by: Francisco Arceo &lt;farceo@redhat.com&gt;

* Update docs/source/overview.rst

Co-authored-by: Matt Prahl &lt;mprahl@users.noreply.github.com&gt;
Signed-off-by: Francisco Arceo &lt;farceo@redhat.com&gt;

* Update docs/source/installation.rst

Co-authored-by: Matt Prahl &lt;mprahl@users.noreply.github.com&gt;
Signed-off-by: Francisco Arceo &lt;farceo@redhat.com&gt;

* updating conf.py

Signed-off-by: Francisco Javier Arceo &lt;farceo@redhat.com&gt;

---------

Signed-off-by: Francisco Javier Arceo &lt;farceo@redhat.com&gt;
Signed-off-by: Francisco Arceo &lt;farceo@redhat.com&gt;
Co-authored-by: Matt Prahl &lt;mprahl@users.noreply.github.com&gt;
diff --git a/docs/conf.py b/docs/conf.py
@@ -70,7 +70,7 @@
 }
 
 html_theme = 'sphinx_immaterial'
-html_title = 'KFP SDK API Reference'
+html_title = 'Kubeflow Pipelines (KFP)'
 html_static_path = ['_static']
 html_css_files = ['custom.css']
 html_logo = '_static/kubeflow.png'
diff --git a/docs/index.rst b/docs/index.rst
@@ -1,14 +1,21 @@
-Kubeflow Pipelines SDK API Reference
+Kubeflow Pipelines (KFP) 
 ====================================
 
 .. mdinclude:: ../sdk/python/README.md
 
+.. mdinclude:: Architecture.md
+
 .. toctree::
    :caption: Contents
    :hidden:
 
    Home <self>
+   Quickstart <source/quickstart>
+   GenAI <source/genai>
+   Overview <source/overview>
+   Installation <source/installation>
    API Reference <source/kfp>
    Command Line Interface <source/cli>
+
    Usage Docs (kubeflow.org) <https://kubeflow.org/docs/pipelines/>
    Source Code <https://github.com/kubeflow/pipelines/>
diff --git a/docs/source/genai.rst b/docs/source/genai.rst
@@ -0,0 +1,62 @@
+GenAI Use Cases
+===============
+
+Generative AI (GenAI) workflows typically span multiple stages—from **data preparation** to **model fine-tuning**, **prompt engineering**, **evaluation**, and **deployment**. Kubeflow Pipelines provides a flexible and scalable orchestration engine to support these end-to-end workflows in a reproducible, modular way.
+
+Data Preparation
+----------------
+Effective GenAI starts with high-quality, well-structured data. Use Kubeflow Pipelines to:
+
+- Ingest and preprocess unstructured data such as PDFs, HTML, images, or audio.
+- Convert raw documents into structured formats and chunk them for tokenization.
+- Clean, normalize, and deduplicate datasets for training and evaluation.
+- Generate embeddings using models like SentenceTransformers or CLIP.
+- Create and store metadata-rich artifacts for traceability and downstream reuse.
+
+Fine-tuning & Training
+----------------------
+Once data is prepared, Kubeflow Pipelines can orchestrate training jobs at scale:
+
+- Automate tokenization and model fine-tuning (e.g., LoRA, full fine-tuning).
+- Parallelize hyperparameter sweeps (e.g., learning rate, batch size) using conditional and parallel components.
+- Leverage GPUs, TPUs, or managed training backends across environments.
+- Use pipeline components to separate data prep, training, and checkpoint saving.
+
+Prompt Engineering Experiments
+------------------------------
+Experiment with prompt templates using parameterized pipelines:
+
+- Evaluate prompt effectiveness at scale using batch scoring jobs.
+- Log and compare model outputs with evaluation metrics and annotations.
+- Enable iterative prompt design with easy-to-swap text templates.
+
+Evaluation & Monitoring
+-----------------------
+Build pipelines to evaluate and monitor model outputs:
+
+- Compare generations against reference outputs using BLEU, ROUGE, or custom metrics.
+- Integrate human-in-the-loop review and scoring.
+- Run periodic evaluation pipelines to detect degradation or drift in output quality.
+
+Inference & Deployment
+----------------------
+Turn generative models into production services with reproducible deployment steps:
+
+- Package and deploy models as containerized services using KServe or custom backends.
+- Use CI/CD pipelines to roll out new versions with A/B testing or canary releases.
+- Scale endpoints dynamically based on request volume and latency metrics.
+
+Multimodal Generative Workflows
+-------------------------------
+Design rich pipelines that support multiple input/output modalities:
+
+- Combine text, image, and audio generation into a unified DAG.
+- Orchestrate complex workflows involving model chaining and data routing.
+- Use custom components to process modality-specific inputs and outputs.
+
+
+See Also
+--------
+- :doc:`dsl`
+- :doc:`components`
+- :doc:`compiler`
diff --git a/docs/source/installation.rst b/docs/source/installation.rst
@@ -0,0 +1,41 @@
+.. _open-source-deployment:
+
+Deploying Kubeflow Pipelines 
+========
+
+As an alternative to deploying Kubeflow Pipelines (KFP) as part of the
+`Kubeflow deployment <https://www.kubeflow.org/docs/started/installing-kubeflow/>`_,
+you also have the option to deploy only Kubeflow Pipelines.
+
+Follow the instructions below to deploy Kubeflow Pipelines standalone using the supplied Kustomize manifests.
+
+You should be familiar with the following tools:
+
+- `Kubernetes <https://kubernetes.io/docs/home/>`_
+- `kubectl <https://kubernetes.io/docs/reference/kubectl/overview/>`_
+- `kustomize <https://kustomize.io/>`_
+
+Deploying Kubeflow Pipelines
+----------------------------
+
+1. Deploy Kubeflow Pipelines:
+
+   .. code-block:: bash
+
+      export PIPELINE_VERSION={{% pipelines/latest-version %}}
+      kubectl apply -k "github.com/kubeflow/pipelines/manifests/kustomize/cluster-scoped-resources?ref=$PIPELINE_VERSION"
+      kubectl wait --for condition=established --timeout=60s crd/applications.app.k8s.io
+      kubectl apply -k "github.com/kubeflow/pipelines/manifests/kustomize/env/dev?ref=$PIPELINE_VERSION"
+
+   The Kubeflow Pipelines deployment takes approximately 3 minutes to complete. During this time, it is normal for pods to crash in the `kubeflow` namespace until the deployment completes.
+
+2. Port-forward the Kubeflow Pipelines UI:
+
+   .. code-block:: bash
+
+      kubectl port-forward -n kubeflow svc/ml-pipeline-ui 8080:80
+
+3. Open the following URL in your browser to access the UI:
+
+   `http://localhost:8080 <http://localhost:8080>`_
+
diff --git a/docs/source/kfp.rst b/docs/source/kfp.rst
@@ -1,3 +1,5 @@
+.. _kfp-python-sdk:
+
 API Reference
 ==========================
 
diff --git a/docs/source/overview.rst b/docs/source/overview.rst
@@ -0,0 +1,58 @@
+Overview
+========
+
+What is Kubeflow Pipelines?
+----------------------------
+
+Kubeflow Pipelines (KFP) is a platform for building and deploying portable and scalable machine learning (ML) workflows using containers on Kubernetes-based systems.
+With KFP you can author :ref:`components <what-is-a-component>` and :ref:`pipelines <what-is-a-pipeline>` using the :ref:`KFP Python SDK <kfp-python-sdk>`, compile pipelines 
+to an :ref:`intermediate representation YAML <what-is-a-compiled-pipeline>`, and submit the pipeline to run on a KFP-conformant backend such as the :ref:`open source KFP backend <open-source-deployment>`, `Google Cloud Vertex AI Pipelines <https://cloud.google.com/vertex-ai/docs/pipelines/introduction>`_, or KFP local.
+
+The open source KFP backend is available as a core component of Kubeflow or as a standalone installation. 
+
+Why Kubeflow Pipelines?
+-----------------------
+
+KFP enables data scientists and machine learning engineers to:
+
+* Author end-to-end ML workflows natively in Python
+* Create fully custom ML components or leverage an ecosystem of existing components
+* Easily pass parameters and ML artifacts between pipeline components
+* Easily manage, track, and visualize pipeline definitions, runs, experiments, and ML artifacts
+* Efficiently use compute resources through parallel task execution and through caching to eliminate redundant executions
+* Keep experimentation and iteration light and Python-centric, minimizing the need to (re)build and maintain containers
+* Maintain cross-platform pipeline portability through a platform-neutral IR YAML pipeline definition
+* Abstract Kubernetes complexity while running pipelines on your organization's existing infrastructure investments (on-prem, cloud, or hybrid)
+
+.. _what-is-a-pipeline:
+
+What is a pipeline?
+-------------------
+
+A `pipeline` is a definition of a workflow that composes one or more `components` together to form a computational directed acyclic graph (DAG). At runtime, each component execution corresponds to a single container execution, which may create ML artifacts. Pipelines may also feature `control flow`.
+
+.. _what-is-a-component:
+
+What is a component?
+--------------------
+Components are the building blocks of KFP pipelines. A component is a remote function definition; it specifies inputs, has user-defined logic in its body, and can create outputs. When the component template is instantiated with input parameters, we call it a task.
+
+KFP provides two high-level ways to author components: Python Components and Container Components.
+
+Python Components are a convenient way to author components implemented in pure Python. There are two specific types of Python components: Lightweight Python Components and Containerized Python Components.
+
+Container Components expose a more flexible, advanced authoring approach by allowing you to define a component using an arbitrary container definition. This is the recommended approach for components that are not implemented in pure Python.
+
+Importer Components are a special "pre-baked" component provided by KFP which allows you to import an artifact into your pipeline when that artifact was not created by tasks within the pipeline.
+
+.. _what-is-a-compiled-pipeline:
+
+What is a compiled pipeline?
+----------------------------
+A compiled pipeline, often referred to as an IR YAML, is an intermediate representation (IR) of a compiled pipeline or component. The IR YAML is not intended to be written directly.
+
+While IR YAML is not intended to be easily human-readable, you can still inspect it if you know a bit about its contents:
+
+.. _pipelines: #what-is-a-pipeline
+.. _components: #what-is-a-component
+.. _compiled-pipeline: #what-is-a-compiled-pipeline
diff --git a/docs/source/quickstart.rst b/docs/source/quickstart.rst
@@ -0,0 +1,110 @@
+Quickstart
+==========
+
+This guide shows how to create, compile, and run a simple pipeline with Kubeflow Pipelines (KFP).
+
+Prerequisites
+-------------
+- Python 3.9+
+- A running Kubeflow Pipelines deployment (local or remote).
+
+Installation
+------------
+Install the Kubeflow Pipelines SDK:
+
+.. code-block:: bash
+
+   pip install kfp
+
+Local Initialization
+--------------------
+Use the `SubprocessRunner` for local execution without Docker:
+
+.. code-block:: python
+
+   from kfp import local
+   local.init(runner=local.SubprocessRunner())
+
+Writing a Simple Component
+--------------------------
+
+Define a lightweight component using the ``@dsl.component`` decorator:
+
+.. code-block:: python
+
+   from kfp import dsl
+
+   @dsl.component
+   def say_hello(name: str) -> str:
+       message = f"Hello, {name}!"
+       print(message)
+       return message
+
+You can run this component directly like a Python function:
+
+.. code-block:: python
+
+   task = say_hello(name="World")
+   assert task.output == "Hello, World!"
+
+Writing and Running a Pipeline
+------------------------------
+
+Define a pipeline using the ``@dsl.pipeline`` decorator:
+
+.. code-block:: python
+
+   @dsl.pipeline
+   def hello_pipeline(recipient: str) -> str:
+       hello_task = say_hello(name=recipient)
+       return hello_task.output
+
+Run the pipeline locally as a regular function:
+
+.. code-block:: python
+
+   pipeline_task = hello_pipeline(recipient="Local Dev")
+   assert pipeline_task.output == "Hello, Local Dev!"
+
+
+The ``@dsl.component`` and ``@dsl.pipeline`` decorators turn type-annotated Python functions into reusable pipeline components and workflows.
+
+Working with Artifacts
+----------------------
+
+You can also write artifacts to disk and read them locally:
+
+.. code-block:: python
+
+   from kfp.dsl import Output, Artifact
+   import json
+
+   @dsl.component
+   def add(a: int, b: int, out_artifact: Output[Artifact]):
+       result = a + b
+       with open(out_artifact.path, 'w') as f:
+           f.write(json.dumps(result))
+       out_artifact.metadata['operation'] = 'addition'
+
+   task = add(a=1, b=2)
+   with open(task.outputs['out_artifact'].path) as f:
+       result = json.loads(f.read())
+
+   assert result == 3
+   assert task.outputs['out_artifact'].metadata['operation'] == 'addition'
+
+
+Running the pipeline
+----------------------
+You can run the pipeline locally with Python:
+
+.. code-block:: bash
+
+   python my_pipeline.py
+
+
+Next steps
+----------
+- Explore the DSL: :doc:`dsl`
+- Learn about Components: :doc:`components`
+- See the CLI reference: :doc:`cli`

Original file line number	Diff line number	Diff line change
`@@ -70,7 +70,7 @@`
`70`	`70`	`}`
`71`	`71`
`72`	`72`	`html_theme = 'sphinx_immaterial'`
`73`		`-html_title = 'KFP SDK API Reference'`
	`73`	`+html_title = 'Kubeflow Pipelines (KFP)'`
`74`	`74`	`html_static_path = ['_static']`
`75`	`75`	`html_css_files = ['custom.css']`
`76`	`76`	`html_logo = '_static/kubeflow.png'`
Original file line number	Diff line number	Diff line change
`@@ -1,3 +1,5 @@`
	`1`	`+.. _kfp-python-sdk:`
	`2`	`+`
`1`	`3`	`API Reference`
`2`	`4`	`==========================`
`3`	`5`