Welcome! This guide covers everything you need to know to contribute components and pipelines to this repository.
- Prerequisites
- Quick Setup
- What We Accept
- Component Structure
- Naming Conventions
- Development Workflow
- Testing and Quality
- Adding a Custom Base Image
- Submitting Your Contribution
- Getting Help
Before contributing, ensure you have the following tools installed:
- Python 3.11+ for component development
- uv (installation guide) to manage
Python dependencies including
kfpandkfp-kubernetespackages - pre-commit (installation guide) for automated code quality checks
- Docker or Podman to build container images for custom components
- kubectl (installation guide) for Kubernetes operations
All contributors must follow the Kubeflow Community Code of Conduct.
This project uses uv for fast Python package management.
Follow the installation instructions at: https://docs.astral.sh/uv/getting-started/installation/
Verify installation:
uv --versionGet your development environment ready with these commands:
# Fork and clone the repository
git clone https://github.com/YOUR_USERNAME/pipelines-components.git
cd pipelines-components
git remote add upstream https://github.com/kubeflow/pipelines-components.git
# Set up Python environment
uv venv
source .venv/bin/activate
uv sync # Installs package in editable mode
uv sync --extra dev # Include dev dependencies if defined
pre-commit install
# Verify your setup works
pytestuv buildAfter building, you can install and test the wheel locally:
# Install the built wheel
uv pip install dist/kfp_components-*.whl
# Test imports work correctly
python -c "from kfp_components import components, pipelines; print('Core package imports OK')"This repository uses uv with a committed lockfile:
- Dependency definitions live in
pyproject.toml - Resolved dependency graph lives in
uv.lock
Prefer leaving dependency versions unpinned/unrestricted in pyproject.toml unless you have a concrete reason
(e.g., known incompatibility, security issue, or a required feature/behavior). If you restrict a dependency, add a
short comment explaining why (and link an issue if applicable). Use uv.lock to lock the resolved versions for
reproducible local development and CI.
If you change dependencies (e.g., edit pyproject.toml), update the lockfile and ensure it is in sync:
uv lock
uv lock --checkCI also verifies that uv.lock is in sync (see .github/workflows/python-lint.yml).
Before opening a PR, run pre-commit locally so you catch formatting/lint/validation issues early:
pre-commit runWe welcome contributions of production-ready ML components and re-usable pipelines:
- Components are individual ML tasks (data processing, training, evaluation, deployment)
- Pipelines are complete multi-step workflows that can be nested within other pipelines
- Bug fixes improve existing components or fix documentation issues
Components must be organized by category under components/<category>/.
Pipelines must be organized by category under pipelines/<category>/.
For better organization of related components or pipelines, you can create subcategories within a category. Subcategories provide:
- Logical grouping of related assets (e.g., all sklearn-based trainers, related ML workflows)
- Dedicated ownership via subcategory-level OWNERS file
- Shared utilities via an optional
shared/package
Component subcategory structure:
components/<category>/<subcategory>/
├── __init__.py # Subcategory package
├── OWNERS # Subcategory maintainers
├── README.md # Auto-generated subcategory index (lists components)
├── shared/ # Optional shared utilities package
│ ├── __init__.py
│ └── training_utils.py # Common code for components in this subcategory
└── <component_name>/ # Individual component
├── __init__.py
├── component.py
├── metadata.yaml
├── OWNERS
├── README.md # Auto-generated from metadata.yaml
└── tests/
Pipeline subcategory structure:
pipelines/<category>/<subcategory>/
├── __init__.py # Subcategory package
├── OWNERS # Subcategory maintainers
├── README.md # Auto-generated subcategory index (lists pipelines)
├── shared/ # Optional shared utilities package
│ ├── __init__.py
│ └── workflow_utils.py # Common code for pipelines in this subcategory
└── <pipeline_name>/ # Individual pipeline
├── __init__.py
├── pipeline.py
├── metadata.yaml
├── OWNERS
├── README.md # Auto-generated from metadata.yaml
└── tests/
- Components and pipelines use
snake_case(e.g.,data_preprocessing,model_trainer) - Commit messages follow Conventional Commits format with type prefix (feat, fix, docs, etc.)
Every component must include these files in its directory:
components/<category>/<component_name>/
├── __init__.py # Exposes the component function for imports
├── component.py # Main implementation
├── metadata.yaml # Complete specification (see schema below)
├── README.md # Overview, inputs/outputs, usage examples, development instructions
├── OWNERS # Maintainers (approvers must be Kubeflow community members)
├── Containerfile # Container definition (optional; required only when using a custom image)
├── example_pipelines.py # Working usage examples (optional)
└── tests/
│ └── test_component.py # Unit tests (optional)
└── <supporting_files>
Similarly, every pipeline must include these files:
pipelines/<category>/<pipeline_name>/
├── __init__.py # Exposes the pipeline function for imports
├── pipeline.py # Main implementation
├── metadata.yaml # Complete specification (see schema below)
├── README.md # Overview, inputs/outputs, usage examples, development instructions
├── OWNERS # Maintainers (approvers must be Kubeflow community members)
├── example_pipelines.py # Working usage examples (optional)
└── tests/
│ └── test_pipeline.py # Unit tests (optional)
└── <supporting_files>
Note: When using subcategories, the same required files apply at
components/<category>/<subcategory>/<component_name>/orpipelines/<category>/<subcategory>/<pipeline_name>/.
Your metadata.yaml must include these fields:
name: my_component
stability: stable # 'experimental', 'alpha', 'beta', or 'stable'
dependencies:
kubeflow:
- name: Pipelines
version: '>=2.5'
external_services: # Optional list of external dependencies
- name: Argo Workflows
version: "3.6"
tags: # Optional keywords for discoverability
- training
- evaluation
lastVerified: 2025-11-18T00:00:00Z # Updated annually; components are removed after 12 months without update
ci:
skip_dependency_probe: false # Optional. Set true only with justification
links: # Optional, can use custom key-value (not limited to documentation, issue_tracker)
documentation: https://kubeflow.org/components/my_component
issue_tracker: https://github.com/kubeflow/pipelines-components/issuesThe OWNERS file enables component owners to self-service maintenance tasks including approvals, metadata updates, and lifecycle management:
approvers:
- maintainer1 # Approvers must be Kubeflow community members
- maintainer2
reviewers:
- reviewer1The OWNERS file enables code review automation by leveraging PROW commands:
- Reviewers (as well as Approvers), upon reviewing a PR and finding it good to merge, can
comment
/lgtm, which applies thelgtmlabel to the PR - Approvers (but not Reviewers) can comment
/approve, which signifies the PR is approved for automation to merge into the repo. - If a PR has been labeled with both
lgtmandapprove, and all required CI checks are passing, PROW will merge the PR into the destination branch.
See full Prow documentation for usage details.
This repository follows a branch naming convention aligned with Kubeflow Pipelines:
| Branch | Purpose | Base Image Tag |
|---|---|---|
main |
Active development | :main |
release-<major>.<minor> |
Release maintenance (e.g., release-1.11) |
:v<major>.<minor>.<z-stream> |
Release branches are created for each minor version release:
- Naming:
release-<major>.<minor>(e.g.,release-1.11,release-2.0) - Purpose: Maintain stable releases and backport critical fixes
- Base images: Components on release branches should reference the appropriate release tag (e.g.,
:v1.11.0,:v1.11.1, ...)
When working on a release branch:
# For release-1.11, components should use the appropriate patch tag:
@dsl.component(base_image="ghcr.io/kubeflow/pipelines-components-example:v1.11.0")In addition to the initial x.y.0 release for a given release-x.y branch, we may cut one or more patch (z-stream) releases (x.y.1, x.y.2, ...).
Typical characteristics:
- Contents: Backported bug fixes, security fixes, dependency/base-image updates, and other low-risk changes needed to keep the release usable.
- Triggers: Critical regressions, CVEs, or other issues that require updates on a maintained
release-x.ybranch.
Start by syncing with upstream and creating a feature branch:
git remote add upstream https://github.com/kubeflow/pipelines-components.git # if not already set
git fetch upstream
git checkout -b component/my-component upstream/mainYou can create components and pipelines using either the automated approach (recommended) or manually. Both approaches are detailed below.
For rapid development, this repository provides convenient make commands that automate the entire development process.
📋 Overview of Make Commands
The following make targets simplify the development workflow:
| Command | Description |
|---|---|
make component CATEGORY=<cat> NAME=<name> |
Create a new component skeleton |
make pipeline CATEGORY=<cat> NAME=<name> |
Create a new pipeline skeleton |
make tests TYPE=<type> CATEGORY=<cat> NAME=<name> |
Add tests to existing component/pipeline |
make readme TYPE=<type> CATEGORY=<cat> NAME=<name> |
Generate/update README from code |
make format |
Auto-fix code formatting and linting issues |
make lint |
Check code quality (formatting, linting, imports) |
make sync-packages |
Sync package entries in pyproject.toml with discovered packages |
Note:
make componentandmake pipelineautomatically runmake sync-packages, sopyproject.tomlmay be updated after generating a new skeleton.
Optional flags (append to component/pipeline commands):
SUBCATEGORY=<sub>- Create asset in a subcategoryNO_TESTS=true- Skip test file generationCREATE_SHARED=true- Create shared utilities package (requires SUBCATEGORY)
Create a component with tests (recommended):
make component CATEGORY=data_processing NAME=my_data_processorCreate a pipeline with tests:
make pipeline CATEGORY=training NAME=my_training_pipelineCreate without tests (for rapid prototyping):
make component CATEGORY=data_processing NAME=my_prototype NO_TESTS=true
make pipeline CATEGORY=training NAME=my_prototype NO_TESTS=trueThis generates the complete directory structure:
components/data_processing/my_data_processor/
├── __init__.py # Import configuration
├── component.py # Implementation template with TODOs
├── metadata.yaml # Pre-configured metadata
├── README.md # Documentation template
├── OWNERS # Maintainer template
└── tests/ # Test directory (if not using NO_TESTS)
├── __init__.py
├── test_component_unit.py # Unit test template
└── test_component_local.py # Integration test template
Create a component within a subcategory:
# Create component in a subcategory (subcategory files created automatically)
make component CATEGORY=training SUBCATEGORY=sklearn_trainer NAME=logistic_regression
# Create component in subcategory with shared utilities package
make component CATEGORY=training SUBCATEGORY=sklearn_trainer NAME=random_forest CREATE_SHARED=trueThis generates a nested structure:
components/training/sklearn_trainer/
├── __init__.py # Subcategory package
├── OWNERS # Subcategory maintainers
├── README.md # Subcategory documentation
├── shared/ # (if CREATE_SHARED=true) Shared utilities
│ ├── __init__.py
│ └── sklearn_trainer_utils.py # Placeholder utility file
└── logistic_regression/ # Your component
├── __init__.py
├── component.py
├── metadata.yaml
├── OWNERS
├── README.md
└── tests/
├── __init__.py
├── test_component_local.py
└── test_component_unit.py
Create a pipeline within a subcategory:
# Create pipeline in a subcategory (subcategory files created automatically)
make pipeline CATEGORY=training SUBCATEGORY=ml_workflows NAME=batch_training
# Create pipeline in subcategory with shared utilities package
make pipeline CATEGORY=training SUBCATEGORY=ml_workflows NAME=batch_training CREATE_SHARED=trueThis generates a nested structure:
pipelines/training/ml_workflows/
├── __init__.py # Subcategory package
├── OWNERS # Subcategory maintainers
├── README.md # Subcategory documentation
├── shared/ # (if CREATE_SHARED=true) Shared utilities
│ ├── __init__.py
│ └── ml_workflows_utils.py # Placeholder utility file
└── batch_training/ # Your pipeline
├── __init__.py
├── pipeline.py
├── metadata.yaml
├── OWNERS
├── README.md
└── tests/
├── __init__.py
├── test_pipeline_local.py
└── test_pipeline_unit.py
🔧 Alternative: Manual Creation
If you prefer to create components manually or need more control over the structure, you can create your component following the directory structure above. Here's a basic template:
# component.py
from kfp import dsl
@dsl.component(base_image="python:3.11")
def hello_world(name: str = "World") -> str:
"""A simple hello world component.
Args:
name: The name to greet. Defaults to "World".
Returns:
A greeting message.
"""
message = f"Hello, {name}!"
print(message)
return messageWrite comprehensive tests for your component:
# tests/test_component.py
from ..component import hello_world
def test_hello_world_default():
"""Test hello_world with default parameter."""
# Access the underlying Python function from the component
result = hello_world.python_func()
assert result == "Hello, World!"
def test_hello_world_custom_name():
"""Test hello_world with custom name."""
result = hello_world.python_func(name="Kubeflow")
assert result == "Hello, Kubeflow!"Edit the generated component.py or pipeline.py file to replace TODO placeholders with your actual implementation. The skeleton includes:
- Proper imports and decorators
- Parameter and return type hints
- Docstring templates
- Compilation logic
make tests TYPE=component CATEGORY=data_processing NAME=my_data_processor
make tests TYPE=pipeline CATEGORY=training NAME=my_training_pipelineAfter implementing your logic, generate comprehensive README documentation using the existing README generation utility:
make readme TYPE=component CATEGORY=data_processing NAME=my_data_processor
make readme TYPE=pipeline CATEGORY=training NAME=my_training_pipelineThis automatically:
- Extracts parameters and return types from your code
- Parses docstrings for descriptions
- Generates usage examples
- Creates standardized documentation sections
Alternatively, you can create documentation manually following the standardized README.md format required by this repository. See the README Generator Script Documentation for details on the expected structure.
Auto-fix formatting and linting issues:
make formatCheck code quality before committing:
make lintRun tests:
# Run your component/pipeline tests
pytest components/data_processing/my_data_processor/tests/ -v
pytest pipelines/training/my_training_pipeline/tests/ -v
# Or run all tests
pytestRun the complete pre-commit validation:
pre-commit runThis ensures your contribution meets all quality standards before submission.
Follow the Submitting Your Contribution section below to commit your changes and create a pull request.
Here's a complete example creating a data processing component:
# 1. Create feature branch
git checkout -b component/csv-cleaner upstream/main
# 2. Create component skeleton
make component CATEGORY=data_processing NAME=csv_cleaner
# 3. Edit components/data_processing/csv_cleaner/component.py
# (Implement your logic, replace TODOs)
# 4. Generate documentation
make readme TYPE=component CATEGORY=data_processing NAME=csv_cleaner
# 5. Format and validate
make format
make lint
# 6. Run tests
pytest components/data_processing/csv_cleaner/tests/ -v
# 7. Final validation
pre-commit run
# 8. Commit and submit PR
git add .
git commit -m "feat(data_processing): add csv_cleaner component"
git push origin component/csv-cleanerThis workflow typically takes just a few minutes to set up the complete component structure with documentation and tests.
When creating related components that share ownership or utilities:
# 1. Create component in subcategory
make component CATEGORY=training SUBCATEGORY=sklearn_trainer NAME=logistic_regression
# 2. Edit the component and subcategory files:
# - components/training/sklearn_trainer/logistic_regression/component.py (your logic)
# - components/training/sklearn_trainer/OWNERS (subcategory maintainers)
# 3. Generate documentation
make readme TYPE=component CATEGORY=training SUBCATEGORY=sklearn_trainer NAME=logistic_regression
# 4. Run tests
pytest components/training/sklearn_trainer/logistic_regression/tests/ -v
# 5. Format, lint, and submit
make format
make lint
pre-commit run
git add .
git commit -m "feat(training): add logistic_regression component in sklearn_trainer subcategory"Run these commands from your component/pipeline directory before submitting your contribution:
# Run all unit tests with coverage reporting
pytest --cov=src --cov-report=html
# Run specific test files when debugging
pytest tests/test_my_component.py -vEnsure your code meets quality standards:
# Format and lint with ruff
uv run ruff format --check . # Check formatting (120 char line length)
uv run ruff check . # Check linting, docstrings, and import order
# Or use make commands for convenience
make lint # Run all linting checks
make format # Auto-format and auto-fix issues
# Validate import guard (enforces stdlib-only top-level imports)
uv run .github/scripts/check_imports/check_imports.py \
--config .github/scripts/check_imports/import_exceptions.yaml \
components pipelines
# Validate YAML files
uv run yamllint -c .yamllint.yml .
# Validate Markdown files
markdownlint -c .markdownlint.json **/*.md
# Validate metadata schema
python scripts/validate_metadata.py
# Run all pre-commit hooks
pre-commit runAll components and pipelines must use approved base images. The validation script compiles components
using kfp.compiler to extract the actual runtime images, which correctly handles:
- Variable references (
base_image=MY_IMAGE) functools.partialwrappers- Default image resolution
Valid base images:
- Images starting with
ghcr.io/kubeflow/(Kubeflow official registry) - Standard Python images (
python:<version>, e.g.,python:3.11,python:3.11-slim)
Run the validation locally:
# Run with default settings
uv run scripts/validate_base_images/validate_base_images.pyThe script allows any standard Python image matching python:<version> (e.g., python:3.11,
python:3.10-slim) in addition to Kubeflow registry images.
Every component and pipeline that sets ci.compile_check: true in its metadata.yaml must compile
successfully and declare well-formed dependency metadata. The compile-check CLI discovers
metadata-backed assets, validates their dependencies block, and compiles the exposed
@dsl.component/@dsl.pipeline functions.
Run it locally with:
# Run against all metadata-backed targets
uv run python -m scripts.compile_check.compile_check
# Limit to one directory (can be repeated)
uv run python -m scripts.compile_check.compile_check \
--path components/training/my_componentThe script exits non-zero if any dependency metadata is malformed or if compilation fails, matching
the behaviour enforced by CI (.github/workflows/compile-and-deps.yml).
Import Guard: This repository enforces that top-level imports must be limited to Python's
standard library. Heavy dependencies (like kfp, pandas, etc.) should be imported within
function/pipeline bodies. Exceptions can be added to
.github/scripts/check_imports/import_exceptions.yaml when justified (e.g., for test files
importing pytest).
Note: kfp is allowlisted at module scope; kfp_components is allowlisted at module scope for pipelines/**.
Common error: imports non-stdlib module '<module>' at top level
This often happens in modules under components/ or pipelines/.
Keep top-level imports to a bare minimum for compilation, and place imports needed at runtime inside pipeline/component bodies.
Scripts tests (relative imports): For tests under scripts/**/tests/ and .github/scripts/**/tests/, use relative
imports from the parent module so imports work consistently in both IDEs and pytest. Canonical guidance:
scripts/README.md (Import Conventions).
This section explains how to write comprehensive tests for your components, using the yoda_data_processor component as a reference example.
Unit Tests test your component's core logic in isolation:
- Use mocking to avoid external dependencies
- Test the component's Python function directly via
.python_func() - Fast execution, no external resources required
- Located in
tests/test_component_unit.py
Local Runner Tests test your component in a real execution environment:
- Execute the component using KFP's LocalRunner
- Test actual component behavior end-to-end
- Located in
tests/test_component_local.py
Create a tests/ directory in your component folder with the following structure:
components/<category>/<component_name>/tests/
├── __init__.py
├── test_component_unit.py # Unit tests with mocking
└── test_component_local.py # LocalRunner integration tests
Unit tests should verify your component's logic without external dependencies. Here's the pattern used in yoda_data_processor:
# tests/test_component_unit.py
from unittest import mock
from ..component import your_component_function
class TestYourComponentUnitTests:
"""Unit tests for component logic."""
def test_component_function_exists(self):
"""Test that the component function is properly imported."""
assert callable(your_component_function)
assert hasattr(your_component_function, "python_func")
@mock.patch("external_library.some_function")
def test_component_with_mocked_dependencies(self, mock_function):
"""Test component behavior with mocked external calls."""
# Setup mocks
mock_function.return_value = "expected_result"
# Create mock output objects
mock_output = mock.MagicMock()
mock_output.path = "/tmp/test_output"
# Call the component's Python function directly
your_component_function.python_func(
input_param="test_value",
output_artifact=mock_output
)
# Verify expected interactions
mock_function.assert_called_once_with("test_value")Key patterns for unit tests:
- Use
@mock.patchto mock external dependencies - Call
your_component.python_func()to test the underlying Python function - Mock output artifacts with
.pathattributes pointing to test paths - Verify function calls and parameter passing
Local Runner tests execute your component in a real KFP environment. Use the provided fixtures:
# tests/test_component_local.py
from ..component import your_component_function
from tests.utils.fixtures import setup_and_teardown_subprocess_runner
class TestYourComponentLocalRunner:
"""Test component with LocalRunner (subprocess execution)."""
def test_local_execution(self, setup_and_teardown_subprocess_runner):
"""Test component execution with LocalRunner."""
# Execute the component with real parameters
your_component_function(
input_param="real_value",
# Output artifacts are handled automatically by LocalRunner
)
# Add assertions about expected outputs if needed
# (files created, logs generated, etc.)Important notes for Local Runner tests:
- The
setup_and_teardown_subprocess_runnerfixture is automatically available (no import required) - Use the fixture as a test method parameter:
def test_local_execution(self, setup_and_teardown_subprocess_runner) - The fixture handles LocalRunner setup, workspace creation, and cleanup
- Resource Requirements: Ensure your test environment has sufficient CPU, memory, and disk space to execute the component's actual workload
- Tests may download data, install packages, or perform computationally intensive operations
The repository provides test infrastructure through a global conftest.py file at the project root:
Global Test Configuration (conftest.py):
- Session Setup Hook: Uses
pytest_sessionstartto configure the test environment before any tests run - Path Management: Automatically adds the project root to
sys.pathfor clean imports during testing - LocalRunner Fixture:
setup_and_teardown_subprocess_runner(module-scoped)- Creates isolated workspace and output directories (
./test_workspace_subprocess,./test_pipeline_outputs_subprocess) - Configures KFP LocalRunner with subprocess execution (no virtual environment)
- Enables
raise_on_error=Truefor immediate test failure on component errors - Automatically cleans up test artifacts after each test module completes
- Creates isolated workspace and output directories (
Pytest Configuration (pyproject.toml):
- Test Discovery: Configured to find tests in
components/*/testsandpipelines/*/testsdirectories - Import Mode: Uses
--import-mode=importlibfor better import handling - Automatic Detection: Automatically discovers component and pipeline tests without manual configuration
If Ruff complains about pytest fixture imports, you may encounter two types of errors:
F401 (unused import) - If Ruff removes imports that are only used as pytest fixture parameters:
from tests.utils.fixtures import setup_and_teardown_subprocess_runner # noqa: F401F811 (redefinition) - If Ruff thinks the fixture parameter redefines the imported name:
def test_local_execution(self, setup_and_teardown_subprocess_runner): # noqa: F811These comments tell Ruff that the import and parameter usage are intentional pytest fixture patterns.
From your component directory, run:
# Run all tests
pytest tests/
# Run only unit tests (fast)
pytest tests/test_component_unit.py -v
# Run only local runner tests (slower, requires resources)
pytest tests/test_component_local.py -v
# Run with coverage reporting
pytest tests/ --cov=. --cov-report=html- Unit tests: Should have high coverage of your component's logic
- Local runner tests: Should verify end-to-end component execution
- Resource considerations: Local runner tests require adequate system resources for your component's workload
- Dependencies: Mock external services in unit tests; use real dependencies in local runner tests
- Cleanup: Use provided fixtures to ensure proper test environment cleanup
If your component uses a custom image, test the container build:
# Build your component image
docker build -t my-component:test components/<category>/my-component/
# Test the container runs correctly
docker run --rm my-component:test echo "Hello, world!"GitHub Actions automatically runs these checks on every pull request:
- Python linting: Code formatting, style checks, docstring validation, and import sorting
- Import guard: Validates that top-level imports are limited to Python's standard library
- YAML linting: Validates YAML file syntax and style (yamllint)
- Markdown linting: Validates Markdown formatting and style (markdownlint)
- Unit and integration tests with coverage reporting
- Container image builds for components with Containerfiles
- Security vulnerability scans
- Metadata schema validation
- Standardized README content and formatting conformance
This repository uses Dependabot to keep:
- Python dependencies (including pinned direct dependencies in
pyproject.toml) anduv.lockup to date - GitHub Actions versions in workflow files up to date
Configuration lives in .github/dependabot.yml.
Components that require specific dependencies beyond what's available in standard KFP images can use custom base images. This section explains how to add and maintain custom base images for your components.
Custom base images are:
- Built automatically by CI on every push to
mainand on tags - Published to
ghcr.io/kubeflow/pipelines-components-<name> - Tagged with
:mainfor the latest main branch build, or:v<version>for releases
Create a Containerfile in your component's directory:
components/
└── training/
└── my_component/
├── Containerfile # Your custom base image
├── component.py
├── metadata.yaml
└── README.md
See examples/Containerfile for a complete example with recommended patterns
(labels, environment settings, non-root user, etc.).
Guidelines:
- Keep images minimal - only include dependencies your component needs
- Pin dependency versions for reproducibility
- Use official base images when possible
- Avoid including secrets or credentials
Edit .github/workflows/container-build.yml and add your image to the strategy.matrix.include
array in the build job:
strategy:
fail-fast: false
matrix:
include:
- name: example
context: docs/examples
# Add your new image:
- name: my-training-image
context: components/training/my_componentMatrix fields:
name: Unique identifier for your image. The final image will beghcr.io/kubeflow/pipelines-components-<name>.context: Build context directory containing yourContainerfile.
Naming convention:
- Use lowercase with hyphens:
my-training-component - Be descriptive:
sklearn-preprocessing,pytorch-training - The full image path will be:
ghcr.io/kubeflow/pipelines-components-my-training-component
In your component.py, use the base_image parameter with the :main tag:
from kfp import dsl
@dsl.component(
base_image="ghcr.io/kubeflow/pipelines-components-my-training-image:main"
)
def my_component(input_path: str) -> str:
import pandas as pd
from sklearn import preprocessing
# Your component logic here
...Important: Always use the :main tag during development. This ensures:
- Your component uses the latest image from the main branch
- PR validation can override the tag to test against PR-built images
| Event | Behavior |
|---|---|
| Pull Request | Images are built but not pushed. Validation uses locally-loaded :<sha> tags (full 40-character commit SHA). |
Push to main |
Images are built and pushed with tag: :main |
Push to tag (e.g., v1.0.0) |
Images are built and pushed with tag: :<tag> |
Your image will be available with these tags:
| Tag | Description | Example |
|---|---|---|
:main |
Latest build from main branch | ...-my-component:main |
:<tag> |
Git tag | ...-my-component:v1.0.0 |
:<sha> |
PR validation tag (local only; full 40-character commit SHA; not pushed to the registry) | ...-my-component:3f5c8e2a9d4b7c1e0f6a3b9d8c2e4f1a6b7c3d9 |
Before submitting a PR, test your image locally:
Docker
# Build the image
docker build -t my-component:test -f components/training/my_component/Containerfile components/training/my_component
# Test it
docker run --rm my-component:test python -c "import pandas; print(pandas.__version__)"Podman
# Build the image
podman build -t my-component:test -f components/training/my_component/Containerfile components/training/my_component
# Test it
podman run --rm my-component:test python -c "import pandas; print(pandas.__version__)"Use descriptive commit messages following the Conventional Commits format:
git add .
git status # Review what you're committing
git diff --cached # Check the actual changes
git commit -m "feat(training): add <my_component> training component
- Implements <my_component> component
- Includes comprehensive unit tests with 95% coverage
- Provides working pipeline examples
- Resolves #123"Push your changes and create a pull request on GitHub:
git push origin component/my-componentOn GitHub, click "Compare & pull request" and fill out the PR template provided with appropriate details
All PRs must pass:
- Automated checks (linting, tests, builds)
- Code review by maintainers and community members
- Documentation review
All pull requests must complete the following:
- All Automated CI checks successfully passing
- Code Review - reviewers will verify the following:
- Component works as described
- Code is clean and well-documented
- Included tests provide good coverage.
- Receive approval from component OWNERS (for updates to existing components) or repository maintainers (for new components)
- Governance questions: See GOVERNANCE.md for ownership, verification, and process details
- Community discussion: Join
#kubeflow-pipelineschannel on the CNCF Slack - Bug reports and feature requests: Open an issue at GitHub Issues
This repository was established through KEP-913: Components Repository.
Thanks for contributing to Kubeflow! 🚀