Skip to content

feat: sdg component#88

Open
beatsmonster wants to merge 52 commits intokubeflow:mainfrom
beatsmonster:main
Open

feat: sdg component#88
beatsmonster wants to merge 52 commits intokubeflow:mainfrom
beatsmonster:main

Conversation

@beatsmonster
Copy link
Copy Markdown

@beatsmonster beatsmonster commented Jan 29, 2026

Description of your changes:

This is a mirror design documentation of SDG KFP Component for collecting feedbacks.

Checklist:

Pre-Submission Checklist

Additional Checklist Items for New or Updated Components/Pipelines

  • metadata.yaml includes fresh lastVerified timestamp
  • All required files
    are present and complete
  • OWNERS file lists appropriate maintainers
  • README provides clear documentation with usage examples
  • Component follows snake_case naming convention
  • No security vulnerabilities in dependencies
  • Containerfile included if using a custom base image

@google-oss-prow google-oss-prow bot requested a review from HumairAK January 29, 2026 14:26
@google-oss-prow
Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign mprahl for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an SDG Hub–backed Kubeflow Pipelines (KFP) component plus supporting documentation, examples, and tests, along with test fixtures for validating transform-only and LLM flows.

Changes:

  • Introduce sdg_hub KFP component implementation with flow selection, optional model configuration, metrics, and optional PVC export.
  • Add component documentation (README + architecture design) and example pipelines/local runner script.
  • Add unit + local/integration-style tests and SDG Hub flow/prompt test fixtures; update packaging config to include new component package.

Reviewed changes

Copilot reviewed 15 out of 17 changed files in this pull request and generated no comments.

Show a summary per file
File Description
components/sdg/ARCHITECTURE.md Architecture/design documentation for the SDG component.
components/sdg/sdg_hub/component.py New KFP component implementation wrapping SDG Hub SDK.
components/sdg/sdg_hub/README.md Component usage documentation and local dev/testing instructions.
components/sdg/sdg_hub/example_pipelines.py Example KFP pipelines demonstrating component usage patterns.
components/sdg/sdg_hub/run_local.py Local execution script for running an LLM test flow and printing outputs.
components/sdg/sdg_hub/metadata.yaml Component metadata (deps/tags/verification timestamp).
components/sdg/sdg_hub/OWNERS Ownership/maintainer declarations for the component.
components/sdg/sdg_hub/tests/test_component_unit.py Unit tests with SDG Hub SDK mocked to validate component logic.
components/sdg/sdg_hub/tests/test_component_local.py Local runner/integration tests using real flows (LLM test gated by env var).
test_data/sdg_hub/* Sample input + minimal transform/LLM flows and prompt fixture for tests.
pyproject.toml Adds deps for tests and registers new component packages for distribution.
Comments suppressed due to low confidence (9)

components/sdg/sdg_hub/component.py:13

  • The component imports and uses pandas, but packages_to_install only includes sdg-hub. Unless sdg-hub guarantees pandas as a dependency, this can fail at runtime when KFP installs the component dependencies. Add pandas to packages_to_install (or remove the direct dependency on pandas).
@dsl.component(
    packages_to_install=["sdg-hub"],
)

components/sdg/sdg_hub/run_local.py:29

  • run_local.py executes its logic at import time (top-level with tempfile.TemporaryDirectory()...). This makes it unsafe to import and can cause unexpected side effects in tooling. Wrap execution in an if __name__ == "__main__": guard (and optionally move logic into a main() function).
with tempfile.TemporaryDirectory() as tmp_dir:
    output_artifact = Artifact(os.path.join(tmp_dir, "output.jsonl"))
    output_metrics = Artifact(os.path.join(tmp_dir, "metrics.json"))

    print(f"Input:   {INPUT_PATH}")

components/sdg/sdg_hub/README.md:44

  • The README import paths use from components.sdg.sdg_hub import sdg, but this repo is packaged as kfp_components (see pyproject.toml package-dir mapping). These examples likely won’t work for users who pip install kfp-components; consider switching examples to from kfp_components.components.sdg.sdg_hub import sdg (or whatever the intended public import path is).
```python
from components.sdg.sdg_hub import sdg
from kfp import dsl
from kfp_kubernetes import mount_pvc, use_secret_as_env

components/sdg/sdg_hub/example_pipelines.py:75

  • The secret key name in the example (OPENAI_APIKEY) conflicts with the generic secret key naming described elsewhere in this PR (e.g., api_key/api_base in ARCHITECTURE.md). To avoid confusing users, align the examples/documentation on one secret schema (either update this mapping to api_key or clarify that any key name works as long as it’s mapped to LLM_API_KEY).
    use_secret_as_env(
        task=sdg_task,
        secret_name="llm-credentials",
        secret_key_to_env={"OPENAI_APIKEY": "LLM_API_KEY"},
    )

pyproject.toml:38

  • pyproject.toml adds new test dependencies (sdg-hub, pandas) but the committed uv.lock does not appear to include them. If this repo relies on uv.lock for reproducible environments/CI, it should be regenerated/updated to match the new dependency set.
    "semver",
    "pip",
    "sdg-hub",
    "pandas",
]

components/sdg/ARCHITECTURE.md:5

  • PR description says this is a mirror design doc for feedback, but the PR also adds a full component implementation, tests, and test data. Please update the PR description/title or split the PR so reviewers understand whether they’re reviewing design docs only vs. production code changes.
# SDG Hub KFP Component Architecture Design

## Document Information

| Field | Value |

components/sdg/sdg_hub/component.py:67

  • logging.basicConfig(level=getattr(logging, log_level.upper())) will raise AttributeError if an invalid log_level is provided (e.g., typo), causing the component to crash before doing any work. Consider validating log_level against allowed values and/or using a safe fallback (e.g., default to INFO) when the attribute is missing.
    logging.basicConfig(
        level=getattr(logging, log_level.upper()),
        format="%(asctime)s - %(levelname)s - %(message)s",
    )

components/sdg/sdg_hub/README.md:20

  • The Parameters table uses double leading/trailing pipes (e.g., || Parameter | ... |), which does not render as a standard Markdown table in many renderers. Use a normal Markdown table row format with single | delimiters.
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `input_artifact` | `dsl.Input[dsl.Dataset]` | `None` | KFP Dataset artifact from upstream component |
| `input_pvc_path` | `str` | `""` | Path to JSONL input file on a mounted PVC |
| `flow_id` | `str` | `""` | Built-in flow ID from the SDG Hub registry |

components/sdg/ARCHITECTURE.md:1496

  • This section states packages_to_install was rejected in favor of a pre-baked image, but the actual component implementation in this PR uses @dsl.component(packages_to_install=[...]). Either update the design doc to match the chosen implementation, or switch the component to a base_image approach as described here to avoid runtime installs.
**Why pre-baked (not runtime install):**

| Alternative | Why Not Selected |
|-------------|------------------|
| **`packages_to_install`** | 30-60s pip install overhead per run; network dependency; version resolution risk |

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@beatsmonster beatsmonster force-pushed the main branch 3 times, most recently from 0cde74f to b9120fd Compare February 25, 2026 15:24
@beatsmonster beatsmonster changed the title feat: sdg component design feat: sdg component Feb 25, 2026
Copilot AI review requested due to automatic review settings February 26, 2026 16:31
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 15 out of 17 changed files in this pull request and generated 11 comments.

Comments suppressed due to low confidence (2)

components/sdg/sdg_hub/README.md:45

  • The README examples import sdg from components.sdg.sdg_hub, but this repo packages components under kfp_components.components (via tool.setuptools.package-dir). Use the installable import path in examples (e.g., from kfp_components.components.sdg.sdg_hub import sdg) so copy/paste works for users.
from components.sdg.sdg_hub import sdg
from kfp import dsl
from kfp_kubernetes import mount_pvc, use_secret_as_env

components/sdg/sdg_hub/component.py:66

  • getattr(logging, log_level.upper()) will raise AttributeError for invalid log_level values, which turns a user input error into an unexpected crash. Validate log_level (or provide a default like logging.INFO) and raise a clear ValueError listing allowed levels.
    logging.basicConfig(
        level=getattr(logging, log_level.upper()),
        format="%(asctime)s - %(levelname)s - %(message)s",

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

]
}
with open(output_metrics.path, "w") as f:
json.dump(metrics_data, f)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was curious if this is the most appropriate pattern for logging metrics in KFP? I dont have full context but I was referring back to the ARCHITECTURE.md and found that over there the metrics logging was in the format of:

metrics.log_metric("total_input_rows", 1000)
metrics.log_metric("total_output_rows", 850)
metrics.log_metric("execution_time_seconds", 3456.7)
metrics.log_metric("successful_blocks", 8)
metrics.log_metric("failed_blocks", 0)

in the current implementation, a hand-crafted JSON dict is written out to output_metrics.path. If this is the correct pattern, then maybe we can amend the architecture, or if the architecture pattern is more appropriate, could we adopt the same?

Copilot AI review requested due to automatic review settings March 2, 2026 21:58
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 15 out of 17 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (8)

components/sdg/sdg_hub/run_local.py:69

  • run_local.py executes the LocalRunner workflow at import time (module-level with tempfile.TemporaryDirectory() as pipeline_root:). This makes the file unsafe to import (e.g., by tooling) and can cause surprising side effects. Wrap execution in an if __name__ == "__main__": block and keep only definitions/patching setup at module scope.
# Initialize KFP LocalRunner
with tempfile.TemporaryDirectory() as pipeline_root:
    kfp.local.init(
        runner=kfp.local.SubprocessRunner(use_venv=False),
        pipeline_root=pipeline_root,
    )

components/sdg/sdg_hub/run_local.py:22

  • REPO_ROOT is set to the directory of this file (components/sdg/sdg_hub), not the repository root. This name is misleading and makes path logic harder to follow; consider renaming to something like COMPONENT_DIR (or compute the actual repo root if that’s what you need).
# Paths
REPO_ROOT = os.path.dirname(os.path.abspath(__file__))
TEST_DATA = os.path.join(REPO_ROOT, "..", "..", "..", "test_data", "sdg_hub")
INPUT_PATH = os.path.abspath(os.path.join(TEST_DATA, "sample_input.jsonl"))
FLOW_PATH = os.path.abspath(os.path.join(TEST_DATA, "llm_test_flow.yaml"))

components/sdg/sdg_hub/component.py:238

  • PVC export directory naming uses second-resolution timestamps (%Y%m%d_%H%M%S). If two runs export the same flow_id within the same second to the same export_path, they can collide and overwrite generated.jsonl. Consider including higher-resolution time (milliseconds) or a unique suffix (UUID/run id) to make export paths collision-safe.
        flow_name = flow_id if flow_id else "custom"
        timestamp = time.strftime("%Y%m%d_%H%M%S")
        export_dir = os.path.join(export_path, flow_name, timestamp)

        os.makedirs(export_dir, exist_ok=True)
        export_file_path = os.path.join(export_dir, "generated.jsonl")

components/sdg/sdg_hub/README.md:46

  • The README examples import sdg from components.sdg.sdg_hub, but this repo is packaged with kfp_components.components mapped to the components/ directory (see pyproject.toml package-dir). The import path in docs should use kfp_components.components.sdg.sdg_hub (or whatever the intended public import is), otherwise users will get ModuleNotFoundError when using the installed package.
```python
from components.sdg.sdg_hub import sdg
from kfp import dsl
from kfp_kubernetes import mount_pvc, use_secret_as_env

pyproject.toml:39

  • The test extra adds sdg-hub without a version constraint, but the component pins runtime to sdg-hub>=0.7.0,<1.0. To reduce CI flakes from upstream breaking changes, align the test dependency constraint with the component’s supported range (or document why tests should run against unpinned latest).
test = [
    "docker",
    "pytest",
    "pytest-cov",
    "pytest-timeout",
    "docstring-parser",
    "jinja2",
    "packaging",
    "pyyaml",
    "semver",
    "pip",
    "setuptools",
    "sdg-hub",
    "pandas",
]

components/sdg/ARCHITECTURE.md:1497

  • This section states dependencies are pre-baked into an image and specifically calls out packages_to_install as a rejected approach, but the actual component implementation uses @dsl.component(packages_to_install=[...]). To avoid misleading readers, either update the design doc to reflect the current implementation approach, or adjust the implementation to match the documented packaging strategy.
### 10.3 Dependency Management

All dependencies are **pre-baked** into the container image at build time.

```mermaid
flowchart LR
    subgraph build["Build Time"]
        A["Dockerfile"] --> B["pip install sdg-hub"]
        B --> C["Image with all deps"]
    end

    subgraph runtime["Runtime"]
        C --> D["Component executes immediately"]
    end

Why pre-baked (not runtime install):

Alternative Why Not Selected
packages_to_install 30-60s pip install overhead per run; network dependency; version resolution risk
**components/sdg/sdg_hub/component.py:31**
* `runtime_params` uses a mutable default (`{}`), which can leak state between invocations in the same Python process (including unit tests) and is generally unsafe. Use `runtime_params: dict | None = None` (or similar) and set it to `{}` inside the function when needed.
max_tokens: int = -1,
export_to_pvc: bool = False,
export_path: str = "",
runtime_params: dict = {},

) -> None:

**components/sdg/sdg_hub/component.py:68**
* `logging.basicConfig(level=getattr(logging, log_level.upper()), ...)` will raise `AttributeError` if an invalid `log_level` is passed (e.g., a typo). Consider validating `log_level` against known levels and raising a clear `ValueError`, or defaulting to INFO when unknown.
# Configure logging
logging.basicConfig(
    level=getattr(logging, log_level.upper()),
    format="%(asctime)s - %(levelname)s - %(message)s",
)
</details>



---

💡 <a href="/kubeflow/pipelines-components/new/main?filename=.github/instructions/*.instructions.md" class="Link--inTextBlock" target="_blank" rel="noopener noreferrer">Add Copilot custom instructions</a> for smarter, more guided reviews. <a href="https://docs.github.com/en/copilot/customizing-copilot/adding-repository-custom-instructions-for-github-copilot" class="Link--inTextBlock" target="_blank" rel="noopener noreferrer">Learn how to get started</a>.

Copilot AI review requested due to automatic review settings March 3, 2026 05:19
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 16 out of 19 changed files in this pull request and generated 3 comments.

Comments suppressed due to low confidence (10)

components/sdg/sdg_hub/run_local.py:16

  • from component import sdg is brittle (it depends on the current working directory / import path). Use an explicit package import (e.g., relative from .component import sdg or from kfp_components.components.sdg.sdg_hub.component import sdg) so the script works when run from other directories or via python -m ....
import kfp.local
import pandas as pd
from component import sdg
from kfp.local import executor_input_utils, task_dispatcher

components/sdg/sdg_hub/README.md:46

  • The README examples import sdg from components..., but the installable package namespace is kfp_components.components (per pyproject.toml). This will confuse users following the docs from an installed wheel. Update the import paths in the examples to use kfp_components.components.sdg.sdg_hub (or whichever public import you want to support).
from components.sdg.sdg_hub import sdg
from kfp import dsl
from kfp_kubernetes import mount_pvc, use_secret_as_env

pyproject.toml:66

  • The packages list is manually enumerated, but this PR adds pipelines/sdg/ without adding kfp_components.pipelines.sdg (and likely kfp_components.pipelines.sdg.*) here. As-is, the new pipeline examples won’t be included in the built wheel/sdist. Either add the new pipelines package(s) or move these examples to docs/examples outside the packaged modules.
    "kfp_components.components.sdg",
    "kfp_components.components.sdg.sdg_hub",
    "kfp_components.components.deployment",
    "kfp_components.components.evaluation",
    "kfp_components.components.training",
    "kfp_components.pipelines",
    "kfp_components.pipelines.data_processing",
    "kfp_components.pipelines.deployment",
    "kfp_components.pipelines.evaluation",
    "kfp_components.pipelines.training",
]

scripts/tests/run_component_tests.py:134

  • If installing requirements-test.txt fails, the script only prints a warning and then continues into pytest. That can turn a dependency resolution problem into confusing test import errors later. Consider failing fast (return non-zero / raise) and printing stdout/stderr to make the root cause obvious.
            print(f"Installing test dependencies from {req_file.relative_to(REPO_ROOT)}")
            result = subprocess.run(
                [sys.executable, "-m", "pip", "install", "-q", "-r", str(req_file)],
                capture_output=True,
                text=True,
            )
            if result.returncode != 0:
                print(f"⚠️  Failed to install {req_file}: {result.stderr.strip()}")
            installed.append(req_file)

pipelines/sdg/example_pipelines.py:16

  • This module imports kfp_kubernetes, but the repo’s declared dependencies don’t include the kfp-kubernetes distribution (and it doesn’t appear in the lockfile). As written, compiling these example pipelines will fail in a fresh dev environment. Either add kfp-kubernetes as an (optional) dependency, or change the examples to use a dependency that is already part of the project.
from kfp import dsl
from kfp_kubernetes import mount_pvc, use_secret_as_env

pipelines/sdg/example_pipelines.py:18

  • Imports use from components..., but this repository packages components under kfp_components.components (see pyproject.toml package-dir mapping). These example pipelines will break when used from an installed distribution. Update imports to from kfp_components.components.sdg.sdg_hub import sdg (or a relative import within the kfp_components.pipelines package).
from components.sdg.sdg_hub.component import sdg

components/sdg/sdg_hub/README.md:67

  • The README uses secret_key_to_env={"OPENAI_APIKEY": "LLM_API_KEY"} in the quickstart examples, but later sections show secrets with keys like api_key/api_base. Since Secret key names are user-defined, pick one convention and use it consistently across the README (and consider using a more standard key name like api_key or OPENAI_API_KEY to reduce confusion).
    # Inject LLM API credentials
    use_secret_as_env(
        task=sdg_task,
        secret_name="llm-credentials",
        secret_key_to_env={"OPENAI_APIKEY": "LLM_API_KEY"},
    )

components/sdg/sdg_hub/README.md:46

  • The usage snippets rely on kfp_kubernetes (from kfp_kubernetes import mount_pvc, use_secret_as_env), but kfp-kubernetes is not declared as a dependency/extra in this repo. Add it to an appropriate optional-dependency (e.g. test or a new pipelines/kubernetes extra) or adjust the docs to use APIs that are available with the declared dependencies.
from kfp import dsl
from kfp_kubernetes import mount_pvc, use_secret_as_env

components/sdg/sdg_hub/component.py:68

  • logging.basicConfig(level=getattr(logging, log_level.upper())) will raise AttributeError for an invalid log_level (e.g., typo), crashing the component before it can emit a helpful error. Validate log_level against allowed values and fall back to INFO (or raise a clear ValueError).
    logging.basicConfig(
        level=getattr(logging, log_level.upper()),
        format="%(asctime)s - %(levelname)s - %(message)s",
    )

components/sdg/sdg_hub/component.py:13

  • This component imports and uses pandas inside python_func, but packages_to_install only installs sdg-hub. If sdg-hub doesn’t guarantee pandas as a dependency, the component can fail at runtime with ImportError. Consider adding pandas (and any other direct imports) to packages_to_install, or avoid the direct pandas dependency if the SDK can accept file paths/records instead.
@dsl.component(
    packages_to_install=["sdg-hub>=0.7.0,<1.0"],
)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +1893 to +1895
@component(
packages_to_install=["sdg-hub>=0.7.0,<1.0"],
)
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The example sdg component definition here uses packages_to_install=["sdg-hub>=0.7.0,<1.0"], which causes KFP to pip-install a third-party package from PyPI at runtime for every task execution, pinned only to a mutable version range. If the sdg-hub package or the PyPI supply chain is compromised, an attacker could gain code execution in the pipeline pod (with access to LLM_API_KEY, service account tokens, and artifacts) as soon as this step starts. For production use, prefer a pre-baked image that already contains a vetted, immutable sdg-hub version (or at least pin to a specific version from a trusted internal index) instead of relying on packages_to_install.

Copilot uses AI. Check for mistakes.
Copilot AI review requested due to automatic review settings March 9, 2026 16:24
…t path

Signed-off-by: Yi Zheng <237498169+beatsmonster@users.noreply.github.com>
… dependency

Signed-off-by: Yi Zheng <237498169+beatsmonster@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 21, 2026 02:35
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 23 out of 28 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Signed-off-by: Yi Zheng <237498169+beatsmonster@users.noreply.github.com>
Signed-off-by: Yi Zheng <237498169+beatsmonster@users.noreply.github.com>
Signed-off-by: Yi Zheng <237498169+beatsmonster@users.noreply.github.com>
… errors

Signed-off-by: Yi Zheng <237498169+beatsmonster@users.noreply.github.com>
Signed-off-by: Yi Zheng <237498169+beatsmonster@users.noreply.github.com>
Signed-off-by: Yi Zheng <237498169+beatsmonster@users.noreply.github.com>
Signed-off-by: Yi Zheng <237498169+beatsmonster@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 21, 2026 03:35
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 27 out of 32 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Signed-off-by: Yi Zheng <237498169+beatsmonster@users.noreply.github.com>
Signed-off-by: Yi Zheng <237498169+beatsmonster@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 21, 2026 18:58
Signed-off-by: Yi Zheng <237498169+beatsmonster@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 22 out of 26 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…and pipeline

Signed-off-by: Yi Zheng <237498169+beatsmonster@users.noreply.github.com>
@beatsmonster
Copy link
Copy Markdown
Author

CI fixed and ready for review.

Comment on lines +37 to +42
# Component-specific SDK deps required for python_func() tests.
# KFP SubprocessRunner cannot auto-install these because it does not
# support Input[Dataset] artifacts (see: https://github.com/kubeflow/pipelines/issues/13061).
# Once KFP fixes this, these can be removed and tests can use
# SubprocessRunner(use_venv=True) instead.
"sdg-hub>=0.7.0,<1.0",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

happy to refactor if we can have a new release with the fix. i think the latest sdk is still 2.16

Comment on lines +10 to +13
- pandas
"components/**/shared/*":
- pandas
- component
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for awareness. I've created this issue:

from kfp import compiler, dsl
from kfp.kubernetes import use_config_map_as_volume, use_secret_as_env

from components.data_processing.sdg.component import sdg
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The import from components.data_processing.sdg.component import sdg uses the bare
filesystem path, which only works when running from the repo root with PYTHONPATH=..
Consumers who install the package will get ModuleNotFoundError because the installed
namespace is kfp_components.components.*.

Suggested change
from components.data_processing.sdg.component import sdg
from kfp_components.components.data_processing.sdg.component import sdg

Comment on lines +21 to +22
- kfp_kubernetes
- components
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The two new entries are unnecessary and should be removed:

  • kfp_kubernetes: pipeline.py imports from kfp.kubernetes import ..., which
    canonicalizes to kfp — already in the global allowlist. This entry would only
    apply to from kfp_kubernetes import ... (old package name), which isn't used here.
  • components: only needed because pipeline.py uses the wrong import path
    (from components...). If fixed per to use kfp_components.components...,
    it canonicalizes to kfp_components, which is already allowed for
    pipelines/**/*.py.

witold-nowogorski pushed a commit to dryszka/pipelines-components that referenced this pull request Mar 27, 2026
Assisted-by: Claude Code

Signed-off-by: Mateusz Switala <mswitala@redhat.com>
…t exceptions

Signed-off-by: Yi Zheng <237498169+beatsmonster@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 28, 2026 01:41
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 22 out of 26 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

output_metrics = MockArtifact(os.path.join(tmp_dir, "metrics.json"))

# Attempt to run with an invalid model identifier
with pytest.raises(Exception): # SDG Hub will raise an error for invalid models
Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pytest.raises(Exception) is overly broad for validating "invalid model" behavior; it will also pass for unrelated errors (e.g., networking, auth, prompt parsing). Prefer asserting a more specific exception type and/or matching an expected message so the test actually verifies the intended failure mode.

Suggested change
with pytest.raises(Exception): # SDG Hub will raise an error for invalid models
with pytest.raises(ValueError, match="invalid model"): # SDG Hub should raise a ValueError for invalid models

Copilot uses AI. Check for mistakes.
Comment on lines +27 to +31
max_tokens: int = -1,
export_to_pvc: bool = False,
export_path: str = "",
runtime_params: dict = {},
):
Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

runtime_params uses a mutable default ({}), which can leak state between invocations of the component function (and is generally unsafe for KFP component signatures). Use None as the default and normalize to {} inside the function body.

Copilot uses AI. Check for mistakes.
"Supply export_path (base PVC directory for exports)."
)

flow_name = flow_id.replace("/", "_").replace("\\", "_") if flow_id else "custom"
Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When both flow_id and flow_yaml_path are provided, the component correctly prioritizes flow_yaml_path, but the PVC export directory name is still derived from flow_id if it is non-empty. This can cause exports to be written under the wrong flow name. Consider deriving flow_name from the actually-selected flow (e.g., based on flow.metadata.name or the chosen YAML path) instead of flow_id here.

Suggested change
flow_name = flow_id.replace("/", "_").replace("\\", "_") if flow_id else "custom"
if flow_yaml_path:
# Derive flow name from the selected YAML path (basename without extension)
yaml_basename = os.path.basename(flow_yaml_path)
flow_name_root, _ = os.path.splitext(yaml_basename)
flow_name = flow_name_root.replace("/", "_").replace("\\", "_") or "custom"
elif flow_id:
flow_name = flow_id.replace("/", "_").replace("\\", "_")
else:
flow_name = "custom"

Copilot uses AI. Check for mistakes.
Comment on lines +9 to +10
1. Create sample input data (inline) 2. Run the LLM test flow via SDG Hub component 3. Output generated data as KFP
artifact
Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The overview steps are written as 1. ... 2. ... 3. ... on a single line, which Markdown will treat as a single list item rather than a numbered list. Put each numbered step on its own line (or use bullets) so the rendered README matches the intended structure.

Suggested change
1. Create sample input data (inline) 2. Run the LLM test flow via SDG Hub component 3. Output generated data as KFP
artifact
1. Create sample input data (inline)
2. Run the LLM test flow via SDG Hub component
3. Output generated data as KFP artifact

Copilot uses AI. Check for mistakes.
Signed-off-by: Yi Zheng <237498169+beatsmonster@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants