Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Pull request overview
Adds an SDG Hub–backed Kubeflow Pipelines (KFP) component plus supporting documentation, examples, and tests, along with test fixtures for validating transform-only and LLM flows.
Changes:
- Introduce
sdg_hubKFP component implementation with flow selection, optional model configuration, metrics, and optional PVC export. - Add component documentation (README + architecture design) and example pipelines/local runner script.
- Add unit + local/integration-style tests and SDG Hub flow/prompt test fixtures; update packaging config to include new component package.
Reviewed changes
Copilot reviewed 15 out of 17 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
components/sdg/ARCHITECTURE.md |
Architecture/design documentation for the SDG component. |
components/sdg/sdg_hub/component.py |
New KFP component implementation wrapping SDG Hub SDK. |
components/sdg/sdg_hub/README.md |
Component usage documentation and local dev/testing instructions. |
components/sdg/sdg_hub/example_pipelines.py |
Example KFP pipelines demonstrating component usage patterns. |
components/sdg/sdg_hub/run_local.py |
Local execution script for running an LLM test flow and printing outputs. |
components/sdg/sdg_hub/metadata.yaml |
Component metadata (deps/tags/verification timestamp). |
components/sdg/sdg_hub/OWNERS |
Ownership/maintainer declarations for the component. |
components/sdg/sdg_hub/tests/test_component_unit.py |
Unit tests with SDG Hub SDK mocked to validate component logic. |
components/sdg/sdg_hub/tests/test_component_local.py |
Local runner/integration tests using real flows (LLM test gated by env var). |
test_data/sdg_hub/* |
Sample input + minimal transform/LLM flows and prompt fixture for tests. |
pyproject.toml |
Adds deps for tests and registers new component packages for distribution. |
Comments suppressed due to low confidence (9)
components/sdg/sdg_hub/component.py:13
- The component imports and uses
pandas, butpackages_to_installonly includessdg-hub. Unlesssdg-hubguaranteespandasas a dependency, this can fail at runtime when KFP installs the component dependencies. Addpandastopackages_to_install(or remove the direct dependency on pandas).
@dsl.component(
packages_to_install=["sdg-hub"],
)
components/sdg/sdg_hub/run_local.py:29
run_local.pyexecutes its logic at import time (top-levelwith tempfile.TemporaryDirectory()...). This makes it unsafe to import and can cause unexpected side effects in tooling. Wrap execution in anif __name__ == "__main__":guard (and optionally move logic into amain()function).
with tempfile.TemporaryDirectory() as tmp_dir:
output_artifact = Artifact(os.path.join(tmp_dir, "output.jsonl"))
output_metrics = Artifact(os.path.join(tmp_dir, "metrics.json"))
print(f"Input: {INPUT_PATH}")
components/sdg/sdg_hub/README.md:44
- The README import paths use
from components.sdg.sdg_hub import sdg, but this repo is packaged askfp_components(seepyproject.tomlpackage-dir mapping). These examples likely won’t work for users whopip install kfp-components; consider switching examples tofrom kfp_components.components.sdg.sdg_hub import sdg(or whatever the intended public import path is).
```python
from components.sdg.sdg_hub import sdg
from kfp import dsl
from kfp_kubernetes import mount_pvc, use_secret_as_env
components/sdg/sdg_hub/example_pipelines.py:75
- The secret key name in the example (
OPENAI_APIKEY) conflicts with the generic secret key naming described elsewhere in this PR (e.g.,api_key/api_basein ARCHITECTURE.md). To avoid confusing users, align the examples/documentation on one secret schema (either update this mapping toapi_keyor clarify that any key name works as long as it’s mapped toLLM_API_KEY).
use_secret_as_env(
task=sdg_task,
secret_name="llm-credentials",
secret_key_to_env={"OPENAI_APIKEY": "LLM_API_KEY"},
)
pyproject.toml:38
pyproject.tomladds new test dependencies (sdg-hub,pandas) but the committeduv.lockdoes not appear to include them. If this repo relies onuv.lockfor reproducible environments/CI, it should be regenerated/updated to match the new dependency set.
"semver",
"pip",
"sdg-hub",
"pandas",
]
components/sdg/ARCHITECTURE.md:5
- PR description says this is a mirror design doc for feedback, but the PR also adds a full component implementation, tests, and test data. Please update the PR description/title or split the PR so reviewers understand whether they’re reviewing design docs only vs. production code changes.
# SDG Hub KFP Component Architecture Design
## Document Information
| Field | Value |
components/sdg/sdg_hub/component.py:67
logging.basicConfig(level=getattr(logging, log_level.upper()))will raiseAttributeErrorif an invalidlog_levelis provided (e.g., typo), causing the component to crash before doing any work. Consider validatinglog_levelagainst allowed values and/or using a safe fallback (e.g., default toINFO) when the attribute is missing.
logging.basicConfig(
level=getattr(logging, log_level.upper()),
format="%(asctime)s - %(levelname)s - %(message)s",
)
components/sdg/sdg_hub/README.md:20
- The Parameters table uses double leading/trailing pipes (e.g.,
|| Parameter | ... |), which does not render as a standard Markdown table in many renderers. Use a normal Markdown table row format with single|delimiters.
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `input_artifact` | `dsl.Input[dsl.Dataset]` | `None` | KFP Dataset artifact from upstream component |
| `input_pvc_path` | `str` | `""` | Path to JSONL input file on a mounted PVC |
| `flow_id` | `str` | `""` | Built-in flow ID from the SDG Hub registry |
components/sdg/ARCHITECTURE.md:1496
- This section states
packages_to_installwas rejected in favor of a pre-baked image, but the actual component implementation in this PR uses@dsl.component(packages_to_install=[...]). Either update the design doc to match the chosen implementation, or switch the component to abase_imageapproach as described here to avoid runtime installs.
**Why pre-baked (not runtime install):**
| Alternative | Why Not Selected |
|-------------|------------------|
| **`packages_to_install`** | 30-60s pip install overhead per run; network dependency; version resolution risk |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
0cde74f to
b9120fd
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 15 out of 17 changed files in this pull request and generated 11 comments.
Comments suppressed due to low confidence (2)
components/sdg/sdg_hub/README.md:45
- The README examples import
sdgfromcomponents.sdg.sdg_hub, but this repo packages components underkfp_components.components(viatool.setuptools.package-dir). Use the installable import path in examples (e.g.,from kfp_components.components.sdg.sdg_hub import sdg) so copy/paste works for users.
from components.sdg.sdg_hub import sdg
from kfp import dsl
from kfp_kubernetes import mount_pvc, use_secret_as_env
components/sdg/sdg_hub/component.py:66
getattr(logging, log_level.upper())will raiseAttributeErrorfor invalidlog_levelvalues, which turns a user input error into an unexpected crash. Validatelog_level(or provide a default likelogging.INFO) and raise a clearValueErrorlisting allowed levels.
logging.basicConfig(
level=getattr(logging, log_level.upper()),
format="%(asctime)s - %(levelname)s - %(message)s",
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
components/data_processing/sdg/sdg_hub/tests/test_data/transform_test_flow.yaml
Outdated
Show resolved
Hide resolved
components/data_processing/sdg/sdg_hub/tests/test_data/llm_test_flow.yaml
Outdated
Show resolved
Hide resolved
components/data_processing/sdg/sdg_hub/tests/test_data/llm_test_flow.yaml
Outdated
Show resolved
Hide resolved
components/data_processing/sdg/sdg_hub/tests/test_data/llm_test_flow.yaml
Outdated
Show resolved
Hide resolved
components/data_processing/sdg/sdg_hub/tests/test_data/transform_test_flow.yaml
Outdated
Show resolved
Hide resolved
components/sdg/sdg_hub/component.py
Outdated
| ] | ||
| } | ||
| with open(output_metrics.path, "w") as f: | ||
| json.dump(metrics_data, f) |
There was a problem hiding this comment.
I was curious if this is the most appropriate pattern for logging metrics in KFP? I dont have full context but I was referring back to the ARCHITECTURE.md and found that over there the metrics logging was in the format of:
metrics.log_metric("total_input_rows", 1000)
metrics.log_metric("total_output_rows", 850)
metrics.log_metric("execution_time_seconds", 3456.7)
metrics.log_metric("successful_blocks", 8)
metrics.log_metric("failed_blocks", 0)
in the current implementation, a hand-crafted JSON dict is written out to output_metrics.path. If this is the correct pattern, then maybe we can amend the architecture, or if the architecture pattern is more appropriate, could we adopt the same?
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 15 out of 17 changed files in this pull request and generated 1 comment.
Comments suppressed due to low confidence (8)
components/sdg/sdg_hub/run_local.py:69
run_local.pyexecutes the LocalRunner workflow at import time (module-levelwith tempfile.TemporaryDirectory() as pipeline_root:). This makes the file unsafe to import (e.g., by tooling) and can cause surprising side effects. Wrap execution in anif __name__ == "__main__":block and keep only definitions/patching setup at module scope.
# Initialize KFP LocalRunner
with tempfile.TemporaryDirectory() as pipeline_root:
kfp.local.init(
runner=kfp.local.SubprocessRunner(use_venv=False),
pipeline_root=pipeline_root,
)
components/sdg/sdg_hub/run_local.py:22
REPO_ROOTis set to the directory of this file (components/sdg/sdg_hub), not the repository root. This name is misleading and makes path logic harder to follow; consider renaming to something likeCOMPONENT_DIR(or compute the actual repo root if that’s what you need).
# Paths
REPO_ROOT = os.path.dirname(os.path.abspath(__file__))
TEST_DATA = os.path.join(REPO_ROOT, "..", "..", "..", "test_data", "sdg_hub")
INPUT_PATH = os.path.abspath(os.path.join(TEST_DATA, "sample_input.jsonl"))
FLOW_PATH = os.path.abspath(os.path.join(TEST_DATA, "llm_test_flow.yaml"))
components/sdg/sdg_hub/component.py:238
- PVC export directory naming uses second-resolution timestamps (
%Y%m%d_%H%M%S). If two runs export the sameflow_idwithin the same second to the sameexport_path, they can collide and overwritegenerated.jsonl. Consider including higher-resolution time (milliseconds) or a unique suffix (UUID/run id) to make export paths collision-safe.
flow_name = flow_id if flow_id else "custom"
timestamp = time.strftime("%Y%m%d_%H%M%S")
export_dir = os.path.join(export_path, flow_name, timestamp)
os.makedirs(export_dir, exist_ok=True)
export_file_path = os.path.join(export_dir, "generated.jsonl")
components/sdg/sdg_hub/README.md:46
- The README examples import
sdgfromcomponents.sdg.sdg_hub, but this repo is packaged withkfp_components.componentsmapped to thecomponents/directory (seepyproject.tomlpackage-dir). The import path in docs should usekfp_components.components.sdg.sdg_hub(or whatever the intended public import is), otherwise users will getModuleNotFoundErrorwhen using the installed package.
```python
from components.sdg.sdg_hub import sdg
from kfp import dsl
from kfp_kubernetes import mount_pvc, use_secret_as_env
pyproject.toml:39
- The
testextra addssdg-hubwithout a version constraint, but the component pins runtime tosdg-hub>=0.7.0,<1.0. To reduce CI flakes from upstream breaking changes, align the test dependency constraint with the component’s supported range (or document why tests should run against unpinned latest).
test = [
"docker",
"pytest",
"pytest-cov",
"pytest-timeout",
"docstring-parser",
"jinja2",
"packaging",
"pyyaml",
"semver",
"pip",
"setuptools",
"sdg-hub",
"pandas",
]
components/sdg/ARCHITECTURE.md:1497
- This section states dependencies are pre-baked into an image and specifically calls out
packages_to_installas a rejected approach, but the actual component implementation uses@dsl.component(packages_to_install=[...]). To avoid misleading readers, either update the design doc to reflect the current implementation approach, or adjust the implementation to match the documented packaging strategy.
### 10.3 Dependency Management
All dependencies are **pre-baked** into the container image at build time.
```mermaid
flowchart LR
subgraph build["Build Time"]
A["Dockerfile"] --> B["pip install sdg-hub"]
B --> C["Image with all deps"]
end
subgraph runtime["Runtime"]
C --> D["Component executes immediately"]
end
Why pre-baked (not runtime install):
| Alternative | Why Not Selected |
|---|---|
packages_to_install |
30-60s pip install overhead per run; network dependency; version resolution risk |
**components/sdg/sdg_hub/component.py:31**
* `runtime_params` uses a mutable default (`{}`), which can leak state between invocations in the same Python process (including unit tests) and is generally unsafe. Use `runtime_params: dict | None = None` (or similar) and set it to `{}` inside the function when needed.
max_tokens: int = -1,
export_to_pvc: bool = False,
export_path: str = "",
runtime_params: dict = {},
) -> None:
**components/sdg/sdg_hub/component.py:68**
* `logging.basicConfig(level=getattr(logging, log_level.upper()), ...)` will raise `AttributeError` if an invalid `log_level` is passed (e.g., a typo). Consider validating `log_level` against known levels and raising a clear `ValueError`, or defaulting to INFO when unknown.
# Configure logging
logging.basicConfig(
level=getattr(logging, log_level.upper()),
format="%(asctime)s - %(levelname)s - %(message)s",
)
</details>
---
💡 <a href="/kubeflow/pipelines-components/new/main?filename=.github/instructions/*.instructions.md" class="Link--inTextBlock" target="_blank" rel="noopener noreferrer">Add Copilot custom instructions</a> for smarter, more guided reviews. <a href="https://docs.github.com/en/copilot/customizing-copilot/adding-repository-custom-instructions-for-github-copilot" class="Link--inTextBlock" target="_blank" rel="noopener noreferrer">Learn how to get started</a>.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 16 out of 19 changed files in this pull request and generated 3 comments.
Comments suppressed due to low confidence (10)
components/sdg/sdg_hub/run_local.py:16
from component import sdgis brittle (it depends on the current working directory / import path). Use an explicit package import (e.g., relativefrom .component import sdgorfrom kfp_components.components.sdg.sdg_hub.component import sdg) so the script works when run from other directories or viapython -m ....
import kfp.local
import pandas as pd
from component import sdg
from kfp.local import executor_input_utils, task_dispatcher
components/sdg/sdg_hub/README.md:46
- The README examples import
sdgfromcomponents..., but the installable package namespace iskfp_components.components(perpyproject.toml). This will confuse users following the docs from an installed wheel. Update the import paths in the examples to usekfp_components.components.sdg.sdg_hub(or whichever public import you want to support).
from components.sdg.sdg_hub import sdg
from kfp import dsl
from kfp_kubernetes import mount_pvc, use_secret_as_env
pyproject.toml:66
- The
packageslist is manually enumerated, but this PR addspipelines/sdg/without addingkfp_components.pipelines.sdg(and likelykfp_components.pipelines.sdg.*) here. As-is, the new pipeline examples won’t be included in the built wheel/sdist. Either add the new pipelines package(s) or move these examples to docs/examples outside the packaged modules.
"kfp_components.components.sdg",
"kfp_components.components.sdg.sdg_hub",
"kfp_components.components.deployment",
"kfp_components.components.evaluation",
"kfp_components.components.training",
"kfp_components.pipelines",
"kfp_components.pipelines.data_processing",
"kfp_components.pipelines.deployment",
"kfp_components.pipelines.evaluation",
"kfp_components.pipelines.training",
]
scripts/tests/run_component_tests.py:134
- If installing
requirements-test.txtfails, the script only prints a warning and then continues into pytest. That can turn a dependency resolution problem into confusing test import errors later. Consider failing fast (return non-zero / raise) and printing stdout/stderr to make the root cause obvious.
print(f"Installing test dependencies from {req_file.relative_to(REPO_ROOT)}")
result = subprocess.run(
[sys.executable, "-m", "pip", "install", "-q", "-r", str(req_file)],
capture_output=True,
text=True,
)
if result.returncode != 0:
print(f"⚠️ Failed to install {req_file}: {result.stderr.strip()}")
installed.append(req_file)
pipelines/sdg/example_pipelines.py:16
- This module imports
kfp_kubernetes, but the repo’s declared dependencies don’t include thekfp-kubernetesdistribution (and it doesn’t appear in the lockfile). As written, compiling these example pipelines will fail in a fresh dev environment. Either addkfp-kubernetesas an (optional) dependency, or change the examples to use a dependency that is already part of the project.
from kfp import dsl
from kfp_kubernetes import mount_pvc, use_secret_as_env
pipelines/sdg/example_pipelines.py:18
- Imports use
from components..., but this repository packages components underkfp_components.components(seepyproject.tomlpackage-dir mapping). These example pipelines will break when used from an installed distribution. Update imports tofrom kfp_components.components.sdg.sdg_hub import sdg(or a relative import within thekfp_components.pipelinespackage).
from components.sdg.sdg_hub.component import sdg
components/sdg/sdg_hub/README.md:67
- The README uses
secret_key_to_env={"OPENAI_APIKEY": "LLM_API_KEY"}in the quickstart examples, but later sections show secrets with keys likeapi_key/api_base. Since Secret key names are user-defined, pick one convention and use it consistently across the README (and consider using a more standard key name likeapi_keyorOPENAI_API_KEYto reduce confusion).
# Inject LLM API credentials
use_secret_as_env(
task=sdg_task,
secret_name="llm-credentials",
secret_key_to_env={"OPENAI_APIKEY": "LLM_API_KEY"},
)
components/sdg/sdg_hub/README.md:46
- The usage snippets rely on
kfp_kubernetes(from kfp_kubernetes import mount_pvc, use_secret_as_env), butkfp-kubernetesis not declared as a dependency/extra in this repo. Add it to an appropriate optional-dependency (e.g.testor a newpipelines/kubernetesextra) or adjust the docs to use APIs that are available with the declared dependencies.
from kfp import dsl
from kfp_kubernetes import mount_pvc, use_secret_as_env
components/sdg/sdg_hub/component.py:68
logging.basicConfig(level=getattr(logging, log_level.upper()))will raiseAttributeErrorfor an invalidlog_level(e.g., typo), crashing the component before it can emit a helpful error. Validatelog_levelagainst allowed values and fall back to INFO (or raise a clearValueError).
logging.basicConfig(
level=getattr(logging, log_level.upper()),
format="%(asctime)s - %(levelname)s - %(message)s",
)
components/sdg/sdg_hub/component.py:13
- This component imports and uses
pandasinsidepython_func, butpackages_to_installonly installssdg-hub. Ifsdg-hubdoesn’t guaranteepandasas a dependency, the component can fail at runtime withImportError. Consider addingpandas(and any other direct imports) topackages_to_install, or avoid the direct pandas dependency if the SDK can accept file paths/records instead.
@dsl.component(
packages_to_install=["sdg-hub>=0.7.0,<1.0"],
)
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @component( | ||
| packages_to_install=["sdg-hub>=0.7.0,<1.0"], | ||
| ) |
There was a problem hiding this comment.
The example sdg component definition here uses packages_to_install=["sdg-hub>=0.7.0,<1.0"], which causes KFP to pip-install a third-party package from PyPI at runtime for every task execution, pinned only to a mutable version range. If the sdg-hub package or the PyPI supply chain is compromised, an attacker could gain code execution in the pipeline pod (with access to LLM_API_KEY, service account tokens, and artifacts) as soon as this step starts. For production use, prefer a pre-baked image that already contains a vetted, immutable sdg-hub version (or at least pin to a specific version from a trusted internal index) instead of relying on packages_to_install.
…t path Signed-off-by: Yi Zheng <237498169+beatsmonster@users.noreply.github.com>
… dependency Signed-off-by: Yi Zheng <237498169+beatsmonster@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 23 out of 28 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
pipelines/data_processing/sdg/sdg_hub/test_data/llm_test_flow.yaml
Outdated
Show resolved
Hide resolved
Signed-off-by: Yi Zheng <237498169+beatsmonster@users.noreply.github.com>
Signed-off-by: Yi Zheng <237498169+beatsmonster@users.noreply.github.com>
Signed-off-by: Yi Zheng <237498169+beatsmonster@users.noreply.github.com>
… errors Signed-off-by: Yi Zheng <237498169+beatsmonster@users.noreply.github.com>
Signed-off-by: Yi Zheng <237498169+beatsmonster@users.noreply.github.com>
Signed-off-by: Yi Zheng <237498169+beatsmonster@users.noreply.github.com>
Signed-off-by: Yi Zheng <237498169+beatsmonster@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 27 out of 32 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Signed-off-by: Yi Zheng <237498169+beatsmonster@users.noreply.github.com>
Signed-off-by: Yi Zheng <237498169+beatsmonster@users.noreply.github.com>
Signed-off-by: Yi Zheng <237498169+beatsmonster@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 22 out of 26 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…and pipeline Signed-off-by: Yi Zheng <237498169+beatsmonster@users.noreply.github.com>
|
CI fixed and ready for review. |
| # Component-specific SDK deps required for python_func() tests. | ||
| # KFP SubprocessRunner cannot auto-install these because it does not | ||
| # support Input[Dataset] artifacts (see: https://github.com/kubeflow/pipelines/issues/13061). | ||
| # Once KFP fixes this, these can be removed and tests can use | ||
| # SubprocessRunner(use_venv=True) instead. | ||
| "sdg-hub>=0.7.0,<1.0", |
There was a problem hiding this comment.
happy to refactor if we can have a new release with the fix. i think the latest sdk is still 2.16
| - pandas | ||
| "components/**/shared/*": | ||
| - pandas | ||
| - component |
There was a problem hiding this comment.
Just for awareness. I've created this issue:
| from kfp import compiler, dsl | ||
| from kfp.kubernetes import use_config_map_as_volume, use_secret_as_env | ||
|
|
||
| from components.data_processing.sdg.component import sdg |
There was a problem hiding this comment.
The import from components.data_processing.sdg.component import sdg uses the bare
filesystem path, which only works when running from the repo root with PYTHONPATH=..
Consumers who install the package will get ModuleNotFoundError because the installed
namespace is kfp_components.components.*.
| from components.data_processing.sdg.component import sdg | |
| from kfp_components.components.data_processing.sdg.component import sdg |
| - kfp_kubernetes | ||
| - components |
There was a problem hiding this comment.
The two new entries are unnecessary and should be removed:
kfp_kubernetes:pipeline.pyimportsfrom kfp.kubernetes import ..., which
canonicalizes tokfp— already in the global allowlist. This entry would only
apply tofrom kfp_kubernetes import ...(old package name), which isn't used here.components: only needed becausepipeline.pyuses the wrong import path
(from components...). If fixed per to usekfp_components.components...,
it canonicalizes tokfp_components, which is already allowed for
pipelines/**/*.py.
Assisted-by: Claude Code Signed-off-by: Mateusz Switala <mswitala@redhat.com>
…t exceptions Signed-off-by: Yi Zheng <237498169+beatsmonster@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 22 out of 26 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| output_metrics = MockArtifact(os.path.join(tmp_dir, "metrics.json")) | ||
|
|
||
| # Attempt to run with an invalid model identifier | ||
| with pytest.raises(Exception): # SDG Hub will raise an error for invalid models |
There was a problem hiding this comment.
pytest.raises(Exception) is overly broad for validating "invalid model" behavior; it will also pass for unrelated errors (e.g., networking, auth, prompt parsing). Prefer asserting a more specific exception type and/or matching an expected message so the test actually verifies the intended failure mode.
| with pytest.raises(Exception): # SDG Hub will raise an error for invalid models | |
| with pytest.raises(ValueError, match="invalid model"): # SDG Hub should raise a ValueError for invalid models |
| max_tokens: int = -1, | ||
| export_to_pvc: bool = False, | ||
| export_path: str = "", | ||
| runtime_params: dict = {}, | ||
| ): |
There was a problem hiding this comment.
runtime_params uses a mutable default ({}), which can leak state between invocations of the component function (and is generally unsafe for KFP component signatures). Use None as the default and normalize to {} inside the function body.
| "Supply export_path (base PVC directory for exports)." | ||
| ) | ||
|
|
||
| flow_name = flow_id.replace("/", "_").replace("\\", "_") if flow_id else "custom" |
There was a problem hiding this comment.
When both flow_id and flow_yaml_path are provided, the component correctly prioritizes flow_yaml_path, but the PVC export directory name is still derived from flow_id if it is non-empty. This can cause exports to be written under the wrong flow name. Consider deriving flow_name from the actually-selected flow (e.g., based on flow.metadata.name or the chosen YAML path) instead of flow_id here.
| flow_name = flow_id.replace("/", "_").replace("\\", "_") if flow_id else "custom" | |
| if flow_yaml_path: | |
| # Derive flow name from the selected YAML path (basename without extension) | |
| yaml_basename = os.path.basename(flow_yaml_path) | |
| flow_name_root, _ = os.path.splitext(yaml_basename) | |
| flow_name = flow_name_root.replace("/", "_").replace("\\", "_") or "custom" | |
| elif flow_id: | |
| flow_name = flow_id.replace("/", "_").replace("\\", "_") | |
| else: | |
| flow_name = "custom" |
| 1. Create sample input data (inline) 2. Run the LLM test flow via SDG Hub component 3. Output generated data as KFP | ||
| artifact |
There was a problem hiding this comment.
The overview steps are written as 1. ... 2. ... 3. ... on a single line, which Markdown will treat as a single list item rather than a numbered list. Put each numbered step on its own line (or use bullets) so the rendered README matches the intended structure.
| 1. Create sample input data (inline) 2. Run the LLM test flow via SDG Hub component 3. Output generated data as KFP | |
| artifact | |
| 1. Create sample input data (inline) | |
| 2. Run the LLM test flow via SDG Hub component | |
| 3. Output generated data as KFP artifact |
Signed-off-by: Yi Zheng <237498169+beatsmonster@users.noreply.github.com>
Description of your changes:
This is a mirror design documentation of SDG KFP Component for collecting feedbacks.
Checklist:
Pre-Submission Checklist
Learn more about the pull request title convention used in this repository.
Additional Checklist Items for New or Updated Components/Pipelines
metadata.yamlincludes freshlastVerifiedtimestampare present and complete
snake_casenaming convention