Skip to content

chore : add pipelines, components in library, remove pyproject.toml#96

Open
EncHawk wants to merge 1 commit intokubeflow:mainfrom
EncHawk:main
Open

chore : add pipelines, components in library, remove pyproject.toml#96
EncHawk wants to merge 1 commit intokubeflow:mainfrom
EncHawk:main

Conversation

@EncHawk
Copy link
Copy Markdown

@EncHawk EncHawk commented Feb 10, 2026

pyproject.toml was removed to include the repository into kubeflow-sdk as part of pipelines. it is followed here

the pipelines, components directories are included under a common library/ namespace, this is to ensure that import command contians library in the sdk as followed here.

issues : #80

Pre-Submission Checklist

…oml to integrate to sdk

Signed-off-by: D.Leap <dilipkumar2000.r@gmail.com>
Copilot AI review requested due to automatic review settings February 10, 2026 13:48
@google-oss-prow
Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign mprahl for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Migrates this repository’s pipelines/components into a shared library/ namespace to support inclusion in the Kubeflow SDK (per issue #80 / kubeflow/sdk#125), and removes standalone packaging configuration.

Changes:

  • Removed pyproject.toml (previous packaging/build configuration).
  • Added library/components/* and library/pipelines/* package scaffolding (__init__.py) for category-based imports.
  • Added an initial example component (yoda_data_processor) with metadata, owners, and unit/local-runner tests, plus a sample deployment component metadata/owners.

Reviewed changes

Copilot reviewed 1 out of 22 changed files in this pull request and generated no comments.

Show a summary per file
File Description
pyproject.toml Removes project packaging/build metadata previously used for distribution.
library/pipelines/init.py Adds top-level pipelines package initializer for the new library/ layout.
library/pipelines/training/init.py Adds training pipelines namespace initializer.
library/pipelines/evaluation/init.py Adds evaluation pipelines namespace initializer.
library/pipelines/data_processing/init.py Adds data-processing pipelines namespace initializer.
library/pipelines/deployment/init.py Adds deployment pipelines namespace initializer.
library/components/init.py Adds top-level components package initializer for the new library/ layout.
library/components/training/init.py Adds training components namespace initializer.
library/components/evaluation/init.py Adds evaluation components namespace initializer.
library/components/data_processing/init.py Adds data-processing components namespace initializer.
library/components/data_processing/README.md Adds category-level README linking to the new component.
library/components/deployment/init.py Adds deployment components namespace initializer.
library/components/deployment/component_valid/metadata.yaml Adds sample component metadata for deployment category.
library/components/deployment/component_valid/OWNERS Adds sample OWNERS for the deployment sample component.
library/components/data_processing/yoda_data_processor/component.py Introduces the prepare_yoda_dataset KFP component implementation.
library/components/data_processing/yoda_data_processor/init.py Re-exports the new component entrypoint.
library/components/data_processing/yoda_data_processor/metadata.yaml Adds metadata for yoda_data_processor.
library/components/data_processing/yoda_data_processor/OWNERS Adds owners for yoda_data_processor.
library/components/data_processing/yoda_data_processor/README.md Adds component documentation for yoda_data_processor.
library/components/data_processing/yoda_data_processor/tests/init.py Declares the tests package for the component.
library/components/data_processing/yoda_data_processor/tests/test_component_unit.py Adds unit tests that mock HuggingFace datasets interactions.
library/components/data_processing/yoda_data_processor/tests/test_component_local.py Adds a LocalRunner execution test for the component.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@mprahl
Copy link
Copy Markdown
Collaborator

mprahl commented Feb 11, 2026

@mprahl to review this

@EncHawk
Copy link
Copy Markdown
Author

EncHawk commented Feb 23, 2026

Hi @mprahl, could take a look at this

@EncHawk
Copy link
Copy Markdown
Author

EncHawk commented Mar 3, 2026

@mprahl could you review this? If there are no issue can this be merged?

Copy link
Copy Markdown

@manaswinidas manaswinidas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the restructuring work. The renames look correct, but there are a few issues worth addressing before merge — mainly around test quality, a missing library/__init__.py, stale docstrings referencing the now-deleted pyproject.toml packaging, and test fixture data mixed into the production tree. See inline comments.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tautological test: test_component_function_exists is redundant — if the top-level import from ..component import prepare_yoda_dataset fails, the entire module fails to load and this test never runs. It's asserting something Python's import system already guarantees. Remove it.

Mock-mirror anti-pattern: test_component_with_default_parameters reproduces the component's internal call sequence step-by-step via mocks. This makes the test brittle to any refactor (e.g., combining rename_column calls) even if output is unchanged. The actual transformation logic (add_yoda_prefix) is never tested against real data — .map.assert_called_once() only checks that .map() was invoked, not that it does the right thing.

Consider testing the add_yoda_prefix function directly with a small fixture dataset instead.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two issues:

  1. Zero assertions. This calls the component but never checks outputs — not even that the output datasets exist or have rows. "Didn't crash" is not a meaningful test contract.
  2. Undeclared network dependency. This hits the real HuggingFace endpoint (dvgodoy/yoda_sentences), making it an integration test that will fail in air-gapped CI. Either mock the download or move this to a separate integration test suite with appropriate markers.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

import kfp.compiler is only used inside the if __name__ == "__main__" guard. Move it inside that block to avoid loading the compiler module during normal component usage and test execution.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring still references from kfp_components.components import training, but pyproject.toml (which defined the kfp_components package mapping) is deleted in this PR. These import paths are now dangling. Update the docstrings to reflect the new package structure, or remove them until the SDK integration defines the final import paths.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file has name: invalid-dependency-semantic-versioning with fields like whatever: foo — clearly test fixture data. It shouldn't live in the production component tree under deployment/. Move it to a test_data/ or fixtures/ directory to avoid confusion.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "Metadata" section manually duplicates everything from metadata.yaml (tags, dependencies, owners, last-verified date). This creates a maintenance burden — any metadata.yaml change requires a synchronized README update. Consider auto-generating this section or just linking to the YAML.

@@ -1,123 +0,0 @@
[build-system]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deleting the entire build/packaging config (dependencies, ruff/pytest config, setuptools mappings) without any replacement leaves the repo un-buildable and un-lintable. If the intent is to absorb this into kubeflow-sdk, the PR should at minimum document what replaces pyproject.toml or land the SDK-side config first.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same stale-docstring issue as library/components/__init__.py — references kfp_components.pipelines which no longer resolves to anything after pyproject.toml removal.

Also: library/ itself has no __init__.py, so it's not a Python package. import library.pipelines will fail unless namespace packages are intentionally being used (which isn't documented).

@EncHawk
Copy link
Copy Markdown
Author

EncHawk commented Mar 16, 2026

Thank you for the review @manaswinidas will get to fixing it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants